I haven’t posted any ramblings about information retrieval theory in a while. Some of you might be grateful for this lull, but this post is for those of you who miss such thoughts. Everyone else: you’ve been warned!
Here’s what I’ve been thinking about. At one extreme, we have set retrieval, which, given a query, divides a corpus into two subsets corresponding to those documents the system believes to be relevant and those it does not–a binary split. At the other extreme, we have ranked retrieval, which orders documents according to their estimated likelihood of relevance. Given the poor reputation of extremism, I want to explore the space between these extremes.
In both extreme cases, the system returns an ordered sequence of subsets of the corpus, and I propose we consider this as a general framework, which we might call ranked set retrieval. In the first case, the system returns two sets; in the second case, it returns as many singleton sets as there are documents in the corpus. In practice, of course, even ranked retrieval systems tend to dismiss some subset of the corpus as irrelevant, which we can model in our ranked set retrieval framework by appending that subset to the end of the ranked sequence of singletons.
Now that we can consider set retrieval and ranked retrieval in the same framework, we can ask interesting questions and reason about how they should inform the evaluation criteria for information retrieval systems.
For example, when is set retrieval a more appropriate response to a query than ranked retrieval? An easy–though only partial–answer there is evident from symmetry: set retrieval is more appropriate in cases where our estimates of relevance are themselves binary, and where we thus have no principled basis for a finer-grained partition. Hence, given such binary relevance assessments, our retrieval algorithm should recognize that our optimal response is to return two subsets. Conversely, the more fine-grained our estimates of relevance, the greater a basis we have for returning more subsets and including those documents estimated to be more relevant in earlier subsets. At the extreme, the relevance estimates for all documents may be so well separated that the optimal response is, in fact, to return a sequence of singleton sets as per conventional ranked retrieval.
Of course, the interesting cases are in between, i.e., where the optimal response to a query is a collection of subsets corresponding to varying ranges of relevance assessment. Or perhaps we should go beyond bucketing by relevance estimates, and instead optimize for the probability that one of the offered subsets has a high utility reflecting a combination of precision and recall. We could then ordering the subsets by their utility. In fact, a utility measure for such an approach could be recursive–since each subset is really a subquery or query refinement that can then be partitioned into ranked subsets. Indeed, such a recursive approach closely models the behavior we see with information retrieval systems that support interaction.
Why does this subject concern me so much? It’s not just that I’d like to see robust evaluation measures for faceted search and clustering–I’d like to see measures that are able to compare them against ranked retrieval in a common framework, without having to depend on user studies.
Perhaps I’m naively rediscovering paths already explored by folks like Yi Zhang and Jonathan Koren. Their notion of “expected utility based evaluation” does strike a chord. But I don’t see them or anyone else taking the next step and using such an approach to compare the apples and oranges of set and ranked retrieval methods. It’s a missed opportunity, and maybe even a way to bring IR respectability to approaches designed for interactive and exploratory search. If IR can’t come to HCIR, perhaps HCIR can come to IR.
10 replies on “Ranked Set Retrieval”
I think Fernando Diaz did some interesting work, if not directly on the things you are talking about, at least one something related. Check out his 2008 dissertation:
This approach makes sense. In fact, it reminds me a bit of scatter/gather. One HCI concern is people’s expectations. Search results will need to be presented in a palatable way. Perhaps this approach may work better for injecting search results into other tasks rather than serving the up raw.
Jeremy, you’re right, at least in that his future work suggestion on optimal cluster retrieval (section 8.3.1) is very much in this vein. I’m a fan of Fernando’s work, though I’ll confess I’ve never read his dissertation from cover to cover. Do you know if he or anyone else has followed up on these ideas?
Actually, Fernando, maybe you’re reading this yourself and can chime in.
Gene, I like scatter/gather, but the question I think it leaves is how you compare it to a ranked retrieval approach. For that matter, how could a system decide between the two for the same query? I feel like there’s a false dichotomy between clustering / query refinement / set-oriented approaches and ranked ones that is impeding research progress. Indeed, I think this is a major issue for the HCIR community, at least with respect to finding common ground with IR traditionalists.
bah – i just wrote a nice long comment and it got lost. It was along the lines of:
This post has been stewing in the back of my head… yada yada… Bringing the two communities has been a long time coming. Its on the way… I agree with your post… then:
There was an interesting paper at SearchSolutions2008 in London last year. It presented a faceted system they had built that applied weightings to the index, rather than performing set retrieval. So it was facets, but returning a ranked list, in conjunction with keywords. It was presented by Lemur Consulting.
Well, there’s no reason you can’t rank the list of results returned in a faceted search engine. We do that at Endeca all the time. But that’s still a sequence of singleton sets: ranked retrieval with the implied appended subset of non-matching documents at the end.
But, since we return facet values associated with sets of documents, you could say that each facet value represents a subset of the result set, and that our ordering of facet values is a more general kind of ranked set retrieval. In fact, you can even see it recursively, since selecting one of these values will lead to another such set.
But we have never approached the subject with the theoretical rigor needed for something as ambitious as I outlined above. I don’t think anyone has.
[…] of retrieval. The two are complementary: for example, recently Daniel Tunkelang posted about using sets rather than ranked lists as a way of representing search results. This has implications on one hand […]
[…] Fuhr’s model. I suspect I was trying to back into a similar approach in my post on “Ranked Set Retrieval” a few months ago. But, even if I’d try to formalize my proposal, I don’t believe […]
As you said, there is nothing with tiered/ranked set retrieval that prevents you from ranking the documents within each tier.
For example, you can always rank the documents within each set, say according to the ranking formula used in ranked retrieval. Then, if the result sets are ranked relative to all the other sets, you get a global ranked list of documents, even for your tiered method, and you can directly compare the two approaches.
In a flexible search engine e.g. Indri, the independence between filtering (ranking sets) and ranking documents is very clear. There are two kinds of query operators, the Boolean filtering kind for tiered ranking, and the scoring kind, for ranking. Users can mix these two kinds of operators in their queries. However, the result will always be a single ranked list.
The only problem might be result clustering. As there is no ranking across the different clusters, and that changes the result presentation, thus, might be difficult to directly compare with ranked retrieval. I guess this is where the problem really is? Not the previous set ranking case?
Le, you’re correct that the system can mix up the set retrieval and ranked retrieval modes–indeed, that’s the whole point of this model. But forcing a total ordering on the individual documents defeats the point of the model. The idea is to use such a model to optimize for interaction.
For example, let’s say that you can return 10 result elements in response to a query, where a result element is a set of 1 or more documents. What should those 10 result elements be? This is the sort of question that invites a framework like the one proposed by Norbert Fuhr.