The Noisy Channel

 

Recommending Diversity

November 17th, 2008 · 10 Comments · General

Another nice post from Daniel Lemire today, this time about a paper by Mi Zhang and Neil Hurley on “Avoiding monotony: improving the diversity of recommendation lists” (ACM Digital Library subscription required to see full text).

Here’s an abstract of the abstract:

Noting that the retrieval of a set of items matching a user query is a common problem across many applications of information retrieval, we model the competing goals of maximizing the diversity of the retrieved list while maintaining adequate similarity to the user query as a binary optimization problem.

It’s nice to see a similarity vs. diversity trade-off for recommendations analogous to the precision vs. recall trade-off for typical information retreival evaluation.

Our experience at Endeca is certainly that most of the approaches out there underemphasize diversity, which not only leads to the “monotony” problem but also breaks down when the query does not unambiguously express the user’s intent. Since our approach emphasizes interaction, we leverage the diversity of the options we present to maximize the opportunity for users to make progress in satisfying their information needs.

I would like to second Daniel Lemire’s suggestion to perform user studies to investigate the optimal balance between diversity and accuracy. They’d make for great papers. Just remember to send him (and me!) copies!

Tags:

10 responses so far ↓

  • 1 Daniel Lemire // Nov 17, 2008 at 5:13 pm

    One nice thing about recall/precision, is that to compute them, you just need to know whether document x is relevant or not to query y. Measuring diversity requires that you also know the “distance” between document x1 and x2. It is not hard, but it makes up an extra layer of math.

  • 2 Daniel Tunkelang // Nov 17, 2008 at 10:52 pm

    True. But the nice thing about diversity is that it’s an unsupervised measure. I’m a big fan of such unsupervised set measures, like the query-less version of query clarity that we use at Endeca. There really should be unsupervised analogs of the precision vs. recall trade-off that can be used in the absence of relevance assessments.

  • 3 Max L. Wilson // Dec 5, 2008 at 8:43 am

    Diversity in results seems surprisingly underrepresented in both academia and industry. Saw a couple of posters at SIGIR07 on it, but they were largely calculating the diff between 2 pages, and not reporting what made each unique! It seems all the more important in faceted search, for example, where every result is often equally related to the selections made by the user. There was a group that presented at SearchSolutions2008 who weight every attribute given to an object in faceted search. Im not sure how they can put a weighting on things like the manufacturer etc.

  • 4 Daniel Tunkelang // Dec 6, 2008 at 11:14 pm

    Indeed, it’s hard to do research on a problem that doesn’t have accepted measures for evaluation.

    As for putting a weight on each attribute, I suppose you can turn anything into a vector space (e.g., by treating each manufacturer as a binary value). But distance measures can do funny things in high-dimensional spaces.

  • 5 When Recommendations Become a Problem - Things On Top // Mar 2, 2009 at 3:39 pm

    [...] so we cover up by design systems that encourage exploration and discovery. We strive for transparency and diversity, designing an experience for the eager and motivated user. But for some of us, the Paradox of [...]

  • 6 jeremy // Mar 2, 2009 at 6:16 pm

    I’ve talked with insiders at Google, and they claim that they strive to automatically generate diversity in their rankings. I believe what I’ve heard, but Google is still such a black box that I have no idea how to evaluate or understand exactly how much diversity is being created.

  • 7 Daniel Tunkelang // Mar 2, 2009 at 6:58 pm

    “This is the black box. The diversity you asked for is inside.”

    With apologies to Saint Exupéry.

  • 8 jeremy // Mar 2, 2009 at 10:12 pm

    And how does the story continue?

    —-
    I was very surprised to see a light break over the face of my young relevance assessor:

    “That is exactly the way I wanted it! Do you think that this diversity will have to have a great deal of click-throughs?”

    “Why?”

    “Because where I live everything must be relevant…”

    “There will surely be enough click-throughs from it,” I said. “It is a very large diversity that I have given you.”

    He bent his head over the SERP.

    “Not so large that–Look! It has gone to sleep…”
    —-

    Non? ;-)

  • 9 Daniel Tunkelang // Mar 2, 2009 at 11:52 pm

    “Don’t you see–I am very busy with matters of consequence!”

  • 10 jeremy // Mar 3, 2009 at 2:54 am

    Busy letting the boa constrictors of navigational, known-item search swallow the elephants of relevance? ;-)

Leave a Comment

Clicky Web Analytics