The Noisy Channel

 

A Question of User Expectations

October 25th, 2010 · 17 Comments · General

Ideally, a search engine would read the user’s mind. Shy of that, a search engine should provide the user with an efficient process for expressing an information need and then provide the user with results relevant to the that need.

From an information scientist’s perspective, these are two distinct problems to solve in the information seeking process: establishing the user’s information need (query elaboration) and retrieving relevant information (information retrieval).

When open-domain search engines (i.e., web search engines) went mainstream in the late 1990s, they did so by glossing over the problem of query elaboration and focusing almost entirely on information retrieval. More precisely, they addressed the query elaboration problem by requiring users to provide reasonable queries and search engines to infer information needs from those queries. In recent years, there has been more explicit support for query elaboration–most notably in the form of type-ahead query suggestions (e.g., Google Instant). There have also been a variety of efforts to offer related queries as refinements.

But even with such support, query elaboration typically yields an informal, free-text string. All vocabularies have their flaws, but search engines compound the inherent imprecision of language by not even trying to guide users to a common standard. At best, query suggestion nudges users towards more popular–and hopefully more effective–queries.

In contrast, consider closed-domain search engines that operate on curated collections, e.g., the catalog search for an ecommerce site. These search engines often provide users with the opportunity to express precise queries, e.g., black digital cameras for under $250. Moreover, well-designed sites offer users faceted search interfaces that support progressive query elaboration through guided refinements.

Many (though not all) closed-domain search engines have an advantage over their open-domain counterparts: they can rely on manually curated metadata. The scale and heterogeneity of the open web defies human curation. Perhaps we’ll reach a point when automatic information extraction offers quality competitive with curation, but we’re not there yet. Indeed, the lack of good, automatically generated metadata has been cited as the top challenge facing those who would implement faceted search for the open web.

What can we do in the mean time? Here is a simple idea: use a closed-domain search engine do guide users to precise queries, and then apply the resulting queries to the open web. In other words mash up the closed and open collections.

Of course, this is easier said that done. It is not at all clear if or how we can apply a query like “black digital cameras for under $250″ to a collection that is not annotated with the necessary metadata. But we can certainly try. And our ability to perform information retrieval from structured queries will improve over time–in fact, it may even improve more quickly if we can start to assume that users are being guided to precise, unambiguous queries.

Even though result quality would be variable, such an approach would at least eliminate a source of uncertainty in the information seeking process: the user would be certain of having a query that accurately represented his or her information need. That is no small victory!

I fear, however, that users might not respond positively to such an interface. Given the certainty that a query accurately represents his or her information need, a user is likely to have higher expectations of result quality than without that certainty. Retrieval errors are harder to forgive when the query elaboration process eliminates almost any chance of misunderstanding. Even if the results were more accurate, they might not be accurate enough to satisfy user expectations.

As an HCIR evangelist, I am saddened by this prospect. Reducing uncertainty in any part of the information seeking process seems like it should always be a good thing for the user. I’m curious to hear what folks here think of this idea.

17 responses so far ↓

  • 1 Bill Bliss // Oct 25, 2010 at 10:08 pm

    Daniel, your post got me thinking…

    Let’s say you used sites for which there was strong metadata as a training set, of sorts, and then used that to derive a set of search queries?

    For example, if you know that “http://www.bhphotovideo.com/c/search?ci=9811&N=4291645412+4293918168+4294956965″ is a good result for “black digital cameras under $250″ then derive the set of queries which included that page in the first n (say 10) results. That set of queries (perhaps cleansed/de-duped) might be a pretty good proxy for that same concept on the open-domain web.

    Overlay clickthrough and/or dwell-time on the results (some of Susan Dumais’ research contemplates this) and you might have a way of annotating results with metadata in an automated, post-facto way.

  • 2 Daniel Tunkelang // Oct 26, 2010 at 12:31 am

    I agree that the click graph and post-click behavior can be valuable signals. I did like the dwell-time work that Sue and colleagues presented at this year’s SIGIR, and I’m also a fan of earlier work by Craswell and Szummer on random walks on the click graph. And there’s classification, entity extraction, query expansion, language modeling, and lot of other fun IR tools that could be brought to bear.

    My colleagues and I did some light mash-ups of this sort while I was at Endeca and showed them off as demos. I thought it was a cool idea at the time, and still do–despite my concerns about user expectations.

  • 3 Lars Ludwig // Oct 26, 2010 at 11:08 am

    It seems to me that this is, in a way, what ebay and others are already doing/already have achieved in their semi-open domains.

  • 4 Daniel Tunkelang // Oct 26, 2010 at 11:31 am

    Lars, my impression is that eBay’s browsing is driven by metadata supplied by sellers–though it could be that much of that metadata comes from the categories that eBay automatically suggests categories to sellers. Aggregators have a tougher job, but many of them work with feeds.

    But yes, these kinds of efforts are certainly in the spirit of what I’m advocating. For product search, even on the open web, I think faceted search is a no-brainer. Even if I’m biased. :-)

  • 5 Thomas Martin // Oct 26, 2010 at 1:01 pm

    Daniel,

    This post really resonates with some current work.

    As organizations open up their closed domains to user generated content the curated content becomes diluted and we start to lose ground on the query elaboration front. Even with automated mapping processes it starts to feel like we are trying to do the closed and open domain searches in one space. The jury is still out on our ability to maintain the integrity of the closed search while at the same time using additional metadata needed to support the product catalog.

  • 6 Daniel Tunkelang // Oct 27, 2010 at 3:43 pm

    I still am inspired by the work Fernando Diaz and Don Metzler presented at SIGIR 2006 on “Improving the Estimation of Relevance Models Using Large External Corpora“. Based on their results, other work I’ve seen (e.g., the work by Wisam Dakka et al on “Automatic Discovery of Useful Facet Terms“), and the practical work I’ve done in this vein, I’m convinced we can do a lot more to borrow the high signal from closed collections to improve information retrieval on noisier, open collections.

    And I think this approach will be even more effective if we take the uncertainty out of query elaboration.

  • 7 tim // Oct 28, 2010 at 9:45 am

    Anyone tried ‘black digital cameras for under $250′ at google? – give it a try, its not bad at all.
    Among the top 10:
    – Best Inexpensive Digital Cameras
    – Reviewed: The Top 5 Budget Digital Cameras Under $250
    – 5 Excellent Digital Cameras for 2009 Under $250

    The images are useful, too. It has to be hell of a faceted search system that outperforms this answer. I mean, all I fear is that the result pages are not neutral, making some cameras better then they are. 5-star/1-star reviews made by paid marketing departments. things like that..

    The only existing system that I think is able to beat google here is my social network. If my father says buy that one, I’d just do so.

    Don’t get me wrong.
    I love classification. It makes me feel like I’m in control of whats out there. It’s just: I don’t know what I would like to see as a faceted answer to my daily information needs! – It is in fact ‘A Question of User Expectations’.

  • 8 Daniel Tunkelang // Oct 28, 2010 at 11:28 am

    Tim, point made. As I noted in a post a couple of years ago, there’s a symbiosis that occurs between web search engines and the sites they index. Sometimes SEO is win/win, and I think this is such a case.

    But I think it’s very different outside of retail, e.g., real-estate, digital content, people. Maybe the lack of SEO means the economic incentives aren’t there to support the use cases I care about. Or maybe retail is just an easier / more mature market.

  • 9 tim // Oct 29, 2010 at 5:24 am

    ok.
    I had a look at the papers mentioned in this post. There’s one fundamental thing I couldn’t figure out clearly. In a faceted search system for the web, what would the result set be made of?
    Links to matching web pages, augmented by a snippet?
    Or independent, structured Entities, aggregated from multiple web pages (s.th. that looks like the result set in [1]) ?
    While the first one seems feasible and automatable, the second one seems revolutionary powerful.

    [1] http://well-formed-data.net/experiments/elastic_lists/

  • 10 Daniel Tunkelang // Oct 29, 2010 at 9:09 am

    The latter is certainly what I’d love to see. But yes, it’s a harder problem, given where we stand today.

  • 11 jeremy // Oct 31, 2010 at 3:44 pm

    Ideally, a search engine would read the user’s mind. Shy of that, a search engine should provide the user with an efficient process for expressing an information need and then provide the user with results relevant to the that need. From an information scientist’s perspective, these are two distinct problems to solve in the information seeking process: establishing the user’s information need (query elaboration) and retrieving relevant information (information retrieval).

    Sorry, I just have to make my usual point: There is more to search than query elaboration and information retrieval.

    I am thinking in particular of the broader notion of “exploratory” search (i.e. not just faceted search) which assumes that the user does not yet know what he or she wants. No amount of mind reading will ever help for a certain (arguably large) types of information needs. The goal of the system should therefore be to help the user learn and explore, rather than (only) elaborate and retrieve. It’s still an information retrieval system, and there is still most certainly HCIR. It’s just not ad hoc or faceted retrieval.

    Just making sure we don’t prematurely collapse the research space :-)

  • 12 Daniel Tunkelang // Nov 1, 2010 at 9:21 am

    Point taken. In my defense, support for query elaboration is a broader notion than faceted search — it’s just that faceted search is the poster child. Also, as I think I’ve also argued before, much of the technology needed to support query elaboration would also help support exploration in general.

  • 13 dan nicollet // Nov 1, 2010 at 1:12 pm

    Good post and good idea. It seems to me however, that this is somewhat done already on search engines like Google Shopping (ie: http://www.google.com/products?hl=en&q=monitor+56+inch&um=1&ie=UTF-8&sa=N&tab=wf). There is some level of faceting available when you search the large web-native Google shopping collection. The outcome is of course often disappointing because the target data doesn’t always respond well to advanced multi-parameter queries.

    Another idea I feel very convinced of is that instant search tools (like Google Suggest) will become a requirement of search to help with query elaboration. As you said it well in your post, query elaboration was too complex to handle. I worked at Infoseek in the late 90s and we had nothing to resolve this so we assumed people had to come up with a good query by iteration (back-and-forthing between query and results) until they hit a good productive and relevant query.

    I wrote a paper on the use of AJAX suggestion tools to resolve this if you are interested (http://bit.ly/c7UV5g). At Exorbyte we have seen our suggest tool be a major driver of our foray in ecommerce.
    Dan

  • 14 Daniel Tunkelang // Nov 1, 2010 at 8:10 pm

    Indeed, I see some 5.6″ monitors sneaking into the results. Google tries hard, but doesn’t always get these right. But in this case, there’s also no way to specify the size as a facet. What you really want is more like this:

    http://www4.shopping.com/flat-panel-televisions/screen-size–search–56—60-in/products

    And thanks for the link to the Exorbyte paper. I worked on something similar at Endeca in an enterprise context.

  • 15 Jinyoung Kim // Nov 11, 2010 at 4:10 pm

    Hi, Daniel.

    As a researcher who’ve been working on structuring a keyword query for structured document retrieval, I guess using faceted search interface can be more cumbersome than typing in a keyword query, especially the user knows what she wants. This might be related to higher expectation for result set quality in faceted search setting. However, I believe faceted search will be more suitable in some cases — e.g., exploratory search — in that it informs the space of exploration.

    For query structuring, you can look at my work, or a related work from MSR.

  • 16 Daniel Tunkelang // Nov 12, 2010 at 11:55 am

    Jinyoung, thanks for the links. I’m impressed that you and the MSR folks manage to infer structure from keyword queries. But I’d like to see interfaces make that process explicit, reflecting the structure back to users, and making it easy for users to supply it directly.

  • 17 Marketing via Aggregation, Filtering and Curation – Tools and Resources // Jan 31, 2011 at 3:00 pm

    [...] A Question of User Expectations [...]

Clicky Web Analytics