Categories
General

Idea Navigation

Last summer, my colleague Vladimir Zelevinsky worked with two interns, Robin Stewart (MIT) and Greg Scott (Tufts), on a novel approach to information exploration. They call it “idea navigation”: the basic idea is to extract subject-verb-object triples from unstructured text, group them into hierarchies, and then expose them in a faceted search and browsing interface. I like to think of it as an exploratory search take on question answering.

We found out later that Powerset developed similar functionality that they called “Powermouse” in their private beta and now call “Factz”. While the idea navigation prototype is on a smaller scale (about 100k news articles from October 2000), it does some cool things that I haven’t seen on Powerset, like leveraging verb hypernyms from WordNet.

Click on the frame below to see the presentation they delivered at CHI ’08.

Idea Navigation: Structured Browsing for Unstructured Text

Categories
General

Clarification vs. Refinement

The other day, in between braving the Hulk and Spiderman rides at Endeca Discover ’08, I was chatting with Peter Morville about one of my favorite pet peeves in faceted search implementations: the confounding of clarification and refinement. To my delight, he posted about it at findability.org today.

What is the difference? I think it’s easiest to understand by thinking of a free-text search query as causing you to be dropped at some arbitrary point on a map. Our planet is sparsely populated, as pictured below, so most of the area of the map is off-road. Hence, if you’re dropped somewhere at random, you’re really in the middle of nowhere. Before you start trying to find nearby towns and attractions, your first task is to find a road.

How does this metaphor relate to clarification vs. refinement? Clarification is the process of finding the road, while refinement leverages the network of relationships in your content (i.e., the network of roads connecting towns and cities) to enable navigation and exploration.

“Did you mean…” is the prototypical example of clarification, while faceted navigation is the prototypical example of refinement. But it is important not to confuse the concrete user interfaces with their intentions. The key point, on which I’m glad to see Peter agrees, is that clarification, when needed, is a prerequisite for refinement, since it gets the user and the system on the same page. Refinement then allows the user to fully exploit the relationships in the data.

Categories
General

Is Search Broken?

Last night, I had the privilege of speaking to fellow CMU School of Computer Science alumni at Fidelity’s Center for Advanced Technology in Boston. Dean Randy Bryant, Associate Director of Corporate Relations Dan Jenkins, and Director of Alumni Relations Tina Carr, organized the event, and they encouraged me to pick a provocative subject.

Thus encouraged, I decided to ask the question: Is Search Broken?

Slides are here as a PowerPoint show for anyone interested, or use the embedded SlideShare show below.

Categories
General

Another HCIR Game

I just received an announcement from the SIG-IRList about the flickling challenge, a “game” designed around known-item image retrieval from Flickr. The user is given an image (not annotated) and the goal is to find the image again from Flickr using the system.

I’m not sure how well it will catch on with casual gamers–but that is hardly its primary motivation. Rather, the challenge was designed to help provide a foundation for evaluating interactive information retrieval–in a cross-language setting, no less. Details available at the iCLEF 2008 site or in this paper.

I’m thrilled to see efforts like these emerging to evaluate interactive retrieval–indeed, this feels like a solitaire version of Phetch.

Categories
General

Games With an HCIR Purpose?

A couple of weeks ago, my colleague Luis Von Ahn at CMU launched Games With a Purpose,

Here is a brief explanation from the site:

When you play a game at Gwap, you aren’t just having fun. You’re helping the world become a better place. By playing our games, you’re training computers to solve problems for humans all over the world.

Von Ahn has made a career (and earned a MacArthur Fellowship) from his work on such games, most notably the ESP Game and reCAPTCHA. His games emphasize tagging tasks that are difficult for machines but easy for human beings, such as labeling images with high-level descriptors.

I’ve been interested in Von Ahn’s work for several years, and most particularly in a game called Phetch, a game which never quite made it out of beta but strikes me as one of the most ambitious examples of “human computation”. Here is a description from the Phetch site:

Quick! Find an image of Michael Jackson wearing a sailor hat.
Phetch is like a treasure hunt — you must find or help find an image from the Web.

One of the players is the Describer and the others are Seekers. Only the Describer can see the hidden image, and has to help the Seekers find it by giving them descriptions.

If the image is found, the Describer wins 200 points. The first to find it wins 100 points and becomes the new Describer.

A few important details that this description leaves out:

  • The Seeker (but not the Describer) has access to search engine that has indexed the images based on results from the ESP Game.
  • A Seeker loses points (I can’t recall how many) for wrong guesses.
  • The game has a time limit (hence the “Quick!”).

Now, let’s unpack the game description and analyze it in terms of the Human-Computer Information Retrieval (HCIR) paradigm. First, let us simplify the game, so that there is only one Seeker. In that case, we have a cooperative information retrieval game, where the Describer is trying to describe a target document (specifically, an image) as informatively as possible, while the Seeker is trying to execute clever algorithms in his or her wetware to retrieve it. If we think in terms of a traditional information retrieval setup, that makes the Describer the user and the Seeker the information retrieval system. Sort of.

A full analysis of this game is beyond the scope of a single blog post, but let’s look at the game from the Seeker’s perspective, holding our assumption that there is only one Seeker, and adding the additional assumption that the Describer’s input is static and supplied before the Seeker starts trying to find the image.

Assuming these simplifications, here is how a Seeker plays Phetch:

  • Read the description provided by the Describer and uses it to compose a search.
  • Scan the results sequentially, interrupting either to make a guess or to reformulate the search.

The key observation is that Phetch is about interactive information retrieval. A good Seeker recognizes when it is better to try reformulating the search than to keep scanning.

Returning to our theme of evaluation, we can envision modifying Phetch to create a system for evaluating interactive information retrieval. In fact, I persuaded my colleague Shiry Ginosar, who worked with Von Ahn on Phetch and is now a software engineer at Endeca, to elaborate such an approach at HCIR ’07. There are a lot of details to work out, but I find this vision very compelling and perhaps a route to addressing Nick Belkin’s grand challenge.

Categories
General

A Utilitarian View of IR Evaluation

In many information retrieval papers that propose new techniques, the authors validate those techniques by demonstrating improved mean average precision over a standard test collection. The value of such results–at least to a practitioner–hinges on whether mean average precision correlates to utility for users. Not only do user studies place this correlation in doubt, but I have yet to see an empirical argument defending the utility of average precision as an evaluation measure. Please send me any references if you are aware of them!

Of course, user studies are fraught with complications, the most practical one being their expense. I’m not suggesting that we need to replace Cranfield studies with user studies wholesale. Rather, I see the purpose of user studies as establishing the utility of measures that can then be evaluated by Cranfield studies. As with any other science, we need to work with simplified, abstract models to achieve progress, but we also need to ground those models by validating them in the real world.

For example, consider the scenario where a collection contains no documents that match a user’s need. In this case, it is ideal for the user to reach this conclusion as accurately, quickly, and confidently as possible. Holding the interface constant, are there evaluation measures that correlate to how well users perform on these three criteria? Alternatively, can we demonstrate that some interfaces lead to better user performance than others? If so, can we establish measures suitable for those interfaces?

The “no documents” case is just one of many real-world scenarios, and I don’t mean to suggest we should study it at the expense of all others. That said, I think it’s a particularly valuable scenario that, as far as I can tell, has been neglected by the information retreival community. I use it to drive home the argument that practical use cases should drive our process of defining evaluation measures.

Categories
General

Thinking about IR Evaluation

I just read the recent Information Processing & Management special issue on Evaluation of Interactive Information Retrieval Systems. The articles were a worthwhile read, and yet they weren’t exactly what I was looking for. Let me explain.

In fact, let’s start by going back to Cranfield. The Cranfield paradigm offers us a quantitative, repeatable means to evaluate information retrieval systems. Its proponents make a strong case that it is effective and cost-effective. Its critics object that it measures the wrong thing because it neglects the user.

But let’s look a bit harder at the proponents’ case. The primary measure in use today is average precision–indeed, most authors of SIGIR papers validate their proposed approaches by demonstrating increased mean average precision (MAP) over a standard test collection of queries. The dominance of average precision as a measure is no accident: it has been shown to be the best single predictor of the precision-recall graph.

So why are folks like me complaining? There are the various user studies asserting that MAP does not predict user performance on search tasks. Those have me at hello, but the studies are controversial in the information retrieval community, and in any case not constructive.

Instead, consider a paper by Harr Chen and David Karger (both at MIT) entitled "Less is more." Here is a snippet from the abstract:

Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user’s information need.

Let me rephrase that: the precision-recall graph, which indicates how well a ranked retrieval algorithms does at ranking relevant documents ahead of irrelevant ones, does not necessarily characterize how well a system meets a user’s information need.

One of Chen and Karger’s examples is the case where the user is only interested in retrieving one relevant document. In this case, a system does well to return a diverse set of results that hedges against different possible query interpretations or query processing strategies. The authors also discuss more general scenarios, along with heuristics to address them.

But the main contribution of this paper, at least in my eyes, is a philosophical one. The authors consider the diversity of user needs and offer quantitative, repeatable way to evaluate information retrieval systems with respect to different needs. Granted, they do not even consider the challenge of evaluating interactive information retrieval. But they do set a good example.

Stay tuned for more musings on this theme…

Categories
General

A Lofty Goal

The blogosphere is all atwitter with Powerset’s public launch last night. Over at Techcrunch, Michael Arrington refers to their approach as a lofty goal.

But I’d like us to dream bigger. In the science fiction stories that inspired me to study computer and information science, the human-computer interface is not just natural language input. It’s dialogue. The authors do not treat machine understanding of unambiguous requests as a wonder, but instead take it for granted as an artifact of technical progress. Indeed, the human-computer interface only becomes relevant to the plot when communication breaks down (aka “that does not compute”).

Ever since I hacked a BASIC version of ELIZA on a Commodore 64, I’ve felt the visceral appeal of natural language input as an interface. Conversely, the progress of speech synthesis attests to our desire to humanize the machine’s output. It is as if we want to reduce the Turing Test to a look-and-feel.

But the essence of dialogue lies beneath the surface. The conversations we have with machines are driven by our information needs, and should be optimized to that end. Even we human drop natural language among ourselves when circumstances call for more efficient communication. Consider an example as mundane as Starbucks baristas eliciting and delegating a latte order.

In short, let’s remember that we want to talk with our computers, not just at them. Today’s natural language input may be a step towards that end, or it may be just a detour.

Categories
General

Powerset: Public Launch Later Today

As a member of the Powerset private beta, I just received this announcement:

Greetings Powerlabbers,

Later today, Powerset is going to launch the first publicly available version of our product. Since you’ve been active in the Powerlabs community, we wanted to give you a special heads-up to look for our release. Your suggestions, help, feedback, bug reports, and conversation have helped us immensely in creating an innovative and useful product. We hope that you’ll continue to be active in Powerlabs and make more great suggestions.

More information will be posted on Powerset’s blog later today, so keep your eye out for updates. Also, consider following us on Twitter or becoming a fan of Powerset on Facebook.

If you have a blog, we’d especially appreciate it if you’d write a blog post about your experience with this first Powerset product. Since you’ve been on the journey with us, your insight will be helpful in showing other people all of the amazing features in this release.

Again, we want to extend special thanks to you for sticking with us. We hope you feel almost as invested in this release as we are.

Thanks!

The Powerset Team

As loyal readers know, I’ve posted my impressions in the past. Now that the beta will be publicly available, I’m curious to hear impressions from you all.

Categories
General

Special Issues of Information Processing & Management

My colleague Max Wilson at the University of Southampton recently called my attention to a pair of special issues of Information Processing & Management. The first is on Evaluation of Interactive Information Retrieval Systems; the second is on Evaluating Exploratory Search Systems. Both are available online at ScienceDirect. The interactive IR papers can be downloaded for free; the exploratory search papers are available for purchase to folks who don’t have access through their institutions.

I’m behind on my reading, but the titles look promising. Stay tuned!