Categories
Uncategorized

Attending Endeca Discover ’08

I’ll be attending Endeca Discover ’08, Endeca’s annual user conference, from Sunday, May 18th to Wednesday, May 21st, so you might see a bit of a lull in my verbiage here while I live blog at http://blog.endeca.com and hang out in sunny Orlando with Endeca customers and partners.

If you’re attending Discover, please give me a shout and come to my sessions:

Otherwise, I’ll do my best to sneak in a post or comment, and I’ll be back in full force later next week.

Categories
General

A Utilitarian View of IR Evaluation

In many information retrieval papers that propose new techniques, the authors validate those techniques by demonstrating improved mean average precision over a standard test collection. The value of such results–at least to a practitioner–hinges on whether mean average precision correlates to utility for users. Not only do user studies place this correlation in doubt, but I have yet to see an empirical argument defending the utility of average precision as an evaluation measure. Please send me any references if you are aware of them!

Of course, user studies are fraught with complications, the most practical one being their expense. I’m not suggesting that we need to replace Cranfield studies with user studies wholesale. Rather, I see the purpose of user studies as establishing the utility of measures that can then be evaluated by Cranfield studies. As with any other science, we need to work with simplified, abstract models to achieve progress, but we also need to ground those models by validating them in the real world.

For example, consider the scenario where a collection contains no documents that match a user’s need. In this case, it is ideal for the user to reach this conclusion as accurately, quickly, and confidently as possible. Holding the interface constant, are there evaluation measures that correlate to how well users perform on these three criteria? Alternatively, can we demonstrate that some interfaces lead to better user performance than others? If so, can we establish measures suitable for those interfaces?

The “no documents” case is just one of many real-world scenarios, and I don’t mean to suggest we should study it at the expense of all others. That said, I think it’s a particularly valuable scenario that, as far as I can tell, has been neglected by the information retreival community. I use it to drive home the argument that practical use cases should drive our process of defining evaluation measures.

Categories
General

Thinking about IR Evaluation

I just read the recent Information Processing & Management special issue on Evaluation of Interactive Information Retrieval Systems. The articles were a worthwhile read, and yet they weren’t exactly what I was looking for. Let me explain.

In fact, let’s start by going back to Cranfield. The Cranfield paradigm offers us a quantitative, repeatable means to evaluate information retrieval systems. Its proponents make a strong case that it is effective and cost-effective. Its critics object that it measures the wrong thing because it neglects the user.

But let’s look a bit harder at the proponents’ case. The primary measure in use today is average precision–indeed, most authors of SIGIR papers validate their proposed approaches by demonstrating increased mean average precision (MAP) over a standard test collection of queries. The dominance of average precision as a measure is no accident: it has been shown to be the best single predictor of the precision-recall graph.

So why are folks like me complaining? There are the various user studies asserting that MAP does not predict user performance on search tasks. Those have me at hello, but the studies are controversial in the information retrieval community, and in any case not constructive.

Instead, consider a paper by Harr Chen and David Karger (both at MIT) entitled "Less is more." Here is a snippet from the abstract:

Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the user’s information need.

Let me rephrase that: the precision-recall graph, which indicates how well a ranked retrieval algorithms does at ranking relevant documents ahead of irrelevant ones, does not necessarily characterize how well a system meets a user’s information need.

One of Chen and Karger’s examples is the case where the user is only interested in retrieving one relevant document. In this case, a system does well to return a diverse set of results that hedges against different possible query interpretations or query processing strategies. The authors also discuss more general scenarios, along with heuristics to address them.

But the main contribution of this paper, at least in my eyes, is a philosophical one. The authors consider the diversity of user needs and offer quantitative, repeatable way to evaluate information retrieval systems with respect to different needs. Granted, they do not even consider the challenge of evaluating interactive information retrieval. But they do set a good example.

Stay tuned for more musings on this theme…

Categories
General

A Lofty Goal

The blogosphere is all atwitter with Powerset’s public launch last night. Over at Techcrunch, Michael Arrington refers to their approach as a lofty goal.

But I’d like us to dream bigger. In the science fiction stories that inspired me to study computer and information science, the human-computer interface is not just natural language input. It’s dialogue. The authors do not treat machine understanding of unambiguous requests as a wonder, but instead take it for granted as an artifact of technical progress. Indeed, the human-computer interface only becomes relevant to the plot when communication breaks down (aka “that does not compute”).

Ever since I hacked a BASIC version of ELIZA on a Commodore 64, I’ve felt the visceral appeal of natural language input as an interface. Conversely, the progress of speech synthesis attests to our desire to humanize the machine’s output. It is as if we want to reduce the Turing Test to a look-and-feel.

But the essence of dialogue lies beneath the surface. The conversations we have with machines are driven by our information needs, and should be optimized to that end. Even we human drop natural language among ourselves when circumstances call for more efficient communication. Consider an example as mundane as Starbucks baristas eliciting and delegating a latte order.

In short, let’s remember that we want to talk with our computers, not just at them. Today’s natural language input may be a step towards that end, or it may be just a detour.

Categories
General

Powerset: Public Launch Later Today

As a member of the Powerset private beta, I just received this announcement:

Greetings Powerlabbers,

Later today, Powerset is going to launch the first publicly available version of our product. Since you’ve been active in the Powerlabs community, we wanted to give you a special heads-up to look for our release. Your suggestions, help, feedback, bug reports, and conversation have helped us immensely in creating an innovative and useful product. We hope that you’ll continue to be active in Powerlabs and make more great suggestions.

More information will be posted on Powerset’s blog later today, so keep your eye out for updates. Also, consider following us on Twitter or becoming a fan of Powerset on Facebook.

If you have a blog, we’d especially appreciate it if you’d write a blog post about your experience with this first Powerset product. Since you’ve been on the journey with us, your insight will be helpful in showing other people all of the amazing features in this release.

Again, we want to extend special thanks to you for sticking with us. We hope you feel almost as invested in this release as we are.

Thanks!

The Powerset Team

As loyal readers know, I’ve posted my impressions in the past. Now that the beta will be publicly available, I’m curious to hear impressions from you all.

Categories
General

Special Issues of Information Processing & Management

My colleague Max Wilson at the University of Southampton recently called my attention to a pair of special issues of Information Processing & Management. The first is on Evaluation of Interactive Information Retrieval Systems; the second is on Evaluating Exploratory Search Systems. Both are available online at ScienceDirect. The interactive IR papers can be downloaded for free; the exploratory search papers are available for purchase to folks who don’t have access through their institutions.

I’m behind on my reading, but the titles look promising. Stay tuned!

Categories
General

A Harmonic Convergence

This week, Forrester released a report entitled “Search + BI = Unified Information Access”. The authors assert the convergence of search and business intelligence, a case that Forrester has been developing for quite some time.

The executive summary:

Search and business intelligence (BI) really are two sides of the same coin. Enterprise search enables people to access unstructured content like documents, blog and wiki entries, and emails stored in repositories across their organizations. BI surfaces structured data in reports and dashboards. As both technologies mature, the boundary between them is beginning to blur. Search platforms are beginning to perform BI functions like data visualization and reporting, and BI vendors have begun to incorporate simple to use search experiences into their products. Information and knowledge management professionals should take advantage of this convergence, which will have the same effect from both sides: to give businesspeople better context and information for the decisions they make every day.

It’s hard to find any fault here. In fact, the convergence of search and BI is a corollary to the fact that people (yes, businesspeople are people too) use these systems, and that the same people have no desire to distinguish between “structured” and “unstructured” content as they pursue their information needs.

That said, I do have some quibbles with how the authors expect the convergence to play out. The authors make two assertions that I have a hard time accepting at face value:

    • People will be able to execute data queries via a search box using natural language.

    Sure, but will they want to? Natural language is fraught with communication challenges, and I’m no more persuaded by natural language queries for BI than I am by natural language queries for search.

    • Visual data representations will increase understanding of linkages among concepts.

    We’ve all heard the cliché that a picture is worth a thousand words. I know this better than most, as I earned my PhD by producing visual representations of networks. But I worry that people overestimate the value of these visualizations. Data visualization is simply a way to represent data analytics. I see more value in making analytics interactive (e.g., supporting and guiding incremental refinement) than in emphasizing visual representations.

    But I quibble. I strongly agree with most of their points, including:

    • BI interfaces will encourage discovery of additional data dimensions.
    • BI and search tools will provide proactive suggestions.
    • BI and search will continue to borrow techniques from each other.

    And it doesn’t hurt that the authors express a very favorable view of Endeca. I can only hope they won’t change their minds after reading this post!

    Categories
    General

    This Conversation is Public

    An interesting implication of blogging and other social media is that conversations once conducted privately have become public. The most common examples are conversations that take place through the comment areas for posts, rather than through private email.

    My initial reaction to this phenomenon was to bemoan the loss of boundaries. But, in keeping with my recent musings about privacy, I increasingly see the virtues of public conversations. After all, a synonym for privacy, albeit with a somewhat different connotation, is secrecy. Near-antonyms include transparency and openness.

    I can’t promise to always serve personally as an open, transparent information access provider. But I’ll do so where possible. Here at The Noisy Channel, the conversation is public.

    Categories
    General

    Business, Technology, and Information

    I was fortunate to attend the Tri-State CIO Forum these last couple of days, and I thought I’d change the pace a bit by posting some reflections about it.

    In his keynote speech last night, George Colony, Chairman and CEO of Forrester Research, called on the business community to drop the name “information technology” (IT) in favor of “business technology” (BT). His reasoning, in a nutshell, was that such nomenclature would reflect the centrality of technology’s role for businesses.

    Following similar reasoning but reaching a different conclusion, Julia King, an Executive Editor for Computerworld and one of of today’s speakers, noted that IT titles are being “techno-scrubbed”, and that there is a shift from managing technology to managing information.

    While I can’t get excited about a naming debate, I do feel there’s an important point overlooked in this discussion. Even though we’ve achieved consensus on the importance of technology, we need a sharper focus on information. It is a cliché that we live in an information age, but expertise about information is scarce. Information scientists struggle to influence technology development, and information theory is mostly confined to areas like cryptography and compression.

    We have no lack of information technology. Search engines, databases, and applications built on top of them are ubiquitous. But we still just learning how to work with information.

    Categories
    General

    Saracevic on Relevance and Interaction

    There is no Nobel Prize in computer science, despite computer science having done more than any other discipline in the past fifty years to change the world. Instead, there is the Turing Award, which serves as a Nobel Prize of computing.

    But the Turing Award has never been given to anyone in information retrieval. Instead, there is the Gerard Salton Award, which serves as a Turing Award of information retrieval. Its recipients represent an A-list of information retrieval researchers.

    Last week, I had the opportunity to talk with Salton Award recipient Tefko Saracevic. If you are not familiar with Saracevic, I suggest you take an hour to watch his 2007 lecture on “Relevance in information science”.

    I won’t try to capture an hour of conversation in a blog post, but here are a few highlights:

    • We learn from philosophers, particularly Alfred Schütz, that we cannot reduce relevance to a single concept, but rather have to consider a system of interdependent relevancies, such as topical relevance, interpretational relevance, and motivational relevance.
    • When we talk about relevance measures, such as precision and recall, we evaluate results from the perspective of a user. But information retrieval approaches necessarily take a systems perspective, making assumptions about what people will want and encoding those assumptions in models and algorithms.
    • A major challenge in the information retrieval is that users–particularly web search users–often formulate queries that are ineffective, particularly because they are too short. Studies have shown that reference interviews can lead to improved retrieval effectiveness (typically through longer, more informative queries). He said that automated systems could help too, but he wasn’t aware of any that had achieved traction.
    • A variety of factors affect interactive information retrieval, including task context, intent, expertise. Moreover, people react to certain relevance clues more than others, and more within some populations than others.

    As I expected, I walked away with more questions than answers. But I did walk away reassured that my colleagues and I at Endeca , along with others in the HCIR community, are attacking the right problem: helping users formulate better queries.

    I’d like to close with an anecdote that Saracevic recounts in his 2007 lecture. Bruce Croft had just delivered an information retrieval talk, and Nick Belkin raised the objection that users need to be incorporated into the study. Croft’s conversation-ending response: “Tell us what to do, and we will do it.”

    We’re halfway there. We’ve built interactive information retrieval systems, and we see from deployment after deployment that they work. Not that there isn’t plenty of room for improvement, but the unmet challenge, as Ellen Voorhees makes clear, is evaluation. We need to address Nick Belkin’s grand challenge and establish a paradigm suitable for evaluation of interactive IR systems.