The Search for Meaning

By a fortuitous coincidence, I had the opportunity to see two consecutive presentations from search engine companies banking on natural language processing (NLP) to power the next generation of search. The first was from Ron Kaplan, Chief Technology and Science Officer of Powerset, who presented at Columbia University. The second was from Christian Hempelmann, Chief Scientific Officer of hakia, who presented at New York Semantic Web Meetup.

The Powerset talk was entitled “Deep natural language processing for web-scale indexing and retrieval.” Jon Elsas, who attended the same talk earlier this week at CMU, did an excellent job summarizing it on his blog. I’ll simply express my reaction: I don’t get it. I have no reason to doubt that their NLP pipeline is best-in-class. The team has impressive credentials. But I see no evidence that they have produced better results than keyword search. After participating in their private beta for several months, I’d hoped that the presentation would help me see what I’d missed. I specifically asked Ron what measures they used to evaluate their system, and he was mum. So now I am more unconvinced that ever, though, to steal a line from a colleague, I cannot reconcile their enthusiasm with their results.

The hakia talk was entitled “Search for Meaning.” Christian started by making the case for a semantic, rather than statistical approach to NLP. He then presented hakia’s technology in a fair amount of detail, including walking through examples of worse sense disambiguation using context. I’m not convinced that semantics trump statistics, but I thoroughly enjoyed the presentation, and was intrigued enough to want to learn more. I find the company refreshingly open about its technology (not to mention that their beta is public), and I hope it works well enough to be practical.

Still, I’m not convinced the NLP is either the right answer or the right question. I’m no expert on the history of language, but it’s clear that natural languages are hardly optimal means of communication, even among human beings. Rather, they are artifacts of our satisficing and resisting change. Since we are lucky enough to not have developed expectations that people can communicate with computers using natural language (HAL and Star Trek notwithstanding), why take a step backwards now? Rather than advocating for inefficient, unreliable communication mechanisms like natural language, we should be figuring out ways to make communication more efficient.

To use an analogy, there’s a reason that programming languages have strict rules, and that compilers output errors rather than just trying to guess what you mean. The mild inconvenience upstream is a small cost, compared to the downstream benefits of unambiguous communication. I’m not suggesting that people start speaking in formal languages. But I do feel we should strive for a dialog-oriented approach where both the human and the computer have confidence in their mutual understanding. I can’t resist a plug for HCIR.


Ellen Voorhees defends Cranfield

I was extremely flattered to receive an email from Ellen Voorhees responding to my post about Nick Belkin’s keynote. Then I was a little bit scared, since she is a strong advocate of the Cranfield tradition, and I braced myself for her rebuttal.

She pointed me to a talk she gave at the First International Workshop on Adaptive Information Retrieval (AIR) in 2006. I’d paraphrase her argument as follows: Nick and others (including me) are right to push for a paradigm that supports AIR research, but are being naïve regarding what is necessary for such research to deliver effective–and cost-effective–results. It’s a strong case, and I’d be the first to concede that the advocates for AIR research have not (at least to my knowledge) produced a plausible abstract task that is amenable to efficient evaluation.

To quote Nick again, it’s a grand challenge. And Ellen makes it clear that what we’ve learned so far is not encouraging.


Privacy and Information Theory

Privacy is a evergreen topic in technology discussions, and increasingly finds its way into the mainstream (cf. AOL, NSA, Facebook). My impression is that most people feel that companies and government agencies are amassing their “private” data to some nefarious end.

Let’s forget about technology for a moment and subject the notion of privacy to basic examination. If I truly want to keep a secret, I don’t tell anyone. If I want to share information with you but no one else, I only disclose the information under the proviso of a social or legal contract of non-disclosure.

But there’s a major catch here: you–or I–may disclose the information involuntarily by our actions. The various establishments I frequent know my favorite foods, drinks, and even karaoke songs. More subtly, if I tell you in confidence that I don’t like or trust someone, that information is likely to visibly affect your interaction with that person. Moreover, someone who knows that we are friends might even suspect me as the cause for your change in behavior.

What does this have to do with privacy of information? Everything! The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.

For example, if you know I work for a software company and live in New York City, you know more about my gender, education, and salary than if you only know that I live in the United States. We can quantify this information gain in bits of conditional entropy.

Information theory provides a unifying framework for thinking about privacy. We can answer questions like “if I disclose that I like bagels and smoked salmon, to what extent to I disclose that I live in New York?” Or to what extent does an anonymized search log identify me personally.

If we can take this framework and make it consumable to non-information theorists, perhaps we can improve the quality of the privacy debate.


Can Search be a Utility?

A recent lecture at the New York CTO club inspired a heated discussion on what is wrong with enterprise search solutions. Specifically, Jon Williams asked why search can’t be a utility.

It’s unfortunate when such a simple question calls for a complicated answer, but I’ll try to tackle it.

On the web, almost all attempts to deviate even slightly from the venerable ranked-list paradigm have been resounding flops. More sophisticated interfaces, such as Clusty, receive favorable press coverage, but users don’t vote for them with their virtual feet. And web search users seem reasonably satisfied with their experience.

Conversely, in the enterprise, there is widespread dissatisfaction with enterprise search solutions. A number of my colleagues have said that they installed a Google Search Appliance and “it didn’t work.” (Full disclosure: Google competes with Endeca in the enterprise).

While the GSA does have some significant technical limitations, I don’t think the failures were primarily for technical reasons. Rather, I believe there was a failure of expectations. I believe the problem comes down to the question of whether relevance is subjective.

On the web, we get away with pretending that relevance is objective because there is so much agreement among users–particularly in the restricted class of queries that web search handles well, and that hence constitute the majority of actual searches.

In the enterprise, however, we not only lack the redundant and highly social structure of the web. We also tend to have more sophisticated information needs. Specifically, we tend to ask the kinds of informational queries that web search serves poorly, particularly when there is no Wikipedia page that addresses our needs.

It seems we can go in two directions.

The first is to make enterprise search more like web search by reducing the enterprise search problem to one that is user-independent and does not rely the social generation of enterprise data. Such a problem encompasses such mundane but important tasks as finding documents by title or finding department home pages. The challenges here fundamentally ones of infrastructure, reflecting the heterogeneous content repositories in enterprises and the controls mandated by business processes and regulatory compliance. Solving these problems is no cakewalk, but I think all of the major enterprise search vendors understand the framework for solving them.

The second is to embrace the difference between enterprise knowledge workers and casual web users, and to abandon the quest for an objective relevance measure. Such an approach requires admitting that there is no free lunch–that you can’t just plug in a box and expect it to solve an enterprise’s knowledge management problem. Rather, enterprise workers need to help shape the solution by supplying their proprietary knowledge and information needs. The main challenges for information access vendors are to make this process as painless as possible for enterprises, and to demonstrate the return so that enterprises make the necessary investment.


Multiple-Query Sessions

As Nick Belkin pointed out in his recent ECIR 2008 keynote, a grand challenge for the IR community is to figure out how to bring the user into the evaluation process. A key aspect of this challenge is rethinking system evaluation in terms of sessions rather than queries.

Some recent work in the IR community is very encouraging:

– Work by Ryen White and colleagues at Microsoft Research that mines session data to guide users to popular web destinations. Their paper was awarded Best Paper at SIGIR 2007.

– Work by Nick Craswell and Martin Szummer (also at Microsoft Research, and also presented at SIGIR 2007) that performs random walks on the click graph to use click data effectively as evidence to improve relevance ranking for image search on the web.

– Work by Kalervo Järvelin (at the University of Tampere in Finland) and colleagues on discounted cumulated gain based evaluation of multiple-query IR sessions that was awarded Best Paper at ECIR 2008.

This recent work–and the prominence it has received in the IR community–is refreshing, especially in light of the relative lack of academic work on interactive IR and the demise of the short-lived TREC interactive track. They are first steps, but hopefully IR researchers and practitioners will pick up on them.


Q&A with Amit Singhal

Amit Singhal, who is head of search quality at Google, gave a very entertaining keynote at ECIR ’08 that focused on the adversarial aspects of Web IR. Specifically, he discussed some of the techniques used in the arms race to game Google’s ranking algorithms. Perhaps he revealed more than he intended!

During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.

While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.

But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?

At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit’s army of tweakers.


Nick Belkin at ECIR ’08

Last week, I had the pleasure to attend the 30th European Conference on Information Retrieval, chaired by Iadh Ounis at the University of Glasgow. The conference was outstanding in several respects, not least of which was a keynote address by Nick Belkin, one the world’s leading researchers on interactive information retrieval.

Nick’s keynote, entitled “Some(what) Grand Challenges for Information Retrieval“, was a full frontal attack on the Cranfield evaluation paradigm that has dominated IR research for the past half century. I am hoping to see his keynote published and posted online, but in the meantime here is a choice excerpt:

in accepting the [Gerard Salton] award at the 1997 SIGIR meeting, Tefko Saracevic stressed the significance of integrating research in information seeking behavior with research in IR system models and algorithms, saying: “if we consider that unlike art IR is not there for its own sake, that is, IR systems are researched and built to be used, then IR is far, far more than a branch of computer science, concerned primarily with issues of algorithms, computers, and computing.”

Nevertheless, we can still see the dominance of the TREC (i.e. Cranfield) evaluation paradigm in most IR research, the inability of this paradigm to accommodate study of people in interaction with information systems (cf. the death of the TREC Interactive Track), and a dearth of research which integrates study of users’ goals, tasks and behaviors with research on models and methods which respond to results of such studies and supports those goals, tasks and behaviors.

This situation is especially striking for several reasons. First, it is clearly the case that IR as practiced is inherently interactive; secondly, it is clearly the case that the new models and associated representation and ranking techniques lead to only incremental (if that) improvement in performance over previous models and techniques, which is generally not statistically significant; and thirdly, that such improvement, as determined in TREC-style evaluation, rarely, if ever, leads to improved performance by human searchers in interactive IR systems.

Nick has long been critical of the IR community’s neglect of users and interaction. But this keynote was significant for two reasons. First, the ECIR program committee’s decision to invite a keynote speaker from the information science community acknowledges the need for collaboration between these two communities. Second, Nick reciprocated this overture by calling for interdisciplinary efforts to bridge the gap between the formal study of information retrieval and the practical understanding of information behavior. As an avid proponent of HCIR, I am heartily encouraged by steps like these.