Categories
General

Questions. But Why?

Yahoo! Answers and Answers.com have been around since 2005. But community question answering (as distinct from question answering using natural language processing) has witnessed a resurgence of popularity–at least in the blogosphere and among investors. Quora and Hunch are two of hottest startups on the web, and Aardvark was acquired by Google earlier this year. Most recently, Ask.com relaunched with a return to its question-answering roots and Facebook began rolling out Facebook Questions.

So there’s no question that community question answering is hot. The question is why? In particular, is community question answering a step forward or backward relative to today’s search engines, or is it something different?

Regarding Facebook Questions, Jason Kincaid writes in TechCrunch:

Given its size, it won’t take long for Facebook to build up a massive amount of data — if that data is consistently reliable, Questions could turn into a viable alternative to Google for many queries.

That’s a big if.  But I think the bigger caveat is the vague quantifier “many”. The success of community question answering services will depend on how these services position themselves relative to users’ information needs. Anyone arguing that these services can or should replace today’s web search engines might want to consider the following examples of information needs that are typical of current search engine use:

I hope I don’t have to keep going to convince you that web search engines have earned their popularity by serving a broad class of information needs (i.e., answer lots of questions)–and that’s without even using the wide variety of personalized and social features that web search engines are rapidly developing.

The common thread in the above questions is that they focus on objective information. In general, such questions are effectively and efficiently answered by search engines based on indexed, published content (including “deep web” content made available to search engines via APIs). There’s a lot of work we can do to improve search engines, particularly in the area of supporting query formulation. But it seems silly and wasteful to route such questions to other people–human beings should not be reduced to performing tasks at which machines excel.

That said, I agree with Kincaid that there are many information needs that are well addressed by  community question answering. In particular:

  • Questions for which point of view is a feature, not a bug. Review sites succeed when they provide sincere, informed personal reactions to products and services. Similarly, routing questions to people makes sense either when we care about the answerer’s a point of view. For some questions, I want the opinion of someone who shares my taste (which is what Hunch is pursuing with its “taste graph“). For others, I want a diversity of expert opinions–for which I might turn to Aardvark (which tries to route questions to topic experts), Quora (where people follow particular topics), or LinkedIn Answers. Over time, the answers to many such questions can be published and indexed–and indeed some answers sites receive a large share of their traffic from search engines.
  • Niche topics. As much as web search as improved information accessibility for the “long tail” of published information, the effectiveness of web search can be highly variable for the most obscure information needs. Moreover, this effectiveness depends significantly on the user: some people are better at searching than others, especially in their areas of domain expertise. Social search can help level the playing field. Much as Wikipedia has surfaced much of the expertise at the head of the information distribution, community question answering can help out in the tail.
  • Community for its own sake. Even in cases where search engines are more effective and efficient than community question answering services, some people prefer to participate in a social exchange rather than to conduct a transaction with an impersonal algorithm. Indeed, researchers at Aardvark found that many of the questions posed through their service (pre-acquisition) could be answered successfully using Google. I’ll go out on a limb and assume that Aardvark’s users were early technology adopters who are quite conversant with search engines–but in some case chose to use a social alternative simply because they wanted to be social.

Conclusions? Community question answering may be overhyped right now, but it isn’t a fad. There are broad classes of subjective information needs that require a point of view, if not a diversity of views. And even if much of the use of community question answering sites is mediated by search engines indexing their archives, there will always be a need for fresh content. I also believe that social search will continue to be valuable for niche topics, since neither search engines nor searchers will ever be perfect.

But I think the biggest open question is whether people will favor community question answering simply to be social. I conjecture that, by very publicly integrating community question answering into is social networking platform, Facebook is testing the hypothesis that it can turn information seeking from a utilitarian individual task into an entertaining social destination. Given Facebook’s highly engaged user population, we won’t have to wait long to find out.

Categories
General

SIGIR 2010: Day 3 Industry Track Afternoon Sessions

While the SIGIR 2010 Industry Track keynotes had the highest-profile speakers, the rest of the day assembled an impressive line-up:

  • The new frontiers of Web search: going beyond the 10 blue links
    Ricardo Baeza-Yates, Andrei Broder, Yoelle Maarek, and Prabhakar Raghavan, Yahoo! Labs
  • Cross-Language Information Retrieval in the Legal Domain
    Samir Abdou and Thomas Arni, Eurospider
  • Building and Configuring a Real-Time Indexing System
    Garret Swart, Ravi Palakodety, Mohammad Faisal, Wesley Lin, Oracle
  • Lessons and Challenges from Product Search
    Daniel E. Rose, A9.com (Amazon)
  • Being Social: Research in Context-aware and Personalized Information Access @ Telefonica
    Xavier Amatriain, Karen Church and Josep M. Pujol, Telefónica
  • Searching and Finding in a Long Tail Marketplace
    Neel Sundaresan, eBay
  • When No Clicks are Good News
    Carlos Castillo, Aris Gionis, Ronny Lempel, and Yoelle Maarek, Yahoo! Research

I missed the Eurospider and Oracle talks, but otherwise I spent the afternoon enjoying these sessions. The slides, along with all of the keynote slides, are available here.

Some highlights from the talks I attended:

  • Andrei Broder, a pioneer of Web IR and author of the highly cited “Taxonomy of Web Search“,  enumerated a half-dozen challenges for web search to move from its current state to one that not only accomplishes semantic analysis but also supports task completion. Naturally, the one that appeals to me is the need for search engines to move beyond query suggestion and truly engage the user in a dialog.
  • Dan Rose talked about the challenges of product search, and in particular the blessing and curse of implementing search applications for structured data (something that I’m very familiar with from my previous role at Endeca). He also warned of the dangers of over-interpreting behavioral data, e.g., a site change that increases revenue does not necessarily imply a better user experience (it could just be favoring higher-priced inventory), and may ultimately alienate customers.
  • Xavier Amatriain focused on social search, and talked about how, as we’ve turned to context to help mitigate information overload, we find ourselves confronted with the new problem of context overload. Specifically, he cited the research questioning the wisdom of the crowd, and proposed the wisdom of the (expert) few as a better alternative.
  • Neel Sundaresan offered an interesting tour of eBay Research Labs prototypes, including the BayEstimate that helps sellers improve listing titles by discovering the keywords are both representative of the item and used in buyers’ queries.
  • Finally, Carlos Castillo offered a nice approach to discover when search engine abandonment is “good abandonment“: identify a subset of “tenacious” users who almost never abandon searches and measure their abandonment–since it is almost certain to be the good kind.

All in all, I was very impressed with the quality of the Industry Track, and gratified to see how it had improved on the program I put together last year. Given the key role that industry plays in information retrieval, I think it is important that the top-tier IR conference promote the best that industry has to offer.

Categories
General

SIGIR 2010: Day 3 Industry Track Keynotes

When I organized the SIGIR 2009 Industry Track last year, my goal was to meet the standard set by the CIKM 2008 Industry Event: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the field in which they are working. I was mostly happy with the results last year, and the popularity of the industry track relative to the parallel technical sessions suggest that my assessment is not simply from personal bias.

But this year the SIGIR 2010 Industry Track broke new ground. The keynotes were from some of the most senior technologists at the world’s largest web search engines:

I won’t attempt to provide much detail about these presentations, first because I’m hoping they will all be they have all been posted online and second because Jeff Dalton has already done an excellent job of posting live-blogged notes. Rather, I’ll offer a few reactions.

William’s presentation on the “Future Search: From Information Retrieval to Information Enabled Commerce” unsurprisingly focused on the Chinese search-related market. While the topic of  Google in China was an elephant in the room, it did not surface even obliquely in the presentation–and I commend William for taking the high road. As for Baidu itself, its most interesting innovation is Aladdin, an open search platform that allows participating webmasters to submit query-content pairs.

Yossi’s presentation on “Search Flavours at Google” was a tour de force of Google’s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding–where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to predict the present. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.

Jan talked about “Query Understanding at Bing”. I really hope he makes these slides available, since they do a really nice job of describing a machine learning based architecture for processing search queries. To get an idea of this topic, check out Nick Craswell’s presentation from last year’s SIGIR.

Finally Ilya talked about “Machine Learning in Search Quality at Yandex”, the largest search engine in Russia. He described the main challenge in Russia as handling the local aspects of search: he gave as an example that, if you’re in a small town in Russia, then local results in Moscow may as well be on the moon. Local search is a topic close to my heart, not least of which because it is my day job! Ilya’s talked focused largely on Yandex’s MatrixNet implementation of learning to rank. What I’m surprised he didn’t mention is the challenges of data acquisition–in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.

All in all, the four keynotes collectively offered an excellent state-of-the-search-engine address.  As with last year, the industry track talks were the most popular morning sessions, and the speakers delivered the goods.

Categories
General

SIGIR 2010: Day 2 Technical Sessions

On the second day of the SIGIR 2010 conference, I did start shuttling between sessions to attend particular talks.

In the morning session, I attended three talks. The first, “Geometric Representations for Multiple Documents” by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides both theoretical and experimental evidence that geometric means work better than arithmetic means for representing such combinations. The second, “Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction” by Anna Shtok, Oren Kurland, and David Carmel, shows the efficacy of a utility estimation framework comprised of relevance models, measures like query clarity to estimate the representativeness of relevance models, and similarity measures to estimate the similarity or correlation between two ranked lists. The authors demonstrated significant improvements from the framework over simply using the representativeness measures for performance prediction. The third paper, “Evaluating Verbose Query Processing Techniques” by Samuel Huston and Bruce Croft, showed that removing “stop structures”, a generalization of stop words, could significantly improve performance on long queries. Interestingly, the authors evaluated their approach on “black box” commercial search engines Yahoo and Bing without knowledge of their retrieval models.

In the session after lunch, I mostly attended talks from the session on user feedback and user models. The first, “Incorporating Post-Click Behaviors Into a Click Model” by Feimin Zhong, Dong Wang, Gang Wang, Weizhu Chen, Yuchen Zhang, Zheng Chen, and Haixun Wang, proposed and experimentally validated a click model to infer document relevance from post-click behavior like dwell time that can be derived from logs. The second, “Interactive Retrieval Based on Faceted Feedback” by Lanbo Zhang and Yi Zhang, described an approach using facet values for relevance and pseudo-relevance feedback. It’s interesting work, but I think the authors should look at work my colleagues and I presented at HCIR 2008 on distinguishing whether facet values are useful for summarization or for refinement. The third, “Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time” by Chao Liu, Ryen White, and Susan Dumais, offered an elegant model of dwell time and used it to predict dwell time distribution from page-level features. Finally, I attended one talk from the session on retrieval models and ranking: “Finding Support Sentences for Entities” by Roi Blanco and Hugo Zaragoza. They present a novel approach of generalizing snippets to interfaces that offer named entities (e.g., people) as supplements to the search results. I am excited to see research that could make richer interfaces more explainable to users.

I spend the last session of the day listening to a couple of talks about users and interactive IR. The first was “Studying Trailfinding Algorithms for Enhanced Web Search” by Adish Singla, Ryen White, and Jeff Huang, turned out to be the best-paper winner. This work extends previous work that Ryen and colleagues have done on search trails and showed results of various trailfinding algorithms that outperform the trails users follow on their own. The second, “Context-Aware Ranking in Web Search” by Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, and Hang Li, analyzes requerying behavior as reformulation, specialization, generalization, or general association, and demonstrates that knowing or inferring which the user is doing significantly improves ranking of the second query’s results.

The day wrapped up with a luxurious banquet at the Hotel Intercontinental, near the Nations Plaza. After sweating through conference sessions without air conditioning, it was a welcome surprise to enjoy great food in such an elegant setting.

Categories
General

SIGIR 2010: Day 2 Keynote

The second day of the SIGIR 2010 conference kicked off with a keynote by TREC pioneer Donna Harman entitled “Is the Cranfield Paradigm Outdated?”. If you are at all familiar with Donna’s work on TREC, you’ll hardly be surprised that her answer was a resounding “NO!”.

But of course she did a lot more than defend Cranfield. She offered a comprehensive and fascinating history of the Cranfield paradigm, starting with the Cranfield 1 experiments in the late 1950s which evaluated manual indexing systems.

Most importantly, she defined the Cranfield paradigm as defining a metric that reflects real user model and building the collection before the experiments to prevent human bias and enable reusability. As she noted, this model does not say anything about only returning a ranked list of ten blue links–which is what most people (myself included) associate with the Cranfield model. Indeed, she urged us to think outside this mindset.

I loved the presentation and found the history enlightening (though Stephen Robertson corrected a few minor details). Still, I wondered if she was defining the Cranfield paradigm so broadly as to co-opt all of its critics. But I think the clear dividing line between Cranfield and non-Cranfield is whether user effects are something to avoid or embrace. I perceive the success of Cranfield as coming in large part from its reduction of user effects. But I think that much of the HCIR community sees user effects as precisely what we need to be evaluating for information seeking support systems.

In any case, it was a great keynote, and Donna promises me she will make the slides available. Of course I’ll post them here. In the mean time, check out Jeff Dalton’s notes on his great blog and the tweets at #sigir2010.

Categories
General

SIGIR 2010: Day 1 Posters

The first day of SIGIR 2010 ended with a monster poster session–over 100 posters to see in 2 hours in a hall without air conditioning! I managed to see a handful:

  • “Query Quality: User Ratings and System Predictions” by Claudia Hauff, Franciska de Jong, Diane Kelly, and Leif Azzopardi offered the startling (to me at least) result that human prediction of query difficulty did not correlate (or at best correlated weakly) to post-retrieval query performance prediction (QPP) measures like query clarity. I talked with Diane about it, and I wonder how strongly the human prediction, which was pre-retrieval, would correlate to human assessments of the results. I also don’t know how well the QPP measures she used apply to web search contexts.
  • Which leads me to the next poster I saw, “Predicting Query Performance on the Web” by Niranjan Balasubramanian, Giridhar Kumaran, and Vitor Carvalho. They offered what I saw as a much more encouraging result–namely that QPP is highly reliable when it returns low scores. In other words, a search engine may wrongly believe that it did well on a query, but it is almost certainly right when it thinks it failed. This certainty on the negative side is exactly the opening that HCIR advocates need to offer richer interaction for queries a conventional ranking approach recognizes its own failure. While some of the specifics of the authors’ approach are proprietary (they perform regression on features used by Bing), the approach seems broadly applicable.
  • Next I saw “Hashtag Retrieval in a Microblogging Environment” by Miles Efron. He provided evidence that hashtags could be an effective foundation for query expansion of Twitter search queries, using a language model approach. The approach may generalize beyond hashtags, but hashtags do have the advantage of being highly topical and relatively unambiguous by convention.
  • “The Power of Naive Query Segmentation” by Matthias Hagen, Martin Potthast, Benno Stein, and Christof Brautigam suggested a simple approach for segmenting long queries into quoted phrases: consider all segmentations and, for a given segmentation, compute a weighted sum of the Google ngram counts for each quoted phases, the weight of a phrase of length s being s^s. I don’t find the weighting particularly intuitive, but the accuracy numbers they present look quite nice relative to more sophisticated approaches.
  • “Investigating the Suboptimality and Instability of Pseudo-Relevance Feedback” by Raghavendra Udupa and Abhijit Bhole showed that an oracle with knowledge of a few high-scoring non-relevant documents could vastly improve the performance of pseudo-relevance feedback. While this information does not lead directly to any applications, it does suggest that obtaining a very small amount of feedback from the user might go a long way. I’m curious how much is possible from even a single negative-feedback input.
  • “Short Text Classification in Twitter to Improve Information Filtering” by Bharath Sriram, David Fuhry, Engin Demir, Hakan Ferhatosmanoglu, and Murat Demirbas challenged the conventional wisdom that tweets are too short for traditional classification methods. They achieved nice results, but on the relatively simple problem of classifying tweets as news, events, opinions, deals, and private messages. I was offered promises of future work, but I think the more general classification problem is much harder.
  • “Metrics for Assessing Sets of Subtopics” by Filip Radlinski, Martin Szummer, and Nick Craswell proposed an evaluation framework for result diversity based on coherence, distinctness, plausibility, and completeness. I suggested that this framework would apply nicely to faceted search interfaces, and that I’d love to see it demonstrated on production systems–especially since I think that might be easier to achieve than convincing the SIGIR community to embrace it.
  • Which leads me nicely to the last poster I saw, “Machine Learned Ranking of Entity Facets” by Roelof van Zwol, Lluis Garcia Pueyo, Mridul Muralidharan, and Borkur Sigurbjornsson. They found that they could accurately predict click-through rates on named entity facets (people, places) by learning from click logs. It’s worth noting that their entity facets are extremely clean, since they are derived from sources like Wikipedia, IMDB, GeoPlanet, and Freebase. It’s not clear to me how well their approach would work for noisier facets extracted from open-domain data.

As I said, there were over a hundred posters, and I’d meant to see far more of them. Hopefully other people will blog about some of them! Or perhaps tweet about them at #sigir2010.

Categories
General

SIGIR 2010: Day 1 Technical Sessions

I’ve always felt that parallel conference sessions are designed to optimize for anticipated regret, and SIGIR 2010 is no exception. I decided that I’d try to attend whole sessions rather than shuttle between them. I started by attending the descriptively titled “Applications I” session.

Jinyoung Kim of UMass presented joint work with Bruce Croft on “Ranking using Multiple Document Types in Desktop Search” in which they showed that type prediction can significantly improve known-item search performance in simulated desktop settings. I like the approach and result, but I’d be very interested to see how well it applied to more recall-oriented tasks.

Then came work by Googlers Enrique Alfonseca, Marius Pasca, and Enrique Robledo-Arnuncio on “Acquisition of Instance Attributes via Labeled and Related Instances” that overcomes the data sparseness of open-domain attribute extraction by computing relationships among instances and injecting this relatedness data into the instance-attribute graph so that attributes can be propagated to more instances. This is a nice enhancement to earlier work by Pasca and others on obtaining these instance-attribute graphs.

The session ended with an intriguing paper on “Relevance and Ranking in Online Dating Systems” by Yahoo researchers Fernando Diaz, Donald Metzler, and Sihem Amer-Yahia that formulated a two-way relevance model for matchmaking systems but unfortunately found that it did no better than query-independent ranking in the context of a production personals system. I would be very interested to see how the model applied to other matchmaking scenarios, such as matching job seekers to employers.

After a wonderful lunch hosted by Morgan & Claypool for authors, I attended a session on Filtering and Recommendation.

It started with a paper on “Social Media Recommendation Based on People and Tags” by IBM researchers Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. They analyzed item recommendation in an enterprise setting and found that a hybrid approach combining algorithmic tag-based recommendations with people-based recommendations achieves better performance at delivering interesting recommendations than either approach alone. I’m curious how well these results generalize outside of enterprise settings–or even how well they apply across the large variation in enterprises.

Then came work by Nikolaos Nanas, Manolis Vavalis, and Anne De Roeck on “A Network-Based Model for High-Dimensional Information Filtering”. The authors propose to overcome the “curse of dimensionality” of vector space representations of profiles by instead modeling keyword dependencies in a directed graph and applying a non-iterative activation model to it. The presentation was excellent, but I’m not entirely convinced by the baseline they used for their comparisons.

After that was a paper by Neal Lathia, Stephen Halles, Licia Capra, and Xavier Amatriain on “Temporal Diversity in Recommender Systems”. They focused on the problem that users get bored and frustrated by recommender systems that keep recommending the same items over time. They provided evidence that users prefer temporal diversity of recommendations and suggested some methods to promote it. I like the research, but I still think that recommendation engines cry out for transparency, and that transparency can also help address the diversity problem–e.g., pick a random movie the user watched and propose recommendations explicitly based on that movie.

Unfortunately I missed the last paper of the session, in which Noriaki Kawamae talked about “Serendipitous Recommendations via Innovators”.

Reminder: also check out the tweet stream with hash tag #sigir2010.

Categories
General

SIGIR 2010: Day 1 Keynote

As promised, here are some highlights of the SIGIR 2010 conference thus far. Also check out the tweet stream with hash tag #sigir2010.

I arrived here on Monday, too jet-lagged to even imagine attending the tutorials, but fortunately I recovered enough to go to the welcome reception in the Parc de Bastions that evening. Then a night of sleep and on to the main event.

Tuesday morning kicked off with a keynote by Microsoft Live Labs director Gary Flake entitled “Zoomable UIs, Information Retrieval, and the Uncanny Valley”. Flake’s premise is that information retrieval is stuck in the “uncanny valley“, a metaphor he borrows from the robotics community. According to Wikipedia:

The theory holds that when robots and other facsimiles of humans look and act almost like actual humans, it causes a response of revulsion among human observers. The “valley” in question is a dip in a proposed graph of the positivity of human reaction as a function of a robot’s lifelikeness.

Flake offered Grokker (R.I.P.) as an example of a search interface that emphasized visual clustering and got stuck in the uncanny valley. He called it “the sexiest search experience that no one was going to use”. Flake then went on to propose that moving beyond the uncanny valley would require replacing our current discrete interactions with search engines into a mode of continuous, fluid interaction where whole of data greater than sum or parts. He offered some demos, emphasizing the recently released Pivot client, that he felt provided a vision to overcome the uncanny valley.

As became clear in the question and answer period, many people (myself included) felt that this rich visual approach might work well for browsing images but not as clear a fit for text-oriented information needs–despite Flake offering a demo based on the collection of Wikipedia documents. In fairness, it may be too early to assess a proof of concept.

Categories
Uncategorized

Off to Geneva for SIGIR

I’m flying to Geneva tonight to attend SIGIR. Hope to see some of you there! I’ll be back in a week and will post highlights and personal reactions.

Categories
General

The War on Attention Poverty: Measuring Twitter Authority

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=waronattentionpoverty-100713213804-phpapp01&stripped_title=the-war-on-attention-poverty-measuring-twitter-authority

I gave this presentation today at AT&T Labs, hosted by Stephen North of Graphviz fame. The talk was recorded, but I don’t know when the video will be available. In the mean time, here are the slides.

The audience was very engaged and questioned just about all of the TunkRank model’s assumptions. I’m hopeful that as Jason Adams and Israel Kloss work on making a business out of TunkRank, they’ll bridge some of the gap between simplicity and realism.