Categories
General

SIGIR 2010: Day 3 Industry Track Keynotes

When I organized the SIGIR 2009 Industry Track last year, my goal was to meet the standard set by the CIKM 2008 Industry Event: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the field in which they are working. I was mostly happy with the results last year, and the popularity of the industry track relative to the parallel technical sessions suggest that my assessment is not simply from personal bias.

But this year the SIGIR 2010 Industry Track broke new ground. The keynotes were from some of the most senior technologists at the world’s largest web search engines:

I won’t attempt to provide much detail about these presentations, first because I’m hoping they will all be they have all been posted online and second because Jeff Dalton has already done an excellent job of posting live-blogged notes. Rather, I’ll offer a few reactions.

William’s presentation on the “Future Search: From Information Retrieval to Information Enabled Commerce” unsurprisingly focused on the Chinese search-related market. While the topic of  Google in China was an elephant in the room, it did not surface even obliquely in the presentation–and I commend William for taking the high road. As for Baidu itself, its most interesting innovation is Aladdin, an open search platform that allows participating webmasters to submit query-content pairs.

Yossi’s presentation on “Search Flavours at Google” was a tour de force of Google’s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding–where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to predict the present. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.

Jan talked about “Query Understanding at Bing”. I really hope he makes these slides available, since they do a really nice job of describing a machine learning based architecture for processing search queries. To get an idea of this topic, check out Nick Craswell’s presentation from last year’s SIGIR.

Finally Ilya talked about “Machine Learning in Search Quality at Yandex”, the largest search engine in Russia. He described the main challenge in Russia as handling the local aspects of search: he gave as an example that, if you’re in a small town in Russia, then local results in Moscow may as well be on the moon. Local search is a topic close to my heart, not least of which because it is my day job! Ilya’s talked focused largely on Yandex’s MatrixNet implementation of learning to rank. What I’m surprised he didn’t mention is the challenges of data acquisition–in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.

All in all, the four keynotes collectively offered an excellent state-of-the-search-engine address.  As with last year, the industry track talks were the most popular morning sessions, and the speakers delivered the goods.

By Daniel Tunkelang

High-Class Consultant.

6 replies on “SIGIR 2010: Day 3 Industry Track Keynotes”

Yossi’s presentation on “Search Flavours at Google” was a tour de force of Google’s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding–where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to predict the present. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.

Sorry, I missed Yossi’s presentation, so could you clarify: How exactly does integrating social and geographical context, as well as predicting the present, mean that Google has gone beyond 10 blue links?

What I mean is, you can integrate prediction and context, and still only show the user 10 blue links. Those 10 links will be higher “quality”, because you’ve integrated that additional information. But using social/geographic/etc context alone doesn’t automatically mean that you’ve gone beyond 10 links.

Or what am I not understanding?

Like

What I’m surprised he didn’t mention is the challenges of data acquisition–in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.

Could you elaborate on this? Yandex is a web search engine. What kind of challenges should it mention then?

Like

The Yandex presentation emphasized the need for geo-specific ranking because relevant result sets vary across all regions and countries. Consider a query for “pizza”. Returning locally relevant results results is a lot easier if you have some idea of where the user is and what are the nearby pizza places.

Like

Comments are closed.