CIKM 2011 Industry Event: Ilya Segalovich on Improving Search Quality at Yandex

This post is last in a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose.

The final talk of the CIKM 2011 Industry Event was a talk from Yandex co-founder and CTO Ilya Segalovich on “Improving Search Quality at Yandex: Current Challenges and Solutions“.

Yandex is the world’s #5 search engine. It dominates the Russian search market, where it has over 64% market share. Ilya focused on three challenges facing Yandex: result diversification, recency-specific ranking, and cross-lingual search.

For result diversification, Ilya focused on queries containing entities without any addition indicators of intent. He asserted that entities offer a strong but incomplete signal of query intent, and in particular that entities often call for suggested query reformulations. The first step in processing such a query is entity categorization. Ilya said that Yandex achieved almost 90% precision using machine learning, and over 95% precision by incorporating manually tuned heuristics. The second step is enumerating possible search intents for the identified category in order to optimize for intent-aware expected reciprocal rank. By diversifying entity queries, Yandex reduced abandonment on popular queries, increased click-through rates, and was able to highlight possible intents in result snippets.

Ilya then talked about the problem of balancing recency and relevance in handling queries about current events. He sees recency ranking as a diversification problem, since a desire for recent content is a kind of query intent. A challenge is managing recency-specific ranking is to predict the recency sensitivity of the user for a given query. Yandex considers factors such as the fraction of results found that are at most 3 days old, the number of news results, spikes in the query stream, lexical cues (e.g., searches for “explosion” or “fire”), and Twitter trending topics. He also referred to a WWW 2006 paper he co-authored on extracting news-related queries from web query logs. The results of these efforts led to measurable improvements in click-based metrics of user happiness.

Ilya talked about a variety of efforts to support cross-lingual search. Russian users enter a significant fraction (about 15%) of non-Russian queries, but many still prefer Russian-language results. For example, a search for a company name return that company’s Russian-language home page if one is available. Yandex implements language personalization by learning a user’s language knowledge and using it as a factor in relevance computation. Yandex also uses machine translation to serve results for Russian-language queries when there are no relevant Russian-language results.

Ilya concluded by pitching the efforts that Yandex is making to participate in and support the broader information retrieval community, including running (and releasing data for) a relevance prediction challenge. It’s great to see a reminder that there is more to web search than Google vs. Bing, and refreshing to see how much Yandex shares its methodology and results with the IR community.

By Daniel Tunkelang

High-Class Consultant.