A topic with which I developed an obsession in my last few years at Endeca is understanding how to predict query difficulty and performance–performance in the information retrieval sense meaning results quality, not computational efficiency. If only we knew how well a search engine would do–or did–in meeting the user’s information need, we might adapt the user experience to reflect our degree of confidence.
I was particularly interested in work related to the query clarity score initially proposed by Steve Cronen-Townsend, Yun Zhou, and Bruce Croft in a 2002 paper entitled “Predicting Query Performance“. But there is a wide variety of work in this area, including methods to predict performance either before or after results retrieval.
Happily, Claudia Hauff just published a dissertation on this topic, entitled “Predicting the Effectiveness of Queries and Retrieval Systems“. It is very well written, and I recommend it to anyone interested in learning more about this subject. She presents not only her own original research, but also a comprehensive analysis of others’ efforts.
Here is an excerpt from the abstract:
In this thesis we consider users’ attempts to express their information needs through queries, or search requests and try to predict whether those requests will be of high or low quality. Intuitively, a query’s quality is determined by the outcome of the query, that is, whether the retrieved search results meet the user’s expectations. The second type of prediction methods under investigation are those which attempt to predict the quality of search systems themselves. Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.
I look forward to seeing researchers continue to build on these results, and I am excited for the day when search engines are more reflective on their own strengths and weakness.