Categories
General

SIGIR 2010: Day 3 Industry Track Keynotes

When I organized the SIGIR 2009 Industry Track last year, my goal was to meet the standard set by the CIKM 2008 Industry Event: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the field in which they are working. I was mostly happy with the results last year, and the popularity of the industry track relative to the parallel technical sessions suggest that my assessment is not simply from personal bias.

But this year the SIGIR 2010 Industry Track broke new ground. The keynotes were from some of the most senior technologists at the world’s largest web search engines:

I won’t attempt to provide much detail about these presentations, first because I’m hoping they will all be they have all been posted online and second because Jeff Dalton has already done an excellent job of posting live-blogged notes. Rather, I’ll offer a few reactions.

William’s presentation on the “Future Search: From Information Retrieval to Information Enabled Commerce” unsurprisingly focused on the Chinese search-related market. While the topic of  Google in China was an elephant in the room, it did not surface even obliquely in the presentation–and I commend William for taking the high road. As for Baidu itself, its most interesting innovation is Aladdin, an open search platform that allows participating webmasters to submit query-content pairs.

Yossi’s presentation on “Search Flavours at Google” was a tour de force of Google’s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding–where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to predict the present. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.

Jan talked about “Query Understanding at Bing”. I really hope he makes these slides available, since they do a really nice job of describing a machine learning based architecture for processing search queries. To get an idea of this topic, check out Nick Craswell’s presentation from last year’s SIGIR.

Finally Ilya talked about “Machine Learning in Search Quality at Yandex”, the largest search engine in Russia. He described the main challenge in Russia as handling the local aspects of search: he gave as an example that, if you’re in a small town in Russia, then local results in Moscow may as well be on the moon. Local search is a topic close to my heart, not least of which because it is my day job! Ilya’s talked focused largely on Yandex’s MatrixNet implementation of learning to rank. What I’m surprised he didn’t mention is the challenges of data acquisition–in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.

All in all, the four keynotes collectively offered an excellent state-of-the-search-engine address.  As with last year, the industry track talks were the most popular morning sessions, and the speakers delivered the goods.

Categories
General

SIGIR 2010: Day 2 Technical Sessions

On the second day of the SIGIR 2010 conference, I did start shuttling between sessions to attend particular talks.

In the morning session, I attended three talks. The first, “Geometric Representations for Multiple Documents” by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides both theoretical and experimental evidence that geometric means work better than arithmetic means for representing such combinations. The second, “Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction” by Anna Shtok, Oren Kurland, and David Carmel, shows the efficacy of a utility estimation framework comprised of relevance models, measures like query clarity to estimate the representativeness of relevance models, and similarity measures to estimate the similarity or correlation between two ranked lists. The authors demonstrated significant improvements from the framework over simply using the representativeness measures for performance prediction. The third paper, “Evaluating Verbose Query Processing Techniques” by Samuel Huston and Bruce Croft, showed that removing “stop structures”, a generalization of stop words, could significantly improve performance on long queries. Interestingly, the authors evaluated their approach on “black box” commercial search engines Yahoo and Bing without knowledge of their retrieval models.

In the session after lunch, I mostly attended talks from the session on user feedback and user models. The first, “Incorporating Post-Click Behaviors Into a Click Model” by Feimin Zhong, Dong Wang, Gang Wang, Weizhu Chen, Yuchen Zhang, Zheng Chen, and Haixun Wang, proposed and experimentally validated a click model to infer document relevance from post-click behavior like dwell time that can be derived from logs. The second, “Interactive Retrieval Based on Faceted Feedback” by Lanbo Zhang and Yi Zhang, described an approach using facet values for relevance and pseudo-relevance feedback. It’s interesting work, but I think the authors should look at work my colleagues and I presented at HCIR 2008 on distinguishing whether facet values are useful for summarization or for refinement. The third, “Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time” by Chao Liu, Ryen White, and Susan Dumais, offered an elegant model of dwell time and used it to predict dwell time distribution from page-level features. Finally, I attended one talk from the session on retrieval models and ranking: “Finding Support Sentences for Entities” by Roi Blanco and Hugo Zaragoza. They present a novel approach of generalizing snippets to interfaces that offer named entities (e.g., people) as supplements to the search results. I am excited to see research that could make richer interfaces more explainable to users.

I spend the last session of the day listening to a couple of talks about users and interactive IR. The first was “Studying Trailfinding Algorithms for Enhanced Web Search” by Adish Singla, Ryen White, and Jeff Huang, turned out to be the best-paper winner. This work extends previous work that Ryen and colleagues have done on search trails and showed results of various trailfinding algorithms that outperform the trails users follow on their own. The second, “Context-Aware Ranking in Web Search” by Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, and Hang Li, analyzes requerying behavior as reformulation, specialization, generalization, or general association, and demonstrates that knowing or inferring which the user is doing significantly improves ranking of the second query’s results.

The day wrapped up with a luxurious banquet at the Hotel Intercontinental, near the Nations Plaza. After sweating through conference sessions without air conditioning, it was a welcome surprise to enjoy great food in such an elegant setting.

Categories
General

SIGIR 2010: Day 2 Keynote

The second day of the SIGIR 2010 conference kicked off with a keynote by TREC pioneer Donna Harman entitled “Is the Cranfield Paradigm Outdated?”. If you are at all familiar with Donna’s work on TREC, you’ll hardly be surprised that her answer was a resounding “NO!”.

But of course she did a lot more than defend Cranfield. She offered a comprehensive and fascinating history of the Cranfield paradigm, starting with the Cranfield 1 experiments in the late 1950s which evaluated manual indexing systems.

Most importantly, she defined the Cranfield paradigm as defining a metric that reflects real user model and building the collection before the experiments to prevent human bias and enable reusability. As she noted, this model does not say anything about only returning a ranked list of ten blue links–which is what most people (myself included) associate with the Cranfield model. Indeed, she urged us to think outside this mindset.

I loved the presentation and found the history enlightening (though Stephen Robertson corrected a few minor details). Still, I wondered if she was defining the Cranfield paradigm so broadly as to co-opt all of its critics. But I think the clear dividing line between Cranfield and non-Cranfield is whether user effects are something to avoid or embrace. I perceive the success of Cranfield as coming in large part from its reduction of user effects. But I think that much of the HCIR community sees user effects as precisely what we need to be evaluating for information seeking support systems.

In any case, it was a great keynote, and Donna promises me she will make the slides available. Of course I’ll post them here. In the mean time, check out Jeff Dalton’s notes on his great blog and the tweets at #sigir2010.

Categories
General

SIGIR 2010: Day 1 Posters

The first day of SIGIR 2010 ended with a monster poster session–over 100 posters to see in 2 hours in a hall without air conditioning! I managed to see a handful:

  • “Query Quality: User Ratings and System Predictions” by Claudia Hauff, Franciska de Jong, Diane Kelly, and Leif Azzopardi offered the startling (to me at least) result that human prediction of query difficulty did not correlate (or at best correlated weakly) to post-retrieval query performance prediction (QPP) measures like query clarity. I talked with Diane about it, and I wonder how strongly the human prediction, which was pre-retrieval, would correlate to human assessments of the results. I also don’t know how well the QPP measures she used apply to web search contexts.
  • Which leads me to the next poster I saw, “Predicting Query Performance on the Web” by Niranjan Balasubramanian, Giridhar Kumaran, and Vitor Carvalho. They offered what I saw as a much more encouraging result–namely that QPP is highly reliable when it returns low scores. In other words, a search engine may wrongly believe that it did well on a query, but it is almost certainly right when it thinks it failed. This certainty on the negative side is exactly the opening that HCIR advocates need to offer richer interaction for queries a conventional ranking approach recognizes its own failure. While some of the specifics of the authors’ approach are proprietary (they perform regression on features used by Bing), the approach seems broadly applicable.
  • Next I saw “Hashtag Retrieval in a Microblogging Environment” by Miles Efron. He provided evidence that hashtags could be an effective foundation for query expansion of Twitter search queries, using a language model approach. The approach may generalize beyond hashtags, but hashtags do have the advantage of being highly topical and relatively unambiguous by convention.
  • “The Power of Naive Query Segmentation” by Matthias Hagen, Martin Potthast, Benno Stein, and Christof Brautigam suggested a simple approach for segmenting long queries into quoted phrases: consider all segmentations and, for a given segmentation, compute a weighted sum of the Google ngram counts for each quoted phases, the weight of a phrase of length s being s^s. I don’t find the weighting particularly intuitive, but the accuracy numbers they present look quite nice relative to more sophisticated approaches.
  • “Investigating the Suboptimality and Instability of Pseudo-Relevance Feedback” by Raghavendra Udupa and Abhijit Bhole showed that an oracle with knowledge of a few high-scoring non-relevant documents could vastly improve the performance of pseudo-relevance feedback. While this information does not lead directly to any applications, it does suggest that obtaining a very small amount of feedback from the user might go a long way. I’m curious how much is possible from even a single negative-feedback input.
  • “Short Text Classification in Twitter to Improve Information Filtering” by Bharath Sriram, David Fuhry, Engin Demir, Hakan Ferhatosmanoglu, and Murat Demirbas challenged the conventional wisdom that tweets are too short for traditional classification methods. They achieved nice results, but on the relatively simple problem of classifying tweets as news, events, opinions, deals, and private messages. I was offered promises of future work, but I think the more general classification problem is much harder.
  • “Metrics for Assessing Sets of Subtopics” by Filip Radlinski, Martin Szummer, and Nick Craswell proposed an evaluation framework for result diversity based on coherence, distinctness, plausibility, and completeness. I suggested that this framework would apply nicely to faceted search interfaces, and that I’d love to see it demonstrated on production systems–especially since I think that might be easier to achieve than convincing the SIGIR community to embrace it.
  • Which leads me nicely to the last poster I saw, “Machine Learned Ranking of Entity Facets” by Roelof van Zwol, Lluis Garcia Pueyo, Mridul Muralidharan, and Borkur Sigurbjornsson. They found that they could accurately predict click-through rates on named entity facets (people, places) by learning from click logs. It’s worth noting that their entity facets are extremely clean, since they are derived from sources like Wikipedia, IMDB, GeoPlanet, and Freebase. It’s not clear to me how well their approach would work for noisier facets extracted from open-domain data.

As I said, there were over a hundred posters, and I’d meant to see far more of them. Hopefully other people will blog about some of them! Or perhaps tweet about them at #sigir2010.

Categories
General

SIGIR 2010: Day 1 Technical Sessions

I’ve always felt that parallel conference sessions are designed to optimize for anticipated regret, and SIGIR 2010 is no exception. I decided that I’d try to attend whole sessions rather than shuttle between them. I started by attending the descriptively titled “Applications I” session.

Jinyoung Kim of UMass presented joint work with Bruce Croft on “Ranking using Multiple Document Types in Desktop Search” in which they showed that type prediction can significantly improve known-item search performance in simulated desktop settings. I like the approach and result, but I’d be very interested to see how well it applied to more recall-oriented tasks.

Then came work by Googlers Enrique Alfonseca, Marius Pasca, and Enrique Robledo-Arnuncio on “Acquisition of Instance Attributes via Labeled and Related Instances” that overcomes the data sparseness of open-domain attribute extraction by computing relationships among instances and injecting this relatedness data into the instance-attribute graph so that attributes can be propagated to more instances. This is a nice enhancement to earlier work by Pasca and others on obtaining these instance-attribute graphs.

The session ended with an intriguing paper on “Relevance and Ranking in Online Dating Systems” by Yahoo researchers Fernando Diaz, Donald Metzler, and Sihem Amer-Yahia that formulated a two-way relevance model for matchmaking systems but unfortunately found that it did no better than query-independent ranking in the context of a production personals system. I would be very interested to see how the model applied to other matchmaking scenarios, such as matching job seekers to employers.

After a wonderful lunch hosted by Morgan & Claypool for authors, I attended a session on Filtering and Recommendation.

It started with a paper on “Social Media Recommendation Based on People and Tags” by IBM researchers Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. They analyzed item recommendation in an enterprise setting and found that a hybrid approach combining algorithmic tag-based recommendations with people-based recommendations achieves better performance at delivering interesting recommendations than either approach alone. I’m curious how well these results generalize outside of enterprise settings–or even how well they apply across the large variation in enterprises.

Then came work by Nikolaos Nanas, Manolis Vavalis, and Anne De Roeck on “A Network-Based Model for High-Dimensional Information Filtering”. The authors propose to overcome the “curse of dimensionality” of vector space representations of profiles by instead modeling keyword dependencies in a directed graph and applying a non-iterative activation model to it. The presentation was excellent, but I’m not entirely convinced by the baseline they used for their comparisons.

After that was a paper by Neal Lathia, Stephen Halles, Licia Capra, and Xavier Amatriain on “Temporal Diversity in Recommender Systems”. They focused on the problem that users get bored and frustrated by recommender systems that keep recommending the same items over time. They provided evidence that users prefer temporal diversity of recommendations and suggested some methods to promote it. I like the research, but I still think that recommendation engines cry out for transparency, and that transparency can also help address the diversity problem–e.g., pick a random movie the user watched and propose recommendations explicitly based on that movie.

Unfortunately I missed the last paper of the session, in which Noriaki Kawamae talked about “Serendipitous Recommendations via Innovators”.

Reminder: also check out the tweet stream with hash tag #sigir2010.

Categories
General

SIGIR 2010: Day 1 Keynote

As promised, here are some highlights of the SIGIR 2010 conference thus far. Also check out the tweet stream with hash tag #sigir2010.

I arrived here on Monday, too jet-lagged to even imagine attending the tutorials, but fortunately I recovered enough to go to the welcome reception in the Parc de Bastions that evening. Then a night of sleep and on to the main event.

Tuesday morning kicked off with a keynote by Microsoft Live Labs director Gary Flake entitled “Zoomable UIs, Information Retrieval, and the Uncanny Valley”. Flake’s premise is that information retrieval is stuck in the “uncanny valley“, a metaphor he borrows from the robotics community. According to Wikipedia:

The theory holds that when robots and other facsimiles of humans look and act almost like actual humans, it causes a response of revulsion among human observers. The “valley” in question is a dip in a proposed graph of the positivity of human reaction as a function of a robot’s lifelikeness.

Flake offered Grokker (R.I.P.) as an example of a search interface that emphasized visual clustering and got stuck in the uncanny valley. He called it “the sexiest search experience that no one was going to use”. Flake then went on to propose that moving beyond the uncanny valley would require replacing our current discrete interactions with search engines into a mode of continuous, fluid interaction where whole of data greater than sum or parts. He offered some demos, emphasizing the recently released Pivot client, that he felt provided a vision to overcome the uncanny valley.

As became clear in the question and answer period, many people (myself included) felt that this rich visual approach might work well for browsing images but not as clear a fit for text-oriented information needs–despite Flake offering a demo based on the collection of Wikipedia documents. In fairness, it may be too early to assess a proof of concept.

Categories
General

The War on Attention Poverty: Measuring Twitter Authority

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=waronattentionpoverty-100713213804-phpapp01&stripped_title=the-war-on-attention-poverty-measuring-twitter-authority

I gave this presentation today at AT&T Labs, hosted by Stephen North of Graphviz fame. The talk was recorded, but I don’t know when the video will be available. In the mean time, here are the slides.

The audience was very engaged and questioned just about all of the TunkRank model’s assumptions. I’m hopeful that as Jason Adams and Israel Kloss work on making a business out of TunkRank, they’ll bridge some of the gap between simplicity and realism.

Categories
General

Recruiting and a Lesson in Attention Scarcity

Several people have asked me recently for advice on how to recruit for their tech startups. I’ve responded by digging out the following email that someone emailed me last year. I reproduce it in full here, minus the company name:

Subject: we just got Beatles Rock Band for the office and are looking for a vocalist !!

Good Afternoon,

I hope you don’t mind me reaching out to you, but came across your LinkedIn page and my interest is peaked, to say the least. I hope after reading this you feel the same.

If you’re unfamiliar with XXXXXX, we are a distinct small and agile team that functions as an incubated start-up funded by a larger organization. What we are working on is still kind of a secret but I can tell you that it’s focused on completely changing the way we find, consume, share, and manage content on the web today. We are focused on the growing importance of the real-time web and the concurrent need to reduce the noise. We are driven by a strong desire to deliver a better overall experience with a lot less effort required from our users.

Our office is extremely open and collegiate, and we are committed to letting ideas thrive above all else. We’re a very eclectic bunch of characters, but we all share a common commitment to taking whatever we do, fun or work, to the max. Some words that have been used to describe us are: passionate, fun, funny, innovative, contrarian, automagical, brilliant, academic, whimsical, and most importantly respectful. If you fit 3 or more of those descriptions, you might just have some of that magic we’re looking for.

If you’re interested in exploring this opportunity, please email me your resume and I’ll follow up with you ASAP, and have you come by meet the team some time soon.

Either way, I hope to hear from you!

Have a great weekend,

We too are BIG karaoke fans ( I read your website) , and as I said above we just got Beatles Rock Band for the office and are looking for a vocalist !!

Cheers,
XXXXX

I see this email is a poster child of how a startup should recruit. It’s well-written, funny, and shares enough about the opportunity to be an effective hook. Most importantly it’s *personal*. Starting from the subject line that made a great first impression, the email showed proof that the sender–a complete stranger–had taken time to get to know about me.

This is a strategy that does not scale arbitrarily–and that is the whole point. A startup that is building a small team needs to choose its prospective employees carefully and then go after those prospects with full force. If you really want to earn someone’s attention, you have to show that you’ve invested attention yourself. There’s no free lunch–if you want to send out a hundred emails like this one, you’ve got your work cut out for you! But no startup should be recruiting on such a massive scale, and the increase in yield justifies the additional per-candidate investment.

Of course, this principle applies beyond the narrow context of recruiting. Indeed, it is much like an attention bond mechanism: prove to me that you’ve invested in targeting me personally, and I’ll be more inclined to invest my attention in reading your message. Indeed, search advertising follows a similar principle. I still maintain that search is not advertising, but perhaps this aspect of negotiating a shared interest between messenger and messengee is a common thread.

Categories
General

Beyond Social Currency

A research study I like enough to have blogged about it a few times is Princeton sociologist Matt Salganik‘s dissertation work on music preferences and social contagion. For those unfamiliar with this work, here is the abstract of his Science article “Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market” (co-authored with Peter Dodds and Duncan Watts):

Hit songs, books, and movies are many times more successful than average, suggesting that “the best” alternatives are qualitatively different from “the rest”; yet experts routinely fail to predict which products will succeed. We investigated this paradox experimentally, by creating an artificial “music market” in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants’ choices. Increasing the strength of social influence increased both inequality and unpredictability of success. Success was also only partly determined by quality: The best songs rarely did poorly, and the worst rarely did well, but any other result was possible.

The result is hardly surprising to anyone familiar with the history of pop music. But I’m intrigued by the possibility that technology is simultaneously pulling music as a social phenomenon in two opposite directions.

On one hand, YouTube and social networks may actually be amplifying the positive feedback of music popularity. The recent story of YouTube sensation Greyson Chance (yes, a 13-year old with his own Wikipedia entry) becoming a national phenomenon in a couple of weeks attests to the power of social contagion. I don’t mean to take anything away from Chance’s talent, but I feel safe asserting that his talent was necessary but hardly sufficient to achieve his popular success.

On the other hand, Internet radio services like Pandora and Last.fm, despite their social features, offer the possibility of drastically reducing the effect of social influence. Both of these services require users to provide some representation of their musical tastes as initial inputs, whether by selecting preset stations or using particular artists or songs as seeds. Presumably those tastes are in large part the product of social influence. But the subsequent interaction between users and these services is relatively buffered from social influence. Users hear songs while listening privately through headphones–in many cases at work or while commuting. No one else is around when those users decide how to rate what they are listening to.

Granted, social context will always seep in–I don’t think I could give a thumbs-up to a Justin Bieber song even in the privacy of my own Pandora profile. But much of the music I discover is from artists I’ve never heard of–and thus evaluate without the explicit social influence of preconceptions about those artists.

As it turns out, I often discover after the fact that a number of the artists I like have achieved popular success. I can’t tell whether that reflects on their objective music quality, my own conformity of musical taste, or skew on the part of the recommendation system (cf. does everything sounds like Coldplay?). Still, I’m quite sure that I’m not favoring music based on prior knowledge of its popularity –for the most part, I don’t have that information at the time that I decide whether I like a song. Indeed, I hear new music almost exclusively through Pandora.

I don’t know how exceptional I am as a media consumer, but I suspect my case is increasingly common. Perhaps we are heading into a world where there will be a split between musical taste as social currency vs. musical taste as purely personal pleasure. It’s harder for me to imagine books or feature-length movies becoming so divorced from social context, if only because consuming them is a much larger and concentrated investment.

Still, I think it’s a big deal that this is happening in music. It’s a welcome counterpoint to the winner-take-all dynamic that has dominated the past decades of pop music. I can’t say that it will make the music industry more of a meritocracy–or that I even know what that would mean. But I think it’s a welcome step away from the caricature of conformity demonstrated by Salganik’s research.

Categories
General

SIGIR 2010 and SimInt 2010

I’m looking forward to attending SIGIR 2010 in a few weeks and particularly to the SimInt 2010 Workshop on the Automated Evaluation of Interactive Information Retrieval. I hope I get to see a little bit of the city of Geneva, but mostly I’m excited to spend the greater part of a week immersed in the global information retrieval community.

Of course I’ll blog about the conference, though I can’t promise it will be at quite the level of detail I managed last year. Also, I’m glad that SIGIR is continuing to have an industry track, and I am impressed with the program that David Harper and Peter Schäuble have put together. Needless to say, I’m glad to not have the stress of being an organizer this year! Though I’ll put in an early plug for CIKM 2011 in Glasgow, where I’ll be organizing the industry track with former co-worker Tony Russell-Rose.

Some SIGIR papers that caught my attention in the program:

  • Predicting Search Frustration
    Henry Feild, James Allan (University of Massachusetts Amherst), Rosie Jones (Yahoo! Labs)
    (looks like a follow-up to the first two authors’ HCIR 2009 paper on Modeling Searcher Frustration)
  • Relevance and Ranking in Online Dating Systems
    Fernando Diaz, Donald Metzler, Sihem Amer-Yahia (Yahoo! Labs)
  • On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics
    Jun Wang, Jianhan Zhu (University College London)
  • Is the Cranfield Paradigm Outdated? (keynote)
    Donna Harman (NIST)
  • Interactive Retrieval Based on Faceted Feedback
    Lanbo Zhang, Yi Zhang (University of California at Santa Cruz)
  • Do User Preferences and Evaluation measures Line Up?
    Mark Sanderson, Monica Lestari Paramita, Paul Clough, Evangelos Kanoulas (University of Sheffield)
  • Human Performance and Retrieval Precision Revisited
    Mark D. Smucker, Chandra Prakash Jethani (University of Waterloo)

As for the SimInt workshop, it aims “to explore the use of Simulation of Interactions to enable automated evaluation of Interactive Information Retrieval Systems and Applications.” I’m very excited about this attempt to bridge the gap between TREC/Cranfield and IIR/HCIR through simulation. Props to Leif AzzopardiKal JärvelinJaap Kamps, and Mark Smucker for organizing it!

If you’re planning to attend SIGIR, please give me a shout! I plan to be there for the entire conference, and you’ll probably find me at the Google booth during some of the coffee breaks.