Categories
General

SIGIR 2010: Day 3 Industry Track Afternoon Sessions

While the SIGIR 2010 Industry Track keynotes had the highest-profile speakers, the rest of the day assembled an impressive line-up:

  • The new frontiers of Web search: going beyond the 10 blue links
    Ricardo Baeza-Yates, Andrei Broder, Yoelle Maarek, and Prabhakar Raghavan, Yahoo! Labs
  • Cross-Language Information Retrieval in the Legal Domain
    Samir Abdou and Thomas Arni, Eurospider
  • Building and Configuring a Real-Time Indexing System
    Garret Swart, Ravi Palakodety, Mohammad Faisal, Wesley Lin, Oracle
  • Lessons and Challenges from Product Search
    Daniel E. Rose, A9.com (Amazon)
  • Being Social: Research in Context-aware and Personalized Information Access @ Telefonica
    Xavier Amatriain, Karen Church and Josep M. Pujol, Telefónica
  • Searching and Finding in a Long Tail Marketplace
    Neel Sundaresan, eBay
  • When No Clicks are Good News
    Carlos Castillo, Aris Gionis, Ronny Lempel, and Yoelle Maarek, Yahoo! Research

I missed the Eurospider and Oracle talks, but otherwise I spent the afternoon enjoying these sessions. The slides, along with all of the keynote slides, are available here.

Some highlights from the talks I attended:

  • Andrei Broder, a pioneer of Web IR and author of the highly cited “Taxonomy of Web Search“,  enumerated a half-dozen challenges for web search to move from its current state to one that not only accomplishes semantic analysis but also supports task completion. Naturally, the one that appeals to me is the need for search engines to move beyond query suggestion and truly engage the user in a dialog.
  • Dan Rose talked about the challenges of product search, and in particular the blessing and curse of implementing search applications for structured data (something that I’m very familiar with from my previous role at Endeca). He also warned of the dangers of over-interpreting behavioral data, e.g., a site change that increases revenue does not necessarily imply a better user experience (it could just be favoring higher-priced inventory), and may ultimately alienate customers.
  • Xavier Amatriain focused on social search, and talked about how, as we’ve turned to context to help mitigate information overload, we find ourselves confronted with the new problem of context overload. Specifically, he cited the research questioning the wisdom of the crowd, and proposed the wisdom of the (expert) few as a better alternative.
  • Neel Sundaresan offered an interesting tour of eBay Research Labs prototypes, including the BayEstimate that helps sellers improve listing titles by discovering the keywords are both representative of the item and used in buyers’ queries.
  • Finally, Carlos Castillo offered a nice approach to discover when search engine abandonment is “good abandonment“: identify a subset of “tenacious” users who almost never abandon searches and measure their abandonment–since it is almost certain to be the good kind.

All in all, I was very impressed with the quality of the Industry Track, and gratified to see how it had improved on the program I put together last year. Given the key role that industry plays in information retrieval, I think it is important that the top-tier IR conference promote the best that industry has to offer.

Categories
General

SIGIR 2010: Day 3 Industry Track Keynotes

When I organized the SIGIR 2009 Industry Track last year, my goal was to meet the standard set by the CIKM 2008 Industry Event: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the field in which they are working. I was mostly happy with the results last year, and the popularity of the industry track relative to the parallel technical sessions suggest that my assessment is not simply from personal bias.

But this year the SIGIR 2010 Industry Track broke new ground. The keynotes were from some of the most senior technologists at the world’s largest web search engines:

I won’t attempt to provide much detail about these presentations, first because I’m hoping they will all be they have all been posted online and second because Jeff Dalton has already done an excellent job of posting live-blogged notes. Rather, I’ll offer a few reactions.

William’s presentation on the “Future Search: From Information Retrieval to Information Enabled Commerce” unsurprisingly focused on the Chinese search-related market. While the topic of  Google in China was an elephant in the room, it did not surface even obliquely in the presentation–and I commend William for taking the high road. As for Baidu itself, its most interesting innovation is Aladdin, an open search platform that allows participating webmasters to submit query-content pairs.

Yossi’s presentation on “Search Flavours at Google” was a tour de force of Google’s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding–where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to predict the present. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.

Jan talked about “Query Understanding at Bing”. I really hope he makes these slides available, since they do a really nice job of describing a machine learning based architecture for processing search queries. To get an idea of this topic, check out Nick Craswell’s presentation from last year’s SIGIR.

Finally Ilya talked about “Machine Learning in Search Quality at Yandex”, the largest search engine in Russia. He described the main challenge in Russia as handling the local aspects of search: he gave as an example that, if you’re in a small town in Russia, then local results in Moscow may as well be on the moon. Local search is a topic close to my heart, not least of which because it is my day job! Ilya’s talked focused largely on Yandex’s MatrixNet implementation of learning to rank. What I’m surprised he didn’t mention is the challenges of data acquisition–in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.

All in all, the four keynotes collectively offered an excellent state-of-the-search-engine address.  As with last year, the industry track talks were the most popular morning sessions, and the speakers delivered the goods.

Categories
General

SIGIR 2010: Day 2 Technical Sessions

On the second day of the SIGIR 2010 conference, I did start shuttling between sessions to attend particular talks.

In the morning session, I attended three talks. The first, “Geometric Representations for Multiple Documents” by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides both theoretical and experimental evidence that geometric means work better than arithmetic means for representing such combinations. The second, “Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction” by Anna Shtok, Oren Kurland, and David Carmel, shows the efficacy of a utility estimation framework comprised of relevance models, measures like query clarity to estimate the representativeness of relevance models, and similarity measures to estimate the similarity or correlation between two ranked lists. The authors demonstrated significant improvements from the framework over simply using the representativeness measures for performance prediction. The third paper, “Evaluating Verbose Query Processing Techniques” by Samuel Huston and Bruce Croft, showed that removing “stop structures”, a generalization of stop words, could significantly improve performance on long queries. Interestingly, the authors evaluated their approach on “black box” commercial search engines Yahoo and Bing without knowledge of their retrieval models.

In the session after lunch, I mostly attended talks from the session on user feedback and user models. The first, “Incorporating Post-Click Behaviors Into a Click Model” by Feimin Zhong, Dong Wang, Gang Wang, Weizhu Chen, Yuchen Zhang, Zheng Chen, and Haixun Wang, proposed and experimentally validated a click model to infer document relevance from post-click behavior like dwell time that can be derived from logs. The second, “Interactive Retrieval Based on Faceted Feedback” by Lanbo Zhang and Yi Zhang, described an approach using facet values for relevance and pseudo-relevance feedback. It’s interesting work, but I think the authors should look at work my colleagues and I presented at HCIR 2008 on distinguishing whether facet values are useful for summarization or for refinement. The third, “Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time” by Chao Liu, Ryen White, and Susan Dumais, offered an elegant model of dwell time and used it to predict dwell time distribution from page-level features. Finally, I attended one talk from the session on retrieval models and ranking: “Finding Support Sentences for Entities” by Roi Blanco and Hugo Zaragoza. They present a novel approach of generalizing snippets to interfaces that offer named entities (e.g., people) as supplements to the search results. I am excited to see research that could make richer interfaces more explainable to users.

I spend the last session of the day listening to a couple of talks about users and interactive IR. The first was “Studying Trailfinding Algorithms for Enhanced Web Search” by Adish Singla, Ryen White, and Jeff Huang, turned out to be the best-paper winner. This work extends previous work that Ryen and colleagues have done on search trails and showed results of various trailfinding algorithms that outperform the trails users follow on their own. The second, “Context-Aware Ranking in Web Search” by Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, and Hang Li, analyzes requerying behavior as reformulation, specialization, generalization, or general association, and demonstrates that knowing or inferring which the user is doing significantly improves ranking of the second query’s results.

The day wrapped up with a luxurious banquet at the Hotel Intercontinental, near the Nations Plaza. After sweating through conference sessions without air conditioning, it was a welcome surprise to enjoy great food in such an elegant setting.

Categories
General

SIGIR 2010: Day 2 Keynote

The second day of the SIGIR 2010 conference kicked off with a keynote by TREC pioneer Donna Harman entitled “Is the Cranfield Paradigm Outdated?”. If you are at all familiar with Donna’s work on TREC, you’ll hardly be surprised that her answer was a resounding “NO!”.

But of course she did a lot more than defend Cranfield. She offered a comprehensive and fascinating history of the Cranfield paradigm, starting with the Cranfield 1 experiments in the late 1950s which evaluated manual indexing systems.

Most importantly, she defined the Cranfield paradigm as defining a metric that reflects real user model and building the collection before the experiments to prevent human bias and enable reusability. As she noted, this model does not say anything about only returning a ranked list of ten blue links–which is what most people (myself included) associate with the Cranfield model. Indeed, she urged us to think outside this mindset.

I loved the presentation and found the history enlightening (though Stephen Robertson corrected a few minor details). Still, I wondered if she was defining the Cranfield paradigm so broadly as to co-opt all of its critics. But I think the clear dividing line between Cranfield and non-Cranfield is whether user effects are something to avoid or embrace. I perceive the success of Cranfield as coming in large part from its reduction of user effects. But I think that much of the HCIR community sees user effects as precisely what we need to be evaluating for information seeking support systems.

In any case, it was a great keynote, and Donna promises me she will make the slides available. Of course I’ll post them here. In the mean time, check out Jeff Dalton’s notes on his great blog and the tweets at #sigir2010.

Categories
General

SIGIR 2010: Day 1 Posters

The first day of SIGIR 2010 ended with a monster poster session–over 100 posters to see in 2 hours in a hall without air conditioning! I managed to see a handful:

  • “Query Quality: User Ratings and System Predictions” by Claudia Hauff, Franciska de Jong, Diane Kelly, and Leif Azzopardi offered the startling (to me at least) result that human prediction of query difficulty did not correlate (or at best correlated weakly) to post-retrieval query performance prediction (QPP) measures like query clarity. I talked with Diane about it, and I wonder how strongly the human prediction, which was pre-retrieval, would correlate to human assessments of the results. I also don’t know how well the QPP measures she used apply to web search contexts.
  • Which leads me to the next poster I saw, “Predicting Query Performance on the Web” by Niranjan Balasubramanian, Giridhar Kumaran, and Vitor Carvalho. They offered what I saw as a much more encouraging result–namely that QPP is highly reliable when it returns low scores. In other words, a search engine may wrongly believe that it did well on a query, but it is almost certainly right when it thinks it failed. This certainty on the negative side is exactly the opening that HCIR advocates need to offer richer interaction for queries a conventional ranking approach recognizes its own failure. While some of the specifics of the authors’ approach are proprietary (they perform regression on features used by Bing), the approach seems broadly applicable.
  • Next I saw “Hashtag Retrieval in a Microblogging Environment” by Miles Efron. He provided evidence that hashtags could be an effective foundation for query expansion of Twitter search queries, using a language model approach. The approach may generalize beyond hashtags, but hashtags do have the advantage of being highly topical and relatively unambiguous by convention.
  • “The Power of Naive Query Segmentation” by Matthias Hagen, Martin Potthast, Benno Stein, and Christof Brautigam suggested a simple approach for segmenting long queries into quoted phrases: consider all segmentations and, for a given segmentation, compute a weighted sum of the Google ngram counts for each quoted phases, the weight of a phrase of length s being s^s. I don’t find the weighting particularly intuitive, but the accuracy numbers they present look quite nice relative to more sophisticated approaches.
  • “Investigating the Suboptimality and Instability of Pseudo-Relevance Feedback” by Raghavendra Udupa and Abhijit Bhole showed that an oracle with knowledge of a few high-scoring non-relevant documents could vastly improve the performance of pseudo-relevance feedback. While this information does not lead directly to any applications, it does suggest that obtaining a very small amount of feedback from the user might go a long way. I’m curious how much is possible from even a single negative-feedback input.
  • “Short Text Classification in Twitter to Improve Information Filtering” by Bharath Sriram, David Fuhry, Engin Demir, Hakan Ferhatosmanoglu, and Murat Demirbas challenged the conventional wisdom that tweets are too short for traditional classification methods. They achieved nice results, but on the relatively simple problem of classifying tweets as news, events, opinions, deals, and private messages. I was offered promises of future work, but I think the more general classification problem is much harder.
  • “Metrics for Assessing Sets of Subtopics” by Filip Radlinski, Martin Szummer, and Nick Craswell proposed an evaluation framework for result diversity based on coherence, distinctness, plausibility, and completeness. I suggested that this framework would apply nicely to faceted search interfaces, and that I’d love to see it demonstrated on production systems–especially since I think that might be easier to achieve than convincing the SIGIR community to embrace it.
  • Which leads me nicely to the last poster I saw, “Machine Learned Ranking of Entity Facets” by Roelof van Zwol, Lluis Garcia Pueyo, Mridul Muralidharan, and Borkur Sigurbjornsson. They found that they could accurately predict click-through rates on named entity facets (people, places) by learning from click logs. It’s worth noting that their entity facets are extremely clean, since they are derived from sources like Wikipedia, IMDB, GeoPlanet, and Freebase. It’s not clear to me how well their approach would work for noisier facets extracted from open-domain data.

As I said, there were over a hundred posters, and I’d meant to see far more of them. Hopefully other people will blog about some of them! Or perhaps tweet about them at #sigir2010.

Categories
General

SIGIR 2010: Day 1 Technical Sessions

I’ve always felt that parallel conference sessions are designed to optimize for anticipated regret, and SIGIR 2010 is no exception. I decided that I’d try to attend whole sessions rather than shuttle between them. I started by attending the descriptively titled “Applications I” session.

Jinyoung Kim of UMass presented joint work with Bruce Croft on “Ranking using Multiple Document Types in Desktop Search” in which they showed that type prediction can significantly improve known-item search performance in simulated desktop settings. I like the approach and result, but I’d be very interested to see how well it applied to more recall-oriented tasks.

Then came work by Googlers Enrique Alfonseca, Marius Pasca, and Enrique Robledo-Arnuncio on “Acquisition of Instance Attributes via Labeled and Related Instances” that overcomes the data sparseness of open-domain attribute extraction by computing relationships among instances and injecting this relatedness data into the instance-attribute graph so that attributes can be propagated to more instances. This is a nice enhancement to earlier work by Pasca and others on obtaining these instance-attribute graphs.

The session ended with an intriguing paper on “Relevance and Ranking in Online Dating Systems” by Yahoo researchers Fernando Diaz, Donald Metzler, and Sihem Amer-Yahia that formulated a two-way relevance model for matchmaking systems but unfortunately found that it did no better than query-independent ranking in the context of a production personals system. I would be very interested to see how the model applied to other matchmaking scenarios, such as matching job seekers to employers.

After a wonderful lunch hosted by Morgan & Claypool for authors, I attended a session on Filtering and Recommendation.

It started with a paper on “Social Media Recommendation Based on People and Tags” by IBM researchers Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. They analyzed item recommendation in an enterprise setting and found that a hybrid approach combining algorithmic tag-based recommendations with people-based recommendations achieves better performance at delivering interesting recommendations than either approach alone. I’m curious how well these results generalize outside of enterprise settings–or even how well they apply across the large variation in enterprises.

Then came work by Nikolaos Nanas, Manolis Vavalis, and Anne De Roeck on “A Network-Based Model for High-Dimensional Information Filtering”. The authors propose to overcome the “curse of dimensionality” of vector space representations of profiles by instead modeling keyword dependencies in a directed graph and applying a non-iterative activation model to it. The presentation was excellent, but I’m not entirely convinced by the baseline they used for their comparisons.

After that was a paper by Neal Lathia, Stephen Halles, Licia Capra, and Xavier Amatriain on “Temporal Diversity in Recommender Systems”. They focused on the problem that users get bored and frustrated by recommender systems that keep recommending the same items over time. They provided evidence that users prefer temporal diversity of recommendations and suggested some methods to promote it. I like the research, but I still think that recommendation engines cry out for transparency, and that transparency can also help address the diversity problem–e.g., pick a random movie the user watched and propose recommendations explicitly based on that movie.

Unfortunately I missed the last paper of the session, in which Noriaki Kawamae talked about “Serendipitous Recommendations via Innovators”.

Reminder: also check out the tweet stream with hash tag #sigir2010.

Categories
General

SIGIR 2010: Day 1 Keynote

As promised, here are some highlights of the SIGIR 2010 conference thus far. Also check out the tweet stream with hash tag #sigir2010.

I arrived here on Monday, too jet-lagged to even imagine attending the tutorials, but fortunately I recovered enough to go to the welcome reception in the Parc de Bastions that evening. Then a night of sleep and on to the main event.

Tuesday morning kicked off with a keynote by Microsoft Live Labs director Gary Flake entitled “Zoomable UIs, Information Retrieval, and the Uncanny Valley”. Flake’s premise is that information retrieval is stuck in the “uncanny valley“, a metaphor he borrows from the robotics community. According to Wikipedia:

The theory holds that when robots and other facsimiles of humans look and act almost like actual humans, it causes a response of revulsion among human observers. The “valley” in question is a dip in a proposed graph of the positivity of human reaction as a function of a robot’s lifelikeness.

Flake offered Grokker (R.I.P.) as an example of a search interface that emphasized visual clustering and got stuck in the uncanny valley. He called it “the sexiest search experience that no one was going to use”. Flake then went on to propose that moving beyond the uncanny valley would require replacing our current discrete interactions with search engines into a mode of continuous, fluid interaction where whole of data greater than sum or parts. He offered some demos, emphasizing the recently released Pivot client, that he felt provided a vision to overcome the uncanny valley.

As became clear in the question and answer period, many people (myself included) felt that this rich visual approach might work well for browsing images but not as clear a fit for text-oriented information needs–despite Flake offering a demo based on the collection of Wikipedia documents. In fairness, it may be too early to assess a proof of concept.

Categories
Uncategorized

Off to Geneva for SIGIR

I’m flying to Geneva tonight to attend SIGIR. Hope to see some of you there! I’ll be back in a week and will post highlights and personal reactions.

Categories
General

The War on Attention Poverty: Measuring Twitter Authority

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=waronattentionpoverty-100713213804-phpapp01&stripped_title=the-war-on-attention-poverty-measuring-twitter-authority

I gave this presentation today at AT&T Labs, hosted by Stephen North of Graphviz fame. The talk was recorded, but I don’t know when the video will be available. In the mean time, here are the slides.

The audience was very engaged and questioned just about all of the TunkRank model’s assumptions. I’m hopeful that as Jason Adams and Israel Kloss work on making a business out of TunkRank, they’ll bridge some of the gap between simplicity and realism.

Categories
General

Recruiting and a Lesson in Attention Scarcity

Several people have asked me recently for advice on how to recruit for their tech startups. I’ve responded by digging out the following email that someone emailed me last year. I reproduce it in full here, minus the company name:

Subject: we just got Beatles Rock Band for the office and are looking for a vocalist !!

Good Afternoon,

I hope you don’t mind me reaching out to you, but came across your LinkedIn page and my interest is peaked, to say the least. I hope after reading this you feel the same.

If you’re unfamiliar with XXXXXX, we are a distinct small and agile team that functions as an incubated start-up funded by a larger organization. What we are working on is still kind of a secret but I can tell you that it’s focused on completely changing the way we find, consume, share, and manage content on the web today. We are focused on the growing importance of the real-time web and the concurrent need to reduce the noise. We are driven by a strong desire to deliver a better overall experience with a lot less effort required from our users.

Our office is extremely open and collegiate, and we are committed to letting ideas thrive above all else. We’re a very eclectic bunch of characters, but we all share a common commitment to taking whatever we do, fun or work, to the max. Some words that have been used to describe us are: passionate, fun, funny, innovative, contrarian, automagical, brilliant, academic, whimsical, and most importantly respectful. If you fit 3 or more of those descriptions, you might just have some of that magic we’re looking for.

If you’re interested in exploring this opportunity, please email me your resume and I’ll follow up with you ASAP, and have you come by meet the team some time soon.

Either way, I hope to hear from you!

Have a great weekend,

We too are BIG karaoke fans ( I read your website) , and as I said above we just got Beatles Rock Band for the office and are looking for a vocalist !!

Cheers,
XXXXX

I see this email is a poster child of how a startup should recruit. It’s well-written, funny, and shares enough about the opportunity to be an effective hook. Most importantly it’s *personal*. Starting from the subject line that made a great first impression, the email showed proof that the sender–a complete stranger–had taken time to get to know about me.

This is a strategy that does not scale arbitrarily–and that is the whole point. A startup that is building a small team needs to choose its prospective employees carefully and then go after those prospects with full force. If you really want to earn someone’s attention, you have to show that you’ve invested attention yourself. There’s no free lunch–if you want to send out a hundred emails like this one, you’ve got your work cut out for you! But no startup should be recruiting on such a massive scale, and the increase in yield justifies the additional per-candidate investment.

Of course, this principle applies beyond the narrow context of recruiting. Indeed, it is much like an attention bond mechanism: prove to me that you’ve invested in targeting me personally, and I’ll be more inclined to invest my attention in reading your message. Indeed, search advertising follows a similar principle. I still maintain that search is not advertising, but perhaps this aspect of negotiating a shared interest between messenger and messengee is a common thread.