The Noisy Channel


SIGIR 2009: Day 2, Morning Sessions (Anchor Text, Vertical Search)

July 25th, 2009 · 1 Comment · General

Sorry for the delay in postings. Not only was I super-busy the past week, but I had some connectivity challenges (both at SIGIR and at the apartment where I was staying) and mostly restricted my online activity to occasional tweets during talks. I meant to catch up on my blogging yesterday, but instead spent the day wine tasting in Long Island. But enough apologizing, I’m refreshed and ready to blog up a storm!

The second day of SIGIR (Tuesday) started straight off with research talks. I went to the web retrieval session, which consisted of two talks about anchor text and one about privacy-preserving link analysis.

Building Enriched Document Representations using Aggregated Anchor Text“, by Don Metzler and colleagues at Yahoo Labs. They address the challenge of anchor text sparsity (the distribution of in-links for web pages follows a power law) by enriching document representation through aggregation of anchor text along the web graph. Their technique is intuitive, and the authors demonstrate statistically significant improvements in retrieval effectiveness. Unfortunately, their results are not repeatable, since used a proprietary test collection to obtain them.

The second talk of the session, “Using Anchor Texts with Their Hyperlink Structure for Web Search“, was by a group of authors from Microsoft Research Asia. They address the opposite problem of the previous paper: how to handle too much, rather than too little, anchor text. Specifically, they model dependence among multiple anchor texts associated with the same target document. Like the Yahoo folks, they demonstrate statistically significant results on a proprietary test collection.

The third talk, “Link Analysis for Private Weighted Graphs” (ACM DL subscribers only) by Jun Sakuma (University of Tsukuba) and Shigenobu Kobayashi (Tokyo Institute of Technology), was a bit of an outlier, if one can call a paper in a three-paper session an outlier. The authors offer privacy-preserving expansions of PageRank and HITS, the best-known link analysis methods associated with relevance and authority in web search. I’ve noticed an increasing number of papers like these that mix cryptography with information retrieval or database concerns. One of my frustrations in reading such papers is that I always suspect that people are re-inventing wheels because so few people are able to keep up with research in multiple disciplines.

Then I had the coffee break to solve my own research problem: how to fill the 11:30 slot in the Wednesday Industry Track, since a speaker called in sick that morning. When I walked by the Bing table, I saw Jan Pedersen (Chief Scientist for Core Search at Microsoft), and I begged him to help me out. I must have been a persuasive supplicant, because he procured me Nick Craswell, an applied researcher who works on Bing. Out of gratitude for this 11th-hour favor, I wore a Bing t-shirt all day yesterday as I went wine-tasting. Bing drinking, not binge drinking!

Anyway, that urgent problem resolved, I went back to enjoying the conference. For the second morning session, I went to the vertical search session.

As it turns out, that session kicked off the with SIGIR Best Paper winner: “Sources of Evidence for Vertical Selection” by Jaime Arguello (CMU), Fernando Diaz (Yahoo), Jamie Callan (CMU), and Jean-François Crespo (Yahoo). The authors do a lot of things I like: they apply query clarity as a performance predictor, and they bootstrap on an external collection (specifically Wikipedia). The test collection they use for evaluation is proprietary, but that seems to be the price (at least today) of doing this kind of work.

The second talk of the session was by a subset of the previous paper’s authors: “Adaptation of Offline Vertical Selection Predictions in the Presence of User Feedback” by Fernando Diaz and Jaime Arguello. The authors creatively used simulation to evalaute their approach. They did a nice job, but I have to admit I’m skeptical of results about feedback that aren’t based on user studies.

Unfortunately, I missed the third talk of the session because I had to play organizer. But I must have earned some good karma, because I got to enjoy a delightful lunch with Marti Hearst and David Grossman.

Stay tuned for more posts about the interactive search session, the keynote by Albert-László Barabási, the banquet at the JFK Presidential Library and Museum, and of course the Industry Track.

1 response so far ↓

  • 1 CJ // Jul 26, 2009 at 2:52 am

    Thank you for blogging about this Dan, lovely to hear about what is going on and what is of interest this year.

Clicky Web Analytics