October 2011 – The Noisy Channel

If you followed the #hcir2011 tweet stream, then you already know what I have to say: the Fifth Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011) was an extraordinary success. We had about 100 people attending, 14 paper presentations, 28 posters, and 4 challenge entries, all packed into one intense day at Google’s beautiful Mountain View headquarters.

Wednesday evening before the workshop, we were treated to a welcome reception, the first of a few meals provided by Google’s excellent chefs. It was a great opportunity to reconnect with old friends and meet many first-time HCIR attendees.

Thursday started with a scrumptious breakfast that included chilaquiles, coconut fritters, and bacon. Last year’s keynote and this year’s local host Dan Russell pulled all the stops — apparently BigTable is the only Google cafe that serves bacon for breakfast! We then proceeded to a poster boaster session in which each poster presenter had a minute to pitch his or her poster. This session set the tone for the rest of the workshop: concentrated ideas and intense audience engagement.

Then came this year’s keynote, Gary Marchionini. It was a particular treat to have Gary as a keynote, since his lecture on “Toward Human-Computer Information Retrieval” inspired me to conceive the HCIR workshop back in 2007. And Gary delivered the goods. He started with a review of the history of HCIR, including some lesser known figures like Don Hawkins (who was in the audience) , Pauline Cochrane, Richard Marcus, and Charles Meadow. He brought a few chuckles by citing Nick Belkin (who was present) and Sue Dumais (who was not) as the father and mother of HCIR. Naturally he described some of his own work at the University of North Carolina, including the Open Video, Relation Browser, and ResultsSpace projects.But the highlight of his talk was a graph he presented showing two paths to the same user end-state, one of the paths being a smooth progression and the other being a roller-coaster of ups and down. The question of which one was better drew a wide variety of responses, my favorite being Gene Golovchinsky observing that learning is the friction of the information-seeking process.

We broke for coffee and then came back to the first session of paper presentations. Sofia Athenikos presented a semantic search engine that outperformed IMDB in a user study. Chang Liu explored the effect of task difficulty and domain knowledge on dwell times, finding counterintuitive results (at least for me) regarding the correlation of expertise to dwell time. Jingjing Liu presented research on knowledge examination in multi-session tasks. Then came the lightning talks: Mark Smucker on how users examine and process ranked document lists; Jin Kim on simulating associative browsing; Bill Kules on visualizing the stages of exploratory search; and Michael Cole on user domain knowledge and eye movement patterns during search. Way too much goodness to summarize here — I suggest you read the full papers on the workshop site.

Then came lunch — again in BigTable, but this time with outdoor seating — and the poster session. As always, this it the most interactive part of the day: two hours of non-stop discussion that start over food and end with prying people away from discussions about posters. I was especially proud of LinkedIn’s contributions to the poster session, which covered faceted search log analysis, social navigation, and whether it is time to abandon abandonment.

Then back to the second session of paper presentations. Luanne Freund talked about document usefulness and genre, finding that genre, besides being hard for users to reliably identify, only matters for tasks that involve doing, deciding, learning; but not for those that involve fact finding or problem solving. Gene Golovchinsky presented work on designing for collaboration in information seeking, previewing the system he used for his challenge entry. Alyona Medelyan used the Pingar search engine to evaluate how search interface features affect performance on biosciences tasks. Then more lightning talks: Rob Capra analyzing faceted search on mobile devices; Keith Bagley on conceptual mile markers for exploratory search; Xiaojun Yuan on how cognitive styles affect user performance; and Mike Zarro on using social tags and controlled vocabularies as search filters.

Last but not least came the HCIR Challenge:

The HCIR 2011 Challenge focuses on the case where recall is everything – namely, the problem of information availability. The information availability problem arises when the seeker faces uncertainty as to whether the information of interest is available at all. Instances of this problem include some of the highest-value information tasks, such as those facing national security and legal/patent professionals, who might spend hours or days searching to determine whether the desired information exists.

The corpus we will use for the HCIR 2011 Challenge is the CiteSeer digital library of scientific literature. The CiteSeer corpus contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.

There were four entries:

FreeSearch – Literature Search in a Natural Way
Claudiu S. Firan, Wolfgang Nejdl, Mihai Georgescu (University of Hanover), and Xinyun Sun (DEKE Lab MOE, Renmin)
Session-based search with Querium
Gene Golovchinsky (FX Palo Alto Lab) and Abdigani Diriye (University College London)
GisterPro
David L.Ostby and Edmond Brian (Visual Purple)
Query Analytics Workbench
Antony Scerri, Matthew Corkum, Keith Gutfreund, Ron Daniel Jr., Michael Taylor (Elsevier Labs)

The competition was fierce. Claudiu showed off the Faceted DBLP interface, which is well suited to the information availability task on CiteSeer data. Ed showed how GisterPro uses visualization to support the information seeking process. But it came down to a close call between the Query Analytics Workbench and Querium. Despite the Elsevier team’s impressive functionality and animated presentation, Gene’s simpler interface and application of ranked fusion won the day. Congratulations to Gene and Abdigani, this year’s HCIR Challenge winners!

We wrapped up the evening at the Tied House, a local microbrewery. And of course the discussion turned to where, when, and how we will hold next year’s workshop. Watch this space. In the meantime, my heartfelt thanks to everyone who made this year’s workshop such a success — and especially to our sponsors. Thank you Endeca, Kent State, Microsoft, and Google!

Today is a wonderful day for Endeca and Oracle! Oracle has announced that it has entered into an agreement to acquire Endeca, bringing together two of the powerhouses of information access. Quoting from the announcement: “The combination of Oracle and Endeca is expected to create a comprehensive technology platform to process, store, manage, search and analyze structured and unstructured information together. ”

As part of Endeca’s founding team, I am very proud to see this day. My ten years at Endeca were a formative experience that established my professional identity and inspired my passion to pursue the vision of human-computer information retrieval (by happy coincidence, the 5th annual HCIR workshop take place on Thursday). Reading Oracle’s presentation about the acquisition, I’m excited to see how Endeca’s technology will play a key role in unifying structured and unstructured data management and analysis for Oracle’s customers.

I take pride in my contributions to Endeca — I still slip sometimes and refer to Endeca as “we”. But the real heroes here are the folks — and especially the leadership — who have seen this journey through from start to finish. In particular, I am grateful to Steve Papa, Pete Bell, Adam Ferrari, Jack Walter, Keith Johnson, Nik Bates-Haus, and Jason Purcell for everything they have done to bring about this extraordinary outcome.

Finally, excited as I am about this event, it is only the beginning. I am excited to see Endeca’s people and technology powering one of the world’s largest enterprise software companies. Looking forward to the next play!