Author: Daniel Tunkelang

High-Class Consultant.

General

Gone Medium

Post author By Daniel Tunkelang
Post date January 26, 2016

As of 2013, I stopped posting to this blog. You are welcome to find my older content here and on LinkedIn.

You can find my newer posts on Medium.

Also, check out my Query Understanding blog!

General

CIKM 2012: Notes from a Conference in Paradise

Post author By Daniel Tunkelang
Post date November 12, 2012

The moment I learned that CIKM 2012 would be held in Maui, I knew I had to be there. Having co-organized the CIKM 2011 industry event, I had enough karma to be invited as part of this year’s industry event, representing LinkedIn.

PRE-CONFERENCE: PSEUNAMI WARNING AND A WORKSHOP

I arrived in Maui on Sunday, October 28th, fortunate to miss the “pseunami” warnings prompted by an earthquake off the Canadian coast. And even more fortunate to be thousands of miles away from Hurricane Sandy.

Monday, I attended the Workshop on Data-Driven User Behavioral Modeling and Mining from Social Media. The topics within this area were diverse: they included Pinterest users, resume-job matching (unfortunately without the benefit of LinkedIn data), and street harassment stories reported via Project Hollaback.

But ironically in this workshop — and throughout the conference — there was more use of Twitter data than of Twitter itself. Most of the tweets using the #cikm2012 hashtag were my own.

DAY 1: USER ENGAGEMENT, EVALUATION BIAS

Tuesday opened with a welcome that included statistics showing how far CIKM has come as a top-tier international conference. There were 1,088 submissions this year! But the highlight of the opening was program co-chair Guy Lebanon demoing software to “improve” paper reviews. It was hilarious, if a bit close to home: the automatically generated reviews looked a lot like those generated by allegedly human reviewers.

We then proceeded to a keynote by Yahoo! Research VP Ricardo Baeza-Yates entitled “User Engagement: The Network Effect Matters!” The title was a bit confusing: it wasn’t about the conventional “network effect“, but rather about user engagement across a network of sites like those owned by Yahoo. He talked about different ways to measure user engagement, and noted that off-site (or, rather, off-network) links ultimately improve users’ downstream engagement. He also observed that style attributes outperform content attributes as predictors of user engagement. Lots of fascinating observations, but I’m curious how well they generalize beyond Yahoo.

I spent the rest of the day making tough choices among the various parallel sessions, starting with the morning session on information retrieval evaluation. Some nuggets from that session: captions and other surface features introduce significant evaluation bias; assessors have poor agreement when evaluating relevance in eDiscovery contexts; and system evaluation improves when it models user differences.

After lunch, I attended a session on web search. Some themes from that session: neighborhood-based methods are effective, whether the neighborhoods are based on document or user similarity; entities and structure are increasingly important for web search. After the coffee break, I went to the social networks session. Topics there included social contagion, online question answering, and social network data anonymization. The talks wrapped up just in time for us to watch the daily cliff diving ceremony before heading to the poster session.

DAY 2: QUERY PERFORMANCE PREDICTION, ABANDONMENT

Wednesday opened with a keynote by CMU professor William Cohen on “Learning Similarity Measures based on Random Walks in Graphs”. He described the framework and techniques that he and his colleagues used to build NELL (“Never-Ending Language Learning”). The keynote was pretty dense, but there are lots of papers available on the NELL publications page.

Then back to choosing among parallel sessions. Although I was tempted by the recommender systems session featuring presentations by my LinkedIn colleagues Mitul Tiwari and Bee-Chung Chen, I instead attended the session on ads and products. Two takeaways from that session: ad targeting benefits from explicit identification of user interests; influence maximization can be modeled adversarially as a two-player game.

After lunch, I attended the session on formal retrieval models and learning to rank. I most enjoyed the two talks by Oren Kurland that focused on query performance prediction. In particular, he offered a comprehensive probabilistic prediction framework that unifies most of the previously proposed prediction methods using a common formal basis. The session also included a deep dive into aspects of the IBM Watson question-answering system.

After the coffee break, I headed to another session on web search — one of my favorite sessions of the conference. There was a talk on query segmentation, a topic responsible for my most popular blog post. Also a great talk on identifying good abandonment, a problem I’ve been interesting ever since hearing about it at SIGIR 2010. Another talk about learning from search logs: generalizing from click entropy to “click pattern entropy” to analyze query ambiguity. And a talk on modeling domain-dependent query reformulation as machine translation using a pseudo-parallel corpus. All in all, a great session packed with practical content.

Then came a purely social evening. The conference reception was a luau, complete with kalua pig, mai tais, hula, and of course poi. Certainly my most memorable conference banquet. I didn’t take pictures, but I recommend Craig Stanfill‘s photos on Flickr.

And then some of us went to downtown Lahaina to see how the locals (and tourists) celebrate Halloween. Much as I missed spending Halloween with my family, I had a blast!

DAY 3: INDUSTRY EVENT

Thursday began with the last conference keynote: University of Kansas provost Jeffrey Vitter on “Compressed Data Structures with Relevance”. Like the previous keynotes, it was fairly dense, and I suggest you read the papers cited in his abstract if you’re interested in the technical details of how to search for query patterns in massive document collections.

Then came my main reason for attending the conference: the industry event. As seems to have become a pattern at information retrieval conferences, the industry event dominated the other parallel sessions, drawing a standing-room only crowd.

The event started with eBay VP of Research Eric Brill talking about “Having A Great Career in Research”. Unusual in a conference talk, he offered personal and practical advice to students on how to focus their passion and effort towards a happy and successful career. It reminded me of my blog post about dream, fit, and passion, and I hope students took it to heart.

IBM researcher David Carmel gave a talk entitled “Is This Entity Relevant to Your Needs?”. Noting that 71% of web search queries contain named entities (people, places, organizations), he advocated a probabilistic ranking approach to entity-oriented search that ranks retrieved entities according to amount and quality of supporting evidence.

Microsoft Technical Fellow (and former Yahoo! Fellow) Raghu Ramakrishnan talked about “The Future of Information Discovery and Search: Content Optimization, Interactivity, Semantics, and Social Networks”. He packed in a lot of nice material, most of which was from his tenure at Yahoo. He included a nice explanation of explore/exploit, which was also a reminder of how lucky we are at LinkedIn to have hired his former Yahoo colleague Deepak Agarwal.

After lunch, WalmartLabs Chief Scientist AnHai Doan gave a talk entitled “Social Media, Data Integration, and Human Computation”, in which he described constructing a “social genome” by mining social data, connecting it to web data, representing the combined information in a knowledge base. If you’re interested in more details, he’ll be giving an extended version of that talk at LinkedIn on November 29th!

Tencent Research Director Chao Liu talked about “Question Answering through Tencent Open Platform”. Beyond giving a great overview of one of the world’s largest internet platforms, he delivered great self-deprecating lines like “The name is ten cents, and the search engine is soso“.

I spoke next about LinkedIn‘s “Data By The People, For The People“. Given that the talk was right after Halloween and just before the presidential elections, I thought it appropriate to choose a title that would have appealed to one of America’s most distinguished presidents and vampire hunters. If you’re curious to learn more about data science and engineering at LinkedIn (including the publications I cited in my talk), check out http://data.linkedin.com/.

Groupon Director of Research Rajesh Parekh talked about “Leveraging Data to Power Local Commerce”. He focused on a key problem Groupon faces: determining and optimal category mix for each local market. He described how they approach this problem using portfolio theory.

After a coffee break, Adobe Chief Software Architect Tom Malloy talked about “Revolutionizing Digital Marketing with Big Data Analytics”. Apache Pig co-creator — and now Google researcher — Christopher Olston talked about work he did at Yahoo on “Programming and Debugging Large-Scale Data Processing Workflows”. Finally, Microsoft Distinguished Engineer Xuedong (“XD”) Huang gave a talk entitled “From HyperText to HyperTEC”, in which he woke up the audience by having us all participate in the “Bing it On” challenge.

FINAL THOUGHTS

All in all, CIKM 2012 was a great conference in an idyllic setting. Holding the conference in Maui might have been a bit distracting, but the desirability of the location also ensure a high-quality program.

My main complaint is that I don’t like parallel sessions — especially when the topics overlap significantly (e.g., web search sessions competed with those on ranking and recommendations). I’m also not convinced that talks have to be 25 minutes long. Perhaps the conference could more to a format of shorter talks and at least reduce the number of parallel sessions. It would also be great to see more opportunity for interaction — the coffee breaks always felt too short. For more of my thoughts on reforming academic conferences, see my 2009 blog post on the subject.

I also wish more attendees would embrace social media. It’s ironic that researchers who depend so heavily on social media data (especially Twitter) don’t engage in it personally. While I’m honored to have been the conference’s unofficial tweeter (see this visualization of the #cikm2012 tweets), I would have liked to see more attendees engage in a public online conversation. Hopefully others will at least blog about the conference.

But these are quibbles. CIKM continues to be an outstanding conference, and I’m very excited it’s coming to the Bay Area next year. See you at CIKM 2013!

General

Data By The People, For The People

Post author By Daniel Tunkelang
Post date November 11, 2012
2 Comments on Data By The People, For The People

I was fortunate this year not only to be able to attend the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) in Maui, but also to be invited as part of this year’s industry event, representing LinkedIn.

Above are the slides I presented on “Data By The People, For The People“. Enjoy!

General

LinkedIn at CIKM 2012

Last year, I had the pleasure of co-organizing the CIKM 2011 Industry Event in Glasgow. This year, I’m honored to be part of the CIKM 2012 Industry Event program, along with top industry researchers from Adobe, eBay, Google, Groupon, IBM, Microsoft, Tencent, and Walmart Labs. I’ll be giving a talk on “Data By The People, For The People“.

I’m also thrilled to be joined by my colleague Mitul Tiwari, who will be presenting a paper on “Metaphor: A System for Related Search Recommendations“, work that he did with Azarias Reda, Yubin Park, Christian Posse, and Sam Shah.

Finally, we’ll be representing Manuel Gomez-Rodriguez and Monica Rogati at the poster session, presenting their work on “Bridging Ofﬂine and Online Social Graph Dynamics“.

If you’re attending CIKM, please make sure to say hi to me and Mitul. We’d be delighted to talk to you about the work that we and our colleagues are doing. Our data team also has a site showcasing our team, our publications, and some of the projects we can discuss publicly. And, of course, we’re hiring!

General

HCIR 2012: A Personal Report

Post author By Daniel Tunkelang
Post date October 8, 2012
3 Comments on HCIR 2012: A Personal Report

Human-computer information retrieval (HCIR) is the study of information retrieval techniques that integrate human intelligence and algorithmic search to help people explore, understand, and use information. Since 2007, the HCIR Symposium (previously known as the HCIR Workshop) has provided a venue for the theoretical and practical study of HCIR. We even inspired an EuroHCIR workshop across the pond that started in 2011 and is going strong.

Overview

The Sixth Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2012) took place on October 4th and 5th at IBM Research in Cambridge, MA. The 75 attendees represented a cross-section of HCIR research and practice. Over a third of the attendees were from industry — including startups and large technology firms. We had a similar diversity of sponsors, benefiting from the generosity of FXPAL, IBM Research, LinkedIn, Mendeley, Microsoft Research, MIT CSAIL, and Oracle. And we had participants from 6 countries: Canada, Germany, Israel, New Zealand, Switzerland, and the United States.

Keynote

We started the Symposium with a keynote from UC Berkeley professor Marti Hearst, a pioneer in the area of search user interfaces, as well as a prominent researcher of information visualization, natural language processing, and social media analysis. Marti set the tone for the symposium with a visionary keynote that she entitled her “Halloween Cauldron of Ideas for Research”.

She started by talking about the unaddressed seams of sensemaking, reminding us that information seeking is only one part of an overall sensemaking process. She used the challenge of saving and personally organizing search results as an example of a neglected but crucial part of a search interface.

She then challenged us to think about how audio could be used in search interfaces. She cited a study showing that programmers comment their code better when the commenting interface uses speech rather than the keyboard. She then challenged us to consider how auditory notification or feedback could enhance the search experience.

Finally, she presented the idea of “radical collaboration”, offering as an example the use of Mechanical Turk to crowdsource vacation planning. The plans were tested by real tourists, who were delighted with the results.

Marti’s keynote was not only insightful and entertaining (one of her slides featured brain cupcakes!), but notable in how much she engaged all of us in discussion throughout her presentation. This approach was especially appropriate for an HCIR Symposium, given our emphasis on human interaction. For more detail about the keynote, I recommend Gene Golovchinsky’s summary.

Short Paper Presentations

After a coffee break, we had a session devoted to 5 short papers. Each presenter had 10 minutes: 5 minutes to present and 5 minutes for discussion.

We started off with UXLabs director Tony Russell-Rose presenting “Designing for Consumer Search Behaviour“, joint work with University College London researcher Stephann Makri. Tony could not attend in person, so he submitted a video. He presented a framework for describing consumer search behavior along with concrete examples — many of them familiar from the time that Tony and I both worked at Endeca. Most of all, he emphasized the need to close the gap between information science research and industry practice.
Then MIT professor (and Haystack principal investigator) David Karger talked about “Standards Opportunities around Data-Bearing Web Pages“. He argued that there is a small set of standard user interface patterns for authoring structured data: text search, sorting by properties, presenting items in a template, and faceted browsing. He then advocated that these primitives (which have already been implemented in the popular Exhibit framework) be incorporated into a W3C standard so that content authors can use them with the expectation that all modern browsers support them.
Next, Harvard student Elena Agapie presented joint work that she did at FXPAL with Gene Golovchinsky and Pernilla Qvarfordt, entitled “Encouraging Behavior: A Foray into Persuasive Computing“. Information retrieval researchers and practitioners have often argued that longer queries lead to better retrieval performance. But how do we get users to enter longer queries. Elena and colleagues found that the best way was not to explicitly tell them that longer queries are better, but rather to present a halo around the search box that changes color as the query gets longer. A very interesting approach to apply persuasive technology to search!
Then Rutgers student Roberto González-Ibáñez presented joint work with Chirag Shah and Ryen White on “Pseudo-Collaboration as a Method to Perform Selective Algorithmic Mediation in Collaborative IR Systems“. He presented a novel approach that identified when a user should be aided by a collaborator, and to what extent such help could enhance the user’s search success. An interesting way to achieve the benefits of both user-mediated and system-mediated collaboration.
Finally, University of Washington student Jeff Huang presented joint work with Abdigani Diriye on “Web User Interaction Mining from Touch-Enabled Mobile Devices“. He focused on the practical concerns of instrumenting interaction with search engines in mobile environments. Specifically, he suggested tracking the viewport coordinates — that is, the visible portion of the page at any given time.

The short presentation format was extremely effective, encouraging presenters to communicate their ideas efficiently and leaving ample time for discussion.

Posters and Demos

As in previous years, we followed lunch with a vibrant session for posters and demos. Some of the more popular poster themes included question answering, task difficulty, and collaborative information seeking. Here is the full list of poster / demo presentations:

Developing a Typology of Online Q&A Models and Recommending the Right Model for Each Question Type
Erik Choi, Vanessa Kitzie, Chirag Shah
Investigating Positive and Negative Affects in Collaborative Information Seeking: A Pilot Study Report
Roberto González-Ibáñez, Chirag Shah
To Ask or Not to Ask, That is The Question: Investigating Methods and Motivations for Online Q&A
Vanessa Kitzie, Erik Choi, Chirag Shah
Information Seeking Tasks: Why Do Searchers Feel Difficult?
Jingjing Liu, Chang Suk Kim
Finding Literary Themes with Relevance Feedback
Aditi Muralidharan, Marti Hearst
InFrame-Browsing: Enhancing Standard Web Search
Marcus Nitsche, Andreas Nürnberger
Trailblazer: Towards the Design of an Exploratory Search User Interface
Marcus Nitsche, Andreas Nürnberger
min: A Multi-Modal Web Interface for Math Search
Christopher Sasarak, Kevin Hart, Siyu Zhu, Richard Pospesel, David Stalnaker, Lei Hu, Robert Livolsi, Richard Zanibbi
Search Tactics in Collaborative Exploratory Web Search
Zhen Yue, Shuguang Han, Daqing He
Developing a Dual-Process Information-Seeking Model for Exploratory Search
Michael Zarro
Interactive Data Mining at the Speed of Thought
Vladimir Zelevinsky
Do Users with Different Domain Knowledge Select Different Sets of Documents?
Xiangmin Zhang, Jingjing Liu, Xiaojun Yuan, Michael Cole, Nicholas Belkin, Chang Liu
Predicting Task Difficulty from a User’s Moment to Moment Cognitive Effort During Information Seeking
Michael Cole, Jacek Gwizdka, Chang Liu, Nicholas Belkin
Effects of Domain Knowledge on User Task Performance in a Knowledge Domain Visualization System
Xiaojun Yuan, Chaomei Chen, Xiangmin Zhang, Joshua Avery, Tao Xu
Investigating the Effect of Visualization on User Performance of Information Systems
Xiaojun Yuan

Full Paper Presentations

The full paper presentations were split into two sessions, the first held on the 4th and the second held on the 5th. Each presentation slot was 30 minutes. The full papers will be made available soon through the ACM Digital Library.

University of Magdeburg student Marcus Nitsche presented “Knowledge Journey: A Web Search Interface for Young Users”, joint work with Tatiana Gossen and Andreas Nürnberger. The authors performed a study in which they found that children liked having personalized avatars that offer guidance, a wheel-shaped browsing menu, and a coverflow-style results presentation. It will be interesting to see how their study holds up in larger-scale user studies, and whether adults like some of these interface elements too.
Oregon State University professor Carlos Jensen presented “Leyline: Provenance-Based Search Using a Graphical Sketchpad”, joint work with Seyedsoroush Ghorashi. I was intrigued to see a search approach focused entirely on provenance — that is, the history of a document’s ownership and transformations. I’m particularly curious about this area, since I’m a committee member for Aleatha Parker-Wood, who is pursuing a dissertation on “Making Sense of File Systems Through Provenance and Rich Metadata“.
University of Waterloo professor Mark Smucker presented joint work with Charlie Clarke on “Modeling User Variance in Time-Biased Gain”. Their simulation-based approach produced distributions of gain that agree with distributions produced by real users. By emphasizing the effect size of differences, their approach could help uncover how much the performance differences among systems matter to real users.
Finally, University of North Carolina at Chapel Hill professor Barbara Wildemuth and University of British Columbia professor Luanne Freund delivered a highly interactive presentation on “Assigning Search Tasks Designed to Elicit Exploratory Search Behaviors”. They performed an extensive survey of information exploration literature to identify concepts that authors have used to characterize exploratory search tasks. They tested examples on the audience to see how well we agreed with their characterization and with one another.

HCIR Challenge

With Friday morning came the most anticipated event of the Symposium: the HCIR Challenge. The Challenge is now in its third year: the 2010 Challenge focused on historical exploration of news using the New York Times Annotated Corpus; the 2011 Challenge focused on the problem of information availability using the CiteSeer digital library of scientific literature.

This year, we turned to the problem of people and expertise finding, a topic of obvious personal interest. We are grateful to Mendeley for providing this year’s corpus: a database of over a million researcher profiles with associated metadata including published papers, academic status, disciplines, awards, and more taken from Mendeley’s network of 1.6M+ researchers and 180M+ academic documents.

We asked participants to build systems that could perform three kinds of tasks:

Hiring. Given a job description, produce a set of suitable candidates for the position.
Assembling a Conference Program. Given a conference’s past history, produce a set of suitable candidates for keynotes, program committee members, etc. for the conference.
Finding People to deliver Patent Research or Expert Testimony. Given a patent, produce a set of suitable candidates who could deliver relevant research or expert testimony for use in a trial. These people can be further segmented, e.g., students and other practitioners might be good at the research, while more senior experts might be more credible in high-stakes litigation.

Each of the 5 teams was given 30 minutes to present.

École Polytechnique Fédérale de Lausanne student Na Li presented “Magnifico: A Platform For Expert Mining Using Metadata“, joint work with Lei Zhou and Denis Gillet. Magnifico used a modified TF-IDF approach — where the IDF is an inverse discipline frequency — to match search queries to topic experts. It also assigned a multi-disciplinary reputation metric based on the expertise distribution of an author’s readers.
Ben-Gurion University student Dima Kagan presented “Social Network Based Search for Experts“, joint work with Yehonatan Bitton, Michael Fire, Bracha Shapira, Lior Rokach, and Judit Bar-Ilan. Their system made excellent use of additional publicly available data, cross-referencing the Mendeley user profiles with data from Academia.edu and using Microsoft Academic Search to categorize publication and journals. You can try out their application here.
University of Pittsburgh student Shuguang Han presented “IRIS-IPS: An Interactive People Search System for HCIR Challenge“– joint work with Daqing He, Zhen Yue, Jiepu Jiang, and Wei Jeng. The system used three different types of evidence to suggest candidates: expertise relevance, authority based on a PageRank algorithm applied to the co-authorship network, and social similarity using the Jaccard similarity between co-authors.
Luanne Freund and Kristof Kessler, both from the University of British Columbia, presented “Exposing and exploring academic expertise with Virtu“, joint work with Michael Huggett and Edie Rasmussen. Virtu takes a task-based approach to expertise, exposing and giving the user control over dimensions of expertise that are more or less desirable depending on the type of expert-finding task. The search interface supports information interaction and exploration through a number of browsing and filtering tools, including facets and sliders. You can try out their application here.
UCLA student Fei Liu presented the “‘iF’ People Search System“, an impressive solo effort. Also unique among the entries, iF is a mobile application, designed for the iPad and supporting swipe and multi-touch gestures. A very slick application, iF offered a novel approach to exploring the corpus of documents and people using the analysis of their reputations and social network relationships.

THE WINNER: Virtu! The competition was fierce, but Virtu stood out for the compelling approach it took to offering users control over the expert-finding process. Congratulations to Luanne, Kristof, and their colleagues for their outstanding work and well-deserved honor.

Reception

After we wrapped up the first day of the Symposium, we walked over to the nearby Technique, a restaurant in the Athenaeum Press building (home to two of Endeca’s offices in our early years) where students of Le Cordon Bleu practice their culinary skills. I’m no master chef, but I certain hope these students earned excellent grades for their performance. We enjoyed a delightful sampling of wines, appetizers, main courses, and desserts.

Conclusion

HCIR has been getting better every year, and this year was no exception. Many attendees in previous years had felt that the one-day format made the event feel rushed, and expanding to a second day took off much of the time pressure. We had ample opportunity for discussion, during the presentations as well as at the coffee breaks and reception. Finally, the Challenge was our best yet, eliciting extraordinary results from the five participating teams.

I’m proud of how far we’ve taken HCIR in these six years, and especially grateful to co-organizers Robert Capra, Gene Golovchinsky, Bill Kules, Catherine Smith, and Ryen White.

Time to start thinking about HCIR 2013!

General

Office Hours at Cambridge Brewing Company

Post author By Daniel Tunkelang
Post date October 1, 2012
3 Comments on Office Hours at Cambridge Brewing Company

I’ll be in Cambridge, MA this Thursday and Friday for the Sixth Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2012). Hope to see many of you there!

But I’ll also have a few hours on Wednesday evening to meet people informally. If you’re interested in learning more about LinkedIn, data science or anything else, then hop over to the Cambridge Brewing Company on Wednesday, October 3rd. I should be there by 5pm, assuming my flight arrives on time, and I’ll plan to stay there though dinner. I’m pretty easy to contact, so feel free to reach out to me through the usual channels.

MIT and Harvard students and faculty are especially welcome!

General

HCIR 2012 Symposium: Oct 4-5 in Cambridge, MA

Post author By Daniel Tunkelang
Post date September 21, 2012
1 Comment on HCIR 2012 Symposium: Oct 4-5 in Cambridge, MA

It’s the event you’ve been waiting for: the Sixth Symposium on Human-Computer Interaction and Information Retrieval! HCIR 2012 will take place October 4th and 5th at IBM Research in Cambridge, MA.

Who should attend?

Researchers, practitioners, and anyone else interested in the exciting work at the intersection of HCI and IR. Areas like interactive information retrieval, exploratory search, and information visualization.

Why attend?

You’ll enjoy a highly interactive day and a half of learning from HCIR leaders and pioneers. People like keynote speaker Marti Hearst, who literally wrote the book on search user interfaces. Folks from top universities and industry labs who are developing new methods and models for information seeking. And you’ll get to see the five teams competing to win the third annual HCIR Challenge, focused this year on people and expertise finding.

How to register?

Just click here and fill out the information requested. The $150 registration fee includes all sessions on both days, all meeting materials, a reception on October 4th at Technique. We’re grateful to our sponsors — FXPAL, IBM Research, LinkedIn, Microsoft Research, MIT CSAIL, and Oracle — for helping us keep the costs so low.

Capacity is limited, so please register as soon as possible to ensure your attendance.

General

LinkedIn Presentations at RecSys 2012

Post author By Daniel Tunkelang
Post date September 16, 2012
2 Comments on LinkedIn Presentations at RecSys 2012

LinkedIn showed up in force at the 6th ACM International Conference on Recommender Systems (RecSys 2012)! Here are the slides from all of our presentations.

Daniel Tunkelang: Content, Connections, and Context

Mario Rodriguez, Christian Posse, and Ethan Zhang: Multiple Objective Optimization in Recommender Systems

Anmol Bhasin: Beyond Ratings and Followers

Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, and Christian Posse: Social Referral: Leveraging Network Connections to Deliver Recommendations

General

RecSys 2012: Beyond Five Stars

Post author By Daniel Tunkelang
Post date September 14, 2012
7 Comments on RecSys 2012: Beyond Five Stars

I spent the past week in Dublin attending the 6th ACM International Conference on Recommender Systems (RecSys 2012). This young conference has become the premier global forum for discussing the state of the art in recommender systems, and I’m thrilled to have has the opportunity to participate.

Sunday: Workshops

The conference began on Sunday with a day of parallel workshops.

I attended the Workshop on Recommender Systems and the Social Web, where I presented a keynote entitled “Content, Connections, and Context“. Major worktop themes included folksonomies, trust, and pinning down what we mean by “social” and “context”. The most interesting presentation was “Online Dating Recommender Systems: The Split-complex Number Approach“, in which Jérôme Kunegis modeled the dating recommendation problem (specifically, the interaction of “like” and “is-similar” relationships) using a variation of quaternions introduced in the 19th century! The full workshop program, including slides of all the presentations is available here.

Unfortunately, I was not able to attend the other workshops that day, which focused on Human Decision Making in Recommender Systems, Context-Aware Recommender Systems (CARS), and Recommendation Utility Evaluation (RUE). But I did hear that Carlos Gomez-Uribe delivered an excellent keynote at the RUE workshop on the challenges of offline and online evaluation of Netflix’s recommender systems.

Monday: Experiments, Evaluations, and Pints All Around

Monday started with parallel tutorial sessions. I attended Bart Knijnenburg‘s tutorial on “Conducting User Experiments in Recommender Systems“. Bart is an outstanding lecturer, and he delivered an excellent overview of the evaluation landscape. My only complaint is that there was too much material for even a 90-minute session. Fortunately, his slides are online, and perhaps he’ll be persuaded to expand them into book form. Unfortunately, I missed Maria Augusta Nunes and Rong Hu‘s parallel tutorial on personality-based recommender systems.

Then came a rousing research keynote by Jure Leskovec on “How Users Evaluate Things and Each Other in Social Media“. I won’t try to summarize the keynote here — the slides of this and other presentations are available online. But the point Jure made that attracted the most interest was that voting is so predictable that results are determined mostly by turn-out. Aside from the immediate applications of this observation to the US presidential elections, there are many research and practical questions about how to obtain or incent a representative participant pool — a topic I’ve been passionate about for a long time.

The program continued with research presentations on multi-objective recommendation and social recommendations. I may be biased, but my favorite presentation was the work that my colleague Mario Rodriguez presented on multiple-objective optimization in LinkedIn’s recommendation systems. I’ll post the slides and paper here as soon as they are available.

Monday night, we went to the Guinness Storehouse for a tour that culminated with fresh pints of Guinness in the Gravity Bar overlooking the city. We’re all grateful to William Gosset, a chemist working for the Guinness brewery when he introduced the now ubiquitous t-test in 1908 as a way to monitor the quality of his product. A toast to statistics and to great beer!

Tuesday: Math, Posters, and Dancing

Tuesday started with another pair of parallel tutorial sessions. I attended Xavier Amatriain‘s tutorial on “Building Industrial-scale Real-world Recommender Systems” at Netflix. It was an excellent presentation, especially considering that Xavier had just come from a transatlantic flight! A major theme in his presentation was that Netflix is moving beyond the emphasis on user ratings to make the interaction with the user more transparent and conversational. Unfortunately I had to miss the parallel tutorial on the “The Challenge of Recommender Systems Challenges” by Alan Said, Domonkos Tikk, and Andreas Hotho.

Tuesday continued with research papers on implicit feedback and context-aware recommendations. One that drew particular interest was Daniel Kluver’s information-theoretical work to quantify the preference information contained in ratings and predictions, measured in preference bits per second (paper available here for ACM DL subscribers). And Gabor Takacs had the day’s best line with “if you don’t like math, leave the room.” He wasn’t kidding!

Then came the posters and demos — first a “slam” session where each author could make a 60-second pitch, and then two hours for everyone to interact with the authors while enjoying LinkedIn-sponsored drinks. There were lots of great posters, but my favorite was Michael Ekstrand‘s “When Recommenders Fail: Predicting Recommender Failure for Algorithm Selection and Combination“.

Tuesday night we had a delightful banquet capped by a performance of traditional Irish step dancing. The dancers, girls ranging from 4 to 18 years old, were extraordinary. I’m sorry I didn’t capture any of the performance on camera, and I’m hoping someone else did.

Wednesday: Industry Track and a Grand Finale

Wednesday morning we had the industry track. I’m biased as a co-organizer, but I heard resounding feedback that the industry track was the highlight of the conference. I was very impressed with the presentations by senior technologists at Facebook, Yahoo, StumbleUpon. LinkedIn, Microsoft, and Echo Nest. And Ronny Kohavi‘s keynote on “Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics” was a masterpiece. I encourage you to look at the slides for all of these excellent presentations.

Afterward came the last two research sessions, which included the best-paper awardee “CLiMF: Learning to Maximize Reciprocal Rank with Collaborative Less-is-More Filtering“. I’ve been a fan of “less is more” ever since seeing Harr Chen present a paper with that title at SIGIR 2006, and I’m delighted to see these concepts making their way to the RecSys community. In fact, I saw some other ideas, like learning to rank, crossing over from IR to RecSys, and I believe this cross-pollination benefits both fields. Finally, I really enjoyed the last research presentation of the conference, in which Smriti Bhagat talked about inferring and obfuscating user demographics based on ratings. The technical and ethical facets of inferring private data are topics close to my heart.

Finally, next year’s hosts exhorted this year’s participants to come to Hong Kong for RecSys 2013, and we heard the final conference presentation: Neal Lathia’s 100-euro-winning entry in the RecSys Limerick Challenge.

Thursday: Flying Home

Sadly I missed the last day conference-related activity: the doctoral symposium, the RecSys Data Challenge, and additional workshops. I’m looking forward to seeing discussion of these online, as well as reviewing the very active #recsys2012 tweet stream.

All in all, it was an excellent conference. LinkedIn, Netflix, and other industry participants comprised about a third of attendees, and there was a strong conversation bridging the gap between academic research and industry practice. I appreciate the focus of the nuances of evaluation, particularly the challenges of combining offline evaluation with online testing, and ensuring that the participant pool is robust. The one topic where I would have like to see more discussion was that of creating robust incentives for people to participate in recommender systems. Maybe next year in Hong Kong?

Oh, and we’re hiring!

General

Content, Connections, and Context

Post author By Daniel Tunkelang
Post date September 9, 2012
3 Comments on Content, Connections, and Context

This is keynote presentation I delivered at the Workshop on Recommender Systems and the Social Web, held as part of the 6th ACM International Conference on Recommender Systems (RecSys 2012):

Content, Connections, and Context

Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context.

Content comes first – we need to understand what we are recommending and to whom we are recommending it in order to decide whether the recommendation is relevant. Connections supply a social dimension, both as inputs to improve relevance and as social proof to explain the recommendations. Finally, context determines where and when a recommendation is appropriate.

I’ll talk about how we use these three kinds of signals in LinkedIn’s recommender systems, as well as the challenges we see in delivering social recommendations and measuring their relevance.

When I’m back from Dublin, I promise to blog about my impressions and reflections from the conference. In the mean time, I hope you enjoy the slides!