The Noisy Channel

 

HCIR 2012 Symposium: Oct 4-5 in Cambridge, MA

September 21st, 2012 by Daniel Tunkelang
Respond

It’s the event you’ve been waiting for: the Sixth Symposium on Human-Computer Interaction and Information Retrieval! HCIR 2012 will take place October 4th and 5th at IBM Research in Cambridge, MA.

Who should attend?

Researchers, practitioners, and anyone else interested in the exciting work at the intersection of HCI and IR. Areas like interactive information retrieval, exploratory search, and information visualization.

Why attend?

You’ll enjoy a highly interactive day and a half of learning from HCIR leaders and pioneers. People like keynote speaker Marti Hearst, who literally wrote the book on search user interfaces. Folks from top universities and industry labs who are developing new methods and models for information seeking. And you’ll get to see the five teams competing to win the third annual HCIR Challenge, focused this year on people and expertise finding.

How to register?

Just click here and fill out the information requested. The $150 registration fee includes all sessions on both days, all meeting materials, a reception on October 4th at Technique. We’re grateful to our sponsors — FXPAL, IBM Research, LinkedIn, Microsoft Research, MIT CSAIL, and Oracle — for helping us keep the costs so low.

Capacity is limited, so please register as soon as possible to ensure your attendance.

1 Comment

LinkedIn Presentations at RecSys 2012

September 16th, 2012 by Daniel Tunkelang
Respond

LinkedIn showed up in force at the 6th ACM International Conference on Recommender Systems (RecSys 2012)! Here are the slides from all of our presentations.

Daniel Tunkelang: Content, Connections, and Context

 

Mario Rodriguez, Christian Posse, and Ethan Zhang: Multiple Objective Optimization in Recommender Systems

 

Anmol Bhasin: Beyond Ratings and Followers

 

Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, and Christian Posse: Social Referral: Leveraging Network Connections to Deliver Recommendations

2 Comments

RecSys 2012: Beyond Five Stars

September 14th, 2012 by Daniel Tunkelang
Respond

I spent the past week in Dublin attending the 6th ACM International Conference on Recommender Systems (RecSys 2012). This young conference has become the premier global forum for discussing the state of the art in recommender systems, and I’m thrilled to have has the opportunity to participate.

Sunday: Workshops

The conference began on Sunday with a day of parallel workshops.

I attended the Workshop on Recommender Systems and the Social Web, where I presented a keynote entitled “Content, Connections, and Context“. Major worktop themes included folksonomies, trust, and pinning down what we mean by “social” and “context”. The most interesting presentation was “Online Dating Recommender Systems: The Split-complex Number Approach“, in which Jérôme Kunegis modeled the dating recommendation problem (specifically, the interaction of “like” and “is-similar” relationships) using a variation of quaternions introduced in the 19th century! The full workshop program, including slides of all the presentations is available here.

Unfortunately, I was not able to attend the other workshops that day, which focused on Human Decision Making in Recommender Systems, Context-Aware Recommender Systems (CARS), and Recommendation Utility Evaluation (RUE). But I did hear that Carlos Gomez-Uribe delivered an excellent keynote at the RUE workshop on the challenges of offline and online evaluation of Netflix’s recommender systems.

Monday: Experiments, Evaluations, and Pints All Around

Monday started with parallel tutorial sessions. I attended Bart Knijnenburg‘s tutorial on “Conducting User Experiments in Recommender Systems“. Bart is an outstanding lecturer, and he delivered an excellent overview of the evaluation landscape. My only complaint is that there was too much material for even a 90-minute session. Fortunately, his slides are online, and perhaps he’ll be persuaded to expand them into book form. Unfortunately, I missed Maria Augusta Nunes and Rong Hu‘s parallel tutorial on personality-based recommender systems.

Then came a rousing research keynote by Jure Leskovec on “How Users Evaluate Things and Each Other in Social Media“. I won’t try to summarize the keynote here — the slides of this and other presentations are available online. But the point Jure made that attracted the most interest was that voting is so predictable that results are determined mostly by turn-out. Aside from the immediate applications of this observation to the US presidential elections, there are many research and practical questions about how to obtain or incent a representative participant pool — a topic I’ve been passionate about for a long time.

The program continued with research presentations on multi-objective recommendation and social recommendations. I may be biased, but my favorite presentation was the work that my colleague Mario Rodriguez presented on multiple-objective optimization in LinkedIn’s recommendation systems. I’ll post the slides and paper here as soon as they are available.

Monday night, we went to the Guinness Storehouse for a tour that culminated with fresh pints of Guinness in the Gravity Bar overlooking the city. We’re all grateful to William Gosset, a chemist working for the Guinness brewery when he introduced the now ubiquitous t-test in 1908 as a way to monitor the quality of his product. A toast to statistics and to great beer!

Tuesday: Math, Posters, and Dancing

Tuesday started with another pair of parallel tutorial sessions. I attended Xavier Amatriain‘s tutorial on “Building Industrial-scale Real-world Recommender Systems” at Netflix. It was an excellent presentation, especially considering that Xavier had just come from a transatlantic flight! A major theme in his presentation was that Netflix is moving beyond the emphasis on user ratings to make the interaction with the user more transparent and conversational. Unfortunately I had to miss the parallel tutorial on the “The Challenge of Recommender Systems Challenges” by Alan SaidDomonkos Tikk, and Andreas Hotho.

Tuesday continued with research papers on implicit feedback and context-aware recommendations. One that drew particular interest was Daniel Kluver’s information-theoretical work to quantify the preference information contained in ratings and predictions, measured in preference bits per second (paper available here for ACM DL subscribers). And Gabor Takacs had the day’s best line with “if you don’t like math, leave the room.” He wasn’t kidding!

Then came the posters and demos — first a “slam” session where each author could make a 60-second pitch, and then two hours for everyone to interact with the authors while enjoying LinkedIn-sponsored drinks. There were lots of great posters, but my favorite was Michael Ekstrand‘s “When Recommenders Fail: Predicting Recommender Failure for Algorithm Selection and Combination“.

Tuesday night we had a delightful banquet capped by a performance of traditional Irish step dancing. The dancers, girls ranging from 4 to 18 years old, were extraordinary. I’m sorry I didn’t capture any of the performance on camera, and I’m hoping someone else did.

Wednesday: Industry Track and a Grand Finale

Wednesday morning we had the industry track. I’m biased as a co-organizer, but I heard resounding feedback that the industry track was the highlight of the conference. I was very impressed with the presentations by senior technologists at Facebook, Yahoo, StumbleUpon. LinkedIn, Microsoft, and Echo Nest. And Ronny Kohavi‘s keynote on “Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics” was a masterpiece. I encourage you to look at the slides for all of these excellent presentations.

Afterward came the last two research sessions, which included the best-paper awardee “CLiMF: Learning to Maximize Reciprocal Rank with Collaborative Less-is-More Filtering“. I’ve been a fan of “less is more” ever since seeing Harr Chen present a paper with that title at SIGIR 2006, and I’m delighted to see these concepts making their way to the RecSys community. In fact, I saw some other ideas, like learning to rank, crossing over from IR to RecSys, and I believe this cross-pollination benefits both fields. Finally, I really enjoyed the last research presentation of the conference, in which Smriti Bhagat talked about inferring and obfuscating user demographics based on ratings. The technical and ethical facets of inferring private data are topics close to my heart.

Finally, next year’s hosts exhorted this year’s participants to come to Hong Kong for RecSys 2013, and we heard the final conference presentation: Neal Lathia’s 100-euro-winning entry in the RecSys Limerick Challenge.

Thursday: Flying Home

Sadly I missed the last day conference-related activity: the doctoral symposium, the RecSys Data Challenge, and additional workshops. I’m looking forward to seeing discussion of these online, as well as reviewing the very active #recsys2012 tweet stream.

All in all, it was an excellent conference. LinkedIn, Netflix, and other industry participants comprised about a third of attendees, and there was a strong conversation bridging the gap between academic research and industry practice. I appreciate the focus of the nuances of evaluation, particularly the challenges of combining offline evaluation with online testing, and ensuring that the participant pool is robust. The one topic where I would have like to see more discussion was that of creating robust incentives for people to participate in recommender systems. Maybe next year in Hong Kong?

Oh, and we’re hiring!

7 Comments

Content, Connections, and Context

September 9th, 2012 by Daniel Tunkelang
Respond

This is keynote presentation I delivered at the Workshop on Recommender Systems and the Social Web, held as part of the 6th ACM International Conference on Recommender Systems (RecSys 2012):

Content, Connections, and Context 

Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context.

Content comes first – we need to understand what we are recommending and to whom we are recommending it in order to decide whether the recommendation is relevant. Connections supply a social dimension, both as inputs to improve relevance and as social proof to explain the recommendations. Finally, context determines where and when a recommendation is appropriate.

I’ll talk about how we use these three kinds of signals in LinkedIn’s recommender systems, as well as the challenges we see in delivering social recommendations and measuring their relevance.

When I’m back from Dublin, I promise to blog about my impressions and reflections from the conference. In the mean time, I hope you enjoy the slides!

3 Comments

LinkedIn at RecSys 2012

September 4th, 2012 by Daniel Tunkelang
Respond

LinkedIn is an industry leader in the area of recommender systems — a place where big data meets clever algorithms and content meets social. If you’re one of the 175M+ people using LinkedIn, you’ve probably noticed some of our recommendation products, such People You May KnowJobs You Might Be Interested In, and LinkedIn Today.

So it’s no surprise we’re participating in the 6th ACM International Conference on Recommender Systems (RecSys 2012), which will take place in Dublin next week.

Here’s a preview:

I hope to see many of you at the conference, especially if you’re interested in learning about opportunities to work on recommendation systems and related areas at LinkedIn. And perhaps you can provide your own recommendations — specifically, local pubs where we can take in the local spirit.

See you in Dublin. Sláinte!

6 Comments

Panos Ipeirotis talking at LinkedIn about Crowdsourcing!

August 30th, 2012 by Daniel Tunkelang
Respond

Sharing knowledge is part of our core culture at LinkedIn, whether it’s through hackdays or contributions to open-source projects. We actively participate in academic conferences, such as KDDSIGIR, RecSys, and CIKM, as well as industry conferences like QCON and Strata.

Beyond sharing our own knowledge, we provide a platform for researchers and practitioners to share their insights with the technical community. We host an Tech Talk series at our Mountain View headquarters that we open up to the general public. Some of our recent speakers include Coursera founders Daphne Koller and Andrew Ng, UC-Berkeley professor Joe Hellerstein,  and Hadapt Chief Scientist Daniel Abadi. It’s an excellent opportunity for people with shared professional interests can reconnect with people they know, as well as make new connections. For those who cannot attend, we offer a live stream.

Our next talk will be by Panos Ipeirotis, a professor at NYU and one of the world’s top experts on crowdsourcing. Here is a full description:

Crowdsourcing: Achieving Data Quality with Imperfect Humans
Friday, September 7, 2012 at 3:00 PM
LinkedIn (map)

Crowdsourcing is a great tool to collect data and support machine learning — it is the ultimate form of outsourcing. But crowdsourcing introduces budget and quality challenges that must be addressed to realize its benefits.

In this talk, I will discuss the use of crowdsourcing for building robust machine learning models quickly and under budget constraints. I’ll operate under the realistic assumption that we are processing imperfect labels that reflect random and systematic error on the part of human workers. I will also describe our “beat the machine” system engages humans to improve a machine learning system by discovering cases where the machine fails and fails while confident on being correct. I’ll use classification problems that arise in online advertising.

Finally, I’ll discuss our latest results showing that mice and Mechanical Turk workers are not that different after all.

Panos Ipeirotis is an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, with distinction. He has received three “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation.

If you’re in the Bay Area, I encourage you to attend in person — Panos is a great speaker, and it’s also a great opportunity to network with other attendees. If not, then you can follow on the live stream.

The event is free, but please sign up on the event page. See you next week!

2 Comments

Data Werewolves

August 23rd, 2012 by Daniel Tunkelang
Respond

Thank you Scott Adams for the free advertising. Of course, LinkedIn is the place to find data werewolves.

 

Want to find more data werewolves. Check out my team! Don’t worry, they only bite when they’re hungry.

4 Comments

Matt Lease: Recent Adventures in Crowdsourcing and Human Computation

August 20th, 2012 by Daniel Tunkelang
Respond

Today we (specifically, my colleague Daria Sorokina) had the pleasure of hosting UT-Austin professor Matt Lease at LinkedIn to give a talk on his “Recent Adventures in Crowdsourcing and Human Computation“. It was a great talk, and the slides above are full of references to research that he and his colleagues have done in this area. A great resource for people interested in the theory and practice of crowdsourcing!

If you are interested in learning more about crowdsourcing, then sign up for an upcoming LinkedIn tech talk by NYU professor Panos Ipeirotis on “Crowdsourcing: Achieving Data Quality with Imperfect Humans“.

And if you’re already an expert, then perhaps you’d like to work on crowdsourcing at LinkedIn!

2 Comments

WTF! @ k: Measuring Ineffectiveness

August 20th, 2012 by Daniel Tunkelang
Respond

At SIGIR 2004, Ellen Voorhees presented a paper entitled “Measuring Ineffectiveness” in which she asserted:

Using average values of traditional evaluation measures [for information retrieval systems] is not an appropriate methodology because it emphasizes effective topics: poorly performing topics’ scores are by definition small, and they are therefore difficult to distinguish from the noise inherent in retrieval evaluation.

Ellen is one of the world’s top researchers in the field of information retrieval evaluation. And for those not familiar with TREC terminology, “topics” are the queries used to evaluate information retrieval systems. So what she’s saying above is that, in order to evaluate systems effectively, we need to focus more on failures than on successes.

Specifically, she proposed that we judge information retrieval system performance by measuring the percentage of topics (i.e., queries) with no relevant results in the top 10 retrieved (%no), a measure that was then adopted by the TREC robust retrieval track.

Information Retrieval in the Wild

Information retrieval (aka search) in the wild is a bit different from information retrieval in the lab. We don’t have a gold standard of human relevance judgements against which we can compare search engine results. And even if we can assemble a representative collection of test queries, it isn’t economically plausible to assemble this gold standard for a large document corpus where each query can have thousands — even millions — of relevant results.

Moreover, the massive growth of the internet and the advent of social networks have changed the landscape of information retrieval. The idea that the relationship between a document and a search query, would  be sufficient to determine relevance was always a crude approximation, but now the diversity of a global user base makes this approximation even cruder.

For example, consider this query on Google for [nlp]:

Hopefully Google’s hundreds of ranking factors — and all of you —  know me well enough to know that, when I say NLP, I’m probably referring to natural language processing rather than neuro-linguistic programming. Still, it’s an understandable mistake — the latter NLP sells a lot more books.

And search in the context of a social network makes the user’s identity and task context key factors for determine relevance — factors that are uniquely available to each user. For example, if I search on Linkedin for [peter kim], the search engine cannot know for certain whether I’m looking for my former co-worker, a celebrity I’m connected to, a current co-worker who is a 2nd-degree connection, or someone else entirely.

In short, we cannot rely on human relevance judgments to determine if we are delivering users the most relevant results.

From %no to WTF! @ k

But human judgments can still provide enormous value for evaluating search engine and recommender system performance. Even if we can’t use them to distinguish the most relevant results, we can identify situations where we are delivering glaringly irrelevant results. Situations where the user’s natural reaction is “WTF!“.

People understand that search engines and recommender systems aren’t mind readers. We humans recognize that computers make mistakes, much as other people do. To err, after all, is human.

What we don’t forgive — especially from computers — are seemingly inexplicable mistakes that any reasonable person would be able to recognize.

I’m not going to single out any sites to provide examples. I’m sure you are familiar with the experience of a search engine or recommender system returning a result that makes you want to scream “WTF!”. I may even bear some responsibility, in which case I apologize. Besides, everyone is entitled to the occasional mistake.

But I’m hard-pressed to come up with a better measure to optimize (i.e., minimize) than WTF! @ k — that is, the number of top-k results that elicit a WTF! reaction. The value of k depends on the application. For a search engine, k = 10 could correspond to the first page of results. For a recommender system, k is probably smaller, e.g., 3.

Also the system can substantially mitigate the risk of WTF! results by providing explanations for results and making the information seeking process more of a conversation with the user.

Measuring WTF! @ k

Hopefully you agree that we should strive to minimize WTF! @ k. But, as Lord Kelvin tells us, if you can’t measure it, then you can’t improve it. How do we measure WTF! @ k?

On one hand, we cannot rely on click behavior to measure it implicitly. All non-clicks look the same, and we can’t tell which ones were WTF! results. In fact, egregiously irrelevant results may inspire clicks out of sheer curiosity. One of the phenomena that search engines watch out for is an unusually high click-through rate — those clicks often signal something other than relevance, like a racy or offensive result.

On the other hand, we can measure WTF! @ k with human judgments. A rater does not need to have the personal and task context of a user to evaluate whether a result is at least plausibly relevant. WTF! @ k is thus a measure that is amenable to crowdsourcing, a technique that both Google and Bing use to improve search quality. As does LinkedIn, and we are hiring a program manager for crowdsourcing.

Conclusion

As information retrieval systems become increasingly personalized and task-centric, I hope we will see more people using measures like WTF! @ k to evaluate their performance, as well as working to make results more explainable. After all, no one likes hurting their computer’s feelings by screaming WTF! at it.

 

19 Comments

Hiring: Taking It Personally

August 1st, 2012 by Daniel Tunkelang
Respond

As a manager, I’ve found that I mostly have two jobs: bringing great people onto the team, and creating the conditions for their success. The second job is the reason I became a manager — there’s nothing more satisfying than seeing people achieve greatness in both the value they create and their own professional development.

But the first step is getting those people on your team. And hiring great people is hard, even when you and your colleagues are building the world’s best hiring solutions! By definition, the best people are scarce and highly sought after.

At the risk of giving away my competitive edge, I’d like to offer a word of advice to hiring managers: take it personally. That is, make the hiring process all about the people you’re trying to hire and the people on your team.

How does that work in practice? It means that everyone on the team participates in every part of the hiring process — from sourcing to interviewing to closing. A candidate interviews with the team he or she will work with, so everyone is invested in the process. The interview questions reflect the real problems the candidate would work on. And interviews communicate culture in both directions — by the end of the interviews, it’s clear to both the interviewers and the candidate whether they would enjoy working together.

I’ve seen and been part of impersonal hiring processes. And I  understand how the desire to build a scalable process can lead to a bureaucratic, assembly-line approach. But I wholeheartedly reject it. Hiring is fundamentally about people, and that means making the process a human one for everyone involved.

And taking it personally extends to sourcing. Earlier this week, the LinkedIn data science team hosted a happy hour for folks interested in learning more about us and our work. Of course we used our own technology to identify amazing candidates, but I emailed everyone personally, and the whole point of the event was to get to know one another in an informal atmosphere. It was a great time for everyone, and I can’t imagine a better way to convey the unique team culture we have built.

I’m all for technology and process that offers efficiency and scalability. But sometimes your most effective tool is your own humanity. When it comes to hiring, take it personally.

5 Comments

Clicky Web Analytics