LinkedIn at RecSys 2012

LinkedIn is an industry leader in the area of recommender systems — a place where big data meets clever algorithms and content meets social. If you’re one of the 175M+ people using LinkedIn, you’ve probably noticed some of our recommendation products, such People You May KnowJobs You Might Be Interested In, and LinkedIn Today.

So it’s no surprise we’re participating in the 6th ACM International Conference on Recommender Systems (RecSys 2012), which will take place in Dublin next week.

Here’s a preview:

I hope to see many of you at the conference, especially if you’re interested in learning about opportunities to work on recommendation systems and related areas at LinkedIn. And perhaps you can provide your own recommendations — specifically, local pubs where we can take in the local spirit.

See you in Dublin. Sláinte!


Panos Ipeirotis talking at LinkedIn about Crowdsourcing!

Sharing knowledge is part of our core culture at LinkedIn, whether it’s through hackdays or contributions to open-source projects. We actively participate in academic conferences, such as KDDSIGIR, RecSys, and CIKM, as well as industry conferences like QCON and Strata.

Beyond sharing our own knowledge, we provide a platform for researchers and practitioners to share their insights with the technical community. We host an Tech Talk series at our Mountain View headquarters that we open up to the general public. Some of our recent speakers include Coursera founders Daphne Koller and Andrew Ng, UC-Berkeley professor Joe Hellerstein,  and Hadapt Chief Scientist Daniel Abadi. It’s an excellent opportunity for people with shared professional interests can reconnect with people they know, as well as make new connections. For those who cannot attend, we offer a live stream.

Our next talk will be by Panos Ipeirotis, a professor at NYU and one of the world’s top experts on crowdsourcing. Here is a full description:

Crowdsourcing: Achieving Data Quality with Imperfect Humans
Friday, September 7, 2012 at 3:00 PM
LinkedIn (map)

Crowdsourcing is a great tool to collect data and support machine learning — it is the ultimate form of outsourcing. But crowdsourcing introduces budget and quality challenges that must be addressed to realize its benefits.

In this talk, I will discuss the use of crowdsourcing for building robust machine learning models quickly and under budget constraints. I’ll operate under the realistic assumption that we are processing imperfect labels that reflect random and systematic error on the part of human workers. I will also describe our “beat the machine” system engages humans to improve a machine learning system by discovering cases where the machine fails and fails while confident on being correct. I’ll use classification problems that arise in online advertising.

Finally, I’ll discuss our latest results showing that mice and Mechanical Turk workers are not that different after all.

Panos Ipeirotis is an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, with distinction. He has received three “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation.

If you’re in the Bay Area, I encourage you to attend in person — Panos is a great speaker, and it’s also a great opportunity to network with other attendees. If not, then you can follow on the live stream.

The event is free, but please sign up on the event page. See you next week!


Data Werewolves

Thank you Scott Adams for the free advertising. Of course, LinkedIn is the place to find data werewolves.


Want to find more data werewolves. Check out my team! Don’t worry, they only bite when they’re hungry.


Matt Lease: Recent Adventures in Crowdsourcing and Human Computation

Today we (specifically, my colleague Daria Sorokina) had the pleasure of hosting UT-Austin professor Matt Lease at LinkedIn to give a talk on his “Recent Adventures in Crowdsourcing and Human Computation“. It was a great talk, and the slides above are full of references to research that he and his colleagues have done in this area. A great resource for people interested in the theory and practice of crowdsourcing!

If you are interested in learning more about crowdsourcing, then sign up for an upcoming LinkedIn tech talk by NYU professor Panos Ipeirotis on “Crowdsourcing: Achieving Data Quality with Imperfect Humans“.

And if you’re already an expert, then perhaps you’d like to work on crowdsourcing at LinkedIn!


WTF! @ k: Measuring Ineffectiveness

At SIGIR 2004, Ellen Voorhees presented a paper entitled “Measuring Ineffectiveness” in which she asserted:

Using average values of traditional evaluation measures [for information retrieval systems] is not an appropriate methodology because it emphasizes effective topics: poorly performing topics’ scores are by definition small, and they are therefore difficult to distinguish from the noise inherent in retrieval evaluation.

Ellen is one of the world’s top researchers in the field of information retrieval evaluation. And for those not familiar with TREC terminology, “topics” are the queries used to evaluate information retrieval systems. So what she’s saying above is that, in order to evaluate systems effectively, we need to focus more on failures than on successes.

Specifically, she proposed that we judge information retrieval system performance by measuring the percentage of topics (i.e., queries) with no relevant results in the top 10 retrieved (%no), a measure that was then adopted by the TREC robust retrieval track.

Information Retrieval in the Wild

Information retrieval (aka search) in the wild is a bit different from information retrieval in the lab. We don’t have a gold standard of human relevance judgements against which we can compare search engine results. And even if we can assemble a representative collection of test queries, it isn’t economically plausible to assemble this gold standard for a large document corpus where each query can have thousands — even millions — of relevant results.

Moreover, the massive growth of the internet and the advent of social networks have changed the landscape of information retrieval. The idea that the relationship between a document and a search query, would  be sufficient to determine relevance was always a crude approximation, but now the diversity of a global user base makes this approximation even cruder.

For example, consider this query on Google for [nlp]:

Hopefully Google’s hundreds of ranking factors — and all of you —  know me well enough to know that, when I say NLP, I’m probably referring to natural language processing rather than neuro-linguistic programming. Still, it’s an understandable mistake — the latter NLP sells a lot more books.

And search in the context of a social network makes the user’s identity and task context key factors for determine relevance — factors that are uniquely available to each user. For example, if I search on Linkedin for [peter kim], the search engine cannot know for certain whether I’m looking for my former co-worker, a celebrity I’m connected to, a current co-worker who is a 2nd-degree connection, or someone else entirely.

In short, we cannot rely on human relevance judgments to determine if we are delivering users the most relevant results.

From %no to WTF! @ k

But human judgments can still provide enormous value for evaluating search engine and recommender system performance. Even if we can’t use them to distinguish the most relevant results, we can identify situations where we are delivering glaringly irrelevant results. Situations where the user’s natural reaction is “WTF!“.

People understand that search engines and recommender systems aren’t mind readers. We humans recognize that computers make mistakes, much as other people do. To err, after all, is human.

What we don’t forgive — especially from computers — are seemingly inexplicable mistakes that any reasonable person would be able to recognize.

I’m not going to single out any sites to provide examples. I’m sure you are familiar with the experience of a search engine or recommender system returning a result that makes you want to scream “WTF!”. I may even bear some responsibility, in which case I apologize. Besides, everyone is entitled to the occasional mistake.

But I’m hard-pressed to come up with a better measure to optimize (i.e., minimize) than WTF! @ k — that is, the number of top-k results that elicit a WTF! reaction. The value of k depends on the application. For a search engine, k = 10 could correspond to the first page of results. For a recommender system, k is probably smaller, e.g., 3.

Also the system can substantially mitigate the risk of WTF! results by providing explanations for results and making the information seeking process more of a conversation with the user.

Measuring WTF! @ k

Hopefully you agree that we should strive to minimize WTF! @ k. But, as Lord Kelvin tells us, if you can’t measure it, then you can’t improve it. How do we measure WTF! @ k?

On one hand, we cannot rely on click behavior to measure it implicitly. All non-clicks look the same, and we can’t tell which ones were WTF! results. In fact, egregiously irrelevant results may inspire clicks out of sheer curiosity. One of the phenomena that search engines watch out for is an unusually high click-through rate — those clicks often signal something other than relevance, like a racy or offensive result.

On the other hand, we can measure WTF! @ k with human judgments. A rater does not need to have the personal and task context of a user to evaluate whether a result is at least plausibly relevant. WTF! @ k is thus a measure that is amenable to crowdsourcing, a technique that both Google and Bing use to improve search quality. As does LinkedIn, and we are hiring a program manager for crowdsourcing.


As information retrieval systems become increasingly personalized and task-centric, I hope we will see more people using measures like WTF! @ k to evaluate their performance, as well as working to make results more explainable. After all, no one likes hurting their computer’s feelings by screaming WTF! at it.



Hiring: Taking It Personally

As a manager, I’ve found that I mostly have two jobs: bringing great people onto the team, and creating the conditions for their success. The second job is the reason I became a manager — there’s nothing more satisfying than seeing people achieve greatness in both the value they create and their own professional development.

But the first step is getting those people on your team. And hiring great people is hard, even when you and your colleagues are building the world’s best hiring solutions! By definition, the best people are scarce and highly sought after.

At the risk of giving away my competitive edge, I’d like to offer a word of advice to hiring managers: take it personally. That is, make the hiring process all about the people you’re trying to hire and the people on your team.

How does that work in practice? It means that everyone on the team participates in every part of the hiring process — from sourcing to interviewing to closing. A candidate interviews with the team he or she will work with, so everyone is invested in the process. The interview questions reflect the real problems the candidate would work on. And interviews communicate culture in both directions — by the end of the interviews, it’s clear to both the interviewers and the candidate whether they would enjoy working together.

I’ve seen and been part of impersonal hiring processes. And I  understand how the desire to build a scalable process can lead to a bureaucratic, assembly-line approach. But I wholeheartedly reject it. Hiring is fundamentally about people, and that means making the process a human one for everyone involved.

And taking it personally extends to sourcing. Earlier this week, the LinkedIn data science team hosted a happy hour for folks interested in learning more about us and our work. Of course we used our own technology to identify amazing candidates, but I emailed everyone personally, and the whole point of the event was to get to know one another in an informal atmosphere. It was a great time for everyone, and I can’t imagine a better way to convey the unique team culture we have built.

I’m all for technology and process that offers efficiency and scalability. But sometimes your most effective tool is your own humanity. When it comes to hiring, take it personally.


Upcoming Conferences: RecSys, HCIR, CIKM

Long-time readers know that I have strong opinions about academic conferences. I find the main value of conferences and workshops to be facilitating face-to-face interaction among researchers and practitioners who share professional interests. An offline version of LinkedIn, if you will.

This year, I’m focusing my attention on three conferences: RecSys, HCIR, and CIKM. Regrettably I won’t be able to attend SIGIR, Strata NY, or UIST. But fortunately my colleagues are attending the first two, and hopefully some UIST attendees will be able to arrive a few days early and attend HCIR. Perhaps we can steal a page from WSDM and CSCW and arrange a cross-conference social in Cambridge.

6th ACM Recommender System Conference (RecSys 2012)

At RecSys, which will take place September 9-13 in Dublin,I’m co-organizing the Industry Track with Yehuda Koren. The program features technology leaders from Facebook, Microsoft, StumbleUpon, The Echo Nest, Yahoo, and of course LinkedIn. I’m also delivering a keynote at the Workshop on Recommender Systems and the Social Web. I hope to see you there, along with several of my colleagues who will be presenting their work on recommender systems at LinkedIn.

6th Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2012)

The 6th HCIR represents a milestone — we’ve upgraded from a 1-day worksop to a 2-day symposium. We are continuing two great traditions: strong keynotes (Marti Hearst) and the HCIR Challenge (focused on people search). The symposium will take place October 4-5 in Cambridge, MA. Hope to see many of you there. And, if you’re still working on your submissions and challenge entries, good luck wrapping them up by the July 29 deadline!

21st ACM International Conference on Information and Knowledge Management (CIKM 2012)

Finally, you can’t miss CIKM in Hawaii! This year’s conference will take place October 29 – November 2 in Maui. After co-organizing last year’s industry track in Glasgow, I’m delighted to be a speaker in this year’s track, which also includes researchers and practitioners from Adobe, eBay, Google, Groupon, IBM, Microsoft, Tencent, Walmart Labs, and Yahoo. A great program in one of the world’s most beautiful settings, how can you resist?

I hope to see many of you at one — hopefully all! — of these great events! But, if you can’t make it, be reassured that I’ll blog about them here.


HCIR 2012 Challenge: People Search

As we get ready for the Sixth Symposium on Human-Computer Interaction and Information Retrieval this October in Cambridge, MA, people around the world are working on their entries for the third HCIR Challenge.

Our first HCIR Challenge in 2010 focused on exploratory search of a news archive. Thanks to the generosity of the Linguistic Data Consortium (LDC), we were able to provide participants with access to the New York Times (NYT) Annotated Corpus free of charge. Six teams presented their entries:

Search for Journalists: New York Times Challenge Report
Corrado Boscarino, Arjen P. de Vries, and Wouter Alink (Centrum Wiskunde and Informatica)

Exploring the New York Times Corpus with NewsClub
Christian Kohlschütter (Leibniz Universität Hannover)

Searching Through Time in the New York Times (WINNER)
Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)
(covered in Technology Review: “A Search Service that Can Peer into the Future“)

News Sync: Three Reasons to Visualize News Better
V.G. Vinod Vydiswaran (University of Illinois), Jeroen van den Eijkhof (University of Washington), Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research) 

Custom Dimensions for Text Corpus Navigation
Vladimir Zelevinsky (Endeca Technologies)

A Retrieval System Based on Sentiment Analysis
Wei Zheng and Hui Fang (University of Delaware)

In 2011, we continued wth a Challenge focused on the problem of information availability. Four teams presented their systems to address this particularly difficult area of information retrieval:

FreeSearch – Literature Search in a Natural Way
Claudiu S. Firan, Wolfgang Nejdl, Mihai Georgescu (University of Hanover), and Xinyun Sun (DEKE Lab MOE, Renmin)

Session-based search with Querium (WINNER)
Gene Golovchinsky (FX Palo Alto Lab) and Abdigani Diriye (University College London)

David L.Ostby and Edmond Brian (Visual Purple)

Query Analytics Workbench
Antony Scerri, Matthew Corkum, Keith Gutfreund, Ron Daniel Jr., Michael Taylor (Elsevier Labs)

This year’s Challenge focuses on people search — that is, on the problem of people and expertise finding.

Here are examples of the kinds of tasks we will publish after the systems are frozen at the end of August:

  • Hiring

    Given a job description, produce a set of suitable candidates for the position. An example of a job description:

  • Assembling a Conference Program

    Given a conference’s past history, produce a set of suitable candidates for keynotes, program committee members, etc. for the conference. An example conference could be HCIR 2013, where past conferences are described at

  • Finding People to deliver Patent Research or Expert Testimony

    Given a patent, produce a set of suitable candidates who could deliver relevant research or expert testimony for use in a trial. These people can be further segmented, e.g., students and other practitioners might be good at the research, while more senior experts might be more credible in high-stakes litigation. An example task would be to find people for

For all of the tasks there is a dual goal of obtaining a set of candidates (ideally organized or ranked) and producing a repeatable and extensible search strategy.

Best of luck to this year’s HCIR Challenge participants — I’m excited to see the systems that they present this October at the Symposium!


RecSys 2012 Industry Track

I’m proud to be co-organizing the RecSys 2012 Industry Track with Yehuda Koren.

Check out the line-up:

  • Ronny Kohavi (Microsoft), Keynote
    Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics
  • Ralf Herbrich (Facebook)
    Distributed, Real-Time Bayesian Learning in Online Services
  • Ronny Lempel (Yahoo! Research)
    Recommendation Challenges in Web Media Settings
  • Sumanth Kolar (StumbleUpon)
    Recommendations and Discovery at StumbleUpon
  • Anmol Bhasin (LinkedIn)
    Recommender Systems & The Social Web
  • Thore Graepel (Microsoft Research)
    Towards Personality-Based Personalization
  • Paul Lamere (The Echo Nest)
    I’ve got 10 million songs in my pocket. Now what?

Hope to see you at RecSys this September in Dublin! Registration is open now.


Information Cascades, Revisited

A couple of years ago, I blogged about an information cascade problem I’d read about in David Easley and Jon Kleinberg‘s textbook on Networks, Crowds, and Markets. To recall the problem (which they themselves borrowed from Lisa Anderson and Charles Holt:

The experimenter puts an urn at the front of the room with three marbles in it; she announces that there is a 50% chance that the urn contains two red marbles and one blue marble, and a 50% chance that the urn contains two blue marbles and one red marble…one by one, each student comes to the front of the room and draws a marble from the urn; he looks at the color and then places it back in the urn without showing it to the rest of the class. The student then guesses whether the urn is majority-red or majority-blue and publicly announces this guess to the class.

The fascinating result is that the sequence of guesses locks in on a single color as soon as two consecutive students agree. For example, if the first two marbles drawn are blue, then all subsequent students will guess blue. If the urn is majority-red, then it turns out there is a 4/5 probability that the sequence will converge to red and a 1/5 probability that it will converge to blue.

Let me explain why I find this problem so fascinating.

Consider a scenario where you are among a group of people faced with the single binary decision — let’s say, choosing red or blue — and that each of you is independently tasked with recommending the best decision given your own judgement and all available information. Assume further that each of you is perfectly rational and that each of your prior decisions (i.e., without knowing what anyone else thinks) is based on independent and identically distributed random variables. Let’s follow the example above, in which each participant in the decision process has a prior corresponding to a Bernoulli random variable with probability p = 2/3.

If each of you makes a decision independently, then the expected fraction of participants who makes the right decision is 2/3.

But you could do better if you have a chance to observe others’ independent decision making first. For example, if you get to witness 100 independent decisions, then you have a very low probability of going wrong by voting the majority. If you’d like the gory details, review the cumulative distribution function of binomial random variables.

On the other hand, if the decisions happen sequentially and every person has access to all of the previous decisions, then we see an information cascade. Rationally, it makes sense to let previous decisions influence your own — and indeed 4/5 > 2/3. But there’s still a one in five chance of making the wrong decision, even after you witness 100 previous decisions. We are wasting a lot of independent input because of how participants are incented.

I can’t help wondering how changing the incentives could affect the outcome of this process. What would happen if participants were rewarded based, in whole or in part, on the accuracy of the participants who guess after them?

Consider as an extreme case rewarding all participants based solely on the accuracy of the final participant’s guess. In that case, the optimal strategy for all but the last participant is to ignore previous participants’ guesses and vote based solely on their own independent judgements. Then the final participant combines these judgements with his or her own and votes based on the majority. The result makes optimal use of all participants’ independent judgments, despite the sequential decision process.

But what if individuals are reward based on a combination of individual and collective success? Consider the 3rd participant in our example who draws a red marble after the previous participants guess blue. Let’s say that there are 5 participants in total. If the reward is entirely based on individual success, the 3rd participant will vote blue, yielding an expected reward of 2/3. If the reward is entirely based on group success, the 3rd participant will vote red, yielding an expected reward of 20/27 (details left as an exercise for the reader). If we make the reward evenly split between individual success and group success, the 3rd participant will still vote blue — the benefit from helping the group will not be enough to overcome the cost to the individual reward.

There’s a lot more math in the details of this problem, e.g. “The Mathematics of Bayesian Learning Traps“, by Simon Loertscher and Andrew McLennan. But there’s a simple take-away: incentives are crucial in determining how we best exploit our collective wisdom. Something to think about the next time you’re on a committee.