Categories
General

Panos Ipeirotis talking at LinkedIn about Crowdsourcing!

Sharing knowledge is part of our core culture at LinkedIn, whether it’s through hackdays or contributions to open-source projects. We actively participate in academic conferences, such as KDDSIGIR, RecSys, and CIKM, as well as industry conferences like QCON and Strata.

Beyond sharing our own knowledge, we provide a platform for researchers and practitioners to share their insights with the technical community. We host an Tech Talk series at our Mountain View headquarters that we open up to the general public. Some of our recent speakers include Coursera founders Daphne Koller and Andrew Ng, UC-Berkeley professor Joe Hellerstein,  and Hadapt Chief Scientist Daniel Abadi. It’s an excellent opportunity for people with shared professional interests can reconnect with people they know, as well as make new connections. For those who cannot attend, we offer a live stream.

Our next talk will be by Panos Ipeirotis, a professor at NYU and one of the world’s top experts on crowdsourcing. Here is a full description:

Crowdsourcing: Achieving Data Quality with Imperfect Humans
Friday, September 7, 2012 at 3:00 PM
LinkedIn (map)

Crowdsourcing is a great tool to collect data and support machine learning — it is the ultimate form of outsourcing. But crowdsourcing introduces budget and quality challenges that must be addressed to realize its benefits.

In this talk, I will discuss the use of crowdsourcing for building robust machine learning models quickly and under budget constraints. I’ll operate under the realistic assumption that we are processing imperfect labels that reflect random and systematic error on the part of human workers. I will also describe our “beat the machine” system engages humans to improve a machine learning system by discovering cases where the machine fails and fails while confident on being correct. I’ll use classification problems that arise in online advertising.

Finally, I’ll discuss our latest results showing that mice and Mechanical Turk workers are not that different after all.

Panos Ipeirotis is an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, with distinction. He has received three “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation.

If you’re in the Bay Area, I encourage you to attend in person — Panos is a great speaker, and it’s also a great opportunity to network with other attendees. If not, then you can follow on the live stream.

The event is free, but please sign up on the event page. See you next week!

Categories
General

Data Werewolves

Thank you Scott Adams for the free advertising. Of course, LinkedIn is the place to find data werewolves.

 

Want to find more data werewolves. Check out my team! Don’t worry, they only bite when they’re hungry.

Categories
General

Matt Lease: Recent Adventures in Crowdsourcing and Human Computation

Today we (specifically, my colleague Daria Sorokina) had the pleasure of hosting UT-Austin professor Matt Lease at LinkedIn to give a talk on his “Recent Adventures in Crowdsourcing and Human Computation“. It was a great talk, and the slides above are full of references to research that he and his colleagues have done in this area. A great resource for people interested in the theory and practice of crowdsourcing!

If you are interested in learning more about crowdsourcing, then sign up for an upcoming LinkedIn tech talk by NYU professor Panos Ipeirotis on “Crowdsourcing: Achieving Data Quality with Imperfect Humans“.

And if you’re already an expert, then perhaps you’d like to work on crowdsourcing at LinkedIn!

Categories
General

WTF! @ k: Measuring Ineffectiveness

At SIGIR 2004, Ellen Voorhees presented a paper entitled “Measuring Ineffectiveness” in which she asserted:

Using average values of traditional evaluation measures [for information retrieval systems] is not an appropriate methodology because it emphasizes effective topics: poorly performing topics’ scores are by definition small, and they are therefore difficult to distinguish from the noise inherent in retrieval evaluation.

Ellen is one of the world’s top researchers in the field of information retrieval evaluation. And for those not familiar with TREC terminology, “topics” are the queries used to evaluate information retrieval systems. So what she’s saying above is that, in order to evaluate systems effectively, we need to focus more on failures than on successes.

Specifically, she proposed that we judge information retrieval system performance by measuring the percentage of topics (i.e., queries) with no relevant results in the top 10 retrieved (%no), a measure that was then adopted by the TREC robust retrieval track.

Information Retrieval in the Wild

Information retrieval (aka search) in the wild is a bit different from information retrieval in the lab. We don’t have a gold standard of human relevance judgements against which we can compare search engine results. And even if we can assemble a representative collection of test queries, it isn’t economically plausible to assemble this gold standard for a large document corpus where each query can have thousands — even millions — of relevant results.

Moreover, the massive growth of the internet and the advent of social networks have changed the landscape of information retrieval. The idea that the relationship between a document and a search query, would  be sufficient to determine relevance was always a crude approximation, but now the diversity of a global user base makes this approximation even cruder.

For example, consider this query on Google for [nlp]:

Hopefully Google’s hundreds of ranking factors — and all of you —  know me well enough to know that, when I say NLP, I’m probably referring to natural language processing rather than neuro-linguistic programming. Still, it’s an understandable mistake — the latter NLP sells a lot more books.

And search in the context of a social network makes the user’s identity and task context key factors for determine relevance — factors that are uniquely available to each user. For example, if I search on Linkedin for [peter kim], the search engine cannot know for certain whether I’m looking for my former co-worker, a celebrity I’m connected to, a current co-worker who is a 2nd-degree connection, or someone else entirely.

In short, we cannot rely on human relevance judgments to determine if we are delivering users the most relevant results.

From %no to WTF! @ k

But human judgments can still provide enormous value for evaluating search engine and recommender system performance. Even if we can’t use them to distinguish the most relevant results, we can identify situations where we are delivering glaringly irrelevant results. Situations where the user’s natural reaction is “WTF!“.

People understand that search engines and recommender systems aren’t mind readers. We humans recognize that computers make mistakes, much as other people do. To err, after all, is human.

What we don’t forgive — especially from computers — are seemingly inexplicable mistakes that any reasonable person would be able to recognize.

I’m not going to single out any sites to provide examples. I’m sure you are familiar with the experience of a search engine or recommender system returning a result that makes you want to scream “WTF!”. I may even bear some responsibility, in which case I apologize. Besides, everyone is entitled to the occasional mistake.

But I’m hard-pressed to come up with a better measure to optimize (i.e., minimize) than WTF! @ k — that is, the number of top-k results that elicit a WTF! reaction. The value of k depends on the application. For a search engine, k = 10 could correspond to the first page of results. For a recommender system, k is probably smaller, e.g., 3.

Also the system can substantially mitigate the risk of WTF! results by providing explanations for results and making the information seeking process more of a conversation with the user.

Measuring WTF! @ k

Hopefully you agree that we should strive to minimize WTF! @ k. But, as Lord Kelvin tells us, if you can’t measure it, then you can’t improve it. How do we measure WTF! @ k?

On one hand, we cannot rely on click behavior to measure it implicitly. All non-clicks look the same, and we can’t tell which ones were WTF! results. In fact, egregiously irrelevant results may inspire clicks out of sheer curiosity. One of the phenomena that search engines watch out for is an unusually high click-through rate — those clicks often signal something other than relevance, like a racy or offensive result.

On the other hand, we can measure WTF! @ k with human judgments. A rater does not need to have the personal and task context of a user to evaluate whether a result is at least plausibly relevant. WTF! @ k is thus a measure that is amenable to crowdsourcing, a technique that both Google and Bing use to improve search quality. As does LinkedIn, and we are hiring a program manager for crowdsourcing.

Conclusion

As information retrieval systems become increasingly personalized and task-centric, I hope we will see more people using measures like WTF! @ k to evaluate their performance, as well as working to make results more explainable. After all, no one likes hurting their computer’s feelings by screaming WTF! at it.

 

Categories
General

Hiring: Taking It Personally

As a manager, I’ve found that I mostly have two jobs: bringing great people onto the team, and creating the conditions for their success. The second job is the reason I became a manager — there’s nothing more satisfying than seeing people achieve greatness in both the value they create and their own professional development.

But the first step is getting those people on your team. And hiring great people is hard, even when you and your colleagues are building the world’s best hiring solutions! By definition, the best people are scarce and highly sought after.

At the risk of giving away my competitive edge, I’d like to offer a word of advice to hiring managers: take it personally. That is, make the hiring process all about the people you’re trying to hire and the people on your team.

How does that work in practice? It means that everyone on the team participates in every part of the hiring process — from sourcing to interviewing to closing. A candidate interviews with the team he or she will work with, so everyone is invested in the process. The interview questions reflect the real problems the candidate would work on. And interviews communicate culture in both directions — by the end of the interviews, it’s clear to both the interviewers and the candidate whether they would enjoy working together.

I’ve seen and been part of impersonal hiring processes. And I  understand how the desire to build a scalable process can lead to a bureaucratic, assembly-line approach. But I wholeheartedly reject it. Hiring is fundamentally about people, and that means making the process a human one for everyone involved.

And taking it personally extends to sourcing. Earlier this week, the LinkedIn data science team hosted a happy hour for folks interested in learning more about us and our work. Of course we used our own technology to identify amazing candidates, but I emailed everyone personally, and the whole point of the event was to get to know one another in an informal atmosphere. It was a great time for everyone, and I can’t imagine a better way to convey the unique team culture we have built.

I’m all for technology and process that offers efficiency and scalability. But sometimes your most effective tool is your own humanity. When it comes to hiring, take it personally.