Long-time readers know that I have strong opinions about academic conferences. I find the main value of conferences and workshops to be facilitating face-to-face interaction among researchers and practitioners who share professional interests. An offline version of LinkedIn, if you will.
This year, I’m focusing my attention on three conferences: RecSys, HCIR, and CIKM. Regrettably I won’t be able to attend SIGIR, Strata NY, or UIST. But fortunately my colleagues are attending the first two, and hopefully some UIST attendees will be able to arrive a few days early and attend HCIR. Perhaps we can steal a page from WSDM and CSCW and arrange a cross-conference social in Cambridge.
6th ACM Recommender System Conference (RecSys 2012)
At RecSys, which will take place September 9-13 in Dublin,I’m co-organizing the Industry Track with Yehuda Koren. The program features technology leaders from Facebook, Microsoft, StumbleUpon, The Echo Nest, Yahoo, and of course LinkedIn. I’m also delivering a keynote at the Workshop on Recommender Systems and the Social Web. I hope to see you there, along with several of my colleagues who will be presenting their work on recommender systems at LinkedIn.
6th Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2012)
The 6th HCIR represents a milestone — we’ve upgraded from a 1-day worksop to a 2-day symposium. We are continuing two great traditions: strong keynotes (Marti Hearst) and the HCIR Challenge (focused on people search). The symposium will take place October 4-5 in Cambridge, MA. Hope to see many of you there. And, if you’re still working on your submissions and challenge entries, good luck wrapping them up by the July 29 deadline!
21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
Finally, you can’t miss CIKM in Hawaii! This year’s conference will take place October 29 – November 2 in Maui. After co-organizing last year’s industry track in Glasgow, I’m delighted to be a speaker in this year’s track, which also includes researchers and practitioners from Adobe, eBay, Google, Groupon, IBM, Microsoft, Tencent, Walmart Labs, and Yahoo. A great program in one of the world’s most beautiful settings, how can you resist?
I hope to see many of you at one — hopefully all! — of these great events! But, if you can’t make it, be reassured that I’ll blog about them here.
News Sync: Three Reasons to Visualize News Better V.G. Vinod Vydiswaran (University of Illinois), Jeroen van den Eijkhof (University of Washington), Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research)
Assembling a Conference Program
Given a conference’s past history, produce a set of suitable candidates for keynotes, program committee members, etc. for the conference. An example conference could be HCIR 2013, where past conferences are described at http://hcir.info/.
Finding People to deliver Patent Research or Expert Testimony
Given a patent, produce a set of suitable candidates who could deliver relevant research or expert testimony for use in a trial. These people can be further segmented, e.g., students and other practitioners might be good at the research, while more senior experts might be more credible in high-stakes litigation. An example task would be to find people for http://www.articleonepartners.com/study/index/1658-system-and-method-for-providing-consumer-rewards.
For all of the tasks there is a dual goal of obtaining a set of candidates (ideally organized or ranked) and producing a repeatable and extensible search strategy.
Best of luck to this year’s HCIR Challenge participants — I’m excited to see the systems that they present this October at the Symposium!
The experimenter puts an urn at the front of the room with three marbles in it; she announces that there is a 50% chance that the urn contains two red marbles and one blue marble, and a 50% chance that the urn contains two blue marbles and one red marble…one by one, each student comes to the front of the room and draws a marble from the urn; he looks at the color and then places it back in the urn without showing it to the rest of the class. The student then guesses whether the urn is majority-red or majority-blue and publicly announces this guess to the class.
The fascinating result is that the sequence of guesses locks in on a single color as soon as two consecutive students agree. For example, if the first two marbles drawn are blue, then all subsequent students will guess blue. If the urn is majority-red, then it turns out there is a 16/21 probability that the sequence will converge to red and a 5/21 probability that it will converge to blue.
Let me explain why I find this problem so fascinating.
Consider a scenario where you are among a group of people faced with the single binary decision — let’s say, choosing red or blue — and that each of you is independently tasked with recommending the best decision given your own judgement and all available information. Assume further that each of you is perfectly rational and that each of your prior decisions (i.e., without knowing what anyone else thinks) is based on independent and identically distributed random variables. Let’s follow the example above, in which each participant in the decision process has a prior corresponding to a Bernoulli random variable with probability p = 2/3.
If each of you makes a decision independently, then the expected fraction of participants who makes the right decision is 2/3.
But you could do better if you have a chance to observe others’ independent decision making first. For example, if you get to witness 100 independent decisions, then you have a very low probability of going wrong by voting the majority. If you’d like the gory details, review the cumulative distribution function of binomial random variables.
On the other hand, if the decisions happen sequentially and every person has access to all of the previous decisions, then we see an information cascade. Rationally, it makes sense to let previous decisions influence your own — and indeed 16/21 > 2/3. But 16/21 is still almost a one in four chance of making the wrong decision, even after you witness 100 previous decisions. We are wasting a lot of independent input because of how participants are incented.
I can’t help wondering how changing the incentives could affect the outcome of this process. What would happen if participants were rewarded based, in whole or in part, on the accuracy of the participants who guess after them?
Consider as an extreme case rewarding all participants based solely on the accuracy of the final participant’s guess. In that case, the optimal strategy for all but the last participant is to ignore previous participants’ guesses and vote based solely on their own independent judgements. Then the final participant combines these judgements with his or her own and votes based on the majority. The result makes optimal use of all participants’ independent judgments, despite the sequential decision process.
But what if individuals are reward based on a combination of individual and collective success? Consider the 3rd participant in our example who draws a red marble after the previous participants guess blue. Let’s say that there are 5 participants in total. If the reward is entirely based on individual success, the 3rd participant will vote blue, yielding an expected reward of 2/3. If the reward is entirely based on group success, the 3rd participant will vote red, yielding an expected reward of 20/27 (details left as an exercise for the reader). If we make the reward evenly split between individual success and group success, the 3rd participant will still vote blue — the benefit from helping the group will not be enough to overcome the cost to the individual reward.
There’s a lot more math in the details of this problem, e.g. “The Mathematics of Bayesian Learning Traps“, by Simon Loertscher and Andrew McLennan. But there’s a simple take-away: incentives are crucial in determining how we best exploit our collective wisdom. Something to think about the next time you’re on a committee.
But I’m not exactly a fanboy of the semantic web, and I wasn’t sure how the audience would respond to some of my more provocative assertions. Fortunately the reception was very positive. Several people approached me afterwards to thank me for presenting a balanced argument for combining big data with structured representations and for raising HCIR issues.
A couple of people felt that faceted search was old news. I’m delighted that faceted search is becoming increasingly common, but there is still a lot of opportunity to use it more often and more effectively, And I was pleasantly surprised at the interest in discussing extensions of faceted search to address relationships between entities, as well as other nuances. I’ll have to dive into those in future posts.
For now, I hope you enjoy the slides, and I encourage you to ask questions in the comments.
The HCIR Symposium (formerly known as the HCIR Workshop) has run annually since 2007. The event unites academic researchers and industrial practitioners working at the intersection of HCI and IR to develop more sophisticated models, tools, and evaluation metrics to support activities such as interactive information retrieval and exploratory search. It provides an opportunity for attendees to informally share ideas via posters, small group discussions and selected short talks.
Topics for discussion and presentation at the symposium include, but are not limited to:
Novel interaction techniques for information retrieval.
Modeling and evaluation of interactive information retrieval.
Exploratory search and information discovery.
Information visualization and visual analytics.
Applications of HCI techniques to information retrieval needs in specific domains.
Ethnography and user studies relevant to information retrieval and access.
Scale and efficiency considerations for interactive information retrieval systems.
Relevance feedback and active learning approaches for information retrieval.
Demonstrations of systems and prototypes are particularly welcome.
We are also excited to continue the HCIR Challenge, this year focusing on the problem of people and expertise finding. We are grateful to Mendeley for providing this year’s corpus: a database based on Mendeley’s network of 1.6M+ researchers and 180M+ academic documents. Participants will build systems to enable efficient discovery of experts or expertise for applications such as collaborative research, team building, and competitive analysis.
In addition to the Challenge and a small number of research presentations, we will leave plenty of time for what participants have consistently told us that they find extremely valuable: informal discussions, posters and directed group discussions. Finally, we are extending our previous format to include a few full-length, fully-refereed archival quality papers that will be indexed in the ACM Digital Library.
We have extended the event to a second day to accommodate more presentations (including the full papers), and to leave plenty of time for discussion and for interaction around the poster session. There will be a reception on Thursday evening.
Lots of people ask me what it’s like to be a data scientist at LinkedIn. The short answer: it’s awesome. Folks like Pete Skomoroch and team are building data products related to identity and reputation, such as Skills and InMaps. Yael Garten is leading the effort to understand and increase mobile engagement. And other folks work on everything from open-source infrastructure to fraud detection. Amazing people helping our 160M+ members by deriving valuable insights from big data.
I wanted to take a moment to showcase my own team. As a team, we straddle the boundary between science and engineering. We work closely with several engineering teams to deliver products that our members use everyday.
Joseph Adler is a name you might recognize from your bookshelf: he wrote Baseball Hacks and R in a Nutshell, both published by O’Reilly. At LinkedIn, he is a data hacker extraordinaire, currently focused on improving the network update stream.
Ahmet Bugdayci just joined LinkedIn this year, and he’s already on a tear. He’s working on a better approach to representing job titles, one of the most fundamental facets of our members’ professional identity. And he’s a polyglot.
Heyning Cheng is our innovator in chief. He envisions data products and does whatever it takes to hack them together. Our recruiters are especially happy to be his beta testers, and we’re working to turn those prototypes into shipped product.
Gloria Lau leads all things data for the student initiative. Check out LinkedIn Alumni to see what she’s been up to. Students are the future, and we’re excited to be making LinkedIn a great tools for students, alumni, and universities.
Monica Rogati spearheaded many of LinkedIn’s key products: the Talent Match system that matches jobs to candidates; the first machine learning model for People You May Know; and the first version of Groups You May Like. When she’s not working on our products, she gives awesome presentations.
Daria Sorokina recently joined us and is working on search quality. She’s a hard-core machine learning researcher and developer: check out her open-source code for additive groves.
Ramesh Subramonian has been focused on data efforts for our international expansion. Over 60% of our members live outside the United States, and his efforts ensure that LinkedIn’s value proposition is a global one.
Joyce Wang is a data science generalist. She is part of the search team, but she’s built great tools for log analysis and human evaluation that are finding great use across the company.
I hope that gives you a flavor of what it’s like to be a data scientist at LinkedIn — and on my team in particular.
Do you possess that rare combination of computer science background, technical skill, creative problem-solving ability, and product sense? If so, then I’d love to talk with you about opportunities to work on challenging problems with amazing people!
Last night, I had the pleasure to deliver the keynote address at the CIO Summit US. It was an honor to address an assembly of CIOs, CTOs, and technology executives from the nation’s top organizations. My theme was “Science as a Strategy”.
To set the stage, I told the story of TunkRank: how, back in 2009, I proposed a Twitter influence measure based on an explicit model of attention scarcity which proved better than the intuitive but flawed approach of counting followers. The point of the story was not self-promotion, but rather to introduce my core message:
Science is the difference between instinct and strategy.
Given the audience, I didn’t expect this message to be particularly controversial. But we all know that belief is not the same as action, and science is not always popular in the C-Suite. Thus, I offered three suggestions to overcome the HIPPO (Highest Paid Person’s Opinion):
Ask the right questions.
Practice good data hygiene.
Don’t argue when you can experiment!
Asking the Right Questions
Asking the right questions seems obvious — after all, our answers can only be as good as the questions we ask. But science is littered with examples of people asking the wrong questions — from 19th-century phrenologists measuring the sizes of people’s skulls to evaluate intelligence to IT executives measuring lines of code to evaluate programmer productivity. It’s easy for us (today) to recognize these approaches as pseudoscience, but we have to make sure we ask the right questions in our own organizations.
As an example, I turned to the challenge of improving the hiring process. One approach I’ve seen tried at both Google and LinkedIn is to measure the accuracy of interviewers — that is, to see how well the hire / no-hire recommendations of individual interviewers predict the final decisions. But this turns out to be the wrong question — in large part because negative recommendations (especially early ones) weigh much more heavily in the decision than positive ones.
What we found instead was that we should focus on efficiency as an optimization problem. More specifically, there is a trade-off: short-circuiting the process as early as possible (e.g., after the candidate performs poorly on the first phone screen) reduces the average time per candidate, but it also reduces the number of good candidates who make it through the process. To optimize overall throughput (while keeping our high bar), we’ve had to calibrate the upstream filters. How to optimize that upstream filter turns out to be the right question to ask — and one we still continue to iterate on.
More generally, I talked about how, when we hire data scientists at LinkedIn, we look for not only strong analytical skills but also the product and business sense to pick the right questions to ask – questions whose answers create value for users and drive key business decisions. Asking the right questions is the foundation of good science.
Practicing Good Data Hygiene
Data mining is amazing, but we have to watch out for its pejorative meaning of discovering spurious patterns. I used the Super Bowl Indicator as an example of data mining gone wrong — with 80% accuracy, the division (AFC vs. NFC) of the Super Bowl champion predicts the coming year’s stock market performance. Indeed, the NFC won this year (go Giants!) and subsequent market gains have been consistent with this indicator (so far).
We can all laugh at these misguided investors, but we make these mistakes all the time. Despite what researchers have called the “unreasonable effectiveness of data”, we still need the scientific method of first hypothesizing and then experimenting in order to obtain valid and useful conclusions. Without data hygiene, our desires, preconceptions, and other human frailties infect our rational analysis.
A very different example is using click-through data to measure the effectiveness of relevance ranking. This approach isn’t completely wrong, but it suffers from several flaws. And the fundamental flaw relates to data hygiene: how we present information to users infects their perception of relevance. Users assume that top-ranked results are more relevant than lower-ranked results. Also, they can only click on the results presented to them. To paraphrase Donald Rumsfeld: they don’t know what they don’t know. If we aren’t careful, a click-based evaluation of relevance creates positive feedback and only reinforces our initial assumptions – which certainly isn’t the point of evaluation!
Fortunately, there are ways to avoid these biases. We can pay people to rate results presented to them in random order. We can use the explore / exploit technique to hedge against the ranking algorithm’s preconceived bias. And so on.
But the key take-away is that we have to practice good data hygiene, splitting our projects into the two distinct activities of hypothesis generation (i.e., exploratory analysis) and hypothesis testing using withheld data.
Don’t Argue when you can Experiment
I couldn’t resist the opportunity to cite Nobel laureate Daniel Kahneman‘s seminal work on understanding human irrationality. I also threw in Mercier and Sperber’s recent work on reasoning as argumentative. The summary: don’t trust anyone’s theories, not even mine!
Then what can you trust? The results of a well-‐run experiment. Rather than debating data-‐free assertions, subject your hypotheses to the ultimate test: controlled experiments. Not every hypothesis can be tested using a controlled experiment, but most can be.
I recounted the story of how Greg Linden persuaded his colleagues at Amazon to implement shopping-cart recommendations through A/B testing, despite objections from a marketing SVP. Indeed, his work — and Amazon’s generally — has strongly advanced the practice of A/B testing in online settings.
Don’t argue when you can experiment. Decisions about how to improve products and processes should not be by an Oxford-style debate. Rather, those decisions should be informed by data.
Conclusion: Even Steve Jobs Made Mistakes
Some of you may think that this is all good advice, but that science is no match for an inspired leader. Indeed, some pundits have seen Apple’s success relative to Google as an indictment of data-driven decision making in favor of an approach that follows a leader’s gut instinct. Are they right? Should we throw out all of our data and follow our CEOs’ instincts?
Let’s go back a decade. In 2002, Apple faced a pivotal decision – perhaps the most important decision in its history. The iPod was clearly a breakthrough product, but it was only compatible with the Mac. Remember that, back in 2002, Apple had only a 3.5% market share in the PC business. Apple’s top executives did their analysis and predicted that they could drive the massive success of the iPod by making it compatible with Windows, the dominant operating system with over 95% market share.
Steve Jobs resisted. At one point he said that Windows users would get to use the iPod “over [his] dead body”. After continued convincing, Jobs gave up. According to authorized biographer Walter Isaacson, Steve’s exact words were: “Screw it. I’m sick of listening to you assholes. Go do whatever the hell you want.” Luckily for Steve, Apple, and the consumer public, they did, and the rest is history.
It isn’t easy being one those assholes. But that’s our job, much as it was theirs. It’s up to us to turn data into gold, to apply science and technology to create value for our organizations. Because without data, we are gambling on our leaders’ gut feelings. And our leaders, however inspired, have fallible instincts.
Science is the difference between instinct and strategy.
The second was a live interview on Internet Evolution, hosted by Mary Jander and Nicole Ferraro. They clearly did their homework, scouring my blog posts and web commentary for everything controversial I’d ever said — and then some! If that’s enough to pique your interest, then I encourage you to listen to the recorded interview and read the chat transcript at Internet Evolution.
Happy to answer questions based on either of these sessions — comment away!