Month: November 2010

Giving Thanks as an Information Scientist

Post author By Daniel Tunkelang
Post date November 25, 2010
1 Comment on Giving Thanks as an Information Scientist

As a first-generation American who is married to a card-carrying Native American, I celebrate Thanksgiving the traditional way: a day of gluttony followed by yummy leftovers. But, trite as it may be, I do like to take the time to reflect on the countless things for which I am thankful. A wonderful family, of course, but also the great fortune to live in an age where some of the subjects that I find most intellectually stimulating have become highly relevant to our practical daily lives.

Consider information retrieval. Perhaps I’m dating myself, but an undergraduate computer science major, I hardly imagined that information retrieval would have much significance outside of academia. Sure, there were commercial IR systems being built in the 1980s, but it wasn’t until the late 1990s that web search brought IR to the mainstream. Today, it’s hard to imagine studying computer science without learning about IR. Sure, my career makes me a tad biased, but it is undeniable that information retrieval is one of the defining problems of our generation.

And then there are social networks. When I studied graph drawing in the 1990s, the canonical example of a social network was “Six Degrees of Kevin Bacon“. Sure, many of my peers would talk about their Erdős numbers (they were more discreet about their placement in the Tarjan graph), but the study of social networks was surely an academic pursuit. Who would imagine that, barely a decade later, a movie entitled The Social Network would be a blockbuster movie grossing $175M? Leaving aside Hollywood, social networks have become a significant part of our daily lives. Not only do Facebook, Twitter, and LinkedIn account for a large fraction of our time online, but they also affect our offline personal and professional lives.

From childhood, I’ve been interested in mathematics, computer science, and psychology. Living in an age of information retrieval and social networks means that I can apply these interests in my daily work. Today I give thanks for being born at the right place and right time, blessed with a lifetime of interesting and practical problems to solve. Happy Thanksgiving to all, and enjoy the leftovers!

General

An Information Cascade

I’ve been reading Networks, Crowds, and Markets, a great textbook by David Easley and Jon Kleinberg. I’m very grateful to Cambridge University Press for surprising me with an unsolicited review copy. I’m more than halfway through its 700+ pages. Much of the material is familiar in this “interdisciplinary look at economics, sociology, computing and information science, and applied mathematics to understand networks and behavior”. But I’m delighted by much that is new to me, including a particularly elegant description of an information cascade.

I excerpt the following example from section 16.2, which the authors in turn borrow from Lisa Anderson and Charles Holt:

The experimenter puts an urn at the front of the room with three marbles in it; she announces that there is a 50% chance that the urn contains two red marbles and one blue marble, and a 50% chance that the urn contains two blue marbles and one red marble…one by one, each student comes to the front of the room and draws a marble from the urn; he looks at the color and then places it back in the urn without showing it to the rest of the class. The student then guesses whether the urn is majority-red or majority-blue and publicly announces this guess to the class.

Let’s simulate how a set of rational students would perform in this experiment.

The first student has it easy: if he selects a blue marble, he guesses blue; if he selects a red marble, he guesses red. Either way, his guess publicly discloses the first marble’s color.

Thus the second student knows exactly the colors of the first two selected marbles. If he selects the same color as the first student, he will make the same guess. If, however, the second student selects a red marble, he has no reason to prefer one color over the other. Let’s assume that, when the odds are 50/50, an indifferent student breaks symmetry by selecting the color in his hand. That way, we guarantee that the second student discloses the color of the marble he selects.

Things get interesting with the third student’s selection. What happens if the first two students have both guessed red, but the third student selects a blue marble? Rationally, the third student will guess red, since he knows that two of the first three selected marbles were red. In fact, if the first two students select red marbles, *every* subsequent student will ignore his own selection and guess red. Of course, analogous reasoning applies if we reverse the colors.

Generalizing from this case, we can see that the sequence guesses locks in on a single color as soon as the count for one color is ahead of the other by two. I leave it as an exercise to the reader to determine that, if the urn is majority-red, there is a 4/5 probability that the sequence will converge to red and a 1/5 probability that it will converge to blue.

A 1/5 probability of arriving at the wrong answer may not seem so bad. But imagine if you could see the actual marbles sampled and not just the guesses (i.e., each student provides an independent signal). The law of large numbers kicks in quickly, and the probability of the sample majority color being different from the true majority converges to 0.

This example of an information cascade is unrealistically simple, but is eerily suggestive of the way many sequential decision processes work. I hope we all see it as a cautionary tale. The wisdom of the crowd breaks down when we throw away the independent signals of its participants.

General

The Element of Surprise

Surprise is not a word that user interface designers typically like to hear. Indeed, the principle of least surprise (also called the principle of least astonishment) is that systems should always strive to act in a way that least surprises the user.

Like many interface design principles, the principle of least surprise reflects the premise that software applications exist to be useful. In utility-oriented applications, surprise means distraction and delay — negatives that good designers work to avoid.

But we increasingly see applications whose main value to the user is not utility, but entertainment. Indeed, a recent Nielsen report claims that the top two online activities for Americans are social networks / blogs and games. I take the report with a grain of salt, but it seems safe to argue that people have come to expect the internet to be at least as fun as it is useful.

Even search, which would seem to be the poster child for the utility of online services, is being pressed into the service of entertainment. Max Wilson and David Elsweiler argued as much in their HCIR 2010 presentation about “casual leisure searching“. They mined Twitter to analyze a variety of scenarios where search isn’t about the use finding something, but rather about enjoying the experience. Indeed, their controversial definition of search is broad enough to include the possibility that the user does not have an information need.

Like the businessman in Antoine de St. Exupery’s Le Petit Prince, I’ve long felt that, as “un homme sérieux”, my job is delivering utility to users. Users already have lots of ways to waste time; I focus on making their productivity-oriented time more effective and efficient. I’m glad there are folks who devote their lives to making the rest of us have more fun (especially all the computer scientists who left academia for Pixar), but entertainment simply isn’t a vocation for me.

However, I’ve been coming around to the realization that fun and utility are not mutually exclusive. For example, news serves the utilitarian ideal of informing the citizenry, but many (most?) of us read news as a pleasant way to pass the time. Social networks are another example serving a similar function–perhaps with a balance that is more toward the entertainment of the spectrum but still providing genuine social utility.

A common feature of both of these examples is that users regularly return to the same site expecting the unexpected. The transient nature of news and social news feeds promises an endless supply of fresh content, produced more quickly than users can consume it. This situation is in stark contrast to those of typical web search queries, for which the results are expected to be largely static. Indeed, we may set up alerts to inform us of novel search results, but we are unlikely to regularly visit a bookmarked search results page the way we regularly visit a news or social network site.

Is novelty the only source of surprise? Novelty certainly helps, but it is not a necessity. An alternative source is randomness. I’m known people to use Wikipedia’s “random article” feature. But a more plausible place to introduce randomness is in recommendations — whether for products or content. Since recommendations are good guesses at best, a bit of randomness can help ensure that the guesses are interesting. Indeed, a SIGIR 2010 paper by Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain on “Temporal Diversity in Recommender Systems” explored the use or randomness to induce diversity in recommendations and arrived at the conclusion that people don’t like being recommended the same things over and over again.

Can we generalize from these examples? I think so. For utility-oriented information needs, it is important to provide users with accurate, predictable, and efficient tools. But we can’t dismiss everything else as frivolous. Sometimes we just need to offer our users a little bit of surprise to keep it interesting.

Or, as Mary Poppins tells us: “In every job that must be done, there is an element of fun. You find the fun, and – SNAP – the job’s a game!”