The Napoleon Dynamite Problem

This week’s New York Times Magazine features an article by Clive Thompson about the Netflix Prize. The Netflix Prize, sponsored by the Netflix movie rental company, is perhaps the best marketing stunt I’ve seen in the history of machine learning:

The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.

The Netflix Prize has captured the imagination of many information retrieval and machine learning researchers, and I tip my hat to the folks at Netflix for inspiring researchers while pursuing their self-interest.

But, for all of the energy that the contest has generated, I have my doubts as to whether it focuses on the right problem. As I commented at a few months ago:

what are the real problems we’re trying to help users address through recommendations. I know that, as a Netflix customer, I don’t get much out of the numerical ratings (or out of sorting by them). When a friend recommends a movie to me, he explains why he thinks I’ll like it. Not only is that far more informative, but it also allows me to interact with the process, since I can correct errors in his explanation (e.g., I like British comedy but not slapstick, and hence hate Monty Python) and thus arrive at different recommendations.

I understand that a simple model of assigning a number to each movie is very easy to measure and lends itself to a competition like the Netflix Prize. But, as an end-user, I question the practical utility of the approach, both in the specific context of movie recommendations and more generally to the problem of helping users find what they want / need.

I am gratified to see the New York Times article raises a similar concern:

As the teams have grown better at predicting human preferences, the more incomprehensible their computer programs have become, even to their creators. Each team has lined up a gantlet of scores of algorithms, each one analyzing a slightly different correlation between movies and users. The upshot is that while the teams are producing ever-more-accurate recommendations, they cannot precisely explain how they’re doing this. Chris Volinsky admits that his team’s program has become a black box, its internal logic unknowable.

Specifically, the off-beat film “Napoleon Dynamite” confounds many of the algorithms, because it is nearly impossible to predict whether and why someone will like it.

It’s a matter of debate whether black box recommendation engines are better than transparent ones. I’ve repeatedly made the case for transparency–whether for relevance or recommendations. But the machine learning community, much like the information retrieval community, generally prefers black box approaches, because the restriction of transparency adversely impact the accuracy of recommendations.

If the goal is to optimize one-shot recommendations, they are probably right. But I maintain that the process of picking a movie, like most information seeking tasks, is inherently interactive, and thus that transparency ultimately pays off. Then again, I have drunk the HCIR kool-aid.

By Daniel Tunkelang

High-Class Consultant.

15 replies on “The Napoleon Dynamite Problem”

First of all, I disagree that even if this is one-shot recommendation, they are on the right track, because I disagree that accuracy is all that matters, see:

Hint: in IR, nobody would focus entirely on precision at the expense of recall. We know that a balance is needed. Yet, in collaborative filtering, people use a single metric, without any balance.

But even so, is their accuracy likely to pan out in the real world? Take into account that they work with static data set… ignoring the feedback effect:

Hint: in practise, users will react to your recommender system and not rate the same items. This may play in your favour or against you.

I need to write a position paper of some kind.


As I replied on Twitter, I think your disagreement is that I don’t take a strong enough position against the value of single-metric recommendation systems. Which I interpret as your agreeing with me, only more so.


Comments are closed.