The Noisy Channel

 

Prediction Is Hard, Especially About The Future

August 18th, 2009 · 9 Comments · General

That Niels Bohr certainly knew what he was talking about! But that hasn’t discouraged folks in any number of industries from trying to make predictions.

Google in particular has been researching the predictability of search trends (just to be fair and balanced, so have Bing and Yahoo). Yossi Matias, Niv Efron, and Yair Shimshoni at Google Labs Israel have made some fascinating observations based on Google Trends, including the following:

  • Over half of the most popular Google search queries are predictable in a 12 month ahead forecast, with a mean absolute prediction error of about 12%.
  • Nearly half of the most popular queries are not predictable (with respect to the model we have used).
  • Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food & Drink (67%) and Travel (65%).
  • Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks & Online Communities (27%).
  • The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean absolute prediction error of of less than 6%.

You can read their full 32-page paper here.

I’m not surprised at the predictability of human search behavior, especially for stable topics or even for unstable ones viewed as aggregates–one could argue the celebrities and scandals du jour are unpredictable but interchangeable. What I’m curious about is what we can do with this predictability.

In the SIGIR ’09 session on Interactive Search, Peter Bailey talked about “Predicting User Interests from Contextual Information“, analyzing the predictive performance of contextual information sources (interaction, task, collection, social, historic) for different temporal durations. Max Van Kleek wrote a nice summary of the talk at the Haystack blog. The paper doesn’t investigate seasonality (perhaps because they only looked at four months of data), but I’d imagine they would subsume it under the broader categories of historic and social context. But they do set a clear goal:

Postquery navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity…Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.

I hope Google is following a similar agenda. If you’re going to go through the trouble of predicting the future, then help make it a better one for users!

9 responses so far ↓

  • 1 Daniel Lemire // Aug 18, 2009 at 11:14 am

    I think that the quality of our predictions is probably proportional to the amount of innovation. Since we innovate faster and faster, we are going to become more and more unpredictable. No?

    Think about making predictions about behavior in the middle ages (in Europe). Certainly, it was not very difficult! There was the odd, unexpected local war or drought… but mostly, everything was routine.

    Anyhow, I have worked a bit on recommender systems before it became such a big deal… and I eventually gave up for several reasons, and one of them was my own inability to predict my own behavior, even based on very reliable data. I would study for hours the reviews of a given movie, and find that I could not predict at all whether I would enjoy it (even obviously bad movies, could generate enjoyment…)

    I don’t think I can predict what I’ll be searching for in a week, let alone in a year.

    Maybe I am an odd fellow. Maybe.

  • 2 Daniel Tunkelang // Aug 18, 2009 at 6:51 pm

    the quality of our predictions is probably proportional to the amount of innovation

    Isn’t that a tautology? :-) But your point is well taken–there’s little value of investing in forecasting when we know that the world is changing quickly.

    The value of recommender systems–that’s a whole other story, but I think we’ve beaten that one to death a few times.

  • 3 jeremy // Aug 19, 2009 at 12:37 am

    I think that the quality of our predictions is probably proportional to the amount of innovation. Since we innovate faster and faster, we are going to become more and more unpredictable. No?.

    It depends on whether you are more a follower of Parmenides, or Heraclitus. Right?

  • 4 Daniel Tunkelang // Aug 19, 2009 at 12:48 am

    I just listen to Mr. Tompkins:

    Increases, decreases
    Decreases, Increases
    What the hell do we care
    What entropy does?

  • 5 dinesh vadhia // Aug 19, 2009 at 3:59 pm

    I plan to read the paper so the following can be taken with a pinch of salt:

    If they know that 50% of queries are predictable in a 12 month ahead forecast then doesn’t this mean that they can organize their infrastructure better to satisfy these ‘same’ queries?

    Doesn’t it demonstrate that people (in large numbers) show similar behavior and interests (health, food and drink, travel) which helps in the targeting of ads?

  • 6 Daniel Lemire // Aug 19, 2009 at 4:16 pm

    @dinesh

    My experience is that it is fairly easy to be right with your predictions 50% of the time. Or even 80% of the time. Sometimes using rather crude algorithms.

    But the devil is in the remaining 50% or 20%.

    Maybe you are a publisher. You are pretty good at predicting booksellers. Some guy comes in with a crazy novel “Dune”. It is nonsensical to you. You reject it. In fact, 40 different publishers reject it. Turns out that the novel in question becomes one of the two most important scifi novel of all times.

    My point is that if all you are predicting is the “predictible” stuff, there might not be much value in your work. Basically, you must be able to predict the unexpected.

    That’s how Physics made its bread and butter. Setup a crazy experiment to measure the speed of light in different. Nobody has any idea what will come out of it, but some crazy guy called Einstein predicts that the speed of light will always come out to the same value. His prediction made him famous (among other things).

    Predicting that I will need milk next week, on the other hand, has very little value.

    I mean, we have been there in the late nineties with the “data mining” fad. All companies were doing association rules to find out “hidden” relations between products and services. Most of this work was quickly discarded since it did not bring any actual value to the business.

    (Some of it did, when Gred Linden got Amazon to add the item-to-item recommender system, but that’s another unexpected story. Managers back then predicted that it would not be useful to the company!)

  • 7 jeremy // Aug 19, 2009 at 4:58 pm

    My point is that if all you are predicting is the “predictible” stuff, there might not be much value in your work. Basically, you must be able to predict the unexpected.

    Daniel L: I could not agree with you more. I think you’re absolutely spot on. As a researcher, the more interesting problem is predicting the unexpected, finding the uncommon, discovering the atypical.

    However, I think most of the world disagrees with you (us). Web search engines make lots and lots of money by basically serving people the common, mundane, expected results (and advertisements), but doing it in a way that allows the user to remain lazy.

    Giving lazy people the expected appears to be more lucrative than giving active information seekers the unexpected.

  • 8 Daniel Tunkelang // Aug 19, 2009 at 7:17 pm

    While I’m with you about the interesting research–and many of the high-value problems in practice–being about the hard stuff, I wouldn’t knock the value of answering easy questions efficiently at large scale. I’ve been known to compare Google to McDonald’s. Let’s not forget that McDonald’s is an extremely successful company with lots of satisfied customers, none of whom are complaining about theor lack of gourmet cuisine.

  • 9 Daniel Lemire // Aug 19, 2009 at 8:22 pm

    Did I mention that I don’t eat at McDonald’s?

    ;-)

Clicky Web Analytics