In an interview with CNET’s Tom Krazit, Google Chief Economist Hal Varian made a nice argument regarding the relative advantages of scale to a search engine:
On this data issue, people keep talking about how more data gives you a bigger advantage. But when you look at data, there’s a small statistical point that the accuracy with which you can measure things as they go up is the square root of the sample size. So there’s a kind of natural diminishing returns to scale just because of statistics: you have to have four times as big a sample to get twice as good an estimate.
Another point that I think is very important to remember…query traffic is growing at over 40 percent a year. If you have something that is growing at 40 percent a year, that means it doubles in two years.
So the amount of traffic that Yahoo, say, has now is about what Google had two years ago. So where’s this scale business? I mean, this is kind of crazy.
The other thing is, when we do improvements at Google, everything we do essentially is tested on a 1 percent or 0.5 percent experiment to see whether it’s really offering an improvement. So, if you’re half the size, well, you run a 2 percent experiment.
For those unfamiliar with statistics, I encourage you to look at the Wikipedia entry on standard deviation. Varian is obviously reducing the argument to a sound bite, but the sound bite rings true. More is better, but there’s a dramatically diminishing return at the scale of either Microsoft or Google.
However, I do think there’s a big difference when you start talking about running lots of experiments on small subsets of your users. The ability to run twice as many simultaneous tests without noticeably disrupting overall user experience is a major competitive advantage. But even there quality trumps quantity–how you choose what to test matters a lot more than how many tests you run.
What does strike me as ironic is that the moral here is a great counterpoint to the Varian’s colleagues’ arguments about the “unreasonable effectiveness of data“. Granted, it’s apples and oranges–Alon Halevy, Peter Norvig, and Fernando Pereira are talking about data scale, not user scale. Still, the same arguments apply. Sampling is sampling.
ps. Also check out Nick Carr’s commentary here.
5 replies on “Google’s Chief Economist Hal Varian Talks Stats 101”
But even there quality trumps quality
Do you mean “quality trumps quantity”?
how you choose what to test matters a lot more than how many tests you run.
And how you choose to measure what you test matters a lot more than either of those two factors. It’s a point that we discussed in your comments, before, with Max Wilson, etc. If someone runs a query and then doesn’t click anything, how can you tell the difference between that search being unsuccessful, because they didn’t find the information that they wanted, and that search being successful, because they found that the information that they wanted didn’t appear?
For example, someone queries for his or her name, and finds that the embarrassing drunk photo doesn’t show up in the first page of results. That’s a success! Or someone writing a patent does a bunch of searches for related work, and finds that there is nothing related. Success!
How does the search engine measure those scenarios? Does a 0.5%, 1% or even 2% experiment really give you enough data to tease out the difference in these two types of searches? Does a 100% experiment even give you enough data?
Oops! Corrected now.
And I agree, what you measure and thus optimize for is critical. I still suspect that the bottleneck is creativity, not volume or scale.
[…] Hay mucha data interesante en el blog de Nick Carr y datos en Search Engine Land y una nota excelente de Daniel Tunkelang […]
What Varian is saying here is that for tuning you do not need to run the experiment on all users. Correct, if you tune elements on a wide-scale (e.g., the infamous “what shade of blue to use for our interface element”)
However, there are many aspects of tuning in which you do run into sparse data problems. How do you estimate, say, clickthrough for queries that are typed only a few hundred times a day? Or how do you estimate clickthrough for an ad-query pair? In such cases, you *do* have to deal with sparse data, and doubling or tripling the size of users can indeed be beneficial.
I will not even mention the network effects for the bipartite ad network (advertisers will choose a network with many content nodes, content nodes will put ads from a network with many advertisers). I found it rather ironic that Hal Varian, of all people, chose to ignore that aspect.
I understand the need from Google to play down the Microsoft-Yahoo agreement but sometimes it feels like listening to propaganda…
Panos, I think we’re on the same page. That’s what I meant by saying that the ability to run twice as many simultaneous tests without noticeably disrupting overall user experience is a major competitive advantage. But, as you point out, that’s a somewhat inconvenient truth if you’re trying to argue that scale isn’t that big an advantage.
BTW, I don’t think this is about Google playing down the the Microsoft-Yahoo agreement. Rather, it’s to dismiss the argument that Google has a monopolistic advantage because of its scale. I.e., he’s arguing that Microsoft doesn’t need Google’s scale to be competitive.