The Long Tail of Search

The “long tail” is one of the most abused buzzwords of recent years, and I hesitate to use it myself in respectable company.

Nonetheless, SEO veteran Dustin Woodard has a nice guest post at the Hitwise Intelligence blog entitled “Sizing Up the Long Tail of Search“. Here are some statistics he cites about the distribution of search term frequency for web search data collected by Hitwise:


  • Top 100 terms: 5.7% of the all search traffic
  • Top 500 terms: 8.9% of the all search traffic
  • Top 1,000 terms: 10.6% of the all search traffic
  • Top 10,000 terms: 18.5% of the all search traffic

It’s nice to see concrete data to validate conventional wisdom. Of course, I’d be curious to see the corresponding distribution of ad revenue associated with terms.

By Daniel Tunkelang

High-Class Consultant.

5 replies on “The Long Tail of Search”

There are actually quite a few publications studying query logs and query distributions. Here are two early ones,

B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–17, 1998.

C. Silverstein, M. Henzinger, J. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC-TN-1998-014, HP Labs Technical Report, 1998.


Fernando, thanks. But I’d be curious to see something a bit more current, ideally from insiders at the major search players. I have to imagine that online behavior has changed a bit in the past decade.


Does this include “adult search terms” or not?
I’d imagine that those terms are in that big head, not the tail, and that there are enough of them to fill….. ah, I see they filtered out adult searches. I’d love to see what portion of search engine resources (plus bandwidth, etc.) is being consumed by porn or sexual content.


Comments are closed.