The “long tail” is one of the most abused buzzwords of recent years, and I hesitate to use it myself in respectable company.
Nonetheless, SEO veteran Dustin Woodard has a nice guest post at the Hitwise Intelligence blog entitled “Sizing Up the Long Tail of Search“. Here are some statistics he cites about the distribution of search term frequency for web search data collected by Hitwise:
- Top 100 terms: 5.7% of the all search traffic
- Top 500 terms: 8.9% of the all search traffic
- Top 1,000 terms: 10.6% of the all search traffic
- Top 10,000 terms: 18.5% of the all search traffic
It’s nice to see concrete data to validate conventional wisdom. Of course, I’d be curious to see the corresponding distribution of ad revenue associated with terms.
5 replies on “The Long Tail of Search”
There are actually quite a few publications studying query logs and query distributions. Here are two early ones,
B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–17, 1998.
C. Silverstein, M. Henzinger, J. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC-TN-1998-014, HP Labs Technical Report, 1998.
LikeLike
Fernando, thanks. But I’d be curious to see something a bit more current, ideally from insiders at the major search players. I have to imagine that online behavior has changed a bit in the past decade.
LikeLike
For those with access to the ACM Digital Library (or a hard copy of this month’s issue of Communications of the ACM), check out Avi Goldfarb and Catherine Tucker’s article on search engine advertising: http://portal.acm.org/citation.cfm?id=1400214.1400222
LikeLike
Does this include “adult search terms” or not?
I’d imagine that those terms are in that big head, not the tail, and that there are enough of them to fill….. ah, I see they filtered out adult searches. I’d love to see what portion of search engine resources (plus bandwidth, etc.) is being consumed by porn or sexual content.
LikeLike
Yeah, I’d love to see that data too. There are some reports, e.g., 18% of queries for adult content in a 1997 Excite log reported in http://www.cs.usask.ca/UM99/Proc/lau.pdf and 10% in a 2003-4 AOL log (not sure if this is the same as the infamously released AOL log), as reported in http://www.ir.iit.edu/publications/downloads/p249-beitzel.pdf.
LikeLike