The Noisy Channel

 

The Long Tail of Search

November 7th, 2008 · 5 Comments · Uncategorized

The “long tail” is one of the most abused buzzwords of recent years, and I hesitate to use it myself in respectable company.

Nonetheless, SEO veteran Dustin Woodard has a nice guest post at the Hitwise Intelligence blog entitled “Sizing Up the Long Tail of Search“. Here are some statistics he cites about the distribution of search term frequency for web search data collected by Hitwise:

 

  • Top 100 terms: 5.7% of the all search traffic
  • Top 500 terms: 8.9% of the all search traffic
  • Top 1,000 terms: 10.6% of the all search traffic
  • Top 10,000 terms: 18.5% of the all search traffic

It’s nice to see concrete data to validate conventional wisdom. Of course, I’d be curious to see the corresponding distribution of ad revenue associated with terms.

If you enjoyed this post, make sure you subscribe to my RSS feed!

5 responses so far ↓

  • 1 FD // Nov 8, 2008 at 12:22 pm

    There are actually quite a few publications studying query logs and query distributions. Here are two early ones,

    B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: A study of user queries on the web. SIGIR Forum, 32(1):5–17, 1998.

    C. Silverstein, M. Henzinger, J. Marais, and M. Moricz. Analysis of a very large altavista query log. Technical Report SRC-TN-1998-014, HP Labs Technical Report, 1998.

  • 2 Daniel Tunkelang // Nov 8, 2008 at 1:07 pm

    Fernando, thanks. But I’d be curious to see something a bit more current, ideally from insiders at the major search players. I have to imagine that online behavior has changed a bit in the past decade.

  • 3 Daniel Tunkelang // Nov 8, 2008 at 1:45 pm

    For those with access to the ACM Digital Library (or a hard copy of this month’s issue of Communications of the ACM), check out Avi Goldfarb and Catherine Tucker’s article on search engine advertising: http://portal.acm.org/citation.cfm?id=1400214.1400222

  • 4 Otis Gospodnetic // Nov 10, 2008 at 6:12 pm

    Does this include “adult search terms” or not?
    I’d imagine that those terms are in that big head, not the tail, and that there are enough of them to fill….. ah, I see they filtered out adult searches. I’d love to see what portion of search engine resources (plus bandwidth, etc.) is being consumed by porn or sexual content.

  • 5 Daniel Tunkelang // Nov 10, 2008 at 9:40 pm

    Yeah, I’d love to see that data too. There are some reports, e.g., 18% of queries for adult content in a 1997 Excite log reported in http://www.cs.usask.ca/UM99/Proc/lau.pdf and 10% in a 2003-4 AOL log (not sure if this is the same as the infamously released AOL log), as reported in http://www.ir.iit.edu/publications/downloads/p249-beitzel.pdf.

Leave a Comment

Clicky Web Analytics