OK, this is an oldie but goodie from xkcd, but I saw it in a recent presentation and couldn’t resist sharing.
Of course, you’d never know that from the sensationlist press.
If you enjoyed this post, make sure you subscribe to my RSS feed!
OK, this is an oldie but goodie from xkcd, but I saw it in a recent presentation and couldn’t resist sharing.
Of course, you’d never know that from the sensationlist press.
© 2008 The Noisy Channel — Cutline by Chris Pearson

6 responses so far ↓
1 Milan // Nov 17, 2008 at 5:33 pm
Of course, since that was first published there’s been more than a bit of self-referential measurement error introduced. The current count for “skydiving” results is 633, for “blogging” 652, and for “knitting” more than 10,700.
Whether that indicates high overlap in the sample sets or a deadly wave of recent yarn & needle fatalities, I can not imagine.
2 Daniel Lemire // Nov 17, 2008 at 9:24 pm
Blogging leads to weight loss? Who knew?
Where do these “journalists” take their info? I know plenty of overweight bloggers.
3 Daniel Tunkelang // Nov 17, 2008 at 10:44 pm
I can’t speak to the rigor of journalists–or bloggers–in obtaining their statistics.
But, in all seriousness, the techniques that some people have discussed for learning from distributional similarity in documents and query logs are strikingly in line with that xkcd comic.
It does make you (or at least me) wonder how to game the log-mining approaches by spamming the query log, much as spammers already exploit distributional similarity to create documents that get past spam filters.
4 jeremy // Nov 18, 2008 at 7:45 pm
Be careful about what assumptions you draw based on Google hit and/or suggestion counts.
There is ongoing evidence that Google does not report the real numbers.
http://sethf.com/infothought/blog/archives/001402.html
5 jeremy // Nov 18, 2008 at 7:46 pm
Here is a better link:
http://www.hyperorg.com/blogger/2008/11/12/obama-v-bush-google-counts/
6 Daniel Tunkelang // Nov 19, 2008 at 1:15 pm
Seth Finkelstein–haven’t seen him since I was an undergrad at MIT! But I do agree with him and Dave Weinberger that Google’s counts are systematically skewed, though it’s not clear what causes that skew. Mr. Schmidt, tear down this black box!
Leave a Comment