At last we arrive at the SIGIR 2009 Industry Track. Since I organized this track (which mainly involved coming up with a program and then actually producing the speakers), I’m not exactly an impartial observer. But hopefully the organizers of future industry tracks will benefit from my perspective as an organizer.
Last December (New Year’s Eve, to be precise), I started recruiting speakers. I started with a list of topics I wanted to see covered, and one of those topics was spam / adversarial information retrieval. My top two choices were Matt Cutts and Amit Singhal, both members of the Search Quality group at Google. I’d heard Amit speak before: he delivered one of the keynotes at ECIR 2008 (and inspired one of my first blog posts!). So I decided to aim for Matt Cutts, despite having no way to contact him (the head of Google’s Webspam team is understandably a bit protective of his personal email address). And, just two weeks later, I had Matt locked in to the program.
Matt was an incredible speaker, and he had the unenviable task of opening the Industry Track at 8:30 AM, the morning after the banquet. His title, “WebSpam and Adversarial IR: The Road Ahead”, gave him a fair amount of maneuvering room, and he used his 45 minutes to give the audience a peek into his world.
He opened the talk by inducing the audience to try to think like a spammer. He then game examples of social engineering attacks, to put us in a “black hat” mindset. He also pointed out the danger of punishing sites with spammy inlinks: people and companies would use this knowledge against their competitors / enemies (the practice has been called “Google bowling“).
He offered a common-sense framework for fighting spam: reduce the return on investment. Unfortuately, he sees a trend in spam where spammers are aiming for faster, higher payoffs by hacking sites and installing malware. Indeed, the democratizing effect of social media means that a lot more people have pages that can serve spam, including their Twitter and Facebook pages. He invited the information retrieval community to invest effort in learning how to automatically detect that a page or server has been hacked.
My only quibble with the talk is that Matt did not discuss the inherent subjectivity of spam. Sure, there are many cases that are black and white, but ultimately spam (like relevance) is in the eye of the user. I’d love to see more use of techniques like attention bond mechanisms that accommodate a subjective definition of spam, e.g., “any email that you would rather have not received.”
But I quibble. Matt delivered an excellent talk to a packed audience, and it was a real privilege to have him kick off the Industry Track.
ps. You can also read Jeff Dalton’s notes on Matt’s presentation.