The Noisy Channel

 

Q&A with Amit Singhal

April 8th, 2008 · 9 Comments · General

Amit Singhal, who is head of search quality at Google, gave a very entertaining keynote at ECIR ’08 that focused on the adversarial aspects of Web IR. Specifically, he discussed some of the techniques used in the arms race to game Google’s ranking algorithms. Perhaps he revealed more than he intended!

During the question and answer session, I reminded Amit of the admonition against security through obscurity that is well accepted in the security and cryptography communities. I questioned whether his team is pursuing the wrong strategy by failing to respect this maxim. Amit replied that a relevance analog to security by design was an interesting challenge (which he delegated to the audience), but he appealed to the subjectivity of relevance as a reason for it being harder to make relevance as transparent as security.

While I accept the difficulty of this challenge, I reject the suggestion that subjectivity makes it harder. To being with, Google and other web search engines rank results objectively, rather than based on user-specific considerations. Furthermore, the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry.

But the challenge is indeed a daunting one. Is there a way we can give control to users and thus make the search engines objective referees rather than paternalistic gatekeepers?

At Endeca, we emphasize the transparency of our engine as a core value of our offering to enterprises. Granted, our clients generally do not have an adversarial relationship with their data. Still, I am convinced that the same approach not only can work on the web, but will be the only way to end the arms race between spammers and Amit’s army of tweakers.

9 responses so far ↓

  • 1 Mark Watkins // Apr 10, 2008 at 11:35 am

    From the “loyal opposition” (the author of
    How Pagerank wrecked the web
    founded a new search company) – essentially arguing that Google’s ranking algorithms (effective, but, ultimately, arbitrary in some sense) are in fact the origin of the arms race. There must be some “less gamable” ranking system out there….

  • 2 Daniel Tunkelang // Apr 11, 2008 at 12:07 pm

    Well, the initial success of PageRank, at least as far as I can tell, came from it being harder to game than the IR measures that other search engines were using at the time. Since then, of course, it’s been an arms race.

    I’d really love to see the relevance arms race replace with a principled approach based on attention economics.

  • 3 Mark // Apr 11, 2008 at 2:30 pm

    Well back in the day Page Rank succeeded because it worked better than existing approaches, as well as being harder to game.

    But yes the “attention economics” approach would be interesting, if there were some way to measure that, that did not incent people to mount DoS attacks to simulate attention 8).

  • 4 fd // Apr 12, 2008 at 12:11 pm

    Back in the day PageRank succeeded because it was a baseline approximation for user data. With richer user visitation data (through e.g. toolbars) PageRank becomes moot. See here.

  • 5 Mark // Apr 14, 2008 at 5:35 am

    I might be wrong, but if we built a ranking system based on user visitation data, won’t that be gamed as well? – instead of “link farms”, won’t we get “traffic farms” that artificially inflate the user visitation values of some sites by directing extra (fake) traffic to them?

  • 6 Aaswath // Apr 14, 2008 at 9:50 pm

    This was also a topic of discussion while I was at MS — we did in fact publish some papers on spam and adversarial IR based on collaborations with MSR, but we were well aware that blackhats/spammers out there were reading these papers. When spam proves to be an existential threat to the relevance of search engines, things like transparency are sometimes hard to justify.

    However, Google (and Yahoo and Live) have done a good job of providing at least some transparency to siteowners with their webmaster tools (ie: are you being hit with spam filters, etc). These didn’t exist until relatively recently, but I’d say everyone’s now come around to seeing the positive benefits of engaging with legitimate siteowners and offering them information in return for their registering themselves, their sites and sitemaps in a formal way. This is a sea-change from a previously adversarial relationship to all siteowners to one that tries to engage with normal/”good” sites.

    Some other thoughts:
    1) Search engines are always looking for proxies for relevance (like links) and aspects of an attention economics-approach to this are in play already. Unless I’m misunderstanding however, this too is currently prone to be gamed as well — for example, botnets can be frighteningly effective.

    2) Can there ever be a truly (or even quasi-) objective definition of relevance across the web? Obviously engines have ways of measuring their effectiveness, but those definitions are ultimately subjective ones. Would an “open” standard of relevance achieve this?

    I’m inclined to think web relevance will remain subjective by virtue of the nature of the dataset, and because the stakes are high, monetarily speaking for all involved. I’ll also say given the amount of money involved, spammers will try very, very hard to game whatever system is put out there. There’s a few billion too many involved, and thus, unsurprisingly, a large number of smart folks working on spamming. Perhaps I have a dimmer view of human nature after dealing with this for a while, but I’m skeptical we can ever end the arms race with spammers unless the monetary incentive decreases in some way :)

    Another thought is that an ancillary beneficiary to spam succeeding is often the search engine’s ad wing itself in terms of fees. This isn’t to suggest any conspiracy, just an example of the weird dynamics often at play vis-a-vis spam.

  • 7 Disincenting Spam | The Noisy Channel // Oct 20, 2008 at 2:55 pm

    […] Greg’s argument reminds me of one of the first posts I wrote on this blog. I was criticizing Google’s approach of keeping its relevance approach […]

  • 8 Age-Old Questions about BWBX « Network(ed)News // Jan 26, 2009 at 5:11 pm

    […] So if we’re still getting it wrong, why? And if we’re getting it right, why can’t we be more transparent about it? We know how pagerank is the beating heart of google’s effort to out-engineer spam, and some argue that’s not even enough. […]

  • 9 SIGIR 2009: Day 3, Industry Track: Matt Cutts | The Noisy Channel // Jul 29, 2009 at 12:06 am

    […] I’d heard Amit speak before: he delivered one of the keynotes at ECIR 2008 (and inspired one of my first blog posts!). So I decided to aim for Matt Cutts, despite having no way to contact him (the head of […]

Clicky Web Analytics