Let me preface this post with a clear disclaimer: I work at Google, but the views I express on this blog are my own personal views.
Last week, Google head of webspam Matt Cutts posted a full-throated defense of Google’s transparency on Google’s European Policy Blog in response to complaints that a few companies raised to the European Commission. Long-time readers of my blog know that I’m a big fan of search engine transparency and have made my own calls on this blog for Google to be more transparent. The fact that I work at Google now doesn’t change my values. But being on the inside has informed my perspective.
In particular, as Matt elaborates in his post, Google deserves more credit for transparency than it often gets from its critics. For example, Google has published:
- “The Anatomy of a Large-Scale Hypertextual Web Search Engine“, which not only details the formula for PageRank, but also mentions other signals that Google uses to rank search results: anchor text, location of query terms within documents, proximity of query terms, etc.
- details of its key infrastructure innovations: MapReduce, the Google File System, Bigtable, and Protocol Buffers
- hundreds of research papers by Googlers in diverse areas of computer science
He goes onto describe the various webmaster tools and social media resources that Google has made available. The popularity of these tools is a testament to their utility.
Still, as Matt points out:
we don’t think it’s unreasonable for any business to have some trade secrets, not least because we don’t want to help spammers and crackers game our system. If people who are trying to game search rankings knew every single detail about how we rank sites, it would be easier for them to ‘spam’ our results with pages that are not relevant and are frustrating to users — including porn and malware sites.
As I blogged back in 2008, I still hope that someday we won’t need to have to rely on a relevance analog of security through obscurity in order to deter spam and abusive SEO practices. But I recognize that we haven’t developed such an analog, and hence that complete transparency today for web search ranking algorithms would have a far greater downside than upside for ordinary users.
I suspect that a prerequisite for complete transparency in search requires moving from a ranking-based retrieval approach to a set-based approach. For many web search information needs (e.g., navigational queries), it’s hard to see how users would benefit from such a radical change. For queries that represent more exploratory information needs, a set-based approach would be (at least in my view) far preferable to one based on ranking. But there’s a lot of work to do on the content side before such exploratory interfaces for the web are usable.
In summary, I’m happy to see Matt taking a public stand in Google’s defense. I don’t always agree with my employer’s decisions, but I do believe that my colleagues act in good faith and with good intentions. I understand how many people–especially site owners–fixate on whatever Google keeps secret. In a world where so many people compete for attention, information is power. Google tries to provide maximum quality to users while keeping the playing field level for site owners. As Google Fellow Amit Singhal points out, “this stuff is tough“.