Unless you’ve been living in a cone of silence, you’ve probably heard about the epic war of words between Google and Bing. But just in case, here’s a quick summary:
Amit Singhal, Google Fellow: “Microsoft’s Bing uses Google search results—and denies it“:
Bing is using some combination of:
- Internet Explorer 8, which can send data to Microsoft via its Suggested Sites feature
- the Bing Toolbar, which can send data via Microsoft’s Customer Experience Improvement Program
or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.
Harry Shum, Corporate Vice President, Bing: “Thoughts on search quality“:
We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.
Yusuf Mehdi, Senior Vice President, Online Services Division, Bing: “Setting the record straight“:
Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results. What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.
Matt Cutts, Head of Webspam, Google: “My thoughts on this week’s debate“:
Something I’ve heard smart people say is that this could be due to generalized clickstream processing rather than code that targets Google specifically. I’d love if Microsoft would clarify that, but at least one example has surfaced in which Microsoft was targeting Google’s urls specifically. The paper is titled Learning Phrase-Based Spelling Error Models from Clickthrough Data and here’s some of the relevant parts:
The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser [I assume this is Internet Explorer. –Matt] …. In our experiments, we “reverse-engineer” the parameters from the URLs of these [query formulation] sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs.”
This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction. Figure 1 of that paper looks like Microsoft used specific Google url parameters such as “&spell=1″ to extract spell corrections from Google. Targeting Google deliberately is quite different than using lots of clicks from different places.
Let me start by saying that these are very serious words from very serious people.
Amit and Matt, both of whom I know personally, are not just two of the most prominent Google employees — they have a deep personal investment in Google’s search quality. Amit is personally responsible for much of Google’s web search ranking algorithm, and Matt is surely the person whom spammers (and many SEO consultants) most love to hate. There is no question in my mind that the emotion both of them are expressing is sincere.
I haven’t met Harry or Yusuf, but I have no reason to doubt their own sincerity — especially since everything they are saying seems consistent with the facts — in fact, consistent with the substantive parts of Google’s allegations. Indeed, the facts don’t really seem to be in dispute. And more generally, I’ve met some of the folks who lead the Bing team (like Jan Pedersen), and, like Matt, I believe they are thoughtful and sincere and are devoted to building a great search engine of their own.
The debate is not about the facts. Rather, it’s about what is right and wrong. I will try to summarize the two sides’ position without editorializing.
Bing is claiming that:
- Users have a right to do as they please with their own clickthrough data, which includes data from Google search sessions.
- Bing toolbar users opted in to share this clickthrough data with Bing.
- By using this clickthrough data, Bing creates value for users.
Google is claiming that:
- Bing’s specific targeting of Google clickthrough data amounts to copying Google and is wrong.
- Bing toolbar users are not necessarily aware that they are complicit in this behavior.
- Bing is disingenuous in understating how much it benefits from Google as a signal.
What do I think?
I agree with Bing that users have the right to do as they please with clickthrough data. I’d think Google would agree too, given that Google wrote the sermon on “the meaning of open“:
Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.
I agree with all of the three points I listed as Google’s claims except for the part that Bing’s behavior is wrong. It’s up to users if they want to help Bing compete with Google. Do users know that they’re doing so? Probably not. But would they stop doing so if they did? I doubt it. I can’t see why most users would have a dog in this fight — and in fact, it may be in users’ interest to help Bing be more competitive.
I do think Bing should be forthright about what it is doing — and how much this user-provided data from Google search sessions is contributing to its own quality improvements. Bing can, of course, keep this information secret, but I’d think that Bing would want to defend its reputation as an innovator — especially as the David in a David vs. Goliath fight.
But I also think that Google should be careful with its accusations. Accusing Bing of not being innovative is one thing, and that accusation, backed by concrete examples, is probably enough to score points. But implying that Google owns its users’ clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.
I’m curious to hear what others here think. It’s been a while since I could freely express opinions about Google and Bing, so I’m delighted to have such a hot controversy to incite discussion. Because everyone enjoys a muddle puddle tweetle poodle beetle noodle bottle paddle battle!