The Noisy Channel

 

Google vs. Bing: A Tweetle Beetle Battle Muddle

February 5th, 2011 · 52 Comments · General

Unless you’ve been living in a cone of silence, you’ve probably heard about the epic war of words between Google and Bing. But just in case, here’s a quick summary:

Amit Singhal, Google Fellow: “Microsoft’s Bing uses Google search results—and denies it“:

Bing is using some combination of:

or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

Harry Shum, Corporate Vice President, Bing: “Thoughts on search quality“:

We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.

Yusuf Mehdi, Senior Vice President, Online Services Division, Bing: “Setting the record straight“:

Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results. What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.

Matt Cutts, Head of Webspam, Google: “My thoughts on this week’s debate“:

Something I’ve heard smart people say is that this could be due to generalized clickstream processing rather than code that targets Google specifically. I’d love if Microsoft would clarify that, but at least one example has surfaced in which Microsoft was targeting Google’s urls specifically. The paper is titled Learning Phrase-Based Spelling Error Models from Clickthrough Data and here’s some of the relevant parts:

The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser [I assume this is Internet Explorer. –Matt] …. In our experiments, we “reverse-engineer” the parameters from the URLs of these [query formulation] sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs.”

This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction. Figure 1 of that paper looks like Microsoft used specific Google url parameters such as “&spell=1″ to extract spell corrections from Google. Targeting Google deliberately is quite different than using lots of clicks from different places.

Let me start by saying that these are very serious words from very serious people.

Amit and Matt, both of whom I know personally, are not just two of the most prominent Google employees — they have a deep personal investment in Google’s search quality. Amit is personally responsible for much of Google’s web search ranking algorithm, and Matt is surely the person whom spammers (and many SEO consultants) most love to hate. There is no question in my mind that the emotion both of them are expressing is sincere.

I haven’t met Harry or Yusuf, but I have no reason to doubt their own sincerity — especially since everything they are saying seems consistent with the facts — in fact, consistent with the substantive parts of Google’s allegations. Indeed, the facts don’t really seem to be in dispute. And more generally, I’ve met some of the folks who lead the Bing team (like Jan Pedersen), and, like Matt, I believe they are thoughtful and sincere and are devoted to building a great search engine of their own.

The debate is not about the facts. Rather, it’s about what is right and wrong. I will try to summarize the two sides’ position without editorializing.

Bing is claiming that:

  • Users have a right to do as they please with their own clickthrough data, which includes data from Google search sessions.
  • Bing toolbar users opted in to share this clickthrough data with Bing.
  • By using this clickthrough data, Bing creates value for users.

Google is claiming that:

  • Bing’s specific targeting of Google clickthrough data amounts to copying Google and is wrong.
  • Bing toolbar users are not necessarily aware that they are complicit in this behavior.
  • Bing is disingenuous in understating how much it benefits from Google as a signal.

What do I think?

I agree with Bing that users have the right to do as they please with clickthrough data. I’d think Google would agree too, given that Google wrote the sermon on “the meaning of open“:

Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.

I agree with all of the three points I listed as Google’s claims except for the part that Bing’s behavior is wrong. It’s up to users if they want to help Bing compete with Google. Do users know that they’re doing so? Probably not. But would they stop doing so if they did? I doubt it. I can’t see why most users would have a dog in this fight — and in fact, it may be in users’ interest to help Bing be more competitive.

I do think Bing should be forthright about what it is doing — and how much this user-provided data from Google search sessions is contributing to its own quality improvements. Bing can, of course, keep this information secret, but I’d think that Bing would want to defend its reputation as an innovator — especially as the David in a David vs. Goliath fight.

But I also think that Google should be careful with its accusations. Accusing Bing of not being innovative is one thing, and that accusation, backed by concrete examples, is probably enough to score points. But implying that Google owns its users’ clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.

I’m curious to hear what others here think. It’s been a while since I could freely express opinions about Google and Bing, so I’m delighted to have such a hot controversy to incite discussion. Because everyone enjoys a muddle puddle tweetle poodle beetle noodle bottle paddle battle!

52 responses so far ↓

  • 1 Corey K Katir // Jun 11, 2011 at 1:10 pm

    Is Bing A Better Search Engine?

    We have created a logical test that shows which search engine provides better search results. Google or Bing? I will explain the test on this page.

    First, I would like to make the test concept more clear with several examples:

    Say we take a series of Titles to search on Google and Bing for comparison.

    Here are several example: (all the tests are at http://www.rssfeedrss.com/index2.html)

    Title 1) Patients are willing to undergo multiple tests for new cancer treatments

    http://www.rssfeedrss.com/test2.html

    Title 2) Conference on composite materials for structural performance: Towards higher limits

    http://www.rssfeedrss.com/test4.html

    Now, I explain the way this test works.
    Each title is about two or three main keywords.
    For example Title 1 is about cancer treatment.
    Title 2 is about composite material.

    I propose a logical test that uses Google, and also Bing search results that extracts the main keywords in a logical manner. The better search engine will provide a better and more relevant extraction based on this logical test. I like to emphasize logic.

    Now what is this logical test?
    The better search engine provides search results that contain higher number of main keywords in the search page results (usually in bold).

    For example, if we take title 1 to either Google or Bing and make a search on the whole title and then count the number of times the main keywords appear in the search results (usually in Bold), the better search engine will give us cancer treatment and not other words. That means if you count the number of times the keywords cancer treatment appear in search results in both Google and Bing, Bing provides a higher quantity.

    I used both Google and Bing for the test on the page http://www.rssfeedrss.com/index2.html and Bing provided a better search. You can do this test in-house.

    I will propose this test in search engine conferences. It is a valid test.
    I can email you the perl file that performed the test. Call 949-500-8638 or email info@katir.com.

    In fact, if you continue the test to second page results, it also shows which search engine provides better search results for the second page or third page or….

    Why is this test valid?

    It is not very complex to prove why this test is valid. If you type a sentence that contains several main keywords, you prefer more information about those main keywords. The higher quantity of those main keywords prove the page is more relevant and the search engine has delivered more relevant results.

  • 2 Strata 2012: Big Data is Bigger than Ever! // Mar 2, 2012 at 12:58 am

    […] to address my charge that Google doesn’t think users own their search history (cf. “Google vs. Bing: A Tweetle Beetle Battle Muddle“), but she said she was unfamiliar with the details of that event. I do wish that someone at […]

Clicky Web Analytics