The Noisy Channel

 

Google vs. Bing: A Tweetle Beetle Battle Muddle

February 5th, 2011 · 52 Comments · General

Unless you’ve been living in a cone of silence, you’ve probably heard about the epic war of words between Google and Bing. But just in case, here’s a quick summary:

Amit Singhal, Google Fellow: “Microsoft’s Bing uses Google search results—and denies it“:

Bing is using some combination of:

or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

Harry Shum, Corporate Vice President, Bing: “Thoughts on search quality“:

We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.

Yusuf Mehdi, Senior Vice President, Online Services Division, Bing: “Setting the record straight“:

Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results. What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.

Matt Cutts, Head of Webspam, Google: “My thoughts on this week’s debate“:

Something I’ve heard smart people say is that this could be due to generalized clickstream processing rather than code that targets Google specifically. I’d love if Microsoft would clarify that, but at least one example has surfaced in which Microsoft was targeting Google’s urls specifically. The paper is titled Learning Phrase-Based Spelling Error Models from Clickthrough Data and here’s some of the relevant parts:

The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser [I assume this is Internet Explorer. –Matt] …. In our experiments, we “reverse-engineer” the parameters from the URLs of these [query formulation] sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs.”

This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction. Figure 1 of that paper looks like Microsoft used specific Google url parameters such as “&spell=1″ to extract spell corrections from Google. Targeting Google deliberately is quite different than using lots of clicks from different places.

Let me start by saying that these are very serious words from very serious people.

Amit and Matt, both of whom I know personally, are not just two of the most prominent Google employees — they have a deep personal investment in Google’s search quality. Amit is personally responsible for much of Google’s web search ranking algorithm, and Matt is surely the person whom spammers (and many SEO consultants) most love to hate. There is no question in my mind that the emotion both of them are expressing is sincere.

I haven’t met Harry or Yusuf, but I have no reason to doubt their own sincerity — especially since everything they are saying seems consistent with the facts — in fact, consistent with the substantive parts of Google’s allegations. Indeed, the facts don’t really seem to be in dispute. And more generally, I’ve met some of the folks who lead the Bing team (like Jan Pedersen), and, like Matt, I believe they are thoughtful and sincere and are devoted to building a great search engine of their own.

The debate is not about the facts. Rather, it’s about what is right and wrong. I will try to summarize the two sides’ position without editorializing.

Bing is claiming that:

  • Users have a right to do as they please with their own clickthrough data, which includes data from Google search sessions.
  • Bing toolbar users opted in to share this clickthrough data with Bing.
  • By using this clickthrough data, Bing creates value for users.

Google is claiming that:

  • Bing’s specific targeting of Google clickthrough data amounts to copying Google and is wrong.
  • Bing toolbar users are not necessarily aware that they are complicit in this behavior.
  • Bing is disingenuous in understating how much it benefits from Google as a signal.

What do I think?

I agree with Bing that users have the right to do as they please with clickthrough data. I’d think Google would agree too, given that Google wrote the sermon on “the meaning of open“:

Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.

I agree with all of the three points I listed as Google’s claims except for the part that Bing’s behavior is wrong. It’s up to users if they want to help Bing compete with Google. Do users know that they’re doing so? Probably not. But would they stop doing so if they did? I doubt it. I can’t see why most users would have a dog in this fight — and in fact, it may be in users’ interest to help Bing be more competitive.

I do think Bing should be forthright about what it is doing — and how much this user-provided data from Google search sessions is contributing to its own quality improvements. Bing can, of course, keep this information secret, but I’d think that Bing would want to defend its reputation as an innovator — especially as the David in a David vs. Goliath fight.

But I also think that Google should be careful with its accusations. Accusing Bing of not being innovative is one thing, and that accusation, backed by concrete examples, is probably enough to score points. But implying that Google owns its users’ clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.

I’m curious to hear what others here think. It’s been a while since I could freely express opinions about Google and Bing, so I’m delighted to have such a hot controversy to incite discussion. Because everyone enjoys a muddle puddle tweetle poodle beetle noodle bottle paddle battle!

52 responses so far ↓

  • 1 Jon // Feb 5, 2011 at 9:04 pm

    Daniel – I think you hit the nail on the head with this:

    “But implying that Google owns its users’ clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.”

    The fact is that the users in question are users of both MS and Google services. They simultaneously generate usage data for both of those companies, and the companies are allowed to use the data how they see fit. (Assuming, of course, appropriate opt-in agreements are made, legal CYA is provided, yadda yadda yadda.)

    I think there’s an argument to be made that Google’s ranking is proprietary, and Bing shouldn’t be scraping results pages. But, its not clear that was actually happening. Bing seems to be using click features that are easily manipulated by Google in some cases.

  • 2 jeremy // Feb 5, 2011 at 9:33 pm

    or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.

    How about “put another way, this demonstrates the fantastic power of the mashup culture of the web, and the spirit of openness that feeds (or should feed) it”?

    Why is the automatic assumption that the Bing results are nothing more than incomplete, stale Google? Why is the automatic assumption that no results could possibly be better than Google…that it is conceptually impossible to augment Google results in any way that makes them at all better?

    This is information retrieval, folks. We’ve known for 30 years that the same algorithm produces wide variance on different queries, i.e. there is no one size fits all algorithm.

    So what if Bing is simply getting the big picture.. learning more than Google ever can about which queries work on which engines.. and doing dynamic, personalized, per query mashups.. giving users Bing results when Bing wins, and Google results when Google wins?

    Does Singhal also feel the same way about other kinds of mashups? That the whole mashup culture is nothing but incomplete, stale versions of the original?

  • 3 Daniel Tunkelang // Feb 5, 2011 at 10:15 pm

    By the way, it’s possible I’m misreading Google’s argument — I should reiterate that these paraphrases are my own attempts to distill from the primary materials I’ve cited.

    And I’m not giving Bing a free pass here: if Bing’s quality improvements reflect convergence to Google’s results by way of Google users’ clickthrough data, I think that’s at least embarrassing. Especially since capturing search results clickthrough data isn’t that different from scraping results in terms of net effect: users mostly click on top-ranked results.

    Moreover, I’m very sympathetic to Google’s search quality team. I think they are really upset, to the point that their reaction is not in Google’s best business interests. It would have been better to make their point with humor, like Apple did with the C:\NGRTLNS.W95 ad to suggest that Windows 95 was a cheap imitation of Mac OS, and to focus on the copying not as unethical but rather as an implicit admission that Google was better. And, of course, Google could use technical solutions to interfere with this approach if it were so inclined.

  • 4 jeremy // Feb 5, 2011 at 10:37 pm

    But I don’t think Bing’s result converge to Google’s. I rotate my search engine usage all the time, and Bing feels different. Not worse. Different. Like’s its metrics are slightly different.. more.. scenic route. Not irrelevant scenic route. But still scenic. That’s not a bad thing.

    And sure, for many queries there might be the same results in the top 3. But is that because of a copy, or because there are objectively only a small handful of relevant results? How different does one expect the results to really be, when the query is something like [new york times]?

    I’m left thinking about this line, from Google’s “openness” manifesto:

    Open systems are just the opposite. They are competitive and far more dynamic. In an open system, a competitive advantage doesn’t derive from locking in customers, but rather from understanding the fast-moving system better than anyone else and using that knowledge to generate better, more innovative products. The successful company in an open system is both a fast innovator and a thought leader; the brand value of thought leadership attracts customers and then fast innovation keeps them. This isn’t easy — far from it — but fast companies have nothing to fear, and when they are successful they can generate great shareholder value. Open systems have the potential to spawn industries. They harness the intellect of the general population and spur businesses to compete, innovate, and win based on the merits of their products and not just the brilliance of their business tactics.”

    I’m not unsympathetic to Google’s search quality team. But I really don’t understand what it is that they’re upset about, if they really believe that they (Google) are a fast company, and therefore have nothing to fear? If they truly believe that Bing’s results are nothing but a stale, incomplete version of Google’s, why are they even concerned in the first place? No user would ever switch, hmm?

    If, however, Bing is using its toolbar/etc data to augment and improve open Google’s results.. to add value, and if Bing has actually succeeded in improving upon Google and has truly added that user value then I can see why Google would be concerned. But then it goes back to the argument about the user. If this is really better for the user, then how could Google be against it?

    Quote: “Focus on the user and all else will follow. Since the beginning, we’ve focused on providing the best user experience possible. Whether we’re designing a new Internet browser or a new tweak to the look of the homepage, we take great care to ensure that they will ultimately serve you, rather than our own internal goal or bottom line.”

    In summary, whether or not you or I understand their current argument, it seems it doesn’t matter. Either Google is that fast moving company that doesn’t need to worry that someone else has an incomplete, stale version of itself, or else Bing is an even faster moving company.. and it got to be that way by harnessing the wonderful, mashup-laden beauty that is the user-focused web…taking what is out there and improving on it even further…a very Googly thing to do, wouldn’t you say?

  • 5 jeremy // Feb 5, 2011 at 10:39 pm

    If, however, Bing is using its toolbar/etc data to augment and improve open Google’s results..

    I mean “improve upon”.

  • 6 jeremy // Feb 5, 2011 at 10:42 pm

    After all, is Microsoft copying Google’s advertisers, too? Google says time and again how wonderfully, user-helpfully relevant its ads are.. how the user experience would be worse if the ads weren’t there (or if ads were only shown in an opt-in manner). Therefore Google is always going to be better than Bing, because of Google’s relevant ads, right? ;-)

  • 7 Daniel Tunkelang // Feb 5, 2011 at 11:11 pm

    My understanding is that Google and Bing track the similarity of each other’s top-ranked results to their own. Of course, convergence hardly implies cheating — or that either is improving in quality. But Google is suggesting that Bing is trying to improve its quality through a mechanism that, in effect, uses Google’s ranking as a signal (via user clickthroughs) and results in Bing improving quality by looking more like Google. It would be great to see broad evidence. The results of the honeypot experiment are consistent with Google’s case, but what would be more interesting is the evidence that Amit cites as motivating the honeypot experiment in the first place.

  • 8 Bored // Feb 6, 2011 at 12:32 am

    “But implying that Google owns its users’ clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.”

    Google DOESNT own Clickdata. Users own there clickdata and by participating in the opt-in program, they are transferring ownership or atleast right to use to MSFT.

    Is it ethical to use Google’s clickdata if ever Google owns it? Yes.

    What was break through in search? PageRank. How does it work?

    It uses outbound and inbound links on x.com/blah to rank all the pages that linked by x.com/blah. Isnt Google using x.com/blah’s data to rank other pages? Is it ethical of Google to use x.com/blah’s links to rank the content of pages it is linking to? Yes. Because it helps in better ranking, on the same lines, its ethical to use clickdata from _ANY DAMN SITE_ on earth to rank results.

  • 9 Bored // Feb 6, 2011 at 12:39 am

    @Daniel :
    “And I’m not giving Bing a free pass here: if Bing’s quality improvements reflect convergence to Google’s results by way of Google users’ clickthrough data, I think that’s at least embarrassing. Especially since capturing search results clickthrough data isn’t that different from scraping results in terms of net effect: users mostly click on top-ranked results.”

    There is lot of difference between clickthrough and scraping.

    Get it straight, every real time sensor lacks context.

    User was displayed 10 links and he clicked on 4th link. So what? To use this well you need to know what else was displayed to him. A guy queries for hus girlfriend’s name and there is celebrity with name whose facebook profile is first link and his girlfriend’s facebook profile is 2nd, he clicks on second. Does that mean 2nd was top result for this query? No.

    Also, clickdata is for static ranking which is query independent. It doesnt make any sense to associate it with a query.

    Also imagine, how Bing can royally screw its result by just including Google click data. It has so much noise. Its hard to normalize, people from X country tend to click more than Y country. You cant normalize it.

    Disclaimer : I am a Stanford data mining student. Neither associated with Google / Bing.

  • 10 Daniel Tunkelang // Feb 6, 2011 at 5:53 am

    @Bored:

    I think we’re agreeing on the ethical point. As for the difference between clickthrough and scraping, my point is simply that users are entering search queries, finding results through Google, and then sending the query-result pairs to Bing. This isn’t the same as sending a copy of Google’s ranked list of results, but it does send information derived fron that list. It’s hard to avoid the conclusion that Bing is benefiting from Google’s ranking algorithms.

  • 11 Jmaaan // Feb 6, 2011 at 7:06 am

    http://www.seattle20.com/blog/Big-twist-Google-Has-Been-Using-Bing-Data-to-Improve-Its-Search-Results.aspx

    This is interesting.

  • 12 Jon // Feb 6, 2011 at 9:14 am

    “It’s hard to avoid the conclusion that Bing is benefiting from Google’s ranking algorithms.”

    OTOH, Google is clearly benefitting from MS Toolbar users since they’re running enough Google searches to make a signal worth extracting :)

  • 13 Bored // Feb 6, 2011 at 2:20 pm

    @Jon :
    More searches doesnt make signal worth extracting, it adds equal amount of noise , making signal un usable.

    @Daniel :
    Dont you think its very hard to find relation between query->result just based on click?

    Google results are specific to geography. They are query dependent. Given that you cant analyse log in real time, click stream is majorly a static ranking feature. And static rank is always query independent.

    More over, if I were to use that feature, I wont do it on Google. Web results dont have any context. I would rather prefer clickstreams on a specific site. For ranking technical articles, I would prefer TechCrunch clickstream or some tech blog than generic web result.

    I do agree that Bing benefits from Google’s ranking, but not because its Google, but any search engine or any popular site. If any item is getting more clickthrough on eBay search, its will get better rank in Bing. So I dont find why Google is so cocky about there IP. Its not there IP. Its user’s click action that is benefitting Bing.

    Sorry to go round and round again.

  • 14 Robert // Feb 6, 2011 at 4:36 pm

    “The debate is not about the facts.”

    Maybe that’s true among those of us who have gathered more facts than those shown by Google, and, after piecing facts together with our knowledge of search engine technology.

    We (some more than others) understand how Google’s screen shots came to be, and can now think deep thoughts about the morality of it all.

    For the other 99.99% of the general public, including everyone I’m likely to meet at my family’s Super Bowl party, the debate IS about the facts. To wit: “How can Microsoft get away with this?” and “Why can’t Google just block Microsoft from stealing its pages?”, and “So what? Everyone knows that Microsoft cheats.”

    No, the people who will vote with their clicks are the ones who paid attention to this matter for about 5 minutes, long enough to see some screen shots on TV, laugh about it with Colbert, and go to their computers thinking “Why bother trying Bing, it’s just Google”.

    This sad state of affairs came about because Google showed some screenshots and stood by while the media reported it as proof that Bing gets all its results from Google.

    So, part of our moral discussion should at least consider Google’s failure to provide the genuine facts.

  • 15 jeremy // Feb 6, 2011 at 4:42 pm

    long enough to see some screen shots on TV, laugh about it with Colbert, and go to their computers thinking “Why bother trying Bing, it’s just Google”.

    Interesting. So what you’re saying is that this is actually a pre-emptive attack on Google’s part, to make Bing look like an also-ran Google? I.e. maybe the truth is that Bing is adding value to Google results, beyond which Google has the capability, and so Google’s reaction is to play the politics of public perception?

  • 16 jeremy // Feb 6, 2011 at 4:51 pm

    i.e. Bing is giving Google good competition, and Google is running scared? Is that your premise, Robert?

  • 17 Robert // Feb 6, 2011 at 5:04 pm

    @jeremy:

    Not exactly. I’m just saying that Google hasn’t been responsible with the facts.

    As to Google’s motivations, I’m completely in the dark, but like any mega-corporation, their ultimate aim is to increase shareholder value, which is how they measure morality.

  • 18 Everfluxx // Feb 6, 2011 at 6:25 pm

    What Bored said here.

  • 19 Daniel Tunkelang // Feb 6, 2011 at 6:35 pm

    @Bored:

    Presumably Bing’s toolbar also knows something about the user’s geography and other context. So all of this, combined with the clickstream, can be used to generate training data for ranking. But a key part of generating that training data is having the URLs that users click on — and that is what Google is providing through its search results. Yes, other search engines in the toolbar do that too, but Google is by far the most popular search engine in the markets where Bing competes. I’m not justifying Google’s claim/implication that Bing’s behavior is wrong — simply trying to characterize what is going on. And I’ve made it clear that I see this as users providing the data to Bing, even if they are doing so inadvertently. I’m not sure which part of my analysis, if any, you are disagreeing with.

    @Robert:

    Fair point: Google may have won this PR war in the eyes of the general public, regardless of how it fares in the tech blogosphere. And its disclosure has been selective, as is Bing’s. I think both could be more forthcoming — as I’ve tried to make clear in earlier comments. Specifically, Google could talk more about the evidence that inspired the honeypot experiment in the first place.

    But I think your expectations about what it means to be “responsible with the facts are a bit extreme. Google has made its points in writing and in public video — with concrete evidence and clear explanations. I don’t think it’s fair to hold Google for “standing by” while the media reports the story with less than perfect fidelity. Google is responsible for its own words, not those of others. Conversely, I don’t blame Google’s competitors for standing by when the media reports exaggerations or outright falsehoods at Google’s expense. I believe in a free press, warts and all.

  • 20 Robert // Feb 6, 2011 at 6:46 pm

    @Daniel: thanks for the response. Perhaps where we differ it that I feel Google, having made the accusations public in the first place, bears some responsibility to see that those accusations are understood by the public. Are they duty bound to do this? No; I simply expect better of them.

  • 21 Jeff // Feb 6, 2011 at 7:11 pm

    I am going to make a guarantee that microsoft will be forced to stop doing this. How? Spammers. let’s say I pay $100 for 100 people in India to change their hosts file to point to my server and hit my server that serves up google results for my domains, using the “google.com” domain. While they have the microsoft tool bar/suggested sites. Now I’m not encouraging anyone to do this, but you can make a ton of money doing this. Google did this with 20 people. Think hundreds.

    So these people in India then click on links my server responds. See, I have a feeling that bing is weighting google results really high. Notice the PR about bing search quality? They’re very proud of it, the question we should ask ourselves is.. how did it get this way? Is microsoft taking in a ton of google clicks and creating bing from that? Could very well be. And if so, spammers are about to make a killing until Microsoft disables this “signal”.

    I say “signal” because this should in no way influence search results, it shouldn’t be a signal. But it doesn’t matter what I think because the spammers are really busy right now, and in the end they will make sure microsoft does the right thing. Expect bing search quality to drop real soon.

  • 22 Daniel Tunkelang // Feb 6, 2011 at 7:19 pm

    @Jeff:

    I give Bing’s engineers more credit than that. Clickthrough is not the only measure that search engines use for evaluation, and search engines have been aware of the potential for click fraud / abuse for a long time. If the signal loses value, I’m sure Bing will drop it.

  • 23 Robert // Feb 6, 2011 at 7:25 pm

    @Jeff: cool idea, but I wonder how many link-clickers it would take to actually influence rankings from real-world search terms. I’d guess a lot more than 100. Something tells me Google already has a good estimate. Do you really think 20 engineers spent all that time searching only on nonexistent search terms? And do you really think that, being engineers, they actually clicked with mice? I doubt that Google would hire an engineer who couldn’t code a customized client in a half hour.

  • 24 Jeff // Feb 6, 2011 at 7:26 pm

    @Daniel Tunkelang

    Not all search engines do this, Google claims they do not consider click throughs via search engines as a valid signal that influences search results. Microsoft is more or less parsing clicks on google search results to grab keywords, and associating them with search results. That is something entirely different and relatively unique to bing it seems.

    I don’t think it’s a question of “if” this signal loses value, it will. There are intelligent spammers on top of this as we speak, and it will go down in value, there’s too much money to be made and it’s way too easy. It would be interesting to measure bing’s search result quality changes in the next few months.

  • 25 Robert // Feb 6, 2011 at 7:31 pm

    To build on comments from @Bored, and after prompting from @Jeremy: Bing’s use of clickstream data has something in common with Google’s use of hyperlink data, and maybe that’s why Googlers are alarmed.

    As far as I know, Google was the first general-purpose search engine to extract and make good use of the information contained in hyperlinks. A page’s inbound links represent human decisions: they indicate the regard that web page authors have for other web pages. Google succeeded wildly by recognizing the importance of these human decisions. Maybe not so much anymore, in view of SEO, content farms and link spam, but quite true 10 years ago.

    I should note that this wasn’t a huge innovation: As Page and Brin themselves mentioned in an early publication, they were inspired by citation analysis of the scientific literature, pioneered in the 1950s by Eugene Garfield at the Institute for Scientific Information. Garfield used citations, the hyperlink of his day, in his successful citation index reference books and, starting in the 1980s, end-user search engines such as the Scientific Citation Index.

    Fast-forward to 2011. Google has noticed that Bing extracts information from another kind of human decision: the best-looking item in a page full of search results. Again, this isn’t a new idea: Information retrieval geeks have been discussing this for decades, using terms like “implicit relevance feedback”. As @Bored notes, this feedback is noisy, depending as it does on the user’s unfathomable intentions. But some intentions are shared by many users, so relevance feedback has been found to be useful as one part of the entire stew of data. It certainly adds flavor that few other ingredients can. Like an inbound link, relevance feedback adds a human touch that’s missing in a generic text index.

    It would amaze me if Google didn’t already collect this kind of feedback from its own users. They are certainly able to do so, because nowadays all end-user clicks are redirected through Google itself. As the search market-leader, and with access to ALL of its users’ clicks, they have access to a trove of relevance feedback data.

    However, Bing might be the first search application that infers relevance from end-user decisions made on OTHER search engines. Surely they can apply this capability to any other search engine that Bing’s toolbar users might go to, and whose links can be parsed. And perhaps this is why the Googlers are getting their knickers in a knot: Bing has recognized the value of implicit relevance feedback, and is collecting it from (some) users on Google and other engines (including, no doubt, Bing’s own results). Perhaps Google was also planning to do this with its own toolbar, but knew they would eventually be discovered and hammered for it. So they decided to hammer Bing, like a kid squealing on a sister who snuck the last cookie out of the jar before he could.

    Who knows. This is all idle speculation on my part.

  • 26 Jeff // Feb 6, 2011 at 7:33 pm

    @Robert

    I have no idea if these Google engineers automated their clicks. I am sure spammers are testing the waters now. I can think of a few ways to automate this process for search terms which get a lot of queries. This is going to be a cat and mouse game between bing and spammers. In the end I think this form of a signal is fundamentally flawed and it can always be gamed. What if someone creates a bot that takes control of your browser and submits this? Then it’s game over. Hopefully bing is correct in saying this is a small signal, I am pretty sure we will find out soon enough if this is true.

  • 27 Hao Wooi Lim // Feb 7, 2011 at 3:23 am

    I have nothing against Bing but I do think that they need to be more forthcoming about what they’re doing after user installed the Bing toolbar. Yes, sure, the user has agreed to let Bing collects anonymous data to help improve Bing services. But I believe most users do not realized that Bing are collecting click though data when they’re doing searches on Google. They may know that Bing is collecting some kind of data, most probably if they are searching on Bing or through the Bing toolbar. Google on the other hand, should not accuse Bing of not innovating. Speaking of collecting click through data, I believe Google itself is doing it. However, I’m not sure if Chrome is also collecting click through data if the default search engine is set to “Bing”. If Google did, then did Google use click through data from Bing? Google claimed they did not. So I guess it the real question is if it is ethical to use click through data generated from a competitor’s search engine to improve one’s own search engine?

  • 28 Bored // Feb 7, 2011 at 11:01 am

    @Hao Wooi Lim
    “So I guess it the real question is if it is ethical to use click through data generated from a competitor’s search engine to improve one’s own search engine?”

    I have already commented on this.
    Is it ethical or not is axiomatic question depending on how we see the owner ship of the data.

    There are two possibilities, 1) Google owns its ClickData 2) User owns its ClickData.

    In either case, is it ethical to use someone’s else clickdata to rank results.

    Yes, Google only started this trend. I will quote Robert’s comment :

    “To build on comments from @Bored, and after prompting from @Jeremy: Bing’s use of clickstream data has something in common with Google’s use of hyperlink data, and maybe that’s why Googlers are alarmed.

    As far as I know, Google was the first general-purpose search engine to extract and make good use of the information contained in hyperlinks. A page’s inbound links represent human decisions: they indicate the regard that web page authors have for other web pages. Google succeeded wildly by recognizing the importance of these human decisions. Maybe not so much anymore, in view of SEO, content farms and link spam, but quite true 10 years ago.”

    @Daniel :
    I think we are on same page with ethical-ness of using clickstream and Bing usage of ClickStream.

    Now the point is, Did Google did there homework before going public? Did they stand by and let media took over and blame Bing?

    No, the are incomplete in there accusations.

    None of there experiments prove that Bing uses Google as a special signal. Those are so gibberish queries, that only clickstream signal had a non zero score. More over it was clicked only on Google, so only Google’s clickstream mattered. No where it signifies that Bing uses Google’s clickstream data in special way.

    I believe, you would agree, a monster search engine should have done a full fledged experiment before going public with such half baked result. Unless, they have planted a really huge PR coup to either get away from hot topic of spam on Google or to hamper Bing.

  • 29 Daniel Tunkelang // Feb 7, 2011 at 12:52 pm

    As I see it, the key questions are:

    1) Who owns the user’s interaction data: the user or the search engine?

    I’d say the user at least has full rights to his or her side of the conversation.

    2) Does the user have the right to exercise ultimate control over this data?

    Yes. Moreover, Google explicitly says so in a blog post.

    3) Did Bing sufficiently disclose to users how their interaction data would be collected and used?

    Not sure — that’s a judgment call.

    4) Would informed users be less likely to give Bing permission to collect and use their interaction data?

    Hard to know. I suspect users might say not because privacy concerns, but not out of concern for improperly favoring Bing in its rivalry with Google.

    5) Assuming that users are informed, is Bing’s targeting of users’ interaction data with a competitor wrong?

    At last a moral question! I’d say no, but there’s certainly room to debate this one. Perhaps it’s a bit unsportsmanlike, but I don’t think it’s unethical–and, while I am not a lawyer, I don’t think it’s illegal.

  • 30 Robert // Feb 7, 2011 at 1:23 pm

    @Bored: Very thoughtful comment. I especially liked your suggestion that Google should have done a full experiment. I’d bet dollars to donuts that they did a lot more testing than they are owning up to, but they cherry-picked the most (only?) incriminating results.

    A test using genuine but very rarely used words might add some perspective, by assessing whether the strength of Google clickstream “signal” is at all significant when it’s not the only information available about a term.

    A sneaking suspicion tells me that the Google kids tried this kind of test, but we’re not hearing about it because it didn’t help them advance their cause.

    @Daniel Tunkelang: I’m with you on the user owning his or her side of the conversation, even if that side includes hints such as the Referrer. There’s much less reason to support a user’s ownership of a SERP, or their right to transfer it to a third party.

  • 31 Hao Wooi Lim // Feb 7, 2011 at 3:52 pm

    @Bored
    If the user is to truly owns the click through data, right now we do not have the tools or means to truly do so. Ideally there should be a list of web-accessible list of data that I’m “sharing” it with Bing or Google and that I can revoke access permission to the data at any time if I want to.

    The other thing is if a search engine should reveals to its users if a particular result’s ranking was attributed to which search engine.

    I think this all boils down to how transparent a search engine should be. If Bing had been more transparent with regards to how they use the data they collected, and how they rank the stories, then this saga won’t have happened the way it did.

  • 32 jeremy // Feb 7, 2011 at 3:55 pm

    There’s much less reason to support a user’s ownership of a SERP, or their right to transfer it to a third party.

    I strongly, strongly disagree.

    There is just as much user data wrapped up in the fact that I didn’t click something, as in the fact that I did click something. There is just as much implicit non-relevance in a non-click as there is implicit relevance in a click. Moreover, a click in large part derives its meaning from context, i.e. in relation to what was around it, that was not clicked.

    I as a user create that interaction. I create my clicks and my non-clicks. Therefore, I own that.

    If the sum of my clicks and not clicks happen to also be capable of re-creating a SERP, then so be it. That doesn’t change the fact that the user still owns his/her actions — even if that action is an explicit decision to not click.

    And as the owner, the user can transfer that to whomever he or she wishes.

    (More about the topic in the following link. But, I warn you, it’s a long-ish rant. However, it does go into more detail about not only why the user owns his/her clicks and non-clicks but also about why both the clicks and non-clicks are of value to a user: http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/)

  • 33 PasserBy // Feb 7, 2011 at 4:19 pm

    Let’s consider this in terms of intellectual property. When Bing uses clicks on the Google search it clearly uses Google’s intellectual property. Users just refine what Google offers them. So in a click there is a Google’s contribution and a user’s contribution. I would say that user’s contribution is fairly small compared to Google’s.

    So in my opinion Bing is stealing.

  • 34 Robert // Feb 7, 2011 at 5:44 pm

    @jeremy:

    My browser kept the “)” in your link; this one should work better:

    http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/

    Maybe your post makes this clear (I’ll read it later, I promise!), but your comment here on thenoisychannel could be used to support copying anything that appears on your browser, with the possible exception of copyrighted content. (Is Google’s SERP copyrighted? I imagine that its value-added components, including rank order, might be considered as IP, but I don’t know).

    Just my opinion, but I think the question of clickstream ownership is big enough for now, without getting into ownership of the SERP.

    Having said that, I was very surprised to read this post from a Bing developer (speaking for himself), implying that Bind indeed does make use of the unselected links on the SERP:

    http://willwhim.wordpress.com/2011/02/04/is-bing-cheating-at-search-redux-its-all-in-the-clicks/

    After that, I’m starting (just starting) to appreciate Google’s position. But more importantly, I’m less confident that either side will ever disclose all the pertinent facts in this mess. Maybe they’re both evil.

  • 35 jeremy // Feb 7, 2011 at 6:21 pm

    @Robert,

    No need to promise to read my post. It’s overly verbose, and I only mean to suggest, not inflict :-)

    but your comment here on thenoisychannel could be used to support copying anything that appears on your browser

    Actually, let me help you bypass my post completely, and point to you three other (shorter) posts that I link to:

    http://blog.jonudell.net/2009/10/08/magic-glasses-and-magic-projectors-private-versus-public-augmentation-of-experience/

    http://www.windley.com/archives/2009/09/claiming_my_right_to_a_purposecentric_web_sidewiki.shtml

    http://www.windley.com/archives/2009/10/its_my_browser_and_ill_autoclick_if_i_want_to.shtml

    All three posts address different aspects of the same fundamental question:

    do people have the right to control how Web content is displayed in their browser?

    That’s one of the larger issues here, and I think it is a fascinating question.

  • 36 Dave // Feb 7, 2011 at 6:55 pm

    Daniel: Thank you for generating comments on this topic; in the context of being an ex Google employee and no longer being bound to maintain a tight and restricted commentary on actions by Google. That is refreshing.

    I was not at all bothered by Bing’s actions, or swayed by Google’s comments. Google personnel were decidedly public on this issue. On endless other issues they never say a peep. (I spend a lot of time on Local Issues–where Google’s silence is extraordinary) It was only on this issue, wherein they opted to score points against their rival that they decided to become very “public friendly”.

    On search terms with even modest search volume the two engines provide similar but different results.

    On a variety of search terms with very scant activity I’ve seen all three engines (when Yahoo had its own engine) show the same term at the top of the serps based on the most minimal of anchor text. Results under the top ranked site were different….but clearly if a single anchor text link (from a relatively weak web page, no less) could impact Serps in all 3 engines…it speaks to algo’s that are “stretched” to look for signals –in the absense of signals.

    The Bing results referenced by Google were obscure–even made up phrases. There were no signals.

    Correct me if I’m wrong, but at least in local/maps algo patent language were there not references to click through volume as a “signal” that could impact serps?

  • 37 Daniel Tunkelang // Feb 7, 2011 at 10:36 pm

    I’m not a lawyer, but even if search engine result pages are in general covered by copyright law, there would surely be a fair use exemption. The terms of service may be more strict than copyright, but I don’t know if they are enforceable. My own opinion is that mirroring Google without permission is clearly not ok, while sharing small amounts of information from your personal use of it is ok. But there’s surely some gray in between.

    Dave: can’t comment on how Google does ranking (NDA still applies), but it’s no secret that Google could do a lot more on the public relations front.

  • 38 Aaswath // Feb 8, 2011 at 1:08 am

    I think my NDA on the other side of things has long since expired :) (and so has the currency of my knowledge of their ranking systems), but here’s my take:

    Bing’s use of this toolbar information seems OK to me. I know Google’s toolbar collects a lot of information as well (whether about clicks on competitors’ SERPs is a good question, but I’m not sure why one wouldn’t). I assume Google would use this data for competitive intelligence/ benchmarking at the very least. I suppose the issue is that this appears to be directly plugged in to Bing’s ranker, though frankly, I can think of ways it could arise without a ‘direct’ connection. Here’s a thought experiment: Say Google discovers through Google Toolbar click data that on Bing, users were clicking on the top result for a query more than they were in Google for the same query. This leads to some development and investigation, and perhaps the algorithm is changed to correct for this. That seems fine to me, and I’m not sure there’s a big moral difference here between that and what Bing is apparently doing.

    Moreover it would surprise me if this particular ‘signal’ is a strong factor in ranking for non-synthetic or non-ultra-tail queries. For one, if spammers have figured it out, I’d expect Bing to have corrected for the signal mattering! ;) Way too easily gamed! For these synthetic ones I’m guessing they’re showing up because Bing thinks this one little scrap of information it knows is better than showing nothing for a query it’s never seen before. For other more ‘normal’ queries, it would be indeed be difficult to conclusively show that this signal is that significant a factor, because like they said, it’s one of 1000s. Also, there’s no evidence to me that this actually directly and *non-trivially helps* Bing in its search quality. If anything, Google has shown that it is very gameable.

    Overall, I suspect that instead of being that directly useful for ranking, this toolbar data is far more useful to Bing for analytics and competitive intelligence. Being able to satisfy users on random tail queries is nice, but only one small part when thinking of your overall search quality. (Some may disagree on this). Knowing how they’re interacting with, and using your competitor’s SERPs is far more interesting and useful; and something I suspect both sides are interested in and pursuing.

    [PS: If I might take issue with Matt Cutts’ reference to that paper: MSR papers on something don’t mean it’s actually being implemented in the search engine. They’re experiments and research after all, and hey at least they’re published openly ;)]

  • 39 Daniel Tunkelang // Feb 8, 2011 at 12:18 pm

    Aaswath, thanks for the informed commentary! Definitely need to catch up soon — been far too long!

  • 40 Daniel Tunkelang // Feb 8, 2011 at 12:21 pm

    Further reading on this debate that I recommend:

    Greg Linden: Google, Bing, and web browsing data

    John Langford: User preferences for search engines

  • 41 jeremy // Feb 8, 2011 at 2:37 pm

    At the risk of too much self-promotion, let me also point to another post that I wrote in Dec. 2009 that mirrors what Langford is saying, above:

    http://irgupf.com/2009/12/23/a-fragile-local-maximum-for-the-web/

  • 42 Zubair // Feb 8, 2011 at 2:38 pm

    Nice post and lively debate. IMO tough for google to have it both ways on this – and I side with the open-ness and encourage Bing along. Happy for the competition.

  • 43 Jeremy Hoffman // Feb 9, 2011 at 5:27 am

    I’m a search quality engineer at Google. I don’t speak for Google, but here’s my personal perspective. (Two personal notes: 1. I had a great internship at Microsoft search in 2007 and I think they’re good people. 2. I had absolutely nothing to do with Google’s “sting” operation.)

    Consider the pharmaceutical industry, where bringing a new product to market requires a large sunk cost in R&D. If patents of drug formulas were eliminated, no one could recoup the R&D costs, so they would have to stop investing in developing new drugs, which would hurt the people who need those drugs.

    Now imagine this scenario: I spend six months developing, testing, and launching a new ranking signal that brings up some great new results for a whole bunch of queries. Three weeks after I finally launch, my great new results start appearing as the top Bing results for those queries, *automatically*. What did I do all that work for, then? How are my manager and I going to justify working on a similar project in the future? Software engineers ain’t cheap.

    Think that’s far-fetched? This is basically what happened to Google’s spelling team with their impressive correction from “torsorophy” to ”tarsorrhaphy.” And since Google’s announcement, Bing has been unapologetic adamant that they will continue using click data in this way, so there’s no reason to think that they won’t try to do more of it in the future.

    If competitors can’t differentiate their products on quality, it removes the incentive to improve quality. Isn’t that bad for users in the long run?

  • 44 jeremy // Feb 9, 2011 at 10:35 am

    @JeremyH: Let me say that your argument sounds like the same argument that copyright holders (musicians, movie producers, etc.) were making when Google went out and bought YouTube.

    A musician spends months writing and then producing his or her songs. Three weeks after the album is released, those great new songs start appearing in Google’s YouTube *automatically*. What did the musician do all the work for, then? How is the dude and his band going to justify working on another album in the future? Writing and producing music ain’t cheap.

    Think that’s far fetched? This is basically what happens with YouTube.

    The arguments that I heard back from Google were that by exposing more people to the music, overall demand would increase, and musicians would sell more live seats at their concerts.

    Now, see my comment #6, above.

    Bing can’t copy the advertising that goes with the Google search results, because that requires the additional step of the advertisers themselves pulling out their dollars from Google and going to Bing. So if you think about it, advertising links on Google constitute the “live performance”, by analogy, to the ripped off YouTube musician.

    So given that Google claims that advertising is fantastically, wonderfully relevant to the user.. that it’s so helpful that Google refuses to make it opt-in, and refuses to provide an official opt-out.. then my argument is that Google should compete on “live performance”, i.e. advertising. That’s the way Google can truly make itself differentiable.

    And if Google truly believes advertising is good for the users, then yes, that is good for the users in the long run.

  • 45 jeremy // Feb 9, 2011 at 10:37 am

    (Please note, too, that in this analogy, songs are appearing in YouTube not because of what Google is explicitly trying to rip off, but because of what the users themselves are doing. Same thing with Bing<-Google results. Bing isn't actively going out and scraping/crawling Google SERPs. The users are, again, the ones doing it.

  • 46 Daniel Tunkelang // Feb 9, 2011 at 12:48 pm

    @JeremyH: first off, thank you for taking a personal stand here. It means a lot.

    That said, I differ with your position. As a general rule, innovation is only a competitive advantage for the short term. Competitors learn and catch up. In some industries, patents offer a trade-off: tell the world how you did it, and you get a legal monopoly on your methods for the duration of the patent. But in this case, Google is protecting its IP using trade secrets, and there is no protection once information protected as trade secret is uncovered by others through reverse engineering.

    Of course reverse engineering may be unethical even if it is legal. If Bing were simply scraping SERPs to reverse engineer Google’s ranking algorithms, I’d call that unethical. Even if Bing were encouraging users to do that, I’d be pretty queasy — although I’d want to be clear that any accusations of apply to users as much as to Bing.

    But what’s happening here is more subtle. Users are telling Bing what they click on. Sure, they are clicking on links provided in Google’s SERPs, but surely users own their click data. I have a really hard time seeing users as doing something wrong here.

    Which leads me to conclude that the strongest statement Google could make against Bing would be something like the following:

    We would not object to what Bing was doing with toolbar data if Bing were to clearly disclose to users how the data is being used. Our users have the right to do as they please with their usage data, including giving it to Bing, but their consent should be an informed one. We still believe that targeting the usage data associated with a competitor is unsportsmanlike, but we recognize the right of our users to exercise ultimate control over their information, even if that means helping our competitors.

    I’d love to see Google take this higher ground.

  • 47 Al Miller // Feb 10, 2011 at 1:45 pm

    I still have faith that google will win

  • 48 jeremy // Feb 10, 2011 at 4:13 pm

    @Al: And I have faith that the user will win ;-)

  • 49 Life’s a Beach // Feb 14, 2011 at 8:48 pm

    […] to Punta Cana for a week. Feel free to keep writing great comments — will catch up when I get back! If you enjoyed this post, make sure you subscribe to my RSS […]

  • 50 Google Said, Bing Said | Ethics in the News // Mar 3, 2011 at 2:33 pm

    […] http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/ […]

  • 51 Corey K Katir // Jun 11, 2011 at 1:10 pm

    Is Bing A Better Search Engine?

    We have created a logical test that shows which search engine provides better search results. Google or Bing? I will explain the test on this page.

    First, I would like to make the test concept more clear with several examples:

    Say we take a series of Titles to search on Google and Bing for comparison.

    Here are several example: (all the tests are at http://www.rssfeedrss.com/index2.html)

    Title 1) Patients are willing to undergo multiple tests for new cancer treatments

    http://www.rssfeedrss.com/test2.html

    Title 2) Conference on composite materials for structural performance: Towards higher limits

    http://www.rssfeedrss.com/test4.html

    Now, I explain the way this test works.
    Each title is about two or three main keywords.
    For example Title 1 is about cancer treatment.
    Title 2 is about composite material.

    I propose a logical test that uses Google, and also Bing search results that extracts the main keywords in a logical manner. The better search engine will provide a better and more relevant extraction based on this logical test. I like to emphasize logic.

    Now what is this logical test?
    The better search engine provides search results that contain higher number of main keywords in the search page results (usually in bold).

    For example, if we take title 1 to either Google or Bing and make a search on the whole title and then count the number of times the main keywords appear in the search results (usually in Bold), the better search engine will give us cancer treatment and not other words. That means if you count the number of times the keywords cancer treatment appear in search results in both Google and Bing, Bing provides a higher quantity.

    I used both Google and Bing for the test on the page http://www.rssfeedrss.com/index2.html and Bing provided a better search. You can do this test in-house.

    I will propose this test in search engine conferences. It is a valid test.
    I can email you the perl file that performed the test. Call 949-500-8638 or email info@katir.com.

    In fact, if you continue the test to second page results, it also shows which search engine provides better search results for the second page or third page or….

    Why is this test valid?

    It is not very complex to prove why this test is valid. If you type a sentence that contains several main keywords, you prefer more information about those main keywords. The higher quantity of those main keywords prove the page is more relevant and the search engine has delivered more relevant results.

  • 52 Strata 2012: Big Data is Bigger than Ever! // Mar 2, 2012 at 12:58 am

    […] to address my charge that Google doesn’t think users own their search history (cf. “Google vs. Bing: A Tweetle Beetle Battle Muddle“), but she said she was unfamiliar with the details of that event. I do wish that someone at […]

Clicky Web Analytics