The Noisy Channel

 

Memo to Steve Ballmer: Just Ask Them!

March 20th, 2009 · 62 Comments · General

Dina Bass from Bloomberg reports today that Steve Ballmer is talking smack about Google:

“Google does have to be all things to all people,” Ballmer said yesterday in an interview in New York. “Our search does not need to be all things to all people.”

That is an interesting take on the search market–an attempt to turn a bug (Microsoft’s 8% market share vs. Google’s 63%–at least in the United States) into a feature. Ballmer essentially claims that Google’s near-monopoly stifles its innovation.

I agree that Google hasn’t been particularly innovative when it comes to search interfaces, but I’m not persuaded by Ballmer’s hand-waving “innovator’s dilemma” reasoning. Besides, as Chris Lake at Econsultancy notes, “it’s hard to know exactly what Microsoft is trying to be, in terms of search.”

Lake goes on to make an excellent point:

Search has always been about intent. That’s essentially what a search query is: an indicator of intent. You want something, you need something, you mean to purchase something, you’re going to do something.

But most queries do not reveal the exact nature of intent.

And here is the money shot:

The search engines might be able to determine intent automatically… But for me nothing works as well as asking the question, and seeking out some explicit data. Ask the question!

Yes, it’s like Feynman said: you just ask them! So much effort in the search industy aims at coming up with more clever ways to divine the user’s intent automatically, and so little focuses on building better tools to work *with* the user. Yes, I’m just beating the HCIR drum again–it’s what I do here. 🙂

But I can’t let a moment like this pass without pointing it out. If Microsoft wants a serious shot at Google, it should invest less in bribing users and more in HCIR.

Sadly, as Silicon Alley Insider’s Eric Krangel learned from a discussion with Microsoft Search director Stefan Weitz, “Microsoft doesn’t want to scare off users by introducing any dramatic changes to what people expect from the search engine experience.” Or, as Krangel summarized it pithily, what we can expect are “tweaks.”

I understand that Microsoft can’t ignore the fact that users have been trained on Google. But Microsoft is in a market where it has little to lose and everything to gain. This is the time to be bold, not conservative. Moreover, some the best HCIR researchers are working for their own research division! Want to build a better search engine? Just ask them!

62 responses so far ↓

  • 1 jeremy // Mar 20, 2009 at 4:50 pm

    I long-windedly talk about some research done in which the search interface “just asks” the user:

    http://irgupf.com/2009/03/19/long-term-versus-evolutionary-thinking-part-2-of-2/

    By simply having a 5-line input box, rather than a 1-line input box, and asking the user to say more, the search engine takes a step in the right direction without adding confusion/complexity.

    And I think Google really is stifled by the innovator’s dilemma in this matter.

  • 2 Daniel Tunkelang // Mar 20, 2009 at 4:56 pm

    Just read it. As I commented there, I wonder whether it’s better to elicit long queries immediately or progressively, i.e., through iterative query elaboration? I’d think the latter would not only be more efficient, but also would give users a sense of making steady progress towards their goals.

  • 3 jeremy // Mar 20, 2009 at 7:53 pm

    My comment in return (also see the blog.. this is where the blogosphere needs threading of blog comments) is that having more information to start with gives the back-end algorithm more information from which it can infer interactive elaboration suggestions.

  • 4 jeremy // Mar 20, 2009 at 8:00 pm

    It doesn’t have to be one or the other, long query vs. iteration. You can have long query and iteration. In fact, starting with a long query means you have more information to start with, which means you can probably do a better system-supported job at providing meaningful interaction and refinement.

  • 5 Daniel Tunkelang // Mar 20, 2009 at 11:37 pm

    My concern about starting with a long query is that the user may put in a lot of initial effort to get an initially disappointing result. I am curious about the point of diminishing return, or when the cost (real or perceived) of entering a longer query outweighs the benefit.

  • 6 jeremy // Mar 21, 2009 at 12:13 am

    But that’s what I’m saying.. if the user perceives no additional difficulty in entering a longer query, as it seems to appear, then we’re already starting at an even point.

    So there are three possibilities:

    (1) If both short and long queries return equally good results, there is no issue. Maybe the user gets mad at having entered an additional 2.25 query terms. But that takes all of, what, 1.4 seconds? It takes more time to read over the first result than it does to type an additional few words. Pretty soon, I bet the user won’t even notice that they’re entering more words. Over the past 5-6 years, hasn’t the average query length risen from 1.7 to 2.2 anyway? Users are already much more comfortable entering longer queries today than they were half a decade ago.

    (2) If one returns better results than the other, then the user would be more satisfied with the one that returned better results. According to Belkin, there is a higher likelihood of that query being the long query.

    (3) If both long and short queries return bad results, then the user of course isn’t going to be happy. BUT, let’s then look at the quality of the exploratory options and suggestions. My hypothesis is that the longer query is going to have a better range of suggestions, a better set of hints and pathways to lead the user out into what they’re really after, than a short query. In that case, even though the base results are bad for both long and short, the overall interaction is going to be better for the user, when issuing the long query. Is an extra 2.25 words at the outset worth having much better exploratory options in the first shot? I would argue that it is, by orders of magnitude. It saves another whole query, and having to look through two, rather than one, set of poor results.

    Maybe we should run a study. Anyone interested? 🙂

  • 7 jeremy // Mar 21, 2009 at 12:17 am

    Ok, this is just the result of 5 seconds of internet searching, but I found someone who claims that in 2007 the average web query length was up to 4 words:

    http://www.beussery.com/blog/index.php/2008/02/google-average-number-of-words-per-query-have-increased/

    Now I am 98% positive that earlier in this decade the average length was below two words. I think it was 1.7, but it was definitely below two.

    So if people have risen from 1.7 to 4 already, more than doubled their query length, it doesn’t seem all that unreasonable to have a taller input box, as Belkin studied, so that they naturally and effortlessly (without even thinking of it as extra effort) enter longer queries.

  • 8 Daniel Tunkelang // Mar 21, 2009 at 12:19 am

    We’re still talking about fairly short queries, so I’m inclined to accept your argument. In fact, it may even be less work to write a 4-word query than a 2-word one, along the lines of Blaise Pascal’s famous (if often misattributed) quotation, “I have made this letter longer than usual, only because I have not had time to make it shorter.”

    I thought we were talking about much longer queries that would take noticeably more time and effort to construct or even type.

  • 9 Daniel Tunkelang // Mar 21, 2009 at 12:20 am

    I’m also curious how type-ahead interfaces affect the equation.

  • 10 jeremy // Mar 21, 2009 at 12:21 am

    For reference: Here is a JASIS&T article from 2000 claiming an average of 1.66 words per query, on one search engine.

    http://jimjansen.tripod.com/academic/pubs/wus.html

  • 11 jeremy // Mar 21, 2009 at 12:23 am

    “I have made this letter longer than usual, only because I have not had time to make it shorter.”

    Really, was that Pascal? I always thought it was Jefferson. But yes, parsimony is difficult 🙂

    Type ahead probably helps. Though someone had to have entered that longer query to begin with. And that longer query probably yielded better results, which I’m sure is part of the reason that query string becomes a likely type-ahead expansion candidate.

    So, yeah, longer queries! 🙂

  • 12 jeremy // Mar 21, 2009 at 12:26 am

    And now imagine simply selecting a whole relevant passage of text, and using that as a query. 50 words. You’re not looking for an exact quote. You’re looking for other documents that talk about similar things that are talked about in this 50-word passage.

    It’s as easy as pie to enter that 50-word query. Ctl-C Ctl-V. And the search engine does a terrible job in utilizing it.

    There is so much richness there, in such a passage, so much “seed data” for exploratory search. So much the search engines should be doing, that they’re not.

  • 13 Daniel Tunkelang // Mar 21, 2009 at 12:45 am

    Points taken. There are real cases where it’s easy to enter a lot of initial data to work with, and it may be easiest to enter more than a couple of words even for most cases. I suspect I’m just experiencing a knee-jerk reaction against NLP query interfaces, but that’s not at all what you’re describing. More probably is better, particularly for an exploratory search engine that now has more fodder to work with.

    The crowd-sourcing aspect of type-ahead is a good point. I wonder how much search goodness can and should result from more sophisticated information seekers forging the best paths and others following them.

  • 14 Christopher // Mar 21, 2009 at 3:00 am

    If I was leading Microsoft’s search team, I would be designing on the edge, testing multiple new (or new to MS) HCIR ideas. Google is doing a fine job with the basic interface, the next leap is efficiency is going to come from a very different user experience.

    A note on type-ahead, it can be both an excellent visual cue but it can also create a pigeon hole view for the querier. It definitely has its place but must be used judiciously. As an example I make use of it in some areas in my mobile application to cut down on key strokes and surface contextual linkages.

    Type-ahead used properly can almost be seen as a discovery view within the query box if done right. It’s a balance though.

  • 15 MarkH // Mar 21, 2009 at 4:53 am

    re long vs short query:
    the effectiveness surely hinges on AND vs OR?
    Jeremy, in your paper you link to in your first comment the google “use short queries” quote is simply describing the effects of the default AND Boolean operator.
    The choice of and over or is significant here.

  • 16 jeremy // Mar 21, 2009 at 11:14 am

    @Daniel: I suspect I’m just experiencing a knee-jerk reaction against NLP query interfaces, but that’s not at all what you’re describing.

    Yup, I’m not talking about NLP at all.

    @Christopher: Type-ahead is not the final, best answer in HCIR interfaces, and I agree there is a risk of pigeon-holing. I was just agreeing with Daniel that having type-ahead probably helped/helps contribute to the fact that the average query length was 1.66 in 2000, and 4 in 2007.

    (Note, however, that Belkin et al found an average query length of 5.45 in 2003. So there are still improvements to the current search engine interfaces that can be made)

    @MarkH: Yes, Google has that AND problem. And that’s a problem. That interface.. and yes it is an interface.. because it inserts an AND in between each of your query terms.. is a problem. First, it’s a problem because it is a very subtle but important alteration to your query, and the interface does an extremely poor job of letting the user know what is actually happening (you never actually see the ANDs). Second, it’s a problem because it becomes too restrictive as users start to ask longer queries.

    I consider as fundamentally flawed any engine that gets worse as you give it more information. That’s exactly the opposite of how a search engine should work.

  • 17 MarkH // Mar 21, 2009 at 1:05 pm

    >>gets worse as you give it more information.

    With ANDs I suspect 2 words are generally better than one, but then it begins to tail off rapidly. Of course, having Google-scale content always helps improve the chances of finding at least *something* with all your search terms.

    Anyhoo, ANDs are more efficient for an engine to process than ORs and this brings me back to a point I brought up round here previously about suspecting Google are largely constrained by scaling issues – not a lack of imagination about what might be useful search features.

    With this firmly in mind, complaining about Google’s service can seem like someone standing in the food queue of an over-crowded refugee camp loudly bemoaning the lack of haute cuisine.

  • 18 Daniel Tunkelang // Mar 21, 2009 at 1:16 pm

    Actually, Google does let you OR, even if it requires extra work. Perhaps they prefer it that way for scale reasons, but I suspect it’s more because AND is more intuitive for short (i.e., 2-word) queries.

    As for “the food here is terrible…and the portions are so small,” I think folks complaining about Google would do well to offer constructive suggestions. Better yet, to build them. The problem, of course, is that it isn’t enough to offer an alternative. Google is, after all, a near-monopoly.

  • 19 Christopher Rines // Mar 21, 2009 at 6:36 pm

    @Jeremy

    Ah, good point yes, type-ahead is definetly an ingredient in larger queiries.

    We need more research into query to document store formatting combined with some advances in how the data is returned. Still think someone like MS should take a shot at something different. Why compete on the same playing field when you can define your own?

  • 20 jeremy // Mar 21, 2009 at 7:55 pm

    Anyhoo, ANDs are more efficient for an engine to process than ORs and this brings me back to a point I brought up round here previously about suspecting Google are largely constrained by scaling issues – not a lack of imagination about what might be useful search features.

    I suspect that what you suspect about it being about scaling issues is true. But that’s not what they say. They say that people don’t want to search for information in an exploratory manner. Well, that fellow from Google who commented on your blog a few weeks ago.. Daniel.. he did seem to admit that it might be a scale problem. But everyone else that I’ve heard talk about it, both publicly and privately, says things like “we checked the logs; nobody does exploratory searches. Therefore, we don’t need to solve that problem.”

    In short, their PR is that it’s not their own fault. It’s your and my fault. That doesn’t sit well with me.

    With this firmly in mind, complaining about Google’s service can seem like someone standing in the food queue of an over-crowded refugee camp loudly bemoaning the lack of haute cuisine.

    This is true.. if I were at a refugee camp. Instead, I am at a camp where they say that they have world-class services and world-class ambitions. Their goal is to organize the world information. Not just the world’s information that is AND-scalable. But all the world’s information. That includes everything from Taco Bell to haute cuisine.

    In such a camp, I find it perfectly reasonable to complain about the lack of haute cuisine.

    If they change the rhetoric, scale back the mission statement, and then I would be a fool to complain. But they don’t.

  • 21 jeremy // Mar 21, 2009 at 9:08 pm

    @Daniel: I suspect MarkH really is right; I was at the first WSDM conference, and there were a few papers about boolean ANDs and speed/efficiency. The AND nature of the queries let you take shortcuts that you probably couldn’t have taken otherwise.

    But even if AND is the correct way to start, esp for most 2-word queries, I would expect there to be a more graceful degradation.. the search engine giving you interface mechanism for intelligently backing off the AND assumption through some sort of query refinement.

    In short, what the search engine is doing with your query should be more transparent than it is now. And refinements should be simpler and more transparent, than having to type the whole query in again, using a couple of ORs here and there.

  • 22 Daniel Tunkelang // Mar 21, 2009 at 11:06 pm

    Try some OR queries on Google. They run very quickly. Remember–they’re not really returning a full set of results, so run time is hardly proportional to result set size. In fact, populating 10 arbitrary results from A OR B is easier than populating 10 arbitrary results from A AND B: you could take the top 10 results from each, dedupe, and sort. In contrast, even finding 10 results in A AND B might require much more work, especially if A is large, B is large, but A AND B is small.

    Thus I’m not persuaded that efficiency drives the choice of AND vs. OR in a web search engine that only shows 10 results, doesn’t even pretend its estimated number of results is meaningful, and doesn’t commit to a deterministic sort. I think it’s simply that AND is more intuitive for users for short queries. Or at least it is today.

  • 23 jeremy // Mar 22, 2009 at 12:10 am

    Just did an experiment:

    Query 1a: [collaborative search OR exploratory search OR retrieval OR hcir OR explanatory search]
    Time elapsed: 0.31 seconds

    Query 1b: [collaborative search exploratory search retrieval hcir explanatory search]
    Time elapsed: 0.55 seconds

    Query 2a: [birds OR bird OR zoo OR migration OR feather OR feathers]
    Time elapsed: 0.16 seconds

    Query 2b: [birds bird zoo migration feather feathers]
    Time elapsed: 0.40 seconds

    Well, I must say that I stand corrected. I apologize. The AND queries seem to take twice as long as the OR queries, not the other way around.

    What about what the Belkin paper says, though? It’s not an OR query, and not an AND query. It’s a best-match query, which is a sum. You need to sum the relative contribution of every single word/term. So it’s not as “loose” as an OR, but not as tight as an AND. You can’t automatically filter a document out from your scoring computation step, just because it does not contain one or more of your query terms. You have to let document X (which let’s say does not contain all the query terms) still rank higher than document Y (which does). Because the 4 of 6 query terms in docX might have a higher cumulative tf.idf value (or whatever your scoring function is) than the 6 of 6 terms in docY.

    So is efficiency still an issue in a best-match environment?

  • 24 Daniel Tunkelang // Mar 22, 2009 at 12:25 am

    Ah, now you’re asking harder questions. If your scoring is a black box function, then returning the best results takes time linear in the size of your filtered set. But of course it may be a black box to us, but it’s not a black box to Google. So they can arrange their indexes to minimize the number of documents they need to look at. They can also cut corners, since users can’t actually verify the scores.

    A good resource on this topic is Trevor Strohman. He’s at Google now, which means he’s probably not allowed to reveal the secrets of the Illuminati therein, but before that he wrote a dissertation entitled “Efficient Processing of Complex Features for Information Retrieval“.

  • 25 Gene Golovchinsky // Mar 22, 2009 at 1:03 am

    Clearly we have nothing better to do on Saturday night. Oh well.

    To continue Jeremy’s thread of selecting text to form queries: one of the results from my PhD thesis was that people had strong preferences in terms of how to express queries, but that my system (which used InQuery from UMass) performed equally wrt various recall and precision metrics.

    I had given users three choices: click on a link (that was created automatically, and resolved via a query), type some text into a textbox (3 lines high, I think) , or select a passage in one of the previously-retrieved documents. I did some clustering of users based on patterns of behavior, and found clusters of people who preferred to follow links or select text had higher recall scores than those who typed queries. The complete results are more complex, but this is the relevant bit. In short, more text can produce better results, but people who type their own queries may not be as effective.

  • 26 Gene Golovchinsky // Mar 22, 2009 at 1:04 am

    More info in my CHI 97 paper.

  • 27 MarkH // Mar 22, 2009 at 6:15 am

    @jeremy :

    >>But all the world’s information. That includes everything from Taco Bell to haute cuisine.

    That’s turning a comment about service quality around to a discussion about the range of topics covered.

    What I meant by my “refugee camp” analogy is that Google is the equivalent of mass-catering. The volume of customers in this environment dictates that you cannot spend huge amounts of time servicing each customer. Fancier menu options will be less well understood by the average customer and take way too long to cook up.
    What seems to be missing from much of the criticism here about “the Google experience” is a basic acknowledgement of the mass-catering restrictions they work with. It’s hard to say exactly how much this restricts their thinking but at least consider this state of affairs in your criticism of them (e.g. tell me how your proposed improvement can be achieved *without* doubling Google’s existing compute costs of ~200ms per request).

  • 28 Daniel Tunkelang // Mar 22, 2009 at 11:15 am

    Mark, I actually am inclined to agree with you here. In fact, when I compared Google to McDonald’s in my reconsidering relevance tech talk, the Googlers didn’t even seem all that offended. Perhaps they embrace their place in the ecosystem.

    But I don’t want is a world where, like in Demolition Man, every restaurant is a Taco Bell. I don’t mind Google as a mass caterer; I mind that everyone else thinks they should also be in the fast food business.

  • 29 jeremy // Mar 22, 2009 at 12:21 pm

    @Gene: “In short, more text can produce better results, but people who type their own queries may not be as effective.

    Sounds like the perfect environment into which one should introduce exploratory search. 🙂

    @MarkH: “That’s turning a comment about service quality around to a discussion about the range of topics covered.

    No, this is still about service quality, not about topics. You were the one who brought up haute cuisine as a service-quality analogy. I merely ran with it.

    What seems to be missing from much of the criticism here about “the Google experience” is a basic acknowledgement of the mass-catering restrictions they work with

    No, this is exactly what is not missing. This is exactly the point I was speaking to, with haute cuisine. Yes, most of the people most of the time, are going to want Happy Meal #3, i.e. the mass-produced experience.

    However, some of the people, some of the time, are going to want to be able to look deeper, to get more interesting (less popular results) and are going to want to engage with the information a little bit more. What I am saying is that there should be a way of me telling Google, “On these next few queries that I’m going to do, I don’t care if it takes ~500ms, or even 5 seconds, to get an answer back. Because it’s worth it to me to get a better answer, and I am willing to wait a little longer to get that answer.”

    Currently, there is no way for me to do that.

    That is the haute cuisine — the ability to also order slow food. Sure, make the default the fast, mass food. But if you say your goal is to organize the world’s information, i.e. “all the world’s food” (by analogy) then that should include slow food/information as well as fast food/information.

    So I also do not have a problem with Google being a McDonald’s, if that’s what they want to be. However, I do have a problem with them claiming to deal with all the world’s food, when they clearly only serve burgers and fries. Oh, and those horrible filet of fish things. Who even orders that?!

    Do you see what I mean?

    @Daniel: I also agree that an annoying byproduct of all this is that now everyone else is trying to be McDonald’s, too. So that no one else wants to be a “Zagat-rated” search engine, either.

  • 30 Daniel Tunkelang // Mar 22, 2009 at 12:55 pm

    Oh, and those horrible filet of fish things. Who even orders that?!

    It’s Lent, they’re 2 for $3, and there’s a recession on. Do you want fries with that?

  • 31 MarkH // Mar 22, 2009 at 2:28 pm

    >>I don’t care if it takes ~500ms, or even 5 seconds, to get an answer back.

    But Google (and any one else running data-centres with hundreds of thousands of servers and billions of users) really, really does care about this number.

    No special orders, all ingredients flash-fried in an instant and no friendly waiter service. Welcome to mass catering, please take a tray and have your money ready.
    🙂

  • 32 jeremy // Mar 22, 2009 at 7:57 pm

    @MarkH: Why does Google care?

    Do they (1) care because they think the user cares?
    or
    Do they (2) care because they can’t afford the processing power to handle the 5 second query?

    If the answer is (1), well, then they’re wrong. This user (me) cares about better information more than he cares about having to wait a few seconds. In fact, I consider it more of a waste of my time to having type in 2, 3, 4, multiple queries, over and over, because Google sucked the first time around. That takes me more overall/total time than if I had only had to wait 5 seconds initially. And Google is supposed to be serving the user. So they should care about my information need, but they don’t.

    If the answer is (2), well, then they can be as McDonalds as they want to be. That’s completely their right as a corporation. But what they need to stop doing is portraying themselves as anything but a McDonalds. They are not organizing the world’s information (e.g. cooking all the world’s food). They are only organizing that small portion of the world’s information (only those few dishes) that are flash-friable and able to be in an instant. Which again they have every right to do. What they have no right to do is misrepresent themselves about that it is that they do. That “all the world’s information misrepresentation has got to stop.

    Daniel’s right; if Google is unwilling to do it, then someone else needs to step up and offer haute cuisine information retrieval for the web. But in the meantime, Google needs to stop saying that they’re doing it. Their mission statement needs to change to “To organize some of the world’s information, and make only that portion of it accessible that can be discovered within half a second.”

    Right?

  • 33 Daniel Tunkelang // Mar 22, 2009 at 8:03 pm

    For what it’s worth, McDonald’s is now promoting the quality of its ingredients.

    And I think Google believes that the user does care. Take a look at this page:


    Every millisecond counts.

    Nothing is more valuable than people’s time. Google pages load quickly, thanks to slim code and carefully selected image files. The most essential features and text are placed in the easiest-to-find locations. Unnecessary clicks, typing, steps, and other actions are eliminated. Google products ask for information only once and include smart defaults. Tasks are streamlined.

    Speed is a boon to users. It is also a competitive advantage that Google doesn’t sacrifice without good reason.

  • 34 Christopher // Mar 22, 2009 at 8:19 pm

    What this discussion still highlights for me is the need for one of Google’s consumer search competitors or a new player in the general consumer search space to take a hard left and/or right turn and release search interface(s) that are radically different to Google’s. Bring on experimentation.

    In a nutshell Microsoft, et ál won’t beat Google or improve the users life by subtle changes to a Google style query box & result set.

    While I don’t want to discount user knowledge they simply don’t know they need something different.

    Google works for Google and many/most of its customers and that won’t change until someone releases a revolutionary change not an evolutionary improvement.

    I’d love to see what companies like Endeca or university/lab/independent research teams could do in the areas of user experience by connecting their HCIR advances pointed against an API exposed search back-end like Yahoo’s.

  • 35 Christopher // Mar 22, 2009 at 8:26 pm

    Follow-up to my last comment (34).

    While companies or individuals like myself are working on radically improving domain specific HCIR or working on specific indexing issues (again like my work on semantic data like people, places, events) what we really need is companies or labs that can afford to spend resources & eat failed experiments in the open search domain which is much more difficult.

  • 36 Daniel Tunkelang // Mar 22, 2009 at 8:49 pm

    The subject has come up among my colleagues at Endeca, but the history of companies trying to do both enterprise search and web search is not a pretty one. Yes, Google and Microsoft both have presences in both areas, but in both cases I think it’s as if their enterprise search divisions are separate companies.

    So I think the best options for an HCIR open web search play are either for a major company already significantly invested in web search to pursue it (i.e., Yahoo or Microsoft, conceivably Ask), or for a startup like Kosmix. It’s a major challenge either way, but better than trying to be an also-ran as Google cleans up on its near-monopoly.

  • 37 Christopher // Mar 22, 2009 at 8:55 pm

    Agreed, it’s hard to split focus between enterprise & consumer search. I’m not a real fan of either Google’s or Microsoft’s enterprise offering so it does show even the big guys struggle with it.

    Someone (like those you mention) has to take the shot though & really do something different.

  • 38 jeremy // Mar 22, 2009 at 10:56 pm

    @Daniel: And I think Google believes that the user does care. Take a look at this page: Every millisecond counts. Nothing is more valuable than people’s time. Google pages load quickly, thanks to slim code and carefully selected image files.

    Ever heard the expression “penny-wise, pound foolish?”

    Google totally cares about speed and user time, on the penny level. 0.33 seconds is better than 0.5 seconds is better than 2 seconds.

    But if a user has to iterate 5 times on Google to finally get what they want, because Google doesn’t offer any exploratory IR user interfaces or algorithmic support, then the “pound” is so heavy that it outweighs all the penny gains.

    Let me put it another way: Using Google, suppose I have to run 5 two-word queries to get what I want. What is the Total Cost (capital letters) of that whole search task?

    There are three parts to that total time cost: query typing time, engine results return time, and user evaluation of results time.

    Let’s assume it takes 1 second per query word to type. Let’s also assume that it takes 2 seconds per query result (ranked list) to read/evaluate, and that Google returns an answer within 0.2 seconds.

    1 second/typedTerm * 2 typedTerms/query * 5 queries
    +
    2 examining seconds/result * 10 results/query * 5 queries
    +
    0.2 searching seconds/query * 5 queries
    =
    111 seconds total

    Now, contrast that with an exploratory engine in which the user types in more query terms, and it takes longer to get an answer back.. let’s say it takes the search engine 5 seconds per query. And now, let’s suppose the user enters 5 words per query, rather than 2. But that because the “slower” gourmet search engine returns better results, the user only has to run 3 queries rather than 5 queries. The math works out as:

    1 second/typedTerm * 5 typedTerms/query * 3 queries
    +
    2 examining seconds/result * 10 results/query * 3 queries
    +
    5 searching seconds/query * 3 queries
    =
    90 seconds total

    So even though the “non-Googly” gourmet search engine took 5 seconds rather than 0.2 seconds to respond, per query, the total search task time was 21 seconds shorter when using that engine!

    In fact, in the former, Google-like engine, the total algorithmic search time across 5 queries was only 1 second, as opposed to 15 seconds spend by the gourmet search engine. Big savings, right? 15x speedup? But this is being penny wise, because the total cost of the user, in having to run more queries, even shorter queries, plus the cost of having to read the top 10 results from each query, to see if they’re getting closer to what they’re after, is the pound. And if you’re foolish about the pound, then it doesn’t matter how many pennies you’ve saved.

    Again, this is only true of certain types of queries, i.e. informational queries. Certainly if you’re doing navigational queries you want to use the first, 0.2 second search engine. Because those usually involve only a single query and 1-3 top results examined. So in that case, there is less of a gap between the penny and the pound; slight differences in penny-speed matter.

    But for gourmet queries, Google is very penny wise, but pound foolish.

    So I don’t think that Google cares about every single user, because they’ve optimized their engine toward one kind of user, and against another kind of user. They’ve shown, through pound foolishness, that they don’t care about gourmet, deeper need users.

  • 39 Daniel Tunkelang // Mar 22, 2009 at 11:06 pm

    Like you, I don’t agree with Google’s approach. My point is that I think they are, in their view, putting users first–at least when it comes to emphasizing processing speed. Let’s not get started on advertising in this thread. 🙂

  • 40 jeremy // Mar 22, 2009 at 11:21 pm

    Oh, yeah, talking about advertising in this context would lead to another hundred comments 🙂 Totally exciting to a search geek. Totally boring to everyone else 🙂

    I think we need to throw the baby, the bathwater, and maybe even the nursemaid out!

    I kid, you realize. I kid 🙂

    But seriously, I think you’re right when you say that Google is earnestly and honestly trying its best, in its own mind and within certain contraints, to put the user first. I’m not completely disagreeing with you.

    It just has a colossal blind spot when it comes to certain aspect of what it really means to put the user first. A colossal one.

    Again there are two possibilities:

    (Possibility 1) G doesn’t realize that it has a blind spot.
    (Solution 1) In that case, it needs to come to that understanding. Someone needs to go to Google (say in NYC) and give a talk or something.

    (Possibility 2) G does realize it has that blind spot, but doesn’t want to do anything about it.
    (Solution 2) Google needs to change its motto, to “Organizing Some of the World’s Information”, like said above.

    I’m curious about which case it is.. 1 or 2?

  • 41 Christopher // Mar 22, 2009 at 11:26 pm

    @Jeremy,

    It’s #2 BUT it’s not so much a blind spot as a decision to move into new interfaces slowly as they are unsure what it’ll do to their main business – advertising.

    Google has a long view and will slowly move towards really fulfilling their motto. I’m not convinced they will get their before someone else does but they are doing lots of research and looking at many ideas.

    For competitors though this long view is the opening they need to create something truly new that changes the equation again.

  • 42 jeremy // Mar 22, 2009 at 11:36 pm

    My point is that I think they are, in their view, putting users first–at least when it comes to emphasizing processing speed.

    Again, I agree with you.. but what popped into my mind a few minutes after posting that last comment was a quote from Samuel Johnson: “The road to hell is paved with good intentions”.

    😉

  • 43 jeremy // Mar 22, 2009 at 11:45 pm

    Google has a long view and will slowly move towards really fulfilling their motto.

    How slowly? And how does all the work that they put into developing, say, Google Lively, play into this goal?

    And I heard a talk from Peter Norvig two years ago, where he talks about the research that Google does as very, very different from your typical PARC/Bell Labs research. He said that Google Research works by putting developments immediately into the hands of users, so that they can get users giving feedback on the product, and keep the research directly tied to what the users are doing.

    So if Google is doing all this research, why have I never seen any of it, during my normal, everyday use of Google?

    It’s #2 BUT it’s not so much a blind spot as a decision to move into new interfaces slowly as they are unsure what it’ll do to their main business – advertising.

    I.. But.. argh. Daniel said not to get into advertising on this thread. I’m chomping at the bit. We can come back to it later. Let me point you to this, for now:

    http://palblog.fxpal.com/?p=82

  • 44 Christopher // Mar 23, 2009 at 12:23 am

    While I may be overstating Google’s long-view and I agree they are quite different than a PARC or Bell Labs I think it’s a rather naive view to believe that all research that Google is doing or sponsoring (which is a lot I’d bet) is ending up in the hands of users to beta test and none of it is speculative.

    Yes, they do a lot of applied, product focused research but my belief is the statement given by Peter is a simplified view of what is really going on inside.

  • 45 Gene Golovchinsky // Mar 23, 2009 at 1:22 am

    Re: @Jeremy’s comment on exploratory search: I think that’s what I was trying to do during my graduate work. It was a small step, granted, and it had little do with with the web. But overall, I think the notion of query-mediated links is one way to operationalize exploratory search while retaining some of the ease of interaction associated with hypertext links.

  • 46 MarkH // Mar 23, 2009 at 6:03 am

    Re 5 second queries:

    >>Do they (1) care because they think the user cares?
    >>Do they (2) care because they can’t afford the processing power to handle the 5 second query?

    It’s a mix but I suspect #2 is the dominant factor given the computing challenges they face: http://research.google.com/people/jeff/WSDM09-keynote.pdf

    While it might be nice to think of Google as some benevolent “organiser of the world’s information” they are a business and it would be unreasonable to expect them to throw limitless resource at that task.
    Stepping up from a 200 ms query to a 5 seconds query is an example of something which would demand a massive increase in overall compute resource.

  • 47 jeremy // Mar 23, 2009 at 12:11 pm

    While it might be nice to think of Google as some benevolent “organiser of the world’s information” they are a business and it would be unreasonable to expect them to throw limitless resource at that task.

    Agreed. But then is it also not unreasonable to ask them to change their mission statement, into one that was a little more honest?

    Stepping up from a 200 ms query to a 5 seconds query is an example of something which would demand a massive increase in overall compute resource.

    But would it really? I am not saying that Google should do the 5-seconds of processing on every single query. Rather, I think there should be a button next to “I’m feeling Lucky” that says “I’m feeling Frustrated” or something like that. This button then goes and does a deeper-processed search, looking for more relationships among the data, and uncovers those hidden, difficult to discover pieces of information (that haute cuisine) that are not discoverable by regular McDonald’s flash-frying.

    Suppose this button gets used 5% of the time. So now you have 95% of the queries that take 0.2 seconds each, and 5% of the queries that take 5 seconds each. Some quick algebra reveals that this would take a 132% increase in their current processing power. A little more than double.

    Well, I have an easy suggestion: Dump Google Docs and dump Gmail. There’s your doubled capacity right there. In fact, they should have never developed those other products in the first place, because their corporate mission statement used to say: (from June 2004):

    It’s best to do one thing really, really well. Google does search. Google does not do horoscopes, financial advice or chat. With the largest research group in the world focused exclusively on solving search problems, Google knows what it does well and how it could be done better. Through continued iteration on difficult problems, Google has been able to solve complex issues that stymie others and provide continuous improvements to a service already considered the best on the web. Innovations like Google’s spell checker and the Google Toolbar, which enables users to search using Google from any website, make finding information a fast and seamless experience for millions of users. Google’s entire staff is dedicated to creating the perfect search engine and work tirelessly toward that goal.

    I guess they stopped dedicating their entire staff toward that goal of making a better search engine, and applied those extra resources elsewhere, because today their mission statement says:

    It’s best to do one thing really, really well. Google does search. With one of the world’s largest research groups focused exclusively on solving search problems, we know what we do well, and how we could do it better. Through continued iteration on difficult problems, we’ve been able to solve complex issues and provide continuous improvements to a service already considered the best on the web at making finding information a fast and seamless experience for millions of users. Our dedication to improving search has also allowed us to apply what we’ve learned to new products, including Gmail, Google Desktop, and Google Maps. As we continue to build new products* while making search better, our hope is to bring the power of search to previously unexplored areas, and to help users access and use even more of the ever-expanding information in their lives.

    So they either need to change their mission statement even further, and say “Google thinks its best to do a couple of things really, really well and not just focus on one thing (search) like we used to”, or else Google needs to dump Docs and Gmail, and use that extra processing power to give people real, deep, exploratory, valuable search.

  • 48 jeremy // Mar 23, 2009 at 1:14 pm

    ..because Google is a business, after all, like you said, and it’s not like gmail and gdocs are huge revenue generators. So dump them as big cost liabilities, and apply that same processing power toward improving search — something that (at least used to be) Google’s mission anyway.

  • 49 MarkH // Mar 23, 2009 at 1:17 pm

    I can see it now….
    “Dear Gmail users,
    a small but militant band of campaigners for ‘real search’ have taken us to task over our mission statement. Unfortunately their demands are such that we will be forced to withdraw the Gmail service. You have one week to notify all your contacts about your change of address and export your email before we permanently close all accounts. Thanks for your understanding in this matter”
    🙂
    But more seriously – the one thing Google “do really, really well” is provide a scalable computing platform for applications (be that search or mail or whatever). While you may want them to remain true to something else the reality is they have moved on in ways that are not easily reversed.

  • 50 Daniel Tunkelang // Mar 23, 2009 at 1:21 pm

    http://www.markcarey.com/googleguy-says/archives/discuss-googles-mission-with-gmail.html

  • 51 MarkH // Mar 23, 2009 at 2:09 pm

    …and while we’re considering the likelihood of large-scale UI change at Google:

    http://www.theregister.co.uk/2009/03/23/douglas_bowman_quits_google/

  • 52 jeremy // Mar 23, 2009 at 2:12 pm

    Yes, MarkH, that’s funny. I agree that wouldn’t work.

    But look at your serious response: “the one thing Google ‘do really, really well’ is provide a scalable computing platform for applications (be that search or mail or whatever).

    Today, that is very true. I agree; that’s what they do well.

    But when did it change? When did they lose focus on their main goal, which was search?

    While you may want them to remain true to something else the reality is they have moved on in ways that are not easily reversed.

    I want them to remain true to what they say is their main goal. Google’s mission statement does not say: “It is good to do one thing really well — scalable architecture”. No, it says search.

    So they can do whatever they want. If they say search, then do search. If they are no longer doing search, but doing scalable architectures, then just change what their mission statement says. And do it publicly, visibly, and transparently. The same sort of transparency that Tim O’Reilly wants out of Government 2.0, I want out of Corporations 2.0.

    Is it really so unreasonable for me to ask them to be honest and transparent about their mission statement? I thought Google was a different kind of company. A special one.

  • 53 jeremy // Mar 23, 2009 at 2:16 pm

    @Daniel:

    GoogleGuy Says: [Link to quote]
    Google’s mission: to organize the world’s information, making it universally accessible and useful. Email == information. 🙂

    Email == information? Oh, that is sooo rich. Guess what? Device drivers are also tools for information organization. So why doesn’t Google build a consumer-facing operating system, while they’re at it?

    Computer science, by definition, is the study of the representation, transformation, and reuse of information. How broad do you want to go? At some point, as Google keeps reinterpreting its mission statement, it becomes so broad that it has no meaning and no value. Is that what the founders really intended?

  • 54 jeremy // Mar 23, 2009 at 2:23 pm

    “Dear Gmail users,
    a small but militant band of campaigners for ‘real search’ have taken us to task over our mission statement.

    You forget, MarkH, that when Google started, its founders were a small but militant band of campaigners for ‘real search’.

    Remember?

    😉

  • 55 MarkH // Mar 23, 2009 at 2:30 pm

    🙂

  • 56 jeremy // Mar 23, 2009 at 3:14 pm

    MarkH, how about this letter instead?

    “Dear Gmail users,

    A small but militant band of campaigners for ‘real search’ reminded us that we used to be a small but militant band of campaigners for ‘real search’. We realize that we have lost our way, loosened our focus, and generally halted all major search developments other than trying to decide whether the lines of a box should be 3 pts, 4 pts, or 5 pts wide.

    Therefore, in order to return to our core competency and passion and a search company, we are suspending active development on Gmail, the same way we similar suspended development in Google Video a year or so ago. This will allow us to rededicate ourselves to the application that made us great in the first place.

    Unfortunately, this re-dedication comes at a cost; we cannot do everything. As a result, we will slowly start to phase out the Gmail service. Your email will still be available via POP3 for another two years. And we will continue forwarding email sent to your Gmail account for the next 5 years. However, we will be dedicating our servers to improving search, not processing your email so that we can serve advertisements next to it.

    Thanks for your understanding in this matter.”

  • 57 Christopher // Mar 23, 2009 at 9:46 pm

    @Jeremy,

    I have to applaud your passion on this topic but email is searchable information and valuable searchable information at that. IMO comparing it to device driver’s information feeds is not valid. Email & communication streams in all forms contain more value than almost any other from of information, it’s truly general purpose where as information created and collected by OS level services while important to a few is not important to many.

    Whether right or wrong it looks like Google is trying to organize the majority of personal information or information a person would like to have access too.

    Now while staying away from advertising it’s obvious why they want to index as much personal information as possible – targeted advertising. 😉

    I spend a lot of time working with communication data and my personal view is being able to search this stream is even more important than expanding the available public data. Don’t get me wrong I want to be able to access and navigate as much data as possible but in my day to day life it’s communication streams, domain specific documents (which often come with a domain specific index & ui) and finally more results in my general search.

    For me improving general search still comes down to improving the HCIR. Everyone here knows I am a fan of what Endeca does in enterprise search and I want some one to do it for general search or even try something truly new none of us has thought of.

    I think faceted, exploratory UIs are the next step in turning data into information. The engine does not have to read my mind, let me interact. I built a rudimentary version of this for email as an experiment and even in that area it seems to show marked improvement in accuracy and recall.

  • 58 Christopher // Mar 23, 2009 at 9:52 pm

    A final comment from me on general search and less than stellar recall.

    What I have seen, I’ll call it 3 steps to happy:

    General consumer does a search at Google, finds what they want = Happy.

    General consumer does a search at Google, can’t find what they are looking for, reach out to a “computer literate” friend or family member = Happy.

    General consumer does a search at Google, can’t find what they are looking for, reach out to a “computer literate” friend or family member, expert can’t find answer, writes a social message to their social graph, gets an answer, passes on to general consumer = Happy.

    I have rarely got to the 3rd stage and not been able to get a result. In fact I don’t remember the last time I actually didn’t.

  • 59 Daniel Tunkelang // Mar 23, 2009 at 9:58 pm

    I fail to find things on Google a bit more frequently, but in most cases I don’t know if the information I’m looking for is even out there. My complaint about Google is not that it doesn’t help me establish confidence that my failed searches mean the information isn’t out there.

  • 60 Christopher // Mar 23, 2009 at 10:05 pm

    Definitely not a Google fan boy, there’s lots that’s wrong and I agree fully with this statement and it’s one of my pet peeves as well: “My complaint about Google is not that it doesn’t help me establish confidence that my failed searches mean the information isn’t out there.”

    I have found using my 3 steps to happiness I tend to get a result whether directly or from a helpful 3rd party.

  • 61 jeremy // Mar 23, 2009 at 11:28 pm

    Christopher, Thank you for not mistaking my passion for belligerence 🙂 I tend to get a little fist-pounding at times, but rest assured it’s more bark than bite. This is all quite friendly discussion.

    You say: “Don’t get me wrong I want to be able to access and navigate as much data as possible but in my day to day life it’s communication streams, domain specific documents (which often come with a domain specific index & ui) and finally more results in my general search.

    I guess I don’t really share this same need. Most of the time, email that comes and goes just comes and goes. I really don’t need to search back over it. If there was an important link that came in via email, I will have already bookmarked it. If there was an address that I needed to write down, I will write it down.

    If I ever do need to search for something in email, it’s not to find one, top-ranked message. It’s mostly to find and reconstruct a series of he said/she said exchanges, between and among multiple recipients, which sometimes have occurred over many weeks or months, in order to reconstruct the logical flow of some issue my team is dealing with. But then, that sort of information need is an exploratory, recall-oriented need, and is something that Google already sucks at anyway. So having Google/Gmail doesn’t really help me, because they don’t do that kind of search.

    What I question about diving into email is the need or the efficacy of Google having to own not only the raw data, but also the UI, the servers, etc.

    In short, if email really is so important for search, then Google should be integrating email searching features into Google Desktop so that it can search email already on your system. It should be working on tools to crawl Exchange databases, to parse old Eudora and Pine and Elm mailboxes, and (with your permission and password) crawl and index your Yahoo and Hotmail accounts. That way, and only that way, would email search truly be useful.. because Google would really be searching through whatever email tool I used.

    Instead, they chose to develop a whole, hosted email service themselves. And while your data is not locked in, your ability to search that data is. In short, Google provides an inferior search solution, because it only has the ability to search your Google email. Gmail is wholly incapable of searching your old Pine and Eudora mailboxes. Which I’ve frankly got quite a few of.

    Know what I’m saying?

    And Daniel, that’s a great way to express it: Google gives you no way of establishing confidence that you’ve done a thorough and proper search, and the information simply isn’t there to be found. Apologies for promoting my own blog posts in your comments section — not something I want to make a regular habit — but I had a similar thought a few weeks ago and wrote about it:

    http://irgupf.com/2009/03/04/ranked-lists-and-the-paradox-of-choice/

    Finally, Christopher, in response to your comment #60: Maybe for many of your information needs you find your answer via one of those three steps. But let’s take this back to your email example. What if you’re searching in your email, and can’t find it, but know that it has to exist? You try a few Google searches and they fail. Do you really then ask a computer expert / friend to search your email for you? And after that, do you really chat up someone on your social graph, to find those email(s)? It seems like email search is not something where you could ask the resident computer expert or your social network for help, right?

  • 62 Christopher // Mar 23, 2009 at 11:37 pm

    Regarding #60, no this does not work for email at all & I certainly agree that having a exploratory UI for my email as well as my general search.

Clicky Web Analytics