Sometimes the response to a comment is worthy of an entire post, and this is one of those times. In response to my recent post about Able Grape, a wine search engine developed by Doug Cook (now Director of Twitter Search), Lee asked:
Let’s say I know almost nothing about wines/digital cameras/cars and a search site offers me “options” to drill down. However, I can’t use those effectively and eventually it comes down to availability and price for me. My questions are what are your thoughts on these kinds of situations and is there a scientific explanation/theory on this case?
This may be why Google does not endorse faceted search except for experimental projects.
It’s a great question. There’s been a lot of research on how people make decisions when they have to manage trade-offs among multiple attributes, and the increasing interest in behavioral economics since Daniel Kahneman won the Nobel Prize in 2002 has helped some of that research has even percolated into the mainstream thanks to bestsellers like Freakonomics and Dan Ariely’s Predictable Irrationality.
The short answer is that there’s no point in offering users options that they can’t (or won’t) use effectively. Choice overload is certainly a problem, and our reaction to it is to satisfice, typically resorting to “fast and frugal” heuristics that throw out most of the potential decision criteria and instead focus on one or two attributes, e.g., price and availability.
But that’s no reason to dumb down the data we make available to decision makers. We make hard choices all the time, and fast and frugal can be horrendously suboptimal. We don’t hire employees based solely on their price and availability–or at least good employers don’t! For that matter, I don’t think most people pick wines that way, given that even Trader Joe has to diversify beyond “Two Buck Chuck“. And, while there’s probably more of a market for cheap cameras and cars, I’m pretty sure you’re an extreme outlier if you completely ignore other criteria.
That said, there are some caveats about exposing options to users. Faceted search is hard, especially on the open web. Take it from the folks at Microsoft Research–but I’m sure Googlers would be the first to agree, especially given their experience with projects like Google Squared that, while promising, are nowhere near ready for prime time.
I appreciate that Google is conservative about embracing faceted search–and HCIR in general. I’m actually impressed by the steadily improving quality of their related terms for search queries–even if they do hide them behind two clicks (show options -> related searches). Perhaps they’re feeling some pressure from Bing. But I think they’re largely following the dictum of “if it ain’t broke, don’t fix it”. Google is an extremely successful company. And, as Clayton Christensen argues, successful companies are great at incremental innovation and bad at disruptive innovation. As far as I can tell, faceted search is very disruptive to their model.
20 replies on “Why Does Google Hold Back On Faceted Search?”
But I think they’re largely following the dictum of “if it ain’t broke, don’t fix it”. Google is an extremely successful company. And, as Clayton Christensen argues, successful companies are great at incremental innovation and bad at disruptive innovation. As far as I can tell, faceted search is very disruptive to their model.
The more time people spend exploring facets, the less time people spend clicking ads. Isn’t it as simple as that?
Speaking of Christensen, see also: http://irgupf.com/2009/03/19/long-term-versus-evolutionary-thinking-part-2-of-2/
Perhaps, but conversely faceted search offers users more opportunities to express their intent, which in turn offers the prospect of more targeted advertising. There may be interface challenges in where to put those ads, but it doesn’t sound like an insurmountable challenge.
I think the bigger issue is that faceted search on the web is hard, and offers many opportunities to mismanage user expectations. Google has incredible customer satisfaction. There’s a strong case for not risking that through innovation in the user interface. I think it’s not surprising that Microsoft is approaching such innovation more aggressively–though still not aggressively enough for my tastes.
great post daniel. Its interesting to think about this froma cost benefit perspective, as you have. I’ve been focusing on a notion of support vs complexity. In a space that is not well bounded. Facets ‘make you think’. In a space that’s well bounded and understood, however, we feel enpowered maybe. The functionality provides a benefit, and knowing the bounds reduces complexity. It’s maybe too simple, but less synicial than ad clicks 🙂
which in turn offers the prospect of more targeted advertising. There may be interface challenges in where to put those ads, but it doesn’t sound like an insurmountable challenge.
But doesn’t the more targeted advertising also require input from the advertisers, too. And that’s where it gets complex. Advertisers already have to manage all different sorts of keywording options (exact match, subset match, fuzzy match, etc.) Once they have to start deciding which facets and subfacets they want to bid competitively for, it will become unmanageable for them. So while Google might be able to make web facets work, ad facets are even harder. So perhaps it still is the advertising that is holding them back.
Maybe. But I’m not convinced. As you note, specifying the rules for when an ad triggers is already complex, but I can’t imagine advertisers asking for a less expressive API. I’m pretty sure that advertisers are willing to accept a more complex interface if it leads to a more lucrative outcome. Indeed, the added complexity could be optional. But I’m pretty sure advertisers would quickly jump to use it, much as they use the matching options today.
I very much like faceted search, and I even wrote a couple of papers on using tag clouds to do something similar to faceted search.
Alas, with tools like DBLP, I am unimpressed by the practical result. It just does not “flow”. It does not do what I’d expect. (And it is not nearly as hard a problem was web search, you’ll agree, I’m sure.)
ACM Digital Library integrated some faceted search (though I’m not sure you would called it faceted search) and it is terrible. I tried to refine a search today and end up closing the browser out of frustration.
No doubt, Endecca must have gotten it right. (Some of the time at least!) But clearly it is difficult.
I have not bought your book yet (sorry), but from what I could see in the reviews, you don’t seem to have some kind of magic wand to “make it work”.
Some maybe we need to do more research…
Ah! But how to get started?
Is faceted search defined exclusively by pointwise metadata? Or are there other ways of defining facets, in your book?
Daniel, you’re right that there’s no magic wand. A lot of faceted search implementations–including, to my chagrin, some that are powered by Endeca–make questionable design choices. The reasons vary–design by committee, insistence on mimicking a previous interface that did not include faceted search, etc. As may have said: good design takes work; bad design just happens.
As for my book, I do offer what I hope to be useful advice in my book, much as Marti Hearst and others do elsewhere–much of the advice about search and even web usability in general applies to faceted search. But you’re right, there’s no magic wand. If I had one, I’d certainly charge more for the book!
Jeremy, I would not say that faceted search is defined exclusively by point-wise meta-data. But I don’t really explore alternatives in the book (unless you count the use of multiple entity types, e.g., refining books by facets of their authors). In theory, a facet doesn’t have to be explicitly represented in the data at all, e.g., a facet could be derived from other data elements. In practice, however, no one seems to take such approaches–probably because the query-time computation becomes a major challenge.
In practice, however, no one seems to take such approaches
Surely you can, OLAP-style, do a rollup. For example, given a “city” facet, you could roll-up to the “state” facet and then a country facet…
I’ll go order your book. 😉
Sure, buy isn’t that just grouping records into aggregates based on their point-wise metadata? I thought Jeremy had in mind facets where the documents whose the facet values were implicitly rather than explicitly represented, e.g., a facet derived from point-wise metadata. If you allow for such facets to be defined at query time, you get great flexibility, but at the expense of a serious (though in some cases acceptable) computational cost.
In theory, a facet doesn’t have to be explicitly represented in the data at all, e.g., a facet could be derived from other data elements. In practice, however, no one seems to take such approaches–probably because the query-time computation becomes a major challenge.
Sure, buy isn’t that just grouping records into aggregates based on their point-wise metadata? I thought Jeremy had in mind facets where the documents whose the facet values were implicitly rather than explicitly represented, e.g., a facet derived from point-wise metadata.
Ah, it’s either fantastic that you “get” exactly where I was headed with that line of reasoning — because we’re on the same wavelength — or terrible, because my line of reasoning is so immediately transparent and naive 🙂
Either way, yes. This is what I was after.
Computation aside, I see those types of facets as still useful, but also something that advertisers would have a horrendous time trying to bid against. Given their implicit, user-driven nature, it would be difficult.
Otherwise, don’t we live in a “cloud” world these days? Isn’t one of the benefits of the cloud the fact that we can now start to do some of this higher-order computation?
I think it’s content variability and search spam that precludes any possibility of faceted results for web search.
Web search indexers can’t rely on page metadata, because it’s usually missing, duplicate, wrong, or deliberately misleading. They can’t rely on entity extraction because of language ambiguity (ruby on rails = red train on a track?). They can’t rely on any kind of content review process because the scope of their corpus is too big — even with crowdsourcing.
Dirty data is bad enough when enterprises own it and are able to clean it up for facets. But when the content is not just unreliable but also actively tuned to reverse-engineer search algorithms… I just can’t see it.
This situation does give enterprise search a nice space for improvement. I’m sure we’ve all pointed to facets when someone complains that the search is ‘not as good as Google’.
Avi, I partly agree, but I think “precludes” is a very strong word. For example, Able Grape doesn’t rely on page metadata and it does, IMO, a very respectable job. Granted, the content, restricted to a single domain, is hardly as variable as the general web, but the scale is respectable (18M pages crawled from 38K sites)–and it’s a one-person effort. I think there’s room for broader vertical plays, and that there’s even an incremental path to much of what we do with web search today.
The adversarial nature of web search is a real challenge. But I think web search engines make that problem even worse by reducing everything to one-dimensional ranking. Giving users more control through interfaces like faceted search makes it harder for a page author to execute an effective spamming strategy–it diffuses the adversary.
Yes, enterprises (at least most of them) do have the advantage of not having to deal with the partially adversarial dynamics of web search. But that’s not excuse for web search developers to rest on their laurels, taking advantage of the scale advantages of the web and the low expectations of users. It may be a great short-term strategy, but it does provide an opportunity for disruption.
I’m sure that Google, Microsoft, and others know this. The question is what anyone can and will do about it.
An even more surprising fact is that, despite some early marketing announcements, the Google appliance still does not support faceted search (while everybody would agree that this is a must have feature for the enterprise).
So the answer to the question may well simply be: because Google has yet to develop the capability… It’s actually not as simple as it seems. Google does a lot of post-processing work on the search results (to remove duplicates, group by sites, increase diversity etc) which makes it very hard to have meaningful counts in the facets.
Ouch, I didn’t realize the GSA add-on was that, um, lame. Thanks, that is quite surprising–I need to pay more attention and not take press releases at face value!
Good story. It’s all about structuring data within an ontology/ taxonomy. The web is unstructured data and more and more people try to structure the web and building taxonomies. For example with tags or via linked data formats. In my opinion Google try to put mechanisms in place that will automatically build these taxonomies (by communities) and by this you can use this for facet navigation and search.
[…] an earlier post, I speculated about why Google is holding back on faceted search. Of course, I was talking about their web search properties, not their enterprise offerings. I […]
Actually, GSA does offer faceted search (at least through the front end xslt). Dunno if it’s accessible through the api & the facets are static. The Google team points back to parametric for dynamic facets but it’s limited to 1000, not 100 reults; which eliminates any chance for deep discovery.
That’s actually another way to try to hack it but it is not a viable solution in most cases… Without counts this only amounts to some static shortcuts to do new queries, with the counts (which at least would allow you to know that a link is useless and would lead to no results) it’s extremely inefficient as explained in the documentation:
“The code will then submit an additional query per facet and obviously multiply the load to the GSA by the number of facets you have defined. ”
For the parametric hack, going to 1000 results just means that you need to retrieve 1000 extra results per query making it again very inefficient…
I just wrote a blog post about this.
Jerome, I agree with you on the hackish nature of those GSA facets. The hardcoded ones are just way too static, and the “parametric” ones have no activity in 2009, which is not encouraging.
I guess the Google Enterprise Search folks just chose to put their effort elsewhere.