As long-time readers know, one of my recurring themes is that there is a world of difference between web search and enterprise search–at least as those concepts are understood today. The other day, I had a conversation with my friend Carl Eklof, and we arrived at an aspect of that difference that I have at best understated in the past. Let me try to elaborate it now.

In web search, the immediate results for a query are pages on web sites. But these pages aren’t necessarily “documents”. In fact, the most popular web sites are portals or destinations, designed to help a user shop, research specialized information, communicate with other people, etc. When a web search takes a user to a page on such a site, the site (if it is well designed) takes on the responsibility for contextualizing the user’s experience.

In contrast, enterprise content often consists of a heterogeneous collection of content whose organization is at best implicit in its physical and logical arrangement. Departments within an enterprise may build user-centered portals, but it’s rare to see the sort of symbiosis that occurs between web search engines and the sites they index.

As a result, one of the challenges of an enterprise search application is that it must deliver a holistic user experience that compensates for the lack of effort on the part of the documents it indexes. Users still need context and guidance, but now the responsibility falls almost entirely on the search engine to deliver it.

Admittedly this picture is oversimplified. I don’t even like the term “enterprise search” because it’s often construed so narrowly. But I realize that many folks struggle with the idea that finding information within a proprietary document collection could be harder than doing so on the web. I hope this explanation helps shed some light.

Excellent point.

Isn’t it funny that people doing enterprise search have to “differentiate” themselves from web search?

If we go back in time, “enterprise search” came first… by something like 30 years. Ok. We did not have full text search in the sixties, but many of the first “enterprise” computers were used to help retrieve documents. Web search is a relatively recent addition. Yet, it is not the reference.


It is funny. When I went through the history of search in my MIT talk, I tried to make the point that search existed for a long time before the web, but that the mass adoption of the web meant that most people’s first experience with search was web search. Before the web, searching electronic collections was something that most of us associated with libraries–and often with professional librarians.


Daniel T: I agree with what you are saying. However, an even larger-overlooked problem is the fact that the web itself also contains regions or subsets that are quite enterprise-like in nature. They have exactly the properties that you describe. And even though these web pages are just that.. web pages, most internet search engines do a horrible job finding them.


Jeremy: you’re right. I do think that the popular evaluation web search reflects a selection bias towards the queries that it handles well. Moreover, the ecosystem adapts: there is a entire industry of search engine optimization, which goes beyond pursuing traffic to creating landing pages best designed to exploit it. For queries that don’t land in this sweet spot, web search isn’t quite as compelling an experience.

Daniel L.: history is written by the victors.


I think one should be very careful about suggesting that the majority of web search is about getting people to portal sites. Certainly, the most popular queries have navigational intent. However, there’s a very long, rich tail and the majority of the interesting work goes on at that end. Representing web search by navigational queries is like representing english by stopwords.

That said, I’m not arguing that web IR = enterprise IR = legal IR = cross-lingual IR. Each corpus and query base has its particularities that need to be understood well before applying IR techniques.


However, there’s a very long, rich tail and the majority of the interesting work goes on at that end.

Could you give specific examples? Because frankly, I don’t see evidence of this work that you’re talking about.

If anything, I see a shocking lack of this work on the web. Take, for example, a company like Google. In their enterprise search, they do clustering of results. In their web search they do not. Why?


i “suspect” there is more clustering in web search than you think.

wrt the tail. head queries are pretty easy to detect and handle. tail queries are harder to detect, sometimes easy to handle, sometimes hard. in general, that’s what differentiates systems.


Clustering that is exposed to the user, I mean. Clustering w/ interactive feedback. Google Enterprise has it. Google Web does not.


Fight, fight, fight! But seriously, let me me try to clarify.

Many queries lead users to portals even if they don’t have navigational intent. A search for a notable book title likely yields Amazon and Wikipedia pages as top results. A search for a person yields LinkedIn and Facebook pages. These aren’t navigational queries in the sense of Broder et al., but they do lead to portals.

And of course I agree that, to the extent that web search companies are doing interesting work, it’s almost entirely opaque to the user.


No, no fight. I was just seeking to clarify what each of us meant about clustering.

Am I incorrect? Does Google Web not offer (user-facing, exploratory-search directed) clustering, and Google Enterprise offer it? If so, how can it be true, as the google enterprise guy said in that interview that you posted a few days ago, that they are trying to provide the same experience in the enterprise as they are on the web? That seems not to be the case, which was my main point.


