Should We Build Task-Centric Search Engines?

Greg blogged today about a video of a DEMOfall08 panel on “Where the Web is Going” where Peter Norvig from Google and Prabhakar Raghavan from Yahoo both advocated that, rather than supporting only one search at a time, search engines should focus on helping people accomplished larger tasks, such as booking a vacation or finding a job. I won’t be so vain as to assume they read my blog, as these are canonical examples of tasks that current search engines don’t address very well.

I do see value in building task-specific applications that encapsulate the process of accomplishing particular classes of tasks–including any information seeking neccessary towards that end. But I’m not convinced that such applications live inside of search engines.  Rather, I think that a search engine (if that is still the right word for it) should be adaptive enough for task-centric applications to leverage it as a tool.

Perhaps it’s natural that leading researchers from Google and Yahoo have a search-centric view of the world. Given my daily work, I sometimes lapse into that view myself. But it’s important for us to realize that search–or, more broadly speaking, information seeking–is a means to an end. At least in the future I envision, information seeking support tools will be so well embedded in task-centric applications that we will almost never be conscious of information seeking as a distinct activity.

By Daniel Tunkelang

High-Class Consultant.

20 replies on “Should We Build Task-Centric Search Engines?”

That is the right idea: “a search engine should be adaptive enough for task-centric applications to leverage it as a tool”.

I don’t know why people do not learn more from the Unix model. Lots of small, bug-free, highly optimized tools. You can then chain them together to make whatever you need.

That might not be what the user wants, but as a tech. guy, that’s what I want to see. And maybe that for anyone with more sophistication than your low-level clerk, the Unix model is workable: have them learn 12 or 15 tools and let them combine them to be productive.


The Unix model is great for building applications. Naturally most users don’t want to build applications themselves, but they benefit from open architectures that allow tech guys to put nifty applications together.

But I think there’s more to this than the gospel of open architecture. Too many people see search as a central activity. We’ve got to get over this transitory phase. On the way, I hope be that we’ll stop confusing search as a problem (i.e., information seeking needs) with search as a solution (i.e., the technology that retrieves content based on a query).


one challenge for this combinatoric approach to supporting exploratory search is the need for more capable APIs that provide a variety of data and computaton needed by specific applications.


I wrote a story in, hmmm, 1996 on “vertical search engines” for Web Week. No reason not to have them, but the fact is — and here I think Dan and I again find ourselves agreeing — search is either

1. A Great Big Search Brand (Google, Alta Vista, Yahoo, Lycos)


2. A brand that isn’t search so much as it is a task-related brand, such as Kayak, or the New York Times, or Gartner. These brands have value propositions that are search-oriented, and with search as a critical element — and they may have federated capabilities, or process-completion capabilities (go to the Gartner Web site and do a search on “enterprise search value calculator” to see what I mean in our context).


Gene: I agree that search engine APIs seem impoverished for supporting exploration. That’s part of why I push so hard for set retrieval.

Whit: That’s an nice way to segment the market, even if the online world has changed a bit since 1996. Information seeking today revolves around the Great Big Search Brand. But we’re headed for a world where end users think about tasks rather than about search. In that world, search technologists provide infrastructure, not the end-user destination.


More on search APIs: search on the web (typically) consists of three tasks: a) crawling and updating, b) indexing, and c) retrieval. They all involve different core competencies. It would be interesting if there were a way to segment the search space to allow independent innovation in all three segments without duplicating effort or sacrificing expressiveness, particularly in indexing and querying. This implies (to me) that the indexer should have all sorts of metadata about documents (links, how often content changes, etc.) and the retrieval component should have access to document and collection statistics (e.g., TF, IDF, etc.).


Gene, it seems to me that the IR research community sees (a) as a systems issue and (b) as an implementation detail of (c). Naturally, all of the attention focuses on (c).

While I can see the virtue of pursuing independent innovation on subtasks–and I think this has been done for crawling, though perhaps not by the IR community–I’m not sure how we can meaningfully separate indexing from retrieval. How do we measure the quality of indexing in a vacuum?


While the quality of results from an end-user perspective is a system-wide measure (gotta have the right docs, they gotta be indexed well, gotta have good query matching), it should still be possible to assess how well the indexing component supports a rich query interface. Does it support document modification? Does it expose corpus statistics? Does it allow persistent sub-collections to be defined? etc.

I think we would see many more task-specific search engines if it were possible to leverage existing indexing services to construct specialized collections and associated query capability.


Gene, I hear you loud and clear. But I’m trying to translate this into language that the research community could relate to., i.e., What Would TREC Do?

Speaking as a practitioner who focuses on enterprise settings, I can tell you that it’s easy to create a checklist of features that software buyers could use for their RFPs. What is hard is to provide guidance to them on how each of these features affects the overall experience that an application ultimately delivers. And, naturally, vendors provide guidance that emphasizes the importance of their best or unique features.

What I’d like to see is an enumeration of “plays well with others” scenarios to provide realistic tests of an engine’s ability to be integrated into specialized applications. But unfortunately this is an area under-served by both the academic and the industry analysts. See my earlier “call to action” posts. It did draw interest from some of the major industry analysts, but there’s got to be another leading vendor beyond Endeca interested before anyone will pay attention.


There seems to be continual meme running through the tech & IR communities around “what is the next Google?”. I assert that there isn’t going to be a next Google, and Daniel has made essentially the same point elsewhere (e.g.

As has been argued elsewhere by myself & others in The Noisy Channel, task-centric information seeking is where the opportunity for innovation is (We covered some of this ground in one of Daniel’s previous posts here: Rephrasing Daniel’s current question a bit, is this going to be the province of vertical search engines, or will these capabilities be embedded in applications?

There’s a basic definitional question that needs to be answered first – is Kayak a search engine or an application? I think if you asked Kayak, they’d say they make a search engine. If you asked my mom, she’d call it an application. In this case, I think it may be a distinction without a difference. Search is in the eye of the beholder. One might argue that Kayak isn’t a search engine because it’s not using web data, it’s using structured data from feeds and databases, but that might be begging the question…

To my mind, the biggest challenge in consumer-oriented IR comes from the state of data on the web. Google is mired in a world of words, tokens, html – it’s fundamentally text-centric, like much of the web. A raft of companies are trying to impose order on this chaos through application of NLP or other statistical classification technologies (FirstRain, Clearforest,, or through rebuilding from the ground up (the Semantic Web approach). Context, dialog, exploration (required to provide the kind of exploratory search that’s needed) requires some kind of metadata to power it. Today, this metadata is hard to come by and requires sophisticated technologies to produce, and often requires a very domain-centric processing strategy. As as result (IMHO), task-oriented, vertical information access approaches are going to remain the province of IR specialists (hooray for job security!), and not general purpose application development companies.


This is a great post-topic. There’s been quite a few projects working on multi-search-session task support etc, at CHI and UIST. TaskBar (I think it was) was an interesting spin your history view in a web browser like IE. Even cognitive load theory recognises task segmentation as one of the key methods of reducing cognitive load on users achieving a larger goal. I’ve not seen any services, however, that try to identify and represent process to users. I’ve only heard talk about it. It would be interesting to see if larger processes can be produced from keyword analysis of large search sessions.


Mark, I think the distinction that might be most important practically is that Kayak users see Kayak doing something that Google doesn’t do. To the average person, “search” = “what Google does”.

Where I think things get interesting is when Google leads someone to a site that then takes over for the rest of a complex information seeking task. In that case, Google is acting as a switchboard, but probably gets disproportionate credit for the overall experience.

And that raises a question: will there be one switchboard to rule them all? And, if so, will it look like–or even be–Google?


@Daniel’s comment 14: If Search == Google, then what is a Task Centric Search Engine – i.e. I’m confused about the sense of your original question. Google is today the “One switchboard” and I can’t see that changing anytime soon. And Kayak is a task centric search engine. Google will be the switchboard that takes you to a task-centric search engine (likely through that search engine’s SEO efforts).

Tripit is a great process-oriented application – information seeking is a part of the TripIt experience but it’s not at the center of their identity, and in the end, that’s probably what distinguishes “search” from “applications”. Search engines are about finding information located outside the search experience, whereas apps are about keeping you in the experience. By this logic, Kayak is a search engine, but Yelp is not. Which, I think, feels right to me.


To clarify: I’m saying that almost everyone who does not work in the convex hull of our industries equates “search” to “Google”. It’s only within our rarified circle that the question of distinguishing “search” from information retrieval is even meaningful.

In any case, I’m certainly ready to accept the centrality of the information seeking aspect as the defining characteristic of a search engine as distinct from an application that happens to include an incidental information seeking component. Though I’m not sure how Yelp is not a search engine by that definition.


My suggestioned differentiation was that Kayak is designed to take the user off-site (e.g. to a fare page on, whereas Yelp is designed to keep the user on-site (e.g. on the restaurant page for Sushi-Teq, with reviews & lots of meta), but I’ll agree the difference might be one of emphasis, as Yelp does have a pointer to their web site.


Comments are closed.