The Noisy Channel

 

What Is (Not) Search?

December 2nd, 2008 · 9 Comments · General

I had a conversation the other day that raised a conundrum: what is *not* search? What do I mean by that? Well, as Stephen Arnold points out in a recent post, “search” can be anything from a “find-a-phone number problem” to a “glittering generality” that encompasses end-to-end information processing.

Language is imperfect, so is it really that important to define what is and isn’t “search”? It certainly matters when you’re trying to sell search technology! But, more importantly, we need some shared understanding in order to make progress.

At the very least, I propose that we distinguish “search” as a problem from “search” as a solution. By the former, I mean the problem of information seeking, which is traditonially the domain of library and information scientists. By the latter, I mean the approach most commonly associated with information retrieval, in which a user enters a query into the system (typically as free text) and the system returns a set of objects that match the query, perhaps with different degrees of relevancy.

Beyond that, we need to recognize that search exists within the context of tasks. It is easy to lump every task that involves information seeking as “search”, but doing so oversimplifies a complex landscape of activities and needs. I believe we are headed for a world where end users think about tasks rather than about the search activities that form part of those tasks. In that world, search technologists provide infrastructure, not the end-user destination.

9 responses so far ↓

  • 1 jeremy // Dec 2, 2008 at 2:27 pm

    I hope we continue to move toward a more task-centric view of information seeking, as well.

    I agree with your characterization of “information seeking”, but let me slightly disagree with your use of “information retrieval”. What you are calling information retrieval I would call “ad hoc retrieval”. Information retrieval is a superset of that, dealing not only with ad hoc, but also with clustering, routing, recommendation, question answering, etc.

    And what gets called “search”, especially in today’s current web environment, is a subset of ad hoc retrieval. Search is ad hoc retrieval, with an emphasis on high precision, low-recall.

    In my understanding of things, this is the chain of concepts, from most general to least general:

    (1) “information seeking”
    (2) “information retrieval” is one way of approaching information seeking
    (3) “ad hoc retrieval” is one way of approaching information retrieval
    (4) “search” is one way of approaching ad hoc retrieval

  • 2 Daniel Lemire // Dec 2, 2008 at 5:41 pm

    What is *not* search? As a database researcher, I would say that search implies a high selectivity. Precisely, it means the desired result set is much smaller than the entire data set.

    So, plotting the number of papers written by all scientists in DBLP per year is not “search” because we need to visit every single entry.

    But of course, this has probably nothing to do with what you meant.

  • 3 jeremy // Dec 2, 2008 at 6:49 pm

    What is *not* search?

    As I said above, what is not search is:
    (A) ad hoc retrieval that focuses on something other than high precision. E.g. high-recall.
    (B) other information retrieval, such as faceted retrieval, recommendation, question-answering, etc.
    (C) other forms of information seeking, for example the collaborative information retrieval stuff that I’ve been working on.

    As a database researcher, I would say that search implies a high selectivity. Precisely, it means the desired result set is much smaller than the entire data set.

    I agree. That’s essentially what I was saying when I wrote “search is ad hoc retrieval, with an emphasis on high precision, low-recall.

    So, plotting the number of papers written by all scientists in DBLP per year is not “search” because we need to visit every single entry.

    Well, this is where we can start to split hairs, because if the information that you are looking for is a plot of the number of papers written by all scientists in DBLP, then the process of finding that plot is search. But once you’re already at the DBLP website, and you are now trying to generate that plot, then I agree, that is no longer search.. that is now information retrieval, which I think is a broader category than just “search”.

  • 4 Max L. Wilson // Dec 5, 2008 at 9:03 am

    Interestingly, i totally agree with the hierarchy Jeremy put up of IS, IR, ad hoc, etc, but feel VERY dodgy about saying that ‘search’ is ad hoc retrieval and not more. In fact, suggesting that IR is wider than search makes me wanna freak out! Terminology in this area though has been a key topic in a number of forums recently, including the Collaborative Information Seeking, where some people were using the term for everything broader than even Information Seeking, I disagreed with that too, as did Nick Belkin. When you put ‘searching’ as an alternative to ‘browsing’, ‘exploring’, ‘navigating’, then search becomes broader than IR alone, despite the fact that many people use IR to navigate to pages. In the example above, then generating a plot is barely IR, but information behaviour that is outside even Information Seeking.

    Thoughts?

  • 5 jeremy // Dec 5, 2008 at 12:12 pm

    Max, let me back up for a second and clarify: I am not talking about how I would like the word “search” to be used. I am talking about how it actually gets used, in the community today.

    When you look at web “search” engines today, they really are not doing anything more than known-item lookup, aka precision-based ad hoc retrieval. They could be, but they are not.

    I agree with you that search should be more than this, but in practice it isn’t. So this public understanding of the definition of “search” was the one that I used in my hierarchy.

  • 6 jeremy // Dec 5, 2008 at 12:19 pm

    ..that said, I still think that no matter how you define “search”, it is still narrower than “IR”. IR is the field or study of setting up all the document similarities and structural relationships, so as to allow you to search, browse, explore, navigate, etc. IR makes useful and intelligent searching, browsing, etc. possible.

  • 7 Daniel Tunkelang // Dec 6, 2008 at 8:30 pm

    Wow, great to see that this place stays noisy even when I disappear for a week!

    While I don’t shy away from making provocative statement, I didn’t actually mean to make a strong statement about IR vs. search.

    I think that Jeremy is describing how the word “search” is used in the IR community, while Max is describing how it is used among information scientists. Or perhaps I’m confusing the matter further.

    In any case, the definitions I’ve seen of IR (e.g., the Wikipedia entry) are so broad that they probably encompass library science, databases, and then some!

  • 8 Daniel Tunkelang // Dec 6, 2008 at 8:33 pm

    And, in response to Daniel Lemire’s comment, that’s an interesting perspective. Given my IR training, I interpret the emphasis on high selectivity in a retrieval context as a concern for precision rather than recall.

  • 9 Finding, Locating, Discovering | The Noisy Channel // Aug 31, 2009 at 12:11 pm

    […] This breadth of meaning causes a lot of confusion. I’ve blogged about this before: “What is (Not) Search?“: At the very least, I propose that we distinguish “search” as a problem from […]

Clicky Web Analytics