Categories
General

Conversation with Seth Grimes

I had an great conversation with Intelligent Enterprise columnist Seth Grimes today. Apparently there’s an upside to writing critical commentary on Google’s aspirations in the enterprise!

One of the challenges in talking about enterprise search is that no one seems to agree on what it is. Indeed, as I’ve been discussing with Ryan Shaw , I use the term broadly to describe information access scenarios distinct from web search where an organization has some ownership or control of the content (in contrast to the somewhat adversarial relationship that web search companies have with the content they index). But I realize that many folks define enterprise search more narrowly to be a search box hooked up to the intranet.

Perhaps a better way to think about enterprise search is as a problem rather than solution. Many people expect a search box because they’re familiar with searching the web using Google. I don’t blame anyone for expecting that the same interface will work for enterprise information collections. Unfortunately, wishful thinking and clever advertising notwithstanding, it doesn’t.

I’ve blogged about this subject from several different perspectives over the past weeks, so I’ll refer recent readers to earlier posts on the subject rather than bore the regulars.

But I did want to mention a comment Seth made that I found particularly insightful. He defined enterprise search even more broadly than I do, suggesting that it encompassed any information seeking performed in the pursuit of enterprise-centric needs. In that context, he does see Google as the leader in enterprise search–not because of their enterprise offerings, but rather because of the web search they offer for free.

I’m not sure how I feel about his definition, but I think he raises a point that enterprise vendors often neglect. No matter how much information an enterprise controls, there will always be valuable information outside the enterprise. I find today’s APIs to that information woefully inadequate; for example, I can’t even choose a sort order through any of the web search APIs. But I am optimistic that those APIs will evolve, and that we will see “federated” information seeking that goes beyond merging ranked lists from different sources.

Indeed, I look forward to the day that web search providers take a cue from the enterprise and drop the focus on black box relevance ranking in favor of an approach that offers users control and interaction.

Categories
General

Position papers for NSF IS3 Workshop

I just wanted to let folks know that the position papers for the NSF Information Seeking Support Systems Workshop are now available at this link.

Here is a listing to whet your curiosity:

  • Supporting Interaction and Familiarity
    James Allan, University of Massachusetts Amherst, USA
  • From Web Search to Exploratory Search: Can we get there from here?
    Peter Anick, Yahoo! Inc., USA
  • Complex and Exploratory Web Search (with Daniel Russell)
    Anne Aula, Google, USA
  • Really Supporting Information Seeking: A Position Paper
    Nicholas J. Belkin, Rutgers University, USA
  • Transparent and User-Controllable Personalization For Information Exploration
    Peter Brusilovsky, University of Pittsburgh, USA
  • Faceted Exploratory Search Using the Relation Browser
    Robert Capra, UNC, USA
  • Towards a Model of Understanding Social Search
    Ed Chi, Palo Alto Research Center, USA
  • Building Blocks For Rapid Development of Information Seeking Support Systems
    Gary Geisler, University of Texas at Austin, USA
  • Collaborative Information Seeking in Electronic Environments
    Gene Golovchinsky, FX Palo Alto Laboratory, USA
  • NeoNote: User Centered Design Suggestions for a Global Shared Scholarly Annotation System
    Brad Hemminger, UNC, USA
  • Speaking the Same Language About Exploratory Information Seeking
    Bill Kules, The Catholic University of America, USA
  • Musings on Information Seeking Support Systems
    Michael Levi, U.S. Bureau of Labor Statistics, USA
  • Social Bookmarking and Information Seeking
    David Millen, IBM Research, USA
  • Making Sense of Search Result Pages
    Jan Pedersen, Yahoo, USA
  • A Multilevel Science of Social Information Foraging and Sensemaking
    Peter Pirolli, XEROX PARC USA
  • Characterizing, Supporting and Evaluating Exploratory Search
    Edie Rasmussen, University of British Columbia, Canada
  • The Information-Seeking Funnel
    Daniel Rose, A9.com, USA
  • Complex and Exploratory Web Search (with Anne Aula)
    Daniel Russell, Google, USA
  • Research Agenda: Visual Overviews for Exploratory Search
    Ben Shneiderman, University of Maryland, USA
  • Five Challenges for Research to Support IS3
    Elaine Toms, Dalhousie University, Canada
  • Resolving the Battle Royale between Information Retrieval and Information Science
    Daniel Tunkelang, Endeca, USA
Categories
General

Why Enterprise Search Will Never Be Google-y

As I prepared to end my trilogy of Google-themed posts, I ran into two recently published items. They provide an excellent context for what I intended to talk about: the challenges and opportunities of enterprise search.

The first is Google’s announcement of an upgrade to their search appliance that allows one box to index 10 million documents and offers improved search quality and personalization.

The second is an article by Chris Sherman in the Enterprise Search Sourcebook 2008 entitled Why Enterprise Search Will Never Be Google-y.

First, the Google announcement. These are certainly improvements for the GSA, and Google does seem to be aiming to compete with the Big Three: Autonomy, Endeca, FAST (now a subsidiary of Microsoft). But these improvements should be seen in the context of state of the art. In particular, Google’s scalability claims, while impressive, still fall short of the market leaders in enterprise search. Moreover, the bottleneck in enterprise search hasn’t been the scale of document indexing, but rather the effectiveness with which people can access and interact with the indexed content. Interestingly, Google’s strongest selling point for the GSA, their claim it works “out of the box”, is also its biggest weakness: even with the new set of features, the GSA does not offer the flexibility or rich functionality that enterprises have come to expect.

Second, the Chris Sherman piece. Here is an excerpt:

Enterprise search and web search are fundamentally different animals, and I’d argue that enterprise search won’t–and shouldn’t–be Google-y any time soon….Like web search, Google’s enterprise search is easy to use–if you’re willing to go along with how Google’s algorithms view and present your business information….Ironically, enterprises, with all of their highly structures and carefully organized silos of information, require a very different and paradoxically more complex approach.

I highly recommend you read the whole article (it’s only 2 pages), not only because it informative and well written, but also because the author isn’t working for one of the Big Three.

The upshot? There is no question that Google is raising the bar for simple search in the enterprise. I wouldn’t recommend that anyone try to compete with the GSA on its turf.

But information needs in the enterprise go far beyond known-item search, What enterprises want when they ask for “enterprise search” is not just a search box, but an interactive tool that helps them (or their customers) work through the process of articulating and fulfilling their information needs, for tasks as diverse as customer segmentation, knowledge management, and e-discovery.

If you’re interested in search and want to be on the cutting edge of innovation, I suggest you think about the enterprise.

Categories
General

Where Google Isn’t Good Enough

My last post, Is Google Good Enough?, challenged would-be Google killers to identify and address clear consumer needs for which Google isn’t good enough as a solution. I like helping my readers, so here are some ideas.

  • Shopping. Google Product Search (fka Froogle) is not one of Google’s crown jewels. At best, it works well when you know the exact name of the product you are looking for. But it pales in contrast to any modern ecommerce site, such as Amazon or Home Depot. What makes a shopping site successful? Put simply, it helps users find what they want, even when they didn’t know exactly what they wanted when they started.
  • Finding a job. Google has not thrown its hat into the ring of job search, and even the page they offer for finding jobs at Google could use some improvement. The two biggest job sites, Monster and Careerbuilder, succeed in terms of the number of jobs posted, but aren’t exactly optimized for user experience. Dice does better, but only for technology jobs. Interestingly, the best job finding site may be LinkedIn–not because of their search implementation (which is adequate but not innovative), but because of their success in getting millions of professionals to provide high-quality data.
  • Finding employees. Again, LinkedIn has probably come closest to providing a good employee finding site. The large job sites (all of which I’ve used at some point) not only fail to support exploratory search, but also suffer from a skew towards ineligible candidates and a nuisance of recruiters posing as job seekers. Here again, Google has not tried to compete.
  • Planning a trip. Sure, you can use Expedia, Travelocity, or Kayak to find a flight, hotel, and car rental. But there’s a lot of room for improvement when it comes to planning a trip, whether for business or pleasure. The existing tools do a poor job of putting together a coordinated itinerary (e.g., meals, activities), and also don’t integrate with relevant information sources, such as local directories and reviews. This is another area where Google has not tried to play.

Note two general themes here. The first is thinking beyond the mechanics of search and focusing on the ability to meet user needs at the task level. The second is the need for exploratory search. These only scratch the surface of opportunities in consumer-facing “search” applications. The opportunities within the enterprise are even greater, but I’ll save that for my next post.

Categories
General

Is Google Good Enough?

As Chief Scientist of Endeca, I spend a lot of my time explaining to people why they should not be satisfied with an information seeking interface that only offers them keyword search as an input mechanism and a ranked list of results as output. I tell them about query clarification dialogs, faceted navigation, and set analysis. More broadly, I evangelize exploratory search and human computer information retrieval as critical to addressing the inherent weakness of conventional ranked retrieval. If you haven’t heard me expound on the subject, feel free to check out this slide show on Is Search Broken?.

But today I wanted to put my ideology aside and ask the the simple question: Is Google good enough? Here is a good faith attempt to make the case for the status quo. I’ll focus on web search, since, as I’ve discussed before on this blog, enterprise search is different.

1) Google does well enough on result quality, enough of the time.

While Google doesn’t publish statistics about user satisfaction, it’s commonplace that Google usually succeeds in returning results that users find relevant. Granted, so do all of the major search engines: you can compare Google and Yahoo graphically at this site. But the question is not whether other search engines are also good enough–or even whether they are better. The point is that Google is good enough.

2) Google doesn’t support exploratory search. But it often leads you to a tool that does.

The classic instance of this synergy is when Google leads you to a Wikipedia entry. For example, I look up Daniel Kahneman on Google. The top results is his Wikipedia entry. From there, I can traverse links to learn about his research areas, his colleagues, etc.

3) Google is a benign monopoly that mitigates choice overload.

Many people, myself includes, have concerns about Google’s increasing role in mediating our access to information. But it’s hard to ignore the upside of a single portal that gives you access to everything in one place: web pages, blogs, maps, email, etc, And it’s all “free”–at least in so far as ad-supported services can be said to be free.

In summary, Google sets the bar pretty high. There are places where Google performs poorly (e.g., shopping) or doesn’t even try to compete (e.g., travel). But when I see the series of companies lining up to challenge Google, I have to wonder how many of them have identified and addressed clear consumer needs for which Google isn’t good enough as a solution. Given Google’s near-monopoly in web search, parity or even incremental advantage isn’t enough.

Categories
General

Not as Cuil as I Expected

Today’s big tech news is the launch of Cuil, the latest challenger to Google’s hegemony in Web search. Given the impressive team of Xooglers that put it together, I had high expectations for the launch.

My overall reaction: not bad, but not good enough to take seriously as a challenge to Google. They may be “The World’s Biggest Search Engine” based on the number of pages indexed, but they return zero results for a number of queries where Google does just fine, including noisy channel blog (compare to Google). But I’m not taking it personally–after all, their own site doesn’t show up when you search for their name (again, compare to Google). As for their interface features (column display, explore by category, query suggestions), they’re fine, but neither the concepts nor the quality of their implementation strike me as revolutionary.

Perhaps I’m expecting too much on day 1. But they’re not just trying to beat Gigablast; they’re trying to beat Google, and they surely expected to get lots of critical attention the moment they launched. Regardless of the improvements they’ve made in indexing, they clearly need to do more work on their crawler. It’s hard to judge the quality of results when it’s clear that at least some of the problem is that the most relevant documents simply aren’t in their index. I’m also surprised to not see Wikipedia documents showing up much for my searches–particularly for searches when I’m quite sure the most relevant document is in Wikipedia. Again, it’s hard to tell if this is an indexing or results quality issue.

I wish them luck–I speak for many in my desire to see Google face worthy competition in web search.

Categories
General

Catching up on SIGIR ’08

Now that SIGIR ’08 is over, I hope to see more folks blogging about it. I’m jealous of everyone who had the opportunity to attend, not only because of the culinary delights of Singapore, but because the program seems to reflect an increasing interest of the academic community in real-world IR problems.

Some notes from looking over the proceedings:

  • Of the 27 paper sessions, 2 include the word “user” in their titles, 2 include the word “social”, 2 focus on Query Analysis & Models, and 1 is about exploratory search. Compared to the last few SIGIR conferences, this is a significant increase in focus on users and interaction.
  • A paper on whether test collections predict users’ effectiveness offers an admirable defense of the Cranfield paradigm, much along the lines I’ve been advocating.
  • A nice paper from Microsoft Research looks at the problem of whether to personalize results for a query, recognizing that not all queries benefit from personalization. This approach may well be able to reap the benefits of personalization while avoiding much of its harm.
  • Two papers on tag prediction: Real-time Automatic Tag Recommendation (ACM Digital Library subscription required) and Social Tag Prediction. Semi-automated tagging tools are one of the best ways to leverage the best of both human and machine capabilities.

And I haven’t even gotten to the posters! I’m sad to see that they dropped the industry day, but perhaps they’ll bring it back next year in Boston.

Categories
General

Knol: Google takes on Wikipedia

Just a few days ago, I was commenting on a New York Times article about Wikipedia’s new approval system that the biggest problem with Wikipedia is anonymous authorship. By synchronous coincidence, Google unveiled Knol today, which is something of a cross between Wikipedia and Squidoo. It’s most salient feature is that each entry will have a clearly identified author. They even allow authors to verify their identities using credit cards or phone directories.

It’s a nice idea, since anonymous authorship is a a major factor in the adversarial nature of information retrieval on the web. Not only does the accountability of authorship inhibit vandalism and edit wars, but it also allows readers to decide for themselves whom to trust–at least to the extent that readers are able and willing to obtain reliable information about the authors. Without question, they are addressing Wikipedia’s biggest weakness.

But it’s too little, too late. Wikipedia is already there. And, despite complaints about its inaccuracy and bias, Wikipedia is a fantastic, highly utilized resource. The only way I see for Knol to supplant Wikipedia in reasonable time frame is through a massive cut-and-paste to make up for the huge difference in content.

Interestingly, Wikipedia does not seem to place any onerous restrictions on verbatim copying. However, unless a single author is 100% responsible for authoring a Wikipedia entry, it isn’t clear that anyone can simply copy the entry into Knol.

I know that it’s dangerous to bet against Google. But I’m really skeptical about this latest effort. It’s a pity, because I think their emphasis is the right one. But for once I wish they’d been a bit more humble and accepted that they aren’t going to build a better Wikipedia from scratch.

Categories
General

Predictably Irrational

As regular readers have surely noticed by now, I’ve been on a bit of a behavioral psychology kick lately. Some of this reflects long-standing personal interest and my latest reading. But I also feel increasingly concerned that researchers in information seeking–especially those working on tools–have neglected the impact of cognitive bias.

For those who are unfamiliar with last few decades of research in this field, I highly recommend a recent lecture by behavioral economist Dan Ariely on predictable irrationality. Not only is he a very informative and entertaining speaker, but he chooses very concrete and credible examples, starting with his contemplating how we experience pain based on his own experience of suffering
third-degree burns over 70 percent of his body. I promise you, the lecture is an hour well spent, and the time will fly by.

A running theme of through this and my other posts on cognitive bias is that the way a information is presented to us has dramatic effects on how we interpret that information.

This is great news for anyone who wants to manipulate people. In fact, I once asked Dan about the relative importance of people’s inherent preferences vs. those induced by presentation on retail web sites, and he all but dismissed the former (i.e., you can sell ice cubes to Eskimos, if you can manipulate their cognitive biases appropriately). But it’s sobering news for those of us who want to empower user to evaluate information objectively to support decision making.

Categories
General

Beyond a Reasonable Doubt

In Psychology of Intelligence Analysis, Richards Heuer advocates that we quantify expressions of uncertainty: “To avoid ambiguity, insert an odds ratio or probability range in parentheses after expressions of uncertainty in key judgments.”

His suggestion reminds me of my pet peeve about the unquantified notion of reasonable doubt in the American justice system. I’ve always wanted (but never had the opportunity) to ask a judge what probability of innocence constitutes a reasonable doubt.

Unfortunately, as Heuer himself notes elsewhere in his book, we human beings are really bad at estimating probabilities. I suspect (with a confidence of 90 to 95%) that quantifying our uncertainties as probability ranges will only suggest a false sense of precision.

So, what can we do to better communicate uncertainty? Here are a couple of thoughts:

  • We can calibrate estimates based on past performance. It’s unclear what will happen if people realize that their estimates are being translated, but, at worst, it feels like good fodder for research in judgment and decision making.
  • We can ask people to express relative probability judgments. While these are also susceptible to bias, at least they don’t demand as much precision. And we can always vary the framing of questions to try to factor out the cognitive biases they induce.

Also, we talk about uncertainty, it is important that we distinguish between aleatory and epistemic uncertainty.

When I flip a coin, I am certain it has a 50% chance of landing heads, because I know the probability distribution of the event space. This is aleatory uncertainty, and forms the basis of probability and statistics.

But when I reason about less contrived uncertain events, such as estimating the likelihood that my bank will collapse this year, the challenge is my ignorance of the probability distribution. This is epistemic uncertainty, and it’s a lot messier.

If you’d like to learn more about aleatory and existential uncertainty, I recommend Nicholas Nassim Taleb’s Fooled by Randomness (which is a better read than his better-known Black Swan).

In summary, we have to accept the bad news that the real world is messy. As a mathematician and computer scientist, I’ve learned to pursue theoretical rigor as an ideal. Like me, you may find it very disconcerting to not be able to treat all real-world uncertainty in terms of probability spaces. Tell it to the judge!