Category: General

General posts, typically analyzing HCIR issues.

Real Time Search Is Personal

Post author By Daniel Tunkelang
Post date January 18, 2010
19 Comments on Real Time Search Is Personal

The other day, I promised in a comment thread that I’d write about what I see as real use cases for real-time search. As it happens, I’m experiencing one right now.

As my wife, daughter, and I were walking home from a playground, we noticed a large number of fire trucks congregating a block away from our house. A quick search on Twitter explained what was going on, particularly by pointing us to this post on Gothamist–which as of this writing seems to be the only reporting about this incident.

I think this example tells us a lot about the utility of real-time search. Most of us don’t need real-time search to tell us about the news in Haiti, since a critical mass of major news providers is covering the story around the clock. Where real-time search matters most is at the personal level–specifically, when our personal urgency to obtain information is higher than that of the general population. In such situations, we’re willing to accept less polished–and even risk less accurate–information, particularly if the alternative is to wait until if and when news providers cover the story. At least to some extent, urgency trumps authority.

Yes, there are other use cases for conversational media like Facebook and Twitter, such as sharing the experience of watching a live event, or simply chatting with friends and strangers about arbitrary topics. But I wouldn’t consider such use of these media to be search. Real-time search, in my view, is about helping users obtain the latest information available–in accordance with their personal needs. Twitter and Google served me well today, and I’m grateful that real-time search gave me real-time peace of mind.

General

When Is Faceted Search Appropriate?

Post author By Daniel Tunkelang
Post date January 15, 2010
21 Comments on When Is Faceted Search Appropriate?

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uiedesignpatternstrailermerged-091210133302-phpapp01&stripped_title=search-discovery-patterns-a-uie-virtual-seminar

Earlier this week, Peter Morville and Mark Burrell presented a UIE virtual seminar on “Leveraging Search & Discovery Patterns For Great Online Experiences“. It sold out! And I thought Pete Bell and I had done well with our seminar on faceted search!

But I’m hardly surprised. Although I wasn’t able to attend it myself, I gather from Twitter and the blogosphere that it was a great presentation. I enjoyed serving as a reviewer for Peter’s new book on Search Patterns, and I contributed a bit to Endeca’s UI Design Pattern Library while I was there and Mark’s team was developing it.

In reading reactions to the seminar, I was particularly intrigued by a post entitled “Search and Browse” by Livia Labate on her fantastically named blog, “I think, therefore IA“. She raised a question that I think needs to be asked more often: when is (or isn’t) faceted search appropriate?

Her conversation with readers in a comment thread offered some possible answers:

Faceted search helps users who think in terms of attribute specifications as filtering criteria.
Faceted search supports search by exclusion, as opposed to by discovery.
Faceted search requires a set of useful facets that is neither too small nor too large.

I’d like to propose my own answers. Here are the conditions for which I see faceted search being most useful:

Faceted search supports exploratory use cases, in contrast to known-item search. For known-item search, users are better served by a search box to specify an item by name, or a non-faceted hierarchy to locate it. In contrast, faceted search optimizes for cases where users are either unsure of what they want or of how to specify it.
Faceted search helps users who need or want to learn about the search space as they execute the search process. Facets educate users about different ways to characterize items in a collection. If users do not need or want this education, they may be frustrated by an interface that makes them do more work.
The search space is classified using accurate, understandable facets that relate to the users’ information needs. As I’ve discussed before, data quality is often the bottleneck in designing search interfaces. Offering users facets that are either unreliable or unrelated to their needs is worse than providing no facets at all.

Given the above criteria, it’s not surprising that faceted search has been a huge success in online retail: shopping is often an exploratory learning experience, and retailers tend to have good data.

But the success of faceted search in retail overshadows other domains where faceted search may be even more valuable. My favorite example is faceted people search, most recently demonstrated by LinkedIn. I would love to see other entities (locations, businesses, etc.) receive similar treatment, at least in contexts where exploration is a common use case.

I think Livia is right to be skeptical about any interface that introduces complexity–and facets do introduce complexity. I hope that my guidelines help answer her question as to when that complexity is worthwhile and perhaps even necessary to help users satisfy their information needs.

General

Search Questions for 2010: What’s On My Mind

Post author By Daniel Tunkelang
Post date January 3, 2010
15 Comments on Search Questions for 2010: What’s On My Mind

Happy New Year to the Noisy Community and everyone else in virtual earshot! I hope everyone is entering 2010 well-rested and ready for great things. And I don’t just mean shiny new gadgets.

For me, 2009 marked the end of a decade-long run at Endeca, where I focused on bringing HCIR to enterprises. I’m particularly proud of two professional accomplishments: writing a book on faceted search, and organizing the SIGIR 2009 Industry Track.

But past is prologue. I spent the last several weeks of 2009 as a Noogler, and I launch into 2010 living and breathing search on the open web.

What’s on my mind? Here are some top-of-mind questions to which I hope to have better answers by this time next year:

Exploratory Search: how should we determine that users want a more exploratory search experience, rather than one that minimizes time to a best-effort result? How should we respond to queries that clearly don’t have a single best answers, such as queries of the form [category] or [category location]?

Mobile Search: should it be just like non-mobile search with a few tweaks to accommodate the device form factor? Or does / should mobile search fundamentally change the way we interact with information?

Real-Time Search: is it more than real-time indexing plus emphasizing recency as a query-independent relevance factor? What are the use cases, and how should we be addressing them?

Social / Collaborative Search: should we be looking to microblogging or other social media signals to augment (or even supplant!) link-based citations as authority cues? Should we be supporting mediated search by linking people to people, rather than directly to information?

Transparency: is it possible to offer more transparency in relevance ranking without losing ground in the battle against spam and black-hat SEO?

To be clear, these are simply the questions that are on my mind–I’m speaking as an individual and not as a Google employee. That said, a great thing about being at Google is that there are people working on all of these areas. So I expect 2010 to be an exciting year!

Curious to hear what problems are on other people’s minds as we enter the new year. Comment away!

General

Faceted Web Search?

Researchers from Microsoft say it’s very challenging. Google is trying, but there’s a long way to go. And Eric Iverson just wrote me to describe his own preliminary efforts to build faceted search on top of Yahoo! BOSS.

I believe there’s a clearly established business case for faceted search inside the enterprise, for site search (especially for retail and media / publishing sites), even for vertical search on the open web. In all of these cases, relevance-ranked results are insufficient to meet a large subset of users’ more exploratory information needs, and HCIR approaches like faceted search are an easy sell.

But it seems much harder to make this case for general web search. The track record of startups in this space isn’t very encouraging. That could be because no one has done it right, but Clayton Christensen’s theory of disruptive innovation would suggest that a successful entrant wouldn’t have to have parity across the board, but would simply need to win on an underserved market segment. Perhaps the increasing use of faceted search for vertical search is how this process is playing out, and faceted search for general web search may end up being a slow agglomeration of verticals.

I’m curious if others have been pursuing efforts like Eric’s. Are the available APIs powerful enough to prototype your own faceted web search engine? If they aren’t, then is this a potential business opportunity for one of the major (or non-major) search engines to promote innovation by offering an open system? Or, if Yahoo! BOSS already offers such an open system, what should we make of the scale of its impact?

General

R.I.P. Modista

Long-time readers may recall my post about visual search startup Modista last November, or this guest post by one of its principals. Unfortunately, the story has a sad ending. I hope that both this technology and its developers find a good home.

General

LinkedIn Faceted Search Now Out Of Beta

Post author By Daniel Tunkelang
Post date December 15, 2009
3 Comments on LinkedIn Faceted Search Now Out Of Beta

LinkedIn started rolling out a beta version of faceted people search back in July. Now it’s officially out of beta, as announced on their blog. I’ve re-posted the video above in case you missed it in July.

Interestingly, LinkedIn developed its own tool to support the combination of faceted search with social network search: Bobo-Browse (Otis mentioned it in our recent presentation to the New York CTO Club). I helped develop similar functionality when I was at Endeca, so I know how hard this problem is. LinkedIn has done an impressive job–and has applied it to one of the most valuable data sets on the web. Bravo!

But I can’t help asking for just one more thing. LinkedIn has great semi-structured data about its 50+ million members. I’d love to be able to explore that data using more facets–in particular, facets relating to people’s job skills and expertise. I hope that’s something they’re working on. Perhaps a good topic of conversation at the upcoming Workshop on Search and Social Media!

General

Karaoke: A Hotbed for Micro-IR?

Post author By Daniel Tunkelang
Post date December 13, 2009
7 Comments on Karaoke: A Hotbed for Micro-IR?

I’m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I’m sure I’m not the only person who encounters this micro-IR problem, and it occurred to me that there might be better technical solutions to it.

Most karaoke venues provide printed song books, typically sorted by title and by artist. This approach is certainly adequate for very limited selections, but it doesn’t scale gracefully. Indeed, one of my favorite karaoke bars, the Courtside in Cambridge, MA, has a fantastic song selection that is only accessible through printed books. Kinda frustrating for a search guy, even though the staff is very helpful!

My regular karaoke venue in New York, Second on Second, is a bit more technologically advanced: it provides computers with dedicated software that allows patrons to search through their song catalog. Aside from being faster than thumbing through books, the software makes it possible to find songs when you only remember words that are in the middle of song or artist names.

But even such a system only addresses known-item search–in this case, looking for a song or artist by name when you know precisely what you are looking for. There’s room for incremental improvement here, e.g., searching for songs based on the lyrics you remember. For example, many people remember a famous David Bowie song based on its protagonist “Major Tom” rather than its title “Space Oddity“; fortunately, tools like Google’s music search are happy to make such connections.

But none of the karaoke search technology I’ve see to date supports exploration. Specifically, I’d love to go into a karaoke bar and have a procedure for finding songs I know that is better than trial and error. For example, I’d like to be able to see my options for hard rock 80s songs with male vocals. Or to find out which downtempo bands, if any, are on the menu. A little faceted search would go a long way towards making the song-finding experience more pleasant and efficient.

But why stop there? I’d really like a system that suggests songs based on what it knows about me. For example, knowing that I like to sing Scorpions songs is a reasonable basis to suggest similar artists like Def Leppard and Guns N’ Roses. Or perhaps to suggest 80s songs in general–after all, karaoke roulette notwithstanding, most people sing songs they know (or at least think they know), and their song knowledge tends to have some temporal locality. I’m sure you can imagine far more sophisticated personalization–and such personalization could be accomplished with complete transparency to the user.

Even if you aren’t into karaoke (and yet have managed to read this far!), I hope you can appreciate the universality of the information needs I’m describing. Exploratory search is everywhere. But I think it’s easiest to demonstrate its practical importance by working through concrete use cases. As an HCIR advocate, I’ve repeatedly learned the lesson that such demonstrations are critical in order to successfully evangelize this worldview.

General

Faceted Search Presentation at New York CTO Club

Post author By Daniel Tunkelang
Post date December 10, 2009

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=facetedsearchnyctotalk-091210081555-phpapp01&stripped_title=faceted-search-nycto-talk

Otis Gospodnetic and I recently gave a talk at the New York CTO Club on faceted search. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my book or attended the UIE virtual seminar a few months ago that I gave with Pete Bell (whom I worked with for 10 years at Endeca) might recognize some of my material. Otis focused on the specifics of implementing faceted search using the open-source Solr platform.

Here were the major take-aways:

Think about what users are trying to do, not just how they search.
Facets get polluted with bad result sets, so offer clarification before refinement.
Don’t just move the information overload problem to the facets! Show less, not more.
Look at the potential data facets you already have, you will be surprised.
Facets can come from new data, e.g. sentiment.

General

Search User Interfaces and Data Quality

Post author By Daniel Tunkelang
Post date December 3, 2009
29 Comments on Search User Interfaces and Data Quality

One of the many things I’ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about HCIR. Indeed, some of the folks working on “more and better search refinements” are just steps away from my desk. Very cool!

But working on the inside has also help me appreciate what Bob Wyman tried to tell me months ago–that Google has no philosophical predilection towards black box approaches, but rather is only limited by what technology makes possible and what its engineers can implement. I’d qualify that slightly by saying that I perceive an additional constraint: Google does have a strong predilection towards data-driven decisions. Some folks have found that approach objectionable in the context of interface design.

Anyway, if you’re a regular here, then you’re probably predisposed towards HCIR and exploratory search. In that case, I’d like to take a moment to help you appreciate the challenge I face on a day-to-day basis.

Which one of these two statements do you most agree with?

We need better data quality in order to support richer search user interfaces.
Richer search user interfaces allow us to overcome data quality limitations.

On one hand, consider two search engines whose interfaces are designed to support exploratory search: Cuil and Kosmix. Sometimes they’re great, e.g., [michael jackson] on Cuil and [iraq] on Kosmix. But look what can happen for queries that are further out in the tail, e.g. [faceted search] on Cuil [real time search] on Kosmix. Yes, the kinds of queries I make. 🙂 I don’t mean to knock these guys–they’re trying, and their efforts are admirable. Moreover, both generally return respectable search results on the first pages (in Kosmix’s case, through federation). But the search refinements can be way off, and that undermine the overall experience. I strongly suspect that the problem is one of data quality, along the lines of what others have argued.

On the other hand, some of the work that I did with colleagues at Endeca (e.g., work presented at HCIR 2008 on “Supporting Exploratory Search for the ACM Digital Library”) at least dangles the possibility that the second statement holds–namely, a richer user interface could help overcome data quality limitations. Interaction draws more of the information need out of the user, and the process may be able to mask imperfection in the data. For example, it’s clear to users–and clear from the search refinements–that [michael jackson beer] and [michael jackson -beer] are about different people. If we can just get that incremental information from the user, we don’t have to achieve perfection in named entity recognition and disambiguation.

I think there’s some truth in both arguments. Data quality is a major bottleneck for effectively delivering an exploratory search experience, and data quantity, much as it helps, is not a guarantee of quality. Richer interfaces offer the enticing possibility of leveraging human computation, but they also introduce the risk of disappointing and alienating users. Even for an HCIR zealot like me, the constraints of reality are sobering.

And yes, speed and computational cost matter too. But hey, it wouldn’t be a grand challenge if it were easy!

General

Fun with Google, Bing, and Yahoo

Post author By Daniel Tunkelang
Post date November 29, 2009
8 Comments on Fun with Google, Bing, and Yahoo

Web search is a fiercely competitive space–as Google points out, “competition is just one click away“. In practice, I take that claim with a grain of salt–but I do think the switching costs are much lower than in most competitive markets. With that in mind, it’s interesting to look at what happens if you search for the name of one of the major search engines on one of its competitor’s sites.

Google returns standard results for such searches:

Bing is generous to a fault, saving you a click if you choose to use one of its leading competitors:

Finally Yahoo, whose CEO claims “we have never been a search company,” seems quite eager to keep searchers from going elsewhere:

It’s easy to dismiss these queries as corner cases, but the logs show that they really happen. And, as browsers increasingly blur the line between an address bar and a search box, it’s not unreasonable to consider that switches between search engines are likely to commence with such queries.