Categories
Uncategorized

Blogs I Read: Search Facets

A couple of years ago, I started The Noisy Channel as a personal blog. Since my then-employer Endeca didn’t have a corporate blog, I became the company’s ambassador to the blogosphere, despite my protests that this was not a corporate blog.

But I’m pleased to report that Endeca now has is its own blog, aptly entitled Search Facets. I’m not usually a fan of corporate blogs, but I like the approach Endeca is taking to this one. The folks who have posted so far are Adam Ferrari (CTO), Vladimir Zelevinsky (Research Scientist), and Pete Bell (Co-Founder)–an indication that the blog will contain substance, rather than warmed-over press releases.

Indeed, the posts so far are nice and meaty. I particularly like Adam’s post about “Vertical stores for vertical web search?“–it’s nice to see read intelligent analysis from someone who understand the strengths of both MapReduce and column-oriented relational databases.

Anyway, I’m delighted that my former co-workers have taken to the blogosphere, and I look forward to reading what they have to say!

Categories
General

LinkedIn Search: A Look Beneath the Hood

Last week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale.

John was kind enough to publish his slides, and I’ve embedded them above. Unfortunately, there’s no recording of the extensive Q&A (which included various attempts to get John to reveal the precise details of LinkedIn’s data volume), but the slides are quite meaty.

Personally, I learned two surprising things from the talk.

First, I was surprised that LinkedIn dismisses index/cache warming as “cheating”, instead computing almost everything in real time. Specifically, I would have expected LinkedIn to cache information like a user’s set of degree-two connections: these are expensive to compute at query time, especially when the social graph is distributed and sharded by user. I did ask John whether LinkedIn recomputes a user’s degree-two network during a session, and he admitted that LinkedIn is sensible enough to “cheat” and not perform this expensive but almost useless re-computation.

Second, I learned about reference search, a feature I may have missed because it is only available for premium LinkedIn accounts. It’s a nice feature, allowing you to search against company + date range pairs. People who are familiar with implementing faceted search may recognize the preservation of such associations between facet values as a gnarly implementation challenge.

All in all, it was a treat to get this look under the hood, as well as to finally meet John in person. I also ran into Gene Golovchinsky there–so much for my spending a few days on the west coast incognito!

In any case, I’m looking forward to seeing Gene, some of John’s colleagues, and many more interesting people at the Search and Social Media Workshop (SSM 2010) on Wednesday. My apologies to those who aren’t able to attend this oversubscribed event. I promise to blog about it!

Categories
General

Workshop on Search and Social Media (SSM 2010)

The 3rd Annual Workshop on Search in Social Media (SSM 2010) will be held on Wednesday, February 3rd at the Polytechnic Institute in Brooklyn, NY. It’s co-located with the WSDM 2010 conference on Web Search and Data Mining. As a co-organizer, I’m proud to announce that the workshop program is now online.

It features a keynote from Jan Pedersen, Chief Scientist for Core Search at Microsoft, as well as an impressive set of posters and panel sessions. Other participants include:

  • Sihem Amer-Yahia, Yahoo!
  • Jon Elsas, CMU
  • Gene Golovchinksky, FXPAL
  • David Hendi, MySpace
  • LiangJie Hong, Lehigh U.
  • Jeremy Hylton, Google
  • Matthew Hurst, Microsoft
  • Hilary Mason, bit.ly
  • Richard McCreadie, U. of Glasgow
  • Mor Naaman, Rutgers U.
  • Meena Nagarajan, Wright State U.
  • Igor Perisic, LinkedIn
  • Jeremy Pickens, FXPAL

There’s still time to register if you’re interested!

Categories
General

Real Time Search Is Personal

The other day, I promised in a comment thread that I’d write about what I see as real use cases for real-time search. As it happens, I’m experiencing one right now.

As my wife, daughter, and I were walking home from a playground, we noticed a large number of fire trucks congregating a block away from our house. A quick search on Twitter explained what was going on, particularly by pointing us to this post on Gothamist–which as of this writing seems to be the only reporting about this incident.

I think this example tells us a lot about the utility of real-time search. Most of us don’t need real-time search to tell us about the news in Haiti, since a critical mass of major news providers is covering the story around the clock. Where real-time search matters most is at the personal level–specifically, when our personal urgency to obtain information is higher than that of the general population. In such situations, we’re willing to accept less polished–and even risk less accurate–information, particularly if the alternative is to wait until if and when news providers cover the story. At least to some extent, urgency trumps authority.

Yes, there are other use cases for conversational media like Facebook and Twitter, such as sharing the experience of watching a live event, or simply chatting with friends and strangers about arbitrary topics. But I wouldn’t consider such use of these media to be search. Real-time search, in my view, is about helping users obtain the latest information available–in accordance with their personal needs. Twitter and Google served me well today, and I’m grateful that real-time search gave me real-time peace of mind.

Categories
General

When Is Faceted Search Appropriate?

Earlier this week, Peter Morville and Mark Burrell presented a UIE virtual seminar on “Leveraging Search & Discovery Patterns For Great Online Experiences“. It sold out! And I thought Pete Bell and I had done well with our seminar on faceted search!

But I’m hardly surprised. Although I wasn’t able to attend it myself, I gather from Twitter and the blogosphere that it was a great presentation. I enjoyed serving as a reviewer for Peter’s new book on Search Patterns, and I contributed a bit to Endeca’s UI Design Pattern Library while I was there and Mark’s team was developing it.

In reading reactions to the seminar, I was particularly intrigued by a post entitled “Search and Browse” by Livia Labate on her fantastically named blog, “I think, therefore IA“. She raised a question that I think needs to be asked more often: when is (or isn’t) faceted search appropriate?

Her conversation with readers in a comment thread offered some possible answers:

  • Faceted search helps users who think in terms of attribute specifications as filtering criteria.
  • Faceted search supports search by exclusion, as opposed to by discovery.
  • Faceted search requires a set of useful facets that is neither too small nor too large.

I’d like to propose my own answers. Here are the conditions for which I see faceted search being most useful:

  • Faceted search supports exploratory use cases, in contrast to known-item search. For known-item search, users are better served by a search box to specify an item by name, or a non-faceted hierarchy to locate it. In contrast, faceted search optimizes for cases where users are either unsure of what they want or of how to specify it.
  • Faceted search helps users who need or want to learn about the search space as they execute the search process. Facets educate users about different ways to characterize items in a collection. If users do not need or want this education, they may be frustrated by an interface that makes them do more work.
  • The search space is classified using accurate, understandable facets that relate to the users’ information needs. As I’ve discussed before, data quality is often the bottleneck in designing search interfaces. Offering users facets that are either unreliable or unrelated to their needs is worse than providing no facets at all.

Given the above criteria, it’s not surprising that faceted search has been a huge success in online retail: shopping is often an exploratory learning experience, and retailers tend to have good data.

But the success of faceted search in retail overshadows other domains where faceted search may be even more valuable. My favorite example is faceted people search, most recently demonstrated by LinkedIn. I would love to see other entities (locations, businesses, etc.) receive similar treatment, at least in contexts where exploration is a common use case.

I think Livia is right to be skeptical about any interface that introduces complexity–and facets do introduce complexity. I hope that my guidelines help answer her question as to when that complexity is worthwhile and perhaps even necessary to help users satisfy their information needs.

Categories
Uncategorized

Can You “Near Me Now”?

Weren’t we just talking about what’s different about mobile search use cases and about how to make web search more exploratory? I may be biased, but I think that Google’s recently launched “near me now” button is a step in the right direction (no pun intended!) on both of these fronts.

I’m curious to hear unbiased feedback from iPhone and Android users who have gotten to play with it.

Categories
General

Search Questions for 2010: What’s On My Mind

Happy New Year to the Noisy Community and everyone else in virtual earshot! I hope everyone is entering 2010 well-rested and ready for great things. And I don’t just mean shiny new gadgets.

For me, 2009 marked the end of a decade-long run at Endeca, where I focused on bringing HCIR to enterprises. I’m particularly proud of two professional accomplishments: writing a book on faceted search, and organizing the SIGIR 2009 Industry Track.

But past is prologue. I spent the last several weeks of 2009 as a Noogler, and I launch into 2010 living and breathing search on the open web.

What’s on my mind? Here are some top-of-mind questions to which I hope to have better answers by this time next year:

  • Exploratory Search: how should we determine that users want a more exploratory search experience, rather than one that minimizes time to a best-effort result? How should we respond to queries that clearly don’t have a single best answers, such as queries of the form [category] or [category location]?
  • Mobile Search: should it be just like non-mobile search with a few tweaks to accommodate the device form factor? Or does / should mobile search fundamentally change the way we interact with information?
  • Real-Time Search: is it more than real-time indexing plus emphasizing recency as a query-independent relevance factor? What are the use cases, and how should we be addressing them?
  • Social / Collaborative Search: should we be looking to microblogging or other social media signals to augment (or even supplant!) link-based citations as authority cues? Should we be supporting mediated search by linking people to people, rather than directly to information?

To be clear, these are simply the questions that are on my mind–I’m speaking as an individual and not as a Google employee. That said, a great thing about being at Google is that there are people working on all of these areas. So I expect 2010 to be an exciting year!

Curious to hear what problems are on other people’s minds as we enter the new year. Comment away!