Categories
General

LinkedIn Faceted Search Now Out Of Beta

LinkedIn started rolling out a beta version of faceted people search back in July. Now it’s officially out of beta, as announced on their blog. I’ve re-posted the video above in case you missed it in July.

Interestingly, LinkedIn developed its own tool to support the combination of faceted search with social network search: Bobo-Browse (Otis mentioned it in our recent presentation to the New York CTO Club). I helped develop similar functionality when I was at Endeca, so I know how hard this problem is. LinkedIn has done an impressive job–and has applied it to one of the most valuable data sets on the web. Bravo!

But I can’t help asking for just one more thing. LinkedIn has great semi-structured data about its 50+ million members. I’d love to be able to explore that data using more facets–in particular, facets relating to people’s job skills and expertise. I hope that’s something they’re working on. Perhaps a good topic of conversation at the upcoming Workshop on Search and Social Media!

Categories
General

Karaoke: A Hotbed for Micro-IR?

I’m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I’m sure I’m not the only person who encounters this micro-IR problem, and it occurred to me that there might be better technical solutions to it.

Most karaoke venues provide printed song books, typically sorted by title and by artist. This approach is certainly adequate for very limited selections, but it doesn’t scale gracefully. Indeed, one of my favorite karaoke bars, the Courtside in Cambridge, MA, has a fantastic song selection that is only accessible through printed books. Kinda frustrating for a search guy, even though the staff is very helpful!

My regular karaoke venue in New York, Second on Second, is a bit more technologically advanced: it provides computers with dedicated software that allows patrons to search through their song catalog. Aside from being faster than thumbing through books, the software makes it possible to find songs when you only remember words that are in the middle of song or artist names.

But even such a system only addresses known-item search–in this case, looking for a song or artist by name when you know precisely what you are looking for. There’s room for incremental improvement here, e.g., searching for songs based on the lyrics you remember. For example, many people remember a famous David Bowie song based on its protagonist “Major Tom” rather than its title “Space Oddity“; fortunately, tools like Google’s music search are happy to make such connections.

But none of the karaoke search technology I’ve see to date supports exploration. Specifically, I’d love to go into a karaoke bar and have a procedure for finding songs I know that is better than trial and error. For example, I’d like to be able to see my options for hard rock 80s songs with male vocals. Or to find out which downtempo bands, if any, are on the menu. A little faceted search would go a long way towards making the song-finding experience more pleasant and efficient.

But why stop there? I’d really like a system that suggests songs based on what it knows about me. For example, knowing that I like to sing Scorpions songs is a reasonable basis to suggest similar artists like Def Leppard and Guns N’ Roses. Or perhaps to suggest 80s songs in general–after all, karaoke roulette notwithstanding, most people sing songs they know (or at least think they know), and their song knowledge tends to have some temporal locality. I’m sure you can imagine far more sophisticated personalization–and such personalization could be accomplished with complete transparency to the user.

Even if you aren’t into karaoke (and yet have managed to read this far!), I hope you can appreciate the universality of the information needs I’m describing. Exploratory search is everywhere. But I think it’s easiest to demonstrate its practical importance by working through concrete use cases. As an HCIR advocate, I’ve repeatedly learned the lesson that such demonstrations are critical in order to successfully evangelize this worldview.

Categories
General

Faceted Search Presentation at New York CTO Club

Otis Gospodnetic and I recently gave a talk at the New York CTO Club on faceted search. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my book or attended the UIE virtual seminar a few months ago that I gave with Pete Bell (whom I worked with for 10 years at Endeca) might recognize some of my material. Otis focused on the specifics of implementing faceted search using the open-source Solr platform.

Here were the major take-aways:

  • Think about what users are trying to do, not just how they search.
  • Facets get polluted with bad result sets, so offer clarification before refinement.
  • Don’t just move the information overload problem to the facets! Show less, not more.
  • Look at the potential data facets you already have, you will be surprised.
  • Facets can come from new data, e.g. sentiment.
Categories
Uncategorized

Blogs I Read: Living La Vida Local

My new role at Google (yes, it still feels new after not quite a month!) has given me a professional interest in local search. I’ve adjusted my reading materials accordingly, and I’ve started reading blogs that focus on local. Here are a handful that I’ve discovered so far:

Not surprisingly, these blogs offers me a critical perspective on how Google and other search engines serve the local space.  Granted, everyone has their own motives–and it’s hard to avoid some tension in a space with the competitive dynamics of local search. But now that I’m no longer an outsider myself, I appreciate having others to help keep me honest as I work to make local search better for users and businesses.

Categories
General

Search User Interfaces and Data Quality

One of the many things I’ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about HCIR. Indeed, some of the folks working on “more and better search refinements” are just steps away from my desk. Very cool!

But working on the inside has also help me appreciate what Bob Wyman tried to tell me months ago–that Google has no philosophical predilection towards black box approaches, but rather is only limited by what technology makes possible and what its engineers can implement. I’d qualify that slightly by saying that I perceive an additional constraint: Google does have a strong predilection towards data-driven decisions. Some folks have found that approach objectionable in the context of interface design.

Anyway, if you’re a regular here, then you’re probably predisposed towards HCIR and exploratory search. In that case, I’d like to take a moment to help you appreciate the challenge I face on a day-to-day basis.

Which one of these two statements do you most agree with?

  1. We need better data quality in order to support richer search user interfaces.
  2. Richer search user interfaces allow us to overcome data quality limitations.

On one hand, consider two search engines whose interfaces are designed to support exploratory search: Cuil and Kosmix. Sometimes they’re great, e.g., [michael jackson] on Cuil and [iraq] on Kosmix. But look what can happen for queries that are further out in the tail, e.g. [faceted search] on Cuil [real time search] on Kosmix. Yes, the kinds of queries I make. 🙂 I don’t mean to knock these guys–they’re trying, and their efforts are admirable. Moreover, both generally return respectable search results on the first pages (in Kosmix’s case, through federation). But the search refinements can be way off, and that undermine the overall experience. I strongly suspect that the problem is one of data quality, along the lines of what others have argued.

On the other hand, some of the work that I did with colleagues at Endeca (e.g., work presented at HCIR 2008 on “Supporting Exploratory Search for the ACM Digital Library”) at least dangles the possibility that the second statement holds–namely, a richer user interface could help overcome data quality limitations. Interaction draws more of the information need out of the user, and the process may be able to mask imperfection in the data. For example, it’s clear to users–and clear from the search refinements–that [michael jackson beer] and [michael jackson -beer] are about different people. If we can just get that incremental information from the user, we don’t have to achieve perfection in named entity recognition and disambiguation.

I think there’s some truth in both arguments. Data quality is a major bottleneck for effectively delivering an exploratory search experience, and data quantity, much as it helps, is not a guarantee of quality. Richer interfaces offer the enticing possibility of leveraging human computation, but they also introduce the risk of disappointing and alienating users. Even for an HCIR zealot like me, the constraints of reality are sobering.

And yes, speed and computational cost matter too. But hey, it wouldn’t be a grand challenge if it were easy!

Categories
General

Fun with Google, Bing, and Yahoo

Web search is a fiercely competitive space–as Google points out, “competition is just one click away“. In practice, I take that claim with a grain of salt–but I do think the switching costs are much lower than in most competitive markets. With that in mind, it’s interesting to look at what happens if you search for the name of one of the major search engines on one of its competitor’s sites.

Google returns standard results for such searches:

[bing] on Google

 on Google

Bing is generous to a fault, saving you a click if you choose to use one of its leading competitors:

[google] on Bing

 on Bing

Finally Yahoo, whose CEO claims “we have never been a search company,” seems quite eager to keep searchers from going elsewhere:

[bing] on Yahoo

[google] on Yahoo

It’s easy to dismiss these queries as corner cases, but the logs show that they really happen. And, as browsers increasingly blur the line between an address bar and a search box, it’s not unreasonable to consider that switches between search engines are likely to commence with such queries.

Categories
General

Marti Hearst: Tech Talk on Search User Interfaces

Earlier this week, Marti Hearst gave a Tech Talk at Google about her recently published book, Search User Interfaces. Fortunately for those of us who missed (myself included!), it is now available on YouTube. Enjoy! (via Jon Elsas)

Categories
General

Can We Learn From Anti-Social Users?

One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise–be they links, re-tweets, or any number other ways that online consumers (or more correctly “prosumers”) actively and passively communicate value judgments about information. On the other hand, our reliance on these social signals makes us vulnerable to positive feedback and spammers.

Consider MusicLab, an “experimental study of self-fulfilling prophecies in an artificial cultural market“. In this study, sociologists Matt Salganik, Peter Dodds, and Duncan Watts manipulated the social information available to consumers (specifically teens) regarding their peers’ musical tastes. The experimenters’ goal was to empirically validate a quantitative model of social contagion.

But we can look at this study another way: by isolating the social factors that influence musical taste, the experimenters were also isolating the non-social signal–in theory, how popular a song would be in the absence of social signaling. Indeed, they found that, if they measured a song’s quality by isolating out the social factor, “the best songs never do very badly, and the worst songs never do extremely well, but almost any other result is possible”.

It’s interesting–interesting to me, at least!–to ask if search engines can do the same for search. One of the frequent objections to link-based authority measures like PageRank is that they make the rich get richer. “Real-time” variants like re-tweet frequency (and even TunkRank) suffer from the same weakness. Unchecked, these measures can cause authority / influence market has to resemble a winner-take-all market.

It strikes me as interesting to learn from cases where searchers swim upstream against the social signals to find information. Of course, you may already see the contradiction–this is just another kind of social signaling! Still, it seems like it might be a way to hedge our bets and against the weaknesses of positive feedback and spammers. In a similar vein, we might look at how users find information that suffers from poor accessibility or retrievability.

I don’t have answers about how to pursue such an approach, or whether it would even be feasible to do so. But I hope you agree with me that it’s an interesting question.

Categories
General

Exploring Exploratory Search

Google’s recently released Image Swirl is slick. But I’ve been struggling to figure out whether it’s useful or simply a showcase for cool technology.

And that’s prompted me to think about the overloaded term “exploratory search“. A while back, I tried to define exploratory search based on what it is not. This time, let me aim to positively characterize what I see as its two primary use cases:

  1. I know what I want, but I don’t know how to describe it.
  2. I don’t know what I want, but I hope to figure it out once I see what’s out there.

The first use case cries out for tools that support query refinement or elaboration. Existing tools span a range from suggesting spelling corrections (aka “did you mean”) to offering semantically or statistically related searches that hopefully provide the user with at least a step in the right direction. One of my favorite approaches, faceted search, is primarily used to support query refinement through progressive narrowing of an initial search query.

The second “I don’t know what I want” use case is fuzzier. In the language of machine learning, this use case is unsupervised, while the previous one is supervised. In general, it’s a lot harder to define or evaluate outcomes for unsupervised scenarios. Indeed, Hal Daume has argued that we should only do unsupervised learning if we do not have a trustworthy automatic evaluation metric. That’s a strong position, and you can see some of the counterarguments in his comment thread. But, going back to our scenario, it’s really hard to judge the effectiveness of tools like similarity browsing when they support exploration in the absence of any concrete goal.

With that in mind, I’ll reserve judgment on the utility of tools like Image Swirl. To the extent that it aims at the first use case, clustering images for a particular search, I’m ambivalent. I’d prefer a more transparent interface, in which I have more of a sense of control over the navigational experience. I suspect it is more aimed at the second use case, offering a compact visualization of what is out there.

Besides, as some folks have brought up at the HCIR workshops, it’s important that we make information seeking fun. And Swirl certainly scores on that front.

Categories
General

An Ad-Supported Model With Teeth?

A computer-implemented method for operating a device, the method comprising:
disabling a function of an operating system in a device;
presenting an advertisement in the device while the function is disabled;
and enabling the function in response to the advertisement ending.

So reads the first claim from a patent application that Apple recently filed (with Steve Jobs as first inventor, no less!) for technology to deliver a rather compelling ad-supported business model. Or perhaps the better word is compulsory. You can read an analysis by Randall Stross in the New York Times.

I agree with Stross that it’s hard to imagine Apple ever implementing the technology described by the patent application–indeed, Apple has been one of the few success stories for paid digital content models. That said, the approach does feel like at least one endpoint for the ad-supported model–it guarantees the advertisers the attention that they are paying for by subsidizing content or services.

The advertising business is a bit more top of mind for me, now that it pays my salary. Google’s approach, however, follows the aphorism that honey catches more flies than vinegar: it tries to target ads well enough that users want to click on them, rather than to simply endure them as a cost of subsidizing free services. Google’s revenue (and the popularity of PPC models in general) is a testament to the success of this approach, my occasional rant notwithstanding.

In general, the industry seems to have found a compromise in how aggressively to push ads at users. Users can safely ignore (or even block) sponsored links, but few people do.  Pre-roll ads on video sites (i.e., advertising before a video starts)  are more invasive, but a number of sites let users skip them. You can read why the YouTube folks are testing this approach. Advertisers–or at least ad-supported services–seem to recognize that they can’t cross the line between pursuing users’ attention and annoying users to the point of alienation.

Still, technology like Apple’s patent application describes shows that it is possible for the ad-supported model to take a more more aggressive approach. Part of me wonders if more aggressive ad-supported models would revitalize paid content models, as users would stop perceiving the former as free. But I suspect that the gentler ad-supported model is here to stay, and that it will continue to strive toward the point of optimal effectiveness.