Categories
Uncategorized

Forget Real-Time, Give Us Over Time!

In a recent announcement, Twitter Platform / API Product Manager Ryan Sarver tells us that Twitter is:

committed to providing a framework for any company big or small, rich or poor to do a deal with us to get access to the Firehose in the same way we did deals with Google and Microsoft. We want everyone to have the opportunity — terms will vary based on a number of variables but we want a two-person startup in a  garage to have the same opportunity to build great things with the full feed that someone with a billion dollar market cap does. There are still a lot of details to be fleshed out and communicated, but this a top priority for us and we look forward to what types of companies and products get built on top of this unique and rich stream.

That and some other details, like raising the API rate limit from 150 requests per hour to 1500,  may well bring on what Marshall Kirkpatrick of ReadWriteWeb calls “Twitter 2.0“. But it was something else in Kirkpatrick’s write up that caught my attention–this quote from Wow.ly co-founder Kevin Marshall:

The more I do with and around social data, the less interested I seem to become in ‘realtime’ and the more interested I become in ‘over time.’ When I first started hacking on Twitter (and Facebook) apps, I was in love with the idea of parsing and analyzing data in real-time and I was very link/content focused. But the more I build and use these tools, the more I see the value in the history and the trails of the data set.

I couldn’t have said it better! Not that I haven’t tried: you look back at my post about Topsy, you’ll see where real-time and over time meet. Recency matters, but the signal is far too sparse without some way to aggregate and analyze over time.

I’m thrilled that Twitter plans to open up its platform in a way that could enable analysis over semantic, social, and temporal dimensions. Now I’m curious to see what that access will look like, and what everyone has been clamoring for that access will do with it.

Categories
General

Faceted Web Search?

Researchers from Microsoft say it’s very challenging. Google is trying, but there’s a long way to go. And Eric Iverson just wrote me to describe his own preliminary efforts to build faceted search on top of Yahoo! BOSS.

I believe there’s a clearly established business case for faceted search inside the enterprise, for site search (especially for retail and media / publishing sites), even for vertical search on the open web. In all of these cases, relevance-ranked results are insufficient to meet a large subset of users’ more exploratory information needs, and HCIR approaches like faceted search are an easy sell.

But it seems much harder to make this case for general web search. The track record of startups in this space isn’t very encouraging. That could be because no one has done it right, but Clayton Christensen’s theory of disruptive innovation would suggest that a successful entrant wouldn’t have to have parity across the board, but would simply need to win on an underserved market segment. Perhaps the increasing use of faceted search for vertical search is how this process is playing out, and faceted search for general web search may end up being a slow agglomeration of verticals.

I’m curious if others have been pursuing efforts like Eric’s. Are the available APIs powerful enough to prototype your own faceted web search engine? If they aren’t, then is this a potential business opportunity for one of the major (or non-major) search engines to promote innovation by offering an open system? Or, if Yahoo! BOSS already offers such an open system, what should we make of the scale of its impact?

Categories
General

R.I.P. Modista

Long-time readers may recall my post about visual search startup Modista last November, or this guest post by one of its principals. Unfortunately, the story has a sad ending. I hope that both this technology and its developers find a good home.

Categories
Uncategorized

Recovering From Being Hacked

I discovered today that I’d been hacked earlier this week by a spam link injection attack. I’m still not sure how it happened, but I believe I’ve cleaned out all of the offending PHP from my WordPress installation. I’ve also removed most of my plug-ins in the process, and I may have broken some things in my zeal to clean up the site. My apologies for any inconveniences, and my thanks to @awaisathar and @gsingers for helping me resolve this quickly.

Categories
Uncategorized

Blogs I Read: UXmatters

According to Wikipedia, user experience is “the overarching experience a person has as a result of their interactions with a particular product or service, its delivery, and related artifacts, according to their design.” While I’ve never labeled myself a designer, I have always cared deeply about user experience, even back before my information retrieval days, when I was working on graph drawing. Indeed user experience is the defining problem for HCIR.

One of my favorite resources for learning about user experience is the UXmatters blog. This group blog boasts a set of authors that represent a diverse collection of industry practitioners (and one academic) and offer concrete case studies and recommendations.

For example, in “Best Practices for Designing Faceted Search Filters“, Greg Nudelman offers a constructive critique of the Office Depot search user interface. Some of his material will be familiar to those who have read my faceted search book (particularly the chapter on front-end concerns), but the focus on a single example makes for a compelling read. I also liked Greg’s most recent post, entitled “Cameras, Music, and Mattresses: Designing Query Disambiguation Solutions for the Real World“. I was amused that he and I use the same “canonical” example for the need to offer clarification before refinement. 🙂

Here are a few more posts from other authors to give you a taste for the blog:

If you are a user experience professional, in name or in deed, then you should be reading the the UXmatters blog — or perhaps even contributing to it. Of course, you’re always welcome to contribute a guest post here too.

Categories
General

LinkedIn Faceted Search Now Out Of Beta

LinkedIn started rolling out a beta version of faceted people search back in July. Now it’s officially out of beta, as announced on their blog. I’ve re-posted the video above in case you missed it in July.

Interestingly, LinkedIn developed its own tool to support the combination of faceted search with social network search: Bobo-Browse (Otis mentioned it in our recent presentation to the New York CTO Club). I helped develop similar functionality when I was at Endeca, so I know how hard this problem is. LinkedIn has done an impressive job–and has applied it to one of the most valuable data sets on the web. Bravo!

But I can’t help asking for just one more thing. LinkedIn has great semi-structured data about its 50+ million members. I’d love to be able to explore that data using more facets–in particular, facets relating to people’s job skills and expertise. I hope that’s something they’re working on. Perhaps a good topic of conversation at the upcoming Workshop on Search and Social Media!

Categories
General

Karaoke: A Hotbed for Micro-IR?

I’m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I’m sure I’m not the only person who encounters this micro-IR problem, and it occurred to me that there might be better technical solutions to it.

Most karaoke venues provide printed song books, typically sorted by title and by artist. This approach is certainly adequate for very limited selections, but it doesn’t scale gracefully. Indeed, one of my favorite karaoke bars, the Courtside in Cambridge, MA, has a fantastic song selection that is only accessible through printed books. Kinda frustrating for a search guy, even though the staff is very helpful!

My regular karaoke venue in New York, Second on Second, is a bit more technologically advanced: it provides computers with dedicated software that allows patrons to search through their song catalog. Aside from being faster than thumbing through books, the software makes it possible to find songs when you only remember words that are in the middle of song or artist names.

But even such a system only addresses known-item search–in this case, looking for a song or artist by name when you know precisely what you are looking for. There’s room for incremental improvement here, e.g., searching for songs based on the lyrics you remember. For example, many people remember a famous David Bowie song based on its protagonist “Major Tom” rather than its title “Space Oddity“; fortunately, tools like Google’s music search are happy to make such connections.

But none of the karaoke search technology I’ve see to date supports exploration. Specifically, I’d love to go into a karaoke bar and have a procedure for finding songs I know that is better than trial and error. For example, I’d like to be able to see my options for hard rock 80s songs with male vocals. Or to find out which downtempo bands, if any, are on the menu. A little faceted search would go a long way towards making the song-finding experience more pleasant and efficient.

But why stop there? I’d really like a system that suggests songs based on what it knows about me. For example, knowing that I like to sing Scorpions songs is a reasonable basis to suggest similar artists like Def Leppard and Guns N’ Roses. Or perhaps to suggest 80s songs in general–after all, karaoke roulette notwithstanding, most people sing songs they know (or at least think they know), and their song knowledge tends to have some temporal locality. I’m sure you can imagine far more sophisticated personalization–and such personalization could be accomplished with complete transparency to the user.

Even if you aren’t into karaoke (and yet have managed to read this far!), I hope you can appreciate the universality of the information needs I’m describing. Exploratory search is everywhere. But I think it’s easiest to demonstrate its practical importance by working through concrete use cases. As an HCIR advocate, I’ve repeatedly learned the lesson that such demonstrations are critical in order to successfully evangelize this worldview.

Categories
General

Faceted Search Presentation at New York CTO Club

Otis Gospodnetic and I recently gave a talk at the New York CTO Club on faceted search. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my book or attended the UIE virtual seminar a few months ago that I gave with Pete Bell (whom I worked with for 10 years at Endeca) might recognize some of my material. Otis focused on the specifics of implementing faceted search using the open-source Solr platform.

Here were the major take-aways:

  • Think about what users are trying to do, not just how they search.
  • Facets get polluted with bad result sets, so offer clarification before refinement.
  • Don’t just move the information overload problem to the facets! Show less, not more.
  • Look at the potential data facets you already have, you will be surprised.
  • Facets can come from new data, e.g. sentiment.
Categories
Uncategorized

Blogs I Read: Living La Vida Local

My new role at Google (yes, it still feels new after not quite a month!) has given me a professional interest in local search. I’ve adjusted my reading materials accordingly, and I’ve started reading blogs that focus on local. Here are a handful that I’ve discovered so far:

Not surprisingly, these blogs offers me a critical perspective on how Google and other search engines serve the local space.  Granted, everyone has their own motives–and it’s hard to avoid some tension in a space with the competitive dynamics of local search. But now that I’m no longer an outsider myself, I appreciate having others to help keep me honest as I work to make local search better for users and businesses.

Categories
General

Search User Interfaces and Data Quality

One of the many things I’ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about HCIR. Indeed, some of the folks working on “more and better search refinements” are just steps away from my desk. Very cool!

But working on the inside has also help me appreciate what Bob Wyman tried to tell me months ago–that Google has no philosophical predilection towards black box approaches, but rather is only limited by what technology makes possible and what its engineers can implement. I’d qualify that slightly by saying that I perceive an additional constraint: Google does have a strong predilection towards data-driven decisions. Some folks have found that approach objectionable in the context of interface design.

Anyway, if you’re a regular here, then you’re probably predisposed towards HCIR and exploratory search. In that case, I’d like to take a moment to help you appreciate the challenge I face on a day-to-day basis.

Which one of these two statements do you most agree with?

  1. We need better data quality in order to support richer search user interfaces.
  2. Richer search user interfaces allow us to overcome data quality limitations.

On one hand, consider two search engines whose interfaces are designed to support exploratory search: Cuil and Kosmix. Sometimes they’re great, e.g., [michael jackson] on Cuil and [iraq] on Kosmix. But look what can happen for queries that are further out in the tail, e.g. [faceted search] on Cuil [real time search] on Kosmix. Yes, the kinds of queries I make. 🙂 I don’t mean to knock these guys–they’re trying, and their efforts are admirable. Moreover, both generally return respectable search results on the first pages (in Kosmix’s case, through federation). But the search refinements can be way off, and that undermine the overall experience. I strongly suspect that the problem is one of data quality, along the lines of what others have argued.

On the other hand, some of the work that I did with colleagues at Endeca (e.g., work presented at HCIR 2008 on “Supporting Exploratory Search for the ACM Digital Library”) at least dangles the possibility that the second statement holds–namely, a richer user interface could help overcome data quality limitations. Interaction draws more of the information need out of the user, and the process may be able to mask imperfection in the data. For example, it’s clear to users–and clear from the search refinements–that [michael jackson beer] and [michael jackson -beer] are about different people. If we can just get that incremental information from the user, we don’t have to achieve perfection in named entity recognition and disambiguation.

I think there’s some truth in both arguments. Data quality is a major bottleneck for effectively delivering an exploratory search experience, and data quantity, much as it helps, is not a guarantee of quality. Richer interfaces offer the enticing possibility of leveraging human computation, but they also introduce the risk of disappointing and alienating users. Even for an HCIR zealot like me, the constraints of reality are sobering.

And yes, speed and computational cost matter too. But hey, it wouldn’t be a grand challenge if it were easy!