Categories
General

HCIR ’08: A Great Interaction!

I’m back from HCIR ’08 and pleased to report that it was a rousing success. We had about 40 attendees, including such HCIR luminaries as Gary Marchionini, Marti Hearst, and mc schraefel. Microsoft Research supplied us not only with space and great food, but also workshop co-chair Ryen White and keynote speaker Sue Dumais, not to mention distinguished attendees Ken Church and Ashok Chandra. With a group like that, it was clear we were in for a great workshop.

And a great workshop we had! Some highlights: 

  • Sue Dumais’s keynote on “Thinking Outside the (Search) Box” reviewed a variety of projects she and her colleagues at MSR have pursued in personal and personalized information retrieval.
  • Marti Hearst discussed design issues in faceted search interfaces, as well as extensions to the faceted model.
  • Steven Voida discussed a novel activitiy-based approach to personal information and task management.

At the beginning of the day, program chair Bill Kules had us write up our top HCIR concerns on post-it notes. We clustered these to form the basis for four breakout groups that discussed interactivity, task / workflow integration, sharing/collaboration, and results presentation. The results of these discussions, as well as the accepted papers, will be published online soon.

The workshop room also served as a space for posters. Posters were displayed throughout the day, and attendees congregated around posters and demos during the various breaks between sessions.

We concluded the workshop by soliciting feedback on how to improve it for next year. A fair number of attendees expressed interest in making the structure less formal, reducing the time spent on presentations and increasing the time available for more informal interaction. A number of folks expressed interest in continuing the discussions online, though there was no consensus on the best forum for doing so.

All in all, I was delighted by the energy of the group, and I believe that these workshops are helping to support HCIR efforts in both academia and industry.

Finally, I was delighted to recognize a number of Noisy Channel readers among the attendees. Conversely, Raman Chandrasekar was nice enough to (inadvertently) advertise this blog by leaving this screen up for a few minutes at the end of his presentation while he took Q&A:

I’ll keep folks here posted as more materials from the workshop become available. And, of course, you’ll be among the first to know where and when HCIR ’09 will take place.

Categories
General

In Defense of Web 2.0

Dave Kellogg has a nice post entitled “Web 2.Over?” in which he eloquently reviews the various reasons that most web 2.0 startups are “in for a reality check”.

But what I liked most about the post was his defense of the spirit of web 2.0:

While a swarm of eyeball-catching, oddly-named, twenty-something-led startups may get obliterated — outside venture circles at least — that wasn’t the point of web 2.0. To me, web 2.0 was, is, and remains an important collection of concepts that will endure:

  • A read/write web, where we can participate, update, annotate, comment, etc.
  • A social web, where there is awareness of relationships that can be leveraged appropriately
  • User-generated content, which is here to stay and always has been (think: radio call-in shows, Kids Say the Darndest Things, or America’s Funniest Home Videos)
  • The use of the web for communication and entertainment. People are natural communicators. We will always adapt our tools to that fundamental need.
  • A personalized web, that understands what we like and how we like to get it

Amen! The good news is that there is no turning back on this vision of a more interactive online medium. Today it’s blogs and tweets; tomorrow it may be something we haven’t even imagined. But, now that an increasing number of us fancy ourselves as publishers and communicators, I don’t see us giving up that power without a fight.

Categories
Uncategorized

New Wikipedia Entry: Faceted Search

I used a six-hour plane flight to finally take a crack at a Wikipedia entry for faceted search. I did recycle the faceted browser entry, but you’ll see that I largely overhauled it.

I’ve tried my best to capture the main academic and commercial efforts in this space. But I realize that, particularly on the commercial front, the entry is likely to attract attention from enterprise search vendors, particularly those that may feel slighted from not being included in the entry. I also realize that, despite my concerted attempt to write the entry from a neutral point of view, Wikipedia editors may have a knee-jerk reaction that I have a conflict of interest because my association with Endeca. I do think, however, that Endeca is one of the few notable vendors associated with faceted search, and that it is appropriate for Endeca to receive particular mention in the history of its commercial application.

I respect the challenge that Wikipedia editors face, especially when they are curating content outside their areas of expertise. As before I call upon the readership here to help out. If you see anything missing, add it! If you see anything wrong or misleading, fix it! Remember: ask not what Wikipedia can do for you; ask what you can do to improve Wikipedia.

Categories
Uncategorized

On My Way to HCIR ’08

Just wanted to let folks know that I’ll be offline for the next couple of days, attending HCIR ’08. I’m excited about the workshop and promise to blog about it when I return.

Categories
Uncategorized

Wikipedia and the Meaning of Truth

Nice article from Simson Garfinkel in Technology Review: “Wikipedia and the Meaning of Truth“.

An excerpt:

So what is Truth? According to Wikipedia’s entry on the subject, “the term has no single definition about which the majority of professional philosophers and scholars agree.” But in practice, Wikipedia’s standard for inclusion has become its de facto standard for truth, and since Wikipedia is the most widely read online reference on the planet, it’s the standard of truth that most people are implicitly using when they type a search term into Google or Yahoo. On Wikipedia, truth is received truth: the consensus view of a subject.

Categories
General

Tag Clouds: the Good, the Bad, and the Ugly

A while back, I promised to write about tag clouds. I’m a man of my word, and I apologize for the delay in getting to this promise.

First, let’s define tag clouds. A tag cloud is a visual depiction of a set of words or phrases that characterize a set of documents. While “tag” suggests that the words and phrases are user-generated, the contents of tag clouds are often supplied by authors or even automatically extracted. Typically, tag clouds order tags alphabetically and use the size (or some similar typographical aspect) of a tag to indicate its frequency or relevance to the document set.

Tag clouds have been derided as “the mullets of Web 2.0” (I believe the original “mullet” critic was Jeffrey Zeldman). As someone who at least finds himself advising clients about how to improve user experience, I have seen companies clamor for tag clouds without necessarily thinking through how users would benefit from them. Indeed, while a picture may be worth a thousand words, a tag cloud may simply look like a thousand words.

The Good

My favorite example of a tag cloud interface is the ESPN website. Here is a “before” and “after” view of Roger Clemens:

Before:

After:

I know that both Red Sox and Yankees fans read this blog, so I won’t take sides on the accuracy of the Mitchell Report, But the change in the tag cloud clearly and concisely shows how the news about Roger Clemens changed when that report came out.

The Bad

Unfortunately, tag clouds that offer insight are the exception, rather than the rule. Part of the problem is that tag clouds are only as good as the tags they depict:  garbage in, garbage out. Tag clouds can also be so large and heterogeneous. Ryan Turner cites Flickr as such an example in his post, “Tag Clouds Are Bad (Usually)“:

The Ugly

As Greg recently blogged, tag clouds generated by social tagging systems can be worse than unhelpful; they can be actively misleading. Since tag clouds often occupy prime real estate on web site, they are a natural target for what Gartner analyst Whit Andrews calls “denial of insight” attacks.

In summary, tag clouds are a too-often abused but sometimes useful means to communicate information about a set of documents. But sites need to avoid presenting tag clouds simply expose the poverty of their tagging.

Also, while tag clouds may be an appropriate visualization for summarizing of a set of documents, they may not be the best means of presenting users with options for refning it. My colleagues and I discuss this problem in our upcoming HCIR presentation, and I’ll blog about it when I get back from the workshop.

Categories
General

Disincenting Spam

Greg called my attention today to news that Digg is shifting from popularity-based aggregation to personalized news. I can’t say I’m thrilled at the prospect of a system that “would make guesses about what [users] like based on information mined from the giant demographic veins of social networks”. I don’t suppose the results are necessarily worse than showing users stories based solely on their popularity, but at least the latter offers me some transparency.

But it was an older post Greg pointed to that caught my attention: “Combating web spam with personalization“. Here is his argument in a nutshell:

Personalized search shows different search results to different people based on their history and their interests. Not only does this increase the relevance of the search results, but also it makes the search results harder to spam.

In this 2006 post, Greg is specifically referring to the personalized search that Google was beta testing back in 2004. Google has since implemented personalized search, but without sharing much detail about how it works.

Nonetheless, Greg’s argument reminds me of one of the first posts I wrote on this blog. I was criticizing Google’s approach of keeping its relevance approach secret and particularly the argument that Amit Singhal has advanced to justify it–that the subjectivity of relevance makes it harder to develop an open approach to relevance. My response: “the subjectivity of relevance should make the adversarial problem easier rather than harder, as has been observed in the security industry”. 

I suppose personalization can help fight spam even if it is not coupled with transparency to the user. But what a great opportunity to do both by providing more user control over the information seeking process.

Categories
General

Google Exec Udi Manber: In-House Search is “Not That Good”

On Friday, David Needle of InternetNews published an article with the provocative title, “Google Exec Disses Google’s In-House Search“. The essence of the article: Udi Manber, the Google VP of Engineering who is responsible for core search, evaluated Google’s internal search tools less than enthusiastically, saying “It’s not that good — I’m complaining about it”.

The article is a bit short on details. It quotes Nitin Mangtani describing recent updates to the Google Search Appliance to enable clustering of search results. But the most telling snippet is towards the end of the article, when Manber expresses his views on user interfaces:

While the search giant is constantly tinkering with new user interfaces, Manber said the simplicity of its standard, bare bones design remains tough to beat.

“Google has been very successful by being very minimal,” Manber said. “We’re doing hundreds of experiments with user interfaces; I see two to three new ones everyday.”

He added that Google might offer users the option of different views on its main search page, similar to the way it does so already on its personalized iGoogle page.

“Otherwise, I expect very incremental changes.” He said advanced users appreciate things like 3D and interfaces that offer more detailed views, but for the vast majority, “what happens now works. You type in two words, click and you’re done. You can’t beat that.”

To borrow a popular political slogan, yes we can. In fact, as IDC analyst Sue Feldman (also quoted in the article) said the other day, “One of the problems we have with search is that people ask such lousy questions…anytime tools hand people clues, it helps.”

Google’s success on the consumer web affords them the luxury of hiding their heads in the sand when it comes to enterprise information access. And I understand the appeal that Google’s computer scientists (and others) feel in approaching the information seeking problem as one of optimizing relevance ranking.

It’s not just Manber. Here are some quotes from Google Enterprise Product Manager Cyrus Mistry at a recent presentation:

  • “[the ideal search engine] knows exactly what you meant, gives you exactly what you want.”
  • “If you think tagging is the way to go, good luck. See me in 10 years.”
  • “We’ll decide where to show it” (an explanation of the value proposition of Google’s universal search, which blends results from multiple sources into a single ranking)

It’s easy for Google to be cocky when they’re making $1.35B in quarterly profits. But it doesn’t make them right, especially when it comes to an area that accounts for about 1% of their business. Mind reading may not be impossible; some of my colleagues at CMU are working on it as we speak.

In the mean time, the only practical means our systems have for determining user intent is their input. And, as has been widely reported, the average search query contains 1.7 words. Perhaps the entropy of web search makes it possible to reliably infer intent from such a small signal. But enterprise search–which is to say information seeking in the enterprise–is harder.

At Endeca, we use our own technology in house. Our solution isn’t perfect, and we’re constantly working to improve it. But, most importantly, we’re going after the right problem. To respond to Mistry’s comments:

  • The information access tool does not presume to know exactly what you meant or what you want, but instead works with you to establish this understanding through dialogue.
  • Tagging can be a very effective way to bring in human expertise, especially when it is distributed across a broad population of users. But the tagging mechanism has to be easy for users, and the system needs to be smart about extrapolating from those tags to fill in the gaps.
  • Often the best way to present diverse results is not by blending them into a single ranking but rather exposing that diversity to users in the form of a progressive refinement dialogue.

Google aspires to “organize the world’s information” but admits that its approach falls short when it comes to organizing the information inside their firewall. I commend Manber for his candor. But I hope he and his fellow Googlers take the next step and recognize that they have to think outside the search box.

    Categories
    Uncategorized

    Advertisers are Irrational

    I’ve learn a lot from what Herb Simon, Danny Kahneman, and others tell us about the fallacy of assuming that human behavior conforms to unbounded–or even bounded–rationality. But it’s always nice to see reminders in real-world scenarios, especially ones where real money is at stake.

    If you enjoy this topic, I recommend Greg Linden’s post: “Are advertisers rational?“. Or, if you’re up for it, read the original paper by Jason Auerback, Joel Galenson, and Mukund Sundararajan: “An Empirical Analysis of Return on Investment Maximization in Sponsored Search Auctions“.

    Categories
    Uncategorized

    Blogs I Read: Jeff’s Search Engine Caffe

    One of the great things about blogging is that its public nature helps keep me honest. For all that I talk about “give to get,” I could do a bit more of it myself. One way I’d like to try is by adding a new category of posts called Blogs I Read to talk about other blogs that appeal to me and, I hope, to readers here at The Noisy Channel.

    To inaugurate this series, I’m starting with Jeff’s Search Engine Caffe, published by Jeff Dalton. Jeff is a grad student in the PhD program at UMass Amherst’s Center for Intelligent Information Retrieval. He’s a bit more practically minded than your average PhD student in information retrieval, perhaps owing to his previous experience as a software engineer at Globalspec, where he worked on vertical search for engineering and manufacturing.

    It’s thanks to Jeff that I’m blogging in the first place. I first met Jeff at SIGIR 2006 in Seattle, but it was at ECIR 2008 in Glasgow that he persuaded me to start a blog. Moreover, his advertising my blog on his own was a critical factor in helping me build up a critical mass of readers.

    But I hardly need gratitude as a pretext to read Jeff’s blog. Jeff does a great job of keeping up with happenings in information retrieval, particulary those that span academia and industry like Yahoo BOSS and developments in blog search.

    I know that graduate students aren’t exactly encouraged to blog, since the currency of the realm is peer-reviewed publication. But I hope that Jeff keeps up blogging as a way to share his ideas with a broader audience.