Earlier this week, Marti Hearst gave a Tech Talk at Google about her recently published book, Search User Interfaces. Fortunately for those of us who missed (myself included!), it is now available on YouTube. Enjoy! (via Jon Elsas)
Category: General
General posts, typically analyzing HCIR issues.
Can We Learn From Anti-Social Users?
One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise–be they links, re-tweets, or any number other ways that online consumers (or more correctly “prosumers”) actively and passively communicate value judgments about information. On the other hand, our reliance on these social signals makes us vulnerable to positive feedback and spammers.
Consider MusicLab, an “experimental study of self-fulfilling prophecies in an artificial cultural market“. In this study, sociologists Matt Salganik, Peter Dodds, and Duncan Watts manipulated the social information available to consumers (specifically teens) regarding their peers’ musical tastes. The experimenters’ goal was to empirically validate a quantitative model of social contagion.
But we can look at this study another way: by isolating the social factors that influence musical taste, the experimenters were also isolating the non-social signal–in theory, how popular a song would be in the absence of social signaling. Indeed, they found that, if they measured a song’s quality by isolating out the social factor, “the best songs never do very badly, and the worst songs never do extremely well, but almost any other result is possible”.
It’s interesting–interesting to me, at least!–to ask if search engines can do the same for search. One of the frequent objections to link-based authority measures like PageRank is that they make the rich get richer. “Real-time” variants like re-tweet frequency (and even TunkRank) suffer from the same weakness. Unchecked, these measures can cause authority / influence market has to resemble a winner-take-all market.
It strikes me as interesting to learn from cases where searchers swim upstream against the social signals to find information. Of course, you may already see the contradiction–this is just another kind of social signaling! Still, it seems like it might be a way to hedge our bets and against the weaknesses of positive feedback and spammers. In a similar vein, we might look at how users find information that suffers from poor accessibility or retrievability.
I don’t have answers about how to pursue such an approach, or whether it would even be feasible to do so. But I hope you agree with me that it’s an interesting question.
Exploring Exploratory Search
Google’s recently released Image Swirl is slick. But I’ve been struggling to figure out whether it’s useful or simply a showcase for cool technology.
And that’s prompted me to think about the overloaded term “exploratory search“. A while back, I tried to define exploratory search based on what it is not. This time, let me aim to positively characterize what I see as its two primary use cases:
- I know what I want, but I don’t know how to describe it.
- I don’t know what I want, but I hope to figure it out once I see what’s out there.
The first use case cries out for tools that support query refinement or elaboration. Existing tools span a range from suggesting spelling corrections (aka “did you mean”) to offering semantically or statistically related searches that hopefully provide the user with at least a step in the right direction. One of my favorite approaches, faceted search, is primarily used to support query refinement through progressive narrowing of an initial search query.
The second “I don’t know what I want” use case is fuzzier. In the language of machine learning, this use case is unsupervised, while the previous one is supervised. In general, it’s a lot harder to define or evaluate outcomes for unsupervised scenarios. Indeed, Hal Daume has argued that we should only do unsupervised learning if we do not have a trustworthy automatic evaluation metric. That’s a strong position, and you can see some of the counterarguments in his comment thread. But, going back to our scenario, it’s really hard to judge the effectiveness of tools like similarity browsing when they support exploration in the absence of any concrete goal.
With that in mind, I’ll reserve judgment on the utility of tools like Image Swirl. To the extent that it aims at the first use case, clustering images for a particular search, I’m ambivalent. I’d prefer a more transparent interface, in which I have more of a sense of control over the navigational experience. I suspect it is more aimed at the second use case, offering a compact visualization of what is out there.
Besides, as some folks have brought up at the HCIR workshops, it’s important that we make information seeking fun. And Swirl certainly scores on that front.
An Ad-Supported Model With Teeth?
A computer-implemented method for operating a device, the method comprising:
disabling a function of an operating system in a device;
presenting an advertisement in the device while the function is disabled;
and enabling the function in response to the advertisement ending.
So reads the first claim from a patent application that Apple recently filed (with Steve Jobs as first inventor, no less!) for technology to deliver a rather compelling ad-supported business model. Or perhaps the better word is compulsory. You can read an analysis by Randall Stross in the New York Times.
I agree with Stross that it’s hard to imagine Apple ever implementing the technology described by the patent application–indeed, Apple has been one of the few success stories for paid digital content models. That said, the approach does feel like at least one endpoint for the ad-supported model–it guarantees the advertisers the attention that they are paying for by subsidizing content or services.
The advertising business is a bit more top of mind for me, now that it pays my salary. Google’s approach, however, follows the aphorism that honey catches more flies than vinegar: it tries to target ads well enough that users want to click on them, rather than to simply endure them as a cost of subsidizing free services. Google’s revenue (and the popularity of PPC models in general) is a testament to the success of this approach, my occasional rant notwithstanding.
In general, the industry seems to have found a compromise in how aggressively to push ads at users. Users can safely ignore (or even block) sponsored links, but few people do. Pre-roll ads on video sites (i.e., advertising before a video starts) are more invasive, but a number of sites let users skip them. You can read why the YouTube folks are testing this approach. Advertisers–or at least ad-supported services–seem to recognize that they can’t cross the line between pursuing users’ attention and annoying users to the point of alienation.
Still, technology like Apple’s patent application describes shows that it is possible for the ad-supported model to take a more more aggressive approach. Part of me wonders if more aggressive ad-supported models would revitalize paid content models, as users would stop perceiving the former as free. But I suspect that the gentler ad-supported model is here to stay, and that it will continue to strive toward the point of optimal effectiveness.
The Noisy Noogler: A Quick FAQ
I’m barely 24 hours into my new life as a Googler, and I’ve already gotten lots of questions! Here at the answers to a few of them:
Will I continue blogging at The Noisy Channel?
Absolutely! I’m committed to posting at least weekly, and I’ll try to do better than that once I’m settled into my new environment.
Will I participate in scholarly conferences and workshops?
Of course! I’m co-organizing SSM 2010, which will be held in conjunction with WSDM 2010 in February, and of course HCIR 2010, which will be held in conjunction with IIiX 2010 in August. You probably won’t see me at vendor fests, but I do hope to continue bringing industry practitioners and academic researchers together.
Will I blog about Google?
I certainly won’t disclose any confidential information–people get fired for that–or worse. And, given how much access I will have to such information, I will err on the side of caution, only discussing information that I’m sure Google has released to the general public. Beyond that, I’ll exercise common sense. I don’t want to either come across as a shill for my employer or to spar with my new colleagues in public. Subject to those constraints, however, I can and will blog about Google.
Can I get you a job at Google?
I can advise you and connect you to a recruiter, but that’s the limit of my power. The hiring process here is specifically designed to prevent any individual from manipulating it–even me!
Will I talk about what I’m working on?
See above regarding confidential information. I’ll be delighted to talk about anything I’m working on that Google has decided to disclose publicly.
Does Google know about my karaoke habit?
Too late, they’ve already signed the offer letter. 🙂
Twitter Lists as an Influence Measure?
In “Using Twitter Lists To Judge Influence“, Todd Zeigler of the Bivings Report writes:
I think Twitter Lists will end up helping separate the men from the boys when it comes to influence. In addition to seeing a Twitter users follower count, we can now see the number of other Twitter users who have added them to lists (example to the right). I would argue that getting added to a list is a bigger deal than simply getting someone to follow you.
I’m certainly intrigued by Twitter Lists, but I’m skeptical that counting how many lists someone is on will prove that much more useful than follower count. For example, I currently have 1159 followers, am on 33 lists, and have a TunkRank of 24.1. For grins, here’s a handful of people who have similar stats:
- Evan Sandhaus: 796 followers, 21 lists, TunkRank = 17.2
- Josh Young: 801 followers, 25 lists, TunkRank = 14.3
- Chris Ahearn: 1108 followers, 14 lists, TunkRank = 30.1
- Brynn Evans: 1303 followers, 33 lists, TunkRank = 18.9
- Eric Andersen: 1543 followers, 37 lists, TunkRank = 3.1
While I can’t generalize from a few arbitrarily selected data points (though Gladwell seems to have no trouble doing so in Outliers), my suspicion is that list count will be highly correlated to follower count–and may actually be a noisier signal because the numbers are so much smaller.
Of course, there’s no reason we should use raw list counts–any more than we should use raw follower counts. Just as TunkRank aspires to model attention scarcity and recognizes that not all followers are created equal, an effective measure of how lists contribute to influence must recognize that not all list memberships are created equal either.
I’ve been chatting with Chris Langreiter, who is working on enhancements to TunkRank to address some of the oversimplifications of its model, as well as with Jonathan Glick and Ken Reisman at TLists. I’d like to see online influence–on Twitter and in general–measured more effectively. It will be great if lists can help, but we can’t make the same naive mistakes as those who were quick to embrace follower count as a measure of authority.
Tuning in to Google Music Search
With all of the activity around e-books last week, you might think that the online world wasn’t paying attention to the media category most transformed by the Internet music. But a week is a lifetime in the ADD-addled technology press, and today’s top story is that Google is “making search more musical“. From the official blog post:
Now, when you enter a music-related query — like the name of a song, artist or album — your search results will include links to an audio preview of those songs provided by our music search partners MySpace (which just acquired iLike) or Lala. When you click the result you’ll be able to listen to an audio preview of the song directly from one of those partners.
As with most Google features, this one is being rolled out gradually. If you’re impatient (like me), you can try it directly from this page. Or you can watch the video above.
My first impression: this is great feature to improve known-item search, and it’s nice that they’ve partnered with folks that often let you hear whole songs, rather than 30-second snippets. The selection seems limited, but it could be that my tastes are a bit obscure. I’m curious if others share my sense that the catalog is much smaller than the ones on iTunes or Amazon.
But, as music IR specialist and fellow HCIR advocate Jeremy Pickens points out, Google is “doing to music what they did to the web“. I’m not as concerned as Jeremy is about the prospect of musical tastes being homogenized through the “rich get richer” effect of ranking–perhaps because we’re already there. Not only is pop music self-perpetuating (see this great study by my friend (and Princeton sociologist) Matt Salganik and his former advisor Duncan Watts), but even recommendation engines quash diversity. Google really can’t make things that much worse.
Besides, much as Google’s default search leads many searchers to Wikipedia, a great starting point for exploratory search, the new music search leads users to Pandora, which is probably the leading engine for exploratory music search offers users a more exploratory user experience (though it would be great if they also linked to last.fm) (thanks Jeremy!). OK, maybe “leads” is a strong word for a “listen on” link below the search result, but it’s there for people in the know.
I’d love to see Google embrace HCIR. But I appreciate the improvements to known-item search too, especially if they can delegate the HCIR functionality to others that focus on it.
The slides for Ben Shneiderman‘s HCIR 2009 keynote on “The Future of Information Discovery” are now available on the workshop web site. I’ve also taken the liberty to upload them to SlideShare and embed them here. The slides don’t do justice to Ben’s presentation style, but hopefully they at least communicate a taste of the material he covered and his vision of where HCIR needs to go as a field and community.
Google Experimenting with Social Search
Google may be an also-ran in the social networking market with its Brazil-centric Orkut service, but that hasn’t stopped the search giant from adding social features to its products. A post at the (unofficial) Google Operating System blog recounts the history of Google Reader’s social evolution, up to but not including its latest update last week. SearchWiki, though not a social search feature per se, allows users to share personal annotations of their search results, as does the more recently introduced Sidewiki. And, like Bing, Google has established a partnership with Twitter in order to surface “social” results.
But the feature announced today, which Google is actually calling “Social Search“, is a much bigger step, even if it is tucked away as an experiment on Google Labs. From the official blog post:
With Social Search, Google finds relevant public content from your friends and contacts and highlights it for you at the bottom of your search results. When I do a simple query for [new york], Google Social Search includes my friend’s blog on the results page under the heading “Results from people in your social circle for New York.” I can also filter my results to see only content from my social circle by clicking “Show options” on the results page and clicking “Social.”
I gave it a whirl, search for “noisy channel” and then restricting the search to content from what Google considers my social circle. The results are as promised, and could further refine to results by author name, selecting from a familiar list of Neal Richter, Jason Adams, Daniel Lemire. Ken Ellis, and Joshua Young (though for some reason Josh’s link didn’t work). Cool! Except that there are a lot of names missing (check out the bloggers in The Noisy Community) and, more importantly, I can’t further refine or even sort the search results. Indeed, the ordering of search results seems quite arbitrary–a phenomenon I’ve noticed more generally for search engine ranking of social media content.
In short, Google Social Search is a welcome initiative, but there’s a lot more work to do before I would find a productive use for it. Given the mismatch between social search and black-box relevance ranking, a little bit of HCIR would go a long way towards making this feature practically useful.
HCIR 2009: Human-Human Interaction
On Friday, I had the privilege of seeing just how much the annual Workshop on Human-Computer Information Retrieval has grown up since I conceived it in the summer of 2007. Back then, my co-conspirators and I worried about attracting a critical mass of participants–indeed, Endeca employees easily accounted for a quarter of the attendees (and submissions) at the first HCIR workshop. And even last year host and co-sponsor Microsoft Research supplied a disproportionate share of the attendees.
But this year was different. We were overloaded with strong submissions from all corners, and we had to turn people away for lack of capacity! While we didn’t relish saying no to prospective participants, these are great problems to have! And, thanks to Nick Belkin and Diane Kelly, we’ve arranged to greatly increase that capacity at HCIR 2010–more on that in a moment.
Max Wilson has already written up an excellent summary of the workshop, which I encourage you to read. You can also see the live tweet stream at #hcir09. Rather than duplicate these efforts, let me add my personal reflections as an organizer and participant.
Ben Shneiderman‘s keynote address was sweeping and inspiring. I expected him to talk about information visualization, the area where he is most known for his contributions. He did present some examples of his group’s work on visualization-centric interfaces to support medical research, but his overall presentation took the much more ambitious approach of discussing the past, present, and possible future of HCIR. Specifically, he urged us to link our work to societal goals, such as the United Nations Millennium Development Goals. His challenge may seem impossibly idealistic, but I agree with his assertion that it is a practical one: we will do our best research by grounding ourselves firmly in the real and pressing problems of our age. Last year’s keynote speaker went on to win the Gerard Salton Award; I can only hope that Ben receives comparable accolades for his past accomplishments and future contributions to HCIR.
A new feature for this year’s workshop was having a “poster boaster” session, in which each of the presenters in the poster session had one minute to pitch his or her work. For those of you unfamiliar with this format, I highly recommend it. The compressed format forces presenters to distill the essence of their contributions–a useful exercise in general. And the audience doesn’t get bored: if you decide halfway into a presentation that you aren’t interested, then you only have to wait 30 seconds until the next one! Not the we had that problem: the posters were consistently interesting, as the submissions were unusually strong this year. You can download the full workshop proceedings here.
Even the full presentations weren’t that long. The five speakers were each allotted ten minutes, with a healthy amount of time reserved for a panel-style Q&A sessions. The papers in this session were, by design, some of the more controversial ones. In particular, Ellen Voorhees delivered a full-throated defense of Cranfield / TREC-style evaluation: “I Come Not to Bury Cranfield, but to Praise It” (similar to her presentation at the 2006 Workshop on Adaptive Information Retrieval that I discussed on this blog last year). Her reminder of HCIR’s challenges on the evaluation front surely ruffled some feathers, but all of us HCIR avocates need to address these challenges if we want researchers (and practitioners) outside our community to drink our kool-aid.
The above format was already quite interactive (as befits a workshop about interaction), but the second half of the day was explicitly designed to facilitate discussion. We had lunch on site, followed by a one-hour poster session. We then had two one-hour guided discussion sessions to address the theoretical and practical concerns of HCIR. As organizers, we seeded both sessions with questions, but we also incorporated concerns that had come up during earlier discussions.
Finally, I am grateful to our sponsors. Catholic University was a gracious host and sponsor, providing the workshop with a great space and very helpful student volunteers. Between that and the financial contributions of Endeca and Microsoft Research, we were able to continue our tradition of not charging attendees for the workshop. I can’t promise that will continue indefinitely, but I am glad that our insistence on emphasizing substance over frivolous amenities has helped us deliver what I believe to be some of the best bang-for-buck in the scholarly community.
I’m already excited about HCIR 2010. Unlike the past three workshops, which have been held as independent events, next year’s workshop will be co-located with the Information Interaction in Context Symposium (IIiX’10) in New Brunswick, New Jersey. The workshop will take place on August 22nd, breaking our unintended tradition of holding the workshop on October 23rd. Nick Belkin assures us that there will be lots of space, so hopefully we’ll be able to accommodate everyone who is interested. We’ll also be soliciting sponsors for both the workshop and the broader symposium.
But there’s more to HCIR than enjoying each other’s company at workshops. We must spend the remaining 364 days of the year fleshing out our vision, and relating that vision not only to the disciplines HCIR explicitly integrates, but to pressing social concerns. It is up to us all to make our work relevant.

