Month: October 2009

Tuning in to Google Music Search

Post author By Daniel Tunkelang
Post date October 29, 2009
6 Comments on Tuning in to Google Music Search

With all of the activity around e-books last week, you might think that the online world wasn’t paying attention to the media category most transformed by the Internet music. But a week is a lifetime in the ADD-addled technology press, and today’s top story is that Google is “making search more musical“. From the official blog post:

Now, when you enter a music-related query — like the name of a song, artist or album — your search results will include links to an audio preview of those songs provided by our music search partners MySpace (which just acquired iLike) or Lala. When you click the result you’ll be able to listen to an audio preview of the song directly from one of those partners.

As with most Google features, this one is being rolled out gradually. If you’re impatient (like me), you can try it directly from this page. Or you can watch the video above.

My first impression: this is great feature to improve known-item search, and it’s nice that they’ve partnered with folks that often let you hear whole songs, rather than 30-second snippets. The selection seems limited, but it could be that my tastes are a bit obscure. I’m curious if others share my sense that the catalog is much smaller than the ones on iTunes or Amazon.

But, as music IR specialist and fellow HCIR advocate Jeremy Pickens points out, Google is “doing to music what they did to the web“. I’m not as concerned as Jeremy is about the prospect of musical tastes being homogenized through the “rich get richer” effect of ranking–perhaps because we’re already there. Not only is pop music self-perpetuating (see this great study by my friend (and Princeton sociologist) Matt Salganik and his former advisor Duncan Watts), but even recommendation engines quash diversity. Google really can’t make things that much worse.

Besides, much as Google’s default search leads many searchers to Wikipedia, a great starting point for exploratory search, the new music search leads users to Pandora, which is probably the leading engine for exploratory music search offers users a more exploratory user experience (though it would be great if they also linked to last.fm) (thanks Jeremy!). OK, maybe “leads” is a strong word for a “listen on” link below the search result, but it’s there for people in the know.

I’d love to see Google embrace HCIR. But I appreciate the improvements to known-item search too, especially if they can delegate the HCIR functionality to others that focus on it.

General

Ben Shneiderman’s HCIR 2009 Keynote: The Future of Information Discovery

Post author By Daniel Tunkelang
Post date October 27, 2009
1 Comment on Ben Shneiderman’s HCIR 2009 Keynote: The Future of Information Discovery

http://static.slidesharecdn.com/swf/ssplayerd.swf?doc=hcir2009-futureinfodiscovery3-091027115639-phpapp01&stripped_title=the-future-of-information-discovery

The slides for Ben Shneiderman‘s HCIR 2009 keynote on “The Future of Information Discovery” are now available on the workshop web site. I’ve also taken the liberty to upload them to SlideShare and embed them here. The slides don’t do justice to Ben’s presentation style, but hopefully they at least communicate a taste of the material he covered and his vision of where HCIR needs to go as a field and community.

General

Google Experimenting with Social Search

Post author By Daniel Tunkelang
Post date October 26, 2009
5 Comments on Google Experimenting with Social Search

Google may be an also-ran in the social networking market with its Brazil-centric Orkut service, but that hasn’t stopped the search giant from adding social features to its products. A post at the (unofficial) Google Operating System blog recounts the history of Google Reader’s social evolution, up to but not including its latest update last week. SearchWiki, though not a social search feature per se, allows users to share personal annotations of their search results, as does the more recently introduced Sidewiki. And, like Bing, Google has established a partnership with Twitter in order to surface “social” results.

But the feature announced today, which Google is actually calling “Social Search“, is a much bigger step, even if it is tucked away as an experiment on Google Labs. From the official blog post:

With Social Search, Google finds relevant public content from your friends and contacts and highlights it for you at the bottom of your search results. When I do a simple query for [new york], Google Social Search includes my friend’s blog on the results page under the heading “Results from people in your social circle for New York.” I can also filter my results to see only content from my social circle by clicking “Show options” on the results page and clicking “Social.”

I gave it a whirl, search for “noisy channel” and then restricting the search to content from what Google considers my social circle. The results are as promised, and could further refine to results by author name, selecting from a familiar list of Neal Richter, Jason Adams, Daniel Lemire. Ken Ellis, and Joshua Young (though for some reason Josh’s link didn’t work). Cool! Except that there are a lot of names missing (check out the bloggers in The Noisy Community) and, more importantly, I can’t further refine or even sort the search results. Indeed, the ordering of search results seems quite arbitrary–a phenomenon I’ve noticed more generally for search engine ranking of social media content.

In short, Google Social Search is a welcome initiative, but there’s a lot more work to do before I would find a productive use for it. Given the mismatch between social search and black-box relevance ranking, a little bit of HCIR would go a long way towards making this feature practically useful.

General

HCIR 2009: Human-Human Interaction

Post author By Daniel Tunkelang
Post date October 26, 2009
2 Comments on HCIR 2009: Human-Human Interaction

On Friday, I had the privilege of seeing just how much the annual Workshop on Human-Computer Information Retrieval has grown up since I conceived it in the summer of 2007. Back then, my co-conspirators and I worried about attracting a critical mass of participants–indeed, Endeca employees easily accounted for a quarter of the attendees (and submissions) at the first HCIR workshop. And even last year host and co-sponsor Microsoft Research supplied a disproportionate share of the attendees.

But this year was different. We were overloaded with strong submissions from all corners, and we had to turn people away for lack of capacity! While we didn’t relish saying no to prospective participants, these are great problems to have! And, thanks to Nick Belkin and Diane Kelly, we’ve arranged to greatly increase that capacity at HCIR 2010–more on that in a moment.

Max Wilson has already written up an excellent summary of the workshop, which I encourage you to read. You can also see the live tweet stream at #hcir09. Rather than duplicate these efforts, let me add my personal reflections as an organizer and participant.

Ben Shneiderman‘s keynote address was sweeping and inspiring. I expected him to talk about information visualization, the area where he is most known for his contributions. He did present some examples of his group’s work on visualization-centric interfaces to support medical research, but his overall presentation took the much more ambitious approach of discussing the past, present, and possible future of HCIR. Specifically, he urged us to link our work to societal goals, such as the United Nations Millennium Development Goals. His challenge may seem impossibly idealistic, but I agree with his assertion that it is a practical one: we will do our best research by grounding ourselves firmly in the real and pressing problems of our age. Last year’s keynote speaker went on to win the Gerard Salton Award; I can only hope that Ben receives comparable accolades for his past accomplishments and future contributions to HCIR.

A new feature for this year’s workshop was having a “poster boaster” session, in which each of the presenters in the poster session had one minute to pitch his or her work. For those of you unfamiliar with this format, I highly recommend it. The compressed format forces presenters to distill the essence of their contributions–a useful exercise in general. And the audience doesn’t get bored: if you decide halfway into a presentation that you aren’t interested, then you only have to wait 30 seconds until the next one! Not the we had that problem: the posters were consistently interesting, as the submissions were unusually strong this year. You can download the full workshop proceedings here.

Even the full presentations weren’t that long. The five speakers were each allotted ten minutes, with a healthy amount of time reserved for a panel-style Q&A sessions. The papers in this session were, by design, some of the more controversial ones. In particular, Ellen Voorhees delivered a full-throated defense of Cranfield / TREC-style evaluation: “I Come Not to Bury Cranfield, but to Praise It” (similar to her presentation at the 2006 Workshop on Adaptive Information Retrieval that I discussed on this blog last year). Her reminder of HCIR’s challenges on the evaluation front surely ruffled some feathers, but all of us HCIR avocates need to address these challenges if we want researchers (and practitioners) outside our community to drink our kool-aid.

The above format was already quite interactive (as befits a workshop about interaction), but the second half of the day was explicitly designed to facilitate discussion. We had lunch on site, followed by a one-hour poster session. We then had two one-hour guided discussion sessions to address the theoretical and practical concerns of HCIR. As organizers, we seeded both sessions with questions, but we also incorporated concerns that had come up during earlier discussions.

Finally, I am grateful to our sponsors. Catholic University was a gracious host and sponsor, providing the workshop with a great space and very helpful student volunteers. Between that and the financial contributions of Endeca and Microsoft Research, we were able to continue our tradition of not charging attendees for the workshop. I can’t promise that will continue indefinitely, but I am glad that our insistence on emphasizing substance over frivolous amenities has helped us deliver what I believe to be some of the best bang-for-buck in the scholarly community.

I’m already excited about HCIR 2010. Unlike the past three workshops, which have been held as independent events, next year’s workshop will be co-located with the Information Interaction in Context Symposium (IIiX’10) in New Brunswick, New Jersey. The workshop will take place on August 22nd, breaking our unintended tradition of holding the workshop on October 23rd. Nick Belkin assures us that there will be lots of space, so hopefully we’ll be able to accommodate everyone who is interested. We’ll also be soliciting sponsors for both the workshop and the broader symposium.

But there’s more to HCIR than enjoying each other’s company at workshops. We must spend the remaining 364 days of the year fleshing out our vision, and relating that vision not only to the disciplines HCIR explicitly integrates, but to pressing social concerns. It is up to us all to make our work relevant.

Uncategorized

Off To DC

I’m heading to Washington, DC tomorrow morning, a couple of days before the HCIR ’09 workshop. I’m not sure I’ll have any opportunities to blog while I’m in the nation’s capital, but of course I’ll post a write-up about the workshop when I’m back! Meanwhile, if you need your blog fix, I encourage you to check out some of the blogs I read.

General

Books! Books! Books!

When my daughter was born almost two years ago, I wondered if she’d grow up reading books. After all, I do most of my reading online, and increasingly find myself reading short articles rather than whole books. Needless to say, she’s loved books so far, even if she’s shredded a few.

But the bigger surprise for me is that books–specifically e-books–have become such a hot industry. When I briefly worked for a consulting firm after grad school in 1999, my first assignment was to evaluate the e-book market. The readers then consisted of the Rocket ebook and SoftBook Reader. Needless to say, I correctly predicted at the time that the ebook-market wasn’t ready for prime time.

But fast forward to the present. Amazon has given the e-book market some credibility: Citigroup says they sold 500K Kindles in 2008, and Forrester predicted they will sell 1.8M units this year.

But the last days (and even the last 24 hours!) of news show that the e-book market is only starting to open up:

In May, Sony, whose e-reader sales have lagged behind the Kindle, announced a partnership with Google in May in order to make copyright-free books available for free.
Google just announced a service called Editions that it plans to launch in 2010 (by when it will have presumably settled the Google Books Settlement Agreement).
The Internet Archive just announced the Bookserver project as “a growing open architecture for vending and lending digital books over the Internet”.
Spring Design just announced Alex, an e-book reader based on Google’s Android operating system.
Barnes & Noble is expected to announce an e-reader that competes directly with the Kindle and has generated lots of buzz through leaked photos.

I grew up on books, and I’m excited to see that, a decade after the initial market failures, e-books (like touchscreens) are a mainstream reality. I still worry about who will buy them, especially considering that the marginal cost of distributing a typical e-book is even less than that of distributing a 5-minute song. A quick scan of a popular file-sharing site reveals that the pdf version of bestseller The Lost Symbol takes up less than 3MB.

Still, I’ll take a moment to celebrate the progress of technology. I’ve always known that reading was cool, but now we have the gadgets to prove it!

Rocket ebook and Softbook.

General

Who Will Buy?

As some of you know, I’m a karaoke junkie. But it’s my wife who has the classier repertoire, including “Who Will Buy?” from the musical Oliver!:

Who will buy this wonderful morning?
Such a sky you never did see!
Who will tie it up with a ribbon
And put it in a box for me?

Of course, the trope that the best things in life are free predates musical theater, let alone the web. But recent years have witnessed dramatic changes in our price sensitivities in every genre of digital (or digitizable) content, and I’m curious (sometimes morbidly so) about where it goes from here.

I won’t make you suffer through a rant about the malaise of the music and news industries–those topics, important as they are, have been overplayed in the blogosphere. If you need a refresher, I suggest Lawrence Lessig and the Nieman Journalism Lab as some of the more rational voices contributing to the discussion.

But it’s not just news and music that are experiencing the effects of the “information wants to be free” movement. Consider these industries:

Books. Many publishers worry that the Kindle has been setting a consumer expectation that a book should only cost $10. Indeed, a recent price war between Amazon and Wal-Mart drove some of those prices down to $8.99. Is this a boon for consumers, or a body blow to the publishing industry? It’s easy to evoke the $0.99 / per song expectation set by iTunes–but that change was more about disaggregating albums than about changing the per-unit cost. Besides, books have not yet had to confront the scale of unauthorized distribution that we see in the music industry. Legal or not, free is a potent source of price pressure.
Software. Wolfram Alpha just made headlines by releasing a $50 iPhone app. Many have reacted that such a high price is outrageous and will doom the application to failure. They may be right on that latter point–the market will vote with its clicks soon enough. But I’m old enough to remember $50 as being in the ballpark of what it cost to purchase a new consumer software application. Even then, unauthorized distribution was an issue–remember the “don’t copy that floppy” campaign? Today, my impression is that few people consciously purchase consumer software–a trend that I at least date to Microsoft’s strategy of bundling its software into PC purchases. The most noted exceptions are console games (which are impressive holdouts in the consumer software space) and iPhone apps–with the caveat that only a tiny minority of apps make enough money for the creators to live on. (Update: just saw this note about how EA Sports President Peter Moore sees the current console game business model of cartridges and discs as a “burning platform”.)
Television. Between Boxee and Netflix, there is a real chance that digital content’s cash cow, cable television, will see its regional monopolies disrupted. I can’t imagine that anyone will shed a tear for the cable companies. And yet I can’t help but wonder what happens as the notion of premium content is subsumed by an expectation that video content should be free. Are we heading towards a proliferation of cheaply produced reality TV, contests, and game shows–all sponsored by rampant product placement?

If we are to believe Mike Masnick, then the price of content is driven to its marginal cost. It’s pretty clear that the marginal cost of distributing most digital content is, while not free, close enough to be a rounding error. Should we be looking forward to a world where no one can charge consumers for content? Folks like Jeff Jarvis and Chris Anderson are cheerleading such a world as not only inevitable but a good thing–though both of them have had the sense to make some money on non-free books while the going is good.

Yes, there are and will always be business models to support content creators. In particular, one-time content (live events, consulting services) has some degree of insulation from the inexorable trend toward free. But what an inefficient turn of events, if people are rewarded for creating one-time content but not for creating far more valuable content that is useful to a broad audience of consumers!

I know that there are non-financial incentives that drive scholars, open-source developers, and activists to create free content. Indeed, I personally write this blog without any direct financial incentive. Perhaps these incentives will be the driving forces for content creation in the 21st century. One way or another, I hope we find a way to fund the things we value, rather than devolving into a locally optimal rut where value creation isn’t economic for the creators.

p.s. You can find the lyrics to Oliver for free online, and you can easily view an free (unauthorized) copy of a performance of “Who Will Buy?” on YouTube. Or you can buy the song for $0.99.

Uncategorized

Third Annual Workshop on Search in Social Media (SSM 2010)

Post author By Daniel Tunkelang
Post date October 16, 2009
1 Comment on Third Annual Workshop on Search in Social Media (SSM 2010)

I’m proud to announce that Eugene Agichtein, Marti Hearst, and Ian Soboroff have invited me to help organize the upcoming Workshop on Search in Social Media (SSM 2010). The workshop will take place in conjunction with the ACM Conference on Web Search and Data Mining (WSDM 2010), a young conference that has quickly become a top-tier forum for work in these areas. The conference and workshop will take place in my home town of New York–Brooklyn, to be precise!

Here’s the key information from the workshop web site:

Overview

Social applications are the fastest growing segment of the web. They establish new forums for content creation, allow people to connect to each other and share information, and permit novel applications at the intersection of people and information. However, to date, social media has been primarily popular for connecting people, not for finding information. While there has been progress on searching particular kinds of social media, such as blogs, search in others (e.g., Facebook, Myspace, of flickr) are not as well understood.

The purpose of the 3rd Annual Workshop on Search in Social Media (SSM 2010), is to bring together information retrieval and social media researchers to consider the following questions: How should we search in social media? What are the needs of users, and models of those needs, specific to social media search? What models make the most sense? How does search interact with existing uses of social media? How can social media search complement traditional web search? What new search paradigms for information finding can be facilitated by social media?

SSM 2010 follows up on the highly successful SSM 2009 and SSM 2008 workshops held at SIGIR 2009 and CIKM 2008 respectively. We are looking forward to an equally exciting workshop at WSDM 2010 in New York!

Format and Topics

We are planning for a full-day workshop consisting of invited speakers, organized in both plenary and panel sessions, and a contributed poster/demo session.

We solicit short (under 2 pages) position papers, posters or demo proposals to be presented as part of a poster session, describing late-breaking and novel research results or demonstrations of prototypes or working systems. All topics at the intersection of information finding and social media are of interest, including, but not limited to:

Searching blogs, tweets, and other textual social media.

Searching within social networks, including expert finding.

Searching Wikipedia discussions and revision histories.

Searching online discussions, mailing lists, forums, and community question answering sites.

The role of human-powered and community question answering.

Novel models of information finding and new search applications for social media.

The role of timeliness, authority, and accuracy in social media search.

Interaction between traditional web search and social media search.

User needs assessments and task analysis for social media search.

Interactions between searching and browsing in social media.

Searching and exploiting folksonomies, tags, and tagged data.

Spam and adversarial interactions in social media.

Ideal papers may include late-breaking and novel research results, position and vision papers discussing the role of search in social media, and demonstrations of prototypes or working systems. Note that the workshop proceedings will not be archived or considered as formal publication, to encourage the informal atmosphere and to allow the authors to publish expanded versions of the work elsewhere.

The poster/demo proposals should be in standard ACM SIG format, more details to be posted soon.

Submissions are due on December 15th. I hope to see some of you there! Meanwhile, feel free to suggest ideas for invited speakers who have done interesting work at the intersection of social media and search, and I’ll share your suggestions with my co-organizers.

General

Innovation at Huffington Post: Data-Driven Headlines

Post author By Daniel Tunkelang
Post date October 15, 2009
31 Comments on Innovation at Huffington Post: Data-Driven Headlines

The other day, I was suggesting to one of my colleagues that Endeca‘s software could help authors write better (translate, more SEO-friendly) headlines. The details of that discussion are proprietary, but I’m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their right-brain creativity.

But apparently the Huffington Post is blazing new trails in this area. The Nieman Journalism Lab reports that:

The Huffington Post applies A/B testing to some of its headlines. Readers are randomly shown one of two headlines for the same story. After five minutes, which is enough time for such a high-traffic site, the version with the most clicks becomes the wood that everyone sees.

NJL also reports that Huffington Post social media editor–and long-time Noisy Channel reader–Josh Young uses Twitter to help crowd-source better headlines.

I’m sure this approach must rattle some old-school journalists. And there is a real danger of optimizing for the wrong outcome. For example, including the word “sex” in this message might improve its traffic (the popularity of this post attests to that), but to what end?

Still, I don’t see this use of technology as cramping anyone’s style. Most of us write to be read–especially those in the media industry who are trying to monetize their audiences. Measurable success matters, and there’s no harm in trying to maximize it.

General

Are Duplicate Tweets Spam?

Post author By Daniel Tunkelang
Post date October 15, 2009
3 Comments on Are Duplicate Tweets Spam?

The Twitterverse is all a-twitter with a new controversy: Twitter has rolled out a new feature that blocks duplicate tweets. They reported to the SocialOomph blog that:

Recurring Tweets are a violation no matter how they are done, including whether or not someone pays you to have a special privilege. We don’t want to see any duplicate tweets whatsoever- They pollute Twitter, and tools shouldn’t be given to enable people to break the rules. Spinnable text seems to just be a way to bypass the rules against duplicate updates and essentially provides the same problems.

Hence, from Thursday, October 15th, 2009, 00:00 AM CST we will prevent the entry of recurring tweets on Twitter accounts within the SocialOomph system. Existing recurring tweets on Twitter accounts will all be placed in paused state at that time, so that the content of the tweet text is still accessible to you, but no publishing to Twitter of those tweets will take place.

Not everyone is thrilled with this new feature. My friend (and Noisy Channel reader) Eric Andersen notes: “this doesn’t make a lot of sense to me – many highly regarded Twitter users (e.g. @GuyKawasaki) regularly re-post tweets…primarily because of the “dip” model: re-posting the same tweet means more people will see, especially with an int’l audience.”

On one hand, I loathe inefficient communication, and I see repeated tweets as exposing the inefficiency of the dip model. We won’t get into my differences of opinion with Guy Kawasaki. If Twitter offered better search and control to users, then I think it would make sense for them to consider duplicate tweets as a spam issue.

On the other hand, Twitter search is crude. And the dip model, much as it may raise my personal hackles, is, in fact, what many users embrace. Twitter takes pride in letting users drive innovation, and I think they should be cautious about being too autocratic. Surely many of the people who post duplicate tweets do so with unspammy intentions.

Let’s face it: Twitter is going through growing pains, even if it just inherited the mother of all trust funds. They really do have to address spam. But they might consider doing so in a less heavy-handed way. I suspect that duplicate tweets are mainly a problem because they affect the statistics for Trending Topics–a problem they could easily address without prohibiting the tweets themselves. Better search would make it users to take charge of the user experience–a small dose of HCIR would go a long way.

I think Twitter has the best of intentions, and that it is confronting a real problem. I hope they work harder to find the right solution.