Categories
General

Upcoming Information Retrieval Conferences

I hope everyone who attended the recent SIGIR 2011 in Beijing had an excellent experience. I didn’t manage to make it to that side of the globe myself, but I’m looking forward to hearing back from my LinkedIn colleagues who were there — particularly Paul Ogilvie, who gave an invited talk at the first Workshop on Entity-Oriented Search (EOS) on “Anchoring Relevance with Entities”.

There are four outstanding information retrieval conferences coming up, and I will have the pleasure of participating in three of them. I’d like to make sure readers here are aware of all of them.

The first is KDD 2011, which will take place August 21-24, 2011 in San Diego, CA. The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location.

I will not be attending KDD myself, but several of my colleagues will be there. In particular, Ron Bekkerman will be presenting a paper on “High-Precision Phrase-Based Document Classification on a Modern Scale”, as well as offering a tutorial on “Scaling Up Machine Learning: Parallel and Distributed Approaches”.

Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011) – Mountain View, CA – October 20, 2011

The second is HCIR 2011, the fifth annual HCIR workshop, which I am co-organizing. It will be held all day on Thursday, October 20th, 2011 at Google’s main campus in Mountain View, California. There will be a reception on Wednesday evening before the workshop. Our keynote speaker this year will be Gary Marchionini, Dean of the School of Information and Library Science, University of North Carolina at Chapel Hill. We are also excited to continue the HCIR Challenge, this year focusing on the problem of information availability, where the seeker faces uncertainty as to whether the information of interest is available at all. The corpus will be the CiteSeer digital library of scientific literature, which contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.

Thanks to generous contributions made by Google, Microsoft Research, and Endeca, there will be no registration fee for HCIR this year. Information about how to register will be sent to authors of accepted position papers, research papers, and challenge reports. Note that the submission deadline has been extended by two weeks to Sunday, August 14th. I strongly encourage you to submit in one of these categories in you are working in this field.

The third is RecSys 2011, the 5th ACM International Conference on Recommender Systems. RecSys 2011 builds on the success of the Recommenders 06 Summer School in Bilbao, Spain and the series of four successful conference events from 2007 to 2010 in Minneapolis (2007), Lausanne (2008), New York (2009) and Barcelona (2010). In these events many members of the practitioner and research communities valued the rich exchange of ideas made possible by the shared plenary sessions. The 5th International conference will promote the same close interaction among practitioners and researchers.

I will be giving a tutorial at RecSys 2011 on “Recommendations as a Conversation with the User”.

The fourth is CIKM 2011, the 20th ACM Conference on Information and Knowledge Management. It will take place in Glasgow, Scotland, UK, 24th-28th October 2011. Since 1992, the CIKM has successfully brought together leading researchers and developers from the database, information retrieval, and knowledge management communities. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future research directions through the publication of high quality, applied and theoretical research findings. CIKM 2011 will continue the tradition of promoting collaboration among multiple areas in the general areas of databases, information retrieval, and knowledge management.

I am proud to be organizing the CIKM 2011 Industry Event, which will feature such industry heavyweights as Stephen Robertson (Microsoft Research), John Giannandrea (Google), Vanja Josifovski (Yahoo! Research), Ilya Segalovich (Yandex), Jeff Hammerbacher (Cloudera), and Chavdar Botev (LinkedIn).

I’m very excited about all four of these opportunities to exchange ideas about information retrieval and related areas, and I am grateful to LinkedIn for supporting my participation, as well as that of my colleagues. I hope to see some of you at these events!

Categories
General

Attention vs. Privacy

A major feature of the recently released Google+ is Circles, which allows you to “share relevant content with the right people, and follow content posted by people you find interesting.”

Most people seem to look at Circles as a privacy feature — and indeed Google’s official description gives the impression that Circles exist to manage privacy based on real-life social contexts. Of course, re-sharing can result in unintended consequences, and Google even offers a warning that:

Unless you disable reshares, anything you share (either publicly or with your circles) can be reshared beyond the original people you shared the content with. This could happen either through reshares or through mentions in comments.

Privacy is a big deal, especially for Google — and particularly in the context of rolling out a new social network. Still, I’m not persuaded that privacy is the only or even the primary concern motivating the concept of social circles.

Sharing content with someone is not just about giving that person permission to see it. Sharing content with someone asserts a claim on that person’s attention. While it may be a privilege for me to have access to your content, it may be even more of a privilege for you that I allocate my scarce attention to consume it.

What if we focus on routing content to the people who would find it most interesting? Such an approach works best if all of the shared content is public with respect to permissions — that is, people post it without any expectation of privacy. Twitter demonstrates that many people are comfortable with such a sharing model. Imagine if they could learn to trust a system that optimizes (or at least attempts to optimize) the allocation of everyone’s attention. This is not an easy problem by any means, nor is it one that is likely to be solved by algorithms alone. It will take a strong dose of HCIR to get it right. But, at least in my view, optimizing the allocation of human attention is the grand challenge that everyone working with information retrieval or social networks should be striving to address.

Privacy is important, and social networks should offer simple, robust privacy controls that users understand. We all have experienced the problem of filter failure. But sharing isn’t just about privacy. Our attention is our most precious cognitive asset, both as individuals and as a society, Moreover, our attention faces ever-increasing demands as our social lives evolve in an online world relatively free of physical constraints. Social network developers would do well to pay attention…to attention.

Categories
General

Guest Post: Diego Basch on The Need for Speed

Diego Basch is the CEO and founder of IndexTank, a hosted search service that powers major web sites such as Reddit, Twitvid, blip.tv, as well as providing a WordPress plug-in for blogs (like this one). Diego gained his search experience working with Inktomi, where he wrote some of the world’s first web-scale link analysis algorithms. He is on a mission to make every search box blazing fast and useful.

So much brainpower is spent solving the wrong problems. The world is filled with solutions looking for problems that nobody has — as illustrated by a Google query for [stupidest inventions ever]. More often, people focus narrowly on a particular approach when they should focus on the problem the approach is intended to solve. Or they take a solution for one problem and assume it will apply to another.

Consider the emphasis that search engine developers place on relevance ranking. It is not hard to understand why web-scale search engines emphasize relevance. For example, a search on Google for [emergency locksmith] returns tens of billions of web pages, among which there are only a handful results that you want. Google must filter out the growing number of lead generation companies that spend a ton of money trying to game its results.

Most web and application developers are familiar with the concept of relevance, so they naturally assume that it should be the primary concern when they add search to their own sites or apps. When I talk to people who want full-text search for their 40,000 book titles or 100k classified ads, they ask me about all the ways they can tune relevance. But often they are focusing on a solution, rather than their fundamental problem.

Developers are (or should be!) trying to improve the user experience of their application search. Too often they wrongly assume that relevance is the single most important factor for optimizing this user experience. Let’s surface this confusion in a concrete example.

As a rock climber, once in a while I feel the aches and pains caused by the sport. As the years go by it’s very important to keep your tendons healthy if you do not want to take forced breaks (or type with one hand!). Rockclimbing.com is one of the most popular climbing sites, and I know some medical professionals who occasionally answer health-related questions there. Let’s search there for [tendon injury prevention].

In the above example, part of the problem is that the search results do not have contextual snippets. Maybe there is relevant information hiding behind a click, but the user has no way of knowing. More generally, there’s no hint as to what results could be better. Information such as score of the answer (which is available), the author’s bio (e.g. “climber, physical therapist”) would make the decision easier. If you need to click and scroll, search within the page, go back and try something else, then the search engine is wasting your time. 

Which brings us the broader point: when users search, they want to spend the least amount of time possible getting to the information they want. Relevance is a means to this end. In particular, clicks and typing costs users time. That time can come from page load, rendering, repeated use of the back button, and of course typing (and re-typing) search queries.

Some application search engines really nail the user experience. Let’s say we’re looking for the movie Koyaanits-however-you-spell-it. Go to the Internet Movie Database (IMDB) and start typing k-o-y-e — and there it is, as the second result. Notice that there is a ton of irrelevant stuff around it but it doesn’t matter. I see what I want very quickly.

Hopefully these two examples serve to illustrate the broader point: search engines should not focus on relevance as an end in itself, but rather on whatever helps users find the information they want as quickly as possible. That means offering contextual snippets, instant feedback, and of course snappy response times. Give users speed, and you will make them happy.

Categories
General

Google±?

When I left Google last December, it was an open secret that Google was developing a social networking product. Now that Google has released Google+, I am at liberty to share my personal impressions.

Let’s start with the clear wins.

  • Impressive launch. Google has certainly learned its lesson from the past launches of Wave and Buzz. Google+ is unambiguously opt-in — no one is going to complain about being ambushed. People have been begging for invites. But Google is wisely releasing invites quickly enough to build critical mass. I’d say that Google has at least picked up the Quora crowd of early adopters in Silicon Valley.
  • Clean design. Design lead Andy Hertzfeld (of Macintosh fame) has nailed it, leading bloggers to comment that this looks too well designed to be a Google product. Comparing Google+ to Facebook now, I’m reminded at least a little of comparisons between Facebook and Myspace. Great move for Google here.

Now let’s talk about Google’s three big features here: Circles, Sparks, and Hangouts.

  • Circles. Straight out of Paul Adams’s presentation of social networking (which he created before he left Google for Facebook), the idea is simple: a person doesn’t have a single group of friends, but rather several groups that tend are mostly disjoint. Through Circles, Google+ makes this soft partitioning of the social space a core design principle. You add people to one or more circles, follow the stream of activity from a circle, and share with circles. It’s great in theory. But in practice it creates friction, especially for people trained on Facebook. There’s a trade-off between simplicity and expressive power, and Google is placing a strong bet on how users will make this trade-off.  I’m inclined to agree with Yishan Wong that “the sorting of friends into buckets (friend lists) is something that only nerds do”. Given Google’s deep expertise in machine learning, I’m expecting Google to reduce this friction by give users intelligent suggestions. Full disclosure: my colleagues at LinkedIn built InMaps, which infers communities from your social network.
  • Sparks. The tagline for Sparks is “For nerding out. Together.” It feels like a positioning designed by Googlers for Googlers– you can see promotional videos here and here. I haven’t seen much talk about Sparks, and what little commentary I’ve seen is less than gushing. I’ve experimented with it a bit from a consumption side, and I confess I’m underwhelmed. Perhaps it’s a chicken-and-egg problem — Sparks will only be useful if users populate their profiles with interests, but right now users have no incentive to do so. If Sparks is Google’s attempt to make Reader more social, there’s still a ways to go. Full disclosure: LinkedIn has its own approach to social news, LinkedIn Today, which seems to be doing something right. 🙂 
  • Hangouts. In plain English, Hangouts are group video chat embedded in a social network. Which sounds a lot like what Facebook is rumored to be releasing this week through a partnership with Skype. Which in turn was just acquired by Microsoft. Will Apple join the party too by implementing group chat in FaceTime? Competitive dynamics aside, this is a very cool feature that hopefully won’t devolve into Chatroulette. Nothing to, um, disclose here.

But the $64B question is whether all this will matter. Can Google+ sustainably co-exist with Facebook? Will people use both services — and, if so, how will they allocate their attention between them? Or is the success of Google+ predicated on displacing Facebook? Or Twitter? Either of those would certainly qualify as a Big Hairy Audacious Goal.

Like Fred Wilson, I’m rooting for Google+ to succeed — but even Fred notes that he would not be able to get his family on Google+, as they are already happy with Facebook. It’s not clear to me what I can get *today* from Google+ that I can’t get from Facebook.

Granted, I’m not a heavy Facebook user, so I’m not the best person to ask this question. So readers, I ask you: why will or won’t you use Google+?