Categories
Uncategorized

Is Google destroying the planet?

In Lewis Caroll’s Through the Looking Glass, the White Queen tells Alice that she’s “believed as many as six impossible things before breakfast”. Well, it’s a good thing I read this post about the environmental impact of Google searches before breakfast. It cites physicist Alex Wissner-Gross as saying that the average Google search generates 7g of CO2, using up half as much energy as boiling a kettle for a cup of tea.

I ate some breakfast after reading it, and now I’m feeling appropriately skeptical again. If nothing else, I doubt Google reveals enough about its internals for anyone to come to such a precise calculation.

[Note: Google explains here that the calculation is off by a fair amount–the average search generated 0.2g of CO2. Thanks to Jeremy for the heads up. Also, Jason Kincaid criticizes the Times Of London’s reporting here.]

Nonetheless, there may be something worth exploring in this argument. Even if the environmental cost of Google is far less than this research claims, it’s still a cost. Perhaps we should think about information seeking systems not only in terms of efficient use of human attention, but also in terms of other non-renewable natural resources.

In the automobile industry, we (at least in the United States) made the mistake of assuming the customer was always right, thus favoring SUVs over more economical and energy-efficient alternatives. Hopefully we’ve learned our lesson, though that’s still to be determined. In any case, perhaps there’s a similar lesson to be learned in the world of search engines.

Categories
General

A Word of Thanks to Thanx Media

As I hope I’ve made abundantly clear in the past, this is not a corporate blog, and I try to avoid even the appearance of being a shill for my employer, our customers, or our partners.

But I hope you will understand that, in this case, it’s personal.

A couple of months ago, SLI Systems CEO Shaun Ryan did something which I thought was, to put it generously, not taking the high road. In the guise of sending out a helpful “note of caution” to SLI’s customers and prospects, he proceeded to make an attack of the kind I typically associate with desperate political campaigns.

The intended target was Endeca partner Thanx Media. But here’s where we get to the personal part. He used this post of mine to suggest that the software I’ve helped develop and deploy was difficult to set up.

At the time, I was persuaded by colleagues to take the high road myself and not respond. But now that Thanx Media has announced its latest successes, including displacing SLI at CableOrganizer.com only weeks after Ryan blogged about it, I feel it is appropriate to thank the guys at Thanx Media for defending my honor along with their own.

I’m all for healthy competition. I recently gave a technical talk at Google, whose enterprise division competes with Endeca, and I even invited a former EVP at FAST to attend. My aim in organizing the SIGIR Industry Track is to raise the caliber of discussion among competitors. I try to give credit to competitors for their successes, but more importantly I try to keep my criticism fair. I also open up my blog to comments, which means that you folks can keep me honest if I stray from the path.

Here in the United States, many of us are hopeful for an era that will bring us a new kind of politics. Why don’t we start by practicing it ourselves?

Categories
Uncategorized

If you’re an IR / NLP person looking for work…

I get pinged from time to time by colleagues and recruiters looking to hire IR / NLP people, for everything ranging from short-term contract work to CTO-level roles. Unfortunately for them, I’m very happy in my role at Endeca, so the best I can ever offer is to route them to colleagues.

That’s where you come in. If you’re in this space and want to be on my radar, please let me know, either by commenting here or by sending me email. I can’t promise anything, but I’ll do my best to play matchmaker when I see a potential fit.

Alternatively, if there’s a good existing forum to bring together employers and job-seekers in this space, please let me know, and I’ll encourage everyone to congregate there.

Categories
General

Is online friendship worth less than a piece of meat?

In a brilliant marketing campaign, Burger King is offering a coupon for a free Whopper to anyone who “sacrifices” ten of their Facebook friends. The “whopper sacrifice” campaign is earning mass media coverage, including in the New York Times. I checked it out myself and took the opportunity to trade ten of my more questionable online friendships for a slightly less questionable repast.

Of course, the interesting question in the context of much of the discussion on this blog is what such a campaign tells us about the value of online social network connections. On Facebook, friendship is symmetric, as is also the case on LinkedIn. But it’s interesting to consider how such a campaign might have worked on Twitter. Would you be asked to sacrifice followers or followees?

On one hand, you choose whom you follow, and in theory you follow them because you’re interested in what they have to say. It stands to reason that unfollowing someone would be a sacrifice.

On the other hand, having lots of followers is signals status and perhaps even authority. So perhaps it’s giving up followers that would be a sacrifice.

Of course, these two possibilities aren’t mutually exclusive: there may be value both in following and being followed. Regardless of whether it is better to give than receive, it may be good to do both.

Nonetheless, I suspect that the average online “friendship” is worth less than $0.37 (a whopper goes for $3.69). I’m sure Burger King will have no trouble giving away whoppers.

Categories
Uncategorized

Making Whuffie

I know it’s unseemly to brag, but I’m very excited about the feedback I’ve gotten about the “Reconsidering Relevance” presentation, and I wanted to share that excitement.

Here’s some of the whuffie I’ve received:

I’m looking forward to posting the video, which I’m told will be available early next week,

Categories
General

Google Tech Talk: Reconsidering Relevance

I’m still waiting for Google to post a video of the talk to YouTube (the wait is over!), but in the meantime I’ve posted the slides to Scribd and SlideShare. I’ve included speaker notes designed to make the talk completely self-contained.

I’d like to add that my hosts at Google NYC were very gracious, particularly considering that my material was more than a little critical of their approach to search and information retrieval.

Here is the abstract again as a reminder:

We’ve become complacent about relevance. The overwhelming success of web search engines has lulled even information retrieval (IR) researchers to expect only incremental improvements in relevance in the near future. And beyond web search, there are still broad search problems where relevance still feels hopelessly like the pre-Google web.

But even some of the most basic IR questions about relevance are unresolved. We take for granted the very idea that a computer can determine which documents are relevant to a person’s needs. And we still rely on two-word queries (on average) to communicate a user’s information need. But this approach is a contrivance; in reality, we need to think of information-seeking as a problem of optimizing the communication between people and machines.

We can do better. In fact, there are a variety of ongoing efforts to do so, often under the banners of “interactive information retrieval”, “exploratory search”, and “human computer information retrieval”. In this talk, I’ll discuss these initiatives and how they are helping to move “relevance” beyond today’s outdated assumptions.

Categories
General

The Real Twitter

I just came back from the monthly NY Tech Meetup, whose theme this evening was “Built on Twitter“. While the meeting was well organized (a testament to Nate Westheimer, who received the torch from Meetup CEO Scott Heiferman, I had mixed feelings about the demos. Everyone is capitalizing on Twitter’s buzz, but so few people seem to be creating anything valuable on top of it.

But, by luck, Daniel Lemire sent me a link to Sylvie Noël’s post about a paper by HP Labs on “Twitter: Social Networks that Matter: Twitter under the microscope” by Bernardo A. Huberman, Daniel M. Romero and Fang Wu. She also pointed to an executive summary by Forrester analyst Jeremiah Owyang.

The paper is insightful. The authors practically had me at hello–this is the paper’s third paragraph:

While the standard definition of a social network embodies the notion of all the people with whom one shares a social relationship, in reality people interact with very few of those “listed” as part of their network. One important reason behind this fact is that attention is the scarce resource in the age of the web. Users faced with many daily tasks and large number of social links default to interacting with those few that matter and that reciprocate their attention. For example, a recent study of Facebook showed that users only poke and message a small number of people while they have a large number of declared friends. And a casual search through recent calls made through any mobile phone usually reveals that a small percentage of the contacts stored in the phone are frequently contacted by the user.

They then define a user’s “friend” as a person to whom that user has specifically directed at least two posts and show that the a user’s number of friends is a better predictor of the user’s activity (number of posts) than the user’s number of followers. Having thus validated the number of friends as a more important input variable than the number of followers, they explore the friend graph, which turns out to be much sparser than the follower graph.

Their conclusion:

Many people, including scholars, advertisers and political activists, see online social networks as an opportunity to study the propagation of ideas, the formation of social bonds and viral marketing, among others. This view should be tempered by our findings that a link between any two people does not necessarily imply an interaction between them. As we showed in the case of Twitter, most of the links declared within Twitter were meaningless from an interaction point of view. Thus the need to find the hidden social network; the one that matters when trying to rely on word of mouth to spread an idea, a belief, or a trend.

I urge you to read the whole paper, as my abbreviated version hardly does it justice. And then, if you’re practically minded, think about ways to build applications on Twitter than leverage this real social network that is hidden in plain sight.

I further suspect that the authors result generalize beyond Twitter to other social networks where the cost of connecting is far lower than the cost of actually investing in the connection. It doesn’t seem hard to identify the hidden social network, and by doing so we can unlock its value.

Of course, Twitter has the virtue that its network is mostly available to the public, not hidden behind a walled garden like LinkedIn or Facebook. As a result, I expect that Twitter will drive both research and innovation in the social network space, at least in the near term.

Categories
Uncategorized

Back to the Future: Amazon lets Data Providers Charge for Access

Jeremy Kirk reports in Computerworld that:

Amazon.com Inc. has rolled out a new option for its Simple Storage Service (S3) that lets data owners shift the cost of accessing their information to other people or entities.

This may not seem like a big deal; all they’re doing is offering data owners the opportunity to shift costs. But it strikes me as counter to the general industry trend, which is to offer all online information for free and make it up in the volume–I mean, in advertising revenue.

Will Amazon’s retro move overcome this inexorable trend? Perhaps not, but perhaps they don’t have to. It may be enough to capture a small segment of the market that is willing to have those who access data pay for it. And there’s certainly a lot of opportunity for such a model in the B2B space.

Personally, I’m hopeful at any sign that someone is seriously pursuing a non-free business model for content access. Once upon a time, that was a cultural norm, and it’s not clear that the “information wants to be free” approach has done much to preserve the value of information. Perhaps Amazon is just tilting at windmills, but I’m hoping that the 2008 CTO of the Year knows what he’s doing.

Categories
Uncategorized

Upcoming Google NYC Talk: Reconsidering Relevance

I’m givng a talk in the Google New York office this Wednesday (1/7) at 3pm entitled “Reconsidering Relevance”. The title is an allusion to Tefko Saracevic‘s paper, “Relevance Reconsidered“.

I’m not sure how many NYC Googlers read this blog, but I encourage you all to attend. I’ve also been able to put a few folks on the guest list. For everyone else, my host assures me that the recorded presentation will be posted on YouTube. And of course I’ll post the slides here at The Noisy Channel, as well as on SlideShare.

Here’s the title and abstract:

Reconsidering Relevance

We’ve become complacent about relevance. The overwhelming success of web search engines has lulled even information retrieval (IR) researchers to expect only incremental improvements in relevance in the near future. And beyond web search, there are still broad search problems where relevance still feels hopelessly like the pre-Google web.

But even some of the most basic IR questions about relevance are unresolved.  We take for granted the very idea that a computer can determine which documents are relevant to a person’s needs. And we still rely on two-word queries (on average) to communicate a user’s information need. But this approach is a contrivance; in reality, we need to think of information-seeking as a problem of optimizing the communication between people and machines.

We can do better. In fact, there are a variety of ongoing efforts to do so, often under the banners of “interactive information retrieval”, “exploratory search”, and “human computer information retrieval”. In this talk, I’ll discuss these initiatives and how they are helping to move “relevance” beyond today’s outdated assumptions.

Categories
General

Enterprise Search Hype: An Example

I was going through my alerts this morning for enterprise search and found this nugget in an interview with an enterprise search executive (the interview is easy to find on the web):

What is the basis of your firm’s technical approach?

XXX provides a highly scalable and manageable information access platform built on open standards. XXX transforms raw data, whatever its nature, into actionable intelligence through best of breed indexing, extraction and classification technologies.

The term “information access platform” has become standard enough, but it tells us nothing about the company’s technical approach. “Highly scalable and manageable”? Again, these are neither specific nor informative about the approach. In fact, the only thing we learn about the company’s technical approach is that it uses open standards, and we don’t even hear which ones. And perhaps it is unfair to single out this company, since I’ve seen similar content-free descriptions across the industry.

But compare that to how I answered Ron Miller in a recent one-on-one:

What is the differentiator for Endeca search?

Endeca combines a set-oriented retrieval approach with user interaction to create an interactive dialogue, offering next steps or refinements to help guide users to the results most relevant for their unique needs. An Endeca-powered application responds to a query with not just relevant results, but with an overview of the user’s current context and an organized set of options for incremental exploration.

I’m not claiming that my answer is complete–it is hard to go deep in a short interview. But I at least tried to explain *what* Endeca does, rather than just concatenate buzz words. A litmus test is that you could not pick another enteprise search vendor name out of a hat and substitute it into that paragraph.

As I posted the other day, the enterprise search market is beset by marketing and hype. There are lots of folks to blame for this set of affairs, but we who are selling enterprise search technology have a particular responsiblity for emphasizing reality over hype. Let’s keep it real.