Author: Daniel Tunkelang

High-Class Consultant.

The Noisy Community

One of the great things about blogging is that it’s given me the chance to assemble a community of people who interests overlap with my own. But so far that community only becomes interactive through the comment threads and, to a lesser extent, Twitter.

I’ve wondered if it would be worthwhile to invest in cultivating more sense of community at The Noisy Channel. I have no desire to replicate social network functionality available elsewhere, and I trust that most readers use some combination on LinkedIn, Facebook, and Twitter.

But I could do some things here that might be helpful:

Create an opt-in directory page that lists readers with a short tagline and a contact URL. For example, mine would be:
Daniel Tunkelang
Chief Scientist, Endeca
https://thenoisychannel.com/
Post job descriptions targeted at readers, along with contact information.
Institute a regular appearance of guest blog posts.

Do any of the above appeal to people? The directory strikes me as the simplest first step. I’d love to find out more about who my readers are, and hopefully I can make it worth your while by directing some traffic in your direction. Conversely, a list of Noisy Channel readers with short descriptions sounds like just what the SEO doctor ordered. Of course, it would only take off even enough people are interested in being on this page.

I’m intrigued by the possiblity of posting targeted job descriptions, but that’s only worthwhile if there are enough of them. I have no desire to compete with the large job sites!

Finally, guest posts are something I’ve talked about before, but that somehow have never taken. But a directory might make it easier for me to know whom to ask.

If any of the above interest you, please give me feedback in the comments. I’ll take silence as a lack of interest.

Uncategorized

Forums for Enterprise Search Practitioners

Post author By Daniel Tunkelang
Post date January 14, 2009
4 Comments on Forums for Enterprise Search Practitioners

Since a substantial fraction of readers here are involved with enterprise search (in its broadest sense), I thought it might be helpful to share links to three online forums targeting this field.

The search_dev Yahoo group, “a technical and business discussion group for developers, consultants, IT people and managers who work with Enterprise Search engines”.
The Enterprise Search Engine Professionals LinkedIn group, which welcomes “product managers, developers, designers, buyers, business developers etc. working with Enterprise Search products and platforms”.
The Information Access and Search Professionals LinkedIn group, which welcomes “specialists in Information Retrieval, Knowledge Management, Natural Language Engineering, Image, Audio and Video Analysis, and related areas of Human Computer Interaction and User Studies”.

If you are aware of other valuable forums and resources (preferably ones that are vendor-neutral), please let the rest of us know!

General

A Twitter Analog to PageRank

Post author By Daniel Tunkelang
Post date January 13, 2009
77 Comments on A Twitter Analog to PageRank

A few weeks ago, there was a flame war about Twitter authority, and I was all too eager to throw fuel on the pyre. But now that the blogosphere has calmed down a bit, I’d like to propose a ranking measure that I think might work. My apologies if it isn’t original. In fact, if you’ve seen it elsewhere, please point me to it.

Let me start with the assumptions about the model:

Influence(X) = Expected number of people who will read a tweet that X tweets, including all retweets of that tweet. For simplicity, we assume that, if a person reads the same message twice (because of retweets), both readings count.
If X is a member of Followers(Y), then there is a 1/||Following(X)|| probability that X will read a tweet posted by Y, where Following(X) is the set of people that X follows.
If X reads a tweet from Y, there’s a constant probability p that X will retweet it.

This model is obviously simplistic in all three assumptions. But I think it’s a reasonable first cut. In particular, it accounts for the inflation that occurs from people who follow in the hopes of reciprocity. There’s less value in being followed by someone who follows a lot of people, because that person is less likely to read your messages or retweet them.

Of course, there’s room for adding more realism to this model, but I hope it is at least close enough to the truth to be interesting.

From this model, it’s easy to measure someone’s influence recursively, assuming that we know the constant retweet probability p:

equation1

The recursion is infinite over a graph with directed cycles, but rapidly converges as high powers of p approach zero. I would think this measure wouldn’t be hard to compute to a reasonable accuracy.

This measure strikes me as a PageRank for Twitter or any system with similar properties. There’s more room for nuance, but I at least find this approach more plausible than the ones I’ve seen. It also strikes me as hard to game, since it isn’t counting retweets, and it’s hard to add much influence through followers who don’t have any influence themselves.

What do folks think? Has anyone tried this? If not, is there anyone who’d like to try hacking an application to compute it? Either way, please let me know!

Uncategorized

Is Google destroying the planet?

Post author By Daniel Tunkelang
Post date January 11, 2009
11 Comments on Is Google destroying the planet?

In Lewis Caroll’s Through the Looking Glass, the White Queen tells Alice that she’s “believed as many as six impossible things before breakfast”. Well, it’s a good thing I read this post about the environmental impact of Google searches before breakfast. It cites physicist Alex Wissner-Gross as saying that the average Google search generates 7g of CO2, using up half as much energy as boiling a kettle for a cup of tea.

I ate some breakfast after reading it, and now I’m feeling appropriately skeptical again. If nothing else, I doubt Google reveals enough about its internals for anyone to come to such a precise calculation.

[Note: Google explains here that the calculation is off by a fair amount–the average search generated 0.2g of CO2. Thanks to Jeremy for the heads up. Also, Jason Kincaid criticizes the Times Of London’s reporting here.]

Nonetheless, there may be something worth exploring in this argument. Even if the environmental cost of Google is far less than this research claims, it’s still a cost. Perhaps we should think about information seeking systems not only in terms of efficient use of human attention, but also in terms of other non-renewable natural resources.

In the automobile industry, we (at least in the United States) made the mistake of assuming the customer was always right, thus favoring SUVs over more economical and energy-efficient alternatives. Hopefully we’ve learned our lesson, though that’s still to be determined. In any case, perhaps there’s a similar lesson to be learned in the world of search engines.

General

A Word of Thanks to Thanx Media

Post author By Daniel Tunkelang
Post date January 11, 2009

As I hope I’ve made abundantly clear in the past, this is not a corporate blog, and I try to avoid even the appearance of being a shill for my employer, our customers, or our partners.

But I hope you will understand that, in this case, it’s personal.

A couple of months ago, SLI Systems CEO Shaun Ryan did something which I thought was, to put it generously, not taking the high road. In the guise of sending out a helpful “note of caution” to SLI’s customers and prospects, he proceeded to make an attack of the kind I typically associate with desperate political campaigns.

The intended target was Endeca partner Thanx Media. But here’s where we get to the personal part. He used this post of mine to suggest that the software I’ve helped develop and deploy was difficult to set up.

At the time, I was persuaded by colleagues to take the high road myself and not respond. But now that Thanx Media has announced its latest successes, including displacing SLI at CableOrganizer.com only weeks after Ryan blogged about it, I feel it is appropriate to thank the guys at Thanx Media for defending my honor along with their own.

I’m all for healthy competition. I recently gave a technical talk at Google, whose enterprise division competes with Endeca, and I even invited a former EVP at FAST to attend. My aim in organizing the SIGIR Industry Track is to raise the caliber of discussion among competitors. I try to give credit to competitors for their successes, but more importantly I try to keep my criticism fair. I also open up my blog to comments, which means that you folks can keep me honest if I stray from the path.

Here in the United States, many of us are hopeful for an era that will bring us a new kind of politics. Why don’t we start by practicing it ourselves?

Uncategorized

If you’re an IR / NLP person looking for work…

Post author By Daniel Tunkelang
Post date January 10, 2009

I get pinged from time to time by colleagues and recruiters looking to hire IR / NLP people, for everything ranging from short-term contract work to CTO-level roles. Unfortunately for them, I’m very happy in my role at Endeca, so the best I can ever offer is to route them to colleagues.

That’s where you come in. If you’re in this space and want to be on my radar, please let me know, either by commenting here or by sending me email. I can’t promise anything, but I’ll do my best to play matchmaker when I see a potential fit.

Alternatively, if there’s a good existing forum to bring together employers and job-seekers in this space, please let me know, and I’ll encourage everyone to congregate there.

General

Is online friendship worth less than a piece of meat?

Post author By Daniel Tunkelang
Post date January 10, 2009
4 Comments on Is online friendship worth less than a piece of meat?

In a brilliant marketing campaign, Burger King is offering a coupon for a free Whopper to anyone who “sacrifices” ten of their Facebook friends. The “whopper sacrifice” campaign is earning mass media coverage, including in the New York Times. I checked it out myself and took the opportunity to trade ten of my more questionable online friendships for a slightly less questionable repast.

Of course, the interesting question in the context of much of the discussion on this blog is what such a campaign tells us about the value of online social network connections. On Facebook, friendship is symmetric, as is also the case on LinkedIn. But it’s interesting to consider how such a campaign might have worked on Twitter. Would you be asked to sacrifice followers or followees?

On one hand, you choose whom you follow, and in theory you follow them because you’re interested in what they have to say. It stands to reason that unfollowing someone would be a sacrifice.

On the other hand, having lots of followers is signals status and perhaps even authority. So perhaps it’s giving up followers that would be a sacrifice.

Of course, these two possibilities aren’t mutually exclusive: there may be value both in following and being followed. Regardless of whether it is better to give than receive, it may be good to do both.

Nonetheless, I suspect that the average online “friendship” is worth less than $0.37 (a whopper goes for $3.69). I’m sure Burger King will have no trouble giving away whoppers.

Uncategorized

Making Whuffie

I know it’s unseemly to brag, but I’m very excited about the feedback I’ve gotten about the “Reconsidering Relevance” presentation, and I wanted to share that excitement.

Here’s some of the whuffie I’ve received:

Selected as the Top Presentation of the Day on the SlideShare homepage
A nice review by Ken Ellis, Chief Scientist at Daylife
Picked up by Gwen Harris at Taxonomy Watch

I’m looking forward to posting the video, which I’m told will be available early next week,

General

Google Tech Talk: Reconsidering Relevance

Post author By Daniel Tunkelang
Post date January 8, 2009
20 Comments on Google Tech Talk: Reconsidering Relevance

http://static.slideshare.net/swf/ssplayer2.swf?doc=reconsidering-relevance-1231426605583628-1&stripped_title=google-tech-talk-reconsidering-relevance-presentation

I’m still waiting for Google to post a video of the talk to YouTube (the wait is over!), but in the meantime I’ve posted the slides to Scribd and SlideShare. I’ve included speaker notes designed to make the talk completely self-contained.

I’d like to add that my hosts at Google NYC were very gracious, particularly considering that my material was more than a little critical of their approach to search and information retrieval.

Here is the abstract again as a reminder:

We’ve become complacent about relevance. The overwhelming success of web search engines has lulled even information retrieval (IR) researchers to expect only incremental improvements in relevance in the near future. And beyond web search, there are still broad search problems where relevance still feels hopelessly like the pre-Google web.

But even some of the most basic IR questions about relevance are unresolved. We take for granted the very idea that a computer can determine which documents are relevant to a person’s needs. And we still rely on two-word queries (on average) to communicate a user’s information need. But this approach is a contrivance; in reality, we need to think of information-seeking as a problem of optimizing the communication between people and machines.

We can do better. In fact, there are a variety of ongoing efforts to do so, often under the banners of “interactive information retrieval”, “exploratory search”, and “human computer information retrieval”. In this talk, I’ll discuss these initiatives and how they are helping to move “relevance” beyond today’s outdated assumptions.

General

The Real Twitter

I just came back from the monthly NY Tech Meetup, whose theme this evening was “Built on Twitter“. While the meeting was well organized (a testament to Nate Westheimer, who received the torch from Meetup CEO Scott Heiferman, I had mixed feelings about the demos. Everyone is capitalizing on Twitter’s buzz, but so few people seem to be creating anything valuable on top of it.

But, by luck, Daniel Lemire sent me a link to Sylvie Noël’s post about a paper by HP Labs on “Twitter: Social Networks that Matter: Twitter under the microscope” by Bernardo A. Huberman, Daniel M. Romero and Fang Wu. She also pointed to an executive summary by Forrester analyst Jeremiah Owyang.

The paper is insightful. The authors practically had me at hello–this is the paper’s third paragraph:

While the standard definition of a social network embodies the notion of all the people with whom one shares a social relationship, in reality people interact with very few of those “listed” as part of their network. One important reason behind this fact is that attention is the scarce resource in the age of the web. Users faced with many daily tasks and large number of social links default to interacting with those few that matter and that reciprocate their attention. For example, a recent study of Facebook showed that users only poke and message a small number of people while they have a large number of declared friends. And a casual search through recent calls made through any mobile phone usually reveals that a small percentage of the contacts stored in the phone are frequently contacted by the user.

They then define a user’s “friend” as a person to whom that user has specifically directed at least two posts and show that the a user’s number of friends is a better predictor of the user’s activity (number of posts) than the user’s number of followers. Having thus validated the number of friends as a more important input variable than the number of followers, they explore the friend graph, which turns out to be much sparser than the follower graph.

Their conclusion:

Many people, including scholars, advertisers and political activists, see online social networks as an opportunity to study the propagation of ideas, the formation of social bonds and viral marketing, among others. This view should be tempered by our findings that a link between any two people does not necessarily imply an interaction between them. As we showed in the case of Twitter, most of the links declared within Twitter were meaningless from an interaction point of view. Thus the need to find the hidden social network; the one that matters when trying to rely on word of mouth to spread an idea, a belief, or a trend.

I urge you to read the whole paper, as my abbreviated version hardly does it justice. And then, if you’re practically minded, think about ways to build applications on Twitter than leverage this real social network that is hidden in plain sight.

I further suspect that the authors result generalize beyond Twitter to other social networks where the cost of connecting is far lower than the cost of actually investing in the connection. It doesn’t seem hard to identify the hidden social network, and by doing so we can unlock its value.

Of course, Twitter has the virtue that its network is mostly available to the public, not hidden behind a walled garden like LinkedIn or Facebook. As a result, I expect that Twitter will drive both research and innovation in the social network space, at least in the near term.