Category: General

General posts, typically analyzing HCIR issues.

Lucid Imagination

After the big news in the enterprise search space about Autonomy and FAST last week, the announcement of Lucid Imagination raising $6 million may seem anti-climatic. That’s nothing compared to the $775M Autonomy spent to acquire Interwoven, let along the $1.2B that Microsoft paid for FAST last year. And can Lucid Imagination really succeed as the Red Hat of enterprise search, making money by supporting open-source Lucene and Solr?

Perhaps. Lucene is certainly popular among folks looking for a free search engine. Moreover, for people who want to tinker with it, its being open source is a big plus.

But Lucene deployments require extensive customization. This is often the downside of open source, and the reason that industrial use open source software often involves a significant transfer of funds from enterprises to consultants. In contrast, closed-source solutions tend to come with more tooling, integration support, etc. Those are the sorts of details that don’t necessarily excite open-source developers but are crucial for enterprise software.

Will Lucid Imagination revolutionize the enterprise search market by providing low-cost services on top of free software? Perhaps, though I’m skeptical–and not just because they are a potential competitor to my employer. If they are to be more than a body shop, they’ll have to productize their customization efforts. But I’d imagine that, if Lucid Imagination were to build such products, it would contribute them back into the Lucene code base. That might be great for customers, but it’s not clear how it translates into a sustainable revenue model.

It’s also worth noting that Lucid Imagination isn’t the first company pursuing this model. Sematext, founded in 2007, is another company implementing solutions on top of open source software, including Lucene. In fact, its founder, Otis Gospodnetic, is a regular at The Noisy Channel. Perhaps he can comment on the space.

General

The Information Triumvirate

Post author By Daniel Tunkelang
Post date January 24, 2009
17 Comments on The Information Triumvirate

Nicholas Carr (of “Does IT Matter?” fame) wrote a post a couple of days ago entitled “All hail the information triumvirate!”

Here is his argument in a nutshell:

what we seem to have here is evidence of a fundamental failure of the Web as an information-delivery service. Three things have happened, in a blink of history’s eye: (1) a single medium, the Web, has come to dominate the storage and supply of information, (2) a single search engine, Google, has come to dominate the navigation of that medium, and (3) a single information source, Wikipedia, has come to dominate the results served up by that search engine. Even if you adore the Web, Google, and Wikipedia – and I admit there’s much to adore – you have to wonder if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing. Is culture best served by an information triumvirate?

Carr discloses that he is on the Encyclopedia Britannica’s board of editorial advisors, but I don’t think he’s writing this as a hit piece against Wikipedia. Nor does he recycle the usual pablum about Wikipedia’s inaccuracies; he has surely read the research that Wikipedia is just as accurate as Britannica. [Note: he has read it and criticized the study.]

I agree with Carr that our current use of Google and Wikipedia impoverishes our experience of the information that the web has to offer. In fact, that was a major subtext of my recent presentation on reconsidering relevance. I’m not sure that the dominance of the web is itself a problem, but that’s because I assume that the web is relentlessly assimilating all of the world’s information.

But Carr’s advice isn’t constructive, nor are most of the comments in response to it. Indeed, people wrongly focus their ire on Wikipedia rather than Google. It’s Google that promotes a winner-take-all information economy through its relevance-centric paradigm; if anything, Wikipedia mediates this effect because its results are collaboratively edited.

I think that, if we constrain ourselves to a relevance-centric information seeking system, our current Google + Wikipedia model is close to optimal. To do better, we need web-scale tools that support exploratory search. Long live the HCIR revolution!

General

Autonomy acquires Interwoven: Felix Nube

Post author By Daniel Tunkelang
Post date January 22, 2009
7 Comments on Autonomy acquires Interwoven: Felix Nube

Emperor Maximilian I of Habsburg cultivated the motto, “Bella gerant alii, tu felix Austria nube” (What others achieve by war, let you, happy Austria, achieve by marriage). This morning’s announcement that Autonomy (AUTNF) is acquiring Interwoven (IWOV) for $775M would have made Maximilian proud. Indeed, while some (including Alan Pelz-Sharpe at CMS Watch) have gone as far as to call Autonomy a holding company, I think of Autonomy more as a Habsburgian dynasty.

Autonomy’s acquisitions over the past decade include:

Softsound (speech recognition)
Virage (multimedia search)
etalk (call center software)
Verity (enterprise search)
Blinkx (multimedia search, since divested)
Zantaz (email archiving and litigation support)

And now Autonomy acquires Interwoven, a player in the Enterprise Content Management (ECM) space. It’s not quite an AOL-acquires-Time Warner moment, but it is dramatic to see an enterprise search player acquire a content management player. Indeed, most of the speculation last year was that Autonomy would be acquired, not vice versa.

What does this mean for the rest of us? Given that Autonomy and Interwoven have both been focusing on the compliance and e-discovery space, it is reasonable to expect that the acquisition will deepen this focus. The more interesting question, at least to me, is what this means for the rest of the enterprise search space. Autonomy was recently crowing that it “has won the enterprise search wars“. That’s news to me, but I won’t claim to be objective. In any case, with Autonomy focusing its investment in the compliance / e-discovery space and Microsoft acquiring FAST in order to fold its technology into SharePoint, I do wonder what will happen to the competitive landscape of enterprise search. Will it be Endeca vs. Google?

General

Information Sharing We Can Believe In

Post author By Daniel Tunkelang
Post date January 20, 2009
6 Comments on Information Sharing We Can Believe In

Today is a monumental day in the history of the United States, and I suspect many readers will be too caught up in the inaugural festivities to be checking their RSS readers. But everyone has a job to do, and part of mine is to speak truth to power through blogging.

A few months ago, when Obama’s victory was hardly certain, I was fortunate to attend a meeting of New York technology executives where Jed Katz, a managing director at DFJ Gotham Ventures and a technology advisor to the Obama campaign, responded to questions about Obama’s technology strategy. What he made clear was that he, and by extension the Obama campaign and administration-to-be, was there as much to listen and learn as to respond.

I followed up with Jed by sending my ideas about how the federal government can better make public domain information available to the public. Today, much of that data is locked up in the hands of vendors who then dole it out in drabs. While these arrangements may have been necessary at the time to fund the distribution of that information, they are out of step with today’s technology. What we need is for anyone to be able to obtain that information in raw form and then add value to it by creating better ways to access and analyze it. In fact, private sector companies, universities, and NGOs could compete in their offerings.

I’m hardly the first person to suggest this. Vivek Kundra, the CTO of the District of Columbia and a short-listed candidate for CTO of the United States, organized Apps for Democracy, a contest much along the lines of what I’m describing.

I work in the private sector, and I’m not swayed by the naive conception that all information needs to be free. Some of us would like to earn a living! But public domain information produced by government needs to be freely available to an informed and active citizenry. Moreover, information freely contributed by that citizenry should to be available to decision makers, as well as to the at large.

Today is a milestone many of us never expected we’d live to see, a president who “doesn’t look like all those other presidents on the dollar bills.” Let’s raise the stakes and aim for a government that doesn’t look like the information-hiding governments of our past. That is change I can believe in.

General

Open Calais at the New York Semantic Web Meetup

Post author By Daniel Tunkelang
Post date January 16, 2009
1 Comment on Open Calais at the New York Semantic Web Meetup

http://static.slideshare.net/swf/ssplayer2.swf?doc=opencalais-release-40-1232126721137134-2&stripped_title=open-calais-release-40-presentation

Tom Tague, who leads the Calais initiative at Thomson Reuters, delivered an excellent presentation this week at the New York Semantic Web Meetup. While the slides hardly do justice to this highly interactive session, they’re still worth a look. More importantly, Calais itself is worth a look if you are interested in semantic tagging. For free.

General

The Influence Economy

There’s an interesting convergence of two ideas in recent days. On one hand, there’s been a lot of attention to the problem of measuring Twitter authority / influence. On the other hand, there have been efforts, some more serious than others, to monetize the connections established on social networks like Twitter.

Of course, these are flip sides of the same problem: measuring and optimizing value in a social network. Or, as I like to think of it, the influence economy.

I recently proposed a way to measure influence on Twitter–or, more generally, in an asymmetric social network. While the measure is simplistic, it has the virtue of modeling attention scarcity, thus making it resilient to the inflationary effect of people following more people in the hope of reciprocity. I’m quite bullish about it, and looking forward to seeing someone implement it.

Given such a measure, let’s turn to the question of buying and selling friends. If we can measure influence, then we can monetize it, much as content providers monetize their audience’s attention by selling it to advertisers. But, just as content providers destroy their value by spamming their audiences with ads, influencers stand to destroy their own value by selling out.

But, as the saying goes, everyone has a price. It may be crude, but we can certainly compute how much influence X gains from Y following X–as well as how much Y’s value as a follower decreases through the dilution of Y’s attention. Thus, if X wants Y as a follower, perhaps X should offer Y compensation that reflects X’s gain and Y’s loss.

I haven’t yet worked out the math, but it seems straightforward. And it might even translate into a business model for Twitter and other social networks. By supporting real value creation in the network, an online social network is in the best position to demand a cut of that value as a commision.

General

A Twitter Analog to PageRank

Post author By Daniel Tunkelang
Post date January 13, 2009
77 Comments on A Twitter Analog to PageRank

A few weeks ago, there was a flame war about Twitter authority, and I was all too eager to throw fuel on the pyre. But now that the blogosphere has calmed down a bit, I’d like to propose a ranking measure that I think might work. My apologies if it isn’t original. In fact, if you’ve seen it elsewhere, please point me to it.

Let me start with the assumptions about the model:

Influence(X) = Expected number of people who will read a tweet that X tweets, including all retweets of that tweet. For simplicity, we assume that, if a person reads the same message twice (because of retweets), both readings count.
If X is a member of Followers(Y), then there is a 1/||Following(X)|| probability that X will read a tweet posted by Y, where Following(X) is the set of people that X follows.
If X reads a tweet from Y, there’s a constant probability p that X will retweet it.

This model is obviously simplistic in all three assumptions. But I think it’s a reasonable first cut. In particular, it accounts for the inflation that occurs from people who follow in the hopes of reciprocity. There’s less value in being followed by someone who follows a lot of people, because that person is less likely to read your messages or retweet them.

Of course, there’s room for adding more realism to this model, but I hope it is at least close enough to the truth to be interesting.

From this model, it’s easy to measure someone’s influence recursively, assuming that we know the constant retweet probability p:

equation1

The recursion is infinite over a graph with directed cycles, but rapidly converges as high powers of p approach zero. I would think this measure wouldn’t be hard to compute to a reasonable accuracy.

This measure strikes me as a PageRank for Twitter or any system with similar properties. There’s more room for nuance, but I at least find this approach more plausible than the ones I’ve seen. It also strikes me as hard to game, since it isn’t counting retweets, and it’s hard to add much influence through followers who don’t have any influence themselves.

What do folks think? Has anyone tried this? If not, is there anyone who’d like to try hacking an application to compute it? Either way, please let me know!

General

A Word of Thanks to Thanx Media

Post author By Daniel Tunkelang
Post date January 11, 2009

As I hope I’ve made abundantly clear in the past, this is not a corporate blog, and I try to avoid even the appearance of being a shill for my employer, our customers, or our partners.

But I hope you will understand that, in this case, it’s personal.

A couple of months ago, SLI Systems CEO Shaun Ryan did something which I thought was, to put it generously, not taking the high road. In the guise of sending out a helpful “note of caution” to SLI’s customers and prospects, he proceeded to make an attack of the kind I typically associate with desperate political campaigns.

The intended target was Endeca partner Thanx Media. But here’s where we get to the personal part. He used this post of mine to suggest that the software I’ve helped develop and deploy was difficult to set up.

At the time, I was persuaded by colleagues to take the high road myself and not respond. But now that Thanx Media has announced its latest successes, including displacing SLI at CableOrganizer.com only weeks after Ryan blogged about it, I feel it is appropriate to thank the guys at Thanx Media for defending my honor along with their own.

I’m all for healthy competition. I recently gave a technical talk at Google, whose enterprise division competes with Endeca, and I even invited a former EVP at FAST to attend. My aim in organizing the SIGIR Industry Track is to raise the caliber of discussion among competitors. I try to give credit to competitors for their successes, but more importantly I try to keep my criticism fair. I also open up my blog to comments, which means that you folks can keep me honest if I stray from the path.

Here in the United States, many of us are hopeful for an era that will bring us a new kind of politics. Why don’t we start by practicing it ourselves?

General

Is online friendship worth less than a piece of meat?

Post author By Daniel Tunkelang
Post date January 10, 2009
4 Comments on Is online friendship worth less than a piece of meat?

In a brilliant marketing campaign, Burger King is offering a coupon for a free Whopper to anyone who “sacrifices” ten of their Facebook friends. The “whopper sacrifice” campaign is earning mass media coverage, including in the New York Times. I checked it out myself and took the opportunity to trade ten of my more questionable online friendships for a slightly less questionable repast.

Of course, the interesting question in the context of much of the discussion on this blog is what such a campaign tells us about the value of online social network connections. On Facebook, friendship is symmetric, as is also the case on LinkedIn. But it’s interesting to consider how such a campaign might have worked on Twitter. Would you be asked to sacrifice followers or followees?

On one hand, you choose whom you follow, and in theory you follow them because you’re interested in what they have to say. It stands to reason that unfollowing someone would be a sacrifice.

On the other hand, having lots of followers is signals status and perhaps even authority. So perhaps it’s giving up followers that would be a sacrifice.

Of course, these two possibilities aren’t mutually exclusive: there may be value both in following and being followed. Regardless of whether it is better to give than receive, it may be good to do both.

Nonetheless, I suspect that the average online “friendship” is worth less than $0.37 (a whopper goes for $3.69). I’m sure Burger King will have no trouble giving away whoppers.

General

Google Tech Talk: Reconsidering Relevance

Post author By Daniel Tunkelang
Post date January 8, 2009
20 Comments on Google Tech Talk: Reconsidering Relevance

http://static.slideshare.net/swf/ssplayer2.swf?doc=reconsidering-relevance-1231426605583628-1&stripped_title=google-tech-talk-reconsidering-relevance-presentation

I’m still waiting for Google to post a video of the talk to YouTube (the wait is over!), but in the meantime I’ve posted the slides to Scribd and SlideShare. I’ve included speaker notes designed to make the talk completely self-contained.

I’d like to add that my hosts at Google NYC were very gracious, particularly considering that my material was more than a little critical of their approach to search and information retrieval.

Here is the abstract again as a reminder:

We’ve become complacent about relevance. The overwhelming success of web search engines has lulled even information retrieval (IR) researchers to expect only incremental improvements in relevance in the near future. And beyond web search, there are still broad search problems where relevance still feels hopelessly like the pre-Google web.

But even some of the most basic IR questions about relevance are unresolved. We take for granted the very idea that a computer can determine which documents are relevant to a person’s needs. And we still rely on two-word queries (on average) to communicate a user’s information need. But this approach is a contrivance; in reality, we need to think of information-seeking as a problem of optimizing the communication between people and machines.

We can do better. In fact, there are a variety of ongoing efforts to do so, often under the banners of “interactive information retrieval”, “exploratory search”, and “human computer information retrieval”. In this talk, I’ll discuss these initiatives and how they are helping to move “relevance” beyond today’s outdated assumptions.