Categories
General

The Information Triumvirate

Nicholas Carr (of “Does IT Matter?” fame) wrote a post a couple of days ago entitled “All hail the information triumvirate!

Here is his argument in a nutshell:

what we seem to have here is evidence of a fundamental failure of the Web as an information-delivery service. Three things have happened, in a blink of history’s eye: (1) a single medium, the Web, has come to dominate the storage and supply of information, (2) a single search engine, Google, has come to dominate the navigation of that medium, and (3) a single information source, Wikipedia, has come to dominate the results served up by that search engine. Even if you adore the Web, Google, and Wikipedia – and I admit there’s much to adore – you have to wonder if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing. Is culture best served by an information triumvirate?

Carr discloses that he is on the Encyclopedia Britannica’s board of editorial advisors, but I don’t think he’s writing this as a hit piece against Wikipedia. Nor does he recycle the usual pablum about Wikipedia’s inaccuracies; he has surely read the research that Wikipedia is just as accurate as Britannica. [Note: he has read it and criticized the study.]

I agree with Carr that our current use of Google and Wikipedia impoverishes our experience of the information that the web has to offer. In fact, that was a major subtext of my recent presentation on reconsidering relevance. I’m not sure that the dominance of the web is itself a problem, but that’s because I assume that the web is relentlessly assimilating all of the world’s information.

But Carr’s advice isn’t constructive, nor are most of the comments in response to it. Indeed, people wrongly focus their ire on Wikipedia rather than Google. It’s Google that promotes a winner-take-all information economy through its relevance-centric paradigm; if anything, Wikipedia mediates this effect because its results are collaboratively edited.

I think that, if we constrain ourselves to a relevance-centric information seeking system, our current Google + Wikipedia model is close to optimal. To do better, we need web-scale tools that support exploratory search. Long live the HCIR revolution!

Categories
Uncategorized

Early Stage IR / NLP Investment Opportunity

Those of you who know me offline or online hopefully know that I have a very critical eye, especially when it comes to technical innovations. But every now and then I run into colleagues whose research projects that scream for productization.

I’ve been talking to such a colleague and am eager to hook him up with the right angel or VC investors. He’s in the New York area, if geography matters to you, and the IR / NLP technology he has developed is of the sort that would generally appeal to readers here, but is in no way competitive with my employer’s.

If you are an investor looking for opportunities and are interested in learning more, please contact me directly. Please don’t contact me just because you are curious. I’m not trying to tease anyone, and I promise that, if something comes of this, everyone here will find about about it in due course.

Also, I realize I’m not giving much information away, which is not my style. I recognize that this stinginess may put off potential investors. I’m sorry, but it’s a conscious trade-off. I have a deep respect for my colleague’s privacy, and I also feel that the trust I am asking for acts as a useful filter: if you don’t trust my taste in ready-for-prime-time technology, then there’s no reason for either of us to waste each other’s time.

In any case, I’m not asking for money, just evidence that you are an investor (and no, not evidence in the Nigerian Letter sense). Also, I don’t have any direct stake in the outcome; I’m just trying to help a colleague. Although it’s the sort of venture in which I’d happily serve as an advisor.

You can contact me at dtunkelang at gmail dot com.

Categories
Uncategorized

Shake-Up at Microsoft FAST

Wow, this is quite a week for news in the enterprise search industry! Yesterday, I woke up to hear that Autonomy is acquiring Interwoven; today, I hear from CMS Watch analyst Adriaan Bloem that FAST CEO  John Lervik is resigning, with their CTO Bjørn Olstad stepping in to fill his duties.

FAST had some accounting issues that came out shortly after Microsoft acquired them last year, and it’s possible that Microsoft is simply clearing the deck of anyone associated with those issues. I’m just an innocent bystander, and my pedestrian concern is whether Bjørn’s new duties will prevent him from participating in a panel I’m organizing at SIGIR.

Categories
General

Autonomy acquires Interwoven: Felix Nube

Emperor Maximilian I of Habsburg cultivated the motto, “Bella gerant alii, tu felix Austria nube” (What others achieve by war, let you, happy Austria, achieve by marriage). This morning’s announcement that Autonomy (AUTNF) is acquiring Interwoven (IWOV) for $775M would have made Maximilian proud. Indeed, while some (including Alan Pelz-Sharpe at CMS Watch) have gone as far as to call Autonomy a holding company, I think of Autonomy more as a Habsburgian dynasty.

Autonomy’s acquisitions over the past decade include:

  • Softsound (speech recognition)
  • Virage (multimedia search)
  • etalk (call center software)
  • Verity (enterprise search)
  • Blinkx (multimedia search, since divested)
  • Zantaz (email archiving and litigation support)

And now Autonomy acquires Interwoven, a player in the Enterprise Content Management (ECM) space. It’s not quite an AOL-acquires-Time Warner moment, but it is dramatic to see an enterprise search player acquire a content management player. Indeed, most of the speculation last year was that Autonomy would be acquired, not vice versa.

What does this mean for the rest of us? Given that Autonomy and Interwoven have both been focusing on the compliance and e-discovery space, it is reasonable to expect that the acquisition will deepen this focus. The more interesting question, at least to me, is what this means for the rest of the enterprise search space. Autonomy was recently crowing that it “has won the enterprise search wars“. That’s news to me, but I won’t claim to be objective. In any case, with Autonomy focusing its investment in the compliance / e-discovery space and Microsoft acquiring FAST in order to fold its technology into SharePoint, I do wonder what will happen to the competitive landscape of enterprise search. Will it be Endeca vs. Google?

Categories
Uncategorized

A Warm Reception for “Reconsidering Relevance”

I am proud to report that the “Reconsidering Relevance” presentation has been enjoying  a warm reception:

  • SlideShare: made top (most viewed) presentation on the day it was posted (January 9th) and was featured today (January 21st) by the SlideShare editorial team. Viewed over 1,000 times.
  • Interviewed by Cloud of Data blogger Paul Miller about the presentation, and more generally about Endeca’s approach to enterprise search. Click  here to listen to the podcast.
  • Presentation re-posted on Oracle’s Enterprise 2.0 and Content Management blog.

I am grateful for the attention this presentation has received, and I hope that the attention helps further the HCIR vision that the presentation advocates. I also promise that the YouTube video is forthcoming. Google is very apologetic about the delay, but they assure me that the upload is in process.

Categories
Uncategorized

A Turker’s Got To Know His Limitations

“A man’s got to know his limitations,” as Clint Eastwood famously said as Dirty Harry in Magnum Force. Well, as Panos Ipeirotis reports on his blog, Turkers (the people who are paid to perform tasks using Amazon’s Mechanical Turk service), actually do know their limitations.

Read the full post for details, but here is the punch line:

Turkers can self-report accurately the difficulty of labeling an example correctly! Since example difficulty and labeling quality are strongly interconnected, this also means that they are good at estimating their own quality!

It’s great to see Mechanical Turk used in the service of productive research, and not just as a way to shill on the cheap.

Categories
General

Information Sharing We Can Believe In

Today is a monumental day in the history of the United States, and I suspect many readers will be too caught up in the inaugural festivities to be checking their RSS readers. But everyone has a job to do, and part of mine is to speak truth to power through blogging.

A few months ago, when Obama’s victory was hardly certain, I was fortunate to attend a meeting of New York technology executives where Jed Katz, a managing director at DFJ Gotham Ventures and a technology advisor to the Obama campaign, responded to questions about Obama’s technology strategy. What he made clear was that he, and by extension the Obama campaign and administration-to-be, was there as much to listen and learn as to respond.

I followed up with Jed by sending my ideas about how the federal government can better make public domain information available to the public. Today, much of that data is locked up in the hands of vendors who then dole it out in drabs. While these arrangements may have been necessary at the time to fund the distribution of that information, they are out of step with today’s technology. What we need is for anyone to be able to obtain that information in raw form and then add value to it by creating better ways to access and analyze it. In fact, private sector companies, universities, and NGOs could compete in their offerings.

I’m hardly the first person to suggest this. Vivek Kundra, the CTO of the District of Columbia and a short-listed candidate for CTO of the United States, organized Apps for Democracy, a contest much along the lines of what I’m describing.

I work in the private sector, and I’m not swayed by the naive conception that  all information needs to be free. Some of us would like to earn a living! But public domain information produced by government needs to be freely available to an informed and active citizenry. Moreover, information freely contributed by that citizenry should to be available to decision makers, as well as to the at large.

Today is a milestone many of us never expected we’d live to see, a president who “doesn’t look like all those other presidents on the dollar bills.” Let’s raise the stakes and aim for a government that doesn’t look like the information-hiding governments of our past. That is change I can believe in.

Categories
Uncategorized

Taken Out of Context: Danah Boyd’s Dissertation

Just a heads up that Danah Boyd has published her PhD dissertaion entitled “Taken Out of Context: American Teen Sociality in Networked Publics“. Danah is a rock star in the social networking research community; you might have noitced that I cite her Master’s Thesis from time to time. I’m looking forward to reading her latest work, and to welcoming her to the Boston area, where she’ll be joining Microsoft Research New England.

Categories
Uncategorized

Google Improves Personalization

Today was a harsh news day for Google, with TechCrunch posting a leaked thread of internal emails on why Google employees quit. The emails are intriguing if highly redundant; the schadenfreude comments are merely predictable.

But the more interesting piece of news is that Google is moving beyond its initial SearchWiki efforts to offer more meaningful personalization to users. Their new feature is called Google Preferred Sites. According to the unofficial Google Operating System Blog:

Preferred Sites is a new experimental feature for Google Search that lets you personalize the results by adding a list of sites you want to appear more often when you search. Based on your search history, Google suggests some frequently-visited sites, but you can add any other site.

As regular readers know, I have little love for SearchWiki. But Preferred Sites seems to be a real step, albeit a small one, towards allowing users to meaningfully–and transparently–personalize their search experience. I say “seems” because I haven’t had a chance to try it out. Perhaps someone with a lucky cookie who’s gotten to try it can comment on his or her experience.

Categories
Uncategorized

Jeff’s Search Engine Caffe: Open Source Resources

Jeff Dalton recently updated articles he maintains on open-source NLP and machine learning tools and open-source search engines. Check it out, as well as the discussion about the pros and cons of open source enterprise search on LinkedIn.