Categories
Uncategorized

Is Google Google-y?

Interviewing  blogger Jeff Jarvis about his recently published book, “What Would Google Do?“, Nick Summers asks:

Are there any areas in which Google itself doesn’t act very “Google-y?” Not disclosing its advertising revenue splits, for example.

Jarvis answers:

Right. There are areas where Google doesn’t act very Google-y, which are mainly about transparency. It can’t be transparent about its algorithms and how they operate, because then they will get gamed more. And those are special sauce. I wish Google were more open about its advertising arrangements and splits, so we had a better sense of the value of the market; I wish it were more open about the sources that it puts into Google News.

I’m glad to see Jarvis recognizing this lack of transparency, which I’ve occasionally railed about on his blog. Read the rest of the interview at Newsweek.

Categories
General

Lucid Imagination

After the big news in the enterprise search space about Autonomy and FAST last week, the announcement of Lucid Imagination raising $6 million may seem anti-climatic. That’s nothing compared to the $775M Autonomy spent to acquire Interwoven, let along the $1.2B that Microsoft paid for FAST last year. And can Lucid Imagination really succeed as the Red Hat of enterprise search, making money by supporting open-source Lucene and Solr?

Perhaps. Lucene is certainly popular among folks looking for a free search engine. Moreover, for people who want to tinker with it, its being open source is a big plus.

But Lucene deployments require extensive customization. This is often the downside of open source, and the reason that industrial use open source software often involves a significant transfer of funds from enterprises to consultants. In contrast, closed-source solutions tend to come with more tooling, integration support, etc. Those are the sorts of details that don’t necessarily excite open-source developers but are crucial for enterprise software.

Will Lucid Imagination revolutionize the enterprise search market by providing low-cost services on top of free software? Perhaps, though I’m skeptical–and not just because they are a potential competitor to my employer. If they are to be more than a body shop, they’ll have to productize their customization efforts.  But I’d imagine that, if Lucid Imagination were to build such products, it would contribute them back into the Lucene code base. That might be great for customers, but it’s not clear how it translates into a sustainable revenue model.

It’s also worth noting that Lucid Imagination isn’t the first company pursuing this model. Sematext, founded in 2007, is another company implementing solutions on top of open source software, including Lucene. In fact, its founder, Otis Gospodnetic, is a regular at The Noisy Channel. Perhaps he can comment on the space.

Categories
Uncategorized

Microsoft Songsmith: Reverse Karaoke

Some readers have noticed that I often take shots at Google on this blog, but seem to give Microsoft a pass. I assure you that I am not a Microsoft fan boy–in fact, Microsoft’s enterprise search subsidiary, FAST, competes with Endeca more than Google does. But today I’ll prove that I’m an equal-opportunity critic by talking about Microsoft Songsmith.

The idea is brilliant, at least in theory:

Just open up Songsmith, choose from one of thirty different musical styles, and press record. Sing whatever you like – a birthday song for Mom, a love song for that special someone (they’ll be impressed that you wrote a song for them!), or maybe just try playing with your favorite pop songs. As soon as you press “stop”, Songsmith will generate musical accompaniment to match your voice, and play back your song for you. It’s that simple.

What do the critics say? Here’s what Randall Stross writes in the New York Times

How satisfying are the musical results? Microsoft lets you hear for yourself in a promotional video titled “Everyone Has a Song Inside.” The video is getting more attention than the software because it’s awful, in unintentional ways. “Notes on ‘Camp’, ” the 1964 essay by Susan Sontag, identifies a category of art that isn’t campy, just “bad to the point of being laughable, but not bad to the point of being enjoyable.” The Songsmith video is exactly that.

But I have to wonder if the researchers and product developers at Microsoft imagined what would happen when they released Songsmith into the wild. People have been taking actual vocal tracks from pop songs and feeding them into Songsmith. The results are–well, I’ll let you judge for yourself from this rendition of “Sgt. Pepper’s Lonely Hearts Club Band.”

Categories
General

The Information Triumvirate

Nicholas Carr (of “Does IT Matter?” fame) wrote a post a couple of days ago entitled “All hail the information triumvirate!

Here is his argument in a nutshell:

what we seem to have here is evidence of a fundamental failure of the Web as an information-delivery service. Three things have happened, in a blink of history’s eye: (1) a single medium, the Web, has come to dominate the storage and supply of information, (2) a single search engine, Google, has come to dominate the navigation of that medium, and (3) a single information source, Wikipedia, has come to dominate the results served up by that search engine. Even if you adore the Web, Google, and Wikipedia – and I admit there’s much to adore – you have to wonder if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing. Is culture best served by an information triumvirate?

Carr discloses that he is on the Encyclopedia Britannica’s board of editorial advisors, but I don’t think he’s writing this as a hit piece against Wikipedia. Nor does he recycle the usual pablum about Wikipedia’s inaccuracies; he has surely read the research that Wikipedia is just as accurate as Britannica. [Note: he has read it and criticized the study.]

I agree with Carr that our current use of Google and Wikipedia impoverishes our experience of the information that the web has to offer. In fact, that was a major subtext of my recent presentation on reconsidering relevance. I’m not sure that the dominance of the web is itself a problem, but that’s because I assume that the web is relentlessly assimilating all of the world’s information.

But Carr’s advice isn’t constructive, nor are most of the comments in response to it. Indeed, people wrongly focus their ire on Wikipedia rather than Google. It’s Google that promotes a winner-take-all information economy through its relevance-centric paradigm; if anything, Wikipedia mediates this effect because its results are collaboratively edited.

I think that, if we constrain ourselves to a relevance-centric information seeking system, our current Google + Wikipedia model is close to optimal. To do better, we need web-scale tools that support exploratory search. Long live the HCIR revolution!

Categories
Uncategorized

Early Stage IR / NLP Investment Opportunity

Those of you who know me offline or online hopefully know that I have a very critical eye, especially when it comes to technical innovations. But every now and then I run into colleagues whose research projects that scream for productization.

I’ve been talking to such a colleague and am eager to hook him up with the right angel or VC investors. He’s in the New York area, if geography matters to you, and the IR / NLP technology he has developed is of the sort that would generally appeal to readers here, but is in no way competitive with my employer’s.

If you are an investor looking for opportunities and are interested in learning more, please contact me directly. Please don’t contact me just because you are curious. I’m not trying to tease anyone, and I promise that, if something comes of this, everyone here will find about about it in due course.

Also, I realize I’m not giving much information away, which is not my style. I recognize that this stinginess may put off potential investors. I’m sorry, but it’s a conscious trade-off. I have a deep respect for my colleague’s privacy, and I also feel that the trust I am asking for acts as a useful filter: if you don’t trust my taste in ready-for-prime-time technology, then there’s no reason for either of us to waste each other’s time.

In any case, I’m not asking for money, just evidence that you are an investor (and no, not evidence in the Nigerian Letter sense). Also, I don’t have any direct stake in the outcome; I’m just trying to help a colleague. Although it’s the sort of venture in which I’d happily serve as an advisor.

You can contact me at dtunkelang at gmail dot com.

Categories
Uncategorized

Shake-Up at Microsoft FAST

Wow, this is quite a week for news in the enterprise search industry! Yesterday, I woke up to hear that Autonomy is acquiring Interwoven; today, I hear from CMS Watch analyst Adriaan Bloem that FAST CEO  John Lervik is resigning, with their CTO Bjørn Olstad stepping in to fill his duties.

FAST had some accounting issues that came out shortly after Microsoft acquired them last year, and it’s possible that Microsoft is simply clearing the deck of anyone associated with those issues. I’m just an innocent bystander, and my pedestrian concern is whether Bjørn’s new duties will prevent him from participating in a panel I’m organizing at SIGIR.

Categories
General

Autonomy acquires Interwoven: Felix Nube

Emperor Maximilian I of Habsburg cultivated the motto, “Bella gerant alii, tu felix Austria nube” (What others achieve by war, let you, happy Austria, achieve by marriage). This morning’s announcement that Autonomy (AUTNF) is acquiring Interwoven (IWOV) for $775M would have made Maximilian proud. Indeed, while some (including Alan Pelz-Sharpe at CMS Watch) have gone as far as to call Autonomy a holding company, I think of Autonomy more as a Habsburgian dynasty.

Autonomy’s acquisitions over the past decade include:

  • Softsound (speech recognition)
  • Virage (multimedia search)
  • etalk (call center software)
  • Verity (enterprise search)
  • Blinkx (multimedia search, since divested)
  • Zantaz (email archiving and litigation support)

And now Autonomy acquires Interwoven, a player in the Enterprise Content Management (ECM) space. It’s not quite an AOL-acquires-Time Warner moment, but it is dramatic to see an enterprise search player acquire a content management player. Indeed, most of the speculation last year was that Autonomy would be acquired, not vice versa.

What does this mean for the rest of us? Given that Autonomy and Interwoven have both been focusing on the compliance and e-discovery space, it is reasonable to expect that the acquisition will deepen this focus. The more interesting question, at least to me, is what this means for the rest of the enterprise search space. Autonomy was recently crowing that it “has won the enterprise search wars“. That’s news to me, but I won’t claim to be objective. In any case, with Autonomy focusing its investment in the compliance / e-discovery space and Microsoft acquiring FAST in order to fold its technology into SharePoint, I do wonder what will happen to the competitive landscape of enterprise search. Will it be Endeca vs. Google?

Categories
Uncategorized

A Warm Reception for “Reconsidering Relevance”

I am proud to report that the “Reconsidering Relevance” presentation has been enjoying  a warm reception:

  • SlideShare: made top (most viewed) presentation on the day it was posted (January 9th) and was featured today (January 21st) by the SlideShare editorial team. Viewed over 1,000 times.
  • Interviewed by Cloud of Data blogger Paul Miller about the presentation, and more generally about Endeca’s approach to enterprise search. Click  here to listen to the podcast.
  • Presentation re-posted on Oracle’s Enterprise 2.0 and Content Management blog.

I am grateful for the attention this presentation has received, and I hope that the attention helps further the HCIR vision that the presentation advocates. I also promise that the YouTube video is forthcoming. Google is very apologetic about the delay, but they assure me that the upload is in process.

Categories
Uncategorized

A Turker’s Got To Know His Limitations

“A man’s got to know his limitations,” as Clint Eastwood famously said as Dirty Harry in Magnum Force. Well, as Panos Ipeirotis reports on his blog, Turkers (the people who are paid to perform tasks using Amazon’s Mechanical Turk service), actually do know their limitations.

Read the full post for details, but here is the punch line:

Turkers can self-report accurately the difficulty of labeling an example correctly! Since example difficulty and labeling quality are strongly interconnected, this also means that they are good at estimating their own quality!

It’s great to see Mechanical Turk used in the service of productive research, and not just as a way to shill on the cheap.

Categories
General

Information Sharing We Can Believe In

Today is a monumental day in the history of the United States, and I suspect many readers will be too caught up in the inaugural festivities to be checking their RSS readers. But everyone has a job to do, and part of mine is to speak truth to power through blogging.

A few months ago, when Obama’s victory was hardly certain, I was fortunate to attend a meeting of New York technology executives where Jed Katz, a managing director at DFJ Gotham Ventures and a technology advisor to the Obama campaign, responded to questions about Obama’s technology strategy. What he made clear was that he, and by extension the Obama campaign and administration-to-be, was there as much to listen and learn as to respond.

I followed up with Jed by sending my ideas about how the federal government can better make public domain information available to the public. Today, much of that data is locked up in the hands of vendors who then dole it out in drabs. While these arrangements may have been necessary at the time to fund the distribution of that information, they are out of step with today’s technology. What we need is for anyone to be able to obtain that information in raw form and then add value to it by creating better ways to access and analyze it. In fact, private sector companies, universities, and NGOs could compete in their offerings.

I’m hardly the first person to suggest this. Vivek Kundra, the CTO of the District of Columbia and a short-listed candidate for CTO of the United States, organized Apps for Democracy, a contest much along the lines of what I’m describing.

I work in the private sector, and I’m not swayed by the naive conception that  all information needs to be free. Some of us would like to earn a living! But public domain information produced by government needs to be freely available to an informed and active citizenry. Moreover, information freely contributed by that citizenry should to be available to decision makers, as well as to the at large.

Today is a milestone many of us never expected we’d live to see, a president who “doesn’t look like all those other presidents on the dollar bills.” Let’s raise the stakes and aim for a government that doesn’t look like the information-hiding governments of our past. That is change I can believe in.