Month: May 2009

Is Google Conjuring a “Magic Inbox” for Gmail?

Post author By Daniel Tunkelang
Post date May 21, 2009
2 Comments on Is Google Conjuring a “Magic Inbox” for Gmail?

Alex Chitu at the unofficial Google Operating System blog reports that:

Gmail’s code reveals an upcoming feature called “magic inbox” or “icebox inbox”, which is likely to prioritize the messages sent by your friends and other contacts you email frequently.

That wouldn’t be hard to implement for Google or any other email service / application that has access to your history, but I’m skeptical of the value of implementing prioritization this way. I can’t speak for others, but I personally have no reason to believe there is a correlation between frequency of contact and priority. Indeed, I’ve found that non-spam out-of-the-blue emails are sometimes the most pressing ones, e.g., requests to write something for a publication or present at a conference. Not to say that my more frequent correspondents aren’t important, but if anything they have other ways to reach me with time-sensitive requests.

I’ve pushed for attention bond mechanisms before, and I’ll do it again. I’d love to see them implemented in a way that plays well with the infrastructure and is usable. To my knowledge, they are the most promising way both to improve spam filtering (though, in fairness, current spam filters work adequately) and to prioritize non-spam. But I recognize that the infrastructure and usability hurdles are significant.

Uncategorized

Google Suggests…Ads

I haven’t seen this in my own browser yet, but MG Siegler at TechCrunch reports that Google Suggest has added advertising (see Google’s official post here). It also talks about personalization, but I’ve been seeing that for a while, so I don’t know that there’s anything new on that front.

In any case, here’s an example of a suggested ad, courtesy of TechCrunch:

I’m sure Firefox extensions like CustomizeGoogle will soon blog these ads, if they aren’t doing so already. Granted, I can hardly blame an ad-supported service for pushing more ads–and in this case the ad is actually a relevant result, independent of the fact that it’s sponsored. In fact, it’s the top-ranked organic search result for south park episodes. I imagine the feature will be considerably more annoying when the sponsored links are more typical ads, but probably not enough so to incite people to install ad blockers. Google seems to know how not to push people too far.

General

SIGIR ’09 Industry Track Program

Post author By Daniel Tunkelang
Post date May 19, 2009
5 Comments on SIGIR ’09 Industry Track Program

At long last, SIGIR 2009 has posted the program for the Industry Track! It will take place on Wednesday, July 22, 2009 during the regular conference program (in parallel with the technical tracks). There is no additional registration fee for full conference attendees, but there is a one-day registration option for people who only want to attend the Industry Track.

Here’s the condensed version of the program:

Presentations

Matt Cutts, Google: “Web Spam and Adversarial IR: The Road Ahead”
danah boyd, Microsoft Research: “The Searchable Nature of Acts in Networked Publics”
Vanja Josifovski, Yahoo! Research: “Ad Retrieval – A new Frontier of Information Retrieval”
Thomas (Tom) Tague, Thomson Reuters: “Semantic Web and the Linked Data Economy”
Tip House, OCLC: “Alexandria 2.0: Search Innovations Keep Libraries Relevant in an Online World”

Panel of Search Industry Analysts

Whit Andrews, Gartner
Susan Feldman, IDC
Theresa Regli, CMS Watch

Panel of Enterprise Search Vendors

Øystein Torbjørnsen, FAST
Peter Menell, Autonomy
Adam Ferrari, Endeca

More details are available on the the Industry Track page. The early registration deadline is this Sunday, May 24th, so please register soon if you haven’t already, before the fees go up by $50.

General

Approach and Identify

Back on my 30th birthday, my wife gave me a copy of Logan’s Run, with a card ensuring me that I’d found sanctuary. The joke is probably lost on those who haven’t seen this wonderful sci-fi B-movie, as is the title of this post, but you can crib from the script here.

But I’ll get to the point of this post, I just read in the New York Times that Equifax, one of the larger consumer credit reporting agencies in the United States, is developing an “i-card” service that will let you create and then assert an online identity, backed up by them. Yes, they’re hardly the first to offer some kind of online identity validation, but their being a major offline player may make them different than OpenID or similar services. Then again, the article suggests that the service is complex to use, so it might just fall under its own weight.

In any case, I hope that the blogosphere takes these efforts seriously. As I’ve noted in the past (e.g., here), it strikes me as oddly antisocial that anonymous publishing is the norm in social media, at least for commenters. Yes, anonymity makes sense for whistle blowers, political dissidents, and anyone else who fears retribution. But it is hardly necessary for your average TechCrunch commenter. Instead, it makes it easy for people to post vitriol–or just nonsense–without any risk to personal reputation. I don’t see the social value.

Moreover, just imagine how easy it would be for someone who didn’t like you do start posting embarrassing comments and signing them with your name. Or perhaps someone might pursue a more subtle strategy, such as posting reasonable-sounding comments in order to advance an agenda. Less speculatively, we’ve seen how anonymity can be troublesome for the integrity of Wikipedia editing.

Given the growing role of social media, we’re going to have to cross this information accountability bridge sooner or later. I hope it’s sooner. Would it be nice if we developed a cultural norm that people stood proudly behind their online words?

Uncategorized

Where Have All The Google Killers Gone?

Post author By Daniel Tunkelang
Post date May 19, 2009
4 Comments on Where Have All The Google Killers Gone?

Harry McCracken at Technologizer just posted “A Brief History of Google Killers“, in which he enumerates fourteen companies that “were supposed to do away with the Web’s biggest brand”. He forgot a few–I’d love to see a a more comprehensive list (e.g., where’s Dipsie?). Still, it’s an informative and entertaining analysis, and would be Google executioners would do well to read the lessons. I have a “So You Want To Kill Google” post in the virtual queue, but this will have to tide you over until I have time to write it.

General

Great Press, But Where Are The Customers?

Post author By Daniel Tunkelang
Post date May 19, 2009
2 Comments on Great Press, But Where Are The Customers?

One of the things I love about being in the enterprise search / information access business is that there is always new blood keeping us old-timers on our toes and maintaining the pressure to innovate. While the competitive landscape is brutal (ask any analyst who has covered it over the past decade!), it apparently doesn’t dissuade entrepreneurs from making their own attempts to tackle the fundamental problems of making information accessible and useful.

Two of the higher profile newcomers to the scene are Attivio and Digital Reef. Attivio seems to be everywhere these days: sending its CTO to giving talks; sponsoring conferences and dinners; and even winning awards. Digital Reef is a bit less gregarious, but they made a lot of press in March when they emerged from stealth mode after two years. Just today, they announced a partnership with FAST, the enterprise search subsidiary of Microsoft.

I’ve interacted with a couple of people at Attivio, and I’ve read some of the Digital Reef blog posts. Both companies intrigue me. But what intrigues me more is that they say almost nothing about their customers. As far as I can tell, Attivio has only announced two customers (Thumbplay.com, Intralinks) and Digital Reef hasn’t announced any. There’s nothing wrong with ramping up (I still remember the early years myself), but I’m struck by the discrepancy between the highly visible marketing and the seemingly invisible customers.

If anyone here knows more about these companies (including representatives from the companies themselves), I’d love to hear your perspectives.

General

Copying TREC is the Wrong Track for the Enterprise

Post author By Daniel Tunkelang
Post date May 18, 2009
45 Comments on Copying TREC is the Wrong Track for the Enterprise

Otis just wrote a post in which he cited the Open Relevance Project, an embryonic effort by the Lucene project to build a free, public information retrieval evaluation framework analogous to the TREC conference. Not surprisingly, he sees this as an opportunity for Lucene to prove that it is just as good as the commercial enterprise search engines.

On one hand, I’m delighted to see an attempt to make a TREC-like infrastructure more widely accessible. While the Linguistic Data Consortium and the University of Glasgow may only be charging enough to cover their costs, perhaps there are more efficient ways to manage corpora today. Indeed, other alternatives include publishing torrents and Amazon’s public data sets. If the bottleneck is licensing costs, then perhaps there should be a call to donate data–or to assemble collections from public domain sources.

On the other hand, if the goal of this project is to help companies evaluate competing search offerings, then I think its proponents are chasing the wrong problem. Lest you think I’m biased because of my affiliation with one of those commercial search vendors Otis taunts in his post, I encourage you to check out a post that Jeff Dalton (who is certainly pro-Lucene) wrote a year ago, entitled “Open Source Search Engine Evaluation: Test Collections“. In it, he raises a number of issues that go beyond the issue of data availability. One of the issues he brings up is the evaluation of interactive retrieval, an area for where even TREC has struggled.

I understand the desire for Lucene advocates to prove that Lucene is just as good as or better than the commercial search engines–it’s not that different from the desire every vendor has to make competitive claims about his or her own technology. To Otis’s credit, he recognizes that relevance isn’t the only criterion worthy of assessment–he also suggests extending the Open Relevance Project to include the non-functional metrics of efficiency and scalability. But he still seems to accept an evaluation framework that would treat search engines as out-of-the-box relevance ranking engines.

I dare say I have a little bit of experience with how companies make decisions about search technnology, so let me offer my perspective. Companies build search applications to support specific tasks and information needs. For example, ecommerce sites want to help users find what they are looking for, as well as to target those users with their marketing strategies. Manufacturing companies want to optimize their own part reuse, as well as to make sense of their supply chains. Staffing agencies want to optimize utlization of their consultants and minimize their own costs. Etc.

All of the above rely on search applications to meet their needs. But I don’t think they’d be swayed by a TREC-style relevance bake-off. That’s why companies (and vendors) trumpet success in the form of metrics that reflect task performance (and there are often standard key performance indicators for the various application areas) rather than information retrieval performance. Yes, non-functional requirements like efficiency and scalability matter too–but they presume the functional requirements. If an application can’t meet the functional needs, it really doesn’t matter how quickly it processes queries, or how many documents it can index. Moreover, many companies ask for a proof of concept as part of the sales process. Why? Because they recognize that their needs are idiosyncratic, and they are even skeptical of vendors who have built similar solutions in their space. They see success stories and satisfied customers as positive–but not definitive–evidence.

To summarize: the quest to open up TREC may be of great interest to information retrieval researchers, but I’m highly skeptical that it will create a practically useful framework for comparing search technologies. I think it would be more useful to set up public frameworks where applications (both vendor-sponsored and open-source) can compete on how effectively they help users complete information seeking tasks that are representative of practical applications. I’d love to see a framework like Luis Von Ahn’s “games with a purpose” used for such an endeavor. I would happily participate in such an effort myself, and I’m pretty sure I could drag my employer into it.

Uncategorized

A Consumer-Centric View of Business Models for Publishing

Post author By Daniel Tunkelang
Post date May 17, 2009
1 Comment on A Consumer-Centric View of Business Models for Publishing

Curt Monash has a nice post that turns around the question of innovating business models for online publishing . He considers the reasons that people consume information, and uses that as the basis for evaluating the potential of the various business models (e.g., freemium, metered) available to the companies that produce it.

It’s a long post, so I’ll excerpt his conclusions:

“Freemium” models, in which one gives away some good information but charges for the best stuff, can succeed.
Charging by some kind of usage metric doesn’t make sense.
Grand cosmic all-you-can-consume-of-all-but-the-most-highly-valuable-information subscriptions — e.g., an “ASCAP for news” — could be marketable.

Monash doesn’t bring up the possibility of monetizing participation–a route that I think a number of publishers should consider. But he covers a lot of ground, and would-be saviors of the publishing industry would do well do read his sober, comm0n-sense analysis before latching onto a new business model as a get-saved-quick scheme.

Uncategorized

Wolfram Alpha is Live, But Struggling

Post author By Daniel Tunkelang
Post date May 16, 2009
12 Comments on Wolfram Alpha is Live, But Struggling

Wolfram Alpha is live, though it is experiencing some strain under load. Lots of reactions on Techmeme, both commenting on the brief launch delay and expressing mixed reactions to the service itself.

I encourage you all to try it, at least when it recovers from the initial load. If nothing else, I need for everyone here to keep me honest after I’ve been spouting my opinions about Wolfram Alpha for the past few weeks!

General

Free Advice to the NYT: Monetize Community

Post author By Daniel Tunkelang
Post date May 15, 2009
3 Comments on Free Advice to the NYT: Monetize Community

I just read in The Observer that the New York Times is considering two plans to charge online users:

One includes a “meter system,” in which the reader can roam freely on the Web site until hitting a predetermined limit of word-count or pageviews, after which a meter will start running and the reader is charged for movement on the site thereafter…the second proposal [is] a “membership” system. In this model, readers pledge money to the site and are invited into a “New York Times community.”

Here is my free advice: ditch option one, and embrace option 2. It’s not that I don’t believe in charging for content, but rather that nobody else does, and it’s quixotic for even the New York Times to think it can buck the trend solo. OK, not quite solo, but the article cites New York Times executive editor Bill Keller as saying that the Times makes significantly more money from digital advertising than The Wall Street Journal makes from its subscription-based pay model. Of course, past performance isn’t necessarily a great predictor of the future, but it’s probably indicative of the near term.

I wrote a couple of months ago that “Community = Copy Protection“. It may also equal business model protection. Of course, the New York Times would have to put serious thought and effort into offering a community worth paying for (I hope the “baseball cap or a T-shirt” suggestion in the article is a joke). But I do believe it’s a vision they should pursue.