Categories
General

Approach and Identify

Back on my 30th birthday, my wife gave me a copy of Logan’s Run, with a card ensuring me that I’d found sanctuary. The joke is probably lost on those who haven’t seen this wonderful sci-fi B-movie, as is the title of this post, but you can crib from the script here.

But I’ll get to the point of this post, I just read in the New York Times that Equifax, one of the larger consumer credit reporting agencies in the United States,  is developing an “i-card” service that will let you create and then assert an online identity, backed up by them. Yes, they’re hardly the first to offer some kind of online identity validation, but their being a major offline player may make them different than OpenID or similar services. Then again, the article suggests that the service is complex to use, so it might just fall under its own weight.

In any case, I hope that the blogosphere takes these efforts seriously. As I’ve noted in the past (e.g., here), it strikes me as oddly antisocial that anonymous publishing is the norm in social media, at least for commenters. Yes, anonymity makes sense for whistle blowers, political dissidents, and anyone else who fears retribution. But it is hardly necessary for your average TechCrunch commenter. Instead, it makes it easy for people to post vitriol–or just nonsense–without any risk to personal reputation. I don’t see the social value.

Moreover, just imagine how easy it would be for someone who didn’t like you do start posting embarrassing comments and signing them with your name. Or perhaps someone might pursue a more subtle strategy, such as posting reasonable-sounding comments in order to advance an agenda. Less speculatively, we’ve seen how anonymity can be troublesome for the integrity of Wikipedia editing.

Given the growing role of social media, we’re going to have to cross this information accountability bridge sooner or later. I hope it’s sooner. Would it be nice if we developed a cultural norm that people stood proudly behind their online words?

Categories
Uncategorized

Where Have All The Google Killers Gone?

Harry McCracken at Technologizer just posted “A Brief History of Google Killers“, in which he enumerates fourteen companies that “were supposed to do away with the Web’s biggest brand”. He forgot a few–I’d love to see a a more comprehensive list (e.g., where’s Dipsie?). Still, it’s an informative and entertaining analysis, and would be Google executioners would do well to read the lessons. I have a “So You Want To Kill Google” post in the virtual queue, but this will have to tide you over until I have time to write it.

Categories
General

Great Press, But Where Are The Customers?

One of the things I love about being in the enterprise search /  information access business is that there is always new blood keeping us old-timers on our toes and maintaining the pressure to innovate. While the competitive landscape is brutal (ask any analyst who has covered it over the past decade!), it apparently doesn’t dissuade entrepreneurs from making their own attempts to tackle the fundamental problems of making information accessible and useful.

Two of the higher profile newcomers to the scene are Attivio and Digital Reef. Attivio seems to be everywhere these days: sending its CTO to giving talks; sponsoring conferences and dinners;  and even winning awards. Digital Reef is a bit less gregarious, but they made a lot of press in March when they emerged from stealth mode after two years. Just today, they announced a partnership with FAST, the enterprise search subsidiary of Microsoft.

I’ve interacted with a couple of people at Attivio, and I’ve read some of the Digital Reef blog posts. Both companies intrigue me. But what intrigues me more is that they say almost nothing about their customers. As far as I can tell, Attivio has only announced two customers (Thumbplay.com, Intralinks) and Digital Reef hasn’t announced any. There’s nothing wrong with ramping up (I still remember the early years myself), but I’m struck by the discrepancy between the highly visible marketing and the seemingly invisible customers.

If anyone here knows more about these companies (including representatives from the companies themselves), I’d love to hear your perspectives.

Categories
General

Copying TREC is the Wrong Track for the Enterprise

Otis just wrote a post in which he cited the Open Relevance Project, an embryonic effort by the Lucene project to build a free, public information retrieval evaluation framework analogous to the TREC conference. Not surprisingly, he sees this as an opportunity for Lucene to prove that it is just as good as the commercial enterprise search engines.

On one hand, I’m delighted to see an attempt to make a TREC-like infrastructure more widely accessible. While the Linguistic Data Consortium and the University of Glasgow may only be charging enough to cover their costs, perhaps there are more efficient ways to manage corpora today. Indeed, other alternatives include publishing torrents and Amazon’s public data sets. If the bottleneck is licensing costs, then perhaps there should be a call to donate data–or to assemble collections from public domain sources.

On the other hand, if the goal of this project is to help companies evaluate competing search offerings, then I think its proponents are chasing the wrong problem. Lest you think I’m biased because of my affiliation with one of those commercial search vendors Otis taunts in his post, I encourage you to check out a post that Jeff Dalton (who is certainly pro-Lucene) wrote a year ago, entitled “Open Source Search Engine Evaluation: Test Collections“. In it, he raises a number of issues that go beyond the issue of data availability. One of the issues he brings up is the evaluation of interactive retrieval, an area for where even TREC has struggled.

I understand the desire for Lucene advocates to prove that Lucene is just as good as or better than the commercial search engines–it’s not that different from the desire every vendor has to make competitive claims about his or her own technology. To Otis’s credit, he recognizes that relevance isn’t the only criterion worthy of assessment–he also suggests extending the Open Relevance Project to include  the non-functional metrics of efficiency and scalability. But he still seems to accept an evaluation framework that would treat search engines as out-of-the-box relevance ranking engines.

I dare say I have a little bit of experience with how companies make decisions about search technnology, so let me offer my perspective. Companies build search applications to support specific tasks and information needs. For example, ecommerce sites want to help users find what they are looking for, as well as to target those users with their marketing strategies. Manufacturing companies want to optimize their own part reuse, as well as to make sense of their supply chains. Staffing agencies want to optimize utlization of their consultants and minimize their own costs. Etc.

All of the above rely on search applications to meet their needs. But I don’t think they’d be swayed by a TREC-style relevance bake-off. That’s why companies (and vendors) trumpet success in the form of metrics that reflect task performance (and there are often standard key performance indicators for the various application areas) rather than information retrieval performance. Yes, non-functional requirements like efficiency and scalability matter too–but they presume the functional requirements. If an application can’t meet the functional needs, it really doesn’t matter how quickly it processes queries, or how many documents it can index. Moreover, many companies ask for a proof of concept as part of the sales process. Why? Because they recognize that their needs are idiosyncratic, and they are even skeptical of vendors who have built similar solutions in their space. They see success stories and satisfied customers as positive–but not definitive–evidence.

To summarize: the quest to open up TREC may be of great interest to information retrieval researchers, but I’m highly skeptical that it will create a practically useful framework for comparing search technologies. I think it would be more useful to set up public frameworks where applications (both vendor-sponsored and open-source) can compete on how effectively they help users complete information seeking tasks that are representative of practical applications. I’d love to see a framework like Luis Von Ahn’s “games with a purpose” used for such an endeavor. I would happily participate in such an effort myself, and I’m pretty sure I could drag my employer into it.

Categories
Uncategorized

A Consumer-Centric View of Business Models for Publishing

Curt Monash has a nice post that turns around the question of innovating business models for online publishing . He considers the reasons that people consume information, and uses that as the basis for evaluating the potential of the various business models (e.g., freemium, metered) available to the companies that produce it.

It’s a long post, so I’ll excerpt his conclusions:

  • “Freemium” models, in which one gives away some good information but charges for the best stuff, can succeed.
  • Charging by some kind of usage metric doesn’t make sense.
  • Grand cosmic all-you-can-consume-of-all-but-the-most-highly-valuable-information subscriptions — e.g., an “ASCAP for news” — could be marketable.

Monash doesn’t bring up the possibility of monetizing participation–a route that I think a number of publishers should consider. But he covers a lot of ground, and would-be saviors of the publishing industry would do well do read his sober, comm0n-sense analysis before latching onto a new business model as a get-saved-quick scheme.

Categories
Uncategorized

Wolfram Alpha is Live, But Struggling

Wolfram Alpha is live, though it is experiencing some strain under load. Lots of reactions on Techmeme, both commenting on the brief launch delay and expressing mixed reactions to the service itself.

I encourage you all to try it, at least when it recovers from the initial load. If nothing else, I need for everyone here to keep me honest after I’ve been spouting my opinions about Wolfram Alpha for the past few weeks!

Categories
General

Free Advice to the NYT: Monetize Community

I just read in The Observer that the New York Times is considering two plans to charge online users:

One includes a “meter system,” in which the reader can roam freely on the Web site until hitting a predetermined limit of word-count or pageviews, after which a meter will start running and the reader is charged for movement on the site thereafter…the second proposal [is] a “membership” system. In this model, readers pledge money to the site and are invited into a “New York Times community.”

Here is my free advice: ditch option one, and embrace option 2. It’s not that I don’t believe in charging for content, but rather that nobody else does, and it’s quixotic for even the New York Times to think it can buck the trend solo. OK, not quite solo, but the article cites New York Times executive editor Bill Keller as saying that the Times makes significantly more money  from digital advertising  than The Wall Street Journal makes from its subscription-based pay model. Of course, past performance isn’t necessarily a great predictor of the future, but it’s probably indicative of the near term.

I wrote a couple of months ago that “Community = Copy Protection“. It may also equal business model protection. Of course, the New York Times would have to put serious thought and effort into offering a community worth paying for (I hope the “baseball cap or a T-shirt” suggestion in the article is a joke). But I do believe it’s a vision they should pursue.

Categories
General

The Wolfram Cometh

If you’re curious about Wolfram Alpha but tired of reading second-hand reports and third-hand hysteria about it, then be assured that your wait for first-hand access is almost over. Their blog reports that they will launch today.

UPDATE: The Wolfram Alpha site says:

Watch a live webcast of the Wolfram|Alpha system being brought online for the first time. Friday, May 15, beginning at 7pm CST

It’s been an interesting pre-launch hype cycle, particularly since I’ve gotten to watch it from a pretty good seat–a perk of being an obsessive a prolific blogger.

The initial marketing and buzz offered a level of hyperbole comparable to the hype that surrounded the publication of Wolfram’s New Kind of Science (appreciated NKS by Wolfram and fans) seven years ago. Regulars may recall that I responded by calling it “A New Kind of Marketing (NKM)“.

Apparently sensitive to the dangers of being hyped up (and thus set up to fail) as a “Google killer”, Wolfram Alpha’s marketing team reached out to influencers (and a few little people like me), offering demos and explanations. In fact, I think they were doing an admirable job of mitigating the original damage–up until April 28th. That day, as Wolfram himself was giving a public demonstration of Wolfram Alpha at Harvard, Google releasing Google Public Data to the general public. Ouch. To add insult to injury, Google’s Matt Cutts says the timing was a coincidence–an accidental upstaging!

In any case, the release of Google Public Data amplified the pressure on Wolfram Alpha. A week later, the latter was offering preview access to reporters and bloggers, presumably knowing that the testers would compare the two offerings side by side. Meanwhile, Google has continued raising the stakes through announcements like this week’s preview of Google Squared. I doubt the two companies are as focused on one another as the blogosphere makes them out to be, but it’s certainly an entertaining David vs. Goliath story (albeit where the David has a Goliath-sized ego).

And today is public launch day for Wolfram Alpha. I wish them luck! But I’m pretty sure that many of the people who’ve been waiting for this access will be disillusioned as they struggle with the  NLP interface. None of the marketing team’s attempts at expectation management can mitigate the frustration of an undocumented, brittle interface. Ah well. I did try to tell them. I hope they can make it past that initial blow and then reconsider their approach to the interface.

Categories
General

Reprising the Enterprise Search Summit

‘Tis the season of search conferences, at least for me, and I spent the last couple of days attending the Enterprise Search Summit in New York. I enjoyed it immensely–it’s one of the better networking events in the industry. It was great to catch up with analysts, consultants, information architects, and even competitors–or, as Nate put it, “respected foes”. I’m grateful to Michelle Manafy for putting it all together, and to Will Evans for getting me included on a social search panel.

I found the informal conversations and the final discussion session to be the most valuable activities during the two days. But I also appreciated a number of the presentations, particularly Jared Spool‘s keynote on “Search, Scent, and the Happiness of Pursuit”. Perhaps unsurprisingly, I found a few of the talks disappointingly shallow or salesy. I wish that more vendors and consultants realized that people expect substance from a 45-minute talk, and that the best sales pitch is to deliver that substance. I’m determined to see that all the talks at the SIGIR ’09 Industry Track are substantive; I hope I’ve recruited a line-up that will deliver!

One thing I also liked about the conference is that that the various participating vendors (and there were a lot of us!) were very collegial. There was, however, an exception that I feel compelled to point out. An article on internetnews.com sums it up: “Google Talks Enterprise Search, Bashes Microsoft“, in which Nitin Mangtani, lead product manager for Google enterprise search, said:

One way of doing enterprise search would be to start something in 2001 that didn’t work. You could then do a complete overhaul in 2003, which also didn’t work. In 2007, you could launch a rip-and-replace system and then … you could acquire a large, random, non-integrated system. I’m not going to name any specific company.

I thought that was catty, especially since the reference was obvious to an audience of enterprise search professionals. Moreover, Mangtani’s characterization of the enterprise search space was, to put it diplomatically, interesting. He described Google’s approach to enterprise search as being distinctive because of their attention to structured and semi-structured data. If you’re familiar with both Google‘s and Endeca‘s offerings (or FAST‘s for that matter), I think you’ll share my surprise at this particular characterization. I don’t want to commit the same offense I’m criticizing. I’ve had cordial exchanges with a few of the Google enterprise folks (including Mangtani), and I think Google has a respectable offering in the enterprise space. But they should be big enough (and mindful enough of Google’s corporate reputation) to treat their competitors with respect–and they might do a bit more homework on competitive analysis.

In any case, I can’t complain about Endeca’s visibility among participants and speakers. It seemed that I was hearing Endeca mentioned in every other talk attended–quite a feat considering that none of the talks were by Endeca partisans and that, until recently, Endeca’s marketing department studiously avoided using the term “enterprise search”! Even now, the official corporate positioning is that Endeca enables search applications. I understand the distinction, and in any case I probably shouldn’t use my blog to second-guess my colleagues in marketing. Still, it’s clear that many people looking for what they consider to be enterprise search want what Endeca has to offer, and I’m not one to let a vocabulary problem get in the way of selling software and meeting customers’ needs!

All in all, two days well spent, though I’m glad to get back to my blog–and to my day job. At least I get a small break before the the Text Analytics Summit on June 1-2!

Categories
Uncategorized

Is Google Diving Head First Into HCIR?

I’ve been at the Enterprise Search Summit all day, so I didn’t have the chance to pore through the buzz about Google’s Searchology announcements. But I did see a snippet that struck me as very unusual language for Google:

Our first announcement today is a new set of features that we call Search Options, which are a collection of tools that let you slice and dice your results and generate different views to find what you need faster and easier. Search Options helps solve a problem that can be vexing: what query should I ask?

Google, focusing on query refinement and elaboration? I’m all ears! In fact, eyes and ears–here is the video tour:

Well, on second thought, it’s a bit of a rehash of features they’d already rolled out, and that I personally didn’t find overwhelming (see here and here). Still, I’m pleased that their marketing language is embracing HCIR–that’s a big step for a company that has perhaps done more than anyone to emphasize the primacy of relevance ranking in the search interface. Even if they’re only taking baby steps at this point, I am cautiously optimistic that they will build on them.