Categories
Uncategorized

2009 Enterprise Search Sourcebook

Enterprise Search Sourcebook 2009

I just noticed that the 2009 edition of the Enterprise Search Sourcebook is now available. Published by Information Today, it’s a nice way to survey the landscape. Of course, it goes without saying that you to take vendor claims with a grain of salt, but you have to start somewhere!

Of course, if you’re interested in learning more about enterprise search, be sure to check out the program for the SIGIR 2009 Industry Track–particularly the two panels: one comprised of leading industry analysts, and the other of senior technologists from the top three enterprise search vendors. Yes, I’m biased: I’m organizing it. 🙂

Categories
General

Topsy: Tippling the Stream of Conversations

Cited as “amazing” by the master of hype, TechCrunch’s Mike Arrington, it’s…Topsy: “The first index is based exclusively on Twitter statuses and the wonderful people who write them.” Apparently they have been in stealth mode for three years!

I’ve only played around with it a little, but I think I have a feel for the quality. It’s hardly amazing (sorry, Mike), but it’s not embarassingly bad either–especially if they really are only relying on Twitter rather than crawling and indexing the web. If that is the case, then they have certainly made the case that it is possible to build a serviceable search engine using only the social stream, and that is an impressive proof point.

Moreover, Topsy is taking an approach that Google (and web search engines in general), neglect at their peril: treating people as first-class objects. For example, a search for exploratory search returns a list of Twitter users, many of which should be familiar to readers here. They also have pages associated with Twitter users, like this one.

I see Topsy as a very early proof of concept–I can’t imagine anyone relying on it in its present form. But it does deserve a look. Forget all the hoopla about “real-time” search. As far as I can tell, that obsession is a sideshow compared to the real value of Twitter and other social media tools, which is to make search as much about people as about content.

Categories
General

NYT Appoints a “Social Media Editor”

What’s a social media editor? I have no idea, but the New York Times now has one! As reported in ReadWriteWeb, paidContent.org, and of course Valleywag, the paper of record has appointed Jennifer Preston, former editor of the regional sections, as its first social media editor.

I agree with Marshall Kirkpatrick at ReadWriteWeb that

We would love to see Preston fill a role similar to what Mathew Ingram does at the Toronto Globe and Mail, Canada’s largest daily paper. Ingram’s position is “Communities Editor” but he interfaces with social media activities both on and off of the paper’s site.

I think of Ingram more as a blogger than an editor, but in any case he’s certainly a credible voice in the brave new world of social media, and the New York Times would do well to have such a person on its staff.

It’s not as if the Times has been sitting on its hands–check out their APIs and their Open blog. But these efforts seem driven more by their technologists than by the editorial side of the house. My sense at Times Open was that the editors are still scared that any change could dilute their brand equity.

I’ve taken the apparently controversial stance that the New York Times should seek ways to monetize community. A hopefully less controversial assertion is that the paper needs to expose the value of that community. Few papers have the sort of brand-name writers that can act as attention magnets in a highly competitive attention economy, sucking in readers from Facebook, Twitter, the blogosphere, and the web as a whole. Of course, the management has to allow those writers to do so, which may be tough for an old guard used to assuring quality through control.

Still, I’m hopeful that the New York Times is taking a step in the right direction. I know nothing about Preston, or about the Times’s intentions beyond what’s been published in the articles I cited. Nonetheless, the gray lady seems to understand that now is the time to learn new tricks.

Categories
General

News, Search Experience, and Value

I’ve been known to spar with Jeff Jarvis about Google’s role in the present and future of journalism, but I readily admit I’m something of an amateur. Thus I’m delighted to see a thoughtful post from Josh Young, a more serious and informed media junkie, entitled “Not by Links Alone“. In it, he does a great job of explaining the main opposing positions in the debate over whether Google is good or bad for journalism.

Jarvis offers his own summary of the post:

He’s saying that Google is causing news to be reshaped so it can be found, now that it has been unbundled from the products we used to have no choice but to buy: our newspapers. He says that news is an “experience good” we can’t really know until we taste it. He says we need a new experience of news and it ain’t Google.

Jarvis further suggests that he adds value to the post by adding a “search-engine-and-browsing-friendly summary”, i.e., a lede to make the article SEO friendly. Without a doubt Jarvis does encourage readers to find the article, since citation (and a link) from a prominent blogger is a boon to traffic. I’m less persuaded that this has anything to do with SEO.

Regardless, I’d like to except what I see as the main point of the post, it’s summary of the perspective of news executives:

Google’s approach to the Web can’t reproduce the important connection the news once had with readers. Google just doesn’t fit layered, subtle, multi-dimensional products—experience goods—like articles of serious journalism. Because news is an experience good, we need really good recommendations about whether we’re going to enjoy it. And the Google-centered link economy just won’t do. It doesn’t add quite enough value.

Because, as Jarvis said a few years ago (and as Josh cites in his post), Google commodifies everything (my bad for not citing him here). Needless to say, I agree with Josh that:

What we need is a search experience that let’s us discover the news in ways that fit why we actually care about it. We need a search experience built around concretely identifiable sources and writers. We need a search experience built around our friends and, lest we dwell too snugly in our own comfort zones, other expert readers we trust.

It is that need that motivates much of my work at Endeca, particularly in working with media organizations like the Financial Times, the Guardian, and WebMD. Yes, we live in the present and can’t neglect the importance of SEO in a Google-dominated world, I’m much more excited about adding value through a user-centered search experience than about helping sites compete in a zero-sum game. Google’s good or evil notwithstanding, I don’t want Google to be the gatekeeper for the world’s information.

Categories
Uncategorized

NSF Report on Information Seeking Support Systems

Long-time readers may recall that I participated last year in an NSF Information Seeking Support Systems Workshop at the University of North Carolina, organized by Gary Marchionini and Ryen White. Some of the output of that workshop recently surfaced in a special issue of IEEE Computer. It’s a great issue, and I recommend it to those who have access to it either in print or online form (through the IEEE digital library).

But I know that many readers here do not have ready access to this material. Hence, I am delighted to announce that the workshop report is now available for free online. It’s a great introduction to the concerns occupying HCIR researchers.

Of course, if you’d like to meet some of these people face-to-face, I recommend you participate in HCIR ’09 this October. The deadline for position papers is August 24th. I hope to see you there!

Categories
Uncategorized

Book. Is. Done.

I couldn’t think of a better way to start a holiday weekend than by uploading the revised chapters of my faceted search book to the publisher. It was the first–and hopefully last–time that I have hand-edited pdf files (download a trial version of Acrobat here if you’re jealous). Barring some unforeseen event, the publishers will incorporate these last edits and then make the book available in hard-copy and electronic format in a few weeks!

I’d to thank everyone who helped me put the book together. I’m grateful to Candy Schwartz (the Co-Editor of Library & Information Science Research–how cool is that?) for her thorough feedback, as well to the the veritable army of voluntary reviewers who offered great suggestions to improve both the content and style of the book: Omar Alonso, Pete Bell, Amitava Biswas, Blade Kotelly, Sol Lederman (who also wrote a review based on the draft), Milan Merhar, Jennifer Novosad, XiaoGuang Qi, Brett Randall, Dusan Rnic, and Joshua Young. I hope I haven’t missed anyone!

And of course I am grateful for the support of my family and co-workers. Hopefully they will be glad to have me back among the ranks of the living.

Categories
Uncategorized

Data.gov

When I first heard that Vivek Kundra was on the short list to be the CTO of the United States, I was very excited about the possibility that he would implement information sharing at a national level like he had in DC. Today I’m happy to read that that the Federal CIO Council has launched Data.gov:

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data.

OK, so it’s still in beta, but I’m still gratified to see this step toward government transparency. And what a boon to information retrieval researchers looking for public data sets!

Categories
General

Is Google Conjuring a “Magic Inbox” for Gmail?

Alex Chitu at the unofficial Google Operating System blog reports that:

Gmail’s code reveals an upcoming feature called “magic inbox” or “icebox inbox”, which is likely to prioritize the messages sent by your friends and other contacts you email frequently.

That wouldn’t be hard to implement for Google or any other email service / application that has access to your history, but I’m skeptical of the value of implementing prioritization this way. I can’t speak for others, but I personally have no reason to believe there is a correlation between frequency of contact and priority. Indeed, I’ve found that non-spam out-of-the-blue emails are sometimes the most pressing ones, e.g., requests to write something for a publication or present at a conference. Not to say that my more frequent correspondents aren’t important, but if anything they have other ways to reach me with time-sensitive requests.

I’ve pushed for attention bond mechanisms before, and I’ll do it again. I’d love to see them implemented in a way that plays well with the infrastructure and is usable. To my knowledge, they are the most promising way both to improve spam filtering (though, in fairness, current spam filters work adequately) and to prioritize non-spam. But I recognize that the infrastructure and usability hurdles are significant.

Categories
Uncategorized

Google Suggests…Ads

I haven’t seen this in my own browser yet, but MG Siegler at TechCrunch reports that Google Suggest has added advertising (see Google’s official post here). It also talks about personalization, but I’ve been seeing that for a while, so I don’t know that there’s anything new on that front.

In any case, here’s an example of a suggested ad, courtesy of TechCrunch:

I’m sure Firefox extensions like CustomizeGoogle will soon blog these ads, if they aren’t doing so already. Granted, I can hardly blame an ad-supported service for pushing more ads–and in this case the ad is actually a relevant result, independent of the fact that it’s sponsored. In fact, it’s the top-ranked organic search result for south park episodes. I imagine the feature will be considerably more annoying when the sponsored links are more typical ads, but probably not enough so to incite people to install ad blockers. Google seems to know how not to push people too far.

Categories
General

SIGIR ’09 Industry Track Program

At long last, SIGIR 2009 has posted the program for the Industry Track! It will take place on Wednesday, July 22, 2009 during the regular conference program (in parallel with the technical tracks). There is no additional registration fee for full conference attendees, but there is a one-day registration option for people who only want to attend the Industry Track.

Here’s the condensed version of the program:

Presentations

  • Matt Cutts, Google: “Web Spam and Adversarial IR: The Road Ahead”
  • danah boyd, Microsoft Research: “The Searchable Nature of Acts in Networked Publics”
  • Vanja Josifovski, Yahoo! Research: “Ad Retrieval – A new Frontier of Information Retrieval”
  • Thomas (Tom) Tague, Thomson Reuters: “Semantic Web and the Linked Data Economy”
  • Tip House, OCLC:  “Alexandria 2.0: Search Innovations Keep Libraries Relevant in an Online World”

Panel of Search Industry Analysts

  • Whit Andrews, Gartner
  • Susan Feldman, IDC
  • Theresa Regli, CMS Watch

Panel of Enterprise Search Vendors

  • Øystein Torbjørnsen, FAST
  • Peter Menell, Autonomy
  • Adam Ferrari, Endeca

More details are available on the the Industry Track page. The early registration deadline is this Sunday, May 24th, so please register soon if you haven’t already, before the fees go up by $50.