Category: General

General posts, typically analyzing HCIR issues.

Google Squared: A Great First Step

Post author By Daniel Tunkelang
Post date June 4, 2009
23 Comments on Google Squared: A Great First Step

Regular readers know that I am not a Google fan boy, and that much of my commentary on Google focuses on their neglect of exploratory search. Nonetheless, when I saw the initial Youtubeware describing Google Squared a few weeks ago, my ears perked up. I decided to wait until it went live to assess it. Well, it’s live now.

The idea of Google Squared is simple: it “collects facts from the web and presents them in an organized collection, similar to a spreadsheet.” The best way to understand it is to try it. For example, search for hybrid car, and you’ll see a table of hybrids, with columns corresponding to image, description, type of transmission, yeah, and height. Add a price column if you’d like, and it will populate it for you. Very slick.

Of course, it is, as Google admits, “by no means perfect”. Most queries will show its warts, and some, like information scientists, are way off (it doesn’t even try to return results for library scientists). But it does pretty well when there is structured data out there, and it makes admirable attempt to find it! I suspect the real trick here is that it does a decent job of finding determining instances of the query category (perhaps a souped up version of work they started discussing back in 2004), and then mining structured content about those instances from repositories like Freebase.

I mean, look at these results:

To be clear, I picked these examples after a fair amount of trial and error–like Wolfram Alpha, it is hit and miss, with more miss than hit. But, as Seth Grimes said at the recent Text Analytics Summit, when Wolfram Alpha is good, it’s very very good, but when it’s bad, it’s horrid. Google Squared doesn’t fail quite so spectacularly, and it gives you a lot more of a chance to interact with it.

This is, by far, the best step I’ve seen Google take towards HCIR, and I’m impressed. It’s still a toy at this stage, but I think it has a future. My warmest congratulations to Daniel Dulitz and the rest of the magpie team that developed it; I’m looking forward to seeing it evolve.

General

Google Search Appliance Woos, But Does It Wow?

Post author By Daniel Tunkelang
Post date June 3, 2009
2 Comments on Google Search Appliance Woos, But Does It Wow?

Yesterday, Google announced the latest version of its search appliance, GSA 6.0, to great fanfare. As usual, their emphasis was on scale: they’re pushing a distributed architecture that lets them “push it to a new realm: billions”. It’s a nice sound bite, and it played well to the press.

The few analysts who commented about it were somewhat more critical. Matthew Brown from Forrester said, “They’re coming to market so late, with requirements that were established years and years ago. They’ve reached parity with where the market was four or five years ago.” Adriaan Bloem from CMS Watch was even harsher, assessing many of Google’s claims as exaggerated and requiring a complexity at odds with their positioning as a plug-and-play appliance.

Given my role at Endeca, I’m in no position to be objective. But I’ll share my impressions, which you can take with the appropriate grain of salt. We don’t encounter Google much as a competitor; FAST and Autonomy are still more likely to show up with us on prospective customers’ short lists. And, while I have met happy GSA customers, I’ve met many more enterprise buyers who scoff when I suggest the GSA as a candidate solution for them (yes, that’s why I’m not in sales). Also, my recent experience of seeing how Google positions the GSA was less than persuasive. There is still a widespread impression that Google is not serious about this market segment.

Of course, the market will decide, and a data-driven company like Google will surely track the success of its efforts quantitatively. But for now, I don’t feel that Google’s announcement has changed the competitive landscape. As always, I’m curious to hear others’ opinions.

General

Banging on Bing: A Bummer

Post author By Daniel Tunkelang
Post date June 1, 2009
23 Comments on Banging on Bing: A Bummer

So, Bing is out early. Yes, an early release from Microsoft! And it’s snappy, attractive, and offers decent quality. If I needed to use Bing as my main search engine for the web (yes, readers, imagine a world without Google and Yahoo as search options). I’d survive.

But I can’t say I’d be thrilled. I’ve only had a short time to play with Bing, but I’m not overwhelmed. In fact, I’m quite disappointed, given their big talk about deliver a “decision engine“, I expected at least a little bit of innovation in the user experience. No such luck, The focus is still on the ranked list, and their ranking is, at least to my taste, perceptibly inferior to Google’s. I could live with that small difference if the interface offered real opportunities for interaction. But there isn’t anything new there. You can refine by result type (Web, Images, Videos, Shopping, News, Maps, Local, Travel), but search engines have been doing that for years.

The only novelty is “xRank”, which lets you “see who and what everyone’s searching for most”. It’s intriguing, but it seems half-baked, and I suspect that others are further ahead on crowd-sourcing relevance through the social stream.

I take no pleasure in throwing cold water on the queue of challengers that attempt to provide competition for Google in web search. Perhaps Bing is truly in beta, and will prove itself a more formidable challenger in the future. But it’s surely not there now.

General

Journalism Is Not Like Craigslist

Post author By Daniel Tunkelang
Post date May 29, 2009
7 Comments on Journalism Is Not Like Craigslist

In response to a meeting sponsored by the Newspaper Association of America about “Models to Monetize Content” (i.e., how to charge for online news), Scott Rosenberg writes that charging for articles could hobble the future of journalism.

His main argument is an appeal to analogy: the classifieds business:

In at least one area, the newspaper web sites of the 90s didn’t give away the store…But the greatest success of all came in the unlikely form of Craigslist, a community-based enterprise led by a shy programmer who offered classifieds not as a profit-making enterprise but (in all but a tiny subset of categories) as a free service. As a result, newspapers’ classified businesses today have been devastated.

It’s a sobering point: staying the course is not always the best strategy, and Craigslist is certainly a case study in disruptive innovation. As a user, I find the Craiglist user experience awful. But it turns out that price (to sellers) often trumps user experience–though I would note that, at least in New York City, many people still pay to have their apartments listed in the New York Times real estate section.

But journalism is very different from the classifieds market. Classifieds are the the canonical example of a peer-to-peer business. You hardly need to consider a quasi-altruistic endeavor like Craigslist–consider what eBay did to the classifieds market by reducing its friction. Sellers and buyers are naturally incented to participate, so it’s just a matter of supplying the right infrastructure and then taking enough of a cut to sustain that infrastructure and, in the case of eBay or other for-profit market places, a bit more than that to make a profit.

I’d like to see someone explain how free online journalism is supposed to work that way. Sure, there are content producers and consumers, but these roles aren’t quite like sellers and buyers if a world where no one pays for content. Of course the status quo is the ad-supported model, where what is being sold for money is the readers’ attention, rather than the content itself. But the advocates of this model might want to remember that free, ad-supported print media has existed far longer than the internet. I believe the technical term for such a publications is “rags”. Not exactly “change” I can believe in.

I’m realistic that it will be very hard for newspapers to put the free content genie back in the bottle. But I cringe when I hear pontification against their even trying to do so. I think they belatedly recognize their mistake, and are desperately trying for a do-over.

General

Waiting for the Big Bing

Everyone is talking about Bing today–well, everyone who isn’t too busy watching Google do the Wave. If you haven’t been paying attention, Microsoft is about to spend $100M to market an upgrade of its web search engine, rebranding it from Live to Bing (by way of Kumo).

I’m reserving judgment about it until I have a chance to play with it myself–and I assume I’ll have to wait until next week just like everyone else (unless any kind insider cares to offer me a sneak preview). But it is interesting to see how Microsoft is positioning Bing as a “decision engine” rather than a search engine, with messaging at least mildly suggestive of HCIR. At least according to their marketing, their focus is on organizing results, simplifying tasks, and supporting decision making.

I personally can’t help noticing how the messaging looks familiar–and they even name-check a very familiar customer in their marketing video. But it’s probably just a coincidence. 🙂

In any case, Google also claims to organize the world’s information–and it’s even starting to offer users limited query refinement capabilities. If Microsoft is going to make headway on this round, I suspect they’ll have to deliver a significantly better experience than Google at the task level for at least a couple of common tasks. The areas they tout in their marketing are travel, health, and shopping. It certainly wouldn’t be hard for Microsoft to beat Google on all three–note the overlap with areas where Google isn’t good enough. But it remains to be seen if they will.

General

Topsy: Tippling the Stream of Conversations

Post author By Daniel Tunkelang
Post date May 27, 2009
3 Comments on Topsy: Tippling the Stream of Conversations

Cited as “amazing” by the master of hype, TechCrunch’s Mike Arrington, it’s…Topsy: “The first index is based exclusively on Twitter statuses and the wonderful people who write them.” Apparently they have been in stealth mode for three years!

I’ve only played around with it a little, but I think I have a feel for the quality. It’s hardly amazing (sorry, Mike), but it’s not embarassingly bad either–especially if they really are only relying on Twitter rather than crawling and indexing the web. If that is the case, then they have certainly made the case that it is possible to build a serviceable search engine using only the social stream, and that is an impressive proof point.

Moreover, Topsy is taking an approach that Google (and web search engines in general), neglect at their peril: treating people as first-class objects. For example, a search for exploratory search returns a list of Twitter users, many of which should be familiar to readers here. They also have pages associated with Twitter users, like this one.

I see Topsy as a very early proof of concept–I can’t imagine anyone relying on it in its present form. But it does deserve a look. Forget all the hoopla about “real-time” search. As far as I can tell, that obsession is a sideshow compared to the real value of Twitter and other social media tools, which is to make search as much about people as about content.

General

NYT Appoints a “Social Media Editor”

Post author By Daniel Tunkelang
Post date May 26, 2009

What’s a social media editor? I have no idea, but the New York Times now has one! As reported in ReadWriteWeb, paidContent.org, and of course Valleywag, the paper of record has appointed Jennifer Preston, former editor of the regional sections, as its first social media editor.

I agree with Marshall Kirkpatrick at ReadWriteWeb that

We would love to see Preston fill a role similar to what Mathew Ingram does at the Toronto Globe and Mail, Canada’s largest daily paper. Ingram’s position is “Communities Editor” but he interfaces with social media activities both on and off of the paper’s site.

I think of Ingram more as a blogger than an editor, but in any case he’s certainly a credible voice in the brave new world of social media, and the New York Times would do well to have such a person on its staff.

It’s not as if the Times has been sitting on its hands–check out their APIs and their Open blog. But these efforts seem driven more by their technologists than by the editorial side of the house. My sense at Times Open was that the editors are still scared that any change could dilute their brand equity.

I’ve taken the apparently controversial stance that the New York Times should seek ways to monetize community. A hopefully less controversial assertion is that the paper needs to expose the value of that community. Few papers have the sort of brand-name writers that can act as attention magnets in a highly competitive attention economy, sucking in readers from Facebook, Twitter, the blogosphere, and the web as a whole. Of course, the management has to allow those writers to do so, which may be tough for an old guard used to assuring quality through control.

Still, I’m hopeful that the New York Times is taking a step in the right direction. I know nothing about Preston, or about the Times’s intentions beyond what’s been published in the articles I cited. Nonetheless, the gray lady seems to understand that now is the time to learn new tricks.

General

News, Search Experience, and Value

Post author By Daniel Tunkelang
Post date May 23, 2009
3 Comments on News, Search Experience, and Value

I’ve been known to spar with Jeff Jarvis about Google’s role in the present and future of journalism, but I readily admit I’m something of an amateur. Thus I’m delighted to see a thoughtful post from Josh Young, a more serious and informed media junkie, entitled “Not by Links Alone“. In it, he does a great job of explaining the main opposing positions in the debate over whether Google is good or bad for journalism.

Jarvis offers his own summary of the post:

He’s saying that Google is causing news to be reshaped so it can be found, now that it has been unbundled from the products we used to have no choice but to buy: our newspapers. He says that news is an “experience good” we can’t really know until we taste it. He says we need a new experience of news and it ain’t Google.

Jarvis further suggests that he adds value to the post by adding a “search-engine-and-browsing-friendly summary”, i.e., a lede to make the article SEO friendly. Without a doubt Jarvis does encourage readers to find the article, since citation (and a link) from a prominent blogger is a boon to traffic. I’m less persuaded that this has anything to do with SEO.

Regardless, I’d like to except what I see as the main point of the post, it’s summary of the perspective of news executives:

Google’s approach to the Web can’t reproduce the important connection the news once had with readers. Google just doesn’t fit layered, subtle, multi-dimensional products—experience goods—like articles of serious journalism. Because news is an experience good, we need really good recommendations about whether we’re going to enjoy it. And the Google-centered link economy just won’t do. It doesn’t add quite enough value.

Because, as Jarvis said a few years ago (and as Josh cites in his post), Google commodifies everything (my bad for not citing him here). Needless to say, I agree with Josh that:

What we need is a search experience that let’s us discover the news in ways that fit why we actually care about it. We need a search experience built around concretely identifiable sources and writers. We need a search experience built around our friends and, lest we dwell too snugly in our own comfort zones, other expert readers we trust.

It is that need that motivates much of my work at Endeca, particularly in working with media organizations like the Financial Times, the Guardian, and WebMD. Yes, we live in the present and can’t neglect the importance of SEO in a Google-dominated world, I’m much more excited about adding value through a user-centered search experience than about helping sites compete in a zero-sum game. Google’s good or evil notwithstanding, I don’t want Google to be the gatekeeper for the world’s information.

General

Is Google Conjuring a “Magic Inbox” for Gmail?

Post author By Daniel Tunkelang
Post date May 21, 2009
2 Comments on Is Google Conjuring a “Magic Inbox” for Gmail?

Alex Chitu at the unofficial Google Operating System blog reports that:

Gmail’s code reveals an upcoming feature called “magic inbox” or “icebox inbox”, which is likely to prioritize the messages sent by your friends and other contacts you email frequently.

That wouldn’t be hard to implement for Google or any other email service / application that has access to your history, but I’m skeptical of the value of implementing prioritization this way. I can’t speak for others, but I personally have no reason to believe there is a correlation between frequency of contact and priority. Indeed, I’ve found that non-spam out-of-the-blue emails are sometimes the most pressing ones, e.g., requests to write something for a publication or present at a conference. Not to say that my more frequent correspondents aren’t important, but if anything they have other ways to reach me with time-sensitive requests.

I’ve pushed for attention bond mechanisms before, and I’ll do it again. I’d love to see them implemented in a way that plays well with the infrastructure and is usable. To my knowledge, they are the most promising way both to improve spam filtering (though, in fairness, current spam filters work adequately) and to prioritize non-spam. But I recognize that the infrastructure and usability hurdles are significant.

General

SIGIR ’09 Industry Track Program

Post author By Daniel Tunkelang
Post date May 19, 2009
5 Comments on SIGIR ’09 Industry Track Program

At long last, SIGIR 2009 has posted the program for the Industry Track! It will take place on Wednesday, July 22, 2009 during the regular conference program (in parallel with the technical tracks). There is no additional registration fee for full conference attendees, but there is a one-day registration option for people who only want to attend the Industry Track.

Here’s the condensed version of the program:

Presentations

Matt Cutts, Google: “Web Spam and Adversarial IR: The Road Ahead”
danah boyd, Microsoft Research: “The Searchable Nature of Acts in Networked Publics”
Vanja Josifovski, Yahoo! Research: “Ad Retrieval – A new Frontier of Information Retrieval”
Thomas (Tom) Tague, Thomson Reuters: “Semantic Web and the Linked Data Economy”
Tip House, OCLC: “Alexandria 2.0: Search Innovations Keep Libraries Relevant in an Online World”

Panel of Search Industry Analysts

Whit Andrews, Gartner
Susan Feldman, IDC
Theresa Regli, CMS Watch

Panel of Enterprise Search Vendors

Øystein Torbjørnsen, FAST
Peter Menell, Autonomy
Adam Ferrari, Endeca

More details are available on the the Industry Track page. The early registration deadline is this Sunday, May 24th, so please register soon if you haven’t already, before the fees go up by $50.