Categories
General

Real-Time But Not Ready For Prime Time

Extra, extra, read all about it–two new real-time search engines debuted today: CrowdEye and Collecta.

I love the headlines from Techmeme:

Yes, folks, it’s really, really, real-time! Of course Twitter and Facebook have their own real-time search offerings. And apparently Google, Yahoo, and Microsoft are looking hard at real-time too.

I concede that there’s something in this real-time mania. I’ve live-tweeted events, and I’ve followed others who were doing so. I certainly read current news and blogs–as they say, today’s newspaper wraps tomorrow’s fish (someone will have to translate the expression for folks who’ve never read an analog newspaper). But yes, recency / freshness  is a certainly a concern in information seeking.

But it’s not the only one, and I doubt it’s the dominant one. Moreover, the dismissal of web search engines as if their index contents are ancient history is preposterous. Search for iran election on Google, Yahoo, or Bing, and you see a lot of current news. I suppose Twitter offers more recently generated bits, but the main virtue there is not the immediacy–rather, it’s the social nature of the content. For example, a number of people are following @persiankiwi for a personal perspective. I’ll let you decide for yourselves if Collecta or Crowdeye offer something new or valuable–I’m still waiting for the former to show me anything at all!

I know that the technology press likes new buzzwords, and “real-time” search is surely the buzzword du jour, even giving “semantic” search a run for its money. And I understand how many in the blogosphere feel it is their moral duty to cheer on any start-up that makes a go at disrupting the current regime. But I wish these folks would evaluate the new entrants on their merits, rather than simply on the drama of the David vs. Goliath story.

I understand what it’s like on the startup side–it wasn’t that long ago that few people outside the Boston-area technology scene had heard of Endeca. For a long time, I was jealous of people whose companies had generated more buzz. But, in retrospect, I’m at least glad that my colleagues and I had a chance to build a robust product before the press noticed us. Overenthusiastic press isn’t necessarily a good thing, as I’m sure a line-up of prematurely crowned Google killers can attest.

In that spirit, I hope that CrowdEye and Collecta bring something interesting to the market. But I doubt that “real-time” search will cut it, especially if it’s not ready for prime time.

Categories
Uncategorized

Google Markets Itself

I still don’t buy that Google is “gripped with fear“, but I agree with Danny Sullivan’s analysis that Google’s new “Explore Google Search” page (with a link in the usually sacrosanct real estate on the home page) is a reaction to Microsoft’s campaign to market Bing. I’d be curious to know what fraction of Google’s users are aware of the features Google enumerates on that page–perhaps users will actually benefit from the education. But more likely Google is simply disgruntled to see Microsoft getting press for an allegedly different approach that, in most cases, looks a lot like what Google (and, as Sullivan points out, Yahoo) already does.

I’d like to see competition over actual innovation, rather than over perceived innovation through marketing. I suppose I can’t blame Google for tooting its own horn. But the timing does make Google look a bit defensive–or at least reactive. I’m sure it at least put smiles on the faces of the Bing team to have drawn out a reaction.

Categories
Uncategorized

JCDL 2009

For the benefit of those of us not lucky enough to be attending this year’s Joint Conference on Digital Libraries (JCDL 2009), a number of attendees are live-tweeting the conference using the hashtag #jcdl2009. I’m sure there will be blog posts (like these), and I’ll try to round up what I can when the conference wraps up. I also understand that papers will eventually be available in the ACM Digital Library, and that authors are being encouraged to post their own papers on their web sites–if / when that happens, I’ll try to assemble a list here, at least of the ones that particularly catch my attention.

Categories
General

Spam in the Twitterverse

I’ve noted in the past that “real-time” alerting systems, in contrast to search engines that place less emphasis on immediacy, are particularly vulnerable to spamming. It’s a lot like telemarketing–you could avoid it entirely if you routed any questionable calls to voicemail, but then you would, at the very least, not be able to be reached in real time.

At first glance, Twitter seems immune from this sort of spamming, since you only see tweets from the users you follow. Yes, Barack Obama and Guy Kawasaki must spend a lot of time on Twitter! But, regardless of how many users you follow, you are the one in control.

At least that’s the theory. Of course, things tend to work a bit differently in practice. Like many Twitter users, I use Twitter Search to maintain a running vanity query for mentions of my user name, employer, blog, etc. As a result, a user I don’t follow can nonetheless get my attention by tweeting an “at reply” to me. Twitter has struggled to figure out whether that is a good thing or a bad thing, but I suspect that my erring on the side of vanity is a common behavior.

But I do recognize that I’m opening myself up to alert spamming–perhaps not just in theory, but in practice. Today I read on All Things Digital that:

Pontiflex, a lead generation startup that hoovers up names and other other info from users that visit its network of publishers, then sells the data to marketers. The Brooklyn-based company is rolling out a Twitter product that lets marketers compile a list of interested Twitter users.

Since the users aren’t actually signing up to “follow” any of the marketers, said marketers can’t send them direct messages. The marketers could try to “at reply” their leads — the equivalent of shouting out the name of someone you think might be at a loud cocktail party, but who you can’t actually see. But that’s about it.

That’s about enough, if enough users are like me. Fortunately, I’m not enough of a celebrity to be particularly concerned about being singled out–at this stage. But I think the writing is on the wall, and spammers will innovate to embrace social media. I’ve already experienced a few examples of such innovation, and I’m sure that they are child’s play compared to what’s in store.

Personally, I look forward to this spamageddon. Why? Because I think we already have a problem managing attention scarcity in social media, but haven’t found sufficient motivation to confront the problem head on. A spam epidemic will certainly cause us to revisit our priorities, and I’m optimistic that we’ll innovate beyond the existing approaches used for email spam.

Categories
Uncategorized

Wikipedia: Play The Ball, Not The Man

Today’s Freakonomics blog in the New York Times has a nice post entitled “By a Bunch of Nobodies: A Q&A With the Author of The Wikipedia Revolution“, in which Annika Mengisen interviews Wikipedia editor/administrator Andrew Lih.

Here’s an excerpt to whet your appetite:

Q: A while ago, Essjay, one of Wikipedia’s most prominent editors, lied about his background. What, if anything, did this do to Wikipedia’s credibility?

A: A prominent Wikipedia editor nicknamed Essjay claimed to be a tenured academic theologian who had to stay anonymous to protect him from trouble with his school. He was exposed in the end to not have any of those credentials, also lying to The New Yorker magazine about his background.

In this case, what’s interesting is despite his deception, the tens of thousands of edits he made and the community decisions he oversaw were, by all accounts, legitimate and useful. Even with much forensic investigation by community members who were skeptical about whether his fraudulent identity translated into fraudulent edits, they found nothing of note that was considered malfeasance.

This is perhaps why the biggest identity fraud in Wikipedia’s history has not created much of a crisis in community. From the very beginning, to borrow a sports analogy, Wikipedians “played the ball and not the man.”

Read the rest of the article here.

Categories
Uncategorized

Hunch Has Launched

For anyone who has been waiting to try Hunch (which really is a “decision engine“) but didn’t manage to snarf an invite, today is your lucky day: Hunch has launched. They’ve added some new features too–for example, they offer a faceted navigation interface that lets you bypass their ordering of the questions in the decision tree (e.g., for choosing a cocktail).

But be warned, the site may be a bit sluggish. They’re certainly getting bombarded with traffic from the blogosphere and Twitterverse!

Categories
General

Don’t believe everything you read in the New York Post

Now this is the sort of publicity that even $100M can’t buy: the New York Post is reporting that, in response to Microsoft’s recent Bing launch, “FEAR GRIPS GOOGLE” (all caps in the original):

Sergey Brin is so rattled by the launch of Microsoft’s rival search engine that he has assembled a team of top engineers to work on urgent upgrades to his Web service.

I never imagined that anyone would get their technology news from the New York Post, but evidently it’s well read in the blogosphere. Techmeme reports the following articles as citing the New York Post article:

I know that the press loves a good fight, and in technology it’s hard to ask for a better pairing than Google and Microsoft. Moreover, I do think that Google should be paying attention to Microsoft’s positioning of Bing, regardless of how well Microsoft has delivered on that positioning. In any case, it makes sense for Google to keep close tabs on its competitors. After all, even a fraction of a percent of web search market share translates into millions–more than enough revenue to justify a few full-time employees.

Still, to assert that Google is gripped with fear stretches credibility, even for a tabloid. I don’t mean to suggest that Google is so self-confident as to be fearless. Google may well have reacted with fear when it looked like Microsoft would acquire Yahoo–in fact, some have suggested that Google’s proposed (but ultimately abandoned) advertising deal with Yahoo was a Machiavellian maneuver to scuttle the acquisition.

But, unless I’m missing something, Bing simply isn’t a threat to Google’s market dominance. If anyone should be concerned, it’s folks like Kayak who might lose some market share to Bing’s travel search–which seems to be generally acknowledged as Bing’s strongest vertical.

Personally, after being underwhelmed by Bing, I decided to try it for 2 weeks. I made it for about a week and a half, and you can see some of my commentary on Twitter. I stand by initial impression: it’s not bad, but it’s noticeably inferior to Google, and even parity is not enough to reverse the tide. Perhaps the tiny gain–or the slowdown in loss–that they will make in market share will justify their investment. But this is no revolution, and the Gevil Empire is not running scared.

Categories
General

Guest Post at the Federated Search Blog

I wrote a guest post at Sol Lederman’s Federated Search blog entitled “The Problem with Federated Search“. Here’s an excerpt:

The case for federated search is straightforward: no single organization has all of the answers, and therefore no single index can ever hope to complete satisfy its users’ needs. Federation allows the developer of a search application to hedge his or her bets by bringing in knowledge from outside resources.

But federation is no panacea, at least as it is implemented today.

Read the rest here!

Categories
General

Google Wave or just a Blip?

Yesterday, I was fortunate to attend a presentation from a Google Engineering Director about Google Wave, an online communication and collaboration tool that Google recently unveiled at the Google I/O developer conference. For those who, like me, were unable to attend I/O, Google has posted the entire 80-minute presentation on YouTube (embedded above). For those of you without 80 minutes to spare, Gina Trapani has assembled a highlight reel.

The pitch is that email, the most popular technology for online communication, is a 40 years old and needs an overhaul to reflect the opportunities of an always-on world. They also emphasize that everything they’ve done works inside the browser.

The video is sexy, showing off both the real-time updating capabilities of Wave (blurring the lines between email and instant messaging) and the ability to support structure more cleanly than email (e.g., responding to only part of an email). The conversation model is also nice: for example, participants can bring someone new into a conversation, and that new person can access the evolution of a conversation (a sort of retroactive cc). Indeed, Wave looks more like Basecamp than like email.

Google is pitching Wave to developers–they even stole a page from Oprah and gave every Google I/O attendee a new Android phone in order to develop applications using their early-access Wave accounts. I haven’t studied the APIs, but the object model seems reasonable, ranging from a “blip” (a low-level event associated with content, possibly as fine-grained as someone typing a single character) to “wavelets” (the sub-conversations that comprise a wave) to of course the wave itself. And, given that the team is led by the folks who developed Google Maps, I have no doubt that they understand how to play well with mash-ups.

But I’m left with two big questions.

The first is what it would feel like to access this rich structured history of conversation. The search interface feels a lot like Gmail’s–and I don’t mean that as a compliment. I use Gmail, but I curse every time I have to deal with managing search results that include large conversational threads. I think there will be a lot of challenges for managing search results, and I’m curious how Google, with its historically spartan approach to search interfaces, will address them.

The second is about interoperability. For all of the openness, I get the sense that everything can be brought into Wave and Waves can be embedded anywhere. That feels about as open as Facebook. What I’m missing is a sense of how (or even if!) Google Wave will interoperate with other communication platforms. They do show an example of building a Twitter client within Wave–perhaps that is representative of their interoperability strategy.

The Google Wave demo is impressive, and I have no doubt that developers will play with it and build cool demos of their own. But I believe the ultimate success of Google Wave will depend on how they address the above two questions. Time will tell.

Categories
General

Back from Endeca Discover ’09

I hope that regular readers forgive the recent sparsity of posts. I spent most of the last three days attending Discover, Endeca’s annual user conference. It might come as a shock to some (especially the PR folks who keep sending me press releases), but I’m not a professional blogger, and I actually hold down a day job at Endeca as Chief Scientist!

As promised, here are some of the highlights of that user conference, and my thoughts about what makes an event like this successful.

Outside a handful of general sessions, the conference agenda consisted of three parallel tracks: business, technical, and labs.

Most of the business sessions centered around case studies presented by customers (particularly those recognized as Navigator Award winners). I attended as many of these as I could, learning about what Scripps Networks is doing with Endeca’s Page Builder (check out the Food Network site!) and how Expedia and CHIP Online (a massively popular German site similar to CNET) are implementing SEO. The CHIP guys went as far as to perform live queries on Google.de to show off their SEO success–quite a tour de force! Of course, I also made a point of attending the Newssift (Financial Times) and ESPN sessions, since I’m especially proud of what they’ve done with text analytics.

I didn’t attend quite as many of the technical sessions, though I did manage to make it to the one I was presenting: “money for nothing and your tags for free”. I was lucky to have an 80s-friendly crowd, though someone in the audience did confuse Dee Snider with Roger Daltrey. But my favorite technical session was the one about the extensible Endeca: it featured some of our coolest lab-ware, some of it  developed by my team. One of the demos even had a Google Squared sort of feel: it uses WordNet to support dynamic facet creation in response to queries. And it was built by one person on my team in 24 hours for a Guardian Hack Day!

Finally, I didn’t attend the labs, but I heard great feedback about them, especially from customers who had been nervous about using new product features. There’s nothing like hands-on training to build comfort with new technology.

In short, I had a blast, and I heard lots of positive feedback from attendees. I was impressed with the level of the presentations–especially since I’d seen a few fluffy ones in past years’ conferences. This year, I suspect some people will complain that the presentations had too much content! And, while Boston isn’t as nice a setting as Orlando (nor bowling quite as fun as the rides at Universal Studios), I suspect attendees appreciated that the focus was squarely on the substance of the event. I certainly did.