Categories
Uncategorized

Attending Endeca Discover

Apologies for the unusual hiatus in posting–I’ve been attending Endeca Discover (an annual user conference) and haven’t managed to allocate time for blogging. I’ll make up for it by blogging about the conference tomorrow, when I’m back to what passes for normality. In the mean time, feel free to follow the conference on Twitter.

Categories
General

Those Who Give Twitter *Get* Twitter

Marshall Kirkpatrick at ReadWriteWeb wrote a post arguing that the people working at Twitter aren’t using the service the way its power users do, and that this bodes ill for Twitter. His main arguments:

  • Twitter’s employees don’t twitter very much: an average of 2 to 3 tweets per person per day.
  • Twitter employees don’t follow very many other people: only 2 out of 49 Twitter team members follow more than 500 people and no one was over 1k.
  • Twitter staff members aren’t following top Twitter developers in the community.

I can’t really address the third point, but the first two–and especially the second–are hardly helpful to Kirkpatrick’s case. To the contrary, they argue that the people who work at Twitter get it. And, to make sure Kirkpatrick got it, Twitter CEO Ev Williams even wrote him a letter, in which he said:

Many people fall into the trap that you should follow all or most people back out of a sense of politeness or so-called engagement with the community… At a certain point, you’re not actually reading any more tweets by following more people — you’re just dipping into the stream somewhat randomly and missing a whole lot of what people say. That’s fine, but I believe people will generally get more value out of Twitter by dropping the symmetrical relationship expectation and simply curating their following list based on the information and people they want to tune in to.

Amen! I’ve been hammering this point here in most of my posts about Twitter, but here is a handful of examples for newer readers:

And of course the whole point of TunkRank is to discourage the vicious circle of reciprocity and fake following. That’s baked into the the measure which, like PageRank, divides the voting power by the number of out-links.

The comments on Kirkpatrick’s post suggest that a lot of regular Twitter users also get it. I find that reassuring, especially given the hype around Twitter in the last several weeks. Twitter can be a useful tool but it will help if people don’t devalue it by imposing cultural norms that devalue the social network. I’m glad the folks who have given us Twitter realize that.

Categories
General

Enabling Exploration Through Text Analytics

As promised, here are my slides from the recently held Text Analytics Summit. Feel free to download them from SlideShare–some of the animation may not come through in this version (though I try to use such animation sparingly).

I enjoyed the conference, and was pleasantly surprised by the overall intellectual level of participants, who included a number of end-users of text analytics, as well as senior technologists from the leading text analytics vendors. Yes, there were sales and marketing people there too, and the occasional vendor fluff piece, but someone’s got to pay the bills. I believe that all of the presentations will be posted online in the next few weeks.

And, speaking of vendors, I hope to see some of you next week at Endeca Discover. I’ll be delivering an 80s-themed presentation entitles “money for nothing and your tags for free”.

Also, while I have your attention, I urge everyone to spread the word about SIGIR. If you are in industry and have little patience for academic conference, I still encourage you to consider the one-day SIGIR Industry Track. For $300 (compare that to any other industry conference!), you get a chance to hear and meet a star-studded line-up, including:

I hope to see many of you there, and I also appreciate if you can spread the word, since SIGIR doesn’t traditionally market to industry professionals.

Categories
General

Google Squared: A Great First Step

Regular readers know that I am not a Google fan boy, and that much of my commentary on Google focuses on their neglect of exploratory search. Nonetheless, when I saw the initial Youtubeware describing Google Squared a few weeks ago, my ears perked up. I decided to wait until it went live to assess it. Well, it’s live now.

The idea of Google Squared is simple: it “collects facts from the web and presents them in an organized collection, similar to a spreadsheet.” The best way to understand it is to try it. For example, search for hybrid car, and you’ll see a table of hybrids, with columns corresponding to image, description, type of transmission, yeah, and height. Add a price column if you’d like, and it will populate it for you. Very slick.

Of course, it is, as Google admits, “by no means perfect”. Most queries will show its warts, and some, like information scientists, are way off (it doesn’t even try to return results for library scientists). But it does pretty well when there is structured data out there, and it makes admirable attempt to find it! I suspect the real trick here is that it does a decent job of finding determining instances of the query category (perhaps a souped up version of work they started discussing back in 2004), and then mining structured content about those instances from repositories like Freebase.

I mean, look at these results:

To be clear, I picked these examples after a fair amount of trial and error–like Wolfram Alpha, it is hit and miss, with more miss than hit. But, as Seth Grimes said at the recent Text Analytics Summit, when Wolfram Alpha is good, it’s very very good, but when it’s bad, it’s horrid. Google Squared doesn’t fail quite so spectacularly, and it gives you a lot more of a chance to interact with it.

This is, by far, the best step I’ve seen Google take towards HCIR, and I’m impressed. It’s still a toy at this stage, but I think it has a future. My warmest congratulations to Daniel Dulitz and the rest of the magpie team that developed it; I’m looking forward to seeing it evolve.

Categories
General

Google Search Appliance Woos, But Does It Wow?

Yesterday, Google announced the latest version of its search appliance, GSA 6.0, to great fanfare. As usual, their emphasis was on scale: they’re pushing a distributed architecture that lets them “push it to a new realm: billions”. It’s a nice sound bite, and it played well to the press.

The few analysts who commented about it were somewhat more critical. Matthew Brown from Forrester said, “They’re coming to market so late, with requirements that were established years and years ago. They’ve reached parity with where the market was four or five years ago.” Adriaan Bloem from CMS Watch was even harsher, assessing many of Google’s claims as exaggerated and requiring a complexity at odds with their positioning as a plug-and-play appliance.

Given my role at Endeca, I’m in no position to be objective. But I’ll share my impressions, which you can take with the appropriate grain of salt. We don’t encounter Google much as a competitor; FAST and Autonomy are still more likely to show up with us on prospective customers’ short lists. And, while I have met happy GSA customers, I’ve met many more enterprise buyers who scoff when I suggest the GSA as a candidate solution for them (yes, that’s why I’m not in sales). Also, my recent experience of seeing how Google positions the GSA was less than persuasive. There is still a widespread impression that Google is not serious about this market segment.

Of course, the market will decide, and a data-driven company like Google will surely track the success of its efforts quantitatively. But for now, I don’t feel that Google’s announcement has changed the competitive landscape. As always, I’m curious to hear others’ opinions.

Categories
Uncategorized

Faceted Search Book: Now Available Online!

I’m delighted to report that my faceted search book is now available for online purchase at the Morgan & Claypool site! The printed version should be going out shortly (you can pre-order at Barnes & Noble or Amazon); the publisher assures me that there will be copies in time for SIGIR.

Categories
General

Banging on Bing: A Bummer

So, Bing is out early. Yes, an early release from Microsoft! And it’s snappy, attractive, and offers decent quality. If I needed to use Bing as my main search engine for the web (yes, readers, imagine a world without Google and Yahoo as search options). I’d survive.

But I can’t say I’d be thrilled. I’ve only had a short time to play with Bing, but I’m not overwhelmed. In fact, I’m quite disappointed, given their big talk about deliver a “decision engine“, I expected at least a little bit of innovation in the user experience. No such luck, The focus is still on the ranked list, and their ranking is, at least to my taste, perceptibly inferior to Google’s. I could live with that small difference if the interface offered real opportunities for interaction. But there isn’t anything new there. You can refine by result type (Web, Images, Videos, Shopping, News, Maps, Local, Travel), but search engines have been doing that for years.

The only novelty is “xRank”, which lets you “see who and what everyone’s searching for most”. It’s intriguing, but it seems half-baked, and I suspect that others are further ahead on crowd-sourcing relevance through the social stream.

I take no pleasure in throwing cold water on the queue of challengers that attempt to provide competition for Google in web search. Perhaps Bing is truly in beta, and will prove itself a more formidable challenger in the future. But it’s surely not there now.

Categories
General

Journalism Is Not Like Craigslist

In response to a meeting sponsored by the Newspaper Association of America about “Models to Monetize Content” (i.e., how to charge for online news), Scott Rosenberg writes that charging for articles could hobble the future of journalism.

His main argument is an appeal to analogy: the classifieds business:

In at least one area, the newspaper web sites of the 90s didn’t give away the store…But the greatest success of all came in the unlikely form of Craigslist, a community-based enterprise led by a shy programmer who offered classifieds not as a profit-making enterprise but (in all but a tiny subset of categories) as a free service. As a result, newspapers’ classified businesses today have been devastated.

It’s a sobering point: staying the course is not always the best strategy, and Craigslist is certainly a case study in disruptive innovation. As a user, I find the Craiglist user experience awful. But it turns out that price (to sellers) often trumps user experience–though I would note that, at least in New York City, many people still pay to have their apartments listed in the New York Times real estate section.

But journalism is very different from the classifieds market. Classifieds are the the canonical example of a peer-to-peer business. You hardly need to consider a quasi-altruistic endeavor like Craigslist–consider what eBay did to the classifieds market by reducing its friction. Sellers and buyers are naturally incented to participate, so it’s just a matter of supplying the right infrastructure and then taking enough of a cut to sustain that infrastructure and, in the case of eBay or other for-profit market places, a bit more than that to make a profit.

I’d like to see someone explain how free online journalism is supposed to work that way. Sure, there are content producers and consumers, but these roles aren’t quite like sellers and buyers if a world where no one pays for content. Of course the status quo is the ad-supported model, where what is being sold for money is the readers’ attention, rather than the content itself. But the advocates of this model might want to remember that free, ad-supported print media has existed far longer than the internet. I believe the technical term for such a publications is “rags”. Not exactly “change” I can believe in.

I’m realistic that it will be very hard for newspapers to put the free content genie back in the bottle. But I cringe when I hear pontification against their even trying to do so. I think they belatedly recognize their mistake, and are desperately trying for a do-over.

Categories
Uncategorized

Page’s Law? Try Wirth’s Law. Or Gates’s.

I hesitate to cite Valleywag as a news source, but I did read there that Sergey Brin is crediting fellow Google co-founder Larry Page with “Page’s Law“, the assertion that software gets twice as slow every 18 months, and thus outpaces Moore’s law.

Fortunately for Page, he is already assured of a solid entry in the history books. Because Page’s Law sounds suspiciously like Wirth’s law, pronounced by computer science titan Niklaus Wirth in 1995: “Software is getting slower more rapidly than hardware becomes faster.” In fact, the more precise version cited by Page is known as Gates’s law–though I don’t think Bill Gates want to take credit for it.

Categories
General

Waiting for the Big Bing

Everyone is talking about Bing today–well, everyone who isn’t too busy watching Google do the Wave. If you haven’t been paying attention, Microsoft is about to spend $100M to market an upgrade of its web search engine, rebranding it from Live to Bing (by way of Kumo).

I’m reserving judgment about it until I have a chance to play with it myself–and I assume I’ll have to wait until next week just like everyone else (unless any kind insider cares to offer me a sneak preview). But it is interesting to see how Microsoft is positioning Bing as a “decision engine” rather than a search engine, with messaging at least mildly suggestive of HCIR. At least according to their marketing, their focus is on organizing results, simplifying tasks, and supporting decision making.

I personally can’t help noticing how the messaging looks familiar–and they even name-check a very familiar customer in their marketing video. But it’s probably just a coincidence. 🙂

In any case, Google also claims to organize the world’s information–and it’s even starting to offer users limited query refinement capabilities. If Microsoft is going to make headway on this round, I suspect they’ll have to deliver a significantly better experience than Google at the task level for at least a couple of common tasks. The areas they tout in their marketing are travel, health, and shopping. It certainly wouldn’t be hard for Microsoft to beat Google on all three–note the overlap with areas where Google isn’t good enough. But it remains to be seen if they will.