The Noisy Channel

 

The Twouble with Twitter Search

May 9th, 2009 · 10 Comments · General

There has been a flurry of reports about Twitter search–whether about Twitter’s plans to improve their search functionality or about alternative ways to search Twitter. But Danny Sullivan makes a great point in a recent post about Google:

Ironically, Google gets a taste of its own medicine with Twitter. It still can’t access the “firehose” of information that Twitter has, in order to build a decent real-time search service. If it can’t strike a deal, expect to hear the company start pushing on how “real-time data” should be open.

Of course, that logic applies not only to Google, but also to anyone with aspirations to build a better mousetrap for Twitter search. As things stand, applications can’t do much better than post-processing native Twitter search results–which makes it hard to offer any noticeable improvement on them. If Twitter offered full Boolean set retrieval (e.g., if a search for star trek returned the set of all tweets containing both words), then applications could implement lots of interesting algorithms and interfaces  on top of their API. I’d love to work on exploratory search applications myself! But the trickle that Twitter returns is hardly enough.

I believe this limitation is by design–that Twitter knows the value of such access and isn’t about to give it away. I just hope Twitter will figure out a way to provide this access for a price, and that an ecology of information access providers develops around it. Of course, if Google or Microsoft buys Twitter first, that probably won’t happen.

10 responses so far ↓

  • 1 Hannes Helander // May 10, 2009 at 1:03 am

    Just to bring to your attention the different search operators actually supported by Twitter, including phrases and sentiment (love the operator) although the latter seems very rudimentary.

    http://search.twitter.com/operators

  • 2 Daniel Tunkelang // May 10, 2009 at 10:11 am

    Hannes, I’m aware of the operators–you might notice that I use some of them to generate my Twitter feed (bottom right). The problem I’m trying to highlight is not the lack of operators. Rather, it is that Twitter only returns the most recent results.

    That’s extremely limiting for popular queries (presumably the main fodder for “real-time” search), e.g., as I’m writing, all of the results for http://search.twitter.com/search?q=star+trek are from less than a minute ago and hardly give me a coherent, holistic picture. It’s also limiting for queries that would show an interesting arc over time, e.g., http://search.twitter.com/search?q=google+books+settlement.

    I could see cost and business model reasons for Twitter not to give this access away. I just hope they find a way to make this access available, under terms and conditions that lead people to make use of it.

  • 3 Brendan O'Connor // May 10, 2009 at 2:53 pm

    I believe this limitation is by design–that Twitter knows the value of such access and isn’t about to give it away.

    Simpler explanation: they’re too busy to implement new features!

    Have you tried using max date restrictions? You can go back further though it’s still limited.

    They even have a blog post saying they wished they could serve up more, but are limited by available hardware and are working on getting more. There’s no business conspiracy. Just a little company with too much too do.

  • 4 Daniel Tunkelang // May 10, 2009 at 5:23 pm

    Brendan, conspiracy is a strong word. But I’m sure the a number of folks, including Google, would happily fund the availability of the firehouse with both money and labor if Twitter would give them a chance–and I’m equally sure that Twitter knows this.

  • 5 David Sterry // May 11, 2009 at 1:37 pm

    Twitter had a decent way for others to get access to the real-time information but that method didn’t scale well. A firehose is just what it is…we’re talking a million or more tweets per day and can expect 10 million tweets a day in a year or two. Jabber could handle it but they could also post the messages to a consumer server 100, 1000, or 10000 messages at a time. The realtime-ness is important to within a minute but people notice missing tweets so the top priority should be completeness.

  • 6 Daniel Tunkelang // May 11, 2009 at 1:49 pm

    I think that full access to the Twitter archives (e.g., tweets that are at least a day-old) would be valuable in its own right, though I realize that what excites people is the “real-time” aspect of Twitter.

    Are you saying that the scaling challenge is because of the desire for sub-minute latency or because of the challenge of servicing a large number of consuming applications?

  • 7 An Able Grape at the Helm of Twitter Search | The Noisy Channel // Aug 13, 2009 at 10:56 am

    [...] While I am an avid Twitter user (and apparently a tradeable commodity in a “Fantasy Twitter” game that some friends are playing), regular readers know that I’ve offered mixed reviews of Twitter Search. [...]

  • 8 An Able Grape at the Helm of Twitter Search | The Noisy Channel // Aug 13, 2009 at 10:56 am

    [...] While I am an avid Twitter user (and apparently a tradeable commodity in a “Fantasy Twitter” game that some friends are playing), regular readers know that I’ve offered mixed reviews of Twitter Search. [...]

  • 9 An Able Grape at the Helm of Twitter Search « AltSearchEngines // Aug 13, 2009 at 4:08 pm

    [...] While I am an avid Twitter user (and apparently a tradeable commodity in a “Fantasy Twitter” game that some friends are playing), regular readers know that I’ve offered mixed reviews of Twitter Search. [...]

  • 10 Is Twitter Planning To Monetize The Firehose? | The Noisy Channel // Oct 8, 2009 at 9:05 am

    [...] few months ago, I wrote in “The Twouble with Twitter Search“: But the trickle that Twitter returns is hardly [...]

Clicky Web Analytics