A Scaling Challenge for Twitter Search

The other day, I explained why, as far as I can tell, Twitter’s existing search functionality isn’t that hard to implement. In a subsequent post, I argued that Twitter is not a search engine, an opinion that seems to place me in a minority of the blogosphere, albeit a substantial one.

Today I see that Twitter is have some trouble with what strikes me as a core constituency, SXSW attendees.

Daniel Terdiman at CNET writes that “At SXSW, attendees confront Twitter saturation“:

At SXSW, the standard is for everyone to include the tag “#sxsw” in their tweets. For example, on Friday, I was looking for sources for a different story and tweeted, “If you are launching an iPhone app at #sxsw, or know someone who is, please let me know. Thanks!”

That’s a great convention because it allows anyone wanting to know what’s going on to search Twitter for posts using any search term important to them.

I did a search for the “#sxsw” tag on Saturday afternoon and found that there had been 392 tweets with the term in just the previous 10 minutes. That number mushroomed to more than 1,500 in the previous hour.

Large volumes of results wouldn’t be such a problem if users had a way to summarize, navigate, and explore them. But that will take more than a search engine that offers more than reverse-date ordering of Boolean queries.

I wonder if Abdur Chowdhury and his team are working on this problem. Perhaps if Twitter is willing to make some of the historical logs available for download (they’re already public, just not easily downloaded), some of us HCIR wonks could implement interfaces on top of it to explore the possibilities. Abdur, if you read this and are interested, please let us know!

By Daniel Tunkelang

High-Class Consultant.

8 replies on “A Scaling Challenge for Twitter Search”

Cool–nice to see text mining apps working on top of the Twitter stream. Is this basically a focused version of Twitter’s trending topics, with some sentiment mining thrown in?


We have been exploring the idea of research collections but have not really had the bandwidth to fully work out all the logistics of such a project.

Summarization for some topics is one area that could help users, but I can envision many other ways to slice and dice the data being produced.


I agree with you about the twitter search. You made think about how to make it a real search engine. One problem is that there is no ranking at all. IMHO, without some kind of ranking, the functionality is very poor.
I am working on a project about that, hopefully someday I would finish it.


Abdur, I know you’ve had mixed experiences with releasing such research collections. But I hope that, with the data already public and searchable, there really aren’t any privacy issues to sidestep. But I can imagine that you guys are a bit short-staffed to handle the logistics of publishing a large collection that would attract enormous attention.


Comments are closed.