The Noisy Channel

 

Twitter’s “Real-Time Search” Ain’t That Hard

March 4th, 2009 · 13 Comments · General

Google CEO Eric Shmidt’s dismissal of Twitter as a poor man’s email was petty and comical, but no more amusing that the blogosphere’s obsession with the wonders of “real-time search“. The dominant narrative in the echo chamber seems to be that Google is in danger of being usurped writing off this segment of the information seeking market.

I actually see some merit to this narrative: search engines in general, and Google in particular, could do a lot to improve their alerting tools. But I want to get something straight: from a technical perspective, real-time search, at least as Twitter implements it, is not that hard.

Let me try to explain this is terms that hopefully do not require technical background.

Twitter’s search interface offers a simple search box. If users do not use any operators, then the results for search are those tweets containing all of the words the user enters. In logical terms, it is as if the terms were combined with a logical AND (e.g., information seeking). In fact, Twitter supports a few Boolean logic operators, so that it is possible to combine terms with OR (e.g., tunkelang OR dtunkelang) and the minus sign (-) for negation (e.g., dtunkelang -published). Twitter also supports quotes as a way of requiring two or more words to occur as a phrase (e.g., “noisy channel”). Finally, Twitter supports some other filtering on its advanced search page.

But what is important is that Twitter only supports one sort, reverse order by date. This vastly simplifies the requirements for Twitter’s inverted index. For those of you unfamiliar with and inverted index, it is much like an index at the back of a book (remember those relics? don’t forget to buy mine!) that associates each word with a list of the documents (in this case, tweets) in which it occurs.

Since Twitter users can only see search results sorted by date, the inverted index presumably maintains its lists in date order. Doing so makes it trivial to add new content, since all additions are at the end of the lists. Moreover, as the index grows, there’s a natural way to partition it into smaller chunks: time-slicing. The problem is, as computer scientists say, embarrassingly parallel.

I’m not trying to suggest that real-time search–or alerting, as it used to be called in ancient pre-Twitter times–isn’t valuable. But, if there is an entry barrier, it is surely not a technical one. Rather, it’s a human one: Twitter’s great achievement, much like Wikipedia’s, is one of human computation: its users supply the content that makes it valuable. Twitter may be much smaller than Facebook, but its single-minded focus on micro-blogging has made it incredibly efficient at what it does, the noise from follower-whores notwithstanding. Twitter’s strength comes from the loyalty of its users.

But this strength is also a vulnerability. As Twitter looks into ways to monetize the attention of its users, it has to be extraordinarily careful not to alienate them. Loyalty is a two-way street.

13 responses so far ↓

  • 1 Joe Cardillo // Mar 4, 2009 at 10:46 am

    I think you’re right that Twitter has a strong search capability. The main difference between traditional search engines like Google and Twitter—never thought I’d be calling Google traditional :)—is that the content serves a different purpose. At least for now.

    Google’s content has been quantified monetarily, so anytime you are searching anything you are still getting both the pulse of what’s on people’s minds, but also advertising/marketing, etc. Versus Twitter which for now continues to be mostly a way to see what’s on people’s minds.

    Even experiments like the Skittles website still haven’t managed to put an ad value on Twitter. But obviously that is something marketing folks are working on. I’m sure Ev and Biz have thought about that plenty, (Suicide Girls interview touches on it http://snipurl.com/d2wjl ).

    I like the current Twitter search function. It’s clean, simple and isn’t designed for advertising/marketing. For now your point about content being human generated is true, so it’ll be interesting to see how they deal with spambots or similar accounts.

  • 2 Jonathan Mendez // Mar 4, 2009 at 10:57 am

    Excellent technical analysis. complements my goal analysis http://bit.ly/1456Jd quite well. 🙂

    Interestingly Fred Wilson mentioned a forthcoming real-time search Q&A feature for Twitter today on his blog. Seems like some real SEO juice but also a recipe for spamming that Twitter has not really had to confront yet at scale.

    RTS is a “vertical” and Twitter maybe more closely aligned with a site search like Endeca than anything else.

  • 3 Daniel Tunkelang // Mar 4, 2009 at 11:25 am

    Jonathan, that goal analysis is great! Also, here’s a link to a past post about alerting spam.

  • 4 Daniel Lemire // Mar 4, 2009 at 11:30 am

    Google is pretty good at indexing content quickly, if Google Reader is any indication.

    I honestly don’t use Twitter’s search function all that much.

  • 5 Daniel Tunkelang // Mar 4, 2009 at 11:33 am

    I’m a big fan of Twitter’s search function, and the feed published on this blog is a good example of how I use it. But just because it’s useful doesn’t mean it would be hard to replicate.

  • 6 Abdur // Mar 4, 2009 at 1:06 pm

    Nice post… Not sure I would agree with the “Ain’t That Hard”, but either way real-time search over Twitter has value that is just different from other search products. Results change as you sit there, click-through is to engage in conversation with someone, etc…

  • 7 Daniel Tunkelang // Mar 4, 2009 at 1:47 pm

    Abdur, I’m honored to see you here at The Noisy Channel! Have you or any of your colleagues ever published any details of how you maintain search.twitter.com? Of course I understand if you feel the need to keep these details secret for competitiveness reasons.

  • 8 Valencio // Mar 4, 2009 at 2:41 pm

    I am actually a huge fan of twitter and its search function, with keywords tied to links.. I think Google should not dismiss twitter as such..

  • 9 Twitter is Not a Search Engine | The Noisy Channel // Mar 5, 2009 at 2:48 pm

    […] No relevance ranking, user-specified sorting, query refinement, etc. I talked about it in a previous post. If Twitter wants to be taken seriously as a search engine (and I’m not sure that they do), […]

  • 10 Thomas Kjelsrud // Mar 6, 2009 at 9:43 am

    Very good article! But as always in these discussions lately, little or no emphasis is put on relevancy. Which is a hurdle at this point, Retweets are not really linked in a way that lets us measure which tweets are “good” and not.. futher more, the use of several url services makes it difficult to find a consistent way to measure link popularity (like Google Page Rank). And, last but not least – there is so much chatter on twitter – even if the technology works out – there might be huge amounts of unrelated comments in the results as most people try to maximize their exposure by using as many tags as possible. Sorry for the long rant!

  • 11 Daniel Tunkelang // Mar 6, 2009 at 10:02 am

    I suspect that, given the small size of messages, it is probably better to associate authority scores with users than with messages–though quite possibly to make those authority scores multidimensional (e.g., this user is authoritative on software but not on politics) and transparent to users. But yes, it’s a long road to get there from here.

  • 12 A Scaling Challenge for Twitter Search | The Noisy Channel // Mar 15, 2009 at 1:35 am

    […] other day, I explained why, as far as I can tell, Twitter’s existing search functionality isn’t that hard to implement. In a subsequent post, I argued that Twitter is not a search engine, an opinion that seems to place […]

  • 13 An Able Grape at the Helm of Twitter Search « AltSearchEngines // Aug 13, 2009 at 3:55 pm

    […] link-baited Summize founder and Twitter Chief Scientist Abdur Chowdhury here once or twice, but I understand that he’s no longer running Twitter Search. They’ve got a new guy, […]

Clicky Web Analytics