Categories

# A Twitter Analog to PageRank

A few weeks ago, there was a flame war about Twitter authority, and I was all too eager to throw fuel on the pyre. But now that the blogosphere has calmed down a bit, I’d like to propose a ranking measure that I think might work. My apologies if it isn’t original. In fact, if you’ve seen it elsewhere, please point me to it.

• Influence(X) = Expected number of people who will read a tweet that X tweets, including all retweets of that tweet. For simplicity, we assume that, if a person reads the same message twice (because of retweets), both readings count.
• If X is a member of Followers(Y), then there is a 1/||Following(X)|| probability that X will read a tweet posted by Y, where Following(X) is the set of people that X follows.
• If X reads a tweet from Y, there’s a constant probability p that X will retweet it.

This model is obviously simplistic in all three assumptions. But I think it’s a reasonable first cut. In particular, it accounts for the inflation that occurs from people who follow in the hopes of reciprocity. There’s less value in being followed by someone who follows a lot of people, because that person is less likely to read your messages or retweet them.

Of course, there’s room for adding more realism to this model, but I hope it is at least close enough to the truth to be interesting.

From this model, it’s easy to measure someone’s influence recursively, assuming that we know the constant retweet probability p:

The recursion is infinite over a graph with directed cycles, but rapidly converges as high powers of p approach zero. I would think this measure wouldn’t be hard to compute to a reasonable accuracy.

This measure strikes me as a PageRank for Twitter or any system with similar properties. There’s more room for nuance, but I at least find this approach more plausible than the ones I’ve seen. It also strikes me as hard to game, since it isn’t counting retweets, and it’s hard to add much influence through followers who don’t have any influence themselves.

What do folks think? Has anyone tried this? If not, is there anyone who’d like to try hacking an application to compute it? Either way, please let me know!

## By Daniel Tunkelang

High-Class Consultant.

## 77 replies on “A Twitter Analog to PageRank”

Hi Daniel,

An excellent starting point. Since I’ve been following the talk around Tweet Rank the last couple of weeks the missing component (IMHO) that you address is that “There’s less value in being followed by someone who follows a lot of people, because that person is less likely to read your messages or retweet them.”

As you say it represents the attention scarcity. In social networks more is not necessarily better and often has the opposite effect.

Cheers,

Christopher

Like

This looks like a standard type of heuristic ranking to me? The magic of PageRank is it’s use of the stationary distribution (first eigenvector) of the link-graph. As an aside I hate attributing this technique to Google.. people have been using the stationary distribution to analyze complex systems long before Larry Page plucked it out of a textbook.

Off the cuff I’d say the retweet matrix to be far too sparse to really measure the dynamics of information sharing on twitter.

I’d put forth that measuring click though of shared links (biy.ly, tinyulr etc) is a more implicit measure. Of the 50+ people I follow re-tweeting is not too common as a percentage of posts… but my bit.ly analytics show that people click on the links I share often.

One could probably add all the various #XXXX and other ‘social command line’ stuff on twitter.

Hmm TwitterRank smells like an area with real meat behind the first bite 😉

Like

I like it. I ran a simulation on a toy social graph and the results are pretty much what you’d expect. For my small example, the score was fairly sensitive to the effect of being followed by people who follow many other people. It seems to converge very quickly too, as you say.

One thing that was interesting was that users who are only followed by people with no followers, will have a score of zero. This properly reflects their lack of retweetability, so maybe you should call it RetweetRank.

I named my python script TunkRank, so that might work too. 🙂

Like

Neal, you’re right that it’s sloppy for me to use “PageRank” as shorthand for stationary distribution on the link graph. My apologies.

I should also note that I intend “retweet” in a general sense. I don’t see much of a practical difference between citing a post, retweeting it, or in some cases even replying to it in a way that draws attention to it. So I kept the model simple, but I intend the concept generally.

Jason, I’m psyched that you’re already looking at this empirically! The zero-follower case is a nice reality check. And, just to clarify the sensitivity you observed, do you mean that a follower who follows many other people adds almost nothing to a person’s rank?

As for naming the measure, I think RetweetRank is taken.

Like

Correct, they add very little to your rank. Sorry for the lack of clarity.

Click stats might be a great thing to add to this somehow, but I don’t see how you could possibly implement that. I’m not sure every (or even most) url shorteners publish those stats. If you implement that unevenly, it will certainly bias your results. And then occasionally twitter doesn’t auto-shorten, and you’d have no way to get to those unless you could harvest the clicks from all the ten million twitter clients and proxies out there. And then there’s URLs that aren’t turned into links.. 😛

Maybe instead, just track the number of links published. Someone who publishes too many would have to be penalized, since followers will only click on so many.

Like

I do like that my model can’t be gamed by clicking or even by sending messages. Perhaps that’s disguising a bug as a feature–it models expected influence rather than trying to measure it empirically. But I see that as similar to PageRank, which uses a stationary distribution of the link graph rather than the click stream. Of course, it wouldn’t be a bad idea to validate the model against reality. And there’s still the question of how to pick the retweet probability p.

Like

Daniel, I like your formulation and plan to play around with something like it. As for validating models against reality here’s a short post on retweeting you might enjoy: http://xlink.cc/00002e

Like

Shilman, thanks for the link–and for the reminder that, for an influence /authority measure to be useful, it must be somewhat resistant to gaming. Naive retweet counts certainly fail that that test.

Like

jeremysays:

Sorry, this comment is probably a bit orthogonal to the type of discussion that you’re seeking here, but… I have to ask.. why do we need a Twitter analogy to PageRank?

Isn’t one of the problems of PageRank that it emphasizes authority and influence a bit too much, to the detriment of exploratory search? It’s basically a “find me what’s popular” mechanism, rather than a “find me something interesting that I might not have found in any other way” mechanism.

Why seek to duplicate that sort of bias in the Twitter world? Wouldn’t it be more interesting to do exploratory Twitter search, and come across less influential, less connected voices, but perhaps who have something more interesting to say?

Like

Orthogonal, schmorthogonal, it’s not like we have rules here.

Why do we need a Twitter analogy to PageRank?

Well, to start off, what I’m trying to measure here is people (or AIs that have Twitter accounts), not messages. And I’m trying to measure influence, which is a bit more subtle than popularity. The question I’m trying to answer is: if X says something, what will be the expected impact?

What do I care about measuring influence? Here are a couple of reasons:

– I’d like to be able to measure my own influence, since one of my goals is to increase the leverage associated with my ideas. If I were a company, the same would apply to measuring brand capital. In my own case, I’d like to be able to check my balance of reputation capital.

– I’d like to know who the influencers are so I can monitor them and in some cases court them. Of course I’ll have other criteria about the people and their areas of expertise. But the ability to explore and the ability to sort by influence are complementary. For example, I’d love to know who are the most influential people tweeting about information seeking.

For me, these are the practical applications. I’m sure that others would find it interesting for different reasons, perhaps even simply as an interesting research problem in social networks.

Like

i vote for “TunkRank”….

“a follower who follows many other people adds almost nothing to a person’s rank”
-you might want to take into account that twitter clients (like tweetdeck and others) might allow a different model of following for following sake and following and engaging in (by creating filter views) not sure how prevalent that is but something that people have also been actively requesting directly from Twitter as a feature as well. just a thought

Like

I can imagine an extension to the model where, even though Y follows n people, Y doesn’t follow all of them equally. In that case, the inside of the sum shouldn’t be weighted uniformly by 1/Following (Y), but rather the attention of Y should be allocated to reflect how Y allocates attention among Following (Y). Of course, that can only be done in practice if Y is able and willing to publicize this allocation.

Like

I am one of those people Daniel alludes to who use filters, though it is a practice some might disagree with. While I follow 250+ people, I really only keep up with what 50-75 are tweeting. The others I do occasionally read but with considerably lower frequency. Twalala is great for this purpose, btw..

Like

Indeed, I’ve noticed that some people who follow a thousand people nonetheless seem to notice my tweets with frequency that belies my simple proposed model. The modification I proposed in response to Daniela should handle this case. E.g., for Jason, those 50-75 people might each get 1% of the weight, with the rest divided among the remainder. I don’t know how you’d compute the weights in practice, but I think it’s at least the right model in theory.

Like

Sorry I meant Daniela.. Too many “Daniel”s! 🙂

One approximation might be to count the number of people you reference or reply to. That will miss a lot of where your attention is going and it may discourage interaction since people will be penalized for communicating. So forget I said it.

Like

Sean McGuiresays:

I think this is a reasonable first pass but assuming p is constant across posters is limiting and, as the nature of posters changes, will skew the results. Suppose a Hollywood A-list celebrity starts twittering; my intuition is that s/he would get a lot of followers but fewer retweets. If that holds true, someone ‘twitter famous’ like timoreilly might have significantly fewer followers but significantly more impact.

The first-order retweet probability for any given poster can be empirically determined, though it changes the problem from O(number of links in the graph) to O(number of links + number of posters * number of messages per poster).

Like

I readily concede the assuming p is constant across posters is limiting. But I am stumped on how best to remedy the problem. The obvious approach of inferring p from behavior seems to invite gaming, assuming that such a measure was adopted and people cared about their influence scores. I feel I’ve addressed the follower inflation problem, but I don’t know how to address the retweet inflation problem, so I’ve avoided considering actual behavior in the model.

But I do wonder about your Hollywood vs. “Twitter famous” celebrity example. I’d be curious if there are differences among their followers that would show up in their follower subgraphs. Is it wishful thinking to imagine that the average person following @timoreilly is more influential than the average person following @britneyspears?

Like

Colby Dyesssays:

>> Is it wishful thinking to imagine that the average person following @timoreilly
>> is more influential than the average person following @britneyspears?

Not what you intended to mean, but I couldn’t help wonder how many pop music fans desperately want Britney’s observations on the power of real-time enterprise? Or for that matter, how many IT executives are purchasing make-up based on Tim O’Reilly’s endorsements?

Like

[…] other day, I proposed a sort of Twitter analog to PageRank that readers generously dubbed “TunkRank”. I know that some readers started looking […]

Like

I’m actually working on something similar, although from a slightly different perspective.

Some points:

# If X is a member of Followers(Y), then there is a 1/||Following(X)|| probability that X will read a tweet posted by Y, where Following(X) is the set of people that X follows.

I’m using p_xy = (Messages sent by Y)/(Total Messages Received by X). I’m also playing with capping it for people with a small in-flux.

# If X reads a tweet from Y, there’s a constant probability p that X will retweet it.

I think I can test this. I’ll try to get you numbers.

There’s also, of course, the @mentions, replies, and favorites — all of which show that the initiator read something from the target.

All this comes out of the massive twitter scrape I gathered: http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-friend-graph/
… which we will share as soon as the lawyertypes have at it.

Like

Sorry for the double post.

> But I do wonder about your Hollywood vs. “Twitter famous” celebrity example. I’d be curious if there are differences among their followers that would show up in their follower subgraphs.

There’s going to be a lot less closure in the graphs of celebrities and entities (ie. @ruwtbot or @zappos) — that is, “triangles” in that user’s 1-neighborhood. The triangles will also have a different spectrum: an organic person has a lot more 2-2-2 triangles (all three follow each other) reinforced by @, RT and fave links.

Tim O’Reilly’s cluster will probably have a stronger topical signature than Britneys (A “word cloud” of his 2-neighborhood will have a sharper distribution than hers). He also probably has many more conversation threads (reply-reply-reply) although that’s so messy to measure I’m not looking at it.

But I guarantee you THE REAL SHAQ is more influential by far.

… Incidentally if anyone’s interested in collaborating on some factor analysis / bayesian classification with this please email me flip at infochimps org

Like

Interesting. It’s certainly intuitive (almost tautological) that the graphs of followers of mass-appeal celebrities will be sparser than those of people who have a more targeted appeal.

But that doesn’t answer the question of whose followers themselves follow more people. Maybe there’s no correlation.

In any case, I’m excited about your analysis (sorry the comment was initially swallowed by Akismet’s spam filter). It also seems to capture attention scarcity, but based on how the network is dynamically used rather than how it is statically configured.

Like

I think your ranking is also based on individual pages. You could rank really well for one page as their is a ton of links pointing to it from other related stories linking to yours.

Versus a page that doesn’t rank very well due to the nature of the page and few links.

Like

Just to be clear, the ranking is for people (well, Twitter ids), not entries or web pages.

Like

Is it just me, or was that next to last comment one of the new breed of blog spam?

Like

I think it’s legit, even though it’s a bit confused. Unsure, I decided to give it the benefit of the doubt.

Like

[…] the bugs in his TunkRank implementation, I’ve been thinking about the relationship between TunkRank and retweet rank as influence […]

Like

Interesting idea. But I think we should wait until Jason is ready to say his code is working as intended before we read much into the results and try to improve them.

Like

[…] Vergleich zum Pagerank gefragt, wer Spaß an Formeln hat, dem sei dieser Artikel empfohlen: “A Twitter Analog to PageRank“. Bedenken sollte man dabei, daß der Pagerank aus dutzenden Kriterien zusammengestellt […]

Like

[…] We will participate in a gift economy. Reputation will count. Attention is scarce. Something like tunkrank will help, I’m […]

Like

Ouch .. I think my head hurts after reading through this very interesting conversation but I have a question (or two).

If I understand this correctly you are trying to come up with an ranking system for a person’s influence within the TwitterSphere – what I am curious about is would an individual’s ‘rank’ of influence be more than that of a company or even a celebrity?

If so how would you be able to differentiate between accounts that are actual valid people using Twitter for more than just self-promotion and from companies announcing products in contrast to companies who actually use Twitter to interact on a full time basis.

Would a company utilizing Twitter as a ‘help desk’ interaction have more influence than a celebrity or say someone like myself or yourself?

I’m just more curious than anything else.

Like

It would be great to have meta-data about users to know if they are people vs. companies, computer scientists vs. basketball players, etc. But I see this sort of information as orthogonal to their influence. I don’t think it is worthwhile to compare Tim O’Reilly with Shaquille O’Neal: it is more likely a question of whether people care more about Web 2.0 or Basketball 7’1″.

Granted, people might not choose to describe who they are accurately. But hopefully the fakers would get culled out by their low influence once people discovered they were fakers–open fakers like Fake Steve Jobs being the exception.

Like

[…] A couple of months ago, I put out a challenge to implement an influence measure for Twitter that acquire the personally gratifying (if unmelifluous) name TunkRank. […]

Like

[…] into consideration the probability of a retweet.  There’s a pretty equation for you to see on his post and he explains it well also, take a […]

Like

[…] designed by Daniel Tunkelang, measures the influence of a twitter user by the possible number of times his tweet is read, and […]

Like

[…] of stuff like wefollow and SpeakerRate is a harbinger. Or you can look at the prominence of Twitter influence […]

Like

[…] TunkRank. TunkRank is an application Jason Adams built, in response to a challenge to implement a measure that takes a PageRank-like approach to measuring influence on […]

Like

[…] to discourage the vicious circle of reciprocity and fake following. That’s baked into the the measure which, like PageRank, divides the voting power by the number of […]

Like

Otis: The reason is that I stopped following you!

😉 😉

Like

[…] I’ve dabbled a bit on the theoretical side myself. The TunkRank measure (I’m indebted to Jason Adams for his implementing it on a live site!) attempts to […]

Like

[…] A Twitter Analog to PageRank | The Noisy Channel […]

Like

[…] naknadnim Googlanjem naišao sam na članak koji polazi od sličnih pretpostavki i dolazi do sličnih zaključaka; obojica računamo da je […]

Like

[…] they make the rich get richer. “Real-time” variants like re-tweet frequency (and even TunkRank) suffer from the same weakness. Unchecked, these measures can cause authority / influence market […]

Like

[…] Influence algorithm is inspired by Daniel Tunkelang’s Tunkrank for Twitter. We’re still tweaking it a bit (guess we’ll never stop doing […]

Like

Thanks for the link! I’ve seen something similar before, but not as a GreaseMonkey script. It would be interesting to go one step further than looking at common followers, e.g,, using some sort of feature reduction based on sets of people that are often followed as a group or even analyzing the content of their tweets.

Like