Identifying Influencers on Twitter

One of the perks of working at LinkedIn is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a WSDM 2011 paper on “Identifying ‘Influencers’ on Twitter” by Eytan Bakshy, Jake Hofman, Winter Mason, and Duncan Watts. It’s great to see the folks at Yahoo! Research doing cutting-edge work in this space.

I thought I’d prepare for the discussion by sharing my thoughts here. Perhaps some of you will even be kind enough to add your own ideas, which I promise to share with the reading group.

I encourage you to read the paper, but here’s a summary of its results:

  • A user’s influence on Twitter is the extent to which that user can cause diffusion a posted URL, as measured by reposts propagated through follower edges in Twitter’s directed social graph.
  • The best predictors of future total influence are follower count and past local influence, where local influence refers to the average number of reposts by that user’s immediate followers, and total influence refers to average total cascade size.
  • The content features of individual posts do not have identifiable predictive value.
  • Barring a high per-influencer acquisition cost, the most cost-effective strategy for buying influence is to target users of average influence.

Let’s dive in a bit deeper.

The definitions of influence and influencers are, by the authors’ own admission, narrow and arbitrary. There are many ways one could define influence, even within the context of Twitter use. But I agree with the authors that these definitions have enough verisimilitude to be useful, and their simplicity facilitates quantitative analysis.

It’s hardly surprising that past influence is a strong predictor of future influence. But it might seem counterintuitive that, for predicting future total influence, past local influence is more informative than past total influence. The authors suggest the explanation that most non-trivial cascades are of depth 1 — i.e., total influence is mostly local influence. But at most that would make the two features equally informative, and total influence should still be a mildly better predictor.

I suspect that another factor is in play — namely, that the difference between local influence and total influence reflects the unpredictable and rare virality of the content (e.g., a random Facebook Question generated 4M votes). If this hypothesis is correct, then past local influence factors out this unpredictable factor and is thus a better predictor of both future local influence and future total influence.

I’m a bit surprised that follower count supplies additional informative value beyond the past local influence; after all, local influence should already reflect the extent to which the followers are being influenced. It’s possible that past influence lags the follower count, since it does not sufficiently weigh the potential contributions of more recent followers. But another possibility is one analogous to the predictive value of past local vs. global influence: past local influence may include an unpredictable content factor which follower count factors out.

Of course, I can’t help suggesting that TunkRank might be a more useful indicator than follower count. Unfortunately the authors don’t seem to be aware of the TunkRank work — or perhaps they preferred to restrict their attention to basic features.

I’m not surprised by the inability to exploit content features to predict influence. If it were easy to generate viral content, everyone would do it. Granted, a deeper analysis might squeeze out a few features (like those suggested in the Buddy Media report), but I don’t think there are any silver bullets here.

Finally, the authors consider the question of designing a cost-effective strategy to buy influence. The authors assume that the cost of buying influence can be modeled in terms of two parameters: a per-influencer acquisition cost (which is the same for each influencer) and a per-follower cost for each influencer. They conclude that, until the acquisition cost is extremely high (i.e., over 10,000 times the per-follower cost), the most cost-efficient influencers are those of average influence. In other words, there’s no reason to target the small number of highly influential users.

The authors may be arriving at the right conclusion (Watts’s earlier work with Peter Dodds, which the paper cites, questions the “influentials” hypothesis), but I’m not convinced by their economic model of an influence market. It may be the case that professional influencers are trying to peddle their followers’ attention on a per-follower basis — there are sites that offer this model.

But why should anyone believe that an influencer’s value is proportional to his or her number of followers? The authors’ own work suggests that past local influence is a more valuable predictor than follower count, and again they might want to look at TunkRank.

Regardless, I’m not surprised that a fixed per-follower cost makes users with high follower counts less cost-effective, as I subscribe to its corollary: as a user’s follower count goes up, the per-follower value diminishes. I haven’t done the analysis, but I believe that the ratio of a user’s TunkRank to the user’s follower count tends to go down as a user’s follower count goes up. A more interesting research (and practical) question would be to establish a correctly calibrated model of influencer value and then explore portfolio strategies.

In any case, it’s an interesting paper, and I look forward to discussing it with my colleagues next week. Of course, I’m happy to discuss it here in the meantime. If you’re in my reading group, feel free to chime in. And you’re not in you’re not in my reading group, consider joining. We do have openings. 🙂

By Daniel Tunkelang

High-Class Consultant.

26 replies on “Identifying Influencers on Twitter”

First time writing here, though I’m a long-time lurker. Which brings me to something I felt is missing in the discussed paper, that is, whether the influenced users (the ones who spread the links, supposedly showing engagement with content) are real users or something else. We know how easy it is in Twitter to distribute content automatically, thus, I wonder whether it has become necessary to study who the so-called influenced are and whether there is some diversity inside that group at all. Maybe someone has done it and I am simply not aware of it, so I will appreciate pointers.

I guess the point I want to make is expressed better by the group of tweets I’m adding at the end of this post. I have dozens of such examples in my dataset from the Massachusetts Senate election in 2010 (which is relatively small, 250K tweets in one week). But I wonder whether the few deep cascades that were found by the paper are examples like this, or genuine examples of information diffusion involving real users, since their qualitative analysis included only 1000 URLs, and 20% of them were spam.

What concerns me is that the services you mentioned, who sell Twitter followers, might have been successful in creating large networks of fake users who appear normal all the time, because they don’t engage in spam, but mimic the behavior human Twitter users.

Bill987232 Sat Jan 16 15:28:04 Coakley counts on union muscle to win Senate race (AP)
coolyoung31 Sat Jan 16 15:38:05 Coakley counts on union muscle to win Senate race (AP)
xavier132 Sat Jan 16 15:48:07 Coakley counts on union muscle to win Senate race (AP)
hannah98712 Sat Jan 16 15:58:09 Coakley counts on union muscle to win Senate race (AP)
Bonnie07123 Sat Jan 16 16:08:09 Coakley counts on union muscle to win Senate race (AP)
LoggedInUs Sat Jan 16 16:18:15 Coakley counts on union muscle to win Senate race (AP)
eBayItems Sat Jan 16 16:28:28 Coakley counts on union muscle to win Senate race (AP)
TheLatestStuff Sat Jan 16 16:38:35 Coakley counts on union muscle to win Senate race (AP)
GoodDay999 Sat Jan 16 16:48:24 Coakley counts on union muscle to win Senate race (AP)
DailyLog2 Sat Jan 16 16:58:26 Coakley counts on union muscle to win Senate race (AP)
HeadsUp22 Sat Jan 16 17:08:34 Coakley counts on union muscle to win Senate race (AP)
Zowie34 Sat Jan 16 17:19:14 Coakley counts on union muscle to win Senate race (AP)
ThisIsIt04 Sat Jan 16 17:28:32 Coakley counts on union muscle to win Senate race (AP)
Hilary12378 Sat Jan 16 17:38:46 Coakley counts on union muscle to win Senate race (AP)
Frank12392 Sat Jan 16 17:48:36 Coakley counts on union muscle to win Senate race (AP)
Alex1237812 Sat Jan 16 17:58:41 Coakley counts on union muscle to win Senate race (AP)
hector98 Sat Jan 16 18:18:35 Coakley counts on union muscle to win Senate race (AP)
Billy0989 Sat Jan 16 18:28:50 Coakley counts on union muscle to win Senate race (AP)
Shirley1376 Sat Jan 16 18:39:28 Coakley counts on union muscle to win Senate race (AP)


Eni, I’m glad to see that this post has inspired you to comment! Thank you for contributing to the discussion.

I don’t know that anyone has done the analysis you describe. The first person I’d ask is someone I believe you already have worked with: Daniel Gayo at the University of Oveido.


I’m intrigued at how this post is generating enormous interest (it’s already my most popular post in 2011) but very little discussion here.

I did have a nice conversation on Twitter with Chris Dixon, which I’ll reproduce it here so it doesn’t fall into the void:

@cdixon: does that Yahoo paper discount for effect of SUL? i didn’t see it mentioned there.

@dtunkelang: No mention of SUL in the paper. In general, no discussion of how someone’s followers were obtained.

@cdixon: then the paper is deeply flawed.

@dtunkelang: And I thought I was being harsh! 🙂 But yes, they didn’t sufficiently address the variability of follower quality in the paper.

@cdixon: twitter should feature some score like tunkrank pronto so people optimize for that instead of followers.

@cdixon: its not variability. SUL decoupled follower count and active follower count.

@dtunkelang: That’s what I meant. Not all followers are created equal. SUL is certainly part of that.

@dtunkelang: By follower quality I primarily meant how much attention a follower invests. The raison d’etre for TunkRank.

@cdixon: perception also becomes reality as many people look at “social proof” of # of followers to decide whether to follow.

@dtunkelang: Indeed, if Twitter offers follower count as best public measure correlated to influence, there’s a self-fulfilling element.


This is a great post. There’s an even deeper flaw in the Yahoo! Research that I’m posting on my website in the morning (timing is everything), but I’ll reproduce it advance in full below:


Yahoo! Research recently published a paper entitled “Identifying ‘Influencers’ on Twitter’. Though they correctly categorize that the term “influencer” has become confused given the definition can refer to personal, celebrity or subject matter expert broadcast relationships, they completely miss the mark with respect to categorizing influence quantitatively:

“We quantify the influence of a given post by the number of users who subsequently repost the URL, meaning that they can be traced back to the originating user through the follower graph.”

It appears that they measure influence via the diffusion of retweets, old and new, alone from seeded shortened URLs. The flaw here is that retweets represent an act of curation, not consumption, therefore, they are testing how primary influencers (originators) push content to secondary influencers (curators) via visible means. To measure consumption based influence (reading of URLs), Yahoo! Research should have measured the click counts on those originally seeded short URLs (they could have also measured diffusion via the geolocation of those clicks as a proxy for follower graph diffusion). To say that more people consume URLs via clicks than curate them via retweets on Twitter is not preposterous, but it can be empirically tested because the data is freely available via the API. Further, I submit that a retweet of an URL without an associated click on the content first is specious at best and inauthentic at worst (though this behavior is not nearly as easy to test).

Given this flaw, I have to discount if not completely dismiss their predictive influence model (although I do admire the approach and the effort). I’d love to see this research done correctly as it’s a critical piece to driving the next wave of smart distribution/discovery applications in the Twitter ecosystem.


Greg, thanks for sharing your perspective. I accepted their definition as being useful, but you’re right that consumption makes more sense for the objective function. Indeed, this is implicit in the TunkRank model, since TunkRank models expected consumption rather than expected sharing.


It got a lot of attention but hey, its hard to add value to the one you already got on the post 🙂

However, after reading the paper… i got here to discuss exactly the point that @cdixon were discussing on twitter (and I dind’t read that) the SUL was a necessary step to build twitter communities… but it also set up a “flawed” ecosystem to measure.

Its ironic, where I live (Argentina) one of my commenters got within the SUL by first buying followers, then stopped buying and deleted almost everyone on the “following” list and suddenly started pointing out (on some #SMcamps or events like that) the value of influence as follower count.

And since you dnt take the SUL effect nor a way to insert some sort of TunkRank (or something like that) it ends up being naive since “activities/engagement” are not the kind of stuff you get on the SUL list follower


Thanks Daniel, I will talk to (the other) Daniel.

I wanted to point out that the paper describes who the followers are:

In this way, we obtained a large fraction of the Twitter follower graph comprising all active posters and anyone connected to these users via one-way directed chains of followers. Specifically, the subgraph comprised approximately 56M users and 1.7B edges.

Indeed, the authors only look for information diffusion in this network. But, without addressing the quality of followers, their finding about the “efficacy of ordinary influencers” remains questionable.

However, I wonder whether TunkRank could help. I went and checked the data for the example I showed in my first comment, and almost all senders had in average 20 friends and 200 followers. So, whoever creates these account farms is being very smart in avoiding known red flags, though their consistent choice of usernames as a combination of names and digits is a red flag itself. But you become aware of this only when you see all accounts together and the fact that they post the same content. Normal users also combine names and digits all the time.

Finally, I think I found a paper that studies a particular class of Twitter users (it will appear on ICWSM 2011), but I have yet to read it:
Seven Months with the Devils: A Long-Term Study of Content Polluters on Twitter

Click to access lee11icwsm.pdf


I think this is the best post I have written on Twitter influence in a long time. As a user of several analysis tools and the creator of a Twitter application called I find stuff like this really really interesting.

I think the point about buying influence is an interesting one. I wouldn’t be surprised if people actually do this already. I often query what 50,000 followers gets you without any engagement.


The first problem Twitter needs to solve is spam. Any research that doesnt count that is flawed at best. I mean, how can you analyse a service thinking it has 200 million users when in reality there are only 25 million?


Mariano: thanks! And I do hope Twitter is working to make sure it doesn’t end up with an ecosystem where people buy followers just for their counts. A key part of Twitter’s value proposition, at least in my view, is that following is a meaningful act.

Gerald: thanks! I borrowed the picture from the David Armano’s blog (which it links to):

Eni: it may well be that the spammers are acting in a way that would confuse TunkRank too. I agree with StatSpotting that there’s probably no substitute for spam detection to make these measures robust. Thanks for the link, which I’ll check out.

Chris: thanks. And yes, there is a market for buying followers — though I have no idea how much money it’s generating. It will be interesting to see if this influence economy matures into something more useful.


Hi all,

I’m the second Daniel that Eni referred to 🙂

I have to carefully read the paper by Bakshy et al. but I agree that “influence” is certainly a hot topic but also a subtle one.

We should first define if we are identifying influence as popularity (aka fame) or something deeper. Besides, RTs are certainly a way of measuring (a type of) influence but there should be other ways; especially if we are thinking of “monetizable” actions or actual outcomes in physical world (e.g. buying a product, watching a film, voting a candidate).

At this moment, there are plenty of studies on the topic but, unfortunately, they are a bit “atomized” (i.e. some groups don’t seem to be aware of what other groups are doing in the same area).

For the interested readers I highly recommend “Influence and Passivity in Social Media” by Romero et al. ( which provides a very smart way of measuring influence.

Of course, I should also recommend you to read a paper of my own (to appear in Europhysics Letters this april): “De retibus socialibus et legibus momenti”

It has a slide deck:

BTW, Daniel Tunkelang is right: TunkRank is a pretty damn good predictor of influence.

Regards, Dani


Dani, thanks for chiming in! The definition of influence is a touchy subject, but I’m cutting the authors some slack since there are limits to what they’re able to measure. I at least think it’s useful to measure the ability to persuade people to consume or retweet a post — modulo that those should be people and not bots.

And thanks for the links to your papers! Love the Latin title on the latter one. De definitionibus potentiae non est disputandum.


[…] Create official materials and assets to reach people and get them involved – emails, agreements, buttons and banners, tweets and hashtags. Consider throughout (unless you are paying them) the motivations behind giving and sharing: altruism, enjoyment, status seeking, reputation seeking (from CR). A good approach is to get some ‘big names’ involved up front and then use them on communications to encourage others to get on board. But if you decide you need to pay, go for the middle: the most cost-effective strategy is to target those of average influence […]


It seems to me that all current attempts, at least all that I’ve seen, at valuating twitter users misses one vital element.

Doesn’t demographics play a vital role in tweet shelf life?

Don’t I need to find influential persons in fields related to my passions that I tweet about in order to gain the advantages of having a relationship with someone influential?

Is this a basis of “Identifying Influencers on Twitter” that is simply taken as a given and that I’ve missed?

Also, isn’t it a bit pointless to be “Identifying Influencers on Twitter” when that person certainly won’t retweet what I have to say, merely because the person is identified as an influencer? Dali Lama gets a lot of depth, but will the Dali Lama ever retweet my sales pitches for spam? I think not.

I understand why a company would want to find highly influential twitter users, but is merely saying “buy Coke” stated from someone identified as influential really going to get anywhere?

It seems demographics is completely ignored, unless, again, this is a given that is often times left unsaid.


Earl, thanks for the comment! As I said at the beginning of the post:

The definitions of influence and influencers are, by the authors’ own admission, narrow and arbitrary. There are many ways one could define influence, even within the context of Twitter use. But I agree with the authors that these definitions have enough verisimilitude to be useful, and their simplicity facilitates quantitative analysis.

There is research that analyzes Twitter influence to demographics, e.g., “Understanding the Demographics of Twitter Users>“) as well as research on finding topic-sensitive influential Twitterrers. But as always there’s a trade-off between the realism of the model and the ability to use the model for analysis.


This whole line of research supports and furthers the process by which the successful become more successful. And the individual withers.
Know “Influencers” does not democratize the process.
Instead it should be more obvious how to ADDRESS a tweet to a TOPIC – and to read the tweets on a TOPIC directly without searching.

Now THAT would be useful.


An assumption, a thought, and a question:
(Given that all analysis can only be done on historical data)
I believe local re-tweets attract new followers:
Follower count is likely to increase if a user sees relevant content of mine re-tweeted by someone we follow in common (or via search, Top Tweets, etc) and chooses to follow me as a result.
Therefore, would follower count have to be captured at the time the tweet was posted for it to be used as a measure of how successful that tweet was?

As for clicks, I stopped using to prevent details about my activist network being exposed in ways that may reveal too much information about them to hostile watchers. I can vouch for the fact that even links that attract no re-tweets can register hundreds of clicks. But it’s important to bear that in mind, not all those clicks are made by humans.

When this article was written, I think Twitter had not yet made urls mandatory, and was allowing less restricted access to the search API. The changes to these and other aspects of the service since then (such as the near-elimination of RSS feeds) will have an impact on all anaylsis.

And for Karel: the way to address a tweet to or follow a TOPIC is still by using and following a hashtag, as far as I know.


Karel: check out if you’re interested in a topic-oriented approach to Twitter.

Anita: I’m pretty sure the Yahoo researchers captured statistics at the time of the tweets. And they used retweets, not clicks. I prefer measuring consumption using clicks, but I understand the pitfalls of the clickstream being polluted by bots or obfuscated for privacy reasons. And yes, the ability to measure any of these things depends on what data Twitter (and other social networks) makes available to researchers and practitioners.


Comments are closed.