Identifying Influencers on Twitter

Post author By Daniel Tunkelang
Post date April 16, 2011
26 Comments on Identifying Influencers on Twitter

One of the perks of working at LinkedIn is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a WSDM 2011 paper on “Identifying ‘Influencers’ on Twitter” by Eytan Bakshy, Jake Hofman, Winter Mason, and Duncan Watts. It’s great to see the folks at Yahoo! Research doing cutting-edge work in this space.

I thought I’d prepare for the discussion by sharing my thoughts here. Perhaps some of you will even be kind enough to add your own ideas, which I promise to share with the reading group.

I encourage you to read the paper, but here’s a summary of its results:

A user’s influence on Twitter is the extent to which that user can cause diffusion a posted URL, as measured by reposts propagated through follower edges in Twitter’s directed social graph.
The best predictors of future total influence are follower count and past local influence, where local influence refers to the average number of reposts by that user’s immediate followers, and total influence refers to average total cascade size.
The content features of individual posts do not have identifiable predictive value.
Barring a high per-influencer acquisition cost, the most cost-effective strategy for buying influence is to target users of average influence.

Let’s dive in a bit deeper.

The definitions of influence and influencers are, by the authors’ own admission, narrow and arbitrary. There are many ways one could define influence, even within the context of Twitter use. But I agree with the authors that these definitions have enough verisimilitude to be useful, and their simplicity facilitates quantitative analysis.

It’s hardly surprising that past influence is a strong predictor of future influence. But it might seem counterintuitive that, for predicting future total influence, past local influence is more informative than past total influence. The authors suggest the explanation that most non-trivial cascades are of depth 1 — i.e., total influence is mostly local influence. But at most that would make the two features equally informative, and total influence should still be a mildly better predictor.

I suspect that another factor is in play — namely, that the difference between local influence and total influence reflects the unpredictable and rare virality of the content (e.g., a random Facebook Question generated 4M votes). If this hypothesis is correct, then past local influence factors out this unpredictable factor and is thus a better predictor of both future local influence and future total influence.

I’m a bit surprised that follower count supplies additional informative value beyond the past local influence; after all, local influence should already reflect the extent to which the followers are being influenced. It’s possible that past influence lags the follower count, since it does not sufficiently weigh the potential contributions of more recent followers. But another possibility is one analogous to the predictive value of past local vs. global influence: past local influence may include an unpredictable content factor which follower count factors out.

Of course, I can’t help suggesting that TunkRank might be a more useful indicator than follower count. Unfortunately the authors don’t seem to be aware of the TunkRank work — or perhaps they preferred to restrict their attention to basic features.

I’m not surprised by the inability to exploit content features to predict influence. If it were easy to generate viral content, everyone would do it. Granted, a deeper analysis might squeeze out a few features (like those suggested in the Buddy Media report), but I don’t think there are any silver bullets here.

Finally, the authors consider the question of designing a cost-effective strategy to buy influence. The authors assume that the cost of buying influence can be modeled in terms of two parameters: a per-influencer acquisition cost (which is the same for each influencer) and a per-follower cost for each influencer. They conclude that, until the acquisition cost is extremely high (i.e., over 10,000 times the per-follower cost), the most cost-efficient influencers are those of average influence. In other words, there’s no reason to target the small number of highly influential users.

The authors may be arriving at the right conclusion (Watts’s earlier work with Peter Dodds, which the paper cites, questions the “influentials” hypothesis), but I’m not convinced by their economic model of an influence market. It may be the case that professional influencers are trying to peddle their followers’ attention on a per-follower basis — there are sites that offer this model.

But why should anyone believe that an influencer’s value is proportional to his or her number of followers? The authors’ own work suggests that past local influence is a more valuable predictor than follower count, and again they might want to look at TunkRank.

Regardless, I’m not surprised that a fixed per-follower cost makes users with high follower counts less cost-effective, as I subscribe to its corollary: as a user’s follower count goes up, the per-follower value diminishes. I haven’t done the analysis, but I believe that the ratio of a user’s TunkRank to the user’s follower count tends to go down as a user’s follower count goes up. A more interesting research (and practical) question would be to establish a correctly calibrated model of influencer value and then explore portfolio strategies.

In any case, it’s an interesting paper, and I look forward to discussing it with my colleagues next week. Of course, I’m happy to discuss it here in the meantime. If you’re in my reading group, feel free to chime in. And you’re not in you’re not in my reading group, consider joining. We do have openings. 🙂

By Daniel Tunkelang

High-Class Consultant.

View Archive

26 replies on “Identifying Influencers on Twitter”

First time writing here, though I’m a long-time lurker. Which brings me to something I felt is missing in the discussed paper, that is, whether the influenced users (the ones who spread the links, supposedly showing engagement with content) are real users or something else. We know how easy it is in Twitter to distribute content automatically, thus, I wonder whether it has become necessary to study who the so-called influenced are and whether there is some diversity inside that group at all. Maybe someone has done it and I am simply not aware of it, so I will appreciate pointers.

I guess the point I want to make is expressed better by the group of tweets I’m adding at the end of this post. I have dozens of such examples in my dataset from the Massachusetts Senate election in 2010 (which is relatively small, 250K tweets in one week). But I wonder whether the few deep cascades that were found by the paper are examples like this, or genuine examples of information diffusion involving real users, since their qualitative analysis included only 1000 URLs, and 20% of them were spam.

What concerns me is that the services you mentioned, who sell Twitter followers, might have been successful in creating large networks of fake users who appear normal all the time, because they don’t engage in spam, but mimic the behavior human Twitter users.

Bill987232 Sat Jan 16 15:28:04 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
coolyoung31 Sat Jan 16 15:38:05 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
xavier132 Sat Jan 16 15:48:07 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
hannah98712 Sat Jan 16 15:58:09 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Bonnie07123 Sat Jan 16 16:08:09 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
LoggedInUs Sat Jan 16 16:18:15 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
eBayItems Sat Jan 16 16:28:28 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
TheLatestStuff Sat Jan 16 16:38:35 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
GoodDay999 Sat Jan 16 16:48:24 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
DailyLog2 Sat Jan 16 16:58:26 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
HeadsUp22 Sat Jan 16 17:08:34 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Zowie34 Sat Jan 16 17:19:14 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
ThisIsIt04 Sat Jan 16 17:28:32 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Hilary12378 Sat Jan 16 17:38:46 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Frank12392 Sat Jan 16 17:48:36 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Alex1237812 Sat Jan 16 17:58:41 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
hector98 Sat Jan 16 18:18:35 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Billy0989 Sat Jan 16 18:28:50 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR
Shirley1376 Sat Jan 16 18:39:28 Coakley counts on union muscle to win Senate race (AP)http://bit.ly/6BWSXR

Share this:

Related

By Daniel Tunkelang

26 replies on “Identifying Influencers on Twitter”