Categories
General

CFP: CIKM 2011 Industry Event

As I posted a few months ago, I’m organizing the Industry Event at CIKM 2011 with Tony Russell-Rose. We have a great set of keynotes lined up:

We’re also looking for submissions from industry researchers and practitioners. The submission deadline is June 21.

Here is a copy of the call for papers:

This year’s CIKM conference will include an Industry Event, which will be held during the regular conference program in parallel with the technical tracks.

The Industry Event’s objectives are twofold. The first objective is to present the state-of-the-art in information retrieval, knowledge management, databases, and data mining, delivered as keynote talks by influential technical leaders who work in industry. The second objective is to present interesting, novel and innovative industry developments in these areas.

Industry authors are invited to prepare proposals for presenting interesting, novel and innovative ideas, and submit these to industry@cikm2011.org by June 21st 2011. The proposals should contain (with respective lengths):

  • Short company portrait (125 words)
  • Short CV of the presenter (125 words)
  • Title and abstract of the presentation (250 words)
  • Reasons why the presentation should be interesting to the CIKM audience

When submitting a proposal, please bear in mind the following:

  • Ensure the presentation is relevant to the CIKM audience (the Call for Papers gives a good idea of the conference scope).
  • Try to highlight interesting R&D challenges in the work you present. Please do not present a sales pitch.
  • All slides will be made public (no confidential information on the slides; you will be expected to ensure your slides are approved by your company before being presented).
  • Presenters may opt to have their presentation videoed and made public, and if so, the presenter will be asked to sign a release form.

We look forward to receiving your submissions, and welcoming you to the CIKM 2011 Conference and Industry Event.

Important dates:
21 June 2011: Industry Event paper proposals due
19 July 2011: Notifications sent
27 October 2011: Industry Event
24-28 October 2011: CIKM conference

 

Categories
General

CFP: IEEE Internet Computing Special Issue on Context-Aware Computing

Pankaj Mehra and I are guest editors for an upcoming special issue of IEEE Internet Computing with the topic “Beyond Search: Context-Aware Computing“.

Here is a copy of the call for papers:

Context is the unstated actor in human communications, actions, and situations. It makes our communication efficient, our commands actionable, and our situations understandable to the people, organizations, and devices that provide us with content or services. The increased embedding of technology into our personal and social environments drives a need for context-aware computing.

Context-aware computing offers mobile Internet users an experience that goes beyond user-initiated search and location-­based services. Context awareness sharpens relevance when responding to user-initiated actions (such as product search and support calls). It also enables proactive communications through analysis of a user’s behavior and environment, thereby forming the basis for key business imperatives targeting customer-engagement systems. Even greater opportunity arises from context use in systems that can make sense of and engage in customer dialogs and forums.

This special issue seeks original articles that support and illustrate context use in creating enhanced user experiences. Sample topics include

  • proactive, contextualized delivery of information, alerts, and advertisements;
  • context-mediated Web service orchestration, yielding actionable interpretation of spoken high-level commands;
  • system architecture, economics, and ecosystems for comprehensive capture, representation, communication, gathering, and brokering the larger user context;
  • systems of engagement that treat discourse as text plus context and process textual communication as an event in which linguistic, cognitive, and social actions converge; and
  • reasoning and knowledge representation mechanisms that use context in selecting the body of knowledge to use, the level of detail to model, and the point of view with which to communicate and interpret text and data.

All submissions must be original manuscripts of fewer than 5,000 words, focused on Internet technologies and implementations. All manuscripts are subject to peer review on both technical merit and relevance to IC’s international readership—primarily system and software design engineers. We do not accept white papers, and we discourage strictly theoretical or mathematical papers. To submit a manuscript, please log on to ScholarOne (https://mc.manuscriptcentral.com:443/ic-cs) to create or access an account, which you can use to log on to IC’s Author Center and upload your submission.

I hope some of you will submit articles in time for the June 15 deadline, and Pankaj and I look forward to reviewing them.

Categories
General

Identifying Influencers on Twitter


One of the perks of working at LinkedIn is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a WSDM 2011 paper on “Identifying ‘Influencers’ on Twitter” by Eytan Bakshy, Jake Hofman, Winter Mason, and Duncan Watts. It’s great to see the folks at Yahoo! Research doing cutting-edge work in this space.

I thought I’d prepare for the discussion by sharing my thoughts here. Perhaps some of you will even be kind enough to add your own ideas, which I promise to share with the reading group.

I encourage you to read the paper, but here’s a summary of its results:

  • A user’s influence on Twitter is the extent to which that user can cause diffusion a posted URL, as measured by reposts propagated through follower edges in Twitter’s directed social graph.
  • The best predictors of future total influence are follower count and past local influence, where local influence refers to the average number of reposts by that user’s immediate followers, and total influence refers to average total cascade size.
  • The content features of individual posts do not have identifiable predictive value.
  • Barring a high per-influencer acquisition cost, the most cost-effective strategy for buying influence is to target users of average influence.

Let’s dive in a bit deeper.

The definitions of influence and influencers are, by the authors’ own admission, narrow and arbitrary. There are many ways one could define influence, even within the context of Twitter use. But I agree with the authors that these definitions have enough verisimilitude to be useful, and their simplicity facilitates quantitative analysis.

It’s hardly surprising that past influence is a strong predictor of future influence. But it might seem counterintuitive that, for predicting future total influence, past local influence is more informative than past total influence. The authors suggest the explanation that most non-trivial cascades are of depth 1 — i.e., total influence is mostly local influence. But at most that would make the two features equally informative, and total influence should still be a mildly better predictor.

I suspect that another factor is in play — namely, that the difference between local influence and total influence reflects the unpredictable and rare virality of the content (e.g., a random Facebook Question generated 4M votes). If this hypothesis is correct, then past local influence factors out this unpredictable factor and is thus a better predictor of both future local influence and future total influence.

I’m a bit surprised that follower count supplies additional informative value beyond the past local influence; after all, local influence should already reflect the extent to which the followers are being influenced. It’s possible that past influence lags the follower count, since it does not sufficiently weigh the potential contributions of more recent followers. But another possibility is one analogous to the predictive value of past local vs. global influence: past local influence may include an unpredictable content factor which follower count factors out.

Of course, I can’t help suggesting that TunkRank might be a more useful indicator than follower count. Unfortunately the authors don’t seem to be aware of the TunkRank work — or perhaps they preferred to restrict their attention to basic features.

I’m not surprised by the inability to exploit content features to predict influence. If it were easy to generate viral content, everyone would do it. Granted, a deeper analysis might squeeze out a few features (like those suggested in the Buddy Media report), but I don’t think there are any silver bullets here.

Finally, the authors consider the question of designing a cost-effective strategy to buy influence. The authors assume that the cost of buying influence can be modeled in terms of two parameters: a per-influencer acquisition cost (which is the same for each influencer) and a per-follower cost for each influencer. They conclude that, until the acquisition cost is extremely high (i.e., over 10,000 times the per-follower cost), the most cost-efficient influencers are those of average influence. In other words, there’s no reason to target the small number of highly influential users.

The authors may be arriving at the right conclusion (Watts’s earlier work with Peter Dodds, which the paper cites, questions the “influentials” hypothesis), but I’m not convinced by their economic model of an influence market. It may be the case that professional influencers are trying to peddle their followers’ attention on a per-follower basis — there are sites that offer this model.

But why should anyone believe that an influencer’s value is proportional to his or her number of followers? The authors’ own work suggests that past local influence is a more valuable predictor than follower count, and again they might want to look at TunkRank.

Regardless, I’m not surprised that a fixed per-follower cost makes users with high follower counts less cost-effective, as I subscribe to its corollary: as a user’s follower count goes up, the per-follower value diminishes. I haven’t done the analysis, but I believe that the ratio of a user’s TunkRank to the user’s follower count tends to go down as a user’s follower count goes up. A more interesting research (and practical) question would be to establish a correctly calibrated model of influencer value and then explore portfolio strategies.

In any case, it’s an interesting paper, and I look forward to discussing it with my colleagues next week. Of course, I’m happy to discuss it here in the meantime. If you’re in my reading group, feel free to chime in. And you’re not in you’re not in my reading group, consider joining. We do have openings. 🙂

Categories
General

Social Utility, +/- 25%

I like Google…

I’ve been a regular Google user since the day I first discovered its existence in 1999. Indeed, I’ve consistently found Google to be the most useful service on the web. That’s not love, but it’s a very strong +1.

Moreover, I’d say that my preference for Google is an informed one. I’ve given all of the major search engines a fair chance, and even tried a fair number of obscure ones. They all have their strengths, but none have delivered enough utility to me to justify the cognitive load of using more than one search engine for the open web.

…but I don’t need Google.

Nonetheless, I know that, if Google disappeared tomorrow or became inconvenient to access, I’d be content with one of its competitors. I have no particular investment in Google beyond brand loyalty.

Actually, that’s not entirely true. I could easily walk away from Google search, but I’d be apoplectic if I suddenly lost access to my Gmail account — much as if I lost access to my LinkedIn or Twitter accounts. Indeed, Gmail is the only way in which Google has me locked in, but I don’t see my Gmail account as entangled with my access to Google’s other services.

Perhaps that not a bug but a feature: after all, Google trumpets the virtues of “open” and the portability of user data (including Gmail) through the Data Liberation Front. Nonetheless, it’s no secret that Google has a major case of Facebook envy. And if rumors hold, Google is now making the success of its social strategy a major component in all employee compensation.

Social is Give to Get.

Google critics often assert that Google doesn’t get social. But I think the problem isn’t so much with what Google gets as what it gives. When it comes to social, you have to give to get. That is, to get data and engagement, you have to provide social utility.

To start off, Google would love to know who you are. That’s why it developed Google Profiles in 2007. People are more than willing to provide data about who they are, as proven by the hundreds of millions of people who create profiles on Facebook and LinkedIn. Perhaps Google was a little bit late to the game. More likely, people didn’t see enough utility in creating Google profiles. Facebook, on the other hand, helps people be found by their friends and family in a context designed for social interaction. LinkedIn offers people the opportunity to be found by people who can help you professionally: colleagues, classmates, potential employers, etc. Google didn’t give people much reason to invest effort — in fact it seems to treat Profiles as a dumping ground populated by Google’s other products, rather than valuable piece of online real estate embedded in a living social context. Not surprisingly, users invest their efforts elsewhere.

Google would also love to know where you are and where you’ve been — that’s why Google created Latitude in 2009. Moreover, Google developed this pioneering location-based service as a complement to Google Maps, perhaps the best product Google has produced outside of search. Given it’s dominance in mapping services, directions, and local search, Google should be the leader of all things local. And yet, while Latitude has flopped, Foursquare — which launched in the same year as a tiny startup after Google acquired and shut down its previous incarnation— succeeded in defining location-based services as a category. Before Foursquare, the idea of a service tracking your location was one that most of us associated with Lo-Jack and Big Brother — if not with modern totalitarian regimes. Yet, by making a game out of “checking in” to venues, Foursquare inspired its users to willingly — and eagerly! — share and publish their whereabouts. It’s unclear whether this model will create sustained interest (cf. Mark Watkins’s analysis at ReadWriteWeb), but Foursquare’s success thus far is predicate on its offers social utility in exchange for data and attention.

Of course, Google also wants to know what you like. That’s why Google developed SearchWiki (RIP), Hotpot (now merged into Places), and most recently +1. As Amazon, Facebook, Netflix, and Yelp have demonstrated, people aren’t shy about sharing their opinions publicly, given the right social context and utility. Unfortunately, Google seems to struggle with that last part. Google embedded SearchWiki in the non-social context of search — and has launched +1 the same way. It’s not at all clear what users would gain by going out of their flow to annotate search results. Hotpot may simply be a case of too little, too late — people are already trained to go to Yelp and Facebook Fan pages for subjective information about service businesses. Overall, Google has not given users a reason to believe there is significant return on their investment in sharing opinions.

Collecting Data Doesn’t Count.

Of course Google is able to collect a significant amount of data about users’ identities through their search history, cookies, browser toolbars, and purchase history (if they use Google Checkout). Indeed, it is Google inference of user intent in search queries that has allowed Google to become the poster child of online advertising.

But collecting data is not the same as having the user volunteer it. Most users have a transactional relationship with Google, tolerating data collection and advertising in exchange for a free service. Google wants more — it wants users to invest in identities associated with their Google accounts. But Google doesn’t seem to undertand that users don’t make these investments unless their receive some social or professional utility in return.

If it’s true that Larry Page is making “social” Google’s top OKR, then I hope for the sake of my former colleagues that Google has learned from its past experiments.

Categories
General

Guest Blog: Data 2.0 Conference Report

http://www.flickr.com/apps/slideshow/show.swf?v=71649

Note: This post was written by Scott Nicholson, a Senior Data Scientist at LinkedIn. Scott is data and modeling geek with a passion for startups, product and user experience. His work at LinkedIn focuses on analyzing and improving user engagement and monetization.

I’m happy to report back on my experience at the Data 2.0 conference, an event organized by midVentures and targeted at entrepreneurs building products to leverage the dramatic increase in publicly and privately collected data. The conference has four main themes: what data is available, how to obtain data, how to store and access data, and how to create value from data products. For data nerds or hackers, the conference offered a delightful stream of  “you know what would be cool…” ideas.

The morning started off on a strong foot with a talk by Vivek Wadhwa on how data is going to define the next generation of successful startups in a new information age. He observed the increasing online access to data that has previously been restricted to offline access (or no access at all). He also emphasized the importance of  new sources of data, such as medical records and genome data. We need to think of social use of data beyond Twitter, Facebook and LinkedIn: for example, genome data will allow us to connect to each other in ways that helps us better understand our similarities and differences. Meanwhile, some existing data sources will become increasingly open and available to all. Wadhwa stressed the importance of leveraging the open sources of federal, state and local government data to come up with solutions to the existing closed and clunky legacy systems that governments used to generate data reports (a pity that data.gov and related programs may be defunded — DT).

The morning keynote segued nicely into the panel on open data sources. Jay Nath, Director of CRM for the city of San Francisco, noted that, while many applications are using government data and APIs, they mostly address consumer convenience (e.g., public transit apps) rather than government efficiency.  Panelists agreed that government employees have few incentives to take risks by using new technology: legacy systems might be expensive, inflexible and inefficient, but they do perform their limited function. Alluding to Eric Ries’s idea of a “lean startup“, Nath suggested the concept of a “lean government” that lowered costs, sped up its operations, and avoided procurement processes by using open source technology — all in the context of providing services to its citizens.

The inspiring mid-day keynote by former Amazon Chief Scientist Andreas Weigend took a different perspective from the morning sessions: he focused on the how data sharing can provide tangible value to end-users, even resulting in significant behavior change. He cited products like tweeting weight scales, FitBit, and Nike + that allow people to share data about their fitness efforts, thus leading to social reinforcement for positive behaviors. I personally see this area as a great example of where data scientists and engineers can create enormous economic value and increase people’s welfare

The day also featured a various product launches and presentations. Here are a few that caught my attention:

  • Micello: Google maps for indoors. They won the startup competition that was held in conjunction with the conference.
  • Tropo: API for voice calls and SMS
  • DataStax Brisk: Technology unifying Hadoop, Hive & Cassandra. A new Hadoop distribution powered by Cassandra.
  • Neer: always-on location awareness app from Qualcomm. Privately share location with groups and families.
  • Heritage Health Prize: $3MM prize for predictive modeling around who will require hospitalization (a follow-up on their announcing the prize at Strata)

Overall, it was great to see hundreds of people exploring innovations and opportunities to use data to improve business, technology and society.