One of the highlights of the recent Data 2.0 Summit was a panel featuring:
- Alexander Gray, CTO of SkyTree
- Anthony Goldbloom, CEO of Kaggle
- Josh Wills, Director of Data Science at Cloudera
The focus of the panel was supposed to be about “Data Science and Predicting the Future”, but the most contentious topic was whether data, algorithms or people (that is, the data scientists themselves) were the most important factor in the practice and success of data science.
Yes, we one-upped the debate that my colleague Monica Rogati instigated at this year’s Strata conference. In fact, Josh cited the “better data beats more data beats clever algorithms” argument that Monica made in her own Strata presentation. And, just like at Strata, there was a healthy dose of audience participation.
Of course, I came down on the side of data — which I believe won the debate hands down.
I’m a fan of clever algorithms, which Alexander had to defend given that Skytree’s core value proposition is better machine learning algorithms delivered at scale. But I’m with Peter Norvig et al. on the dominance of data over algorithms.
Favoring data over people was a harder choice. Anthony naturally made the case for people (Kaggle’s claim to fame is assembling many of the world’s best data scientists by organizing competitions). Hopefully my team won’t quit en masse when they read this blog post! But I think they’ll agree with me that, without the incredible data we work with at LinkedIn, they’d be unable to deliver the awesomeness that I’ve come to expect from them.
There’s a saying that we all cook from the same cookbooks, so that it’s the ingredients that make all the difference. To take the metaphor further, you can also try to poach your rival’s chefs. But data is the biggest entry barrier — and the most sustainable competitive advantage.
Of course, we should have the best people apply the best algorithms to work with the best data. But data comes first. The best meal starts with the best ingredients.