I’ve started reading a few different blogs in the past months, and one that I particularly like is Chris Dixon’s, which has the simple (if uncreative) title cdixon.org.
Chris has an interesting history that includes heading R&D at a hedge fund, co-founding SiteAdvisor, investing in a number of technology companies (including Skype and Postini), and most recently co-founding Hunch (which I’ve blogged about here a few times). As a karaoke junkie, I can’t help noting that he developed the software that became MySpace Karaoke.
Not surprisingly, Chris brings the combined perspective of an investor and a technologist to his blog. Here are some examples of recent posts that illustrate his range.
Thoughts on machine learning:
- To make smarter systems, it’s all about the data
- Machine learning is really good at partially solving just about any problem
Career advice for entrepreneurs:
- The worst time to join a startup is right after it gets initial VC financing
- Why you shouldn’t keep your startup idea secret
And of course he occasionally blogs about Hunch, his current venture.
Chris has a strong personality that comes through as a blogger. I think that’s critical for making a blog both informative and entertaining, and I try to channel my own personality (which I’m told, for better or worse, is quite distinctive) through this blog.
In short, check out cdixon.org if you’re interested in the perspective of a practical (and successful) technologist-entrepreneur.
5 replies on “Blogs I Read: Chris Dixon (cdixon.org)”
To make smarter systems, it’s all about the data
Do you believe it?
Personally, I think statements like this are true, for certain limited domains or application types. Home page finding, factoid lookup. Those are areas where large data is very useful in web search.
But when your information need is exploratory, I have a difficult time seeing how large data will help. By definition, you have an information need or task that is orthogonal to the direction that the large data is pointing. Exploratory search runs against the large data grain, not with it.
LikeLike
I think it’s worth quoting from the post: “significant AI breakthroughs come from identifying or creating new sources of data, not inventing new algorithms.” He’s not arguing that data scale is everything, but rather that looking at the right data trumps picking the right data mining algorithm. And I’m generally in agreement with him there.
LikeLike
looking at the right data trumps picking the right data mining algorithm
Just to clarify: We’re not talking about good feature selection, are we? (i.e. just picking the right attributes out of whatever data we currently have.)
And we’re also not talking about intelligently-initialized machine learning algorithms (i.e. what some might call “structured” learning, or initializing your learning algorithm with domain-dependent knowledge so as to guide the learning algorithm into the best task-specific models.)
Instead, we’re talking about simply identifying new sources of data? Is this correct?
LikeLike
Well, I obviously can’t speak for Chris, but I think he means discovering data you haven’t been using, rather than improving the feature selection on the data that you have been.
In particular, I read his example of PageRank as arguing that Google’s improvement over the dominant approaches it displaced as being its introducing the use of links, rather than its simply assigning more weight to them relative to other factors.
Another way of looking at this is that it’s often more important to pick a good objective function (and particularly the right inputs) than to choose the best algorithm for optimizing relative to that objective function.
LikeLike
Another way of looking at this is that it’s often more important to pick a good objective function (and particularly the right inputs) than to choose the best algorithm for optimizing relative to that objective function.
I… yes ok. If we’re just making that narrow of a claim, I think I might be able to get on board.
I still have this nagging feeling though.. think about it this way: What is the different between (a) choosing the right input to your learning algorithm, and (b) choosing the right constraint on that that in the learning algorithm?
Everyone else seems to be expressing a preference for (a). But I think that they’re equivalent. And have different advantages and disadvantages.
I think I’ll blog about it tomorrow, using a music information retrieval example. Will let you know when it’s up.
LikeLike