Categories
Uncategorized

Blogs I Read: Chris Dixon (cdixon.org)

I’ve started reading a few different blogs in the past months, and one that I particularly like is Chris Dixon’s, which has the simple (if uncreative) title cdixon.org.

Chris has an interesting history that includes heading R&D at a hedge fund, co-founding SiteAdvisor, investing in a number of technology companies (including Skype and Postini), and most recently co-founding Hunch (which I’ve blogged about here a few times). As a karaoke junkie, I can’t help noting that he developed the software that became MySpace Karaoke.

Not surprisingly, Chris brings the combined perspective of an investor and a technologist to his blog. Here are some examples of recent posts that illustrate his range.

Thoughts on machine learning:

Career advice for entrepreneurs:

And of course he occasionally blogs about Hunch, his current venture.

Chris has a strong personality that comes through as a blogger. I think that’s critical for making a blog both informative and entertaining, and I try to channel my own personality (which I’m told, for better or worse, is quite distinctive) through this blog.

In short, check out cdixon.org if you’re interested in the perspective of a practical (and successful) technologist-entrepreneur.

By Daniel Tunkelang

High-Class Consultant.

5 replies on “Blogs I Read: Chris Dixon (cdixon.org)”

To make smarter systems, it’s all about the data

Do you believe it?

Personally, I think statements like this are true, for certain limited domains or application types. Home page finding, factoid lookup. Those are areas where large data is very useful in web search.

But when your information need is exploratory, I have a difficult time seeing how large data will help. By definition, you have an information need or task that is orthogonal to the direction that the large data is pointing. Exploratory search runs against the large data grain, not with it.

Like

I think it’s worth quoting from the post: “significant AI breakthroughs come from identifying or creating new sources of data, not inventing new algorithms.” He’s not arguing that data scale is everything, but rather that looking at the right data trumps picking the right data mining algorithm. And I’m generally in agreement with him there.

Like

looking at the right data trumps picking the right data mining algorithm

Just to clarify: We’re not talking about good feature selection, are we? (i.e. just picking the right attributes out of whatever data we currently have.)

And we’re also not talking about intelligently-initialized machine learning algorithms (i.e. what some might call “structured” learning, or initializing your learning algorithm with domain-dependent knowledge so as to guide the learning algorithm into the best task-specific models.)

Instead, we’re talking about simply identifying new sources of data? Is this correct?

Like

Well, I obviously can’t speak for Chris, but I think he means discovering data you haven’t been using, rather than improving the feature selection on the data that you have been.

In particular, I read his example of PageRank as arguing that Google’s improvement over the dominant approaches it displaced as being its introducing the use of links, rather than its simply assigning more weight to them relative to other factors.

Another way of looking at this is that it’s often more important to pick a good objective function (and particularly the right inputs) than to choose the best algorithm for optimizing relative to that objective function.

Like

Another way of looking at this is that it’s often more important to pick a good objective function (and particularly the right inputs) than to choose the best algorithm for optimizing relative to that objective function.

I… yes ok. If we’re just making that narrow of a claim, I think I might be able to get on board.

I still have this nagging feeling though.. think about it this way: What is the different between (a) choosing the right input to your learning algorithm, and (b) choosing the right constraint on that that in the learning algorithm?

Everyone else seems to be expressing a preference for (a). But I think that they’re equivalent. And have different advantages and disadvantages.

I think I’ll blog about it tomorrow, using a music information retrieval example. Will let you know when it’s up.

Like

Comments are closed.