The Noisy Channel


Innovation at Huffington Post: Data-Driven Headlines

October 15th, 2009 · 31 Comments · General

The other day, I was suggesting to one of my colleagues that Endeca‘s software could help authors write better (translate, more SEO-friendly) headlines. The details of that discussion are proprietary, but I’m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their right-brain creativity.

But apparently the Huffington Post is blazing new trails in this area. The Nieman Journalism Lab reports that:

The Huffington Post applies A/B testing to some of its headlines. Readers are randomly shown one of two headlines for the same story. After five minutes, which is enough time for such a high-traffic site, the version with the most clicks becomes the wood that everyone sees.

NJL also reports that Huffington Post social media editor–and long-time Noisy Channel reader–Josh Young uses Twitter to help crowd-source better headlines.

I’m sure this approach must rattle some old-school journalists. And there is a real danger of optimizing for the wrong outcome. For example, including the word “sex” in this message might improve its traffic (the popularity of this post attests to that), but to what end?

Still, I don’t see this use of technology as cramping anyone’s style. Most of us write to be read–especially those in the media industry who are trying to monetize their audiences. Measurable success matters, and there’s no harm in trying to maximize it.

31 responses so far ↓

  • 1 Bob Carpenter // Oct 15, 2009 at 7:09 pm

    Are you sure you want to trade readers of “Software Patents: A Personal Story” for readers of “Dan’s Friend Screwed by Trolls”?

    Don’t all the big web companies do this kind of alternative exploration all the time? For instance, check out this description of Google News ranking.

    I think evaluating a title for SEO sexiness would be relatively easy, though there are lots of moving parts to control for, and the target’s not stationary (e.g. “Brangelina”‘s cachet might expire).

    Generating headlines from articles, or even paraphrasing existing headlines, is much more poorly understood. Assuming you want the title to match the article; if you’re just a spammer, search over existing titles is probably enough.

    You could use English-to-English machine translation, but the tech’s pretty sketchy, and certainly won’t be up to the New York Post‘s standard.

  • 2 Daniel Tunkelang // Oct 15, 2009 at 10:22 pm

    Bob, from now on I’ll let you write titles for me! And you’re right that evaluating a title for SEO is tractable–the more interesting challenge is not just evaluation but guidance. The New York Post, of course, is the gold standard.

  • 3 Kevin Burton // Oct 17, 2009 at 5:42 pm

    A number of our customers are working in some similar areas.

    Tailrank’s clustering engine (way back in the day) had the ability to determine the cluster title by looking at the term distribution of the cluster text and then selecting the most mathematically appropriate title.

  • 4 Daniel Tunkelang // Oct 18, 2009 at 3:33 pm

    Kevin, centroid approaches (like this one) make a lot of sense for summarization. But I think that playing the SEO game effectively requires more inputs and more work.

  • 5 jeremy // Oct 19, 2009 at 2:26 pm

    I can’t help the feeling that all this focus on SEO and completely data-driven decision-making is going to lead us into.. well.. mediocrity.

    It reminds me of this great Onion parody: “Live Poll Allows Pundits to Pander to Viewers in Real Time”

    While it is still a parody at this point, ever joke reveals the underlying truth: Data-driven headline selection is a step in the direction of real-time pandering.

    And is this really the social incentive structure that we’re trying to create?

  • 6 Kevin Burton // Oct 19, 2009 at 2:28 pm

    Hey Jeremy… I think you’re right but I also think it’s inevitable :-/

  • 7 Daniel Tunkelang // Oct 19, 2009 at 2:36 pm

    Pandering is such a loaded word–and we hardly need automated techniques to incent mediocrity. I agree that there’s a real danger of implementing the dystopia envisioned in Epic 2014 / 2015, but let’s not confuse the implementation with the vision it is trying to implement. Many publishers knowingly act in ways that forgo opportunities to broaden their audiences. I’m sure that will continue, regardless of the analytical tools available to them.

  • 8 jeremy // Oct 19, 2009 at 4:56 pm

    Well, “pandering” was the Onion’s choice of words, not mine.

    But it’s indeed the vision, rather than the implementation, that makes me a little squeamish. The vision is the changing of a product to make it have more mass market appeal. The vision is self-alteration to a lowest common denominator. Correct?

  • 9 jeremy // Oct 19, 2009 at 5:04 pm

    ..that is to say, we’re dealing with a fundamental tradeoff here, regardless of the implementation. That tradeoff is “broad appeal but generic” vs. “narrow appeal and specific”.

    The decision to create something new and valuable is inherently a decision to move away from the broad and generic, and toward the specific and interesting.

    The decision to increase one’s market share is inherently a decision to move in the opposite direction, toward the broad and generic.

    I would argue that one of the biggest problems in news today is not that it has either a liberal or a conservative bias (plenty of both), but that it tries to be too broad and therefore too generic. And it never ends up tackling real issues, for fear of losing mass market appeal.

    So again, is this the vision that we really want for society? A continued move toward broad, bland, greater-appeal but less-substance information?

  • 10 Daniel Tunkelang // Oct 19, 2009 at 5:06 pm

    In the specific case of the Huffington Post, we are just talking about the headlines, right? :-)

    But your point is well taken–the quest for popularity at the expense of all other values is one that does lead to mediocrity, if not outright idiocracy. But the other extreme of ignoring the audience’s preferences as input leads to pathological narcissism. In fact, I’d say that we see that problem in some scholarly circles–but that’s a subject for a blog post in its own right.

    In any case, many writers do want, subject to some constraints, to optimize for readership goals. Those goals may be more specific than maximizing the number of readers–typically, some readers are more important to the writer than others. And if writers are consciously trying to optimize their writing for audience response, I at least see no further harm in providing writers with the tools to do so efficiently.

  • 11 Daniel Tunkelang // Oct 19, 2009 at 5:09 pm

    I wrote my comment before seeing your latest one–so let me just add one point: pandering doesn’t necessarily mean going broad. Part of the dystopian vision of Epic 2014/5 is that news is personalizes to conform to the reader’s biases. That is the ultimate pandering!

  • 12 jeremy // Oct 19, 2009 at 5:13 pm

    But the other extreme of ignoring the audience’s preferences as input leads to pathological narcissism.

    Oh, true. Good point.

    Part of the dystopian vision of Epic 2014/5 is that news is personalizes to conform to the reader’s biases.

    Hmmm.. would we call this “micro-generic”? :-)

    Yes, obviously we don’t want to ignore our customers completely. That’s not good business, as well as narcissistic, as you say. At the same time, we don’t want to only do/follow the gradient defined by the current customer base, because that shows a lack of vision and guts.

    I guess our conclusion is that it’s a fine line?

  • 13 Daniel Tunkelang // Oct 19, 2009 at 5:20 pm

    How, um, generic a conclusion! What I’d really like is predictive analytics. For example, I’m sure I could increase traffic to this blog in a number of ways that would be ultimately counterproductive (e.g., by posting porn). And I certainly wouldn’t want any automated SEO to change my headlines in such a way that I attracted readers I’d inevitably disappoint.

    Put another way–it’s not just about eyeballs; it’s about the conversion rate. Only that, like many writers, I’m not selling products, but rather evangelizing ideas. Still, I’m all about maximizing conversion!

  • 14 jeremy // Oct 19, 2009 at 5:27 pm

    How, um, generic a conclusion!

    Generic, or timeless?

    Still, I’m all about maximizing conversion!

    But in the news cycle, eyeballs is your number of viewers, while conversion is how much you get people talking about what you’ve told them. And the quickest route to that goal is still sensationalism. Glenn Beck does a fantastic job at maximizing conversion, but I hardly think that is a style that we should be seeking to emulate, regardless of our politics.

  • 15 jeremy // Oct 19, 2009 at 5:29 pm

    but I hardly think that is a style that we should be seeking to emulate, regardless of our politics.

    … too literal in its interpretations, too black and white in its conclusions, not nuanced enough in its headlines.

    In short, it’s a perfect SEO style, is it not?

  • 16 Daniel Tunkelang // Oct 19, 2009 at 5:39 pm

    Was kidding about it being generic, but I agree that timelessness is a less generic term than generic. :-)

    As for whether Glenn Beck (or Michael Moore) is a role model, I’m torn. As much as I’d like to see rational and civil discussion as the foundation for how we inform and persuade one another, it’s hard not to be jealous of successful evangelists. The end may not justify the means, but it certainly poses interesting moral dilemmas.

    For example, consider the recent debate over vaccination. Perhaps doctors could be more effective at persuading the general public–and thus in actually saving lives–if they employed more sensationalist tactics. In the long term, that would probably cost them credibility–although people have frightfully short-term memories.

    Perhaps journalistic integrity should trump all other considerations. I’m not that much of a purist. But then again, I wouldn’t consider myself a journalist.

  • 17 jeremy // Oct 19, 2009 at 6:23 pm

    I haven’t read this book, but I would eventually like to get to it:

    Replace Postman’s hypothesis:

    “The presentation most often de-emphasizes quality; all data becomes burdened to the far-reaching need for entertainment.”


    “The presentation most often de-emphasizes quality; all data becomes burdened to the far-reaching need for SEO.”

    I am certainly not a lone or even unique voice in raising these questions. Others have done it more eloquently. Still, this extremism when it comes to data-driven approaches doesn’t make me comfortable. Kevin Burton, above, is probably correct when he says that a move to these approaches in inevitable.

  • 18 Daniel Tunkelang // Oct 19, 2009 at 9:09 pm

    I appreciate the concern, and I specifically agree in abhorring the drive toward sensationalism. Robert Frank describes this tend as a consequence of globalized news and entertainment being a winner-take-all market. Nonetheless, I don’t see the answer being to change the optimization method. If we’re going to go to hell, we may as well get there efficiently.

  • 19 jeremy // Oct 20, 2009 at 12:49 pm

    Yeah, search is pretty much turning into a winner-take-all domain as well…with 60% or more of the global web search market share being taken up by a single company.. and a very, very short tail of 2nd and 3rd place competitors.

    I disagree that we all necessarily need to go to hell. If the world is burning, why feed fuel to the fire? Why play the game, and consciously participate in reducing the diversity of the ecosystem?

  • 20 Daniel Tunkelang // Oct 20, 2009 at 1:32 pm

    Just for grins, let’s turn this around. Remember when San Francisco magazine posted a story about Google VP Marissa Mayer with an unfortunate title? A bit of automation might have warned them of the sort of traffic the article would attract. So would a savvier editor, but my point is that we should confuse the means with end.

  • 21 jeremy // Oct 20, 2009 at 1:59 pm

    Grins indeed ;-)

    Sure, I’m not opposed to using automated systems to help avoid unfortunate “googirl” references. But that sort of automation is not what we’re talking about here, is it? There is a difference between avoiding sexual slang, and choosing words that are geared to get a maximum audience, high-ranked position from automated machine learning algorithms.

    The unfortunate double entendre aside, I still admire the fact that there was a human writer behind the headline, attempting to inject (sorry!) honest cleverness and creativity into a form of human-to-human communication.

    As Nick Carr wrote a few years ago, it’s still different than writing for the machine (SEO):

  • 22 Daniel Tunkelang // Oct 20, 2009 at 2:10 pm

    Agreed, my example is pretty unrepresentative: I can’t argue with a straight face that anyone I know is using automated techniques in such a sophisticated way. Most are probably aiming to increase raw traffic through what they perceive as harmless changes to their presentation. And yes, it’s easy for them to slide down a slippery slope in how they define “harmless”.

    Nonetheless, I see s no difference in principle between manual and automatic optimization of content / presentation. Automatic tools that optimize content for its intended audience without compromising its integrity–cui malo?

  • 23 jeremy // Oct 20, 2009 at 2:33 pm

    Ok, maybe I didn’t make myself clear earlier, but I’m actually uncomfortable with both manual and automatic optimization of content for the purpose of SEO. I’m uncomfortable with the idea of writing for the machine, rather than writing for the human, no matter of that writing is done by humans manually, or assisted by machines.

    So yes, I agree with you that there is no difference in the composition stage between a human who is manually writing for the machine, and a human who is getting borg-assistance in the same task.

    My problem is not with the automated writing assistance. My problem is with the target audience, who the writing is for.

    Maybe I’m too idealistic, but I think the need for SEO stems from a lack of decent HCIR tools. It’s because search engines are trying to be “McDonalds” that creates the need to optimize for them in the first place. That’s all backwards. People shouldn’t be writing for the search engines. Search engines should be finding what people write. That’s IR. That’s HCIR.

  • 24 Daniel Tunkelang // Oct 20, 2009 at 3:02 pm

    These back and forth comments make me nostalgic!

    Anyway, I think we’re converging. Only that I’ll push a bit harder–I think some of what writers do for SEO (manually or automatically) is actually adding value to the content. The analogy I like to use is polishing a resume vs. embellishing it: the latter is unethical, but the former is win/win for both the applicant and the prospective employer. At least to some extent, optimizing for search engines overlaps with optimizing for human consumption. Where it doesn’t, I’m with you–SEO should not create an artificial skew on the content we produce.

  • 25 jeremy // Oct 20, 2009 at 7:48 pm

    Damn — like any good internet flame war, we are supposed to diverge rather than converge. What are we doing wrong? ;-)

    Ok, I see your “polish the resume” point. Hrm. Sure, nothing wrong with polishing. But again, you’re polishing for human/HR/boss consumption, rather than polishing for a retrieval algorithm.

    I suppose to the extent that the tangible outcomes when polishing for machine vs. for man overlap, I can’t disagree with you. I would be foolish to say that, even if outcomes are the identical, motives have to be unsullied.

    Where I don’t have as much confidence as you is in how often the final outcome will be (essentially) the same after polishing with man vs. machine as the target audience. I think those scenarios will diverge more than they converge.

    Any ideas about a way we could test that (divergence/convergence rates) empirically?

  • 26 Daniel Tunkelang // Oct 20, 2009 at 8:49 pm

    One think I’ve noticed is that optimizing for Twitter traffic is quite different from SEO. The former feels more like optimizing for humans (getting people to retweet, post links, etc.), while SEO is of course optimizing for the search engines. As a crude approximation, compare ranking on Topsy to ranking on Google.

  • 27 jeremy // Oct 20, 2009 at 9:02 pm

    When does “optimization” cross the line into “baiting”?


  • 28 Daniel Tunkelang // Oct 20, 2009 at 9:24 pm

    What is this line you speak of? :-)

  • 29 jeremy // Oct 20, 2009 at 10:45 pm

    Um.. Maginot? :-))

  • 30 jeremy // Oct 21, 2009 at 1:11 pm

    But I agree, I enjoy these back-and-forths as well. Thanks for the discussion.

    To me, that’s the point of blogging: the ensuing conversations. Wish more folks would join in.

  • 31 On Twitter, A/B Analysis, and the Art of Headlines // Oct 27, 2009 at 11:40 am

    [...] no, perhaps, not so cool? Maybe data-driven headlines are a problem (quoting The Noisy Channel on this subject): I’m sure this approach must rattle [...]

Clicky Web Analytics