Innovation at Huffington Post: Data-Driven Headlines

The other day, I was suggesting to one of my colleagues that Endeca‘s software could help authors write better (translate, more SEO-friendly) headlines. The details of that discussion are proprietary, but I’m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their right-brain creativity.

But apparently the Huffington Post is blazing new trails in this area. The Nieman Journalism Lab reports that:

The Huffington Post applies A/B testing to some of its headlines. Readers are randomly shown one of two headlines for the same story. After five minutes, which is enough time for such a high-traffic site, the version with the most clicks becomes the wood that everyone sees.

NJL also reports that Huffington Post social media editor–and long-time Noisy Channel reader–Josh Young uses Twitter to help crowd-source better headlines.

I’m sure this approach must rattle some old-school journalists. And there is a real danger of optimizing for the wrong outcome. For example, including the word “sex” in this message might improve its traffic (the popularity of this post attests to that), but to what end?

Still, I don’t see this use of technology as cramping anyone’s style. Most of us write to be read–especially those in the media industry who are trying to monetize their audiences. Measurable success matters, and there’s no harm in trying to maximize it.

By Daniel Tunkelang

High-Class Consultant.

31 replies on “Innovation at Huffington Post: Data-Driven Headlines”

Are you sure you want to trade readers of “Software Patents: A Personal Story” for readers of “Dan’s Friend Screwed by Trolls”?

Don’t all the big web companies do this kind of alternative exploration all the time? For instance, check out this description of Google News ranking.

I think evaluating a title for SEO sexiness would be relatively easy, though there are lots of moving parts to control for, and the target’s not stationary (e.g. “Brangelina”‘s cachet might expire).

Generating headlines from articles, or even paraphrasing existing headlines, is much more poorly understood. Assuming you want the title to match the article; if you’re just a spammer, search over existing titles is probably enough.

You could use English-to-English machine translation, but the tech’s pretty sketchy, and certainly won’t be up to the New York Post‘s standard.


A number of our customers are working in some similar areas.

Tailrank’s clustering engine (way back in the day) had the ability to determine the cluster title by looking at the term distribution of the cluster text and then selecting the most mathematically appropriate title.


I can’t help the feeling that all this focus on SEO and completely data-driven decision-making is going to lead us into.. well.. mediocrity.

It reminds me of this great Onion parody: “Live Poll Allows Pundits to Pander to Viewers in Real Time”

While it is still a parody at this point, ever joke reveals the underlying truth: Data-driven headline selection is a step in the direction of real-time pandering.

And is this really the social incentive structure that we’re trying to create?


Pandering is such a loaded word–and we hardly need automated techniques to incent mediocrity. I agree that there’s a real danger of implementing the dystopia envisioned in Epic 2014 / 2015, but let’s not confuse the implementation with the vision it is trying to implement. Many publishers knowingly act in ways that forgo opportunities to broaden their audiences. I’m sure that will continue, regardless of the analytical tools available to them.


Well, “pandering” was the Onion’s choice of words, not mine.

But it’s indeed the vision, rather than the implementation, that makes me a little squeamish. The vision is the changing of a product to make it have more mass market appeal. The vision is self-alteration to a lowest common denominator. Correct?


..that is to say, we’re dealing with a fundamental tradeoff here, regardless of the implementation. That tradeoff is “broad appeal but generic” vs. “narrow appeal and specific”.

The decision to create something new and valuable is inherently a decision to move away from the broad and generic, and toward the specific and interesting.

The decision to increase one’s market share is inherently a decision to move in the opposite direction, toward the broad and generic.

I would argue that one of the biggest problems in news today is not that it has either a liberal or a conservative bias (plenty of both), but that it tries to be too broad and therefore too generic. And it never ends up tackling real issues, for fear of losing mass market appeal.

So again, is this the vision that we really want for society? A continued move toward broad, bland, greater-appeal but less-substance information?


In the specific case of the Huffington Post, we are just talking about the headlines, right? ๐Ÿ™‚

But your point is well taken–the quest for popularity at the expense of all other values is one that does lead to mediocrity, if not outright idiocracy. But the other extreme of ignoring the audience’s preferences as input leads to pathological narcissism. In fact, I’d say that we see that problem in some scholarly circles–but that’s a subject for a blog post in its own right.

In any case, many writers do want, subject to some constraints, to optimize for readership goals. Those goals may be more specific than maximizing the number of readers–typically, some readers are more important to the writer than others. And if writers are consciously trying to optimize their writing for audience response, I at least see no further harm in providing writers with the tools to do so efficiently.


I wrote my comment before seeing your latest one–so let me just add one point: pandering doesn’t necessarily mean going broad. Part of the dystopian vision of Epic 2014/5 is that news is personalizes to conform to the reader’s biases. That is the ultimate pandering!


But the other extreme of ignoring the audienceโ€™s preferences as input leads to pathological narcissism.

Oh, true. Good point.

Part of the dystopian vision of Epic 2014/5 is that news is personalizes to conform to the readerโ€™s biases.

Hmmm.. would we call this “micro-generic”? ๐Ÿ™‚

Yes, obviously we don’t want to ignore our customers completely. That’s not good business, as well as narcissistic, as you say. At the same time, we don’t want to only do/follow the gradient defined by the current customer base, because that shows a lack of vision and guts.

I guess our conclusion is that it’s a fine line?


How, um, generic a conclusion! What I’d really like is predictive analytics. For example, I’m sure I could increase traffic to this blog in a number of ways that would be ultimately counterproductive (e.g., by posting porn). And I certainly wouldn’t want any automated SEO to change my headlines in such a way that I attracted readers I’d inevitably disappoint.

Put another way–it’s not just about eyeballs; it’s about the conversion rate. Only that, like many writers, I’m not selling products, but rather evangelizing ideas. Still, I’m all about maximizing conversion!


How, um, generic a conclusion!

Generic, or timeless?

Still, Iโ€™m all about maximizing conversion!

But in the news cycle, eyeballs is your number of viewers, while conversion is how much you get people talking about what you’ve told them. And the quickest route to that goal is still sensationalism. Glenn Beck does a fantastic job at maximizing conversion, but I hardly think that is a style that we should be seeking to emulate, regardless of our politics.


but I hardly think that is a style that we should be seeking to emulate, regardless of our politics.

… too literal in its interpretations, too black and white in its conclusions, not nuanced enough in its headlines.

In short, it’s a perfect SEO style, is it not?


Was kidding about it being generic, but I agree that timelessness is a less generic term than generic. ๐Ÿ™‚

As for whether Glenn Beck (or Michael Moore) is a role model, I’m torn. As much as I’d like to see rational and civil discussion as the foundation for how we inform and persuade one another, it’s hard not to be jealous of successful evangelists. The end may not justify the means, but it certainly poses interesting moral dilemmas.

For example, consider the recent debate over vaccination. Perhaps doctors could be more effective at persuading the general public–and thus in actually saving lives–if they employed more sensationalist tactics. In the long term, that would probably cost them credibility–although people have frightfully short-term memories.

Perhaps journalistic integrity should trump all other considerations. I’m not that much of a purist. But then again, I wouldn’t consider myself a journalist.


I haven’t read this book, but I would eventually like to get to it:

Replace Postman’s hypothesis:

“The presentation most often de-emphasizes quality; all data becomes burdened to the far-reaching need for entertainment.”


“The presentation most often de-emphasizes quality; all data becomes burdened to the far-reaching need for SEO.”

I am certainly not a lone or even unique voice in raising these questions. Others have done it more eloquently. Still, this extremism when it comes to data-driven approaches doesn’t make me comfortable. Kevin Burton, above, is probably correct when he says that a move to these approaches in inevitable.


I appreciate the concern, and I specifically agree in abhorring the drive toward sensationalism. Robert Frank describes this tend as a consequence of globalized news and entertainment being a winner-take-all market. Nonetheless, I don’t see the answer being to change the optimization method. If we’re going to go to hell, we may as well get there efficiently.


Yeah, search is pretty much turning into a winner-take-all domain as well…with 60% or more of the global web search market share being taken up by a single company.. and a very, very short tail of 2nd and 3rd place competitors.

I disagree that we all necessarily need to go to hell. If the world is burning, why feed fuel to the fire? Why play the game, and consciously participate in reducing the diversity of the ecosystem?


Just for grins, let’s turn this around. Remember when San Francisco magazine posted a story about Google VP Marissa Mayer with an unfortunate title? A bit of automation might have warned them of the sort of traffic the article would attract. So would a savvier editor, but my point is that we should confuse the means with end.


Grins indeed ๐Ÿ˜‰

Sure, I’m not opposed to using automated systems to help avoid unfortunate “googirl” references. But that sort of automation is not what we’re talking about here, is it? There is a difference between avoiding sexual slang, and choosing words that are geared to get a maximum audience, high-ranked position from automated machine learning algorithms.

The unfortunate double entendre aside, I still admire the fact that there was a human writer behind the headline, attempting to inject (sorry!) honest cleverness and creativity into a form of human-to-human communication.

As Nick Carr wrote a few years ago, it’s still different than writing for the machine (SEO):


Agreed, my example is pretty unrepresentative: I can’t argue with a straight face that anyone I know is using automated techniques in such a sophisticated way. Most are probably aiming to increase raw traffic through what they perceive as harmless changes to their presentation. And yes, it’s easy for them to slide down a slippery slope in how they define “harmless”.

Nonetheless, I see s no difference in principle between manual and automatic optimization of content / presentation. Automatic tools that optimize content for its intended audience without compromising its integrity–cui malo?


Ok, maybe I didn’t make myself clear earlier, but I’m actually uncomfortable with both manual and automatic optimization of content for the purpose of SEO. I’m uncomfortable with the idea of writing for the machine, rather than writing for the human, no matter of that writing is done by humans manually, or assisted by machines.

So yes, I agree with you that there is no difference in the composition stage between a human who is manually writing for the machine, and a human who is getting borg-assistance in the same task.

My problem is not with the automated writing assistance. My problem is with the target audience, who the writing is for.

Maybe I’m too idealistic, but I think the need for SEO stems from a lack of decent HCIR tools. It’s because search engines are trying to be “McDonalds” that creates the need to optimize for them in the first place. That’s all backwards. People shouldn’t be writing for the search engines. Search engines should be finding what people write. That’s IR. That’s HCIR.


These back and forth comments make me nostalgic!

Anyway, I think we’re converging. Only that I’ll push a bit harder–I think some of what writers do for SEO (manually or automatically) is actually adding value to the content. The analogy I like to use is polishing a resume vs. embellishing it: the latter is unethical, but the former is win/win for both the applicant and the prospective employer. At least to some extent, optimizing for search engines overlaps with optimizing for human consumption. Where it doesn’t, I’m with you–SEO should not create an artificial skew on the content we produce.


Damn — like any good internet flame war, we are supposed to diverge rather than converge. What are we doing wrong? ๐Ÿ˜‰

Ok, I see your “polish the resume” point. Hrm. Sure, nothing wrong with polishing. But again, you’re polishing for human/HR/boss consumption, rather than polishing for a retrieval algorithm.

I suppose to the extent that the tangible outcomes when polishing for machine vs. for man overlap, I can’t disagree with you. I would be foolish to say that, even if outcomes are the identical, motives have to be unsullied.

Where I don’t have as much confidence as you is in how often the final outcome will be (essentially) the same after polishing with man vs. machine as the target audience. I think those scenarios will diverge more than they converge.

Any ideas about a way we could test that (divergence/convergence rates) empirically?


One think I’ve noticed is that optimizing for Twitter traffic is quite different from SEO. The former feels more like optimizing for humans (getting people to retweet, post links, etc.), while SEO is of course optimizing for the search engines. As a crude approximation, compare ranking on Topsy to ranking on Google.


But I agree, I enjoy these back-and-forths as well. Thanks for the discussion.

To me, that’s the point of blogging: the ensuing conversations. Wish more folks would join in.


Comments are closed.