Author: Daniel Tunkelang

High-Class Consultant.

Jeremy Blogged in Class Today!

Post author By Daniel Tunkelang
Post date February 22, 2009
3 Comments on Jeremy Blogged in Class Today!

OK, for those of you who don’t recognize the Pearl Jam allusion, here are the lyrics and video. But most of all, congrats to Jeremy Pickens for jumping in to the blogosphere with his new blog: Information Retrieval Gupf.

Most readers here who read the comments (note to RSS readers–you can and should also read the comments feed via RSS!) know Jeremy as Google’s outsourced conscience. So it’s not surprising to see him blogging about Google’s attitude towards competition in his most recent post, entitled “One Click Away“.

I’ve of course added Jeremy’s blog to the feeds I read. While I realize he’ll now be splitting his time between commenting at The Noisy Channel and posting to his own blog, I’d like to think that I’m not losing a commenter but gaining a blogger. Jeremy, welcome to the dark side!

Uncategorized

Quasi-Property Rights: Associated Press and the “Hot News” Doctrine

Post author By Daniel Tunkelang
Post date February 22, 2009
8 Comments on Quasi-Property Rights: Associated Press and the “Hot News” Doctrine

Like many bloggers, I learn which topics are “hot” from aggregators like Techmeme–which in turn automatically aggregate news sources from around the world (though lately they’re also receiving help from human volunteers). I’ve always thought this fell under the doctrine of fair use.

But apparently neither the Associated Press nor the courts think so. As Joe Mullin reports at The Prior Art:

A New York federal judge ruled Tuesday that The Associated Press can sue its competitors not merely for copyright infringement, but for a “quasi property” right in the news known as the “hot news” doctrine.

As Mullin points out, this doctrine seems broad enough to cover any instance where one news organization covers a topic after being “scooped” by a competitor. Surely no one would dream of applying it so broadly, but stranger things have been known to happen.

Intellectual property law is crazy enough without entertaining a world where me-too coverage–or even mere citation–is considered theft. I hope that this lawsuit established a sustainable standard of fair use. I shudder to think that Techmeme could sue me for writing this post–if Mullin didn’t sue both of us first!

Uncategorized

Relevance is Not a Game Changer in Search

Post author By Daniel Tunkelang
Post date February 22, 2009
3 Comments on Relevance is Not a Game Changer in Search

The top story on Techmeme as I write is Randall Stross ‘s New York Times article, “Everyone Loves Google, Until It’s Too Big“. As Stephen Arnold notes, don’t read the article expecting to learn something new. Still, not everyone who reads the New York Times is current on the search industry, and the article does sum of the state of affairs neatly.

In particular, there’s a nice point by Prabhakar Raghavan, head of Yahoo! Research:

“Whether we’re slightly ahead or slightly behind Google in core relevance is not a game changer in search,” said Prabhakar Raghavan, Yahoo’s chief search strategist.

Yahoo’s best opportunity, Mr. Raghavan said, is to offer radically new ways of presenting information that will help users finish whatever it is they started before the search, like finding a job or buying a plane ticket. “People don’t want to search; it’s a digression,” he said. “They want to complete a task.”

Yes, I told you so. And so did Prabhakar, among others. I just wonder how long it will take for the rest of the world to figure this out. That’s why I’m happy to see this discussion making it into the paper of record.

General

Reflecting on Times Open

Like everyone else who managed to get into the standing room only event (or at least everyone I had a chance to meet there), I had a great time at Times Open, the New York Times’s coming out party for its APIs. I won’t try to summarize, since Taylor Barstow has already done that superbly (also, check out his nytexplorer app!).

Also, people have commented about the intense Twitter conversations that took place during the presentations. I feel strongly that they added to the event (see my response to Owen Thomas on Valleywag). And, by no coincidence, I want to use this space to talk about what I felt to be the most interesting topic that came up through the event: the handling of user-generated content.

The New York Times is one of the world’s oldest and most prestigious media brands. As their CEO Janet Robinson proudly told us, the paper has won 98 Pulitzer Prizes, more than any other newspaper. Moreover, she added, they have been around for 158 years and plan to be around for another 158 years. For perspective, some folks don’t even give the New York Times 158 days to live! Despite the thei recent financial troubles, I think they’re here for the long haul–and not because I have any financial interest in their success. Rather, it is because I see the vast numbers of people who look to the New York Times as something more than news wire with a pretty logo.

They care about the most personal aspects of the paper, like its columnists. When Tim O’Reilly called attention to science and technology writer John Markoff, everyone turned to him, eager to put a face on a writer whom many of us have been reading since before there was a World Wide Web. With all respect for the Associated Press, I can imagine that they or their staff command this sort of devotion. Personality makes a big difference, and personality comes from people.

But of course there are far more people reading the New York Times than writing it. Those people increasingly want to play a more active role. They do so in various ways today:

Contributing to the “Most Popular” stats by email articles to one another.
Commenting on the select articles that offer this opportunity.
Linking to articles in their blog posts, tweets, and other social media.

Here is an audience eager to participate! But there’s a big catch: the New York Times is paranoid of diluting its brand equity by mixing up user contributions with their carefully vetted writing. As a result, all comments are moderated, and their aggregation of blogs linking to articles is a limited, proprietary system (Blogrunner). The New York Times wants to have its cake and eat it–all the benefits from users’ active engagement without the costs of diluting their brand.

I think their APIs make this possible, at least in theory. Someone else can now repurpose New York Times content, allow others to annotate it, etc. The concern now is licensing and monetization. Surely the New York Times won’t simply let someone else mirror their site with looser restrictions about user-generated content. Or will it, for the right price?

Money issues aside, can readers get used to the idea that authorship and its associated brand equity are independent of the site on which content appears? Can media brands embrace such a new world? And, if user-generated content starts to blur the line between readers and writers, does a media company morph from a publisher to that of a professional editor-in-chief of a sprawling graph of writers and amateur editors?

These questions, which strike me as some the key questions for the future of newspapers, stuck in my mind as I left Times Open. I’m curious what others think? Are these the right questions? And, if they are, what are the answers?

Uncategorized

Attending Times Open

Just wanted to let readers know that I’m attending Times Open, learning about how the New York Times is opening up its APIs to better engage the developer community. Follow the day remotely (or non-remotely) in real time on Twitter!

Uncategorized

A Reply to All PR People

If you’re not in the public relations industry or have not been emailing me your story ideas for The Noisy Channel, please feel free to skip the rest of this post.

To those of you who have been sending me pitches for your companies or your clients, this post is for you. I’m flattered that I make it onto your list of target media outlets–I can’t deny it’s cool to be so valued when I’ve been blogging for less than a year. And I’m sure my readers are flattered that you value their eyeballs enough to seek them out. But your approach–well, it just isn’t very effective.

Like most bloggers, I’m pretty quick to tune out press releases, or similarly fluffy sales / marketing pitches. It’s pretty easy to identify them from their tone, from the immediate sensation that someone pasted “Dear Daniel” onto a message that was sent out in bulk. As soon as I detect that impersonal tone, I conclude that you have no idea what might interest me, and that you are talking at me rather than with me. While I used to feel guilty about not responding to email with my name on it, I don’t feel the same qualm about ignoring–or even reporting as spam–emails that clearly were sent to a distribution list I didn’t sign up for.

Moreover, bear in mind that blogging is not my day job–in fact, this blog generates zero income for me. Perhaps some reporters are grateful to be spoon-fed content that they can use to fill empty pages. Not me–I’d rather go dark for a week than post content that would annoy my readers. My credibility is my coin.

So please, if you are one of the folks who has been filling my inbox with PR pitches, unsubscribe me from your lists. If you think that I or my readers want to know more about you or your products, write me a personal email to persuade me. I can tell the difference. If writing a personal email is too much investment for you, then reading an impersonal one is surely too much investment for me.

Finally, I can relate to your challenge. When I started this blog, I sent out scores of emails to let people know about it. But I didn’t email in bulk, and every message I sent out was clearly meant for its recipient. It took me much, much longer to write those messages than if I’d simply written one and sent it out to everyone. But the success of this blog is a testament to the human approach.

Uncategorized

Matt Cutts Keeps Google Honest

Post author By Daniel Tunkelang
Post date February 19, 2009
5 Comments on Matt Cutts Keeps Google Honest

The other day, I was shocked to hear that Google was employing a pay-per-post stategy in Japan–precisely the sort of strategy they’ve historically condemned. I was certainly among those crying “hypocrisy”.

Well, to his credit, so was Matt Cutts, head of Google’s Webspam team. In fact, he didn’t just complain–his team did something about it. Via “Google Penalizing Google” at Google Blogoscoped:

head of Google’s anti web-spam team Matt Cutts via Twitter writes, “Google.co.jp PageRank is now ~5 instead of ~9. I expect that to remain for a while.”

Matt Cutts blogged about it himself today, saying “To the extent that I can speak on behalf of Google, I apologize that this happened.” I haven’t met Matt yet (I’m looking forward to meeting him at the SIGIR ’09 Industry Track!), but I’m delighted to see this preview of his integrity.

Uncategorized

Blogs I Read: FXPAL Blog

Post author By Daniel Tunkelang
Post date February 18, 2009

OK, I’ve just started reading it–in fact, they’ve just started writing it! But, given the quality of comments on this blog from FXPALers Gene Golovchinsky and Jeremy Pickens, I’m expecting great things from the FXPAL Blog.

Check out their recent post about “Recall-oriented search on the web“.

Uncategorized

The Sultans of Speed

Post author By Daniel Tunkelang
Post date February 18, 2009

Whatever else you might say about Google, they understand how to engineer web-scale systems. Check out Greg’s notes (or Michael Bendersky’s via Jeff’s Search Engine Caffe) about Google Fellow Jeff Dean‘s keynote at last week’s WSDM 2009 conference.

Here’s the teaser from Greg’s notes:

Jeff gave several examples of how Google has grown from 1999 to 2009. They have x1000 the number of queries now. They have x1000 the processing power (# machines * speed of the machines). They went from query latency normally under 1000ms to normally under 200ms. And, they dropped the update latency by a factor of x10000, going from months to detect a changed web page and update their search results to just minutes.

This is no small feat, and it gives you a sense for the bar that Google has set in the web search market. Students of the Innovator’s Dilemma take note: if you want to beat Google, you’re not going to do it by stuggling to incrementally outdo them on their own turf. For the use cases it addresses, Google is surely good enough. And damn fast.

General

TunkRank and Retweet Rank

While we wait for Jason to iron out the bugs in his TunkRank implementation, I’ve been thinking about the relationship between TunkRank and retweet rank as influence measures.

Here’s my thought: TunkRank assumes in its model that, if X reads a tweet from Y, then there’s a constant probability p that X will retweet it. If this assumption holds true, then the TunkRank of X should be roughly proportional to the retweet rank.

Of course, one of the reasons this assumption might fail is that X is using bots (or bot-like people) to game his or her retweet rank. It’s also possible that the TunkRank assumption about a constant probability of retweeting is too simplistic.

But I’m intrigued at the idea that, subject to the assumptions of its model, TunkRank acts as a sort of ungameable retweet rank.