Category: General

General posts, typically analyzing HCIR issues.

Do Speech-to-Text Readers Need To License Peformance Rights?

Post author By Daniel Tunkelang
Post date February 25, 2009
6 Comments on Do Speech-to-Text Readers Need To License Peformance Rights?

Now that the new Kindle includes an apparently listenable text-to-speech reader, the Authors Guild is crying foul that this feature exploits authors and violates their rights:

Publishers certainly could contractually prohibit Amazon from adding audio functionality to its e-books without authorization, and Amazon could comply by adding a software tag that would prohibit its machine from creating an audio version of a book unless Amazon has acquired the appropriate rights. Until this issue is worked out, Amazon may be undermining your audio market as it exploits your e-books.

In a New York Times op-ed entitled “The Kindle Swindle“, Authors Guild president Roy Blount Jr. says:

What the guild is asserting is that authors have a right to a fair share of the value that audio adds to Kindle 2’s version of books.

In my view, doing so would set a frightening precedent. The speech-to-text transformation is completely mechanical. I have no doubt that the Authors Guild can come up with contract language that forbids applying transformation to their members’ content, essentially as a kind of digital rights management (DRM). But I’d be sad to see this happen. I thought we were moving beyond this stuff.

Besides, this is simply not worth fighting over. Good audio books are dramatic readings, and those will never be possible from anything we’ve seen in mechanical speech-to-text. Perhaps I’m being short-sighted on that front. But I’ll eat my words when AI proves me wrong.

I have no doubt that the author guilds in the various creative industries can find a way to codify these claims in their licensing terms. I just hope that we’re not heading for a direction where private, mechanical transformation isn’t simply part of the fair use package.

General

Are Media Companies Out-Innovating Their Advertisers?

Post author By Daniel Tunkelang
Post date February 24, 2009
4 Comments on Are Media Companies Out-Innovating Their Advertisers?

In “Three Ways the Media is Innovating with New Interfaces“, Micro Persuasion blogger Steve Rubel argues that “media must innovate their way out of this situation from both editorial and sales, but no one seems to be really doing so on the advertising side.” By “this situation”, he means the dismal quality of display ads that is suffering in lockstep with the economy as a whole.

There are lots of people who beat up on media for its mistakes, but it’s interesting that Rubel singles out advertisers. On the editorial side, he praises innovations such as nytxplorer and even “retro” subscriber-based models, like Sporting News Today. He’s also optimistic that media companies will exploit the user experience potential of the iPhone and Kindle.

But he raises the concern that advertising is behind–and this is a major concern if media companies are bound to the ad-supported model. My own hope is that we find a way to move beyond that model. But, if not, then it’s important that advertising catch up with editorial, of all of the latter’s innovation will be in vain.

General

Everything is a Platform

I spent all day Friday learning how the New York Times aspires to become a platform for a brave new world of online news (though they’re still figuring out how to handle user-generated content). Meanwhile, every social network hopes to be *the* platform for social media, be it Facebook, Twitter, or LinkedIn. To be clear, it’s not just that platforms are the new black; rather everyone wants to control whatever is left after Google has exercised its droit de seigneur as the gateway to online information.

The latest entrant on the aspiring platform front is Wikipedia, at least according to Marshall Kirkpatrick at ReadWriteWeb. In a post entitled “Could Wikipedia’s Future Be as a Development Platform?“, Kirkpatrick suggests:

Wikipedia can offer developers opportunities to glean analysis, suplemental content and structured data from its years old store of collaboratively generated information.

He also observes that:

There is no formal Wikipedia Application Programming Interface (API) but the data there is relatively accesible anyway. It can be downloaded and proccessed locally.

Having worked with Wikipedia data, I think that access via download is actually a better option than access via an API, particularly since most APIs come with parsimonious rate limits, e.g., 5,000 requests per day for the New York Times APIs. Indeed, the overwhelming majority of New York Times article data *is* available for download, albeit only under non-commercial licensing terms.

In any case, it’s interesting to see the rush to transform everything–but particularly content–into a platform. I can only imagine the marketing geniuses getting ready for platforms of platforms. Of course, what we really need is for all of this information resources to play together nicely enough that we can seamlessly integrate them into applications (yes, that’s what platforms are supposed to help you build!) without worrying which of them are platforms.

Out of the platform frying pan and into the SOA fire…

General

Reflecting on Times Open

Like everyone else who managed to get into the standing room only event (or at least everyone I had a chance to meet there), I had a great time at Times Open, the New York Times’s coming out party for its APIs. I won’t try to summarize, since Taylor Barstow has already done that superbly (also, check out his nytexplorer app!).

Also, people have commented about the intense Twitter conversations that took place during the presentations. I feel strongly that they added to the event (see my response to Owen Thomas on Valleywag). And, by no coincidence, I want to use this space to talk about what I felt to be the most interesting topic that came up through the event: the handling of user-generated content.

The New York Times is one of the world’s oldest and most prestigious media brands. As their CEO Janet Robinson proudly told us, the paper has won 98 Pulitzer Prizes, more than any other newspaper. Moreover, she added, they have been around for 158 years and plan to be around for another 158 years. For perspective, some folks don’t even give the New York Times 158 days to live! Despite the thei recent financial troubles, I think they’re here for the long haul–and not because I have any financial interest in their success. Rather, it is because I see the vast numbers of people who look to the New York Times as something more than news wire with a pretty logo.

They care about the most personal aspects of the paper, like its columnists. When Tim O’Reilly called attention to science and technology writer John Markoff, everyone turned to him, eager to put a face on a writer whom many of us have been reading since before there was a World Wide Web. With all respect for the Associated Press, I can imagine that they or their staff command this sort of devotion. Personality makes a big difference, and personality comes from people.

But of course there are far more people reading the New York Times than writing it. Those people increasingly want to play a more active role. They do so in various ways today:

Contributing to the “Most Popular” stats by email articles to one another.
Commenting on the select articles that offer this opportunity.
Linking to articles in their blog posts, tweets, and other social media.

Here is an audience eager to participate! But there’s a big catch: the New York Times is paranoid of diluting its brand equity by mixing up user contributions with their carefully vetted writing. As a result, all comments are moderated, and their aggregation of blogs linking to articles is a limited, proprietary system (Blogrunner). The New York Times wants to have its cake and eat it–all the benefits from users’ active engagement without the costs of diluting their brand.

I think their APIs make this possible, at least in theory. Someone else can now repurpose New York Times content, allow others to annotate it, etc. The concern now is licensing and monetization. Surely the New York Times won’t simply let someone else mirror their site with looser restrictions about user-generated content. Or will it, for the right price?

Money issues aside, can readers get used to the idea that authorship and its associated brand equity are independent of the site on which content appears? Can media brands embrace such a new world? And, if user-generated content starts to blur the line between readers and writers, does a media company morph from a publisher to that of a professional editor-in-chief of a sprawling graph of writers and amateur editors?

These questions, which strike me as some the key questions for the future of newspapers, stuck in my mind as I left Times Open. I’m curious what others think? Are these the right questions? And, if they are, what are the answers?

General

TunkRank and Retweet Rank

While we wait for Jason to iron out the bugs in his TunkRank implementation, I’ve been thinking about the relationship between TunkRank and retweet rank as influence measures.

Here’s my thought: TunkRank assumes in its model that, if X reads a tweet from Y, then there’s a constant probability p that X will retweet it. If this assumption holds true, then the TunkRank of X should be roughly proportional to the retweet rank.

Of course, one of the reasons this assumption might fail is that X is using bots (or bot-like people) to game his or her retweet rank. It’s also possible that the TunkRank assumption about a constant probability of retweeting is too simplistic.

But I’m intrigued at the idea that, subject to the assumptions of its model, TunkRank acts as a sort of ungameable retweet rank.

General

Previews of Upcoming Industry Search Conferences

Post author By Daniel Tunkelang
Post date February 15, 2009

Some of you, despite President Obama’s admonition, may have enjoyed FASTForward in Las Vegas last week. Others may be spending $2,295 to attend the Omniture Summit in Utah this week, which includes skiing at Snowbird and a Maroon 5 concert. I certainly hope that those of you who are Endeca customers and partners will be able to attend the more modest Discover ’09 in Boston this June, despite the lack of showgirls, skiing, or “neo-soul” pop bands. I’ll be there, and I promise there will be lots of hands-on sessions, as well as presentations specifically targeting the hot topics in enterprise and site search.

But no one needs to wait that long to get a preview of the upcoming industry search conferences.

As a preview for the Infonortics Search Engine Meeting, which will take place April 27-28 in Boston, Stephen Arnold, known for his Beyond Search blog, has been publishing interviews with some of the speakers. You can find mine here. I’m partial to this format, since I find text a more efficient medium for this sort of content than audio or video. Of course, the latter makes more sense when there’s more to see than one or two talking heads (e.g., my Reconsidering Relevance video).

But some people prefer podcasts (especially people will long commutes!), and those will be happy to know that the Enterprise Search Summit, which will take place May 12-13 in New York, is offering interviews of some of its speakers to give attendees (and potential attendees) a taste of what to expect. Only a few are up there now, but Michelle Manafy tells me that there will be more coming up over the next days.

I’ve toyed with trying to do something similar for the SIGIR ’09 Industry Track (mark your calendars: July 22 in Boston). If I do, I promise that you will be the first to know.

General

Yes, Virginia, Google Does Devalue Everything It Touches

Post author By Daniel Tunkelang
Post date February 14, 2009
47 Comments on Yes, Virginia, Google Does Devalue Everything It Touches

Mike Masnick at TechDirt just published a post entitled “WSJ Editor Claims Google Devalues Everything” in which he objects to Wall Street Journal managing editor Robert Thomson’s claim on the Charlie Rose show that “Google devalues everything it touches.”

His main objections:

This is wrong on so many levels it’s hard to know where to begin. Google doesn’t devalue things it touches. It increases their value by making them easier to find and access. Google increases your audience as a content creator, which is the most important asset you have. It takes a special kind of cluelessness to claim that something that increases your biggest asset “devalues” your business. Thomson’s mistake seems to be that he’s confusing “price” and “value” which is a bit scary for the managing editor of a business publication. Yes, the widespread availability of news may push down the price (that’s just supply and demand), but it doesn’t decrease the value at all. It opens up more opportunities to capture that value.

In a word, no. And he’s wrong on so many levels that it’s hard for me to know where to begin! But I’ll try.

He’s right that Google makes it easy to find a news article, but only in the limited sense that it’s easy to find if you’re explicitly looking for it. That’s only a marginal improvement on the pre-Google world. Google also makes it easy for readers to find commodity information on a particular subject–and frankly, the real innovation there is Wikipedia. Google has never made serious investments in supporting exploratory search.

Google doesn’t do much to help users appreciate the differentiation among competing sources for news–or for products in general. For users, this may achieve a satisficing outcome–with minimal effort, they obtain the gist of the news from good-enough sources. But for content creators, this is commoditization: because the interface de-emphasizes the differentiation, users perceive a set of undifferentiated choices.

Masnick complains that Thomson is confusing price and value, but in fact Masnick is confusing value with breadth of distribution. There are numerous examples where controlling distribution increases value: first-class seating, peer-reviewed publication, and even Google’s PageRank measure. In fact, to the extent that Google helps identify the best sources of information, it adds value, But Google destroys far more value by reducing the notion of value to a single, scalar (i.e., one-dimensional) measure.

By analogy, think of what has happened to the retail industry as comparison shoppers started using online aggregators to compare competitors on price, but not much else. Other dimensions of utility started to lose value–most notably, customer service. Retailers have suffered, and consumers suffer too, no longer able to make trade-offs based on the utility they assign to dimensions that they can no longer observe. What shopbots have done for retail, Google has done for everyone, but most of all for media.

One can reasonably ask why publishers don’t simply opt out of Google, using robots.txt to turn away Google’s crawlers. The answer is that they can’t unless they’re competitors opt out too. Google has lowered the value of content by persuading everyone, en masse, to offer packaging that masks the content providers’ differentiation. Like Wal-Mart, they’ve made consumers happy with lower prices, but don’t be surprised that some content providers are concerned about being strong-armed out of business (cf. Vlasic Pickles).

There’s no point in whining about it, and I commend media providers who are struggling to create value under such hostile conditions. I also know the media players have made many of their own mistakes to help get them into this pickle, not least of which was collectively giving Google so much leverage over them. But let’s dispense with the myth that Google’s gale of creative destruction is creating value for media providers. At best, Google is creating value at their expense.

General

Canonical URLs and Faceted Search

Post author By Daniel Tunkelang
Post date February 13, 2009
1 Comment on Canonical URLs and Faceted Search

Big news from Google, Yahoo, and Microsoft: the three web search leaders announced yesterday that they will jointly support a standard by which a web page can indicate the address of its “canonical” version. By using this standard, a site can avoid the problem of indexing duplicate copies of pages and suffering, from an SEO perspective in terms of how well those pages are indexed.

You can find coverage at:

This is a great development for everyone, but especially for anyone building sites that use faceted search (which should be everyone!). One of the problems we identified early on at Endeca is that faceted search, if implemented naively, can lead to massive duplication of URLs. The whole point of a faceted information architecture is that there are many paths that lead to a given product or document page.

For example, consider a page that is associated with values from 10 facets. There may be 10! = 3,628,800 ways to reach it–and that’s assuming that none of the facets are hierarchical. In fairness, it also assumes that none of the paths contract from implicit selection. Regardless, the number of paths is large enough to be a problem for SEO if each path receives its own URL.

Endeca recognized this problem a while ago, and addressed it through what we call “URL beautification”–our own means of canonicalizing URLs that, in addition to deduping the multiple paths, has the side benefit of creating URLs that are SEO-friendly.

Nonetheless, my colleagues and I are delighted to see the major web search engines recognizing this problem and making it easier for everyone to solve it. It’s a rare day to see Google, Yahoo, and Microsoft working together, but it’s nice when it happens. Good thing they got the news out before “Be Evil” day!

General

Think Evil

Every now and then, I think about ways to subvert the ad-supported model, particularly for web search. It’s my token resistance to the tyranny of free. Some of my thoughts undoubtedly qualify as evil. And today, Friday the 13th, feels like an appropriate day to let my evil side take over the blog.

A few years ago, when it became clear that Microsoft was losing the search wars to Google–but when they hadn’t lost much browser market share to Firefox–I thought they should have used a scorched earth strategy of including an ad-blocker in Internet Explorer. The ad blocker would be on by default and would block all ads, including sponsored links from search engines. Actually, I can’t bring myself to consider this particular approach evil–from my perspective, the means would justify the end. I can only speculate about how the antitrust courts would have reacted to this browser enhancement.

But, even after Microsoft missed its chance to make ad-blocking an above-board feature, there was still an opportunity to let others do the job. I imagined a virus whose sole function, beyond propagating itself, would be to install ad blockers on the machines of its “victims”. Somehow I doubt there would be much of an outcry from users, and even the eradication of this virus might take long enough that many users would be introduced to ad blocking and find it attractive. I imagined that, before Google negotiated with them, the Chinese government might have considered this strategy themselves as a preemptive strike. In any case, there is no lack of virus writers around the world who could implement such a scheme, and some of them live in countries with even worse economies than the United States.

Finally, it occurred to me that a more subtle variation of this strategy would be to leave the ads intact, but route clicks directly to the advertised links, bypassing the search engines. In the immediate terms, users would not notice a difference, but search engines would get no clickthrough data, and thus could not charge the advertisers. Unchecked, such an approach could destroy the pay-per-click (PPC) model.

I truly doubt that any of the above will come to pass. But I can still dream my evil dreams. Happy Friday the 13th!

General

Social Media: Making it Measure Up

Post author By Daniel Tunkelang
Post date February 12, 2009
1 Comment on Social Media: Making it Measure Up

This morning, I was privileged to attend a social media breakfast hosted by Crimson Hexagon at the Roger Smith Hotel in New York. I wasn’t sure what to expect, other than breakfast. Breakfast was excellent, but the real fare came from the speakers.

First up was Brad McCormick from Porter Novelli. He had my attention at the first mention of Duncan Watts. His best take-away: brands vastly overestimate the extent to which consumer pay attention to branding. He knows this because he’s measured consumer response, at least in the context of an unnamed client in the grooming industry.

Then came Shiv Singh from Avenue A | Razorfish. His thesis was that online brand success factors differ from those of offline brands, alluding to a presentation from fellow Razorfisher Joe Crump on “Digital Darwinism“. He then performed some live (if informal) market research to see if the audience shared his concerns about the future of social media in the branding industry. He found consensus on the concern that metrics are major challenge in establishing credibility for the business of social media. The more controversial issue was the relative value of online vs. offline word of mouth marketing. I have my own point of view on this, but I’ll save that for a future post.

The final speaker was Melanie Notkin, founder and CEO of SavvyAuntie.com. I’d never heard of SavvyAuntie before, but perhaps that’s because I’m not in their target demographic: the roughly 50% of American women who don’t have children of their own. Her presentation was phenomenal, and I can’t do justice to it here. But I’ll try to give you a taste of this case study in success through social media.

First off, she established a clear brand: Professional Aunts, No Kids (PANK) and “playful luxury”, targeting non-moms with discretionary income.She then used social media–particularly her blog and Twitter–to rapidly perform field research and build brand recognition. Some gems: “aunt farm”, “auntrepreneur”. And perhaps the best take-away about the hype about “community”: “Community is for those who want it.”

All in all, it was two hours well spent, and I am grateful to Noisy Community member Perry Hewitt for inviting me!