Category: General

General posts, typically analyzing HCIR issues.

Books! Books! Books!

When my daughter was born almost two years ago, I wondered if she’d grow up reading books. After all, I do most of my reading online, and increasingly find myself reading short articles rather than whole books. Needless to say, she’s loved books so far, even if she’s shredded a few.

But the bigger surprise for me is that books–specifically e-books–have become such a hot industry. When I briefly worked for a consulting firm after grad school in 1999, my first assignment was to evaluate the e-book market. The readers then consisted of the Rocket ebook and SoftBook Reader. Needless to say, I correctly predicted at the time that the ebook-market wasn’t ready for prime time.

But fast forward to the present. Amazon has given the e-book market some credibility: Citigroup says they sold 500K Kindles in 2008, and Forrester predicted they will sell 1.8M units this year.

But the last days (and even the last 24 hours!) of news show that the e-book market is only starting to open up:

In May, Sony, whose e-reader sales have lagged behind the Kindle, announced a partnership with Google in May in order to make copyright-free books available for free.
Google just announced a service called Editions that it plans to launch in 2010 (by when it will have presumably settled the Google Books Settlement Agreement).
The Internet Archive just announced the Bookserver project as “a growing open architecture for vending and lending digital books over the Internet”.
Spring Design just announced Alex, an e-book reader based on Google’s Android operating system.
Barnes & Noble is expected to announce an e-reader that competes directly with the Kindle and has generated lots of buzz through leaked photos.

I grew up on books, and I’m excited to see that, a decade after the initial market failures, e-books (like touchscreens) are a mainstream reality. I still worry about who will buy them, especially considering that the marginal cost of distributing a typical e-book is even less than that of distributing a 5-minute song. A quick scan of a popular file-sharing site reveals that the pdf version of bestseller The Lost Symbol takes up less than 3MB.

Still, I’ll take a moment to celebrate the progress of technology. I’ve always known that reading was cool, but now we have the gadgets to prove it!

Rocket ebook and Softbook.

General

Who Will Buy?

As some of you know, I’m a karaoke junkie. But it’s my wife who has the classier repertoire, including “Who Will Buy?” from the musical Oliver!:

Who will buy this wonderful morning?
Such a sky you never did see!
Who will tie it up with a ribbon
And put it in a box for me?

Of course, the trope that the best things in life are free predates musical theater, let alone the web. But recent years have witnessed dramatic changes in our price sensitivities in every genre of digital (or digitizable) content, and I’m curious (sometimes morbidly so) about where it goes from here.

I won’t make you suffer through a rant about the malaise of the music and news industries–those topics, important as they are, have been overplayed in the blogosphere. If you need a refresher, I suggest Lawrence Lessig and the Nieman Journalism Lab as some of the more rational voices contributing to the discussion.

But it’s not just news and music that are experiencing the effects of the “information wants to be free” movement. Consider these industries:

Books. Many publishers worry that the Kindle has been setting a consumer expectation that a book should only cost $10. Indeed, a recent price war between Amazon and Wal-Mart drove some of those prices down to $8.99. Is this a boon for consumers, or a body blow to the publishing industry? It’s easy to evoke the $0.99 / per song expectation set by iTunes–but that change was more about disaggregating albums than about changing the per-unit cost. Besides, books have not yet had to confront the scale of unauthorized distribution that we see in the music industry. Legal or not, free is a potent source of price pressure.
Software. Wolfram Alpha just made headlines by releasing a $50 iPhone app. Many have reacted that such a high price is outrageous and will doom the application to failure. They may be right on that latter point–the market will vote with its clicks soon enough. But I’m old enough to remember $50 as being in the ballpark of what it cost to purchase a new consumer software application. Even then, unauthorized distribution was an issue–remember the “don’t copy that floppy” campaign? Today, my impression is that few people consciously purchase consumer software–a trend that I at least date to Microsoft’s strategy of bundling its software into PC purchases. The most noted exceptions are console games (which are impressive holdouts in the consumer software space) and iPhone apps–with the caveat that only a tiny minority of apps make enough money for the creators to live on. (Update: just saw this note about how EA Sports President Peter Moore sees the current console game business model of cartridges and discs as a “burning platform”.)
Television. Between Boxee and Netflix, there is a real chance that digital content’s cash cow, cable television, will see its regional monopolies disrupted. I can’t imagine that anyone will shed a tear for the cable companies. And yet I can’t help but wonder what happens as the notion of premium content is subsumed by an expectation that video content should be free. Are we heading towards a proliferation of cheaply produced reality TV, contests, and game shows–all sponsored by rampant product placement?

If we are to believe Mike Masnick, then the price of content is driven to its marginal cost. It’s pretty clear that the marginal cost of distributing most digital content is, while not free, close enough to be a rounding error. Should we be looking forward to a world where no one can charge consumers for content? Folks like Jeff Jarvis and Chris Anderson are cheerleading such a world as not only inevitable but a good thing–though both of them have had the sense to make some money on non-free books while the going is good.

Yes, there are and will always be business models to support content creators. In particular, one-time content (live events, consulting services) has some degree of insulation from the inexorable trend toward free. But what an inefficient turn of events, if people are rewarded for creating one-time content but not for creating far more valuable content that is useful to a broad audience of consumers!

I know that there are non-financial incentives that drive scholars, open-source developers, and activists to create free content. Indeed, I personally write this blog without any direct financial incentive. Perhaps these incentives will be the driving forces for content creation in the 21st century. One way or another, I hope we find a way to fund the things we value, rather than devolving into a locally optimal rut where value creation isn’t economic for the creators.

p.s. You can find the lyrics to Oliver for free online, and you can easily view an free (unauthorized) copy of a performance of “Who Will Buy?” on YouTube. Or you can buy the song for $0.99.

General

Innovation at Huffington Post: Data-Driven Headlines

Post author By Daniel Tunkelang
Post date October 15, 2009
31 Comments on Innovation at Huffington Post: Data-Driven Headlines

The other day, I was suggesting to one of my colleagues that Endeca‘s software could help authors write better (translate, more SEO-friendly) headlines. The details of that discussion are proprietary, but I’m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their right-brain creativity.

But apparently the Huffington Post is blazing new trails in this area. The Nieman Journalism Lab reports that:

The Huffington Post applies A/B testing to some of its headlines. Readers are randomly shown one of two headlines for the same story. After five minutes, which is enough time for such a high-traffic site, the version with the most clicks becomes the wood that everyone sees.

NJL also reports that Huffington Post social media editor–and long-time Noisy Channel reader–Josh Young uses Twitter to help crowd-source better headlines.

I’m sure this approach must rattle some old-school journalists. And there is a real danger of optimizing for the wrong outcome. For example, including the word “sex” in this message might improve its traffic (the popularity of this post attests to that), but to what end?

Still, I don’t see this use of technology as cramping anyone’s style. Most of us write to be read–especially those in the media industry who are trying to monetize their audiences. Measurable success matters, and there’s no harm in trying to maximize it.

General

Are Duplicate Tweets Spam?

Post author By Daniel Tunkelang
Post date October 15, 2009
3 Comments on Are Duplicate Tweets Spam?

The Twitterverse is all a-twitter with a new controversy: Twitter has rolled out a new feature that blocks duplicate tweets. They reported to the SocialOomph blog that:

Recurring Tweets are a violation no matter how they are done, including whether or not someone pays you to have a special privilege. We don’t want to see any duplicate tweets whatsoever- They pollute Twitter, and tools shouldn’t be given to enable people to break the rules. Spinnable text seems to just be a way to bypass the rules against duplicate updates and essentially provides the same problems.

Hence, from Thursday, October 15th, 2009, 00:00 AM CST we will prevent the entry of recurring tweets on Twitter accounts within the SocialOomph system. Existing recurring tweets on Twitter accounts will all be placed in paused state at that time, so that the content of the tweet text is still accessible to you, but no publishing to Twitter of those tweets will take place.

Not everyone is thrilled with this new feature. My friend (and Noisy Channel reader) Eric Andersen notes: “this doesn’t make a lot of sense to me – many highly regarded Twitter users (e.g. @GuyKawasaki) regularly re-post tweets…primarily because of the “dip” model: re-posting the same tweet means more people will see, especially with an int’l audience.”

On one hand, I loathe inefficient communication, and I see repeated tweets as exposing the inefficiency of the dip model. We won’t get into my differences of opinion with Guy Kawasaki. If Twitter offered better search and control to users, then I think it would make sense for them to consider duplicate tweets as a spam issue.

On the other hand, Twitter search is crude. And the dip model, much as it may raise my personal hackles, is, in fact, what many users embrace. Twitter takes pride in letting users drive innovation, and I think they should be cautious about being too autocratic. Surely many of the people who post duplicate tweets do so with unspammy intentions.

Let’s face it: Twitter is going through growing pains, even if it just inherited the mother of all trust funds. They really do have to address spam. But they might consider doing so in a less heavy-handed way. I suspect that duplicate tweets are mainly a problem because they affect the statistics for Trending Topics–a problem they could easily address without prohibiting the tweets themselves. Better search would make it users to take charge of the user experience–a small dose of HCIR would go a long way.

I think Twitter has the best of intentions, and that it is confronting a real problem. I hope they work harder to find the right solution.

General

Structured Search Is On The Table

Post author By Daniel Tunkelang
Post date October 13, 2009
15 Comments on Structured Search Is On The Table

Freebase. Wolfram Alpha. Google Squared. I hesitate to declare a trend, but there does seem to be a growing interest in more structured approaches to information seeking.

The latest entry is Factual, launched today by Gil Elbaz. Elbaz is no slouch: in 1998, he and Adam Weissman co-founded Applied Semantics (originally known as Oingo) and built a word sense disambiguation engine based on WordNet. In 2003, they sold the company to Google for $102M, where it became the bases of their very lucrative AdSense offering.

According to Factual’s website:

Factual is a platform where anyone can share and mash open data on any subject. For example, you might find a comprehensive directory of restaurants along with dozens of searchable attributes, a huge database of published books, or a list of every video game and their cheat codes. We provide smart tools to help the community build and maintain a trusted source of structured data.

Factual’s key product, the Factual Table, provides a unique way to view and work with structured data. Information in Factual Tables comes from the wisdom of the community and from our powerful data mining tools, and the result is rich, dynamic, and transparent data.

You can read more detailed coverage in Search Engine Land, TechCrunch, ReadWriteWeb, GigaOM, and VentureBeat.

To me, Factual sounds like a hybrid between Freebase and Many Eyes. And, like both, it’s free (as in free beer). Free cuts both ways: the Factual site states clearly that “There is currently no way for us to help you monetize these tables.” As with many companies at this stage, the business model is TBD.

I have mixed feelings. I like the increasing interest by startups in structured search. It’s a step in the right direction, since structure is a key enabler for interaction. But we already have one Freebase (and even Google Base), and it’s not clear that we need yet another company to enable crowd-sourced submission of structured data. Perhaps what we need is a way to incent the sort of behavior that has made Wikipedia so successful. As my colleague Rob Gonzalez (who is rumored to have a blog in the works) is always happy to point out, structured data repositories are a public good that no one is ever willing to pay for. The current best hope seems to be the Linked Data initiative, which sounds great in theory–though I think the jury is still out on whether it will succeed in practice.

My ambivalence aside, I am excited that some of the greatest minds in computer science are focused on bringing more structure to the information seeking progress. Even if some of these efforts prove to be false starts, we’re going in the right direction. Structured search is on the table.

General

Google Is Sharpening Its Squares

Post author By Daniel Tunkelang
Post date October 9, 2009
7 Comments on Google Is Sharpening Its Squares

As some of you may remember, I’m excited about Google Squared, a project I see as a great first step toward exploratory search at a web scale. Yes, I know that Duck Duck Go, Kosmix and others are already taking on this challenge, but it makes a difference to see Google throw its weight behind such an ungoogley initiative. Plus Google Squared is ambitious, to say the least–the input is free-form text and the output is highly structured.

Since I’ve beaten up Wolfram Alpha for is overreliance on NLP, I can’t give Google a free pass. It would be nice to be able to give Google Square more structured guidance (yes, I’m still an HCIR fanatic). But Google Squared seems to achieve far more robust query interpretation than Wolfram Alpha’s–perhaps because supporting exploratory search is less brittle than question answering.

The quality of the tables that Google Square produces as results is still spotty, but it is a major improvement from the initial release. To those who wrote off Google Squared in June, I suggest you take a second look.

General

Is Twitter Planning To Monetize The Firehose?

Post author By Daniel Tunkelang
Post date October 8, 2009
9 Comments on Is Twitter Planning To Monetize The Firehose?

A few months ago, I wrote in “The Twouble with Twitter Search“:

But the trickle that Twitter returns is hardly enough.

I believe this limitation is by design–that Twitter knows the value of such access and isn’t about to give it away. I just hope Twitter will figure out a way to provide this access for a price, and that an ecology of information access providers develops around it. Of course, if Google or Microsoft buys Twitter first, that probably won’t happen.

Now that Twitter has raised $100M at a valuation of $1B, I doubt any acquisition will happen anytime soon. But, according to Kara Swisher’s unnamed sources:

Twitter is in advanced talks with Microsoft and Google separately about striking data-mining deals, in which the companies would license a full feed from the microblogging service that could then be integrated into the results of their competing search engines.

If so, then it’s about time! How much either Microsoft or Google would pay for this feed is an interesting question. It’s probably not a coincidence that Twitter raised its last round of funding before pursuing this path–the revenue they obtain this way could be significant, but is unlikely to justify a $1B valuation.

In any case, I’m excited as a consumer that Twitter may finally allow Google and Microsoft to better expose the value of its content. But I’m also curious what my friends on the Twitter Search team think of the potential competition from the web search titans. Until now, no one has been able compete effectively with Twitter’s native search because of lacking access to the firehose. Having such access would give Google and Microsoft more than a fighting chance. Given the centrality of search to Twitter’s user experience, it’s an interesting corporate strategy.

General

Jeff Jarvis and Matt Cutts on the New FTC Blog Regulations

Post author By Daniel Tunkelang
Post date October 5, 2009
25 Comments on Jeff Jarvis and Matt Cutts on the New FTC Blog Regulations

As has been anticipated for a while–and discussed during the Ethics of Blogging panel–the United States Federal Trade Commission (FTC) has published explicit guidelines regarding how bloggers (at least within its jurisdiction) must disclose any “material connections” they have to the companies they endorse. The full details are available here.

There have been a number of reactions across the blogosphere, but I’d like to hone in on two opposing views: those of What Would Google Do author (and blogger) Jeff Jarvis and Googler Matt Cutts.

Jarvis describes the regulations as “a monument to unintended consequence, hidden dangers, and dangerous assumptions…the greatest myth embedded within the FTC’s rules [is] that the government can and should sanitize the internet for our protection.”

Commenting on Jarvis’s post, Cutts replies:

As a Google engineer who has seen the damage done by fake blogs, sock puppets, and endless scams on the internet, I’m happy to take the opposite position: I think the FTC guidelines will make the web more useful and more trustworthy for consumers. Consumers don’t want to be shilled and they don’t want payola; they want a web that they can trust. The FTC guidelines just say that material connections should be disclosed. From having dealt with these issues over several years, I believe that will be a good thing for the web.

It’s a fascinating debate, and I can see merit in both sides. Like the folks at Reason, I lean libertarian (at least on issues of freedom of expression) and am not eager to see more government regulation of online speech. That said, I see the value of laws requiring truth in advertising, and I don’t see why pay-for-play bloggers should get a free pass if they are acting as advertisers. Interestingly, Jarvis’s response to Cutts is: “I trust you to regulate spam more than the FTC. You are better at it and have more impact.” That’s probably true today, but wouldn’t want to invest that responsibility in a company that makes 99% of its revenue from advertising.

Everyone in this discussion sees the value of transparency–the question is whether it should be a legal norm enforced through FTC regulation or a social norm enforced by the marketplace. Despite my general skepticism about regulation of expression, I temper my libertarianism with a dose of pragmatism. For example, I’m glad that the Food and Drug Administration (FDA) at least tries to regulate health claims–its efforts may not eliminate quackery, but they surely reduce the problem.

Do we need FTC regulation in order to tame the jungle of social media? For that matter, will regulations have a positive effect, or will sploggers and other scammers simply ignore them–and perhaps even more offshore? I share Jarvis’s fear that the regulation will cause more harm than good–perhaps even having chilling effects on would-be bloggers. Certainly the FTC will have to use its new power wisely–both to avoid trampling the existing blogosphere and to not scare off newcomers. Still, if the FTC shows that it is only out to get true scammers, it may help establish, in Cutts’s words, a web we can trust.

I’m Daniel Tunkelang, and I endorse this blog post.

General

Software Patents: A Personal Story

Post author By Daniel Tunkelang
Post date October 3, 2009
60 Comments on Software Patents: A Personal Story

Given the radioactive nature of this post’s subject matter, I feel the need to remind readers that this is not a corporate blog, and that the opinions expressed within are my personal opinions, not those of my employer. Also, please understand that I cannot comment on any intellectual property issues specifically related to my employer.

With that preamble out of the way, let me tell you a true story. The other day, I received a phone call from a friend who has been building a kick-ass startup. That friend had been contacted by a much larger competitor with what amounted to an ultimatum: shut down and come work for us, or we’ll crush you with a patent infringement suit. My friend’s startup didn’t cave in–in fact, my friend even went through the trouble of sharing a pile of incontrovertible prior art with the competitor. The competitor was unimpressed, and my friend’s startup is now facing a potentially ruinous lawsuit.

If you know any of the characters in this story, I beg you to keep that information to yourself–at least for now. I’d like my friend to have a chance of getting his company out of this predicament, and premature publicity might hurt his case.

But back to the case: let me give you an idea of how a story like this can play out. At a high level, the startup can choose to fight or not fight.

Not fighting means the entrepreneurs writing off their startup, but it allows them to move on and try something new. It might be the best career move for the entrepreneurs, but it means that the world loses a promising startup, and the surrender rewards bad behavior, reinforcing a regime where innovators can’t afford to compete with more established players.

Fighting means mounting a non-infringement defense, an invalidation defense, or both.

A non-infringement argument asserts that, regardless of the validity of the patent, its claims don’t cover what the startup is doing. Since patents carry a presumption of validity, the non-infringement route is appealing–there’s no need to slog through the much longer invalidation process. Leaving a bad patent alive may be a worse outcome for the rest of the world, but entrepreneurs don’t have the luxury of taking the weight of the world onto their own shoulders.

Unfortunately, the very characteristics of a bad patent make it hard for an accused infringer to succeed in a non-infringement argument. If a patent is overly broad, then it’s more likely that the infringement argument will be valid (but not sound, since the patent itself is–or should be–invalid). Vaguely worded claims are also a problem–while a patent examiner may have granted a patent based on one interpretation of the claim language, the patent holder may now be asserting infringement under a different (and typically broader) interpretation of that same language.

As a result, a non-infringement argument often depends almost entirely on the result of a Markman hearing, more formally known as a claim construction hearing. In such a hearing, a judge decides how to interpret any language in the claim whose meaning is contested by the opposing parties in the suit. Such a hearing is often a crap shoot for the accused infringer. An unfavorable result which supports the infringement accusation may ultimately help invalidate the patent, but the results are likely to come too late–justice delayed for a startup is often an extreme case of justice denied.

Which brings us to the invalidation route. In theory, invalidation is the right approach to take when confronted with an invalid patent. Ideally, the accused infringer presents prior art to the patent office to reexamine the patent, resulting in the patent either being invalidated or rewritten to have a much narrower scope. In practice, however, this approach requires significant effort, time, money–especially if you depend on lawyers to do the heavy lifting–and luck. The best hope is to rapidly request and obtain a reexamination, and then to request and obtain a stay of the infringement suit pending reexamination. Needless to say, the patent holder will fight tooth and nail to avoid this outcome.

I don’t know how my friend’s story will end. But, as the above analysis should make clear, he’s between a rock and a hard place. Whether or not you believe that there should be software patents–and there is room for reasonable people to debate this question–I hope you agree that the situation my friend is facing amounts to legalized extortion. I understand that no system is perfect, and that our legal system requires compromises that have inevitable casualties.

Nonetheless, my friend’s story does not feel like an isolated incident, but rather evidence of a systemic problem. There are a lot of software patents floating around right now of dubious validity, many of them granted to companies that have since folded and have unloaded their assets in fire sales. It would be sad for this supply of ersatz intellectual property to impede the real innovation that the patent system was intended to protect.

Update: this post has been picked up by Y Combinator’s Hacker News.

General

Privacy, Pseudonymity, and Copyright

Post author By Daniel Tunkelang
Post date September 29, 2009

A lunch conversation during the Transparent Text symposium about transparency in social media (also a hot topic in the Ethics of Blogging panel) led me to watch the following presentation from Lawrence Lessig on “Privacy 2.0“:

http://blip.tv/play/lG372wMC

Another topic in that conversation was pseudonymity. Someone pointed to a 2000 USENIX paper entitled “Can Pseudonymity Really Guarantee Privacy?” The challenges of implementing pseudonymity have, of course, received lots of attention in the past few years. The most notorious example is the AOL search data scandal, which made the front page of the New York Times. But there’s also the work co-authored by my friend Vitaly Shmatikov on de-anonymizing Netflix data. Indeed, some have expressed concern that the new Netflix competition is a privacy lawsuit waiting to happen.

Finally, danah boyd‘s master’s thesis on “faceted id/entity: managing representation in a digital world” also came up–and I recently discovered by way of Robert Scoble that she’ll be keynoting at SXSW next year. Now I feel even more proud that I convinced her to speak at the SIGIR Industry Track this year. But I digress.

What does any of this have to do with copyright? Watch Lessig’s presentation–it’s long, but I promise you it’s worthwhile and entertaining to boot. Besides, I’ve made it easy by embedding it for you! He makes an analogy–rather, he makes fair use of Jonathan Zittrain‘s analogy–between privacy rights and copyright.

The executive (and overgeneralized) summary is that both privacy-holders (“consumers”) and copyright-holders (“industry”) have complained that technology has undermined their rights, and both have sought out legal remedies. Consumers push back on industry, frustrated with legal strategies to enforce copyright at the expense of consumer freedom, preferring instead to let technology dictate policy; industry pushes back on consumers, frustrated with their legal strategies to enforce privacy rights at the expense of industry freedom, in this case preferring instead to let technology dictate policy. The analogy may not be perfect, but it is close enough to be compelling.

But I’d like to stretch the analogy further than Lessig and Zittrain to consider pseudonymity and derivative works. The pseudonymity challenge (e.g., the recent reports about Project Gaydar) remind us that privacy isn’t binary, and that we have to accept at least some loss of privacy if we are going to live in a social world. Similarly, provisions like fair use exist because copyright is an inherent trade-off between protecting creators’ rights and embracing the value of creation in a social context.

As I said, I find the Zittrain’s analogy and Lessig’s presentation compelling. While it may not answer any of society’s urgent questions about privacy and copyright, it may at least further the conversation. At the very least, I hope the topic is intellectually stimulating.