Month: December 2008

The Noisy Channel: Now Better Than Sex!

Post author By Daniel Tunkelang
Post date December 12, 2008

Well, I’ll admit the evidence is a bit shaky. But an online survey commissioned by Intel reports that about half of women and a third of men would rather go without sex for two weeks than give up the Internet for that long. I’m not quite sure what to make of the survey, or the premise that the survey sought to prove “how essential the Internet has become to people–even during tough economic times.”

All I can say is that, if you are spending your time on the Internet, I hope you are enjoying The Noisy Channel.

General

Computational Information Design

Post author By Daniel Tunkelang
Post date December 12, 2008

Tonight I had the good fortune to attend a talk by Ben Fry on Computational Information Design at the Broad Institute of MIT and Harvard. Ben Fry is one of those rare human beings whose work spans from the heart of academia (he’s worked with Eric Lander on visualizing genetic data) to popular culture (he work appears in Minority Report and The Hulk). And he’s an outstanding speaker.

The content of his talk reflected his dissertation work at the MIT Media Laboratory, his postdoc work the Broad Institute, and some of his more recent work as a designer and consultant. I can’t do justice to the talk, which unfortunately is not available in any recorded form. But I do suggest you seize the opportunity to hear him speak, should it come your way. He communicates the power of visualization through examples, in a way that conveys both their practical value and their beauty.

The Q&A session was almost as long as the talk, and probably could have gone on indefinitely if the organizers hadn’t finally cut it off. Suspecting that I was one of the few non-academics in the audience, I asked two eminently practical questions: how do you know that a visualization is effective, and how d you guard against a visualization skewing your perception of the data?

Fry’s answers were incisive. He judges the effectiveness of a visualization based on whether people give up their previous tools to use it. And he selects problem areas where he sees a significant opportunity to improve the state of the art. That way, the difference in adoption is so obvious that you don’t need to perform user studies to observe it.

As for concern with visualization skewing perception of the data, he acknowledges it as a valid concern but points out that we don’t seem to raise the same concern with non-visual (e.g., textual) data presentation. Somehow we are especially suspcious of aesthetic representation, a sort of “don’t hate me because I’m beautiful” bias. He adds that the risk of design skewing our perception is dwarfed by the cost of not designing at all.

Visualization is a tricky subject, and I’ll freely admit that I’m underwhelmed by much of the work I’ve encountered. Perhaps my past work in information visualization makes me a particularly harsh critic. But Fry presents a compelling picture–or rather, a compelling video, since his work is full of motion. My only complaint is that he hasn’t explored the world of search and information retrieval. His work seems to beg for application in that domain. Food for thought.

Uncategorized

Upgraded to WordPress 2.7

Just wanted to let readers know that I’ve upgraded to the latest version of WordPress, 2.7 (“Coltrane”). Please let me know if you experience any technical difficulties.

General

This is not a corporate blog

Post author By Daniel Tunkelang
Post date December 10, 2008
17 Comments on This is not a corporate blog

To paraphrase René Magritte, ceci n’est pas un blogue corporate (this is not a corporate blog).

Why do I bring this up? Because today I saw a post by Richard MacManus on ReadWriteWeb entitled “Report: Corporate Blogs Not Trusted” and a similar post by Joe Wilcox about how to “Make Your Corporate Blog Believable“. They cite a report from Forrester that company blogs represent the least trusted information source, down at 16%. Actually, personal blogs don’t fare much better at 18%, but I’d like to use this report as a pretext to talk about what it means to blog as an industry professional.

Frankly, I don’t think corporate blogs, at least as they are conventionally understood, are a good idea. Companies put out press releases that no sane person would trust as an objective information source. A corporate blog is just a repackaging of a press release web page, trying to masquerade as something more hip. I don’t think it fools anyone, and I’m not aware of any corporate blogs other than the Official Google Blog that have significant readership.

Bloggers are people who speak with individual voices. And industry professionals are still people, regardless of their corporate affiliations. I am no more the voice of Endeca’s public relations department than Greg Linden is the voice of Microsoft’s or Matt Cutts the voice of Google’s.

Of course, I have my point of view, which unsurprisingly has some alignment with my employer’s overall vision. When I advocate for HCIR, I don’t make excuses for the fact that HCIR underlies Endeca’s approach to information access. But I speak as an individual and in my own voice. When I blog, I put my own credibility on the line, and I cultivate a reputation that extends beyond my corporate affiliation.

I think the interesting question for companies is not whether they should publish corporate blogs, but rather whether they should encourage their employees to publish personal blogs that relate to the work the company does. As someone who has been involved in the development of Endeca’s core intellectual property, I understand the reservations that companies have about letting their employees publish. But I think that companies are often too conservative, and incur an enormous opportunity cost in the name of protecting trade secrets. Letting employees blog (and, more generally, publish) not only provides the companies with free marketing, but also provides employees with an avenue for personal development.

I’d be curious to hear perspectives from readers here who work for companies. Perhaps I’m lucky to work for an enlightened employer; do most corporate citizens get the memo from Legal saying that blogging is something only the marketing department should do?

Uncategorized

Freemium is the new black

An article by Claire Cain Miller in today’s New York Time proclaims: “A Web Start-Up Counting on Ad Sales? Good Luck“. The article isn’t kind to the ad-supported model in general, but the particular concern is for startups. The article quotes David Weiden from Kholsa Ventures:

“The ad model is somewhat worse but not radically worse,” he said. “What’s worse is getting funded that way.” If a company approaches investors with a plan to lose money for three or four years while building an audience, it will encounter many closed doors, he said. “It’s gone from plausible to almost implausible.”

The preferred alternative is the “freemium” model: offer basic services for free, and upsell advanced or special features. I marvel how it’s controversial to talk about charging for services. But that’s what happens in a world where people have come to expect information to be free–an expectation that the current economic conditions will surely reinforce.

Uncategorized

Why the SEO stakes are so high

Post author By Daniel Tunkelang
Post date December 8, 2008
2 Comments on Why the SEO stakes are so high

According to an article published today in IT Business Canada:

The typical Web site gets 61 per cent of its traffic from organic (nonpaid) search engine results, and 41 per cent of all traffic from Google alone.

In part, these numbers reflect Google’s dominance in web search–41/61 is a whopping 67%, which is within epsilon of Google’s reported share of the web search market. But the larger point is that most sites depend on web search for the majority of their traffic, which makes search engine optimization (SEO) a matter of life or death for commercial sites in general, but especially online retailers, publishers.

So it’s not surprising that SEO is a multi-billion dollar industry, comparable in size with the pay-per-click (PPC) advertising industry. And, to the extent that the SEO industry is helping to organize the world’s information, it’s earning its keep. But it’s hard to know how much SEO improves the efficiency of the information market vs. how much simply fuels an arms race. Again, full disclosure: I am one of the arms dealers.

General

Overwhelmed by Email?

Normally, I don’t post about press releases that people email me. But in this case, the title, “Half of Americans Are Overwhelmed by E-Mail“, hit far too close to home. Having spent the day catching up on a week of email, I’m feeling more than a little overwhelmed. And it’s made me think hard about imposing more discipline on how I manage email.

I’m in no immediate danger of declaring email bankruptcy, but I have reached a point where my ad hoc approach to managing email–particularly to checking email as it arrives–costs me so much in productivity that I am considering reducing the frequency with which I check email to once or twice a day.

One might ask if there’s anything unique about email as a source of context switching. Don’t the same issues apply to news feeds, Twitter, etc? And there’s instant messenger–which is intended to trigger an immediate context switch. Why single out email?

My suspicion is that email satisfies an unholy mix of properties:

A substantial fraction of email is personal and important, and there are no reliable automatic ways of identifying this fraction.
The sender’s expectation of how long to wait for a response varies widely–from as soon as possible (e.g., one-line bodyless emails used as instant messages) to days or even indefinite (e.g., an FYI email that does not require a response).
The typical sender sees email as the least invasive way to communicate, and therefore uses it as the default means of doing so.

The result: you (or at least I) end up with a relentless queue of email, faced with a choice of looking at all of it frequently, or likely deferring something urgent.

Of course, there are conventions, like marking emails as urgent, that are intended to sort out some of the above. But it isn’t realistic to expect everyone to use these consistently, at least not this late in the game.

Perhaps the answer is, as was suggested in an earlier post, to make it public. Specifically, Tantek suggests to “Move as much 1:1 communication into 1:many or 1:all mediums.” At some level, that is counterintuitive–after all, doesn’t that just make my problem everyone’s problem? But the key is that public communication sets different expectations. I might know the answer to your question, but so might any number of other people, so let’s balance the load.

It’s a nice idea, though it’s not clear how anarchic 1:many and 1:all mediums can accomplish this load balancing efficiently. But arguably that just an implementation detail. The first step is to calibrate the specificity of distribution to the specificity of the information / communication need.

Where does this leave me and others who are overwhelmed by email? For now, stuck with heuristics like disciplined management. For the long haul, advocating for more scalable social norms.

p.s. Ironically, this post is a public response to a private email.

Uncategorized

Noisy Channel, Back on Manual

Post author By Daniel Tunkelang
Post date December 6, 2008

As the Captain says in WALL-E, “AUTO, you are relieved of duty!” It’s good to be back in the blogger’s seat, so stay tuned for fresh content coming up this week.

Uncategorized

Humans and Machines: Collaborators or Competitors?

Post author By Daniel Tunkelang
Post date December 3, 2008
3 Comments on Humans and Machines: Collaborators or Competitors?

Last week, Hal Daume wrote a nice post entitled “Supplanting vs Augmenting Human Language Capabilities“. Drawing an analogy between natural language processing (NLP) and robotics, he says:

I would say that most NLP research aims to supplant humans. Machine translation puts translators out of work. Summarization puts summarizers out of work (though there aren’t as many of these). Information extraction puts (one form of) information analysts out of work. Parsing puts, well… hrm…

There seems actually to be quite little in the way of trying to augment human capabilities.

He then offers possible ways that NLP might be used to augment, rather than supplant human capabilities:

Tools for language learning.
Interactive information retrieval.
Adaptive tutorials.

The main tenet of HCIR is that information retrieval systems should be working with users, rather than trying to do all of the work on their own. It’s great to see a kindred spirit thinking about machine learning and NLP in the same light.

General

What Is (Not) Search?

I had a conversation the other day that raised a conundrum: what is *not* search? What do I mean by that? Well, as Stephen Arnold points out in a recent post, “search” can be anything from a “find-a-phone number problem” to a “glittering generality” that encompasses end-to-end information processing.

Language is imperfect, so is it really that important to define what is and isn’t “search”? It certainly matters when you’re trying to sell search technology! But, more importantly, we need some shared understanding in order to make progress.

At the very least, I propose that we distinguish “search” as a problem from “search” as a solution. By the former, I mean the problem of information seeking, which is traditonially the domain of library and information scientists. By the latter, I mean the approach most commonly associated with information retrieval, in which a user enters a query into the system (typically as free text) and the system returns a set of objects that match the query, perhaps with different degrees of relevancy.

Beyond that, we need to recognize that search exists within the context of tasks. It is easy to lump every task that involves information seeking as “search”, but doing so oversimplifies a complex landscape of activities and needs. I believe we are headed for a world where end users think about tasks rather than about the search activities that form part of those tasks. In that world, search technologists provide infrastructure, not the end-user destination.