Author: Daniel Tunkelang

High-Class Consultant.

This is not a corporate blog

Post author By Daniel Tunkelang
Post date December 10, 2008
17 Comments on This is not a corporate blog

To paraphrase René Magritte, ceci n’est pas un blogue corporate (this is not a corporate blog).

Why do I bring this up? Because today I saw a post by Richard MacManus on ReadWriteWeb entitled “Report: Corporate Blogs Not Trusted” and a similar post by Joe Wilcox about how to “Make Your Corporate Blog Believable“. They cite a report from Forrester that company blogs represent the least trusted information source, down at 16%. Actually, personal blogs don’t fare much better at 18%, but I’d like to use this report as a pretext to talk about what it means to blog as an industry professional.

Frankly, I don’t think corporate blogs, at least as they are conventionally understood, are a good idea. Companies put out press releases that no sane person would trust as an objective information source. A corporate blog is just a repackaging of a press release web page, trying to masquerade as something more hip. I don’t think it fools anyone, and I’m not aware of any corporate blogs other than the Official Google Blog that have significant readership.

Bloggers are people who speak with individual voices. And industry professionals are still people, regardless of their corporate affiliations. I am no more the voice of Endeca’s public relations department than Greg Linden is the voice of Microsoft’s or Matt Cutts the voice of Google’s.

Of course, I have my point of view, which unsurprisingly has some alignment with my employer’s overall vision. When I advocate for HCIR, I don’t make excuses for the fact that HCIR underlies Endeca’s approach to information access. But I speak as an individual and in my own voice. When I blog, I put my own credibility on the line, and I cultivate a reputation that extends beyond my corporate affiliation.

I think the interesting question for companies is not whether they should publish corporate blogs, but rather whether they should encourage their employees to publish personal blogs that relate to the work the company does. As someone who has been involved in the development of Endeca’s core intellectual property, I understand the reservations that companies have about letting their employees publish. But I think that companies are often too conservative, and incur an enormous opportunity cost in the name of protecting trade secrets. Letting employees blog (and, more generally, publish) not only provides the companies with free marketing, but also provides employees with an avenue for personal development.

I’d be curious to hear perspectives from readers here who work for companies. Perhaps I’m lucky to work for an enlightened employer; do most corporate citizens get the memo from Legal saying that blogging is something only the marketing department should do?

Uncategorized

Freemium is the new black

An article by Claire Cain Miller in today’s New York Time proclaims: “A Web Start-Up Counting on Ad Sales? Good Luck“. The article isn’t kind to the ad-supported model in general, but the particular concern is for startups. The article quotes David Weiden from Kholsa Ventures:

“The ad model is somewhat worse but not radically worse,” he said. “What’s worse is getting funded that way.” If a company approaches investors with a plan to lose money for three or four years while building an audience, it will encounter many closed doors, he said. “It’s gone from plausible to almost implausible.”

The preferred alternative is the “freemium” model: offer basic services for free, and upsell advanced or special features. I marvel how it’s controversial to talk about charging for services. But that’s what happens in a world where people have come to expect information to be free–an expectation that the current economic conditions will surely reinforce.

Uncategorized

Why the SEO stakes are so high

Post author By Daniel Tunkelang
Post date December 8, 2008
2 Comments on Why the SEO stakes are so high

According to an article published today in IT Business Canada:

The typical Web site gets 61 per cent of its traffic from organic (nonpaid) search engine results, and 41 per cent of all traffic from Google alone.

In part, these numbers reflect Google’s dominance in web search–41/61 is a whopping 67%, which is within epsilon of Google’s reported share of the web search market. But the larger point is that most sites depend on web search for the majority of their traffic, which makes search engine optimization (SEO) a matter of life or death for commercial sites in general, but especially online retailers, publishers.

So it’s not surprising that SEO is a multi-billion dollar industry, comparable in size with the pay-per-click (PPC) advertising industry. And, to the extent that the SEO industry is helping to organize the world’s information, it’s earning its keep. But it’s hard to know how much SEO improves the efficiency of the information market vs. how much simply fuels an arms race. Again, full disclosure: I am one of the arms dealers.

General

Overwhelmed by Email?

Normally, I don’t post about press releases that people email me. But in this case, the title, “Half of Americans Are Overwhelmed by E-Mail“, hit far too close to home. Having spent the day catching up on a week of email, I’m feeling more than a little overwhelmed. And it’s made me think hard about imposing more discipline on how I manage email.

I’m in no immediate danger of declaring email bankruptcy, but I have reached a point where my ad hoc approach to managing email–particularly to checking email as it arrives–costs me so much in productivity that I am considering reducing the frequency with which I check email to once or twice a day.

One might ask if there’s anything unique about email as a source of context switching. Don’t the same issues apply to news feeds, Twitter, etc? And there’s instant messenger–which is intended to trigger an immediate context switch. Why single out email?

My suspicion is that email satisfies an unholy mix of properties:

A substantial fraction of email is personal and important, and there are no reliable automatic ways of identifying this fraction.
The sender’s expectation of how long to wait for a response varies widely–from as soon as possible (e.g., one-line bodyless emails used as instant messages) to days or even indefinite (e.g., an FYI email that does not require a response).
The typical sender sees email as the least invasive way to communicate, and therefore uses it as the default means of doing so.

The result: you (or at least I) end up with a relentless queue of email, faced with a choice of looking at all of it frequently, or likely deferring something urgent.

Of course, there are conventions, like marking emails as urgent, that are intended to sort out some of the above. But it isn’t realistic to expect everyone to use these consistently, at least not this late in the game.

Perhaps the answer is, as was suggested in an earlier post, to make it public. Specifically, Tantek suggests to “Move as much 1:1 communication into 1:many or 1:all mediums.” At some level, that is counterintuitive–after all, doesn’t that just make my problem everyone’s problem? But the key is that public communication sets different expectations. I might know the answer to your question, but so might any number of other people, so let’s balance the load.

It’s a nice idea, though it’s not clear how anarchic 1:many and 1:all mediums can accomplish this load balancing efficiently. But arguably that just an implementation detail. The first step is to calibrate the specificity of distribution to the specificity of the information / communication need.

Where does this leave me and others who are overwhelmed by email? For now, stuck with heuristics like disciplined management. For the long haul, advocating for more scalable social norms.

p.s. Ironically, this post is a public response to a private email.

Uncategorized

Noisy Channel, Back on Manual

Post author By Daniel Tunkelang
Post date December 6, 2008

As the Captain says in WALL-E, “AUTO, you are relieved of duty!” It’s good to be back in the blogger’s seat, so stay tuned for fresh content coming up this week.

Uncategorized

Humans and Machines: Collaborators or Competitors?

Post author By Daniel Tunkelang
Post date December 3, 2008
3 Comments on Humans and Machines: Collaborators or Competitors?

Last week, Hal Daume wrote a nice post entitled “Supplanting vs Augmenting Human Language Capabilities“. Drawing an analogy between natural language processing (NLP) and robotics, he says:

I would say that most NLP research aims to supplant humans. Machine translation puts translators out of work. Summarization puts summarizers out of work (though there aren’t as many of these). Information extraction puts (one form of) information analysts out of work. Parsing puts, well… hrm…

There seems actually to be quite little in the way of trying to augment human capabilities.

He then offers possible ways that NLP might be used to augment, rather than supplant human capabilities:

Tools for language learning.
Interactive information retrieval.
Adaptive tutorials.

The main tenet of HCIR is that information retrieval systems should be working with users, rather than trying to do all of the work on their own. It’s great to see a kindred spirit thinking about machine learning and NLP in the same light.

General

What Is (Not) Search?

I had a conversation the other day that raised a conundrum: what is *not* search? What do I mean by that? Well, as Stephen Arnold points out in a recent post, “search” can be anything from a “find-a-phone number problem” to a “glittering generality” that encompasses end-to-end information processing.

Language is imperfect, so is it really that important to define what is and isn’t “search”? It certainly matters when you’re trying to sell search technology! But, more importantly, we need some shared understanding in order to make progress.

At the very least, I propose that we distinguish “search” as a problem from “search” as a solution. By the former, I mean the problem of information seeking, which is traditonially the domain of library and information scientists. By the latter, I mean the approach most commonly associated with information retrieval, in which a user enters a query into the system (typically as free text) and the system returns a set of objects that match the query, perhaps with different degrees of relevancy.

Beyond that, we need to recognize that search exists within the context of tasks. It is easy to lump every task that involves information seeking as “search”, but doing so oversimplifies a complex landscape of activities and needs. I believe we are headed for a world where end users think about tasks rather than about the search activities that form part of those tasks. In that world, search technologists provide infrastructure, not the end-user destination.

General

Reflecting on AltaVista

Today is Dec 1, and it seems like an appropriate day to reflect on DEC‘s one-time foray into web search: AltaVista. In fact, AltaVista was publicly launched as an internet search engine on December 15, 1995 as altavista.digital.com.

I was an avid AltaVista user, and I was shocked by the rapidity of its demise. Why did AltaVista fail?

According to Don Dodge, former Director of Engineering at Altavista:

The AltaVista experience is sad to remember. We should have been the “Google” of today. We were pure search, no frills, no consumer portal crap.

DEC is guilty of neglect in its handling of AltaVista. Compaq put a bunch of PC guys in charge who relied on McKinsey consultants and copied AOL, Excite, Yahoo and Lycos into the consumer portal game. It should have been clear that being the 5th or 6th player in the consumer portal business wouldn’t work. AltaVista spent hundreds of millions on acquisitions that never worked, and spent $100M on a brand advertising campaign. They spent NOTHING to improve core search. That was the undoing of AltaVista. (via Greg).

Perhaps. I think that doesn’t give Google enough credit for its key innovation: using link analysis to compute a then unspammable measure of a site’s authority, and then using that authority as a prior for its relevance. Of course, spammers caught up and have engaged Google in an arms race ever since, but the head start was enough for Google to establish its supremacy.

Is there a moral? Surely Dodge is right in condemning DEC’s business strategy. But I am sad to see how web search technology has settled in its current local optimum. So, at the risk of being cliché, I’ll draw the lesson that no technologist can afford to be complacent.

General

Software Agents and Rationality

Post author By Daniel Tunkelang
Post date November 30, 2008
1 Comment on Software Agents and Rationality

Back when I was an undergraduate (yes, a long long time ago), there was a lot of excitement about software agents, also called intelligent agents. The general idea was that a software agent would be able to pursue goal-directed behavior on a person’s behalf. Of course, what that meant ran the gamut from the mundane (e.g., autodialers) to science fiction (e.g., Braniac in the Superman comics).

With the increasing role that the web plays in our interactions, I wonder about the role of software agents on the web. We already see comment spammers and prankster instant messaging bots, as well as more benign shopbots.

But a question that plagues me is how to reconcile the inherent rationality of software agents with the systematic irrationality of the human beings they represent. Herb Simon argued that humans exercise bounded rationality, but the research from prospect theory suggests that the situation is even worse: not only are we bounded by our limited mental resources, but we don’t even make the most rational use of the resources we have.

So, if software agents start making decisions on our behalf, I wonder how happy we’ll be with those decisions. Will software agents have to simulate our deviations from rationality? Or will we have to learn to be more rational?

Finally, I shouuld not that machine agents are not restricted to the web or even to software. Just pick up the New York Times, and you can read about attempts to make Terminators a reality. Those efforrts raise concerns not only about rationality, but about ethics and accountability.

I’ll be back.

General

Beware of Google

According to the generally accepted history of Google, the company’s name originated from a common misspelling of the word “googol“, which refers to 10¹⁰⁰.

But, for folks who spend their nights worrying whether Google is evil, you might want to explore the possibility that its name comes from the horrid monster depicted in V. C. Vickers’s 1913 children’s tale, “The Google Book“.

Don’t worry, I’m not scaring my daughter with tales of Googles and Yahoos.