Category: General

General posts, typically analyzing HCIR issues.

Google Exec Udi Manber: In-House Search is “Not That Good”

Post author By Daniel Tunkelang
Post date October 19, 2008
6 Comments on Google Exec Udi Manber: In-House Search is “Not That Good”

On Friday, David Needle of InternetNews published an article with the provocative title, “Google Exec Disses Google’s In-House Search“. The essence of the article: Udi Manber, the Google VP of Engineering who is responsible for core search, evaluated Google’s internal search tools less than enthusiastically, saying “It’s not that good — I’m complaining about it”.

The article is a bit short on details. It quotes Nitin Mangtani describing recent updates to the Google Search Appliance to enable clustering of search results. But the most telling snippet is towards the end of the article, when Manber expresses his views on user interfaces:

While the search giant is constantly tinkering with new user interfaces, Manber said the simplicity of its standard, bare bones design remains tough to beat.

“Google has been very successful by being very minimal,” Manber said. “We’re doing hundreds of experiments with user interfaces; I see two to three new ones everyday.”

He added that Google might offer users the option of different views on its main search page, similar to the way it does so already on its personalized iGoogle page.

“Otherwise, I expect very incremental changes.” He said advanced users appreciate things like 3D and interfaces that offer more detailed views, but for the vast majority, “what happens now works. You type in two words, click and you’re done. You can’t beat that.”

To borrow a popular political slogan, yes we can. In fact, as IDC analyst Sue Feldman (also quoted in the article) said the other day, “One of the problems we have with search is that people ask such lousy questions…anytime tools hand people clues, it helps.”

Google’s success on the consumer web affords them the luxury of hiding their heads in the sand when it comes to enterprise information access. And I understand the appeal that Google’s computer scientists (and others) feel in approaching the information seeking problem as one of optimizing relevance ranking.

It’s not just Manber. Here are some quotes from Google Enterprise Product Manager Cyrus Mistry at a recent presentation:

“[the ideal search engine] knows exactly what you meant, gives you exactly what you want.”
“If you think tagging is the way to go, good luck. See me in 10 years.”
“We’ll decide where to show it” (an explanation of the value proposition of Google’s universal search, which blends results from multiple sources into a single ranking)

It’s easy for Google to be cocky when they’re making $1.35B in quarterly profits. But it doesn’t make them right, especially when it comes to an area that accounts for about 1% of their business. Mind reading may not be impossible; some of my colleagues at CMU are working on it as we speak.

In the mean time, the only practical means our systems have for determining user intent is their input. And, as has been widely reported, the average search query contains 1.7 words. Perhaps the entropy of web search makes it possible to reliably infer intent from such a small signal. But enterprise search–which is to say information seeking in the enterprise–is harder.

At Endeca, we use our own technology in house. Our solution isn’t perfect, and we’re constantly working to improve it. But, most importantly, we’re going after the right problem. To respond to Mistry’s comments:

The information access tool does not presume to know exactly what you meant or what you want, but instead works with you to establish this understanding through dialogue.
Tagging can be a very effective way to bring in human expertise, especially when it is distributed across a broad population of users. But the tagging mechanism has to be easy for users, and the system needs to be smart about extrapolating from those tags to fill in the gaps.
Often the best way to present diverse results is not by blending them into a single ranking but rather exposing that diversity to users in the form of a progressive refinement dialogue.

Google aspires to “organize the world’s information” but admits that its approach falls short when it comes to organizing the information inside their firewall. I commend Manber for his candor. But I hope he and his fellow Googlers take the next step and recognize that they have to think outside the search box.

General

Duck Duck Go!

Recently I’ve starting using Twitter Search to find people talking about topics that interest me. One of my serendipitous finds was Gabriel Weinberg, who is reported to have single-handedly built a search engine called Duck Duck Go. I’ll suspend judgement on the name–after all, Beyond Search blogger Steve Arnold proudly calls himself an addled goose.

Regular readers know that I’m highly skeptical of quixotic attempts to take on the web search market. And I have no reason to believe that Duck Duck Go will achieve meaningful market share in our lifetime.

But Weinberg has truly done more with less. For example, when I do a query for SIGIR , I get a disambiguation dialog that bootstraps on Wikipedia. Yes, these are also the top two hits on Google, but with a dialog that implements clarification before refinement.

Unambiguous queries like Endeca or Warren Buffett need no clarification and instead return clean pages of top results from the major content types.

I supsect that Weinberg is heavily leveraging Wikipedia. But why not? Why work hard when you can work smart?

And Duck Duck Go can go off the rails, particularly for harder queries. I haven’t tested it enough to scientifically compare its quality to that of the major web search engines.

Still, it makes a strong first impression and have a great interface. At the very least, he’s raising the user experience bar for the common web search use cases. Check it out!

General

2.0 Means Give-to-Get

I’ve been living in a kaleidoscope of “2.0”s recently: web 2.0, enterprise 2.0, government 2.0. I know some of those are already shedding 2s in favor of 3s. But I wanted to reflect on the core tenet of these 2.0 visions: give-to-get.

First, let me give immediate credit to the folks at the Greater IBM Connection, who put the phrase in my head at a recent summit. Those who have studied pre-2.0-history, may recognize give-to-get as the Golden Rule: Do unto others as you would have them do unto you. I don’t know of any other precept that has achieved such a universality across cultures.

What does give-to-get mean in the context of these 2.0 approaches to technology?

Web 2.0:

Blogs where readers’ comments become even more valuable than the original posts.
Sites that send users away when they can’t meet users’ needs themselves.

Enterprise 2.0:

Application architectures evaluated based on their ability to play well with others.
Professional conversations increasingly taking place openly, outside company firewalls.

Government 2.0

Information sharing factoring into government employees’ peformance reviews.
Efforts by intelligence agencies to replicate the communication style of consumer social networks.

Some people who are far more expert than I have written about this stuff:

Jeff Jarvis’s Golden Rule of Links
Dan Woods’s work on Mesh Collaboration

What I hope is clear is that, despite the overplaying of “2.0” as a buzzword, the real value of this trend is in promoting one of the cornerstones of our success as a species: enlightened altruism.

And, at the risk of beating a dead horse, I’d like to call attention to my own efforts to give credit where it’s due on this blog. I’ve consciously reduced internal linking, only using it to refer to earlier posts. But, as you’ll see, this “altruism” is quite self-serving. I’ve been delighted to see folks link to this blog in order to cite its ideas.

Because what 2.0 is ultimately about is better information sharing for all of us. And for that, we all have to give to get.

General

Alerting: Push or Pull?

The other day, I was ranting about how Google is conflating the goals of search and advertising. One of the questions that we discussed over at Greg Linden’s blog was whether the difference between search and advertising is push vs. pull. But, as we concluded, that isn’t quite it. The difference is not the means, but rather then end: meeting the user’s needs rather than those of advertisers.

And, indeed, the perfect example of a user-driven push interface is alerting. In a typical alerting system, users specify a running query that triggers whenever matching content is published. Certainly this is more akin to web search than to advertising.

But, like web search, alerting runs into the challenge of adversarial information retrieval. If SEO is about maximizing exposure to users through high rankings in search results, there must be an analogous concept of maximizing exposure to users by triggering their alerts.

For example, I happen to know that Gartner analyst Whit Andrews, like many of us, has an alert on his own name. By placing his name in this post, I am quite confident that he will read it.

But why go after individuals when you can spam wholesale? Including the name of a company in a blog post is certain to attract the attention of a fair number of employees. Including the ticker symbol of a publicly traded company in a document will trigger stock tracking alerts. Et cetera.

Others have noticed the ability to spam through alerting systems. I imagine that alerting systems will eventually engage in similar strategies to search engines to inhibit spam and decide what is relevant. And perhaps those same systems will include ads.

General

Extra, Extra: Newspapers’ Web Revenue is Stalling

Post author By Daniel Tunkelang
Post date October 13, 2008

Today’s news: newspapers’ web revenue is stalling.

No wonder: Google is mixing up search, advertising, and publication, while newspapers are responding to this competitive pressure by sliding down the slippery slope into becoming aggregators.

This is a tough game, and I’m not sure how it plays out. I understand how media players fear Google commodifying their content, but I don’t think the best strategy for them is to accelerate this process.

On one hand, the increasing dependence on Google for traffic degrades brand loyalty, since it leads to hit-and-run users. On the other hand, Indeed, the increasing dependence on ad networks means that, as Paul Iaffaldano of the TWC Media Solutions Group suggests, “the publishers commoditize their own inventory”.

I use Google News and Techmeme to get an overview of general news and technology headlines, but I am still loyal to several sites and feeds. I’m probably more of a news junkie than most. Even so, if publishers sacrifice their differentation as a short-term survival tactic, they will ultimately lose everything.

It’s a bit late, but I think publishers need to figure out how to renegotiate their balance of power with aggregators and even search engines. If economics of publishing on the web reduce to an SEO war, it ain’t gonna be pretty.

General

The Link Economy goes Mainstream

Post author By Daniel Tunkelang
Post date October 13, 2008
3 Comments on The Link Economy goes Mainstream

I just read an article in the New York Times by Brian Stelter describing how mainstream news outlets like NBC and the New York Times itself are starting to link to other sites . This is a pretty radical change, since these sites have historically aimed to by sticky and thus maximize their customer exposure to their content and their ads.

The article quotes Scott Karp, chief of the Web-based newswire Publish2, justifing this “link journalism” approach by relating it to Google’s success: “It’s all about sending people away, and it does such a good job of it that people keep coming back for more.”

Blogger Jeff Jarvis (who is also involved with the Daylife news aggregator) offers a “golden rule” of links: “Link unto others’ good stuff as you would have them link unto your good stuff.”

As a blogger, I find a lot to agree with in the above. But I’m operating a niche site aimed at a highly targeted audience. And, while I aspire to have hoardes of readers, I am not counting on them for my likelihood. I’m not even monetizing my readership by selling their attention to advertisers!

But I’m not sure how well this approach will work for broad media outlets. As the article states, these news organziations are acting in effect like aggregators. So much for “content is king”. I exaggerate–I assume that none of these media companies are planning to dump their own content and reduce themselves to branded aggregators. Still, it is a slippery slope, and it’s hard to resist the lure of free content.

I’m curious to see where this all goes. As a user, I’ve moved from media loyalty (I grew up receiving the New York Times on my doorstep) to using media commodifying aggregators (Google News) to pulling together RSS feeds into my own reader. I suppose most people lack the patience, inclination, or technical sophistication to put together personalized newsfeeds. Still, I’m not convinced that it’s a good idea for media players who have valuable content to turn themselves into aggregators.

Rather, I think they should follow the advice of Dan Farber, vice-president of editorial at CNET Networks and editor in chief of ZDNet:

At CNET we link to our stories and to others. Generally if it is a standard news item that everyone has, we link to our version. If someone has the seed of a story or a take that helps to carry a story forward or deeper, we link to whatever. A challenge for all of us is finding and linking to content that we should point our readers at…often we don’t have the time to go figure who has the best take or where a story came from before it got refactored by the blogosphere…so we continue to improve on it every day.

I think this advice confirms Jarvis’s “golden rule”, but doesn’t go as far as Karp’s “link journalism”. If you are a media outlet, you should send your readers away if you don’t have what they want. But you should try to do a good job of having what your readers want. After all, you are a media outlet, not an aggregator or search engine.

General

Information Retrieval on Wikipedia: How You Can Help

Post author By Daniel Tunkelang
Post date October 12, 2008
1 Comment on Information Retrieval on Wikipedia: How You Can Help

In response to my various calls to action here at The Noisy Channel, I’ve gotten a fair number of requests from readers asking me for specifics on how they can help. I’d like to offer some concrete suggestions. I’m hoping to make them bite-sized enough that we can make a task queue that volunteers will pick up.

Proposed projects:

Add a History section to the Enterprise Search entry. Some suggested sources:
– “Challenges in Enterprise Search” by David Hawking
– “Enterprise Search: Tough Stuff” by Rajat Mukherjee and Jianchang Mao
– Enterprise Search Sourcebook 2008
– The list of Enterprise Search Vendors (but please don’t revert this page back to a vendor list!)
– Enterprise Track at TREC
Add a “Definition” section to the Enterprise Search entry that subsumes the current entry and explains the various competing definitions of “enterprise search”. Posts on this blog that mention enterprise search, as well as the material they reference, would be a good starting point. This task might also include eliminating the Enterprise Information Access stub and pointing it to the Enterprise Search entry.
Create a Faceted Search entry. Amazingly, Wikipedia doesn’t have one, and the Faceted Classification entry is not (and should not be) a substitute. Marti Hearst’s HCIR ’08 paper would be a great starting point.
Go through the Information Retrieval category and propose at least incremental steps to organize the 91 entries in it. For example, should vendors be on this list? Open source software packages? I don’t know what is standard for a Wikipedia category, but it is ironic that the Information Retrieval category is so chaotic!

I also encourage people here to add to this list, though I suggest to those same people to consider contributing more than just work for others. And, to be clear, some of the entries in this category are excellent, e.g., the entry on stemming. We should aspire to raise all of the entries to this level, and at least to promote high-quality entries so that they are not buried by their lower-quality brethren.

General

Twitter’s Twist on the Attention Economy

Post author By Daniel Tunkelang
Post date October 10, 2008
2 Comments on Twitter’s Twist on the Attention Economy

I am a long-time LinkedIn user, and over time I’ve accumulated over 1,000 connections. Most of them are people I actually know or at least have interacted with online beyond “connecting”.

You might think that’s a large number of people to have as connections, and that I could afford to have a more selective velvet rope. And, as you may have noted, I know only most of my connections; some of them are link spammers whose connection requests I nonetheless accepted.

But, you see, there’s no incentive for an individual to reject a spammy connection request. Link spammers do reduce the relative value of legitimate links, and as a result devalue the LinkedIn network as a whole. But it’s a classic tragedy of the commons. Why should I personally sacrifice the reach of my network if I gain nothing? As far as I can tell, this problem applies just as much to Facebook and other social networking platforms.

Twitter is a different beast. Granted, Twitter and LinkedIn may not even see each other as competitors, but that is beside the point. They are competing for people’s social networking cycles, and all of today’s social networking platforms / applications are surely keeping their options open as to what positions they will ultimately stake out.

In any case, what most differentiates Twitter from LinkedIn is their attention economics. On LinkedIn, you incur a benefit–at no apparent cost–from the size of your network, up to degree 3. In contrast, all that matters in the Twitter “social graph” are your immediate links. You don’t get any direct benefit from connections at distance greater than 1. Moreover, the connections are asymmetric, as are their costs and benefits. Following people is an investment of your attention, where the return is access to information (in a broad sense). Being followed is an investment of their attention, and hence an opportunity to exert influence. The asymmetry of Twitter connections is most evident for celebrity influencers, who have far more followers than followees.

While Twitter, at least in my view, is a work in progress, I think they have done well to align their model with attention scarcity. I’m most keenly aware of this scarcity as I decide whom to follow. Accepting a connection from a LinkedIn spammer costs me nothing, while following someone on Twitter who updates on every inhale and exhale would render the service completely worthless.

As a result, connections in Twitter reflect real value. They correspond to investments of attention. Someone with many followers is much like an author with many readers. While I’m sure this metric can be gamed (e.g., by creating bogus Twitter accounts and having them follow you), at least Twitter has the model right in principle.

Speaking of which, if you’re interested in following my tweets, you can find them here.

General

Search is Not Advertising

Post author By Daniel Tunkelang
Post date October 9, 2008
11 Comments on Search is Not Advertising

Thanks to Greg Linden (who in turn thanks John Battelle) for calling my attention to a post by Google VP of Product Management Susan Wojcicki entitled “Ad Perfect“.

We can distill Wojcicki’s post to three principles, each a direct quote:

“advertising should deliver the right information to the right person at the right time”
“help you learn about something you didn’t know you wanted”
“it needs to be very easy and quick for anyone to create good ads, to show them only to people for whom they are useful, and to measure how effective they are”

While Wojcicki does call out the similarity between Google’s mission in advertising and its mission in search, she fails to see a key difference–a difference exposes a fundamental problem with web search today.

Search is all about the user. If you can help me, the user, find what I’m looking for, or to find something I didn’t know I wanted, then I’m all ears (or eyes). Of course, I’d like to understand your motives if you’re offering to help me make decisions, especially if they involve my money or even my health.

Advertising is about selling the user’s attention to the highest bidder. Google has done more than anyone to make that bidding process economically efficient. But any utility that advertising proves to users is a means to an end. Advertising is all about the advertisers, and the advertisers only care about providing value to users in so far as their interests are aligned. Absent alignment, advertisers naturally look out for themselves.

This dynamic is hardly unique to search; it applies to any situation where we allow someone or something to influence our decisions. Indeed, persuasion and critical thinking have been locked in an arms race for millenia. The use of advertising to subsidize content dates back to the early 1800s. Wikipedia offers a nice history of the subject.

But supporting search through advertising is a tricky business. Google insists that it maintains a wall between its search and advertising businesses. But Wojcicki’s post–which is on Google’s official blog–suggests otherwise, at least in spirit. If Google believes that both search and advertising aim to “offer relevant content” and “deliver the right information to the right person at the right time”, then why put up a wall at all?

In any case, it is at best misguided and at worst intellectually dishonest to claim that the main goal of advertising is to inform or help the user. The goal of advertising is to influence the user, a goal whose achievement requires delivering a message to which the user is receptive. But influencing is not the same as informing. I hope we all have the critical thinking skills to appreciate the difference.

General

NRC Report: Data Mining won’t find the Terrorists

Post author By Daniel Tunkelang
Post date October 7, 2008
18 Comments on NRC Report: Data Mining won’t find the Terrorists

According to Declan McCullagh, a just-released U.S. National Research Council report entitled Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Assessment concludes that automated identification of terrorists through data mining or any other mechanism “is neither feasible as an objective nor desirable as a goal of technology development efforts.”

I haven’t had the time to read through the 352-page report. The committee that wrote the report includes Stanford professor William Perry, former MIT president Charles Vest, and Microsoft researcher Cynthia Dwork. Such a crew undoubtedly realizes that any data mining technique yields false positives. The big questions are whether the data mining techniques are more effective than the alternatives, and whether the using them is consistent with law and policy.

Based on McCullagh’s summary, the report seems to mainly call for oversight and objective evaluation. Nothing controversial there. And, as he wryly notes, Americans may have watched too many episodes of 24 to have a realistic sense of what data mining can and can’t do.

Still, I think we’d be naive to give up entirely on machine learning approaches to fight crime and improve national security. As with all science, we need to subject hypotheses to rigorous, objective testing. But remember, low-tech approaches have false positives too. There is no moral superiority in being a Luddite.