Category: General

General posts, typically analyzing HCIR issues.

Why LinkedIn Frustrates Me

Post author By Daniel Tunkelang
Post date March 31, 2009
3 Comments on Why LinkedIn Frustrates Me

Let me start by saying that I really like LinkedIn. I use it as everything from a self-updating address book to a gateway to professional communities like the enterprise search professionals group. I am delighted by the information LinkedIn has assembled about companies just by aggregating user profiles. In short, I take LinkedIn quite seriously as a professional networking tool.

With that preamble out of the way, I’d like to vent a bit about LinkedIn’s approach to search. Directories are a poster-child domain for faceted search. LinkedIn specifically has high-quality semi-structured data, since users are personally incented to optimize their own findability. Moreover, the process doesn’t even seem adversarial–I haven’t seen any Joe the Scammer claiming to be the CEO of a Fortune 500 company (oops, bad example). LinkedIn has done the best job I’ve seen of aggregating high-quality data about people’s professional history–in a volume that is not only unprecedented but more importantly is large enough to be broadly useful. And the site designers clearly care about search: they still proclaim above the search box, “New Improved Search!” (see my earlier review here).

Why is faceted search so valuable for directories? Because finding someone is often a task that calls for exploratory search. Unless you’re looking someone up by name (and hopefully by a unique name), you’re not performing known-item search. Rather, you’re looking for a potential employee, employer, business partner, or expert. You may not even know what you want until you have a sense of what’s out there–the different companies in the space, the different relevant job titles, etc. Moreover, now that LinkedIn has a significant amount of text associated with its users (e.g., the Q&A section and forums), it could do a much better job of linking people to the content they produce.

I understand that uniting the social network functionality of LinkedIn with search is hard enough, and that introducing faceted search makes the problem that much harder. But it’s not impossible, and the value of such an application more than justifies the effort. So far, LinkedIn has benefited from having the best data. But users have no incentive to give LinkedIn exclusive access to that data. LinkedIn knows this, and its increasing emphasis on community will surely make it harder for someone to compete just by offering better information seeking support.

Still, the core value proposition of LinkedIn is tightly bound to the site’s search functionality, and LinkedIn would do well to take a more modern approach. Doing so would increase the site’s value dramatically, and I’m certain LinkedIn would find ways to monetize that value.

General

Wikipedia or Potemkinpedia?

Post author By Daniel Tunkelang
Post date March 29, 2009
3 Comments on Wikipedia or Potemkinpedia?

Today’s New York Times features an essay by Noam Cohen entitled “Wikipedia: Exploring Fact City“, in which Cohen explores the metaphor of Wikipedia as an information metropolis, repete with dead-end streets and industrial districts. It’s a nice read, even if Cohen gets a bit carried away with his artistic license.

But Nick Carr reacts strongly to the notion that Wikipedia offers “sidewalk-like transparency and collective responsibility”. In his post, entitled “Potemkinpedia“, Carr argues:

Wikipedia has imposed editorial controls on those articles, restricting who can edit them. Wikipedia has, to extend Cohen’s metaphor, erected a whole lot of police barricades, cordoning off large areas of the site and requiring would-be editors to show their government-issued ID cards before passing through.

Carr is right that that Wikipedia exerts editorial controls, and it’s true that not everyone agrees with Wikipedia’s means of doing so. Still, I feel his tone seems unnecessarily hostile, and he doesn’t point out that the amount of top-down control exerted to maintain Wikipedia is extremely low compared to the value it creates. Indeed, he cites “hundreds” of pages being protected without noting that Wikipedia contains 2,817,176 articles in English alone.

One of Wikipedia’s founding principles is “Assume good faith.” Perhaps I should do that myself, yet I can’t help but wonder if Carr’s relationship with Brittanica predisposes him against Wikipedia. In fairness to Carr, he has pointed out that relationship in some of his articles (like this one). But I actually hunted down a reference on his Wikipedia entry.

Personally, I give Wikipedia high marks for transparency. What other publication exposes the full revision history of its content and provides a public discussion forum for both its content and editorial process? I’ve had my disagreements with that process, but I can’t fault its openness. Indeed, my main criticism of Wikipedia is that it doesn’t require such transparency of authors, but instead allows authors to contribute anonymously.

Perhaps Carr sincerely wants Wikipedia to be more like Brittanica, or at least wants everyone to know that Wikipedia is not 100% unlike Brittanica. And perhaps Cohen offered an oversimplified vision of Wikipedia’s governance that some could see as a utopian vision of anarchy. But I think Carr is being a bit disingenuous. After all, even a metropolis has a mayor.

General

Curt Monash on the Information Ecosystem

Post author By Daniel Tunkelang
Post date March 29, 2009
7 Comments on Curt Monash on the Information Ecosystem

Curt Monash recently published a pair of sweeping posts about the future of the information ecosystems:

They’re nice posts, and Monash does a great job of playing by the rules of the link economy (the one time I agree with Jeff Jarvis) and bringing together thinking from around the blogosphere.

But I do have some differences. I posted a comment on Monash’s post about where the information ecosystem is headed, which I’ll reproduce here:

I think you may be underestimating the challenge of transitioning from an ecosystem dominated independent information providers to one dominated by vendors or analysts motivated by self-interest.

Yes, everyone has self-interest–I’m not trying to suggest otherwise. But, much as I’m sure anyone who reads my blog takes anything I say about enterprise search with a grain of salt, I’m sure anyone who reads your blog maintains a healthy skepticism towards you say about the vendors you do business with–or about their competitors. I think there will always be a market for information that comes without conflicts of interest.

The question of course, is how large that market will be, and where consumers’ willingness to pay will intersect the cost of production, especially if independent information providers are competing with vendors or analysts who provide information for free in order to market their products and services.

And, while I’m specifically thinking about technology news / analysis, the same goes for other arenas, like politics. Do we want all of the reporting to come from activist organizations? Some would argue that’s already the case, but it could be much, much worse.

General

I Gotta “Hunch” You’ll Wanna Check This Out

Post author By Daniel Tunkelang
Post date March 27, 2009
20 Comments on I Gotta “Hunch” You’ll Wanna Check This Out

It’s all about who you know, and this week I was lucky to meet one of the investors in Hunch, a decision-making tool from Flickr founder Caterina Fake that “gives you its best hunch” of what you would like when you’re feeling indecisive. It made a big splash today and is the top story on Techmeme as I write.

It’s an intriguing concept. I played with it for a bit, and I have to confess I’m not indecisive enough to fully appreciate it. But I can see the appeal for folks who would like to crowd-source minor decisions and thus introduce a bit of randomness and novelty into their lives. Especially for decisions like what to eat or where to go to college.

In any case, my new acquaintance was kind enough to not only give me an account, but also to provide me with 10 invites to distribute to incorrigibly curious.

As always, it’s first come, first serve.

General

Does Metadata Matter?

I’m trilled at the discussion that my call for devil’s advocacy has incited. Keep bringing it–and let me know if you’d like to contribute a guest post.

But it’s also nice to find strong views elsewhere in the blogosphere. This morning I saw a post at The Findability Project entitled “Metadata Schmetadata, Relevance and Reality“, in which the authors argue that they don’t need metadata.

Specifically, they say:

Working on this project, we have evaluated what we need from metadata as part of enterprise search implementation. Our conclusion? We don’t need metadata.

Or better said, we don’t need to add metadata for a Google Search Appliance (GSA) to accomplish what we want to accomplish with enterprise search.

I posted the following comment, which is currently pending moderation:

An interesting article. But perhaps I missed an explanation of how you performed your evaluation. Did you assign tasks to your users and compare their effectiveness on the two systems? Did you ask them to express their subjective satisfaction with the system? Did you have some productivity measure external to the system, such as efficiency at completing projects?

It may be that a simple out-of-the box ranked search approach, with no annotation, manual or automatic, of your documents, is exactly what your organization need. But it’s very hard to generalize from your experience without understanding better what exactly you where evaluating.

I am on board with the argument that 100% manual annotation offers poor return on investment. But that’s a straw-man argument. I would think that the real question is whether you want fully automated metadata generation, a semi-automated approach, or none at all.

And, as per my comment, it seems hard to justify any design decisions about enterprise search without success metrics, even imperfect ones.

In any case, look forward to more such posts, as I strive to increase the diversity of views at The Noisy Channel, even if I have to import them!

General

Taking the Google Wonder Wheel for a Spin

Post author By Daniel Tunkelang
Post date March 25, 2009
10 Comments on Taking the Google Wonder Wheel for a Spin

I tried out the Google Wonder Wheel today–it’s being rolled out as an experiment, but you can enable the cookie yourself by entering the following into the address bar after a Google query:

javascript:void(document.cookie=”PREF=ID=4a609673baf685b5:TB=2:LD=en:CR=2:TM=1227543998:LM=1233568652:DV=AA:GM=1:IG=3:S=yFGqYec2D7L0wgxW;path=/; domain=.google.com”);

As I blogged yesterday, I’m glad that Google is giving exploratory / HCIR approaches a shot. But I’m shocked at how far behind they seem to be, especially given the incredible talent of Google employees.

In fact, the Wonder Wheel reminds me of work AltaVista did over a decade ago, close to heart because of related work I did at IBM Research. I’d be more impressed if the related topics were of higher quality, but they seem to be lagging far behind the state of the art. Which makes it all the more understandable that they don’t promote such features to users.

I’d like to get more excited about such efforts, but I fear that they are being set up to fail and will be used as evidence against HCIR in general. I hope at least that more informed technologists recognize these efforts as less than representative of what can be done using richer interface metaphors, and will keep working on improving the tools to support information seeking.

General

HCIR 2009: A Pre-CFP

Post author By Daniel Tunkelang
Post date March 25, 2009

We’re still a week or two away from officially announcing the 3rd Annual Workshop on Human Computer Information Retrieval (HCIR ’09), but there are some details I wanted to share now, as well as a favor I’d like to ask of the community.

First, the details.

The workshop will be held on October 23, 2009 at the Catholic Univesity of America, in Washington, DC. We have a great keynote speaker lined up, but I’ll save that for the official announcement. As in the past couple of years, attendees will be expected to engage as active participants, and the format of the workshop will increase the emphasis on participation.

Second, the request for a favor.

Since we will be in the nation’s capital (please forgive the U.S.-centrism), we see this year’s workshop as an opportunity for the HCIR community to engage with the federal public sector. Seeing the success that Vivek Kundra, now our national CIO, has had with the Apps for Democracy program in DC should inspire anyone who believes in the value of an informed citizenry. The struggles of intelligence agencies to make sense of enormous amounts of data that is quite literally a matter of life and death call for approaches that best combine the skills of people and machines. And we all want economists and public policy experts to have the best access to any information that could help them help us.

If you are in the federal public sector or know people who are, I would greatly appreciate the opportunity to talk with you about HCIR ’09. Ideally, we are hoping for a U.S. government agency to join Catholic, Endeca, and Microsoft Research in sponsoring the event. But this appeal isn’t about money–even before the recession, we ran the workshop with fiscal restraint. Rather, we’d like to make sure that we make effective use of the workshop’s location to educate the public sector about HCIR–and to educate the HCIR community about the needs of the public sector.

If you would like to get involved, please reach out to me, either publicly here or privately by email.

General

Google Offers “More And Better Search Refinements”

Post author By Daniel Tunkelang
Post date March 24, 2009
11 Comments on Google Offers “More And Better Search Refinements”

Fresh news, hot from the Official Google Blog:

Starting today, we’re deploying a new technology that can better understand associations and concepts related to your search, and one of its first applications lets us offer you even more useful related searches (the terms found at the bottom, and sometimes at the top, of the search results page).

For example, if you search for [principles of physics], our algorithms understand that “angular momentum,” “special relativity,” “big bang” and “quantum mechanic” are related terms that could help you find what you need.

A couple of reactions. First, Google has offered related searches for a while, so I’d love to know what makes these “more and better”. I can’t tell from playing with it, and the suggestions I see aren’t as good as, say, Kosmix. Second, if they believe that this feature can improve user experience, why are they putting the results at the bottom of the page (at least on all of my queries)? Surely they know from their own logs that only a minority of users look to the end of the results list.

While I see this enhancement as a step in the right direction for Google, I wonder if they have their hearts in it. Google used to promote refinements–actually faceted search refinements–on their product search site, but pushed those to the bottom too. It seems very hard for them to get away from the primacy of those ten blue links.

I’d like to get excited about Google embracing HCIR, especially after they were so kind as to let me lecture them about it. And perhaps I’m being too harsh a critic. Their post concludes:

Even if you don’t notice all of our changes, rest assured we’re hard at work making sure you have the highest quality

It seems to me that they go out of their way to make sure that changes aren’t noticeable to users. I suppose their conservative attitude might cost them the occasional designer, but hasn’t hurt their pocketbooks.

Come on, guys, you’re the market leaders! Don’t be so timid.

General

Text Analytics Summit: Early-Bird Discount

Post author By Daniel Tunkelang
Post date March 23, 2009

I’m giving a presentation on “Enabling Exploration through Text Analytics” at the 5th Annual Text Analytics Summit, which will take place June 1st-2nd in Boston. Check out the agenda and line-up of speakers; you’ll find some impressive names.

June is far away, so why am I posting about this now? Well, as is usual for conferences like these, there are early-bird registration discounts. To get the cheapest option, you should register by Thursday, March 27th. Better yet, enter my name in the promotional offer box and you get another $100 off. And no, I don’t get any kickbacks. 😦

I am fully aware that conferences like these are expensive, and that budgets are tight. But if you are in a business that could benefit from better text analytics and can afford to attend (and hopefully the discounts make that more of a possibility), then I encourage you to do so. Education is the best investment you can make in order to maximize the value of your investments in information technology, whether you buy or build.

Also, to the best of my knowledge, all of the speakers are invited and their participation is not predicated on any kind of corporate sponsorship. This is a big deal when it comes to industry conferences; the last thing you want to do is pay a lot of money for glorified sales pitches. These guys are putting in the effort to deliver quality content. I hope to see some of you there– let me know if you’re attending!

Of course, if you’re Endeca customer, I urge you even more strongly to attend Discover ’09, Endeca’s annual user conference, where I’ll be talking about “Money for Nothing and Your Tags for Free“. It’s also in Boston in June: June 8th to June 10th.

General

The Internet Is About Freedom

Post author By Daniel Tunkelang
Post date March 22, 2009
12 Comments on The Internet Is About Freedom

I was in a bit of shock when I saw that the top story on Techmeme was a post on TechCrunch entitled. “Why Advertising Is Failing On The Internet“. After all, TechCrunch is an ad-supported site–something I admittedly had to confirm using a browser without an ad blocker.

But my confusion subsided when I realize that the TechCrunch post was actually a guest post by Eric Clemons, Professor of Operations and Information Management at The Wharton School of the University of Pennsylvania.

Here’s the outline:

1. There Must Be Something Other Than Advertising

2. Advertising will fail

3. Advertising will fail for three reasons:

Consumers do not trust advertising.
Consumers do not want to view advertising.
Consumers do not need advertising.

4. Alternative models for monetization are available:

Selling content and information.
Selling experience and participation in a virtual community.
Selling accessories for virtual communities.

In my case he’s preaching to the converted, and I don’t see why his arguments should be so controversial. But clearly they are in a world where the ad-supported model dominates to such an extent that most people don’t imagine any other business model is viable. I hope his post helps persuade a few skeptics.

Finally, I love his conclusion:

The internet is about freedom, and I suspect that a truly free population will not be held captive and forced to watch ads. We always knew that freedom comes at a price; perhaps the price of internet freedom and the failure of ads will be paying a fair price for the content and the experience and the recommendations that we value.