Author: Daniel Tunkelang

High-Class Consultant.

MIT User Interface Design Teatime Blog

Post author By Daniel Tunkelang
Post date November 8, 2008

I just discovered that the User Interface Design group at MIT has started blogging. Here’s the mission statement from their opening post:

The sharing of knowledge and ideas is of fundamental importance to the advancement of technology. With this goal in mind, MIT’s User Interface Design group meets once a day at Tea Time to brainstorm new ideas, review new technologies and ideas, and share their experiences working in the field.

If we hope to herald innovation by sharing ideas with a research group , then there’s a boundless value to sharing ideas and thoughts with the world at large. With this goal in mind, we will post a daily log of the musings and observations we discuss in our tea time meetings, and welcome your thoughts and comments about Human Computer Interaction, User Interface Design, and increasing the value and effectiveness of how we use technology.

I’m psyched whenever I see academics blogging, and even more psyched to see a collective effort like this one.

Uncategorized

The Long Tail of Search

The “long tail” is one of the most abused buzzwords of recent years, and I hesitate to use it myself in respectable company.

Nonetheless, SEO veteran Dustin Woodard has a nice guest post at the Hitwise Intelligence blog entitled “Sizing Up the Long Tail of Search“. Here are some statistics he cites about the distribution of search term frequency for web search data collected by Hitwise:

Top 100 terms: 5.7% of the all search traffic
Top 500 terms: 8.9% of the all search traffic
Top 1,000 terms: 10.6% of the all search traffic
Top 10,000 terms: 18.5% of the all search traffic

It’s nice to see concrete data to validate conventional wisdom. Of course, I’d be curious to see the corresponding distribution of ad revenue associated with terms.

Uncategorized

IRF Symposium on Patent Retrieval

Post author By Daniel Tunkelang
Post date November 7, 2008

Thanks to Jeff for writing up notes on the annual IR Facility Symposium 2008. Related links:

Uncategorized

Daniel Lemire on What Makes Database Indexes Work

Post author By Daniel Tunkelang
Post date November 7, 2008
2 Comments on Daniel Lemire on What Makes Database Indexes Work

Daniel Lemire has a great post today entitled “Understanding what makes database indexes work“. There’s nothing that should be surprising for folks who live and breathe this stuff, but it’s a great introduction for those who don’t. Here are his bullet points:

You expect specific queries: restructure your data!
You expect specific queries: materialize them!
You expect specific queries: redundancy is (sometimes) your friend
Use multiresolution!
Your data is not random: compress it!
In any case: optimize your code

Read his post to get the details.

General

Another Difference Between Enterprise Search and Web Search

Post author By Daniel Tunkelang
Post date November 6, 2008
14 Comments on Another Difference Between Enterprise Search and Web Search

As long-time readers know, one of my recurring themes is that there is a world of difference between web search and enterprise search–at least as those concepts are understood today. The other day, I had a conversation with my friend Carl Eklof, and we arrived at an aspect of that difference that I have at best understated in the past. Let me try to elaborate it now.

In web search, the immediate results for a query are pages on web sites. But these pages aren’t necessarily “documents”. In fact, the most popular web sites are portals or destinations, designed to help a user shop, research specialized information, communicate with other people, etc. When a web search takes a user to a page on such a site, the site (if it is well designed) takes on the responsibility for contextualizing the user’s experience.

In contrast, enterprise content often consists of a heterogeneous collection of content whose organization is at best implicit in its physical and logical arrangement. Departments within an enterprise may build user-centered portals, but it’s rare to see the sort of symbiosis that occurs between web search engines and the sites they index.

As a result, one of the challenges of an enterprise search application is that it must deliver a holistic user experience that compensates for the lack of effort on the part of the documents it indexes. Users still need context and guidance, but now the responsibility falls almost entirely on the search engine to deliver it.

Admittedly this picture is oversimplified. I don’t even like the term “enterprise search” because it’s often construed so narrowly. But I realize that many folks struggle with the idea that finding information within a proprietary document collection could be harder than doing so on the web. I hope this explanation helps shed some light.

Uncategorized

No Correlation Between Reading Difficulty and Popularity?

Post author By Daniel Tunkelang
Post date November 6, 2008

Paul Ogilvie just started blogging at mSpoke, and his first post asks “What makes a blog post popular? Part I: Comparing popularity and reading difficulty“. Specifically, he explores “whether well-written feed items are more likely to receive attention than poorly-written ones”. At the risk of stealing his thunder, I’ll deliver the punchline: he found no correlations between surface features of reading difficulty and popularity. Fortunately, he’s not planning to give up on writing quality!

Like Paul, I find that the absence of correlation goes against common sense wisdom. I’m curious whether the problem is the measures he’s using (which he admits are crude), or other factors that confound the popularity statistics.

Via Jon Elsas.

General

Modista: Similarity Browsing…for Shoes!

Post author By Daniel Tunkelang
Post date November 5, 2008
1 Comment on Modista: Similarity Browsing…for Shoes!

Let me start with a disclaimer. My idea of “finding shoes” is finding the one pair of shoes I own in the closet. In general, I’m not much of a shopper, let alone a shoe shopper.

That said, I really love what Arlo Faria and AJ Shankar, two Berkeley PhD students on leave, have done with Modista. In their own words:

Modista simplifies online shopping by searching inventories across multiple retailers and displaying results in an intuitive interface. Our patent-pending technology organizes items according to their visual similarity using digital image processing and machine learning algorithms.

All that is true, but it doesn’t capture what makes Modista cool. Modista delivers what m c schraefel calls the “joy of search”. Even for someone like me who only buys classic black loafers, they’ve created a fun exploratory experience. To see what a real shoe-shopper thinks of it, check out this post at ShoeBlog.

I’ve been skeptical of both similarity browsing and visual search. I’m still skeptical about the breadth of either techinque’s applicability. But I am impressed with this application.

General

Transparency 2.0

Anyone who doubts the impact of blogging, Twitter, and other Web 2.0 technologies would do well to read yesterday’s New York Times article, “In Era of Blog Sniping, Companies Shoot First“.

While the article focuses on the more drastic aspects of corporate communication (“In the age of transparency, the layoff will be blogged”), there is a larger point here. NDAs not withstanding, employees talk–especially disgruntled employees who have lost or are about to lose their jobs. Even before Web 2.0, there were sites that encouraged anonymous tipsters to supply news of companies experiencing financial or moral difficulty. But blogs and Twitter have made the propagation of juicy information almost instantaneous.

Our notions of privacy and secrecy are changing as we no longer have privacy through difficulty. Many people–as well as governments and institutions–are reacting with alarm, trying to find ways to safeguard individual or corporate confidentiality in an age of hypercommunication. Perhaps we would do better to accept that privacy as we used to know it is lost, and come up with legal and social norms that reflect the world we live in today.

Uncategorized

Knowledge Management is a Process

Post author By Daniel Tunkelang
Post date November 4, 2008

Kudos to Lynda Moulton at the Enteprise Search Practice Blog for a post entitled “Apples and Orangutans: Enterprise Search and Knowledge Management“. She criticises some commentary in CIO Magazine that “search is being implemented in enterprises as the new knowledge management”. Her thesis in a nutshell is that “knowledge management (KM) is not now, nor has it ever been, a software product or even a suite of products”. If that’s not enough to get you to read the full article, here’s an excerpt:

Because I follow enterprise search for the Gilbane Group while maintaining a separate consulting practice in knowledge management, I am struggling with his conflation of the two terms or even the migration of one to the other. The search we talk about is a set of software technologies that retrieve content. I’m tired of the debate about the terminology “enterprise search” vs. “behind the firewall search.” I tell vendors and buyers that my focus is on software products supporting search executed within (or from outside looking in) the enterprise on content that originates from within the enterprise or that is collected by the enterprise. I don’t judge whether the product is for an exclusive domain, content type or audience, or whether it is deployed with the “intent” of finding and retrieving every last scrap of content lying around the enterprise. It never does nor will do the latter but if that is what an enterprise aspires to, theirs is a judgment call I might help them re-evaluate in consultation.

Uncategorized

MIT Talk Now Available Online

Post author By Daniel Tunkelang
Post date November 4, 2008
7 Comments on MIT Talk Now Available Online

I haven’t gotten to look at it from my wi-fi connection on the Amtrak Acela (which is clearly a work in progress, but nonetheless a very exciting development), but here’s a link to the video.

Note that you’ll have to install Microsoft Silverlight to view it.

Alternatively, you can watch the slideshow below, but I’m not sure how much sense it will make without the voice-over.

http://static.slideshare.net/swf/ssplayer2.swf?doc=set-retrieval-20-1225903234513602-9&stripped_title=set-retrieval-20-presentation