Month: November 2008

The Word of the Day is…Ambient

Post author By Daniel Tunkelang
Post date November 9, 2008

No, not Ambien or ambiance. but ambient as in ambient findability.

Two items caught my attention this weekend. The first was a post by Oscar Berg at The Content Economy about ambient awareness and findability. The second was a presentation by Marianne Sweeny, posted at Ambient Insight, about SEO for Web 2.0.

An excerpt from Oscar’s post:

I am however more fond of the term “ambient awareness” and I am especially interested in how ambient awareness relates to findability which has traditionally been focused mainly on active methods of finding information such as searching and browsing.

I dare to say that humans are lazy by nature and that we are likely to use the method that requires the least effort when we look for information. We even tend to use less reliable information if it’s just easy to find and use. Instead of actively looking for information we prefer to passively monitor the flow of information in our environment. In fact, some say that actively looking for information is a relatively new phenomenon in human history. So, just being in an environment and becoming passively aware about things that happen in it is something we find very natural and convenient.

It’s an interesting point. Most of the systems we build for finding information presume an active information-seeking motive, but perhaps such systems are not optimizing for the way people are used to obtaining information. Still, I think that, until systems can passively surmise what information people need, we are stuck with requiring at least some active expression of that intent.

That leads us to the Sweeny presentation. It traces the history of search from an SEO point of view:

Human-Mediated
Human-Mediated plus Catalogs
Machine-Mediated
Human-Directed / Machine-Mediated
Human-Like Machine Mediation (aspirational)

It’s a nice presentation, and I recommend you give it a look. I’m delighted to see someone in the SEO community express a version of history and vision that is largely in line with that of the information seeking support folks.

Uncategorized

Happy Birthday to the ACM Digital Library!

Post author By Daniel Tunkelang
Post date November 8, 2008

This month’s issue of the Communications of the ACM includes a letter from ACM CEO John White celebrating the 10th anniversary of ACM’s Digital Library. As some of you may know, my colleagues and I at Endeca have been working with the ACM to improve the search and navigation functionality that the Digital Library provides.

In particular, ACM recently deployed a terminology extraction feature that we recently presented at HCIR ’08. While it’s still a work in progress (their version isn’t quite as current as what we demonstrated at the workshop), it represents a strong step in the direction of supporting exploratory search as part of the online library experience.

Please check it out and provide them with feedback, especially regarding the user interface that they designed using their own consultants.

Uncategorized

MIT User Interface Design Teatime Blog

Post author By Daniel Tunkelang
Post date November 8, 2008

I just discovered that the User Interface Design group at MIT has started blogging. Here’s the mission statement from their opening post:

The sharing of knowledge and ideas is of fundamental importance to the advancement of technology. With this goal in mind, MIT’s User Interface Design group meets once a day at Tea Time to brainstorm new ideas, review new technologies and ideas, and share their experiences working in the field.

If we hope to herald innovation by sharing ideas with a research group , then there’s a boundless value to sharing ideas and thoughts with the world at large. With this goal in mind, we will post a daily log of the musings and observations we discuss in our tea time meetings, and welcome your thoughts and comments about Human Computer Interaction, User Interface Design, and increasing the value and effectiveness of how we use technology.

I’m psyched whenever I see academics blogging, and even more psyched to see a collective effort like this one.

Uncategorized

The Long Tail of Search

The “long tail” is one of the most abused buzzwords of recent years, and I hesitate to use it myself in respectable company.

Nonetheless, SEO veteran Dustin Woodard has a nice guest post at the Hitwise Intelligence blog entitled “Sizing Up the Long Tail of Search“. Here are some statistics he cites about the distribution of search term frequency for web search data collected by Hitwise:

Top 100 terms: 5.7% of the all search traffic
Top 500 terms: 8.9% of the all search traffic
Top 1,000 terms: 10.6% of the all search traffic
Top 10,000 terms: 18.5% of the all search traffic

It’s nice to see concrete data to validate conventional wisdom. Of course, I’d be curious to see the corresponding distribution of ad revenue associated with terms.

Uncategorized

IRF Symposium on Patent Retrieval

Post author By Daniel Tunkelang
Post date November 7, 2008

Thanks to Jeff for writing up notes on the annual IR Facility Symposium 2008. Related links:

Uncategorized

Daniel Lemire on What Makes Database Indexes Work

Post author By Daniel Tunkelang
Post date November 7, 2008
2 Comments on Daniel Lemire on What Makes Database Indexes Work

Daniel Lemire has a great post today entitled “Understanding what makes database indexes work“. There’s nothing that should be surprising for folks who live and breathe this stuff, but it’s a great introduction for those who don’t. Here are his bullet points:

You expect specific queries: restructure your data!
You expect specific queries: materialize them!
You expect specific queries: redundancy is (sometimes) your friend
Use multiresolution!
Your data is not random: compress it!
In any case: optimize your code

Read his post to get the details.

General

Another Difference Between Enterprise Search and Web Search

Post author By Daniel Tunkelang
Post date November 6, 2008
14 Comments on Another Difference Between Enterprise Search and Web Search

As long-time readers know, one of my recurring themes is that there is a world of difference between web search and enterprise search–at least as those concepts are understood today. The other day, I had a conversation with my friend Carl Eklof, and we arrived at an aspect of that difference that I have at best understated in the past. Let me try to elaborate it now.

In web search, the immediate results for a query are pages on web sites. But these pages aren’t necessarily “documents”. In fact, the most popular web sites are portals or destinations, designed to help a user shop, research specialized information, communicate with other people, etc. When a web search takes a user to a page on such a site, the site (if it is well designed) takes on the responsibility for contextualizing the user’s experience.

In contrast, enterprise content often consists of a heterogeneous collection of content whose organization is at best implicit in its physical and logical arrangement. Departments within an enterprise may build user-centered portals, but it’s rare to see the sort of symbiosis that occurs between web search engines and the sites they index.

As a result, one of the challenges of an enterprise search application is that it must deliver a holistic user experience that compensates for the lack of effort on the part of the documents it indexes. Users still need context and guidance, but now the responsibility falls almost entirely on the search engine to deliver it.

Admittedly this picture is oversimplified. I don’t even like the term “enterprise search” because it’s often construed so narrowly. But I realize that many folks struggle with the idea that finding information within a proprietary document collection could be harder than doing so on the web. I hope this explanation helps shed some light.

Uncategorized

No Correlation Between Reading Difficulty and Popularity?

Post author By Daniel Tunkelang
Post date November 6, 2008

Paul Ogilvie just started blogging at mSpoke, and his first post asks “What makes a blog post popular? Part I: Comparing popularity and reading difficulty“. Specifically, he explores “whether well-written feed items are more likely to receive attention than poorly-written ones”. At the risk of stealing his thunder, I’ll deliver the punchline: he found no correlations between surface features of reading difficulty and popularity. Fortunately, he’s not planning to give up on writing quality!

Like Paul, I find that the absence of correlation goes against common sense wisdom. I’m curious whether the problem is the measures he’s using (which he admits are crude), or other factors that confound the popularity statistics.

Via Jon Elsas.

General

Modista: Similarity Browsing…for Shoes!

Post author By Daniel Tunkelang
Post date November 5, 2008
1 Comment on Modista: Similarity Browsing…for Shoes!

Let me start with a disclaimer. My idea of “finding shoes” is finding the one pair of shoes I own in the closet. In general, I’m not much of a shopper, let alone a shoe shopper.

That said, I really love what Arlo Faria and AJ Shankar, two Berkeley PhD students on leave, have done with Modista. In their own words:

Modista simplifies online shopping by searching inventories across multiple retailers and displaying results in an intuitive interface. Our patent-pending technology organizes items according to their visual similarity using digital image processing and machine learning algorithms.

All that is true, but it doesn’t capture what makes Modista cool. Modista delivers what m c schraefel calls the “joy of search”. Even for someone like me who only buys classic black loafers, they’ve created a fun exploratory experience. To see what a real shoe-shopper thinks of it, check out this post at ShoeBlog.

I’ve been skeptical of both similarity browsing and visual search. I’m still skeptical about the breadth of either techinque’s applicability. But I am impressed with this application.

General

Transparency 2.0

Anyone who doubts the impact of blogging, Twitter, and other Web 2.0 technologies would do well to read yesterday’s New York Times article, “In Era of Blog Sniping, Companies Shoot First“.

While the article focuses on the more drastic aspects of corporate communication (“In the age of transparency, the layoff will be blogged”), there is a larger point here. NDAs not withstanding, employees talk–especially disgruntled employees who have lost or are about to lose their jobs. Even before Web 2.0, there were sites that encouraged anonymous tipsters to supply news of companies experiencing financial or moral difficulty. But blogs and Twitter have made the propagation of juicy information almost instantaneous.

Our notions of privacy and secrecy are changing as we no longer have privacy through difficulty. Many people–as well as governments and institutions–are reacting with alarm, trying to find ways to safeguard individual or corporate confidentiality in an age of hypercommunication. Perhaps we would do better to accept that privacy as we used to know it is lost, and come up with legal and social norms that reflect the world we live in today.