Categories
General

The Word of the Day is…Ambient

No, not Ambien or ambiance. but ambient as in ambient findability.

Two items caught my attention this weekend. The first was a post by Oscar Berg at The Content Economy about ambient awareness and findability. The second was a presentation by Marianne Sweeny, posted at Ambient Insight, about SEO for Web 2.0.

An excerpt from Oscar’s post:

I am however more fond of the term “ambient awareness” and I am especially interested in how ambient awareness relates to findability which has traditionally been focused mainly on active methods of finding information such as searching and browsing.

I dare to say that humans are lazy by nature and that we are likely to use the method that requires the least effort when we look for information. We even tend to use less reliable information if it’s just easy to find and use. Instead of actively looking for information we prefer to passively monitor the flow of information in our environment. In fact, some say that actively looking for information is a relatively new phenomenon in human history. So, just being in an environment and becoming passively aware about things that happen in it is something we find very natural and convenient.

It’s an interesting point. Most of the systems we build for finding information presume an active information-seeking motive, but perhaps such systems are not optimizing for the way people are used to obtaining information. Still, I think that, until systems can passively surmise what information people need, we are stuck with requiring at least some active expression of that intent.

That leads us to the Sweeny presentation. It traces the history of search from an SEO point of view:

  • Human-Mediated
  • Human-Mediated plus Catalogs
  • Machine-Mediated
  • Human-Directed / Machine-Mediated
  • Human-Like Machine Mediation (aspirational)

It’s a nice presentation, and I recommend you give it a look. I’m delighted to see someone in the SEO community express a version of history and vision that is largely in line with that of the information seeking support folks.

Categories
Uncategorized

Happy Birthday to the ACM Digital Library!

This month’s issue of the Communications of the ACM includes a letter from ACM CEO John White celebrating the 10th anniversary of ACM’s Digital Library. As some of you may know, my colleagues and I at Endeca have been working with the ACM to improve the search and navigation functionality that the Digital Library provides.

In particular, ACM recently deployed a terminology extraction feature that we recently presented at HCIR ’08. While it’s still a work in progress (their version isn’t quite as current as what we demonstrated at the workshop), it represents a strong step in the direction of supporting exploratory search as part of the online library experience.

Please check it out and provide them with feedback, especially regarding the user interface that they designed using their own consultants. 

Categories
Uncategorized

MIT User Interface Design Teatime Blog

I just discovered that the User Interface Design group at MIT has started blogging. Here’s the mission statement from their opening post:

The sharing of knowledge and ideas is of fundamental importance to the advancement of technology. With this goal in mind, MIT’s User Interface Design group meets once a day at Tea Time to brainstorm new ideas, review new technologies and ideas, and share their experiences working in the field.

If we hope to herald innovation by sharing ideas with a research group , then there’s a boundless value to sharing ideas and thoughts with the world at large. With this goal in mind, we will post a daily log of the musings and observations we discuss in our tea time meetings, and welcome your thoughts and comments about Human Computer Interaction, User Interface Design, and increasing the value and effectiveness of how we use technology.

I’m psyched whenever I see academics blogging, and even more psyched to see a collective effort like this one.

Categories
Uncategorized

The Long Tail of Search

The “long tail” is one of the most abused buzzwords of recent years, and I hesitate to use it myself in respectable company.

Nonetheless, SEO veteran Dustin Woodard has a nice guest post at the Hitwise Intelligence blog entitled “Sizing Up the Long Tail of Search“. Here are some statistics he cites about the distribution of search term frequency for web search data collected by Hitwise:

 

  • Top 100 terms: 5.7% of the all search traffic
  • Top 500 terms: 8.9% of the all search traffic
  • Top 1,000 terms: 10.6% of the all search traffic
  • Top 10,000 terms: 18.5% of the all search traffic

It’s nice to see concrete data to validate conventional wisdom. Of course, I’d be curious to see the corresponding distribution of ad revenue associated with terms.

Categories
Uncategorized

IRF Symposium on Patent Retrieval

Thanks to Jeff for writing up notes on the annual IR Facility Symposium 2008. Related links:

Categories
Uncategorized

Daniel Lemire on What Makes Database Indexes Work

Daniel Lemire has a great post today entitled “Understanding what makes database indexes work“. There’s nothing that should be surprising for folks who live and breathe this stuff, but it’s a great introduction for those who don’t. Here are his bullet points:

  1. You expect specific queries: restructure your data!
  2. You expect specific queries: materialize them!
  3. You expect specific queries: redundancy is (sometimes) your friend
  4. Use multiresolution!
  5. Your data is not random: compress it!
  6. In any case: optimize your code 
Read his post to get the details.
Categories
General

Another Difference Between Enterprise Search and Web Search

As long-time readers know, one of my recurring themes is that there is a world of difference between web search and enterprise search–at least as those concepts are understood today. The other day, I had a conversation with my friend Carl Eklof, and we arrived at an aspect of that difference that I have at best understated in the past. Let me try to elaborate it now.

In web search, the immediate results for a query are pages on web sites. But these pages aren’t necessarily “documents”. In fact, the most popular web sites are portals or destinations, designed to help a user shop, research specialized information, communicate with other people, etc. When a web search takes a user to a page on such a site, the site (if it is well designed) takes on the responsibility for contextualizing the user’s experience.

In contrast, enterprise content often consists of a heterogeneous collection of content whose organization is at best implicit in its physical and logical arrangement. Departments within an enterprise may build user-centered portals, but it’s rare to see the sort of symbiosis that occurs between web search engines and the sites they index.

As a result, one of the challenges of an enterprise search application is that it must deliver a holistic user experience that compensates for the lack of effort on the part of the documents it indexes. Users still need context and guidance, but now the responsibility falls almost entirely on the search engine to deliver it.

Admittedly this picture is oversimplified. I don’t even like the term “enterprise search” because it’s often construed so narrowly. But I realize that many folks struggle with the idea that finding information within a proprietary document collection could be harder than doing so on the web. I hope this explanation helps shed some light.

Categories
Uncategorized

No Correlation Between Reading Difficulty and Popularity?

Paul Ogilvie just started blogging at mSpoke, and his first post asks “What makes a blog post popular? Part I: Comparing popularity and reading difficulty“. Specifically, he explores “whether well-written feed items are more likely to receive attention than poorly-written ones”. At the risk of stealing his thunder, I’ll deliver the punchline: he found no correlations between surface features of reading difficulty and popularity. Fortunately, he’s not planning to give up on writing quality!

Like Paul, I find that the absence of correlation goes against common sense wisdom. I’m curious whether the problem is the measures he’s using (which he admits are crude), or other factors that confound the popularity statistics.

Via Jon Elsas.

Categories
General

Modista: Similarity Browsing…for Shoes!

Let me start with a disclaimer. My idea of “finding shoes” is finding the one pair of shoes I own in the closet. In general, I’m not much of a shopper, let alone a shoe shopper.

That said, I really love what Arlo Faria and AJ Shankar, two Berkeley PhD students on leave, have done with Modista. In their own words:

Modista simplifies online shopping by searching inventories across multiple retailers and displaying results in an intuitive interface. Our patent-pending technology organizes items according to their visual similarity using digital image processing and machine learning algorithms.

All that is true, but it doesn’t capture what makes Modista cool. Modista delivers what m c schraefel calls the “joy of search”. Even for someone like me who only buys classic black loafers, they’ve created a fun exploratory experience. To see what a real shoe-shopper thinks of it, check out this post at ShoeBlog.

I’ve been skeptical of both similarity browsing and visual search. I’m still skeptical about the breadth of either techinque’s applicability. But I am impressed with this application.

Categories
General

Transparency 2.0

Anyone who doubts the impact of blogging, Twitter, and other Web 2.0 technologies would do well to read yesterday’s New York Times article, “In Era of Blog Sniping, Companies Shoot First“.

While the article focuses on the more drastic aspects of corporate communication (“In the age of transparency, the layoff will be blogged”), there is a larger point here. NDAs not withstanding, employees talk–especially disgruntled employees who have lost or are about to lose their jobs. Even before Web 2.0, there were sites that encouraged anonymous tipsters to supply news of companies experiencing financial or moral difficulty. But blogs and Twitter have made the propagation of juicy information almost instantaneous.

Our notions of privacy and secrecy are changing as we no longer have privacy through difficulty. Many people–as well as governments and institutions–are reacting with alarm, trying to find ways to safeguard individual or corporate confidentiality in an age of hypercommunication. Perhaps we would do better to accept that privacy as we used to know it is lost, and come up with legal and social norms that reflect the world we live in today.