Categories
General

Kosmix: I’m Impressed

One of the recurring objections to exploratory search is that it can’t work for the web. For example, some folks at Microsoft Research presented a poster at HCIR ’08 explaining what they see as daunting challenges to implement faceted search for the web. I’ve often plugged Duck Duck Go as an successful example of exploratory search for the web, though it does not go as far as to offer faceted search.

Neither does Kosmix,  but I’m impressed nonetheless. Kosmix has been around for a few years, and Ken Ellis has blogged about it a fair amount at NP-Harder. But, perhaps because of a underwhelming first impression, I forgot about it. Today, I was reminded by a New York Times article entitled “Just Don’t Compare Kosmix to Google“. Intrigued, I decided to give Kosmix another whirl.

Here are some of the searches I tried (yes, of course I start with vanity searches):

Verdict: I’m impressed. When there’s a Wikpedia entry, it piggy-backs on it, but most of those queries are far enough into the “long tail” to have their own Wikipedia entries. I slightly prefer how Duck Duck Go handles ambiguity (e.g., compare a query for sigir on Kosmix vs. Duck Duck Go), but I have to say that. on the whole, Kosmix is delivering on its promise of offering exploratory search for the web, and the functionality is the richest I’ve seen so far.

My only complaint is that it returns a lot of content for every query. I’d prefer a more progressive experience that returns a bit less initially but simply shows me the directions for further exploration–perhaps even including them in the initially returned page, but tucked away for a cleaner presentation. And of course I wonder how well it will work for the research tasks that most demand exploratory search. There’s still a lot of work to do.

By Daniel Tunkelang

High-Class Consultant.

10 replies on “Kosmix: I’m Impressed”

Hey Daniel, thanks for the review. I’ll personally take it as feedback. 🙂

In dealing with such a massive taxonomy, so many relationships and so much content, two of our big design challenges are disambiguation and content density.

Since we launched in December 2008 (the product just turned 4 months old!), we’ve come a long way, but as you state, there’s still a lot of work to do. Often times in a fast-moving startup that works on this scale, it’s that technology/computing power can’t quite do (even if just for a few months) what we would design in an ideal world. Other times, it’s that we frankly aren’t 100% positive about the best approach because we’re inventing the approach. 🙂 So, we/Kosmix sometimes take the web-as-live-lab approach; we make educated guesses about designing a feature, launch the feature, gain input from folks, look at usage patterns, and refine/iterate/nuke from there.

Disambiguation and content density certainly weigh heavily on my mind, and I think it is safe to say that nothing you see now in the product is permanent. Progressive disclosure, customization and personalization are all things we think about on a daily basis. We’re very much in a learning/iterative mode, and I think that we will continue to improve the experience based on input from the folks that use the product.

Cheers,
Gino Zahnd
Director of User Experience, Kosmix

Like

Gino, you’re quite welcome, and thanks for engaging me on my home turf.

I didn’t realize your launch was so recent. But now I realize that, while the company has been around for a while, what I’d remembered was the vertical search engine you launched early on. I like this new direction.

Of course, while I’m offering feedback, I’ll ask for what I always want out of web search tools: more control. But I know too well from my own experience that you have to crawl before you walk, and I approve wholeheartedly of your iterative approach. Looking forward to great things!

Like

What does it mean for exploratory search to not work for the web? Do they mean that it takes too much processing power? Or that faceted search requires too many taxonomies, and the web it too broad? Faceted search is just one of many possible manifestations of exploratory search.

Broadly speaking, exploratory search is about helping you discover pathways of information that you might not have discovered otherwise, if you are just using a single-line input box with 10 blue needs. So I don’t see how anyone can say, with a single broad brushstroke, that exploratory search won’t work for the web, because I can think of dozens of ways to take exploratory search steps right now, ways that are scalable.

Like

To be clear, I think the Microsoft researchers only said that faceted search doesn’t work for the web, not exploratory search in general.

I see Duck Duck Go and Kosmix as evidence that exploratory search can and does work for the web. And I do think faceted web search is a good idea too, but I admit it will require more work.

Like

MSR only said that about faceted search. But you start the blogpost by saying: “One of the recurring objections to exploratory search is that it can’t work for the web.” Who else is objecting?

Like

I feel I’ve heard it often enough that I take it for granted as being part of the conventional wisdom. That’s sloppy of me, and I’ll try to find sources or else I’ll drop this claim as a straw man.

Like

Well, FWIW, I also “sense” it in the air. I was just wondering if you had any specific details, i.e. any specific instances where someone had specifically said something about exploratory search not working for the web. I’m not trying to call you out on your reasoning, as much as I am I’m just looking for something specific to refute 🙂

I agree with you that most big search companies don’t do it. I want to understand if they aren’t doing it because (1) they think it isn’t necessary, (2) even if it were necessary, that it doesn’t/can’t function properly on web-type data, (3) even if it could be made to function well, it wouldn’t scale, or (4) it can scale, but they’re too lazy to do it, or (5) they’re not too lazy to do it, but it would undercut their current business model, so they won’t do it, even though it can be done.

I’m seeking a little more clarity in what “they” mean when they say that exploratory search doesn’t work for the web. At what level does it “not work”?

Like

Well, I was impressed by Googler Bob Wyman’s insistence that the reasons Google hasn’t develop more exploratory functionality are technical, not philosophical. I have heard people, at Google and elsewhere (e.g., LinkedIn), insist that anything beyond ranked lists is too complicated for their users. And of course it doesn’t help matter that so many attempts to build “Google killers” use richer interfaces but don’t actually offer substantial improvement in resolving tasks Google doesn’t already solve adequately.

For example, I like Duck Duck Go, but I readily admit that it is only incrementally better than Google and thus no threat to it. Kosmix seems like a more ambitious effort, but I’m not ready to offer more than a positive first impression and cautious optimism.

Make your own impressions. Check out a query close to my heart that I doubt any of the engines have tuned for:

http://www.google.com/search?q=exploratory+search

http://duckduckgo.com/?q=exploratory+search

http://www.kosmix.com/topic/exploratory+search

Like

Great article and thread! I’m with Deep Web Technologies (www.deepwebtech.com).

Try:

http://tinyurl.com/c22utf

Sorry about the long URL. Our software at http://www.science.gov is a bit dated.

And, the sources aren’t the best for this particular search (our clients pick the sources we federate).

One of the big issues we’re trying to overcome is user context. Someone searching for “breast cancer” is going to care very differently about the results set, depending on whether they are a cancer patient, medical researcher or pharmaceutical researcher.

Larry.

Like

Larry, hope you don’t mind that I shortened the URL. I know of Deep Web through Abe and Sol Lederman. Exploratory interfaces over federated results can be tricky, from picking the right ones to managing overlapping content, diversity of user needs, etc. And I know a lot of folks think it sufficient to just reduce the presentation to sorting the federated results by a single, scalar score. I’m glad you guys are trying to expose the richness of the data you work with.

Like

Comments are closed.