The Noisy Channel

 

Guest Demo: Eric Iverson’s Itty Bitty Search

February 16th, 2010 · 17 Comments · General

I’m back from vacation, and still digging my way out of everything that’s piled up while I’ve been offline.

While I catch up, I thought I’d share with you a demo that Eric Iverson was gracious enough to share with me. It uses Yahoo! BOSS to support an exploratory search experience on top of a general web search engine.

When you perform a query, the application retrieves a set of related term candidates using Yahoo’s key terms API. It then scores each term by dividing its occurrence count within the result set by its global occurrence count–a relevance measure similar to one my former colleagues and I used at Endeca in enterprise contexts.

You can try out the demo yourself at http://www.ittybittysearch.com/. While it has rough edges, it produces nice results–especially considering the simplicity of the approach.

Here’s an example of how I used the application to explore and learn something new. I started with [“information retrieval”]. I noticed “interactive information retrieval” as a top term, so I used it to refine. Most of the refinement suggestions looked familiar to me–but an unfamiliar name caught my attention: “Anton Leuski”. Following my curiosity, I refined again. Looking at the results, I immediately saw that Leuski had done work on evaluating document clustering for interactive information retrieval. Further exploration made it clear this is someone whose work I should get to know–check out his home page!

I can’t promise that you’ll have as productive an experience as I did, but I encourage you to try Eric’s demo. It’s simple examples like these that remind me of the value of pursuing HCIR for the open web.

Speaking of which, HCIR 2010 is in the works. We’ll flesh out the details over the next weeks, and of course I’ll share them here.

17 responses so far ↓

  • 1 jeremy // Feb 16, 2010 at 2:45 am

    Anton is an old colleague of mine from the UMass days. If you’ve never seen his Lighthouse system, you really should. It’s a great example of exploratory search, combining ranked lists with clustering, in a way that lets you see both at the same time.

    http://people.ict.usc.edu/~leuski/projects/lighthouse/index.html

  • 2 jeremy // Feb 16, 2010 at 2:52 am

    ..and not only see both the cluster and the ranked list.. but (more importantly) interact with both the cluster and ranked list at the same time, and have the changes that you affect on one instantly appear on the other, so you can get a sense of what is happening as you interactively adjust your search strategy, online.

    I first encountered Anton’s work back in ’98 or so. I agree; it’s a shame that this sort of interactivity never took off on the open web. I’ve always been fascinated by it.

  • 3 Eric Iverson // Feb 16, 2010 at 9:43 am

    Thanks for the interest in itty bitty search. This is a work in progress, and sidebar term ordering may change.

    I am particularly leaning towards only having one sidebar per search. Right now I have one sidebar per page of search results. It’s too much work to scan to see what’s new, what went away, and what moved.

    Also, I am looking into using Google AdWords keyword cost per day and total searches per day to help order the sidebar results.

  • 4 Otis Gospodnetic // Feb 16, 2010 at 1:51 pm

    Cool cool.
    Using term counts in SERPs reminds me of what you see on top-right of, say, http://www.simpy.com/user/otis/tag/%22information+retrieval%22 (no division by global occurrence counts)

    If I had to do that again, I’d use this Key Phrase Extractor (I’m biased):
    http://www.sematext.com/products/key-phrase-extractor/index.html :)

  • 5 Daniel Tunkelang // Feb 16, 2010 at 6:41 pm

    Eric: thank you for putting the demo out there!

    Jeremy: I’m intrigued by Anton’s work, though I have to confess a certain skepticism about information retrieval using graph-layout visualizations. Perhaps I’m too burned by my own past attempts in this area.

    Otis: thanks for the link. I do think global occurrence counts are helpful. I also assume that Yahoo is using its query logs to improve the quality of keyword extraction beyond what would be possible if all you could look at was the document text.

  • 6 jeremy // Feb 16, 2010 at 6:51 pm

    That’s the thing: Most graph layout visualizations do that and only that. You’re forced into that one and only mode.

    Anton’s work, on the other hand, lets you simultaneously see both the graph and the ranked list (top 50 or 100 results). So when you make a change to the list view, it appears in the graph view, and vice versa.

    This is the power of his work, imho. It’s the multiplicity of the views, which share state and are mutually informative.

  • 7 Ed H. Chi // Feb 16, 2010 at 10:40 pm

    jeremy: there has been many graph layout and visualization of search result over the last decade, and most of them seem not to work out very well, since they spend precious screen space that should be allocated to actual presentation of the words (snippets) of the search results. I believe graph layouts and visualization of search results make the wrong tradeoff in design, and I speak of this as an information visualization researcher.

    Eric’s tool very much reminds me of our own work at http://MrTaggy.com

    The issue he speaks of about “It’s too much work to scan to see what’s new, what went away, and what moved”, we spent a lot of time dealing with in our design.

  • 8 jeremy // Feb 18, 2010 at 12:14 pm

    Ed: Well, Anton’s work was over a decade ago, and to this day I haven’t seen anything like it. One thing that it let you do is control the amount of screen space dedicated to the ranked snippet, vs. to the graph. You could make the snippets as large (and the graph as small) as you wanted it to be. So it allowed you do basically have the ranked list take up 4/5 of the screen, and the graph take up 1/5. Then, while you are traversing the ranked list, as you normally would in a non-visual interface, if any of the documents that you are marking as “interesting” appear to start clustering, you could easily detect that and make the graph larger again, mark the notes that you’d like to see more of, based on the visualization, then switch back to the ranked list.

    It was the interactivity, the back-and-forth play between list and visualization, that I like, that make this work interesting.

    Everything else that I’ve seen in graph visualization over the past decade gives you the graph, and only the graph, with maybe a summary snippet of whatever current node you are mousing over. And I will definitely agree with you that having only the graph is not good.

    But you do have more experience than me in this, so could you point me to any sort of evaluation of something like Anton’s work, where the user essentially has both visualizations simultaneously, i.e. the list AND the graph, and can see the effect that manipulating one has on the other, simultaneously?

    Because again, it is wrong to call Anton’s work a “graph visualization”. Because it’s not. It’s a “list plus graph visualization”. It works because it does both, simultaneously.

  • 9 jeremy // Feb 18, 2010 at 1:52 pm

    mark the notes that you’d like to see more of

    Correction: “mark the nodes that you’d like to see more of”.

    Too much music IR in my background ;-)

  • 10 Don Byrd // Feb 20, 2010 at 3:44 pm

    Anton is an old colleague of mine from those UMass days (the late 1990s), too. As is Jeremy, not coincidentally. (In fact, it’s mostly my fault Jeremy has so much music IR background :-) .) I was also very impressed with Lighthouse, for the same reasons Jeremy was. Of course tradeoffs in using screen space are always important, but Anton’s design lets you weight things the way you want… Incidentally, Marti Hearst’s new book includes some work of mine from UMass back then, the “Scrollbar with Confetti”, that visualizes the content of a document at zero cost in screen space.

  • 11 Daniel Tunkelang // Feb 20, 2010 at 4:44 pm

    I’m psyched to have surrounded myself online and offline with UMass alums! And I recognize that my skepticism about graph drawing (even when mixed with non-visual approaches) may be a prejudice–albeit one informed by personal experience.

  • 12 jeremy // Feb 20, 2010 at 9:15 pm

    Well, that’s what I am curious about, Daniel. What is your experience with graph drawing? More to the point: Is your experience solely with a graph-only visualization? Or does it include a graph-list interplay? Because the latter is a very different beast than the former, a very different mode of user interaction.

    And it’s the interaction, rather than the visualization, that I think is the key here.

  • 13 Daniel Tunkelang // Feb 22, 2010 at 9:48 am

    Admittedly my experience is not with a graph-list interplay. I worked on stuff like this, where visualization of the concept network was intended to guide query refinement. But I’ve seen lots of attempts to use graph layout to communicate document or term similarity. I feel this often is better expressed textually if the measure is reliable / useful–or not expressed at all otherwise.

    In any case, I’d love to play with Anton’s demo. Hard for me to conclude much from a screenshot.

  • 14 Don Byrd // Feb 22, 2010 at 9:06 pm

    Glad to hear you’re psyched about us UMass alums, Dan :-) . In support of Jeremy’s observation about the interplay being the important thing, one of Anton’s publications about Lighthouse is entitled “The best of both worlds: Combining ranked list and clustering”. And true, you can’t tell much from a screenshot, but there’s a movie of a demo on Anton’s site too. Unfortunately, Firefox 3.5.8 — at least on my MacBook — renders the page without the QuickTime movie frame!, but it displays in Safari… It’d be nice to hear from Anton, wouldn’t it? I emailed him about this conversation a few days ago but haven’t heard from him yet.

  • 15 jeremy // Feb 22, 2010 at 11:02 pm

    What I find about most graph visualization interfaces is that they require you to traverse the information along pathways defined by the graph itself.

    I’ve always found that completely problematic, because even if the document similarity algorithm is 100% reliable, there is still a difference between similar documents and relevant documents. A ranked list orders documents by their relevance to the query, whereas a graph layout orders documents by their similarity to each other.

    So what you really want is to traverse the documents by their relevance to the query. (Well, I know you’re not a huge fan of ranking, but bear with me for this moment; generally sorting by probability of relevance is a good thing.) So what Anton’s system lets you do is traverse by relevance to the query. But if there then does happen to be a set of documents that do cluster, you can easily visualize that in a way that you can’t from the ranked list. You can quickly jump to those few documents, then jump back to the ranked list traversal.

    So that’s the interaction, that’s the process. Walk the list, but see the graph.

    Most of everything else I’ve seen in this space requires searchers to both walk and see the graph, with no list in sight. And that’s (imo) why they don’t work.

    But that difference in interaction — walk the list vs. walk the graph — is key.

    Here’s the reference that Don mentioned: http://ciir-publications.cs.umass.edu/getpdf.php?id=176

  • 16 Daniel Tunkelang // Feb 23, 2010 at 9:12 am

    I guess I’m just not a big fan of scalar document similarity measures. I’d like to see document similarity expressed in terms of common membership in explainable sets. Those sets could be presented visually, but text seems quite up to the task. In any case, this isn’t a criticism of Anton’s work–just that I’m interested in a different approach to the problem.

  • 17 jeremy // Feb 23, 2010 at 4:47 pm

    No, I understand that is not your interest. But given the predominance of ranked list approaches, I’m proposing that Anton’s work is better than raw ranked list, rather than worse. I’m just saying that Anton’s work is such that the graph is an augmentation of interaction as list iteration. Rather than a replacement of it.

    Most other graph visualization approaches try to replace standard list-following interaction with graph-based traversal, and that’s why (imo) they don’t work.

    Maybe similar-set-common-membership is the best way to do things. But w/r/t ranked lists, I am proposing the following ordering of systems:

    (worst = 3) Graph traversal + graph visualization interfaces
    (2) Ranked list traversal + ranked list visualization interfaces
    (best = 1) Ranked list traversal + graph visualization interfaces

    I’m just trying to make the case that 3 != 1, that you cannot just conflate anything with a graph into the same thing. And that by understanding the difference in interaction (the “I” in HCIR) between (graph traversal + graph visualization) versus (list traversal + graph visualization), we don’t risk throwing the baby out with the bathwater.

    I’m still interested in hearing back from Ed Chi, too. Ed, are you following this thread? Is the work that you’ve done in any way similar to Anton’s? Because again, most of everything else that I’ve seen over the past 10 years is not like that. And so I can see why folks dismiss it. They haven’t actually played around with ideas like Anton’s.

Clicky Web Analytics