The Noisy Channel


Exploring Semantic Means

March 11th, 2009 · 8 Comments · General

I gave a talk last week at the New York Semantic Web Meetup entitled “exploring semantic means“, and I thought readers here might want to peruse the slides. You can see more pictures of the event here, as well as the slides Ken Ellis presented about the work he’s doing at Daylife. I was also interviewed for a few minutes after the talk; I’ll post a link to the podcast when it’s available.

8 responses so far ↓

  • 1 MarkH // Mar 12, 2009 at 9:20 am

    Hi Daniel,
    So is slide 25 the “ask me in a month..” answer to my original question?

    It’s certainly nice, but my original question was less about the specifics of bank bonuses and more about how fuzzier styles of query (using example documents as queries) can sit with a faceted search interface when all results are far from being equally relevant.
    I suppose your start query could have been a document and not a keyword but the key here (unlike in previous faceted interfaces of yours I’ve seen) is NOT to show any totals for the number of matches in suggested groups. The numbers would be meaningless because they imply that you have either considered all results (including the very low-similarity matches) or you have introduced an arbitrary relevance cut-off that makes a nonsense of any precision in group totals.
    Is the technology you show here exhaustive or sample-based in its grouping of results? The former would preclude a fuzzy “like this doc/para” style of query.

  • 2 Daniel Tunkelang // Mar 12, 2009 at 1:28 pm

    Indeed, and you didn’t even have to wait a month! I assumed that you weren’t concerned about the specifics of bank bonuses, but rather about the ability to satisfy information needs like the one you described. And I hope the interface there does take a step in the right direction.

    You’re right that they aren’t showing numbers–that’s a design choice, but the technology is based on actual results, not something fuzzy. I’m not entirely sure how they are ranking them–but there’s no rule that you have to rank facet values based on their frequency in the result set, particularly if the results involve a search that favors recall over precision. That’s a topic for broader discussion, and I promise to post about it.

  • 3 MarkH // Mar 12, 2009 at 1:43 pm

    >>I assumed you[ were].. concerned about the ability to satisfy information needs like the one you described

    Yes, but specifically using the retrieval technique I suggested- “given an example document”. Do I detect that you do not subscribe to that approach to specifying queries?

  • 4 Daniel Tunkelang // Mar 12, 2009 at 1:50 pm

    I’m not morally opposed to it, but it’s not my preferred entry point into the information seeking process–at least not with a browser-based search engine. In general, it strikes me as making the problem harder than it has to be–turning the query into a similarity query on a document. I see that approach being more useful when it is later in the process, e.g., I like this document, show me more like it. Better still when the basis for similarity is transparent.

  • 5 MarkH // Mar 12, 2009 at 1:56 pm

    >>it strikes me as making the problem harder than it has to be

    Harder for who? The user trying to specify the information need, the system that processes it or the user interpreting results?

  • 6 Daniel Tunkelang // Mar 12, 2009 at 2:01 pm

    For the user. My information needs usually aren’t in document form.

  • 7 MarkH // Mar 12, 2009 at 2:13 pm

    I went on an Autonomy course many years ago. The mantra was keywords are “legacy” and that we should use example documents/passages to express intent.
    As you suggest, I felt this was like trying to have a conversation with someone where you couldn’t speak for yourself but had to select a book from a shelf that best represented your intent.
    I think the technique can have its place but I see issues trying to combine that with faceted result UIs.

  • 8 Financial Times + Endeca = Newssift | The Noisy Channel // Mar 18, 2009 at 8:46 pm

    […] Now that the cat is out of the bag, I’m proud to tell readers here about an effort I’ve been involved with over the past few months. As reported in TechCrunch and Search Engine Land,  the Financial Times just launched Newssift, a semantic search engine, powered by Endeca, that sifts through business news. Regular readers may recognize the application from an example I used in my presentation on exploring semantic means. […]

Clicky Web Analytics