I’m a big fan of using machine learning and automated information extraction to improve search performance and generally support information seeking. I’ve had some very good experiences with both supervised (e.g., classification) and unsupervised (e.g., terminology extraction) learning approaches, and I think that anyone today who is developing an application to help people access text documents should at least give serious consideration to both kinds of algorithmic approaches. Sometimes automatic techniques work like magic!
But sometimes they don’t. Netbase‘s recent experience with HealthBase is, unfortunately, a case study in why you shouldn’t have too much faith in magic. As Jeff Dalton noted, the “semantic search” is hit-or-miss. The hits are great, but it’s the misses that generate headlines like this one in TechCrunch: “Netbase Thinks You Can Get Rid Of Jews With Alcohol And Salt”. Ouch.
It seems unfair to single out Netbase for a problem endemic to fully automated approaches, but they did invite the publicity. It would be easy to dig up a host of other purely automated approaches that are just as embarassing, if less publicized.
Dave Kellogg put it well (if a bit melodramatically) when he characterized this experience as a “tragicomedy” that reveals the perils of magic. His argument, in a nutshell, is that you don’t want to be completely dependent on an approach for which 80% accuracy is considered good enough. As he says, the problem with magic is that it can fail in truly spectacular ways.
Granted, there’s a lot more nuance to using automated content enrichment approaches. Some techniques (or implementations of general techniques) optimize for precision (i.e., minimizing false positives), while others optimize for recall (i.e., minimizing false negatives). Supervised techniques are generally more conservative than unsupervised ones: you might incorrectly assert that a document is about disease, but that’s less dramatic a failure than adding the word “Jews” to an automatically extracted medical vocabulary. In general, the more human input into the process, the more opportunity to improve the effectiveness and avoid embarassing mistakes.
Of course, the whole point of automation is to reduce the need for human input. Human labor is a lot more expensive that machine labor! But there’s a big difference between the mirage of eliminating human labor and the realistic aspiration to make its use more efficient and effective. That what human-computer information retrieval (HCIR) is all about, and all of the evidence I’ve encountered confirms that it’s the right way to crack this nut. Look for yourselves at the proceedings of HCIR ’07 and ’08. Having just read through all of the submissions to HCIR ’09, I can tell you that the state of the art keeps getting better.
Interestingly, even Google CEO Eric Schmidt may be getting around to drinking the kool-aid. In an interview published today in TechCrunch, he says: “We have to get from the sort of casual use of asking, querying…to ‘what did you mean?’.” Unfortunately, he then goes into science-fiction-AI land and seems to end up suggesting a natural language question-answering approach like Wolfram Alpha. Still, at least his heart is in the right place.
Anyway, as they say, experience is the best teacher. Hopefully Netbase can recover from what could generously be called a public relations hiccup. But, as the aphorism continues, it is only the fool that can learn from no other. Let’s not be fools–and instead take away the moral of this story: instead of trying to automate everything, optimize the division of labor between human and machine. HCIR.
9 replies on “HCIR: Better Than Magic!”
Interestingly, even Google CEO Eric Schmidt may be getting around to drinking the kool-aid. In an interview published today in TechCrunch, he says: “We have to get from the sort of casual use of asking, querying…to ‘what did you mean?’.”
I don’t see this quote as a sign that Schmidt is finally coming to his senses and seeing the value of human input in the search process/dialogue.
I see it as a sign that he believes that more machine learning will be able to automatically infer “what you mean”.
Maybe I’m too optimistic–or too quick to give him the benefit of the doubt. But I do want to clarify that I’m not anti-semantic.
I agree with you, automation can only get so far, that being the 80/20 split. I’ve seen some systemst that combined techniques to get to to 85% or 90%, but it’s that last few that are so wrong and embarrassing. There’s a lot to be said for human editorial judgment.
there´s a lot of talk about sentiment analysis these days, following a few recent articles from NYT, EL Pais, etc…are we really capable of extracting “sentiment”, lets just leave it at “meaning” out of unstructures information, in this case social media tools suchas as blogs, twitter, FB, etc? Talking about sentiments is way too ambitous based on my experience in classification and search projects, but hey, it sells terribly well to government and enterprise marketing folks. Everybody wants to know, what do people think about you?
Technology can monitor and spot comments about us, but what about interpreting them and extraction meaning or a conclusion? Very complicated I think…
We´re having an interesting debate here..sorry it´s in Spanish:
Borja, thanks for the comment and link. To the credit of automated text processing tools, Google’s translation of your post is readable.
In short, I think you’re right that the start of the art in alerting is way ahead of the state of the art in extracting objective meaning, let alone subjective opinion.
A belated comment on this. An early example of a high-profile gaffe by an IR system appeared in the Boston Globe in 1995. An article criticized the then-new THOMAS system for legislative information, pointing out that its top match for “elderly black Americans” was a bill on “black bears”! The problem is discussed in the well-known (and easily found) paper by Bruce Croft et al, Providing Government Information on the Internet: Experiences with THOMAS.
One other thing. You say “the whole point of automation is to reduce the need for human input. Human labor is a lot more expensive that machine labor! But there’s a big difference between the mirage of eliminating human labor and the realistic aspiration to make its use more efficient and effective. That’s what HCIR is all about, and all of the evidence I’ve encountered confirms that it’s the right way to crack this nut.” I mostly agree. I’d agree completely except for Luis Von Ahn’s amazing discovery of “games with a purpose”, a way to make human labor _free_! Free, that is, except for the overhead of developing the game, and collecting and analyzing the information. And, of course, this offline method of adding intelligence to an IR system has severe limitations.
Better late than never! I hadn’t heard about that example, but I’m not surprised, as I’ve heard similar stories about other automated tools (including those in earlier versions of Microsoft Word). Here’s a link to the Croft et al. paper.
As for Luis Von Ahn’s GWAP approach, I’m a big fan. Check out some of my earlier posts on the subject, or stay tuned for my HCIR ’09 position paper! In any case, I’ll amend my statement: reduce the need for human labor and / or make it more fun.
[…] There’s no navigation, no browsing. There’s no search—and especially none involving interaction between the human and computer. There’s no news judgment beyond what newspaper editors originally add. And the corpus is […]
Daniel, thanks for your quick response; I’m pleased but not at all surprised you’re a GWAP fan. And Josh, I like your analysis of the Slate visualization; they have an environment that could add so much more HCI to the IR!