<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Catching Up With Hunch</title>
	<atom:link href="http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/</link>
	<description></description>
	<lastBuildDate>Mon, 21 May 2012 05:21:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/comment-page-1/#comment-3119</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Wed, 13 May 2009 23:15:14 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2073#comment-3119</guid>
		<description>Looking at a few measures, but the one that I&#039;m most interested in is based on relative entropy between the starting set and expanded set, using distributions on explicit or extracted document annotations. We also can use lexical databases like WordNet, but I (and my fellow Endecans generally) much prefer to work with statistical techniques than linguistic ones. And we don&#039;t expect the measure to be right all the time, so we&#039;re also looking at ways to give users not only fine-grained control of the expansion, but also previews of how expanding the query changes the results. After all, I am an HCIR zealot!

This is work in progress--I&#039;ll say and show more when I can!</description>
		<content:encoded><![CDATA[<p>Looking at a few measures, but the one that I&#8217;m most interested in is based on relative entropy between the starting set and expanded set, using distributions on explicit or extracted document annotations. We also can use lexical databases like WordNet, but I (and my fellow Endecans generally) much prefer to work with statistical techniques than linguistic ones. And we don&#8217;t expect the measure to be right all the time, so we&#8217;re also looking at ways to give users not only fine-grained control of the expansion, but also previews of how expanding the query changes the results. After all, I am an HCIR zealot!</p>
<p>This is work in progress&#8211;I&#8217;ll say and show more when I can!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/comment-page-1/#comment-3112</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Wed, 13 May 2009 05:19:25 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2073#comment-3112</guid>
		<description>How are you measuring topic drift? Through non-relevance alone, or by some other metric?</description>
		<content:encoded><![CDATA[<p>How are you measuring topic drift? Through non-relevance alone, or by some other metric?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/comment-page-1/#comment-3104</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Wed, 13 May 2009 01:25:35 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2073#comment-3104</guid>
		<description>I think most of these sites put more emphasis on precision than recall--it&#039;s very hard to make the case for the latter, &lt;a href=&quot;http://thenoisychannel.com/2009/03/17/precision-and-recall/&quot; rel=&quot;nofollow&quot;&gt;not that I haven&#039;t tried&lt;/a&gt;. My colleagues and I are  working on some approaches that focus on recall in the face of noisy annotation, exposing a number of records vs. topic drift trade-off (which is an unsupervised analog of recall vs. precision). Demo in progress!</description>
		<content:encoded><![CDATA[<p>I think most of these sites put more emphasis on precision than recall&#8211;it&#8217;s very hard to make the case for the latter, <a href="http://thenoisychannel.com/2009/03/17/precision-and-recall/" rel="nofollow">not that I haven&#8217;t tried</a>. My colleagues and I are  working on some approaches that focus on recall in the face of noisy annotation, exposing a number of records vs. topic drift trade-off (which is an unsupervised analog of recall vs. precision). Demo in progress!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/comment-page-1/#comment-3099</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Tue, 12 May 2009 17:39:36 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2073#comment-3099</guid>
		<description>The main problem I have as a consumer with both of these technologies is recall.   Even highly faceted sites like newegg.com make decisions about which facets to present to the user in what order, just like a decision tree.  

If I go to newegg.com, for instance, and type &quot;2gb memory&quot;, it by default pops me into &quot;Guided Search&quot;.   But even if I go to &quot;Advanced Search&quot;, I don&#039;t get any of the subcategories I&#039;d expect, like the type of memory (pin config, DDR2/DDR3), the speed of the memory, etc.  I can expand subcategories, useful links (?), price, and manufacturer.   I have to make several choices, limiting my choice of products, until I see the &quot;type&quot; and &quot;speed&quot; and &quot;cas latency&quot; facets.  Someone (or some algorithm) decided that manufacturer was more important than the type of memory!  And I can&#039;t click on the breadcrumb path at the top of my search to just limit myself to DDR2 and forget the other facets I&#039;ve chosen.

The second issue is plain old database errors. With faceting at sites like NewEgg and Amazon, the DB annotation isn&#039;t perfect, so you get false positives and false negatives on faceted search that you tend not to get with plain old text search.  If I select DDR2, I have to trust that they&#039;ve entered &quot;DDR2&quot; for every product that really is DDR2.   Amazon also has serious issues faceting by recording artist (granted, it&#039;s a hard problem), but it means that I can&#039;t click on an artist&#039;s name and expect to see all their other albums.  The faceting&#039;s actually hurting recall.

Of course, we could hope that they do a better job on the database, selecting facets, etc., but it&#039;s a really difficult problem when there are literally hundreds or thousands of facets in the system and humans doing data entry.</description>
		<content:encoded><![CDATA[<p>The main problem I have as a consumer with both of these technologies is recall.   Even highly faceted sites like newegg.com make decisions about which facets to present to the user in what order, just like a decision tree.  </p>
<p>If I go to newegg.com, for instance, and type &#8220;2gb memory&#8221;, it by default pops me into &#8220;Guided Search&#8221;.   But even if I go to &#8220;Advanced Search&#8221;, I don&#8217;t get any of the subcategories I&#8217;d expect, like the type of memory (pin config, DDR2/DDR3), the speed of the memory, etc.  I can expand subcategories, useful links (?), price, and manufacturer.   I have to make several choices, limiting my choice of products, until I see the &#8220;type&#8221; and &#8220;speed&#8221; and &#8220;cas latency&#8221; facets.  Someone (or some algorithm) decided that manufacturer was more important than the type of memory!  And I can&#8217;t click on the breadcrumb path at the top of my search to just limit myself to DDR2 and forget the other facets I&#8217;ve chosen.</p>
<p>The second issue is plain old database errors. With faceting at sites like NewEgg and Amazon, the DB annotation isn&#8217;t perfect, so you get false positives and false negatives on faceted search that you tend not to get with plain old text search.  If I select DDR2, I have to trust that they&#8217;ve entered &#8220;DDR2&#8243; for every product that really is DDR2.   Amazon also has serious issues faceting by recording artist (granted, it&#8217;s a hard problem), but it means that I can&#8217;t click on an artist&#8217;s name and expect to see all their other albums.  The faceting&#8217;s actually hurting recall.</p>
<p>Of course, we could hope that they do a better job on the database, selecting facets, etc., but it&#8217;s a really difficult problem when there are literally hundreds or thousands of facets in the system and humans doing data entry.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Christopher</title>
		<link>http://thenoisychannel.com/2009/05/11/catching-up-with-hunch/comment-page-1/#comment-3096</link>
		<dc:creator>Christopher</dc:creator>
		<pubDate>Tue, 12 May 2009 02:36:49 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2073#comment-3096</guid>
		<description>I was very skeptical (still am) as I don&#039;t think Decision Trees are the best mechanism for this BUT I&#039;m thinking like a researcher/scientist not an average consumer who is their target (I assume) or is their target for this incarnation of Hunch.

Now when I found out who their Chief Scientist is I changed my tune a bit, he has done some great stuff like using common sense to improve aspects of NL parsing while at MIT (I am highly impressed with the concept).

The other reason I am now quite interested is the thing you nailed, the availability of an an API (hello Wolfram!).  While as a web site Hunch is not overly exciting to me taking (them or a 3rd party) their data into the deeper NLP realm can allow some very interesting things can be built; I think.</description>
		<content:encoded><![CDATA[<p>I was very skeptical (still am) as I don&#8217;t think Decision Trees are the best mechanism for this BUT I&#8217;m thinking like a researcher/scientist not an average consumer who is their target (I assume) or is their target for this incarnation of Hunch.</p>
<p>Now when I found out who their Chief Scientist is I changed my tune a bit, he has done some great stuff like using common sense to improve aspects of NL parsing while at MIT (I am highly impressed with the concept).</p>
<p>The other reason I am now quite interested is the thing you nailed, the availability of an an API (hello Wolfram!).  While as a web site Hunch is not overly exciting to me taking (them or a 3rd party) their data into the deeper NLP realm can allow some very interesting things can be built; I think.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

