<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: In Defense Of Recall</title>
	<atom:link href="http://thenoisychannel.com/2009/07/17/in-defense-of-recall/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/</link>
	<description></description>
	<lastBuildDate>Sat, 11 Feb 2012 00:39:47 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: CIKM 2011 Industry Event: Stephen Robertson on Why Recall Matters</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-10511</link>
		<dc:creator>CIKM 2011 Industry Event: Stephen Robertson on Why Recall Matters</dc:creator>
		<pubDate>Mon, 14 Nov 2011 16:02:18 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-10511</guid>
		<description>[...] on &#8220;Why Recall Matters&#8220;. For the record, I didn&#8217;t put him up to this, despite my strong opinions on the [...]</description>
		<content:encoded><![CDATA[<p>[...] on &#8220;Why Recall Matters&#8220;. For the record, I didn&#8217;t put him up to this, despite my strong opinions on the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: In Search Of Structure</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-9884</link>
		<dc:creator>In Search Of Structure</dc:creator>
		<pubDate>Sun, 15 May 2011 19:25:45 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-9884</guid>
		<description>[...] use of recall as a measure of search satisfaction.&#8221; I posted a rebuttal entitled &#8220;In Defense of Recall&#8220;, arguing that recall is much more useful as a measure for set retrieval than for ranked [...]</description>
		<content:encoded><![CDATA[<p>[...] use of recall as a measure of search satisfaction.&#8221; I posted a rebuttal entitled &#8220;In Defense of Recall&#8220;, arguing that recall is much more useful as a measure for set retrieval than for ranked [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4033</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Sun, 26 Jul 2009 22:46:24 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4033</guid>
		<description>Also, William, thanks for the heads up. As it turns out, I sat with Justin at breakfast on the &lt;a href=&quot;http://thenoisychannel.com/2009/07/21/sigir-2009-day-1/&quot; rel=&quot;nofollow&quot;&gt;first day of SIGIR&lt;/a&gt;. I believe that Alistair was at the table next to us, but he let Justin field my tough questions. :-)</description>
		<content:encoded><![CDATA[<p>Also, William, thanks for the heads up. As it turns out, I sat with Justin at breakfast on the <a href="http://thenoisychannel.com/2009/07/21/sigir-2009-day-1/" rel="nofollow">first day of SIGIR</a>. I believe that Alistair was at the table next to us, but he let Justin field my tough questions. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Le Zhao</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4031</link>
		<dc:creator>Le Zhao</dc:creator>
		<pubDate>Sat, 25 Jul 2009 23:18:43 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4031</guid>
		<description>Dan, glad to know you already considered that option.  When Googling fails, it is indeed difficult to find accurate and affordable answers quickly.  Glad to know mother and baby are fine!</description>
		<content:encoded><![CDATA[<p>Dan, glad to know you already considered that option.  When Googling fails, it is indeed difficult to find accurate and affordable answers quickly.  Glad to know mother and baby are fine!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4030</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Sat, 25 Jul 2009 22:31:13 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4030</guid>
		<description>Le, not off topic at all--in fact, I did consider switching modes from information search to expert search, and would have done so in this case had there been time. Earlier in the pregnancy, we did consult doctor in our social network to help make sense of literature about various risks and trade-offs.</description>
		<content:encoded><![CDATA[<p>Le, not off topic at all&#8211;in fact, I did consider switching modes from information search to expert search, and would have done so in this case had there been time. Earlier in the pregnancy, we did consult doctor in our social network to help make sense of literature about various risks and trade-offs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Le Zhao</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4029</link>
		<dc:creator>Le Zhao</dc:creator>
		<pubDate>Sat, 25 Jul 2009 22:24:39 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4029</guid>
		<description>At the risk of being off topic, I want to point out that for technical questions like the medical example in the article, I would switch to something like Yahoo! Answers, after failing to find useful information for some time.

In library science, the first question to ask is not how to formulate a query, but where to look for the information.  Cheers.</description>
		<content:encoded><![CDATA[<p>At the risk of being off topic, I want to point out that for technical questions like the medical example in the article, I would switch to something like Yahoo! Answers, after failing to find useful information for some time.</p>
<p>In library science, the first question to ask is not how to formulate a query, but where to look for the information.  Cheers.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4028</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Sat, 25 Jul 2009 00:40:35 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4028</guid>
		<description>Bob, sorry not to respond to this sooner--was a bit occupied with SIGIR. Your posts reinforce my feeling that the IR community has under-emphasized recall relative to its value in real-world problems. Of course, I blame this state of affairs on the laziness of people to think beyond ranked retrieval as experienced in web search. But I&#039;m glad to see more people questioning the status quo, e.g., two of the speakers at the SIGIR Industry Track (whom I didn&#039;t pay off!).</description>
		<content:encoded><![CDATA[<p>Bob, sorry not to respond to this sooner&#8211;was a bit occupied with SIGIR. Your posts reinforce my feeling that the IR community has under-emphasized recall relative to its value in real-world problems. Of course, I blame this state of affairs on the laziness of people to think beyond ranked retrieval as experienced in web search. But I&#8217;m glad to see more people questioning the status quo, e.g., two of the speakers at the SIGIR Industry Track (whom I didn&#8217;t pay off!).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4026</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Tue, 21 Jul 2009 00:02:31 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4026</guid>
		<description>I&#039;ve mentioned (to Dan) before that faceting on some sites I use (e.g. NewEgg or Amazon) induces low recall because of data entry errors (e.g. listing the wrong DDR type for a memory chip, so I can&#039;t find it when I search under the DDR3 facet).

I&#039;ve &lt;a href=&quot;http://lingpipe-blog.com/2007/06/20/is-90-entity-detection-good-enough/&quot; rel=&quot;nofollow&quot;&gt;blogged about high recall&lt;/a&gt; (and here&lt;/a&gt; and &lt;a href=&quot;http://lingpipe-blog.com/2006/07/25/confidence-based-gene-mentions-for-all-of-medline/&quot; rel=&quot;nofollow&quot;&gt;here&lt;/a&gt;).

Our NIH grant is focused on high recall linkage of the research literature to gene databases (and other databases).  In the gene case, for any given release of the databases, the queries are fixed -- they&#039;re just the genes in the Entrez-Gene database.    And we just want to find articles that mention them.

A serious issue is results diversity when trying to formalize useful recall.  Both micro- and macro-averages are misleading, as is the pooled evaluation strategy of TREC, which tends to overestimate true recall (by assuming each true positive was found by at least one team in the top 1000 results).  

It&#039;s very easy to find some genes and very hard to find others.  We can find all gazillion mentions of &quot;p53&quot; with high precision, but have a harder time with the mouse gene &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;term=11904&quot; rel=&quot;nofollow&quot;&gt;&quot;at&quot;&lt;/a&gt;.  And that applies across aliases within genes, too; &lt;a href=&quot;http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;dopt=default&amp;rn=1&amp;list_uids=12&quot; rel=&quot;nofollow&quot;&gt;Alpha-1-antichymotrypsin&lt;/a&gt; is easy to find when referred to as &quot;SERPINA3&quot;, but much harder when called &quot;ACT&quot; (there&#039;s lots of all caps titles, etc. in MEDLINE).</description>
		<content:encoded><![CDATA[<p>I&#8217;ve mentioned (to Dan) before that faceting on some sites I use (e.g. NewEgg or Amazon) induces low recall because of data entry errors (e.g. listing the wrong DDR type for a memory chip, so I can&#8217;t find it when I search under the DDR3 facet).</p>
<p>I&#8217;ve <a href="http://lingpipe-blog.com/2007/06/20/is-90-entity-detection-good-enough/" rel="nofollow">blogged about high recall</a> (and here and <a href="http://lingpipe-blog.com/2006/07/25/confidence-based-gene-mentions-for-all-of-medline/" rel="nofollow">here</a>).</p>
<p>Our NIH grant is focused on high recall linkage of the research literature to gene databases (and other databases).  In the gene case, for any given release of the databases, the queries are fixed &#8212; they&#8217;re just the genes in the Entrez-Gene database.    And we just want to find articles that mention them.</p>
<p>A serious issue is results diversity when trying to formalize useful recall.  Both micro- and macro-averages are misleading, as is the pooled evaluation strategy of TREC, which tends to overestimate true recall (by assuming each true positive was found by at least one team in the top 1000 results).  </p>
<p>It&#8217;s very easy to find some genes and very hard to find others.  We can find all gazillion mentions of &#8220;p53&#8243; with high precision, but have a harder time with the mouse gene <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;term=11904" rel="nofollow">&#8220;at&#8221;</a>.  And that applies across aliases within genes, too; <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&amp;cmd=retrieve&amp;dopt=default&amp;rn=1&amp;list_uids=12" rel="nofollow">Alpha-1-antichymotrypsin</a> is easy to find when referred to as &#8220;SERPINA3&#8243;, but much harder when called &#8220;ACT&#8221; (there&#8217;s lots of all caps titles, etc. in MEDLINE).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: William Webber</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4025</link>
		<dc:creator>William Webber</dc:creator>
		<pubDate>Mon, 20 Jul 2009 04:59:30 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4025</guid>
		<description>Alistair, Justin, and Laurence will all be at SIGIR -- I&#039;m sure they&#039;d love to chat with you about recall. You can bail them up at the poster/demo sessions.</description>
		<content:encoded><![CDATA[<p>Alistair, Justin, and Laurence will all be at SIGIR &#8212; I&#8217;m sure they&#8217;d love to chat with you about recall. You can bail them up at the poster/demo sessions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Heading to SIGIR &#124; The Noisy Channel</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4023</link>
		<dc:creator>Heading to SIGIR &#124; The Noisy Channel</dc:creator>
		<pubDate>Sun, 19 Jul 2009 15:54:04 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4023</guid>
		<description>[...] RSS    &#160;     &#8592; In Defense Of Recall [...]</description>
		<content:encoded><![CDATA[<p>[...] RSS    &nbsp;     &larr; In Defense Of Recall [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4022</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sun, 19 Jul 2009 02:46:35 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4022</guid>
		<description>&lt;i&gt;My point was precisely as stated. If we have a proliferation of metrics, then we are not likely to have a problem.&lt;/i&gt;

Your point (I thought) was that if in the historical literature there is a proliferation, then we do not have a problem.  (&lt;i&gt;It would be interesting to do a survey and see whether metrics “proliferate”.&lt;/i&gt;)

My counterpoint is that we don&#039;t have to do a survey, because we can proliferate the metrics ourselves.  By writing papers that propose, and utilize, new metrics.  

Both recall and precision aside, one of the things I learned very early on in my IR career is that there is a strong (at least theoretical) willingness in the general community to accept any paper that uses a non-traditional evaluation metric, as long as that metric is justified and objective (not just made up to fit the data).  

Granted, not every single reviewer feels this way, but I&#039;ve had enough private conversations over the years to know that a lot of people do.  In fact, I&#039;ll venture out on a limb and make the observation that the younger the researcher, the more dogmatic he or she tends to be about the evaluation metric used.  I&#039;ve had many more conversations with IR &quot;old timers&quot; in which they&#039;ve expressed willingness to let metrics proliferate.  

And hypothetically, suppose almost no one allowed metrics to proliferate. If that were the case, it doesn&#039;t mean we can&#039;t do good science and use good metrics anyway.  Yes, it might be much more difficult to get papers accepted, and to get funding, etc.  But that&#039;s a constant problem that we face either way.. convincing others of the value of our work.  

But luckily, that&#039;s not the case. I do find a willingness in the IR community to accept papers with non-traditional metrics.</description>
		<content:encoded><![CDATA[<p><i>My point was precisely as stated. If we have a proliferation of metrics, then we are not likely to have a problem.</i></p>
<p>Your point (I thought) was that if in the historical literature there is a proliferation, then we do not have a problem.  (<i>It would be interesting to do a survey and see whether metrics “proliferate”.</i>)</p>
<p>My counterpoint is that we don&#8217;t have to do a survey, because we can proliferate the metrics ourselves.  By writing papers that propose, and utilize, new metrics.  </p>
<p>Both recall and precision aside, one of the things I learned very early on in my IR career is that there is a strong (at least theoretical) willingness in the general community to accept any paper that uses a non-traditional evaluation metric, as long as that metric is justified and objective (not just made up to fit the data).  </p>
<p>Granted, not every single reviewer feels this way, but I&#8217;ve had enough private conversations over the years to know that a lot of people do.  In fact, I&#8217;ll venture out on a limb and make the observation that the younger the researcher, the more dogmatic he or she tends to be about the evaluation metric used.  I&#8217;ve had many more conversations with IR &#8220;old timers&#8221; in which they&#8217;ve expressed willingness to let metrics proliferate.  </p>
<p>And hypothetically, suppose almost no one allowed metrics to proliferate. If that were the case, it doesn&#8217;t mean we can&#8217;t do good science and use good metrics anyway.  Yes, it might be much more difficult to get papers accepted, and to get funding, etc.  But that&#8217;s a constant problem that we face either way.. convincing others of the value of our work.  </p>
<p>But luckily, that&#8217;s not the case. I do find a willingness in the IR community to accept papers with non-traditional metrics.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4020</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Sat, 18 Jul 2009 23:19:19 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4020</guid>
		<description>Area under the ROC is my favorite Precision/Recall mix metric at the moment.

Beyond that I think we still are dealing with context.  Even the results are contextual.</description>
		<content:encoded><![CDATA[<p>Area under the ROC is my favorite Precision/Recall mix metric at the moment.</p>
<p>Beyond that I think we still are dealing with context.  Even the results are contextual.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4019</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Sat, 18 Jul 2009 03:53:08 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4019</guid>
		<description>@jeremy

My point was precisely as stated. If we have a proliferation of metrics, then we are not likely to have a problem.

Thus, in any field, not just IR, it would be interesting to do surveys to see whether people use and develop a wide range of metrics.

Then, if they don&#039;t, we should investigate and determine whether it is a problem.

I think it is an interesting, generic, research program.</description>
		<content:encoded><![CDATA[<p>@jeremy</p>
<p>My point was precisely as stated. If we have a proliferation of metrics, then we are not likely to have a problem.</p>
<p>Thus, in any field, not just IR, it would be interesting to do surveys to see whether people use and develop a wide range of metrics.</p>
<p>Then, if they don&#8217;t, we should investigate and determine whether it is a problem.</p>
<p>I think it is an interesting, generic, research program.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4018</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sat, 18 Jul 2009 03:05:47 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4018</guid>
		<description>oops, scratch that last sentence fragment ;-)</description>
		<content:encoded><![CDATA[<p>oops, scratch that last sentence fragment <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4017</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sat, 18 Jul 2009 03:04:27 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4017</guid>
		<description>&lt;i&gt;It would be interesting to do a survey and see whether metrics “proliferate”. If they do, then I agree we do not have a problem.&lt;/i&gt;

I don&#039;t quite understand what you mean by doing a survey.  Mine was a proscriptive suggestion, not a descriptive.  

But the suggestion does have precedent.  In the early days of TREC (early-mid 90s), there wasn&#039;t just one way of presenting precision or recall.  You have everything from MAP, to Prec@n, to R-prec, to recall@1000, to interpolated precision/recall (e.g. prec at 0.0 interpolated recall, at 0.1, etc.) .. and then later GMAP, and ndgc, and.. and..

And a lot of the early studies didn&#039;t just report one number, one metric.  A lot of the early studies reported *lots* of metrics.

It is only relatively recently.. really since the popularization of the web and web search engines, that the number of reported metrics per paper seems to have decreased.

I still say, go with all of &#039;em.  Give the biggest, most robust picture available.


And a lot</description>
		<content:encoded><![CDATA[<p><i>It would be interesting to do a survey and see whether metrics “proliferate”. If they do, then I agree we do not have a problem.</i></p>
<p>I don&#8217;t quite understand what you mean by doing a survey.  Mine was a proscriptive suggestion, not a descriptive.  </p>
<p>But the suggestion does have precedent.  In the early days of TREC (early-mid 90s), there wasn&#8217;t just one way of presenting precision or recall.  You have everything from MAP, to Prec@n, to R-prec, to recall@1000, to interpolated precision/recall (e.g. prec at 0.0 interpolated recall, at 0.1, etc.) .. and then later GMAP, and ndgc, and.. and..</p>
<p>And a lot of the early studies didn&#8217;t just report one number, one metric.  A lot of the early studies reported *lots* of metrics.</p>
<p>It is only relatively recently.. really since the popularization of the web and web search engines, that the number of reported metrics per paper seems to have decreased.</p>
<p>I still say, go with all of &#8216;em.  Give the biggest, most robust picture available.</p>
<p>And a lot</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4016</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Sat, 18 Jul 2009 02:00:58 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4016</guid>
		<description>&lt;i&gt;You can still do multi-shot query evaluation and measure it in terms of precision and recall.&lt;/i&gt;

I think it might be a bit more difficult to execute with rigor. I&#039;d be interested in any reference.

&lt;i&gt;All metrics are only approximations. And as long as we don’t forget that, the approach that I favor is to let metrics proliferate.&lt;/i&gt;

It would be interesting to do a survey and see whether metrics &quot;proliferate&quot;. If they do, then I agree we do not have a problem.</description>
		<content:encoded><![CDATA[<p><i>You can still do multi-shot query evaluation and measure it in terms of precision and recall.</i></p>
<p>I think it might be a bit more difficult to execute with rigor. I&#8217;d be interested in any reference.</p>
<p><i>All metrics are only approximations. And as long as we don’t forget that, the approach that I favor is to let metrics proliferate.</i></p>
<p>It would be interesting to do a survey and see whether metrics &#8220;proliferate&#8221;. If they do, then I agree we do not have a problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4013</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sat, 18 Jul 2009 00:54:08 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4013</guid>
		<description>I think the single-shot/batch issue is a bit of a red herring.

You can still do multi-shot query evaluation and measure it in terms of precision and recall.  You  might have to ask users to trawl through real results until they&#039;ve completed their task, but at the end of the day the entire set of information that the user has examined can be thought of as a sequence or a list.  And with that list, you can perform precision and recall calculations.

Well, you might have to change the name of the metric to &quot;viewed precision&quot; and &quot;viewed recall&quot;.    But the overall concept still applies.

However, even if you do measure it this way, there is still the issue of how well recall and/or precision approximate task completion (aka &quot;information need satisfaction&quot;), which we all agree is what we&#039;re really after.  

Yes, there is a danger of reifying the evaluation metric, whether that metric is precision, recall, ndcg or whatever.  But that temptation to reify is going to be true of any and every evaluation metric, no matter what it is.  All metrics are only approximations.  And as long as we don&#039;t forget that, the approach that I favor is to let metrics proliferate.  The more metrics you use, the more approximations, the more samples points you have, and that can only be a good thing.  

That is, instead of being &quot;against recall&quot; because it is unclear whether one means cardinality, coverage, density, etc...why not be pro-recall, pro-cardinality, pro-coverage, pro-density, pro-precision, pro-etc.?</description>
		<content:encoded><![CDATA[<p>I think the single-shot/batch issue is a bit of a red herring.</p>
<p>You can still do multi-shot query evaluation and measure it in terms of precision and recall.  You  might have to ask users to trawl through real results until they&#8217;ve completed their task, but at the end of the day the entire set of information that the user has examined can be thought of as a sequence or a list.  And with that list, you can perform precision and recall calculations.</p>
<p>Well, you might have to change the name of the metric to &#8220;viewed precision&#8221; and &#8220;viewed recall&#8221;.    But the overall concept still applies.</p>
<p>However, even if you do measure it this way, there is still the issue of how well recall and/or precision approximate task completion (aka &#8220;information need satisfaction&#8221;), which we all agree is what we&#8217;re really after.  </p>
<p>Yes, there is a danger of reifying the evaluation metric, whether that metric is precision, recall, ndcg or whatever.  But that temptation to reify is going to be true of any and every evaluation metric, no matter what it is.  All metrics are only approximations.  And as long as we don&#8217;t forget that, the approach that I favor is to let metrics proliferate.  The more metrics you use, the more approximations, the more samples points you have, and that can only be a good thing.  </p>
<p>That is, instead of being &#8220;against recall&#8221; because it is unclear whether one means cardinality, coverage, density, etc&#8230;why not be pro-recall, pro-cardinality, pro-coverage, pro-density, pro-precision, pro-etc.?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4012</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Fri, 17 Jul 2009 18:43:47 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4012</guid>
		<description>Thanks. The library and information scientists get that too, and indeed they take a very cognitive / user-centric approach to information seeking.  I think most information retrieval researchers view the batch retrieval model is simply a necessary evil to make their work measurable. But assumptions have consequences, and I think it&#039;s a real problem that the IR community hasn&#039;t established a consensus on how--or ever whether--to measure retrieval effectiveness when multiple queries are involved.</description>
		<content:encoded><![CDATA[<p>Thanks. The library and information scientists get that too, and indeed they take a very cognitive / user-centric approach to information seeking.  I think most information retrieval researchers view the batch retrieval model is simply a necessary evil to make their work measurable. But assumptions have consequences, and I think it&#8217;s a real problem that the IR community hasn&#8217;t established a consensus on how&#8211;or ever whether&#8211;to measure retrieval effectiveness when multiple queries are involved.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Lemire</title>
		<link>http://thenoisychannel.com/2009/07/17/in-defense-of-recall/comment-page-1/#comment-4011</link>
		<dc:creator>Daniel Lemire</dc:creator>
		<pubDate>Fri, 17 Jul 2009 18:35:58 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=2331#comment-4011</guid>
		<description>Insightful.

I don&#039;t think that database people ever assume that the user will only issue one query. In application such as OLAP, you actually expect a stream of queries.</description>
		<content:encoded><![CDATA[<p>Insightful.</p>
<p>I don&#8217;t think that database people ever assume that the user will only issue one query. In application such as OLAP, you actually expect a stream of queries.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

