<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Guest Post: Information Retrieval using a Bayesian Model of Learning and Generalization</title>
	<atom:link href="http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/</link>
	<description></description>
	<lastBuildDate>Mon, 21 May 2012 05:21:00 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<item>
		<title>By: Dan</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5941</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Mon, 26 Apr 2010 14:02:38 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5941</guid>
		<description>Katherine,

I agree with what you said that the process of category learning is happening with one or two examples.  The more interesting questions are how long does it take to reach its peak performance and whether you have a model for this process that performs as well as human learners (or better) in reaching its peak performance and whether this process actually works very well in a test environment without labeled data.   Everything else is theoretical and academic, as I am not sure that we yet have an understanding of &#039;How the Mind Works&#039; and whether Bayesian learning will ultimately prove to be helpful here.</description>
		<content:encoded><![CDATA[<p>Katherine,</p>
<p>I agree with what you said that the process of category learning is happening with one or two examples.  The more interesting questions are how long does it take to reach its peak performance and whether you have a model for this process that performs as well as human learners (or better) in reaching its peak performance and whether this process actually works very well in a test environment without labeled data.   Everything else is theoretical and academic, as I am not sure that we yet have an understanding of &#8216;How the Mind Works&#8217; and whether Bayesian learning will ultimately prove to be helpful here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5940</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Mon, 26 Apr 2010 11:44:03 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5940</guid>
		<description>Hi Dan,

The entire query in xyggy is formed from examples, the text word is just a quick way of indexing a few examples.

In terms of &quot;category learning&quot;, if you give people even one or two examples they will use those examples to learn, or generalize, about the category. They may not always be able to generalize completely correctly, but that doesn&#039;t mean that they aren&#039;t learning. Just because someone is confused about whether a whale is a mammal or a fish doesn&#039;t mean they don&#039;t know anything about mammals or fish.

References for people learning novel categories from one or just a couple of examples: Xu and Tenenbaum 2007 Word learning as bayesian
inference. Psychological Review, and Pinker 1999, How the Mind Works.</description>
		<content:encoded><![CDATA[<p>Hi Dan,</p>
<p>The entire query in xyggy is formed from examples, the text word is just a quick way of indexing a few examples.</p>
<p>In terms of &#8220;category learning&#8221;, if you give people even one or two examples they will use those examples to learn, or generalize, about the category. They may not always be able to generalize completely correctly, but that doesn&#8217;t mean that they aren&#8217;t learning. Just because someone is confused about whether a whale is a mammal or a fish doesn&#8217;t mean they don&#8217;t know anything about mammals or fish.</p>
<p>References for people learning novel categories from one or just a couple of examples: Xu and Tenenbaum 2007 Word learning as bayesian<br />
inference. Psychological Review, and Pinker 1999, How the Mind Works.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5932</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Sun, 25 Apr 2010 19:56:21 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5932</guid>
		<description>Katherine,

Thanks for the references.  I agree that human category learning is complex and the actual rate of learning must depend upon individual difference factors as well as the nature of the category, such as the number of features.    The same may be said about animal and machine learning of categories.  A complex category with many features like &#039;dog&#039; has features that overlap with many other similar categories like &#039;cat&#039; and &#039;goat&#039; etc. If you have an example of where something like this can be learned by a human learner with just a couple of training trials, I would be very interested. 

I am just not sure that category learning is happening with one or two examples in Zygy, as it seems to me that the cued recall of the previously stored category is what is being aided by the one or two examples in the query.  That is not to discount Zygy as being a potentially new and interesting retrieval system that may have benefits beyond LDA and Google Sets.  I do not know enough about Bayesian Sets to comment on that.  I do know a lot about how many trials it takes a supervised machine learning system to accurately learn a category with many features that overlap with similar categories, as that has been our business for many years and in multitudes of applications.  I can assure you that this cannot be accomplished with the typical  high dimensional, multicollinear business data that we see daily with just a couple of training examples.</description>
		<content:encoded><![CDATA[<p>Katherine,</p>
<p>Thanks for the references.  I agree that human category learning is complex and the actual rate of learning must depend upon individual difference factors as well as the nature of the category, such as the number of features.    The same may be said about animal and machine learning of categories.  A complex category with many features like &#8216;dog&#8217; has features that overlap with many other similar categories like &#8216;cat&#8217; and &#8216;goat&#8217; etc. If you have an example of where something like this can be learned by a human learner with just a couple of training trials, I would be very interested. </p>
<p>I am just not sure that category learning is happening with one or two examples in Zygy, as it seems to me that the cued recall of the previously stored category is what is being aided by the one or two examples in the query.  That is not to discount Zygy as being a potentially new and interesting retrieval system that may have benefits beyond LDA and Google Sets.  I do not know enough about Bayesian Sets to comment on that.  I do know a lot about how many trials it takes a supervised machine learning system to accurately learn a category with many features that overlap with similar categories, as that has been our business for many years and in multitudes of applications.  I can assure you that this cannot be accomplished with the typical  high dimensional, multicollinear business data that we see daily with just a couple of training examples.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5931</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Sun, 25 Apr 2010 18:04:12 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5931</guid>
		<description>@dan

The congratulations are really appreciated.  Hopefully, down the road we or others will generate empirical data that compares the results of using different feature vectors on the same data set.  Katherine has jumped in to answer the category learning portion of the question.</description>
		<content:encoded><![CDATA[<p>@dan</p>
<p>The congratulations are really appreciated.  Hopefully, down the road we or others will generate empirical data that compares the results of using different feature vectors on the same data set.  Katherine has jumped in to answer the category learning portion of the question.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5930</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Sun, 25 Apr 2010 17:40:55 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5930</guid>
		<description>Hi Dan,

Human category learning is obviously a complex issue. In terms of how quickly adults learn new categories, well, that really depends on the category being learned. See for example Nosofsky, Gluck et al&#039;s replication of the Shepard, Hovland, Jenkin&#039;s data: http://love.psy.utexas.edu/~love/papers/love_etal_2004.pdf - figure 3. Some categories people learn very quickly, while others are much more complicated and it&#039;s not even clear that they ever completely learn them.

There has been alot of recent work on Bayesian models for human category learning and generalization, by Tenenbaum, Griffiths, we had a paper recently: http://www.gatsby.ucl.ac.uk/~heller/prior_knowledge_nips_revision.pdf -- we have another paper in submission in fact which compares human generalization off of one versus two training examples -- and these Bayesian models seem to match up with human data quite well. While much of the discussion in these papers focuses on the prior, the basic way a category is being modeled, or learned, is the same way in which we model a category, or concept, in Bayesian Sets.</description>
		<content:encoded><![CDATA[<p>Hi Dan,</p>
<p>Human category learning is obviously a complex issue. In terms of how quickly adults learn new categories, well, that really depends on the category being learned. See for example Nosofsky, Gluck et al&#8217;s replication of the Shepard, Hovland, Jenkin&#8217;s data: <a href="http://love.psy.utexas.edu/~love/papers/love_etal_2004.pdf" rel="nofollow">http://love.psy.utexas.edu/~love/papers/love_etal_2004.pdf</a> &#8211; figure 3. Some categories people learn very quickly, while others are much more complicated and it&#8217;s not even clear that they ever completely learn them.</p>
<p>There has been alot of recent work on Bayesian models for human category learning and generalization, by Tenenbaum, Griffiths, we had a paper recently: <a href="http://www.gatsby.ucl.ac.uk/~heller/prior_knowledge_nips_revision.pdf" rel="nofollow">http://www.gatsby.ucl.ac.uk/~heller/prior_knowledge_nips_revision.pdf</a> &#8212; we have another paper in submission in fact which compares human generalization off of one versus two training examples &#8212; and these Bayesian models seem to match up with human data quite well. While much of the discussion in these papers focuses on the prior, the basic way a category is being modeled, or learned, is the same way in which we model a category, or concept, in Bayesian Sets.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5929</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Sun, 25 Apr 2010 16:18:06 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5929</guid>
		<description>Dinesh,

I appreciate your response.  I did have a look at the Bayesian Sets paper.  I do not know enough about this method to know if it would perform better under the circumstances that you list.  I am not sure if the trigram would be generally sensitive to  the type of interaction that I mention.  I am also not sure if adding the complete text would help with this anomaly that I find.   These are empirical questions.  However, I congratulate you and the rest of the team on a very interesting approach to a very difficult problem.

I would again though mention that this does not seem to be a model of concept or category learning, such as when a child learns the category of &#039;dog&#039;.  It seems more to be a model for cued recall of the contents of previously stored categories.  It is more like if I ask a professor who is an expert on patents to give me a reference to all patents that concern the concept of &#039;dog&#039; or &#039;regression&#039;.  If she stumbles on &#039;regression&#039; because of the multiple meanings, you might give a couple of examples and then she suddenly will be able to know that you mean regression related to disease or tumor.  Humans can obviously do this type of task with very few examples,  but the original learning of the category actually seems to require a much larger training sample size.  Please see this paper for example :

http://www.psych.nyu.edu/rehder/Hoffman_&amp;_Rehder_10.pdf

This paper presents a couple of experiments with adult humans where  category learning requires at least 150-250 training trials (and in some cases it may not have even asymptoted yet).</description>
		<content:encoded><![CDATA[<p>Dinesh,</p>
<p>I appreciate your response.  I did have a look at the Bayesian Sets paper.  I do not know enough about this method to know if it would perform better under the circumstances that you list.  I am not sure if the trigram would be generally sensitive to  the type of interaction that I mention.  I am also not sure if adding the complete text would help with this anomaly that I find.   These are empirical questions.  However, I congratulate you and the rest of the team on a very interesting approach to a very difficult problem.</p>
<p>I would again though mention that this does not seem to be a model of concept or category learning, such as when a child learns the category of &#8216;dog&#8217;.  It seems more to be a model for cued recall of the contents of previously stored categories.  It is more like if I ask a professor who is an expert on patents to give me a reference to all patents that concern the concept of &#8216;dog&#8217; or &#8216;regression&#8217;.  If she stumbles on &#8216;regression&#8217; because of the multiple meanings, you might give a couple of examples and then she suddenly will be able to know that you mean regression related to disease or tumor.  Humans can obviously do this type of task with very few examples,  but the original learning of the category actually seems to require a much larger training sample size.  Please see this paper for example :</p>
<p><a href="http://www.psych.nyu.edu/rehder/Hoffman_&#038;_Rehder_10.pdf" rel="nofollow">http://www.psych.nyu.edu/rehder/Hoffman_&#038;_Rehder_10.pdf</a></p>
<p>This paper presents a couple of experiments with adult humans where  category learning requires at least 150-250 training trials (and in some cases it may not have even asymptoted yet).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5927</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Sun, 25 Apr 2010 08:29:48 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5927</guid>
		<description>Hi Dan
 
The patent demo uses bibliographic data only which generally represents about 10% of the available textual information (for non-design patents).  We are using a sophisticated bag-of-words method for the feature vectors but using all available textual data would improve the search results further.  With Bayesian Sets you have the flexibility to define the features of the feature vectors to fit your needs.  For example, you can use tri-grams instead of bag-of-words and also include concepts and semantic relationships.  
 
Wrt the demo, experiment by toggling on/off the keyword search line in the search box as well as toggling each item on/off to improve relevance.  Also, take a look at the psychological research literature cited in the Bayesian Sets paper.</description>
		<content:encoded><![CDATA[<p>Hi Dan</p>
<p>The patent demo uses bibliographic data only which generally represents about 10% of the available textual information (for non-design patents).  We are using a sophisticated bag-of-words method for the feature vectors but using all available textual data would improve the search results further.  With Bayesian Sets you have the flexibility to define the features of the feature vectors to fit your needs.  For example, you can use tri-grams instead of bag-of-words and also include concepts and semantic relationships.  </p>
<p>Wrt the demo, experiment by toggling on/off the keyword search line in the search box as well as toggling each item on/off to improve relevance.  Also, take a look at the psychological research literature cited in the Bayesian Sets paper.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5920</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Sat, 24 Apr 2010 10:54:20 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5920</guid>
		<description>Dinesh,

I do not believe that Google&#039;s research blog will publish my last comment which was an answer to you because I mentioned the name of your product.  As all that I said was that a &quot;bag of words&quot; approach will not be powerful enough to deal with polysemy, whereas an approach that can handle interactions would be.  In other words, the statistical interaction of frequency counts of &quot;regression&quot; and &quot;tumor&quot; would be a feature that would uniquely separate the two meanings of &quot;regression&quot; in this example even with a small training sample size.  In order to do this though, one would need massive parallel processing capabilities to compute the very large number of interaction features.</description>
		<content:encoded><![CDATA[<p>Dinesh,</p>
<p>I do not believe that Google&#8217;s research blog will publish my last comment which was an answer to you because I mentioned the name of your product.  As all that I said was that a &#8220;bag of words&#8221; approach will not be powerful enough to deal with polysemy, whereas an approach that can handle interactions would be.  In other words, the statistical interaction of frequency counts of &#8220;regression&#8221; and &#8220;tumor&#8221; would be a feature that would uniquely separate the two meanings of &#8220;regression&#8221; in this example even with a small training sample size.  In order to do this though, one would need massive parallel processing capabilities to compute the very large number of interaction features.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dan</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5917</link>
		<dc:creator>Dan</dc:creator>
		<pubDate>Fri, 23 Apr 2010 23:52:56 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5917</guid>
		<description>Dinesh,

I posted a reply on Google&#039;s Research blog that references my concerns about whether this method should be compared to human concept learning until it can better deal with polysemy.

Here are the details of my test.  I was after patents related to regression of disease, such as the regression of cancer.  The seach term was regression and this is what my search looks like after I added four patents that attempt to refine the meaning of regression in this context.

regression
5414019 Regression of mammalian carcinomas
5356817 Method for detecting the onset, progression and regression of gynecolic cancers
4544305 Aiding the regression of neoplastic diseases
4757056 Method for tumor regression in rats, mice and hamsters

 When you do this, you will see that 6 of the top 12 patents returned have the meaning of regression in the statistical sense rather than the disease sense.  I would be interested in the comments of the bloggers on this anomaly.  In fairness, when I run the search for &quot;regression&quot; on Google, it also give the more frequent usage related to statistics as the predominant result.  I would need to qualify it as &quot;regression cancer disease&quot; before Google would understand what I was after.   Yet, humans would easily be able to learn what I was after in my search with just a few added items by just looking at the titles as in this example.  The problem appears to be that &quot;statistical regression&quot; was mentioned in the methods section of some of these patents even though the predominant idea in all of them was &quot;regression&quot; in the disease sense.</description>
		<content:encoded><![CDATA[<p>Dinesh,</p>
<p>I posted a reply on Google&#8217;s Research blog that references my concerns about whether this method should be compared to human concept learning until it can better deal with polysemy.</p>
<p>Here are the details of my test.  I was after patents related to regression of disease, such as the regression of cancer.  The seach term was regression and this is what my search looks like after I added four patents that attempt to refine the meaning of regression in this context.</p>
<p>regression<br />
5414019 Regression of mammalian carcinomas<br />
5356817 Method for detecting the onset, progression and regression of gynecolic cancers<br />
4544305 Aiding the regression of neoplastic diseases<br />
4757056 Method for tumor regression in rats, mice and hamsters</p>
<p> When you do this, you will see that 6 of the top 12 patents returned have the meaning of regression in the statistical sense rather than the disease sense.  I would be interested in the comments of the bloggers on this anomaly.  In fairness, when I run the search for &#8220;regression&#8221; on Google, it also give the more frequent usage related to statistics as the predominant result.  I would need to qualify it as &#8220;regression cancer disease&#8221; before Google would understand what I was after.   Yet, humans would easily be able to learn what I was after in my search with just a few added items by just looking at the titles as in this example.  The problem appears to be that &#8220;statistical regression&#8221; was mentioned in the methods section of some of these patents even though the predominant idea in all of them was &#8220;regression&#8221; in the disease sense.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5845</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Wed, 14 Apr 2010 22:07:21 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5845</guid>
		<description>Great! Dinesh forwarded your email to me. I&#039;ll let you know when I&#039;m in town :)</description>
		<content:encoded><![CDATA[<p>Great! Dinesh forwarded your email to me. I&#8217;ll let you know when I&#8217;m in town <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5838</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 13 Apr 2010 22:06:02 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5838</guid>
		<description>Thank you, Dinesh and Katherine, for all your patient explanations!

Katherine, let me know when you get here!  I&#039;m around.</description>
		<content:encoded><![CDATA[<p>Thank you, Dinesh and Katherine, for all your patient explanations!</p>
<p>Katherine, let me know when you get here!  I&#8217;m around.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Weekly Search &#38; Social News: 04/13/2010 &#124; Search Engine Journal</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5834</link>
		<dc:creator>Weekly Search &#38; Social News: 04/13/2010 &#124; Search Engine Journal</dc:creator>
		<pubDate>Tue, 13 Apr 2010 14:24:04 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5834</guid>
		<description>[...] Guest Post: Information Retrieval using a Bayesian Model of Learning and Generalization &#8211; Noisy channel [...]</description>
		<content:encoded><![CDATA[<p>[...] Guest Post: Information Retrieval using a Bayesian Model of Learning and Generalization &#8211; Noisy channel [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5831</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Tue, 13 Apr 2010 10:53:01 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5831</guid>
		<description>Hi Jeremy,

I&#039;m planning to be in the SF area for most of the summer.. If you&#039;re around perhaps we can meet up there.</description>
		<content:encoded><![CDATA[<p>Hi Jeremy,</p>
<p>I&#8217;m planning to be in the SF area for most of the summer.. If you&#8217;re around perhaps we can meet up there.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5830</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Tue, 13 Apr 2010 10:45:21 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5830</guid>
		<description>@ jeremy &amp; daniel wrt feature engineering

The following are the data sets that we created demos from at Xyggy:

- Content-based image search using unlabelled and labelled images with corel and flickr pictures
- last.fm listener playcount and tag data by song to provide a music suggestion service (“if you liked this you will also like these” which is about understanding the users mindset instead of the traditional “people who liked your choice also liked these”)
- Netflix ratings data to provide a movie suggestion service 
- Patents using patent bibliographic data only
- Legal cases with citations
- New York Times annotated corpus consisting of 1.8m articles from 1987 to 2007

The feature vectors are defined and created prior to the search indexes.  Some of the above data sets are rich (eg. images, patents and NYT) and others such as last.fm and Netflix would be difficult to classify as rich data.  Rich or not, all the above demos worked very well.  With flickr images we also built a version that combined the low-level features (eg. color, texture etc.) with available text data (labels, tags, user annotations and so on).  This mixing of types could also be applied to music/audio data.  With the last.fm data two versions were built: one with playcount data only and the other with playcount plus tag data. With text data, you can create new &#039;custom&#039; features based on concepts or semantic relationships.  In general, it really doesn&#039;t hurt to have too many features - directly from the raw data and custom ones - if they are at least plausibly relevant to search.  As can be seen there is plenty of flexibility and creativity available for feature engineering using Bayesian Sets.</description>
		<content:encoded><![CDATA[<p>@ jeremy &amp; daniel wrt feature engineering</p>
<p>The following are the data sets that we created demos from at Xyggy:</p>
<p>- Content-based image search using unlabelled and labelled images with corel and flickr pictures<br />
- last.fm listener playcount and tag data by song to provide a music suggestion service (“if you liked this you will also like these” which is about understanding the users mindset instead of the traditional “people who liked your choice also liked these”)<br />
- Netflix ratings data to provide a movie suggestion service<br />
- Patents using patent bibliographic data only<br />
- Legal cases with citations<br />
- New York Times annotated corpus consisting of 1.8m articles from 1987 to 2007</p>
<p>The feature vectors are defined and created prior to the search indexes.  Some of the above data sets are rich (eg. images, patents and NYT) and others such as last.fm and Netflix would be difficult to classify as rich data.  Rich or not, all the above demos worked very well.  With flickr images we also built a version that combined the low-level features (eg. color, texture etc.) with available text data (labels, tags, user annotations and so on).  This mixing of types could also be applied to music/audio data.  With the last.fm data two versions were built: one with playcount data only and the other with playcount plus tag data. With text data, you can create new &#8216;custom&#8217; features based on concepts or semantic relationships.  In general, it really doesn&#8217;t hurt to have too many features &#8211; directly from the raw data and custom ones &#8211; if they are at least plausibly relevant to search.  As can be seen there is plenty of flexibility and creativity available for feature engineering using Bayesian Sets.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5826</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 23:32:16 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5826</guid>
		<description>&lt;i&gt;But even there the richness of the raw input matters.&lt;/i&gt;

Yes, but if you&#039;ve got the audio of each of the songs listed above, then you should have enough raw input to help with my WCS seeking task.   The song itself contains the song itself.

I&#039;m going to try and make it to IIiX as well as HCIR.</description>
		<content:encoded><![CDATA[<p><i>But even there the richness of the raw input matters.</i></p>
<p>Yes, but if you&#8217;ve got the audio of each of the songs listed above, then you should have enough raw input to help with my WCS seeking task.   The song itself contains the song itself.</p>
<p>I&#8217;m going to try and make it to IIiX as well as HCIR.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5825</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Mon, 12 Apr 2010 23:29:32 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5825</guid>
		<description>Re feature induction / selection: I hear you. I can imagine applying a technique like &lt;a href -&quot;http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.6979&amp;rep=rep1&amp;type=pdf&quot; rel=&quot;nofollow&quot;&gt;streamwise feature selection&lt;/a&gt; toward this end. But even there the richness of the raw input matters.

I probably won&#039;t make it to SIGIR this year, but I hope you guys carry this conversation forward there--and continue it at the HCIR workshop a few weeks later!</description>
		<content:encoded><![CDATA[<p>Re feature induction / selection: I hear you. I can imagine applying a technique like <a href -"http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.6979&#038;rep=rep1&#038;type=pdf" rel="nofollow">streamwise feature selection</a> toward this end. But even there the richness of the raw input matters.</p>
<p>I probably won&#8217;t make it to SIGIR this year, but I hope you guys carry this conversation forward there&#8211;and continue it at the HCIR workshop a few weeks later!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-2/#comment-5824</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 23:12:23 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5824</guid>
		<description>Yes, the Greiff paper was a question about the mathematics of Bayesian sets, not a question about relevance feedback.  

And I still think that the only difference between query by example and relevance feedback is at the stage of the process at which they are executed.  Conceptually, I mean.  Marking a document as relevant is, imho, no different than saying &quot;add this document to the next query that I execute&quot;.  But maybe I&#039;m just splitting hairs.

But I do think our conversation is becoming complex enough, and with enough subtle subthreads, that it&#039;s becoming difficult to continue properly.  Perhaps in person?  Will you be at SIGIR?

Oh, and @Daniel: there is a middle ground between taking exactly the features that are given to you on the one hand, and trying to draw water from a stone on the other.  Look at some of the old Della Pietra work on feature induction.  That&#039;s where you create new features out of existing raw data, in a task-driven manner.</description>
		<content:encoded><![CDATA[<p>Yes, the Greiff paper was a question about the mathematics of Bayesian sets, not a question about relevance feedback.  </p>
<p>And I still think that the only difference between query by example and relevance feedback is at the stage of the process at which they are executed.  Conceptually, I mean.  Marking a document as relevant is, imho, no different than saying &#8220;add this document to the next query that I execute&#8221;.  But maybe I&#8217;m just splitting hairs.</p>
<p>But I do think our conversation is becoming complex enough, and with enough subtle subthreads, that it&#8217;s becoming difficult to continue properly.  Perhaps in person?  Will you be at SIGIR?</p>
<p>Oh, and @Daniel: there is a middle ground between taking exactly the features that are given to you on the one hand, and trying to draw water from a stone on the other.  Look at some of the old Della Pietra work on feature induction.  That&#8217;s where you create new features out of existing raw data, in a task-driven manner.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5822</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Mon, 12 Apr 2010 17:56:50 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5822</guid>
		<description>Hi Jeremy,

Ok, I think I understand where this is coming from now. So vanilla Bayesian Sets, is not designed for the purpose of doing relevance feedback. It could easily be extended to do so, and I have to add in a caveat here that I&#039;m not totally up on all that has been happening at Xyggy. But, you started out asking me the difference between a particular relevance feedback score and the Bayes Sets score, and I said that there were differences because of Bayesian Sets being Bayesian and comparing different hypotheses because it wasn&#039;t designed specifically to perform relevance feedback (ie include negative examples, etc.). This is still all true.

So I&#039;m glad we agree that the initial query-by-example is different than relevance feedback. That is what the Bayesian Sets score is designed to do, be a retrieval method you can use instead of the standard text query (btw even when you input a text query on the xyggy site, its just being used as a shorthand way of getting a query set). Thus the hypotheses are different than that of the relevance feedback work.

Now, you point out that on the Xyggy site there is this interactive querying situation, and isn&#039;t this the same as relevance feedback? Its doing relevance feedback I suppose, if that&#039;s what the user is using the interactive process for. However, I believe, the score is not different from the initial query (though as I keep saying, relevance feedback in Bayes Sets is an extension being worked on). Its also possible that the interaction with the user has happened because the user is forming a new query, so the difficulty here in incorporating the negative examples is in determining when the user is refining an old query, versus forming a new one. Right now the algorithm treats every change to the query as a new query. So I&#039;d call whats going on &quot;query exploration&quot; or something like that. I don&#039;t know if there&#039;s a name for this. The user is using the retrieved items to modify the query, but the scoring process is not changing, and treats every query like the first one. I&#039;d say it&#039;s sort of like a sandbox.

In terms of the complexity discussion, I thought that you were implying complexity in mathematics (complex functions of the features). In terms of other things, like poor algorithms, features, UI etc. - I feel like there are bunch of issues that are getting conflated here. I certainly agree with Daniel, that if the information isn&#039;t there, there&#039;s not much anyone can do about it.</description>
		<content:encoded><![CDATA[<p>Hi Jeremy,</p>
<p>Ok, I think I understand where this is coming from now. So vanilla Bayesian Sets, is not designed for the purpose of doing relevance feedback. It could easily be extended to do so, and I have to add in a caveat here that I&#8217;m not totally up on all that has been happening at Xyggy. But, you started out asking me the difference between a particular relevance feedback score and the Bayes Sets score, and I said that there were differences because of Bayesian Sets being Bayesian and comparing different hypotheses because it wasn&#8217;t designed specifically to perform relevance feedback (ie include negative examples, etc.). This is still all true.</p>
<p>So I&#8217;m glad we agree that the initial query-by-example is different than relevance feedback. That is what the Bayesian Sets score is designed to do, be a retrieval method you can use instead of the standard text query (btw even when you input a text query on the xyggy site, its just being used as a shorthand way of getting a query set). Thus the hypotheses are different than that of the relevance feedback work.</p>
<p>Now, you point out that on the Xyggy site there is this interactive querying situation, and isn&#8217;t this the same as relevance feedback? Its doing relevance feedback I suppose, if that&#8217;s what the user is using the interactive process for. However, I believe, the score is not different from the initial query (though as I keep saying, relevance feedback in Bayes Sets is an extension being worked on). Its also possible that the interaction with the user has happened because the user is forming a new query, so the difficulty here in incorporating the negative examples is in determining when the user is refining an old query, versus forming a new one. Right now the algorithm treats every change to the query as a new query. So I&#8217;d call whats going on &#8220;query exploration&#8221; or something like that. I don&#8217;t know if there&#8217;s a name for this. The user is using the retrieved items to modify the query, but the scoring process is not changing, and treats every query like the first one. I&#8217;d say it&#8217;s sort of like a sandbox.</p>
<p>In terms of the complexity discussion, I thought that you were implying complexity in mathematics (complex functions of the features). In terms of other things, like poor algorithms, features, UI etc. &#8211; I feel like there are bunch of issues that are getting conflated here. I certainly agree with Daniel, that if the information isn&#8217;t there, there&#8217;s not much anyone can do about it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5820</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 15:20:16 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5820</guid>
		<description>&lt;i&gt;However, I think its also interesting to think about retrieval systems that perhaps start out doing something fast, and then if the user remains unhappy with the results, offers the option of spending more time and learning something more complicated.&lt;/i&gt;

I totally agree.

&lt;i&gt;My guess is right now, though, that most people’s frustration comes from poor algorithms, features, and user interface design, and not from needing to learn some super-complex model of the concept they’d like retrieved.&lt;/i&gt;

I submit that what makes a concept &quot;complex&quot; is that the algorithms and features, in their out-of-the-box configuration, do not align well with the concept that the user is trying to express.  For example, my ongoing Google frustration that I am unable to sort by least recent, topically relevant information when doing a literature search.  

Complexity is not always a function of mathematics; it can also be a function of constrained design.</description>
		<content:encoded><![CDATA[<p><i>However, I think its also interesting to think about retrieval systems that perhaps start out doing something fast, and then if the user remains unhappy with the results, offers the option of spending more time and learning something more complicated.</i></p>
<p>I totally agree.</p>
<p><i>My guess is right now, though, that most people’s frustration comes from poor algorithms, features, and user interface design, and not from needing to learn some super-complex model of the concept they’d like retrieved.</i></p>
<p>I submit that what makes a concept &#8220;complex&#8221; is that the algorithms and features, in their out-of-the-box configuration, do not align well with the concept that the user is trying to express.  For example, my ongoing Google frustration that I am unable to sort by least recent, topically relevant information when doing a literature search.  </p>
<p>Complexity is not always a function of mathematics; it can also be a function of constrained design.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5819</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 15:07:40 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5819</guid>
		<description>&lt;i&gt;The problem of relevance feedback is different than that of querying with examples if you think about it. When a user performs a query with examples, they select a few positive examples only from a very large collection of items. When a user performs (standard) relevance feedback, there has already been some stab at what the user wants, and a limited number of “close” items have been returned. From this small number of items, you’re able to get the user to tell you which ones are positive examples, but not only that, the ones they haven’t selected are also (likely) negative examples. You can use this “close but negative” information to refine your results as well.&lt;/i&gt;

Um, maybe I&#039;m just dense, but I really don&#039;t see how this is any different from relevance feedback.  What I mean is, this is an interactive process, correct?  It doesn&#039;t matter if you&#039;ve started with a &quot;query by text&quot; or with a &quot;query by example&quot;, if in the 2nd round of interaction, the system produces a list of relevant mixed with &quot;close but nonrelevant&quot; results into a list.  At that point, as you start then dragging more relevant examples in, and non-relevant examples out, how is that not relevance feedback?  Procedurally and conceptually, I mean?  I understand that the mathematics that make it possible are different.  But if those mathematics produce a set of relevant and close but nonrelevant results after the initial query-by-example, then you&#039;ve got relevance feedback, do you not?  As Dinesh writes:

&lt;i&gt;An important aspect of our approach is that the search box accepts text queries as well as items, by dragging them in and out of the search box.  An implementation using patent data is at http://www.xyggy.com/patent.php.  Enter keywords (e.g., “earthquake sensor”) and relevant items to the keywords are displayed.  Drag an item of interest from the results into the search box and the relevance changes.  When two or more items are added into the search box, the system discovers what they have in common and returns better results.  Items can be toggled in/out of the search by clicking the +/- symbol and items can be completely removed by dragging them out of the search box.  Each change to an item in the search box automatically retrieves new relevant results. &lt;/i&gt;

So when you first, immediately start interacting with the system, I do see that there is a difference in that standard IR requires query-by-text, whereas you enable query-by-example -- it doesn&#039;t have to be a text query.  But the &lt;i&gt;results&lt;/i&gt; of that query by example produce a ranked list of relevant and close-but-not-relevant items, which you then use more Bayesian sets to tease apart, given that the user starts dragging each example in/out.  Right?  Or am I fundamentally still not understanding something?</description>
		<content:encoded><![CDATA[<p><i>The problem of relevance feedback is different than that of querying with examples if you think about it. When a user performs a query with examples, they select a few positive examples only from a very large collection of items. When a user performs (standard) relevance feedback, there has already been some stab at what the user wants, and a limited number of “close” items have been returned. From this small number of items, you’re able to get the user to tell you which ones are positive examples, but not only that, the ones they haven’t selected are also (likely) negative examples. You can use this “close but negative” information to refine your results as well.</i></p>
<p>Um, maybe I&#8217;m just dense, but I really don&#8217;t see how this is any different from relevance feedback.  What I mean is, this is an interactive process, correct?  It doesn&#8217;t matter if you&#8217;ve started with a &#8220;query by text&#8221; or with a &#8220;query by example&#8221;, if in the 2nd round of interaction, the system produces a list of relevant mixed with &#8220;close but nonrelevant&#8221; results into a list.  At that point, as you start then dragging more relevant examples in, and non-relevant examples out, how is that not relevance feedback?  Procedurally and conceptually, I mean?  I understand that the mathematics that make it possible are different.  But if those mathematics produce a set of relevant and close but nonrelevant results after the initial query-by-example, then you&#8217;ve got relevance feedback, do you not?  As Dinesh writes:</p>
<p><i>An important aspect of our approach is that the search box accepts text queries as well as items, by dragging them in and out of the search box.  An implementation using patent data is at <a href="http://www.xyggy.com/patent.php" rel="nofollow">http://www.xyggy.com/patent.php</a>.  Enter keywords (e.g., “earthquake sensor”) and relevant items to the keywords are displayed.  Drag an item of interest from the results into the search box and the relevance changes.  When two or more items are added into the search box, the system discovers what they have in common and returns better results.  Items can be toggled in/out of the search by clicking the +/- symbol and items can be completely removed by dragging them out of the search box.  Each change to an item in the search box automatically retrieves new relevant results. </i></p>
<p>So when you first, immediately start interacting with the system, I do see that there is a difference in that standard IR requires query-by-text, whereas you enable query-by-example &#8212; it doesn&#8217;t have to be a text query.  But the <i>results</i> of that query by example produce a ranked list of relevant and close-but-not-relevant items, which you then use more Bayesian sets to tease apart, given that the user starts dragging each example in/out.  Right?  Or am I fundamentally still not understanding something?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5817</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Mon, 12 Apr 2010 12:24:40 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5817</guid>
		<description>(tempo &gt; 50 bpm) and (tempo &lt; 90 bpm) and (beatEmphasis == (2 &#124; 4))  would be nice, but I could imagine something not quite as elegant--as long as rhythm is somehow represented. Just want to state the obvious that you can&#039;t draw water from a stone: the information necessary to identify the basis for similarity may simply not be accessible. I think it would be a highly desirable system quality to help the user figure that out as quickly as possible.</description>
		<content:encoded><![CDATA[<p>(tempo > 50 bpm) and (tempo < 90 bpm) and (beatEmphasis == (2 | 4))  would be nice, but I could imagine something not quite as elegant&#8211;as long as rhythm is somehow represented. Just want to state the obvious that you can&#8217;t draw water from a stone: the information necessary to identify the basis for similarity may simply not be accessible. I think it would be a highly desirable system quality to help the user figure that out as quickly as possible.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Antonio</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5816</link>
		<dc:creator>Antonio</dc:creator>
		<pubDate>Mon, 12 Apr 2010 11:22:30 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5816</guid>
		<description>There is also an interesting IR approach here:

whatisprymas.wordpress.com/</description>
		<content:encoded><![CDATA[<p>There is also an interesting IR approach here:</p>
<p>whatisprymas.wordpress.com/</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5814</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Mon, 12 Apr 2010 10:24:29 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5814</guid>
		<description>Hi Jeremy,

I know what relevance feedback is. The difference is as I described 3 posts ago (I&#039;ll copy here):

---
In terms of relevance feedback, it does seem that its a problem which is a natural fit for Bayes Sets to address, and one that we’re working on. The problem of relevance feedback is different than that of querying with examples if you think about it. When a user performs a query with examples, they select a few positive examples only from a very large collection of items. When a user performs (standard) relevance feedback, there has already been some stab at what the user wants, and a limited number of “close” items have been returned. From this small number of items, you’re able to get the user to tell you which ones are positive examples, but not only that, the ones they haven’t selected are also (likely) negative examples. You can use this “close but negative” information to refine your results as well. This would be good, for example, in the case that you’ve been talking about with 80s and concept X songs. The downside is that the items being labelled by the user are from a very small pool which is biased by the original retrieval process, and therefore makes most sense as a refinement technique.
---

The score in the relevance feedback paper you linked to uses the &quot;close negative&quot; examples that the user just labelled in the non-relevant hypothesis, in order to refine the results. When we query with examples, we don&#039;t have these &quot;close negative&quot; examples, nor has the algorithm presented the user with retrieval results to label (because we&#039;re not doing relevance feedback), nor is it attempting to refine an already used retrieval process. Therefore our &quot;non-relevant&quot; hypothesis does not include any &quot;close negative&quot; examples, or &quot;negative&quot; examples at all. Instead the hypothesis says that the item being scored is in a different cluster from the query items, where a cluster is defined by a probabilistic model.

WCS songs: Assuming your features do capture the information you&#039;re trying to retrieve, the complexity you describe will depend to some degree on the probabilistic model you choose to represent a cluster, within the BS framework. The problem is, in general, if you want to have an extremely complicated model, learning is slow, and therefore retrieval is slow. In practice we&#039;ve found that you most often get really, really good results with simple bernoulli models. And honestly I would be surprised if, given a reasonable number of music features, the concept you want couldn&#039;t be distinguished with this.

However, I think its also interesting to think about retrieval systems that perhaps start out doing something fast, and then if the user remains unhappy with the results, offers the option of spending more time and learning something more complicated. My guess is right now, though, that most people&#039;s frustration comes from poor algorithms, features, and user interface design, and not from needing to learn some super-complex model of the concept they&#039;d like retrieved.</description>
		<content:encoded><![CDATA[<p>Hi Jeremy,</p>
<p>I know what relevance feedback is. The difference is as I described 3 posts ago (I&#8217;ll copy here):</p>
<p>&#8212;<br />
In terms of relevance feedback, it does seem that its a problem which is a natural fit for Bayes Sets to address, and one that we’re working on. The problem of relevance feedback is different than that of querying with examples if you think about it. When a user performs a query with examples, they select a few positive examples only from a very large collection of items. When a user performs (standard) relevance feedback, there has already been some stab at what the user wants, and a limited number of “close” items have been returned. From this small number of items, you’re able to get the user to tell you which ones are positive examples, but not only that, the ones they haven’t selected are also (likely) negative examples. You can use this “close but negative” information to refine your results as well. This would be good, for example, in the case that you’ve been talking about with 80s and concept X songs. The downside is that the items being labelled by the user are from a very small pool which is biased by the original retrieval process, and therefore makes most sense as a refinement technique.<br />
&#8212;</p>
<p>The score in the relevance feedback paper you linked to uses the &#8220;close negative&#8221; examples that the user just labelled in the non-relevant hypothesis, in order to refine the results. When we query with examples, we don&#8217;t have these &#8220;close negative&#8221; examples, nor has the algorithm presented the user with retrieval results to label (because we&#8217;re not doing relevance feedback), nor is it attempting to refine an already used retrieval process. Therefore our &#8220;non-relevant&#8221; hypothesis does not include any &#8220;close negative&#8221; examples, or &#8220;negative&#8221; examples at all. Instead the hypothesis says that the item being scored is in a different cluster from the query items, where a cluster is defined by a probabilistic model.</p>
<p>WCS songs: Assuming your features do capture the information you&#8217;re trying to retrieve, the complexity you describe will depend to some degree on the probabilistic model you choose to represent a cluster, within the BS framework. The problem is, in general, if you want to have an extremely complicated model, learning is slow, and therefore retrieval is slow. In practice we&#8217;ve found that you most often get really, really good results with simple bernoulli models. And honestly I would be surprised if, given a reasonable number of music features, the concept you want couldn&#8217;t be distinguished with this.</p>
<p>However, I think its also interesting to think about retrieval systems that perhaps start out doing something fast, and then if the user remains unhappy with the results, offers the option of spending more time and learning something more complicated. My guess is right now, though, that most people&#8217;s frustration comes from poor algorithms, features, and user interface design, and not from needing to learn some super-complex model of the concept they&#8217;d like retrieved.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5811</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 04:09:26 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5811</guid>
		<description>If you think about it abstractly, &quot;find all songs that are good to dance WCS to&quot; is quite a similar type of information need to &quot;find all the information necessary to describe how subway crime has varied in New York over the past two decades.&quot;  

The very fact that crime varies means that the same concise keywords or descriptors in the late 80&#039;s might not be the same best one&#039;s for the late 90&#039;s.  But the information need as a whole encompasses both.  Just like certain timbres might be good for one set of WCS songs and not for another.</description>
		<content:encoded><![CDATA[<p>If you think about it abstractly, &#8220;find all songs that are good to dance WCS to&#8221; is quite a similar type of information need to &#8220;find all the information necessary to describe how subway crime has varied in New York over the past two decades.&#8221;  </p>
<p>The very fact that crime varies means that the same concise keywords or descriptors in the late 80&#8242;s might not be the same best one&#8217;s for the late 90&#8242;s.  But the information need as a whole encompasses both.  Just like certain timbres might be good for one set of WCS songs and not for another.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5810</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 02:55:04 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5810</guid>
		<description>&lt;i&gt;But I think it reasonable to require that a set have a concise description in terms of the feature space in order to be findable.&lt;/i&gt;

You mean something like:

If (tempo &gt; 50 bpm) and (tempo &lt; 90 bpm) and (beatEmphasis == (2 &#124; 4)) and ... then song = relevant (i.e. WCS-able)?

Maybe something so simple does exist.  But if it doesn&#039;t exist, then I would hope that we could develop methods that learned how to come up with (discover) more complex interactions in the primitives that do concisely describe the information need.  The primitive features given to the system might be tempo and beat onsets.  But the features necessary for classification might be something like &quot;histogram of note duration ratios&quot;.  A good IR system should be able to induce some of this more complicated structure as part of the active learning, interactive, HCIR process.  

Do Bayesian sets do that, or do they rather focus more on adjusting weights on existing feature primitive vectors?</description>
		<content:encoded><![CDATA[<p><i>But I think it reasonable to require that a set have a concise description in terms of the feature space in order to be findable.</i></p>
<p>You mean something like:</p>
<p>If (tempo &gt; 50 bpm) and (tempo &lt; 90 bpm) and (beatEmphasis == (2 | 4)) and &#8230; then song = relevant (i.e. WCS-able)?</p>
<p>Maybe something so simple does exist.  But if it doesn&#039;t exist, then I would hope that we could develop methods that learned how to come up with (discover) more complex interactions in the primitives that do concisely describe the information need.  The primitive features given to the system might be tempo and beat onsets.  But the features necessary for classification might be something like &quot;histogram of note duration ratios&quot;.  A good IR system should be able to induce some of this more complicated structure as part of the active learning, interactive, HCIR process.  </p>
<p>Do Bayesian sets do that, or do they rather focus more on adjusting weights on existing feature primitive vectors?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5809</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Mon, 12 Apr 2010 02:35:18 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5809</guid>
		<description>Point taken. I&#039;m not expecting that the songs would have to be tagged &quot;west coast swing&quot;. But I think it reasonable to require that a set have a concise description in terms of the feature space in order to be findable. The challenge is arriving at that concise description. But if no such concise description exists, then I think the best one can hope for is an efficient process that yields a negative result.</description>
		<content:encoded><![CDATA[<p>Point taken. I&#8217;m not expecting that the songs would have to be tagged &#8220;west coast swing&#8221;. But I think it reasonable to require that a set have a concise description in terms of the feature space in order to be findable. The challenge is arriving at that concise description. But if no such concise description exists, then I think the best one can hope for is an efficient process that yields a negative result.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5808</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Mon, 12 Apr 2010 02:15:05 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5808</guid>
		<description>Well, that&#039;s part of my question, really.  Often times, the metadata that you do have are orthogonal to your information need.  

Think about this in terms of regular text search: navigation vs. exploratory search.  If the relevant document has the &quot;right&quot; text, and your query has the &quot;right&quot; text, then modulo spam, the problem of finding your relevant document is relatively easy. 

But if your query terms don&#039;t match the query terms in the relevant document, or if there are multiple relevant documents and some match and some don&#039;t, or if your information need is not easily expressed in query terms (e.g. my ongoing frustration with Google about not being able to sort by *least recent* most topically relevant documents), then you&#039;ll have a problem.

Now, this same issue applies to music and Bayesian sets.  Or &quot;examples&quot; of any kind, and Bayesian sets.  How well do Bayesian set do with an &quot;exploratory&quot; information need?  How well do they do when no specific sets of features really easily describes the information need?  (e.g. west coast swing).  

I ask, because to me, that&#039;s the more interesting kind of IR problem.  I believe that there needs to be an acknowledgment that the features (whether data or metadata) will never exactly match the information need.  The research question is: What does one do about that?</description>
		<content:encoded><![CDATA[<p>Well, that&#8217;s part of my question, really.  Often times, the metadata that you do have are orthogonal to your information need.  </p>
<p>Think about this in terms of regular text search: navigation vs. exploratory search.  If the relevant document has the &#8220;right&#8221; text, and your query has the &#8220;right&#8221; text, then modulo spam, the problem of finding your relevant document is relatively easy. </p>
<p>But if your query terms don&#8217;t match the query terms in the relevant document, or if there are multiple relevant documents and some match and some don&#8217;t, or if your information need is not easily expressed in query terms (e.g. my ongoing frustration with Google about not being able to sort by *least recent* most topically relevant documents), then you&#8217;ll have a problem.</p>
<p>Now, this same issue applies to music and Bayesian sets.  Or &#8220;examples&#8221; of any kind, and Bayesian sets.  How well do Bayesian set do with an &#8220;exploratory&#8221; information need?  How well do they do when no specific sets of features really easily describes the information need?  (e.g. west coast swing).  </p>
<p>I ask, because to me, that&#8217;s the more interesting kind of IR problem.  I believe that there needs to be an acknowledgment that the features (whether data or metadata) will never exactly match the information need.  The research question is: What does one do about that?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5807</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Mon, 12 Apr 2010 00:17:24 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5807</guid>
		<description>I wonder if even Pandora would be able to generalize from a large set of west coast swing songs which had no other strong commonality. Seems this is at least much about available metadata (e.g., bpm, playlists) as choice of algorithm.</description>
		<content:encoded><![CDATA[<p>I wonder if even Pandora would be able to generalize from a large set of west coast swing songs which had no other strong commonality. Seems this is at least much about available metadata (e.g., bpm, playlists) as choice of algorithm.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5806</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sun, 11 Apr 2010 23:48:30 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5806</guid>
		<description>By the way, these two versions of &quot;If you go away&quot; are not very good to dance the WCS to:

http://www.youtube.com/watch?v=y9OJCoTQqbA

http://www.youtube.com/watch?v=i2wmKcBm4Ik

So it&#039;s not the song itself.</description>
		<content:encoded><![CDATA[<p>By the way, these two versions of &#8220;If you go away&#8221; are not very good to dance the WCS to:</p>
<p><a href="http://www.youtube.com/watch?v=y9OJCoTQqbA" rel="nofollow">http://www.youtube.com/watch?v=y9OJCoTQqbA</a></p>
<p><a href="http://www.youtube.com/watch?v=i2wmKcBm4Ik" rel="nofollow">http://www.youtube.com/watch?v=i2wmKcBm4Ik</a></p>
<p>So it&#8217;s not the song itself.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5805</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Sun, 11 Apr 2010 23:43:42 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5805</guid>
		<description>Ok, I understand the difference between ML estimation and Bayesian methods.  But the distinction between &quot;his method is relevance feedback&quot; and &quot;our method is examples&quot; is something I still don&#039;t quite get.  Relevance feedback is essentially the idea of getting the user to examine a set of documents, and say &quot;yes&quot; to some, and &quot;no&quot; to others.  Relevance feedback *is* example-based.  Don&#039;t let the word &quot;relevance&quot; fool you.  What&#039;s happening is that the user is doing a by-example query, by saying &quot;yes&quot; to some and &quot;no&quot; to others.

I found another example song.. too bad you can&#039;t see it:

http://www.youtube.com/watch?v=kF0IVXXg240

It&#039;s &quot;If you go away&quot; by Cyndi Lauper, covering a 1959 French song by Jacques Brel.  80s, 50s, cover songs, etc.  That&#039;s why I really am curious about how well the concept is detectable.

FWIW, the concept I was after was &quot;songs that are all good to dance a &#039;west coast swing&#039; to.  Good west coast swings span various genres, artists, timbres, and eras.  Some can be fast and funky, some can be slow and sultry.  In other words, good WCS swing songs are all across the map.  So I&#039;m just wondering how well Bayesian Sets can pick up on detecting/finding good WCS songs, even with 10 to 20 examples.

Here are some examples of various songs and people dancing to them.  Hopefully there is no Vevo in these:

http://www.youtube.com/watch?v=5wAnbmurwtE

http://www.youtube.com/watch?v=V3ZxiPKmacg

And for Daniel, another Coldplay:

http://www.youtube.com/watch?v=43SrQLFiE84

:-)</description>
		<content:encoded><![CDATA[<p>Ok, I understand the difference between ML estimation and Bayesian methods.  But the distinction between &#8220;his method is relevance feedback&#8221; and &#8220;our method is examples&#8221; is something I still don&#8217;t quite get.  Relevance feedback is essentially the idea of getting the user to examine a set of documents, and say &#8220;yes&#8221; to some, and &#8220;no&#8221; to others.  Relevance feedback *is* example-based.  Don&#8217;t let the word &#8220;relevance&#8221; fool you.  What&#8217;s happening is that the user is doing a by-example query, by saying &#8220;yes&#8221; to some and &#8220;no&#8221; to others.</p>
<p>I found another example song.. too bad you can&#8217;t see it:</p>
<p><a href="http://www.youtube.com/watch?v=kF0IVXXg240" rel="nofollow">http://www.youtube.com/watch?v=kF0IVXXg240</a></p>
<p>It&#8217;s &#8220;If you go away&#8221; by Cyndi Lauper, covering a 1959 French song by Jacques Brel.  80s, 50s, cover songs, etc.  That&#8217;s why I really am curious about how well the concept is detectable.</p>
<p>FWIW, the concept I was after was &#8220;songs that are all good to dance a &#8216;west coast swing&#8217; to.  Good west coast swings span various genres, artists, timbres, and eras.  Some can be fast and funky, some can be slow and sultry.  In other words, good WCS swing songs are all across the map.  So I&#8217;m just wondering how well Bayesian Sets can pick up on detecting/finding good WCS songs, even with 10 to 20 examples.</p>
<p>Here are some examples of various songs and people dancing to them.  Hopefully there is no Vevo in these:</p>
<p><a href="http://www.youtube.com/watch?v=5wAnbmurwtE" rel="nofollow">http://www.youtube.com/watch?v=5wAnbmurwtE</a></p>
<p><a href="http://www.youtube.com/watch?v=V3ZxiPKmacg" rel="nofollow">http://www.youtube.com/watch?v=V3ZxiPKmacg</a></p>
<p>And for Daniel, another Coldplay:</p>
<p><a href="http://www.youtube.com/watch?v=43SrQLFiE84" rel="nofollow">http://www.youtube.com/watch?v=43SrQLFiE84</a></p>
<p> <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5793</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Fri, 09 Apr 2010 11:23:54 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5793</guid>
		<description>Hi Jeremy,

So I&#039;ve only looked over that paper briefly, but it seems to me that he&#039;s using a likelihood ratio score (which can be seen as comparing 2 different hypotheses) to do term reweighting from relevance feedback. As far as I can tell, empirical frequencies are always used to compute the likelihoods, which is analogous to saying that there&#039;s a multivariate Bernoulli generative model, and performing maximum likelihood estimation.  He runs into some overfitting troubles due to the ML estimation which he tries to compensate for in his discussion on binning.

We also use probabilistic models, in what we&#039;ve presented here also multivariate Bernoulli&#039;s (though that doesn&#039;t need to be the case), to compare two different hypotheses, also having a flavor of relevant and non-relevant to the query. However, obviously our retrieval situation is different because our query is a bunch of examples, and his model is specifically for performing relevance feedback, so the exact hypotheses that we&#039;re comparing are different. We also don&#039;t perform ML estimation, but take a  Bayesian approach instead (you can look up Bayes factors if you&#039;re interested). Bayesian methods avoid the kind of overfitting situation in the binning section, and in fact allow us to compare the hypotheses that we do, since our &quot;non-relevant&quot; hypothesis has more parameters than our &quot;relevant&quot; hypothesis, and would therefore always be found to be more likely using maximum likelihood.

BTW I can&#039;t see your music links because I&#039;m in a different country: &quot;This video contains content from Vevo, who has blocked it in your country on copyright grounds.&quot;</description>
		<content:encoded><![CDATA[<p>Hi Jeremy,</p>
<p>So I&#8217;ve only looked over that paper briefly, but it seems to me that he&#8217;s using a likelihood ratio score (which can be seen as comparing 2 different hypotheses) to do term reweighting from relevance feedback. As far as I can tell, empirical frequencies are always used to compute the likelihoods, which is analogous to saying that there&#8217;s a multivariate Bernoulli generative model, and performing maximum likelihood estimation.  He runs into some overfitting troubles due to the ML estimation which he tries to compensate for in his discussion on binning.</p>
<p>We also use probabilistic models, in what we&#8217;ve presented here also multivariate Bernoulli&#8217;s (though that doesn&#8217;t need to be the case), to compare two different hypotheses, also having a flavor of relevant and non-relevant to the query. However, obviously our retrieval situation is different because our query is a bunch of examples, and his model is specifically for performing relevance feedback, so the exact hypotheses that we&#8217;re comparing are different. We also don&#8217;t perform ML estimation, but take a  Bayesian approach instead (you can look up Bayes factors if you&#8217;re interested). Bayesian methods avoid the kind of overfitting situation in the binning section, and in fact allow us to compare the hypotheses that we do, since our &#8220;non-relevant&#8221; hypothesis has more parameters than our &#8220;relevant&#8221; hypothesis, and would therefore always be found to be more likely using maximum likelihood.</p>
<p>BTW I can&#8217;t see your music links because I&#8217;m in a different country: &#8220;This video contains content from Vevo, who has blocked it in your country on copyright grounds.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Biweekly Links &#8211; 04-09-2010 &#171; God, Your Book Is Great !!</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5792</link>
		<dc:creator>Biweekly Links &#8211; 04-09-2010 &#171; God, Your Book Is Great !!</dc:creator>
		<pubDate>Fri, 09 Apr 2010 06:36:46 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5792</guid>
		<description>[...] Information Retrieval using a Bayesian Model of Learning and Generalization A discussion about Bayesian Sets and using them to find related stuff. The idea looks promising [...]</description>
		<content:encoded><![CDATA[<p>[...] Information Retrieval using a Bayesian Model of Learning and Generalization A discussion about Bayesian Sets and using them to find related stuff. The idea looks promising [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5785</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Thu, 08 Apr 2010 20:16:16 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5785</guid>
		<description>I had to look up the reference; I didn&#039;t know who that person was.  No, that&#039;s not Concept X.

I need to correct what I said in comment #33: there is NOT a concept that connects all those artists.  There is a concept that connects just those particular songs, in comments #3 and #27, by those artists.  But yes, this concept does exist.</description>
		<content:encoded><![CDATA[<p>I had to look up the reference; I didn&#8217;t know who that person was.  No, that&#8217;s not Concept X.</p>
<p>I need to correct what I said in comment #33: there is NOT a concept that connects all those artists.  There is a concept that connects just those particular songs, in comments #3 and #27, by those artists.  But yes, this concept does exist.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5784</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Thu, 08 Apr 2010 20:07:02 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5784</guid>
		<description>@Jeremy is ConceptX mutt lange?</description>
		<content:encoded><![CDATA[<p>@Jeremy is ConceptX mutt lange?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5783</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Thu, 08 Apr 2010 19:04:33 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5783</guid>
		<description>BTW, it&#039;s interesting that no one has even asked what &quot;Concept X&quot; is.  I&#039;m not making it up.  There really is a concept that connects Shania Twain, Kraftwerk, and the Eurythmics.  I really would be curious of Bayesian Sets could pick up on it.</description>
		<content:encoded><![CDATA[<p>BTW, it&#8217;s interesting that no one has even asked what &#8220;Concept X&#8221; is.  I&#8217;m not making it up.  There really is a concept that connects Shania Twain, Kraftwerk, and the Eurythmics.  I really would be curious of Bayesian Sets could pick up on it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5782</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Thu, 08 Apr 2010 16:04:14 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5782</guid>
		<description>@Daniel :-)

@Katherine: So would you say there is any kind of relationship between what you&#039;ve done, and some of the issues that Warren Greiff has explored?

&quot;A Theory of Term Weighting for Exploratory Data Analysis&quot;

http://ciir.cs.umass.edu/pubfiles/ir-122.pdf

In that paper, Greiff also sets up a ratio between generative models, i.e. p(occurrance&#124;relevant)/p(occurrance&#124;nonrelenvat).  This ratio is calculated on a per-term, per-feature basis.  Those features with the highest ratio can then be selected and used to get even more &quot;relevant&quot; documents.</description>
		<content:encoded><![CDATA[<p>@Daniel <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>@Katherine: So would you say there is any kind of relationship between what you&#8217;ve done, and some of the issues that Warren Greiff has explored?</p>
<p>&#8220;A Theory of Term Weighting for Exploratory Data Analysis&#8221;</p>
<p><a href="http://ciir.cs.umass.edu/pubfiles/ir-122.pdf" rel="nofollow">http://ciir.cs.umass.edu/pubfiles/ir-122.pdf</a></p>
<p>In that paper, Greiff also sets up a ratio between generative models, i.e. p(occurrance|relevant)/p(occurrance|nonrelenvat).  This ratio is calculated on a per-term, per-feature basis.  Those features with the highest ratio can then be selected and used to get even more &#8220;relevant&#8221; documents.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5775</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Thu, 08 Apr 2010 09:56:10 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5775</guid>
		<description>Yes, in that what you describe is generative probabilistic modeling in general. Generative probabilistic models are used for lots of things including language modeling, information retrieval, computer vision, compuational biology, etc.

What we compute is the probability that an item and the query items were drawn from the same (versus different) probability distribution, although we don&#039;t know exactly what that distribution is (ie the parameters).</description>
		<content:encoded><![CDATA[<p>Yes, in that what you describe is generative probabilistic modeling in general. Generative probabilistic models are used for lots of things including language modeling, information retrieval, computer vision, compuational biology, etc.</p>
<p>What we compute is the probability that an item and the query items were drawn from the same (versus different) probability distribution, although we don&#8217;t know exactly what that distribution is (ie the parameters).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gene Golovchinsky</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5769</link>
		<dc:creator>Gene Golovchinsky</dc:creator>
		<pubDate>Thu, 08 Apr 2010 03:19:07 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5769</guid>
		<description>This seems to be related to language-modeling as well, in the sense that your approach computes probabilities that a an item is drawn from a particular distribution. That&#039;s sort of what language models do as well, isn&#039;t it? Yes, the data structure might be different, but the basic principle seems related.</description>
		<content:encoded><![CDATA[<p>This seems to be related to language-modeling as well, in the sense that your approach computes probabilities that a an item is drawn from a particular distribution. That&#8217;s sort of what language models do as well, isn&#8217;t it? Yes, the data structure might be different, but the basic principle seems related.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5766</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Wed, 07 Apr 2010 22:56:21 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5766</guid>
		<description>Jeremy: I thought Coldplay only drew hooks from Joe Satriani!

&lt;object width=&quot;480&quot; height=&quot;385&quot;&gt;&lt;param name=&quot;movie&quot; value=&quot;http://www.youtube.com/v/OEGGFJLpbu4&amp;hl=en_US&amp;fs=1&amp;&quot;&gt;&lt;/param&gt;&lt;param name=&quot;allowFullScreen&quot; value=&quot;true&quot;&gt;&lt;/param&gt;&lt;param name=&quot;allowscriptaccess&quot; value=&quot;always&quot;&gt;&lt;/param&gt;&lt;embed src=&quot;http://www.youtube.com/v/OEGGFJLpbu4&amp;hl=en_US&amp;fs=1&amp;&quot; type=&quot;application/x-shockwave-flash&quot; allowscriptaccess=&quot;always&quot; allowfullscreen=&quot;true&quot; width=&quot;480&quot; height=&quot;385&quot;&gt;&lt;/embed&gt;&lt;/object&gt;</description>
		<content:encoded><![CDATA[<p>Jeremy: I thought Coldplay only drew hooks from Joe Satriani!</p>
<p><object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/OEGGFJLpbu4&#038;hl=en_US&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/OEGGFJLpbu4&#038;hl=en_US&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5763</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Wed, 07 Apr 2010 18:03:44 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5763</guid>
		<description>Hi Jeremy,

The algorithm can&#039;t read your mind :) As I said before, the same query could be given by someone who is genuinely interested in the 80s aspect. The algorithm needs more information to be able to distinguish you from them. This information could come in a variety of ways. 1) You could provide examples of non-80s concept X music. 2) You could potentially incorporate other external information about what you like to listen to, this is similar to 1, but perhaps more implicit. 3) Incorporate relevance feedback. 4) Provide clustered results, which I talked about before. These are the alternatives that come to mind straight off...

In terms of relevance feedback, it does seem that its a problem which is a natural fit for Bayes Sets to address, and one that we&#039;re working on. The problem of relevance feedback is different than that of querying with examples if you think about it. When a user performs a query with examples, they select a few positive examples only from a very large collection of items. When a user performs (standard) relevance feedback, there has already been some stab at what the user wants, and a limited number of &quot;close&quot; items have been returned. From this small number of items, you&#039;re able to get the user to tell you which ones are positive examples, but not only that, the ones they haven&#039;t selected are also (likely) negative examples. You can use this &quot;close but negative&quot; information to refine your results as well. This would be good, for example, in the case that you&#039;ve been talking about with 80s and concept X songs. The downside is that the items being labelled by the user are from a very small pool which is biased by the original retrieval process, and therefore makes most sense as a refinement technique.

In terms of methodology, the relevance feedback methods I tend to hear alot about are vector space methods (which I&#039;ve already addressed). I don&#039;t know the details of the ones you&#039;re talking about, but they sound quite different than how one might naturally think of doing relevance feedback in the Bayesian Sets paradigm.</description>
		<content:encoded><![CDATA[<p>Hi Jeremy,</p>
<p>The algorithm can&#8217;t read your mind <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  As I said before, the same query could be given by someone who is genuinely interested in the 80s aspect. The algorithm needs more information to be able to distinguish you from them. This information could come in a variety of ways. 1) You could provide examples of non-80s concept X music. 2) You could potentially incorporate other external information about what you like to listen to, this is similar to 1, but perhaps more implicit. 3) Incorporate relevance feedback. 4) Provide clustered results, which I talked about before. These are the alternatives that come to mind straight off&#8230;</p>
<p>In terms of relevance feedback, it does seem that its a problem which is a natural fit for Bayes Sets to address, and one that we&#8217;re working on. The problem of relevance feedback is different than that of querying with examples if you think about it. When a user performs a query with examples, they select a few positive examples only from a very large collection of items. When a user performs (standard) relevance feedback, there has already been some stab at what the user wants, and a limited number of &#8220;close&#8221; items have been returned. From this small number of items, you&#8217;re able to get the user to tell you which ones are positive examples, but not only that, the ones they haven&#8217;t selected are also (likely) negative examples. You can use this &#8220;close but negative&#8221; information to refine your results as well. This would be good, for example, in the case that you&#8217;ve been talking about with 80s and concept X songs. The downside is that the items being labelled by the user are from a very small pool which is biased by the original retrieval process, and therefore makes most sense as a refinement technique.</p>
<p>In terms of methodology, the relevance feedback methods I tend to hear alot about are vector space methods (which I&#8217;ve already addressed). I don&#8217;t know the details of the ones you&#8217;re talking about, but they sound quite different than how one might naturally think of doing relevance feedback in the Bayesian Sets paradigm.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5762</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Wed, 07 Apr 2010 15:06:11 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5762</guid>
		<description>@Daniel: Yes, you joke about Coldplay, but I did some trawling and was able to find one of their songs that I believe does borderline-fit the &quot;Concept X&quot; from which the other 6 songs are also drawn.  Here&#039;s that Coldplay song:

http://www.youtube.com/watch?v=_N9rH2x5KUw

Then again, while listening to it, I realized I&#039;d heard that hook before.  I grew up listening to Kraftwerk; check out their song &quot;Computer Love&quot;

http://www.youtube.com/watch?v=EEBPzD3MPWE

Eh?  Eh?  Same exact hook as Coldplay!  Seriously, give them both a listen!

The irony here is that not only are both this Coldplay piece AND the Kraftwerk piece drawn from this &quot;Concept X&quot; that I&#039;m trying to find more of, but the Kraftwerk piece was also released in.. you guessed it, 1981.  More 80s music.  

So now I&#039;m starting to wonder about precision vs. recall of Bayesian sets.  Katherine, your explanation was quite helpful, thank you.  But what if I start to get frustrated with all the 80s music, as I&#039;m trying to find more and more Concept X?  

Finally, one more question for Dinesh: How is this different from Relevance Feedback?  In classic relevance feedback, a (text) document is described in terms of (term and phrase) feature vectors.  The user looks at a set of n documents, and pick k &lt; n of those documents as relevant.  Then, probability distributions between the features vectors in those selected documents, and those feature vectors in the collection as a whole are compared, using things like Kullback-Leibler divergence or Bose-Einstein statistics.  Those features that best discriminate the current set from the rest of the collection are weighted higher, while those that are not good discriminators are weighted lower, and then those feature weightings are used to pull more similar documents from the collection.

Now, there is nothing in relevance feedback that says the document has to be text.  You could also have an image &quot;document&quot;, and your feature vector for that image could be color histograms, edges, etc.  And mechanically, you&#039;d still perform relevance feedback the same way.  

So are Bayesian Sets essentially a new way of doing relevance feedback?</description>
		<content:encoded><![CDATA[<p>@Daniel: Yes, you joke about Coldplay, but I did some trawling and was able to find one of their songs that I believe does borderline-fit the &#8220;Concept X&#8221; from which the other 6 songs are also drawn.  Here&#8217;s that Coldplay song:</p>
<p><a href="http://www.youtube.com/watch?v=_N9rH2x5KUw" rel="nofollow">http://www.youtube.com/watch?v=_N9rH2x5KUw</a></p>
<p>Then again, while listening to it, I realized I&#8217;d heard that hook before.  I grew up listening to Kraftwerk; check out their song &#8220;Computer Love&#8221;</p>
<p><a href="http://www.youtube.com/watch?v=EEBPzD3MPWE" rel="nofollow">http://www.youtube.com/watch?v=EEBPzD3MPWE</a></p>
<p>Eh?  Eh?  Same exact hook as Coldplay!  Seriously, give them both a listen!</p>
<p>The irony here is that not only are both this Coldplay piece AND the Kraftwerk piece drawn from this &#8220;Concept X&#8221; that I&#8217;m trying to find more of, but the Kraftwerk piece was also released in.. you guessed it, 1981.  More 80s music.  </p>
<p>So now I&#8217;m starting to wonder about precision vs. recall of Bayesian sets.  Katherine, your explanation was quite helpful, thank you.  But what if I start to get frustrated with all the 80s music, as I&#8217;m trying to find more and more Concept X?  </p>
<p>Finally, one more question for Dinesh: How is this different from Relevance Feedback?  In classic relevance feedback, a (text) document is described in terms of (term and phrase) feature vectors.  The user looks at a set of n documents, and pick k &lt; n of those documents as relevant.  Then, probability distributions between the features vectors in those selected documents, and those feature vectors in the collection as a whole are compared, using things like Kullback-Leibler divergence or Bose-Einstein statistics.  Those features that best discriminate the current set from the rest of the collection are weighted higher, while those that are not good discriminators are weighted lower, and then those feature weightings are used to pull more similar documents from the collection.</p>
<p>Now, there is nothing in relevance feedback that says the document has to be text.  You could also have an image &quot;document&quot;, and your feature vector for that image could be color histograms, edges, etc.  And mechanically, you&#039;d still perform relevance feedback the same way.  </p>
<p>So are Bayesian Sets essentially a new way of doing relevance feedback?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5761</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Wed, 07 Apr 2010 13:28:58 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5761</guid>
		<description>Ah, yes you&#039;re right its the same code across media types. Thanks!! :)</description>
		<content:encoded><![CDATA[<p>Ah, yes you&#8217;re right its the same code across media types. Thanks!! <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5760</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Wed, 07 Apr 2010 13:28:44 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5760</guid>
		<description>@ daniel

We believe this is best addressed through the interactive search box where individual items can be turned on and off.  Additionally, as Katherine alluded to earlier we are working on an explicit relevance feedback mechanism so that the user can quickly focus on their concept of interest.</description>
		<content:encoded><![CDATA[<p>@ daniel</p>
<p>We believe this is best addressed through the interactive search box where individual items can be turned on and off.  Additionally, as Katherine alluded to earlier we are working on an explicit relevance feedback mechanism so that the user can quickly focus on their concept of interest.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5759</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Wed, 07 Apr 2010 13:12:25 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5759</guid>
		<description>@christopher

The image search question is a good time to briefly talk about feature vectors and feature engineering wrt Xyggy and Bayesian Sets.  Xyggy can operate on all data types ranging from pure text to non-text and everything in between where each item type is defined by a specfic feature vector (schema).  A useful aspect is that developers are free to define the most suitable feature vectors for their application.  For example, we built two image search prototypes: in the first one, the feature vectors were obtained by processing each image with standard texture (Gabor &amp; Tamura) and color (HSV) filters to produce a feature vector of length 240 elements.  The second prototype consists of ~700K flickr images that were pre-processed with color histogram, color auto correlogram, edge direction histogram, wavelet texture, color moments and bag of visual words filters to deliver a feature vector with 1,134 features.  In general, a feature vector with standard texture and color features should be sufficient for consumer image search on the web with the option to add additional features if desired.  If image textual data such as tags, labels, user comments and so on are available then these can be added to the feature vector too.

The feature vector of an item type is defined beforehand during the data processing phase and is mostly a matter of common sense.  There doesn’t need to be a huge amount of feature engineering as the relevant information simply needs to be clearly present in the feature vectors. For example, if you want to search for movies, it is useful to have information about actors if you expect your system to find movies with the same actor.  If you represent images only with texture features, it won&#039;t find images with the same colors and vice-versa.  The main advantage is that it doesn’t really hurt to have too many features, if they are at least plausibly relevant to search.  

Depending on the search application some clever and sophisticated features can also be created.  For example, feature vectors can be defined for web pages and text documents that include word count occurrences for words as well as for phrases, semantic concepts and relationships, numbers, geo details, urls, patterns, tags and annotations and so on.</description>
		<content:encoded><![CDATA[<p>@christopher</p>
<p>The image search question is a good time to briefly talk about feature vectors and feature engineering wrt Xyggy and Bayesian Sets.  Xyggy can operate on all data types ranging from pure text to non-text and everything in between where each item type is defined by a specfic feature vector (schema).  A useful aspect is that developers are free to define the most suitable feature vectors for their application.  For example, we built two image search prototypes: in the first one, the feature vectors were obtained by processing each image with standard texture (Gabor &amp; Tamura) and color (HSV) filters to produce a feature vector of length 240 elements.  The second prototype consists of ~700K flickr images that were pre-processed with color histogram, color auto correlogram, edge direction histogram, wavelet texture, color moments and bag of visual words filters to deliver a feature vector with 1,134 features.  In general, a feature vector with standard texture and color features should be sufficient for consumer image search on the web with the option to add additional features if desired.  If image textual data such as tags, labels, user comments and so on are available then these can be added to the feature vector too.</p>
<p>The feature vector of an item type is defined beforehand during the data processing phase and is mostly a matter of common sense.  There doesn’t need to be a huge amount of feature engineering as the relevant information simply needs to be clearly present in the feature vectors. For example, if you want to search for movies, it is useful to have information about actors if you expect your system to find movies with the same actor.  If you represent images only with texture features, it won&#8217;t find images with the same colors and vice-versa.  The main advantage is that it doesn’t really hurt to have too many features, if they are at least plausibly relevant to search.  </p>
<p>Depending on the search application some clever and sophisticated features can also be created.  For example, feature vectors can be defined for web pages and text documents that include word count occurrences for words as well as for phrases, semantic concepts and relationships, numbers, geo details, urls, patterns, tags and annotations and so on.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5758</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Wed, 07 Apr 2010 12:48:47 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5758</guid>
		<description>Hi Katherine,

A good overview of Bayesian Sets on a Nutshell, thank you for taking the time.

2 things:

1. By same code I was refering to bayesian sets using the same bayesien set code across different media types not that they and LDA are code equivalents.

2. I now understand how you are taking into account the query and how that goes well beyond a basic LDA topic model. I do believe a similar tact can be taken in LDA but with a bounded set of items (the topics) that can be evaluated but I get how your method is likely  superior in not only coverage but performance.

In this light Bayesian Sets do sound very useful. :)</description>
		<content:encoded><![CDATA[<p>Hi Katherine,</p>
<p>A good overview of Bayesian Sets on a Nutshell, thank you for taking the time.</p>
<p>2 things:</p>
<p>1. By same code I was refering to bayesian sets using the same bayesien set code across different media types not that they and LDA are code equivalents.</p>
<p>2. I now understand how you are taking into account the query and how that goes well beyond a basic LDA topic model. I do believe a similar tact can be taken in LDA but with a bounded set of items (the topics) that can be evaluated but I get how your method is likely  superior in not only coverage but performance.</p>
<p>In this light Bayesian Sets do sound very useful. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5756</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Wed, 07 Apr 2010 09:29:56 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5756</guid>
		<description>Hi Christopher,

Bayesian Sets is not the same code or algorithm as LDA. Let me try to explain:

In LDA &quot;topics&quot; are probability distributions over words (and are not &quot;items&quot; themselves), and the words might be our &quot;features&quot;. Documents, which are then our &quot;items&quot; (they&#039;re the objects we&#039;d like retrieved), belong to multiple &quot;topics&quot;. You could therefore think that one way to create a &quot;set&quot; is to choose a particular topic and then rank all the documents in terms of their membership to that &quot;topic&quot;.

But where is the query? LDA comes up with a fixed set of topics for the entire corpus, and a single set of assignments of documents to &quot;topics&quot;, which are responsible for explaining all of the words in the document. 

Bayesian Sets on the other hand, takes in a user query, and then finds the set which is defined by this query. In theory there are an unbounded number of queries that could be specified/ sets that can be retrieved. Bayesian Sets also doesn&#039;t care about clustering the whole corpus like LDA, only about that which is related to the query. Lastly, the concept exemplified by the query is not responsible for explaining all the features of the items, only those relevant to the query.</description>
		<content:encoded><![CDATA[<p>Hi Christopher,</p>
<p>Bayesian Sets is not the same code or algorithm as LDA. Let me try to explain:</p>
<p>In LDA &#8220;topics&#8221; are probability distributions over words (and are not &#8220;items&#8221; themselves), and the words might be our &#8220;features&#8221;. Documents, which are then our &#8220;items&#8221; (they&#8217;re the objects we&#8217;d like retrieved), belong to multiple &#8220;topics&#8221;. You could therefore think that one way to create a &#8220;set&#8221; is to choose a particular topic and then rank all the documents in terms of their membership to that &#8220;topic&#8221;.</p>
<p>But where is the query? LDA comes up with a fixed set of topics for the entire corpus, and a single set of assignments of documents to &#8220;topics&#8221;, which are responsible for explaining all of the words in the document. </p>
<p>Bayesian Sets on the other hand, takes in a user query, and then finds the set which is defined by this query. In theory there are an unbounded number of queries that could be specified/ sets that can be retrieved. Bayesian Sets also doesn&#8217;t care about clustering the whole corpus like LDA, only about that which is related to the query. Lastly, the concept exemplified by the query is not responsible for explaining all the features of the items, only those relevant to the query.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5751</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Wed, 07 Apr 2010 02:21:55 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5751</guid>
		<description>Thanks. :)</description>
		<content:encoded><![CDATA[<p>Thanks. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5750</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Wed, 07 Apr 2010 02:18:30 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5750</guid>
		<description>Fixed! Thanks for the heads up.</description>
		<content:encoded><![CDATA[<p>Fixed! Thanks for the heads up.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5749</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Wed, 07 Apr 2010 02:13:50 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5749</guid>
		<description>Daniel, your blog no longer seems to have a notify of new comments feature anymore?</description>
		<content:encoded><![CDATA[<p>Daniel, your blog no longer seems to have a notify of new comments feature anymore?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5748</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Wed, 07 Apr 2010 02:05:05 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5748</guid>
		<description>Dinesh, my concern is simply that users be able to understand--and ideally manipulate--the basis for similarity. That would address the problem of an algorithm making the wrong generalization from insufficient input.</description>
		<content:encoded><![CDATA[<p>Dinesh, my concern is simply that users be able to understand&#8211;and ideally manipulate&#8211;the basis for similarity. That would address the problem of an algorithm making the wrong generalization from insufficient input.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: renaissance chambara &#124; Ged Carroll - Links of the day</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5747</link>
		<dc:creator>renaissance chambara &#124; Ged Carroll - Links of the day</dc:creator>
		<pubDate>Tue, 06 Apr 2010 23:04:49 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5747</guid>
		<description>[...] Guest Post: Information Retrieval using a Bayesian Model of Learning and Generalization - interesting article on increasing search relevance. I know that Microsoft had done a lot of work on Bayesian mathematics for search [...]</description>
		<content:encoded><![CDATA[<p>[...] Guest Post: Information Retrieval using a Bayesian Model of Learning and Generalization &#8211; interesting article on increasing search relevance. I know that Microsoft had done a lot of work on Bayesian mathematics for search [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5746</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Tue, 06 Apr 2010 21:23:21 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5746</guid>
		<description>Hi Dinesh,

If it&#039;s about the novel algorithm then that is cool &amp; useful and I congratulate you on creating a &quot;real&quot; patent! :)

So what I seem to be gathering is I&#039;m not crazy and yes I can do something similar using LDA in the text realm but your method moves beyond text to allow the SAME (at the code level) algorithm work across media types. 

That is definitely highly useful &amp; needed in a lot of verticals! I&#039;m especially excited about how it works on unlabeled images.

Oh and I&#039;m now interested. ;)</description>
		<content:encoded><![CDATA[<p>Hi Dinesh,</p>
<p>If it&#8217;s about the novel algorithm then that is cool &amp; useful and I congratulate you on creating a &#8220;real&#8221; patent! <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>So what I seem to be gathering is I&#8217;m not crazy and yes I can do something similar using LDA in the text realm but your method moves beyond text to allow the SAME (at the code level) algorithm work across media types. </p>
<p>That is definitely highly useful &amp; needed in a lot of verticals! I&#8217;m especially excited about how it works on unlabeled images.</p>
<p>Oh and I&#8217;m now interested. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5745</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Tue, 06 Apr 2010 20:48:21 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5745</guid>
		<description>Christopher:

The patent is about the novel Bayesian Sets algorithm and we also agree that it is pretty cool!  

It is a new search tool and working through it you&#039;ll discover the countless new uses that are not possible with text-search.

It is easy to see how item-search with the interactive search box fits in almost naturally with touch devices such as the iPod and iPad.  Imagine Apps from media companies where items such as articles, images, music, movies, sounds and ads can be dragged in and out of the search box to find other relevant items.   

With text-search, you are essentially looking for some combination of the query text in a corpus of documents (or web pages).  With item-search, the item can be any data type, and a query finds other relevant items based on how similar they are to the query items and delivered in ranked order.</description>
		<content:encoded><![CDATA[<p>Christopher:</p>
<p>The patent is about the novel Bayesian Sets algorithm and we also agree that it is pretty cool!  </p>
<p>It is a new search tool and working through it you&#8217;ll discover the countless new uses that are not possible with text-search.</p>
<p>It is easy to see how item-search with the interactive search box fits in almost naturally with touch devices such as the iPod and iPad.  Imagine Apps from media companies where items such as articles, images, music, movies, sounds and ads can be dragged in and out of the search box to find other relevant items.   </p>
<p>With text-search, you are essentially looking for some combination of the query text in a corpus of documents (or web pages).  With item-search, the item can be any data type, and a query finds other relevant items based on how similar they are to the query items and delivered in ranked order.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5744</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Tue, 06 Apr 2010 20:34:35 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5744</guid>
		<description>Hi Katherine, 

At this time I guess I&#039;ll have to take your word for it, but are Latent Topics not essentially equivalent to items (aka  similar topics) in a &quot;set&quot;?

If I am describing this correctly I think you can see why I think using LDA to deal with &quot;topics&quot; can return a similar &quot;set&quot; as described in this post which can then use inputs to query for related &quot;topics&quot; (aka set items) across the input corpus or set.

By leveraging LDA one can (obviously) apply the topic model to different tasks but maybe I&#039;m missing something key here - and I acknowledge I may well be - I&#039;m always willing to learn and be proven wrong is sometimes a requirement to that. :) 

Let me know but I will also re-read your paper.</description>
		<content:encoded><![CDATA[<p>Hi Katherine, </p>
<p>At this time I guess I&#8217;ll have to take your word for it, but are Latent Topics not essentially equivalent to items (aka  similar topics) in a &#8220;set&#8221;?</p>
<p>If I am describing this correctly I think you can see why I think using LDA to deal with &#8220;topics&#8221; can return a similar &#8220;set&#8221; as described in this post which can then use inputs to query for related &#8220;topics&#8221; (aka set items) across the input corpus or set.</p>
<p>By leveraging LDA one can (obviously) apply the topic model to different tasks but maybe I&#8217;m missing something key here &#8211; and I acknowledge I may well be &#8211; I&#8217;m always willing to learn and be proven wrong is sometimes a requirement to that. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  </p>
<p>Let me know but I will also re-read your paper.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5743</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Tue, 06 Apr 2010 19:59:43 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5743</guid>
		<description>This is nothing like LDA. LDA tries to find latent &quot;topics&quot; in a corpus of documents, where it is assumed that each word is generated from a topic, and each document is generated from a mixture of topics. Bayesian Sets provides a totally different model for addressing a different problem than that which LDA addresses.</description>
		<content:encoded><![CDATA[<p>This is nothing like LDA. LDA tries to find latent &#8220;topics&#8221; in a corpus of documents, where it is assumed that each word is generated from a topic, and each document is generated from a mixture of topics. Bayesian Sets provides a totally different model for addressing a different problem than that which LDA addresses.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5741</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Tue, 06 Apr 2010 17:49:32 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5741</guid>
		<description>So this is nothing like applying LDA to a set of data then? Sure sounds like it...</description>
		<content:encoded><![CDATA[<p>So this is nothing like applying LDA to a set of data then? Sure sounds like it&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5740</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Tue, 06 Apr 2010 17:46:48 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5740</guid>
		<description>My aversion to &quot;software&quot; patents is also practical... I may be jumping the gun as well as I have not looked at the patent in question and if I am I appologize upfront BUT if this patent covers the concept of applying vector models to sets of data for &quot;filtering&quot; (NOT a valid patent in my view) vs. the specific and novel application of an  algorithmic state change (the algorithm making the change can be patented but NOT the outcome as there&#039;s unlimited ways to do the same).

I&#039;d argue the original (there&#039;s many of the former type patents now out there) rsa and lsi patents fall  into the latter category and if this one does too I&#039;m interested otherwise I am not.

I&#039;ll concede Jeremy&#039;s point tho that the concept of it being intetesting in general is not the right use of language by me.

Again if the patent covers a specific / novel transform ignore everything I&#039;ve said and not the other thing congradulations on some cool work. :)</description>
		<content:encoded><![CDATA[<p>My aversion to &#8220;software&#8221; patents is also practical&#8230; I may be jumping the gun as well as I have not looked at the patent in question and if I am I appologize upfront BUT if this patent covers the concept of applying vector models to sets of data for &#8220;filtering&#8221; (NOT a valid patent in my view) vs. the specific and novel application of an  algorithmic state change (the algorithm making the change can be patented but NOT the outcome as there&#8217;s unlimited ways to do the same).</p>
<p>I&#8217;d argue the original (there&#8217;s many of the former type patents now out there) rsa and lsi patents fall  into the latter category and if this one does too I&#8217;m interested otherwise I am not.</p>
<p>I&#8217;ll concede Jeremy&#8217;s point tho that the concept of it being intetesting in general is not the right use of language by me.</p>
<p>Again if the patent covers a specific / novel transform ignore everything I&#8217;ve said and not the other thing congradulations on some cool work. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5739</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Tue, 06 Apr 2010 17:34:48 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5739</guid>
		<description>Daniel:

The Bayesian Sets implementation (code) remains the same between different item-search services operating on different data types (eg.  text documents and unlabelled images).  What is different is the feature vector representation of the item type between different services.

Does this address your concern about &quot;black box similarity models&quot; or is it something else?</description>
		<content:encoded><![CDATA[<p>Daniel:</p>
<p>The Bayesian Sets implementation (code) remains the same between different item-search services operating on different data types (eg.  text documents and unlabelled images).  What is different is the feature vector representation of the item type between different services.</p>
<p>Does this address your concern about &#8220;black box similarity models&#8221; or is it something else?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Katherine Heller</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5738</link>
		<dc:creator>Katherine Heller</dc:creator>
		<pubDate>Tue, 06 Apr 2010 17:22:03 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5738</guid>
		<description>Jeremy:

Thats a great question. It depends in part on the features being used of course, but in general (ie if these concepts are captured by the features) if the query is the top 3 songs, the algorithm will prefer the songs which are both &quot;cheesy 80s&quot; AND &quot;concept X&quot; (someone else with the exact same query might genuinely be interested in the cheesy 80s aspect, so I think this makes sense). Then after those songs, will likely be a mix of cheesy 80s OR concept X. Once the query includes examples of non-cheesy 80s, concept X songs, songs that belong to concept X will then be top ranked, regardless of their cheesy 80ness.

One of the extensions to Bayesian Sets that we&#039;re currently working on is to detect clusters within the retrieved results. For example, if the top ranked results largely fall into either the &quot;cheesy 80s&quot; category, or the &quot;concept X&quot; category, then we should be able to cluster the results into these two categories, based on their features, and possibly use this to prompt the user to determine which category they are interested in.</description>
		<content:encoded><![CDATA[<p>Jeremy:</p>
<p>Thats a great question. It depends in part on the features being used of course, but in general (ie if these concepts are captured by the features) if the query is the top 3 songs, the algorithm will prefer the songs which are both &#8220;cheesy 80s&#8221; AND &#8220;concept X&#8221; (someone else with the exact same query might genuinely be interested in the cheesy 80s aspect, so I think this makes sense). Then after those songs, will likely be a mix of cheesy 80s OR concept X. Once the query includes examples of non-cheesy 80s, concept X songs, songs that belong to concept X will then be top ranked, regardless of their cheesy 80ness.</p>
<p>One of the extensions to Bayesian Sets that we&#8217;re currently working on is to detect clusters within the retrieved results. For example, if the top ranked results largely fall into either the &#8220;cheesy 80s&#8221; category, or the &#8220;concept X&#8221; category, then we should be able to cluster the results into these two categories, based on their features, and possibly use this to prompt the user to determine which category they are interested in.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Revisiting Xyggy &#171; The Intellogist Blog</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5737</link>
		<dc:creator>Revisiting Xyggy &#171; The Intellogist Blog</dc:creator>
		<pubDate>Tue, 06 Apr 2010 17:17:52 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5737</guid>
		<description>[...] very informative and interesting article about the method behind &#8220;item search&#8221; over at The Noisy Channel if you wish to dig into the nitty gritty of Xyggy.  Feel free to dive into in the comments section [...]</description>
		<content:encoded><![CDATA[<p>[...] very informative and interesting article about the method behind &#8220;item search&#8221; over at The Noisy Channel if you wish to dig into the nitty gritty of Xyggy.  Feel free to dive into in the comments section [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zoubin Ghahramani</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5736</link>
		<dc:creator>Zoubin Ghahramani</dc:creator>
		<pubDate>Tue, 06 Apr 2010 17:00:16 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5736</guid>
		<description>christopher, 

it&#039;s not a mashup of Google Sets and existing vector space models (like tf-idf) at all. You can look at the paper if you&#039;re interested in how it works. The basic idea is that it builds a probabilistic model of the set of query items and scores how well new items fit into that set. We do use vectors to represent items (they are a pretty general representation), and one can relate it to vector space models, but existing vector space models don&#039;t take into account properties of a *set* of query items.

Re patents and interestingness: we hope people find it interesting and useful -- we are trying to be open about what we&#039;ve done. 

jeremy:

that&#039;s an excellent question. It really does depend on the data the algorithm has accessible to it -- in other words how the music is represented. We can represent music in many ways: low-level audio features, text in lyrics, tags given by users, attributes from Music Genome Project, patterns of user-music preferences as used in traditional recommender systems, etc. Given just three items, and the representation of the music, Bayesian Sets will do as well as it can inferring the underlying concept. It&#039;s obviously a difficult problem and of course it won&#039;t always &quot;guess&quot; what&#039;s in you mind. But one of the nice things about Bayesian Sets is that as you give it more items and more features in the representation the method will generally work better and better.</description>
		<content:encoded><![CDATA[<p>christopher, </p>
<p>it&#8217;s not a mashup of Google Sets and existing vector space models (like tf-idf) at all. You can look at the paper if you&#8217;re interested in how it works. The basic idea is that it builds a probabilistic model of the set of query items and scores how well new items fit into that set. We do use vectors to represent items (they are a pretty general representation), and one can relate it to vector space models, but existing vector space models don&#8217;t take into account properties of a *set* of query items.</p>
<p>Re patents and interestingness: we hope people find it interesting and useful &#8212; we are trying to be open about what we&#8217;ve done. </p>
<p>jeremy:</p>
<p>that&#8217;s an excellent question. It really does depend on the data the algorithm has accessible to it &#8212; in other words how the music is represented. We can represent music in many ways: low-level audio features, text in lyrics, tags given by users, attributes from Music Genome Project, patterns of user-music preferences as used in traditional recommender systems, etc. Given just three items, and the representation of the music, Bayesian Sets will do as well as it can inferring the underlying concept. It&#8217;s obviously a difficult problem and of course it won&#8217;t always &#8220;guess&#8221; what&#8217;s in you mind. But one of the nice things about Bayesian Sets is that as you give it more items and more features in the representation the method will generally work better and better.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Daniel Tunkelang</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5735</link>
		<dc:creator>Daniel Tunkelang</dc:creator>
		<pubDate>Tue, 06 Apr 2010 15:24:44 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5735</guid>
		<description>Jeremy: may I suggest &lt;a href=&quot;http://thenoisychannel.com/2009/02/24/how-recommendation-engines-quash-diversity/&quot; rel=&quot;nofollow&quot;&gt;Coldplay&lt;/a&gt;? :-) Seriously, I share your concern about black-box similarity models. Curious to hear Dinesh&#039;s response.

Re patents and interestingness: I think most readers know that I&#039;m not thrilled with the state of software patents in the US. My qualms are practical rather than philosophical. Regardless, I agree with Jeremy that that an approach being patented (or, as per the post, patent-pending)  does not make it less interesting. Latent semantic indexing and RSA were (and still are) both pretty interesting!</description>
		<content:encoded><![CDATA[<p>Jeremy: may I suggest <a href="http://thenoisychannel.com/2009/02/24/how-recommendation-engines-quash-diversity/" rel="nofollow">Coldplay</a>? <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  Seriously, I share your concern about black-box similarity models. Curious to hear Dinesh&#8217;s response.</p>
<p>Re patents and interestingness: I think most readers know that I&#8217;m not thrilled with the state of software patents in the US. My qualms are practical rather than philosophical. Regardless, I agree with Jeremy that that an approach being patented (or, as per the post, patent-pending)  does not make it less interesting. Latent semantic indexing and RSA were (and still are) both pretty interesting!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5734</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 06 Apr 2010 14:52:13 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5734</guid>
		<description>christopher,

I don&#039;t buy your argument about &quot;being patented&quot; == &quot;limits interestingness&quot;

PageRank is patented.  Did that also limit its interestingness?

(http://en.wikipedia.org/wiki/PageRank)</description>
		<content:encoded><![CDATA[<p>christopher,</p>
<p>I don&#8217;t buy your argument about &#8220;being patented&#8221; == &#8220;limits interestingness&#8221;</p>
<p>PageRank is patented.  Did that also limit its interestingness?</p>
<p>(<a href="http://en.wikipedia.org/wiki/PageRank" rel="nofollow">http://en.wikipedia.org/wiki/PageRank</a>)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: christopher</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5728</link>
		<dc:creator>christopher</dc:creator>
		<pubDate>Tue, 06 Apr 2010 06:09:02 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5728</guid>
		<description>Sounds like a mashup of Google sets and existing vector models. In-Fact beyond saying it&#039;s based on a human cognitive model it sounds very similar to existing vector models. Am I incorrect?

Too bad it&#039;s going to be patented, because In practice that limits how interesting it can actually be.</description>
		<content:encoded><![CDATA[<p>Sounds like a mashup of Google sets and existing vector models. In-Fact beyond saying it&#8217;s based on a human cognitive model it sounds very similar to existing vector models. Am I incorrect?</p>
<p>Too bad it&#8217;s going to be patented, because In practice that limits how interesting it can actually be.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: jeremy</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5727</link>
		<dc:creator>jeremy</dc:creator>
		<pubDate>Tue, 06 Apr 2010 06:07:39 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5727</guid>
		<description>I like that you&#039;ve got this framework for increased interactivity in the retrieval process, and I like the real-time, interactive updating of the results in your patent search example.

I wonder, how well would this approach work with the music example that I gave you in London a few years ago, where any given song (or even set of songs) means dozens of different things?  For example, if I gave you these three examples...

http://www.youtube.com/watch?v=HzZ_urpj4As

http://www.youtube.com/watch?v=0-Q3cp3cp88

http://www.youtube.com/watch?v=o7aShcmEksw

You might conclude that I wanted more cheesy 80s mainstream pop music.  

But what if what I was really after were these (types of) songs?

http://www.youtube.com/watch?v=iZRA-Dwv86E

http://www.youtube.com/watch?v=4uLrbodN-9A

http://www.youtube.com/watch?v=DaMYSvy3ths

There is a concept or theme tying all six of these songs together.  But figuring out what that is might be very difficult, as the first three examples are actually consistent with more than one model (i.e. the &quot;Cheesy 80s&quot; model as well as the &quot;Concept X&quot; model).  Can Bayesian Sets handle things like this?</description>
		<content:encoded><![CDATA[<p>I like that you&#8217;ve got this framework for increased interactivity in the retrieval process, and I like the real-time, interactive updating of the results in your patent search example.</p>
<p>I wonder, how well would this approach work with the music example that I gave you in London a few years ago, where any given song (or even set of songs) means dozens of different things?  For example, if I gave you these three examples&#8230;</p>
<p><a href="http://www.youtube.com/watch?v=HzZ_urpj4As" rel="nofollow">http://www.youtube.com/watch?v=HzZ_urpj4As</a></p>
<p><a href="http://www.youtube.com/watch?v=0-Q3cp3cp88" rel="nofollow">http://www.youtube.com/watch?v=0-Q3cp3cp88</a></p>
<p><a href="http://www.youtube.com/watch?v=o7aShcmEksw" rel="nofollow">http://www.youtube.com/watch?v=o7aShcmEksw</a></p>
<p>You might conclude that I wanted more cheesy 80s mainstream pop music.  </p>
<p>But what if what I was really after were these (types of) songs?</p>
<p><a href="http://www.youtube.com/watch?v=iZRA-Dwv86E" rel="nofollow">http://www.youtube.com/watch?v=iZRA-Dwv86E</a></p>
<p><a href="http://www.youtube.com/watch?v=4uLrbodN-9A" rel="nofollow">http://www.youtube.com/watch?v=4uLrbodN-9A</a></p>
<p><a href="http://www.youtube.com/watch?v=DaMYSvy3ths" rel="nofollow">http://www.youtube.com/watch?v=DaMYSvy3ths</a></p>
<p>There is a concept or theme tying all six of these songs together.  But figuring out what that is might be very difficult, as the first three examples are actually consistent with more than one model (i.e. the &#8220;Cheesy 80s&#8221; model as well as the &#8220;Concept X&#8221; model).  Can Bayesian Sets handle things like this?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dinesh Vadhia</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5715</link>
		<dc:creator>Dinesh Vadhia</dc:creator>
		<pubDate>Mon, 05 Apr 2010 18:55:14 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5715</guid>
		<description>Google Sets (GS) was part of our inspiration for Bayesian Sets and our Bayesian Sets paper discusses this.  GS addresses the same kind of &quot;query by examples&quot; problem, but specifically for retrieval of list items from a limited amount of list data off the web.  

Unlike GS, Bayesian Sets is applicable to any kind of data you want to perform retrieval on, and not tied to a particular list item data set.  The way retrieval is done in GS and also the results are therefore very different from Bayesian Sets.  Also, unlike GS, as we&#039;ve discussed in the post, Bayesian Sets is modeled after our understanding of human learning and generalization</description>
		<content:encoded><![CDATA[<p>Google Sets (GS) was part of our inspiration for Bayesian Sets and our Bayesian Sets paper discusses this.  GS addresses the same kind of &#8220;query by examples&#8221; problem, but specifically for retrieval of list items from a limited amount of list data off the web.  </p>
<p>Unlike GS, Bayesian Sets is applicable to any kind of data you want to perform retrieval on, and not tied to a particular list item data set.  The way retrieval is done in GS and also the results are therefore very different from Bayesian Sets.  Also, unlike GS, as we&#8217;ve discussed in the post, Bayesian Sets is modeled after our understanding of human learning and generalization</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: John Tantalo</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/comment-page-1/#comment-5714</link>
		<dc:creator>John Tantalo</dc:creator>
		<pubDate>Mon, 05 Apr 2010 15:56:58 +0000</pubDate>
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061#comment-5714</guid>
		<description>Isn&#039;t this the same as Google Sets (2002)?</description>
		<content:encoded><![CDATA[<p>Isn&#8217;t this the same as Google Sets (2002)?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

