<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Noisy Channel</title>
	<atom:link href="http://thenoisychannel.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com</link>
	<description></description>
	<lastBuildDate>Fri, 27 Aug 2010 12:18:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>HCIR 2010: Bigger and Better than Ever!</title>
		<link>http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/</link>
		<comments>http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 05:24:13 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3275</guid>
		<description><![CDATA[Last Sunday was HCIR 2010, the Fourth Annual Workshop on Human-Computer Interaction and Information Retrieval, held at Rutgers University in New Brunswick, collocated with the Information Interaction in Context Symposium (IIiX 2010). With 70 registered attendees, it was the biggest HCIR workshop we have held. Rutgers was a gracious host, providing space not only for [...]]]></description>
			<content:encoded><![CDATA[<p>Last Sunday was <a href="http://www.hcir2010.org/">HCIR 2010</a>, the Fourth Annual Workshop on Human-Computer Interaction and Information Retrieval, held at Rutgers University in New Brunswick, collocated with the Information Interaction in Context Symposium (<a href="http://www.iiix2010.org/">IIiX 2010</a>).</p>
<p>With 70 registered attendees, it was the biggest HCIR workshop we have held. Rutgers was a gracious host, providing space not only for the all-day workshop but also for a welcome reception the night before.</p>
<p>And, based on an informal survey of participants, I can say with some semblance of objectivity that this was the best HCIR workshop to date.</p>
<p>The opening &#8220;poster boaster&#8221; session was particularly energetic. There was no award for best boaster, but <a href="http://www.bobcatsss2008.org/programme/speakers/136.en.html">Cathal Hoare</a> won an ovation by delivering his boaster as a poem:</p>
<blockquote>
<div id="_mcePaste">If a picture is worth a thousand words</div>
<div id="_mcePaste">Surely to query formulation a photo affords</div>
<div id="_mcePaste">The ability to ask ‘what is that’ in ways that are many</div>
<div id="_mcePaste">But for years we have asked how can-we</div>
<div id="_mcePaste">Narrow the search space so that in reasonable time</div>
<div id="_mcePaste">We can use images to answer questions that are yours and mine</div>
<div id="_mcePaste">In my humble poster I will describe</div>
<div id="_mcePaste">How recent technology and users prescribe</div>
<div id="_mcePaste">A solution that allows me to point and click</div>
<div id="_mcePaste">And get answers so that I don’t feel so thick</div>
<div id="_mcePaste">About my location and my environment</div>
<div id="_mcePaste">And to my touristic explorations bring some enjoyment</div>
<div id="_mcePaste">Now if after all that you feel rather dazed</div>
<div id="_mcePaste">Please come by my poster and see if you are amazed&#8230;.</div>
</blockquote>
<p>As in past years, we enlisted a rock-star keynote speaker&#8211;this time, Google UX researcher <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a>. His slides hardly do justice to his talk&#8211;especially without the audio and video&#8211;but I&#8217;ve embedded them here so that you can get a flavor for his presentation on how we need to do more to improve the searcher.</p>
<p><object id="__sse5065727" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir-keynote-talk-russell-aug-22-2010-100827000301-phpapp01&amp;stripped_title=dan-russell-search-quality-and-user-happiness" /><param name="name" value="__sse5065727" /><param name="allowfullscreen" value="true" /><embed id="__sse5065727" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir-keynote-talk-russell-aug-22-2010-100827000301-phpapp01&amp;stripped_title=dan-russell-search-quality-and-user-happiness" name="__sse5065727" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>We accepted six papers for the presentation sessions&#8211;sadly, one of the presenters could not make it because of visa issues. The five presentations covered a variety of topics relating to tools, models, and evaluation for HCIR. The most intriguing of these (to me, at least) was a presentation by <a href="http://www.cs.swan.ac.uk/~csmax/">Max Wilson</a> about &#8220;casual-leisure searching&#8221;&#8211;which he argues breaks our current models of exploratory search. Check out the slides below, as well as Erica Naone&#8217;s article in <em>Technology Review</em> on &#8220;<a href="http://www.technologyreview.com/communications/26135/">Searching for Fun</a>&#8220;.</p>
<p><object id="__sse5045602" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir2010pres-100824083643-phpapp02&amp;stripped_title=hcir2010-casualleisure-search" /><param name="name" value="__sse5045602" /><param name="allowfullscreen" value="true" /><embed id="__sse5045602" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir2010pres-100824083643-phpapp02&amp;stripped_title=hcir2010-casualleisure-search" name="__sse5045602" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;">As always, the poster session was the most interactive. Part of the energy came from <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> participants showing off their systems in advance of the final session that would decide which of them would win. In any case, I felt like a heel having to walk through the hall of poster three times in order to herd people back to their seats.</div>
<div style="padding: 5px 0 12px;">Which brings us to the Challenge. When I first suggested the idea of a competition or challenge to my co-organizers back in February, I wasn&#8217;t sure we could pull it off. Indeed,  even after we managed to obtain the use of the <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">New York Times Annotated Corpus</a> (thank you, <a href="http://www.ldc.upenn.edu/">LDC</a>!) and a volunteer to set up a baseline system in <a href="http://lucene.apache.org/solr/">Solr</a> (thank you, <a href="http://tommy.chheng.com/">Tommy</a>!), I still worried that we&#8217;d have a party and no one would come. So I was delighted to see six very credible entries competing for the &#8220;people&#8217;s choice&#8221; award.</div>
<div>
<p>All of the participants offered interesting ideas: custom facets, visualization of the associations between relevant terms, multi-document summarization to catch up on a topic, and combining topic modeling with sentiment analysis to analyzing competing perspectives on a controversial issue. The winning entry, presented by Michael Matthews of Yahoo! Labs Bareclona, was the <a href="http://fbmya01.barcelonamedia.org:8080/future/">Time Explorer</a>. As its name suggests, it allows users see the evolution of a topic over time. A cool feature is that it parses absolute and relative dates from article test&#8211;in some cases references to past or future times outside the publication span of the collection. Moreover, the temporal visualization of topics allows users to discover unexpected relationships between entities at particular points in time, e.g., between <a href="http://fbmya01.barcelonamedia.org:8080/future/results.jsp?query=yugoslavia&amp;s=0&amp;rc=10&amp;facet.filter=per:Slobodan%20Milosevic&amp;facet.filter=per:Saddam%20Hussein">Slobodan Milosevic and Saddam Hussein</a>. You can read more about it in Tom  Simonite&#8217;s <em>Technology Review</em> article, &#8220;<a href="http://www.technologyreview.com/computing/26113/">A Search Service that Can Peer into the Future</a>&#8220;.</p>
<p>In short, HCIR 2010 will be a tough act to follow. But we&#8217;re already working on it. Watch this space&#8230;</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Exploring Nuggetize</title>
		<link>http://thenoisychannel.com/2010/08/15/exploring-nuggetize/</link>
		<comments>http://thenoisychannel.com/2010/08/15/exploring-nuggetize/#comments</comments>
		<pubDate>Sun, 15 Aug 2010 23:12:36 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3261</guid>
		<description><![CDATA[I&#8217;ve been exchanging emails with Dhiti co-founder Bharath Mohan about Nuggetize, an intriguing interface that surfaces &#8220;nuggets&#8221; from a site to reduce the user&#8217;s cost of exploring a document collection. Specifically Nuggetize targets research scenarios where users are likely to assemble a substantial reading list before diving into it. You can try Nuggetize on the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3265" title="The Noisy Channel - Nuggetized" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/08/The-Noisy-Channel-Nuggetized.png" alt="" width="500" height="280" /></p>
<p>I&#8217;ve been exchanging emails with <a href="http://www.dhiti.com/">Dhiti</a> co-founder <a href="http://in.linkedin.com/in/bharathkumarmohan">Bharath Mohan</a> about <a href="http://www.nuggetize.com/">Nuggetize</a>, an intriguing interface that surfaces &#8220;nuggets&#8221; from a site to reduce the user&#8217;s cost of exploring a document collection. Specifically Nuggetize targets research scenarios where users are likely to assemble a substantial reading list before diving into it. You can try Nuggetize on the general web or on a particular site that has been &#8220;nuggetized&#8221;, e.g., a blog like <a href="http://nuggetize.com/thenoisychannel">this one</a> or <a href="http://nuggetize.com/cdixon-org">Chris Dixon&#8217;s</a>.</p>
<p>I&#8217;m always happy to see people building systems that explicitly support <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> (and am looking forward to seeing the <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> entries in a week!). Regular readers may recall my coverage of <a href="http://thenoisychannel.com/?s=cuil">Cuil</a>, <a href="http://thenoisychannel.com/?s=kosmix">Kosmix</a>, and <a href="http://thenoisychannel.com/?s=duck+duck+go">Duck Duck Go</a>. And of course I helped build a few of my own at <a href="http://endeca.com/">Endeca</a>. So what&#8217;s special about Nuggetize?</p>
<p>Mohan <a href="http://bharathruminates.blogspot.com/2010/05/nuggetize-faceted-search-for-web.html">describes</a> it as a <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> interface for the web. I&#8217;ll quibble here&#8211;the interface offers grouped refinement options, but the groups don&#8217;t really strike me as <a href="http://en.wikipedia.org/wiki/Faceted_classification">facets</a>. Moreover, the interface isn&#8217;t really designed to explore intersections of the refinement options&#8211;rather, at any given time, you see the intersection of the initial search and a currently selected refinement. But it is certainly an interface that supports query refinement and exploration.</p>
<p>The more interesting features are the nuggets and the support for <a href="http://en.wikipedia.org/wiki/Relevance_feedback">relevance feedback</a>.</p>
<p>The nuggets are full sentences, and thus feel quite different from conventional search-engine snippets. Conventional snippets serve primarily to provide <a href="http://en.wikipedia.org/wiki/Information_foraging#Information_scent">information scent</a>, helping users quickly determine the utility of a search result without the cost of clicking through to it and reading it. In contrast the nuggets are document fragments that are sufficiently self-contained to communicate a coherent thought. The experience suggests passage retrieval rather than document retrieval.</p>
<p>The relevance feedback is explicit: users can thumbs-up or thumbs-down results. After supplying feedback, users can refresh their results (which re-ranks them) and are also presented with suggested categories to use for feedback (both positive and negative). Unfortunately, the research on relevance feedback tells us that, helpful as it could be to improving user experience, users don&#8217;t bite. But perhaps users in research scenarios will give it a chance&#8211;especially with the added expressiveness and transparency of combining document and category feedback.</p>
<p>Overall it is a slick interface, and it&#8217;s nice seeing the various ideas Mohan and his colleagues put together. There&#8217;s certainly room for improvement&#8211;particularly in the quality of the categories, which sometimes feel like victims of <a href="http://en.wikipedia.org/wiki/Polysemy">polysemy</a>. Open-domain information extraction is hard! Some would even call it a <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">grand challenge</a>.</p>
<p>Mohan reads this blog (he reached out to me a few months ago via a <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/#comment-5977">comment</a>), and I&#8217;m sure he&#8217;d be happy to answer questions here.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/15/exploring-nuggetize/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Taking Blekko out for a Spin</title>
		<link>http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/</link>
		<comments>http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/#comments</comments>
		<pubDate>Sat, 07 Aug 2010 02:57:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3254</guid>
		<description><![CDATA[If you&#8217;re a search engine junkie like me, you&#8217;ve probably heard about Blekko, a search engine that has been percolating for over two years and recently launched a private beta. If not, I encourage you to watch the TechCrunch video I&#8217;ve embedded above. You can join the beta by following them on Twitter. I did [...]]]></description>
			<content:encoded><![CDATA[<p><script src="http://player.ooyala.com/player.js?embedCode=90cmtrMTom9vae2YoUwJrngW3UCgI2Zu&amp;deepLinkEmbedCode=90cmtrMTom9vae2YoUwJrngW3UCgI2Zu"></script></p>
<p>If you&#8217;re a search engine junkie like me, you&#8217;ve probably heard about <a href="http://blekko.com/">Blekko</a>, a search engine that has been percolating for <a href="http://blekko.com/">over two years</a> and recently <a href="http://searchengineland.com/blekko-a-new-search-engine-that-lets-you-spin-the-web-47215">launched</a> a private beta. If not, I encourage you to watch the TechCrunch video I&#8217;ve embedded above. You can join the beta by following them <a href="http://www.twitter.com/blekko">on Twitter</a>. I did that earlier this week, and my invitation arrived via a direct message the next day.</p>
<p>Blekko&#8217;s main differentiating feature is that it supports &#8220;slashtags&#8221;. These aren&#8217;t the same as the <a href="http://en.wikipedia.org/wiki/Slashtag">Twitter microsyntax</a> proposed by <a href="http://factoryjoe.com/blog/2009/11/08/slashtags/">Chris Messina</a> and named by <a href="http://unthinkingly.com/2009/11/09/slashtags-for-citizen-editors/">Chris Blow</a>. Rather, they are a way for users to &#8220;spin&#8221; their search results using a variety of filters. For example, [climate /liberal] and [climate /conservative] return very different results, because they are restricted to different sets of sites.</p>
<p>In addition to providing a set of curated slashtags, Blekko allows users to define their own slashtags by specifying the sets of sites to be included. There&#8217;s a social aspect here too: you can use (and follow) other users&#8217; slashtags. Blekko also has some special slashtags that don&#8217;t act as site filters, e.g., /date shows recent results and /seo offers indexing information about web sites.</p>
<p>Blekko emphasizes two characteristics that I find very appealing: transparency and user control. While they do not disclose their relevance ranking algorithm, they do expose some of the information they use to compute it. More significantly, their emphasis on slashtags de-emphasizes default ranking, but rather encourages users to take more responsibility in the information seeking process. Very <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>!</p>
<p>I like the concept. But I&#8217;m not sure how I feel about the execution. I have three main concerns.</p>
<p>First, the set of slashtags is somewhat haphazard&#8211;to be expected in a beta, but I&#8217;m not sure how it will evolve. I&#8217;d love to see a vocabulary collectively (and transparently) curated like Wikipedia, but I fear it will look more like social tagging site <a href="http://delicious.com/">Delicious</a>, which is a case study in the &#8220;<a href="http://furnas.people.si.umich.edu/Papers/vocab.paper.pdf">vocabulary problem</a>&#8220;. As any information scientist can tell you, managing vocabularies is hard!</p>
<p>Second, I&#8217;m not sure if site filters are the right model. What happens to sites with heterogeneous content? Or to sites that have one-hit wonders and therefore are unlikely to show up in any slashtags? I&#8217;d prefer to see the sites used as seeds to train classifiers that could then be applied to the entire index. Something a bit more like what <a href="http://people.lis.illinois.edu/~mefron/">Miles Efron</a> implemented in <a href="http://people.lis.illinois.edu/~mefron/papers/efron-libmedia.pdf">this research</a>&#8211;only on a much larger scale and applied at a page rather than site level.</p>
<p>Third, I think there&#8217;s a third ingredient that is essential to complement transparency and user control: guidance. As a user, I need to know what slashtags would lead me to interesting results, and ideally I&#8217;d want some kind of preview to make exploration as low-cost as possible.</p>
<p>I know I&#8217;m asking for a lot&#8211;especially from an ambitious startup that has just launched its private beta. But I think the stakes are high in this space, and going easy on a newcomer is no favor. I offer the tough love of a critic who would really like to see this kind of vision succeed.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>HCIR 2010 Accepted Papers</title>
		<link>http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/</link>
		<comments>http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 01:55:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3246</guid>
		<description><![CDATA[The 4th Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2010) is coming up on August 22 in New Brunswick, NJ, taking place immediately after the Information Interaction in Context conference (IIiX 2010). That&#8217;s just a few weeks away! If you are are interested in attending and haven&#8217;t already registered, please let me know as [...]]]></description>
			<content:encoded><![CDATA[<p>The 4th Workshop on Human-Computer Interaction and Information Retrieval <a href="http://www.hcir2010.org/">(HCIR 2010</a>) is coming up on August 22 in New Brunswick, NJ, taking place immediately after the Information Interaction in Context conference (<a href="http://www.iiix2010.org/">IIiX 2010</a>). That&#8217;s just a few weeks away!</p>
<p>If you are are interested in attending and haven&#8217;t already registered, please let me know as soon as possible via <a href="mail:dtunkelang@gmail.com">email</a> or <a href="http://twitter.com/dtunkelang">Twitter</a> (speaking of which, follow the <a href="http://twitter.com/#search?q=%23hcir10">#hcir2010</a> hash tag). We&#8217;re making the remaining slots available to the community on a first-come, first-serve basis.</p>
<p>Google user experience researcher <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a> will be delivering this year&#8217;s keynote on &#8220;<a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/keynote.html">Why is search sometimes easy and sometimes hard? Understanding serendipity and expertise in the mind of the searcher</a>&#8220;.</p>
<p>Here is the list of <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/presentations.html">accepted papers</a>:</p>
<p>Oral Presentations</p>
<ul>
<li>VISTO: for Web Information Gathering and Organization<br />
<em>Anwar Alhenshiri, Carolyn Watters, and Michael Shepherd (Dalhousie University)</em></li>
<li><em> </em>Time-based Exploration of News Archives<br />
<em>Omar Alonso (Microsoft Corporation), </em><em>Klaus Berberich (Max-Planck Institute for Informatics), </em><em>Srikanta Bedathur (Max-Planck Institute for Informatics), and </em><em>Gerhard Weikum (Max-Planck Institute for Informatics)</em></li>
<li><em></em>Combining Computational Analyses and Interactive Visualization to Enhance Information Retrieval<br />
<em>Carsten Goerg, Jaeyeon Kihm, Jaegul Choo, Zhicheng Liu, Sivasailam Muthiah, Haesun Park, and John Stasko (Georgia Institute of Technology)</em></li>
<li><em></em>Impact of Retrieval Precision on Perceived Difficulty and Other User Measures<br />
<em>Mark Smucker and Chandra Prakash Jethani (University of Waterloo)</em></li>
<li><em></em>Exploratory Searching As Conceptual Exploration<br />
<em>Pertti Vakkari (University of Tampere)</em></li>
<li><em></em>Casual-leisure Searching: The Exploratory Search Scenarios that Break our Current Models<br />
<em>Max L. Wilson (Swansea University) and David Elsweiler (University of Erlangen)</em></li>
</ul>
<p><a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> Reports</p>
<ul>
<li>Search for Journalists: New York Times Challenge Report<br />
<em>Corrado Boscarino, Arjen P. de Vries, and Wouter Alink </em><em>(Centrum Wiskunde and Informatica)</em></li>
<li>Exploring the New York Times Corpus with NewsClub<br />
<em>Christian Kohlschütter (Leibniz Universität Hannover)</em></li>
<li><em></em>Searching Through Time in the New York Times<br />
<em>Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)</em></li>
<li><em></em>News Sync: Three Reasons to Visualize News Better<br />
<em>V.G. Vinod Vydiswaran (University of Illinois), </em><em>Jeroen van den Eijkhof (University of Washington), </em><em>Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research)</em></li>
<li><em></em>Custom Dimensions for Text Corpus Navigation<br />
<em>Vladimir Zelevinsky (Endeca Technologies)</em></li>
<li><em></em>A Retrieval System Based on Sentiment Analysis<br />
<em>Wei Zheng and Hui Fang (University of Delaware)</em></li>
</ul>
<p>Research Posters</p>
<ul>
<li>Improving Web Search for Information Gathering: Visualization in Effect<br />
<em>Anwar Alhenshiri, Carolyn Watters, and Michael Shepherd (Dalhousie University)</em></li>
<li><em></em>User-oriented and Eye-Tracking-based Evaluation of an Interactive Search System<br />
<em>Thomas Beckers and Norbert Fuhr (University of Duisberg-Essen)</em></li>
<li><em></em>Exploring Combinations of Sources for Interaction Features for Document Re-ranking<br />
<em>Emanuele Di Buccio (University of Padua), Massimo Melucci (University of Padua), and Dawei Song (The Robert Gordon University)</em></li>
<li>Extracting Expertise to Facilitate Exploratory Search and Information Discovery: Combining Information Retrieval Techniques with a Computational Cognitive Model<br />
<em>Wai-Tat Fu and Wei Dong (University of Illinois at Urbana-Champaign)</em></li>
<li><em></em>An Architecture for Real-time Textual Query Term Extraction from Images<br />
<em>Cathal Hoare and Humphrey Sorensen (University College Cork)</em></li>
<li><em></em>Transaction Log Analysis of User Actions in a Faceted Library Catalog Interface<br />
<em>Bill Kules (The Catholic University of America), </em><em>Robert Capra (University of North Carolina at Chapel Hill), and </em><em>Joseph Ryan (North Carolina State University Libraries)</em></li>
<li>Context in Health Information Retrieval: What and Where<br />
<em>Carla Lopes and Cristina Ribeiro (University of Porto)</em></li>
<li><em></em>Tactics for Information Search in a Public and an Academic Library Catalog with Faceted Interfaces<br />
<em>Xi Niu and Bradley M. Hemminger (University of North Carolina at Chapel Hill)</em></li>
</ul>
<p>Position Papers</p>
<ul>
<li>Understanding Information Seeking in the Patent Domain and its Impact on the Interface Design of IR Systems<br />
<em>Daniela Becks, Matthias Görtz, and </em><em>Christa Womser-Hacker (University of Hildesheim)</em></li>
<li>Better Search Applications Through Domain Specific Context Descriptions<br />
<em>Corrado Boscarino, Arjen P. de Vries, and Jacco van Ossenbruggen </em><em>(Centrum Wiskunde and Informatica)</em></li>
<li><em></em>Layered, Adaptive Results: Interaction Concepts for Large, Heterogeneous Data Sets<br />
<em>Duane Degler (Design for Context)</em></li>
<li><em></em>Revisiting Exploratory Search from the HCI Perspective<br />
<em>Abdigani Diriye (University College London), Max L. Wilson (Swansea University), </em><em>Ann Blandford (University College London), and </em><em>Anastasios Tombros (Queen Mary University London)</em></li>
<li>Supporting Task with Information Appliances: Taxonomy of Needs<br />
<em>Sarah Gilbert, Lori McCay-Peet, and Elaine Toms (Dalhousie University)</em></li>
<li><em></em>A Proposal for Measuring and Implementing Group’s Affective Relevance in Collaborative Information Seeking<br />
<em>Roberto González-Ibáñez and Chirag Shah (Rutgers University)</em></li>
<li><em></em>Evaluation of Music Information Retrieval: Towards a User-Centered Approach<br />
<em>Xiao Hu (University of Illinois at Urbana Champaign) and </em><em>Jingjing Liu (Rutgers University)</em></li>
<li><em></em>Information Derivatives: A New Way to Examine Information Propagation<br />
<em>Chirag Shah (Rutgers University)</em></li>
<li><em></em>Implicit Factors in Networked Information Feeds<br />
<em>Fred Stutzman (University of North Carolina at Chapel Hill)</em></li>
<li><em></em>Improving the Online News Experience<br />
<em>V. G. Vinod Vydiswaran (University of Illinois) and </em><em>Raman Chandrasekar (Microsoft Research)</em></li>
<li><em></em>Breaking Down the Assumptions of Faceted Search<br />
<em>Vladimir Zelevinsky (Endeca Technologies)</em></li>
<li><em></em>A Survey of User Interfaces in Content-based Image Search Engines on the Web<br />
<em>Danyang Zhang (The City University of New York) </em></li>
</ul>
<p>You can also download the full proceedings <a href="http://www.hcir2010.org/docs/HCIR2010Proceedings.pdf">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Overcoming Spammers in Twitter</title>
		<link>http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/</link>
		<comments>http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 03:43:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3243</guid>
		<description><![CDATA[As I blogged a few months ago, University of Oviedo professor Daniel Gayo-Avello published a research paper entitled “Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms“, in which he concluded that TunkRank was the best of the measures he studied for ranking Twitter users. I recently discovered that he and David Brenes posted slides from [...]]]></description>
			<content:encoded><![CDATA[<p><object id="__sse4504913" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ceri2010-gayobrenes-imagenes-100615061415-phpapp02&amp;stripped_title=overcoming-spammers-in-twitter-a-tale-of-five-algorithms" /><param name="name" value="__sse4504913" /><param name="allowfullscreen" value="true" /><embed id="__sse4504913" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ceri2010-gayobrenes-imagenes-100615061415-phpapp02&amp;stripped_title=overcoming-spammers-in-twitter-a-tale-of-five-algorithms" name="__sse4504913" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div id="__ss_4504913" style="width: 425px;">
<p>As I blogged <a href="http://thenoisychannel.com/2010/04/07/go-tunkrank/">a few months ago</a>, University of Oviedo professor <a href="http://www.di.uniovi.es/~dani/">Daniel Gayo-Avello</a> published a research paper entitled “<a href="http://arxiv.org/abs/1004.0816">Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms</a>“, in which he concluded that <a href="http://tunkrank.com/">TunkRank</a> was the best of the measures he studied for ranking Twitter users. I recently discovered that he and <a href="http://es.linkedin.com/in/brenes">David Brenes</a> posted slides from their presentation at <a href="http://ir.ii.uam.es/ceri2010/">CERI 2010</a> on &#8220;Overcoming Spammers in Twitter&#8221;. Enjoy!</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Questions. But Why?</title>
		<link>http://thenoisychannel.com/2010/08/01/questions-but-why/</link>
		<comments>http://thenoisychannel.com/2010/08/01/questions-but-why/#comments</comments>
		<pubDate>Sun, 01 Aug 2010 18:41:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3231</guid>
		<description><![CDATA[Yahoo! Answers and Answers.com have been around since 2005. But community question answering (as distinct from question answering using natural language processing) has witnessed a resurgence of popularity&#8211;at least in the blogosphere and among investors. Quora and Hunch are two of hottest startups on the web, and Aardvark was acquired by Google earlier this year. Most recently, Ask.com [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://answers.yahoo.com/">Yahoo! Answers</a> and <a href="http://www.answers.com/">Answers.com</a> have been around since 2005. But community question answering (as distinct from <a href="http://en.wikipedia.org/wiki/Question_answering">question answering using natural language processing</a>) has witnessed a resurgence of popularity&#8211;at least in the blogosphere and among investors. <a href="http://www.quora.com/">Quora</a> and <a href="http://hunch.com/">Hunch</a> are two of hottest startups on the web, and <a href="http://vark.com/">Aardvark</a> was acquired by Google earlier this year. Most recently, <a href="http://www.ask.com/">Ask.com</a> relaunched with a return to its question-answering roots and Facebook began rolling out <a href="http://blog.facebook.com/blog.php?post=411795942130">Facebook Questions</a>.</p>
<p>So there&#8217;s no question that community question answering is hot. The question is why? In particular, is community question answering a step forward or backward relative to today&#8217;s search engines, or is it something different?</p>
<p>Regarding Facebook Questions, Jason Kincaid writes in <a href="http://techcrunch.com/2010/07/28/facebook-qa-service-questions-begins-rolling-out-could-be-massive/">TechCrunch</a>:</p>
<blockquote><p>Given its size, it won’t take long for Facebook to build up a massive amount of data — if that data is consistently reliable, Questions could turn into a viable alternative to Google for many queries.</p></blockquote>
<p>That&#8217;s a big if.  But I think the bigger caveat is the vague quantifier &#8220;many&#8221;. The success of community question answering services will depend on how these services position themselves relative to users&#8217; information needs. Anyone arguing that these services can or should replace today&#8217;s web search engines might want to consider the following examples of information needs that are typical of current search engine use:</p>
<ul>
<li><a href="http://www.google.com/search?q=how+do+i+get+an+iphone+case">How do I get an iPhone case?</a></li>
<li><a href="http://www.google.com/search?q=who+sings+the+choco+latte+song">Who sings the &#8220;choco latte&#8221; song?</a></li>
<li><a href="http://www.google.com/search?q=movies+near+11201">What movies are playing in my neighborhood?</a></li>
<li><a href="http://www.google.com/search?q=how+do+i+get+to+boston+from+new+york">How do I get to Boston from New York?</a></li>
<li><a href="http://www.google.com/search?q=best+selling+netbook">What is the best selling netbook?</a></li>
<li><a href="http://www.google.com/search?q=best+cell+phone+reception+in+new+york">Who offers the best cell phone reception in New York?</a></li>
<li><a href="http://www.google.com/search?q=what+was+the+score+in+the+north+korea+portugal+game">What was the score in the North Korea &#8211; Portugal game?</a></li>
</ul>
<p>I hope I don&#8217;t have to keep going to convince you that web search engines have earned their popularity by serving a broad class of information needs (i.e., answer lots of questions)&#8211;and that&#8217;s without even using the wide variety of personalized and social features that web search engines are rapidly developing.</p>
<p>The common thread in the above questions is that they focus on objective information. In general, such questions are effectively and efficiently answered by search engines based on indexed, published content (including &#8220;<a href="http://en.wikipedia.org/wiki/Deep_Web">deep web</a>&#8221; content made available to search engines via APIs). There&#8217;s a lot of work we can do to improve search engines, particularly in the area of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">supporting query formulation</a>. But it seems silly and wasteful to route such questions to other people&#8211;human beings should not be reduced to performing tasks at which machines excel.</p>
<p>That said, I agree with Kincaid that there are many information needs that are well addressed by  community question answering. In particular:</p>
<ul>
<li><strong>Questions for which point of view is a feature, not a bug.</strong> Review sites succeed when they provide sincere, informed personal reactions to products and services. Similarly, routing questions to people makes sense either when we care about the answerer&#8217;s a point of view. For some questions, I want the opinion of someone who shares my taste (which is what Hunch is pursuing with its &#8220;<a href="http://www.businessinsider.com/heres-what-comes-after-the-social-graph-2010-7">taste graph</a>&#8220;). For others, I want a diversity of expert opinions&#8211;for which I might turn to Aardvark (which tries to route questions to topic experts), Quora (where people follow particular topics), or <a href="http://www.linkedin.com/answers/">LinkedIn Answers</a>. Over time, the answers to many such questions can be published and indexed&#8211;and indeed some answers sites receive a <a href="http://twitter.com/Hitwise_US/status/19919086878">large share of their traffic</a> from search engines.</li>
<li><strong>Niche topics.</strong> As much as web search as improved <a href="http://thenoisychannel.com/2008/04/22/accessibility-in-information-retrieval/">information accessibility</a> for the &#8220;long tail&#8221; of published information, the effectiveness of web search can be highly variable for the most obscure information needs. Moreover, this effectiveness depends significantly on the user: some people are better at searching than others, especially in their areas of domain expertise. Social search can help level the playing field. Much as Wikipedia has surfaced much of the expertise at the head of the information distribution, community question answering can help out in the tail.</li>
<li><strong>Community for its own sake.</strong> Even in cases where search engines are more effective and efficient than community question answering services, some people prefer to participate in a social exchange rather than to conduct a transaction with an impersonal algorithm. Indeed, <a href="http://vark.com/aardvarkFinalWWW2010.pdf">researchers at Aardvark</a> found that many of the questions posed through their service (pre-acquisition) could be answered successfully using Google. I&#8217;ll go out on a limb and assume that Aardvark&#8217;s users were early technology adopters who are quite conversant with search engines&#8211;but in some case chose to use a social alternative simply because they wanted to be social.</li>
</ul>
<p>Conclusions? Community question answering may be overhyped right now, but it isn&#8217;t a fad. There are broad classes of subjective information needs that require a point of view, if not a diversity of views. And even if much of the use of community question answering sites is mediated by search engines indexing their archives, there will always be a need for fresh content. I also believe that social search will continue to be valuable for niche topics, since neither search engines nor searchers will ever be perfect.</p>
<p>But I think the biggest open question is whether people will favor community question answering simply to be social. I conjecture that, by very publicly integrating community question answering into is social networking platform, Facebook is testing the hypothesis that it can turn information seeking from a utilitarian individual task into an entertaining social destination. Given Facebook&#8217;s <a href="http://mashable.com/2009/09/17/facebook-google-time-spent/">highly engaged</a> user population, we won&#8217;t have to wait long to find out.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/01/questions-but-why/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 3 Industry Track Afternoon Sessions</title>
		<link>http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/</link>
		<comments>http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 01:46:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3226</guid>
		<description><![CDATA[While the SIGIR 2010 Industry Track keynotes had the highest-profile speakers, the rest of the day assembled an impressive line-up: The new frontiers of Web search: going beyond the 10 blue links Ricardo Baeza-Yates, Andrei Broder, Yoelle Maarek, and Prabhakar Raghavan, Yahoo! Labs Cross-Language Information Retrieval in the Legal Domain Samir Abdou and Thomas Arni, [...]]]></description>
			<content:encoded><![CDATA[<p>While the <a href="http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/">SIGIR 2010 Industry Track keynotes</a> had the highest-profile speakers, the rest of the day assembled an impressive line-up:</p>
<ul>
<li>The new frontiers of Web search: going beyond the 10 blue links<br />
<em>Ricardo Baeza-Yates, Andrei Broder, Yoelle Maarek, and Prabhakar Raghavan, <strong>Yahoo! Labs<br />
</strong></em></li>
<li>Cross-Language Information Retrieval in the Legal Domain<br />
<em>Samir Abdou and Thomas Arni, <strong>Eurospider</strong></em></li>
<li>Building and Configuring a Real-Time Indexing System<br />
<em>Garret Swart, Ravi Palakodety, Mohammad Faisal, Wesley Lin, <strong>Oracle<br />
</strong></em></li>
<li>Lessons and Challenges from Product Search<br />
<em>Daniel E. Rose, <strong>A9.com (Amazon)<br />
</strong></em></li>
<li>Being Social: Research in Context-aware and Personalized Information Access @ Telefonica<br />
<em>Xavier Amatriain, Karen Church and Josep M. Pujol, <strong>Telefónica</strong><br />
</em></li>
<li>Searching and Finding in a Long Tail Marketplace<br />
<em>Neel Sundaresan, </em><strong><em>eBay</em></strong><em><br />
</em></li>
<li>When No Clicks are Good News<br />
<em>Carlos Castillo, Aris Gionis, Ronny Lempel, and Yoelle Maarek, <strong>Yahoo! Research</strong></em></li>
</ul>
<p>I missed the Eurospider and Oracle talks, but otherwise I spent the afternoon enjoying these sessions. The slides, along with all of the keynote slides, are available <a href="http://www.eurospider.com/acm-sigir-industry-track-2010.html">here</a>.</p>
<p>Some highlights from the talks I attended:</p>
<ul>
<li>Andrei Broder, a pioneer of Web IR and author of the highly cited &#8220;<a href="http://www.sigir.org/forum/F2002/broder.pdf">Taxonomy of Web Search</a>&#8220;,  enumerated a half-dozen challenges for web search to move from its current state to one that not only accomplishes semantic analysis but also supports task completion. Naturally, the one that appeals to me is the need for search engines to move beyond query suggestion and truly engage the user in a dialog.</li>
<li>Dan Rose talked about the challenges of product search, and in particular the blessing and curse of implementing search applications for structured data (something that I&#8217;m very familiar with from my previous role at <a href="http://www.endeca.com/">Endeca</a>). He also warned of the dangers of over-interpreting behavioral data, e.g., a site change that increases revenue does not necessarily imply a better user experience (it could just be favoring higher-priced inventory), and may ultimately alienate customers.</li>
<li>Xavier Amatriain focused on social search, and talked about how, as we&#8217;ve turned to context to help mitigate information overload, we find ourselves confronted with the new problem of context overload. Specifically, he cited the research questioning the wisdom of the crowd, and proposed the <a href="http://www.nuriaoliver.com/RecSys/wisdomFew_sigir09.pdf">wisdom of the (expert) few</a> as a better alternative.</li>
<li>Neel Sundaresan offered an interesting tour of <a href="http://labs.ebay.com/">eBay Research Labs</a> prototypes, including the <a href="http://labs.ebay.com/erl/demoto/to">BayEstimate</a> that helps sellers improve listing titles by discovering the keywords are both representative of the item and used in buyers&#8217; queries.</li>
<li>Finally, Carlos Castillo offered a nice approach to discover when search engine abandonment is &#8220;<a href="http://research.google.com/pubs/pub35486.html">good abandonment</a>&#8220;: identify a subset of &#8220;tenacious&#8221; users who almost never abandon searches and measure <em>their</em> abandonment&#8211;since it is almost certain to be the good kind.</li>
</ul>
<p>All in all, I was very impressed with the quality of the Industry Track, and gratified to see how it had improved on the program I put together last year. Given the key role that industry plays in information retrieval, I think it is important that the top-tier IR conference promote the best that industry has to offer.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 3 Industry Track Keynotes</title>
		<link>http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/</link>
		<comments>http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/#comments</comments>
		<pubDate>Sun, 25 Jul 2010 22:45:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3219</guid>
		<description><![CDATA[When I organized the SIGIR 2009 Industry Track last year, my goal was to meet the standard set by the CIKM 2008 Industry Event: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the [...]]]></description>
			<content:encoded><![CDATA[<p>When I organized the <a href="http://www.sigir2009.org/Program/industry">SIGIR 2009 Industry Track</a> last year, my goal was to meet the standard set by the <a href="http://www.cikm2008.org/industry_event.php">CIKM 2008 Industry Event</a>: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the field in which they are working. I was mostly happy with the results last year, and the popularity of the industry track relative to the parallel technical sessions suggest that my assessment is not simply from personal bias.</p>
<p>But this year the <a href="http://www.sigir2010.org/doku.php?id=industry:program">SIGIR 2010 Industry Track</a> broke new ground. The keynotes were from some of the most senior technologists at the world&#8217;s largest web search engines:</p>
<ul>
<li><a title="http://ir.baidu.com/phoenix.zhtml?c=188488&amp;p=irol-govBio&amp;ID=161381" rel="nofollow" href="http://ir.baidu.com/phoenix.zhtml?c=188488&amp;p=irol-govBio&amp;ID=161381" target="extern">William Chang</a>, Chief Scientist at Baidu</li>
<li><a title="https://docs.google.com/Doc?id=dhbn99z4_529xnxc2hh" rel="nofollow" href="https://docs.google.com/Doc?id=dhbn99z4_529xnxc2hh" target="extern">Yossi Matias</a>, Head of Google&#8217;s Israel R&amp;D Center</li>
<li><a title="http://www.jopedersen.com/jopedersen/Home.html" rel="nofollow" href="http://www.jopedersen.com/jopedersen/Home.html" target="extern">Jan Pedersen</a>, Chief Scientist for Core Search at Microsoft (Bing)</li>
<li><a title="http://company.yandex.com/general_info/management_team.xml" rel="nofollow" href="http://company.yandex.com/general_info/management_team.xml" target="extern">Ilya Segalovich</a>, CTO and Co-Founder of Yandex</li>
</ul>
<p>I won&#8217;t attempt to provide much detail about these presentations, first because <del datetime="2010-07-26T16:42:30+00:00">I&#8217;m hoping they will all be</del> <a href="http://www.eurospider.com/acm-sigir-industry-track-2010.html">they have all been posted online</a> and second because Jeff Dalton has already done an excellent job of posting <a href="http://www.searchenginecaffe.com/search?q=sigir+industry+day">live-blogged notes</a>. Rather, I&#8217;ll offer a few reactions.</p>
<p>William&#8217;s presentation on the &#8220;Future Search: From Information Retrieval to Information Enabled Commerce&#8221; unsurprisingly focused on the Chinese search-related market. While the topic of  <a href="http://googleblog.blogspot.com/2010/06/update-on-china.html">Google in China</a> was an elephant in the room, it did not surface even obliquely in the presentation&#8211;and I commend William for taking the high road. As for Baidu itself, its most interesting innovation is <a href="http://open.baidu.com/">Aladdin</a>, an open search platform that allows participating webmasters to submit query-content pairs.</p>
<p>Yossi&#8217;s presentation on &#8220;Search Flavours at Google&#8221; was a tour de force of Google&#8217;s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding&#8211;where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to <a href="http://googleresearch.blogspot.com/2009/04/predicting-present-with-google-trends.html">predict the present</a>. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.</p>
<p>Jan talked about &#8220;Query Understanding at Bing&#8221;. I really hope he makes these slides available, since they do a really nice job of describing a machine learning based architecture for processing search queries. To get an idea of this topic, check out <a href="http://thenoisychannel.com/2009/08/02/sigir-2009-day-3-industry-track-nick-craswell/">Nick Craswell&#8217;s presentation</a> from last year&#8217;s SIGIR.</p>
<p>Finally Ilya talked about &#8220;Machine Learning in Search Quality at Yandex&#8221;, the largest search engine in Russia. He described the main challenge in Russia as handling the local aspects of search: he gave as an example that, if you&#8217;re in a small town in Russia, then local results in Moscow may as well be on the moon. Local search is a topic close to my heart, not least of which because it is my day job! Ilya&#8217;s talked focused largely on Yandex&#8217;s <a href="http://company.yandex.com/general_info/technologies.xml">MatrixNet</a> implementation of <a href="http://en.wikipedia.org/wiki/Learning_to_rank">learning to rank</a>. What I&#8217;m surprised he didn&#8217;t mention is the challenges of data acquisition&#8211;in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.</p>
<p>All in all, the four keynotes collectively offered an excellent state-of-the-search-engine address.  As with last year, the industry track talks were the most popular morning sessions, and the speakers delivered the goods.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 2 Technical Sessions</title>
		<link>http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/</link>
		<comments>http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 06:49:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3215</guid>
		<description><![CDATA[On the second day of the SIGIR 2010 conference, I did start shuttling between sessions to attend particular talks. In the morning session, I attended three talks. The first, &#8220;Geometric Representations for Multiple Documents&#8221; by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides [...]]]></description>
			<content:encoded><![CDATA[<p>On the second day of the <a href="http://www.sigir2010.org/">SIGIR 2010</a> conference, I did start shuttling between sessions to attend particular talks.</p>
<p>In the morning session, I attended three talks. The first, &#8220;Geometric Representations for Multiple Documents&#8221; by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides both theoretical and experimental evidence that geometric means work better than arithmetic means for representing such combinations. The second, &#8220;Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction&#8221; by Anna Shtok, Oren Kurland, and David Carmel, shows the efficacy of a utility estimation framework comprised of relevance models, measures like query clarity to estimate the representativeness of relevance models, and similarity measures to estimate the similarity or correlation between two ranked lists.  The authors demonstrated significant improvements from the framework over simply using the representativeness measures for performance prediction. The third paper, &#8220;Evaluating Verbose Query Processing Techniques&#8221; by Samuel Huston and Bruce Croft, showed that removing &#8220;stop structures&#8221;, a generalization of stop words, could significantly improve performance on long queries. Interestingly, the authors evaluated their approach on &#8220;black box&#8221; commercial search engines Yahoo and Bing without knowledge of their retrieval models.</p>
<p>In the session after lunch, I mostly attended talks from the session on user feedback and user models. The first, &#8220;Incorporating Post-Click Behaviors Into a Click Model&#8221; by Feimin Zhong, Dong Wang, Gang Wang, Weizhu Chen, Yuchen Zhang, Zheng Chen, and Haixun Wang, proposed and experimentally validated a click model to infer document relevance from post-click behavior like dwell time that can be derived from logs. The second, &#8220;Interactive Retrieval Based on Faceted Feedback&#8221; by Lanbo Zhang and Yi Zhang, described an approach using facet values for relevance and pseudo-relevance feedback. It&#8217;s interesting work, but I think the authors should look at work my colleagues and I presented at <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2008/">HCIR 2008</a> on distinguishing whether facet values are useful for summarization or for refinement. The third, &#8220;Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time&#8221; by Chao Liu, Ryen White, and Susan Dumais, offered an elegant model of dwell time and used it to predict dwell time distribution from page-level features. Finally, I attended one talk from the session on retrieval models and ranking: &#8220;Finding Support Sentences for Entities&#8221; by Roi Blanco and Hugo Zaragoza. They present a novel approach of generalizing snippets to interfaces that offer named entities (e.g., people) as supplements to the search results. I am excited to see research that could make richer interfaces more explainable to users.</p>
<p>I spend the last session of the day listening to a couple of talks about users and interactive IR. The first was &#8220;Studying Trailfinding Algorithms for Enhanced Web Search&#8221; by Adish Singla, Ryen White, and Jeff Huang<del datetime="2010-07-28T22:14:41+00:00">, turned out to be the best-paper winner</del>. This work extends <a href="http://research.microsoft.com/en-us/um/people/ryenw/publications.html">previous work</a> that Ryen and colleagues have done on search trails and showed results of various trailfinding algorithms that outperform the trails users follow on their own. The second, &#8220;Context-Aware Ranking in Web Search&#8221; by Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, and Hang Li, analyzes requerying behavior as reformulation, specialization, generalization, or general association, and demonstrates that knowing or inferring which the user is doing significantly improves ranking of the second query&#8217;s results.</p>
<p>The day wrapped up with a luxurious banquet at the <a href="http://www.ichotelsgroup.com/intercontinental/en/gb/locations/overview/gvaha">Hotel Intercontinental</a>, near the <a href="http://www.google.com/images?q=place+des+nations+geneva">Nations Plaza</a>. After sweating through conference sessions without air conditioning, it was a welcome surprise to enjoy great food in such an elegant setting.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 2 Keynote</title>
		<link>http://thenoisychannel.com/2010/07/22/sigir-2010-day-2-keynote/</link>
		<comments>http://thenoisychannel.com/2010/07/22/sigir-2010-day-2-keynote/#comments</comments>
		<pubDate>Thu, 22 Jul 2010 12:32:36 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3212</guid>
		<description><![CDATA[The second day of the SIGIR 2010 conference kicked off with a keynote by TREC pioneer Donna Harman entitled &#8220;Is the Cranfield Paradigm Outdated?&#8221;. If you are at all familiar with Donna&#8217;s work on TREC, you&#8217;ll hardly be surprised that her answer was a resounding &#8220;NO!&#8221;. But of course she did a lot more than [...]]]></description>
			<content:encoded><![CDATA[<p>The second day of the <a href="http://www.sigir2010.org/">SIGIR 2010</a> conference kicked off with a keynote by <a href="http://trec.nist.gov/">TREC</a> pioneer Donna Harman entitled &#8220;Is the Cranfield Paradigm Outdated?&#8221;. If you are at all familiar with Donna&#8217;s work on TREC, you&#8217;ll hardly be surprised that her answer was a resounding &#8220;NO!&#8221;.</p>
<p>But of course she did a lot more than defend <a href="http://www.iva.dk/bh/core%20concepts%20in%20lis/articles%20a-z/cranfield_experiments.htm">Cranfield</a>. She offered a comprehensive and fascinating history of the Cranfield paradigm, starting with the Cranfield 1 experiments in the late 1950s which evaluated manual indexing systems.</p>
<p>Most importantly, she defined the Cranfield paradigm as defining a metric that reflects real user model and building the collection before the experiments to prevent human bias and enable reusability. As she noted, this model does not say anything about only returning a ranked list of ten blue links&#8211;which is what most people (myself included) associate with the Cranfield model. Indeed, she urged us to think outside this mindset.</p>
<p>I loved the presentation and found the history enlightening (though <a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a> corrected a few minor details). Still, I wondered if she was defining the Cranfield paradigm so broadly as to co-opt all of its critics.  But I think the clear dividing line between Cranfield and non-Cranfield is whether user effects are something to avoid or embrace. I perceive the success of Cranfield as coming in large part from its reduction of user effects. But I think that much of the <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> community sees user effects as precisely what we need to be evaluating for information seeking support systems.</p>
<p>In any case, it was a great keynote, and Donna promises me she will make the slides available. Of course I&#8217;ll post them here. In the mean time, check out <a href="http://www.searchenginecaffe.com/2010/07/sigir-2010-keynote-donna-harmon-on.html">Jeff Dalton&#8217;s notes</a> on his great blog and the tweets at <a href="http://search.twitter.com/search?q=%23sigir2010">#sigir2010</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/22/sigir-2010-day-2-keynote/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 1 Posters</title>
		<link>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-posters/</link>
		<comments>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-posters/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 07:19:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3208</guid>
		<description><![CDATA[The first day of SIGIR 2010 ended with a monster poster session&#8211;over 100 posters to see in 2 hours in a hall without air conditioning! I managed to see a handful: &#8220;Query Quality: User Ratings and System Predictions&#8221; by Claudia Hauff, Franciska de Jong, Diane Kelly, and Leif Azzopardi offered the startling (to me at [...]]]></description>
			<content:encoded><![CDATA[<p>The first day of <a href="http://www.sigir2010.org/">SIGIR 2010</a> ended with a monster <a href="http://www.sigir2010.org/doku.php?id=program:posters">poster session</a>&#8211;over 100 posters to see in 2 hours in a hall without air conditioning! I managed to see a handful:</p>
<ul>
<li>&#8220;Query Quality: User Ratings and System Predictions&#8221; by Claudia Hauff, Franciska de Jong, Diane Kelly, and Leif Azzopardi offered the startling (to me at least) result that human prediction of query difficulty did not correlate (or at best correlated weakly) to post-retrieval <a href="http://thenoisychannel.com/2010/05/23/estimating-the-query-difficulty-for-information-retrieval/">query performance prediction</a> (QPP) measures like query clarity. I talked with Diane about it, and I wonder how strongly the human prediction, which was pre-retrieval, would correlate to human assessments of the results. I also don&#8217;t know how well the QPP measures she used apply to web search contexts.</li>
<li>Which leads me to the next poster I saw, &#8220;Predicting Query Performance on the Web&#8221; by Niranjan Balasubramanian, Giridhar Kumaran, and Vitor Carvalho. They offered what I saw as a much more encouraging result&#8211;namely that QPP is highly reliable when it returns low scores. In other words, a search engine may wrongly believe that it did well on a query, but it is almost certainly right when it thinks it failed. This certainty on the negative side is exactly the opening that <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> advocates need to offer richer interaction for queries a conventional ranking approach recognizes its own failure. While some of the specifics of the authors&#8217; approach are proprietary (they perform regression on features used by Bing), the approach seems broadly applicable.</li>
<li>Next I saw &#8220;Hashtag Retrieval in a Microblogging Environment&#8221; by Miles Efron. He provided evidence that hashtags could be an effective foundation for query expansion of Twitter search queries, using a <a href="http://en.wikipedia.org/wiki/Language_model">language model</a> approach. The approach may generalize beyond hashtags, but hashtags do have the advantage of being highly topical and relatively unambiguous by convention.</li>
<li>&#8220;The Power of Naive Query Segmentation&#8221; by Matthias Hagen, Martin Potthast, Benno Stein, and Christof Brautigam suggested a simple approach for segmenting long queries into quoted phrases: consider all segmentations and, for a given segmentation, compute a weighted sum of the <a href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">Google ngram counts</a> for each quoted phases, the weight of a phrase of length <em>s</em> being s^s. I don&#8217;t find the weighting particularly intuitive, but the accuracy numbers they present look quite nice relative to more sophisticated approaches.</li>
<li>&#8220;Investigating the Suboptimality and Instability of Pseudo-Relevance Feedback&#8221; by Raghavendra Udupa and Abhijit Bhole showed that an oracle with knowledge of a few high-scoring non-relevant documents could vastly improve the performance of <a href="http://en.wikipedia.org/wiki/Relevance_feedback#Blind_feedback">pseudo-relevance feedback</a>. While this information does not lead directly to any applications, it does suggest that obtaining a very small amount of feedback from the user might go a long way. I&#8217;m curious how much is possible from even a single negative-feedback input.</li>
<li>&#8220;Short Text Classification in Twitter to Improve Information Seeking&#8221; by Bharath Sriram, David Fuhry, Engin Demir, Hakan Ferhatosmanoglu, and Murat Demirbas challenged the conventional wisdom that tweets are too short for traditional classification methods. They achieved nice results, but on the relatively simple problem of classifying tweets as news, events, opinions, deals, and private messages. I was offered promises of future work, but I think the more general classification problem is much harder.</li>
<li>&#8220;Metrics for Assessing Sets of Subtopics&#8221; by Filip Radlinski, Martin Szummer, and Nick Craswell proposed an evaluation framework for result diversity based on coherence, distinctness, plausibility, and completeness. I suggested that this framework would apply nicely to <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> interfaces, and that I&#8217;d love to see it demonstrated on production systems&#8211;especially since I think that might be easier to achieve than convincing the SIGIR community to embrace it.</li>
<li>Which leads me nicely to the last poster I saw, &#8220;Machine Learned Ranking of Entity Facets&#8221; by Roelof van Zwol, Lluis Garcia Pueyo, Mridul Muralidharan, and Borkur Sigurbjornsson. They found that they could accurately predict click-through rates on named entity facets (people, places) by learning from click logs. It&#8217;s worth noting that their entity facets are extremely clean, since they are derived from sources like Wikipedia, IMDB, GeoPlanet, and Freebase. It&#8217;s not clear to me how well their approach would work for noisier facets extracted from open-domain data.</li>
</ul>
<p>As I said, there were over a hundred posters, and I&#8217;d meant to see far more of them. Hopefully other people will blog about some of them! Or perhaps tweet about them at <a href="search.twitter.com/search?q=%23sigir2010">#sigir2010</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-posters/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 1 Technical Sessions</title>
		<link>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-technical-sessions/</link>
		<comments>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-technical-sessions/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 07:08:27 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3204</guid>
		<description><![CDATA[I&#8217;ve always felt that parallel conference sessions are designed to optimize for anticipated regret, and SIGIR 2010 is no exception. I decided that I&#8217;d try to attend whole sessions rather than shuttle between them. I started by attending the descriptively titled &#8220;Applications I&#8221; session. Jinyoung Kim of UMass presented joint work with Bruce Croft on [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve always felt that parallel conference sessions are designed to optimize for <a href="http://researchstories.asu.edu/2007/10/anticipated_regret_takes_out_t.html">anticipated regret</a>, and <a href="http://www.sigir2010.org/">SIGIR 2010</a> is no exception. I decided that I&#8217;d try to attend whole sessions rather than shuttle between them. I started by attending the descriptively titled &#8220;Applications I&#8221; session.</p>
<p>Jinyoung Kim of UMass presented joint work with Bruce Croft on &#8220;Ranking using Multiple Document Types in Desktop Search&#8221; in which they showed that type prediction can significantly improve known-item search performance in simulated desktop settings. I like the approach and result, but I&#8217;d be very interested to see how well it applied to more recall-oriented tasks.</p>
<p>Then came work by Googlers Enrique Alfonseca, Marius Pasca, and Enrique Robledo-Arnuncio on &#8220;Acquisition of Instance Attributes via Labeled and Related Instances&#8221; that overcomes the data sparseness of open-domain attribute extraction by computing relationships among instances and injecting this relatedness data into the instance-attribute graph so that attributes can be propagated to more instances. This is a nice enhancement to <a href="http://research.google.com/pubs/author107.html">earlier work by Pasca and others</a> on obtaining these  instance-attribute graphs.</p>
<p>The session ended with an intriguing paper on &#8220;Relevance and Ranking in Online Dating Systems&#8221; by Yahoo researchers Fernando Diaz, Donald Metzler, and Sihem Amer-Yahia that formulated a two-way relevance model for matchmaking systems but unfortunately found that it did no better than query-independent ranking in the context of a production personals system. I would be very interested to see how the model applied to other matchmaking scenarios, such as matching job seekers to employers.</p>
<p>After a wonderful lunch hosted by <a href="http://www.morganclaypool.com/">Morgan &amp; Claypool</a> for <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">authors</a>, I attended a session on Filtering and Recommendation.</p>
<p>It started with a paper on &#8220;Social Media Recommendation Based on People and Tags&#8221; by IBM researchers Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, and Erel Uziel. They analyzed item recommendation in an enterprise setting and found that a hybrid approach combining algorithmic tag-based recommendations with people-based recommendations achieves better performance at delivering interesting recommendations than either approach alone. I&#8217;m curious how well these results generalize outside of enterprise settings&#8211;or even how well they apply across the large variation in enterprises.</p>
<p>Then came work by Nikolaos Nanas, Manolis Vavalis, and Anne De Roeck on &#8220;A Network-Based Model for High-Dimensional Information Filtering&#8221;. The authors propose to overcome the &#8220;curse of dimensionality&#8221; of vector space representations of profiles by instead modeling keyword dependencies in a directed graph and applying a non-iterative activation model to it. The presentation was excellent, but I&#8217;m not entirely convinced by the baseline they used for their comparisons.</p>
<p>After that was a paper by Neal Lathia, Stephen Halles, Licia Capra, and Xavier Amatriain on &#8220;Temporal Diversity in Recommender Systems&#8221;. They focused on the problem that users get bored and frustrated by recommender systems that keep recommending the same items over time. They provided evidence that users prefer temporal diversity of recommendations and suggested some methods to promote it. I like the research, but I still think that <a href="http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/">recommendation engines cry out for transparency</a>, and that transparency can also help address the diversity problem&#8211;e.g., pick a random movie the user watched and propose recommendations explicitly based on that movie.</p>
<p>Unfortunately I missed the last paper of the session, in which Noriaki Kawamae talked about &#8220;Serendipitous Recommendations via Innovators&#8221;.</p>
<p>Reminder: also check out the tweet stream with hash tag <a href="search.twitter.com/search?q=%23sigir2010">#sigir2010</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-technical-sessions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 1 Keynote</title>
		<link>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-keynote/</link>
		<comments>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-keynote/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 06:44:32 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3201</guid>
		<description><![CDATA[As promised, here are some highlights of the SIGIR 2010 conference thus far. Also check out the tweet stream with hash tag #sigir2010. I arrived here on Monday, too jet-lagged to even imagine attending the tutorials, but fortunately I recovered enough to go to the welcome reception in the Parc de Bastions that evening. Then [...]]]></description>
			<content:encoded><![CDATA[<p>As promised, here are some highlights of the <a href="http://www.sigir2010.org/">SIGIR 2010</a> conference thus far. Also check out the tweet stream with hash tag <a href="http://search.twitter.com/search?q=%23sigir2010">#sigir2010</a>.</p>
<p>I arrived here on Monday, too jet-lagged to even imagine attending the <a href="http://www.sigir2010.org/doku.php?id=program:tutorials">tutorials</a>, but fortunately I recovered enough to go to the welcome reception in the <a href="http://www.bastions.ch/">Parc de Bastions</a> that evening. Then a night of sleep and on to the main event.</p>
<p>Tuesday morning kicked off with a keynote by Microsoft Live Labs director <a href="http://flakenstein.net/">Gary Flake</a> entitled &#8220;Zoomable UIs, Information Retrieval, and the Uncanny Valley&#8221;. Flake&#8217;s premise is that information retrieval is stuck in the &#8220;<a href="http://en.wikipedia.org/wiki/Uncanny_valley">uncanny valley</a>&#8220;, a metaphor he borrows from the robotics community. According to Wikipedia:</p>
<blockquote><p>The theory holds that when robots and other facsimiles of humans look and act almost like actual humans, it causes a response of revulsion among human observers. The &#8220;valley&#8221; in question is a dip in a proposed graph of the positivity of human reaction as a function of a robot&#8217;s lifelikeness.</p></blockquote>
<p>Flake offered <a href="http://en.wikipedia.org/wiki/Grokker">Grokker</a> (R.I.P.) as an example of a search interface that emphasized visual clustering and got stuck in the uncanny valley. He called it &#8220;the sexiest search experience that no one was going to use&#8221;.  Flake then went on to propose that moving beyond the uncanny valley would require replacing our current discrete interactions with search engines into a mode of continuous, fluid interaction where whole of data greater than sum or parts. He offered some demos, emphasizing the recently released <a href="http://www.getpivot.com/">Pivot</a> client, that he felt provided a vision to overcome the uncanny valley.</p>
<p>As became clear in the question and answer period, many people (myself included) felt that this rich visual approach might work well for browsing images but not as clear a fit for text-oriented information needs&#8211;despite Flake offering a demo based on the collection of Wikipedia documents. In fairness, it may be too early to assess a proof of concept.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-keynote/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Off to Geneva for SIGIR</title>
		<link>http://thenoisychannel.com/2010/07/18/off-to-geneva-for-sigir/</link>
		<comments>http://thenoisychannel.com/2010/07/18/off-to-geneva-for-sigir/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 12:30:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3198</guid>
		<description><![CDATA[I&#8217;m flying to Geneva tonight to attend SIGIR. Hope to see some of you there! I&#8217;ll be back in a week and will post highlights and personal reactions.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m flying to Geneva tonight to attend <a href="http://www.sigir2010.org/">SIGIR</a>. Hope to see some of you there! I&#8217;ll be back in a week and will post highlights and personal reactions.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/18/off-to-geneva-for-sigir/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The War on Attention Poverty: Measuring Twitter Authority</title>
		<link>http://thenoisychannel.com/2010/07/13/the-war-on-attention-poverty-measuring-twitter-authority/</link>
		<comments>http://thenoisychannel.com/2010/07/13/the-war-on-attention-poverty-measuring-twitter-authority/#comments</comments>
		<pubDate>Wed, 14 Jul 2010 03:13:45 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3195</guid>
		<description><![CDATA[I gave this presentation today at AT&#38;T Labs, hosted by Stephen North of Graphviz fame. The talk was recorded, but I don&#8217;t know when the video will be available. In the mean time, here are the slides. The audience was very engaged and questioned just about all of the TunkRank model&#8217;s assumptions. I&#8217;m hopeful that as [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_4749609" style="width: 425px;"><object id="__sse4749609" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=waronattentionpoverty-100713213804-phpapp01&amp;stripped_title=the-war-on-attention-poverty-measuring-twitter-authority" /><param name="name" value="__sse4749609" /><param name="allowfullscreen" value="true" /><embed id="__sse4749609" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=waronattentionpoverty-100713213804-phpapp01&amp;stripped_title=the-war-on-attention-poverty-measuring-twitter-authority" name="__sse4749609" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;">
<p>I gave this presentation today at <a href="http://www.research.att.com/editions/201005_home.html">AT&amp;T Labs</a>, hosted by <a href="http://www.research.att.com/people/North_Stephen_C">Stephen North</a> of <a href="http://www.graphviz.org/">Graphviz</a> fame. The talk was recorded, but I don&#8217;t know when the video will be available. In the mean time, here are the slides.</p>
<p>The audience was very engaged and questioned just about all of the TunkRank model&#8217;s assumptions. I&#8217;m hopeful that as <a href="http://mendicantbug.com/about/">Jason Adams</a> and <a href="http://www.linkedin.com/in/israelkloss">Israel Kloss</a> work on making a business out of <a href="http://tunkrank.com/">TunkRank</a>, they&#8217;ll bridge some of the gap between simplicity and realism.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/13/the-war-on-attention-poverty-measuring-twitter-authority/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Recruiting and a Lesson in Attention Scarcity</title>
		<link>http://thenoisychannel.com/2010/07/11/recruiting-and-a-lesson-in-attention-scarcity/</link>
		<comments>http://thenoisychannel.com/2010/07/11/recruiting-and-a-lesson-in-attention-scarcity/#comments</comments>
		<pubDate>Mon, 12 Jul 2010 00:59:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3191</guid>
		<description><![CDATA[Several people have asked me recently for advice on how to recruit for their tech startups. I&#8217;ve responded by digging out the following email that someone emailed me last year. I reproduce it in full here, minus the company name: Subject: we just got Beatles Rock Band for the office and are looking for a [...]]]></description>
			<content:encoded><![CDATA[<p>Several people have asked me recently for advice on how to recruit for their tech startups. I&#8217;ve responded by digging out the following email that someone emailed me last year. I reproduce it in full here, minus the company name:</p>
<blockquote><p>Subject: we just got Beatles Rock Band for the office and are looking for a vocalist !!</p>
<p>Good Afternoon,</p>
<p>I hope you don&#8217;t mind me reaching out to you, but came across your LinkedIn page and my interest is peaked, to say the least. I hope after reading this you feel the same.</p>
<p>If you&#8217;re unfamiliar with XXXXXX, we are a distinct small and agile team that functions as an incubated start-up funded by a larger organization. What we are working on is still kind of a secret but I can tell you that it’s focused on completely changing the way we find, consume, share, and manage content on the web today. We are focused on the growing importance of the real-time web and the concurrent need to reduce the noise. We are driven by a strong desire to deliver a better overall experience with a lot less effort required from our users.</p>
<p>Our office is extremely open and collegiate, and we are committed to letting ideas thrive above all else. We’re a very eclectic bunch of characters, but we all share a common commitment to taking whatever we do, fun or work, to the max. Some words that have been used to describe us are: passionate, fun, funny, innovative, contrarian, automagical, brilliant, academic, whimsical, and most importantly respectful. If you fit 3 or more of those descriptions, you might just have some of that magic we’re looking for.</p>
<p>If you&#8217;re interested in exploring this opportunity, please email me your resume and I&#8217;ll follow up with you ASAP, and have you come by meet the team some time soon.</p>
<p>Either way, I hope to hear from you!</p>
<p>Have a great weekend,</p>
<p>We too are BIG karaoke fans ( I read your website) , and as I said above we just got Beatles Rock Band for the office and are looking for a vocalist !!</p>
<p>Cheers,<br />
XXXXX</p></blockquote>
<p>I see this email is a poster child of how a startup should recruit. It&#8217;s well-written, funny, and shares enough about the opportunity to be an effective hook. Most importantly it&#8217;s *personal*. Starting from the subject line that made a great first impression, the email showed proof that the sender&#8211;a complete stranger&#8211;had taken time to get to know about me.</p>
<p>This is a strategy that does not scale arbitrarily&#8211;and that is the whole point. A startup that is building a small team needs to choose its prospective employees carefully and then go after those prospects with full force. If you really want to earn someone&#8217;s attention, you have to show that you&#8217;ve invested attention yourself. There&#8217;s no free lunch&#8211;if you want to send out a hundred emails like this one, you&#8217;ve got your work cut out for you! But no startup should be recruiting on such a massive scale, and the increase in yield justifies the additional per-candidate investment.</p>
<p>Of course, this principle applies beyond the narrow context of recruiting. Indeed, it is much like an <a href="http://www.itu.int/osg/spu/spam/contributions/Spam%20economics-faq.pdf">attention bond mechanism</a>: prove to me that you&#8217;ve invested in targeting me personally, and I&#8217;ll be more inclined to invest my attention in reading your message. Indeed, search advertising follows a similar principle. I still maintain that <a href="http://thenoisychannel.com/2008/10/09/search-is-not-advertising/">search is not advertising</a>, but perhaps this aspect of negotiating a shared interest between messenger and messengee is a common thread.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/11/recruiting-and-a-lesson-in-attention-scarcity/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Paul Adams&#8217;s Presentation on Social Networking</title>
		<link>http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/</link>
		<comments>http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/#comments</comments>
		<pubDate>Thu, 08 Jul 2010 04:29:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3188</guid>
		<description><![CDATA[This presentation by Paul Adams, lead for User Research for Social at Google, has been making the rounds in the blogosphere. It&#8217;s long (over 200 slides!) but well worth the time to read it, even if you&#8217;re already familiar with the ethnography of online social behavior. It touches on all things online and social, from [...]]]></description>
			<content:encoded><![CDATA[<p><img style="visibility: hidden; width: 0px; height: 0px;" src="http://counters.gigya.com/wildfire/IMP/CXNID=2000002.0NXC/bT*xJmx*PTEyNzg1NjI5MDYyMzUmcHQ9MTI3ODU2MjkxODIxMiZwPTEwMTkxJmQ9V*ZfZW1iZWRfZG9jdW1lbnQmZz*yJm89OGFl/YzIxMGM5MmE2NGVlOGI4MDc4YmU3MjAzMDRkODcmb2Y9MA==.gif" border="0" alt="" width="0" height="0" /></p>
<div id="__ss_4656436" style="width: 477px;"><object id="__sse4656436" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="477" height="510" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="FlashVars" value="gig_lt=1278562906235&amp;gig_pt=1278562918212&amp;gig_g=2" /><param name="src" value="http://static.slidesharecdn.com/swf/doc_player.swf?doc=vtm2010-100701010846-phpapp01&amp;stripped_title=the-real-life-social-network-v2" /><param name="name" value="__sse4656436" /><param name="flashvars" value="gig_lt=1278562906235&amp;gig_pt=1278562918212&amp;gig_g=2" /><param name="allowfullscreen" value="true" /><embed id="__sse4656436" type="application/x-shockwave-flash" width="477" height="510" src="http://static.slidesharecdn.com/swf/doc_player.swf?doc=vtm2010-100701010846-phpapp01&amp;stripped_title=the-real-life-social-network-v2" name="__sse4656436" flashvars="gig_lt=1278562906235&amp;gig_pt=1278562918212&amp;gig_g=2" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;"></div>
<div style="padding: 5px 0 12px;">This presentation by <a href="http://www.thinkoutsidein.com/blog/about-paul-adams/">Paul Adams</a>, lead for User Research for Social at Google, has been making the rounds in the blogosphere. It&#8217;s long (over 200 slides!) but well worth the time to read it, even if you&#8217;re already familiar with the ethnography of online social behavior. It touches on all things online and social, from the theory of strong and weak ties to social influence to privacy. Enjoy!</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Beyond Social Currency</title>
		<link>http://thenoisychannel.com/2010/07/06/beyond-social-currency/</link>
		<comments>http://thenoisychannel.com/2010/07/06/beyond-social-currency/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 20:52:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3184</guid>
		<description><![CDATA[A research study I like enough to have blogged about it a few times is Princeton sociologist Matt Salganik&#8216;s dissertation work on music preferences and social contagion. For those unfamiliar with this work, here is the abstract of his Science article &#8220;Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market&#8221; (co-authored with Peter Dodds [...]]]></description>
			<content:encoded><![CDATA[<p>A research study I like enough to have blogged about it a few times is Princeton sociologist <a href="http://www.princeton.edu/~mjs3/">Matt Salganik</a>&#8216;s dissertation work on music preferences and social contagion. For those unfamiliar with this work, here is the abstract of his <em>Science</em> article &#8220;<a href="http://www.sciencemag.org/cgi/content/abstract/311/5762/854">Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market</a>&#8221; (co-authored with <a href="http://www.uvm.edu/~pdodds/">Peter Dodds</a> and <a href="http://research.yahoo.com/Duncan_Watts">Duncan Watts</a>):</p>
<blockquote><p>Hit songs, books, and movies are many times more successful than average, suggesting that &#8220;the best&#8221; alternatives are qualitatively different from &#8220;the rest&#8221;; yet experts routinely fail to predict which products will succeed. We investigated this paradox experimentally, by creating an artificial &#8220;music market&#8221; in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants&#8217; choices. Increasing the strength of social influence increased both inequality and unpredictability of success. Success was also only partly determined by quality: The best songs rarely did poorly, and the worst rarely did well, but any other result was possible.</p></blockquote>
<p>The result is hardly surprising to anyone familiar with the history of pop music. But I&#8217;m intrigued by the possibility that technology is simultaneously pulling music as a social phenomenon in two opposite directions.</p>
<p>On one hand, YouTube and social networks may actually be amplifying the positive feedback of music popularity. The recent story of YouTube sensation <a href="http://en.wikipedia.org/wiki/Greyson_Chance">Greyson Chance</a> (yes, a 13-year old with his own Wikipedia entry) becoming a national phenomenon in a couple of weeks attests to the power of social contagion. I don&#8217;t mean to take anything away from Chance&#8217;s talent, but I feel safe asserting that his talent was necessary but hardly sufficient to achieve his popular success.</p>
<p>On the other hand, Internet radio services like <a href="http://en.wikipedia.org/wiki/Pandora_Radio">Pandora</a> and <a href="http://en.wikipedia.org/wiki/Last.fm">Last.fm</a>, despite their social features, offer the possibility of drastically reducing the effect of social influence. Both of these services require users to provide some representation of their musical tastes as initial inputs, whether by selecting preset stations or using particular artists or songs as seeds. Presumably those tastes are in large part the product of social influence. But the subsequent interaction between users and these services is relatively buffered from social influence. Users hear songs while listening privately through headphones&#8211;in many cases at work or while commuting. No one else is around when those users decide how to rate what they are listening to.</p>
<p>Granted, social context will always seep in&#8211;I don&#8217;t think I could give a thumbs-up to a <a href="http://en.wikipedia.org/wiki/Justin_Bieber">Justin Bieber</a> song even in the privacy of my own Pandora profile. But much of the music I discover is from artists I&#8217;ve never heard of&#8211;and thus evaluate without the explicit social influence of preconceptions about those artists.</p>
<p>As it turns out, I often discover after the fact that a number of the artists I like have achieved popular success. I can&#8217;t tell whether that reflects on their objective music quality, my own conformity of musical taste, or skew on the part of the recommendation system (cf. <a href="http://thenoisychannel.com/2009/02/24/how-recommendation-engines-quash-diversity/">does everything sounds like Coldplay?</a>). Still, I&#8217;m quite sure that I&#8217;m not favoring music based on prior knowledge of its popularity &#8211;for the most part, I don&#8217;t have that information at the time that I decide whether I like a song. Indeed, I hear new music almost exclusively through Pandora.</p>
<p>I don&#8217;t know how exceptional I am as a media consumer, but I suspect my case is increasingly common. Perhaps we are heading into a world where there will be a split between musical taste as social currency vs. musical taste as purely personal pleasure. It&#8217;s harder for me to imagine books or feature-length movies becoming so divorced from social context, if only because consuming them is a much larger and concentrated investment.</p>
<p>Still, I think it&#8217;s a big deal that this is happening in music. It&#8217;s a welcome counterpoint to the <a href="http://www.amazon.com/Winner-Take-All-Society-Much-More-Than/dp/0140259953">winner-take-all</a> dynamic that has dominated the past decades of pop music. I can&#8217;t say that it will make the music industry more of a meritocracy&#8211;or that I even know what that would mean. But I think it&#8217;s a welcome step away from the caricature of conformity demonstrated by Salganik&#8217;s research.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/06/beyond-social-currency/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010 and SimInt 2010</title>
		<link>http://thenoisychannel.com/2010/06/27/sigir-2010-and-simint-2010/</link>
		<comments>http://thenoisychannel.com/2010/06/27/sigir-2010-and-simint-2010/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 00:21:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3178</guid>
		<description><![CDATA[I&#8217;m looking forward to attending SIGIR 2010 in a few weeks and particularly to the SimInt 2010 Workshop on the Automated Evaluation of Interactive Information Retrieval. I hope I get to see a little bit of the city of Geneva, but mostly I&#8217;m excited to spend the greater part of a week immersed in the [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m looking forward to attending <a href="http://www.sigir2010.org/">SIGIR 2010</a> in a few weeks and particularly to the <a href="http://www.dcs.gla.ac.uk/access/simint/">SimInt 2010</a> Workshop on the Automated Evaluation of Interactive Information Retrieval. I hope I get to see a little bit of the city of <a href="http://www.ville-geneve.ch/index.php?id=6675">Geneva</a>, but mostly I&#8217;m excited to spend the greater part of a week immersed in the global information retrieval community.</p>
<p>Of course I&#8217;ll blog about the conference, though I can&#8217;t promise it will be at quite the level of detail I managed <a href="http://thenoisychannel.com/2009/07/21/sigir-2009-day-1/">last year</a>. Also, I&#8217;m glad that SIGIR is continuing to have an <a href="http://www.sigir2010.org/doku.php?id=industry:program">industry track</a>, and I am impressed with the program that <a href="http://www.linkedin.com/in/davidjohnharper">David Harper</a> and Peter Schäuble have put together. Needless to say, I&#8217;m glad to not have the stress of being an organizer this year! Though I&#8217;ll put in an early plug for <a href="http://www.cikm2011.org/">CIKM 2011</a> in Glasgow, where I&#8217;ll be organizing the industry track with former co-worker <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Some SIGIR papers that caught my attention in the <a href="http://www.sigir2010.org/doku.php?id=program:sessions">program</a>:</p>
<ul>
<li><strong>Predicting Search Frustration<br />
</strong>Henry Feild, James Allan (University of Massachusetts Amherst), Rosie Jones (Yahoo! Labs)<br />
(looks like a follow-up to the first two authors&#8217; <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> paper on <a href="http://maroo.cs.umass.edu/pub/web/getpdf.php?id=897">Modeling Searcher Frustration</a>)</li>
<li><strong>Relevance and Ranking in Online Dating Systems<br />
</strong>Fernando Diaz, Donald Metzler, Sihem Amer-Yahia (Yahoo! Labs)</li>
<li><strong>On Statistical Analysis and Optimization of Information Retrieval Effectiveness Metrics<br />
</strong>Jun Wang, Jianhan Zhu (University College London)</li>
<li><strong>Is the Cranfield Paradigm Outdated? (keynote)<br />
</strong>Donna Harman (NIST)</li>
<li><strong>Interactive Retrieval Based on Faceted Feedback<br />
</strong>Lanbo Zhang, Yi Zhang (University of California at Santa Cruz)</li>
<li><strong>Do User Preferences and Evaluation measures Line Up?<br />
</strong>Mark Sanderson, Monica Lestari Paramita, Paul Clough, Evangelos Kanoulas (University of Sheffield)</li>
<li><strong>Human Performance and Retrieval Precision Revisited<br />
</strong>Mark D. Smucker, Chandra Prakash Jethani (University of Waterloo)</li>
</ul>
<p>As for the <a href="http://www.dcs.gla.ac.uk/access/simint/">SimInt</a> workshop, it aims &#8221;to explore the use of Simulation of Interactions to enable automated evaluation of Interactive Information Retrieval Systems and Applications.&#8221; I&#8217;m very excited about this attempt to bridge the gap between <a href="http://trec.nist.gov/">TREC</a>/<a href="http://en.wikipedia.org/wiki/Cranfield_Experiments">Cranfield</a> and <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.3085&amp;rep=rep1&amp;type=pdf">IIR</a>/<a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> through simulation. Props to <a href="http://www.dcs.gla.ac.uk/~leif">Leif Azzopardi</a>, <a href="http://www.uta.fi/~likaja/">Kal Järvelin</a>, <a href="http://staff.science.uva.nl/~kamps/">Jaap Kamps</a>, and <a href="http://www.mansci.uwaterloo.ca/~msmucker/">Mark Smucker</a> for organizing it!</p>
<p>If you&#8217;re planning to attend SIGIR, please give me a shout! I plan to be there for the entire conference, and you&#8217;ll probably find me at the Google booth during some of the coffee breaks.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/06/27/sigir-2010-and-simint-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gridworks and Needlebase</title>
		<link>http://thenoisychannel.com/2010/06/20/gridworks-and-needlebase/</link>
		<comments>http://thenoisychannel.com/2010/06/20/gridworks-and-needlebase/#comments</comments>
		<pubDate>Sun, 20 Jun 2010 22:25:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3173</guid>
		<description><![CDATA[One of the big challenges of working with heterogeneous data is curating it. Below are introductions to two tools for doing do: Gridworks, developed by David Huynh, Stefano Mazzocchi, and their colleagues at Metaweb, the company behind Freebase. Needlebase, developed by Justin Boyan and colleagues at ITA Software, the company powering travel search for Kayak, Orbitz, and [...]]]></description>
			<content:encoded><![CDATA[<p>One of the big challenges of working with heterogeneous data is curating it. Below are introductions to two tools for doing do:</p>
<ul>
<li><a href="http://code.google.com/p/freebase-gridworks/">Gridworks</a>, developed by <a href="http://davidhuynh.net/">David Huynh</a>, <a href="http://www.betaversion.org/~stefano/">Stefano Mazzocchi</a>, and their colleagues at <a href="http://www.metaweb.com/">Metaweb</a>, the company behind <a href="http://www.freebase.com/">Freebase</a>.</li>
<li><a href="http://www.needlebase.com/">Needlebase</a>, developed by <a href="http://www.itasoftware.com/about_us/management.html#286">Justin Boyan</a> and colleagues at <a href="http://www.itasoftware.com/">ITA Software</a>, the company powering travel search for <a href="http://www.kayak.com/">Kayak</a>, <a href="http://www.orbitz.com/">Orbitz</a>, and others.</li>
</ul>
<p>If you&#8217;re concerned with building and maintaining collections of semi-structured data, or building your own technology for this purpose, I suggest you check out these state-of-the-art tools.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="360" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=10081183&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="480" height="360" src="http://vimeo.com/moogaloop.swf?clip_id=10081183&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/58Gzlq4zSDk&amp;hl=en_US&amp;fs=1&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/58Gzlq4zSDk&amp;hl=en_US&amp;fs=1&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/06/20/gridworks-and-needlebase/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Why Can&#8217;t We Just Use Prediction Markets?</title>
		<link>http://thenoisychannel.com/2010/06/09/why-cant-we-just-use-prediction-markets/</link>
		<comments>http://thenoisychannel.com/2010/06/09/why-cant-we-just-use-prediction-markets/#comments</comments>
		<pubDate>Wed, 09 Jun 2010 12:11:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3169</guid>
		<description><![CDATA[Prediction markets were all the rage a few years ago, two of the most notable being the Iowa Electronic Market forecasting electoral results and the now defunct Tradesports offering a similar platform for betting on sports events. There was even a proposal to have the US government run a prediction market for terrorist attacks. In [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Prediction_market">Prediction markets</a> were all the rage a few years ago, two of the most notable being the <a href="http://www.biz.uiowa.edu/iem/index.cfm">Iowa Electronic Market</a> forecasting electoral results and the now defunct <a href="http://en.wikipedia.org/wiki/TradeSports">Tradesports</a> offering a similar platform for betting on sports events. There was even a proposal to have the US government run a <a href="http://en.wikipedia.org/wiki/Information_Awareness_Office#Futures_Markets_Applied_to_Prediction_.28FutureMAP.29">prediction market for terrorist attacks</a>.</p>
<p>In a prediction market, any event with a quantifiable (e.g., binary) outcome can be converted into an asset. At any given time, the asset value corresponds to the market prediction of the probability of the outcome. Just as in any security market, participants determine the value through their buying and selling actions. In principle, this framework allows any event with a quantifiable outcome to be predicted by a marketplace.</p>
<p>But, at least from my vantage point, prediction markets have not had a broad impact on decision making, despite all of the &#8220;anys&#8221; in the previous paragraph. Outside of political forecasting and sports gambling (and of course finance itself), I&#8217;m not aware of any groups outside of academia that invest significantly in the use of  prediction markets. Sure, there&#8217;s the <a href="http://www.hsx.com/">Hollywood Stock Exchange</a> that applies the fantasy sports concept to the movie industry and even startup <a href="http://www.empireavenue.com/">Empire Avenue</a> that aspires to generalize this idea even further into an &#8220;online influence stock exchange&#8221;.  Still, I think it&#8217;s safe to say that prediction markets have had limited traction to date.</p>
<p>Many people do, however, believe that we can harness the wisdom of crowds. In particular, we as consumers rely on reviews and recommendations to inform our decisions about what to buy, read, etc. Because those decisions have financial implications for sellers, the world of online reviews has an adversarial element, where review systems face manipulation by those who would shill their own products or services. As a result, it is never clear how much we as consumers should trust the reviews we read to be sincere, let alone useful.</p>
<p>Which brings me back to prediction markets. Unlike most venues for soliciting collective opinion, prediction markets offer a strong incentive for accuracy. Betting on whether readers will like a book is quite different than simply offering a review that asserts an opinion without any risk to the person making the assertion. It is possible to manipulate a prediction market (e.g., by flooding it with high bets), but <a href="http://freakonomics.blogs.nytimes.com/2008/10/02/manipulation-in-political-prediction-markets/">research</a> suggests that such manipulations are short-lived and in fact expose the manipulator to significant financial risk when the price re-stabilizes.</p>
<p>So why don&#8217;t we use prediction markets instead of relying on reviews and recommendations? Perhaps we should, and it&#8217;s just a matter of time until entrepreneurs build successful businesses around this idea. But I suspect that much of the value of user-generated content today comes from contributors not thinking in market terms. While using prediction markets could solve the problem of shill reviews, it might also scare off the altruists.</p>
<p>Still, it seems to me that we should look for more opportunities to incent accuracy. Even altruistic reviewers have an interest in establishing their credibility, at least if that credibility determines the propagation of they opinions they share (perhaps I&#8217;m conflating altruism with egotism). The challenge may be to implement a marketplace that deals in the social currency of reputation than the hard currency of cash&#8211;while avoiding the sort of virtual currency that many people see as meaningless.</p>
<p>Can we obtain the benefits of market dynamics and still take advantage of the less rational motivations that drive some of the best online reviews today? I hope there are people who feel incented to work on this problem!</p>
<p>Some previous posts for further reading:</p>
<ul>
<li><a title="Permanent Link to Thoughts About Online Reputation" href="http://thenoisychannel.com/2010/05/02/thoughts-about-online-reputation/">Thoughts About Online Reputation</a></li>
<li><a href="http://thenoisychannel.com/2009/08/22/payola-theres-an-app-for-that/">Payola? There&#8217;s An App For That!</a></li>
<li><a title="Are Links A Distraction?" href="http://thenoisychannel.com/2010/05/31/are-links-a-distraction/">Are Links A Distraction?</a> (please let me know if you like my attempt to keep links outside the main flow)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/06/09/why-cant-we-just-use-prediction-markets/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
		<item>
		<title>Are Links A Distraction?</title>
		<link>http://thenoisychannel.com/2010/05/31/are-links-a-distraction/</link>
		<comments>http://thenoisychannel.com/2010/05/31/are-links-a-distraction/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 01:15:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3166</guid>
		<description><![CDATA[Eric Andersen called my attention to a post by Nick Carr entitled &#8220;Experiments in delinkification&#8220;, in which Carr argues that links embedded in text are distracting, and that we&#8217;re better off treating them like the footnotes they evolved from and putting them in a block at the end of the text. It&#8217;s an interesting piece, [...]]]></description>
			<content:encoded><![CDATA[<p>Eric Andersen <a href="http://twitter.com/eric_andersen/status/15140732425">called my attention</a> to a post by Nick Carr entitled &#8220;<a href="http://www.roughtype.com/archives/2010/05/experiments_in.php">Experiments in delinkification</a>&#8220;, in which Carr argues that links embedded in text are distracting, and that we&#8217;re better off treating them like the footnotes they evolved from and putting them in a block at the end of the text. It&#8217;s an interesting piece, and I see the merits of his argument. Indeed, I remember trying to read a heavily annotated edition of Nabokov&#8217;s <em><a href="http://books.google.com/books?id=UJznorXbTuYC">Lolita</a></em>, and it was extremely hard to maintain the flow of reading the novel while turning every few seconds to read about every last <a href="http://en.wikipedia.org/wiki/Vladimir_Nabokov#Entomology">entomology</a> reference in the text.</p>
<p>Nonetheless, I feel that links supply context, and I&#8217;m a fan of keeping context nearby. Indeed, I find that clicking on a link incurs a much lower cognitive cost than flipping to the back of the book, searching for an endnote. I&#8217;ve had readers specifically thank me for including links to Wikipedia entries for technical terms. I assume those readers are fully capable of finding those Wikipedia entries themselves, but that they appreciate the convenience of the links.</p>
<p>Some of the commenters on Carr&#8217;s post suggest that we use technology to address this tension between preserving the reader&#8217;s focus and supplying nearby context. Specifically, we can use <a href="http://en.wikipedia.org/wiki/Cascading_Style_Sheets">CSS</a> and have a <a href="http://en.wikipedia.org/wiki/JavaScript">JavaScript</a> button that toggles the link style between visible and invisible. I like the idea of handing readers control of the presentation style, though I still think it&#8217;s important to pick a sensible default. At the very least, a document should be self-contained so that a reader can choose if and when to look at the material it cites. The document should also give credit where it&#8217;s due, linking to the material it cites in a way that is visible to people and search engines. Beyond that, I think it&#8217;s really a matter of author style.</p>
<p>Still, I&#8217;m curious what folks here&#8211;especially long-time readers&#8211;think. Do I link so heavily that it&#8217;s distracting? Would it be easier to read my posts if the links were in a block at the end? I write for you, so please let me know how I can make this blog better. I don&#8217;t have the resources to conduct <a href="http://en.wikipedia.org/wiki/Cognitive_load">cognitive load</a> experiments, but I&#8217;m very receptive to comments.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/31/are-links-a-distraction/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>HCIR 2010 Submission Deadlines Approaching</title>
		<link>http://thenoisychannel.com/2010/05/28/hcir-2010-submission-deadlines-approaching/</link>
		<comments>http://thenoisychannel.com/2010/05/28/hcir-2010-submission-deadlines-approaching/#comments</comments>
		<pubDate>Fri, 28 May 2010 18:05:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3162</guid>
		<description><![CDATA[Just a reminder to all of you HCIR people out there that the submission deadline for the HCIR 2010 Workshop on Human-Computer Interaction and Information Retrieval is rapidly approaching! Research papers and position papers are due on Monday, June 14th, and HCIR Challenge reports are due on Monday, July 9th. We&#8217;re looking forward to an exciting workshop [...]]]></description>
			<content:encoded><![CDATA[<p>Just a reminder to all of you <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> people out there that the submission deadline for the <a href="http://www.hcir2010.org/">HCIR 2010</a> Workshop on Human-Computer Interaction and Information Retrieval is rapidly approaching! Research papers and position papers are due on Monday, June 14th, and <a href="challenge.html">HCIR Challenge</a> reports are due on Monday, July 9th. We&#8217;re looking forward to an exciting workshop co-located with the Information Interaction in Context Symposium (<a href="http://www.iiix2010.org/">IIiX 2010</a>).</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/28/hcir-2010-submission-deadlines-approaching/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Estimating the Query Difficulty for Information Retrieval</title>
		<link>http://thenoisychannel.com/2010/05/23/estimating-the-query-difficulty-for-information-retrieval/</link>
		<comments>http://thenoisychannel.com/2010/05/23/estimating-the-query-difficulty-for-information-retrieval/#comments</comments>
		<pubDate>Mon, 24 May 2010 01:59:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3156</guid>
		<description><![CDATA[The other day, I received a surprise package in the mail: a copy of IBM researchers David Carmel and Elad Yom-Tov&#8216;s newly published lecture on &#8220;Estimating the Query Difficulty for Information Retrieval&#8220;. I wasn&#8217;t even aware that this book was being written, so I&#8217;m especially appreciative of the publisher&#8217;s kindness to send me a copy. If you [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.amazon.com/Estimating-Difficulty-Information-Retrieval-Synthesis/dp/160845357X/"><img class="alignnone" title="Estimating the Query Difficulty for Information Retrieval" src="http://ecx.images-amazon.com/images/I/51llDG%2BRb6L._SL500_AA300_.jpg" alt="" width="300" height="300" /></a></p>
<p>The other day, I received a surprise package in the mail: a copy of IBM researchers <a href="http://domino.research.ibm.com/comm/research_people.nsf/pages/carmel.index.html">David Carmel</a> and <a href="https://researcher.ibm.com/researcher/view.php?person=il-YOMTOV">Elad Yom-Tov</a>&#8216;s newly published lecture on &#8220;<a href="http://www.morganclaypool.com/doi/abs/10.2200/S00235ED1V01Y201004ICR015">Estimating the Query Difficulty for Information Retrieval</a>&#8220;. I wasn&#8217;t even aware that this book was being written, so I&#8217;m especially appreciative of the publisher&#8217;s kindness to send me a copy.</p>
<p>If you liked <a href="http://www.vf.utwente.nl/~hauffc/">Claudia Hauff</a>&#8216;s recent dissertation on “<a href="http://eprints.eemcs.utwente.nl/17338/">Predicting the Effectiveness of Queries and Retrieval Systems</a>&#8221; (cf. my blog post on how &#8220;<a href="http://thenoisychannel.com/2010/03/07/not-all-queries-are-created-equal/">Not All Queries Are Created Equal</a>&#8220;), then you&#8217;ll love this compact lecture that review the work on pre-retrieval and post-retrieval prediction of query performance. It covers <a href="http://ciir.cs.umass.edu/pubfiles/ir-241.pdf">query clarity</a>, <a href="http://maroo.cs.umass.edu/pdf/IR-532.pdf">ranking robustness</a>, <a href="http://www.science.uva.nl/~mdr/Publications/Files/ecir2008-coherence.pdf">query coherence</a>, and much more.</p>
<p>I&#8217;m a big fan of the <a href="http://www.morganclaypool.com/">Morgan &amp; Claypool</a> series of <a href="http://www.morganclaypool.com/toc/icr/1/1">Synthesis Lectures on Information Concepts, Retrieval, and Services</a>, though I&#8217;m admittedly <a href="http://thenoisychannel.com/faceted-search-the-book/">biased</a>. Still, I think these books are an excellent way to get an overview of a subject, and Carmel and Yom-Tov&#8217;s book delivers wonderfully. For those not lucky enough to receive free copies in the mail, I recommend <a href="http://www.amazon.com/Estimating-Difficulty-Information-Retrieval-Synthesis/dp/160845357X/">Amazon</a>, which is selling it for less than $24.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/23/estimating-the-query-difficulty-for-information-retrieval/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Elastic Lists for Faceted Search &#8212; Now Open Source!</title>
		<link>http://thenoisychannel.com/2010/05/20/elastic-lists-for-faceted-search-now-open-source/</link>
		<comments>http://thenoisychannel.com/2010/05/20/elastic-lists-for-faceted-search-now-open-source/#comments</comments>
		<pubDate>Thu, 20 May 2010 23:10:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3151</guid>
		<description><![CDATA[If you like faceted search and are interested in design patterns for it, I encourage you to check out Moritz Stefaner&#8216;s work on elastic lists. Here is his description: Elastic lists allow to navigate large, multi-dimensional info spaces with just a few clicks, never letting you run into situations with zero results. They enhance traditional [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://moritz.stefaner.eu/projects/elastic-lists/"><img class="alignnone" title="Elastic Lists" src="http://moritz.stefaner.eu/public/images/peace.gif" alt="" width="467" height="172" /></a></p>
<p>If you like <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> and are interested in <a href="http://www.alistapart.com/articles/design-patterns-faceted-navigation/">design patterns</a> for it, I encourage you to check out <a href="http://moritz.stefaner.eu/">Moritz Stefaner</a>&#8216;s work on <a href="http://moritz.stefaner.eu/projects/elastic-lists/">elastic lists</a>. Here is his description:</p>
<blockquote><p>Elastic lists allow to navigate large, multi-dimensional info spaces with just a few clicks, never letting you run into situations with zero results. They enhance traditional UI approaches for facet browsers by visualizing weight proportions, animated transitions, emphasis of characteristic values and sparkline visualizations.</p></blockquote>
<p>And the good news is that elastic lists are now an <a href="http://github.com/MoritzStefaner/Elastic-Lists">open source project</a>, available under an <a href="http://www.apache.org/licenses/LICENSE-2.0.html">Apache 2.0 license</a>. Also available for free is a <a href="http://thenoisychannel.com/2009/09/25/free-chapter-on-faceted-search-user-interface-design/">book chapter</a> on faceted search user interface design that Stefaner co-authored last year.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/20/elastic-lists-for-faceted-search-now-open-source/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>People You May Know &#8212; Now With Faceted Search!</title>
		<link>http://thenoisychannel.com/2010/05/15/people-you-may-know-now-with-faceted-search/</link>
		<comments>http://thenoisychannel.com/2010/05/15/people-you-may-know-now-with-faceted-search/#comments</comments>
		<pubDate>Sat, 15 May 2010 20:50:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3148</guid>
		<description><![CDATA[I was just looking at LinkedIn and found myself pleasantly surprised by a minor UI improvement in the &#8220;People You May Know&#8221; widget: as you delete people you don&#8217;t know, the widget now updates without your having to go to another page or refresh the home page. Curious, I looked to see if LinkedIn had [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="512" height="308" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/rDCYg1loAxQ&amp;color1=0xb1b1b1&amp;color2=0xd0d0d0&amp;hl=en_US&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="512" height="308" src="http://www.youtube.com/v/rDCYg1loAxQ&amp;color1=0xb1b1b1&amp;color2=0xd0d0d0&amp;hl=en_US&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>I was just looking at <a href="http://linkedin.com/">LinkedIn</a> and found myself pleasantly surprised by a minor UI improvement in the &#8220;People You May Know&#8221; widget: as you delete people you don&#8217;t know, the widget now updates without your having to go to another page or refresh the home page. Curious, I looked to see if LinkedIn had blogged about it.</p>
<p>What I discovered was an even nicer surprise: LinkedIn now connects the People You May Know feature to its <a href="http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/">faceted search</a> interface. Indeed, they <a href="http://blog.linkedin.com/2010/05/12/linkedin-pymk/">blogged about it</a> earlier this week. Props to LinkedIn for continuing to advance the state of the art in people search!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/15/people-you-may-know-now-with-faceted-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Google Job Experiment</title>
		<link>http://thenoisychannel.com/2010/05/15/the-google-job-experiment/</link>
		<comments>http://thenoisychannel.com/2010/05/15/the-google-job-experiment/#comments</comments>
		<pubDate>Sat, 15 May 2010 19:19:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3144</guid>
		<description><![CDATA[This is just so brilliant that I had to post it here. I&#8217;ve blogged in the past about alerting spam, but this guy took the idea to a new level, with great return on investment. Perhaps the news about this story will make the tactic more popular and thus less effective through dilution. Still, it&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p><object width="512" height="308"><param name="movie" value="http://www.youtube.com/v/7FRwCs99DWg&#038;hl=en_US&#038;fs=1&#038;rel=0"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/7FRwCs99DWg&#038;hl=en_US&#038;fs=1&#038;rel=0" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="512" height="308"></embed></object></p>
<p>This is just so brilliant that I had to post it here. I&#8217;ve blogged in the past about <a href="http://thenoisychannel.com/2008/10/13/alerting-push-or-pull/">alerting spam</a>, but this guy took the idea to a new level, with great return on investment. Perhaps the <a href="http://www.google.com/search?tbs=nws:1&#038;q=%22google+job+experiment%22">news</a> about this story will make the tactic more popular and thus less effective through dilution. Still, it&#8217;s fun to see how people exploit inefficiencies in attention markets.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/15/the-google-job-experiment/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Slides from Enterprise Search Summit Keynotes</title>
		<link>http://thenoisychannel.com/2010/05/12/slides-from-enterprise-search-summit-keynotes/</link>
		<comments>http://thenoisychannel.com/2010/05/12/slides-from-enterprise-search-summit-keynotes/#comments</comments>
		<pubDate>Thu, 13 May 2010 03:00:58 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3140</guid>
		<description><![CDATA[Here are the slides from Marti Hearst&#8217;s and Peter Morville&#8216;s keynote presentations at the Enterprise Search Summit: Designing Search For Humans Search &#38; Discovery Patterns]]></description>
			<content:encoded><![CDATA[<p>Here are the slides from <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst&#8217;</a>s and <a href="http://semanticstudios.com/about/">Peter Morville</a>&#8216;s keynote presentations at the <a href="http://www.enterprisesearchsummit.com/2010/">Enterprise Search Summit</a>:</p>
<div id="__ss_4075656" style="width: 425px;"><strong><a title="Designing Search For Humans" href="http://www.slideshare.net/marti_hearst/designing-search-for-humans">Designing Search For Humans</a></strong><object id="__sse4075656" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=enterprisesearch2010-100512210512-phpapp02&amp;stripped_title=designing-search-for-humans" /><param name="name" value="__sse4075656" /><param name="allowfullscreen" value="true" /><embed id="__sse4075656" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=enterprisesearch2010-100512210512-phpapp02&amp;stripped_title=designing-search-for-humans" name="__sse4075656" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<div style="padding: 5px 0 12px;"></div>
<div style="width: 425px;"><strong><a title="Search &amp; Discovery Patterns" href="http://www.slideshare.net/morville/search-discovery-patterns">Search &amp; Discovery Patterns</a></strong><object id="__sse4040702" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=searchpatterns-100510134608-phpapp01&amp;stripped_title=search-discovery-patterns" /><param name="name" value="__sse4040702" /><param name="allowfullscreen" value="true" /><embed id="__sse4040702" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=searchpatterns-100510134608-phpapp01&amp;stripped_title=search-discovery-patterns" name="__sse4040702" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;"></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/12/slides-from-enterprise-search-summit-keynotes/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Something Different from Google New York</title>
		<link>http://thenoisychannel.com/2010/05/12/something-different-from-google-new-york/</link>
		<comments>http://thenoisychannel.com/2010/05/12/something-different-from-google-new-york/#comments</comments>
		<pubDate>Wed, 12 May 2010 20:33:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3137</guid>
		<description><![CDATA[Earlier this week, I mentioned that my colleagues here at Google New York were working cool stuff. Today some of them officially blogged about it! Check out today&#8217;s official Google blog post about &#8220;Understanding the web to find short answers and &#8216;something different&#8216;&#8221; by engineer John Provine, which talks about Google&#8217;s latest work in question [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thenoisychannel.com/2010/05/09/celebrating-six-months-at-google-new-york/">Earlier this week</a>, I mentioned that my colleagues here at Google New York were working cool stuff. Today some of them officially blogged about it! Check out today&#8217;s official Google blog post about &#8220;<a href="http://googleblog.blogspot.com/2010/05/understanding-web-to-find-short-answers.html">Understanding the web to find short answers and &#8216;something different</a>&#8216;&#8221; by engineer John Provine, which talks about Google&#8217;s latest work in <a href="http://en.wikipedia.org/wiki/Question_answering">question answering</a> and <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>.</p>
<p>Examples:</p>
<ul>
<li>A question like [<a href="http://www.google.com/search?hl=en&amp;q=gdp+of+usa">gdp of usa</a>] returns a chart derived from<a href="http://www.google.com/publicdata?ds=wb-wdi&amp;met=ny_gdp_mktp_cd&amp;idim=country:USA&amp;dl=en&amp;hl=en&amp;q=gdp+of+usa"> public data</a>.</li>
<li>Searching for [<a href="http://www.google.com/search?hl=en&amp;q=dora">dora</a>] (yes, I have a <a href="http://www.flickr.com/photos/24264445@N05/">2-year old</a>!) suggests elmo, mickey mouse, barney, scooby doo, and bratz as &#8220;something different&#8221;.</li>
</ul>
<p>I&#8217;m thrilled to see my colleagues&#8217; work  getting more visibility.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/12/something-different-from-google-new-york/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Peter Morville&#8217;s Keynote at Enterprise Search Summit</title>
		<link>http://thenoisychannel.com/2010/05/12/peter-morvilles-keynote-at-enterprise-search-summit/</link>
		<comments>http://thenoisychannel.com/2010/05/12/peter-morvilles-keynote-at-enterprise-search-summit/#comments</comments>
		<pubDate>Wed, 12 May 2010 16:47:41 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3132</guid>
		<description><![CDATA[This morning&#8217;s Enterprise Search Summit keynote was by Peter Morville, who has written a number of best-selling books about information architecture. I&#8217;ve known Peter for a while and had the pleasure of serving as a reviewer for his latest book, Search Patterns, but had never seen him present this material live. As you can see [...]]]></description>
			<content:encoded><![CDATA[<p>This morning&#8217;s <a href="http://www.enterprisesearchsummit.com/2010/">Enterprise Search Summit</a> keynote was by <a href="http://semanticstudios.com/about/">Peter Morville</a>, who has written a number of best-selling books about <a href="http://en.wikipedia.org/wiki/Information_architecture">information architecture</a>. I&#8217;ve known Peter for a while and had the pleasure of serving as a reviewer for his latest book, <em><a href="http://searchpatterns.org/">Search Patterns</a></em>, but had never seen him present this material live. As you can see from his <a href="http://www.slideshare.net/morville/search-discovery-patterns">slides</a>, Peter&#8217;s presentation style is incredibly visual&#8211;almost all of his slides are screenshots or illustrations explaining his concepts. It makes for a great presentation, but a difficult text summary!</p>
<p>The focus of his talk, naturally, was patterns. Specifically, he advocated that we take the behavior patterns of information seekers that library and information scientists have been studying for years, and use them to inform design patterns for search user interfaces.</p>
<p>One point he raised that deserves a deeper dive:  number of media (mobile,  kiosk, TV) environments push people to browse, partly because of limitations of the medium but also taking advantage of the novelty and relative lack of user habits. Unfortunately, browsing doesn&#8217;t always scale in those environments, so search is usually available as a contingency.</p>
<p>Interestingly, while Peter promotes rich interfaces in many of his patterns, he noted that great results ranking plus speedy response (he uses Google &#8220;classic&#8221; as his example) does allow users to rapidly reformulate their queries while staying in the flow of the information seeking experience. He returned to Google later in his talk, noting that the new interface goes beyond ranking to support a richer user interaction.</p>
<p>And, like me and <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a> (yesterday&#8217;s <a href="http://thenoisychannel.com/2010/05/11/marti-hearsts-keynote-at-enterprise-search-summit/">keynote</a>), Peter advocates <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted navigation</a> (I won&#8217;t quibble on whether to call it navigation or search) as his favorite search design pattern. He uses the <a href="http://www.lib.ncsu.edu/">NCSU library</a> as an example not only of a great implementation but also of an organization that continues to experiment with incremental design changes. He also showed faceted search examples from other domains, including <a href="http://amazon.com/">Amazon</a> and <a href="http://buzzillions.com/">Buzzilions</a>.</p>
<p>Other patterns he discusses included <a href="http://en.wikipedia.org/wiki/Question_answering">question answering</a> (his example being <a href="http://thenoisychannel.com/2009/03/31/wolfram-alpha-first-hand-impressions/">Wolfram Alpha</a>) and decision making (his example being <a href="http://hunch.com/">Hunch</a>). He didn&#8217;t go deep on these, but rather invited the audience to consider a broad palette of strategies for supporting information seeking. Indeed, when I asked him about question answering, he conceded that he was a skeptic and preferred a conversational (i.e., <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>) approach akin to a librarian&#8217;s <a href="http://en.wikipedia.org/wiki/Reference_interview">reference interview</a>.</p>
<p>His closing note was about bridging the gap between physical and digital information, where he offered a potpourri of examples (from <a href="http://www.redbox.com/">Redbox</a> to a <a href="http://www.botanicalls.com/kits/">tweeting plant</a>). I <a href="http://thenoisychannel.com/2010/05/09/celebrating-six-months-at-google-new-york/">work in local search</a>, so in my case he&#8217;s preaching to the converted. But I think he&#8217;s right that everything is only recently coming together&#8211;specifically, the ubiquity of digital data on the internet and of mobile devices in the physical world that can both consume and produce that data. Many of us take these developments for granted, but it&#8217;s important that we adapt our approach to search to address what is a very recent phenomenon.</p>
<p>Fun stuff! I didn&#8217;t get to attend the rest of the summit, but I encourage you to check out the tweet stream at <a href="http://search.twitter.com/search?q=%23ESS10">#ESS10</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/12/peter-morvilles-keynote-at-enterprise-search-summit/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Marti Hearst&#8217;s Keynote at Enterprise Search Summit</title>
		<link>http://thenoisychannel.com/2010/05/11/marti-hearsts-keynote-at-enterprise-search-summit/</link>
		<comments>http://thenoisychannel.com/2010/05/11/marti-hearsts-keynote-at-enterprise-search-summit/#comments</comments>
		<pubDate>Tue, 11 May 2010 20:01:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3127</guid>
		<description><![CDATA[The Enterprise Search Summit is taking place in New York this week, and I was lucky to be able to attend Marti Hearst&#8217;s opening keynote this morning about designing search for humans. If you&#8217;ve read her book or heard her present its material, then you&#8217;re probably familiar with the pitch she made. Still, it&#8217;s great [...]]]></description>
			<content:encoded><![CDATA[<p>The Enterprise Search Summit is taking place in New York this week, and I was lucky to be able to attend Marti Hearst&#8217;s opening keynote this morning about designing search for humans. If you&#8217;ve read her <a href="http://searchuserinterfaces.com/book/">book</a> or heard her present its material, then you&#8217;re probably familiar with the pitch she made. Still, it&#8217;s great to hear her present it live to a very non-academic audience.</p>
<p>Her major take-aways:</p>
<ul>
<li>The user&#8217;s emotional response is a key aspect of the <a href="http://en.wikipedia.org/wiki/Information_seeking">information seeking</a> experience.</li>
<li>There is a double vocabulary problem: different ways to express same concept (cf. <a href="http://furnas.people.si.umich.edu/Papers/vocab.paper.pdf">Furnas et al.</a>), and users stubbornly anchor on initial query terms (cf. <a href="http://en.wikipedia.org/wiki/Anchoring">Kahneman, Tversky, et al.</a>)</li>
<li><a href="http://en.wikipedia.org/wiki/Recognition_memory">Recognition</a> is easier than <a href="http://en.wikipedia.org/wiki/Recall_(memory)">recall</a>, so interfaces need to support the recognition process.</li>
<li>Don&#8217;t <a href="http://en.wikipedia.org/wiki/Personalization">personalize</a> search, <a href="http://en.wikipedia.org/wiki/Social_search">socialize</a> it!</li>
</ul>
<p>She peppered her talk with concrete examples and scholarly references. Given that her <a href="http://searchuserinterfaces.com/book/">book</a> is available online for free, I won&#8217;t try to replicate them all here! Still, I&#8217;ll single out two <a href="http://thenoisychannel.com/the-noisy-community/">Noisy Community</a> members: FXPAL researchers <a href="http://fxpal.com/?p=jeremy">Jeremy Pickens</a> and <a href="http://fxpal.com/?p=gene">Gene Golovchinsky</a> (for their SIGIR 2008 work on <a href="http://fxpal.com/publications/FXPAL-PR-08-460.pdf">collaborative exploratory search</a>) and user experience designer <a href="http://www.designcaffeine.com/about/">Greg Nudelman</a> for his proposal of <a href="http://www.boxesandarrows.com/view/faceted-finding-with">faceted breadcrumbs</a> as a search user interface.</p>
<p>If you missed her live, you check find a video of a <a href="http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/">tech talk</a> she gave at Google a few months ago. You can also check out the conference tweet-stream at <a href="http://search.twitter.com/search?q=%23ESS10">#ESS10</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/11/marti-hearsts-keynote-at-enterprise-search-summit/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Celebrating Six Months at Google New York</title>
		<link>http://thenoisychannel.com/2010/05/09/celebrating-six-months-at-google-new-york/</link>
		<comments>http://thenoisychannel.com/2010/05/09/celebrating-six-months-at-google-new-york/#comments</comments>
		<pubDate>Mon, 10 May 2010 01:17:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3119</guid>
		<description><![CDATA[Today I celebrate six months of working at Google. I&#8217;m having a great time, and I wanted to take a moment to share a bit about my experience thus far. First, a bit about Google&#8217;s New York office. It is a major office&#8211;in fact, Google&#8217;s second largest office and an R&#38;D powerhouse. New York Googlers [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.eileensfortcollins.com/images/1066_-_Crayons_-_Half_Birthday.JPG"><img class="aligncenter" title="Happy Half-Birthday!" src="http://www.eileensfortcollins.com/images/1066_-_Crayons_-_Half_Birthday.JPG" alt="" width="253" height="263" /></a></p>
<p>Today I celebrate six months of working at Google. I&#8217;m having a great time, and I wanted to take a moment to share a bit about my experience thus far.</p>
<p>First, a bit about Google&#8217;s New York office. It is a major office&#8211;in fact, Google&#8217;s second largest office and an R&amp;D powerhouse. New York Googlers played key roles in two of Google&#8217;s recent developments: <a href="http://googleblog.blogspot.com/2009/12/relevance-meets-real-time-web.html ">real-time search</a> and the <a href="http://googleblog.blogspot.com/2010/05/spring-metamorphosis-googles-new-look.html ">results page re-design</a>. Less visibly but not less importantly, engineers at Google New York <a href="http://www.google.com/support/jobs/bin/static.py?page=why-ny-ny.html#showcasing  ">contribute to every major aspect of Google&#8217;s technology</a>.</p>
<p>My own contributions have been toward improving local search. Local search represents  an increasingly large and important fraction of people&#8217;s online information seeking. At first glance, it might seem to be an easier problem than general web search, since there are only <a href="http://googleblog.blogspot.com/2010/04/introducing-google-places.html ">tens of millions of places</a>, compared to <a href="http://en.wikipedia.org/wiki/World_Wide_Web#Statistics  ">tens of billions of web pages</a>. But local search poses unique challenges&#8211;from data quality to ranking to supporting <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>.</p>
<p>Another area that I&#8217;m especially excited about is the work on structured data. The <a href="http://www.google.com/squared/search?q=magpie+team ">Magpie team</a> is based in New York. You may be familiar with them as the team that developed <a href="http://www.google.com/squared ">Google Squared</a>&#8211;which powers the &#8220;<a href="http://www.google.com/support/websearch/bin/answer.py?hl=en&amp;answer=180739">something different</a>&#8221; links for web search.</p>
<p>Of course, there is far more going on at Google New York than I could hope to summarize in a blog post&#8211;including lots that I can&#8217;t talk about yet. But I hope this at least gives you a taste of what it&#8217;s like to work for the world&#8217;s best search company in the world&#8217;s best city (yes, I know I&#8217;ll take some flak for at least one of those superlatives).</p>
<p>If you&#8217;re interested in learning more, please don&#8217;t hesitate to reach out to me. I may have drunk the kool-aid, but I promise to be candid and as open as I can about what it&#8217;s like to go through the hiring process and what awaits you at the end of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/09/celebrating-six-months-at-google-new-york/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>TunkRank scores added to FluidDB</title>
		<link>http://thenoisychannel.com/2010/05/05/tunkrank-scores-added-to-fluiddb/</link>
		<comments>http://thenoisychannel.com/2010/05/05/tunkrank-scores-added-to-fluiddb/#comments</comments>
		<pubDate>Thu, 06 May 2010 01:39:42 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3116</guid>
		<description><![CDATA[For those keeping track of TunkRank, I encourage you to check out FluidDB, which just added TunkRank scores to its feature set. That lets you do cool things like find out which users I follow have a TunkRank score over 40. You can also read what Jason Adams has to say about it here. Speaking [...]]]></description>
			<content:encoded><![CDATA[<p>For those keeping track of <a href="http://tunkrank.com/">TunkRank</a>, I encourage you to check out <a href="http://fluidinfo.com/fluiddb">FluidDB</a>, which just <a href="http://blogs.fluidinfo.com/fluidDB/2010/05/06/tunkrank-scores-added-to-fluiddb/">added TunkRank scores</a> to its feature set. That lets you do cool things like find out <a href="http://tickery.net/?query=has%20twitter.com/friends/dtunkelang%20and%20tunkrank.com/score%20%3E%2040&amp;sort=screen_name&amp;icon=medium&amp;tab=advanced">which users I follow have a TunkRank score over 40</a>. You can also read what <a href="http://mendicantbug.com/">Jason Adams</a> has to say about it <a href="http://mendicantbug.com/2010/05/05/tunkrank-meet-tickery/">here</a>.</p>
<p>Speaking of Jason, check out the latest improvements he&#8217;s made to the <a href="http://tunkrank.com/">TunkRank</a> interface. Pretty slick! To learn more about the TunkRank measure of Twitter influence / authority, check out <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">this post</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/05/tunkrank-scores-added-to-fluiddb/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s New Look</title>
		<link>http://thenoisychannel.com/2010/05/05/googles-new-look/</link>
		<comments>http://thenoisychannel.com/2010/05/05/googles-new-look/#comments</comments>
		<pubDate>Wed, 05 May 2010 18:59:25 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3111</guid>
		<description><![CDATA[I wish I could take even a gram of credit for this! I&#8217;m really proud of my colleagues for rolling out this new design that encourages and facilitates exploratory search. Go HCIR!]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="423" height="254" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/C-rnxNFRAQA&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="423" height="254" src="http://www.youtube.com/v/C-rnxNFRAQA&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>I wish I could take even a gram of credit for this! I&#8217;m really proud of my colleagues for rolling out this <a href="http://googleblog.blogspot.com/2010/05/spring-metamorphosis-googles-new-look.html">new design</a> that encourages and facilitates <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>. Go <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/05/googles-new-look/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Thoughts About Online Reputation</title>
		<link>http://thenoisychannel.com/2010/05/02/thoughts-about-online-reputation/</link>
		<comments>http://thenoisychannel.com/2010/05/02/thoughts-about-online-reputation/#comments</comments>
		<pubDate>Sun, 02 May 2010 05:07:03 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3102</guid>
		<description><![CDATA[Sorry for the long delay between posts. Fortunately the blogosphere has been providing ample reading material about the saga of the lost iPhone and the war of words between Apple and Adobe. I&#8217;ve been doing some reading myself. Specifically, I just read F. Randall (&#8220;Randy&#8221;) Farmer and Bryce Glass&#8216;s recent book on Building Web Reputation Systems. [...]]]></description>
			<content:encoded><![CDATA[<p>Sorry for the long delay between posts. Fortunately the blogosphere has been providing ample reading material about the saga of the <a href="http://gizmodo.com/5520471/the-tale-of-apples-next-iphone">lost iPhone</a> and the war of words between <a href="http://www.apple.com/hotnews/thoughts-on-flash/">Apple</a> and <a href="http://blogs.adobe.com/conversations/2010/04/moving_forward.html">Adobe</a>.</p>
<p>I&#8217;ve been doing some reading myself. Specifically, I just read <a href="http://www.oreillynet.com/pub/au/3900">F. Randall (&#8220;Randy&#8221;) Farmer</a> and <a href="http://www.oreillynet.com/pub/au/3901">Bryce Glass</a>&#8216;s recent book on <em><a href="http://oreilly.com/catalog/9780596159795/">Building Web Reputation Systems</a></em>. Given that I&#8217;ve been thinking a lot about online reviews and reputation systems (e.g., this recent <a href="http://thenoisychannel.com/2010/03/20/can-we-build-a-distributed-trust-network/">post</a>), I wanted to hear what the experts had to say.</p>
<p>In the book, Farmer and Glass categorize the motivations for user participation as altruistic, commercial, and egocentric. Commercial motives are clearly the most problematic: a review site loses credibility if commercially motivated reviews are disguised to make their commercial motives. Most review site scandals arise from this kind of deception (e.g., <a href="http://hnn.us/articles/125694.html">this one</a>, <a href="http://thenoisychannel.com/2009/01/17/sell-your-integrity-for-065/">this one</a>, and  <a href="http://www.mobilecrunch.com/2009/08/22/cheating-the-app-store-pr-firm-has-interns-post-positive-reviews-for-clients/">this one</a>).</p>
<p>Sincerity is a necessary but insufficient condition for a review to be valuable to the person who reads it. There is still the &#8220;people like me&#8221; problem: sincere reviewers may still be uninformed, unreasonably biased, or may simply not share our tastes. User-generated content is an inherently subjective medium.</p>
<p>Given these challenges, it&#8217;s a wonder that online review sites work at all! And yet there are real success stories. My personal favorite is <a href="http://www.amazon.com/">Amazon.com</a>. While it has has its <a href="http://techcrunch.com/2010/03/22/im-not-kidding-do-it-now/">hiccups</a>, Amazon nonetheless serves as a poster child for creating value by aggregating user opinions about products.</p>
<p>Amazon has a well-designed <a href="http://www.amazon.com/gp/help/customer/display.html/ref=help_search_1-1?nodeId=16465311">review policy</a> that gets many key elements right:</p>
<ul>
<li>Reviewers have identities tied to purchasing history. That encourages disclosure (people use their real names) and discourages abuse.</li>
<li>The reviews themselves&#8211;and even comments in discussion threads about individual reviews&#8211;are themselves reviewed as helpful or not. That may seem overly meta, but it does a lot to mitigate information overload.</li>
<li>Grounding in objective information (product content, sales rank) reduces the ability to manipulate product perception through reviews.</li>
</ul>
<p>The system isn&#8217;t perfect, but it&#8217;s good enough to be very useful.</p>
<p>But products aren&#8217;t the only reputable entities, to use Farmer and Glass&#8217;s term. What about service businesses, such as restaurants, gyms, etc. Or people?</p>
<p>If Amazon exemplifies online product reviews, then <a href="http://yelp.com/">Yelp</a> is the canonical example of a review site for service businesses. And, despite its own share of <a href="http://www.wired.com/threatlevel/2010/02/yelp-sued-for-alleged-extortion/">controversy</a>, it is quite successful. But I dare say not quite as successful as Amazon. Part of the problem is that is demographics are less representative of the general online population (here&#8217;s what Quantcast says about <a href="http://www.quantcast.com/yelp.com">Yelp</a> and <a href="http://www.quantcast.com/amazon.com">Amazon</a> demographics for their US users). Also, there&#8217;s more variance in experiencing a service than in experiencing a product.</p>
<p>But Yelp has also has had  a credibility problem regarding which reviews they allow to be published. Perhaps the root of this problem is that Yelp&#8217;s business model depends on paid advertising from the businesses reviewed on the site, while businesses would much rather have unpaid positive reviews. In contrast, Amazon makes its money buy selling products&#8211;which at least makes it perceived to be more evenhanded.</p>
<p>But neither Amazon and Yelp have touched the third rail of online reputation: people. <a href="http://linkedin.com/">LinkedIn</a> dabbles in this space by allowing its members to review one another, but reviewees have veto power over reviews&#8211;making the review graph more of a mutual admiration society.</p>
<p>A recent startup, <a href="http://thenoisychannel.com/2010/04/01/get-unvarnished/">Unvarnished</a>, is trying to create a review site with teeth. Farmer argues on his <a href="http://buildingreputation.com/writings/2010/04/dont_display_negative_karma_re.html">blog</a> that Unvarnished is breaking some major  rules:</p>
<ul>
<li>It displays negative karma&#8211;that is, it allows people to write negative reviews of one another and displays those reviews.</li>
<li>The reviews are not clearly tied to context (e.g., were the reviewer and reviewee co-workers?).</li>
<li>The anonymity of reviewers does not incent altruistic or even egocentric behavior, and is thus a recipe for abuse.</li>
</ul>
<p>I&#8217;m not as down on Unvarnished as Farmer, but I agree it will have an uphill battle to succeed. Ironically,  for all of the public concern about Unvarnished becoming a trollfest, the reviews skew strongly positive. This is probably an artifact of how Unvarnished is growing its membership: current users ask friends to review them.</p>
<p>I agree most with Farmer that Unvarnished&#8217;s incentive structure seems problematic. A person&#8217;s friends will probably be inclined to write positive reviews, and may even be annoyed at having to write them anonymously. A person&#8217;s enemies may be inclined to write negative reviews as a form of attack or revenge. But it&#8217;s less clear what will incent people to write accurate reviews&#8211;or what will signal to readers that a review is trustworthy.</p>
<p>All in all, I think that these are early days in the online reputation space, and that there is ample room for innovation. Facebook&#8217;s recent release of &#8220;<a href="http://developers.facebook.com/docs/reference/plugins/like">like buttons</a>&#8221; is an ambitious attempt to boil the ocean of &#8220;social objects&#8221;. A best poster award at the recent <a href="http://www2010.org/www/">WWW 2010</a> conference went to Paul Dütting, Monika Henzinger, and Ingmar Weber&#8217;s &#8220;How much is your Personal Recommendation Worth&#8221;.  Hopefully all of these attempts to research and innovate will lead to a world where we can derive real value from others&#8217; opinions and feel incented to contribute our own.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/05/02/thoughts-about-online-reputation/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Qui, Quae, Quora</title>
		<link>http://thenoisychannel.com/2010/04/19/qui-quae-quora/</link>
		<comments>http://thenoisychannel.com/2010/04/19/qui-quae-quora/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 22:03:32 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3098</guid>
		<description><![CDATA[A friend of mine at Quora invited me into their private beta a couple of weeks ago, and by now I suspect that many of you are using it&#8211;especially since I&#8217;ve somehow managed to be the top hit for [quora invite]. Speaking of which, I appreciate that those of you with spare invites have continued [...]]]></description>
			<content:encoded><![CDATA[<p>A friend of mine at <a href="http://www.quora.com/">Quora</a> invited me into their private beta a couple of weeks ago, and by now I suspect that many of you are using it&#8211;especially since I&#8217;ve somehow managed to be the <a href="http://thenoisychannel.com/2010/03/28/want-a-quora-invite/">top hit</a> for [<a href="http://www.google.com/search?q=quora+invite">quora invite</a>]. Speaking of which, I appreciate that those of you with spare invites have continued sharing them with the stream of folks requesting them.</p>
<p>Anyway, if you haven&#8217;t heard about Quora yet, here&#8217;s a summary from the site:</p>
<blockquote><p>Quora is a continually improving collection of questions and answers created, edited, and organized by everyone who uses it.</p></blockquote>
<p>For those of you who studied Latin, the title of this post hopefully triggers at least a faint memory of <a href="http://en.wikipedia.org/wiki/Latin_declension#Relative_pronouns">relative pronouns and declensions</a>. It&#8217;s been suggested that &#8220;quora&#8221; is a faux-Latin plural of quorum, which in turn is the genitive plural of qui. A less arcane possibility is that quora is intended to evoke the modern-day meaning of &#8220;quorum&#8221;: a gathering of the minimum number of people of an organization to conduct business. Or perhaps &#8220;quora&#8221; is a contraction of <strong>qu</strong>estion <strong>or</strong> <strong>a</strong>nswer, befitting a question-and-answer site.</p>
<p>How did I come up with all of these possibilities? Well, I did study Latin (semper ubi sub ubi!), but I found all of the above from the Quora entry entitled &#8220;<a href="http://www.quora.com/What-does-Quora-mean">What does Quora mean?</a>&#8221; (membership required to view).  Indeed, Quora is a great place to learn about Quora, as well as about <a href="http://vark.com/">Aardvark</a>, <a href="http://hunch.com/">Hunch</a>, and other question-and-answer startups. Because it&#8217;s been launched as a private beta and virally marketed among friends, the community&#8211;and thus its interests&#8211;are highly skewed towards tech startups. Indeed, people seem more inclined to compare it to programmer-oriented site <a href="http://stackoverflow.com/">Stack Overflow</a> than to <a href="http://answers.yahoo.com/">Yahoo! Answers</a>&#8211;which speaks volumes about the current user base.</p>
<p>All that said, is Quora a useful site? It certainly offers useful information, but that&#8217;s a pretty low bar&#8211;after all, the open web already offers lots of useful information. The better question is what Quora offers that the open web does not.</p>
<p>Indeed, the closed nature of the site puts it at a disadvantage to the open web: no links, search engine optimization, etc. That said, I also haven&#8217;t seen spam or any of the other abuse endemic to the open web.</p>
<p>In any case, I don&#8217;t see Quora as a knowledge base of first resort&#8211;except possibly to learn more about software startups. Whether by design or by virtue of its early membership, the site is a very narrow scope.</p>
<p>The more interesting value proposition of Quora is the community it is creating. Quora facilitates conversation, much like a members-only blog where everyone uses their real names. It&#8217;s a well-designed social site, and I like that it revolves around substantive topics.</p>
<p>But I worry that Quora faces a catch-22. If the focus stays narrow, I can&#8217;t imagine it creating enough utility to justify its <a href="techcrunch.com/2010/03/28/quora-has-the-magic-benchmark-invests-at-86-million-valuation/">$86M valuation</a>. But it&#8217;s not clear that Quora can scale up to a broader scope. Given what I&#8217;ve seen of question-and-answer sites, I&#8217;m skeptical.</p>
<p>What Quora does have going for it is an all-star team, and I&#8217;m sure they have big plans for the site. I&#8217;m very curious to see what those plans are, and how they play out.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/19/qui-quae-quora/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Deadline to Register for HCIR Challenge</title>
		<link>http://thenoisychannel.com/2010/04/14/deadline-to-register-for-hcir-challenge/</link>
		<comments>http://thenoisychannel.com/2010/04/14/deadline-to-register-for-hcir-challenge/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 01:42:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3095</guid>
		<description><![CDATA[If you are interested in participating in the HCIR Challenge, please let me know as soon as possible&#8211;and in any case by April 30th. The New York Times and the LDC are graciously providing access to The New York Times Annotated Corpus for free (waiving the usual $300 fee), but we need to let the [...]]]></description>
			<content:encoded><![CDATA[<p>If you are interested in participating in the <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a>, please let me know as soon as possible&#8211;and in any case by <strong>April 30th</strong>. The New York Times and the <a href="http://www.ldc.upenn.edu/">LDC</a> are graciously providing access to <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">The New York Times Annotated Corpus</a> for free (waiving the usual $300 fee), but we need to let the LDC know who will be participating. Hope to see lots of you presenting your systems at <a href="http://www.hcir2010.org/">HCIR 2010</a>!</p>
<p>Also, participants building their systems in <a href="http://lucene.apache.org/solr/">Solr</a> can take advantage of the <a href="http://github.com/tc/nytimes-solr-indexer">scripts</a> that <a href="http://tommy.chheng.com/">Tommy Chheng</a> has prepared. Of course, you are welcome to use your own system. Commercial software companies are especially encouraged to show off their <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> wares!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/14/deadline-to-register-for-hcir-challenge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Follow Finder</title>
		<link>http://thenoisychannel.com/2010/04/14/google-follow-finder/</link>
		<comments>http://thenoisychannel.com/2010/04/14/google-follow-finder/#comments</comments>
		<pubDate>Thu, 15 Apr 2010 00:15:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3089</guid>
		<description><![CDATA[I know there&#8217;s lots of interesting stuff coming out at the Chirp Twitter developer conference this week, and I&#8217;m still catching up on it all. But I am happy to point folks to a Google Labs application that was announced this morning: Follow Finder. It&#8217;s not the first application to suggest Twitter followers based on analysis [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.followfinder.googlelabs.com/"><img class="alignnone" title="Google Follow Finder" src="http://2.bp.blogspot.com/_7ZYqYi4xigk/S8YmZNs5CXI/AAAAAAAAF10/eq616058q34/s1600/screenfinal.png" alt="" width="496" height="201" /></a></p>
<p>I know there&#8217;s lots of interesting stuff coming out at the <a href="http://chirp.twitter.com/">Chirp</a> Twitter developer conference this week, and I&#8217;m still catching up on it all. But I am happy to point folks to a Google Labs application that was <a href="http://googleblog.blogspot.com/2010/04/google-follow-finder-find-some-sweet.html">announced</a> this morning: <a href="http://www.followfinder.googlelabs.com/">Follow Finder</a>.</p>
<p>It&#8217;s not the first application to suggest Twitter followers based on analysis of the social graph, but I&#8217;ve actually found its suggestions to be quite plausible. For example, it suggests @<a href="http://twitter.com/fredwilson">fredwilson</a>, @<a href="http://twitter.com/cshirky">cshirky</a>, @<a href="http://twitter.com/mattcutts">mattcutts</a>, @<a href="http://twitter.com/peteskomoroch">peteskomoroch</a>, and @<a href="http://twitter.com/msftresearch">msftresearch</a> as &#8220;tweeps&#8221; I should follow, and suggests that the following users have similar followers to mine: @<a href="http://twitter.com/endeca">endeca</a>, @<a href="http://twitter.com/lemire">lemire</a>, @<a href="http://twitter.com/yahooresearch">yahooresearch</a>, @<a href="http://twitter.com/googleresearch">googleresearch</a>, and @<a href="http://twitter.com/mattcutts">mattcutts</a>.</p>
<p>There&#8217;s a bit of an &#8220;<a href="http://thenoisychannel.com/2009/02/24/how-recommendation-engines-quash-diversity/">everything sounds like Coldplay</a>&#8221; effect (e.g., @<a href="http://twitter.com/fredwilson">fredwilson</a> shows up in a lot of the searches I tried), but overall I&#8217;m impressed with the quality, especially compared to the other suggestion tools I&#8217;ve tried.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/14/google-follow-finder/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>HCIR 2010: An Update</title>
		<link>http://thenoisychannel.com/2010/04/10/hcir-2010-an-update/</link>
		<comments>http://thenoisychannel.com/2010/04/10/hcir-2010-an-update/#comments</comments>
		<pubDate>Sat, 10 Apr 2010 13:45:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3086</guid>
		<description><![CDATA[I hope a number of you are planning to participate in the HCIR 2010 workshop! Here is a quick update: Program Committee chair Rob Capra has assembled a program committee that includes Susan Dumais, Gary Marchionini, and many more. Folks who want to participate in the HCIR Challenge should contact me for details! You&#8217;ll have [...]]]></description>
			<content:encoded><![CDATA[<p>I hope a number of you are planning to participate in the <a href="http://www.hcir2010.org/">HCIR 2010</a> workshop! Here is a quick update:</p>
<ul>
<li>Program Committee chair <a href="http://www.ils.unc.edu/~rcapra/">Rob Capra</a> has assembled a <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/committee.html">program committee</a> that includes <a href="http://research.microsoft.com/en-us/um/people/sdumais/">Susan Dumais</a>, <a href="http://ils.unc.edu/~march/">Gary Marchionini</a>, and many more.</li>
</ul>
<ul>
<li>Folks who want to participate in the <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> should <a href="mailto:dtunkelang@gmail.com">contact me</a> for details! You&#8217;ll have to obtain your own copy of <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">The New York Times Annotated Corpus</a> from the <a href="http://www.ldc.upenn.edu/">Linguistic Data Consortium</a> (thanks to them and to The New York Times for making this data set freely available!). After that, you can roll your own application, or you can bootstrap on the <a href="http://lucene.apache.org/solr/">Solr</a> <a href="http://github.com/tc/nytimes-solr-indexer">indexer</a> that <a href="http://tommy.chheng.com/">Tommy Chheng</a> has graciously put together.</li>
</ul>
<p>More details&#8211;particularly about the Challenge&#8211;are forthcoming and will be posted on the <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge page</a>. Meanwhile, feel free to ask me questions, either publicly here or by email, and I&#8217;ll be happy to answer them. Looking forward to seeing many of you this August!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/10/hcir-2010-an-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fernanda Viégas and Martin Wattenberg Start a New Company: Flowing Media</title>
		<link>http://thenoisychannel.com/2010/04/08/fernanda-viegas-and-martin-wattenberg-start-a-new-company-flowing-media/</link>
		<comments>http://thenoisychannel.com/2010/04/08/fernanda-viegas-and-martin-wattenberg-start-a-new-company-flowing-media/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 00:46:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3083</guid>
		<description><![CDATA[This just in: Fernanda Viégas and Martin Wattenberg, two of the biggest rock stars in the world of data visualization (their long list of accomplishments includes Many Eyes and the Baby Name Voyager), have left IBM to form their own company, Flowing Media, headquartered in Cambridge, MA. As they wrote me, &#8220;if you know of anyone who [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingmedia.com/"><img class="alignnone" title="Flowing Media" src="http://flowingmedia.com/banner.jpg" alt="" width="500" height="35" /></a></p>
<p>This just in: <a href="http://fernandaviegas.com/">Fernanda Viégas</a> and <a href="http://bewitched.com/">Martin Wattenberg</a>, two of the biggest rock stars in the world of data visualization (their long list of accomplishments includes <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">Many Eyes</a> and the <a href="http://www.babynamewizard.com/voyager">Baby Name Voyager</a>), have left IBM to form their own company, <a href="http://flowingmedia.com/">Flowing Media</a>, headquartered in Cambridge, MA. As they wrote me, &#8220;if you know of anyone who has interesting data and would like help bringing it to life, spread the word.&#8221;</p>
<p>I am very excited for them, and am eagerly anticipating the work they will produce as free agents.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/08/fernanda-viegas-and-martin-wattenberg-start-a-new-company-flowing-media/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Go TunkRank!</title>
		<link>http://thenoisychannel.com/2010/04/07/go-tunkrank/</link>
		<comments>http://thenoisychannel.com/2010/04/07/go-tunkrank/#comments</comments>
		<pubDate>Thu, 08 Apr 2010 00:48:42 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3079</guid>
		<description><![CDATA[I haven&#8217;t talked much about TunkRank in the past months, largely because Jason Adams, who stepped up to the TunkRank Implementation Challenge last year, has been leading the charge. Indeed, all I did, beyond lending my first syllable to its name, was to propose the measure and get it implemented &#8220;Tom Sawyer&#8221; style. Since then: [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://tunkrank.com/"><img class="alignnone" title="TunkRank" src="http://tunkrank.com/images/TunkRank.png" alt="" width="328" height="93" /></a></p>
<p>I haven&#8217;t talked much about <a href="http://tunkrank.com/">TunkRank</a> in the past months, largely because <a href="http://mendicantbug.com/">Jason Adams</a>, who stepped up to the <a href="http://thenoisychannel.com/2009/01/16/the-tunkrank-implementation-challenge/">TunkRank Implementation Challenge</a> last year, has been leading the charge. Indeed, all I did, beyond lending my first syllable to its name, was to <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">propose the measure</a> and get it implemented &#8220;<a href="http://www.pbs.org/marktwain/learnmore/writings_tom.html">Tom Sawyer</a>&#8221; style.</p>
<p>Since then:</p>
<ul>
<li>TunkRank was cited in a <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a> paper entitled &#8220;<a href="http://www.mysmu.edu/staff/jsweng/papers/TwitterRank_WSDM.pdf">TwitterRank: finding topic-sensitive influential twitterers</a>&#8220;</li>
<li><a href="http://www.gamocracy.com/profile/incolas/">Nicolas Cerrato</a> implemented an influence measure for gamer site <a href="http://www.gamocracy.com/">Gamocracy</a> based on TunkRank.</li>
</ul>
<p>And, most recently:</p>
<ul>
<li>University of Oviedo professor <a href="http://www.di.uniovi.es/~dani/">Daniel Gayo-Avello</a> published a research paper entitled &#8220;<a href="http://arxiv.org/abs/1004.0816">Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms</a>&#8220;, based on a follower graph of 1.8M Twitter users, in which he reports:<br />
<em><br />
Lastly, there are one method clearly outperforming PageRank with respect to penalization of abusive users while still inducing plausible rankings: TunkRank. It is certainly similar to PageRank but it makes a much better job when confronted with &#8220;cheating&#8221;: aggressive marketers are almost indistinguishable from common users –which is, of course, desirable; and spammers just manage to grab a much smaller amount of the global available prestige and reach lower positions –although they still manage to be better positioned than average users. In addition to that, the ranking induced by TunkRank certainly agrees with that of PageRank, specially at the very top of the list, meaning that many users achieving good positions with PageRank should also get good positions with TunkRank. Thus, TunkRank is a highly recommendable ranking method to apply to social networks: it is simple, it induces plausible rankings, and severely penalizes spammers when compared to PageRank.<br />
</em><br />
You can read a summary version in his blog post, descriptively titled &#8220;<a href="http://www.di.uniovi.es/~dani/PFCblog/index.php?entry=entry100407-093639">Research on a 1.8M Twitter user graph. Conclusion: TunkRank is your best option.</a>&#8220;</li>
</ul>
<p>I&#8217;ve excited that an idea I came up with on a whim (or perhaps out of <a href="http://www.texttechnologies.com/2009/01/02/daniel-tunkelang-idealizes-twitter/">excessive idealism</a>) has taken such a life of its own. And hey, I do work for a company that is into <a href="http://googleblog.blogspot.com/2009/12/relevance-meets-real-time-web.html">real-time search</a> and that knows a thing or two about <a href="http://en.wikipedia.org/wiki/Adversarial_information_retrieval">adversarial information retrieval</a>. Hopefully I&#8217;ll find way to apply TunkRank&#8211;or at least its intuition&#8211;in my own work. In the mean time, I offer those who have already done so my congratulations and gratitude.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/07/go-tunkrank/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Build Your Own NYT Linked Data Application</title>
		<link>http://thenoisychannel.com/2010/04/06/build-your-own-nyt-linked-data-application/</link>
		<comments>http://thenoisychannel.com/2010/04/06/build-your-own-nyt-linked-data-application/#comments</comments>
		<pubDate>Wed, 07 Apr 2010 02:48:01 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3074</guid>
		<description><![CDATA[Regular readers may recall hearing about the New York Times Annotated Corpus (which is the basis for the HCIR Challenge), and decision to publish their tags as Linked Open Data, Given that linked data applications are still a bit exotic, NYT semantic technologist and Noisy Community regular Evan Sandhaus published a tutorial and example application to help [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/"><img class="alignnone" title="Build Your Own NYT Linked Data Application" src="http://graphics8.nytimes.com/images/blogs/open/whowentwhere.png" alt="" width="480" height="325" /></a></p>
<p>Regular readers may recall hearing about the <a href="http://thenoisychannel.com/2008/10/31/all-the-news-thats-fit-to-text-mine/">New York Times Annotated Corpus</a> (which is the basis for the <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a>), and decision to publish their <a href="http://open.blogs.nytimes.com/2010/01/13/more-tags-released-to-the-linked-data-cloud/">tags</a> as <a href="http://esw.w3.org/LinkedData">Linked Open Data</a>, Given that linked data applications are still a bit exotic, NYT semantic technologist and <a href="http://thenoisychannel.com/the-noisy-community/">Noisy Community</a> regular <a href="http://evansandhaus.com/">Evan Sandhaus</a> published a <a href="http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/">tutorial</a> and example application to help you build your own. If you&#8217;d like to get your feet wet in the <a href="http://www.w3.org/2001/sw/">semantic web</a> (and can forgive the mixed metaphor), this is an excellent opportunity.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/06/build-your-own-nyt-linked-data-application/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Guest Post: Information Retrieval using a Bayesian Model of Learning and Generalization</title>
		<link>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/</link>
		<comments>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/#comments</comments>
		<pubDate>Sun, 04 Apr 2010 21:28:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Guest Post]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3061</guid>
		<description><![CDATA[Dinesh Vadhia, CEO and founder of &#8220;item search&#8221; company Xyggy, has been an active member of the Noisy Community for at least a year, and it is with pleasure that I publish this guest post by him, University of Cambridge / CMU Professor Zoubin Ghahramani, and University of Cambridge / Gatsby Computational Neuroscience Unit researcher [...]]]></description>
			<content:encoded><![CDATA[<p><em>Dinesh Vadhia, CEO and founder of &#8220;item search&#8221; company </em><a href="http://xyggy.com/"><em>Xyggy</em></a><em>, has been an active member of the <a href="http://thenoisychannel.com/the-noisy-community/">Noisy Community</a></em><em> for at least a year, and it is with pleasure that I publish this guest post by him, University of Cambridge / CMU Professor </em><a href="http://learning.eng.cam.ac.uk/zoubin/"><em>Zoubin Ghahramani</em></a><em>, and University of Cambridge / <a href="http://www.gatsby.ucl.ac.uk">Gatsby Computational Neuroscience Unit</a> researcher </em><a href="http://www.gatsby.ucl.ac.uk/~heller/"><em>Katherine Heller</em></a><em>. I&#8217;ve annotated the post with Wikipedia links in the hope of making it more accessible to readers without a background in statistics or machine learning.</em></p>
<p>People are very good at learning new concepts after observing just a few examples.  For instance, a child will confidently point out which animals are &#8220;dogs&#8221; after having seen only a couple of examples of dogs before in their lives.  This ability to learn concepts from examples and to generalize to new items is one of the cornerstones of intelligence.  By contrast, search services currently on the internet exhibit little or no learning and generalization.</p>
<p><a href="http://www.gatsby.ucl.ac.uk/~heller/bsets.pdf">Bayesian Sets</a> is a new framework for <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> based on how humans learn new concepts and generalize.  In this framework a query consists of a set of items which are examples of some concept.  Bayesian Sets automatically infers which other items belong to that concept and retrieves them.  As an example, for the query with the two animated movies, “<a href="http://www.imdb.com/title/tt0275847/">Lilo &amp; Stitch</a>” and “<a href="http://www.imdb.com/title/tt1049413/">Up</a>”, Bayesian Sets would return other similar animated movies, like &#8220;<a href="http://www.imdb.com/title/tt0114709/">Toy Story</a>&#8220;.</p>
<p>How does this work?  Human generalization has been intensely studied in cognitive science and various models have been proposed based on some measure of similarity and <a href="http://en.wikipedia.org/wiki/Features_(pattern_recognition)">feature</a> relevance.  Recently, <a href="http://en.wikipedia.org/wiki/Bayesian">Bayesian</a> methods have emerged as models of both human cognition and as the basis of <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a> systems.</p>
<p><strong>Bayesian Sets &#8211; a novel framework for information retrieval</strong></p>
<p>Consider a universe of items, where the items could be web pages, documents, images, ads, social and professional profiles, publications, audio, articles, video, investments, patents, resumes, medical records, or any other class of items we may want to query.</p>
<p>An individual item is represented by a <a href="http://en.wikipedia.org/wiki/Feature_vector">vector of features</a> of that item.  For example, for text documents, the features could be counts of word occurrences, while for images the features could be the amounts of different color and texture elements.</p>
<p>Given a query consisting of a small set of items (e.g. a few images of buildings) the task is to retrieve other items (e.g. other images) that belong to the concept exemplified by the query.  To achieve the task, we need a measure, or score, of how well an available item fits in with the query items.</p>
<p>A concept can be characterized by using a statistical model, which defines the <a href="http://en.wikipedia.org/wiki/Generative_model">generative process</a> for the features of items belonging to the concept.  Parameters control specific statistical properties of the features of items.  For example, a <a href="http://en.wikipedia.org/wiki/Normal_distribution">Gaussian distribution</a> has parameters which control the mean and variance of each feature.  Generally these parameters are not known, but a <a href="http://en.wikipedia.org/wiki/Prior_probability">prior distribution</a> can represent our beliefs about plausible parameter values.</p>
<p><strong>The score</strong></p>
<p>The score used for ranking the relevance of each item x given the set of query items Q compares the probabilities of two hypotheses.  The first hypothesis is that the item x came from the same concept as the query items Q.  For this hypothesis, compute the probability that the feature vectors representing all the items in Q and the item x were generated from the same model with the same, though unknown, model parameters.  The <a href="http://en.wikipedia.org/wiki/Alternative_hypothesis">alternative hypothesis</a> is that the item x does not belong to the same concept as the query examples Q.  Under this alternative hypothesis, compute the probability that the features in item x were generated from different model parameters than those that generated the query examples Q.   The ratio of the probabilities of these two hypotheses is the Bayesian score at the heart of Bayesian Sets, and can be computed efficiently for any item x to see how well it “fits into” the set Q.</p>
<p>This approach to scoring items can be used with any probabilistic generative model for the data, making it applicable to any problem domain for which a probabilistic model of data can be defined.  In many instances, items can be represented by a vector of features, where each feature can either be present or absent in the item.  For example, in the case of documents the features may be words in some vocabulary, and a document can be represented by a binary vector x where element j of this vector represents the presence or absence of vocabulary word j in the document.  For such binary data, a multivariate <a href="http://en.wikipedia.org/wiki/Bernoulli_distribution">Bernoulli distribution</a> can be used to model the feature vectors of items, where the jth parameter in the distribution represents the frequency of feature j.  Using the <a href="http://en.wikipedia.org/wiki/Beta_distribution">beta distribution</a> as the natural prior the score can be computed extremely efficiently.</p>
<p><strong>Automatically learns</strong></p>
<p>An important aspect of Bayesian Sets is that it automatically learns which features are relevant from queries consisting of two or more items. For example, a movie query consisting of “<a href="http://www.imdb.com/title/tt0088247/">The Terminator</a>” and “<a href="http://www.imdb.com/title/tt0120338/">Titanic</a>” suggests that the concept of interest is movies directed by <a href="http://www.imdb.com/name/nm0000116/">James Cameron</a>, and therefore Bayesian Sets is likely to return other movies by Cameron.  We feel that the power of queries consisting of multiple example items is unexploited in most search engines.  Searching using examples is natural and intuitive for many situations in which the standard text search box is too limited to express the user&#8217;s information need, or infeasible for the type of data being queried.</p>
<p><strong>Uses</strong></p>
<p>The Bayesian Sets method has been applied to diverse problem domains including: unlabelled image search using low-level features such as color, texture and visual <a href="http://en.wikipedia.org/wiki/Bag_of_words_model">bag-of-words</a>; movie suggestions using the <a href="http://www.grouplens.org/node/73">MovieLens</a> and <a href="http://www.netflixprize.com/">Netflix</a> ratings data; music suggestions using <a href="http://www.last.fm/">last.fm</a> play count and user tag data; finding researchers working on similar topics using a conference paper database; searching the <a href="http://www.uniprot.org/">UniProt</a> protein database with features that include annotations, sequence and structure information; searching scientific literature for similar papers; and finding similar legal cases, New York Times articles and patents.</p>
<p>Apart from web and document search, Bayesian Sets can also be used for ad retrieval through content matching, building suggestion systems (“if you liked this you will also like these” which is about understanding the user&#8217;s mindset instead of the traditional “people who liked your choice also liked these”) and finding similar people based on profiles (e.g. for social networks, online dating, recruitment and security).  All these applications illustrate the countless range of problems for which the patent-pending <a href="http://learning.eng.cam.ac.uk/zoubin/papers/bsets-nips05.pdf">Bayesian Sets</a> provides a powerful new approach to finding relevant information.  Specific details of engineering features for particular applications can be provided in a separate post (or comments).</p>
<p><strong>Interactive search box</strong></p>
<p>An important aspect of our approach is that the search box accepts text queries as well as items, by dragging them in and out of the search box.  An implementation using patent data is at <a href="http://www.xyggy.com/patent.php">http://www.xyggy.com/patent.php</a>.  Enter keywords (e.g., &#8220;earthquake sensor&#8221;) and relevant items to the keywords are displayed.  Drag an item of interest from the results into the search box and the relevance changes.  When two or more items are added into the search box, the system discovers what they have in common and returns better results.  Items can be toggled in/out of the search by clicking the +/- symbol and items can be completely removed by dragging them out of the search box.  Each change to an item in the search box automatically retrieves new relevant results.  A future version will allow for explicit <a href="http://en.wikipedia.org/wiki/Relevance_feedback">relevance feedback</a>.  Certain data sets also lend themselves to a <a href=" http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> interface and we are working on a novel implementation in this area.</p>
<p>In our current implementation, items are dragged into the search box from the results list, but it is easy to see how they could be dragged from anywhere on the web or intranet.  For example, a New York Times reader could drag an article or image of interest into the search box to find other items of relevance.  There is a natural affinity between an interactive search box as described and the new generation of touch devices.</p>
<p><strong>Summary</strong></p>
<p>Bayesian Sets demonstrates that intelligent information retrieval is possible, using a Bayesian statistical model of human learning and generalization.  This approach, based on sets of items encapsulates several novel principles.  First, retrieving items based on a query can be seen as a cognitive learning problem; where we have used our understanding of human generalization to design the probabilistic framework.  Second, retrieving items from large corpora requires fast algorithms and the exact computations for the Bayesian scoring function are extremely fast.  Finally, the example-based paradigm for finding coherent sets of items is a powerful new alternative and complement to traditional query-based search.</p>
<p>Finding relevant information from vast repositories of data has become ubiquitous in modern life.  We believe that our approach, based on cognitive principles and sound Bayesian statistics, will find many uses in business, science and society.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/04/guest-post-information-retrieval-using-a-bayesian-model-of-learning-and-generalization/feed/</wfw:commentRss>
		<slash:comments>67</slash:comments>
		</item>
		<item>
		<title>Get Unvarnished!</title>
		<link>http://thenoisychannel.com/2010/04/01/get-unvarnished/</link>
		<comments>http://thenoisychannel.com/2010/04/01/get-unvarnished/#comments</comments>
		<pubDate>Fri, 02 Apr 2010 00:08:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3053</guid>
		<description><![CDATA[Earlier this week, I read about Unvarnished on TechCrunch and was extremely curious about this &#8220;Yelp for LinkedIn&#8221; making a bold play in the online reputation space. My curiosity should be no surprise to folks who have read my recent posts about distributed trust networks and solicited reviews. Anyway, I decided to go straight to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://techcrunch.com/2010/03/30/unvarnished-a-clean-well-lighted-place-for-defamation/"><img class="alignnone" title="Unvarnished.com (via TechCrunch)" src="http://tctechcrunch.files.wordpress.com/2010/03/donna_unvarnished.jpg" alt="" width="368" height="332" /></a></p>
<p>Earlier this week, I read about <a href="http://www.getunvarnished.com/">Unvarnished</a> on <a href="http://techcrunch.com/2010/03/30/unvarnished-a-clean-well-lighted-place-for-defamation/">TechCrunch</a> and was extremely curious about this &#8220;Yelp for LinkedIn&#8221; making a bold play in the online reputation space. My curiosity should be no surprise to folks who have read my recent posts about <a href="http://thenoisychannel.com/2010/03/20/can-we-build-a-distributed-trust-network/">distributed trust networks</a> and <a href="http://thenoisychannel.com/2010/03/14/is-spontaneity-overrated/">solicited reviews</a>. Anyway, I decided to go straight to the source and persuaded Unvarnished CEO <a href="http://www.linkedin.com/in/kazanjy">Peter Kazanjy</a> to invite me to the beta.</p>
<p>My impression so far: they&#8217;ve done a great job of collecting profiles, but the reviews themselves are pretty sparse. Moreover, most of the reviews I&#8217;ve seen so far are positive&#8211;hardly the bloodbath that the blogosphere has been predicting. Membership is relatively non-anonymous (you need to sign in through Facebook Connect), but your actual reviews are posted anonymously.</p>
<p>Since Unvarnished is trying to collect reviews, it&#8217;s not surprising that the way to join the beta&#8230;is to review someone. If you want to try it out and don&#8217;t mind reviewing me (anonymously) as the price of entry, let me know, and I&#8217;ll send you an invite (we&#8217;ll have to connect on Facebook first).</p>
<p>p.s. No, this is not an April Fool&#8217;s Joke. At least not on my part!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/04/01/get-unvarnished/feed/</wfw:commentRss>
		<slash:comments>44</slash:comments>
		</item>
		<item>
		<title>CFP: HCIR 2010</title>
		<link>http://thenoisychannel.com/2010/03/29/cfp-hcir-2010/</link>
		<comments>http://thenoisychannel.com/2010/03/29/cfp-hcir-2010/#comments</comments>
		<pubDate>Tue, 30 Mar 2010 02:20:18 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3049</guid>
		<description><![CDATA[The 4th Annual Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2010) will be held in conjunction with the IIiX 2010 conference in New Brunswick, New Jersey on August 22, 2010. We&#8217;re pleased to announce that our keynote speaker will be Dan Russell from Google. New for this year, we will also be running an HCIR Challenge based on [...]]]></description>
			<content:encoded><![CDATA[<p>The 4th Annual Workshop on Human-Computer Interaction and Information Retrieval (<a href="http://www.hcir2010.org/">HCIR 2010</a>) will be held in conjunction with the <a href="http://www.iiix2010.org/">IIiX 2010</a> conference in New Brunswick, New Jersey on August 22, 2010. We&#8217;re pleased to announce that our keynote speaker will be <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a> from Google. New for this year, we will also be running an HCIR Challenge based on <a href="https://groups.google.com/group/nytnlp/web/%20Corpus%20Overview%20Page">The New York Times Annotated Corpus</a>!</p>
<p><strong>Web Site</strong></p>
<ul>
<li><a href="http://www.hcir2010.org/">http://www.hcir2010.org/</a></li>
</ul>
<p><strong>Workshop Chairs</strong></p>
<ul>
<li><a href="http://faculty.cua.edu/kules/">Bill Kules</a>, The Catholic University of America</li>
<li><a href="http://www.cs.cmu.edu/~quixote/">Daniel Tunkelang</a>, Google</li>
<li><a href="http://research.microsoft.com/en-us/um/people/ryenw/">Ryen White</a>, Microsoft Research</li>
</ul>
<p><strong>Program Chair</strong></p>
<ul>
<li><a href="http://www.ils.unc.edu/~rcapra/">Rob Capra</a>, University of North Carolina at Chapel Hill</li>
</ul>
<p><strong>Local Arrangements Chair</strong></p>
<ul>
<li><a href="http://www.catherinelsmith.com/">Catherine Smith</a>, Rutgers University</li>
</ul>
<p><strong>Sponsors</strong></p>
<ul>
<li><a href="http://www.endeca.com/">Endeca</a></li>
<li><a href="http://www.google.com/">Google</a></li>
<li><a href="http://research.microsoft.com/">Microsoft Research</a></li>
</ul>
<p><strong>Background</strong></p>
<p><a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> combines research from the fields of human-computer interaction (HCI) and information retrieval (IR), placing an emphasis on human involvement in search activities. The HCIR workshop has run annually since 2007. The workshop unites academic researchers and industrial practitioners working at the intersection of HCI and IR to develop more sophisticated models, tools, and evaluation metrics to support activities such as interactive information retrieval and exploratory search. It provides an opportunity for attendees to informally share ideas via posters, small group discussions and selected short talks.</p>
<p><strong>New for 2010: the HCIR Challenge!</strong></p>
<p>New this year, we will be running the HCIR Challenge! The aim of the challenge is to encourage HCIR researchers and practitioners to build and demonstrate effective information access systems. Challenge participants will have no-cost access to a large <a href="https://groups.google.com/group/nytnlp/web/%20Corpus%20Overview%20Page">collection of almost two million newspaper articles</a> with rich metadata generously provided for use in this challenge by The New York Times. The focus of participation is building systems (or using existing ones) to help people search the collection interactively. Entries will be judged by an expert panel based on HCIR criteria (specifically: effectiveness, efficiency, control, transparency, guidance, fun) and also judged by workshop attendees at the event. More information on the challenge will be made available on the workshop website.</p>
<p><strong>Format</strong></p>
<p>We invite 4-page papers that will be reviewed by an international program committee. Papers fall into two categories: position papers describing an idea, an opinion, or early-stage research, and research papers describing a conducted research study, an implemented system, or a review of prior research. Papers will be judged based on relevance to HCIR. Idea diversity across all submissions may also be considered. The revised versions will be published on the workshop website. The workshop time will be used for what participants have told us that they found most valuable in previous events: posters and directed group discussions.</p>
<p>We will select 4-6 papers for presentation in a workshop panel. All other attendees are strongly encouraged to present posters during the morning &#8220;poster boaster&#8221; session.  Selected HCIR challenge papers will also have an opportunity to present their work orally at the event.</p>
<p>Our target is to have 50-75 participants.</p>
<p>Possible topics for discussion and presentation at the workshop include, but are not limited to:</p>
<ul>
<li>Novel interaction techniques for information retrieval.</li>
<li>Modeling and evaluation of interactive information retrieval.</li>
<li>Exploratory search and information discovery.</li>
<li>Information visualization and visual analytics.</li>
<li>Applications of HCI techniques to information retrieval needs in specific domains.</li>
<li>Ethnography and user studies relevant to information retrieval and access.</li>
<li>Scale and efficiency considerations for interactive information retrieval systems.</li>
<li>Relevance feedback and active learning approaches for information retrieval.</li>
</ul>
<p>Demonstrations of systems and prototypes are particularly welcome.</p>
<p><strong>Important Dates</strong></p>
<ul>
<li>Mon June 14, 2010: Submission deadline for position papers (midnight Pacific Time)</li>
<li>Fri July 16, 2010: Decisions sent to authors</li>
<li>Fri July 23, 2010: Deadline for accepted participants to register</li>
<li>Fri July 30, 2010: Submission deadline for camera-ready copies</li>
</ul>
<p><strong>Workshop Fees and Travel Support</strong></p>
<p>After careful consideration, we have decided to implement a $75 workshop fee. This will help offset the workshop costs and allow us to defray expenses for 1-2 graduate students.</p>
<p>We also appreciate the support of our corporate sponsors. It is our hope that the continuing success of this workshop will attract additional funding in future years. If your company or organization is interested in sponsoring travel scholarships, please let us know as soon as possible.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/29/cfp-hcir-2010/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Want a Quora Invite?</title>
		<link>http://thenoisychannel.com/2010/03/28/want-a-quora-invite/</link>
		<comments>http://thenoisychannel.com/2010/03/28/want-a-quora-invite/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 00:43:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3042</guid>
		<description><![CDATA[I have had 10 invites for Quora, a social search site launched earlier this year by a bunch of ex-Facebookers (including former CTO Adam D’Angelo and Charlie Cheever (who previously led Facebook Platform and Facebook Connect)&#8211;and which was just funded at an $86M valuation. Put your email address in a comment or communicated it to [...]]]></description>
			<content:encoded><![CDATA[<p>I <del datetime="2010-03-29T05:22:04+00:00"><strong>have</strong></del> <strong>had</strong> 10 invites for <a href="www.quora.com/">Quora</a>, a social search site launched earlier this year by a bunch of ex-Facebookers (including former CTO <a href="http://www.crunchbase.com/person/adam-d-angelo">Adam D’Angelo</a> and <a href="http://www.crunchbase.com/person/charlie-cheever">Charlie Cheever</a> (who previously led Facebook Platform and Facebook Connect)<strong>&#8211;and which was just <a href="http://techcrunch.com/2010/03/28/quora-has-the-magic-benchmark-invests-at-86-million-valuation/">funded at an $86M valuation</a></strong>. Put your email address in a comment or communicated it to me by some other means if you&#8217;d like one. I&#8217;m pretty sure all new users get 10 invites, so please share the love if I run out. Also, I believe you need to have a Facebook account in order to register (using Facebook Connect).</p>
<p>I&#8217;ll write more about Quora when I&#8217;ve had a chance to play with it myself.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/28/want-a-quora-invite/feed/</wfw:commentRss>
		<slash:comments>46</slash:comments>
		</item>
		<item>
		<title>The Evolution of Social Search</title>
		<link>http://thenoisychannel.com/2010/03/27/the-evolution-of-social-search/</link>
		<comments>http://thenoisychannel.com/2010/03/27/the-evolution-of-social-search/#comments</comments>
		<pubDate>Sat, 27 Mar 2010 21:00:07 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3022</guid>
		<description><![CDATA[Earlier this week, I had the good fortune to attend the New York Semantic Web Meetup, which featured three excellent presentations. I&#8217;ll confess that I primarily attended the event in order to learn more about open data platform Factual, particularly to see how it compares to Freebase and Google Squared. But I found the other [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_3572677" style="width: 490px;"><strong> </strong><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="490" height="410" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=socialsearch-nyc-semanticweb-meetup-mar2010-100327135947-phpapp02&amp;stripped_title=the-evolution-of-social-search" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="490" height="410" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=socialsearch-nyc-semanticweb-meetup-mar2010-100327135947-phpapp02&amp;stripped_title=the-evolution-of-social-search" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>
<p>
Earlier this week, I had the good fortune to attend the <a href="http://www.swnyc.org/index.php?title=New_York_Semantic_Web_Meetup">New York Semantic Web Meetup</a>, which featured three excellent presentations. I&#8217;ll confess that I primarily attended the <a href="http://semweb.meetup.com/25/calendar/12194716/">event</a> in order to learn more about open data platform <a href="http://www.factual.com/">Factual</a>, particularly to see how it compares to <a href="http://www.freebase.com/">Freebase</a> and <a href="http://www.google.com/squared">Google Squared</a>.</p>
<p>But I found the other two presentations, &#8220;<a href="http://www.slideshare.net/nitya/the-evolution-of-social-search">The Evolution of Social Search</a>&#8220; by<a href="http://www.google.com/profiles/114543493432407488859"> Nitya Narasimhan</a> and &#8220;<a href="http://www.designforcontext.com/publications/semweb-ui-pres/">User Interfaces for the Semantic Web</a>&#8221; by <a href="http://www.ipgems.com/ddegler/">Duane Degler</a>, even more compelling. Hopefully I&#8217;m not just biased because they both mentioned me in their slides!</p>
<p>I really wish Nitya&#8217;s session had been recorded as a video, but the slides embedded above will have to suffice for those who couldn&#8217;t see it live. Hopefully they communicate Nitya&#8217;s framing of the social search space. She does a great job of weaving together the various strands of social search: people as sensors promoting &#8220;real-time&#8221; content, social filters to reflect personalized notions of trust, and routers to leverage the collective intelligence of crowds. She gives tons of examples and links for further reading. Hopefully I&#8217;ll get her to give this talk again&#8211;with less stringent time constraints&#8211;and record it for online viewing.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/27/the-evolution-of-social-search/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>New Toys from Hunch</title>
		<link>http://thenoisychannel.com/2010/03/27/new-toys-from-hunch/</link>
		<comments>http://thenoisychannel.com/2010/03/27/new-toys-from-hunch/#comments</comments>
		<pubDate>Sat, 27 Mar 2010 20:02:33 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3016</guid>
		<description><![CDATA[I&#8217;ve been following Hunch for a while, and my impression has evolved from the initial skepticism with which I greeted it a year ago (to the day!). Given the track records of co-founders Caterina Fake and Chris Dixon, perhaps I should have expected their success at obtaining traffic and funding.  But what interests me more is that [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><img class="size-full wp-image-3018 aligncenter" style="border: 1px solid black;" title="Hunch" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/03/Picture-1.png" alt="" width="343" height="115" /></p>
<p><a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/03/Picture-1.png"></a>I&#8217;ve been following <a href="http://hunch.com/">Hunch</a> for a while, and my impression has evolved from the <a href="http://thenoisychannel.com/2009/03/27/i-gotta-hunch-youll-wanna-check-this-out/">initial skepticism</a> with which I greeted it a year ago (to the day!). Given the track records of co-founders <a href="http://www.caterina.net/about.html">Caterina Fake</a> and <a href="http://cdixon.org/about.html">Chris Dixon</a>, perhaps I should have expected their success at obtaining <a href="http://siteanalytics.compete.com/hunch.com/">traffic</a> and <a href="http://techcrunch.com/2010/03/14/hunch-takes-12-million-from-khosla-ventures-adds-former-facebook-cfo-to-board-of-directors/">funding</a>.  But what interests me more is that they are doing interesting things with data mining and putting a new twist on social media analytics.</p>
<p>For those unfamiliar with Hunch, it is a decision engine (cf. [<a href="http://www.google.com/search?q=real+decision+engine">real decision engine</a>] vs. [<a href="http://www.google.com/search?q=decision+engine">decision engine</a>]). For example, it can help you decide <a href="http://hunch.com/should-i-buy-the-apple-ipad/">whether to buy an iPad</a> or <a href="http://hunch.com/baby-girl-names/">how to name your baby</a>. While it&#8217;s not clear to me how much people are using Hunch for utility vs. entertainment, Hunch is certainly accumulating users&#8211;as well as the <a href="http://hunch.com/teach-hunch-about-you/">data</a> that those users volunteer.</p>
<p>Hunch recently released two applications that mash up that data with the Twitter follower graph. The first is a &#8220;<a href="http://hunch.com/games/twitter-predictor/">Twitter Predictor Game</a>&#8221; that attempts to calculate your taste profile from your Twitter id and then predict how you&#8217;ll answer Hunch&#8217;s taste questions. Just to keep the game honest, you can look at the Hunch&#8217;s guess either before or after you provide your answer. The second is called &#8220;<a href="http://hunch.com/twitter-followers/">Twitter Follower Stats</a>&#8220;: given a Twitter user, it reports the salient information it has inferred about that user&#8217;s followers (e.g., <a href="http://hunch.com/twitter-followers/maddow/">@maddow</a> vs. <a href="http://hunch.com/twitter-followers/karlrove/">@karlrove</a>).</p>
<p>I think this stuff is neat, and a great testament to the &#8220;<a href="http://thenoisychannel.com/2009/03/31/the-unreasonable-effectiveness-of-data/">unreasonable effectiveness of data</a>&#8220;. The question-answer data still feels a bit sparse for my taste, and I suspect there&#8217;s still room for more <a href="http://en.wikipedia.org/wiki/Dimension_reduction">dimensionality reduction</a>. I&#8217;m sure Hunch CTO <a href="http://mattgattis.com/about/about.html">Matt Gattis</a> and colleagues are working on it! Also, it would be neat to direct the follower analytics rather than simply see the ones that Hunch deems most salient.</p>
<p>In summary, Hunch is keeping it interesting. Definitely a startup to watch and learn from.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/27/new-toys-from-hunch/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Can We Build a Distributed Trust Network?</title>
		<link>http://thenoisychannel.com/2010/03/20/can-we-build-a-distributed-trust-network/</link>
		<comments>http://thenoisychannel.com/2010/03/20/can-we-build-a-distributed-trust-network/#comments</comments>
		<pubDate>Sun, 21 Mar 2010 02:38:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3005</guid>
		<description><![CDATA[Mathew Ingram posted an interview with Craig Newmark (the Craig of craigslist fame) in which the latter argued that what the web needs is a “distributed trust network” to manage our online reputations. As it happens, this is an idea that has occupied me for several years. So I figured it was about time that I shared my thoughts [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="390" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://blip.tv/play/AYHOqGwC" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="390" src="http://blip.tv/play/AYHOqGwC" allowfullscreen="true"></embed></object></p>
<p>Mathew Ingram posted an <a href="http://gigaom.com/2010/03/18/craig-newmark-on-the-webs-next-big-problem/">interview</a> with Craig Newmark (the <a href="http://www.craigslist.org/about/craig_newmark">Craig</a> of <a href="http://www.craigslist.org/">craigslist</a> fame) in which the latter argued that what the web needs is a “distributed trust network” to manage our online reputations. As it happens, this is an idea that has occupied me for several years. So I figured it was about time that I shared my thoughts on the subject.</p>
<p>When we think of how trust works online, two of the most prominent examples are Google&#8217;s <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a> measure and eBay&#8217;s <a href="http://pages.ebay.com/help/feedback/scores-reputation.html">feedback scores</a>. But neither of these measures addresses what I think Craig has in mind. PageRank is a great way of using citation analysis to determine the most authoritative citations, but the trust in a page should consider its out-links (i.e., can we trust the page not to point us to untrustworthy ones?) and not just its in-links. eBay&#8217;s feedback scores have a different problem: they count positive and negative ratings without considering the social network of buyers and sellers&#8211;and approach that is <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.72.138&amp;rep=rep1&amp;type=pdf">vulnerable to fraud through shill ratings</a>. Incidentally,<a href="http://linkedin.custhelp.com/cgi-bin/linkedin.cfg/php/enduser/std_adp.php?p_faqid=99">LinkedIn recommendations</a> have a similar weakness if viewed in strictly quantitative terms, but the potential for abuse is mitigated by the endorsements being signed&#8211;and by their being more than just binary or numerical ratings. Incidentally, here&#8217;s a <a href="http://endorser.org/">site</a> you can use if you&#8217;re too lazy to actually write the recommendations yourself.</p>
<p>But I digress. Propagation of trust does seem like the perfect application to build on top of social networks. Consider any problem that involves getting advice to inform a decision. If we regularly solicit advice from our first-degree connections, then we should be able to learn over time whose advice we can trust. We can then vouch for these connections, which offers the connections who trust us a basis for trusting their second-degree connections through us. And so forth through our social network. Of course, trust is not irrevocable: loss of trust should propagate similarly.</p>
<p>I&#8217;ve talked about this problem with two of the leading experts on social networks, <a href="http://www.cs.cornell.edu/home/kleinber/">Jon Kleinberg</a> and <a href="http://research.yahoo.com/Prabhakar_Raghavan">Prabhakar Raghavan</a>, and as far as I know no one has built a system along these principles. In economic terms, I envision a system where a person&#8217;s reputation truly is his or her coin. One person might think of bribing one another to exploit the latter&#8217;s established reputation, but a rational person with a strong reputation would demand an exorbitant bribe to put that reputation at risk.</p>
<p>Of course, a lot of information would have to propagate throughout the social network&#8211;and be stored&#8211;for this system to work. Regardless of how the information is abstracted, such a reputation index would raise thorny privacy issues. Nonetheless, I don&#8217;t know if we can build a reputation system that is entirely privacy-preserving&#8211;since reputation is an inherently public mechanism. In addition, any such system would have to consider the implications of <a href="http://en.wikipedia.org/wiki/Defamation">defamation</a> laws. These are some major hurdles!</p>
<p>Nonetheless, I agree wholeheartedly with Craig that a distributed trust network could be “the killingest of killer apps&#8221;. I just hope we can find a way to build and use it!</p>
<p><em>Note: <a href="http://twitter.com/communicating">Chris Rines</a> suggested I look at <a href="http://www.advogato.org/trust-metric.html">Advogato&#8217;s Trust Metric</a>, and a quick investigation led me to the Wikipedia entry for <a href="http://en.wikipedia.org/wiki/Trust_metric">trust metric</a>. Looks like I have some homework to do!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/20/can-we-build-a-distributed-trust-network/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Are Ashton Kutcher and Puff Daddy the Most Influential Twitter Users?</title>
		<link>http://thenoisychannel.com/2010/03/20/are-ashton-kutcher-and-puff-daddy-the-most-influential-twitter-users/</link>
		<comments>http://thenoisychannel.com/2010/03/20/are-ashton-kutcher-and-puff-daddy-the-most-influential-twitter-users/#comments</comments>
		<pubDate>Sat, 20 Mar 2010 23:32:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3001</guid>
		<description><![CDATA[In a post on ReadWriteWeb, Sarah Perez summarizes &#8220;Measuring User Inﬂuence in Twitter: The Million Follower Fallacy&#8220;, a recent research paper by Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna Gummadi. The punch line should hardly be surprising to regular readers here given my variety of rants on the subject: follower count isn&#8217;t great measure [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://an.kaist.ac.kr/~mycha/docs/icwsm2010_cha.pdf"><img class="alignnone" title="Cha et al, &quot;Measuring User Inﬂuence in Twitter: The Million Follower Fallacy&quot;" src="http://www.readwriteweb.com/images/top_100_influentials_on_twitter_chart.png" alt="" width="439" height="274" /></a></p>
<p>In a post on <a href="http://www.readwriteweb.com/archives/the_million_follower_fallacy_audience_size_doesnt_prove_influence_on_twitter.php">ReadWriteWeb</a>, Sarah Perez summarizes &#8220;<a href="http://an.kaist.ac.kr/~mycha/docs/icwsm2010_cha.pdf">Measuring User Inﬂuence in Twitter: The Million Follower Fallacy</a>&#8220;, a recent research paper by Meeyoung Cha, Hamed Haddadi, Fabricio Benevenuto, and Krishna Gummadi. The punch line should hardly be surprising to regular readers here given my variety of <a href="http://thenoisychannel.com/?s=twitter+number+of+followers">rants</a> on the subject: follower count isn&#8217;t great measure of influence.</p>
<p>The authors focus on measuring three quantities: followers (which they call indegree), retweets, and mentions. Their main results is that, while the number of followers is strongly correlated to the numbers of retweets and mentions for the general user population, the correlation is much weaker for the users with high follower counts, e.g., in the top 10%. Indeed, the authors believe that the correlation for the general population is &#8220;an artifact of the tied ranks among the least inﬂuential users, e.g., many of the least connected users also received zero retweet and mention.&#8221;</p>
<p>The authors further note that:</p>
<blockquote><p>Across all three measures, the top inﬂuentials were generally recognizable public ﬁgures and websites. Interestingly, we saw marginal overlap in these three top lists. These top-20 lists only had 2 users in common: Ashton Kutcher and Puff Daddy. The top-100 lists also showed marginal overlap, as shown in Figure 1, indicating that the three measures capture different types of inﬂuence.</p></blockquote>
<p>The authors ultimately conclude that:</p>
<ul>
<li>Follower count represents a user’s popularity, but is not related to notions of inﬂuence such as engaging audience, i.e., retweets and mentions.</li>
<li>Retweets are driven by the content value of a tweet, favoring mainstream news organizations.</li>
<li>Mentions are driven by the name value of the user, favoring celebrities.</li>
</ul>
<p>I can&#8217;t argue with any of the above, but I do wonder if any of them are ideal measures of influence. All three measures are easy to game&#8211;and none of them model the scarcity of user attention, which is the motivating principle of <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a>. Nor do they ground &#8220;influence&#8221; in any outcome external to Twitter.</p>
<p>Still, it&#8217;s an interesting negative result. If nothing else, it helps reinforce the argument that follower count isn&#8217;t a useful measure&#8211;at least once you get beyond the very low end of the range.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/20/are-ashton-kutcher-and-puff-daddy-the-most-influential-twitter-users/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Is Spontaneity Overrated?</title>
		<link>http://thenoisychannel.com/2010/03/14/is-spontaneity-overrated/</link>
		<comments>http://thenoisychannel.com/2010/03/14/is-spontaneity-overrated/#comments</comments>
		<pubDate>Sun, 14 Mar 2010 22:38:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2996</guid>
		<description><![CDATA[It used to be a surprise when people remembered our birthdays, but in the twenty-first century Facebook ensures that we will never forget a birthday. Does that make the happy birthday wishes any less sincere? Or is technology simply providing us with a cognitive assist and helping us express our sincere feelings? A related question is how [...]]]></description>
			<content:encoded><![CDATA[<div>
<p>It used to be a surprise when people remembered our birthdays, but in the twenty-first century Facebook ensures that we will <a href="http://blog.facebook.com/blog.php?post=38780477130 ">never forget a birthday</a>. Does that make the happy birthday wishes any less sincere? Or is technology simply providing us with a cognitive assist and helping us express our sincere feelings?</p>
<p>A related question is how we should react to solicited reviews&#8211;a topic that was the subject of a recent interview on <a href="http://blumenthals.com/blog/2010/03/12/asking-for-reviews-umovefree-finds-the-groove/ ">Mike Blumenthal&#8217;s blog</a>. To be clear: I&#8217;m not talking about businesses offering incentives to reviewers&#8211;most folks seem to agree that incented reviews are a bad idea. And let&#8217;s not get started on hiring <a href="http://thenoisychannel.com/2009/08/22/payola-theres-an-app-for-that/ ">interns</a> or <a href="http://thenoisychannel.com/2009/01/17/sell-your-integrity-for-065/ ">Turkers</a> to write them! Rather, the question is whether a review is less meaningful because it was solicited by a business  rather than spontaneously volunteered  by the reviewer.</p>
<p>I&#8217;m ambivalent. I don&#8217;t think the content of a solicited review is inherently insincere&#8211;after all, the reviewers have no reason to lie. In fact, soliciting a review from a disgruntled customer may annoy that customer enough to elicit one that is scathingly sincere!</p>
<p>Nonetheless, it&#8217;s hard to imagine why a business wouldn&#8217;t target a solicitation campaign at a sympathetic set of reviewers, given the opportunity to do so. Given that consumers put a fair amount of trust in aggregated reviews (as documented in the Forrester study about the &#8220;<a href="http://blogs.hbr.org/groundswell/2008/04/managing-online-reviews.html ">groundswell effect</a>&#8220;), skewing the population of reviewers can significantly stack the desk.</p>
<p>And even a uniform campaign to solicit reviews raises concerns. Research by <a href="http://www.u.arizona.edu/~yoliu/ ">Yong Liu</a> supports the adage that all buzz is good buzz&#8211;though in fairness I don&#8217;t know if he observed causality or just correlation. But I can extrapolate from personal experience that the number of reviews signals the popularity of a product or service. And I doubt I&#8217;m alone, given that <a href="http://www.yelp.com/ ">Yelp</a>, <a href="http://www.menupages.com/ ">MenuPages</a>, and other review sites let you filter or sort by the number of reviews. A successful campaign to solicit reviews, even if it doesn&#8217;t skew the polarity of the reviews, will at least inflate their quantity.</p>
<p>Still, where&#8217;s the harm? There&#8217;s nothing unethical in a business soliciting private or public feedback. And, back to the birthday example, I haven&#8217;t seen anyone upset by Facebook-prodded birthday greetings. Perhaps the online solicitation of reviews serves a similar &#8220;reminder&#8221; purpose, and we should simply accept its as part of our twenty-first century reality.</p>
<p>But consumers will need to re-calibrate their trust in reviews&#8211;or at least in what the numbers signal&#8211;if it turns out that a significant fraction of them are solicited. As Yelp CEO Jeremy Stoppelman pointed out in a <a href="http://officialblog.yelp.com/2010/03/different-day-different-lawyer-same-meritless-claim-a-classic-race-to-the-courthouse.html ">blog post</a> defending his company against recent legal action:</p>
<blockquote><p>If a business could garner a top rating on Yelp simply by soliciting 5-star reviews from friends, family, and favored customers, how useful would such a site be?</p></blockquote>
<p>While I don&#8217;t know enough to comment on the legal merits of the lawsuits (or the history of <a href="http://www.eastbayexpress.com/eastbay/yelp-and-the-business-of-extortion-20/Content?oid=1176635 ">allegations</a> that Yelp extorts advertising from businesses), I can understand how a proprietary <a href="http://officialblog.yelp.com/2009/10/why-yelp-has-a-review-filter.html ">review filter</a> is controversial and invites skepticism from businesses whose positive reviews are filtered or demoted&#8211;especially given that relevance ranking raises similar concerns. But I can also understand how making such a filter completely transparent could defeat its stated purpose: &#8221;to protect consumers and business owners from fake, shill or malicious reviews&#8221;. And Yelp does at least <a href="http://www.yelp.com/business/common_questions ">disclose</a> that it considers users&#8217; activity level as a signal in its filter.</p>
<p>But let&#8217;s face it: it&#8217;s hard to draw a clear distinction between a solicited responses and spontaneous ones. Review sites have never claimed to conduct scientific polls, and consumers should be sophisticated enough to expect some degree of sample bias. Moreover, the process does not have to be perfect in order to be useful to consumers&#8211;we learn to approach review sites with a calibrated level of cynicism.</p>
<p>Still, my hope is that consumers will start placing less stock in the aggregated opinions of anonymous strangers and shift their trust to people who are more transparent about their identities and motives. The more that reviewers stand behind their opinion and put their own integrity on the line, the less it will matter whether those opinions are solicited or spontaneously expressed. We&#8217;ll see how the opinion marketplace sorts this out.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/14/is-spontaneity-overrated/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Google and Transparency</title>
		<link>http://thenoisychannel.com/2010/03/07/google-and-transparency/</link>
		<comments>http://thenoisychannel.com/2010/03/07/google-and-transparency/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 21:34:12 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2990</guid>
		<description><![CDATA[Let me preface this post with a clear disclaimer: I work at Google, but the views I express on this blog are my own personal views. Last week, Google head of webspam Matt Cutts posted a full-throated defense of Google&#8217;s transparency on Google&#8217;s European Policy Blog in response to complaints that a few companies raised [...]]]></description>
			<content:encoded><![CDATA[<p>Let me preface this post with a clear disclaimer: I work at Google, but the views I express on this blog are my own personal views.</p>
<p>Last week, Google head of webspam <a href="http://www.mattcutts.com/">Matt Cutts</a> posted a full-throated <a href="http://googlepolicyeurope.blogspot.com/2010/03/google-transparency-and-our-not-so.html">defense of Google&#8217;s transparency</a> on Google&#8217;s <a href="http://googlepolicyeurope.blogspot.com/">European Policy Blog</a> in response to complaints that a few companies raised to the European Commission. Long-time readers of my blog know that I&#8217;m a big fan of search engine transparency and have made my own calls on this blog for Google to be more transparent. The fact that I <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">work at Google</a> now doesn&#8217;t change my values. But being on the inside has informed my perspective.</p>
<p>In particular, as Matt elaborates in his post, Google deserves more credit for transparency than it often gets from its critics. For example, Google has published:</p>
<ul>
<li>&#8220;<a href="http://ilpubs.stanford.edu:8090/361">The Anatomy of a Large-Scale Hypertextual Web Search Engine</a>&#8220;, which not only details the formula for <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a>, but also mentions other signals that Google uses to rank search results: <a href="http://en.wikipedia.org/wiki/Anchor_text">anchor text</a>, location of query terms within documents, proximity of query terms, etc.</li>
<li>details of its key infrastructure innovations: <a href="http://labs.google.com/papers/mapreduce.html">MapReduce</a>, the <a href="http://labs.google.com/papers/gfs.html">Google File System</a>, <a href="http://labs.google.com/papers/bigtable.html">Bigtable</a>, and <a href="http://code.google.com/p/protobuf/">Protocol Buffers</a></li>
<li><a href="http://code.google.com/p/protobuf/"></a>hundreds of <a href="http://research.google.com/pubs/papers.html">research papers</a> by Googlers in diverse areas of computer science</li>
</ul>
<p>He goes onto describe the various webmaster tools and social media resources that Google has made available. The popularity of these tools is a testament to their utility.</p>
<p>Still, as Matt points out:</p>
<blockquote><p>we don&#8217;t think it&#8217;s unreasonable for any business to have some trade secrets, not least because we don’t want to help spammers and crackers game our system. If people who are trying to game search rankings knew every single detail about how we rank sites, it would be easier for them to &#8216;spam&#8217; our results with pages that are not relevant and are frustrating to users &#8212; including porn and malware sites.</p></blockquote>
<p>As I blogged <a href="http://thenoisychannel.com/2008/04/08/qa-with-amit-singhal/">back in 2008</a>, I still hope that someday we won&#8217;t need to have to rely on a relevance analog of <a href="http://en.wikipedia.org/wiki/Security_through_obscurity">security through obscurity</a> in order to deter spam and abusive <a href="http://en.wikipedia.org/wiki/Search_engine_optimization">SEO</a> practices. But I recognize that we haven&#8217;t developed such an analog, and hence that complete transparency today for web search ranking algorithms would have a far greater downside than upside for ordinary users.</p>
<p>I suspect that a prerequisite for complete transparency in search requires moving from a ranking-based retrieval approach to a <a href="http://thenoisychannel.com/2008/08/24/set-retrieval-vs-ranked-retrieval/">set-based approach</a>. For many web search information needs (e.g., <a href="http://en.wikipedia.org/wiki/Web_search_query">navigational queries</a>), it&#8217;s hard to see how users would benefit from such a radical change. For queries that represent more <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory</a> information needs, a set-based approach would be (at least in my view) far preferable to one based on ranking. But there&#8217;s <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/">a lot of work</a> to do on the content side before such exploratory interfaces for the web are usable.</p>
<p>In summary, I&#8217;m happy to see Matt taking a public stand in Google&#8217;s defense. I don&#8217;t always agree with my employer&#8217;s decisions, but I do believe that my colleagues act in good faith and with good intentions. I understand how many people&#8211;especially site owners&#8211;fixate on whatever Google keeps secret. In a world where so many people compete for attention, information is power. Google tries to provide maximum quality to users while keeping the playing field level for site owners. As Google Fellow <a href="http://singhal.info/">Amit Singhal</a> points out, &#8220;<a href="http://googlepolicyeurope.blogspot.com/2010/02/this-stuff-is-tough.html">this stuff is tough</a>&#8220;.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/07/google-and-transparency/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Not All Queries Are Created Equal</title>
		<link>http://thenoisychannel.com/2010/03/07/not-all-queries-are-created-equal/</link>
		<comments>http://thenoisychannel.com/2010/03/07/not-all-queries-are-created-equal/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 15:55:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2987</guid>
		<description><![CDATA[A topic with which I developed an obsession in my last few years at Endeca is understanding how to predict query difficulty and performance&#8211;performance in the information retrieval sense meaning results quality, not computational efficiency. If only we knew how well a search engine would do&#8211;or did&#8211;in meeting the user&#8217;s information need, we might adapt [...]]]></description>
			<content:encoded><![CDATA[<p>A topic with which I developed an obsession in my last few years at <a href="http://endeca.com/">Endeca</a> is understanding how to predict query difficulty and performance&#8211;performance in the <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> sense meaning results quality, not computational efficiency. If only we knew how well a search engine would do&#8211;or did&#8211;in meeting the user&#8217;s information need, we might adapt the user experience to reflect our degree of confidence.</p>
<p>I was particularly interested in work related to the query clarity score initially proposed by Steve Cronen-Townsend, Yun Zhou, and <a href="http://ciir.cs.umass.edu/personnel/croft.html">Bruce Croft</a> in a 2002 paper entitled &#8220;<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.4506&amp;rep=rep1&amp;type=pdf">Predicting Query Performance</a>&#8220;. But there is a wide variety of work in this area, including methods to predict performance either before or after results retrieval.</p>
<p>Happily, <a href="http://www.vf.utwente.nl/~hauffc/">Claudia Hauff</a> just published a dissertation on this topic, entitled &#8220;<a href="http://eprints.eemcs.utwente.nl/17338/">Predicting the Effectiveness of Queries and Retrieval Systems</a>&#8220;. It is very well written, and I recommend it to anyone interested in learning more about this subject. She presents not only her own original research, but also a comprehensive analysis of others&#8217; efforts.</p>
<p>Here is an excerpt from the abstract:</p>
<blockquote><p>In this thesis we consider users&#8217; attempts to express their information needs through queries, or search requests and try to predict whether those requests will be of high or low quality. Intuitively, a query&#8217;s quality is determined by the outcome of the query, that is, whether the retrieved search results meet the user&#8217;s expectations. The second type of prediction methods under investigation are those which attempt to predict the quality of search systems themselves. Given a number of search systems to consider, these methods estimate how well or how poorly the systems will perform in comparison to each other.</p></blockquote>
<p>I look forward to seeing researchers continue to build on these results, and I am excited for the day when search engines are more reflective on their own strengths and weakness.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/07/not-all-queries-are-created-equal/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>HCIR 2010: A Pre-Announcement</title>
		<link>http://thenoisychannel.com/2010/03/07/hcir-2010-a-pre-announcement/</link>
		<comments>http://thenoisychannel.com/2010/03/07/hcir-2010-a-pre-announcement/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 05:41:12 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2983</guid>
		<description><![CDATA[We&#8217;re gearing up to officially announce the HCIR 2010 workshop, but I wanted to give folks here a heads up, as well as to put out a call for a volunteer. The Fourth Workshop on Human-Computer Interaction and Information Retrieval will take place on August 22nd at Rutgers University in New Brunswick, NJ. It will [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;re gearing up to officially announce the <a href="http://www.hcir2010.org/">HCIR 2010</a> workshop, but I wanted to give folks here a heads up, as well as to put out a call for a volunteer.</p>
<p>The Fourth Workshop on Human-Computer Interaction and Information Retrieval will take place on August 22nd at Rutgers University in New Brunswick, NJ. It will be an independent workshop as in previous years, but this year we are co-locating it with the Third Information Interaction in Context conference (<a href="http://www.iiix2010.org/">IIiX 2010</a>).</p>
<p>We&#8217;ve already lined up Google&#8217;s <a href="http://sites.google.com/site/dmrussell/ ">Dan Russell</a> as a keynote speaker and are close to circulating a call for participation. We&#8217;re also planning to introduce something new to the workshop this year: an HCIR challenge! Participants will build applications around a specific data set that demonstrate the use of HCIR techniques. We&#8217;ll announce the data set and the challenge details as soon as we&#8217;ve confirmed the licensing details.</p>
<p>Meanwhile, we&#8217;re looking for a volunteer to help us build a baseline index for the challenge data set. Participants will be allowed&#8211;but not required&#8211;to use this index as a starting point for their entries. The volunteer should be comfortable using open-source packages <a href="http://lucene.apache.org/java/docs/">Lucene</a> or <a href="http://lucene.apache.org/solr/">Solr</a>. If you are interested in being that volunteer, please let me know, and I&#8217;ll be happy to share more details.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/03/07/hcir-2010-a-pre-announcement/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>You Can&#8217;t Hurry Relevance</title>
		<link>http://thenoisychannel.com/2010/02/28/you-cant-hurry-relevance/</link>
		<comments>http://thenoisychannel.com/2010/02/28/you-cant-hurry-relevance/#comments</comments>
		<pubDate>Sun, 28 Feb 2010 20:07:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2975</guid>
		<description><![CDATA[Lately, I&#8217;ve been musing about the Herb Simon quote that launched&#8211;or at least popularized&#8211;the concepts of information overload and attention economics: in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of [...]]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">Lately, I&#8217;ve been musing about the <a href="http://en.wikipedia.org/wiki/Herbert_Simon">Herb Simon</a> quote that launched&#8211;or at least popularized&#8211;the concepts of information overload and <a href="http://en.wikipedia.org/wiki/Attention_economy">attention economics</a>:</div>
<blockquote>
<div id="_mcePaste">in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it (Simon, 1971)</div>
</blockquote>
<p>I hope everyone agrees that attention is a scarce good. But I&#8217;m curious how people measure it. After all, if we&#8217;re going to talk about an economic good being scarce, we ought to quantify it!</p>
<p>One approach is to measure attention at a specific moment in time, measuring how much of our instantaneous <a href="http://en.wikipedia.org/wiki/Cognitive_load">cognitive capacity</a> we devote to a task. This approach is useful for evaluating a user interface&#8211;in particular, for determining how users allocate their attention among the various interface elements. Another approach is to measure attention in units of time, e.g., how many of our waking hours do we devote to a particular activity. This latter strikes me as more of what Herb Simon had in mind.</p>
<p>We can interpret the two definitions as equivalent&#8211;after all, cumulative attention devoted to a task is simply the sum (or integral) of instantaneous attention over time. But thinking this way so misses a key consideration: we pay a significant price for <a href="http://www.joelonsoftware.com/articles/fog0000000022.html">context switching</a>.</p>
<p>A familiar example is email. The total time we spend reading email is a productivity concern, but the larger concern for many of us is the frequency with which email causes us to interrupt our workflow. Knowing this, I made a <a href="http://thenoisychannel.com/2008/12/06/overwhelmed-by-email/ ">brief attempt</a> in 2008 to check email only once a day. Unfortunately, this approach would have violated too many of my peers&#8217; expectations. I returned to status quo, reading my email (or at least scanning headers) as it arrives. Other messaging tools, such as instant messaging and Twitter, only add to the challenge of managing our personal communication flow.</p>
<p>Of course, what I really want is for my messaging tools to distinguish urgent messages from non-urgent ones, and to only interrupt my workflow for the former. I know that no system, whether based on manual filtering or algorithmic analysis, can make this subjective classification with 100% accuracy, but I&#8217;d certainly accept a handful of false positives in exchange for far fewer interruptions. I suspect I&#8217;m not alone.</p>
<p>Moreover, this approach extends beyond personal communications to more public ones, such as social media platforms and even web search. On one hand, the passing of time offers an opportunity to accumulate reliable content analysis; on the other hand, we don&#8217;t want to miss time-sensitive content just because the system waited too long to determine the content&#8217;s relevance to our information needs. Still, the low <a href="http://en.wikipedia.org/wiki/Signal-to-noise_ratio">signal-to-noise ratio</a> on social media platforms suggests to me that many information consumers would be amenable to a different tradeoff than the one we experience today.</p>
<p>What I&#8217;d really like to see is systems take advantage of the differences in users&#8217; personal senses of urgency. Some examples:</p>
<ul>
<li>A widely broadcast email isn&#8217;t delivered all at once, but first goes to users with higher urgency settings. Because those users mark it as spam, the email is already marked as spam for users with lower urgency settings. Conversely, if enough high-urgency users mark it as important, then it may be sent to lower-urgency users sooner.</li>
<li>High-urgency users frequently check news sites and blogs. If an article attract a threshold level of engagement from high-urgency users, then low-urgency users are notified. This approach could apply to general news or to news in a specific topic that the user follows.</li>
<li>Same as above, but applied to activity feeds and based on engagement within your social network. But again, high-urgency users lead the way, seeing updates sooner but at the price of experiencing a noisier stream.</li>
</ul>
<p>To some extent, our existing systems already approximate this approach. Mechanisms like favoriting and re-tweeting propagate signal from information scouts to their followers, as do algorithms that rank real-time information based on engagement. Still, as an information consumer, I&#8217;d appreciate an interface that explicitly and transparently adapts to my priorities, and that manages interruption of my workflow accordingly.</p>
<p>What do folks here think? Is information delayed tantamount to information denied? Or is time on our side, potentially offering us a better tradeoff than the one we experience today?</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/28/you-cant-hurry-relevance/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Holding Back the Rise of the Machines?</title>
		<link>http://thenoisychannel.com/2010/02/20/holding-back-the-rise-of-the-machines/</link>
		<comments>http://thenoisychannel.com/2010/02/20/holding-back-the-rise-of-the-machines/#comments</comments>
		<pubDate>Sat, 20 Feb 2010 20:22:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2969</guid>
		<description><![CDATA[Amazon&#8217;s Mechanical Turk is one of my favorite examples of leveraging the internet for innovation: Amazon Mechanical Turk is a marketplace for work that requires human intelligence. The Mechanical Turk web service enables companies to programmatically access this marketplace and a diverse, on-demand workforce. Developers can leverage this service to build human intelligence directly into [...]]]></description>
			<content:encoded><![CDATA[<p>Amazon&#8217;s <a href="http://aws.amazon.com/mturk/">Mechanical Turk</a> is one of my favorite examples of leveraging the internet for innovation:</p>
<blockquote><p>Amazon Mechanical Turk is a marketplace for work that requires human intelligence. The Mechanical Turk web service enables companies to programmatically access this marketplace and a diverse, on-demand workforce. Developers can leverage this service to build human intelligence directly into their applications.</p></blockquote>
<p>But, in my view, Mechanical Turk does not take its vision far enough. In the <a href="https://www.mturk.com/mturk/conditionsofuse ">conditions of use</a>, Amazon makes it clear that only human participation need apply: &#8220;you will not use robots, scripts or other automated methods to complete the Services&#8221;.</p>
<p>On one hand, I can understand that Amazon&#8217;s vision for Mechanical Turk, like <a href="http://www.cs.cmu.edu/~biglou/">Luis Von Ahn</a>&#8216;s &#8220;<a href="http://www.gwap.com/gwap/">games with a purpose</a>&#8220;, explicitly aims to apply human intelligence to tasks where automated methods seem inadequate. On the other hand, what are automated methods but encapsulations of human methods? It seems odd for Amazon to be so particular about the human / machine distinction, especially given that terms of service impose practically no other constraints on execution (beyond the obvious legal ones), Moreover, Mechanical Turk offers developers a variety of ways to assure quality (redundancy, qualification tests, etc.).</p>
<p>Granted, there are some important concerns that would have to be addressed if Amazon were to relax the &#8220;humans-only&#8221; constraint. For example, a developer today can reasonably assume that two different human &#8220;Providers&#8221; execute tasks independently. With automated participation, there&#8217;s a far greater risk of dependence&#8211;e.g., from multiple programmers applying the same algorithms. This possibility would have to be taken into account in quality assurance.</p>
<p>Still, the benefits of allowing automated participants would seem to far outweigh the risks. At pennies a task, Mechanical Turk has a limited appeal to the human labor force&#8211;indeed, research by <a href="http://pages.stern.nyu.edu/~panos/">Panos Ipeirotis</a> suggests that Amazon&#8217;s revenue from the service may be so law that it<a href="http://behind-the-enemy-lines.blogspot.com/2009/03/mechanical-turk-profitable-or-not.html  "> doesn&#8217;t even cover the costs of a single dedicated developer</a>!</p>
<p>In contrast, there&#8217;s evidence that programmers would take an interest in participation, were it an option. Marketplaces like <a href="http://www.topcoder.com/ ">TopCoder</a> and competitions like the <a href="http://www.netflixprize.com/ ">Netflix Prize</a> suggest that computer scientists take an interest in proving their mettle in many of the kinds of <a href="http://aws.amazon.com/mturk/#bus-case  ">tasks</a> for which organizations already use Mechanical Turk.</p>
<p>So, why not give algorithms a chance? Surely we&#8217;re not that afraid of <a href="http://en.wikipedia.org/wiki/Skynet_(Terminator) ">Skynet</a> or the &#8220;<a href="http://en.wikipedia.org/wiki/Technological_singularity">technological singularity</a>&#8220;. Let&#8217;s give machines&#8211;and their programmers&#8211;a chance to show off the best of both worlds!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/20/holding-back-the-rise-of-the-machines/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Guest Demo: Eric Iverson&#8217;s Itty Bitty Search</title>
		<link>http://thenoisychannel.com/2010/02/16/guest-demo-eric-iversons-itty-bitty-search/</link>
		<comments>http://thenoisychannel.com/2010/02/16/guest-demo-eric-iversons-itty-bitty-search/#comments</comments>
		<pubDate>Tue, 16 Feb 2010 05:09:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2960</guid>
		<description><![CDATA[I&#8217;m back from vacation, and still digging my way out of everything that&#8217;s piled up while I&#8217;ve been offline. While I catch up, I thought I&#8217;d share with you a demo that Eric Iverson was gracious enough to share with me. It uses Yahoo! BOSS to support an exploratory search experience on top of a general [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m back from vacation, and still digging my way out of everything that&#8217;s piled up while I&#8217;ve been offline.</p>
<p>While I catch up, I thought I&#8217;d share with you a demo that <a href="http://www.linkedin.com/in/newledge">Eric Iverson</a> was gracious enough to share with me. It uses <a href="http://developer.yahoo.com/search/boss/">Yahoo! BOSS</a> to support an <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> experience on top of a general web search engine.</p>
<p>When you perform a query, the application retrieves a set of related term candidates using Yahoo&#8217;s <a href="http://thenoisychannel.com/2008/11/18/yahoo-boss-now-with-key-terms/">key terms</a> API. It then scores each term by dividing its occurrence count within the result set by its global occurrence count&#8211;a relevance measure similar to one my former colleagues and I used at <a href="http://endeca.com/">Endeca</a> in enterprise contexts.</p>
<p>You can try out the demo yourself at <a href="http://www.ittybittysearch.com/">http://www.ittybittysearch.com/</a>. While it has rough edges, it produces nice results&#8211;especially considering the simplicity of the approach.</p>
<p>Here&#8217;s an example of how I used the application to explore and learn something new. I started with [<a href="http://www.ittybittysearch.com/?term=%22information+retrieval%22&amp;submit=Search">"information retrieval"</a>]. I noticed &#8220;interactive information retrieval&#8221; as a top term, so I used it to <a href="http://www.ittybittysearch.com/?term=%22information+retrieval%22+%22interactive+information+retrieval%22">refine</a>. Most of the refinement suggestions looked familiar to me&#8211;but an unfamiliar name caught my attention: &#8220;Anton Leuski&#8221;. Following my curiosity, I <a href="http://www.ittybittysearch.com/?term=%22information+retrieval%22+%22interactive+information+retrieval%22+%22anton+leuski%22">refined</a> again. Looking at the results, I immediately saw that Leuski had done work on <a href="http://lrd.yahooapis.com/_ylc=X3oDMTU4NzFrZ29uBF9TAzIwMjMxNTI3MDIEYXBwaWQDVTV1Q3VsZlYzNEVFeHBQMHhHakdvVmt5bDVwWXFUS1d0TnVXZjVrVFU4TGQ4WkdOWHRPaHNrdk5kRlFmS2dVLQRjbGllbnQDYm9zcwRzZXJ2aWNlA0JPU1MEc2xrA3RpdGxlBHNyY3B2aWQDRVE5aW1XS0ljcnJ2MWlMZXl0RmZVWGdUcmNrYXFFdDZKUXNBQzdzMQ--/SIG=123bqc174/**http%3A//parnec.nuaa.edu.cn/xtan/IIR/readings/cikmLeuski2001.pdf">evaluating document clustering for interactive information retrieval</a>. Further exploration made it clear this is someone whose work I should get to know&#8211;check out his <a href="http://people.ict.usc.edu/~leuski/">home page</a>!</p>
<p>I can&#8217;t promise that you&#8217;ll have as productive an experience as I did, but I encourage you to try Eric&#8217;s demo. It&#8217;s simple examples like these that remind me of the value of pursuing <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> for the open web.</p>
<p>Speaking of which, <a href="http://www.hcir2010.org/">HCIR 2010</a> is in the works. We&#8217;ll flesh out the details over the next weeks, and of course I&#8217;ll share them here.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/16/guest-demo-eric-iversons-itty-bitty-search/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Vacation</title>
		<link>http://thenoisychannel.com/2010/02/06/vacation/</link>
		<comments>http://thenoisychannel.com/2010/02/06/vacation/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 22:54:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2956</guid>
		<description><![CDATA[Just letting readers know that I&#8217;ll be on vacation for the next week. If you are starved for reading materials, check out some of the blogs I read.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.culebrabeachrental.com/index.htm"><img class="alignnone" title="Culebra, Puerto Rico" src="http://www.culebrabeachrental.com/images/fb_19.jpg" alt="" width="360" height="270" /></a></p>
<p>Just letting readers know that I&#8217;ll be on vacation for the next week. If you are starved for reading materials, check out some of the <a href="http://thenoisychannel.com/category/blogs-i-read/">blogs I read</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/06/vacation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WSDM 2010: Day 3</title>
		<link>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-3/</link>
		<comments>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-3/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 22:24:36 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2952</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM. Today is the last day of WSDM 2010, and I unfortunately spent it at home drinking chicken soup. But I&#8217;ve been following the conference via the proceedings and tweets. The day started with a short session on temporal interaction. Topics included clustering social media documents (e.g., Flickr photos) based on their association [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: this post is cross-posted at <a href="http://cacm.acm.org/blogs/blog-cacm/72149-wsdm-2010-day-3/fulltext">BLOG@CACM</a>.<br />
</em></p>
<p>Today is the last day of <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>, and I unfortunately spent it at home drinking chicken soup. But I&#8217;ve been following the conference via the <a href="http://www.wsdm-conference.org/2010/proceedings/ ">proceedings</a> and <a href="http://search.twitter.com/search?q=%23wsdm2010 ">tweets</a>.</p>
<p>The day started with a short session on temporal interaction. Topics included clustering social media documents (e.g., <a href="http://www.flickr.com/">Flickr</a> photos) based on their association with events, statistical tests for early identification of popular social media content, and analysis of answers sites (like <a href="http://answers.yahoo.com/">Yahoo! Answers</a>) as evolving two-sided economic markets.</p>
<p>The next session focused on advertising. Two papers focused on click prediction: one proposing an <a href="http://www.scholarpedia.org/article/Bayesian_statistics">Bayesian</a> inference model to better predict click-throughs in the tail of the ad distribution; the other presenting a framework for personalized click models. Another paper addressed the closely related problem of predicting ad relevance. The remaining papers discussed other aspects of search advertising: one on estimating the value per click for channels like <a href="http://www.google.com/services/adsense_tour/index.html">Google AdSense</a>, where ad inventory is supplied by a third party; the other proposing an algorithmic approach to automate online ad campaigns based on<a href="http://en.wikipedia.org/wiki/Landing_page">landing page</a> content.</p>
<p>The following session was on systems and efficiency, a popular topic given the immense data and traffic associated with web search. Two papers proposed approaches to help short-circuit ranking computations: one by optimizing the organizations of <a href="http://en.wikipedia.org/wiki/Inverted_index">inverted index</a> entries to consider both the static ranks of documents and the upper bounds of term scores for all terms contained in each document; the other using early-exit strategies to optimize <a href="http://en.wikipedia.org/wiki/Ensemble_learning">ensemble-based machine learning</a> algorithms. Another used machine learning to mine rules for de-duplicating web pages based on URL string patterns. Another focused on compression, showing that web content is at least an order of magnitude more compressible that what can be achieved by <a href="http://en.wikipedia.org/wiki/Gzip">gzip</a>. The last paper proposed a method to perform efficient distance queries on graph (i.e., web graphs or social graphs) by pre-computing a collection of node-centered subgraphs.</p>
<p>The last session of the conference discussed various topics in web mining. One presented a system for identifying distributed search bot attacks. Another proposed an image search method using a combination of entity information and visual similarity. The final paper showed that shallow text features can be used for low-cost detection of boilerplate text in web documents.</p>
<p>All in all, WSDM 2010 was an excellent conference, and I&#8217;m sad to not to have been able to attend more of it in person. I&#8217;m delighted to see an even mix of academic and industry representatives sharing ideas and working to make the web a better place for information access.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-3/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WSDM 2010: Day 2</title>
		<link>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-2/</link>
		<comments>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-2/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 04:00:53 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2949</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM. Unfortunately, I woke up this morning rather under the weather, so I&#8217;m having to resort to remotely reporting on the second day of WSDM 2010 conference, based on the published proceedings and the tweet stream. The day started with a keynote from Harvard economist Susan Athey. Her research focuses on the [...]]]></description>
			<content:encoded><![CDATA[<p><i>Note: this post is cross-posted at <a href="http://cacm.acm.org/blogs/blog-cacm/71927-wsdm-2010-day-2/fulltext">BLOG@CACM</a>.</i></p>
<p>Unfortunately, I woke up this morning rather under the weather, so I&#8217;m having to resort to remotely reporting on the second day of <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a> conference, based on the published proceedings and the <a href="http://twitter.com/#search?q=%23wsdm2010">tweet stream</a>.</span></em></p>
<p>The day started with a keynote from Harvard economist <a href="http://kuznets.fas.harvard.edu/~athey/">Susan Athey</a>. Her research focuses on the design of auction-based markets, a topic core to the business of search which largely relies on auction-based advertising models (cf. <a href="http://en.wikipedia.org/wiki/AdWords">Google AdWords</a>). Then came a session focused on learning and optimization. One paper proposed a method to learn ranking functions and query categorization simultaneously, reflecting that different categories of queries leads users to have different expectations about ranking. Another combined traditional list-based ranking with pair-wise comparisons between results to separate the results into tiers reflecting grades of relevance. An intriguing approach to query recommendation treated it as an optimization problem, perturbing users’ query-reformulation path to maximize the expected value of a utility function over the search session. Another paper looked not at ranking per se, but rather at improving the quality of training data for using machine learning for ranking. The final paper of the session, which earned a best-paper nomination, modeled document relevance based not on click-through behavior, but rather on post-click user behavior.</p>
<p>The next session was about users and measurement. It opened with another best-paper nominee: a analysis of over a hundred million users to understand how they re-find web content. Another offered a rigorous analysis of the often sloppily presented &#8220;<a href="http://en.wikipedia.org/wiki/Long_Tail">long-tail</a>&#8221; hypothesis: it found that light users disproportionately prefer content at the head of distribution while heavy users disproportionately prefer the tail. Another log-analysis paper analyzed search logs using a partially observable Markov model, a variant of the<a href="http://en.wikipedia.org/wiki/Hidden_Markov_model">hidden Markov model</a> in which not all of the hidden state transitions emit observable events&#8211;and compared the latent variables with eye-tracking studies. An intriguing study demonstrated that user behavior models are more predictive of goal success than models based on document relevance. The final paper of the session proposed methods for quantifying the reusability of the test collections that lie at the heart of information retrieval evaluation.</p>
<p>The last session of the day focused on social aspects of search. Two of the papers were concerned with modeling authority and influence in social networks, a problem in which I take a deep <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">personal interest</a>. Another inferred attributes of social network users based on those of other users in their communities (cg. MIT&#8217;s <a href="http://www.boston.com/bostonglobe/ideas/articles/2009/09/20/project_gaydar_an_mit_experiment_raises_new_questions_about_online_privacy/">Project Gaydar</a>). Another analyzed <a href="http://www.flickr.com/">Flickr</a> and <a href="http://www.last.fm/">Last.fm</a> user logs to show that users&#8217; semantic similarity based on their tagging behavior is predictive of social links. The final paper tackled the sparsity of social media tags by inferring latent topics from shared tags and spatial information.</p>
<p>Not surprisingly, a disproportionate number of contributors to the conference work at major web search companies, who have both the motivation to improve results and the access to data that is needed for such research. One of the ongoing research challenges for the field is to find ways to make this data available to others while respecting the business concerns of search engine companies and the privacy concerns of their users.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WSDM 2010: Day 1</title>
		<link>http://thenoisychannel.com/2010/02/05/wsdm-2010-day-1-2/</link>
		<comments>http://thenoisychannel.com/2010/02/05/wsdm-2010-day-1-2/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 05:52:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2946</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM. Today was the first day of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), held at the Polytechnic Institute of NYU in Brooklyn, NY. WSDM is a young conference that has already become a top-tier publication venue for research in these areas. In [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: this post is cross-posted at </em><a href="http://cacm.acm.org/blogs/blog-cacm"><em>BLOG@CACM</em></a><em>.</em></p>
<p>Today was the first day of the Third ACM International Conference on Web Search and Data Mining (<a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>), held at the Polytechnic Institute of NYU in Brooklyn, NY. WSDM is a young conference that has already become a top-tier publication venue for research in these areas. In contrast to some of the larger conferences, WSDM is single-track and feels more intimate and coherent&#8211;even with over 200 attendees.</p>
<p>The day started with an ambitious keynote by <a href="http://www.cse.iitb.ac.in/~soumen/">Soumen Chakrabarti</a> (IIT Bombay): &#8220;Bridging the Structured Un-Structured Gap&#8221;. He described a soup-to-nuts architecture to annotate web documents and perform complex reasoning on them using a structured query language. But perhaps this ambitious approach is a practical one: it uses the web we have&#8211;as opposed to waiting for the semantic web to emerge&#8211;and there is a prototype using half a billion documents.</p>
<p>The first paper session focused on web search. Of the five papers, two emphasized temporal aspects of content, one considered social media recommendation, and one focused on identifying concepts in multi-word queries. The last paper of the session proposed using anchor text as a more widely available input than query logs to support the query reformulation process. It also attracted the most audience attention&#8211;while<a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">interaction</a> is often a niche at information retrieval conferences, it always elicits strong interest and opinions.</p>
<p>The following session focused on tags and recommendations. Some take-aways: users produce tags similar to the topics designed by experts; individual &#8220;personomies&#8221; can be translated into aggregated folksonomies; matrix factorization methods can produce interpretable recommendations.</p>
<p>The last session of the day covered information extraction. One of the papers used pattern-based information extraction approaches, demonstrating how far we&#8217;ve come since <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>&#8216;s <a href="http://people.ischool.berkeley.edu/~hearst/papers/coling92.pdf">seminal work</a> on the subject. Another offered a SQL-like system for typed-entity search, complete with a live, publicly accessible prototype. The final paper addressed an issue the came up repeatedly at the <a href="http://cacm.acm.org/blogs/blog-cacm/71444-third-workshop-on-search-and-social-media-ssm-2010/fulltext">SSM workshop</a>: the problem of distilling the truth from a collection of inconsistent sources.</p>
<div>After a full day of talks, we headed to <a href="http://www.theparknyc.com/">The Park</a> for an excellent banquet. I&#8217;m looking forward to another two days of great sessions.</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/05/wsdm-2010-day-1-2/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Report on the Third Workshop on Search and Social Media (SSM 2010)</title>
		<link>http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/</link>
		<comments>http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 08:25:01 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2936</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM. It is my pleasure to report on the 3rd Annual Workshop on Search in Social Media (SSM 2010), a gathering of information retrieval and social media researchers and practitioners in an area that has captured the interest of computer scientists, social scientists, and even the broader public. The one-day [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: this post is cross-posted at <a href="http://cacm.acm.org/blogs/blog-cacm/71444-third-workshop-on-search-and-social-media-ssm-2010/fulltext">BLOG@CACM</a>.</em></p>
<p>It is my pleasure to report on the 3rd Annual Workshop on Search in Social Media (<a href="http://ir.mathcs.emory.edu/SSM2010/" target="_blank">SSM 2010</a>), a gathering of information retrieval and social media researchers and practitioners in an area that has captured the interest of computer scientists, social scientists, and even the broader public. The one-day workshop took place at the Polytechnic Institute of NYU in Brooklyn, NY, co-located with the ACM Conference on Web Search and Data Mining (<a href="http://www.wsdm-conference.org/2010/" target="_blank">WSDM 2010</a>). The quality of the presenters, the overbooked registration, and the hundreds of live tweets with the <a href="http://search.twitter.com/search?q=%23ssm2010" target="_blank">#ssm2010</a> hashtag all attest to the success of this event.</p>
<p>The workshop opened with a warm welcome from <a href="http://www.csee.umbc.edu/~ian/" target="_blank">Ian Soboroff</a> (NIST), immediately followed by a keynote from <a href="http://www.jopedersen.com/jopedersen/Home.html" target="_blank">Jan Pedersen</a>, Chief Scientist of Bing Search. Jan established a clear business case for search in social media: the opportunity to deliver content that is fresh, local, and under-served by general web search. He drilled into particular types of content where social media search is most useful: expert opinions, breaking news, and tail content. The benefits of social media search include trust and personal interaction (as compared to web content that is often soulless and of uncertain provenance), low latency (though perhaps at the cost of accuracy), and access to niche or ephemeral information that web search rarely surfaces. But delivering social media results to searchers creates its own variety of challenges, such as weighing freshness against accuracy and relevance, coping with loss of social content&#8217;s conversational context, managing low update latency when search engines have not been optimized for it, and fighting new kinds of spam. Despite these challenges, it is clear that the major web search engines have embraced the brave new world of real-time social content.</p>
<p><a href="http://www.mathcs.emory.edu/~eugene/" target="_blank">Eugene Agitchein</a> (Emory University) then moderated a panel representing the world&#8217;s leading search engines: <a href="http://www.google.com/profiles/jhylton" target="_blank">Jeremy Hylton</a> (Google), <a href="http://datamining.typepad.com/" target="_blank">Matthew Hurst</a> (Microsoft), <a href="http://research.yahoo.com/user/78" target="_blank">Sihem Amer-Yahia</a> (Yahoo!), and <a href="http://ir.baidu.com/phoenix.zhtml?c=188488&amp;p=irol-govBio&amp;ID=161381" target="_blank">William Chang</a> (Baidu). Jeremy justified the universal interface approach, pointing out that users don&#8217;t want to have to figure out what kind of search site to use for their queries, and that they expect a familiar interface. He also noted that Google has made great strides on update latency: it can index the Twitter firehose in the same amount of time as serving a query. Matthew offered various analyses of the social search problem, based on whether the information signal resides in content (e.g., web) or attention (e.g., Twitter), or whether the information need is expressed in an explicit search query or inferred from the user&#8217;s context. Sihem offered a counter-point to Jeremy, arguing that social media search queries often represent broad or vague information needs, and thus call for a more browsing-oriented interface than web search, which is optimized for highly specific needs. William noted that the biggest competitive threat he sees for web search engines comes from social media players&#8211;and he credits much of Baidu&#8217;s success to its surfacing of social media content.</p>
<p>Then came a flurry of questions, perhaps the most interesting of which was how to address identity management. William argued that people prefer interacting with real-named (or pseudonymous) people to whom they are directly connected. Sihem offered the counter-example of obtaining recommendations through community aggregation. Matthew noted the incongruity of there being no economic relationship between social network companies that maintain proprietary social graphs and people whose identities and relationships those graph represent. Jeremy pointed out that users benefit if the data is as open as possible.</p>
<p>Given the almost even split between academic and industry participation in the workshop, the panelists were also asked to present research challenges to academia. Jeremy posed the problem of determining when social media results are actually true. Matthew wants to see more interdisciplinary work between computer scientists and social scientists. Sihem offered two challenge problems:  scalable community discovery and evaluation of collaborative recommendation systems. William wants to see a rigorous axiomatization of social media search behavior.</p>
<p>After lunch, <a href="http://www.fxpal.com/?p=jeremy" target="_blank">Jeremy Pickens</a> (FXPAL) moderated a panel representing social media / networking companies: <a href="http://www.hilarymason.com/" target="_blank">Hilary Mason</a> (bit.ly), <a href="http://www.linkedin.com/in/igorperisic" target="_blank">Igor Perisic</a> (LinkedIn), and <a href="http://www.myspace.com/myspacedave" target="_blank">David Hendi</a> (MySpace). Hilary noted that, while bit.ly does not have access to an explicit social graph, it captures implicit connections from user behavior that may not be represented in the graph. Jeremy asked the panelists how much a person&#8217;s extended network matters; David and Igor pointed out research indicating correlations of mood and even medical conditions between people and their third-degree connections. Again, the audience was full of questions, especially for Igor. As a fan of <a href="http://en.wikipedia.org/wiki/Faceted_search" target="_blank">faceted search</a>, I was glad to see him touting LinkedIn&#8217;s success in making faceted search the primary means of performing people search on the site. For an in-depth view, I recommend &#8220;<a href="http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/" target="_blank">LinkedIn Search: A Look Beneath the Hood</a>&#8220;.</p>
<p>The afternoon continued with a poster / demo session emphasizing work in progress: tools, interfaces, research studies, and position papers. I particularly enjoyed listening to the stream of interaction between academic researchers and industry practitioners.</p>
<p>The final panel session assembled academic researchers to discuss their views of the challenges in social media. <a href="http://www.fxpal.com/?p=gene" target="_blank">Gene Golovchinsky</a> (FXPAL) moderated a panel comprised of <a href="http://knoesis.wright.edu/researchers/meena/homepage/" target="_blank">Meena Nagarajan</a> (Wright State University), <a href="http://www.lehigh.edu/~lih307/" target="_blank">Liangjie Hong</a> (Lehigh University),<a href="http://www.dcs.gla.ac.uk/~richardm/" target="_blank">Richard McCreadie</a> (University of Glasgow), <a href="http://www.cs.cmu.edu/~jelsas/" target="_blank">Jonathan Elsas</a> (CMU), and <a href="http://comminfo.rutgers.edu/~mor/" target="_blank">Mor Naaman</a> (Rutgers University). Meena highlighted the need to build up meta-data to describe the context around social utteracnces. Liangjie took a position similar to William Cheng&#8217;s, calling for a framework to model the tasks and behavior of users who interact with social media. Richard focused on the intersection of social media and news search, and noted that some of the most useful information is private and proprietary (e.g., search and chat logs). Jonathan offered a variety of challenges: determining the right retrieval granularity, managing multiple axes of organization, aggregating author behavior, and multidimensional indexing of social media content. Finally, Mor noted that we&#8217;re moving from a world of email to a &#8220;social awareness stream&#8221;, in which the content we directed content at a group and have lower expectations of readership than email. As with all of the panels, there were countless questions from the moderator and audience, particularly about determining the truthfulness of social media content and delivering social content in an effective user interface.</p>
<p>The final conference session was a conference was a full-group discussion that dived into the various topics addressed throughout the day. But Gene Golovchinsky provided the &#8220;one more thing&#8221; at the end, showing us a glimpse of a faceted search interface to explore a Twitter stream. It was an elegant finish to a day filled with informative and engaging discussion, and I look forward to seeing many of the participants in the WSDM conference over the next few days.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Blogging SSM 2010 and WSDM 2010</title>
		<link>http://thenoisychannel.com/2010/02/03/blogging-ssm-2010-and-wsdm-2010/</link>
		<comments>http://thenoisychannel.com/2010/02/03/blogging-ssm-2010-and-wsdm-2010/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 05:07:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2932</guid>
		<description><![CDATA[I&#8217;m delighted to report that I&#8217;ll be blogging about the Search and Social Media Workshop (SSM 2010) and the Web Search and Data Mining Conference (WSDM 2010) for Communications of the ACM. Of course, I&#8217;ll cross-post here. I also encourage folks to follow the live tweet streams at #ssm2010 and #wsdm2010, as well as Gene and [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m delighted to report that I&#8217;ll be blogging about the Search and Social Media Workshop (<a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>) and the Web Search and Data Mining Conference (<a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>) for <a href="http://cacm.acm.org/blogs/blog-cacm/">Communications of the ACM</a>.</p>
<p>Of course, I&#8217;ll cross-post here. I also encourage folks to follow the live tweet streams at <a href="http://search.twitter.com/search?q=%23ssm2010">#ssm2010</a> and <a href="http://search.twitter.com/search?q=%23wsdm2010">#wsdm2010</a>, as well as Gene and Jeremy&#8217;s posts at the <a href="http://palblog.fxpal.com/?tag=ssm2010">FXPAL blog</a>.</p>
<p>To those attending: see you all tomorrow through Saturday! To everyone else: I will try my best to communicate the substance and spirit of the conference.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/03/blogging-ssm-2010-and-wsdm-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: Search Facets</title>
		<link>http://thenoisychannel.com/2010/01/31/blogs-i-read-search-facets/</link>
		<comments>http://thenoisychannel.com/2010/01/31/blogs-i-read-search-facets/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 19:54:18 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2928</guid>
		<description><![CDATA[A couple of years ago, I started The Noisy Channel as a personal blog. Since my then-employer Endeca didn&#8217;t have a corporate blog, I became the company&#8217;s ambassador to the blogosphere, despite my protests that this was not a corporate blog. But I&#8217;m pleased to report that Endeca now has is its own blog, aptly [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of years ago, I started The Noisy Channel as a personal blog. Since my then-employer <a href="http://endeca.com/">Endeca</a> didn&#8217;t have a corporate blog, I became the company&#8217;s ambassador to the blogosphere, despite my protests that this was <a href="http://thenoisychannel.com/2008/12/10/this-is-not-a-corporate-blog/">not a corporate blog</a>.</p>
<p>But I&#8217;m pleased to report that Endeca now has is its own blog, aptly entitled <a href="http://facets.endeca.com/">Search Facets</a>. I&#8217;m not usually a fan of corporate blogs, but I like the approach Endeca is taking to this one. The folks who have posted so far are Adam Ferrari (CTO), Vladimir Zelevinsky (Research Scientist), and Pete Bell (Co-Founder)&#8211;an indication that the blog will contain substance, rather than warmed-over press releases.</p>
<p>Indeed, the posts so far are nice and meaty. I particularly like Adam&#8217;s post about &#8220;<a href="http://facets.endeca.com/2010/01/vertical-stores-for-vertical-web-search/">Vertical stores for vertical web search?</a>&#8220;&#8211;it&#8217;s nice to see read intelligent analysis from someone who understand the strengths of both <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> and <a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS">column-oriented relational databases</a>.</p>
<p>Anyway, I&#8217;m delighted that my former co-workers have taken to the blogosphere, and I look forward to reading what they have to say!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/31/blogs-i-read-search-facets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LinkedIn Search: A Look Beneath the Hood</title>
		<link>http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/</link>
		<comments>http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 18:22:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2924</guid>
		<description><![CDATA[Last week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John&#8217;s presentation at the SDForum took a developer&#8217;s perspective, discussing the challenges of combining faceted search and [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://docs.google.com/present/embed?id=d7qvbkn_28cgpvm96r" frameborder="0" width="410" height="342"></iframe></p>
<p>Last week, I had the good fortune to attend a presentation by <a href="http://www.linkedin.com/in/javasoze">John Wang</a>, search architect at <a href="http://linkedin.com/">LinkedIn</a>. You may have read my <a href="http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/">earlier posts</a> about LinkedIn introducing <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> and celebrating the interface from a user perspective. John&#8217;s presentation at the <a href="http://www.sdforum.org/index.cfm?fuseaction=Calendar.eventDetail&amp;eventID=13601">SDForum</a> took a developer&#8217;s perspective, discussing the challenges of combining faceted search and social networking at scale.</p>
<p>John was kind enough to publish his slides, and I&#8217;ve embedded them above. Unfortunately, there&#8217;s no recording of the extensive Q&#038;A (which included various attempts to get John to reveal the precise details of LinkedIn&#8217;s data volume), but the slides are quite meaty.</p>
<p>Personally, I learned two surprising things from the talk.</p>
<p>First, I was surprised that LinkedIn dismisses index/cache warming as &#8220;cheating&#8221;, instead computing almost everything in real time. Specifically, I would have expected LinkedIn to cache information like a user&#8217;s set of degree-two connections: these are expensive to compute at query time, especially when the <a href="http://en.wikipedia.org/wiki/Social_graph">social graph</a> is distributed and <a href="http://en.wikipedia.org/wiki/Shard_%28database_architecture%29">sharded</a> by user. I did ask John whether LinkedIn recomputes a user&#8217;s degree-two network during a session, and he admitted that LinkedIn is sensible enough to &#8220;cheat&#8221; and not perform this expensive but almost useless re-computation.</p>
<p>Second, I learned about <a href="http://www.linkedin.com/rs?trk=msitesearch">reference search</a>, a feature I may have missed because it is only available for premium LinkedIn accounts. It&#8217;s a nice feature, allowing you to search against company + date range pairs. People who are familiar with implementing faceted search may recognize the preservation of such associations between facet values as a gnarly implementation challenge.</p>
<p>All in all, it was a treat to get this look under the hood, as well as to finally meet John in person. I also ran into <a href="http://palblog.fxpal.com/?p=2806">Gene Golovchinsky</a> there&#8211;so much for my spending a few days on the west coast incognito!</p>
<p>In any case, I&#8217;m looking forward to seeing Gene, some of John&#8217;s colleagues, and many more interesting people at the Search and Social Media Workshop (<a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>) on Wednesday. My apologies to those who aren&#8217;t able to attend this oversubscribed event. I promise to blog about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Workshop on Search and Social Media (SSM 2010)</title>
		<link>http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/</link>
		<comments>http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 14:24:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2918</guid>
		<description><![CDATA[The 3rd Annual Workshop on Search in Social Media (SSM 2010) will be held on Wednesday, February 3rd at the Polytechnic Institute in Brooklyn, NY. It&#8217;s co-located with the WSDM 2010 conference on Web Search and Data Mining. As a co-organizer, I&#8217;m proud to announce that the workshop program is now online. It features a [...]]]></description>
			<content:encoded><![CDATA[<p>The 3rd Annual Workshop on Search in Social Media (<a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>) will be held on Wednesday, February 3rd at the Polytechnic Institute in Brooklyn, NY. It&#8217;s co-located with the <a href="http://www.wsdm2010.org/">WSDM 2010</a> conference on Web Search and Data Mining. As a co-organizer, I&#8217;m proud to announce that the <a href="http://ir.mathcs.emory.edu/SSM2010/program.html#schedule">workshop program</a> is now online.</p>
<p>It features a keynote from <a href="http://www.jopedersen.com/jopedersen/Home.html">Jan Pedersen</a>, Chief Scientist for Core Search at Microsoft, as well as an impressive set of posters and panel sessions. Other participants include:</p>
<ul>
<li>Sihem Amer-Yahia, Yahoo!</li>
<li>Jon Elsas, CMU</li>
<li>Gene Golovchinksky, FXPAL</li>
<li>David Hendi, MySpace</li>
<li>LiangJie Hong, Lehigh U.</li>
<li>Jeremy Hylton, Google</li>
<li>Matthew Hurst, Microsoft</li>
<li>Hilary Mason, bit.ly</li>
<li>Richard McCreadie, U. of Glasgow</li>
<li>Mor Naaman, Rutgers U.</li>
<li>Meena Nagarajan, Wright State U.</li>
<li>Igor Perisic, LinkedIn</li>
<li>Jeremy Pickens, FXPAL</li>
</ul>
<p>There&#8217;s still time to <a href="http://www.wsdm-conference.org/2010/register.html">register</a> if you&#8217;re interested!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Real Time Search Is Personal</title>
		<link>http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/</link>
		<comments>http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 19:42:13 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2911</guid>
		<description><![CDATA[The other day, I promised in a comment thread that I&#8217;d write about what I see as real use cases for real-time search. As it happens, I&#8217;m experiencing one right now. As my wife, daughter, and I were walking home from a playground, we noticed a large number of fire trucks congregating a block away [...]]]></description>
			<content:encoded><![CDATA[<p>The other day, I promised in a <a href="http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/#comments">comment thread</a> that I&#8217;d write about what I see as real use cases for real-time search. As it happens, I&#8217;m experiencing one right now.</p>
<p>As my wife, daughter, and I were walking home from a playground, we noticed a large number of fire trucks congregating a block away from our house. A quick search on Twitter <a href="http://search.twitter.com/search?q=+near%3A11201+within%3A15mi+explosion">explained</a> what was going on, particularly by pointing us to this <a href="http://gothamist.com/2010/01/18/buildings_and_subway_stations_in_do.php">post</a> on Gothamist&#8211;which as of this writing seems to be the only reporting about this incident.</p>
<p>I think this example tells us a lot about the utility of real-time search. Most of us don&#8217;t need real-time search to tell us about the <a href="http://http://news.google.com/news/search?q=haiti">news in Haiti</a>, since a critical mass of major news providers is covering the story around the clock. Where real-time search matters most is at the personal level&#8211;specifically, when our personal urgency to obtain information is higher than that of the general population. In such situations, we&#8217;re willing to accept less polished&#8211;and even risk less accurate&#8211;information, particularly if the alternative is to wait until if and when news providers cover the story. At least to some extent, urgency trumps authority.</p>
<p>Yes, there are other use cases for conversational media like Facebook and Twitter, such as sharing the experience of watching a live event, or simply chatting with friends and strangers about arbitrary topics. But I wouldn&#8217;t consider such use of these media to be search. Real-time search, in my view, is about helping users obtain the latest information available&#8211;in accordance with their personal needs. Twitter and <a href="http://www.google.com/search?&amp;output=search&amp;q=brooklyn%20heights%20explosion&amp;tbs=rltm:1">Google</a> served me well today, and I&#8217;m grateful that real-time search gave me real-time peace of mind.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>When Is Faceted Search Appropriate?</title>
		<link>http://thenoisychannel.com/2010/01/15/when-is-faceted-search-appropriate/</link>
		<comments>http://thenoisychannel.com/2010/01/15/when-is-faceted-search-appropriate/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 06:31:27 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2900</guid>
		<description><![CDATA[Earlier this week, Peter Morville and Mark Burrell presented a UIE virtual seminar on &#8220;Leveraging Search &#38; Discovery Patterns For Great Online Experiences&#8220;. It sold out! And I thought Pete Bell and I had done well with our seminar on faceted search! But I&#8217;m hardly surprised. Although I wasn&#8217;t able to attend it myself, I [...]]]></description>
			<content:encoded><![CDATA[<p><img style="visibility: hidden; width: 0px; height: 0px;" src="http://counters.gigya.com/wildfire/IMP/CXNID=2000002.0NXC/bT*xJmx*PTEyNjM1MzYyMTA5MTUmcHQ9MTI2MzUzNjIxNTQ3MSZwPTEwMTkxJmQ9c3NfZW1iZWQmZz*yJm89YjczYWQ5YzUwMGVmNGRiOGFhZGY*MDRmMDI*NzNiOWQmb2Y9MA==.gif" border="0" alt="" width="0" height="0" /></p>
<div id="__ss_2692450" style="width: 425px; text-align: left;"><object style="margin: 0px;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uiedesignpatternstrailermerged-091210133302-phpapp01&amp;stripped_title=search-discovery-patterns-a-uie-virtual-seminar" /><param name="allowfullscreen" value="true" /><embed style="margin: 0px;" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uiedesignpatternstrailermerged-091210133302-phpapp01&amp;stripped_title=search-discovery-patterns-a-uie-virtual-seminar" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>Earlier this week, <a href="http://www.findability.org/">Peter Morville</a> and Mark Burrell presented a <a href="http://uie.com/">UIE</a> virtual seminar on &#8220;<a href="http://www.uie.com/events/virtual_seminars/search_patterns/">Leveraging Search &amp; Discovery Patterns For Great Online Experiences</a>&#8220;. It <a href="http://facets.endeca.com/2010/01/how-to-sell-out-a-virtual-seminar/">sold out</a>! And I thought Pete Bell and I had done well with our <a href="http://www.uie.com/events/virtual_seminars/facets/">seminar</a> on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>!</p>
<p>But I&#8217;m hardly surprised. Although I wasn&#8217;t able to attend it myself, I gather from <a href="http://search.twitter.com/search?q=%23uievs">Twitter</a> and the <a href="http://strottrot.com/2010/01/14/looking-forward-to-interaction10/">blogosphere</a> that it was a great presentation. I enjoyed serving as a reviewer for Peter&#8217;s new book on <a href="http://searchpatterns.org/">Search Patterns</a>, and I contributed a bit to Endeca&#8217;s <a href="http://www.endeca.com/resource-center-ui-pattern-library.htm">UI Design Pattern Library</a> while I was there and Mark&#8217;s team was developing it.</p>
<p>In reading reactions to the seminar, I was particularly intrigued by a post entitled &#8220;<a href="http://livlab.com/thinkia/2010/01/search-and-browse/">Search and Browse</a>&#8221; by Livia Labate on her fantastically named blog, &#8220;<a href="http://livlab.com/thinkia/">I think, therefore IA</a>&#8220;. She raised a question that I think needs to be asked more often: when is (or isn&#8217;t) faceted search appropriate?</p>
<p>Her conversation with readers in a comment thread offered some possible answers:</p>
<ul>
<li>Faceted search helps users who think in terms of attribute specifications as filtering criteria.</li>
<li>Faceted search supports search by exclusion, as opposed to by discovery.</li>
<li>Faceted search requires a set of useful facets that is neither too small nor too large.</li>
</ul>
<p>I&#8217;d like to propose my own answers. Here are the conditions for which I see faceted search being most useful:</p>
<ul>
<li>Faceted search supports <a href="http://thenoisychannel.com/2008/06/24/what-is-not-exploratory-search/">exploratory</a> use cases, in contrast to <a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item search</a>. For known-item search, users are better served by a search box to specify an item by name, or a non-faceted hierarchy to locate it. In contrast, faceted search optimizes for cases where users are either unsure of what they want or of how to specify it.</li>
<li>Faceted search helps users who need or want to learn about the search space as they execute the search process. Facets educate users about different ways to characterize items in a collection. If users do not need or want this education, they may be frustrated by an interface that makes them do more work.</li>
<li>The search space is classified using accurate, understandable facets that relate to the users&#8217; information needs. As I&#8217;ve discussed before, <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/">data quality is often the bottleneck in designing search interfaces</a>. Offering users facets that are either unreliable or unrelated to their needs is worse than providing no facets at all.</li>
</ul>
<p>Given the above criteria, it&#8217;s not surprising that faceted search has been a huge success in online retail: shopping is often an exploratory learning experience, and retailers tend to have good data.</p>
<p>But the success of faceted search in retail overshadows other domains where faceted search may be even more valuable. My favorite example is faceted people search, most recently demonstrated by <a href="http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/">LinkedIn</a>. I would love to see other entities (locations, businesses, etc.) receive similar treatment, at least in contexts where exploration is a common use case.</p>
<p>I think Livia is right to be skeptical about any interface that introduces complexity&#8211;and facets do introduce complexity. I hope that my guidelines help answer her question as to when that complexity is worthwhile and perhaps even necessary to help users satisfy their information needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/15/when-is-faceted-search-appropriate/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Can You &#8220;Near Me Now&#8221;?</title>
		<link>http://thenoisychannel.com/2010/01/09/can-you-near-me-now/</link>
		<comments>http://thenoisychannel.com/2010/01/09/can-you-near-me-now/#comments</comments>
		<pubDate>Sat, 09 Jan 2010 05:12:58 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2896</guid>
		<description><![CDATA[Weren&#8217;t we just talking about what&#8217;s different about mobile search use cases and about how to make web search more exploratory? I may be biased, but I think that Google&#8217;s recently launched &#8220;near me now&#8221; button is a step in the right direction (no pun intended!) on both of these fronts. I&#8217;m curious to hear [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="240" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/ETbTqjjzDLg&amp;hl=en_US&amp;fs=1&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="240" src="http://www.youtube.com/v/ETbTqjjzDLg&amp;hl=en_US&amp;fs=1&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Weren&#8217;t we <a href="http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/">just talking</a> about what&#8217;s different about mobile search use cases and about how to make web search more exploratory? I may be biased, but I think that Google&#8217;s recently launched &#8220;<a href="http://googlemobile.blogspot.com/2010/01/finding-places-near-me-now-is-easier.html">near me now</a>&#8221; button is a step in the right direction (no pun intended!) on both of these fronts.</p>
<p>I&#8217;m curious to hear unbiased feedback from iPhone and Android users who have gotten to play with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/09/can-you-near-me-now/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Search Questions for 2010: What&#8217;s On My Mind</title>
		<link>http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/</link>
		<comments>http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/#comments</comments>
		<pubDate>Sun, 03 Jan 2010 23:09:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2891</guid>
		<description><![CDATA[Happy New Year to the Noisy Community and everyone else in virtual earshot! I hope everyone is entering 2010 well-rested and ready for great things. And I don&#8217;t just mean shiny new gadgets. For me, 2009 marked the end of a decade-long run at Endeca, where I focused on bringing HCIR to enterprises. I&#8217;m particularly [...]]]></description>
			<content:encoded><![CDATA[<p>Happy New Year to the Noisy Community and everyone else in virtual earshot! I hope everyone is entering 2010 well-rested and ready for great things. And I don&#8217;t just mean shiny <a href="http://en.wikipedia.org/wiki/Nexus_One">new</a> <a href="http://en.wikipedia.org/wiki/ISlate">gadgets</a>.</p>
<p>For me, 2009 marked the end of a decade-long run at <a href="http://endeca.com/">Endeca</a>, where I focused on bringing <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> to enterprises. I&#8217;m particularly proud of two professional accomplishments: writing a <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">book</a> on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, and organizing the <a href="http://sigir2009.org/">SIGIR 2009</a> <a href="http://sigir2009.org/Program/industry">Industry Track</a>.</p>
<p>But past is prologue. I spent the last several weeks of 2009 as a <a href="http://www.flickr.com/photos/albill/429691222/">Noogler</a>, and I launch into 2010 living and breathing search on the open web.</p>
<p>What&#8217;s on my mind? Here are some top-of-mind questions to which I hope to have better answers by this time next year:</p>
<ul>
<li><strong>Exploratory Search</strong>: how should we determine that users want a more <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> experience, rather than one that minimizes time to a best-effort result? How should we respond to queries that clearly don&#8217;t have a single best answers, such as queries of the form [category] or [category location]?</li>
</ul>
<ul>
<li><strong>Mobile Search</strong>: should it be just like non-mobile search with a few tweaks to accommodate the device form factor? Or does / should mobile search fundamentally change the way we interact with information?</li>
</ul>
<ul>
<li><strong>Real-Time Search</strong>: is it more than real-time indexing plus emphasizing recency as a query-independent relevance factor? What are the use cases, and how should we be addressing them?</li>
</ul>
<ul>
<li><strong>Social / Collaborative Search</strong>: should we be looking to <a href="http://en.wikipedia.org/wiki/Microblogging">microblogging</a> or other social media signals to augment (or even supplant!) link-based citations as authority cues? Should we be supporting mediated search by linking people to people, rather than directly to information?</li>
</ul>
<ul>
<li><strong>Transparency</strong>: is it possible to offer more <a href="http://thenoisychannel.com/2008/04/08/qa-with-amit-singhal/">transparency in relevance ranking</a> without losing ground in the battle against spam and black-hat SEO?</li>
</ul>
<p>To be clear, these are simply the questions that are on my mind&#8211;I&#8217;m speaking as an individual and not as a Google employee. That said, a great thing about being at Google is that there are people working on all of these areas. So I expect 2010 to be an exciting year!</p>
<p>Curious to hear what problems are on other people&#8217;s minds as we enter the new year. Comment away!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Forget Real-Time, Give Us Over Time!</title>
		<link>http://thenoisychannel.com/2009/12/30/forget-real-time-give-us-over-time/</link>
		<comments>http://thenoisychannel.com/2009/12/30/forget-real-time-give-us-over-time/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 14:56:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2883</guid>
		<description><![CDATA[In a recent announcement, Twitter Platform / API Product Manager Ryan Sarver tells us that Twitter is: committed to providing a framework for any company big or small, rich or poor to do a deal with us to get access to the Firehose in the same way we did deals with Google and Microsoft. We want [...]]]></description>
			<content:encoded><![CDATA[<p>In a recent <a href="https://groups.google.com/group/twitter-development-talk/browse_thread/thread/a1076d83d70d0450?pli=1">announcement</a>, Twitter Platform / API Product Manager <a href="http://sarver.org/about/">Ryan Sarver</a> tells us that Twitter is:</p>
<blockquote><p>committed to providing a framework for any company big or small, rich or poor to do a deal with us to get access to the Firehose in the same way we did deals with Google and Microsoft. We want everyone to have the opportunity &#8212; terms will vary based on a number of variables but we want a two-person startup in a  garage to have the same opportunity to build great things with the full feed that someone with a billion dollar market cap does. There are still a lot of details to be fleshed out and communicated, but this a top priority for us and we look forward to what types of companies and products get built on top of this unique and rich stream.</p></blockquote>
<p>That and some other details, like raising the API rate limit from 150 requests per hour to 1500,  may well bring on what Marshall Kirkpatrick of ReadWriteWeb calls &#8220;<a href="http://www.readwriteweb.com/archives/twitter_20_api_rate_change_could_lead_to_a_world_o.php">Twitter 2.0</a>&#8220;. But it was something else in Kirkpatrick&#8217;s write up that caught my attention&#8211;this quote from <a href="http://wow.ly/">Wow.ly</a> co-founder Kevin Marshall:</p>
<blockquote><p>The more I do with and around social data, the less interested I seem to become in &#8216;realtime&#8217; and the more interested I become in &#8216;over time.&#8217; When I first started hacking on Twitter (and Facebook) apps, I was in love with the idea of parsing and analyzing data in real-time and I was very link/content focused. But the more I build and use these tools, the more I see the value in the history and the trails of the data set.</p></blockquote>
<p>I couldn&#8217;t have said it better! Not that I haven&#8217;t tried: you look back at my post about <a href="http://thenoisychannel.com/2009/05/27/topsy-tippling-the-stream-of-conversations/">Topsy</a>, you&#8217;ll see where real-time and over time meet. Recency matters, but the signal is far too sparse without some way to aggregate and analyze over time.</p>
<p>I&#8217;m thrilled that Twitter plans to open up its platform in a way that could enable analysis over semantic, social, and temporal dimensions. Now I&#8217;m curious to see what that access will look like, and what everyone has been clamoring for that access will do with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/30/forget-real-time-give-us-over-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Faceted Web Search?</title>
		<link>http://thenoisychannel.com/2009/12/27/faceted-web-search/</link>
		<comments>http://thenoisychannel.com/2009/12/27/faceted-web-search/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 21:18:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2878</guid>
		<description><![CDATA[Researchers from Microsoft say it&#8217;s very challenging. Google is trying, but there&#8217;s a long way to go. And Eric Iverson just wrote me to describe his own preliminary efforts to build faceted search on top of Yahoo! BOSS. I believe there&#8217;s a clearly established business case for faceted search inside the enterprise, for site search [...]]]></description>
			<content:encoded><![CDATA[<p>Researchers from Microsoft say it&#8217;s <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">very challenging</a>. Google is <a href="http://www.google.com/squared">trying</a>, but there&#8217;s a long way to go. And <a href="http://www.linkedin.com/in/newledge">Eric Iverson</a> just wrote me to describe his own preliminary efforts to build faceted search on top of <a href="http://developer.yahoo.com/search/boss/">Yahoo! BOSS</a>.</p>
<p>I believe there&#8217;s a clearly established business case for <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> inside the <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise</a>, for site search (especially for retail and media / publishing sites), even for <a href="http://en.wikipedia.org/wiki/Vertical_search">vertical search</a> on the open web. In all of these cases, relevance-ranked results are insufficient to meet a large subset of users&#8217; more <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory</a> information needs, and <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a> approaches like faceted search are an easy sell.</p>
<p>But it seems much harder to make this case for general web search. The track record of startups in this space isn&#8217;t very encouraging. That could be because no one has done it right, but Clayton Christensen&#8217;s theory of <a href="http://en.wikipedia.org/wiki/Disruptive_technology">disruptive innovation</a> would suggest that a successful entrant wouldn&#8217;t have to have parity across the board, but would simply need to win on an underserved market segment. Perhaps the increasing use of faceted search for vertical search is how this process is playing out, and faceted search for general web search may end up being a slow agglomeration of verticals.</p>
<p>I&#8217;m curious if others have been pursuing efforts like Eric&#8217;s. Are the available APIs powerful enough to prototype your own faceted web search engine? If they aren&#8217;t, then is this a potential business opportunity for one of the major (or non-major) search engines to promote innovation by offering an <a href="http://googlepublicpolicy.blogspot.com/2009/12/meaning-of-open.html">open system</a>? Or, if Yahoo! BOSS already offers such an open system, what should we make of the scale of its impact?</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/27/faceted-web-search/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>R.I.P. Modista</title>
		<link>http://thenoisychannel.com/2009/12/26/r-i-p-modista/</link>
		<comments>http://thenoisychannel.com/2009/12/26/r-i-p-modista/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 21:03:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2873</guid>
		<description><![CDATA[Long-time readers may recall my post about visual search startup Modista last November, or this guest post by one of its principals. Unfortunately, the story has a sad ending. I hope that both this technology and its developers find a good home.]]></description>
			<content:encoded><![CDATA[<p><a href="http://modista.com/"><img class="alignnone size-full wp-image-2874" title="Modista RIP" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/12/Modista-RIP.png" alt="" width="235" height="236" /></a></p>
<p>Long-time readers may recall my <a href="http://thenoisychannel.com/2008/11/05/modista-similarity-browsingfor-shoes/">post</a> about visual search startup <a href="http://modista.com/">Modista</a> last November, or this <a href="http://thenoisychannel.com/2009/04/10/guest-post-exploring-visual-similarity-with-modista/">guest post</a> by one of its principals. Unfortunately, the story has a <a href="http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/">sad ending</a>. I hope that both this technology and its developers find a good home.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/26/r-i-p-modista/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Recovering From Being Hacked</title>
		<link>http://thenoisychannel.com/2009/12/24/recovering-from-being-hacked/</link>
		<comments>http://thenoisychannel.com/2009/12/24/recovering-from-being-hacked/#comments</comments>
		<pubDate>Thu, 24 Dec 2009 22:48:59 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2866</guid>
		<description><![CDATA[I discovered today that I&#8217;d been hacked earlier this week by a spam link injection attack. I&#8217;m still not sure how it happened, but I believe I&#8217;ve cleaned out all of the offending PHP from my WordPress installation. I&#8217;ve also removed most of my plug-ins in the process, and I may have broken some things [...]]]></description>
			<content:encoded><![CDATA[<p>I discovered today that I&#8217;d been hacked earlier this week by a spam link injection attack. I&#8217;m still not sure how it happened, but I believe I&#8217;ve cleaned out all of the offending PHP from my WordPress installation. I&#8217;ve also removed most of my plug-ins in the process, and I may have broken some things in my zeal to clean up the site. My apologies for any inconveniences, and my thanks to <a href="http://twitter.com/awaisathar">@awaisathar</a> and <a href="http://twitter.com/gsingers">@gsingers</a> for helping me resolve this quickly.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/24/recovering-from-being-hacked/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: UXmatters</title>
		<link>http://thenoisychannel.com/2009/12/20/blogs-i-read-uxmatters/</link>
		<comments>http://thenoisychannel.com/2009/12/20/blogs-i-read-uxmatters/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 00:19:03 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2858</guid>
		<description><![CDATA[According to Wikipedia, user experience is &#8220;the overarching experience a person has as a result of their interactions with a particular product or service, its delivery, and related artifacts, according to their design.&#8221; While I&#8217;ve never labeled myself a designer, I have always cared deeply about user experience, even back before my information retrieval days, [...]]]></description>
			<content:encoded><![CDATA[<p>According to <a href="http://en.wikipedia.org/wiki/User_experience_design">Wikipedia</a>, user experience is &#8220;the overarching experience a person has as a result of their interactions with a particular product or service, its delivery, and related artifacts, according to their design.&#8221; While I&#8217;ve never labeled myself a designer, I have always cared deeply about user experience, even back before my <a href="http://en.wikipedia.org/w/index.php?title=Information_retrieval">information retrieval</a> days, when I was working on <a href="http://en.wikipedia.org/wiki/Graph_drawing">graph drawing</a>. Indeed user experience is the defining problem for <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>.</p>
<p>One of my favorite resources for learning about user experience is the <a href="http://uxmatters.com/index.php">UXmatters</a> blog. This group blog boasts a set of <a href="http://uxmatters.com/authors/">authors</a> that represent a diverse collection of industry practitioners (and <a href="http://uxmatters.com/authors/archives/2005/12/david_heller_malouf.php">one academic</a>) and offer concrete case studies and recommendations.</p>
<p>For example, in &#8220;<a href="http://www.uxmatters.com/mt/archives/2009/09/best-practices-for-designing-faceted-search-filters.php">Best Practices for Designing Faceted Search Filters</a>&#8220;, Greg Nudelman offers a constructive critique of the <a href="http://www.officedepot.com/">Office Depot</a> search user interface. Some of his material will be familiar to those who have read my <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">faceted search book</a> (particularly the chapter on <a href="http://www.uie.com/events/virtual_seminars/facets/Faceted%20Search%20-%20Chapter%207.pdf">front-end concerns</a>), but the focus on a single example makes for a compelling read. I also liked Greg&#8217;s most recent post, entitled &#8220;<a href="http://www.uxmatters.com/mt/archives/2009/12/cameras-music-and-mattresses-designing-query-disambiguation-solutions-for-the-real-world.php">Cameras, Music, and Mattresses: Designing Query Disambiguation Solutions for the Real World</a>&#8220;. I was amused that he and I use the same &#8220;<em>canon</em>ical&#8221; example for the need to offer <a href="http://thenoisychannel.com/2008/06/02/clarification-vs-refinement/">clarification before refinement</a>. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Here are a few more posts from other authors to give you a taste for the blog:</p>
<ul>
<li>&#8220;<a href="http://www.uxmatters.com/mt/archives/2009/11/first-do-no-harm.php">First, Do No Harm</a>&#8221; by Pabini Gabriel-Petit</li>
<li>&#8220;<a href="http://www.uxmatters.com/mt/archives/2007/11/the-five-competencies-of-user-experience-design.php">The Five Competencies of User Experience Design</a>&#8221; by Steve Psomas</li>
<li>&#8220;<a href="http://www.uxmatters.com/mt/archives/2009/01/beyond-usability-designing-web-sites-for-persuasion-emotion-and-trust.php">Beyond Usability: Designing Web Sites for Persuasion, Emotion, and Trust</a>&#8221; by Eric Schaffer</li>
</ul>
<p>If you are a user experience professional, in name or in deed, then you should be reading the the <a href="http://uxmatters.com/index.php">UXmatters</a> blog &#8212; or perhaps even <a href="http://www.uxmatters.com/aboutus/writing-for-uxmatters.php">contributing</a> to it. Of course, you&#8217;re always welcome to contribute a <a href="http://thenoisychannel.com/category/guest-post/">guest post</a> here too.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/20/blogs-i-read-uxmatters/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>LinkedIn Faceted Search Now Out Of Beta</title>
		<link>http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/</link>
		<comments>http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 04:20:25 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2853</guid>
		<description><![CDATA[LinkedIn started rolling out a beta version of faceted people search back in July. Now it&#8217;s officially out of beta, as announced on their blog. I&#8217;ve re-posted the video above in case you missed it in July. Interestingly, LinkedIn developed its own tool to support the combination of faceted search with social network search: Bobo-Browse [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="243" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/unLo7maOgT4&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="243" src="http://www.youtube.com/v/unLo7maOgT4&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>LinkedIn started rolling out a beta version of <a href="http://thenoisychannel.com/2009/07/15/linkedin-rolling-out-faceted-search/">faceted people search</a> back in July. Now it&#8217;s officially out of beta, as <a href="http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/">announced on their blog</a>. I&#8217;ve re-posted the video above in case you missed it in July.</p>
<p>Interestingly, LinkedIn developed its own tool to support the combination of <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> with social network search: <a href="http://code.google.com/p/bobo-browse/">Bobo-Browse</a> (Otis mentioned it in our recent <a href="http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/">presentation</a> to the New York CTO Club). I helped develop similar functionality when I was at <a href="http://endeca.com/">Endeca</a>, so I know how hard this problem is. LinkedIn has done an impressive job&#8211;and has applied it to one of the most valuable data sets on the web. Bravo!</p>
<p>But I can&#8217;t help asking for just one more thing. LinkedIn has great semi-structured data about its 50+ million members. I&#8217;d love to be able to explore that data using more facets&#8211;in particular, facets relating to people&#8217;s job skills and expertise. I hope that&#8217;s something they&#8217;re working on. Perhaps a good topic of conversation at the upcoming <a href="http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/">Workshop on Search and Social Media</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Karaoke: A Hotbed for Micro-IR?</title>
		<link>http://thenoisychannel.com/2009/12/13/karaoke-a-hotbed-for-micro-ir/</link>
		<comments>http://thenoisychannel.com/2009/12/13/karaoke-a-hotbed-for-micro-ir/#comments</comments>
		<pubDate>Sun, 13 Dec 2009 22:08:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2847</guid>
		<description><![CDATA[I&#8217;m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I&#8217;m sure I&#8217;m not the only person who encounters this micro-IR problem, and it occurred to me that there [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I&#8217;m sure I&#8217;m not the only person who encounters this <a href="http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/">micro-IR</a> problem, and it occurred to me that there might be better technical solutions to it.</p>
<p>Most karaoke venues provide printed song books, typically sorted by title and by artist. This approach is certainly adequate for very limited selections, but it doesn&#8217;t scale gracefully. Indeed, one of my favorite karaoke bars, the <a href="http://www.courtsidekaraoke.com/">Courtside</a> in Cambridge, MA, has a fantastic song selection that is only accessible through printed books. Kinda frustrating for a search guy, even though the <a href="http://www.courtsidekaraoke.com/asktheshark.htm">staff</a> is very helpful!</p>
<p>My regular karaoke venue in New York, <a href="http://www.2ndon2nd.com/">Second on Second</a>, is a bit more technologically advanced: it provides computers with dedicated software that allows patrons to search through their song catalog. Aside from being faster than thumbing through books, the software makes it possible to find songs when you only remember words that are in the middle of song or artist names.</p>
<p>But even such a system only addresses <a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item search</a>&#8211;in this case, looking for a song or artist by name when you know precisely what you are looking for. There&#8217;s room for incremental improvement here, e.g., searching for songs based on the lyrics you remember. For example, many people remember a famous David Bowie song based on its protagonist &#8220;<a href="http://www.google.com/search?q=major+tom">Major Tom</a>&#8221; rather than its title &#8220;<a href="http://en.wikipedia.org/wiki/Space_Oddity">Space Oddity</a>&#8220;; fortunately, tools like Google&#8217;s <a href="http://googleblog.blogspot.com/2009/10/making-search-more-musical.html">music search</a> are happy to make such connections.</p>
<p>But none of the karaoke search technology I&#8217;ve see to date supports <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploration</a>. Specifically, I&#8217;d love to go into a karaoke bar and have a procedure for finding songs I know that is better than trial and error. For example, I&#8217;d like to be able to see my options for hard rock 80s songs with male vocals. Or to find out which <a href="http://en.wikipedia.org/wiki/Downtempo">downtempo</a> bands, if any, are on the menu. A little <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> would go a long way towards making the song-finding experience more pleasant and efficient.</p>
<p>But why stop there? I&#8217;d really like a system that suggests songs based on what it knows about me. For example, knowing that I like to sing <a href="http://www.pandora.com/music/artist/scorpions">Scorpions</a> songs is a reasonable basis to suggest similar artists like <a href="http://www.pandora.com/music/artist/def+leppard">Def Leppard</a> and <a href="http://www.pandora.com/music/artist/guns+n+roses">Guns N&#8217; Roses</a>. Or perhaps to suggest 80s songs in general&#8211;after all, <a href="http://www.karaokeholics.com/home.cfm?dir_cat=17160">karaoke roulette</a> notwithstanding, most people sing songs they know (or at least think they know), and their song knowledge tends to have some temporal locality. I&#8217;m sure you can imagine far more sophisticated personalization&#8211;and such personalization could be accomplished with complete <a href="http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/">transparency</a> to the user.</p>
<p>Even if you aren&#8217;t into karaoke (and yet have managed to read this far!), I hope you can appreciate the universality of the information needs I&#8217;m describing. <a href="http://en.wikipedia.org/wiki/Exploratory_search">Exploratory search</a> is everywhere. But I think it&#8217;s easiest to demonstrate its practical importance by working through concrete use cases. As an <a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval">HCIR</a> advocate, I&#8217;ve repeatedly learned the lesson that such demonstrations are critical in order to successfully evangelize this worldview.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/13/karaoke-a-hotbed-for-micro-ir/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Faceted Search Presentation at New York CTO Club</title>
		<link>http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/</link>
		<comments>http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 14:53:08 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2843</guid>
		<description><![CDATA[Otis Gospodnetic and I recently gave a talk at the New York CTO Club on faceted search. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my book or attended the UIE virtual seminar [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_2690072" style="width: 425px; text-align: left;"><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=facetedsearchnyctotalk-091210081555-phpapp01&amp;stripped_title=faceted-search-nycto-talk" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=facetedsearchnyctotalk-091210081555-phpapp01&amp;stripped_title=faceted-search-nycto-talk" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p><a href="http://www.jroller.com/otis/entry/faceted_search_by_daniel_tunkelang">Otis Gospodnetic</a> and I recently gave a talk at the New York CTO Club on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">book</a> or attended the <a href="http://www.uie.com/events/virtual_seminars/facets/">UIE virtual seminar</a> a few months ago that I gave with Pete Bell (whom I worked with for 10 years at <a href="http://endeca.com/">Endeca</a>) might recognize some of my material. Otis focused on the specifics of implementing faceted search using the open-source <a href="http://lucene.apache.org/solr/">Solr</a> platform.</p>
<p>Here were the major take-aways:</p>
<ul>
<li>Think about what users are trying to do, not just how they search.</li>
<li>Facets get polluted with bad result sets, so offer <a href="http://thenoisychannel.com/2008/06/02/clarification-vs-refinement/">clarification before refinement</a>.</li>
<li>Don&#8217;t just move the information overload problem to the facets! Show less, not more.</li>
<li>Look at the potential data facets you already have, you will be surprised.</li>
<li>Facets can come from new data, e.g. sentiment.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: Living La Vida Local</title>
		<link>http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/</link>
		<comments>http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 21:44:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2836</guid>
		<description><![CDATA[My new role at Google (yes, it still feels new after not quite a month!) has given me a professional interest in local search. I&#8217;ve adjusted my reading materials accordingly, and I&#8217;ve started reading blogs that focus on local. Here are a handful that I&#8217;ve discovered so far: BIA / Kelsey Blog By The Kelsey [...]]]></description>
			<content:encoded><![CDATA[<p>My new role at Google (yes, it still feels new after not quite a month!) has given me a professional interest in <a href="http://en.wikipedia.org/wiki/Local_search_%28Internet%29">local search</a>. I&#8217;ve adjusted my reading materials accordingly, and I&#8217;ve started reading blogs that focus on local. Here are a handful that I&#8217;ve discovered so far:</p>
<ul>
<li><a href="http://blog.kelseygroup.com/">BIA / Kelsey Blog</a>
<ul>
<li>By <a href="http://kelseygroup.com/">The Kelsey Group</a>, a division of <a href="http://www.bia.com/" target="_blank">BIA Advisory Services</a> that provides data and analysis on directories and local media.</li>
</ul>
</li>
<li><a href="http://blog.telemapics.com/">Exploring Local</a>
<ul>
<li> By <a href="http://www.glgroup.com/Council-Member/Michael-Dobson-178033.html">Mike Dobson</a>, President of <a href="http://telemapics.com/">TeleMapics</a>, a company that provides consulting services focused on local search.</li>
</ul>
</li>
<li><a href="http://www.localseoguide.com/">Local SEO Guide</a>
<ul>
<li><a href="http://www.localseoguide.com/about-me?PHPSESSID=7509db638c808d8ac60e49cc596c99fc">Andrew Shotland</a>&#8216;s blog on local search optimization, small business marketing &amp; search engine optimization strategy.</li>
</ul>
</li>
<li><a href="http://www.localsearchdatabase.com/">Localsearchdatabase</a>
<ul>
<li>By <a href="http://twitter.com/golander59">Gib Olander</a>, Director of Business Development for <a href="http://www.localeze.com/">Localeze</a>, an online content management company serving businesses, local search engines and consumers.</li>
</ul>
</li>
<li><a href="http://www.davidmihm.com/blog/">Mihmorandum</a>
<ul>
<li><a href="http://www.davidmihm.com/">David Mihm</a>&#8216;s blog on local search engine optimization and marketing.</li>
</ul>
</li>
<li><a href="http://gesterling.wordpress.com/">Screenwerk</a>
<ul>
<li><a href="http://gesterling.wordpress.com/about/">Greg Sterling</a>&#8216;s thoughts on online and offline media. Sterling used to run The Kelsey Group’s Interactive Local Media program.</li>
</ul>
</li>
<li><a href="http://www.solaswebdesign.net/wordpress/">SEO Igloo Blog</a>
<ul>
<li>By <a href="http://www.solaswebdesign.net/">Solas Web Design</a>, which specializes in web design and search engine optimization for small businesses.</li>
</ul>
</li>
<li><a href="http://blumenthals.com/blog/">Understanding Google Maps &amp; Local Search</a>
<ul>
<li>By <a href="http://www.blumenthals.com/index.php?MikeBlumenthal">Mike Blumenthal</a>, whose company offers consulting services and market research advice relating to maps and local search.</li>
</ul>
</li>
</ul>
<p>Not surprisingly, these blogs offers me a critical perspective on how Google and other search engines serve the local space.  Granted, everyone has their own motives&#8211;and it&#8217;s hard to avoid some tension in a space with the competitive dynamics of local search. But now that I&#8217;m no longer an outsider myself, I appreciate having others to help keep me honest as I work to make local search better for users and businesses.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search User Interfaces and Data Quality</title>
		<link>http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/</link>
		<comments>http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 04:49:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2831</guid>
		<description><![CDATA[One of the many things I&#8217;ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about HCIR. Indeed, some of the folks working on &#8220;more and better search refinements&#8221; are just steps away from my desk. Very cool! But [...]]]></description>
			<content:encoded><![CDATA[<p>One of the many things I&#8217;ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>. Indeed, some of the folks working on &#8220;<a href="http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-results.html">more and better search refinements</a>&#8221; are just steps away from my desk. Very cool!</p>
<p>But working on the inside has also help me appreciate what <a href="http://www.linkedin.com/in/bobwyman">Bob Wyman</a> tried to <a href="http://thenoisychannel.com/2009/02/05/what-would-google-do-what-does-google-do/">tell me</a> months ago&#8211;that Google has no philosophical predilection towards black box approaches, but rather is only limited by what technology makes possible and what its engineers can implement. I&#8217;d qualify that slightly by saying that I perceive an additional constraint: Google does have a strong predilection towards data-driven decisions. Some folks have found that approach <a href="http://stopdesign.com/archive/2009/03/20/goodbye-google.html">objectionable</a> in the context of interface design.</p>
<p>Anyway, if you&#8217;re a regular here, then you&#8217;re probably predisposed towards HCIR and <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>. In that case, I&#8217;d like to take a moment to help you appreciate the challenge I face on a day-to-day basis.</p>
<p>Which one of these two statements do you most agree with?</p>
<ol>
<li>We need better data quality in order to support richer search user interfaces.</li>
<li>Richer search user interfaces allow us to overcome data quality limitations.</li>
</ol>
<p>On one hand, consider two search engines whose interfaces are designed to support exploratory search: <a href="http://www.cuil.com/">Cuil</a> and <a href="http://www.kosmix.com/">Kosmix</a>. Sometimes they&#8217;re great, e.g., [<a href="http://www.cuil.com/search?q=michael+jackson">michael jackson</a>] on Cuil and [<a href="http://www.kosmix.com/topic/iraq">iraq</a>] on Kosmix. But look what can happen for queries that are further out in the tail, e.g. [<a href="http://www.cuil.com/search?q=faceted+search">faceted search</a>] on Cuil [<a href="http://www.kosmix.com/topic/real_time_search">real time search</a>] on Kosmix. Yes, the kinds of queries I make. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  I don&#8217;t mean to knock these guys&#8211;they&#8217;re trying, and their efforts are admirable. Moreover, both generally return respectable search results on the first pages (in Kosmix&#8217;s case, through federation). But the search refinements can be way off, and that undermine the overall experience. I strongly suspect that the problem is one of data quality, along the lines of what <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">others have argued</a>.</p>
<p>On the other hand, some of the work that I did with colleagues at Endeca (e.g., work presented at <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2008/">HCIR 2008</a> on &#8220;Supporting Exploratory Search for the ACM Digital Library&#8221;) at least dangles the possibility that the second statement holds&#8211;namely, a richer user interface could help overcome data quality limitations. Interaction draws more of the information need out of the user, and the process may be able to mask imperfection in the data. For example, it&#8217;s clear to users&#8211;and clear from the search refinements&#8211;that [<a href="http://www.google.com/search?q=michael+jackson+beer">michael jackson beer</a>] and [<a href="http://www.google.com/search?q=michael+jackson+-beer">michael jackson -beer</a>] are about different people. If we can just get that incremental information from the user, we don&#8217;t have to achieve perfection in named entity recognition and disambiguation.</p>
<p>I think there&#8217;s some truth in both arguments. Data quality is a major bottleneck for effectively delivering an exploratory search experience, and data quantity, <a href="http://thenoisychannel.com/2009/03/31/the-unreasonable-effectiveness-of-data/">much as it helps</a>, is not a guarantee of quality. Richer interfaces offer the enticing possibility of leveraging <a href="http://en.wikipedia.org/wiki/Human-based_computation">human computation</a>, but they also introduce the risk of disappointing and alienating users. Even for an HCIR zealot like me, the constraints of reality are sobering.</p>
<p>And yes, speed and computational cost matter too. But hey, it wouldn&#8217;t be a <a href="http://thenoisychannel.com/2008/04/06/nick-belkin-at-ecir-08/">grand challenge</a> if it were easy!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/feed/</wfw:commentRss>
		<slash:comments>29</slash:comments>
		</item>
		<item>
		<title>Fun with Google, Bing, and Yahoo</title>
		<link>http://thenoisychannel.com/2009/11/29/fun-with-google-bing-and-yahoo/</link>
		<comments>http://thenoisychannel.com/2009/11/29/fun-with-google-bing-and-yahoo/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 19:45:06 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2809</guid>
		<description><![CDATA[Web search is a fiercely competitive space&#8211;as Google points out, &#8220;competition is just one click away&#8220;. In practice, I take that claim with a grain of salt&#8211;but I do think the switching costs are much lower than in most competitive markets. With that in mind, it&#8217;s interesting to look at what happens if you search [...]]]></description>
			<content:encoded><![CDATA[<p>Web search is a fiercely competitive space&#8211;as Google points out, &#8220;<a href="http://googlepublicpolicy.blogspot.com/2009/05/googles-approach-to-competition.html">competition is just one click away</a>&#8220;. In practice, I take that claim with a grain of salt&#8211;but I do think the switching costs are much lower than in most competitive markets. With that in mind, it&#8217;s interesting to look at what happens if you search for the name of one of the major search engines on one of its competitor&#8217;s sites.</p>
<p>Google returns standard results for such searches:</p>
<p><a href="http://www.google.com/search?q=bing"><img class="alignnone size-full wp-image-2814" title="[bing] on Google" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/google-bing2.png" alt="[bing] on Google" width="414" height="180" /></a></p>
<p><a href="http://www.google.com/search?q=yahoo"><img class="alignnone size-full wp-image-2815" title="[yahoo] on Google" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/google-yahoo1.png" alt="[yahoo] on Google" width="414" height="266" /></a></p>
<p>Bing is generous to a fault, saving you a click if you choose to use one of its leading competitors:</p>
<p><a href="http://www.bing.com/search?q=google"><img class="alignnone size-full wp-image-2818" title="[google] on Bing" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/bing-google.png" alt="[google] on Bing" width="426" height="247" /></a></p>
<p><a href="http://www.bing.com/search?q=yahoo"><img class="alignnone size-full wp-image-2819" title="[yahoo] on Bing" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/bing-yahoo.png" alt="[yahoo] on Bing" width="426" height="262" /></a></p>
<p>Finally Yahoo, whose CEO claims &#8220;<a href="http://bits.blogs.nytimes.com/2009/08/07/yahoo-ceo-we-have-never-been-a-search-company/">we have never been a search company</a>,&#8221; seems quite eager to keep searchers from going elsewhere:</p>
<p><a href="http://search.yahoo.com/search?p=bing"><img class="alignnone size-full wp-image-2826" title="[bing] on Yahoo" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/yahoo-bing1.png" alt="[bing] on Yahoo" width="515" height="276" /></a></p>
<p><a href="http://search.yahoo.com/search?p=google"><img class="alignnone size-full wp-image-2827" title="[google] on Yahoo" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/yahoo-google1.png" alt="[google] on Yahoo" width="515" height="246" /></a></p>
<p>It&#8217;s easy to dismiss these queries as corner cases, but the logs show that they really happen. And, as browsers increasingly blur the line between an address bar and a search box, it&#8217;s not unreasonable to consider that switches between search engines are likely to commence with such queries.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/29/fun-with-google-bing-and-yahoo/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Marti Hearst: Tech Talk on Search User Interfaces</title>
		<link>http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/</link>
		<comments>http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 23:50:48 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2805</guid>
		<description><![CDATA[Earlier this week, Marti Hearst gave a Tech Talk at Google about her recently published book, Search User Interfaces. Fortunately for those of us who missed (myself included!), it is now available on YouTube. Enjoy! (via Jon Elsas)]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/BpBAg4Ndi9w&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/BpBAg4Ndi9w&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Earlier this week, <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a> gave a Tech Talk at Google about her recently published book, <a href="http://searchuserinterfaces.com/"><em>Search User Interfaces</em></a>. Fortunately for those of us who missed (myself included!), it is now available on <a href="http://www.youtube.com/watch?v=BpBAg4Ndi9w">YouTube</a>. Enjoy! (via <a href="http://windowoffice.tumblr.com/post/257316365/search-user-interfaces-the-movie">Jon Elsas</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Can We Learn From Anti-Social Users?</title>
		<link>http://thenoisychannel.com/2009/11/21/can-we-learn-from-anti-social-users/</link>
		<comments>http://thenoisychannel.com/2009/11/21/can-we-learn-from-anti-social-users/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 21:54:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2798</guid>
		<description><![CDATA[One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise&#8211;be they links, re-tweets, or any number other ways that online consumers (or more correctly &#8220;prosumers&#8221;) actively and passively [...]]]></description>
			<content:encoded><![CDATA[<p>One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise&#8211;be they links, re-tweets, or any number other ways that online consumers (or more correctly &#8220;prosumers&#8221;) actively and passively communicate value judgments about information. On the other hand, our reliance on these social signals makes us vulnerable to positive feedback and spammers.</p>
<p>Consider <a href="http://www.princeton.edu/~mjs3/musiclab.shtml">MusicLab</a>, an &#8220;<a href="http://www.princeton.edu/%7Emjs3/salganik_watts08.pdf" target="_blank">experimental study of self-fulfilling prophecies in an artificial cultural market</a>&#8220;. In this study, sociologists <a href="http://www.princeton.edu/~mjs3/index.shtml">Matt Salganik</a>, <a href="http://www.uvm.edu/~pdodds/home.html">Peter Dodds</a>, and <a href="http://en.wikipedia.org/wiki/Duncan_J._Watts">Duncan Watts</a> manipulated the social information available to consumers (specifically teens) regarding their peers&#8217; musical tastes. The experimenters&#8217; goal was to empirically validate a quantitative model of social contagion.</p>
<p>But we can look at this study another way: by isolating the social factors that influence musical taste, the experimenters were also isolating the non-social signal&#8211;in theory, how popular a song would be in the absence of social signaling. Indeed, they found that, if they measured a song&#8217;s quality by isolating out the social factor, &#8220;the best songs never do very badly, and the worst songs never do extremely well, but almost any other result is possible&#8221;.</p>
<p>It&#8217;s interesting&#8211;interesting to me, at least!&#8211;to ask if search engines can do the same for search. One of the frequent objections to link-based authority measures like <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a> is that they make the rich get richer. &#8220;Real-time&#8221; variants like re-tweet frequency (and even <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a>) suffer from the same weakness. Unchecked, these measures can cause authority / influence market has to resemble a <a href="http://ingrimayne.com/econ/resouceProblems/WinnerTakeIt.html">winner-take-all</a> market.</p>
<p>It strikes me as interesting to learn from cases where searchers swim upstream against the social signals to find information. Of course, you may already see the contradiction&#8211;this is just another kind of social signaling! Still, it seems like it might be a way to hedge our bets and against the weaknesses of positive feedback and spammers. In a similar vein, we might look at how users find information that suffers from poor <a href="http://thenoisychannel.com/2008/04/22/accessibility-in-information-retrieval/">accessibility</a> or <a href="http://thenoisychannel.com/2009/09/26/information-retrievability/">retrievability</a>.</p>
<p>I don&#8217;t have answers about how to pursue such an approach, or whether it would even be feasible to do so. But I hope you agree with me that it&#8217;s an interesting question.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/21/can-we-learn-from-anti-social-users/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Exploring Explortatory Search</title>
		<link>http://thenoisychannel.com/2009/11/18/exploring-explortatory-search/</link>
		<comments>http://thenoisychannel.com/2009/11/18/exploring-explortatory-search/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 13:44:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2792</guid>
		<description><![CDATA[Google&#8217;s recently released Image Swirl is slick. But I&#8217;ve been struggling to figure out whether it&#8217;s useful or simply a showcase for cool technology. And that&#8217;s prompted me to think about the overloaded term &#8220;exploratory search&#8220;. A while back, I tried to define exploratory search based on what it is not. This time, let me [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://googleblog.blogspot.com/2009/11/explore-images-with-google-image-swirl.html"><img class="aligncenter" title="Google Image Swirl" src="http://2.bp.blogspot.com/_7ZYqYi4xigk/SwLfp7ciT2I/AAAAAAAAE8Y/NpojWXrDCb0/s1600/washington+monument.png" alt="" width="377" height="237" /></a></p>
<p>Google&#8217;s recently released <a href="http://googleblog.blogspot.com/2009/11/explore-images-with-google-image-swirl.html">Image Swirl</a> is slick. But I&#8217;ve been struggling to figure out whether it&#8217;s useful or simply a showcase for cool technology.</p>
<p>And that&#8217;s prompted me to think about the overloaded term &#8220;<a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>&#8220;. A while back, I tried to define exploratory search based on <a href="http://thenoisychannel.com/2008/06/24/what-is-not-exploratory-search/">what it is not</a>. This time, let me aim to positively characterize what I see as its two primary use cases:</p>
<ol>
<li>I know what I want, but I don&#8217;t know how to describe it.</li>
<li>I don&#8217;t know what I want, but I hope to figure it out once I see what&#8217;s out there.</li>
</ol>
<p>The first use case cries out for tools that support query refinement or elaboration. Existing tools span a range from suggesting spelling corrections (aka &#8220;did you mean&#8221;) to offering semantically or statistically related searches that hopefully provide the user with at least a step in the right direction. One of my favorite approaches, <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, is primarily used to support query refinement through progressive narrowing of an initial search query.</p>
<p>The second &#8220;I don&#8217;t know what I want&#8221; use case is fuzzier. In the language of <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a>, this use case is <a href="http://en.wikipedia.org/wiki/Unsupervised_learning">unsupervised</a>, while the previous one is <a href="http://en.wikipedia.org/wiki/Supervised_learning">supervised</a>. In general, it&#8217;s a lot harder to define or evaluate outcomes for unsupervised scenarios. Indeed, <a href="http://nlpers.blogspot.com/2006/04/unsupervised-learning-why.html">Hal Daume has argued</a> that we should only do unsupervised learning if we do not have a trustworthy automatic evaluation metric. That&#8217;s a strong position, and you can see some of the counterarguments in his comment thread. But, going back to our scenario, it&#8217;s really hard to judge the effectiveness of tools like <a href="http://maroo.cs.umass.edu/pub/web/getpdf.php?id=614">similarity browsing</a> when they support exploration in the absence of any concrete goal.</p>
<p>With that in mind, I&#8217;ll reserve judgment on the utility of tools like Image Swirl. To the extent that it aims at the first use case, clustering images for a particular search, I&#8217;m ambivalent. I&#8217;d prefer a more transparent interface, in which I have more of a sense of control over the navigational experience. I suspect it is more aimed at the second use case, offering a compact visualization of what is out there.</p>
<p>Besides, as some folks have brought up at the <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> workshops, it&#8217;s important that we make information seeking fun. And Swirl certainly scores on that front.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/18/exploring-explortatory-search/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>An Ad-Supported Model With Teeth?</title>
		<link>http://thenoisychannel.com/2009/11/15/an-ad-supported-model-with-teeth/</link>
		<comments>http://thenoisychannel.com/2009/11/15/an-ad-supported-model-with-teeth/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 13:29:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2788</guid>
		<description><![CDATA[A computer-implemented method for operating a device, the method comprising: disabling a function of an operating system in a device; presenting an advertisement in the device while the function is disabled; and enabling the function in response to the advertisement ending. So reads the first claim from a patent application that Apple recently filed (with [...]]]></description>
			<content:encoded><![CDATA[<p><em>A computer-implemented method for operating a device, the method      comprising:<br />
disabling a function of an operating system in a      device;<br />
presenting an advertisement in the device while the function is      disabled;<br />
and enabling the function in response to the advertisement      ending.</em></p>
<p>So reads the first claim from a <a href="http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;Sect2=HITOFF&amp;d=PG01&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&amp;r=1&amp;f=G&amp;l=50&amp;s1=%2220090265214%22.PGNR.&amp;OS=DN/20090265214&amp;RS=DN/20090265214">patent application</a> that Apple recently filed (with Steve Jobs as first inventor, no less!) for technology to deliver a rather compelling ad-supported business model. Or perhaps the better word is compulsory. You can read an analysis by Randall Stross in the <a href="http://www.nytimes.com/2009/11/15/business/15digi.html">New York Times</a>.</p>
<p>I agree with Stross that it&#8217;s hard to imagine Apple ever implementing the technology described by the patent application&#8211;indeed, Apple has been one of the few success stories for paid digital content models. That said, the approach does feel like at least one endpoint for the ad-supported model&#8211;it guarantees the advertisers the attention that they are paying for by subsidizing content or services.</p>
<p>The advertising business is a bit more top of mind for me, now that it pays my salary. Google&#8217;s approach, however, follows the aphorism that honey catches more flies than vinegar: it tries to target ads well enough that users want to click on them, rather than to simply endure them as a cost of subsidizing free services. Google&#8217;s revenue (and the popularity of <a href="http://en.wikipedia.org/wiki/Pay_per_click">PPC</a> models in general) is a testament to the success of this approach, my occasional <a href="http://thenoisychannel.com/2008/10/09/search-is-not-advertising/">rant</a> notwithstanding.</p>
<p>In general, the industry seems to have found a compromise in how aggressively to push ads at users. Users can safely ignore (or even block) sponsored links, but few people do.  Pre-roll ads on video sites (i.e., advertising before a video starts)  are more invasive, but a number of sites let users skip them. You can read why the YouTube folks are <a href="http://ytbizblog.blogspot.com/2009/11/skip-skip-skip-to-my-video.html">testing</a> this approach. Advertisers&#8211;or at least ad-supported services&#8211;seem to recognize that they can&#8217;t cross the line between pursuing users&#8217; attention and annoying users to the point of alienation.</p>
<p>Still, technology like Apple&#8217;s patent application describes shows that it is possible for the ad-supported model to take a more more aggressive approach. Part of me wonders if more aggressive ad-supported models would revitalize paid content models, as users would stop perceiving the former as free. But I suspect that the gentler ad-supported model is here to stay, and that it will continue to strive toward the point of optimal effectiveness.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/15/an-ad-supported-model-with-teeth/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Call for Speakers: Enterprise Search Summit 2010</title>
		<link>http://thenoisychannel.com/2009/11/13/call-for-speakers-enterprise-search-summit-2010/</link>
		<comments>http://thenoisychannel.com/2009/11/13/call-for-speakers-enterprise-search-summit-2010/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 23:07:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2780</guid>
		<description><![CDATA[I&#8217;m no longer in the enterprise search business, but I know that many readers here are. If you are one of those readers, then I strongly encourage you to consider participating in the Enterprise Search Summit, which will take place next May in New York. I presented there last year and enjoyed the opportunity to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m no longer in the <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise search</a> business, but I know that many readers here are. If you are one of those readers, then I strongly encourage you to consider participating in the <a href="http://www.enterprisesearchsummit.com/2010/">Enterprise Search Summit</a>, which will take place next May in New York. I presented there last year and enjoyed the opportunity to meet fellow presenters and attendees. You can read my <a href="http://thenoisychannel.com/2009/05/13/reprising-the-enterprise-search-summit/">recap</a> of the event.</p>
<p>The deadline for <a href="https://secure.infotoday.com/forms/default.aspx?form=ess2010speakers">proposal submission</a> is November 30th&#8211;you only have to submit a 250-word abstract.</p>
<p>Here is the <a href="http://www.enterprisesearchsummit.com/2010/CallForSpeakers.shtml">call for proposals</a>:</p>
<blockquote>
<p style="font-size: 12px;">We seek dynamic speakers who can talk knowledgeably about detailed aspects of how to implement and maximize search within an organization. Search can no longer be viewed as a stand-alone application. It is increasingly part of everything we do and has become the <em>de facto</em> gateway to information in the enterprise. This year’s Summit will examine the ways to leverage search tools, information architecture, classification, and other strategies and technologies to deliver meaningful results—not just in terms of information, but to the bottom line.</p>
<p style="font-size: 12px;">Ours is a well-informed, tech-savvy audience, so proposals should be specific and detailed. Consider topic such as:</p>
<ul style="font-size: 12px;">
<li>Integrating search into enterprise systems and workflow</li>
<li>Customizing your search solution/ Task-specific search</li>
<li>Compliance, records management, and eDiscovery with effective search</li>
<li>Migrating your search engine</li>
<li>Social search and social tagging strategies &amp; solutions</li>
<li>Search-enabled decision making</li>
<li>Business intelligence, data mining</li>
<li>Search as the gateway to enterprise information</li>
<li>Optimizing the interface and user experience</li>
<li>Navigational tools—context, facets, entity extraction, clustering, and visualization</li>
<li>Emerging trends, the future of search</li>
<li>Overcoming information overload</li>
<li>Categorization techniques</li>
<li>Semantic Search</li>
<li>Query Federation &amp; Federated Search</li>
<li>Enhancing an existing solution</li>
</ul>
<p style="font-size: 12px;">If you represent a company that has an enterprise search software product, your best bet to be on our program is to collaborate with a customer to submit a case study to be presented by them, following the guidelines above.</p>
</blockquote>
<p>If you need more information&#8211;or more time&#8211;I encourage you to reach out directly to <a href="mailto:michelle.manafy@infotoday.com">Michelle Manafy</a>, the conference chair.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/13/call-for-speakers-enterprise-search-summit-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Week 1 at Google: Information Overload!</title>
		<link>http://thenoisychannel.com/2009/11/13/week-1-at-google-information-overload/</link>
		<comments>http://thenoisychannel.com/2009/11/13/week-1-at-google-information-overload/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 13:12:52 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2777</guid>
		<description><![CDATA[As you might imagine, it&#8217;s quite a switch to go from criticizing Google from the outside to being on the inside. Jeff Jarvis, who was gracious enough not to make fun of me in public, nonetheless admitted to me privately that the news had made him chuckle. As I finish my first week, I can [...]]]></description>
			<content:encoded><![CDATA[<p>As you might imagine, it&#8217;s quite a switch to go from criticizing<br />
Google from the outside to being on the inside. <a href="http://www.buzzmachine.com/">Jeff Jarvis</a>, who was<br />
gracious enough not to make fun of me in public, nonetheless admitted<br />
to me privately that the news had made him chuckle.</p>
<p>As I finish my first week, I can sum the experience in a word:<br />
overwhelming. The tools for accessing internal information are better<br />
than I expected, but both the volume of baseline knowledge&#8211;technical<br />
and cultural&#8211;and the relentlessness of the update stream are<br />
daunting.</p>
<p>Indeed, the internal ecosystem is so rich that it&#8217;s easy to forget<br />
there is a world outside it&#8211;ironic given Google&#8217;s enormous role in the world outside it! Then again, this is just my first week&#8211;it will take me<br />
some time to pop up the stack from the <a href="http://en.wikipedia.org/wiki/Build_system">build system</a> to the surface.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/13/week-1-at-google-information-overload/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The Noisy Noogler: A Quick FAQ</title>
		<link>http://thenoisychannel.com/2009/11/10/the-noisy-noogler-a-quick-faq/</link>
		<comments>http://thenoisychannel.com/2009/11/10/the-noisy-noogler-a-quick-faq/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 13:43:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2774</guid>
		<description><![CDATA[I&#8217;m barely 24 hours into my new life as a Googler, and I&#8217;ve already gotten lots of questions! Here at the answers to a few of them: Will I continue blogging at The Noisy Channel? Absolutely! I&#8217;m committed to posting at least weekly, and I&#8217;ll try to do better than that once I&#8217;m settled into [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m barely 24 hours into my new life as a Googler, and I&#8217;ve already gotten lots of questions! Here at the answers to a few of them:</p>
<p><strong>Will I continue blogging at The Noisy Channel?</strong></p>
<p>Absolutely! I&#8217;m committed to posting at least weekly, and I&#8217;ll try to do better than that once I&#8217;m settled into my new environment.</p>
<p><strong>Will I participate in scholarly conferences and workshops?</strong></p>
<p>Of course! I&#8217;m co-organizing <a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>, which will be held in conjunction with <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a> in February, and of course <a href="http://iiix2010.org/hcir.php">HCIR 2010</a>, which will be held in conjunction with <a href="http://iiix2010.org/">IIiX 2010</a> in August. You probably won&#8217;t see me at vendor fests, but I do hope to continue bringing industry practitioners and academic researchers together.</p>
<p><strong>Will I blog about Google?</strong></p>
<p>I certainly won&#8217;t disclose any confidential information&#8211;people get <a href="http://blogoscoped.com/archive/2005-02-08-n55.html">fired</a> for that&#8211;or worse. And, given how much access I will have to such information, I will err on the side of caution, only discussing information that I&#8217;m sure Google has released to the general public. Beyond that, I&#8217;ll exercise common sense. I don&#8217;t want to either come across as a shill for my employer or to spar with my new colleagues in public. Subject to those constraints, however, I can and will blog about Google.</p>
<p><strong>Can I get you a job at Google?</strong></p>
<p>I can advise you and connect you to a recruiter, but that&#8217;s the limitof my power. The hiring process here is specifically designed to prevent any individual from manipulating it&#8211;even me!</p>
<p><strong>Will I talk about what I&#8217;m working on?</strong></p>
<p>See above regarding confidential information. I&#8217;ll be delighted talk about anything I&#8217;m working on that Google has decided to disclose publicly.</p>
<p><strong>Does Google know about my karaoke habit?</strong></p>
<p>Too late, they&#8217;ve already signed the offer letter. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/10/the-noisy-noogler-a-quick-faq/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Apologies for Slow Response Times</title>
		<link>http://thenoisychannel.com/2009/11/06/apologies-for-slow-response-times/</link>
		<comments>http://thenoisychannel.com/2009/11/06/apologies-for-slow-response-times/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 23:11:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2770</guid>
		<description><![CDATA[I am without my own laptop for a few days as I manage a transition between jobs. So I apologize in advance if I am slow to respond to email, comments, or tweets over the weekend. I&#8217;ll be back at full steam early next week.]]></description>
			<content:encoded><![CDATA[<p>I am without my own laptop for a few days as I manage a <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">transition between jobs</a>. So I apologize in advance if I am slow to respond to email, comments, or tweets over the weekend. I&#8217;ll be back at full steam early next week.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/06/apologies-for-slow-response-times/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Going (to) Google</title>
		<link>http://thenoisychannel.com/2009/11/06/going-to-google/</link>
		<comments>http://thenoisychannel.com/2009/11/06/going-to-google/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 21:39:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2763</guid>
		<description><![CDATA[This is my last week at Endeca. The decision to leave has been a heart-wrenching one: not only have the past ten years been the best of my life, but my experiences at Endeca have defined me professionally. Moreover, Endeca is riding a wave of success with recent advances in our products, new relationships with [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.slideshare.net/dtunkelang/google-tech-talk-reconsidering-relevance-presentation"><img class="size-medium wp-image-2764 aligncenter" title="McGoogle" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/McGoogle-300x107.jpg" alt="McGoogle" width="300" height="107" /></a></p>
<p>This is my last week at <a href="http://endeca.com/">Endeca</a>. The decision to leave has been a heart-wrenching one: not only have the past ten years been the best of my life, but my experiences at Endeca have defined me professionally. Moreover, Endeca is riding a wave of success with recent advances in our products, new relationships with key partners, and fascinating new deployments.  (You can read Endeca’s latest announcements in our <a href="http://www.endeca.com/news-and-events-press-releases.htm">newsroom</a>).</p>
<p>Ironically, it is this very success that compels me to move on. In the past several years, I have developed an increasing passion for search on the open web&#8211;an interest only furthered by the explosion of social media.</p>
<p>That is why I&#8217;ve decided to accept an opportunity at Google&#8217;s New York office. Readers here know that I&#8217;ve been a very public critic of Google&#8217;s simplistic approach to user interaction on the open web. I&#8217;m being offered an opportunity to help fix that approach&#8211;and it is an offer I can&#8217;t refuse. My mission is to apply my passion for <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">human-computer information retrieval</a> (HCIR), an approach that Endeca has pioneered in the enterprise, to the world&#8217;s largest information problems&#8211;and where better to do that than at the company that aspires to organize the world&#8217;s information.</p>
<p>This moment is bittersweet: I am excited about the new experiences that await me, but I have a heavy heart as I turn in my badge and part with a world-class team that has succeeded against incredible odds.</p>
<p>Given my role and tenure at Endeca, I want to say explicitly that this move is about my personal ambition. My passion for web search and social media, which have grown exponentially over the past couple of years, simply doesn&#8217;t align with Endeca&#8217;s focus in the enterprise.</p>
<p>Also, I want to make clear: Google hired me because of my values, and not in spite of them. I know that some folks will find it difficult to reconcile my criticisms of Google with my decision to join. That&#8217;s why there&#8217;s an <a href="http://www.theonion.com/content/video/google_opt_out_feature_lets_users">opt-out village</a>! Seriously, though, I take my values with me. Google is offering me the opportunity to channel my passion for HCIR into action, on the world&#8217;s largest stage. I&#8217;m well aware of the magnitude of the challenge, but hey, I&#8217;m feeling lucky.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/06/going-to-google/feed/</wfw:commentRss>
		<slash:comments>63</slash:comments>
		</item>
		<item>
		<title>Twitter Lists as an Influence Measure?</title>
		<link>http://thenoisychannel.com/2009/11/01/twitter-lists-as-an-influence-measure/</link>
		<comments>http://thenoisychannel.com/2009/11/01/twitter-lists-as-an-influence-measure/#comments</comments>
		<pubDate>Sun, 01 Nov 2009 05:40:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2757</guid>
		<description><![CDATA[In &#8220;Using Twitter Lists To Judge Influence&#8220;, Todd Zeigler of the Bivings Report writes: I think Twitter Lists will end up helping separate the men from the boys when it comes to influence.  In addition to seeing a Twitter users follower count, we can now see the number of other Twitter users who have added [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.amazon.com/Influence-Mary-Kate-Olsen/dp/159514210X"><img class="alignnone size-full wp-image-2758" title="Influence" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/influence.jpg" alt="Influence" width="179" height="220" /></a></p>
<p>In &#8220;<a href="http://www.bivingsreport.com/2009/using-twitter-lists-to-judge-influence/">Using Twitter Lists To Judge Influence</a>&#8220;, Todd Zeigler of the <a href="http://www.bivingsreport.com/">Bivings Report</a> writes:</p>
<blockquote><p>I think Twitter Lists will end up helping separate the men from the boys when it comes to influence.  In addition to seeing a Twitter users follower count, we can now see the number of other Twitter users who have added them to lists (example to the right).  I would argue that getting added to a list is a bigger deal than simply getting someone to follow you.</p></blockquote>
<p>I&#8217;m certainly intrigued by <a href="http://blog.twitter.com/2009/10/theres-list-for-that.html">Twitter Lists</a>, but I&#8217;m skeptical that counting how many lists someone is on will prove that much more useful than follower count. For example, <a href="http://twitter.com/dtunkelang">I</a> currently have <a href="http://twitter.com/dtunkelang/followers">1159 followers</a>, am on <a href="http://twitter.com/dtunkelang/lists/memberships">33 lists</a>, and have a <a href="http://twitter.com/dtunkelang/followers">TunkRank of 24.1</a>. For grins, here&#8217;s a handful of people who have similar stats:</p>
<ul>
<li><a href="http://twitter.com/kansandhaus">Evan Sandhaus</a>: 796 followers, 21 lists, TunkRank = 17.2</li>
<li><a href="http://twitter.com/jny2">Josh Young</a>: 801 followers, 25 lists, TunkRank = 14.3</li>
<li><span><a href="http://twitter.com/cjahearn">Chris Ahearn</a>: 1108 followers, 14 lists, TunkRank = </span>30.1</li>
<li><a href="http://twitter.com/brynn">Brynn Evans</a>: 1303 followers, 33 lists, TunkRank = 18.9</li>
<li><a href="http://twitter.com/eric_andersen">Eric Andersen</a>: 1543 followers, 37 lists, TunkRank = 3.1</li>
</ul>
<p>While I can&#8217;t generalize from a few arbitrarily selected data points (though Gladwell seems to have no trouble doing so in <a href="http://en.wikipedia.org/wiki/Outliers_%28book%29"><em>Outliers</em></a>), my suspicion is that list count will be highly correlated to follower count&#8211;and may actually be a noisier signal because the numbers are so much smaller.</p>
<p>Of course, there&#8217;s no reason we should use raw list counts&#8211;any more than we should use raw follower counts. Just as <a href="http://tunkrank.com/">TunkRank</a> aspires to <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">model attention scarcity</a> and recognizes that not all followers are created equal, an effective measure of how lists contribute to influence must recognize that not all list memberships are created equal either.</p>
<p>I&#8217;ve been chatting with <a href="http://twitter.com/chl">Chris Langreiter</a>, who is working on <a href="http://etherpad.com/HoPv2hJ4GB">enhancements to TunkRank</a> to address some of the oversimplifications of its model, as well as with <a href="http://twitter.com/jonathanglick">Jonathan Glick</a> and <a href="http://twitter.com/kenreisman">Ken Reisman</a> at <a href="http://www.tlists.com/">TLists</a>. I&#8217;d like to see online influence&#8211;on Twitter and in general&#8211;measured more effectively. It will be great if lists can help, but we can&#8217;t make the same naive mistakes as those who were quick to embrace <a href="http://thenoisychannel.com/2008/12/27/loic-le-meur-misses-the-point-of-twitter/">follower count as a measure of authority</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/01/twitter-lists-as-an-influence-measure/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Tuning in to Google Music Search</title>
		<link>http://thenoisychannel.com/2009/10/29/tuning-in-to-google-music-search/</link>
		<comments>http://thenoisychannel.com/2009/10/29/tuning-in-to-google-music-search/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 17:09:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2751</guid>
		<description><![CDATA[With all of the activity around e-books last week, you might think that the online world wasn&#8217;t paying attention to the media category most transformed by the Internet music. But a week is a lifetime in the ADD-addled technology press, and today&#8217;s top story is that Google is &#8220;making search more musical&#8220;. From the official [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/DV24RBmy-2I&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/DV24RBmy-2I&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>With all of the activity around <a href="http://thenoisychannel.com/2009/10/20/books-books-books/">e-books</a> last week, you might think that the online world wasn&#8217;t paying attention to the media category most transformed by the Internet music. But a week is a lifetime in the <a href="http://en.wikipedia.org/wiki/Attention_deficit_disorder">ADD</a>-addled technology press, and today&#8217;s top story is that Google is &#8220;<a href="http://googleblog.blogspot.com/2009/10/making-search-more-musical.html">making search more musical</a>&#8220;. From the official blog post:</p>
<blockquote><p>Now, when you enter a music-related query — like the name of a song, artist or album — your search results will include links to an audio preview of those songs provided by our music search partners <a href="http://www.myspace.com/">MySpace</a> (which just acquired <a href="http://www.ilike.com/">iLike</a>) or <a href="http://www.lala.com/">Lala</a>. When you click the result you&#8217;ll be able to listen to an audio preview of the song directly from one of those partners.</p></blockquote>
<p>As with most Google features, this one is being rolled out gradually. If you&#8217;re impatient (like me), you can try it directly from <a href="http://www.google.com/landing/music/">this page</a>. Or you can watch the video above.</p>
<p>My first impression: this is great feature to improve <a href="http://www.db.dk/bh/Core%20Concepts%20in%20LIS/articles%20a-z/known_item_search.htm">known-item search</a>, and it&#8217;s nice that they&#8217;ve partnered with folks that often let you hear whole songs, rather than 30-second snippets. The selection seems limited, but it could be that my tastes are a bit obscure. I&#8217;m curious if others share my sense that the catalog is much smaller than the ones on iTunes or Amazon.</p>
<p>But, as music IR specialist and fellow <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> advocate <a href="http://www.fxpal.com/?p=jeremy">Jeremy Pickens</a> points out, Google is &#8220;<a href="http://irgupf.com/2009/10/28/doing-to-music-what-they-did-to-the-web/">doing to music what they did to the web</a>&#8220;. I&#8217;m not as concerned as Jeremy is about the prospect of musical tastes being homogenized through the &#8220;rich get richer&#8221; effect of ranking&#8211;perhaps because we&#8217;re already there. Not only is pop music self-perpetuating (see this great <a href="http://www.princeton.edu/~mjs3/salganik_watts09.pdf">study</a> by my friend (and Princeton sociologist) <a href="http://www.princeton.edu/~mjs3/">Matt Salganik</a> and his former advisor <a href="http://en.wikipedia.org/wiki/Duncan_J._Watts">Duncan Watts</a>), but even <a href="http://thenoisychannel.com/2009/02/24/how-recommendation-engines-quash-diversity/">recommendation engines quash diversity</a>. Google really can&#8217;t make things that much worse.</p>
<p>Besides, much as Google&#8217;s default search leads many searchers to Wikipedia, a great starting point for <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>, the new music search leads users to <a href="http://www.pandora.com/">Pandora</a>, which i<span style="text-decoration: line-through;">s probably the leading engine for exploratory music search</span> offers users a more exploratory user experience (though it would be great if they also linked to <a href="http://www.last.fm/">last.fm</a>) <strong><em>(thanks Jeremy!)</em></strong>. OK, maybe &#8220;leads&#8221; is a strong word for a &#8220;listen on&#8221; link below the search result, but it&#8217;s there for people in the know.</p>
<p>I&#8217;d love to see Google embrace HCIR. But I appreciate the improvements to known-item search too, especially if they can delegate the HCIR functionality to others that focus on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/29/tuning-in-to-google-music-search/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Ben Shneiderman&#8217;s HCIR 2009 Keynote: The Future of Information Discovery</title>
		<link>http://thenoisychannel.com/2009/10/27/ben-shneidermans-hcir-2009-keynote-the-future-of-information-discovery/</link>
		<comments>http://thenoisychannel.com/2009/10/27/ben-shneidermans-hcir-2009-keynote-the-future-of-information-discovery/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 17:10:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2739</guid>
		<description><![CDATA[The slides for Ben Shneiderman&#8216;s HCIR 2009 keynote on &#8220;The Future of Information Discovery&#8221; are now available on the workshop web site. I&#8217;ve also taken the liberty to upload them to SlideShare and embed them here. The slides don&#8217;t do justice to Ben&#8217;s presentation style, but hopefully they at least communicate a taste of the [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_2358772" style="width: 477px; text-align: left;"><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="477" height="340" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayerd.swf?doc=hcir2009-futureinfodiscovery3-091027115639-phpapp01&amp;stripped_title=the-future-of-information-discovery" /><param name="allowfullscreen" value="true" /><embed style="margin:-0px" type="application/x-shockwave-flash" width="477" height="340" src="http://static.slidesharecdn.com/swf/ssplayerd.swf?doc=hcir2009-futureinfodiscovery3-091027115639-phpapp01&amp;stripped_title=the-future-of-information-discovery" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>The slides for <a href="http://www.cs.umd.edu/~ben/">Ben Shneiderman</a>&#8216;s <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> keynote on &#8220;<a href="http://cuaslis.org/hcir2009/HCIR2009-FutureInfoDiscovery3.pdf">The Future of Information Discovery</a>&#8221; are now available on the <a href="http://cuaslis.org/hcir2009/">workshop web site</a>. I&#8217;ve also taken the liberty to upload them to SlideShare and embed them here. The slides don&#8217;t do justice to Ben&#8217;s presentation style, but hopefully they at least communicate a taste of the material he covered and his vision of where <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a> needs to go as a field and community.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/27/ben-shneidermans-hcir-2009-keynote-the-future-of-information-discovery/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Experimenting with Social Search</title>
		<link>http://thenoisychannel.com/2009/10/26/google-experimenting-with-social-search/</link>
		<comments>http://thenoisychannel.com/2009/10/26/google-experimenting-with-social-search/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 20:31:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2734</guid>
		<description><![CDATA[Google may be an also-ran in the social networking market with its Brazil-centric Orkut service, but that hasn&#8217;t stopped the search giant from adding social features to its products. A post at the (unofficial) Google Operating System blog recounts the history of Google Reader&#8217;s social evolution, up to but not including its latest update last [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="560" height="272" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/ZqWJxgp-_mU&amp;hl=en&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="560" height="272" src="http://www.youtube.com/v/ZqWJxgp-_mU&amp;hl=en&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Google may be an also-ran in the <a href="http://en.wikipedia.org/wiki/List_of_social_networking_websites">social networking market</a> with its Brazil-centric <a href="http://en.wikipedia.org/wiki/Orkut">Orkut</a> service, but that hasn&#8217;t stopped the search giant from adding social features to its products. A post at the (unofficial) Google Operating System blog recounts the history of <a href="http://googlesystem.blogspot.com/2009/07/google-readers-social-evolution.html">Google Reader&#8217;s social evolution</a>, up to but not including its <a href="http://googleblog.blogspot.com/2009/10/reading-gets-personal-with-popular.html">latest update</a> last week. <a href="http://googleblog.blogspot.com/2008/11/searchwiki-make-search-your-own.html">SearchWiki</a>, though not a social search feature per se, allows users to share personal annotations of their search results, as does the more recently introduced <a href="http://www.google.com/sidewiki/intl/en/index.html">Sidewiki</a>. And, <a href="http://blog.twitter.com/2009/10/bing-goes-dynamite.html">like Bing</a>, Google has established a <a href="http://blog.twitter.com/2009/10/google-nice.html">partnership with Twitter</a> in order to surface &#8220;social&#8221; results.</p>
<p>But the feature announced today, which Google is actually calling &#8220;<a href="http://googleblog.blogspot.com/2009/10/introducing-google-social-search-i.html">Social Search</a>&#8220;, is a much bigger step, even if it is tucked away as an <a href="http://www.google.com/experimental/">experiment on Google Labs</a>. From the official blog post:</p>
<blockquote><p>With Social Search, Google finds relevant public content from your friends and contacts and highlights it for you at the bottom of your search results. When I do a simple query for [new york], Google Social Search includes my friend&#8217;s blog on the results page under the heading &#8220;Results from people in your social circle for New York.&#8221; I can also filter my results to see only content from my social circle by clicking &#8220;Show options&#8221; on the results page and clicking &#8220;Social.&#8221;</p></blockquote>
<p>I gave it a whirl, search for <a href="http://www.google.com/search?q=&quot;noisy+channel&quot;">&#8220;noisy channel&#8221;</a> and then restricting the search to content from what Google considers my social circle. The results are as promised, and could further refine to results by author name, selecting from a familiar list of Neal Richter, Jason Adams, Daniel Lemire. Ken Ellis, and Joshua Young (<span style="text-decoration: line-through;">though for some reason Josh&#8217;s link didn&#8217;t work</span>). Cool! Except that there are a lot of names missing (check out the bloggers in <a href="http://thenoisychannel.com/the-noisy-community/">The Noisy Community</a>) and, more importantly, I can&#8217;t further refine or even sort the search results. Indeed, the ordering of search results seems quite arbitrary&#8211;a phenomenon I&#8217;ve noticed more generally for search engine ranking of social media content.</p>
<p>In short, Google Social Search is a welcome initiative, but there&#8217;s a lot more work to do before I would find a productive use for it. Given the mismatch between social search and black-box relevance ranking, a little bit of <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a> would go a long way towards making this feature practically useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/26/google-experimenting-with-social-search/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>HCIR 2009: Human-Human Interaction</title>
		<link>http://thenoisychannel.com/2009/10/26/hcir-2009-human-human-interaction/</link>
		<comments>http://thenoisychannel.com/2009/10/26/hcir-2009-human-human-interaction/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 14:24:19 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2727</guid>
		<description><![CDATA[On Friday, I had the privilege of seeing just how much the annual Workshop on Human-Computer Information Retrieval has grown up since I conceived it in the summer of 2007. Back then, my co-conspirators and I worried about attracting a critical mass of participants&#8211;indeed, Endeca employees easily accounted for a quarter of the attendees (and [...]]]></description>
			<content:encoded><![CDATA[<p>On Friday, I had the privilege of seeing just how much the annual <a href="http://cuaslis.org/hcir2009/">Workshop on Human-Computer Information Retrieval</a> has grown up since I conceived it in the summer of 2007. Back then, my co-conspirators and I worried about attracting a critical mass of participants&#8211;indeed, <a href="http://endeca.com/">Endeca</a> employees easily accounted for a quarter of the attendees (and submissions) at the <a href="http://projects.csail.mit.edu/hcir/">first HCIR workshop</a>. And even <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2008/">last year</a> host and co-sponsor <a href="http://research.microsoft.com/">Microsoft Research</a> supplied a disproportionate share of the attendees.</p>
<p>But this year was different. We were overloaded with strong submissions from all corners, and we had to turn people away for lack of capacity! While we didn&#8217;t relish saying no to prospective participants, these are great problems to have! And, thanks to Nick Belkin and Diane Kelly, we&#8217;ve arranged to greatly increase that capacity at <a href="http://iiix2010.org/hcir.php">HCIR 2010</a>&#8211;more on that in a moment.</p>
<p><a href="http://www.cs.swan.ac.uk/~csmax/">Max Wilson</a> has already written up an <a href="http://www.cs.swan.ac.uk/~csmax/blog/2009/10/hcir09-redux/">excellent summary</a> of the workshop, which I encourage you to read. You can also see the live tweet stream at <a href="http://search.twitter.com/search?q=%23hcir09">#hcir09</a>. Rather than duplicate these efforts, let me add my personal reflections as an organizer and participant.</p>
<p><a href="http://www.cs.umd.edu/~ben/">Ben Shneiderman</a>&#8216;s keynote address was sweeping and inspiring. I expected him to talk about <a href="http://en.wikipedia.org/wiki/Information_visualization">information visualization</a>, the area where he is most known for his contributions. He did present some examples of his group&#8217;s work on <a href="http://www.cs.umd.edu/hcil/lifelines2/">visualization-centric interfaces to support medical research</a>, but his overall presentation took the much more ambitious approach of discussing the past, present, and possible future of <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a>. Specifically, he urged us to link our work to societal goals, such as the <a href="http://www.un.org/millenniumgoals/">United Nations Millennium Development Goals</a>. His challenge may seem impossibly idealistic, but I agree with his assertion that it is a practical one: we will do our best research by grounding ourselves firmly in the real and pressing problems of our age. <a href="http://research.microsoft.com/en-us/um/people/sdumais/">Last year&#8217;s keynote speaker</a> went on to win the <a href="http://www.sigir.org/awards/awards.html">Gerard Salton Award</a>; I can only hope that Ben receives comparable accolades for his past accomplishments and future contributions to HCIR.</p>
<p>A new feature for this year&#8217;s workshop was having a &#8220;poster boaster&#8221; session, in which each of the presenters in the poster session had one minute to pitch his or her work.  For those of you unfamiliar with this format, I highly recommend it. The compressed format forces presenters to distill the essence of their contributions&#8211;a useful exercise in general. And the audience doesn&#8217;t get bored: if you decide halfway into a presentation that you aren&#8217;t interested, then you only have to wait 30 seconds until the next one! Not the we had that problem: the posters were consistently interesting, as the submissions were unusually strong this year. You can download the full workshop proceedings <a href="http://cuaslis.org/hcir2009/HCIR2009.pdf">here</a>.</p>
<p>Even the full presentations weren&#8217;t that long. The five speakers were each allotted ten minutes, with a healthy amount of time reserved for a panel-style Q&amp;A sessions. The papers in this session were, by design, some of the more controversial ones. In particular, <a href="http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/v/Voorhees:Ellen_M=.html">Ellen Voorhees</a> delivered a full-throated defense of <a href="http://en.wikipedia.org/wiki/Cranfield_Experiments">Cranfield</a> / <a href="http://en.wikipedia.org/wiki/Text_Retrieval_Conference">TREC</a>-style evaluation: &#8220;I Come Not to Bury Cranfield, but to Praise It&#8221; (similar to her <a href="http://www.dcs.gla.ac.uk/workshops/air/slides/EllenVoorhees-TestCollectionsforAIR.pdf">presentation</a> at the <a href="http://www.dcs.gla.ac.uk/workshops/air/">2006 Workshop on Adaptive Information Retrieval</a> that I <a href="http://thenoisychannel.com/2008/04/17/ellen-voorhees-defends-cranfield/">discussed</a> on this blog last year). Her reminder of HCIR&#8217;s challenges on the evaluation front surely ruffled some feathers, but all of us HCIR avocates need to address these challenges if we want researchers (and practitioners) outside our community to drink our kool-aid.</p>
<p>The above format was already quite interactive (as befits a workshop about interaction), but the second half of the day was explicitly designed to facilitate discussion. We had lunch on site, followed by a one-hour poster session.  We then had two one-hour guided discussion sessions to address the theoretical and practical concerns of HCIR. As organizers, we seeded both sessions with questions, but we also incorporated concerns that had come up during earlier discussions.</p>
<p>Finally, I am grateful to our sponsors. <a href="http://slis.cua.edu/">Catholic University</a> was a gracious host and sponsor, providing the workshop with a great space and very helpful student volunteers. Between that and the financial contributions of <a href="http://endeca.com/">Endeca</a> and <a href="http://research.microsoft.com/">Microsoft Research</a>, we were able to continue our tradition of not charging attendees for the workshop. I can&#8217;t promise that will continue indefinitely, but I am glad that our insistence on emphasizing substance over frivolous amenities has helped us deliver what I believe to be some of the best bang-for-buck in the scholarly community.</p>
<p>I&#8217;m already excited about <a href="http://iiix2010.org/hcir.php">HCIR 2010</a>. Unlike the past three workshops, which have been held as independent events, next year&#8217;s workshop will be co-located with the <a href="http://iiix2010.org/">Information Interaction in Context Symposium (IIiX’10)</a> in New Brunswick, New Jersey. The workshop will take place on August 22nd, breaking our unintended tradition of holding the workshop on October 23rd. <a href="http://comminfo.rutgers.edu/~belkin/belkin.html">Nick Belkin</a> assures us that there will be lots of space, so hopefully we&#8217;ll be able to accommodate everyone who is interested. We&#8217;ll also be soliciting sponsors for both the workshop and the broader symposium.</p>
<p>But there&#8217;s more to HCIR than enjoying each other&#8217;s company at workshops. We must spend the remaining 364 days of the year fleshing out our vision, and relating that vision not only to the disciplines HCIR explicitly integrates, but to pressing social concerns. It is up to us all to make our work relevant.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/26/hcir-2009-human-human-interaction/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Off To DC</title>
		<link>http://thenoisychannel.com/2009/10/20/off-to-dc/</link>
		<comments>http://thenoisychannel.com/2009/10/20/off-to-dc/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 01:30:57 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2724</guid>
		<description><![CDATA[I&#8217;m heading to Washington, DC tomorrow morning, a couple of days before the HCIR &#8217;09 workshop. I&#8217;m not sure I&#8217;ll have any opportunities to blog while I&#8217;m in the nation&#8217;s capital, but of course I&#8217;ll post a write-up about the workshop when I&#8217;m back! Meanwhile, if you need your blog fix, I encourage you to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m heading to Washington, DC tomorrow morning, a couple of days before the <a href="http://cuaslis.org/hcir2009/">HCIR &#8217;09</a> workshop. I&#8217;m not sure I&#8217;ll have any opportunities to blog while I&#8217;m in the nation&#8217;s capital, but of course I&#8217;ll post a write-up about the workshop when I&#8217;m back! Meanwhile, if you need your blog fix, I encourage you to check out some of the <a href="http://thenoisychannel.com/category/blogs-i-read/">blogs I read</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/20/off-to-dc/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Books! Books! Books!</title>
		<link>http://thenoisychannel.com/2009/10/20/books-books-books/</link>
		<comments>http://thenoisychannel.com/2009/10/20/books-books-books/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 04:26:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2717</guid>
		<description><![CDATA[When my daughter was born almost two years ago, I wondered if she&#8217;d grow up reading books. After all, I do most of my reading online, and increasingly find myself reading short articles rather than whole books. Needless to say, she&#8217;s loved books so far, even if she&#8217;s shredded a few. But the bigger surprise [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.thesharksfoundation.com/reading/index.asp"><img class="alignnone" title="Reading Is Cool" src="http://www.thesharksfoundation.com/images/reading/ric_logo.gif" alt="" width="400" height="161" /></a></p>
<p>When my <a href="http://www.flickr.com/photos/24264445@N05/">daughter</a> was born almost two years ago, I wondered if she&#8217;d grow up reading books. After all, I do most of my reading online, and increasingly find myself reading short articles rather than whole books. Needless to say, she&#8217;s loved books so far, even if she&#8217;s shredded a few.</p>
<p>But the bigger surprise for me is that books&#8211;specifically e-books&#8211;have become such a hot industry. When I briefly worked for a consulting firm after grad school in 1999, my first assignment was to evaluate the e-book market. The readers then consisted of the <a href="http://www.answers.com/topic/rocketbook">Rocket ebook</a> and <a href="http://www.ideo.com/work/item/softbook-reader/">SoftBook Reader</a>. Needless to say, I correctly predicted at the time that the ebook-market wasn&#8217;t ready for prime time.</p>
<p>But fast forward to the present. Amazon has given the e-book market some credibility: Citigroup says they sold <a href="http://mediamemo.allthingsd.com/20090203/citi-says-amazon-sold-500000-kindles-last-year-12-billion-business-next-year/">500K Kindles in 2008</a>, and Forrester predicted they will sell <a href="http://mediamemo.allthingsd.com/20091007/the-coming-kindle-boom-sales-could-double-in-2010/">1.8M units this year</a>.</p>
<p>But the last days (and even the last 24 hours!) of news show that the e-book market is only starting to open up:</p>
<ul>
<li>In May, Sony, whose e-reader sales have lagged behind the Kindle, announced a <a href="http://www.nytimes.com/2009/03/19/technology/19sony.html">partnership with Google</a> in May in order to make copyright-free books available for free.</li>
<li>Google just announced a service called <a href="http://www.google.com/hostednews/ap/article/ALeqM5gr_qJI9KI8h7PBC-AEeknD3ezkegD9BBHAT80">Editions</a> that it plans to launch in 2010 (by when it will have presumably settled the <a href="http://books.google.com/googlebooks/agreement/">Google Books Settlement Agreement</a>).</li>
<li>The Internet Archive just announced the <a href="http://www.archive.org/bookserver">Bookserver</a> project as &#8220;a growing open architecture for vending and lending digital books over the Internet&#8221;.</li>
<li>Spring Design just announced <a href="http://www.springdesign.com/resource/jsp/">Alex</a>, an e-book reader based on Google&#8217;s Android operating system.</li>
<li> Barnes &amp; Noble is expected to <a href="http://www.engadget.com/2009/10/19/barnes-and-noble-nook-color-e-reader-out-tuesday-for-259-says/">announce an e-reader</a> that competes directly with the Kindle and has generated lots of buzz through <a href="http://gizmodo.com/5380942/exclusive-first-photos-of-barnes--nobles-double-screen-e+reader">leaked photos</a>.</li>
</ul>
<p>I grew up on books, and I&#8217;m excited to see that, a decade after the initial market failures, e-books (like touchscreens) are a mainstream reality. I still worry about <a href="http://thenoisychannel.com/2009/10/19/who-will-buy/">who will buy</a> them, especially considering that the marginal cost of distributing a typical e-book is even less than that of distributing a 5-minute song. A quick scan of a popular file-sharing site reveals that the pdf version of bestseller <em>The Lost Symbol</em> takes up less than 3MB.</p>
<p>Still, I&#8217;ll take a moment to celebrate the progress of technology. I&#8217;ve always known that reading was cool, but now we have the gadgets to prove it!</p>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;"><a class="new" title="Rocket ebook (page does not exist)" href="http://en.wikipedia.org/w/index.php?title=Rocket_ebook&amp;action=edit&amp;redlink=1">Rocket ebook</a> and <a class="new" title="Softbook (page does not exist)" href="http://en.wikipedia.org/w/index.php?title=Softbook&amp;action=edit&amp;redlink=1">Softbook</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/20/books-books-books/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Who Will Buy?</title>
		<link>http://thenoisychannel.com/2009/10/19/who-will-buy/</link>
		<comments>http://thenoisychannel.com/2009/10/19/who-will-buy/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 18:12:46 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2711</guid>
		<description><![CDATA[As some of you know, I&#8217;m a karaoke junkie. But it&#8217;s my wife who has the classier repertoire, including &#8220;Who Will Buy?&#8221; from the musical Oliver!: Who will buy this wonderful morning? Such a sky you never did see! Who will tie it up with a ribbon And put it in a box for me? [...]]]></description>
			<content:encoded><![CDATA[<p>As some of you know, I&#8217;m a karaoke junkie. But it&#8217;s my wife who has the classier repertoire, including &#8220;Who Will Buy?&#8221; from the musical <a href="http://en.wikipedia.org/wiki/Oliver!"><em>Oliver!</em></a>:</p>
<p><em>Who will buy this wonderful morning?<br />
Such a sky you never did see!<br />
Who will tie it up with a ribbon<br />
And put it in a box for me?</em></p>
<p>Of course, the trope that the best things in life are free predates musical theater, let alone the web. But recent years have witnessed dramatic changes in our price sensitivities in every genre of digital (or digitizable) content, and I&#8217;m curious (sometimes morbidly so) about where it goes from here.</p>
<p>I won&#8217;t make you suffer through a rant about the malaise of the music and news industries&#8211;those topics, important as they are, have been overplayed in the blogosphere. If you need a refresher, I suggest <a href="http://www.free-culture.cc/">Lawrence Lessig</a> and the <a href="http://www.niemanlab.org/">Nieman Journalism Lab</a> as some of the more rational voices contributing to the discussion.</p>
<p>But it&#8217;s not just news and music that are experiencing the effects of the &#8220;<a href="http://en.wikipedia.org/wiki/Information_wants_to_be_free">information wants to be free</a>&#8221; movement. Consider these industries:</p>
<ul>
<li><strong>Books</strong>. Many publishers worry that the Kindle has been setting a consumer expectation that a book <a href="http://www.wired.com/gadgetlab/2009/04/kindle-readers/">should only cost $10</a>. Indeed, a recent <a href="http://online.wsj.com/article/SB125565024634288895.html">price war between Amazon and Wal-Mart</a> drove some of those prices down to $8.99. Is this a boon for consumers, or a body blow to the publishing industry? It&#8217;s easy to evoke the $0.99 / per song expectation set by iTunes&#8211;but that change was more about disaggregating albums than about changing the per-unit cost. Besides, books have not yet had to confront the scale of unauthorized distribution that we see in the music industry. Legal or not, free is a potent source of price pressure.</li>
<li><strong>Software</strong>. <a href="http://www.wolframalpha.com/">Wolfram Alpha</a> just made headlines by releasing a <a href="http://www.techcrunch.com/2009/10/18/wolfram-alpha-miscalculates-what-its-iphone-app-should-cost/">$50 iPhone app</a>. Many have reacted that such a high price is outrageous and will doom the application to failure. They may be right on that latter point&#8211;the market will vote with its clicks soon enough. But I&#8217;m old enough to remember $50 as being in the ballpark of what it cost to purchase a new consumer software application. Even then, unauthorized distribution was an issue&#8211;remember the &#8220;<a href="http://en.wikipedia.org/wiki/Don%27t_Copy_That_Floppy">don&#8217;t copy that floppy</a>&#8221; campaign? Today, my impression is that few people consciously purchase consumer software&#8211;a trend that I at least date to Microsoft&#8217;s strategy of bundling its software into PC purchases. The most noted exceptions are console games (which are impressive holdouts in the consumer software space) and iPhone apps&#8211;with the caveat that only a tiny minority of apps make enough money for the creators to live on. <em>(Update: just saw <a href="http://uk.games.ign.com/articles/103/1036254p1.html">this note</a> about how EA Sports President Peter Moore sees the current console game business model of cartridges and discs as a &#8220;burning platform&#8221;.)</em></li>
<li><strong>Television</strong>. Between <a href="http://charlie-federman.blogspot.com/2009/02/is-boxee-cables-napster.html">Boxee</a> and <a href="http://www.wired.com/techbiz/it/magazine/17-10/ff_netflix">Netflix</a>, there is a real chance that digital content&#8217;s cash cow, cable television, will see its regional monopolies disrupted. I can&#8217;t imagine that anyone will shed a tear for the cable companies. And yet I can&#8217;t help but wonder what happens as the notion of premium content is subsumed by an expectation that video content should be free. Are we heading towards a proliferation of cheaply produced reality TV, contests, and game shows&#8211;all sponsored by rampant product placement?</li>
</ul>
<p>If we are to believe <a href="http://www.techdirt.com/articles/20090701/0422125421.shtml">Mike Masnick</a>, then the price of content is driven to its marginal cost. It&#8217;s pretty clear that the marginal cost of distributing most digital content is, while not free, close enough to be a rounding error. Should we be looking forward to a world where no one can charge consumers for content? Folks like <a href="http://www.buzzmachine.com/2009/07/08/what-google-would-do/">Jeff Jarvis</a> and <a href="http://www.wired.com/techbiz/it/magazine/16-03/ff_free">Chris Anderson</a> are cheerleading such a world as not only inevitable but a good thing&#8211;though both of them have had the sense to make some money on non-free books while the going is good.</p>
<p>Yes, there are and will always be business models to support content creators. In particular, one-time content (live events, consulting services) has  some degree of insulation from the inexorable trend toward free. But what an inefficient turn of events, if people are rewarded for creating one-time content but not for creating far more valuable content that is useful to a broad audience of consumers!</p>
<p>I know that there are non-financial incentives that drive scholars, open-source developers, and activists to create free content. Indeed, I personally write this blog without any direct financial incentive. Perhaps these incentives will be the driving forces for content creation in the 21st century. One way or another, I hope we find a way to fund the things we value, rather than devolving into a locally optimal rut where value creation isn&#8217;t economic for the creators.</p>
<p>p.s. You can find the lyrics to Oliver for free <a href="http://users.bestweb.net/~foosie/oliver.htm">online</a>, and you can easily view an free (unauthorized) copy of a performance of &#8220;Who Will Buy?&#8221; on YouTube. Or you can buy the song for <a href="http://www.amazon.com/Who-Will-Oliver-2003-Remastered/dp/B001BKPSWS">$0.99</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/19/who-will-buy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Third Annual Workshop on Search in Social Media (SSM 2010)</title>
		<link>http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/</link>
		<comments>http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 04:34:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2708</guid>
		<description><![CDATA[I&#8217;m proud to announce that Eugene Agichtein, Marti Hearst, and Ian Soboroff have invited me to help organize the upcoming Workshop on Search in Social Media (SSM 2010). The workshop will take place in conjunction with the ACM  Conference on Web Search and Data Mining (WSDM 2010), a young conference that has quickly become a [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m proud to announce that <a href="http://www.mathcs.emory.edu/%7Eeugene/">Eugene Agichtein</a>, <a href="http://people.ischool.berkeley.edu/%7Ehearst/"> Marti Hearst</a>, and <a href="http://trec.nist.gov/">Ian Soboroff</a> have invited me to help organize the upcoming <a href="http://ir.mathcs.emory.edu/SSM2010/">Workshop on Search in Social Media (SSM 2010)</a>. The workshop will take place in conjunction with the <a href="http://www.wsdm2010.org/">ACM  Conference on Web Search  and Data Mining (WSDM 2010)</a>, a young conference that has quickly become a top-tier forum for work in these areas. The conference and workshop will take place in my home town of New York&#8211;Brooklyn, to be precise!</p>
<p>Here&#8217;s the key information from the workshop web site:</p>
<blockquote>
<h3>Overview</h3>
<p>Social applications are the fastest growing segment of the web. They establish new forums for content creation, allow people to connect to each other and share information, and permit novel applications at the intersection of people and information. However, to date, social media has been primarily popular for connecting people, not for finding information. While there has been progress on searching particular kinds of social media, such as blogs, search in others (e.g., Facebook, Myspace, of flickr) are not as well understood.</p>
<p>The purpose of the 3rd Annual Workshop on Search in Social Media (SSM 2010), is to bring together information retrieval and social media researchers to consider the following questions: How should we search in social media? What are the needs of users, and models of those needs, specific to social media search? What models make the most sense? How does search interact with existing uses of social media? How can social media search complement traditional web search?  What new search paradigms for information finding can be facilitated by social media?</p>
<p><strong>SSM 2010</strong> follows up on the highly successful <strong> <a href="http://ir.mathcs.emory.edu/SSM2009">SSM 2009</a></strong> and <strong><a href="http://ir.mathcs.emory.edu/SSM2008">SSM 2008</a></strong> workshops held at SIGIR 2009 and CIKM 2008 respectively. We are looking forward to an equally exciting workshop at <strong> <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a></strong> in New York!</p>
<h3>Format and Topics</h3>
<p>We are planning for a full-day workshop consisting of invited speakers, organized in both plenary and panel sessions, and a contributed poster/demo session.</p>
<p><strong>We solicit short (under 2 pages) position papers, posters or demo proposals </strong>to be presented as part of a <strong>poster session</strong>, describing late-breaking and novel research results or demonstrations of prototypes or working systems. All topics at the intersection of information finding and social media are of interest, including, but not limited to:</p>
<ul>
<li>Searching blogs, tweets, and other textual social media.</li>
<li>Searching within social networks, including expert finding.</li>
<li>Searching Wikipedia discussions and revision histories.</li>
<li>Searching online discussions, mailing lists, forums, and community question answering sites.</li>
<li>The role of human-powered and community question answering.</li>
<li>Novel models of information finding and new search applications for social media.</li>
<li>The role of timeliness, authority, and accuracy in social media search.</li>
<li>Interaction between traditional web search and social media search.</li>
<li>User needs assessments and task analysis for social media search.</li>
<li>Interactions between searching and browsing in social media.</li>
<li>Searching and exploiting folksonomies, tags, and tagged data.</li>
<li>Spam and adversarial interactions in social media.</li>
</ul>
<p>Ideal papers may include late-breaking and novel research results, position and vision papers discussing the role of search in social media, and demonstrations of prototypes or working systems. Note that the workshop proceedings will not be  archived or considered as formal publication, to encourage the informal  atmosphere and to allow the authors to publish expanded versions of the work  elsewhere.</p>
<p>The poster/demo proposals should be in standard ACM SIG format, more details to be posted soon.</p></blockquote>
<p>Submissions are due on <strong>December 15th</strong>. I hope to see some of you there! Meanwhile, feel free to suggest ideas for invited speakers who have done interesting work at the intersection of social media and search, and I&#8217;ll share your suggestions with my co-organizers.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Innovation at Huffington Post: Data-Driven Headlines</title>
		<link>http://thenoisychannel.com/2009/10/15/innovation-at-huffington-post-data-driven-headlines/</link>
		<comments>http://thenoisychannel.com/2009/10/15/innovation-at-huffington-post-data-driven-headlines/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 18:17:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2705</guid>
		<description><![CDATA[The other day, I was suggesting to one of my colleagues that Endeca&#8216;s software could help authors write better (translate, more SEO-friendly) headlines. The details of that discussion are proprietary, but I&#8217;m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their [...]]]></description>
			<content:encoded><![CDATA[<p>The other day, I was suggesting to one of my colleagues that <a href="http://endeca.com/">Endeca</a>&#8216;s software could help authors write better (translate, more <a href="http://en.wikipedia.org/wiki/Search_engine_optimization">SEO</a>-friendly) headlines. The details of that discussion are proprietary, but I&#8217;m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their right-brain creativity.</p>
<p>But apparently the <a href="http://www.huffingtonpost.com/">Huffington Post</a> is blazing new trails in this area. The <a href="http://www.niemanlab.org/2009/10/how-the-huffington-post-uses-real-time-testing-to-write-better-headlines/">Nieman Journalism Lab</a> reports that:</p>
<blockquote><p><strong>The Huffington Post applies A/B testing to some of its headlines.</strong> Readers are randomly shown one of two headlines for the same story. After five minutes, which is enough time for such a high-traffic site, the version with the most clicks becomes the <a href="http://www.google.com/search?q=site%3Aobserver.com+%22wood+war%22">wood</a> that everyone sees.</p></blockquote>
<p>NJL also reports that Huffington Post social media editor&#8211;and long-time Noisy Channel reader&#8211;<a href="http://networkednews.wordpress.com/">Josh Young</a> uses Twitter to help crowd-source  better headlines.</p>
<p>I&#8217;m sure this approach must rattle some old-school journalists. And there is a real danger of optimizing for the wrong outcome. For example, including the word &#8220;sex&#8221; in this message might improve its traffic (the popularity of <a href="http://thenoisychannel.com/2008/12/12/the-noisy-channel-now-better-than-sex/">this post</a> attests to that), but to what end?</p>
<p>Still, I don&#8217;t see this use of technology as cramping anyone&#8217;s style. Most of us write to be read&#8211;especially those in the media industry who are trying to monetize their audiences. Measurable success matters, and there&#8217;s no harm in trying to maximize it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/15/innovation-at-huffington-post-data-driven-headlines/feed/</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
	</channel>
</rss>
