<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Noisy Channel</title>
	<atom:link href="http://thenoisychannel.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com</link>
	<description></description>
	<lastBuildDate>Sat, 06 Feb 2010 22:54:20 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Vacation</title>
		<link>http://thenoisychannel.com/2010/02/06/vacation/</link>
		<comments>http://thenoisychannel.com/2010/02/06/vacation/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 22:54:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2956</guid>
		<description><![CDATA[
Just letting readers know that I&#8217;ll be on vacation for the next week. If you are starved for reading materials, check out some of the blogs I read.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.culebrabeachrental.com/index.htm"><img class="alignnone" title="Culebra, Puerto Rico" src="http://www.culebrabeachrental.com/images/fb_19.jpg" alt="" width="360" height="270" /></a></p>
<p>Just letting readers know that I&#8217;ll be on vacation for the next week. If you are starved for reading materials, check out some of the <a href="http://thenoisychannel.com/category/blogs-i-read/">blogs I read</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/06/vacation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WSDM 2010: Day 3</title>
		<link>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-3/</link>
		<comments>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-3/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 22:24:36 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2952</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM.

Today is the last day of WSDM 2010, and I unfortunately spent it at home drinking chicken soup. But I&#8217;ve been following the conference via the proceedings and tweets.
The day started with a short session on temporal interaction. Topics included clustering social media documents (e.g., Flickr photos) based on their association with events, [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: this post is cross-posted at <a href="http://cacm.acm.org/blogs/blog-cacm/72149-wsdm-2010-day-3/fulltext">BLOG@CACM</a>.<br />
</em></p>
<p>Today is the last day of <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>, and I unfortunately spent it at home drinking chicken soup. But I&#8217;ve been following the conference via the <a href="http://www.wsdm-conference.org/2010/proceedings/ ">proceedings</a> and <a href="http://search.twitter.com/search?q=%23wsdm2010 ">tweets</a>.</p>
<p>The day started with a short session on temporal interaction. Topics included clustering social media documents (e.g., <a href="http://www.flickr.com/">Flickr</a> photos) based on their association with events, statistical tests for early identification of popular social media content, and analysis of answers sites (like <a href="http://answers.yahoo.com/">Yahoo! Answers</a>) as evolving two-sided economic markets.</p>
<p>The next session focused on advertising. Two papers focused on click prediction: one proposing an <a href="http://www.scholarpedia.org/article/Bayesian_statistics">Bayesian</a> inference model to better predict click-throughs in the tail of the ad distribution; the other presenting a framework for personalized click models. Another paper addressed the closely related problem of predicting ad relevance. The remaining papers discussed other aspects of search advertising: one on estimating the value per click for channels like <a href="http://www.google.com/services/adsense_tour/index.html">Google AdSense</a>, where ad inventory is supplied by a third party; the other proposing an algorithmic approach to automate online ad campaigns based on<a href="http://en.wikipedia.org/wiki/Landing_page">landing page</a> content.</p>
<p>The following session was on systems and efficiency, a popular topic given the immense data and traffic associated with web search. Two papers proposed approaches to help short-circuit ranking computations: one by optimizing the organizations of <a href="http://en.wikipedia.org/wiki/Inverted_index">inverted index</a> entries to consider both the static ranks of documents and the upper bounds of term scores for all terms contained in each document; the other using early-exit strategies to optimize <a href="http://en.wikipedia.org/wiki/Ensemble_learning">ensemble-based machine learning</a> algorithms. Another used machine learning to mine rules for de-duplicating web pages based on URL string patterns. Another focused on compression, showing that web content is at least an order of magnitude more compressible that what can be achieved by <a href="http://en.wikipedia.org/wiki/Gzip">gzip</a>. The last paper proposed a method to perform efficient distance queries on graph (i.e., web graphs or social graphs) by pre-computing a collection of node-centered subgraphs.</p>
<p>The last session of the conference discussed various topics in web mining. One presented a system for identifying distributed search bot attacks. Another proposed an image search method using a combination of entity information and visual similarity. The final paper showed that shallow text features can be used for low-cost detection of boilerplate text in web documents.</p>
<p>All in all, WSDM 2010 was an excellent conference, and I&#8217;m sad to not to have been able to attend more of it in person. I&#8217;m delighted to see an even mix of academic and industry representatives sharing ideas and working to make the web a better place for information access.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-3/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>WSDM 2010: Day 2</title>
		<link>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-2/</link>
		<comments>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-2/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 04:00:53 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2949</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM.
Unfortunately, I woke up this morning rather under the weather, so I&#8217;m having to resort to remotely reporting on the second day of WSDM 2010 conference, based on the published proceedings and the tweet stream.
The day started with a keynote from Harvard economist Susan Athey. Her research focuses on the design of [...]]]></description>
			<content:encoded><![CDATA[<p><i>Note: this post is cross-posted at <a href="http://cacm.acm.org/blogs/blog-cacm/71927-wsdm-2010-day-2/fulltext">BLOG@CACM</a>.</i></p>
<p>Unfortunately, I woke up this morning rather under the weather, so I&#8217;m having to resort to remotely reporting on the second day of <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a> conference, based on the published proceedings and the <a href="http://twitter.com/#search?q=%23wsdm2010">tweet stream</a>.</span></em></p>
<p>The day started with a keynote from Harvard economist <a href="http://kuznets.fas.harvard.edu/~athey/">Susan Athey</a>. Her research focuses on the design of auction-based markets, a topic core to the business of search which largely relies on auction-based advertising models (cf. <a href="http://en.wikipedia.org/wiki/AdWords">Google AdWords</a>). Then came a session focused on learning and optimization. One paper proposed a method to learn ranking functions and query categorization simultaneously, reflecting that different categories of queries leads users to have different expectations about ranking. Another combined traditional list-based ranking with pair-wise comparisons between results to separate the results into tiers reflecting grades of relevance. An intriguing approach to query recommendation treated it as an optimization problem, perturbing users’ query-reformulation path to maximize the expected value of a utility function over the search session. Another paper looked not at ranking per se, but rather at improving the quality of training data for using machine learning for ranking. The final paper of the session, which earned a best-paper nomination, modeled document relevance based not on click-through behavior, but rather on post-click user behavior.</p>
<p>The next session was about users and measurement. It opened with another best-paper nominee: a analysis of over a hundred million users to understand how they re-find web content. Another offered a rigorous analysis of the often sloppily presented &#8220;<a href="http://en.wikipedia.org/wiki/Long_Tail">long-tail</a>&#8221; hypothesis: it found that light users disproportionately prefer content at the head of distribution while heavy users disproportionately prefer the tail. Another log-analysis paper analyzed search logs using a partially observable Markov model, a variant of the<a href="http://en.wikipedia.org/wiki/Hidden_Markov_model">hidden Markov model</a> in which not all of the hidden state transitions emit observable events&#8211;and compared the latent variables with eye-tracking studies. An intriguing study demonstrated that user behavior models are more predictive of goal success than models based on document relevance. The final paper of the session proposed methods for quantifying the reusability of the test collections that lie at the heart of information retrieval evaluation.</p>
<p>The last session of the day focused on social aspects of search. Two of the papers were concerned with modeling authority and influence in social networks, a problem in which I take a deep <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">personal interest</a>. Another inferred attributes of social network users based on those of other users in their communities (cg. MIT&#8217;s <a href="http://www.boston.com/bostonglobe/ideas/articles/2009/09/20/project_gaydar_an_mit_experiment_raises_new_questions_about_online_privacy/">Project Gaydar</a>). Another analyzed <a href="http://www.flickr.com/">Flickr</a> and <a href="http://www.last.fm/">Last.fm</a> user logs to show that users&#8217; semantic similarity based on their tagging behavior is predictive of social links. The final paper tackled the sparsity of social media tags by inferring latent topics from shared tags and spatial information.</p>
<p>Not surprisingly, a disproportionate number of contributors to the conference work at major web search companies, who have both the motivation to improve results and the access to data that is needed for such research. One of the ongoing research challenges for the field is to find ways to make this data available to others while respecting the business concerns of search engine companies and the privacy concerns of their users.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/06/wsdm-2010-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>WSDM 2010: Day 1</title>
		<link>http://thenoisychannel.com/2010/02/05/wsdm-2010-day-1-2/</link>
		<comments>http://thenoisychannel.com/2010/02/05/wsdm-2010-day-1-2/#comments</comments>
		<pubDate>Fri, 05 Feb 2010 05:52:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2946</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM.
Today was the first day of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), held at the Polytechnic Institute of NYU in Brooklyn, NY. WSDM is a young conference that has already become a top-tier publication venue for research in these areas. In contrast [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: this post is cross-posted at </em><a href="http://cacm.acm.org/blogs/blog-cacm"><em>BLOG@CACM</em></a><em>.</em></p>
<p>Today was the first day of the Third ACM International Conference on Web Search and Data Mining (<a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>), held at the Polytechnic Institute of NYU in Brooklyn, NY. WSDM is a young conference that has already become a top-tier publication venue for research in these areas. In contrast to some of the larger conferences, WSDM is single-track and feels more intimate and coherent&#8211;even with over 200 attendees.</p>
<p>The day started with an ambitious keynote by <a href="http://www.cse.iitb.ac.in/~soumen/">Soumen Chakrabarti</a> (IIT Bombay): &#8220;Bridging the Structured Un-Structured Gap&#8221;. He described a soup-to-nuts architecture to annotate web documents and perform complex reasoning on them using a structured query language. But perhaps this ambitious approach is a practical one: it uses the web we have&#8211;as opposed to waiting for the semantic web to emerge&#8211;and there is a prototype using half a billion documents.</p>
<p>The first paper session focused on web search. Of the five papers, two emphasized temporal aspects of content, one considered social media recommendation, and one focused on identifying concepts in multi-word queries. The last paper of the session proposed using anchor text as a more widely available input than query logs to support the query reformulation process. It also attracted the most audience attention&#8211;while<a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">interaction</a> is often a niche at information retrieval conferences, it always elicits strong interest and opinions.</p>
<p>The following session focused on tags and recommendations. Some take-aways: users produce tags similar to the topics designed by experts; individual &#8220;personomies&#8221; can be translated into aggregated folksonomies; matrix factorization methods can produce interpretable recommendations.</p>
<p>The last session of the day covered information extraction. One of the papers used pattern-based information extraction approaches, demonstrating how far we&#8217;ve come since <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>&#8217;s <a href="http://people.ischool.berkeley.edu/~hearst/papers/coling92.pdf">seminal work</a> on the subject. Another offered a SQL-like system for typed-entity search, complete with a live, publicly accessible prototype. The final paper addressed an issue the came up repeatedly at the <a href="http://cacm.acm.org/blogs/blog-cacm/71444-third-workshop-on-search-and-social-media-ssm-2010/fulltext">SSM workshop</a>: the problem of distilling the truth from a collection of inconsistent sources.</p>
<div>After a full day of talks, we headed to <a href="http://www.theparknyc.com/">The Park</a> for an excellent banquet. I&#8217;m looking forward to another two days of great sessions.</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/05/wsdm-2010-day-1-2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Report on the Third Workshop on Search and Social Media (SSM 2010)</title>
		<link>http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/</link>
		<comments>http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/#comments</comments>
		<pubDate>Thu, 04 Feb 2010 08:25:01 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2936</guid>
		<description><![CDATA[Note: this post is cross-posted at BLOG@CACM.
It is my pleasure to report on the 3rd Annual Workshop on Search in Social Media (SSM 2010), a gathering of information retrieval and social media researchers and practitioners in an area that has captured the interest of computer scientists, social scientists, and even the broader public. The one-day workshop [...]]]></description>
			<content:encoded><![CDATA[<p><em>Note: this post is cross-posted at <a href="http://cacm.acm.org/blogs/blog-cacm/71444-third-workshop-on-search-and-social-media-ssm-2010/fulltext">BLOG@CACM</a>.</em></p>
<p>It is my pleasure to report on the 3rd Annual Workshop on Search in Social Media (<a href="http://ir.mathcs.emory.edu/SSM2010/" target="_blank">SSM 2010</a>), a gathering of information retrieval and social media researchers and practitioners in an area that has captured the interest of computer scientists, social scientists, and even the broader public. The one-day workshop took place at the Polytechnic Institute of NYU in Brooklyn, NY, co-located with the ACM Conference on Web Search and Data Mining (<a href="http://www.wsdm-conference.org/2010/" target="_blank">WSDM 2010</a>). The quality of the presenters, the overbooked registration, and the hundreds of live tweets with the <a href="http://search.twitter.com/search?q=%23ssm2010" target="_blank">#ssm2010</a> hashtag all attest to the success of this event.</p>
<p>The workshop opened with a warm welcome from <a href="http://www.csee.umbc.edu/~ian/" target="_blank">Ian Soboroff</a> (NIST), immediately followed by a keynote from <a href="http://www.jopedersen.com/jopedersen/Home.html" target="_blank">Jan Pedersen</a>, Chief Scientist of Bing Search. Jan established a clear business case for search in social media: the opportunity to deliver content that is fresh, local, and under-served by general web search. He drilled into particular types of content where social media search is most useful: expert opinions, breaking news, and tail content. The benefits of social media search include trust and personal interaction (as compared to web content that is often soulless and of uncertain provenance), low latency (though perhaps at the cost of accuracy), and access to niche or ephemeral information that web search rarely surfaces. But delivering social media results to searchers creates its own variety of challenges, such as weighing freshness against accuracy and relevance, coping with loss of social content&#8217;s conversational context, managing low update latency when search engines have not been optimized for it, and fighting new kinds of spam. Despite these challenges, it is clear that the major web search engines have embraced the brave new world of real-time social content.</p>
<p><a href="http://www.mathcs.emory.edu/~eugene/" target="_blank">Eugene Agitchein</a> (Emory University) then moderated a panel representing the world&#8217;s leading search engines: <a href="http://www.google.com/profiles/jhylton" target="_blank">Jeremy Hylton</a> (Google), <a href="http://datamining.typepad.com/" target="_blank">Matthew Hurst</a> (Microsoft), <a href="http://research.yahoo.com/user/78" target="_blank">Sihem Amer-Yahia</a> (Yahoo!), and <a href="http://ir.baidu.com/phoenix.zhtml?c=188488&amp;p=irol-govBio&amp;ID=161381" target="_blank">William Chang</a> (Baidu). Jeremy justified the universal interface approach, pointing out that users don&#8217;t want to have to figure out what kind of search site to use for their queries, and that they expect a familiar interface. He also noted that Google has made great strides on update latency: it can index the Twitter firehose in the same amount of time as serving a query. Matthew offered various analyses of the social search problem, based on whether the information signal resides in content (e.g., web) or attention (e.g., Twitter), or whether the information need is expressed in an explicit search query or inferred from the user&#8217;s context. Sihem offered a counter-point to Jeremy, arguing that social media search queries often represent broad or vague information needs, and thus call for a more browsing-oriented interface than web search, which is optimized for highly specific needs. William noted that the biggest competitive threat he sees for web search engines comes from social media players&#8211;and he credits much of Baidu&#8217;s success to its surfacing of social media content.</p>
<p>Then came a flurry of questions, perhaps the most interesting of which was how to address identity management. William argued that people prefer interacting with real-named (or pseudonymous) people to whom they are directly connected. Sihem offered the counter-example of obtaining recommendations through community aggregation. Matthew noted the incongruity of there being no economic relationship between social network companies that maintain proprietary social graphs and people whose identities and relationships those graph represent. Jeremy pointed out that users benefit if the data is as open as possible.</p>
<p>Given the almost even split between academic and industry participation in the workshop, the panelists were also asked to present research challenges to academia. Jeremy posed the problem of determining when social media results are actually true. Matthew wants to see more interdisciplinary work between computer scientists and social scientists. Sihem offered two challenge problems:  scalable community discovery and evaluation of collaborative recommendation systems. William wants to see a rigorous axiomatization of social media search behavior.</p>
<p>After lunch, <a href="http://www.fxpal.com/?p=jeremy" target="_blank">Jeremy Pickens</a> (FXPAL) moderated a panel representing social media / networking companies: <a href="http://www.hilarymason.com/" target="_blank">Hilary Mason</a> (bit.ly), <a href="http://www.linkedin.com/in/igorperisic" target="_blank">Igor Perisic</a> (LinkedIn), and <a href="http://www.myspace.com/myspacedave" target="_blank">David Hendi</a> (MySpace). Hilary noted that, while bit.ly does not have access to an explicit social graph, it captures implicit connections from user behavior that may not be represented in the graph. Jeremy asked the panelists how much a person&#8217;s extended network matters; David and Igor pointed out research indicating correlations of mood and even medical conditions between people and their third-degree connections. Again, the audience was full of questions, especially for Igor. As a fan of <a href="http://en.wikipedia.org/wiki/Faceted_search" target="_blank">faceted search</a>, I was glad to see him touting LinkedIn&#8217;s success in making faceted search the primary means of performing people search on the site. For an in-depth view, I recommend &#8220;<a href="http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/" target="_blank">LinkedIn Search: A Look Beneath the Hood</a>&#8220;.</p>
<p>The afternoon continued with a poster / demo session emphasizing work in progress: tools, interfaces, research studies, and position papers. I particularly enjoyed listening to the stream of interaction between academic researchers and industry practitioners.</p>
<p>The final panel session assembled academic researchers to discuss their views of the challenges in social media. <a href="http://www.fxpal.com/?p=gene" target="_blank">Gene Golovchinsky</a> (FXPAL) moderated a panel comprised of <a href="http://knoesis.wright.edu/researchers/meena/homepage/" target="_blank">Meena Nagarajan</a> (Wright State University), <a href="http://www.lehigh.edu/~lih307/" target="_blank">Liangjie Hong</a> (Lehigh University),<a href="http://www.dcs.gla.ac.uk/~richardm/" target="_blank">Richard McCreadie</a> (University of Glasgow), <a href="http://www.cs.cmu.edu/~jelsas/" target="_blank">Jonathan Elsas</a> (CMU), and <a href="http://comminfo.rutgers.edu/~mor/" target="_blank">Mor Naaman</a> (Rutgers University). Meena highlighted the need to build up meta-data to describe the context around social utteracnces. Liahjie took a position similar to William Cheng&#8217;s, calling for a framework to model the tasks and behavior of users who interact with social media. Richard focused on the intersection of social media and news search, and noted that some of the most useful information is private and proprietary (e.g., search and chat logs). Jonathan offered a variety of challenges: determining the right retrieval granularity, managing multiple axes of organization, aggregating author behavior, and multidimensional indexing of social media content. Finally, Mor noted that we&#8217;re moving from a world of email to a &#8220;social awareness stream&#8221;, in which the content we directed content at a group and have lower expectations of readership than email. As with all of the panels, there were countless questions from the moderator and audience, particularly about determining the truthfulness of social media content and delivering social content in an effective user interface.</p>
<p>The final conference session was a conference was a full-group discussion that dived into the various topics addressed throughout the day. But Gene Golovchinsky provided the &#8220;one more thing&#8221; at the end, showing us a glimpse of a faceted search interface to explore a Twitter stream. It was an elegant finish to a day filled with informative and engaging discussion, and I look forward to seeing many of the participants in the WSDM conference over the next few days.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Blogging SSM 2010 and WSDM 2010</title>
		<link>http://thenoisychannel.com/2010/02/03/blogging-ssm-2010-and-wsdm-2010/</link>
		<comments>http://thenoisychannel.com/2010/02/03/blogging-ssm-2010-and-wsdm-2010/#comments</comments>
		<pubDate>Wed, 03 Feb 2010 05:07:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2932</guid>
		<description><![CDATA[I&#8217;m delighted to report that I&#8217;ll be blogging about the Search and Social Media Workshop (SSM 2010) and the Web Search and Data Mining Conference (WSDM 2010) for Communications of the ACM.
Of course, I&#8217;ll cross-post here. I also encourage folks to follow the live tweet streams at #ssm2010 and #wsdm2010, as well as Gene and Jeremy&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m delighted to report that I&#8217;ll be blogging about the Search and Social Media Workshop (<a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>) and the Web Search and Data Mining Conference (<a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>) for <a href="http://cacm.acm.org/blogs/blog-cacm/">Communications of the ACM</a>.</p>
<p>Of course, I&#8217;ll cross-post here. I also encourage folks to follow the live tweet streams at <a href="http://search.twitter.com/search?q=%23ssm2010">#ssm2010</a> and <a href="http://search.twitter.com/search?q=%23wsdm2010">#wsdm2010</a>, as well as Gene and Jeremy&#8217;s posts at the <a href="http://palblog.fxpal.com/?tag=ssm2010">FXPAL blog</a>.</p>
<p>To those attending: see you all tomorrow through Saturday! To everyone else: I will try my best to communicate the substance and spirit of the conference.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/02/03/blogging-ssm-2010-and-wsdm-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: Search Facets</title>
		<link>http://thenoisychannel.com/2010/01/31/blogs-i-read-search-facets/</link>
		<comments>http://thenoisychannel.com/2010/01/31/blogs-i-read-search-facets/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 19:54:18 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2928</guid>
		<description><![CDATA[A couple of years ago, I started The Noisy Channel as a personal blog. Since my then-employer Endeca didn&#8217;t have a corporate blog, I became the company&#8217;s ambassador to the blogosphere, despite my protests that this was not a corporate blog.
But I&#8217;m pleased to report that Endeca now has is its own blog, aptly entitled [...]]]></description>
			<content:encoded><![CDATA[<p>A couple of years ago, I started The Noisy Channel as a personal blog. Since my then-employer <a href="http://endeca.com/">Endeca</a> didn&#8217;t have a corporate blog, I became the company&#8217;s ambassador to the blogosphere, despite my protests that this was <a href="http://thenoisychannel.com/2008/12/10/this-is-not-a-corporate-blog/">not a corporate blog</a>.</p>
<p>But I&#8217;m pleased to report that Endeca now has is its own blog, aptly entitled <a href="http://facets.endeca.com/">Search Facets</a>. I&#8217;m not usually a fan of corporate blogs, but I like the approach Endeca is taking to this one. The folks who have posted so far are Adam Ferrari (CTO), Vladimir Zelevinsky (Research Scientist), and Pete Bell (Co-Founder)&#8211;an indication that the blog will contain substance, rather than warmed-over press releases.</p>
<p>Indeed, the posts so far are nice and meaty. I particularly like Adam&#8217;s post about &#8220;<a href="http://facets.endeca.com/2010/01/vertical-stores-for-vertical-web-search/">Vertical stores for vertical web search?</a>&#8220;&#8211;it&#8217;s nice to see read intelligent analysis from someone who understand the strengths of both <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> and <a href="http://en.wikipedia.org/wiki/Column-oriented_DBMS">column-oriented relational databases</a>.</p>
<p>Anyway, I&#8217;m delighted that my former co-workers have taken to the blogosphere, and I look forward to reading what they have to say!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/31/blogs-i-read-search-facets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LinkedIn Search: A Look Beneath the Hood</title>
		<link>http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/</link>
		<comments>http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/#comments</comments>
		<pubDate>Sun, 31 Jan 2010 18:22:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2924</guid>
		<description><![CDATA[
Last week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John&#8217;s presentation at the SDForum took a developer&#8217;s perspective, discussing the challenges of combining faceted search and [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://docs.google.com/present/embed?id=d7qvbkn_28cgpvm96r" frameborder="0" width="410" height="342"></iframe></p>
<p>Last week, I had the good fortune to attend a presentation by <a href="http://www.linkedin.com/in/javasoze">John Wang</a>, search architect at <a href="http://linkedin.com/">LinkedIn</a>. You may have read my <a href="http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/">earlier posts</a> about LinkedIn introducing <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> and celebrating the interface from a user perspective. John&#8217;s presentation at the <a href="http://www.sdforum.org/index.cfm?fuseaction=Calendar.eventDetail&amp;eventID=13601">SDForum</a> took a developer&#8217;s perspective, discussing the challenges of combining faceted search and social networking at scale.</p>
<p>John was kind enough to publish his slides, and I&#8217;ve embedded them above. Unfortunately, there&#8217;s no recording of the extensive Q&#038;A (which included various attempts to get John to reveal the precise details of LinkedIn&#8217;s data volume), but the slides are quite meaty.</p>
<p>Personally, I learned two surprising things from the talk.</p>
<p>First, I was surprised that LinkedIn dismisses index/cache warming as &#8220;cheating&#8221;, instead computing almost everything in real time. Specifically, I would have expected LinkedIn to cache information like a user&#8217;s set of degree-two connections: these are expensive to compute at query time, especially when the <a href="http://en.wikipedia.org/wiki/Social_graph">social graph</a> is distributed and <a href="http://en.wikipedia.org/wiki/Shard_%28database_architecture%29">sharded</a> by user. I did ask John whether LinkedIn recomputes a user&#8217;s degree-two network during a session, and he admitted that LinkedIn is sensible enough to &#8220;cheat&#8221; and not perform this expensive but almost useless re-computation.</p>
<p>Second, I learned about <a href="http://www.linkedin.com/rs?trk=msitesearch">reference search</a>, a feature I may have missed because it is only available for premium LinkedIn accounts. It&#8217;s a nice feature, allowing you to search against company + date range pairs. People who are familiar with implementing faceted search may recognize the preservation of such associations between facet values as a gnarly implementation challenge.</p>
<p>All in all, it was a treat to get this look under the hood, as well as to finally meet John in person. I also ran into <a href="http://palblog.fxpal.com/?p=2806">Gene Golovchinsky</a> there&#8211;so much for my spending a few days on the west coast incognito!</p>
<p>In any case, I&#8217;m looking forward to seeing Gene, some of John&#8217;s colleagues, and many more interesting people at the Search and Social Media Workshop (<a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>) on Wednesday. My apologies to those who aren&#8217;t able to attend this oversubscribed event. I promise to blog about it!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Workshop on Search and Social Media (SSM 2010)</title>
		<link>http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/</link>
		<comments>http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 14:24:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2918</guid>
		<description><![CDATA[The 3rd Annual Workshop on Search in Social Media (SSM 2010) will be held on Wednesday, February 3rd at the Polytechnic Institute in Brooklyn, NY. It&#8217;s co-located with the WSDM 2010 conference on Web Search and Data Mining. As a co-organizer, I&#8217;m proud to announce that the workshop program is now online.
It features a keynote [...]]]></description>
			<content:encoded><![CDATA[<p>The 3rd Annual Workshop on Search in Social Media (<a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>) will be held on Wednesday, February 3rd at the Polytechnic Institute in Brooklyn, NY. It&#8217;s co-located with the <a href="http://www.wsdm2010.org/">WSDM 2010</a> conference on Web Search and Data Mining. As a co-organizer, I&#8217;m proud to announce that the <a href="http://ir.mathcs.emory.edu/SSM2010/program.html#schedule">workshop program</a> is now online.</p>
<p>It features a keynote from <a href="http://www.jopedersen.com/jopedersen/Home.html">Jan Pedersen</a>, Chief Scientist for Core Search at Microsoft, as well as an impressive set of posters and panel sessions. Other participants include:</p>
<ul>
<li>Sihem Amer-Yahia, Yahoo!</li>
<li>Jon Elsas, CMU</li>
<li>Gene Golovchinksky, FXPAL</li>
<li>David Hendi, MySpace</li>
<li>LiangJie Hong, Lehigh U.</li>
<li>Jeremy Hylton, Google</li>
<li>Matthew Hurst, Microsoft</li>
<li>Hilary Mason, bit.ly</li>
<li>Richard McCreadie, U. of Glasgow</li>
<li>Mor Naaman, Rutgers U.</li>
<li>Meena Nagarajan, Wright State U.</li>
<li>Igor Perisic, LinkedIn</li>
<li>Jeremy Pickens, FXPAL</li>
</ul>
<p>There&#8217;s still time to <a href="http://www.wsdm-conference.org/2010/register.html">register</a> if you&#8217;re interested!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Real Time Search Is Personal</title>
		<link>http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/</link>
		<comments>http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/#comments</comments>
		<pubDate>Mon, 18 Jan 2010 19:42:13 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2911</guid>
		<description><![CDATA[The other day, I promised in a comment thread that I&#8217;d write about what I see as real use cases for real-time search. As it happens, I&#8217;m experiencing one right now.
As my wife, daughter, and I were walking home from a playground, we noticed a large number of fire trucks congregating a block away from [...]]]></description>
			<content:encoded><![CDATA[<p>The other day, I promised in a <a href="http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/#comments">comment thread</a> that I&#8217;d write about what I see as real use cases for real-time search. As it happens, I&#8217;m experiencing one right now.</p>
<p>As my wife, daughter, and I were walking home from a playground, we noticed a large number of fire trucks congregating a block away from our house. A quick search on Twitter <a href="http://search.twitter.com/search?q=+near%3A11201+within%3A15mi+explosion">explained</a> what was going on, particularly by pointing us to this <a href="http://gothamist.com/2010/01/18/buildings_and_subway_stations_in_do.php">post</a> on Gothamist&#8211;which as of this writing seems to be the only reporting about this incident.</p>
<p>I think this example tells us a lot about the utility of real-time search. Most of us don&#8217;t need real-time search to tell us about the <a href="http://http://news.google.com/news/search?q=haiti">news in Haiti</a>, since a critical mass of major news providers is covering the story around the clock. Where real-time search matters most is at the personal level&#8211;specifically, when our personal urgency to obtain information is higher than that of the general population. In such situations, we&#8217;re willing to accept less polished&#8211;and even risk less accurate&#8211;information, particularly if the alternative is to wait until if and when news providers cover the story. At least to some extent, urgency trumps authority.</p>
<p>Yes, there are other use cases for conversational media like Facebook and Twitter, such as sharing the experience of watching a live event, or simply chatting with friends and strangers about arbitrary topics. But I wouldn&#8217;t consider such use of these media to be search. Real-time search, in my view, is about helping users obtain the latest information available&#8211;in accordance with their personal needs. Twitter and <a href="http://www.google.com/search?&amp;output=search&amp;q=brooklyn%20heights%20explosion&amp;tbs=rltm:1">Google</a> served me well today, and I&#8217;m grateful that real-time search gave me real-time peace of mind.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>When Is Faceted Search Appropriate?</title>
		<link>http://thenoisychannel.com/2010/01/15/when-is-faceted-search-appropriate/</link>
		<comments>http://thenoisychannel.com/2010/01/15/when-is-faceted-search-appropriate/#comments</comments>
		<pubDate>Fri, 15 Jan 2010 06:31:27 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2900</guid>
		<description><![CDATA[

Earlier this week, Peter Morville and Mark Burrell presented a UIE virtual seminar on &#8220;Leveraging Search &#38; Discovery Patterns For Great Online Experiences&#8220;. It sold out! And I thought Pete Bell and I had done well with our seminar on faceted search!
But I&#8217;m hardly surprised. Although I wasn&#8217;t able to attend it myself, I gather [...]]]></description>
			<content:encoded><![CDATA[<p><img style="visibility: hidden; width: 0px; height: 0px;" src="http://counters.gigya.com/wildfire/IMP/CXNID=2000002.0NXC/bT*xJmx*PTEyNjM1MzYyMTA5MTUmcHQ9MTI2MzUzNjIxNTQ3MSZwPTEwMTkxJmQ9c3NfZW1iZWQmZz*yJm89YjczYWQ5YzUwMGVmNGRiOGFhZGY*MDRmMDI*NzNiOWQmb2Y9MA==.gif" border="0" alt="" width="0" height="0" /></p>
<div id="__ss_2692450" style="width: 425px; text-align: left;"><object style="margin: 0px;" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uiedesignpatternstrailermerged-091210133302-phpapp01&amp;stripped_title=search-discovery-patterns-a-uie-virtual-seminar" /><param name="allowfullscreen" value="true" /><embed style="margin: 0px;" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=uiedesignpatternstrailermerged-091210133302-phpapp01&amp;stripped_title=search-discovery-patterns-a-uie-virtual-seminar" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>Earlier this week, <a href="http://www.findability.org/">Peter Morville</a> and Mark Burrell presented a <a href="http://uie.com/">UIE</a> virtual seminar on &#8220;<a href="http://www.uie.com/events/virtual_seminars/search_patterns/">Leveraging Search &amp; Discovery Patterns For Great Online Experiences</a>&#8220;. It <a href="http://facets.endeca.com/2010/01/how-to-sell-out-a-virtual-seminar/">sold out</a>! And I thought Pete Bell and I had done well with our <a href="http://www.uie.com/events/virtual_seminars/facets/">seminar</a> on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>!</p>
<p>But I&#8217;m hardly surprised. Although I wasn&#8217;t able to attend it myself, I gather from <a href="http://search.twitter.com/search?q=%23uievs">Twitter</a> and the <a href="http://strottrot.com/2010/01/14/looking-forward-to-interaction10/">blogosphere</a> that it was a great presentation. I enjoyed serving as a reviewer for Peter&#8217;s new book on <a href="http://searchpatterns.org/">Search Patterns</a>, and I contributed a bit to Endeca&#8217;s <a href="http://www.endeca.com/resource-center-ui-pattern-library.htm">UI Design Pattern Library</a> while I was there and Mark&#8217;s team was developing it.</p>
<p>In reading reactions to the seminar, I was particularly intrigued by a post entitled &#8220;<a href="http://livlab.com/thinkia/2010/01/search-and-browse/">Search and Browse</a>&#8221; by Livia Labate on her fantastically named blog, &#8220;<a href="http://livlab.com/thinkia/">I think, therefore IA</a>&#8220;. She raised a question that I think needs to be asked more often: when is (or isn&#8217;t) faceted search appropriate?</p>
<p>Her conversation with readers in a comment thread offered some possible answers:</p>
<ul>
<li>Faceted search helps users who think in terms of attribute specifications as filtering criteria.</li>
<li>Faceted search supports search by exclusion, as opposed to by discovery.</li>
<li>Faceted search requires a set of useful facets that is neither too small nor too large.</li>
</ul>
<p>I&#8217;d like to propose my own answers. Here are the conditions for which I see faceted search being most useful:</p>
<ul>
<li>Faceted search supports <a href="http://thenoisychannel.com/2008/06/24/what-is-not-exploratory-search/">exploratory</a> use cases, in contrast to <a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item search</a>. For known-item search, users are better served by a search box to specify an item by name, or a non-faceted hierarchy to locate it. In contrast, faceted search optimizes for cases where users are either unsure of what they want or of how to specify it.</li>
<li>Faceted search helps users who need or want to learn about the search space as they execute the search process. Facets educate users about different ways to characterize items in a collection. If users do not need or want this education, they may be frustrated by an interface that makes them do more work.</li>
<li>The search space is classified using accurate, understandable facets that relate to the users&#8217; information needs. As I&#8217;ve discussed before, <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/">data quality is often the bottleneck in designing search interfaces</a>. Offering users facets that are either unreliable or unrelated to their needs is worse than providing no facets at all.</li>
</ul>
<p>Given the above criteria, it&#8217;s not surprising that faceted search has been a huge success in online retail: shopping is often an exploratory learning experience, and retailers tend to have good data.</p>
<p>But the success of faceted search in retail overshadows other domains where faceted search may be even more valuable. My favorite example is faceted people search, most recently demonstrated by <a href="http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/">LinkedIn</a>. I would love to see other entities (locations, businesses, etc.) receive similar treatment, at least in contexts where exploration is a common use case.</p>
<p>I think Livia is right to be skeptical about any interface that introduces complexity&#8211;and facets do introduce complexity. I hope that my guidelines help answer her question as to when that complexity is worthwhile and perhaps even necessary to help users satisfy their information needs.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/15/when-is-faceted-search-appropriate/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Can You &#8220;Near Me Now&#8221;?</title>
		<link>http://thenoisychannel.com/2010/01/09/can-you-near-me-now/</link>
		<comments>http://thenoisychannel.com/2010/01/09/can-you-near-me-now/#comments</comments>
		<pubDate>Sat, 09 Jan 2010 05:12:58 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2896</guid>
		<description><![CDATA[
Weren&#8217;t we just talking about what&#8217;s different about mobile search use cases and about how to make web search more exploratory? I may be biased, but I think that Google&#8217;s recently launched &#8220;near me now&#8221; button is a step in the right direction (no pun intended!) on both of these fronts.
I&#8217;m curious to hear unbiased [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="240" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/ETbTqjjzDLg&amp;hl=en_US&amp;fs=1&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="240" src="http://www.youtube.com/v/ETbTqjjzDLg&amp;hl=en_US&amp;fs=1&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Weren&#8217;t we <a href="http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/">just talking</a> about what&#8217;s different about mobile search use cases and about how to make web search more exploratory? I may be biased, but I think that Google&#8217;s recently launched &#8220;<a href="http://googlemobile.blogspot.com/2010/01/finding-places-near-me-now-is-easier.html">near me now</a>&#8221; button is a step in the right direction (no pun intended!) on both of these fronts.</p>
<p>I&#8217;m curious to hear unbiased feedback from iPhone and Android users who have gotten to play with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/09/can-you-near-me-now/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Search Questions for 2010: What&#8217;s On My Mind</title>
		<link>http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/</link>
		<comments>http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/#comments</comments>
		<pubDate>Sun, 03 Jan 2010 23:09:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2891</guid>
		<description><![CDATA[Happy New Year to the Noisy Community and everyone else in virtual earshot! I hope everyone is entering 2010 well-rested and ready for great things. And I don&#8217;t just mean shiny new gadgets.
For me, 2009 marked the end of a decade-long run at Endeca, where I focused on bringing HCIR to enterprises. I&#8217;m particularly proud [...]]]></description>
			<content:encoded><![CDATA[<p>Happy New Year to the Noisy Community and everyone else in virtual earshot! I hope everyone is entering 2010 well-rested and ready for great things. And I don&#8217;t just mean shiny <a href="http://en.wikipedia.org/wiki/Nexus_One">new</a> <a href="http://en.wikipedia.org/wiki/ISlate">gadgets</a>.</p>
<p>For me, 2009 marked the end of a decade-long run at <a href="http://endeca.com/">Endeca</a>, where I focused on bringing <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> to enterprises. I&#8217;m particularly proud of two professional accomplishments: writing a <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">book</a> on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, and organizing the <a href="http://sigir2009.org/">SIGIR 2009</a> <a href="http://sigir2009.org/Program/industry">Industry Track</a>.</p>
<p>But past is prologue. I spent the last several weeks of 2009 as a <a href="http://www.flickr.com/photos/albill/429691222/">Noogler</a>, and I launch into 2010 living and breathing search on the open web.</p>
<p>What&#8217;s on my mind? Here are some top-of-mind questions to which I hope to have better answers by this time next year:</p>
<ul>
<li><strong>Exploratory Search</strong>: how should we determine that users want a more <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> experience, rather than one that minimizes time to a best-effort result? How should we respond to queries that clearly don&#8217;t have a single best answers, such as queries of the form [category] or [category location]?</li>
</ul>
<ul>
<li><strong>Mobile Search</strong>: should it be just like non-mobile search with a few tweaks to accommodate the device form factor? Or does / should mobile search fundamentally change the way we interact with information?</li>
</ul>
<ul>
<li><strong>Real-Time Search</strong>: is it more than real-time indexing plus emphasizing recency as a query-independent relevance factor? What are the use cases, and how should we be addressing them?</li>
</ul>
<ul>
<li><strong>Social / Collaborative Search</strong>: should we be looking to <a href="http://en.wikipedia.org/wiki/Microblogging">microblogging</a> or other social media signals to augment (or even supplant!) link-based citations as authority cues? Should we be supporting mediated search by linking people to people, rather than directly to information?</li>
</ul>
<ul>
<li><strong>Transparency</strong>: is it possible to offer more <a href="http://thenoisychannel.com/2008/04/08/qa-with-amit-singhal/">transparency in relevance ranking</a> without losing ground in the battle against spam and black-hat SEO?</li>
</ul>
<p>To be clear, these are simply the questions that are on my mind&#8211;I&#8217;m speaking as an individual and not as a Google employee. That said, a great thing about being at Google is that there are people working on all of these areas. So I expect 2010 to be an exciting year!</p>
<p>Curious to hear what problems are on other people&#8217;s minds as we enter the new year. Comment away!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Forget Real-Time, Give Us Over Time!</title>
		<link>http://thenoisychannel.com/2009/12/30/forget-real-time-give-us-over-time/</link>
		<comments>http://thenoisychannel.com/2009/12/30/forget-real-time-give-us-over-time/#comments</comments>
		<pubDate>Wed, 30 Dec 2009 14:56:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2883</guid>
		<description><![CDATA[In a recent announcement, Twitter Platform / API Product Manager Ryan Sarver tells us that Twitter is:
committed to providing a framework for any company big or small, rich or poor to do a deal with us to get access to the Firehose in the same way we did deals with Google and Microsoft. We want everyone [...]]]></description>
			<content:encoded><![CDATA[<p>In a recent <a href="https://groups.google.com/group/twitter-development-talk/browse_thread/thread/a1076d83d70d0450?pli=1">announcement</a>, Twitter Platform / API Product Manager <a href="http://sarver.org/about/">Ryan Sarver</a> tells us that Twitter is:</p>
<blockquote><p>committed to providing a framework for any company big or small, rich or poor to do a deal with us to get access to the Firehose in the same way we did deals with Google and Microsoft. We want everyone to have the opportunity &#8212; terms will vary based on a number of variables but we want a two-person startup in a  garage to have the same opportunity to build great things with the full feed that someone with a billion dollar market cap does. There are still a lot of details to be fleshed out and communicated, but this a top priority for us and we look forward to what types of companies and products get built on top of this unique and rich stream.</p></blockquote>
<p>That and some other details, like raising the API rate limit from 150 requests per hour to 1500,  may well bring on what Marshall Kirkpatrick of ReadWriteWeb calls &#8220;<a href="http://www.readwriteweb.com/archives/twitter_20_api_rate_change_could_lead_to_a_world_o.php">Twitter 2.0</a>&#8220;. But it was something else in Kirkpatrick&#8217;s write up that caught my attention&#8211;this quote from <a href="http://wow.ly/">Wow.ly</a> co-founder Kevin Marshall:</p>
<blockquote><p>The more I do with and around social data, the less interested I seem to become in &#8216;realtime&#8217; and the more interested I become in &#8216;over time.&#8217; When I first started hacking on Twitter (and Facebook) apps, I was in love with the idea of parsing and analyzing data in real-time and I was very link/content focused. But the more I build and use these tools, the more I see the value in the history and the trails of the data set.</p></blockquote>
<p>I couldn&#8217;t have said it better! Not that I haven&#8217;t tried: you look back at my post about <a href="http://thenoisychannel.com/2009/05/27/topsy-tippling-the-stream-of-conversations/">Topsy</a>, you&#8217;ll see where real-time and over time meet. Recency matters, but the signal is far too sparse without some way to aggregate and analyze over time.</p>
<p>I&#8217;m thrilled that Twitter plans to open up its platform in a way that could enable analysis over semantic, social, and temporal dimensions. Now I&#8217;m curious to see what that access will look like, and what everyone has been clamoring for that access will do with it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/30/forget-real-time-give-us-over-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Faceted Web Search?</title>
		<link>http://thenoisychannel.com/2009/12/27/faceted-web-search/</link>
		<comments>http://thenoisychannel.com/2009/12/27/faceted-web-search/#comments</comments>
		<pubDate>Sun, 27 Dec 2009 21:18:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2878</guid>
		<description><![CDATA[Researchers from Microsoft say it&#8217;s very challenging. Google is trying, but there&#8217;s a long way to go. And Eric Iverson just wrote me to describe his own preliminary efforts to build faceted search on top of Yahoo! BOSS.
I believe there&#8217;s a clearly established business case for faceted search inside the enterprise, for site search (especially [...]]]></description>
			<content:encoded><![CDATA[<p>Researchers from Microsoft say it&#8217;s <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">very challenging</a>. Google is <a href="http://www.google.com/squared">trying</a>, but there&#8217;s a long way to go. And <a href="http://www.linkedin.com/in/newledge">Eric Iverson</a> just wrote me to describe his own preliminary efforts to build faceted search on top of <a href="http://developer.yahoo.com/search/boss/">Yahoo! BOSS</a>.</p>
<p>I believe there&#8217;s a clearly established business case for <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> inside the <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise</a>, for site search (especially for retail and media / publishing sites), even for <a href="http://en.wikipedia.org/wiki/Vertical_search">vertical search</a> on the open web. In all of these cases, relevance-ranked results are insufficient to meet a large subset of users&#8217; more <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory</a> information needs, and <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a> approaches like faceted search are an easy sell.</p>
<p>But it seems much harder to make this case for general web search. The track record of startups in this space isn&#8217;t very encouraging. That could be because no one has done it right, but Clayton Christensen&#8217;s theory of <a href="http://en.wikipedia.org/wiki/Disruptive_technology">disruptive innovation</a> would suggest that a successful entrant wouldn&#8217;t have to have parity across the board, but would simply need to win on an underserved market segment. Perhaps the increasing use of faceted search for vertical search is how this process is playing out, and faceted search for general web search may end up being a slow agglomeration of verticals.</p>
<p>I&#8217;m curious if others have been pursuing efforts like Eric&#8217;s. Are the available APIs powerful enough to prototype your own faceted web search engine? If they aren&#8217;t, then is this a potential business opportunity for one of the major (or non-major) search engines to promote innovation by offering an <a href="http://googlepublicpolicy.blogspot.com/2009/12/meaning-of-open.html">open system</a>? Or, if Yahoo! BOSS already offers such an open system, what should we make of the scale of its impact?</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/27/faceted-web-search/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>R.I.P. Modista</title>
		<link>http://thenoisychannel.com/2009/12/26/r-i-p-modista/</link>
		<comments>http://thenoisychannel.com/2009/12/26/r-i-p-modista/#comments</comments>
		<pubDate>Sat, 26 Dec 2009 21:03:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2873</guid>
		<description><![CDATA[
Long-time readers may recall my post about visual search startup Modista last November, or this guest post by one of its principals. Unfortunately, the story has a sad ending. I hope that both this technology and its developers find a good home.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://modista.com/"><img class="alignnone size-full wp-image-2874" title="Modista RIP" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/12/Modista-RIP.png" alt="" width="235" height="236" /></a></p>
<p>Long-time readers may recall my <a href="http://thenoisychannel.com/2008/11/05/modista-similarity-browsingfor-shoes/">post</a> about visual search startup <a href="http://modista.com/">Modista</a> last November, or this <a href="http://thenoisychannel.com/2009/04/10/guest-post-exploring-visual-similarity-with-modista/">guest post</a> by one of its principals. Unfortunately, the story has a <a href="http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/">sad ending</a>. I hope that both this technology and its developers find a good home.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/26/r-i-p-modista/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Recovering From Being Hacked</title>
		<link>http://thenoisychannel.com/2009/12/24/recovering-from-being-hacked/</link>
		<comments>http://thenoisychannel.com/2009/12/24/recovering-from-being-hacked/#comments</comments>
		<pubDate>Thu, 24 Dec 2009 22:48:59 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2866</guid>
		<description><![CDATA[I discovered today that I&#8217;d been hacked earlier this week by a spam link injection attack. I&#8217;m still not sure how it happened, but I believe I&#8217;ve cleaned out all of the offending PHP from my WordPress installation. I&#8217;ve also removed most of my plug-ins in the process, and I may have broken some things [...]]]></description>
			<content:encoded><![CDATA[<p>I discovered today that I&#8217;d been hacked earlier this week by a spam link injection attack. I&#8217;m still not sure how it happened, but I believe I&#8217;ve cleaned out all of the offending PHP from my WordPress installation. I&#8217;ve also removed most of my plug-ins in the process, and I may have broken some things in my zeal to clean up the site. My apologies for any inconveniences, and my thanks to <a href="http://twitter.com/awaisathar">@awaisathar</a> and <a href="http://twitter.com/gsingers">@gsingers</a> for helping me resolve this quickly.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/24/recovering-from-being-hacked/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: UXmatters</title>
		<link>http://thenoisychannel.com/2009/12/20/blogs-i-read-uxmatters/</link>
		<comments>http://thenoisychannel.com/2009/12/20/blogs-i-read-uxmatters/#comments</comments>
		<pubDate>Mon, 21 Dec 2009 00:19:03 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2858</guid>
		<description><![CDATA[According to Wikipedia, user experience is &#8220;the overarching experience a person has as a result of their interactions with a particular product or service, its delivery, and related artifacts, according to their design.&#8221; While I&#8217;ve never labeled myself a designer, I have always cared deeply about user experience, even back before my information retrieval days, [...]]]></description>
			<content:encoded><![CDATA[<p>According to <a href="http://en.wikipedia.org/wiki/User_experience_design">Wikipedia</a>, user experience is &#8220;the overarching experience a person has as a result of their interactions with a particular product or service, its delivery, and related artifacts, according to their design.&#8221; While I&#8217;ve never labeled myself a designer, I have always cared deeply about user experience, even back before my <a href="http://en.wikipedia.org/w/index.php?title=Information_retrieval">information retrieval</a> days, when I was working on <a href="http://en.wikipedia.org/wiki/Graph_drawing">graph drawing</a>. Indeed user experience is the defining problem for <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>.</p>
<p>One of my favorite resources for learning about user experience is the <a href="http://uxmatters.com/index.php">UXmatters</a> blog. This group blog boasts a set of <a href="http://uxmatters.com/authors/">authors</a> that represent a diverse collection of industry practitioners (and <a href="http://uxmatters.com/authors/archives/2005/12/david_heller_malouf.php">one academic</a>) and offer concrete case studies and recommendations.</p>
<p>For example, in &#8220;<a href="http://www.uxmatters.com/mt/archives/2009/09/best-practices-for-designing-faceted-search-filters.php">Best Practices for Designing Faceted Search Filters</a>&#8220;, Greg Nudelman offers a constructive critique of the <a href="http://www.officedepot.com/">Office Depot</a> search user interface. Some of his material will be familiar to those who have read my <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">faceted search book</a> (particularly the chapter on <a href="http://www.uie.com/events/virtual_seminars/facets/Faceted%20Search%20-%20Chapter%207.pdf">front-end concerns</a>), but the focus on a single example makes for a compelling read. I also liked Greg&#8217;s most recent post, entitled &#8220;<a href="http://www.uxmatters.com/mt/archives/2009/12/cameras-music-and-mattresses-designing-query-disambiguation-solutions-for-the-real-world.php">Cameras, Music, and Mattresses: Designing Query Disambiguation Solutions for the Real World</a>&#8220;. I was amused that he and I use the same &#8220;<em>canon</em>ical&#8221; example for the need to offer <a href="http://thenoisychannel.com/2008/06/02/clarification-vs-refinement/">clarification before refinement</a>. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Here are a few more posts from other authors to give you a taste for the blog:</p>
<ul>
<li>&#8220;<a href="http://www.uxmatters.com/mt/archives/2009/11/first-do-no-harm.php">First, Do No Harm</a>&#8221; by Pabini Gabriel-Petit</li>
<li>&#8220;<a href="http://www.uxmatters.com/mt/archives/2007/11/the-five-competencies-of-user-experience-design.php">The Five Competencies of User Experience Design</a>&#8221; by Steve Psomas</li>
<li>&#8220;<a href="http://www.uxmatters.com/mt/archives/2009/01/beyond-usability-designing-web-sites-for-persuasion-emotion-and-trust.php">Beyond Usability: Designing Web Sites for Persuasion, Emotion, and Trust</a>&#8221; by Eric Schaffer</li>
</ul>
<p>If you are a user experience professional, in name or in deed, then you should be reading the the <a href="http://uxmatters.com/index.php">UXmatters</a> blog &#8212; or perhaps even <a href="http://www.uxmatters.com/aboutus/writing-for-uxmatters.php">contributing</a> to it. Of course, you&#8217;re always welcome to contribute a <a href="http://thenoisychannel.com/category/guest-post/">guest post</a> here too.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/20/blogs-i-read-uxmatters/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>LinkedIn Faceted Search Now Out Of Beta</title>
		<link>http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/</link>
		<comments>http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/#comments</comments>
		<pubDate>Tue, 15 Dec 2009 04:20:25 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2853</guid>
		<description><![CDATA[
LinkedIn started rolling out a beta version of faceted people search back in July. Now it&#8217;s officially out of beta, as announced on their blog. I&#8217;ve re-posted the video above in case you missed it in July.
Interestingly, LinkedIn developed its own tool to support the combination of faceted search with social network search: Bobo-Browse (Otis [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="243" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/unLo7maOgT4&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="243" src="http://www.youtube.com/v/unLo7maOgT4&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>LinkedIn started rolling out a beta version of <a href="http://thenoisychannel.com/2009/07/15/linkedin-rolling-out-faceted-search/">faceted people search</a> back in July. Now it&#8217;s officially out of beta, as <a href="http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/">announced on their blog</a>. I&#8217;ve re-posted the video above in case you missed it in July.</p>
<p>Interestingly, LinkedIn developed its own tool to support the combination of <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> with social network search: <a href="http://code.google.com/p/bobo-browse/">Bobo-Browse</a> (Otis mentioned it in our recent <a href="http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/">presentation</a> to the New York CTO Club). I helped develop similar functionality when I was at <a href="http://endeca.com/">Endeca</a>, so I know how hard this problem is. LinkedIn has done an impressive job&#8211;and has applied it to one of the most valuable data sets on the web. Bravo!</p>
<p>But I can&#8217;t help asking for just one more thing. LinkedIn has great semi-structured data about its 50+ million members. I&#8217;d love to be able to explore that data using more facets&#8211;in particular, facets relating to people&#8217;s job skills and expertise. I hope that&#8217;s something they&#8217;re working on. Perhaps a good topic of conversation at the upcoming <a href="http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/">Workshop on Search and Social Media</a>!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Karaoke: A Hotbed for Micro-IR?</title>
		<link>http://thenoisychannel.com/2009/12/13/karaoke-a-hotbed-for-micro-ir/</link>
		<comments>http://thenoisychannel.com/2009/12/13/karaoke-a-hotbed-for-micro-ir/#comments</comments>
		<pubDate>Sun, 13 Dec 2009 22:08:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2847</guid>
		<description><![CDATA[I&#8217;m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I&#8217;m sure I&#8217;m not the only person who encounters this micro-IR problem, and it occurred to me that there [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a karaoke junkie and proud to admit it. But one of the challenges I regularly face, especially when I go to an unfamiliar karaoke joint, is finding a song I know well enough to sing. I&#8217;m sure I&#8217;m not the only person who encounters this <a href="http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/">micro-IR</a> problem, and it occurred to me that there might be better technical solutions to it.</p>
<p>Most karaoke venues provide printed song books, typically sorted by title and by artist. This approach is certainly adequate for very limited selections, but it doesn&#8217;t scale gracefully. Indeed, one of my favorite karaoke bars, the <a href="http://www.courtsidekaraoke.com/">Courtside</a> in Cambridge, MA, has a fantastic song selection that is only accessible through printed books. Kinda frustrating for a search guy, even though the <a href="http://www.courtsidekaraoke.com/asktheshark.htm">staff</a> is very helpful!</p>
<p>My regular karaoke venue in New York, <a href="http://www.2ndon2nd.com/">Second on Second</a>, is a bit more technologically advanced: it provides computers with dedicated software that allows patrons to search through their song catalog. Aside from being faster than thumbing through books, the software makes it possible to find songs when you only remember words that are in the middle of song or artist names.</p>
<p>But even such a system only addresses <a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item search</a>&#8211;in this case, looking for a song or artist by name when you know precisely what you are looking for. There&#8217;s room for incremental improvement here, e.g., searching for songs based on the lyrics you remember. For example, many people remember a famous David Bowie song based on its protagonist &#8220;<a href="http://www.google.com/search?q=major+tom">Major Tom</a>&#8221; rather than its title &#8220;<a href="http://en.wikipedia.org/wiki/Space_Oddity">Space Oddity</a>&#8220;; fortunately, tools like Google&#8217;s <a href="http://googleblog.blogspot.com/2009/10/making-search-more-musical.html">music search</a> are happy to make such connections.</p>
<p>But none of the karaoke search technology I&#8217;ve see to date supports <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploration</a>. Specifically, I&#8217;d love to go into a karaoke bar and have a procedure for finding songs I know that is better than trial and error. For example, I&#8217;d like to be able to see my options for hard rock 80s songs with male vocals. Or to find out which <a href="http://en.wikipedia.org/wiki/Downtempo">downtempo</a> bands, if any, are on the menu. A little <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> would go a long way towards making the song-finding experience more pleasant and efficient.</p>
<p>But why stop there? I&#8217;d really like a system that suggests songs based on what it knows about me. For example, knowing that I like to sing <a href="http://www.pandora.com/music/artist/scorpions">Scorpions</a> songs is a reasonable basis to suggest similar artists like <a href="http://www.pandora.com/music/artist/def+leppard">Def Leppard</a> and <a href="http://www.pandora.com/music/artist/guns+n+roses">Guns N&#8217; Roses</a>. Or perhaps to suggest 80s songs in general&#8211;after all, <a href="http://www.karaokeholics.com/home.cfm?dir_cat=17160">karaoke roulette</a> notwithstanding, most people sing songs they know (or at least think they know), and their song knowledge tends to have some temporal locality. I&#8217;m sure you can imagine far more sophisticated personalization&#8211;and such personalization could be accomplished with complete <a href="http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/">transparency</a> to the user.</p>
<p>Even if you aren&#8217;t into karaoke (and yet have managed to read this far!), I hope you can appreciate the universality of the information needs I&#8217;m describing. <a href="http://en.wikipedia.org/wiki/Exploratory_search">Exploratory search</a> is everywhere. But I think it&#8217;s easiest to demonstrate its practical importance by working through concrete use cases. As an <a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval">HCIR</a> advocate, I&#8217;ve repeatedly learned the lesson that such demonstrations are critical in order to successfully evangelize this worldview.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/13/karaoke-a-hotbed-for-micro-ir/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Faceted Search Presentation at New York CTO Club</title>
		<link>http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/</link>
		<comments>http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 14:53:08 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2843</guid>
		<description><![CDATA[
Otis Gospodnetic and I recently gave a talk at the New York CTO Club on faceted search. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my book or attended the UIE virtual seminar [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_2690072" style="width: 425px; text-align: left;"><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=facetedsearchnyctotalk-091210081555-phpapp01&amp;stripped_title=faceted-search-nycto-talk" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=facetedsearchnyctotalk-091210081555-phpapp01&amp;stripped_title=faceted-search-nycto-talk" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p><a href="http://www.jroller.com/otis/entry/faceted_search_by_daniel_tunkelang">Otis Gospodnetic</a> and I recently gave a talk at the New York CTO Club on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>. The club is a group of senior technologists who meet monthly in midtown Manhattan to host breakfast presentations and to share ideas and expertise. Those of you who have read my <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">book</a> or attended the <a href="http://www.uie.com/events/virtual_seminars/facets/">UIE virtual seminar</a> a few months ago that I gave with Pete Bell (whom I worked with for 10 years at <a href="http://endeca.com/">Endeca</a>) might recognize some of my material. Otis focused on the specifics of implementing faceted search using the open-source <a href="http://lucene.apache.org/solr/">Solr</a> platform.</p>
<p>Here were the major take-aways:</p>
<ul>
<li>Think about what users are trying to do, not just how they search.</li>
<li>Facets get polluted with bad result sets, so offer <a href="http://thenoisychannel.com/2008/06/02/clarification-vs-refinement/">clarification before refinement</a>.</li>
<li>Don&#8217;t just move the information overload problem to the facets! Show less, not more.</li>
<li>Look at the potential data facets you already have, you will be surprised.</li>
<li>Facets can come from new data, e.g. sentiment.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/10/faceted-search-presentation-at-new-york-cto-club/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: Living La Vida Local</title>
		<link>http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/</link>
		<comments>http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 21:44:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2836</guid>
		<description><![CDATA[My new role at Google (yes, it still feels new after not quite a month!) has given me a professional interest in local search. I&#8217;ve adjusted my reading materials accordingly, and I&#8217;ve started reading blogs that focus on local. Here are a handful that I&#8217;ve discovered so far:

BIA / Kelsey Blog

By The Kelsey Group, a [...]]]></description>
			<content:encoded><![CDATA[<p>My new role at Google (yes, it still feels new after not quite a month!) has given me a professional interest in <a href="http://en.wikipedia.org/wiki/Local_search_%28Internet%29">local search</a>. I&#8217;ve adjusted my reading materials accordingly, and I&#8217;ve started reading blogs that focus on local. Here are a handful that I&#8217;ve discovered so far:</p>
<ul>
<li><a href="http://blog.kelseygroup.com/">BIA / Kelsey Blog</a>
<ul>
<li>By <a href="http://kelseygroup.com/">The Kelsey Group</a>, a division of <a href="http://www.bia.com/" target="_blank">BIA Advisory Services</a> that provides data and analysis on directories and local media.</li>
</ul>
</li>
<li><a href="http://blog.telemapics.com/">Exploring Local</a>
<ul>
<li> By <a href="http://www.glgroup.com/Council-Member/Michael-Dobson-178033.html">Mike Dobson</a>, President of <a href="http://telemapics.com/">TeleMapics</a>, a company that provides consulting services focused on local search.</li>
</ul>
</li>
<li><a href="http://www.localseoguide.com/">Local SEO Guide</a>
<ul>
<li><a href="http://www.localseoguide.com/about-me?PHPSESSID=7509db638c808d8ac60e49cc596c99fc">Andrew Shotland</a>&#8217;s blog on local search optimization, small business marketing &amp; search engine optimization strategy.</li>
</ul>
</li>
<li><a href="http://www.localsearchdatabase.com/">Localsearchdatabase</a>
<ul>
<li>By <a href="http://twitter.com/golander59">Gib Olander</a>, Director of Business Development for <a href="http://www.localeze.com/">Localeze</a>, an online content management company serving businesses, local search engines and consumers.</li>
</ul>
</li>
<li><a href="http://www.davidmihm.com/blog/">Mihmorandum</a>
<ul>
<li><a href="http://www.davidmihm.com/">David Mihm</a>&#8217;s blog on local search engine optimization and marketing.</li>
</ul>
</li>
<li><a href="http://gesterling.wordpress.com/">Screenwerk</a>
<ul>
<li><a href="http://gesterling.wordpress.com/about/">Greg Sterling</a>&#8217;s thoughts on online and offline media. Sterling used to run The Kelsey Group’s Interactive Local Media program.</li>
</ul>
</li>
<li><a href="http://www.solaswebdesign.net/wordpress/">SEO Igloo Blog</a>
<ul>
<li>By <a href="http://www.solaswebdesign.net/">Solas Web Design</a>, which specializes in web design and search engine optimization for small businesses.</li>
</ul>
</li>
<li><a href="http://blumenthals.com/blog/">Understanding Google Maps &amp; Local Search</a>
<ul>
<li>By <a href="http://www.blumenthals.com/index.php?MikeBlumenthal">Mike Blumenthal</a>, whose company offers consulting services and market research advice relating to maps and local search.</li>
</ul>
</li>
</ul>
<p>Not surprisingly, these blogs offers me a critical perspective on how Google and other search engines serve the local space.  Granted, everyone has their own motives&#8211;and it&#8217;s hard to avoid some tension in a space with the competitive dynamics of local search. But now that I&#8217;m no longer an outsider myself, I appreciate having others to help keep me honest as I work to make local search better for users and businesses.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Search User Interfaces and Data Quality</title>
		<link>http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/</link>
		<comments>http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 04:49:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2831</guid>
		<description><![CDATA[One of the many things I&#8217;ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about HCIR. Indeed, some of the folks working on &#8220;more and better search refinements&#8221; are just steps away from my desk. Very cool!
But working [...]]]></description>
			<content:encoded><![CDATA[<p>One of the many things I&#8217;ve enjoyed in my first few weeks of working at Google is the opportunity to talk with many people who care about user interfaces and think about <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>. Indeed, some of the folks working on &#8220;<a href="http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-results.html">more and better search refinements</a>&#8221; are just steps away from my desk. Very cool!</p>
<p>But working on the inside has also help me appreciate what <a href="http://www.linkedin.com/in/bobwyman">Bob Wyman</a> tried to <a href="http://thenoisychannel.com/2009/02/05/what-would-google-do-what-does-google-do/">tell me</a> months ago&#8211;that Google has no philosophical predilection towards black box approaches, but rather is only limited by what technology makes possible and what its engineers can implement. I&#8217;d qualify that slightly by saying that I perceive an additional constraint: Google does have a strong predilection towards data-driven decisions. Some folks have found that approach <a href="http://stopdesign.com/archive/2009/03/20/goodbye-google.html">objectionable</a> in the context of interface design.</p>
<p>Anyway, if you&#8217;re a regular here, then you&#8217;re probably predisposed towards HCIR and <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>. In that case, I&#8217;d like to take a moment to help you appreciate the challenge I face on a day-to-day basis.</p>
<p>Which one of these two statements do you most agree with?</p>
<ol>
<li>We need better data quality in order to support richer search user interfaces.</li>
<li>Richer search user interfaces allow us to overcome data quality limitations.</li>
</ol>
<p>On one hand, consider two search engines whose interfaces are designed to support exploratory search: <a href="http://www.cuil.com/">Cuil</a> and <a href="http://www.kosmix.com/">Kosmix</a>. Sometimes they&#8217;re great, e.g., [<a href="http://www.cuil.com/search?q=michael+jackson">michael jackson</a>] on Cuil and [<a href="http://www.kosmix.com/topic/iraq">iraq</a>] on Kosmix. But look what can happen for queries that are further out in the tail, e.g. [<a href="http://www.cuil.com/search?q=faceted+search">faceted search</a>] on Cuil [<a href="http://www.kosmix.com/topic/real_time_search">real time search</a>] on Kosmix. Yes, the kinds of queries I make. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  I don&#8217;t mean to knock these guys&#8211;they&#8217;re trying, and their efforts are admirable. Moreover, both generally return respectable search results on the first pages (in Kosmix&#8217;s case, through federation). But the search refinements can be way off, and that undermine the overall experience. I strongly suspect that the problem is one of data quality, along the lines of what <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">others have argued</a>.</p>
<p>On the other hand, some of the work that I did with colleagues at Endeca (e.g., work presented at <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2008/">HCIR 2008</a> on &#8220;Supporting Exploratory Search for the ACM Digital Library&#8221;) at least dangles the possibility that the second statement holds&#8211;namely, a richer user interface could help overcome data quality limitations. Interaction draws more of the information need out of the user, and the process may be able to mask imperfection in the data. For example, it&#8217;s clear to users&#8211;and clear from the search refinements&#8211;that [<a href="http://www.google.com/search?q=michael+jackson+beer">michael jackson beer</a>] and [<a href="http://www.google.com/search?q=michael+jackson+-beer">michael jackson -beer</a>] are about different people. If we can just get that incremental information from the user, we don&#8217;t have to achieve perfection in named entity recognition and disambiguation.</p>
<p>I think there&#8217;s some truth in both arguments. Data quality is a major bottleneck for effectively delivering an exploratory search experience, and data quantity, <a href="http://thenoisychannel.com/2009/03/31/the-unreasonable-effectiveness-of-data/">much as it helps</a>, is not a guarantee of quality. Richer interfaces offer the enticing possibility of leveraging <a href="http://en.wikipedia.org/wiki/Human-based_computation">human computation</a>, but they also introduce the risk of disappointing and alienating users. Even for an HCIR zealot like me, the constraints of reality are sobering.</p>
<p>And yes, speed and computational cost matter too. But hey, it wouldn&#8217;t be a <a href="http://thenoisychannel.com/2008/04/06/nick-belkin-at-ecir-08/">grand challenge</a> if it were easy!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/feed/</wfw:commentRss>
		<slash:comments>27</slash:comments>
		</item>
		<item>
		<title>Fun with Google, Bing, and Yahoo</title>
		<link>http://thenoisychannel.com/2009/11/29/fun-with-google-bing-and-yahoo/</link>
		<comments>http://thenoisychannel.com/2009/11/29/fun-with-google-bing-and-yahoo/#comments</comments>
		<pubDate>Sun, 29 Nov 2009 19:45:06 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2809</guid>
		<description><![CDATA[Web search is a fiercely competitive space&#8211;as Google points out, &#8220;competition is just one click away&#8220;. In practice, I take that claim with a grain of salt&#8211;but I do think the switching costs are much lower than in most competitive markets. With that in mind, it&#8217;s interesting to look at what happens if you search [...]]]></description>
			<content:encoded><![CDATA[<p>Web search is a fiercely competitive space&#8211;as Google points out, &#8220;<a href="http://googlepublicpolicy.blogspot.com/2009/05/googles-approach-to-competition.html">competition is just one click away</a>&#8220;. In practice, I take that claim with a grain of salt&#8211;but I do think the switching costs are much lower than in most competitive markets. With that in mind, it&#8217;s interesting to look at what happens if you search for the name of one of the major search engines on one of its competitor&#8217;s sites.</p>
<p>Google returns standard results for such searches:</p>
<p><a href="http://www.google.com/search?q=bing"><img class="alignnone size-full wp-image-2814" title="[bing] on Google" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/google-bing2.png" alt="[bing] on Google" width="414" height="180" /></a></p>
<p><a href="http://www.google.com/search?q=yahoo"><img class="alignnone size-full wp-image-2815" title="[yahoo] on Google" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/google-yahoo1.png" alt="[yahoo] on Google" width="414" height="266" /></a></p>
<p>Bing is generous to a fault, saving you a click if you choose to use one of its leading competitors:</p>
<p><a href="http://www.bing.com/search?q=google"><img class="alignnone size-full wp-image-2818" title="[google] on Bing" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/bing-google.png" alt="[google] on Bing" width="426" height="247" /></a></p>
<p><a href="http://www.bing.com/search?q=yahoo"><img class="alignnone size-full wp-image-2819" title="[yahoo] on Bing" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/bing-yahoo.png" alt="[yahoo] on Bing" width="426" height="262" /></a></p>
<p>Finally Yahoo, whose CEO claims &#8220;<a href="http://bits.blogs.nytimes.com/2009/08/07/yahoo-ceo-we-have-never-been-a-search-company/">we have never been a search company</a>,&#8221; seems quite eager to keep searchers from going elsewhere:</p>
<p><a href="http://search.yahoo.com/search?p=bing"><img class="alignnone size-full wp-image-2826" title="[bing] on Yahoo" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/yahoo-bing1.png" alt="[bing] on Yahoo" width="515" height="276" /></a></p>
<p><a href="http://search.yahoo.com/search?p=google"><img class="alignnone size-full wp-image-2827" title="[google] on Yahoo" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/yahoo-google1.png" alt="[google] on Yahoo" width="515" height="246" /></a></p>
<p>It&#8217;s easy to dismiss these queries as corner cases, but the logs show that they really happen. And, as browsers increasingly blur the line between an address bar and a search box, it&#8217;s not unreasonable to consider that switches between search engines are likely to commence with such queries.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/29/fun-with-google-bing-and-yahoo/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Marti Hearst: Tech Talk on Search User Interfaces</title>
		<link>http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/</link>
		<comments>http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/#comments</comments>
		<pubDate>Wed, 25 Nov 2009 23:50:48 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2805</guid>
		<description><![CDATA[
Earlier this week, Marti Hearst gave a Tech Talk at Google about her recently published book, Search User Interfaces. Fortunately for those of us who missed (myself included!), it is now available on YouTube. Enjoy! (via Jon Elsas)
]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/BpBAg4Ndi9w&amp;hl=en_US&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/BpBAg4Ndi9w&amp;hl=en_US&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Earlier this week, <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a> gave a Tech Talk at Google about her recently published book, <a href="http://searchuserinterfaces.com/"><em>Search User Interfaces</em></a>. Fortunately for those of us who missed (myself included!), it is now available on <a href="http://www.youtube.com/watch?v=BpBAg4Ndi9w">YouTube</a>. Enjoy! (via <a href="http://windowoffice.tumblr.com/post/257316365/search-user-interfaces-the-movie">Jon Elsas</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/25/marti-hearst-tech-talk-on-search-user-interfaces/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Can We Learn From Anti-Social Users?</title>
		<link>http://thenoisychannel.com/2009/11/21/can-we-learn-from-anti-social-users/</link>
		<comments>http://thenoisychannel.com/2009/11/21/can-we-learn-from-anti-social-users/#comments</comments>
		<pubDate>Sat, 21 Nov 2009 21:54:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2798</guid>
		<description><![CDATA[One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise&#8211;be they links, re-tweets, or any number other ways that online consumers (or more correctly &#8220;prosumers&#8221;) actively and passively [...]]]></description>
			<content:encoded><![CDATA[<p>One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise&#8211;be they links, re-tweets, or any number other ways that online consumers (or more correctly &#8220;prosumers&#8221;) actively and passively communicate value judgments about information. On the other hand, our reliance on these social signals makes us vulnerable to positive feedback and spammers.</p>
<p>Consider <a href="http://www.princeton.edu/~mjs3/musiclab.shtml">MusicLab</a>, an &#8220;<a href="http://www.princeton.edu/%7Emjs3/salganik_watts08.pdf" target="_blank">experimental study of self-fulfilling prophecies in an artificial cultural market</a>&#8220;. In this study, sociologists <a href="http://www.princeton.edu/~mjs3/index.shtml">Matt Salganik</a>, <a href="http://www.uvm.edu/~pdodds/home.html">Peter Dodds</a>, and <a href="http://en.wikipedia.org/wiki/Duncan_J._Watts">Duncan Watts</a> manipulated the social information available to consumers (specifically teens) regarding their peers&#8217; musical tastes. The experimenters&#8217; goal was to empirically validate a quantitative model of social contagion.</p>
<p>But we can look at this study another way: by isolating the social factors that influence musical taste, the experimenters were also isolating the non-social signal&#8211;in theory, how popular a song would be in the absence of social signaling. Indeed, they found that, if they measured a song&#8217;s quality by isolating out the social factor, &#8220;the best songs never do very badly, and the worst songs never do extremely well, but almost any other result is possible&#8221;.</p>
<p>It&#8217;s interesting&#8211;interesting to me, at least!&#8211;to ask if search engines can do the same for search. One of the frequent objections to link-based authority measures like <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a> is that they make the rich get richer. &#8220;Real-time&#8221; variants like re-tweet frequency (and even <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a>) suffer from the same weakness. Unchecked, these measures can cause authority / influence market has to resemble a <a href="http://ingrimayne.com/econ/resouceProblems/WinnerTakeIt.html">winner-take-all</a> market.</p>
<p>It strikes me as interesting to learn from cases where searchers swim upstream against the social signals to find information. Of course, you may already see the contradiction&#8211;this is just another kind of social signaling! Still, it seems like it might be a way to hedge our bets and against the weaknesses of positive feedback and spammers. In a similar vein, we might look at how users find information that suffers from poor <a href="http://thenoisychannel.com/2008/04/22/accessibility-in-information-retrieval/">accessibility</a> or <a href="http://thenoisychannel.com/2009/09/26/information-retrievability/">retrievability</a>.</p>
<p>I don&#8217;t have answers about how to pursue such an approach, or whether it would even be feasible to do so. But I hope you agree with me that it&#8217;s an interesting question.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/21/can-we-learn-from-anti-social-users/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Exploring Explortatory Search</title>
		<link>http://thenoisychannel.com/2009/11/18/exploring-explortatory-search/</link>
		<comments>http://thenoisychannel.com/2009/11/18/exploring-explortatory-search/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 13:44:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2792</guid>
		<description><![CDATA[
Google&#8217;s recently released Image Swirl is slick. But I&#8217;ve been struggling to figure out whether it&#8217;s useful or simply a showcase for cool technology.
And that&#8217;s prompted me to think about the overloaded term &#8220;exploratory search&#8220;. A while back, I tried to define exploratory search based on what it is not. This time, let me aim [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://googleblog.blogspot.com/2009/11/explore-images-with-google-image-swirl.html"><img class="aligncenter" title="Google Image Swirl" src="http://2.bp.blogspot.com/_7ZYqYi4xigk/SwLfp7ciT2I/AAAAAAAAE8Y/NpojWXrDCb0/s1600/washington+monument.png" alt="" width="377" height="237" /></a></p>
<p>Google&#8217;s recently released <a href="http://googleblog.blogspot.com/2009/11/explore-images-with-google-image-swirl.html">Image Swirl</a> is slick. But I&#8217;ve been struggling to figure out whether it&#8217;s useful or simply a showcase for cool technology.</p>
<p>And that&#8217;s prompted me to think about the overloaded term &#8220;<a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>&#8220;. A while back, I tried to define exploratory search based on <a href="http://thenoisychannel.com/2008/06/24/what-is-not-exploratory-search/">what it is not</a>. This time, let me aim to positively characterize what I see as its two primary use cases:</p>
<ol>
<li>I know what I want, but I don&#8217;t know how to describe it.</li>
<li>I don&#8217;t know what I want, but I hope to figure it out once I see what&#8217;s out there.</li>
</ol>
<p>The first use case cries out for tools that support query refinement or elaboration. Existing tools span a range from suggesting spelling corrections (aka &#8220;did you mean&#8221;) to offering semantically or statistically related searches that hopefully provide the user with at least a step in the right direction. One of my favorite approaches, <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, is primarily used to support query refinement through progressive narrowing of an initial search query.</p>
<p>The second &#8220;I don&#8217;t know what I want&#8221; use case is fuzzier. In the language of <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a>, this use case is <a href="http://en.wikipedia.org/wiki/Unsupervised_learning">unsupervised</a>, while the previous one is <a href="http://en.wikipedia.org/wiki/Supervised_learning">supervised</a>. In general, it&#8217;s a lot harder to define or evaluate outcomes for unsupervised scenarios. Indeed, <a href="http://nlpers.blogspot.com/2006/04/unsupervised-learning-why.html">Hal Daume has argued</a> that we should only do unsupervised learning if we do not have a trustworthy automatic evaluation metric. That&#8217;s a strong position, and you can see some of the counterarguments in his comment thread. But, going back to our scenario, it&#8217;s really hard to judge the effectiveness of tools like <a href="http://maroo.cs.umass.edu/pub/web/getpdf.php?id=614">similarity browsing</a> when they support exploration in the absence of any concrete goal.</p>
<p>With that in mind, I&#8217;ll reserve judgment on the utility of tools like Image Swirl. To the extent that it aims at the first use case, clustering images for a particular search, I&#8217;m ambivalent. I&#8217;d prefer a more transparent interface, in which I have more of a sense of control over the navigational experience. I suspect it is more aimed at the second use case, offering a compact visualization of what is out there.</p>
<p>Besides, as some folks have brought up at the <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> workshops, it&#8217;s important that we make information seeking fun. And Swirl certainly scores on that front.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/18/exploring-explortatory-search/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>An Ad-Supported Model With Teeth?</title>
		<link>http://thenoisychannel.com/2009/11/15/an-ad-supported-model-with-teeth/</link>
		<comments>http://thenoisychannel.com/2009/11/15/an-ad-supported-model-with-teeth/#comments</comments>
		<pubDate>Sun, 15 Nov 2009 13:29:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2788</guid>
		<description><![CDATA[A computer-implemented method for operating a device, the method      comprising:
disabling a function of an operating system in a      device;
presenting an advertisement in the device while the function is      disabled;
and enabling the function in response to the advertisement    [...]]]></description>
			<content:encoded><![CDATA[<p><em>A computer-implemented method for operating a device, the method      comprising:<br />
disabling a function of an operating system in a      device;<br />
presenting an advertisement in the device while the function is      disabled;<br />
and enabling the function in response to the advertisement      ending.</em></p>
<p>So reads the first claim from a <a href="http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&amp;Sect2=HITOFF&amp;d=PG01&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsrchnum.html&amp;r=1&amp;f=G&amp;l=50&amp;s1=%2220090265214%22.PGNR.&amp;OS=DN/20090265214&amp;RS=DN/20090265214">patent application</a> that Apple recently filed (with Steve Jobs as first inventor, no less!) for technology to deliver a rather compelling ad-supported business model. Or perhaps the better word is compulsory. You can read an analysis by Randall Stross in the <a href="http://www.nytimes.com/2009/11/15/business/15digi.html">New York Times</a>.</p>
<p>I agree with Stross that it&#8217;s hard to imagine Apple ever implementing the technology described by the patent application&#8211;indeed, Apple has been one of the few success stories for paid digital content models. That said, the approach does feel like at least one endpoint for the ad-supported model&#8211;it guarantees the advertisers the attention that they are paying for by subsidizing content or services.</p>
<p>The advertising business is a bit more top of mind for me, now that it pays my salary. Google&#8217;s approach, however, follows the aphorism that honey catches more flies than vinegar: it tries to target ads well enough that users want to click on them, rather than to simply endure them as a cost of subsidizing free services. Google&#8217;s revenue (and the popularity of <a href="http://en.wikipedia.org/wiki/Pay_per_click">PPC</a> models in general) is a testament to the success of this approach, my occasional <a href="http://thenoisychannel.com/2008/10/09/search-is-not-advertising/">rant</a> notwithstanding.</p>
<p>In general, the industry seems to have found a compromise in how aggressively to push ads at users. Users can safely ignore (or even block) sponsored links, but few people do.  Pre-roll ads on video sites (i.e., advertising before a video starts)  are more invasive, but a number of sites let users skip them. You can read why the YouTube folks are <a href="http://ytbizblog.blogspot.com/2009/11/skip-skip-skip-to-my-video.html">testing</a> this approach. Advertisers&#8211;or at least ad-supported services&#8211;seem to recognize that they can&#8217;t cross the line between pursuing users&#8217; attention and annoying users to the point of alienation.</p>
<p>Still, technology like Apple&#8217;s patent application describes shows that it is possible for the ad-supported model to take a more more aggressive approach. Part of me wonders if more aggressive ad-supported models would revitalize paid content models, as users would stop perceiving the former as free. But I suspect that the gentler ad-supported model is here to stay, and that it will continue to strive toward the point of optimal effectiveness.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/15/an-ad-supported-model-with-teeth/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Call for Speakers: Enterprise Search Summit 2010</title>
		<link>http://thenoisychannel.com/2009/11/13/call-for-speakers-enterprise-search-summit-2010/</link>
		<comments>http://thenoisychannel.com/2009/11/13/call-for-speakers-enterprise-search-summit-2010/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 23:07:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2780</guid>
		<description><![CDATA[I&#8217;m no longer in the enterprise search business, but I know that many readers here are. If you are one of those readers, then I strongly encourage you to consider participating in the Enterprise Search Summit, which will take place next May in New York. I presented there last year and enjoyed the opportunity to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m no longer in the <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise search</a> business, but I know that many readers here are. If you are one of those readers, then I strongly encourage you to consider participating in the <a href="http://www.enterprisesearchsummit.com/2010/">Enterprise Search Summit</a>, which will take place next May in New York. I presented there last year and enjoyed the opportunity to meet fellow presenters and attendees. You can read my <a href="http://thenoisychannel.com/2009/05/13/reprising-the-enterprise-search-summit/">recap</a> of the event.</p>
<p>The deadline for <a href="https://secure.infotoday.com/forms/default.aspx?form=ess2010speakers">proposal submission</a> is November 30th&#8211;you only have to submit a 250-word abstract.</p>
<p>Here is the <a href="http://www.enterprisesearchsummit.com/2010/CallForSpeakers.shtml">call for proposals</a>:</p>
<blockquote>
<p style="font-size: 12px;">We seek dynamic speakers who can talk knowledgeably about detailed aspects of how to implement and maximize search within an organization. Search can no longer be viewed as a stand-alone application. It is increasingly part of everything we do and has become the <em>de facto</em> gateway to information in the enterprise. This year’s Summit will examine the ways to leverage search tools, information architecture, classification, and other strategies and technologies to deliver meaningful results—not just in terms of information, but to the bottom line.</p>
<p style="font-size: 12px;">Ours is a well-informed, tech-savvy audience, so proposals should be specific and detailed. Consider topic such as:</p>
<ul style="font-size: 12px;">
<li>Integrating search into enterprise systems and workflow</li>
<li>Customizing your search solution/ Task-specific search</li>
<li>Compliance, records management, and eDiscovery with effective search</li>
<li>Migrating your search engine</li>
<li>Social search and social tagging strategies &amp; solutions</li>
<li>Search-enabled decision making</li>
<li>Business intelligence, data mining</li>
<li>Search as the gateway to enterprise information</li>
<li>Optimizing the interface and user experience</li>
<li>Navigational tools—context, facets, entity extraction, clustering, and visualization</li>
<li>Emerging trends, the future of search</li>
<li>Overcoming information overload</li>
<li>Categorization techniques</li>
<li>Semantic Search</li>
<li>Query Federation &amp; Federated Search</li>
<li>Enhancing an existing solution</li>
</ul>
<p style="font-size: 12px;">If you represent a company that has an enterprise search software product, your best bet to be on our program is to collaborate with a customer to submit a case study to be presented by them, following the guidelines above.</p>
</blockquote>
<p>If you need more information&#8211;or more time&#8211;I encourage you to reach out directly to <a href="mailto:michelle.manafy@infotoday.com">Michelle Manafy</a>, the conference chair.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/13/call-for-speakers-enterprise-search-summit-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Week 1 at Google: Information Overload!</title>
		<link>http://thenoisychannel.com/2009/11/13/week-1-at-google-information-overload/</link>
		<comments>http://thenoisychannel.com/2009/11/13/week-1-at-google-information-overload/#comments</comments>
		<pubDate>Fri, 13 Nov 2009 13:12:52 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2777</guid>
		<description><![CDATA[As you might imagine, it&#8217;s quite a switch to go from criticizing
Google from the outside to being on the inside. Jeff Jarvis, who was
gracious enough not to make fun of me in public, nonetheless admitted
to me privately that the news had made him chuckle.
As I finish my first week, I can sum the experience in [...]]]></description>
			<content:encoded><![CDATA[<p>As you might imagine, it&#8217;s quite a switch to go from criticizing<br />
Google from the outside to being on the inside. <a href="http://www.buzzmachine.com/">Jeff Jarvis</a>, who was<br />
gracious enough not to make fun of me in public, nonetheless admitted<br />
to me privately that the news had made him chuckle.</p>
<p>As I finish my first week, I can sum the experience in a word:<br />
overwhelming. The tools for accessing internal information are better<br />
than I expected, but both the volume of baseline knowledge&#8211;technical<br />
and cultural&#8211;and the relentlessness of the update stream are<br />
daunting.</p>
<p>Indeed, the internal ecosystem is so rich that it&#8217;s easy to forget<br />
there is a world outside it&#8211;ironic given Google&#8217;s enormous role in the world outside it! Then again, this is just my first week&#8211;it will take me<br />
some time to pop up the stack from the <a href="http://en.wikipedia.org/wiki/Build_system">build system</a> to the surface.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/13/week-1-at-google-information-overload/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>The Noisy Noogler: A Quick FAQ</title>
		<link>http://thenoisychannel.com/2009/11/10/the-noisy-noogler-a-quick-faq/</link>
		<comments>http://thenoisychannel.com/2009/11/10/the-noisy-noogler-a-quick-faq/#comments</comments>
		<pubDate>Tue, 10 Nov 2009 13:43:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2774</guid>
		<description><![CDATA[I&#8217;m barely 24 hours into my new life as a Googler, and I&#8217;ve already gotten lots of questions! Here at the answers to a few of them:
Will I continue blogging at The Noisy Channel?
Absolutely! I&#8217;m committed to posting at least weekly, and I&#8217;ll try to do better than that once I&#8217;m settled into my new [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m barely 24 hours into my new life as a Googler, and I&#8217;ve already gotten lots of questions! Here at the answers to a few of them:</p>
<p><strong>Will I continue blogging at The Noisy Channel?</strong></p>
<p>Absolutely! I&#8217;m committed to posting at least weekly, and I&#8217;ll try to do better than that once I&#8217;m settled into my new environment.</p>
<p><strong>Will I participate in scholarly conferences and workshops?</strong></p>
<p>Of course! I&#8217;m co-organizing <a href="http://ir.mathcs.emory.edu/SSM2010/">SSM 2010</a>, which will be held in conjunction with <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a> in February, and of course <a href="http://iiix2010.org/hcir.php">HCIR 2010</a>, which will be held in conjunction with <a href="http://iiix2010.org/">IIiX 2010</a> in August. You probably won&#8217;t see me at vendor fests, but I do hope to continue bringing industry practitioners and academic researchers together.</p>
<p><strong>Will I blog about Google?</strong></p>
<p>I certainly won&#8217;t disclose any confidential information&#8211;people get <a href="http://blogoscoped.com/archive/2005-02-08-n55.html">fired</a> for that&#8211;or worse. And, given how much access I will have to such information, I will err on the side of caution, only discussing information that I&#8217;m sure Google has released to the general public. Beyond that, I&#8217;ll exercise common sense. I don&#8217;t want to either come across as a shill for my employer or to spar with my new colleagues in public. Subject to those constraints, however, I can and will blog about Google.</p>
<p><strong>Can I get you a job at Google?</strong></p>
<p>I can advise you and connect you to a recruiter, but that&#8217;s the limitof my power. The hiring process here is specifically designed to prevent any individual from manipulating it&#8211;even me!</p>
<p><strong>Will I talk about what I&#8217;m working on?</strong></p>
<p>See above regarding confidential information. I&#8217;ll be delighted talk about anything I&#8217;m working on that Google has decided to disclose publicly.</p>
<p><strong>Does Google know about my karaoke habit?</strong></p>
<p>Too late, they&#8217;ve already signed the offer letter. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/10/the-noisy-noogler-a-quick-faq/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Apologies for Slow Response Times</title>
		<link>http://thenoisychannel.com/2009/11/06/apologies-for-slow-response-times/</link>
		<comments>http://thenoisychannel.com/2009/11/06/apologies-for-slow-response-times/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 23:11:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2770</guid>
		<description><![CDATA[I am without my own laptop for a few days as I manage a transition between jobs. So I apologize in advance if I am slow to respond to email, comments, or tweets over the weekend. I&#8217;ll be back at full steam early next week.
]]></description>
			<content:encoded><![CDATA[<p>I am without my own laptop for a few days as I manage a <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">transition between jobs</a>. So I apologize in advance if I am slow to respond to email, comments, or tweets over the weekend. I&#8217;ll be back at full steam early next week.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/06/apologies-for-slow-response-times/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Going (to) Google</title>
		<link>http://thenoisychannel.com/2009/11/06/going-to-google/</link>
		<comments>http://thenoisychannel.com/2009/11/06/going-to-google/#comments</comments>
		<pubDate>Fri, 06 Nov 2009 21:39:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2763</guid>
		<description><![CDATA[
This is my last week at Endeca. The decision to leave has been a heart-wrenching one: not only have the past ten years been the best of my life, but my experiences at Endeca have defined me professionally. Moreover, Endeca is riding a wave of success with recent advances in our products, new relationships with [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://www.slideshare.net/dtunkelang/google-tech-talk-reconsidering-relevance-presentation"><img class="size-medium wp-image-2764 aligncenter" title="McGoogle" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/McGoogle-300x107.jpg" alt="McGoogle" width="300" height="107" /></a></p>
<p>This is my last week at <a href="http://endeca.com/">Endeca</a>. The decision to leave has been a heart-wrenching one: not only have the past ten years been the best of my life, but my experiences at Endeca have defined me professionally. Moreover, Endeca is riding a wave of success with recent advances in our products, new relationships with key partners, and fascinating new deployments.  (You can read Endeca’s latest announcements in our <a href="http://www.endeca.com/news-and-events-press-releases.htm">newsroom</a>).</p>
<p>Ironically, it is this very success that compels me to move on. In the past several years, I have developed an increasing passion for search on the open web&#8211;an interest only furthered by the explosion of social media.</p>
<p>That is why I&#8217;ve decided to accept an opportunity at Google&#8217;s New York office. Readers here know that I&#8217;ve been a very public critic of Google&#8217;s simplistic approach to user interaction on the open web. I&#8217;m being offered an opportunity to help fix that approach&#8211;and it is an offer I can&#8217;t refuse. My mission is to apply my passion for <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">human-computer information retrieval</a> (HCIR), an approach that Endeca has pioneered in the enterprise, to the world&#8217;s largest information problems&#8211;and where better to do that than at the company that aspires to organize the world&#8217;s information.</p>
<p>This moment is bittersweet: I am excited about the new experiences that await me, but I have a heavy heart as I turn in my badge and part with a world-class team that has succeeded against incredible odds.</p>
<p>Given my role and tenure at Endeca, I want to say explicitly that this move is about my personal ambition. My passion for web search and social media, which have grown exponentially over the past couple of years, simply doesn&#8217;t align with Endeca&#8217;s focus in the enterprise.</p>
<p>Also, I want to make clear: Google hired me because of my values, and not in spite of them. I know that some folks will find it difficult to reconcile my criticisms of Google with my decision to join. That&#8217;s why there&#8217;s an <a href="http://www.theonion.com/content/video/google_opt_out_feature_lets_users">opt-out village</a>! Seriously, though, I take my values with me. Google is offering me the opportunity to channel my passion for HCIR into action, on the world&#8217;s largest stage. I&#8217;m well aware of the magnitude of the challenge, but hey, I&#8217;m feeling lucky.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/06/going-to-google/feed/</wfw:commentRss>
		<slash:comments>62</slash:comments>
		</item>
		<item>
		<title>Twitter Lists as an Influence Measure?</title>
		<link>http://thenoisychannel.com/2009/11/01/twitter-lists-as-an-influence-measure/</link>
		<comments>http://thenoisychannel.com/2009/11/01/twitter-lists-as-an-influence-measure/#comments</comments>
		<pubDate>Sun, 01 Nov 2009 05:40:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2757</guid>
		<description><![CDATA[
In &#8220;Using Twitter Lists To Judge Influence&#8220;, Todd Zeigler of the Bivings Report writes:
I think Twitter Lists will end up helping separate the men from the boys when it comes to influence.  In addition to seeing a Twitter users follower count, we can now see the number of other Twitter users who have added them [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.amazon.com/Influence-Mary-Kate-Olsen/dp/159514210X"><img class="alignnone size-full wp-image-2758" title="Influence" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/11/influence.jpg" alt="Influence" width="179" height="220" /></a></p>
<p>In &#8220;<a href="http://www.bivingsreport.com/2009/using-twitter-lists-to-judge-influence/">Using Twitter Lists To Judge Influence</a>&#8220;, Todd Zeigler of the <a href="http://www.bivingsreport.com/">Bivings Report</a> writes:</p>
<blockquote><p>I think Twitter Lists will end up helping separate the men from the boys when it comes to influence.  In addition to seeing a Twitter users follower count, we can now see the number of other Twitter users who have added them to lists (example to the right).  I would argue that getting added to a list is a bigger deal than simply getting someone to follow you.</p></blockquote>
<p>I&#8217;m certainly intrigued by <a href="http://blog.twitter.com/2009/10/theres-list-for-that.html">Twitter Lists</a>, but I&#8217;m skeptical that counting how many lists someone is on will prove that much more useful than follower count. For example, <a href="http://twitter.com/dtunkelang">I</a> currently have <a href="http://twitter.com/dtunkelang/followers">1159 followers</a>, am on <a href="http://twitter.com/dtunkelang/lists/memberships">33 lists</a>, and have a <a href="http://twitter.com/dtunkelang/followers">TunkRank of 24.1</a>. For grins, here&#8217;s a handful of people who have similar stats:</p>
<ul>
<li><a href="http://twitter.com/kansandhaus">Evan Sandhaus</a>: 796 followers, 21 lists, TunkRank = 17.2</li>
<li><a href="http://twitter.com/jny2">Josh Young</a>: 801 followers, 25 lists, TunkRank = 14.3</li>
<li><span><a href="http://twitter.com/cjahearn">Chris Ahearn</a>: 1108 followers, 14 lists, TunkRank = </span>30.1</li>
<li><a href="http://twitter.com/brynn">Brynn Evans</a>: 1303 followers, 33 lists, TunkRank = 18.9</li>
<li><a href="http://twitter.com/eric_andersen">Eric Andersen</a>: 1543 followers, 37 lists, TunkRank = 3.1</li>
</ul>
<p>While I can&#8217;t generalize from a few arbitrarily selected data points (though Gladwell seems to have no trouble doing so in <a href="http://en.wikipedia.org/wiki/Outliers_%28book%29"><em>Outliers</em></a>), my suspicion is that list count will be highly correlated to follower count&#8211;and may actually be a noisier signal because the numbers are so much smaller.</p>
<p>Of course, there&#8217;s no reason we should use raw list counts&#8211;any more than we should use raw follower counts. Just as <a href="http://tunkrank.com/">TunkRank</a> aspires to <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">model attention scarcity</a> and recognizes that not all followers are created equal, an effective measure of how lists contribute to influence must recognize that not all list memberships are created equal either.</p>
<p>I&#8217;ve been chatting with <a href="http://twitter.com/chl">Chris Langreiter</a>, who is working on <a href="http://etherpad.com/HoPv2hJ4GB">enhancements to TunkRank</a> to address some of the oversimplifications of its model, as well as with <a href="http://twitter.com/jonathanglick">Jonathan Glick</a> and <a href="http://twitter.com/kenreisman">Ken Reisman</a> at <a href="http://www.tlists.com/">TLists</a>. I&#8217;d like to see online influence&#8211;on Twitter and in general&#8211;measured more effectively. It will be great if lists can help, but we can&#8217;t make the same naive mistakes as those who were quick to embrace <a href="http://thenoisychannel.com/2008/12/27/loic-le-meur-misses-the-point-of-twitter/">follower count as a measure of authority</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/11/01/twitter-lists-as-an-influence-measure/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Tuning in to Google Music Search</title>
		<link>http://thenoisychannel.com/2009/10/29/tuning-in-to-google-music-search/</link>
		<comments>http://thenoisychannel.com/2009/10/29/tuning-in-to-google-music-search/#comments</comments>
		<pubDate>Thu, 29 Oct 2009 17:09:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2751</guid>
		<description><![CDATA[
With all of the activity around e-books last week, you might think that the online world wasn&#8217;t paying attention to the media category most transformed by the Internet music. But a week is a lifetime in the ADD-addled technology press, and today&#8217;s top story is that Google is &#8220;making search more musical&#8220;. From the official [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/DV24RBmy-2I&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/DV24RBmy-2I&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;hl=en&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>With all of the activity around <a href="http://thenoisychannel.com/2009/10/20/books-books-books/">e-books</a> last week, you might think that the online world wasn&#8217;t paying attention to the media category most transformed by the Internet music. But a week is a lifetime in the <a href="http://en.wikipedia.org/wiki/Attention_deficit_disorder">ADD</a>-addled technology press, and today&#8217;s top story is that Google is &#8220;<a href="http://googleblog.blogspot.com/2009/10/making-search-more-musical.html">making search more musical</a>&#8220;. From the official blog post:</p>
<blockquote><p>Now, when you enter a music-related query — like the name of a song, artist or album — your search results will include links to an audio preview of those songs provided by our music search partners <a href="http://www.myspace.com/">MySpace</a> (which just acquired <a href="http://www.ilike.com/">iLike</a>) or <a href="http://www.lala.com/">Lala</a>. When you click the result you&#8217;ll be able to listen to an audio preview of the song directly from one of those partners.</p></blockquote>
<p>As with most Google features, this one is being rolled out gradually. If you&#8217;re impatient (like me), you can try it directly from <a href="http://www.google.com/landing/music/">this page</a>. Or you can watch the video above.</p>
<p>My first impression: this is great feature to improve <a href="http://www.db.dk/bh/Core%20Concepts%20in%20LIS/articles%20a-z/known_item_search.htm">known-item search</a>, and it&#8217;s nice that they&#8217;ve partnered with folks that often let you hear whole songs, rather than 30-second snippets. The selection seems limited, but it could be that my tastes are a bit obscure. I&#8217;m curious if others share my sense that the catalog is much smaller than the ones on iTunes or Amazon.</p>
<p>But, as music IR specialist and fellow <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> advocate <a href="http://www.fxpal.com/?p=jeremy">Jeremy Pickens</a> points out, Google is &#8220;<a href="http://irgupf.com/2009/10/28/doing-to-music-what-they-did-to-the-web/">doing to music what they did to the web</a>&#8220;. I&#8217;m not as concerned as Jeremy is about the prospect of musical tastes being homogenized through the &#8220;rich get richer&#8221; effect of ranking&#8211;perhaps because we&#8217;re already there. Not only is pop music self-perpetuating (see this great <a href="http://www.princeton.edu/~mjs3/salganik_watts09.pdf">study</a> by my friend (and Princeton sociologist) <a href="http://www.princeton.edu/~mjs3/">Matt Salganik</a> and his former advisor <a href="http://en.wikipedia.org/wiki/Duncan_J._Watts">Duncan Watts</a>), but even <a href="http://thenoisychannel.com/2009/02/24/how-recommendation-engines-quash-diversity/">recommendation engines quash diversity</a>. Google really can&#8217;t make things that much worse.</p>
<p>Besides, much as Google&#8217;s default search leads many searchers to Wikipedia, a great starting point for <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>, the new music search leads users to <a href="http://www.pandora.com/">Pandora</a>, which i<span style="text-decoration: line-through;">s probably the leading engine for exploratory music search</span> offers users a more exploratory user experience (though it would be great if they also linked to <a href="http://www.last.fm/">last.fm</a>) <strong><em>(thanks Jeremy!)</em></strong>. OK, maybe &#8220;leads&#8221; is a strong word for a &#8220;listen on&#8221; link below the search result, but it&#8217;s there for people in the know.</p>
<p>I&#8217;d love to see Google embrace HCIR. But I appreciate the improvements to known-item search too, especially if they can delegate the HCIR functionality to others that focus on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/29/tuning-in-to-google-music-search/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Ben Shneiderman&#8217;s HCIR 2009 Keynote: The Future of Information Discovery</title>
		<link>http://thenoisychannel.com/2009/10/27/ben-shneidermans-hcir-2009-keynote-the-future-of-information-discovery/</link>
		<comments>http://thenoisychannel.com/2009/10/27/ben-shneidermans-hcir-2009-keynote-the-future-of-information-discovery/#comments</comments>
		<pubDate>Tue, 27 Oct 2009 17:10:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2739</guid>
		<description><![CDATA[
The slides for Ben Shneiderman&#8217;s HCIR 2009 keynote on &#8220;The Future of Information Discovery&#8221; are now available on the workshop web site. I&#8217;ve also taken the liberty to upload them to SlideShare and embed them here. The slides don&#8217;t do justice to Ben&#8217;s presentation style, but hopefully they at least communicate a taste of the [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_2358772" style="width: 477px; text-align: left;"><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="477" height="340" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayerd.swf?doc=hcir2009-futureinfodiscovery3-091027115639-phpapp01&amp;stripped_title=the-future-of-information-discovery" /><param name="allowfullscreen" value="true" /><embed style="margin:-0px" type="application/x-shockwave-flash" width="477" height="340" src="http://static.slidesharecdn.com/swf/ssplayerd.swf?doc=hcir2009-futureinfodiscovery3-091027115639-phpapp01&amp;stripped_title=the-future-of-information-discovery" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>The slides for <a href="http://www.cs.umd.edu/~ben/">Ben Shneiderman</a>&#8217;s <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> keynote on &#8220;<a href="http://cuaslis.org/hcir2009/HCIR2009-FutureInfoDiscovery3.pdf">The Future of Information Discovery</a>&#8221; are now available on the <a href="http://cuaslis.org/hcir2009/">workshop web site</a>. I&#8217;ve also taken the liberty to upload them to SlideShare and embed them here. The slides don&#8217;t do justice to Ben&#8217;s presentation style, but hopefully they at least communicate a taste of the material he covered and his vision of where <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a> needs to go as a field and community.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/27/ben-shneidermans-hcir-2009-keynote-the-future-of-information-discovery/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Experimenting with Social Search</title>
		<link>http://thenoisychannel.com/2009/10/26/google-experimenting-with-social-search/</link>
		<comments>http://thenoisychannel.com/2009/10/26/google-experimenting-with-social-search/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 20:31:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2734</guid>
		<description><![CDATA[
Google may be an also-ran in the social networking market with its Brazil-centric Orkut service, but that hasn&#8217;t stopped the search giant from adding social features to its products. A post at the (unofficial) Google Operating System blog recounts the history of Google Reader&#8217;s social evolution, up to but not including its latest update last [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="560" height="272" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/ZqWJxgp-_mU&amp;hl=en&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="560" height="272" src="http://www.youtube.com/v/ZqWJxgp-_mU&amp;hl=en&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>Google may be an also-ran in the <a href="http://en.wikipedia.org/wiki/List_of_social_networking_websites">social networking market</a> with its Brazil-centric <a href="http://en.wikipedia.org/wiki/Orkut">Orkut</a> service, but that hasn&#8217;t stopped the search giant from adding social features to its products. A post at the (unofficial) Google Operating System blog recounts the history of <a href="http://googlesystem.blogspot.com/2009/07/google-readers-social-evolution.html">Google Reader&#8217;s social evolution</a>, up to but not including its <a href="http://googleblog.blogspot.com/2009/10/reading-gets-personal-with-popular.html">latest update</a> last week. <a href="http://googleblog.blogspot.com/2008/11/searchwiki-make-search-your-own.html">SearchWiki</a>, though not a social search feature per se, allows users to share personal annotations of their search results, as does the more recently introduced <a href="http://www.google.com/sidewiki/intl/en/index.html">Sidewiki</a>. And, <a href="http://blog.twitter.com/2009/10/bing-goes-dynamite.html">like Bing</a>, Google has established a <a href="http://blog.twitter.com/2009/10/google-nice.html">partnership with Twitter</a> in order to surface &#8220;social&#8221; results.</p>
<p>But the feature announced today, which Google is actually calling &#8220;<a href="http://googleblog.blogspot.com/2009/10/introducing-google-social-search-i.html">Social Search</a>&#8220;, is a much bigger step, even if it is tucked away as an <a href="http://www.google.com/experimental/">experiment on Google Labs</a>. From the official blog post:</p>
<blockquote><p>With Social Search, Google finds relevant public content from your friends and contacts and highlights it for you at the bottom of your search results. When I do a simple query for [new york], Google Social Search includes my friend&#8217;s blog on the results page under the heading &#8220;Results from people in your social circle for New York.&#8221; I can also filter my results to see only content from my social circle by clicking &#8220;Show options&#8221; on the results page and clicking &#8220;Social.&#8221;</p></blockquote>
<p>I gave it a whirl, search for <a href="http://www.google.com/search?q=&quot;noisy+channel&quot;">&#8220;noisy channel&#8221;</a> and then restricting the search to content from what Google considers my social circle. The results are as promised, and could further refine to results by author name, selecting from a familiar list of Neal Richter, Jason Adams, Daniel Lemire. Ken Ellis, and Joshua Young (<span style="text-decoration: line-through;">though for some reason Josh&#8217;s link didn&#8217;t work</span>). Cool! Except that there are a lot of names missing (check out the bloggers in <a href="http://thenoisychannel.com/the-noisy-community/">The Noisy Community</a>) and, more importantly, I can&#8217;t further refine or even sort the search results. Indeed, the ordering of search results seems quite arbitrary&#8211;a phenomenon I&#8217;ve noticed more generally for search engine ranking of social media content.</p>
<p>In short, Google Social Search is a welcome initiative, but there&#8217;s a lot more work to do before I would find a productive use for it. Given the mismatch between social search and black-box relevance ranking, a little bit of <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a> would go a long way towards making this feature practically useful.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/26/google-experimenting-with-social-search/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>HCIR 2009: Human-Human Interaction</title>
		<link>http://thenoisychannel.com/2009/10/26/hcir-2009-human-human-interaction/</link>
		<comments>http://thenoisychannel.com/2009/10/26/hcir-2009-human-human-interaction/#comments</comments>
		<pubDate>Mon, 26 Oct 2009 14:24:19 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2727</guid>
		<description><![CDATA[On Friday, I had the privilege of seeing just how much the annual Workshop on Human-Computer Information Retrieval has grown up since I conceived it in the summer of 2007. Back then, my co-conspirators and I worried about attracting a critical mass of participants&#8211;indeed, Endeca employees easily accounted for a quarter of the attendees (and [...]]]></description>
			<content:encoded><![CDATA[<p>On Friday, I had the privilege of seeing just how much the annual <a href="http://cuaslis.org/hcir2009/">Workshop on Human-Computer Information Retrieval</a> has grown up since I conceived it in the summer of 2007. Back then, my co-conspirators and I worried about attracting a critical mass of participants&#8211;indeed, <a href="http://endeca.com/">Endeca</a> employees easily accounted for a quarter of the attendees (and submissions) at the <a href="http://projects.csail.mit.edu/hcir/">first HCIR workshop</a>. And even <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2008/">last year</a> host and co-sponsor <a href="http://research.microsoft.com/">Microsoft Research</a> supplied a disproportionate share of the attendees.</p>
<p>But this year was different. We were overloaded with strong submissions from all corners, and we had to turn people away for lack of capacity! While we didn&#8217;t relish saying no to prospective participants, these are great problems to have! And, thanks to Nick Belkin and Diane Kelly, we&#8217;ve arranged to greatly increase that capacity at <a href="http://iiix2010.org/hcir.php">HCIR 2010</a>&#8211;more on that in a moment.</p>
<p><a href="http://www.cs.swan.ac.uk/~csmax/">Max Wilson</a> has already written up an <a href="http://www.cs.swan.ac.uk/~csmax/blog/2009/10/hcir09-redux/">excellent summary</a> of the workshop, which I encourage you to read. You can also see the live tweet stream at <a href="http://search.twitter.com/search?q=%23hcir09">#hcir09</a>. Rather than duplicate these efforts, let me add my personal reflections as an organizer and participant.</p>
<p><a href="http://www.cs.umd.edu/~ben/">Ben Shneiderman</a>&#8217;s keynote address was sweeping and inspiring. I expected him to talk about <a href="http://en.wikipedia.org/wiki/Information_visualization">information visualization</a>, the area where he is most known for his contributions. He did present some examples of his group&#8217;s work on <a href="http://www.cs.umd.edu/hcil/lifelines2/">visualization-centric interfaces to support medical research</a>, but his overall presentation took the much more ambitious approach of discussing the past, present, and possible future of <a href="http://en.wikipedia.org/wiki/Human–computer_information_retrieval">HCIR</a>. Specifically, he urged us to link our work to societal goals, such as the <a href="http://www.un.org/millenniumgoals/">United Nations Millennium Development Goals</a>. His challenge may seem impossibly idealistic, but I agree with his assertion that it is a practical one: we will do our best research by grounding ourselves firmly in the real and pressing problems of our age. <a href="http://research.microsoft.com/en-us/um/people/sdumais/">Last year&#8217;s keynote speaker</a> went on to win the <a href="http://www.sigir.org/awards/awards.html">Gerard Salton Award</a>; I can only hope that Ben receives comparable accolades for his past accomplishments and future contributions to HCIR.</p>
<p>A new feature for this year&#8217;s workshop was having a &#8220;poster boaster&#8221; session, in which each of the presenters in the poster session had one minute to pitch his or her work.  For those of you unfamiliar with this format, I highly recommend it. The compressed format forces presenters to distill the essence of their contributions&#8211;a useful exercise in general. And the audience doesn&#8217;t get bored: if you decide halfway into a presentation that you aren&#8217;t interested, then you only have to wait 30 seconds until the next one! Not the we had that problem: the posters were consistently interesting, as the submissions were unusually strong this year. You can download the full workshop proceedings <a href="http://cuaslis.org/hcir2009/HCIR2009.pdf">here</a>.</p>
<p>Even the full presentations weren&#8217;t that long. The five speakers were each allotted ten minutes, with a healthy amount of time reserved for a panel-style Q&amp;A sessions. The papers in this session were, by design, some of the more controversial ones. In particular, <a href="http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/v/Voorhees:Ellen_M=.html">Ellen Voorhees</a> delivered a full-throated defense of <a href="http://en.wikipedia.org/wiki/Cranfield_Experiments">Cranfield</a> / <a href="http://en.wikipedia.org/wiki/Text_Retrieval_Conference">TREC</a>-style evaluation: &#8220;I Come Not to Bury Cranfield, but to Praise It&#8221; (similar to her <a href="http://www.dcs.gla.ac.uk/workshops/air/slides/EllenVoorhees-TestCollectionsforAIR.pdf">presentation</a> at the <a href="http://www.dcs.gla.ac.uk/workshops/air/">2006 Workshop on Adaptive Information Retrieval</a> that I <a href="http://thenoisychannel.com/2008/04/17/ellen-voorhees-defends-cranfield/">discussed</a> on this blog last year). Her reminder of HCIR&#8217;s challenges on the evaluation front surely ruffled some feathers, but all of us HCIR avocates need to address these challenges if we want researchers (and practitioners) outside our community to drink our kool-aid.</p>
<p>The above format was already quite interactive (as befits a workshop about interaction), but the second half of the day was explicitly designed to facilitate discussion. We had lunch on site, followed by a one-hour poster session.  We then had two one-hour guided discussion sessions to address the theoretical and practical concerns of HCIR. As organizers, we seeded both sessions with questions, but we also incorporated concerns that had come up during earlier discussions.</p>
<p>Finally, I am grateful to our sponsors. <a href="http://slis.cua.edu/">Catholic University</a> was a gracious host and sponsor, providing the workshop with a great space and very helpful student volunteers. Between that and the financial contributions of <a href="http://endeca.com/">Endeca</a> and <a href="http://research.microsoft.com/">Microsoft Research</a>, we were able to continue our tradition of not charging attendees for the workshop. I can&#8217;t promise that will continue indefinitely, but I am glad that our insistence on emphasizing substance over frivolous amenities has helped us deliver what I believe to be some of the best bang-for-buck in the scholarly community.</p>
<p>I&#8217;m already excited about <a href="http://iiix2010.org/hcir.php">HCIR 2010</a>. Unlike the past three workshops, which have been held as independent events, next year&#8217;s workshop will be co-located with the <a href="http://iiix2010.org/">Information Interaction in Context Symposium (IIiX’10)</a> in New Brunswick, New Jersey. The workshop will take place on August 22nd, breaking our unintended tradition of holding the workshop on October 23rd. <a href="http://comminfo.rutgers.edu/~belkin/belkin.html">Nick Belkin</a> assures us that there will be lots of space, so hopefully we&#8217;ll be able to accommodate everyone who is interested. We&#8217;ll also be soliciting sponsors for both the workshop and the broader symposium.</p>
<p>But there&#8217;s more to HCIR than enjoying each other&#8217;s company at workshops. We must spend the remaining 364 days of the year fleshing out our vision, and relating that vision not only to the disciplines HCIR explicitly integrates, but to pressing social concerns. It is up to us all to make our work relevant.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/26/hcir-2009-human-human-interaction/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Off To DC</title>
		<link>http://thenoisychannel.com/2009/10/20/off-to-dc/</link>
		<comments>http://thenoisychannel.com/2009/10/20/off-to-dc/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 01:30:57 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2724</guid>
		<description><![CDATA[I&#8217;m heading to Washington, DC tomorrow morning, a couple of days before the HCIR &#8216;09 workshop. I&#8217;m not sure I&#8217;ll have any opportunities to blog while I&#8217;m in the nation&#8217;s capital, but of course I&#8217;ll post a write-up about the workshop when I&#8217;m back! Meanwhile, if you need your blog fix, I encourage you to [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m heading to Washington, DC tomorrow morning, a couple of days before the <a href="http://cuaslis.org/hcir2009/">HCIR &#8216;09</a> workshop. I&#8217;m not sure I&#8217;ll have any opportunities to blog while I&#8217;m in the nation&#8217;s capital, but of course I&#8217;ll post a write-up about the workshop when I&#8217;m back! Meanwhile, if you need your blog fix, I encourage you to check out some of the <a href="http://thenoisychannel.com/category/blogs-i-read/">blogs I read</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/20/off-to-dc/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Books! Books! Books!</title>
		<link>http://thenoisychannel.com/2009/10/20/books-books-books/</link>
		<comments>http://thenoisychannel.com/2009/10/20/books-books-books/#comments</comments>
		<pubDate>Tue, 20 Oct 2009 04:26:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2717</guid>
		<description><![CDATA[
When my daughter was born almost two years ago, I wondered if she&#8217;d grow up reading books. After all, I do most of my reading online, and increasingly find myself reading short articles rather than whole books. Needless to say, she&#8217;s loved books so far, even if she&#8217;s shredded a few.
But the bigger surprise for [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.thesharksfoundation.com/reading/index.asp"><img class="alignnone" title="Reading Is Cool" src="http://www.thesharksfoundation.com/images/reading/ric_logo.gif" alt="" width="400" height="161" /></a></p>
<p>When my <a href="http://www.flickr.com/photos/24264445@N05/">daughter</a> was born almost two years ago, I wondered if she&#8217;d grow up reading books. After all, I do most of my reading online, and increasingly find myself reading short articles rather than whole books. Needless to say, she&#8217;s loved books so far, even if she&#8217;s shredded a few.</p>
<p>But the bigger surprise for me is that books&#8211;specifically e-books&#8211;have become such a hot industry. When I briefly worked for a consulting firm after grad school in 1999, my first assignment was to evaluate the e-book market. The readers then consisted of the <a href="http://www.answers.com/topic/rocketbook">Rocket ebook</a> and <a href="http://www.ideo.com/work/item/softbook-reader/">SoftBook Reader</a>. Needless to say, I correctly predicted at the time that the ebook-market wasn&#8217;t ready for prime time.</p>
<p>But fast forward to the present. Amazon has given the e-book market some credibility: Citigroup says they sold <a href="http://mediamemo.allthingsd.com/20090203/citi-says-amazon-sold-500000-kindles-last-year-12-billion-business-next-year/">500K Kindles in 2008</a>, and Forrester predicted they will sell <a href="http://mediamemo.allthingsd.com/20091007/the-coming-kindle-boom-sales-could-double-in-2010/">1.8M units this year</a>.</p>
<p>But the last days (and even the last 24 hours!) of news show that the e-book market is only starting to open up:</p>
<ul>
<li>In May, Sony, whose e-reader sales have lagged behind the Kindle, announced a <a href="http://www.nytimes.com/2009/03/19/technology/19sony.html">partnership with Google</a> in May in order to make copyright-free books available for free.</li>
<li>Google just announced a service called <a href="http://www.google.com/hostednews/ap/article/ALeqM5gr_qJI9KI8h7PBC-AEeknD3ezkegD9BBHAT80">Editions</a> that it plans to launch in 2010 (by when it will have presumably settled the <a href="http://books.google.com/googlebooks/agreement/">Google Books Settlement Agreement</a>).</li>
<li>The Internet Archive just announced the <a href="http://www.archive.org/bookserver">Bookserver</a> project as &#8220;a growing open architecture for vending and lending digital books over the Internet&#8221;.</li>
<li>Spring Design just announced <a href="http://www.springdesign.com/resource/jsp/">Alex</a>, an e-book reader based on Google&#8217;s Android operating system.</li>
<li> Barnes &amp; Noble is expected to <a href="http://www.engadget.com/2009/10/19/barnes-and-noble-nook-color-e-reader-out-tuesday-for-259-says/">announce an e-reader</a> that competes directly with the Kindle and has generated lots of buzz through <a href="http://gizmodo.com/5380942/exclusive-first-photos-of-barnes--nobles-double-screen-e+reader">leaked photos</a>.</li>
</ul>
<p>I grew up on books, and I&#8217;m excited to see that, a decade after the initial market failures, e-books (like touchscreens) are a mainstream reality. I still worry about <a href="http://thenoisychannel.com/2009/10/19/who-will-buy/">who will buy</a> them, especially considering that the marginal cost of distributing a typical e-book is even less than that of distributing a 5-minute song. A quick scan of a popular file-sharing site reveals that the pdf version of bestseller <em>The Lost Symbol</em> takes up less than 3MB.</p>
<p>Still, I&#8217;ll take a moment to celebrate the progress of technology. I&#8217;ve always known that reading was cool, but now we have the gadgets to prove it!</p>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px;"><a class="new" title="Rocket ebook (page does not exist)" href="http://en.wikipedia.org/w/index.php?title=Rocket_ebook&amp;action=edit&amp;redlink=1">Rocket ebook</a> and <a class="new" title="Softbook (page does not exist)" href="http://en.wikipedia.org/w/index.php?title=Softbook&amp;action=edit&amp;redlink=1">Softbook</a>.</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/20/books-books-books/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Who Will Buy?</title>
		<link>http://thenoisychannel.com/2009/10/19/who-will-buy/</link>
		<comments>http://thenoisychannel.com/2009/10/19/who-will-buy/#comments</comments>
		<pubDate>Mon, 19 Oct 2009 18:12:46 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2711</guid>
		<description><![CDATA[As some of you know, I&#8217;m a karaoke junkie. But it&#8217;s my wife who has the classier repertoire, including &#8220;Who Will Buy?&#8221; from the musical Oliver!:
Who will buy this wonderful morning?
Such a sky you never did see!
Who will tie it up with a ribbon
And put it in a box for me?
Of course, the trope that [...]]]></description>
			<content:encoded><![CDATA[<p>As some of you know, I&#8217;m a karaoke junkie. But it&#8217;s my wife who has the classier repertoire, including &#8220;Who Will Buy?&#8221; from the musical <a href="http://en.wikipedia.org/wiki/Oliver!"><em>Oliver!</em></a>:</p>
<p><em>Who will buy this wonderful morning?<br />
Such a sky you never did see!<br />
Who will tie it up with a ribbon<br />
And put it in a box for me?</em></p>
<p>Of course, the trope that the best things in life are free predates musical theater, let alone the web. But recent years have witnessed dramatic changes in our price sensitivities in every genre of digital (or digitizable) content, and I&#8217;m curious (sometimes morbidly so) about where it goes from here.</p>
<p>I won&#8217;t make you suffer through a rant about the malaise of the music and news industries&#8211;those topics, important as they are, have been overplayed in the blogosphere. If you need a refresher, I suggest <a href="http://www.free-culture.cc/">Lawrence Lessig</a> and the <a href="http://www.niemanlab.org/">Nieman Journalism Lab</a> as some of the more rational voices contributing to the discussion.</p>
<p>But it&#8217;s not just news and music that are experiencing the effects of the &#8220;<a href="http://en.wikipedia.org/wiki/Information_wants_to_be_free">information wants to be free</a>&#8221; movement. Consider these industries:</p>
<ul>
<li><strong>Books</strong>. Many publishers worry that the Kindle has been setting a consumer expectation that a book <a href="http://www.wired.com/gadgetlab/2009/04/kindle-readers/">should only cost $10</a>. Indeed, a recent <a href="http://online.wsj.com/article/SB125565024634288895.html">price war between Amazon and Wal-Mart</a> drove some of those prices down to $8.99. Is this a boon for consumers, or a body blow to the publishing industry? It&#8217;s easy to evoke the $0.99 / per song expectation set by iTunes&#8211;but that change was more about disaggregating albums than about changing the per-unit cost. Besides, books have not yet had to confront the scale of unauthorized distribution that we see in the music industry. Legal or not, free is a potent source of price pressure.</li>
<li><strong>Software</strong>. <a href="http://www.wolframalpha.com/">Wolfram Alpha</a> just made headlines by releasing a <a href="http://www.techcrunch.com/2009/10/18/wolfram-alpha-miscalculates-what-its-iphone-app-should-cost/">$50 iPhone app</a>. Many have reacted that such a high price is outrageous and will doom the application to failure. They may be right on that latter point&#8211;the market will vote with its clicks soon enough. But I&#8217;m old enough to remember $50 as being in the ballpark of what it cost to purchase a new consumer software application. Even then, unauthorized distribution was an issue&#8211;remember the &#8220;<a href="http://en.wikipedia.org/wiki/Don%27t_Copy_That_Floppy">don&#8217;t copy that floppy</a>&#8221; campaign? Today, my impression is that few people consciously purchase consumer software&#8211;a trend that I at least date to Microsoft&#8217;s strategy of bundling its software into PC purchases. The most noted exceptions are console games (which are impressive holdouts in the consumer software space) and iPhone apps&#8211;with the caveat that only a tiny minority of apps make enough money for the creators to live on. <em>(Update: just saw <a href="http://uk.games.ign.com/articles/103/1036254p1.html">this note</a> about how EA Sports President Peter Moore sees the current console game business model of cartridges and discs as a &#8220;burning platform&#8221;.)</em></li>
<li><strong>Television</strong>. Between <a href="http://charlie-federman.blogspot.com/2009/02/is-boxee-cables-napster.html">Boxee</a> and <a href="http://www.wired.com/techbiz/it/magazine/17-10/ff_netflix">Netflix</a>, there is a real chance that digital content&#8217;s cash cow, cable television, will see its regional monopolies disrupted. I can&#8217;t imagine that anyone will shed a tear for the cable companies. And yet I can&#8217;t help but wonder what happens as the notion of premium content is subsumed by an expectation that video content should be free. Are we heading towards a proliferation of cheaply produced reality TV, contests, and game shows&#8211;all sponsored by rampant product placement?</li>
</ul>
<p>If we are to believe <a href="http://www.techdirt.com/articles/20090701/0422125421.shtml">Mike Masnick</a>, then the price of content is driven to its marginal cost. It&#8217;s pretty clear that the marginal cost of distributing most digital content is, while not free, close enough to be a rounding error. Should we be looking forward to a world where no one can charge consumers for content? Folks like <a href="http://www.buzzmachine.com/2009/07/08/what-google-would-do/">Jeff Jarvis</a> and <a href="http://www.wired.com/techbiz/it/magazine/16-03/ff_free">Chris Anderson</a> are cheerleading such a world as not only inevitable but a good thing&#8211;though both of them have had the sense to make some money on non-free books while the going is good.</p>
<p>Yes, there are and will always be business models to support content creators. In particular, one-time content (live events, consulting services) has  some degree of insulation from the inexorable trend toward free. But what an inefficient turn of events, if people are rewarded for creating one-time content but not for creating far more valuable content that is useful to a broad audience of consumers!</p>
<p>I know that there are non-financial incentives that drive scholars, open-source developers, and activists to create free content. Indeed, I personally write this blog without any direct financial incentive. Perhaps these incentives will be the driving forces for content creation in the 21st century. One way or another, I hope we find a way to fund the things we value, rather than devolving into a locally optimal rut where value creation isn&#8217;t economic for the creators.</p>
<p>p.s. You can find the lyrics to Oliver for free <a href="http://users.bestweb.net/~foosie/oliver.htm">online</a>, and you can easily view an free (unauthorized) copy of a performance of &#8220;Who Will Buy?&#8221; on YouTube. Or you can buy the song for <a href="http://www.amazon.com/Who-Will-Oliver-2003-Remastered/dp/B001BKPSWS">$0.99</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/19/who-will-buy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Third Annual Workshop on Search in Social Media (SSM 2010)</title>
		<link>http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/</link>
		<comments>http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/#comments</comments>
		<pubDate>Fri, 16 Oct 2009 04:34:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2708</guid>
		<description><![CDATA[I&#8217;m proud to announce that Eugene Agichtein,  Marti Hearst, and Ian Soboroff have invited me to help organize the upcoming Workshop on Search in Social Media (SSM 2010). The workshop will take place in conjunction with the ACM  Conference on Web Search  and Data Mining (WSDM 2010), a young conference that has quickly [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m proud to announce that <a href="http://www.mathcs.emory.edu/%7Eeugene/">Eugene Agichtein</a>, <a href="http://people.ischool.berkeley.edu/%7Ehearst/"> Marti Hearst</a>, and <a href="http://trec.nist.gov/">Ian Soboroff</a> have invited me to help organize the upcoming <a href="http://ir.mathcs.emory.edu/SSM2010/">Workshop on Search in Social Media (SSM 2010)</a>. The workshop will take place in conjunction with the <a href="http://www.wsdm2010.org/">ACM  Conference on Web Search  and Data Mining (WSDM 2010)</a>, a young conference that has quickly become a top-tier forum for work in these areas. The conference and workshop will take place in my home town of New York&#8211;Brooklyn, to be precise!</p>
<p>Here&#8217;s the key information from the workshop web site:</p>
<blockquote>
<h3>Overview</h3>
<p>Social applications are the fastest growing segment of the web. They establish new forums for content creation, allow people to connect to each other and share information, and permit novel applications at the intersection of people and information. However, to date, social media has been primarily popular for connecting people, not for finding information. While there has been progress on searching particular kinds of social media, such as blogs, search in others (e.g., Facebook, Myspace, of flickr) are not as well understood.</p>
<p>The purpose of the 3rd Annual Workshop on Search in Social Media (SSM 2010), is to bring together information retrieval and social media researchers to consider the following questions: How should we search in social media? What are the needs of users, and models of those needs, specific to social media search? What models make the most sense? How does search interact with existing uses of social media? How can social media search complement traditional web search?  What new search paradigms for information finding can be facilitated by social media?</p>
<p><strong>SSM 2010</strong> follows up on the highly successful <strong> <a href="http://ir.mathcs.emory.edu/SSM2009">SSM 2009</a></strong> and <strong><a href="http://ir.mathcs.emory.edu/SSM2008">SSM 2008</a></strong> workshops held at SIGIR 2009 and CIKM 2008 respectively. We are looking forward to an equally exciting workshop at <strong> <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a></strong> in New York!</p>
<h3>Format and Topics</h3>
<p>We are planning for a full-day workshop consisting of invited speakers, organized in both plenary and panel sessions, and a contributed poster/demo session.</p>
<p><strong>We solicit short (under 2 pages) position papers, posters or demo proposals </strong>to be presented as part of a <strong>poster session</strong>, describing late-breaking and novel research results or demonstrations of prototypes or working systems. All topics at the intersection of information finding and social media are of interest, including, but not limited to:</p>
<ul>
<li>Searching blogs, tweets, and other textual social media.</li>
<li>Searching within social networks, including expert finding.</li>
<li>Searching Wikipedia discussions and revision histories.</li>
<li>Searching online discussions, mailing lists, forums, and community question answering sites.</li>
<li>The role of human-powered and community question answering.</li>
<li>Novel models of information finding and new search applications for social media.</li>
<li>The role of timeliness, authority, and accuracy in social media search.</li>
<li>Interaction between traditional web search and social media search.</li>
<li>User needs assessments and task analysis for social media search.</li>
<li>Interactions between searching and browsing in social media.</li>
<li>Searching and exploiting folksonomies, tags, and tagged data.</li>
<li>Spam and adversarial interactions in social media.</li>
</ul>
<p>Ideal papers may include late-breaking and novel research results, position and vision papers discussing the role of search in social media, and demonstrations of prototypes or working systems. Note that the workshop proceedings will not be  archived or considered as formal publication, to encourage the informal  atmosphere and to allow the authors to publish expanded versions of the work  elsewhere.</p>
<p>The poster/demo proposals should be in standard ACM SIG format, more details to be posted soon.</p></blockquote>
<p>Submissions are due on <strong>December 15th</strong>. I hope to see some of you there! Meanwhile, feel free to suggest ideas for invited speakers who have done interesting work at the intersection of social media and search, and I&#8217;ll share your suggestions with my co-organizers.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/16/third-annual-workshop-on-search-in-social-media-ssm-2010/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Innovation at Huffington Post: Data-Driven Headlines</title>
		<link>http://thenoisychannel.com/2009/10/15/innovation-at-huffington-post-data-driven-headlines/</link>
		<comments>http://thenoisychannel.com/2009/10/15/innovation-at-huffington-post-data-driven-headlines/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 18:17:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2705</guid>
		<description><![CDATA[The other day, I was suggesting to one of my colleagues that Endeca&#8217;s software could help authors write better (translate, more SEO-friendly) headlines. The details of that discussion are proprietary, but I&#8217;m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their [...]]]></description>
			<content:encoded><![CDATA[<p>The other day, I was suggesting to one of my colleagues that <a href="http://endeca.com/">Endeca</a>&#8217;s software could help authors write better (translate, more <a href="http://en.wikipedia.org/wiki/Search_engine_optimization">SEO</a>-friendly) headlines. The details of that discussion are proprietary, but I&#8217;m sure you can imagine the gist. But we all wondered whether authors would be willing to stomach such a left-brain infringement on their right-brain creativity.</p>
<p>But apparently the <a href="http://www.huffingtonpost.com/">Huffington Post</a> is blazing new trails in this area. The <a href="http://www.niemanlab.org/2009/10/how-the-huffington-post-uses-real-time-testing-to-write-better-headlines/">Nieman Journalism Lab</a> reports that:</p>
<blockquote><p><strong>The Huffington Post applies A/B testing to some of its headlines.</strong> Readers are randomly shown one of two headlines for the same story. After five minutes, which is enough time for such a high-traffic site, the version with the most clicks becomes the <a href="http://www.google.com/search?q=site%3Aobserver.com+%22wood+war%22">wood</a> that everyone sees.</p></blockquote>
<p>NJL also reports that Huffington Post social media editor&#8211;and long-time Noisy Channel reader&#8211;<a href="http://networkednews.wordpress.com/">Josh Young</a> uses Twitter to help crowd-source  better headlines.</p>
<p>I&#8217;m sure this approach must rattle some old-school journalists. And there is a real danger of optimizing for the wrong outcome. For example, including the word &#8220;sex&#8221; in this message might improve its traffic (the popularity of <a href="http://thenoisychannel.com/2008/12/12/the-noisy-channel-now-better-than-sex/">this post</a> attests to that), but to what end?</p>
<p>Still, I don&#8217;t see this use of technology as cramping anyone&#8217;s style. Most of us write to be read&#8211;especially those in the media industry who are trying to monetize their audiences. Measurable success matters, and there&#8217;s no harm in trying to maximize it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/15/innovation-at-huffington-post-data-driven-headlines/feed/</wfw:commentRss>
		<slash:comments>31</slash:comments>
		</item>
		<item>
		<title>Are Duplicate Tweets Spam?</title>
		<link>http://thenoisychannel.com/2009/10/15/are-duplicate-tweets-spam/</link>
		<comments>http://thenoisychannel.com/2009/10/15/are-duplicate-tweets-spam/#comments</comments>
		<pubDate>Thu, 15 Oct 2009 16:43:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2701</guid>
		<description><![CDATA[The Twitterverse is all a-twitter with a new controversy: Twitter has rolled out a new feature that blocks duplicate tweets. They reported to the SocialOomph blog that:
Recurring Tweets are a violation no matter how they are done, including whether or not someone pays you to have a special privilege. We don’t want to see any [...]]]></description>
			<content:encoded><![CDATA[<p>The Twitterverse is all a-twitter with a new controversy: Twitter has rolled out a new feature that <a href="http://www.techcrunch.com/2009/10/14/cleaning-up-the-stream-twitter-kills-duplicate-tweets/">blocks duplicate tweets</a>. They reported to the <a href="http://www.socialoomphblog.com/recurring-tweets/">SocialOomph</a> blog that:</p>
<blockquote><p>Recurring Tweets are a violation no matter how they are done, including whether or not someone pays you to have a special privilege. We don’t want to see any duplicate tweets whatsoever- They pollute Twitter, and tools shouldn’t be given to enable people to break the rules. Spinnable text seems to just be a way to bypass the rules against duplicate updates and essentially provides the same problems.</p>
<p>Hence, from Thursday, October 15th, 2009, 00:00 AM CST we will prevent the entry of recurring tweets on Twitter accounts within the SocialOomph system. Existing recurring tweets on Twitter accounts will all be placed in paused state at that time, so that the content of the tweet text is still accessible to you, but no publishing to Twitter of those tweets will take place.</p></blockquote>
<p>Not everyone is thrilled with this new feature. My friend (and Noisy Channel reader) <a href="http://twitter.com/eric_andersen">Eric Andersen</a> notes: &#8220;<span title="processed"><span>this doesn&#8217;t make a lot of sense to me &#8211; many highly regarded Twitter users (e.g. @<a href="http://twitter.com/GuyKawasaki">GuyKawasaki</a>) regularly re-post tweets&#8230;</span></span><span title="processed"><span>primarily because of the &#8220;dip&#8221; model: re-posting the same tweet means more people will see, especially with an int&#8217;l audience.&#8221;</span></span></p>
<p><span title="processed"><span>On one hand, I loathe inefficient communication, and I see repeated tweets as exposing the inefficiency of the dip model. We won&#8217;t get into my <a href="http://thenoisychannel.com/2009/04/06/guy-kawasaki-ill-say-it/">differences of opinion</a> with Guy Kawasaki. If Twitter offered better search and control to users, then I think it would make sense for them to consider </span></span><span title="processed"><span>duplicate tweets as a spam issue.</span></span></p>
<p><span title="processed"><span>On the other hand, Twitter search is <a href="http://thenoisychannel.com/2009/05/09/the-twouble-with-twitter-search/">crude</a>. And the dip model, much as it may raise my <a href="http://thenoisychannel.com/2009/01/02/an-attention-ponzi-scheme/">personal hackles</a>, is, in fact, what many users embrace. Twitter takes pride in letting users drive innovation, and I think they should be cautious about being too autocratic. Surely many of the people who post duplicate tweets do so with unspammy intentions.</span></span></p>
<p><span title="processed"><span>Let&#8217;s face it: Twitter is going through growing pains, even if it just inherited the <a href="http://pacific.bizjournals.com/pacific/stories/2009/10/05/daily60.html">mother of all trust funds</a>. They really do have to address <a href="http://thenoisychannel.com/2009/06/27/are-spammers-taking-over-twitter/">spam</a>. But they might consider doing so in a less heavy-handed way. I suspect that duplicate tweets are mainly a problem because they affect the statistics for Trending Topics&#8211;a problem they could easily address without prohibiting the tweets themselves. Better search would make it users to take charge of the user experience&#8211;a small dose of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> would go a long way.</span></span></p>
<p><span title="processed"><span>I think Twitter has the best of intentions, and that it is confronting a real problem. I hope they work harder to find the right solution.<br />
</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/15/are-duplicate-tweets-spam/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Go Shopping, Be Social</title>
		<link>http://thenoisychannel.com/2009/10/14/go-shopping-be-social/</link>
		<comments>http://thenoisychannel.com/2009/10/14/go-shopping-be-social/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 20:42:08 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2694</guid>
		<description><![CDATA[ 
If you&#8217;re into search startups, then today&#8217;s a great day to check out what a couple of them are up to.
TheFind just launched (or relaunched?) a &#8220;buying engine&#8221; that aspires &#8220;to help every shopper find exactly what they want to buy, and to help every merchant, large and small, to reach those shoppers.&#8221; It [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.thefind.com/"><img class="alignleft" title="The Find" src="http://www.thefind.com/images/cobrands/thefind/logos/thefind_frontpage.png" alt="" width="126" height="48" /></a> <a href="http://vark.com/"><img class="size-full wp-image-2696 alignright" title="Aardvark" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/10/Aardvark.jpg" alt="Aardvark" width="240" height="48" /></a></p>
<p>If you&#8217;re into search startups, then today&#8217;s a great day to check out what a couple of them are up to.</p>
<p><a href="http://www.thefind.com/">TheFind</a> just <a href="http://digital.venturebeat.com/2009/10/14/thefind-launches-buying-engine-product-search-from-500000-stores/">launched</a> (or <a href="http://venturebeat.com/2007/03/29/the-courtesan-sisters-of-comparison-search-and-thefind/">relaunched</a>?) a &#8220;buying engine&#8221; that aspires &#8220;to help every shopper find exactly what they want to buy, and to help every merchant, large and small, to reach those shoppers.&#8221; It has some nice interface elements, but I can&#8217;t say I&#8217;m sold on the overall user experience.</p>
<p>Meanwhile, <a href="http://vark.com/">Aardvark</a> just <a href="http://blog.vark.com/?p=229">launched</a> a web-based version of its social search application. The site urges users to &#8220;ask any question in plain English, and Aardvark will discover the perfect person in your network to answer&#8230;in under 5 minutes!&#8221; As I&#8217;ve <a href="http://thenoisychannel.com/2009/06/27/aardvark-burrows-out-of-beta/">commented</a> before, I think they need to embrace the philosophy of &#8220;<a href="../../2008/11/27/when-in-doubt-make-it-public/">when in doubt, make it public</a>&#8220;. But hey, they made the <a href="http://www.time.com/time/specials/packages/article/0,28804,1918031_1918016_1917993,00.html">Time&#8217;s top 50 websites for 2009</a>, so perhaps they are right to ignore my advice.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/14/go-shopping-be-social/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Structured Search Is On The Table</title>
		<link>http://thenoisychannel.com/2009/10/13/structured-search-is-on-the-table/</link>
		<comments>http://thenoisychannel.com/2009/10/13/structured-search-is-on-the-table/#comments</comments>
		<pubDate>Tue, 13 Oct 2009 16:32:59 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2689</guid>
		<description><![CDATA[
Freebase. Wolfram Alpha. Google Squared. I hesitate to declare a trend, but there does seem to be a growing interest in more structured approaches to information seeking.
The latest entry is Factual, launched today by Gil Elbaz. Elbaz is no slouch: in 1998, he and Adam Weissman co-founded Applied Semantics (originally known as Oingo) and built [...]]]></description>
			<content:encoded><![CDATA[<p><!-- Factual Table --><iframe frameborder="0" marginwidth="0" marginheight="0" border="0" style="border:0;margin:0;width:500px;height:350px;" src="http://www.factual.com/s/4tkDnI/American_Idol_Finalists_and_Performances?pkhbg=2c2b2c&#038;pkcbg=5f6162&#038;pkabg=909292&#038;fhbg=c7c9cb&#038;fcbg=ffffff&#038;fabg=e7e8e9" scrolling="no" allowtransparency="true"></iframe></p>
<p><a href="http://www.freebase.com/">Freebase</a>. <a href="http://www.wolframalpha.com/">Wolfram Alpha</a>. <a href="http://www.google.com/squared">Google Squared</a>. I hesitate to declare a trend, but there does seem to be a growing interest in more structured approaches to information seeking.</p>
<p>The latest entry is <a href="http://www.factual.com/">Factual</a>, launched today by <a href="http://www.linkedin.com/in/gilelbaz">Gil Elbaz</a>. Elbaz is no slouch: in 1998, he and Adam Weissman co-founded <a href="http://www.appliedsemantics.com/">Applied Semantics</a> (originally known as Oingo) and built a <a href="http://en.wikipedia.org/wiki/Word_sense_disambiguation">word sense disambiguation</a> engine based on <a href="http://wordnet.princeton.edu/">WordNet</a>. In 2003, they sold the company to Google for $102M, where it became the bases of their very lucrative <a href="http://en.wikipedia.org/wiki/AdSense">AdSense</a> offering.</p>
<p>According to Factual&#8217;s website:</p>
<blockquote><p>Factual is a platform where anyone can share and mash open data on any subject.  For example, you might find a comprehensive directory of restaurants along with dozens of searchable attributes, a huge database of published books, or a list of every video game and their cheat codes.  We provide smart tools to help the community build and maintain a trusted source of structured data.</p>
<p>Factual&#8217;s key product, the Factual Table, provides a unique way to view and work with structured data.  Information in Factual Tables comes from the wisdom of the community and from our powerful data mining tools, and the result is rich, dynamic, and transparent data.</p></blockquote>
<p>You can read more detailed coverage in <a href="http://searchengineland.com/factual-parting-the-curtains-of-the-invisible-web-27608">Search Engine Land</a>, <a href="http://www.techcrunch.com/2009/10/13/factual-applied-semantics-co-founder-launches-a-repository-for-open-data/">TechCrunch</a>, <a href="http://www.readwriteweb.com/archives/factual_makes_publishing_open_data_easy.php">ReadWriteWeb</a>, <a href="http://gigaom.com/2009/10/13/meet-factual-a-start-up-pushing-open-data/">GigaOM</a>, and <a href="http://digital.venturebeat.com/2009/10/13/factual-wants-to-be-the-center-of-the-webs-open-data/">VentureBeat</a>.</p>
<p>To me, Factual sounds like a hybrid between Freebase and <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">Many Eyes</a>. And, like both, it&#8217;s free (as in <a href="http://en.wikipedia.org/wiki/Gratis_versus_Libre">free beer</a>). Free cuts both ways: the Factual site states clearly that &#8220;There is currently no way for us to help you monetize these tables.&#8221; As with many companies at this stage, the business model is TBD.</p>
<p>I have mixed feelings. I like the increasing interest by startups in  structured search. It&#8217;s a step in the right direction, since structure is a key enabler for <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">interaction</a>. But we already have one Freebase (and even <a href="http://www.google.com/base/">Google Base</a>), and it&#8217;s not clear that we need yet another company to enable crowd-sourced submission of structured data. Perhaps what we need is a way to incent the sort of behavior that has made Wikipedia so successful. As my colleague Rob Gonzalez (who is rumored to have a blog in the works) is always happy to point out, structured data repositories are a public good that no one is ever willing to pay for. The current best hope seems to be the <a href="http://linkeddata.org/">Linked Data</a> initiative, which sounds great in theory&#8211;though I think the jury is still out on whether it will succeed in practice.</p>
<p>My ambivalence aside, I am excited that some of the greatest minds in computer science are focused on bringing more structure to the information seeking progress. Even if some of these efforts prove to be false starts, we&#8217;re going in the right direction. Structured search is on the table.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/13/structured-search-is-on-the-table/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Faceted Search Book: Now At Half Price!</title>
		<link>http://thenoisychannel.com/2009/10/10/faceted-search-book-now-at-half-price/</link>
		<comments>http://thenoisychannel.com/2009/10/10/faceted-search-book-now-at-half-price/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 20:53:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2682</guid>
		<description><![CDATA[Not sure when (or why) this happened, but I just noticed that my Faceted Search book is now almost half off at Amazon, selling for $12.94. Not that it was ever that extravagant a purchase, but at this price you have 48% fewer excuses not to buy your own copy! And, speaking of Amazon, I [...]]]></description>
			<content:encoded><![CDATA[<p>Not sure when (or why) this happened, but I just noticed that my <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999"><em>Faceted Search</em></a> book is now almost half off at Amazon, selling for $12.94. Not that it was ever that extravagant a purchase, but at this price you have 48% fewer excuses not to buy your own copy! And, speaking of Amazon, I would appreciate if folks who have read the book could take a moment to post a review there.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/10/faceted-search-book-now-at-half-price/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Google Is Sharpening Its Squares</title>
		<link>http://thenoisychannel.com/2009/10/09/google-is-sharpening-its-squares/</link>
		<comments>http://thenoisychannel.com/2009/10/09/google-is-sharpening-its-squares/#comments</comments>
		<pubDate>Sat, 10 Oct 2009 02:03:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2674</guid>
		<description><![CDATA[
As some of you may remember, I&#8217;m excited about Google Squared, a project I see as a great first step toward exploratory search at a web scale. Yes, I know that Duck Duck Go, Kosmix and others are already taking on this challenge, but it makes a difference to see Google throw its weight behind [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://googleblog.blogspot.com/2009/10/new-in-google-squared-quality.html"><img class="alignnone" title="Google Squared" src="http://4.bp.blogspot.com/_7ZYqYi4xigk/Ss93LUkJaWI/AAAAAAAAEtw/jBZr6QPybQA/s400/dg6t87s8_9hqhzc3d4_b.png" alt="" width="354" height="400" /></a></p>
<p>As some of you may remember, I&#8217;m excited about <a href="http://www.google.com/squared">Google Squared</a>, a project I see as a <a href="http://thenoisychannel.com/2009/06/04/google-squared-a-great-first-step/">great first step</a> toward <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> at a web scale. Yes, I know that <a href="http://duckduckgo.com/">Duck Duck Go</a>, <a href="http://kosmix.com/">Kosmix</a> and others are already taking on this challenge, but it makes a difference to see Google throw its weight behind such an ungoogley initiative. Plus Google Squared is ambitious, to say the least&#8211;the input is free-form text and the output is highly structured.</p>
<p>Since I&#8217;ve beaten up <a href="http://wolframalpha.com/">Wolfram Alpha</a> for is <a href="http://thenoisychannel.com/2009/05/07/playing-with-wolfram-alpha/">overreliance on NLP</a>, I can&#8217;t give Google a free pass. It would be nice to be able to give Google Square more structured guidance (yes, I&#8217;m still an <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> fanatic). But Google Squared seems to achieve far more robust query interpretation than Wolfram Alpha&#8217;s&#8211;perhaps because supporting exploratory search is less brittle than <a href="http://en.wikipedia.org/wiki/Question_answering">question answering</a>.</p>
<p>The quality of the tables that Google Square produces as results is still spotty, but it is a major improvement from the initial release. To those who wrote off Google Squared in June, I suggest you take a second look.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/09/google-is-sharpening-its-squares/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Is Twitter Planning To Monetize The Firehose?</title>
		<link>http://thenoisychannel.com/2009/10/08/is-twitter-planning-to-monetize-the-firehose/</link>
		<comments>http://thenoisychannel.com/2009/10/08/is-twitter-planning-to-monetize-the-firehose/#comments</comments>
		<pubDate>Thu, 08 Oct 2009 13:05:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2671</guid>
		<description><![CDATA[A few months ago, I wrote in &#8220;The Twouble with Twitter Search&#8220;:
But the trickle that Twitter returns is hardly enough.
I believe this limitation is by design–that Twitter knows the value of such access and isn’t about to give it away. I just hope Twitter will figure out a way to provide this access for a [...]]]></description>
			<content:encoded><![CDATA[<p>A few months ago, I wrote in &#8220;<a href="http://thenoisychannel.com/2009/05/09/the-twouble-with-twitter-search/">The Twouble with Twitter Search</a>&#8220;:</p>
<blockquote><p>But the trickle that Twitter returns is hardly enough.</p>
<p>I believe this limitation is by design–that Twitter knows the value of such access and isn’t about to give it away. I just hope Twitter will figure out a way to provide this access for a price, and that an ecology of information access providers develops around it. Of course, if Google or Microsoft buys Twitter first, that probably won’t happen.</p></blockquote>
<p>Now that Twitter has raised $100M at a valuation of $1B, I doubt any acquisition will happen anytime soon. But, according to <a href="http://kara.allthingsd.com/20091008/twitter-talking-separately-to-microsoft-and-also-google-about-big-data-mining-deals/">Kara Swisher&#8217;s unnamed sources</a>:</p>
<blockquote><p>Twitter is in advanced talks with Microsoft and Google separately about striking data-mining deals, in which the companies would license a full feed from the microblogging service that could then be integrated into the results of their competing search engines.</p></blockquote>
<p>If so, then it&#8217;s about time! How much either Microsoft or Google would pay for this feed is an interesting question. It&#8217;s probably not a coincidence that Twitter raised its last round of funding before pursuing this path&#8211;the revenue they obtain this way could be significant, but is unlikely to justify a $1B valuation.</p>
<p>In any case, I&#8217;m excited as a consumer that Twitter may finally allow Google and Microsoft to better expose the value of its content. But I&#8217;m also curious what my friends on the Twitter Search team think of the potential competition from the web search titans. Until now, no one has been able compete effectively with Twitter&#8217;s native search because of  lacking access to the firehose. Having such access would give Google and Microsoft more than a fighting chance. Given the centrality of search to Twitter&#8217;s user experience, it&#8217;s an interesting corporate strategy.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/08/is-twitter-planning-to-monetize-the-firehose/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Google Meets The Press</title>
		<link>http://thenoisychannel.com/2009/10/07/google-meets-the-press/</link>
		<comments>http://thenoisychannel.com/2009/10/07/google-meets-the-press/#comments</comments>
		<pubDate>Wed, 07 Oct 2009 19:06:10 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2665</guid>
		<description><![CDATA[I enjoyed my proverbial fifteen seconds of fame on CNN yesterday, and I even enjoyed lunch at the New York Times cafeteria today. But for a prime-time media show check out the live blogging of a chat that Google co-founder Sergey Brin and CEO Eric Schmidt are having with reporters at the Google New York [...]]]></description>
			<content:encoded><![CDATA[<p>I enjoyed my proverbial <a href="http://thenoisychannel.com/2009/10/06/the-noisy-channel-live-on-cnn/">fifteen seconds of fame on CNN</a> yesterday, and I even enjoyed lunch at the New York Times cafeteria today. But for a prime-time media show check out the live blogging of a chat that Google co-founder Sergey Brin and CEO Eric Schmidt are having with reporters at the Google New York Office.</p>
<p>Here&#8217;s an excerpt (via <a href="http://www.techcrunch.com/2009/10/07/a-conversation-with-sergey-brin/">TechCrunch</a>) to pique your interest:</p>
<blockquote><p><strong>Q</strong>: Do you think Bing is something different or a rebranding?</p>
<p><strong>Sergey Brin</strong>: I don’t want to speak about our competitors.</p>
<p><strong>Eric Schmidt</strong>: Better for you to judge.  We like to focus on our customers.</p></blockquote>
<p>More coverage at:</p>
<ul>
<li><a href="http://searchengineland.com/live-blogging-sergey-brin-eric-schmidt-talking-search-with-the-press-27380">Danny Sullivan (Search Engine Land)</a></li>
<li><a href="http://mediamemo.allthingsd.com/20091007/live-from-new-york-google-cofounder-sergey-brin-meets-the-press/">Peter Kafka (AllThingsD)</a></li>
<li><a href="http://www.techcrunch.com/2009/10/07/a-conversation-with-sergey-brin/">Erick Schonfeld (TechCrunch)</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/07/google-meets-the-press/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>The Noisy Channel, Live On CNN!</title>
		<link>http://thenoisychannel.com/2009/10/06/the-noisy-channel-live-on-cnn/</link>
		<comments>http://thenoisychannel.com/2009/10/06/the-noisy-channel-live-on-cnn/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 20:18:04 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2662</guid>
		<description><![CDATA[
For anyone who&#8217;s ever wondered what it would be like to see me live on CNN, this is your chance! Sorry that it isn&#8217;t my most telegenic moment. Still, it was a nice opportunity to share my perspective on the new FTC regulations facing bloggers.
]]></description>
			<content:encoded><![CDATA[<p><script src="http://i.cdn.turner.com/cnn/.element/js/2.0/video/evp/module.js?loc=dom&amp;vid=/video/business/2009/10/06/dcl.blog.ftc.blogs.cnn" type="text/javascript"></script></p>
<p>For anyone who&#8217;s ever wondered what it would be like to see me live on CNN, this is your chance! Sorry that it isn&#8217;t my most telegenic moment. Still, it was a nice opportunity to share my perspective on the <a href="http://thenoisychannel.com/2009/10/05/jeff-jarvis-and-matt-cutts-on-the-new-ftc-blog-regulations/">new FTC regulations</a> facing bloggers.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/06/the-noisy-channel-live-on-cnn/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>In the ASIS&amp;T Bulletin: Reconsidering Relevance and Embracing Interaction</title>
		<link>http://thenoisychannel.com/2009/10/05/in-the-asist-bulletin-reconsidering-relevance-and-embracing-interaction/</link>
		<comments>http://thenoisychannel.com/2009/10/05/in-the-asist-bulletin-reconsidering-relevance-and-embracing-interaction/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 03:48:57 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2659</guid>
		<description><![CDATA[Just thought I&#8217;d alert readers to an article I just published in the current issue of the ASIS&#38;T Bulletin entitled &#8220;Reconsidering Relevance and Embracing Interaction&#8220;. Of course, it&#8217;s all about trying to usher in a brave new world of human-computer information retrieval. If you&#8217;re not already sick of reading about HCIR, check it out!
]]></description>
			<content:encoded><![CDATA[<p>Just thought I&#8217;d alert readers to an article I just published in the current issue of the <a href="http://www.asis.org/bulletin.html">ASIS&amp;T Bulletin</a> entitled &#8220;<a href="http://www.asis.org/Bulletin/Oct-09/OctNov09_Tunkelang.html">Reconsidering Relevance and Embracing Interaction</a>&#8220;. Of course, it&#8217;s all about trying to usher in a brave new world of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a>. If you&#8217;re not already sick of reading about HCIR, check it out!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/05/in-the-asist-bulletin-reconsidering-relevance-and-embracing-interaction/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>HCIR 2009 Proceedings Now Available</title>
		<link>http://thenoisychannel.com/2009/10/05/hcir-2009-proceedings-now-available/</link>
		<comments>http://thenoisychannel.com/2009/10/05/hcir-2009-proceedings-now-available/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 03:29:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2657</guid>
		<description><![CDATA[The HCIR 2009 proceedings are now available on the workshop web site. We&#8217;re planning to  save trees and money by asking attendees to download the proceedings rather than printing them out. And, of course, we&#8217;re delighted to circulate the proceedings to those who won&#8217;t be fortunate enough to spend the day at the workshop.
]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> proceedings are now available on the workshop <a href="http://cuaslis.org/hcir2009/">web site</a>. We&#8217;re planning to  save trees and money by asking attendees to download the proceedings rather than printing them out. And, of course, we&#8217;re delighted to circulate the proceedings to those who won&#8217;t be fortunate enough to spend the day at the workshop.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/05/hcir-2009-proceedings-now-available/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Jeff Jarvis and Matt Cutts on the New FTC Blog Regulations</title>
		<link>http://thenoisychannel.com/2009/10/05/jeff-jarvis-and-matt-cutts-on-the-new-ftc-blog-regulations/</link>
		<comments>http://thenoisychannel.com/2009/10/05/jeff-jarvis-and-matt-cutts-on-the-new-ftc-blog-regulations/#comments</comments>
		<pubDate>Tue, 06 Oct 2009 03:19:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2653</guid>
		<description><![CDATA[As has been anticipated for a while&#8211;and discussed during the Ethics of Blogging panel&#8211;the United States Federal Trade Commission (FTC) has published explicit guidelines regarding how bloggers (at least within its jurisdiction) must disclose any &#8220;material connections&#8221; they have to the companies they endorse.  The full details are available here.
There have been a number of [...]]]></description>
			<content:encoded><![CDATA[<p>As has been anticipated for a while&#8211;and discussed during the <a href="http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/">Ethics of Blogging</a> panel&#8211;the United States Federal Trade Commission (FTC) has <a href="http://www.ftc.gov/opa/2009/10/endortest.shtm">published explicit guidelines</a> regarding how bloggers (at least within its jurisdiction) must disclose any &#8220;material connections&#8221; they have to the companies they endorse.  The full details are available <a href="http://www.ftc.gov/os/2009/10/091005endorsementguidesfnnotice.pdf">here</a>.</p>
<p>There have been a number of reactions across the blogosphere, but I&#8217;d like to hone in on two opposing views: those of <em><a href="http://www.buzzmachine.com/what-would-google-do/">What Would Google Do</a></em> author (and blogger) <a href="http://www.buzzmachine.com/about-me/">Jeff Jarvis</a> and Googler <a href="http://www.mattcutts.com/blog/about-me/">Matt Cutts</a>.</p>
<p>Jarvis <a href="http://www.buzzmachine.com/2009/10/05/ftc-regulates-our-speech/">describes</a> the regulations as &#8220;a monument to unintended consequence, hidden dangers, and dangerous assumptions&#8230;the greatest myth embedded within the FTC’s rules [is] that the government can and should sanitize the internet for our protection.&#8221;</p>
<p><a href="http://http://www.buzzmachine.com/2009/10/05/ftc-regulates-our-speech/#comment-402517">Commenting</a> on Jarvis&#8217;s post, Cutts replies:</p>
<blockquote><p>As a Google engineer who has seen the damage done by fake blogs, sock puppets, and endless scams on the internet, I’m happy to take the opposite position: I think the FTC guidelines will make the web more useful and more trustworthy for consumers. Consumers don’t want to be shilled and they don’t want payola; they want a web that they can trust. The FTC guidelines just say that material connections should be disclosed. From having dealt with these issues over several years, I believe that will be a good thing for the web.</p></blockquote>
<p>It&#8217;s a fascinating debate, and I can see merit in both sides. Like the folks at <a href="http://reason.com/blog/2009/10/05/shut-your-mouth-if-your-experi">Reason</a>, I lean libertarian (at least on issues of freedom of expression) and am not eager to see more government regulation of online speech. That said, I see the value of laws requiring truth in advertising, and I don&#8217;t see why pay-for-play bloggers should get a free pass if they are acting as advertisers. Interestingly, Jarvis&#8217;s <a href="http://www.buzzmachine.com/2009/10/05/ftc-regulates-our-speech/#comment-402522">response</a> to Cutts is: &#8220;I trust you to regulate spam more than the FTC. You are better at it and have more impact.&#8221; That&#8217;s probably true today, but wouldn&#8217;t want to invest that responsibility in a company that makes 99% of its revenue from advertising.</p>
<p>Everyone in this discussion sees the value of  transparency&#8211;the question is whether it should be a legal norm enforced through FTC regulation or a social norm enforced by the marketplace. Despite my general skepticism about regulation of expression, I temper my libertarianism with a dose of pragmatism. For example, I&#8217;m glad that the Food and Drug Administration (FDA) at least tries to regulate health claims&#8211;its efforts may not eliminate quackery, but they surely reduce the problem.</p>
<p>Do we  need FTC regulation in order to tame the <a href="http://en.wikipedia.org/wiki/The_Jungle">jungle</a> of social media? For that matter, will regulations have a positive effect, or will <a href="http://en.wikipedia.org/wiki/Spam_blog">sploggers</a> and other scammers simply ignore them&#8211;and perhaps even more offshore? I share Jarvis&#8217;s fear that the regulation will cause more harm than good&#8211;perhaps even having chilling effects on would-be bloggers. Certainly the FTC will have to use its new power wisely&#8211;both to avoid trampling the existing blogosphere and to not scare off newcomers. Still, if the FTC shows that it is only out to get true scammers, it may help establish, in Cutts&#8217;s words, a web we can trust.</p>
<p>I&#8217;m Daniel Tunkelang, and I endorse this blog post.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/05/jeff-jarvis-and-matt-cutts-on-the-new-ftc-blog-regulations/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Software Patents: A Personal Story</title>
		<link>http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/</link>
		<comments>http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/#comments</comments>
		<pubDate>Sat, 03 Oct 2009 17:19:06 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2648</guid>
		<description><![CDATA[Given the radioactive nature of this post&#8217;s subject matter, I feel the need to remind readers that this is not a corporate blog, and that the opinions expressed within are my personal opinions, not those of my employer. Also, please understand that I cannot comment on any intellectual property issues specifically related to my employer.
With [...]]]></description>
			<content:encoded><![CDATA[<p>Given the radioactive nature of this post&#8217;s subject matter, I feel the need to remind readers that <a href="http://thenoisychannel.com/2008/12/10/this-is-not-a-corporate-blog/">this is not a corporate blog</a>, and that the opinions expressed within are my personal opinions, not those of my employer. Also, please understand that I cannot comment on any intellectual property issues specifically related to my employer.</p>
<p>With that preamble out of the way, let me tell you a true story. The other day, I received a phone call from a friend who has been building a kick-ass startup. That friend had been contacted by a much larger competitor with what amounted to an ultimatum: shut down and come work for us, or we&#8217;ll crush you with a <a href="http://en.wikipedia.org/wiki/Patent_infringement">patent infringement</a> suit. My friend&#8217;s startup didn&#8217;t cave in&#8211;in fact, my friend even went through the trouble of sharing a pile of incontrovertible <a href="http://en.wikipedia.org/wiki/Prior_art">prior art</a> with the competitor. The competitor was unimpressed, and my friend&#8217;s startup is now facing a potentially ruinous lawsuit.</p>
<p>If you know any of the characters in this story, I beg you to keep that information to yourself&#8211;at least for now. I&#8217;d like my friend to have a chance of getting his company out of this predicament, and premature publicity might hurt his case.</p>
<p>But back to the case: let me give you an idea of how a story like this can play out. At a high level, the startup can choose to fight or not fight.</p>
<p>Not fighting means the entrepreneurs writing off their startup, but it allows them to move on and try something new. It might be the best career move for the entrepreneurs, but it means that the world loses a promising startup, and the surrender rewards bad behavior, reinforcing a regime where innovators can&#8217;t afford to compete with more established players.</p>
<p>Fighting means  mounting a non-infringement defense, an invalidation defense, or both.</p>
<p>A non-infringement argument asserts that, regardless of the validity of the patent, its claims don&#8217;t cover what the startup is doing. Since patents carry a <a href="http://www.uspto.gov/web/offices/pac/mpep/documents/appxl_35_U_S_C_282.htm">presumption of validity</a>, the non-infringement route is appealing&#8211;there&#8217;s no need to slog through the much longer invalidation process. Leaving a bad patent alive may be a worse outcome for the rest of the world, but entrepreneurs don&#8217;t have the luxury of taking the weight of the world onto their own shoulders.</p>
<p>Unfortunately, the very characteristics of a bad patent make it hard for an accused infringer to succeed in a non-infringement argument. If a patent is overly broad, then it&#8217;s more likely that the infringement argument will be valid (<a href="http://wiki.answers.com/Q/What_is_the_difference_between_valid_and_sound_argument">but not sound</a>, since the patent itself is&#8211;or should be&#8211;invalid). Vaguely worded claims are also a problem&#8211;while a patent examiner may have granted a patent based on one interpretation of the claim language, the patent holder may now be asserting infringement under a different (and typically broader) interpretation of that same language.</p>
<p>As a result, a non-infringement argument often depends almost entirely on the result of a <a href="http://en.wikipedia.org/wiki/Markman_hearing">Markman hearing</a>, more formally known as a claim construction hearing. In such a hearing, a judge decides how to interpret any language in the claim whose meaning is contested by the opposing parties in the suit. Such a hearing is often a crap shoot for the accused infringer. An unfavorable result which supports the infringement accusation may ultimately help invalidate the patent, but the results are likely to come too late&#8211;justice delayed for a startup is often an extreme case of justice denied.</p>
<p>Which brings us to the invalidation route. In theory, invalidation is the right approach to take when confronted with an invalid patent. Ideally, the accused infringer presents prior art to the patent office to <a href="http://en.wikipedia.org/wiki/Reexamination">reexamine</a> the patent, resulting in the patent either being invalidated or rewritten to have a much narrower scope. In practice, however, this approach requires significant effort, time,  money&#8211;especially if you depend on lawyers to do the heavy lifting&#8211;and luck. The best hope is to rapidly request and obtain a reexamination, and then to request and obtain a stay of the infringement suit pending reexamination. Needless to say, the patent holder will fight tooth and nail to avoid this outcome.</p>
<p>I don&#8217;t know how my friend&#8217;s story will end. But, as the above analysis should make clear, he&#8217;s between a rock and a hard place. Whether or not you believe that there should be software patents&#8211;and there is room for reasonable people to <a href="http://en.wikipedia.org/wiki/Software_patent_debate">debate</a> this question&#8211;I hope you agree that the situation my friend is facing amounts to legalized extortion. I understand that no system is perfect, and that our legal system requires compromises that have inevitable casualties.</p>
<p>Nonetheless, my friend&#8217;s story does not feel like an isolated incident, but rather evidence of a systemic problem. There are a lot of software patents floating around right now of dubious validity, many of them granted to companies that have since folded and have unloaded their assets in fire sales. It would be sad for this supply of ersatz intellectual property to impede the real innovation that the patent system was intended to protect.</p>
<p><strong><em>Update: this post has been picked up by Y Combinator&#8217;s <a href="http://news.ycombinator.com/item?id=859432">Hacker News</a>.</em></strong></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/feed/</wfw:commentRss>
		<slash:comments>56</slash:comments>
		</item>
		<item>
		<title>Google Updates Search Refinement Options</title>
		<link>http://thenoisychannel.com/2009/10/01/google-updates-search-refinement-options/</link>
		<comments>http://thenoisychannel.com/2009/10/01/google-updates-search-refinement-options/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 17:41:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2644</guid>
		<description><![CDATA[
Google announced today that its Search Options feature, which allows users to progressively refine search results, now includes new refinement options: past hour, specific date range, more shopping sites, fewer shopping sites, visited pages, not yet visited, books, blogs and news. Of course, you could do some of this already with clever hackery. In any [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://googleblog.blogspot.com/2009/10/refine-your-search-results-with-new.html"><img class="alignnone" title="Google Search Options" src="http://2.bp.blogspot.com/_7ZYqYi4xigk/SsTLA7PQUvI/AAAAAAAAErA/2-TG4rKLOoo/s400/SO1.jpg" alt="" width="400" height="270" /></a></p>
<p>Google <a href="http://googleblog.blogspot.com/2009/10/refine-your-search-results-with-new.html">announced</a> today that its <a href="http://googleblog.blogspot.com/2009/05/more-search-options-and-other-updates.html">Search Options</a> feature, which allows users to progressively refine search results, now includes new refinement options: past hour, specific date range, more shopping sites, fewer shopping sites, visited pages, not yet visited, books, blogs and news. Of course, you could do some of this already with <a href="http://blog.omgili.com/?p=108">clever hackery</a>. In any case, it&#8217;s great to see Google slouching towards <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> on its most visible search property. Perhaps I was too quick to <a href="http://thenoisychannel.com/2009/08/14/why-does-google-hold-back-on-faceted-search/">write off</a> their interest in <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>. Meanwhile, I&#8217;m staying tuned for <a href="http://blogs.zdnet.com/microsoft/?p=3906">Bing 2.0</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/01/google-updates-search-refinement-options/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Guest Post: A Plan For Abusiveness</title>
		<link>http://thenoisychannel.com/2009/10/01/guest-post-a-plan-for-abusiveness/</link>
		<comments>http://thenoisychannel.com/2009/10/01/guest-post-a-plan-for-abusiveness/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 14:07:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Guest Post]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2639</guid>
		<description><![CDATA[The following is a guest post by Jeff Revesz and Elena Haliczer, co-founders of Adaptive Semantics. Adaptive Semantics specializes in sentiment analysis, in particular using machine learning to help automate comment moderation. They&#8217;ve been quite successful at the Huffington Post, which is also an investor. Intrigued by their approach, I reached out to them to [...]]]></description>
			<content:encoded><![CDATA[<p><em>The following is a guest post by Jeff Revesz and Elena Haliczer, co-founders of <a href="http://adaptivesemantics.com/">Adaptive Semantics</a>. Adaptive Semantics specializes in sentiment analysis, in particular using machine learning to help automate comment moderation. They&#8217;ve been <a href="http://www.webmetricsguru.com/archives/2009/07/adaptive-semantics-at-the-huffington-post/">quite successful</a> at the <a href="http://www.huffingtonpost.com/">Huffington Post</a>, which is also an <a href="http://semanticweb.com/news/article.php/12161_3829541_2/Huffington-Post-Invests-in-Slice-of-Semantics">investor</a>. Intrigued by their approach, I reached out to them to solicit this post. I encourage you to respond publicly in the comment thread, or to contact them personally (first name @ adaptivesemantics.com).</em></p>
<p>Seven years ago, <a href="http://en.wikipedia.org/wiki/Paul_Graham">Paul Graham</a> famously stated:</p>
<p>“I think it&#8217;s possible to stop spam, and that content-based filters are the way to do it.”</p>
<p>Well, seven years of innovation and research have brought about some great advances in the field of text classification, so perhaps it’s time to raise the stakes a little. In short, we think it’s possible to stop abusiveness in user-generated content, and that content-based filters are the way to do it.</p>
<p><span style="text-decoration: underline;">The Problem with UGC</span></p>
<p>Publishers these days are in a tight spot with user-generated content (UGC). The promise of UGC in terms of engagement and overall stickiness is hard to pass up, but along with the benefits come some headaches as well. Comment spam is less of an issue than it once was, thanks to services such as <a href="http://akismet.com/">Akismet</a>, but the problem of <a href="http://en.wikipedia.org/wiki/Troll_%28Internet%29">trolling</a> and outright abuse is as bad as it ever was. Any publisher venturing into UGC is stuck with the question of how to keep comments in line with their editorial standards while at the same time avoiding accusations of censorship. The solution employed thus far has mainly been a combination of keyword filters and human moderators. Unfortunately for publishers, there are serious problems with both of those, so let’s look at that more closely.</p>
<p>The main problem with human moderators is the cost involved. They’re expensive, hard to outsource, and they don’t scale. The average human has a maximum capacity of about 250 comments per hour, which is a generous estimate. At minimum wage this works out to about $0.03 per comment, which seems reasonable until you consider that a typical online publisher like the Huffington Post receives about 2 million comments per month site-wide. Add in overhead costs like hiring, training, auditing, etc and it quickly starts to get out of control. On top of this is the issue of moderator bias. Is it possible that your Democratic moderator is simply deleting every post that disagrees with President Obama, regardless of content?</p>
<p>To mitigate the costs involved, many publishers add in a layer of non-human filtering, such as a keyword list. While this may seem like a good idea at first, all it really does is offer you the worst of both worlds. Now you have an expensive, non-scalable solution that also gives bad results. Keyword lists can be easily beaten by the simplest <a href="http://en.wikipedia.org/wiki/Obfuscation">obfuscation</a>, such breaking up bad words or simply replacing a letter with a symbol. In addition, it is impossible for keyword filters to catch anything but the crudest type of abusiveness. A great example is the recent Facebook poll “<a href="http://www.huffingtonpost.com/2009/09/28/obama-facebook-poll-asks_n_301860.html">Should Obama Be Killed?</a>” which would likely pass right through a keyword filter but is quite obviously abusive content.</p>
<p><span style="text-decoration: underline;">The Solution:  Sentiment Classifiers</span></p>
<p>The idea of using a machine-learning classifier to identify text-based semantics is not a new one. <a href="http://en.wikipedia.org/wiki/Vladimir_Vapnik">Vladimir Vapnik</a> introduced the original <a href="http://www.springerlink.com/content/k238jx04hm87j80g/fulltext.pdf">theory</a> of <a href="http://en.wikipedia.org/wiki/Support_vector_machine">support vector machines</a> (SVMs) in 1995, and in 1998 <a href="http://www.cs.cornell.edu/People/tj/">Thorsten Joachims</a> argued that the algorithm was perfectly <a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=726D5D1A64E4A3C3580328C4744A0C43?doi=10.1.1.11.6124&amp;rep=rep1&amp;type=pdf">suited for textual data</a>. Finally, in 2002 <a href="http://www.cs.cornell.edu/home/llee/">Lillian Lee</a> and colleagues showed that not only are SVMs well suited for <a href="http://www.cs.cornell.edu/home/llee/papers/sentiment.pdf">identifying sentiment</a>, but they also dominate keyword-based filters consistently. When applied to the problem of comment moderation, SVMs can mimic human moderation decisions with an accuracy of about 85%. That raises the question, is 85% good enough? How can we push the accuracy higher?</p>
<p>We have some proprietary answers to that question over at <a href="http://adaptivesemantics.com/">Adaptive Semantics</a>, but a less controversial one arises from a well-documented property of classifier output known as the <em>hyperplane distance </em>(labeled v<sup>k</sup> in the diagram below).</p>
<p><a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/10/optimal-margin-classifier.jpg"><img class="alignnone size-full wp-image-2640" title="optimal-margin-classifier" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/10/optimal-margin-classifier.jpg" alt="optimal-margin-classifier" width="489" height="312" /></a></p>
<p><em> </em></p>
<p>If the separating hyperplane can be viewed as the dividing line between abusive and non-abusive content, the hyperplane distance of any individual test comment can be interpreted as the classifier’s confidence in its own answer. If a test comment turns out to be very far from the dividing line, we can say that it lies deeply in “abusive space” (or in “non-abusive space” depending on the polarity). Now let’s imagine an SVM pre-filter that only makes auto-publish or auto-delete decisions on comments that have a large hyperplane distance, and sends all other comments to the human moderator staff. Such a classifier would have a guaranteed accuracy above 85%, and would progressively reduce the reliance on human moderators as it is re-trained over time. Even a conservatively tuned model can reduce the human moderation load by about 50% while keeping comment quality roughly the same. That’s a pretty good start.</p>
<p>In addition to high accuracy, a content-based classifier does not have the same limitation of a keyword filter in terms of vocabulary. Since the classifier is trained by feeding it thousands of real-world examples, it will learn to identify all of the typical types of obfuscation such as broken words, netspeak, slang, euphemisms, etc. And since the entire content of the comment is used as an input, the classifier will implicitly take account of context. So the comment “Should Obama Be Killed?” would likely be flagged for deletion, but a comment like “A defeat on healthcare may kill Obama’s chances at re-election.” would be left alone.</p>
<p>So is the abusiveness problem licked? Not quite yet, but the use of linear classifiers would be a huge step in the right direction. You could imagine further advances such as aggregating comment scores by user to quickly identify trolls, and maybe even using those scores as input for another classifier. Or how about training more classifiers to identify quality submissions and pick out the experts in your community? The possibilities are definitely exciting, and they raise another question: why are publishers not using these techniques? That one we don’t have a good answer for, so we founded a <a href="http://adaptivesemantics.com/">company</a> in response.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/10/01/guest-post-a-plan-for-abusiveness/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Museum of Mathematics</title>
		<link>http://thenoisychannel.com/2009/09/30/a-museum-of-mathematics/</link>
		<comments>http://thenoisychannel.com/2009/09/30/a-museum-of-mathematics/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 13:05:07 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Community]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2632</guid>
		<description><![CDATA[
Mathematics illuminates the patterns that abound in our world. The Math Factory strives to enhance public understanding and perception of mathematics. Its dynamic exhibits and programs will stimulate inquiry, spark curiosity, and reveal the wonders of mathematics. The museum’s activities will lead a broad and diverse audience to understand the evolving, creative, human, and aesthetic [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="500" height="304" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/ddeYnQRZz78&amp;hl=en&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="500" height="304" src="http://www.youtube.com/v/ddeYnQRZz78&amp;hl=en&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p><em>Mathematics illuminates the patterns that abound in our world. The Math Factory strives to enhance public understanding and perception of mathematics. Its dynamic exhibits and programs will stimulate inquiry, spark curiosity, and reveal the wonders of mathematics. The museum’s activities will lead a broad and diverse audience to understand the evolving, creative, human, and aesthetic nature of mathematics.</em></p>
<p>The above is the mission statement of <a href="http://www.mathfactory.org/">The Math Factory</a>, an organization headed by former <a href="http://en.wikipedia.org/wiki/Renaissance_Technologies">Renaissance Technologies</a> analyst (and <a href="http://cty.jhu.edu/">CTY</a> alumnus) <a href="http://cty.jhu.edu/alumni/Newsletter/glenwhitney">Glen Whitney</a> that aspires to build a national museum of mathematics in New York. The effort is well underway&#8211;the organization has raised $4M to date, attracted an impressive group of <a href="http://www.mathfactory.org/tiki-index.php?page=Board+of+Trustees">trustees</a> and <a href="http://www.mathfactory.org/tiki-index.php?page=Advisory+Council">advisors</a>, and has obtained quite a bit of enthusiastic <a href="http://www.mathmidway.org/math-midway-press.php">press 	coverage</a>. No wonder&#8211;the <a href="http://www.mathmidway.org/math-midway-gallery.php">Math Midway</a> it exhibited at the <a href="http://www.worldsciencefestival.com/">World Science Festival</a> this past June was a wild success. I&#8217;d gone there to offer moral support, only to find that I was lucky to get close enough to see the exhibits!</p>
<p>Last night, I was fortunate enough to attend a gala at the <a href="http://urbanacademy.org/">Urban Academy</a> and actually play with the exhibits&#8211;from riding a tricycle with square wheels to walking through a maze without making left turns. It was a blast! And, while I&#8217;ll admit to being favorably predisposed towards math, the exhibits hardly required such a predisposition&#8211;any more than the <a href="http://www.exploratorium.edu/">Exploratorium</a> in San Francisco requires a predisposition towards science. Rather, experiences like these create excitement, overcoming the negative preconceptions that too many children (and adults!) have about this subjects.</p>
<p>While I suspect that many Noisy Channel readers are already sold on both the enjoyment and core societal value of mathematics, I encourage you to think about how much better a world we would have if this appreciation were more widely shared. For those who have to think about large numbers just to manage their assets, I encourage you to think of The Math Factory as worthy of your philanthropy. I encourage everyone to contribute your ideas and endorsements to this visionary effort.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/30/a-museum-of-mathematics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Privacy, Pseudonymity, and Copyright</title>
		<link>http://thenoisychannel.com/2009/09/29/privacy-pseudonymity-and-copyright/</link>
		<comments>http://thenoisychannel.com/2009/09/29/privacy-pseudonymity-and-copyright/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 20:49:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2627</guid>
		<description><![CDATA[A lunch conversation during the Transparent Text symposium about transparency in social media (also a hot topic in the Ethics of Blogging panel) led me to watch the following presentation from Lawrence Lessig on &#8220;Privacy 2.0&#8220;:

Another topic in that conversation was pseudonymity. Someone pointed to a 2000 USENIX paper entitled &#8220;Can Pseudonymity Really Guarantee Privacy?&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>A lunch conversation during the <a href="http://thenoisychannel.com/2009/09/21/transparent-text-symposium-day-1/">Transparent</a> <a href="http://thenoisychannel.com/2009/09/23/transparent-text-symposium-day-2/">Text</a> symposium about transparency in social media (also a hot topic in the <a href="http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/">Ethics of Blogging</a> panel) led me to watch the following presentation from <a href="http://www.lessig.org/">Lawrence Lessig</a> on &#8220;<a href="http://lessig.blip.tv/file/2016591/">Privacy 2.0</a>&#8220;:</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="390" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://blip.tv/play/lG372wMC" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="390" src="http://blip.tv/play/lG372wMC" allowfullscreen="true"></embed></object></p>
<p>Another topic in that conversation was <a href="http://en.wikipedia.org/wiki/Pseudonymity">pseudonymity</a>. Someone pointed to a 2000 <a href="http://www.usenix.org/">USENIX</a> paper entitled &#8220;<a href="http://www.usenix.org/events/sec2000/full_papers/rao/rao_html/index.html">Can Pseudonymity Really Guarantee Privacy?</a>&#8221; The challenges of implementing pseudonymity have, of course, received lots of attention in the past few years. The most notorious example is the <a href="http://en.wikipedia.org/wiki/AOL_search_data_scandal">AOL search data scandal</a>, which made the <a href="http://www.nytimes.com/2006/08/09/technology/09aol.html">front page of the New York Times</a>. But there&#8217;s also the work co-authored by my friend <a href="http://www.cs.utexas.edu/users/shmat/">Vitaly Shmatikov</a> on <a href="http://www.cs.utexas.edu/users/shmat/shmat_oak08netflix.pdf">de-anonymizing Netflix data</a>. Indeed, some have expressed concern that the new Netflix competition is a <a href="http://www.freedom-to-tinker.com/blog/paul/netflixs-impending-still-avoidable-multi-million-dollar-privacy-blunder">privacy lawsuit waiting to happen</a>.</p>
<p>Finally, <a href="http://www.danah.org/">danah boyd</a>&#8217;s master&#8217;s thesis on &#8220;<a href="http://smg.media.mit.edu/papers/danah/danahThesis.pdf">faceted id/entity: managing representation in a digital world</a>&#8221; also came up&#8211;and I recently discovered by way of <a href="http://scobleizer.com/2009/09/26/youre-not-on-twitters-suggested-user-list-but-you-are-in-good-company/">Robert Scoble</a> that she&#8217;ll be <a href="http://sxsw.com/node/3432">keynoting at SXSW</a> next year. Now I feel even more proud that I convinced her to speak at the <a href="http://thenoisychannel.com/2009/07/29/sigir-2009-day-3-industry-track-danah-boyd/">SIGIR Industry Track</a> this year. But I digress.</p>
<p>What does any of this have to do with copyright? Watch Lessig&#8217;s presentation&#8211;it&#8217;s long, but I promise you it&#8217;s worthwhile and entertaining to boot. Besides, I&#8217;ve made it easy by embedding it for you! He makes an analogy&#8211;rather, he makes fair use of <a href="http://cyber.law.harvard.edu/people/jzittrain">Jonathan Zittrain</a>&#8217;s analogy&#8211;between privacy rights and copyright.</p>
<p>The executive (and overgeneralized) summary is that both privacy-holders (&#8220;consumers&#8221;) and copyright-holders (&#8220;industry&#8221;) have complained that technology has undermined their rights, and both have sought out legal remedies. Consumers push back on industry, frustrated with legal strategies to enforce copyright at the expense of consumer freedom, preferring instead to let technology dictate policy; industry pushes back on consumers, frustrated with their legal strategies to enforce privacy rights at the expense of industry freedom, in this case preferring instead to let technology dictate policy. The analogy may not be perfect, but it is close enough to be compelling.</p>
<p>But I&#8217;d like to stretch the analogy further than Lessig and Zittrain to consider pseudonymity and <a href="http://en.wikipedia.org/wiki/Derivative_work">derivative works</a>. The pseudonymity challenge (e.g., the recent reports about <a href="http://thenoisychannel.com/2009/09/20/project-gaydar-a-reminder-that-privacy-isnt-binary/">Project Gaydar</a>) remind us that privacy isn&#8217;t binary, and that we have to accept at least some loss of privacy if we are going to live in a social world. Similarly, provisions like <a href="http://en.wikipedia.org/wiki/Fair_use">fair use</a> exist because copyright is an inherent trade-off between protecting creators&#8217; rights and embracing the value of creation in a social context.</p>
<p>As I said, I find the Zittrain&#8217;s analogy and Lessig&#8217;s presentation compelling. While it may not answer any of society&#8217;s urgent questions about privacy and copyright, it may at least further the conversation. At the very least, I hope the topic is intellectually stimulating.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/29/privacy-pseudonymity-and-copyright/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ethics of Blogging: Webcast Now Available</title>
		<link>http://thenoisychannel.com/2009/09/28/ethics-of-blogging-webcast-now-available/</link>
		<comments>http://thenoisychannel.com/2009/09/28/ethics-of-blogging-webcast-now-available/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 03:27:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2623</guid>
		<description><![CDATA[Thanks to Robin Fray Carey for posting the webcast of the Ethics of Blogging panel on the Social Media Today site. You can also catch the tweet stream at #SMTWebcast while it&#8217;s still indexed.
]]></description>
			<content:encoded><![CDATA[<p>Thanks to Robin Fray Carey for posting the <a href="http://www.socialmediatoday.com/SMC/127920">webcast</a> of the <a href="http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/">Ethics of Blogging</a> panel on the Social Media Today site. You can also catch the tweet stream at <a href="http://search.twitter.com/search?q=%23SMTWebcast">#SMTWebcast</a> while it&#8217;s still indexed.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/28/ethics-of-blogging-webcast-now-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Human-Computer Information Retrieval in Layman&#8217;s Terms</title>
		<link>http://thenoisychannel.com/2009/09/27/human-computer-information-retrieval-in-laymans-terms/</link>
		<comments>http://thenoisychannel.com/2009/09/27/human-computer-information-retrieval-in-laymans-terms/#comments</comments>
		<pubDate>Mon, 28 Sep 2009 02:11:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2618</guid>
		<description><![CDATA[One of the great benefits of practicing, as Daniel Lemire calls it, open scholarship is that I have many opportunities to see how ideas translate across the research / practice divide. In particular, I obtain invaluable feedback on the accuracy and effectiveness of that translation process.
A few days ago, I was exchanging email with serial [...]]]></description>
			<content:encoded><![CDATA[<p>One of the great benefits of practicing, as <a href="http://www.daniel-lemire.com/">Daniel Lemire</a> calls it, <a href="http://thenoisychannel.com/2009/09/17/blogs-i-read-the-haystack-blog/#comment-4426">open scholarship</a> is that I have many opportunities to see how ideas translate across the research / practice divide. In particular, I obtain invaluable feedback on the accuracy and effectiveness of that translation process.</p>
<p>A few days ago, I was exchanging email with serial entrepreneur <a href="http://www.cdixon.org/">Chris Dixon</a> about <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a> (HCIR). He&#8217;d just looked through the <a href="http://thenoisychannel.com/2009/09/23/hcir-2009-accepted-submissions/">accepted submissions</a> list for <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> and said, if I may paraphrase: this is great stuff, but it needs to be better communicated for broader consumption. I quickly shot back a reaction that I&#8217;ll excerpt here (<a href="http://thenoisychannel.com/2008/11/27/when-in-doubt-make-it-public/">when in doubt, make it public!</a>):</p>
<blockquote><p>At some level it&#8217;s blindingly obvious: to err is human, to really screw up takes a computer. The <a href="http://thenoisychannel.com/2009/09/04/hcir-better-than-magic/">HealthBase fiasco</a> isn&#8217;t a shocker: lots of people are skeptical of pure <a href="http://en.wikipedia.org/wiki/Artificial_intelligence">AI</a> approaches.</p>
<p>What people don&#8217;t get is that you can work to optimize the division of labor. I&#8217;m evangelizing it in places like <a href="http://www.technologyreview.com/web/22848/">Technology Review</a>&#8211;a bit more mainstream than my blog. But ultimately the message has to resonate with entrepreneurs and investors who will make that vision a reality. <a href="http://endeca.com/">Endeca</a> is all about HCIR. <a href="http://bing.com/">Bing</a> is a step in the right direction for the open web. But there&#8217;s a long way to go.</p></blockquote>
<p>His response: that&#8217;s a lot more consumable that any other description of HCIR he&#8217;d seen to date (and he&#8217;s a regular reader here!). Having just finished reading Steve Blank&#8217;s <a href="http://www.cafepress.com/kandsranch"><em>Four Steps to the Epiphany</em></a>, I appreciate his point: in a new market, the most critical priority is educating the potential customers.</p>
<p>As a number of us prepare for the <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> workshop, that&#8217;s something to keep in mind. There&#8217;s a natural tension between rigorous scholarship and mass communication, but some have the greatest scholars (e.g., <a href="http://research.microsoft.com/apps/tools/tuva/">Richard Feynman</a> and <a href="http://lpi.oregonstate.edu/lpbio/lpbio2.html">Linus Pauling</a>) have shown the way for us mere mortals. Indeed, in a field as cross-disciplinary as HCIR, we would do well to make our work and vision as broadly consumable as possible, albeit without oversimplifying it to the point that it is vapid or even misleading.</p>
<p>Generally speaking, I blog in order to convince people that some of the esoteric ideas I encounter&#8211;and the occasional ideas I am fortunate enough to conceive&#8211;are worthy of broader consideration. I <a href="http://thenoisychannel.com/2008/04/06/nick-belkin-at-ecir-08/">started blogging</a> in order to bring greater visibility to HCIR&#8211;to convince people that the choice between human and machine responsibility is a false dichtomy in almost every aspect of the information seeking process.</p>
<p>In grade school, I learned that <a href="http://en.wikipedia.org/wiki/Division_of_labour">division of labor</a> is the cornerstone of civilization&#8211;perhaps and our adaptive process of allocating effort our greatest achievement as a species. As machines play an increasingly important role in our lives&#8211;and serve as the lenses through which seek and consume almost all information&#8211;it is key that we not forget our roots. Let us be neither <a href="http://en.wikipedia.org/wiki/Luddite">Luddites</a> nor passive participants, but rather let us help computers help us.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/27/human-computer-information-retrieval-in-laymans-terms/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Information Retrievability</title>
		<link>http://thenoisychannel.com/2009/09/26/information-retrievability/</link>
		<comments>http://thenoisychannel.com/2009/09/26/information-retrievability/#comments</comments>
		<pubDate>Sat, 26 Sep 2009 18:07:04 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2610</guid>
		<description><![CDATA[Last year, I wrote a post about Leif Azzopardi and Vishwa Vinay&#8217;s work on information accessibility:
Instead of an actual physical space, in IR, we are predominately concerned with accessing information within a collection of documents (i.e., information space), and instead of a transportation system, we have an Information Access System (i.e., a means by which [...]]]></description>
			<content:encoded><![CDATA[<p>Last year, I wrote a post about <a href="http://ir.dcs.gla.ac.uk/%7Eleif/">Leif Azzopardi</a> and <a href="http://research.microsoft.com/en-us/people/vvinay/">Vishwa Vinay</a>&#8217;s work on <a href="http://thenoisychannel.com/2008/04/22/accessibility-in-information-retrieval/">information accessibility</a>:</p>
<blockquote><p>Instead of an actual physical space, in IR, we are predominately concerned with accessing information within a collection of documents (i.e., information space), and instead of a transportation system, we have an Information Access System (i.e., a means by which we can access the information in the collection, like a query mechanism, a browsing mechanism, etc). The accessibility of a document is indicative of the likelihood or opportunity of it being retrieved by the user in this information space given such a mechanism.</p></blockquote>
<p>After reading a pre-print of my <a href="http://thenoisychannel.com/2009/09/23/hcir-2009-accepted-submissions/">HCIR 2009 position paper</a> about the information availability problem, Vinay pointed me at follow-up work he&#8217;d done with Leif on <a href="http://www.dcs.gla.ac.uk/publications/PAPERS/8984/fp0120-azzopardi.pdf">information retrievability</a>. I agree with his observation that, while I look at information availability from a user-centric perspective; they consider retrievability from  a document- or system-centric perspective. The approaches are complementary, and both add to a growing body of work that advocates a holistic model of how users access information, rather than a narrow focus on reductionist measures like <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a> and <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a> at the level of individual queries.</p>
<p>To be clear, those reductionist measures still have their place. In fact, I&#8217;m looking forward to <a href="http://trec.nist.gov/">NIST</a>&#8217;s Ellen Voorhees <a href="http://thenoisychannel.com/2008/04/17/ellen-voorhees-defends-cranfield/">defending Cranfield</a> next month to an <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> crowd that is, for the most part, deeply suspicious of it.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/26/information-retrievability/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Free Chapter on Faceted Search User Interface Design</title>
		<link>http://thenoisychannel.com/2009/09/25/free-chapter-on-faceted-search-user-interface-design/</link>
		<comments>http://thenoisychannel.com/2009/09/25/free-chapter-on-faceted-search-user-interface-design/#comments</comments>
		<pubDate>Fri, 25 Sep 2009 12:28:21 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2607</guid>
		<description><![CDATA[If you are are interested in user interface design for faceted search&#8211;and I know that&#8217;s a hot topic for many Noisy Channel readers&#8211;then be sure to check out this free book chapter by Moritz Stefaner, Sébastian Ferré, Saverio Perugini, Jonathan Koren, and Yi Zhang.
By the way, a chapter of my own book on faceted search [...]]]></description>
			<content:encoded><![CDATA[<p>If you are are interested in user interface design for <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>&#8211;and I know that&#8217;s a hot topic for many Noisy Channel readers&#8211;then be sure to check out this free <a href="http://moritz.stefaner.eu/downloads/papers/DynTax_Ch_UI.pdf">book chapter</a> by <a href="http://moritz.stefaner.eu/">Moritz Stefaner</a>, <a href="http://www.irisa.fr/LIS/ferre/">Sébastian Ferré</a>, <a href="http://academic.udayton.edu/SaverioPerugini/">Saverio Perugini</a>, <a href="http://users.soe.ucsc.edu/~jonathan/">Jonathan Koren</a>, and <a href="http://users.soe.ucsc.edu/~yiz/">Yi Zhang</a>.</p>
<p>By the way, a <a href="http://www.uie.com/events/virtual_seminars/facets/Faceted%20Search%20-%20Chapter%207.pdf">chapter</a> of my own <a href="http://www.amazon.com/Synthesis-Lectures-Information-Concepts-Retrieval/dp/1598299999">book on faceted search</a> is also available for free online, as is <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>&#8217;s entire book on <a href="http://searchuserinterfaces.com/">search user interfaces</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/25/free-chapter-on-faceted-search-user-interface-design/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Ethics of Blogging Panel Today</title>
		<link>http://thenoisychannel.com/2009/09/24/ethics-of-blogging-panel-today/</link>
		<comments>http://thenoisychannel.com/2009/09/24/ethics-of-blogging-panel-today/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 11:56:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2604</guid>
		<description><![CDATA[Just a reminder that I&#8217;m participating in an online panel today (at 1pm EST) to discuss the Ethics of Blogging.
Maggie Fox, founder and CEO of Social Media Group, will moderate a panel composed of Augie Ray, who blogs at Experience: The Blog) and is Managing Director of Experiential Marketing at interactive and social media agency [...]]]></description>
			<content:encoded><![CDATA[<p>Just a reminder that I&#8217;m participating in an online panel today (at 1pm EST) to discuss the <a href="http://www.socialmediatoday.com/submitform/smtwebinar092409/?reference=smt_dtunkelang">Ethics of Blogging</a>.</p>
<p><a href="http://socialmediagroup.com/about/">Maggie Fox</a>, founder and CEO of <a href="http://socialmediagroup.com/">Social Media Group</a>, will moderate a panel composed of<a href="https://twitter.com/augieray"> Augie Ray</a>, who blogs at <a href="http://www.experiencetheblog.com/">Experience: The Blog</a>) and is Managing Director of Experiential Marketing at interactive and social media agency <a href="http://www.fullhouseinteractive.com/">Fullhouse</a>; <a href="http://johnjantsch.com/">John Jantsch</a>, who blogs at <a href="http://www.ducttapemarketing.com/blog/">Duct Tape Marketing</a> and is a marketing and digital technology coach; and yours truly. It’s free to attend; just register <a href="http://www.socialmediatoday.com/submitform/smtwebinar092409/?reference=smt_dtunkelang">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/24/ethics-of-blogging-panel-today/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>HCIR 2009 Accepted Submissions</title>
		<link>http://thenoisychannel.com/2009/09/23/hcir-2009-accepted-submissions/</link>
		<comments>http://thenoisychannel.com/2009/09/23/hcir-2009-accepted-submissions/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 16:56:06 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2600</guid>
		<description><![CDATA[The agenda for HCIR 2009 is now online! As previously announced, Ben Shneiderman from the University of Maryland will be the keynote speaker. The accepted submissions are as follows:
Panel Presentations

Usefulness as the Criterion for Evaluation of Interactive Information Retrieval
 Michael Cole, Jingjing Liu, Nicholas Belkin, Ralf Bierig, Jacek Gwizdka, Chang Liu, Jun Zhang and Xiangmin [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://cuaslis.org/hcir2009/agenda.html">agenda</a> for <a href="http://cuaslis.org/hcir2009/">HCIR 2009</a> is now online! As previously announced, <a href="http://www.cs.umd.edu/~ben/">Ben Shneiderman</a> from the University of Maryland will be the keynote speaker. The accepted submissions are as follows:</p>
<p><strong>Panel Presentations</strong></p>
<ul>
<li>Usefulness as the Criterion for Evaluation of Interactive Information Retrieval<br />
<em> Michael Cole, Jingjing Liu, Nicholas Belkin, Ralf Bierig, Jacek Gwizdka, Chang Liu, Jun Zhang and Xiangmin Zhang (Rutgers University)</em></li>
<li>Modeling Searcher Frustration<br />
<em> Henry Feild and James Allan (University of Massachusetts Amherst)</em></li>
<li>Query Suggestions as Idea Tactics for Information Search<br />
<em> Diane Kelly (University of North Carolina at Chapel Hill)</em></li>
<li>I Come Not to Bury Cranfield, but to Praise It<br />
<em> Ellen Voorhees (National Institute of Standards and Technology)</em></li>
<li>Search Tasks and Their Role in Studies of Search Behaviors<br />
<em> Barbara Wildemuth (University of North Carolina at Chapel Hill) and Luanne Freund (University of British Columbia)</em></li>
</ul>
<p><strong>Posters and Demonstrations</strong></p>
<ul>
<li>Visual Interaction for Personalized Information Retrieval<br />
<em> Jae-wook Ahn and Peter Brusilovsky (University of Pittsburgh)</em></li>
<li>PuppyIR: Designing an Open Source Framework for Interactive Information Services for Children<br />
<em> Leif Azzopardi (University of Glasgow), Richard Glassey (University of Glasgow), Mounia Lalmas (University of Glasgow), Tamara Polajnar (University of Glasgow) and Ian Ruthven (University of Strathclyde)</em></li>
<li>Designing an Interactive Automatic Document Classification System<br />
<em> Kirk Baker (Collexis)</em></li>
<li>The HCI Browser Tool for Studying Web Search Behavior<br />
<em> Robert Capra (University of North Carolina at Chapel Hill)</em></li>
<li>A Graphic User Interface for Content and Structure Queries in XML Retrieval<br />
<em> Juan M. Fernández-Luna, Luis M. de Campos, Juan F. Huete and Carlos J. Martin-Dancausa (University of Granada)</em></li>
<li>Improving Search-Driven Development with Collaborative Information Retrieval Techniques<br />
<em> Juan M. Fernández-Luna (University of Granada), Juan F. Huete (University of Granada), Ramiro Pérez-Vázquez (Universidad Central de Las Villas) and Julio C. Rodríguez-Cano (Universidad de Holguín)</em></li>
<li>A visualization interface for interactive search refinement<br />
<em> Fernando Figueira Filho (State University of Campinas), João Porto de Albuquerque (University of Sao Paulo), André Resende (State University of Campinas), Paulo Lício de Geus (State University of Campinas) and Gary Olson (University of California, Irvine)</em></li>
<li>Cognitive Dimensions Analysis of Interfaces for Information Seeking<br />
<em> Gene Golovchinsky (FX Palo Alto Laboratory, Inc.)</em></li>
<li>Cognitive Load and Web Search Tasks<br />
<em> Jacek Gwizdka (Rutgers University)</em></li>
<li>Visualising Digital Video Libraries for TV Broadcasting Industry: A User-Centred Approach<br />
<em> Mieke Haesen, Jan Meskens and Karin Coninx (Hasselt University)</em></li>
<li>Log Based Analysis of How Faceted and Text Based Searching Interact in a Library Catalog Interface<br />
<em> Bradley Hemminger (University of North Carolina), Xi Niu (University of North Carolina) and Cory Lown (NC State Libraries)</em></li>
<li>Freebase Cubed: Text-based Collection Queries for Large, Richly Interconnected Data Sets<br />
<em> David Huynh (Metaweb Technologies, Inc.)</em></li>
<li>System Controlled Assistance for Improving Search Performance<br />
<em> Bernard Jansen (Pennsylvania State University)</em></li>
<li>Designing for Enterprise Search in a Global Organization<br />
<em> Maria Johansson and Lina Westerling (Findwise AB)</em></li>
<li>Cultural Differences in Information Behavior<br />
<em> Anita Komlodi (University of Maryland Baltimore County) and Karoly Hercegfi (Budapest University of Technology and Economics)</em></li>
<li>Adapting an Information Visualization Tool for Mobile Information Retrieval<br />
<em> Sherry Koshman and Jae-wook Ahn (University of Pittsburgh)</em></li>
<li>A Theoretical Framework for Subjective Relevance<br />
<em> Katrina Muller and Diane Kelly (University of North Carolina)</em></li>
<li>Query Reuse in Exploratory Search Tasks<br />
<em> Chirag Shah and Gary Marchionini (University of North Carolina at Chapel Hill)</em></li>
<li>Augmenting Cranfield-Style Evaluation with GOMS to Obtain Timed Predictions of User Performance<br />
<em> Mark Smucker (Waterloo University)</em></li>
<li>Text-To-Query: Suggesting Structured Analytics to Illustrate Textual Content<br />
<em> Raphael Thollot (SAP Business Objects) and Marie-Aude Aufaure (Ecole Centrale Paris)</em></li>
<li>The Information Availability Problem<br />
<em> Daniel Tunkelang (Endeca)</em></li>
<li>Exploratory Search Over Temporal Event Sequences: Novel Requirements, Operations, and a Process Model<br />
<em> Taowei Wang, Krist Wongsuphasawat, Catherine Plaisant and Ben Shneiderman (University of Maryland)</em></li>
<li>Keyword Search: Quite Exploratory Actually<br />
<em> Max Wilson (Swansea University)</em></li>
<li>Using Twitter to Assess Information Needs: Early Results<br />
<em> Max Wilson (Swansea University)</em></li>
<li>Integrating User-generated Content Description to Search Interface Design<br />
<em> Kyunghye Yoon (SUNY Oswego)</em></li>
<li>Ambiguity and Context-Aware Query Reformulation<br />
<em> Hui Zhang (Indiana University)</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/23/hcir-2009-accepted-submissions/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Goby Goes Deep</title>
		<link>http://thenoisychannel.com/2009/09/23/goby-goes-deep/</link>
		<comments>http://thenoisychannel.com/2009/09/23/goby-goes-deep/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 11:17:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2597</guid>
		<description><![CDATA[At  the first HCIR workshop in 2007, Michael Stonebraker stood up in the middle of an open discussion session and told all assembled that we needed to be thinking about the deep web.
I don&#8217;t know how much the audience took heed of his call, but he certainly followed his own advice. He and Endeca alum [...]]]></description>
			<content:encoded><![CDATA[<p>At  the first <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> workshop in <a href="http://projects.csail.mit.edu/hcir/web/">2007</a>, <a href="http://en.wikipedia.org/wiki/Michael_Stonebraker">Michael Stonebraker</a> stood up in the middle of an open discussion session and told all assembled that we needed to be thinking about the <a href="http://en.wikipedia.org/wiki/Deep_Web">deep web</a>.</p>
<p>I don&#8217;t know how much the audience took heed of his call, but he certainly followed his own advice. He and Endeca alum <a href="http://twitter.com/viking2917">Mark Watkins</a> just launched <a href="http://www.goby.com/">Goby</a>, a vertical search engine that exhorts you to &#8220;create your own adventure&#8221;.  It&#8217;s fun&#8211;a sort of <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> for explorers. And it uses a deep web crawl to populate its index with semi-structured data.</p>
<p>Anyway, try it out! I&#8217;ve been in the private beta, but haven&#8217;t had the chance to see what they&#8217;ve been up to in the final stretch leading to the launch. You can also read more on <a href="http://searchengineland.com/what-where-when-travel-local-search-combine-goby-com-26395">Search Engine Land</a> or <a href="http://news.cnet.com/8301-27076_3-10359329-248.html">CNET</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/23/goby-goes-deep/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Transparent Text Symposium: Day 2</title>
		<link>http://thenoisychannel.com/2009/09/23/transparent-text-symposium-day-2/</link>
		<comments>http://thenoisychannel.com/2009/09/23/transparent-text-symposium-day-2/#comments</comments>
		<pubDate>Wed, 23 Sep 2009 04:32:29 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2590</guid>
		<description><![CDATA[Given how intense yesterday was at the Transparent Text symposium, I couldn&#8217;t imagine that today would match it. But it did!
The morning kicked off with a series of 18 lighting talks in 90 minutes&#8211;that was 5 minutes apiece, with a ruthless gong for anyone who went overtime. The presentations were consistently intense, and I had [...]]]></description>
			<content:encoded><![CDATA[<p>Given how intense <a href="http://thenoisychannel.com/2009/09/21/transparent-text-symposium-day-1/">yesterday</a> was at the <a href="http://www.research.ibm.com/social/transparent_text/">Transparent Text</a> symposium, I couldn&#8217;t imagine that today would match it. But it did!</p>
<p>The morning kicked off with a series of 18 lighting talks in 90 minutes&#8211;that was 5 minutes apiece, with a ruthless <a href="http://twitpic.com/ip345">gong</a> for anyone who went overtime. The presentations were consistently intense, and I had the misfortune to follow one of the best talks&#8211;a very passionate presentation about crowd-sourced translation by IBM&#8217;s Uyi Stewart. Other notable presenters included design ninja <a href="http://www.alexislloyd.com/">Alexis Lloyd</a> from the New York Times R&amp;D Lab, Karrie  Karahalios from the University of Illinois talking about the experimental <a href="http://wemeddle.com/">WeMeddle</a> Twitter client,  <span id="msgtxt4172392610">MIT Media Lab professor and Berkman Fellow <a href="http://smg.media.mit.edu/people/Judith/">Judith Donath</a> showing a stunning gallery of &#8220;data portraits&#8221;, and Dragon Systems co-founder <a href="http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking#History">Janet Baker</a> explaining how the brain recognizes speech&#8211;with an skull as a prop! The session was incredible, and I hope other conferences adopt this model.</span></p>
<p><span>After the coffee break, there was a session on Text Analysis in the Large, featuring </span><span><a href="http://www.almaden.ibm.com/cs/people/dgruhl/">Dan Gruhl</a> (IBM), </span><span><a href="http://gking.harvard.edu/">Gary King</a> (Harvard), and <a href="http://money.cnn.com/2009/06/26/technology/ibm_jeopardy_watson_computer/?postversion=2009062616">David Ferrucci</a> (IBM). Dan Gruhl talked about web-scale text analysis&#8211;a topic up his alley, considering his role in architecting the IBM <a href="http://en.wikipedia.org/wiki/IBM_WebFountain">WebFountain</a> project. Gary King gave a fascinating talk about using</span><span id="msgtxt4174004598"> ensemble methods to improve on existing clustering methods&#8211;the idea is to synthesize a collection of derived clusterings and place them in an explorable metric space. You can read the full paper <a href="http://gking.harvard.edu/files/discov.pdf">here</a>. But the winner for this session was definitely David Ferrucci, who described the work IBM Research is doing to develop a <a href="http://thenoisychannel.com/2009/04/27/who-wants-to-play-jeopardy/">machine Jeopardy player</a>. He spent much of the talk building a case for the difficulty of the problem&#8211;and then delivered the </span><span id="msgtxt4175579219">punchline: In less then three years of research, they&#8217;ve developed a machine player whose performance is comparable to that or jeopardy winners. Hopefully they&#8217;ll be competing on live television by next year!<br />
</span></p>
<p>After lunch, there was a session on Investigation, featuring <a href="http://maplight.org/">MAPLight</a> Research Director <a name="Emily_Calhoun" href="http://maplight.org/staff">Emily Calhoun</a>, UC Berkeley law professor <a name="Kevin_Quinn" href="http://www.law.berkeley.edu/kevinmquinn.htm">Kevin Quinn</a>, and <span id="msgtxt4177751966">Guardian news editor <a href="http://www.guardian.co.uk/profile/simonrogers">Simon Rogers</a>. </span>Emily Calhoun showed how MAPLight illuminates the connections between money and politics&#8211;it was great seeing <span id="msgtxt4295316262">data to correlate who supports and opposes bills with the associated campaign </span><span id="msgtxt4295316262">contributions from</span><span id="msgtxt4295316262"> interest groups. Kevin Quinn&#8217;s presentation was a bit more technical, but his <a href="http://www.law.berkeley.edu/5957.htm">work</a> reminds me a lot of Miles Efron&#8217;s work on <a href="http://people.lis.illinois.edu/~mefron/papers/efron-libmedia.pdf">estimating political orientation in web documents</a>&#8211;but Quinn&#8217;s work is more general and goes beyond co-citation analysis to analyze the actual language of the documents. Great application of topic modeling! But my favorite presentation in this session was the one from Simon Rogers: he told the story of how the Guardian successfully crowd-sourced a project to <a href="http://mps-expenses.guardian.co.uk/">investigate the expenses of UK Parliament members</a>.</span></p>
<p><span>The final session was a panel discussion about how visualization might elevate or advance the debate over health care policy. The panelists were </span><span id="msgtxt4297562492"><a href="http://benfry.com/">Ben Fry</a>, <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>, <a href="http://gking.harvard.edu/">Gary King</a>, and </span><span id="msgtxt4177751966"><a href="http://www.guardian.co.uk/profile/simonrogers">Simon Rogers</a></span><span id="msgtxt4297562492">; <a href="http://fernandaviegas.com/">Fernanda Vi</a></span><a href="http://fernandaviegas.com/">é</a><span id="msgtxt4297562492"><a href="http://fernandaviegas.com/">gas</a> and <a href="http://www.bewitched.com/">Martin Wattenberg</a> moderated. Unfortunately, the overwhelming sentiment from the panel was pessimism that anything we could do might actually lead to improved outcomes. Nonetheless, it&#8217;s clear that a lot of people are going to try.</span></p>
<p><span>Again, I want to thank Fernanda, Martin, <a href="http://domino.watson.ibm.com/cambridge/research.nsf/pages/irene_greif.html">Irene Greif</a>, and everyone at IBM for organizing this fantastic event&#8211;and for inviting me to attend! I am impressed that anyone could manage to assemble such an impressive set of speakers in one place, and I appreciate the effort that everyone put into making the past two days so worthwhile. I look forward to seeing the videos available online, and I hope those who weren&#8217;t able to attend take the opportunity to watch some of them. I also encourage you to check out the live Twitter stream at <a href="http://search.twitter.com/search?q=%23tt09">#tt09</a> while it&#8217;s still available.<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/23/transparent-text-symposium-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transparent Text Symposium: Day 1</title>
		<link>http://thenoisychannel.com/2009/09/21/transparent-text-symposium-day-1/</link>
		<comments>http://thenoisychannel.com/2009/09/21/transparent-text-symposium-day-1/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 03:37:25 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2584</guid>
		<description><![CDATA[Wow, what an intense day at the Transparent Text symposium! I won&#8217;t try to give detailed summaries of the talks&#8211;videos will be posted after the conference, and you can get a pretty good picture from the live tweet stream at #tt09. Instead, I&#8217;ll try to capture my personal highlights and reactions.
I&#8217;ll start with Deputy U.S. [...]]]></description>
			<content:encoded><![CDATA[<p>Wow, what an intense day at the <a href="http://www.research.ibm.com/social/transparent_text/">Transparent Text</a> symposium! I won&#8217;t try to give detailed summaries of the talks&#8211;videos will be posted after the conference, and you can get a pretty good picture from the live tweet stream at <a href="http://search.twitter.com/search?q=%23tt09">#tt09</a>. Instead, I&#8217;ll try to capture my personal highlights and reactions.</p>
<p>I&#8217;ll start with Deputy U.S. CTO <a href="http://www.nyls.edu/faculty/faculty_profiles/beth_simone_noveck">Beth Noveck</a>&#8217;s keynote about the <a href="http://www.whitehouse.gov/open/">Open Government Initiative</a>. First, the very existence of such an initiative is incredible, given the culture of secrecy traditionally associated with Washington. Second, I like the top priority of releasing raw data so that other people can work on analyzing it, visualizing it, and generally making it more accessible either to the general public or to particular interest groups. This is very much what I had in mind in January when I posted &#8220;<a href="http://thenoisychannel.com/2009/01/20/information-sharing-we-can-believe-in/">Information Sharing We Can Believe In</a>&#8221; and I&#8217;m glad to see tangible progress. I was never a big fan of faith-based initiatives. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>The next session was a group of talks about watchdogs and accountability&#8211;people looking at how to ensure government transparency from the outside. New York Times editor <a href="http://topics.nytimes.com/topics/reference/timestopics/people/p/aron_pilhofer/index.html">Aron Pilhofer</a> and software developer <a href="http://ashkenas.com/">Jeremy Ashkenas</a> talked about <a href="http://www.documentcloud.org/">DocumentCloud</a>, an ambitious project to enable <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> for news documents on the open web. <a href="http://www.sunlightfoundation.com/">Sunlight Foundation</a> co-founder and executive director <a href="http://www.sunlightfoundation.com/people/emiller/">Ellen Miller</a> offered a particularly compelling example of the power of visualization: a graph correlating the campaign contributions and earmark associated with a congressman under investigation. But my favorite presenter in this section was <a href="http://www.propublica.org/">ProPublica</a>&#8217;s <a href="http://www.propublica.org/site/author/amanda_michel">Amanda Michel</a>, whose thoughts about a &#8220;human test of transparency&#8221; are worth a talk in themselves. For now, I recommend you look at the two projects she discussed:<a href="http://projects.propublica.org/spotcheck/"> Stimulus Spot Check</a> and <a href="http://www.huffingtonpost.com/off-the-bus-reporter/a-new-era-begins_b_141197.html">Off the Bus</a>.</p>
<p>After lunch, we shifted gears from government transparency to more of a focus on text. The first of the two afternoon sessions was entitled &#8220;Analyzing the Written Record&#8221; and featured <a href="http://matthew.gray.org/">Matthew Gray</a> from <a href="http://books.google.com/">Google Books</a>, <a href="http://www.opencalais.com/users/tom">Tom Tague</a> from <a href="http://www.opencalais.com/">Open Calais</a> (a free text annotation service that almost all of the previous speakers raved about), and <a href="http://ethanzuckerman.com/">Ethan Zuckerman</a> from Harvard&#8217;s <a href="http://cyber.law.harvard.edu/">Berkman Center</a>. All of the talks were solid, but Ethan&#8217;s was outstanding. I <a href="http://thenoisychannel.com/2009/03/11/media-cloud-watch-analyze-learn/">blogged</a> about his <a href="http://www.mediacloud.org/">Media Cloud</a> project back in March, but it&#8217;s come a long was in the past six months and is doing something I&#8217;ve been waiting years to see someone do: comparing how different news organizations select and cover news.</p>
<p>The final session was about visualization.  <a href="http://davidsmall.com/">David Small</a> offered a presentation about literally transparent text that was, in the words of <a href="http://twitter.com/nrchtct/status/4154937460"><span>Marian Dörk</span></a>, &#8220;<span id="msgtxt4154937460">refreshingly non-utilitarian and visually stimulating&#8221;. <a href="http://benfry.com/">Ben Fry</a> showed the power of visualizing changes in a document over time&#8211;specifically, a project called &#8220;<a href="http://www.benfry.com/traces/">the preservation of favoured traces</a>&#8221; that illustrates  the evolution of Darwin&#8217;s <a href="http://en.wikipedia.org/wiki/On_the_Origin_of_Species"><em>On the Origin of Species</em></a>. But, as expected, IBM&#8217;s <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">Many Eyes</a> researchers </span><a href="http://fernandaviegas.com/">Fernanda Viégas</a> and <a href="http://www.bewitched.com/">Martin Wattenberg</a> stole the show with an incredibly informative and entertaining presentation about the visualization of repetition in text. No summary can do it justice, so I urge you to watch the video when it is available.</p>
<p>After all that, we enjoyed a nice reception at the <a href="http://www.research.ibm.com/social/"> IBM Center for Social Software</a>. I&#8217;m incredibly grateful to IBM for organizing and sponsoring this event, and to Martin Wattenberg for being so kind as to invite me. I&#8217;ll try to earn my keep in my 5 minutes at the &#8220;Ignite-style&#8221; session tomorrow morning.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/21/transparent-text-symposium-day-1/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Follow-Up Podcast for UIE Seminar on Faceted Search (Free!)</title>
		<link>http://thenoisychannel.com/2009/09/21/follow-up-podcast-for-uie-seminar-on-faceted-search-free/</link>
		<comments>http://thenoisychannel.com/2009/09/21/follow-up-podcast-for-uie-seminar-on-faceted-search-free/#comments</comments>
		<pubDate>Tue, 22 Sep 2009 02:21:42 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2582</guid>
		<description><![CDATA[Last month, Pete Bell and I presented a virtual seminar on faceted search for Jared Spool&#8217;s User Interface Engineering (UIE). Whether or not you attended the seminar, you can listen to a free podcast in which we answer some of the questions we didn&#8217;t get to during the seminar. If you still have an unanswered [...]]]></description>
			<content:encoded><![CDATA[<p>Last month, Pete Bell and I presented a <a href="http://thenoisychannel.com/2009/08/20/uie-virtual-seminar-on-faceted-search-a-great-experience/">virtual seminar on faceted search</a> for Jared Spool&#8217;s <a href="http://www.uie.com/">User Interface Engineering</a> (UIE). Whether or not you attended the seminar, you can listen to a <a href="http://www.uie.com/brainsparks/2009/09/21/spoolcast-designing-for-facets-followup/">free podcast</a> in which we answer some of the questions we didn&#8217;t get to during the seminar. If you still have an unanswered question, I encourage you to ask it in the comment thread, and I&#8217;ll do my best to answer it!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/21/follow-up-podcast-for-uie-seminar-on-faceted-search-free/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Live Tweeting from Transparent Text Symposium</title>
		<link>http://thenoisychannel.com/2009/09/21/live-tweeting-from-transparent-text-symposium/</link>
		<comments>http://thenoisychannel.com/2009/09/21/live-tweeting-from-transparent-text-symposium/#comments</comments>
		<pubDate>Mon, 21 Sep 2009 15:43:21 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2580</guid>
		<description><![CDATA[As promised, I&#8217;ll blog about the two-day Transparent Text symposium when it&#8217;s over and I have a chance to collect and express my thoughts. But for now you can follow the live Twitter stream at #tt09.
]]></description>
			<content:encoded><![CDATA[<p>As promised, I&#8217;ll blog about the two-day <a href="http://www.research.ibm.com/social/transparent_text/">Transparent Text</a> symposium when it&#8217;s over and I have a chance to collect and express my thoughts. But for now you can follow the live Twitter stream at <a href="http://search.twitter.com/search?q=%23tt09">#tt09</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/21/live-tweeting-from-transparent-text-symposium/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Project Gaydar: A Reminder That Privacy Isn&#8217;t Binary</title>
		<link>http://thenoisychannel.com/2009/09/20/project-gaydar-a-reminder-that-privacy-isnt-binary/</link>
		<comments>http://thenoisychannel.com/2009/09/20/project-gaydar-a-reminder-that-privacy-isnt-binary/#comments</comments>
		<pubDate>Sun, 20 Sep 2009 18:48:46 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2577</guid>
		<description><![CDATA[There&#8217;s a nice article in the Boston Globe about &#8220;Project Gaydar&#8220;, a project to predict who is gay based on statistically analyzing their Facebook networks. They&#8217;ve only done ad hoc validation of their predictions, but claim that their results seem accurate. The involvement of distinguished MIT professor Hal Abelson (at least to the point where [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a nice article in the Boston Globe about &#8220;<a href="http://www.boston.com/bostonglobe/ideas/articles/2009/09/20/project_gaydar_an_mit_experiment_raises_new_questions_about_online_privacy/">Project Gaydar</a>&#8220;, a project to predict who is gay based on statistically analyzing their Facebook networks. They&#8217;ve only done ad hoc validation of their predictions, but claim that their results seem accurate. The involvement of distinguished MIT professor <a href="http://groups.csail.mit.edu/mac/users/hal/">Hal Abelson</a> (at least to the point where he&#8217;s quoted in the article) lends credibility to their effort.</p>
<p>I&#8217;m glad to finally see a real world example of the issues I blogged about last year in a post entitled &#8220;<a href="http://thenoisychannel.com/2008/04/15/privacy-and-information-theory/">Privacy and Information Theory</a>&#8220;:</p>
<blockquote><p>The mainstream debates treat information privacy as binary. Even when people discuss gradations of privacy, they tend to think in terms of each particular disclosure (e.g., age, favorite flavor of ice cream) as binary. But, if we take an information-theoretic look at disclosure, we immediately see that this binary view of disclosure is illusory.</p></blockquote>
<p>I&#8217;m curious to see if this project advances the conversation. At the very least, I&#8217;m gratified to see my abstract ramblings validated by a real-world example!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/20/project-gaydar-a-reminder-that-privacy-isnt-binary/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>T2: Judgment Day for Twine?</title>
		<link>http://thenoisychannel.com/2009/09/19/t2-judgment-day-for-twine/</link>
		<comments>http://thenoisychannel.com/2009/09/19/t2-judgment-day-for-twine/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 16:43:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2572</guid>
		<description><![CDATA[
Nova Spivack, CEO and founder of Radar Networks, just released a preview (see above) announcing Twine 2.0, a semantic search engine to be released later this year. As Erick Schonfeld points out on TechCrunch, Twine hasn&#8217;t managed to attract broad adoption. I tried it briefly when it came out, and I have to confess that [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/jWF3m14i7Vk&amp;hl=en&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="425" height="344" src="http://www.youtube.com/v/jWF3m14i7Vk&amp;hl=en&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object><br />
<a href="http://www.twine.com/team">Nova Spivack</a>, CEO and founder of <a href="http://www.twine.com/about-radar">Radar Networks</a>, just released a preview (see above) announcing Twine 2.0, a semantic search engine to be released later this year. As Erick Schonfeld points out on <a href="http://www.techcrunch.com/2009/09/18/sneak-peak-at-t2-twines-semantic-search-engine/">TechCrunch</a>, Twine hasn&#8217;t managed to attract broad adoption. I tried it briefly when it came out, and I have to confess that I never understood it.</p>
<p>But I can certainly see the appeal of delivering <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> for the web to support <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory</a> information seeking. It&#8217;s the dream that&#8217;s been driving <a href="http://bing.com/">Bing</a>, <a href="http://freebase.com/">Freebase</a>, not to mention smaller efforts like <a href="http://kosmix.com/">Kosmix</a>. It&#8217;s <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">hard</a>, to be sure. But, as Sarah Lacy tells us, <a href="http://www.techcrunch.com/2009/09/17/memo-to-start-ups-you%E2%80%99re-supposed-to-be-changing-the-world-remember/">startups are supposed to be changing the world</a>&#8211;and established companies can play too.</p>
<p>The demo video is appealing, but I&#8217;ll believe it when I can off-road on it&#8211;and on more than just recipes and restaurants, two highly structured domains that already well covered by sites like <a href="http://www.foodnetwork.com/">Food Network</a> and <a href="http://www.yelp.com/">Yelp</a>. Twine doesn&#8217;t necessarily have to cover all domains to be useful&#8211;perhaps a &#8220;<a href="http://thenoisychannel.com/2009/09/14/is-bing-optimizing-for-the-short-snout/">short snout</a>&#8221; approach like Bing&#8217;s will be good enough to drive adoption.</p>
<p>In any case, I&#8217;m impressed with Twine&#8217;s ambition. But ambition isn&#8217;t enough&#8211;especially given the increasing number of people and companies who share it. If Nova really wants to build a &#8220;<a href="http://novaspivack.typepad.com/nova_spivacks_weblog/2005/10/towards_a_world.html">World Wide Database</a>&#8220;, then he&#8217;ll have to do more than swing for the fences and miss. I&#8217;ll be waiting for a beta invite, and I&#8217;ll let you know what I find out.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/19/t2-judgment-day-for-twine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Transparent Text Symposium</title>
		<link>http://thenoisychannel.com/2009/09/19/transparent-text-symposium/</link>
		<comments>http://thenoisychannel.com/2009/09/19/transparent-text-symposium/#comments</comments>
		<pubDate>Sat, 19 Sep 2009 12:22:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2568</guid>
		<description><![CDATA[One of the unexpected benefits of accepting an invitation to speak at SIGMOD 2009 was an invitation from fellow participant Martin Wattenberg to attend the upcoming Transparent Text symposium at the IBM Center for Social Software:
The Transparent Text symposium is  			a free event that will focus on ways to make large collections of  [...]]]></description>
			<content:encoded><![CDATA[<p>One of the unexpected benefits of accepting an invitation to <a href="http://thenoisychannel.com/2009/07/02/the-wild-world-of-sigmod/">speak at SIGMOD 2009</a> was an invitation from fellow participant <a href="http://www.bewitched.com/">Martin Wattenberg</a> to attend the upcoming <a href="http://www.research.ibm.com/social/transparent_text/">Transparent Text</a> symposium at the <a href="http://www.research.ibm.com/social/">IBM Center for Social Software</a>:</p>
<blockquote><p>The Transparent Text symposium is  			a free event that will focus on ways to make large collections of  			documents understandable to laypeople and experts alike. We are  			interested in approaches that shed light on unstructured text,  			ranging from novel statistical techniques to web-based crowdsourcing.</p></blockquote>
<p>The<a href="http://www.research.ibm.com/social/transparent_text/participants.html"> speaker list</a> is impressive, ranging from familiar (at least to me) interface experts  <a href="http://benfry.com/">Ben Fry</a> and <a href="http://people.ischool.berkeley.edu/%7Ehearst/">Marti Hearst</a> to social scientist <a href="http://gking.harvard.edu/">Gary King</a> and <a href="http://www.sunlightfoundation.com/">Sunlight Foundation</a> Executive Director <a href="http://www.sunlightfoundation.com/people/emiller/">Ellen Miller</a>. IBM also contributed some of its own researchers to the program, including <a href="http://money.cnn.com/2009/06/26/technology/ibm_jeopardy_watson_computer/?postversion=2009062616">David Ferrucci</a>, who has been leading the <a href="http://thenoisychannel.com/2009/04/27/who-wants-to-play-jeopardy/">Jeopardy</a> project. There&#8217;s even an &#8220;<span>Ignite-style&#8221; session where all attendees will have the opportunity to give five-minute presentations.</span></p>
<p><span>I&#8217;m looking forward to the eclectic mix of speakers and attendees. As <a href="http://www.cdixon.org/?p=989">Chris Dixon</a> recently reminded us, it&#8217;s important to introduce some randomization into our intellectual diets so that we don&#8217;t get stuck in a rut of local optimization. While an event with a theme of transparency and interacting with textual information is hardly a detour for me, I am excited about the opportunity to hear a diversity of new perspectives on this topic. There will be videos of the speakers posted after the event, as well as a  Twitter stream at</span><span id="msgtxt4105774644"> <a href="http://search.twitter.com/search?q=%23tt09">#tt09</a>.<br />
</span></p>
<p><span>Of course, I&#8217;ll blog about what I learn and recycle it in the discussion activities at the <a href="http://cuaslis.org/hcir2009/">HCIR workshop</a> next month.<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/19/transparent-text-symposium/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: The Haystack Blog</title>
		<link>http://thenoisychannel.com/2009/09/17/blogs-i-read-the-haystack-blog/</link>
		<comments>http://thenoisychannel.com/2009/09/17/blogs-i-read-the-haystack-blog/#comments</comments>
		<pubDate>Thu, 17 Sep 2009 13:19:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2564</guid>
		<description><![CDATA[It&#8217;s been quite the week in tech business news, with Adobe acquiring Omniture, Google acquiring reCAPTCHA and being rumored (falsely) to acquire Brightcove, Facebook announcing that is has over 300M users and is cash-flow positive, and Twitter closing a new round of funding at a $1B valuation. Recession? What recession?
But sometimes I like to get [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s been quite the week in tech business news, with <a href="http://www.adobe.com/aboutadobe/invrelations/adobeandomniture.html">Adobe acquiring Omniture</a>, <a href="http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html">Google acquiring reCAPTCHA</a> and being <a href="http://www.businessinsider.com/google-to-buy-brightcove-2009-9">rumored</a> (falsely) to acquire Brightcove, Facebook announcing that is has <a href="http://blog.facebook.com/blog.php?post=136782277130">over 300M users and is cash-flow positive</a>, and Twitter closing a new round of funding at a <a href="http://www.techcrunch.com/2009/09/16/twitter-closing-new-venture-round-with-1-billion-valuation/">$1B valuation</a>. Recession? What recession?</p>
<p>But sometimes I like to get away from all that and turn back to my roots inside the ivory tower. And that leads me to one of my favorite university blogs: the <a href="http://groups.csail.mit.edu/haystack/blog/">Haystack Blog</a>.</p>
<p>The Haystack Blog is published by faculty and grad students in the <a href="http://www.csail.mit.edu/">MIT Computer Science and AI Lab (CSAIL)</a>&#8211;specifically those in the <a href="http://groups.csail.mit.edu/haystack/">Haystack</a> group. Principal Investigator (and occasional <a href="http://stellar.mit.edu/S/pe/2009q5/0304.1/index.html">dance instructor</a>) <a href="http://people.csail.mit.edu/karger/">David Karger</a> is its most prolific blogger&#8211;you might have read some of his <a href="http://groups.csail.mit.edu/haystack/blog/?s=sigir09">SIGIR 2009 posts</a> or his debate with <a href="http://www.betaversion.org/~stefano/linotype/">Stefano Mazzocchi</a> about <a href="http://groups.csail.mit.edu/haystack/blog/?s=Stefano+RDF">how to properly use RDF</a>. But other people&#8217;s posts are just as interesting&#8211;check out the most recent post by <a href="http://www.mit.edu/~ebakke/">Eirik Bakke</a> about <a href="http://groups.csail.mit.edu/haystack/blog/2009/09/16/spreadsheets-vs-relational-databases-bridging-the-gap/">bridging the gap between spreadsheets and relational databases</a>.</p>
<p>I wish that more universities and departments would encourage their faculty and students to blog. As <a href="http://www.daniel-lemire.com/blog/">Daniel Lemire</a> has pointed out, it&#8217;s a great way for academic researchers to get their ideas out and build up their reputations and networks. He should know&#8211;he leads by example. Likewise, Haystack is setting a great example for university blogs, and is a credit to MIT and CSAIL.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/17/blogs-i-read-the-haystack-blog/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Udorse: Give Product Placement a Chance</title>
		<link>http://thenoisychannel.com/2009/09/15/udorse-give-product-placement-a-chance/</link>
		<comments>http://thenoisychannel.com/2009/09/15/udorse-give-product-placement-a-chance/#comments</comments>
		<pubDate>Tue, 15 Sep 2009 14:38:53 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2555</guid>
		<description><![CDATA[
Those of you who don&#8217;t live and breathe the software startup scene might be oblivious that a substantial fraction of Silicon Valley is following TechCrunch50, an annual competition hosted by TechCrunch. As if it weren&#8217;t enough to have A-list judges like Marissa Mayer and Paul Graham, there&#8217;s even the fortuitous timing of Intuit acquiring 2007 [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/09/udorse.gif"><img class="alignnone size-full wp-image-2557" title="udorse" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/09/udorse.gif" alt="udorse" width="384" height="217" /></a></p>
<p>Those of you who don&#8217;t live and breathe the software startup scene might be oblivious that a substantial fraction of Silicon Valley is following <a href="http://www.techcrunch50.com/">TechCrunch50</a>, an annual competition hosted by TechCrunch. As if it weren&#8217;t enough to have A-list judges like <a href="http://en.wikipedia.org/wiki/Marissa_Mayer">Marissa Mayer</a> and <a href="http://en.wikipedia.org/wiki/Paul_Graham">Paul Graham</a>, there&#8217;s even the fortuitous timing of <a href="http://www.techcrunch.com/2009/09/13/intuit-to-acquire-former-techcrunch50-winner-mint-for-170-million/">Intuit acquiring 2007 TC50 winner Mint</a> for a respectable $170M.</p>
<p>Here in New York, I have to confess that I haven&#8217;t had my eyes glued to the proceedings. But I have been looking at some of the <a href="http://www.techcrunch50.com/2009/companies/">entries</a>, and one that at least stands out as distinctive is <a href="http://udorse.com/">Udorse</a> (and no, I&#8217;m not just biased because they&#8217;re <a href="http://foursquare.com/venue/68818">local</a>). Their premise is simple: democratize product placement through &#8220;visual endorsement&#8221;. Everyone who shares photos can embed a &#8220;udorsement&#8221; and can either pocket the advertising revenue or donate it to charity. More details from <a href="http://www.techcrunch.com/2009/09/14/tc50-udorse-leverages-facebook-photos-for-social-product-endorsements/">TechCrunch</a> (naturally) and <a href="http://digital.venturebeat.com/2009/09/14/tc50-udorse-lets-you-tag-your-photos-with-product-endorsements/">VentureBeat</a>.</p>
<p>Perhaps your reaction is like mine, uncertain whether to be awed or horrified by this simple concept. Indeed, given my penchant for <a href="http://thenoisychannel.com/2009/08/07/will-browsers-ship-with-ad-blockers/">using ad blockers</a>, you might think I&#8217;d be ideologically against product placement.</p>
<p>But I&#8217;m not, as long as it&#8217;s transparent&#8211;and, as far as I can tell, Udorse passes that test. In theory, this is advertising done right: content creators monetizing their own content by advertising goods and services they believe in&#8211;and putting their own credibility on the line to do so.</p>
<p>Of course, it might turn out very differently in practice. Any way of making money online brings out the worst in people, and I&#8217;m sure we&#8217;ll see lots of people try to game this service if it takes off. Meanwhile, people like me will probably block the &#8220;udorsements&#8221; like any other ads.</p>
<p>Or maybe not. I certainly don&#8217;t block emails from friends recommending the products they like, and I actually wish it were easier to benefit from their sincere opinions. If Udorse succeeds in a way that feels like word-of-mouth marketing, I&#8217;ll be thrilled. I think it&#8217;s a long shot, but I&#8217;m at least intrigued by their approach.</p>
<p>ps. No, I wasn&#8217;t payed to write this post, nor do I have any stake in Udorse. I at least have to keep my record clean for the <a href="http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/">Ethics of Blogging panel</a> next week!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/15/udorse-give-product-placement-a-chance/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Bing Visual Search Beta</title>
		<link>http://thenoisychannel.com/2009/09/14/bing-visual-search-beta/</link>
		<comments>http://thenoisychannel.com/2009/09/14/bing-visual-search-beta/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 19:48:39 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2549</guid>
		<description><![CDATA[
Bing launched a Visual Search beta today that is fun to play with. The name may be a bit misleading&#8211;this isn&#8217;t an image search engine, let alone one that allows you to find images based on visual similarity. Rather, it&#8217;s a graphically intensive (don&#8217;t forget to install Silverlight!) way to explore a small data collection.
I [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/09/Periodic-Table.gif"><img class="alignnone size-full wp-image-2551" title="Periodic Table" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2009/09/Periodic-Table.gif" alt="Periodic Table" width="538" height="338" /></a></p>
<p>Bing launched a <a href="http://www.bing.com/visualsearch">Visual Search</a> beta today that is fun to play with. The name may be a bit misleading&#8211;this isn&#8217;t an image search engine, let alone one that allows you to find images based on visual similarity. Rather, it&#8217;s a graphically intensive (don&#8217;t forget to install <a href="http://silverlight.net/">Silverlight</a>!) way to explore a small data collection.</p>
<p>I agree with <a href="http://searchengineland.com/bing-2-0-unveiled-visual-search-25703">Elisabeth Osmeloski</a> at Search Engine Land that the galleries included with this beta launch emphasize novelty over utility. Still, it&#8217;s nice to see a visual <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> application for exploring the <a href="http://www.bing.com/visualsearch?q=Periodic+table&amp;g=periodic_table_of_elements&amp;FORM=Z9GE54">periodic table</a>. And it&#8217;s an interesting example of <a href="http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/">micro-IR</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/14/bing-visual-search-beta/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Is Bing Optimizing for the Short Snout?</title>
		<link>http://thenoisychannel.com/2009/09/14/is-bing-optimizing-for-the-short-snout/</link>
		<comments>http://thenoisychannel.com/2009/09/14/is-bing-optimizing-for-the-short-snout/#comments</comments>
		<pubDate>Mon, 14 Sep 2009 18:55:06 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2542</guid>
		<description><![CDATA[In a post about Bing on CNET today, Rafe Needleman comments that &#8220;it makes business sense to pour resources into popular searches. Optimizing for the short snout pays.&#8221;
First, it&#8217;s an interesting counterpoint to the conventional wisdom that search (if not the future of business as we know it) is all about the &#8220;long tail&#8220;. But [...]]]></description>
			<content:encoded><![CDATA[<p>In a <a href="http://news.cnet.com/8301-19882_3-10351491-250.html">post about Bing</a> on CNET today, Rafe Needleman comments that &#8220;it makes business sense to pour resources into popular searches. Optimizing for the short snout pays.&#8221;</p>
<p>First, it&#8217;s an interesting counterpoint to the conventional wisdom that search (if not the future of business as we know it) is all about the &#8220;<a href="http://en.wikipedia.org/wiki/The_Long_Tail">long tail</a>&#8220;. But second and more importantly, it&#8217;s an intriguing claim about Bing&#8217;s strategy for differentiating itself from Google.</p>
<p>Needleman goes on to say:</p>
<blockquote><p>I&#8217;d wager that this is how Bing is making its gains in market share. Latest Nielsen data says Bing gained 22 percent month-over-month in August, taking it to 10.7 percent of all U.S. searches. People probably try Bing for a travel or product search (where there&#8217;s also a cash-back financial kicker) and remember their good experience, and then they try it for more obscure searches and find it good enough. It highlights, I believe, an important flaw in Google&#8217;s historic strategy of indexing the entire Web equally well and making the user interface fast and consistent above all, as opposed to specializing as dictated by the query.</p></blockquote>
<p>While I&#8217;ve never heard this claim about Bing before, it is consistent with something I&#8217;ve noticed&#8211;and which <a href="http://research.microsoft.com/en-us/people/nickcr/">Nick Craswell</a> said when he <a href="http://thenoisychannel.com/2009/08/02/sigir-2009-day-3-industry-track-nick-craswell/">talked about Bing at SIGIR 2009</a>. In the upper left area that Bing calls the table of contents (TOC), Bing selectively presents a refinement interface based on the entity type it infers for the search query. For example, a search for <a href="http://www.bing.com/search?q=argentina">Argentina</a> returns options that include Argentina Map,  Argentina Tourism, and  Argentina Culture; while a search for <a href="http://www.bing.com/search?q=abraham+lincoln">Abraham Lincoln</a> returns options that include Abraham Lincoln Speeches and  Abraham Lincoln Facts.</p>
<p>It&#8217;s a nifty feature, even if marketers and reporters have <a href="http://thenoisychannel.com/2009/06/26/search-innovation-why-cant-we-all-just-get-along/">struggled to label it</a>. But, as Needleman says, it does indeed focus on the short snout. For example, there are no TOC options when you search for <a href="http://www.bing.com/search?q=faceted+search">faceted search</a>, since the technical term doesn&#8217;t match a recognized entity type. Searches for names of auto companies, such as <a href="http://www.bing.com/search?q=toyota">Toyota</a>, yield a rich set of options, while  those for scooter companies like <a href="http://www.bing.com/search?q=vespa">Vespa</a> do not. Similarly, searches for <a href="http://www.bing.com/search?q=beyonce">celebrities</a> receive VIP treatment, as compared to  searches for <a href="http://www.bing.com/search?q=daniel+tunkelang">ordinary people</a> that just return a list of search results.</p>
<p>All in all, I&#8217;m inclined to agree with Needleman that Bing is focusing on the short snout&#8211;and I love that phrase to describe it. The open question is whether he&#8217;s right that users &#8220;remember their good experience, and then they try it for more obscure searches and find it good enough&#8221;. It would be great to see data to confirm or refute that hypothesis.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/14/is-bing-optimizing-for-the-short-snout/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Micro vs. Macro Information Retrieval</title>
		<link>http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/</link>
		<comments>http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/#comments</comments>
		<pubDate>Sat, 12 Sep 2009 18:25:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2537</guid>
		<description><![CDATA[The Probably Irrelevant blog has been quiet for a while, but I was happy to see a new post there by Miles Efron about &#8220;micro-IR&#8220;. He characterizes micro-IR, as distinct from macro or general IR, as follows:

In ad hoc (text) IR a principal intellectual challenge lies in modeling ‘aboutness.’  In micro-IR settings, the creativity comes [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://probablyirrelevant.org/">Probably Irrelevant</a> blog has been quiet for a while, but I was happy to see a new post there by <a href="http://people.lis.illinois.edu/~mefron/">Miles Efron</a> about &#8220;<a href="http://probablyirrelevant.org/2009/09/micro-ir/">micro-IR</a>&#8220;. He characterizes micro-IR, as distinct from macro or general IR, as follows:</p>
<ol>
<li>In ad hoc (text) IR a principal intellectual challenge lies in modeling ‘aboutness.’  In micro-IR settings, the creativity comes into play in posing a useful (and tractable) question to answer.  The engineering comes easily after that.</li>
<li>The constrained nature of micro-IR applications leads to a lightweight articulation of information need.  There is a tight coupling here between task, query, and the unit of retrieval, a dynamic that I think is compelling.  Pushing this a bit farther, we might consider the simple act of choosing to use a particular application from those apps on a user’s palette as part of the information need expression.</li>
<li>The tight coupling of task to data to ‘query’ enables a strong contextual element to inform the interaction.  Context constitutes the foreground of the micro-IR interaction.</li>
</ol>
<p>He then asks: &#8220;is micro-IR something at all?  Is it actually related to IR?&#8221; <a href="http://ciir.cs.umass.edu/~fdiaz/">Fernando Diaz</a> answers that &#8220;the only difference between micro and macro IR is text.&#8221; <a href="http://lifidea.wordpress.com/">Jinyoung Kim</a> adds that in micro-IR &#8220;the context (searcher goal) is known, with domain-specific notion of relevance (goodness) and similarity measures.&#8221;</p>
<p>I hadn&#8217;t thought of making this particular distinction, but I like it. While I prefer to think about distinguishing the needs of information seekers&#8211;rather than the characteristics of search applications&#8211;I would be the first to argue that a well-designed search application caters to particular user needs. Indeed, I think the definition of a good micro-IR application implies that it addresses a highly constrained space of information needs. Just as importantly, micro-IR applications can often assume that their users are highly familiar with the information space the applications address, and thus that those users need less of the basic <a href="http://people.csail.mit.edu/teevan/work/publications/papers/chi04.pdf">orienteering</a> support that can be critical for success using macro-IR systems. That said, micro-IR users have (or should have) higher expectations of support for more sophisticated information seeking.</p>
<p>The other day, I speculated about <a href="http://thenoisychannel.com/2009/08/14/why-does-google-hold-back-on-faceted-search/">why Google holds back on faceted search</a>. I feel that the distinction between macro- and micro-IR is in the same vein: micro-IR settings (e.g., site search, enterprise search,vertical search) drive needs for more richer interfaces and support for interaction, while macro-IR application developers (e.g., general web search) worry mostly about producing a reasonable answer for the query&#8211;and often lead users to micro-IR destinations that offer their own support for information seeking within their constrained domains.</p>
<p>In short, it&#8217;s a nice way to think about the IR application space, and it&#8217;s increasingly relevant (no pun intended!) as we see a proliferation of micro-IR applications. And it&#8217;s great to see activity on the Probably Irrelevant blog after all these months of radio silence!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/12/micro-vs-macro-information-retrieval/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Yahoo on Key Scientific Challenges in Search and Machine Learning</title>
		<link>http://thenoisychannel.com/2009/09/11/yahoo-on-key-scientific-challenges-in-search-and-machine-learning/</link>
		<comments>http://thenoisychannel.com/2009/09/11/yahoo-on-key-scientific-challenges-in-search-and-machine-learning/#comments</comments>
		<pubDate>Fri, 11 Sep 2009 21:00:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2533</guid>
		<description><![CDATA[Like many folks, I&#8217;ve assumed that Yahoo&#8217;s partnership with Bing&#8211;assuming it is approved&#8211;offers the best chance of validating CEO Carol Bartz&#8217;s claim that Yahoo has &#8220;never been a search company&#8220;. She may not be able to change the past, but she certainly is making up for lost time. To be clear, I agree with her [...]]]></description>
			<content:encoded><![CDATA[<p>Like many folks, I&#8217;ve assumed that Yahoo&#8217;s partnership with Bing&#8211;assuming it is approved&#8211;offers the best chance of validating CEO Carol Bartz&#8217;s claim that Yahoo has &#8220;<a href="http://bits.blogs.nytimes.com/2009/08/07/yahoo-ceo-we-have-never-been-a-search-company/">never been a search company</a>&#8220;. She may not be able to change the past, but she certainly is making up for lost time. To be clear, I agree with her 100% that Yahoo should have accepted Microsoft&#8217;s $40B acquisition offer last year&#8211;in her words, “<a href="http://www.techflash.com/seattle/2009/09/yahoo_boss_bartz_says_it_was_stupid_to_turn_down_microsoft.html">Sure, do you think I’m stupid?</a>” But I&#8217;m still struggling to understand the rationale behind the deal Yahoo did accept.</p>
<p>In any case, Yahoo researchers haven&#8217;t stopped thinking about search. As <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> reports, Yahoo recently issued a press release about its <a href="http://research.yahoo.com/node/2896">Key Scientific Challenges Summit</a>. Jeff was kind enough to post <a href="http://ciir.cs.umass.edu/~hfeild/">Henry Feild&#8217;</a>s notes about the presentations by <a href="http://research.yahoo.com/Andrew_Tomkins">Andrew Tomkins</a> on <a href="http://www.searchenginecaffe.com/2009/09/yahoo-key-scientific-challenges.html">search</a> and by <a href="http://research.yahoo.com/Sathiya_Keerthi_Selvaraj">Sathiya Keerthi Selvaraj</a> on <a href="http://www.searchenginecaffe.com/2009/09/yahoo-key-scientific-challenges-summit.html">machine learning</a>. I&#8217;d love to hear more detail about how they perceive (and hope to address) the search challenges of optimizing task-aware relevance and measuring / predicting generating user engagement.</p>
<p>Regardless of Yahoo&#8217;s fate, I&#8217;m certainly glad that there are still people at Yahoo working on these big problems. I hope they find a way to develop solutions and bring those solutions to the users who need them.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/11/yahoo-on-key-scientific-challenges-in-search-and-machine-learning/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>The Ethics of Blogging</title>
		<link>http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/</link>
		<comments>http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/#comments</comments>
		<pubDate>Thu, 10 Sep 2009 15:53:42 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2527</guid>
		<description><![CDATA[A few people have commented that the events I advertise here tend to be expensive&#8211;or, worse, require a lot of work to get into! So I&#8217;m glad to announce a freebie that I hope will be as much fun for me as for attendees.
I&#8217;ve been invited to participate in a webinar on the ethics of [...]]]></description>
			<content:encoded><![CDATA[<p>A few people have commented that the events I advertise here tend to be expensive&#8211;or, worse, require a lot of work to get into! So I&#8217;m glad to announce a freebie that I hope will be as much fun for me as for attendees.</p>
<p>I&#8217;ve been invited to participate in a <a href="http://www.socialmediatoday.com/submitform/smtwebinar092409/?reference=smt_dtunkelang">webinar on the ethics of blogging</a> that will take place Thursday, September 24th at 1 PM EST. It&#8217;s free to attend; just register online <a href="http://www.socialmediatoday.com/submitform/smtwebinar092409/?reference=smt_dtunkelang">here</a>.</p>
<p><a href="http://socialmediagroup.com/about/">Maggie Fox</a>, founder and CEO of <a href="http://socialmediagroup.com/">Social Media Group</a>, will moderate. My two co-panelists are<a href="https://twitter.com/augieray"> Augie Ray</a>, who blogs at <a href="http://www.experiencetheblog.com/">Experience: The Blog</a>) and is Managing Director of Experiential Marketing at interactive and social media agency <a href="http://www.fullhouseinteractive.com/">Fullhouse</a>, and <a href="http://johnjantsch.com/">John Jantsch</a>, who blogs at <a href="http://www.ducttapemarketing.com/blog/">Duct Tape Marketing</a> and is a marketing and digital technology coach.</p>
<p>Among the topics to be discussed:</p>
<ul>
<li>Transparency: How and when should a blogger reveal revenue sources?</li>
<li>Pay for play: Blog posts, tweets, and more as marketing tools</li>
<li>Online privacy</li>
<li>Astroturfing: Organizations creating artificial &#8220;grassroots&#8221; campaigns</li>
<li>Compliance and Legal: What should a corporate blog policy look like? What are a blogger&#8217;s legal obligations?</li>
</ul>
<p>I hope some of you will be able to attend! Regardless, please use the comment thread make suggestions here about topics you&#8217;d like me to cover or concerns you&#8217;d like to see me address. I know that a lot of you have thought hard about these issues, and I&#8217;d like to ethically exploit your collective wisdom.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/10/the-ethics-of-blogging/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>CIKM 2009 Accepted Papers</title>
		<link>http://thenoisychannel.com/2009/09/09/cikm-2009-accepted-papers/</link>
		<comments>http://thenoisychannel.com/2009/09/09/cikm-2009-accepted-papers/#comments</comments>
		<pubDate>Wed, 09 Sep 2009 15:50:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2523</guid>
		<description><![CDATA[The two biggest academic conferences for information retrieval are SIGIR and CIKM (a site which, sadly, is still hacked). Hopefully some of you enjoyed my coverage of SIGIR 2009&#8211;or, better yet, attended and experienced it for yourselves.
Anyway, thanks to Jeff Dalton for alerting me that the CIKM 2009 accepted papers list is now available. I [...]]]></description>
			<content:encoded><![CDATA[<p>The two biggest academic conferences for information retrieval are <a href="http://sigir.org/">SIGIR</a> and <a href="http://cikm.org/">CIKM</a> (a site which, sadly, is <a href="http://thenoisychannel.com/2008/10/27/cikm-08-attendees-please-blog/">still hacked</a>). Hopefully some of you enjoyed my <a href="http://thenoisychannel.com/?s=%22sigir+2009%22">coverage</a> of <a href="http://www.sigir2009.org/">SIGIR 2009</a>&#8211;or, better yet, attended and experienced it for yourselves.</p>
<p>Anyway, thanks to <a href="http://www.searchenginecaffe.com/2009/09/cikm-2009-papers.html">Jeff Dalton</a> for alerting me that the <a href="http://www.comp.polyu.edu.hk/conference/cikm2009">CIKM 2009</a> <a href="http://www.comp.polyu.edu.hk/conference/cikm2009/program/accepted_papers.htm">accepted papers</a> list is now available. I don&#8217;t plan to make it to Hong Kong this November, but I hope that those who do are kind enough to blog about it!</p>
<p>Also, I see mention of an industry track, but not of an Industry Event like the widely acclaimed one held at <a href="http://www.cikm2008.org/industry_event.php">CIKM 2008</a>&#8211;which inspired my own organization of the  <a href="http://www.sigir2009.org/Program/industry">SIGIR 2009 Industry Track</a>. I&#8217;m curious whether such industry events will prove to be one-time phenomena or will become a staple of these  conferences. I hope for the latter, but I am admittedly biased, given my industry-centric perspective.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/09/cikm-2009-accepted-papers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Not All Google Critics Are Bigots</title>
		<link>http://thenoisychannel.com/2009/09/05/not-all-google-critics-are-bigots/</link>
		<comments>http://thenoisychannel.com/2009/09/05/not-all-google-critics-are-bigots/#comments</comments>
		<pubDate>Sat, 05 Sep 2009 22:21:40 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2520</guid>
		<description><![CDATA[Jeff Jarvis wrote a post today entitled &#8220;Google bigotry&#8220;, in which he asserts that:
Google has an image problem – not a PR problem (that is, not with the public) but a press problem (with whining old media people).
He then goes on to launch a tirade against a Le Monde journalist whose offense was to say [...]]]></description>
			<content:encoded><![CDATA[<p>Jeff Jarvis wrote a post today entitled &#8220;<a href="http://www.buzzmachine.com/2009/09/05/google-bigotry/">Google bigotry</a>&#8220;, in which he asserts that:</p>
<blockquote><p>Google has an image problem – not a PR problem (that is, not with the public) but a press problem (with whining old media people).</p></blockquote>
<p>He then goes on to launch a tirade against a Le Monde journalist whose offense was to say she was writing &#8220;an article about Google facing a rising tide of discontent concerning privacy and monopoly.&#8221; He proceeds to stereotype the French as having &#8220;national insanity&#8221; of Google bigotry. I&#8217;ll leave analysis of irony as an exercise to the reader.</p>
<p>But the true irony is that Jarvis has a point. While I haven&#8217;t done a rigorous analysis, my impression is that there has been a sensationalist press overreaction against Google, singling out Google for behavior for which all other companies get a pass. As even one of the <a href="http://www.google-watch.org/gmail.html">most vocal Google critics</a> admits, &#8220;Google&#8217;s [privacy] policies are essentially no different than the policies of Microsoft, Yahoo, Alexa and Amazon.&#8221; Moreover, some of the newspapers criticizing Google as parasitic are the same ones who once turned&#8211;and still turn&#8211;to Google with open arms as a source of traffic&#8211;when they could easily cut Google off by configuring <a href="http://en.wikipedia.org/wiki/Robots_exclusion_standard">robots.txt</a>. Granted, the newspapers are now locked into a <a href="http://thenoisychannel.com/2009/04/20/mathew-ingram-google-helps-newspapers/">prisoner&#8217;s dilemma</a>, but they should at least take some responsibility for putting themselves in that position.</p>
<p>That said, there are lots of legitimate reasons to criticize Google, specifically concerning privacy and monopoly. While Google may not have engaged in any illegal or unethical practices to get there, it now holds a position as the primary gatekeeper to the internet for a substantial majority of Americans, as well as much of the western world. On the content creation side, site owners don&#8217;t ask &#8220;<a href="http://www.amazon.com/What-Would-Google-Jeff-Jarvis/dp/0061709719">What Would Google Do?</a>&#8220;&#8211;rather they ask how Google will index their sites. Meanwhile, on the consumption side, the broadening scope of Google&#8217;s role in ordinary people&#8217;s lives is legitimate cause for concern about privacy. It&#8217;s not insane or bigoted to raise these issues.</p>
<p>Moreover, Google claims to hold itself to a <a href="http://www.google.com/corporate/tenthings.html">higher standard</a> than other companies, so it&#8217;s not that surprising that people actually do hold them to it and criticize it when it flls short. Still, that&#8217;s no excuse for exaggeration or outright hallucination.</p>
<p>As I <a href="http://www.buzzmachine.com/2009/09/05/google-bigotry/#comment-400868">commented</a> on Jarvis&#8217;s blog, I don&#8217;t think he&#8217;s the most credible judge of Google&#8217;s critics. He responded in kind. Touché. I accept that exchanging personal attacks doesn&#8217;t advance the argument. Perhaps more detached voices can chime in.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/05/not-all-google-critics-are-bigots/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>John Battelle: &#8220;I don’t know what to ask about&#8221;</title>
		<link>http://thenoisychannel.com/2009/09/05/john-battelle-i-don%e2%80%99t-know-what-to-ask-about/</link>
		<comments>http://thenoisychannel.com/2009/09/05/john-battelle-i-don%e2%80%99t-know-what-to-ask-about/#comments</comments>
		<pubDate>Sat, 05 Sep 2009 13:28:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2517</guid>
		<description><![CDATA[John Battelle has a pair of posts on BingTweets (yes, I know, horrible name) entitled &#8220;Decisions Are Never Easy &#8211; So Far&#8220;. In his second post, he sums up the problem with conventional search engines in a nutshell: &#8220;I don’t know what to ask about&#8221;. His describing the need for a &#8220;decision engine&#8221; is a [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://battellemedia.com/">John Battelle</a> has a pair of posts on BingTweets (yes, I know, horrible name) entitled &#8220;<a href="http://bingtweets.com/ideas/decisions-are-never-easy-so-far/">Decisions Are Never Easy &#8211; So Far</a>&#8220;. In his <a href="http://bingtweets.com/ideas/decisions-are-never-easy-so-far-part-2/">second post</a>, he sums up the problem with conventional search engines in a nutshell: &#8220;I don’t know what to ask about&#8221;. His describing the need for a &#8220;decision engine&#8221; is a bit too obvious a nod to his sponsor, but he is nonetheless right in calling for information seeking support tools based on <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/05/john-battelle-i-don%e2%80%99t-know-what-to-ask-about/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HCIR: Better Than Magic!</title>
		<link>http://thenoisychannel.com/2009/09/04/hcir-better-than-magic/</link>
		<comments>http://thenoisychannel.com/2009/09/04/hcir-better-than-magic/#comments</comments>
		<pubDate>Fri, 04 Sep 2009 14:41:43 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2513</guid>
		<description><![CDATA[I&#8217;m a big fan of using machine learning and automated information extraction to improve search performance and generally support information seeking. I&#8217;ve had some very good experiences with both supervised (e.g., classification) and unsupervised (e.g., terminology extraction) learning approaches, and I think that anyone today who is developing an application to help people access text [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m a big fan of using <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a> and automated <a href="http://en.wikipedia.org/wiki/Information_extraction">information extraction</a> to improve search performance and generally support information seeking. I&#8217;ve had some very good experiences with both <a href="http://en.wikipedia.org/wiki/Supervised_learning">supervised</a> (e.g., <a href="http://en.wikipedia.org/wiki/Statistical_classification">classification</a>) and unsupervised (e.g., <a href="http://en.wikipedia.org/wiki/Terminology_extraction">terminology extraction</a>) learning approaches, and I think that anyone today who is developing an application to help people access text documents should at least give serious consideration to both kinds of algorithmic approaches. Sometimes automatic techniques work like magic!</p>
<p>But sometimes they don&#8217;t. <a href="http://netbase.com/">Netbase</a>&#8217;s recent experience with <a href="http://healthbase.netbase.com/">HealthBase</a> is, unfortunately, a case study in why you shouldn&#8217;t have too much faith in magic. As <a href="http://www.searchenginecaffe.com/2009/09/netbase-launches-healthbase-another.html">Jeff Dalton</a> noted, the &#8220;semantic search&#8221; is hit-or-miss. The hits are great, but it&#8217;s the misses that generate headlines like this one in TechCrunch: &#8220;<a href="http://www.techcrunch.com/2009/09/02/netbase-thinks-you-can-get-rid-of-jews-with-alcohol-and-salt/">Netbase Thinks You Can Get Rid Of Jews With Alcohol And Salt&#8221;</a>. Ouch.</p>
<p>It seems unfair to single out Netbase for a problem endemic to fully automated approaches, but they did <a href="http://netbase.com/press-releases/101">invite</a> the publicity. It would be easy to dig up a host of other purely automated approaches that are just as embarassing, if less publicized.</p>
<p><a href="http://marklogic.blogspot.com/2009/09/netbase-tragicomedy-perils-of-magic-and.html">Dave Kellogg</a> put it well (if a bit melodramatically) when he characterized this experience as a &#8220;tragicomedy&#8221; that reveals the perils of magic. His argument, in a nutshell, is that you don&#8217;t want to be completely dependent on an approach for which 80% accuracy is considered good enough. As he says, the problem with magic is that it can fail in truly spectacular ways.</p>
<p>Granted, there&#8217;s a lot more nuance to using automated content enrichment approaches. Some techniques (or implementations of general techniques) optimize for <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a> (i.e., minimizing false positives), while others optimize for <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a> (i.e., minimizing false negatives). Supervised techniques are generally more conservative than unsupervised ones: you might incorrectly assert that a document is about disease, but that&#8217;s less dramatic a failure than adding the word &#8220;Jews&#8221; to an automatically extracted medical vocabulary. In general, the more human input into the process, the more opportunity to improve the effectiveness and avoid embarassing mistakes.</p>
<p>Of course, the whole point of automation is to reduce the need for human input. Human labor is a lot more expensive that machine labor! But there&#8217;s a big difference between the mirage of eliminating human labor and the realistic aspiration to make its use more efficient and effective. That what <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval (HCIR)</a> is all about, and all of the evidence I&#8217;ve encountered confirms that it&#8217;s the right way to crack this nut. Look for yourselves at the proceedings of <a href="http://projects.csail.mit.edu/hcir/">HCIR &#8216;07</a> and <a href="http://research.microsoft.com/~ryenw/hcir2008/">&#8216;08</a>. Having just read through all of the submissions to <a href="http://cuaslis.org/hcir2009/">HCIR &#8216;09</a>, I can tell you that the state of the art keeps getting better.</p>
<p>Interestingly, even Google CEO Eric Schmidt may be getting around to drinking the kool-aid. In an <a href="http://www.techcrunch.com/2009/09/03/google-ceo-eric-schmidt-on-the-future-of-search-connect-it-straight-to-your-brain/">interview</a> published today in TechCrunch, he says: &#8220;We have to get from the sort of casual use of asking, querying&#8230;to &#8216;what did you mean?&#8217;.&#8221; Unfortunately, he then goes into science-fiction-AI land and seems to end up suggesting a natural language question-answering approach like <a href="http://www.wolframalpha.com/">Wolfram Alpha</a>. Still, at least his heart is in the right place.</p>
<p>Anyway, as they say, experience is the best teacher. Hopefully Netbase can recover from what could generously be called a public relations hiccup. But, as the aphorism continues, it is only the fool that can learn from no other. Let&#8217;s not be fools&#8211;and instead take away the moral of this story: instead of trying to automate everything, optimize the division of labor between human and machine. <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/04/hcir-better-than-magic/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Another Project to Measure Twitter Influence</title>
		<link>http://thenoisychannel.com/2009/09/03/another-project-to-measure-twitter-influence/</link>
		<comments>http://thenoisychannel.com/2009/09/03/another-project-to-measure-twitter-influence/#comments</comments>
		<pubDate>Thu, 03 Sep 2009 21:23:04 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2510</guid>
		<description><![CDATA[Just noticed that the Web Ecology Project has published &#8220;The Influentials: New Approaches for Analyzing Influence on Twitter&#8220;. The blog post includes a link to their full report.
Their approach strikes me as a generalization of measuring retweets, but perhaps I&#8217;m giving it too cursory a read. I did compare their results to TunkRank: we at [...]]]></description>
			<content:encoded><![CDATA[<p>Just noticed that the Web Ecology Project has published &#8220;<a href="http://www.webecologyproject.org/2009/09/analyzing-influence-on-twitter/">The Influentials: New Approaches for Analyzing Influence on Twitter</a>&#8220;. The blog post includes a link to their <a href="http://www.webecologyproject.org/wp-content/uploads/2009/09/influence-report-final.pdf">full report</a>.</p>
<p>Their approach strikes me as a generalization of measuring retweets, but perhaps I&#8217;m giving it too cursory a read. I did compare their results to <a href="http://tunkrank.com/">TunkRank</a>: we at least agree that <a href="http://tunkrank.com/score/mashable">mashable</a> is more influential than <a href="http://tunkrank.com/score/cnn">CNN</a>&#8211;though even as simple a measure as follower count would confirm that judgment.</p>
<p>Anyway, I am delighted to see serious researchers looking at this problem. I&#8217;m still hoping to investigate hypotheses regarding TunkRank and friend:follower ratios.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/03/another-project-to-measure-twitter-influence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Great Series of Posts on Medical Literature Search</title>
		<link>http://thenoisychannel.com/2009/09/02/great-series-of-posts-on-medical-literature-search/</link>
		<comments>http://thenoisychannel.com/2009/09/02/great-series-of-posts-on-medical-literature-search/#comments</comments>
		<pubDate>Wed, 02 Sep 2009 18:04:00 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2507</guid>
		<description><![CDATA[Gene Golovchinsky at FXPAL has written a great series of posts on medical literature search, specifically looking at how MeSH (Medical Subject Headings) has been used to augment conventional text search, and whether its use improves the overall effectiveness of information seeking.
Here are the posts:

What a tangled MeSH we weave
Open-source queries
Have queries, want answers

Even if [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://palblog.fxpal.com/?author=4">Gene Golovchinsky</a> at <a href="http://palblog.fxpal.com/">FXPAL</a> has written a great series of posts on medical literature search, specifically looking at how <a title="Medical Subject Headings | NCBI" href="http://www.nlm.nih.gov/mesh/meshhome.html" target="_blank">MeSH</a> (<strong>Me</strong>dical <strong>S</strong>ubject <strong>H</strong>eadings) has been used to augment conventional text search, and whether its use improves the overall effectiveness of information seeking.</p>
<p>Here are the posts:</p>
<ul>
<li><a href="http://palblog.fxpal.com/?p=1666">What a tangled MeSH we weave</a></li>
<li><a href="http://palblog.fxpal.com/?p=1710">Open-source queries</a></li>
<li><a href="http://palblog.fxpal.com/?p=1716">Have queries, want answers</a></li>
</ul>
<p>Even if you&#8217;re not specifically interested in medical literature search, I recommend you check these posts out. Much of the interesting work on information seeking is taking place in specialized domains like this one, where the value of getting it right offers far more promising returns than incremental improvements to general web search.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/09/02/great-series-of-posts-on-medical-literature-search/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Finding, Locating, Discovering</title>
		<link>http://thenoisychannel.com/2009/08/31/finding-locating-discovering/</link>
		<comments>http://thenoisychannel.com/2009/08/31/finding-locating-discovering/#comments</comments>
		<pubDate>Mon, 31 Aug 2009 16:10:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2503</guid>
		<description><![CDATA[Thanks to Tony Hollingsworth for alerting me to a post by Alex Campbell entitled &#8220;Stark realisation: I no longer depend on Google to find stuff&#8220;. The title is provocative link bait, but the take-away is very down to earth: Google is primarily useful for locating information than for discovering it.
Library scientists make a distinction between [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to <a href="http://twitter.com/hollingsworth/statuses/3661598935">Tony Hollingsworth</a> for alerting me to a post by Alex Campbell entitled &#8220;<a href="http://www.alexjcampbell.com/post/175271559/stark-realisation-i-no-longer-depend-on-google-to-find">Stark realisation: I no longer depend on Google to find stuff</a>&#8220;. The title is provocative link bait, but the take-away is very down to earth: Google is primarily useful for locating information than for discovering it.</p>
<p>Library scientists make a distinction between <a href="http://www.db.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item</a> and <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory</a> search. The former is about locating information: as an information seeker, you know the information exists, and you can even characterize it unambiguously; but the challenge is to convert that description into a location that allows you to retrieve the information. The latter is about discovery: you don&#8217;t know that the information you seek exists, and you may be sure of how to characterize what you are looking for&#8211;or even know what exactly you want until you&#8217;ve learned something about what is available.</p>
<p>These are extreme points on the information seeking spectrum, and most real-world tasks are in the middle, or combine subtasks of both types. For example, in physical libraries (yes, I&#8217;m that old!), I remember finding a book in the stacks and then browsing the nearby books in the hopes of serendipitous discovery. These days, I&#8217;d be more likely to scan its bibliography&#8211;or to look at the books and articles citing it. A known item can be an excellent entry point for exploration. Conversely, exploration can lead you to discover the existence of information that you then simply need to retrieve.</p>
<p>In common use, words like searching and finding cover this entire spectrum of information seeking activity. This breadth of meaning causes a lot of confusion. I&#8217;ve blogged about this before: &#8220;<a href="http://thenoisychannel.com/2008/12/02/what-is-not-search/">What is (Not) Search?</a>&#8220;:</p>
<blockquote><p>At the very least, I propose that we distinguish “search” as a problem from “search” as a solution. By the former, I mean the problem of <a href="http://en.wikipedia.org/wiki/Information_seeking">information seeking</a>, which is traditionally the domain of <a href="http://en.wikipedia.org/wiki/Library_science">library</a> and <a href="http://en.wikipedia.org/wiki/Information_science">information</a> scientists. By the latter, I mean the approach most commonly associated with <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a>, in which a user enters a query into the system (typically as free text) and the system returns a set of objects that match the query, perhaps with different degrees of relevancy.</p></blockquote>
<p>Back to Campbell&#8217;s article. His main points:</p>
<ul>
<li>Social networks have dramatically expanded our network of contacts.</li>
<li>Search engine optimization (SEO) experts have killed their own game.</li>
<li>The flow of information has changed: information now comes to us, rather than us having to go out and find it.</li>
</ul>
<p>I like the spirit of the post, but I think he overstates his case. <a href="http://en.wikipedia.org/wiki/Search_engine_optimization">SEO</a> isn&#8217;t all bad&#8211;in fact, it&#8217;s probably a key factor in Google&#8217;s effectiveness. And, while social networks enable social search in theory, and information does come to us; we are experiencing <a href="http://web2expo.blip.tv/file/1277460/">filter failure</a> (Clay Shirky&#8217;s term) in a big way.</p>
<p>My conclusion: I agree with him about Google&#8217;s limitations&#8211;Google is primarily a locating tool, not a discovery tool. Unfortunately, I&#8217;m not persuaded that social networks and our theoretical ability to construct an ideal in-flow of information have actually delivered on the promise of more efficient information access. But I&#8217;m optimistic that we&#8217;ll eventually get there.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/31/finding-locating-discovering/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Blogs I Read: Chris Dixon (cdixon.org)</title>
		<link>http://thenoisychannel.com/2009/08/30/blogs-i-read-chris-dixon-cdixon-org/</link>
		<comments>http://thenoisychannel.com/2009/08/30/blogs-i-read-chris-dixon-cdixon-org/#comments</comments>
		<pubDate>Sun, 30 Aug 2009 15:15:46 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Blogs I Read]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2500</guid>
		<description><![CDATA[I&#8217;ve started reading a few different blogs in the past months, and one that I particularly like is Chris Dixon&#8217;s, which has the simple (if uncreative) title cdixon.org.
Chris has an interesting history that includes heading R&#38;D at a hedge fund, co-founding SiteAdvisor, investing in a number of technology companies (including Skype and Postini), and most [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve started reading a few different blogs in the past months, and one that I particularly like is Chris Dixon&#8217;s, which has the simple (if uncreative) title <a href="http://cdixon.org/">cdixon.org</a>.</p>
<p>Chris has an interesting <a href="http://cdixon.org/about.html">history</a> that includes heading R&amp;D at a hedge fund, co-founding <a href="http://www.siteadvisor.com/">SiteAdvisor</a>, investing in a number of technology companies (including <a href="http://skype.com/">Skype</a> and <a href="http://postini.com/">Postini</a>), and most recently co-founding <a href="http://www.hunch.com/">Hunch</a> (which I&#8217;ve blogged about <a href="http://thenoisychannel.com/?s=hunch">here</a> a few times). As a karaoke junkie, I can&#8217;t help noting that he developed the software that became <a href="http://ksolo.myspace.com/">MySpace Karaoke</a>.</p>
<p>Not surprisingly, Chris brings the combined perspective of an investor and a technologist to his blog. Here are some examples of recent posts that illustrate his range.</p>
<p>Thoughts on machine learning:</p>
<ul>
<li><a href="http://www.cdixon.org/?p=340">To make smarter systems, it’s all about the data</a></li>
<li><a href="http://www.cdixon.org/?p=342"> Machine learning is really good at partially solving just about any problem</a></li>
</ul>
<p>Career advice for entrepreneurs:</p>
<ul>
<li><a href="http://www.cdixon.org/?p=363">The worst time to join a startup is right after it gets initial VC financing</a></li>
<li><a href="http://www.cdixon.org/?p=338">Why you shouldn’t keep your startup idea secret</a></li>
</ul>
<p>And of course he occasionally <a href="http://www.google.com/search?q=site%3Acdixon.org+hunch">blogs about Hunch</a>, his current venture.</p>
<p>Chris has a strong personality that comes through as a blogger. I think that&#8217;s critical for making a blog both informative and entertaining, and I try to channel my own personality (which I&#8217;m told, for better or worse, is quite distinctive) through this blog.</p>
<p>In short, check out <a href="http://cdixon.org/">cdixon.org</a> if you&#8217;re interested in the perspective of a practical (and successful) technologist-entrepreneur.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/30/blogs-i-read-chris-dixon-cdixon-org/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Free as in Freebase</title>
		<link>http://thenoisychannel.com/2009/08/29/free-as-in-freebase/</link>
		<comments>http://thenoisychannel.com/2009/08/29/free-as-in-freebase/#comments</comments>
		<pubDate>Sat, 29 Aug 2009 18:25:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2488</guid>
		<description><![CDATA[



It&#8217;s been a while since I&#8217;ve blogged about Freebase, the semantic web database maintained by Metaweb. But I recently had the chance to meet Freebasers Robert Cook and Jamie Taylor and hear them present to the New York Semantic Web Meetup on &#8220;Content, Identifiers and Freebase&#8221; (slides embedded above).
It was a fun and informative presentation. [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_1921800" style="width: 425px; text-align: left;"><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=nycmeetup-jt-aug09-090828181330-phpapp02&amp;stripped_title=nyc-semantic-web-meetup-aug-2009" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=nycmeetup-jt-aug09-090828181330-phpapp02&amp;stripped_title=nyc-semantic-web-meetup-aug-2009" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<div style="width: 425px; text-align: left;"></div>
<div id="__ss_1921800" style="width: 425px; text-align: left;">
<p>
It&#8217;s been a while since I&#8217;ve blogged about <a href="http://www.freebase.com/">Freebase</a>, the <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web</a> database maintained by <a href="http://www.metaweb.com/">Metaweb</a>. But I recently had the chance to meet Freebasers <a href="http://www.freebase.com/view/en/robert_cook">Robert Cook</a> and <a href="http://www.freebase.com/view/en/jamie_taylor">Jamie Taylor</a> and hear them present to the <a href="http://semweb.meetup.com/25/">New York Semantic Web Meetup</a> on &#8220;<a href="http://semweb.meetup.com/25/calendar/10966857/">Content, Identifiers and Freebase</a>&#8221; (slides embedded above).</p>
<p>It was a fun and informative presentation. Perhaps the most surprising revelation about Freebase was that all of their data fits in RAM on a 32G box (yes, some of you caught me <a href="http://twitter.com/dtunkelang/status/3590696944">live-tweeting</a> that during the presentation). Their biggest challenge is collecting good data that lends itself to the <a href="http://blog.freebase.com/2008/05/13/new-api-service-reconciliation/">reconciliation</a> needed to make Freebase useful as a data repository. Despite the lack of a near-term revenue model, the Freebasers are bullish about their approach: strong identifiers, strong semantics, open data. On the last point, almost all of Freebase is available under the  <a href="http://creativecommons.org/licenses/by/3.0/">Creative Commons Attribution License (CC-BY)</a>&#8211;which, as far as I can tell, make anyone free to develop a mirror of Freebase. Indeed, many people are using this data, including <a href="http://newstimeline.googlelabs.com/">Google</a> and <a href="http://blog.freebase.com/2009/07/13/bing-structured-search-results-powered-by-freebase/">Bing</a>.</p>
<p>You might wonder whether Freebase is a business or a non-profit foundation&#8211;and the question did come up. The answer is that Freebase eventually expects to make money by providing services, e.g., helping advertisers. They see their <a href="http://en.wikipedia.org/wiki/Triplestore">graph store</a> as a competitive advantage&#8211;but they freely admit that this advantage will erode over time. Indeed, the surprisingly small size of their graph makes me wonder how much speed and scalability matter, compared to the challenge of data scarcity.</p>
<p>I&#8217;d like to see Freebase succeed. I&#8217;m particularly a fan of the work <a href="http://davidhuynh.net/">David Huynh</a> has done there on interfaces for semantic web browsing. Clearly their investors are true believers&#8211;Metaweb has raised a <a href="http://www.crunchbase.com/company/metawebtechnologies">total of $57M in funding</a>. I don&#8217;t quite get it, but I&#8217;m happy we can all benefit from the results.</div>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/29/free-as-in-freebase/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Social Networking: Theory and Practice</title>
		<link>http://thenoisychannel.com/2009/08/25/social-networking-theory-and-practice/</link>
		<comments>http://thenoisychannel.com/2009/08/25/social-networking-theory-and-practice/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 13:45:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2483</guid>
		<description><![CDATA[I&#8217;ve been a student of social network theory for years, enjoying the work of Duncan Watts, Albert-László Barabási, Jon Kleinberg, and a number of other researchers investigating this field. It should be no surprise that a topic that is so core to our humanity has attracted attention from some of our best and brightest.
And I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been a student of social network theory for years, enjoying the work of <a href="http://en.wikipedia.org/wiki/Duncan_J._Watts">Duncan Watts</a>, <a href="http://www.nd.edu/~alb/">Albert-László Barabási</a>, <a href="http://www.cs.cornell.edu/home/kleinber/">Jon Kleinberg</a>, and a number of other researchers investigating this field. It should be no surprise that a topic that is so core to our humanity has attracted attention from some of our best and brightest.</p>
<p>And I&#8217;ve dabbled a bit on the theoretical side myself. The <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a> measure (I&#8217;m indebted to <a href="http://twitter.com/ealdent">Jason Adams</a> for his <a href="http://tunkrank.com/">implementing it on a live site</a>!) attempts to take the most basic assumption about our social behavior&#8211;the constraint that we have a finite attention budget&#8211;and explore its implications for influence over social networks. I have a few unexplored hypotheses queued up for when I can find the spare time to try validate them empirically!</p>
<p>But why settle for theory? We live in an age where social networks compete with web search (and perhaps complement search) as the hottest online technologies. If we&#8217;re not reading about Google vs. Bing, we&#8217;re reading about Facebook vs. Twitter, with LinkedIn offering a third way that seems to co-exist with its more storied peers. In this post, I&#8217;d like to focus on LinkedIn.</p>
<p>LinkedIn, despite its feature creep, is still fairly old-school: its raison d&#8217;être is for users to build, maintain, and exploit their professional networks. In theory, connections on LinkedIn represent present or past working relationships that become the basis for referrals&#8211;whether the goal is employment, sales, or partnership. LinkedIn is not the only professionally oriented social network, but at this point it&#8217;s certainly the dominant one.</p>
<p>But I&#8217;ve found at least two additional ways to use LinkedIn that I&#8217;d like to share:</p>
<p><strong>Intelligence gathering</strong>. For reasons I don&#8217;t yet claim to understand, people share far more information about themselves&#8211;and in a much cleaner, structured form&#8211;on LinkedIn than in perhaps any other online medium. Most people&#8217;s resumes are not available online, but their LinkedIn profiles are tantamount to resumes. Moreover, their structured format makes it possible for LinkedIn to assemble aggregate profiles of companies, revealing composite pictures that must drive some of those companies&#8217; legal and HR departments batty! At a higher level, LinkedIn also works well as a discovery tool&#8211;much more so now they&#8217;ve enabled faceted search. It&#8217;s still a bit tricky to explore people and companies by topic, but far more effective using LinkedIn than using any other tool I&#8217;m aware of.</p>
<p><strong>Meeting new people</strong>. Cold-calling, spamming&#8211;pick your poison. In short, LinkedIn doesn&#8217;t have to only be about connecting with people you already know. But there&#8217;s an art to sending unsolicited messages: you have to pass the moral equivalent of a <a href="http://en.wikipedia.org/wiki/CAPTCHA">CAPTCHA</a> by proving that your communication strategy isn&#8217;t indiscriminate. Let me use a personal example (that Maisha Walker was nice enough to write up in her <a href="http://blog.inc.com/e-commerce/2009/08/linkedin_small_business_success.html">Inc. magazine column</a>). I decided that I wanted to find everyone on LinkedIn who might be interested in <a href="http://cuaslis.org/hcir2009/">HCIR &#8216;09</a>. So I searched for everyone whose profiles indicated interests in both <a href="http://en.wikipedia.org/wiki/Information_retrieval">IR</a> and <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_interaction">HCI</a> and sent out a targeted message (in fact, a invite with personalized message&#8211;a feature I recently <a href="http://thenoisychannel.com/2009/08/18/linkedin-no-longer-allowing-invite-messages/">feared they&#8217;d killed</a>). The results were overwhelmingly positive. I&#8217;m not sure how many of the people I contacted will attend, but I raised awareness without inflicting annoyance. Better yet, one of the people I contacted then discovered I <a href="http://thenoisychannel.com/2009/04/17/booking-it-to-the-finish-line/">was looking for volunteers</a> to review the draft of my <a href="http://thenoisychannel.com/faceted-search-the-book/">book</a>&#8211;and I thus obtained hours of help of someone who, just a day before, had never heard of me!</p>
<p>What intrigues me about LinkedIn (and other social networks) is the extent to which I am exploiting attention market inefficiencies (as LinkedIn may be doing as well). For example, LinkedIn makes it easy to send unsolicited invitations to anyone. Granted, you can lose this privilege by even having a couple of people respond to invitations with &#8220;I don&#8217;t know this person&#8221;. There&#8217;s also the question of why people&#8217;s social norms around disclosure are so different on LinkedIn than anywhere else&#8211;people not only post the content of their resumes, but go through the effort of providing it to LinkedIn in a structured form! Meanwhile, LinkedIn keeps tightfisted control over the information it aggregates&#8211;understandably, they recognize that this content is their most valuable asset.</p>
<p>People are still getting used to the idea of social networks. It will be interesting to see how their use evolves, particularly in term of information and attention market efficiency.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/25/social-networking-theory-and-practice/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Payola? There&#8217;s An App For That!</title>
		<link>http://thenoisychannel.com/2009/08/22/payola-theres-an-app-for-that/</link>
		<comments>http://thenoisychannel.com/2009/08/22/payola-theres-an-app-for-that/#comments</comments>
		<pubDate>Sat, 22 Aug 2009 21:52:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2479</guid>
		<description><![CDATA[Remember a few months ago when there was a scandal about a Belkin employee paying people $0.65 per review to post 5-star reviews to Amazon?
Well, that was child&#8217;s play compared to what PR firm Reverb Communications has allegedly been doing for it clients. According to Gagan Biyani at  TechCrunch, Reverb hired interns to post positive [...]]]></description>
			<content:encoded><![CDATA[<p>Remember a few months ago when there was a scandal about a Belkin employee paying people <a href="http://thenoisychannel.com/2009/01/17/sell-your-integrity-for-065/">$0.65 per review</a> to post 5-star reviews to Amazon?</p>
<p>Well, that was child&#8217;s play compared to what PR firm Reverb Communications has allegedly been doing for it clients. According to Gagan Biyani at  <a href="http://www.mobilecrunch.com/2009/08/22/cheating-the-app-store-pr-firm-has-interns-post-positive-reviews-for-clients/">TechCrunch</a>, Reverb hired interns to post positive review to Apple&#8217;s App Store for clients. Indeed, TechCrunch posted documentation obtained through an anonymous tipster, including  the following:</p>
<blockquote><p>Reverb employs a small team of interns who are focused on managing online message boards, writing influential game reviews, and keeping a gauge on the online communities. Reverb uses the interns as a sounding board to understand the new mediums where consumers are learning about products, hearing about hot new games and listen to the thoughts of our targeted audience. Reverb will use these interns on Developer Y products to post game reviews (written by Reverb staff members) ensuring the majority of the reviews will have the key messaging and talking points developed by the Reverb PR/marketing team.</p></blockquote>
<p>What makes this story especially newsworthy is that Reverb&#8217;s client list includes some big names, such as Harmonix (i.e., Guitar Hero and Rock Band) and MTV Games.</p>
<p>Apparently the reviewer system isn&#8217;t entirely anonymous, so Biyani was able to look for patterns:</p>
<blockquote><p>iTunes allows you to see other reviews posted by the same reviewer. So, we clicked on the reviewer “Vegas Bound” (<a href="http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewUsersUserReviews?dsid=173638052">iTunes link</a>) and started to look at his reviews. He reviewed 7 applications, and gave each one of them 5 stars. Each review was short and sweet, and extremely positive. These reviews represented 6 different developers. A quick Google search revealed an infuriating truth: every <em>single one of these developers</em> was a client of one PR firm: Reverb Communications.</p></blockquote>
<p>I can only hope that scandals like these will cause people to be more skeptical of reviews (or opinions in general) that come from anonymous or obfuscated sources. While most reviews are probably sincere, it doesn&#8217;t take much to erode public trust. Moreover, a few shill reviews can attract attention to a product, thus leading legitimate reviews to follow afterward. Where&#8217;s the harm? Products without those shill reviews are starved of the attention they might deserve. Money substitutes for authentic endorsement.</p>
<p>Our brave new world of social media makes it possible to truly democratize the sharing of knowledge and opinions. But gaming the system like this erodes the trust that is essential for this process to work&#8211;and thus devalues all of the information available to us online. The key enabler of such gaming is anonymity. Fortunately the miscreants do get caught on occasion. Hopefully we will learn from this experience and build more robust systems that aren&#8217;t so easily gamed. <a href="http://thenoisychannel.com/2009/04/13/transparency-or-fail/">Transparency or FAIL</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/22/payola-theres-an-app-for-that/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UIE Virtual Seminar on Faceted Search: A Great Experience!</title>
		<link>http://thenoisychannel.com/2009/08/20/uie-virtual-seminar-on-faceted-search-a-great-experience/</link>
		<comments>http://thenoisychannel.com/2009/08/20/uie-virtual-seminar-on-faceted-search-a-great-experience/#comments</comments>
		<pubDate>Fri, 21 Aug 2009 00:21:33 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2475</guid>
		<description><![CDATA[Pete Bell and I delivered the seminar today, and it was a blast! We had over 150 registered listeners&#8211;and I found out that at least one of those registrations corresponded to a roomful of 20 people at an online retailer that is a thought leader in web usability and design!
Since we didn&#8217;t manage to get [...]]]></description>
			<content:encoded><![CDATA[<p>Pete Bell and I delivered the <a href="http://www.uie.com/events/virtual_seminars/facets/">seminar</a> today, and it was a blast! We had over 150 registered listeners&#8211;and I found out that at least one of those registrations corresponded to a roomful of 20 people at an online retailer that is a thought leader in web usability and design!</p>
<p>Since we didn&#8217;t manage to get to all of the questions (over 40&#8211;possibly over 50 counting the activity on <a href="http://search.twitter.com/search?q=%23uievs">Twitter</a>!), we&#8217;re going to do a follow-up podcast that will be available even to people who didn&#8217;t attend the seminar. And, since even that might not be enough, I&#8217;m saving all of the questions as blog fodder.</p>
<p>To all who attended&#8211;and to Jared, Adam, and all the folks of UIE&#8211;thanks from me and Pete for giving us this great opportunity to connect with folks interested in faceted search and user experience.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/20/uie-virtual-seminar-on-faceted-search-a-great-experience/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google Search Appliance: Now Without HCIR!</title>
		<link>http://thenoisychannel.com/2009/08/20/google-search-appliance-now-without-hcir/</link>
		<comments>http://thenoisychannel.com/2009/08/20/google-search-appliance-now-without-hcir/#comments</comments>
		<pubDate>Thu, 20 Aug 2009 23:52:19 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2468</guid>
		<description><![CDATA[In an earlier post, I speculated about why Google is holding back on faceted search. Of course, I was talking about their web search properties, not their enterprise offerings. I thought that they&#8217;d seen the light by now that faceted search&#8211;and HCIR in general&#8211;is especially important in the enterprise, where you can&#8217;t rely on PageRank, [...]]]></description>
			<content:encoded><![CDATA[<p>In an earlier post, I speculated about <a href="http://thenoisychannel.com/2009/08/14/why-does-google-hold-back-on-faceted-search/">why Google is holding back on faceted search</a>. Of course, I was talking about their web search properties, not their enterprise offerings. I thought that they&#8217;d seen the light by now that <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>&#8211;and <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> in general&#8211;is especially important in the enterprise, where you can&#8217;t rely on <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a>, <a href="http://en.wikipedia.org/wiki/Anchor_text">anchor text</a>, and <a href="http://en.wikipedia.org/wiki/Search_engine_optimization">SEO</a>&#8211;not to mention the large fraction of navigational and straight-to-Wikipedia queries.</p>
<p>But I was wrong. Don&#8217;t take it from me&#8211;watch the video below (or read this <a href="http://googleenterprise.blogspot.com/2009/08/compare-enterprise-search-relevance.html">blog post</a>) and listen to what Cyrus Mistry,  the product manager for the Google Search Appliance has to say. I might give him a pass on his dubious conflation all features other than ranked retrieval with &#8220;advanced search&#8221;. But here&#8217;s a direct quote: &#8220;users care about one thing: the right result coming to the top&#8221;.</p>
<p>Sigh. I don&#8217;t dismiss the value of relevance ranking. Some search queries are easy and clearly point to single documents as answers&#8211;and any search engine should do well on them. But lots of queries in site search and enterprise search environments (more so than on the web) don&#8217;t have a single best answer. That&#8217;s why we have faceted search and interfaces that offer useful <a href="http://en.wikipedia.org/wiki/Information_foraging">information scent</a> to users.</p>
<p>I understand that Google is, on the whole HCIR-averse. But I expect more from their enterprise division. To be clear, the &#8220;side by side&#8221; feature that Mistry touts is nice. It reminds me of <a href="http://blindsearch.fejus.com/">Blind Search</a> (built by a Microsoft employee in his spare time), and of a relevance ranking evaluator that <a href="http://endeca.com/">Endeca</a> customers have been using for years.</p>
<p>But there&#8217;s more to search results than ten blue links. Even the Google web folks seem to be slouching towards <a href="http://thenoisychannel.com/2009/05/12/is-google-diving-head-first-into-hcir/">accepting the importance of interaction</a>. Their enterprise team should be leading, not lagging.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="451" height="275" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/XeG3-n9u-3c&amp;hl=en&amp;fs=1&amp;" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="451" height="275" src="http://www.youtube.com/v/XeG3-n9u-3c&amp;hl=en&amp;fs=1&amp;" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/20/google-search-appliance-now-without-hcir/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>LinkedIn No Longer Allowing Invite Messages?</title>
		<link>http://thenoisychannel.com/2009/08/18/linkedin-no-longer-allowing-invite-messages/</link>
		<comments>http://thenoisychannel.com/2009/08/18/linkedin-no-longer-allowing-invite-messages/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 19:30:13 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2461</guid>
		<description><![CDATA[I noticed recently that, when I sent out an invitation to connect to someone on LinkedIn, there wasn&#8217;t the usual slot for including a free-text note with the invitation. I thought it might be a glitch&#8211;and I even considered the possibility that this was only happening to my account because I&#8217;m a bit of a [...]]]></description>
			<content:encoded><![CDATA[<p>I noticed recently that, when I sent out an invitation to connect to someone on LinkedIn, there wasn&#8217;t the usual slot for including a free-text note with the invitation. I thought it might be a glitch&#8211;and I even considered the possibility that this was only happening to my account because I&#8217;m a bit of a networking junkie.</p>
<p>But I noticed <a href="http://twitter.com/Mr_Linkedin/statuses/3360784769">on Twitter today</a> that Mark Williams (aka <a href="http://twitter.com/Mr_Linkedin">@Mr_LinkedIn</a>) had noticed the same change and followed up on it with LinkedIn&#8217;s customer service department. I never assume any site behavior on a freely provided service is permanent, but it is starting to look like this is a deliberate decision and not a transient bug.</p>
<p>If so, it&#8217;s an annoying change, though I can see the merits. I&#8217;ve made heavy use of the connection message, especially when inviting someone I don&#8217;t know all that well&#8211;or don&#8217;t know at all. A personal message can be what distinguishes a welcome cold call from spam. But I&#8217;m guessing that others have abused that capability, filling it with spam or worse. Still, I feel like LinkedIn may be throwing the baby out with the bathwater. Will follow up if / when I hear more.</p>
<p><strong>UPDATE: Just saw this message on the <a href="http://linkedin.custhelp.com/cgi-bin/linkedin.cfg/php/enduser/std_adp.php?p_faqid=2162">LinkedIn</a> site via <a href="http://twitter.com/LinkedIn/status/3368903249">Twitter</a>:</strong></p>
<blockquote><p><strong>Unable to Personalize Invitation Message</strong></p>
<div id="questiontext">
<div id="desc"><!-- This div is for console answer preview, control of access levels. -->Why can&#8217;t I personalize the message in my Invitation?</div>
</div>
<p>We are aware of an issue preventing some members from customizing their Invitation messages. There is no need to contact Customer Service as our team is reviewing the issue to determine the best overall solution.</p>
<p>As a temporary workaround, the following message (with your name in the signature) is being sent when you click on the &#8216;Send Invitation&#8217; button: &#8216;I&#8217;d like to add you to my professional network on LinkedIn.&#8217;</p>
<p>As long as you approve of this message, you may continue to take advantage of this feature. If you prefer a more customized message to be sent, you may delay sending your Invitations until the functionality has been restored.</p></blockquote>
<p><strong>UPDATE #2: Looks like the problem is resolved.</strong></p>
<p><!-- This div is for console answer preview, control of access levels. --></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/18/linkedin-no-longer-allowing-invite-messages/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Prediction Is Hard, Especially About The Future</title>
		<link>http://thenoisychannel.com/2009/08/18/prediction-is-hard-especially-about-the-future/</link>
		<comments>http://thenoisychannel.com/2009/08/18/prediction-is-hard-especially-about-the-future/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 15:05:10 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2457</guid>
		<description><![CDATA[That Niels Bohr certainly knew what he was talking about! But that hasn&#8217;t discouraged folks in any number of industries from trying to make predictions.
Google in particular has been researching the predictability of search trends (just to be fair and balanced, so have Bing and Yahoo). Yossi Matias, Niv Efron, and Yair Shimshoni at Google [...]]]></description>
			<content:encoded><![CDATA[<p>That <a href="http://www.brainyquote.com/quotes/quotes/n/nielsbohr130288.html">Niels Bohr</a> certainly knew what he was talking about! But that hasn&#8217;t discouraged folks in any number of industries from trying to make predictions.</p>
<p>Google in particular has been researching the <a href="http://googleresearch.blogspot.com/2009/08/on-predictability-of-search-trends.html">predictability of search trends</a> (just to be fair and balanced, so have <a href="http://thenoisychannel.com/2009/08/02/sigir-2009-day-3-industry-track-nick-craswell/">Bing</a> and <a href="http://www.dsi.uniroma1.it/~tfws07/program/talks/plachouras.pdf">Yahoo</a>). Yossi Matias, Niv Efron, and Yair Shimshoni at Google Labs Israel have made some fascinating observations based on Google Trends, including the following:</p>
<ul>
<li>Over half of the most popular Google search queries are predictable in a 12 month ahead forecast, with a mean absolute prediction error of about 12%.</li>
<li>Nearly half of the most popular queries are not predictable (with respect to the model we have used).</li>
<li>Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food &amp; Drink (67%) and Travel (65%).</li>
<li>Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks &amp; Online Communities (27%).</li>
<li>The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean absolute prediction error of of less than 6%.</li>
</ul>
<p>You can read their full 32-page paper <a href="http://research.google.com/archive/google_trends_predictability.pdf">here</a>.</p>
<p>I&#8217;m not surprised at the predictability of human search behavior, especially for stable topics or even for unstable ones viewed as aggregates&#8211;one could argue the celebrities and scandals du jour are unpredictable but interchangeable. What I&#8217;m curious about is what we can do with this predictability.</p>
<p>In the <a href="http://thenoisychannel.com/2009/07/26/sigir-2009-day-2-interactive-search-session/">SIGIR &#8216;09 session on Interactive Search</a>, Peter Bailey talked about “<a href="http://research.microsoft.com/en-us/um/people/ryenw/papers/whitesigir2009.pdf">Predicting User Interests from Contextual Information</a>“, analyzing the predictive performance of contextual information sources (interaction, task, collection, social, historic) for different temporal durations. Max Van Kleek wrote a nice <a href="http://groups.csail.mit.edu/haystack/blog/2009/07/24/sigir09-predicting-user-interests-from-contextual-information/">summary</a> of the talk at the Haystack blog. The paper doesn&#8217;t investigate seasonality (perhaps because they only looked at four months of data), but I&#8217;d imagine they would subsume it under the broader categories of historic and social context. But they do set a clear goal:</p>
<blockquote><p>Postquery navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity&#8230;Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.</p></blockquote>
<p>I hope Google is following a similar agenda. If you&#8217;re going to go through the trouble of predicting the future, then help make it a better one for users!</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/18/prediction-is-hard-especially-about-the-future/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Last Chance to Register for UIE Virtual Seminar on Faceted Search!</title>
		<link>http://thenoisychannel.com/2009/08/18/last-chance-to-register-for-uie-virtual-seminar-on-faceted-search/</link>
		<comments>http://thenoisychannel.com/2009/08/18/last-chance-to-register-for-uie-virtual-seminar-on-faceted-search/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 14:15:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2453</guid>
		<description><![CDATA[My colleague, Endeca co-founder Pete Bell, and I are giving a virtual seminar on faceted search for User Interface Engineering (UIE) this Thursday, August 20th at 1:30PM EST. We&#8217;ve heard that there are over a hundred sign-ups already&#8211;which may actually correspond to more people, since a sign-up may mean a group of people watching in [...]]]></description>
			<content:encoded><![CDATA[<p>My colleague, Endeca co-founder Pete Bell, and I are giving a <a href="http://www.uie.com/events/virtual_seminars/facets/">virtual seminar on faceted search</a> for User Interface Engineering (UIE) this Thursday, August 20th at 1:30PM EST. We&#8217;ve heard that there are over a hundred sign-ups already&#8211;which may actually correspond to more people, since a sign-up may mean a group of people watching in a conference room. We&#8217;re very excited about the opportunity to share our insights on a topic that draws such interest.</p>
<p><span><a href="http://www.uie.com/about/">Jared Spool</a>, who invited us to give this seminar, will be moderating. Indepedendent of the seminar, you you check out his work </span><span>(and t</span><span>he <a href="http://www.uie.com/">UIE</a> site</span><span>) if you are interested in web usability.</span></p>
<p>The regular price is $129, but Noisy Channel readers who are interested in attending can get a $30 discount by using <span id="msgtxt3181097536">TUNKELANG (yes, all caps) as a promo code. Attendees also receive a free copy of my book, <a href="http://www.amazon.com/exec/obidos/ASIN/1598299999/"><em>Faceted Search</em></a>. That&#8217;s a a total value of over $150 for just $99! And it slices and dices!</span></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/18/last-chance-to-register-for-uie-virtual-seminar-on-faceted-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Raging Debate Over The Link Economy</title>
		<link>http://thenoisychannel.com/2009/08/16/the-raging-debate-over-the-link-economy/</link>
		<comments>http://thenoisychannel.com/2009/08/16/the-raging-debate-over-the-link-economy/#comments</comments>
		<pubDate>Sun, 16 Aug 2009 20:16:03 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2450</guid>
		<description><![CDATA[Arnon Mishkin wrote a post last Thursday on paidContent called &#8220;The Fallacy Of The Link Economy&#8221; that has been generating a lot of discussion, so I figured I&#8217;d join in the free-for-all. First, let me try to reduce each person&#8217;s argument to a direct quote that best sums up his position.
Arnon Mishkin:
The vast majority of [...]]]></description>
			<content:encoded><![CDATA[<p>Arnon Mishkin wrote a post last Thursday on paidContent called &#8220;<a href="http://paidcontent.org/article/419-the-fallacy-of-the-link-economy/">The Fallacy Of The Link Economy</a>&#8221; that has been generating a lot of discussion, so I figured I&#8217;d join in the free-for-all. First, let me try to reduce each person&#8217;s argument to a direct quote that best sums up his position.</p>
<p><a href="http://paidcontent.org/article/419-the-fallacy-of-the-link-economy/">Arnon Mishkin</a>:</p>
<blockquote><p>The vast majority of the value gets captured by aggregators linking and scraping rather than by the news organizations that get linked and scraped.</p></blockquote>
<p><a href="http://www.buzzmachine.com/2009/08/14/on-the-link-economy/">Jeff Jarvis</a>:</p>
<blockquote><p>Links are worth what the recipient makes of them.</p></blockquote>
<p><a href="http://techdirt.com/articles/20090813/1841085874.shtml">Mike Masnick</a>:</p>
<blockquote><p>It&#8217;s not the link alone that has value or the story alone that has value, but the overall process of building a community.</p></blockquote>
<p><a href="http://www.techcrunch.com/2009/08/16/the-media-bundle-is-dead-long-live-the-news-aggregators/">Erick Schonfeld</a>:</p>
<blockquote><p>If a news site or a blog can say enough interesting things enough times that news aggregators (or other sites) keep linking to them, then they can build up their brand and reader loyalty.</p></blockquote>
<p>Sigh. I thought the health care debate was bad enough, but I suppose that almost all impassioned debates come down to opposing sides exchanging half-truths.</p>
<p>In Mishkin&#8217;s defense: news organizations are in a <a href="http://en.wikipedia.org/wiki/Catch-22_%28logic%29">catch-22</a>. Many have suggested that if a news organization doesn&#8217;t want its content showing up on aggregators&#8217; sites, it simply has to modify <a href="http://en.wikipedia.org/wiki/Robots_exclusion_standard">robots.txt</a> accordingly. But news organizations can only do so individually&#8211;which puts them in a <a href="http://en.wikipedia.org/wiki/Prisoner%27s_dilemma">prisoner&#8217;s dilemma</a>. Anti-trust law prevents news organizations from collectively bargaining with those who aggregate their content. For all intensive purposes, they are forced to abide by the status quo.</p>
<p>In Jarvis&#8217;s defense (yes, I&#8217;m actually defending Jeff Jarvis!): there isn&#8217;t much point in producing content for which most of the value is captured in a teaser so small as to be covered under fair use rights. As he&#8217;s said <a href="http://www.buzzmachine.com/2009/05/12/getting-past-the-past/">elsewhere</a>, newspapers are inefficient, and the industry will have to shrink a lot to be healthy.</p>
<p>In Masnick&#8217;s defense: I cite my own blog post (also inspired by one of his posts) about monetizing community because <a href="http://thenoisychannel.com/2009/03/03/community-copy-protection/">participation is inherently uncopiable</a>. It&#8217;s hard for me to agree with him more strongly than that!</p>
<p>In Schonfeld&#8217;s defense: his argument sounds a lot like the &#8220;<a href="http://en.wikipedia.org/wiki/Freemium">freemium</a>&#8221; strategy, which has a respectable track record. In order to build a loyal customer base, you often need to give away free trials as teasers&#8211;and that&#8217;s effectively what happens when media sites make some of their content available through aggregators. And, as in the freemium model, the actual product has to be significantly more interesting that the free teaser to earn the consumer&#8217;s investment&#8211;whether that investment is in the form of money, attention, or loyalty.</p>
<p>So, do I agree with them all? Not exactly. Mishkin&#8217;s first prescription to news organization should probably be to cut investment in undifferentiated content. Jarvis should acknowledge that the inability of news organizations to collectively bargain is unfair to them. Masnick&#8211;well, I basically do agree with him on the limited point he&#8217;s making. I suppose the strongest objection would be that not all media sites should be forced to become communities just because they&#8217;re hobbled in their ability to negotiate the monetization of the content they produce. And Schonfeld&#8217;s argument assumes the current link economy as a given&#8211;and one of the biggest points of contention is whether news organizations should be allowed to try to change that economy.</p>
<p>Sadly, I don&#8217;t see any of these guys giving the other an inch, which is why this discussion will probably continue unchanged for the foreseeable future. Hopefully the passion of the debate helps sell, um, papers.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/16/the-raging-debate-over-the-link-economy/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Why Does Google Hold Back On Faceted Search?</title>
		<link>http://thenoisychannel.com/2009/08/14/why-does-google-hold-back-on-faceted-search/</link>
		<comments>http://thenoisychannel.com/2009/08/14/why-does-google-hold-back-on-faceted-search/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 21:24:00 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2446</guid>
		<description><![CDATA[Sometimes the response to a comment is worthy of an entire post, and this is one of those times. In response to my recent post about Able Grape, a wine search engine developed by Doug Cook (now Director of Twitter Search), Lee asked:
Let&#8217;s say I know almost nothing about wines/digital cameras/cars and a search site [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes the response to a comment is worthy of an entire post, and this is one of those times. In response to my recent <a href="http://thenoisychannel.com/2009/08/13/an-able-grape-at-the-helm-of-twitter-search/">post about Able Grape</a>, a wine search engine developed by Doug Cook (now Director of Twitter Search), <a href="http://thenoisychannel.com/2009/08/13/an-able-grape-at-the-helm-of-twitter-search/#comment-4174">Lee asked</a>:</p>
<blockquote><p>Let&#8217;s say I know almost nothing about wines/digital cameras/cars and a search site offers me &#8220;options&#8221; to drill down. However, I can&#8217;t use those effectively and eventually it comes down to availability and price for me. My questions are what are your thoughts on these kinds of situations and is there a scientific explanation/theory on this case?</p>
<p>This may be why Google does not endorse faceted search except for experimental projects.</p></blockquote>
<p>It&#8217;s a great question. There&#8217;s been a lot of research on how people make decisions when they have to manage trade-offs among multiple attributes, and the increasing interest in <a href="http://en.wikipedia.org/wiki/Behavioral_economics">behavioral economics</a> since <a href="http://en.wikipedia.org/wiki/Daniel_Kahneman">Daniel Kahneman</a> won the Nobel Prize in 2002 has helped some of that research has even percolated into the mainstream thanks to bestsellers like <a href="http://en.wikipedia.org/wiki/Freakonomics"><em>Freakonomics</em></a> and Dan Ariely&#8217;s <a href="http://www.predictablyirrational.com/"><em>Predictable Irrationality</em></a>.</p>
<p>The short answer is that there&#8217;s no point in offering users options that they can&#8217;t (or won&#8217;t) use effectively. <a href="http://en.wikipedia.org/wiki/The_Paradox_of_Choice:_Why_More_Is_Less">Choice overload</a> is certainly a problem, and our reaction to it is to <a href="http://en.wikipedia.org/wiki/Satisficing">satisfice</a>, typically resorting to &#8220;<a href="http://fastandfrugal.com/">fast and frugal</a>&#8221; heuristics that throw out most of the potential decision criteria and instead focus on one or two attributes, e.g., price and availability.</p>
<p>But that&#8217;s no reason to dumb down the data we make available to decision makers. We make hard choices all the time, and fast and frugal can be horrendously suboptimal. We don&#8217;t hire employees based solely on their price and availability&#8211;or at least good employers don&#8217;t! For that matter, I don&#8217;t think most people pick wines that way, given that even Trader Joe has to diversify beyond &#8220;<a href="http://en.wikipedia.org/wiki/Charles_Shaw_wine">Two Buck Chuck</a>&#8220;. And, while there&#8217;s probably more of a market for cheap cameras and cars, I&#8217;m pretty sure you&#8217;re an extreme outlier if you completely ignore other criteria.</p>
<p>That said, there are some caveats about exposing options to users. <a href="http://en.wikipedia.org/wiki/Faceted_search">Faceted search</a> is hard, especially on the open web. Take it <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">from the folks at Microsoft Research</a>&#8211;but I&#8217;m sure Googlers would be the first to agree, especially given their experience with projects like <a href="http://thenoisychannel.com/2009/06/04/google-squared-a-great-first-step/">Google Squared</a> that, while promising, are nowhere near ready for prime time.</p>
<p>I appreciate that Google is conservative about embracing faceted search&#8211;and <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> in general. I&#8217;m actually impressed by the steadily improving quality of their related terms for search queries&#8211;even if they do hide them behind two clicks (show options -&gt; related searches). Perhaps they&#8217;re <a href="http://thenoisychannel.com/2009/06/17/google-markets-itself/">feeling some pressure</a> from Bing. But I think they&#8217;re largely following the dictum of &#8220;if it ain&#8217;t broke, don&#8217;t fix it&#8221;. Google is an extremely successful company. And, as <a href="http://en.wikipedia.org/wiki/Clayton_M._Christensen">Clayton Christensen</a> argues, successful companies are great at incremental innovation and bad at <a href="http://en.wikipedia.org/wiki/Disruptive_technology">disruptive innovation</a>. As far as I can tell, faceted search is very disruptive to their model.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/14/why-does-google-hold-back-on-faceted-search/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>Google&#8217;s Chief Economist Hal Varian Talks Stats 101</title>
		<link>http://thenoisychannel.com/2009/08/14/googles-chief-economist-hal-varian-talks-stats-101/</link>
		<comments>http://thenoisychannel.com/2009/08/14/googles-chief-economist-hal-varian-talks-stats-101/#comments</comments>
		<pubDate>Fri, 14 Aug 2009 18:42:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Quick Bites]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2440</guid>
		<description><![CDATA[In an interview with CNET&#8217;s Tom Krazit, Google Chief Economist Hal Varian made a nice argument regarding the relative advantages of scale to a search engine:
On this data issue, people keep talking about how more data gives you a bigger advantage. But when you look at data, there&#8217;s a small statistical point that the accuracy [...]]]></description>
			<content:encoded><![CDATA[<p>In an <a href="http://news.cnet.com/8301-30684_3-10309375-265.html">interview</a> with CNET&#8217;s Tom Krazit, Google Chief Economist <a href="http://people.ischool.berkeley.edu/~hal/">Hal Varian</a> made a nice argument regarding the relative advantages of scale to a search engine:</p>
<blockquote><p>On this data issue, people keep talking about how more data gives you a bigger advantage. But when you look at data, there&#8217;s a small statistical point that the accuracy with which you can measure things as they go up is the square root of the sample size. So there&#8217;s a kind of natural diminishing returns to scale just because of statistics: you have to have four times as big a sample to get twice as good an estimate.</p>
<p>Another point that I think is very important to remember&#8230;query traffic is growing at over 40 percent a year. If you have something that is growing at 40 percent a year, that means it doubles in two years.</p>
<p>So the amount of traffic that Yahoo, say, has now is about what Google had two years ago. So where&#8217;s this scale business? I mean, this is kind of crazy.</p>
<p>The other thing is, when we do improvements at Google, everything we do essentially is tested on a 1 percent or 0.5 percent experiment to see whether it&#8217;s really offering an improvement. So, if you&#8217;re half the size, well, you run a 2 percent experiment.</p></blockquote>
<p>For those unfamiliar with statistics, I encourage you to look at the Wikipedia entry on <a href="http://en.wikipedia.org/wiki/Standard_deviation">standard deviation</a>. Varian is obviously reducing the argument to a sound bite, but the sound bite rings true. More is better, but there&#8217;s a dramatically diminishing return at the scale of either Microsoft or Google.</p>
<p>However, I do think there&#8217;s a big difference when you start talking about running lots of experiments on small subsets of your users. The ability to run twice as many simultaneous tests without noticeably disrupting overall user experience is a major competitive advantage. But even there quality trumps quantity&#8211;how you choose what to test matters a lot more than how many tests you run.</p>
<p>What does strike me as ironic is that the moral here is a great counterpoint to the Varian&#8217;s colleagues&#8217; arguments about the &#8220;<a href="http://thenoisychannel.com/2009/03/31/the-unreasonable-effectiveness-of-data/">unreasonable effectiveness of data</a>&#8220;. Granted, it&#8217;s apples and oranges&#8211;Alon Halevy, Peter Norvig, and Fernando Pereira are talking about data scale, not user scale. Still, the same arguments apply. Sampling is sampling.</p>
<p>ps. Also check out Nick Carr&#8217;s commentary <a href="http://www.roughtype.com/archives/2009/08/the_diminishing.php">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/14/googles-chief-economist-hal-varian-talks-stats-101/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>UIE Virtual Seminar on Faceted Search</title>
		<link>http://thenoisychannel.com/2009/08/13/uie-virtual-seminar-on-faceted-search/</link>
		<comments>http://thenoisychannel.com/2009/08/13/uie-virtual-seminar-on-faceted-search/#comments</comments>
		<pubDate>Thu, 13 Aug 2009 15:08:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[Noise]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=2437</guid>
		<description><![CDATA[My colleague, Endeca co-founder Pete Bell, and I are giving a virtual seminar on faceted search next week for User Interface Engineering (UIE). It&#8217;s on Thursday, August 20th at 1:30PM EST. The regular price is $129, but Noisy Channel readers who are interested in attending can get a $30 discount by using TUNKELANG (yes, all [...]]]></description>
			<content:encoded><![CDATA[<p>My colleague, Endeca co-founder Pete Bell, and I are giving a <a href="http://www.uie.com/events/virtual_seminars/facets/">virtual seminar on faceted search</a> next week for User Interface Engineering (UIE). It&#8217;s on Thursday, August 20th at 1:30PM EST. The regular price is $129, but Noisy Channel readers who are interested in attending can get a $30 discount by using <span id="msgtxt3181097536">TUNKELANG (yes, all caps) as a promo code. Attendees also receive a free copy of my book, <a href="http://www.amazon.com/exec/obidos/ASIN/1598299999/"><em>Faceted Search</em></a>.</span></p>
<p><span>Whether or not you can attend, I do encourage you to check out the <a href="http://www.uie.com/">UIE</a> site. It&#8217;s got a lot of free, useful content, and <a href="http://www.uie.com/about/">Jared Spool</a> is definitely someone worth following if you are interested in web usability.<br />
</span></p>
]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2009/08/13/uie-virtual-seminar-on-faceted-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
