<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Noisy Channel</title>
	<atom:link href="http://thenoisychannel.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com</link>
	<description></description>
	<lastBuildDate>Wed, 13 Mar 2013 18:08:41 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>The Noisy Channel Has Moved To LinkedIn</title>
		<link>http://thenoisychannel.com/2013/03/07/the-noisy-channel-has-moved-to-linkedin/</link>
		<comments>http://thenoisychannel.com/2013/03/07/the-noisy-channel-has-moved-to-linkedin/#comments</comments>
		<pubDate>Fri, 08 Mar 2013 04:23:00 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4501</guid>
		<description><![CDATA[If you&#8217;re reading this, then you&#8217;re probably interested in my current writing. You can find my latest posts on my LinkedIn author page.]]></description>
				<content:encoded><![CDATA[<p>If you&#8217;re reading this, then you&#8217;re probably interested in my current writing. You can find my latest posts on my <a href="https://www.linkedin.com/today/post/articles/50510">LinkedIn author page</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2013/03/07/the-noisy-channel-has-moved-to-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2013/03/07/the-noisy-channel-has-moved-to-linkedin/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Onward, Upward, and [In]ward</title>
		<link>http://thenoisychannel.com/2012/12/29/onward-upward-and-inward/</link>
		<comments>http://thenoisychannel.com/2012/12/29/onward-upward-and-inward/#comments</comments>
		<pubDate>Sun, 30 Dec 2012 00:05:53 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4487</guid>
		<description><![CDATA[It&#8217;s that time &#8212; the end of another great year. It&#8217;s been a phenomenal year for LinkedIn, for my amazing team of data scientists, and for me personally. The end of year is also an exciting time of transition. I enter 2013 thinking about my daughter starting kindergarten, the house awaiting me after a lifetime [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/today/post/articles/50510"><img class="alignnone  wp-image-4492" alt="Follow me on LinkedIn!" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/12/follow-me-on-linkedin1.png" width="511" height="385" /></a></p>
<p>It&#8217;s that time &#8212; the end of another great year. It&#8217;s been a phenomenal year for LinkedIn, for my amazing team of data scientists, and for me personally.</p>
<p>The end of year is also an exciting time of transition. I enter 2013 thinking about my daughter starting kindergarten, the house awaiting me after a lifetime of apartment dwelling, and of course all the great things my team is setting out to accomplish.</p>
<p>One of those transitions concerns this blog. About a month ago, I started posting on LinkedIn as an &#8220;<a href="http://blog.linkedin.com/2012/10/02/follow-people/">influencer</a>&#8221; &#8212; one of a couple of hundred people privileged to use LinkedIn as a native publishing platform. The thousands of people following me there and engaging with my content have convinced me to go all in.</p>
<p>I will keep this site up with the hundreds of posts I&#8217;ve published over the past 4+ years, but I&#8217;ll put The Noisy Channel on mute for the foreseeable future. So, whether you&#8217;re a long-time reader or someone who just got here, I hope you&#8217;ll <a href="http://www.linkedin.com/today/post/articles/50510">follow me</a> on LinkedIn and read my posts there.</p>
<p>I&#8217;m excited about the upcoming year. There&#8217;s lots of great stuff on my team&#8217;s roadmap, as well as fun opportunities to participate in conferences. Hopefully I&#8217;ll see some of you at the <a href="https://iriss.stanford.edu/css/conference-agenda-2013">Stanford Conference on Computational Social Science</a> and at the <a href="http://strataconf.com/strata2013/public/schedule/detail/27320">O&#8217;Reilly Strata Conference</a>. But most of all I hope to continue engaging with all of you offline and online.</p>
<p>Happy New Year, and see you on LinkedIn!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/12/29/onward-upward-and-inward/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/12/29/onward-upward-and-inward/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Lily 5.0, LinkedIn 2.0</title>
		<link>http://thenoisychannel.com/2012/12/05/lily-5-0-linkedin-2-0/</link>
		<comments>http://thenoisychannel.com/2012/12/05/lily-5-0-linkedin-2-0/#comments</comments>
		<pubDate>Thu, 06 Dec 2012 07:45:40 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4472</guid>
		<description><![CDATA[            What a wonderful week! On Tuesday, my daughter Lily turned 5. We celebrated by taking her out to her first karaoke night, where she had a blast (check her out singing Kimbra&#8217;s part in &#8220;Somebody That I Used To Know&#8220;) as did we! I&#8217;ll post a video soon. And [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone  wp-image-4473" title="Lily Bear" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/12/Lily-Bear.png" alt="" width="110" height="178" />           <img class="alignnone  wp-image-4475" title="LinkedIn 2.0" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/12/LinkedIn-2.01.jpg" alt="" width="326" height="178" /></p>
<p>What a wonderful week! On Tuesday, my daughter Lily turned 5. We celebrated by taking her out to her first karaoke night, where she had a blast (<a href="http://www.youtube.com/watch?v=Fw5c-OP2yyU">check her out singing Kimbra&#8217;s part in &#8220;Somebody That I Used To Know</a>&#8220;) as did we! I&#8217;ll post a video soon. And no, she didn&#8217;t sing in a bear outfit &#8212; but she did dance in that costume.</p>
<p>This same week, I&#8217;m celebrating my second anniversary at LinkedIn. It&#8217;s amazing how quickly two years have gone by, and how the team has grown in numbers and accomplishments. My <a href="http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/">team post from May</a> already feels so dated! I&#8217;ll update it in the next few weeks.</p>
<p>And I just passed <a href="http://www.linkedin.com/today/post/articles/50510">5,000 followers on LinkedIn</a>. I&#8217;m gratified that people enjoy my writing, and also thrilled to be playing a part in LinkedIn&#8217;s content ecosystem.</p>
<p>I hope all of you are celebrating your own milestones as we approach the holidays and the end of 2012.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/12/05/lily-5-0-linkedin-2-0/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/12/05/lily-5-0-linkedin-2-0/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Continuing to Post on LinkedIn</title>
		<link>http://thenoisychannel.com/2012/12/04/continuing-to-post-on-linkedin/</link>
		<comments>http://thenoisychannel.com/2012/12/04/continuing-to-post-on-linkedin/#comments</comments>
		<pubDate>Tue, 04 Dec 2012 08:40:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4468</guid>
		<description><![CDATA[I just posted another pair of articles on LinkedIn: Understanding Relevance Relevance is Complicated I&#8217;ll continue to use The Noisy Channel as a personal blog, but I&#8217;ll be focusing my publishing efforts on LinkedIn. Hope you&#8217;ll follow me there!]]></description>
				<content:encoded><![CDATA[<p>I just posted another pair of articles on LinkedIn:</p>
<ul>
<li><a href="http://www.linkedin.com/today/post/article/20121130160500-50510-understanding-relevance">Understanding Relevance</a></li>
<li><a href="http://www.linkedin.com/today/post/article/20121204080955-50510-relevance-is-complicated">Relevance is Complicated</a></li>
</ul>
<p>I&#8217;ll continue to use The Noisy Channel as a personal blog, but I&#8217;ll be focusing my publishing efforts on LinkedIn. Hope you&#8217;ll <a href="http://www.linkedin.com/today/post/articles/50510">follow me</a> there!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/12/04/continuing-to-post-on-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/12/04/continuing-to-post-on-linkedin/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Follow Me on LinkedIn!</title>
		<link>http://thenoisychannel.com/2012/11/20/follow-me-on-linkedin/</link>
		<comments>http://thenoisychannel.com/2012/11/20/follow-me-on-linkedin/#comments</comments>
		<pubDate>Tue, 20 Nov 2012 19:01:57 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4464</guid>
		<description><![CDATA[I&#8217;ve recently started posting on LinkedIn as part of LinkedIn&#8217;s thought leadership program. My first two posts as an &#8220;influencer&#8221; are a bit meta: Influence and the Attention Economy Measuring Influence I&#8217;m curious to hear what you think &#8212; both of the content and of the platform. I have no immediate intention to migrate the [...]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve recently started posting on LinkedIn as part of LinkedIn&#8217;s <a href="http://blog.linkedin.com/2012/10/02/follow-people/">thought leadership program</a>.</p>
<p>My first two posts as an &#8220;influencer&#8221; are a bit meta:</p>
<ul>
<li><a href="http://www.linkedin.com/today/post/article/20121116153051-50510-influence-and-the-attention-economy?trk=mp-author-card">Influence and the Attention Economy</a></li>
<li><a href="http://www.linkedin.com/today/post/article/20121120165029-50510-measuring-influence?trk=mp-author-card">Measuring Influence</a></li>
</ul>
<p>I&#8217;m curious to hear what you think &#8212; both of the content and of the platform. I have no immediate intention to migrate the The Noisy Channel to LinkedIn, but I&#8217;m very excited by the initial feedback I&#8217;ve gotten for these posts.</p>
<p>Meanwhile, I encourage you to <a href="http://www.linkedin.com/today/post/articles/50510">follow me on LinkedIn</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/11/20/follow-me-on-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/11/20/follow-me-on-linkedin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2012: Notes from a Conference in Paradise</title>
		<link>http://thenoisychannel.com/2012/11/12/cikm-2012-notes-from-a-conference-in-paradise/</link>
		<comments>http://thenoisychannel.com/2012/11/12/cikm-2012-notes-from-a-conference-in-paradise/#comments</comments>
		<pubDate>Mon, 12 Nov 2012 15:00:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4422</guid>
		<description><![CDATA[The moment I learned that CIKM 2012 would be held in Maui, I knew I had to be there. Having co-organized the CIKM 2011 industry event, I had enough karma to be invited as part of this year&#8217;s industry event, representing LinkedIn. PRE-CONFERENCE: PSEUNAMI WARNING AND A WORKSHOP I arrived in Maui on Sunday, October [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.cikm2012.org/conference_venue.php"><img class="alignnone" title="Sheraton Maui" src="http://www.cikm2012.org/gallery/galleries/conference_venue/sheraton%20from%20beach.jpg" alt="" width="583" height="174" /></a></p>
<p>The moment I learned that <a href="http://www.cikm2012.org/">CIKM 2012</a> would be held in <a href="http://www.gohawaii.com/maui">Maui</a>, I knew I had to be there. Having co-organized the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 industry event</a>, I had enough karma to be invited as part of <a href="http://www.cikm2012.org/industry_event.php">this year&#8217;s industry event</a>, representing LinkedIn.</p>
<p><strong>PRE-CONFERENCE: PSEUNAMI WARNING AND A WORKSHOP</strong></p>
<div>
<p>I arrived in Maui on Sunday, October 28th, fortunate to miss the &#8220;<a href="http://www.reuters.com/article/2012/10/28/us-hawaii-tsunami-idUSBRE89R02M20121028">pseunami</a>&#8221; warnings prompted by an earthquake off the Canadian coast. And even more fortunate to be thousands of miles away from <a href="http://en.wikipedia.org/wiki/Hurricane_Sandy">Hurricane Sandy</a>.</p>
<p>Monday, I attended the <a href="https://sites.google.com/site/umsocial2012/">Workshop on Data-Driven User Behavioral Modeling and Mining from Social Media</a>. The topics within this area were diverse: they included Pinterest users, resume-job matching (unfortunately without the benefit of LinkedIn data), and street harassment stories reported via <a href="http://www.ihollaback.org/">Project Hollaback</a>.</p>
</div>
<p>But ironically in this workshop &#8212; and throughout the conference &#8212; there was more use of Twitter data than of Twitter itself. Most of the tweets using the <a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=1641">#cikm2012</a> hashtag were my own.</p>
<p><strong>DAY 1: USER ENGAGEMENT, EVALUATION BIAS</strong></p>
<p>Tuesday opened with a welcome that included statistics showing how far CIKM has come as a top-tier international conference. There were 1,088 submissions this year! But the highlight of the opening was program co-chair <a href="http://www.cc.gatech.edu/~lebanon/">Guy Lebanon</a> demoing <a href="https://plus.google.com/100871861816702740643/posts/Q62AdR5wsgw">software to &#8220;improve&#8221; paper reviews</a>. It was hilarious, if a bit close to home: the automatically generated reviews looked a lot like those generated by allegedly human reviewers.</p>
<p>We then proceeded to a keynote by Yahoo! Research VP <a href="http://research.yahoo.com/Ricardo_Baeza-Yates/">Ricardo Baeza-Yates</a> entitled &#8220;<a href="http://www.cikm2012.org/keynote.php">User Engagement: The Network Effect Matters!</a>&#8221; The title was a bit confusing: it wasn&#8217;t about the conventional &#8220;<a href="http://en.wikipedia.org/wiki/Network_effect">network effect</a>&#8220;, but rather about user engagement across a network of sites like those <a href="http://en.wikipedia.org/wiki/List_of_Yahoo!-owned_sites_and_services">owned by Yahoo</a>. He talked about different ways to measure user engagement, and noted that off-site (or, rather, off-network) links ultimately improve users&#8217; downstream engagement. He also observed that style attributes outperform content attributes as predictors of user engagement. Lots of fascinating observations, but I&#8217;m curious how well they generalize beyond Yahoo.</p>
<p>I spent the rest of the day making tough choices among the various parallel sessions, starting with the morning <a href="http://www.cikm2012.org/schedules.php#session3">session on information retrieval evaluation</a>. Some nuggets from that session: captions and other surface features introduce significant evaluation bias; assessors have poor agreement when evaluating relevance in <a href="http://en.wikipedia.org/wiki/Electronic_discovery">eDiscovery</a> contexts; and system evaluation improves when it models user differences.</p>
<p>After lunch, I attended a <a href="http://www.cikm2012.org/schedules.php#sessions4">session on web search</a>. Some themes from that session: neighborhood-based methods are effective, whether the neighborhoods are based on document or user similarity; entities and <a href="http://thenoisychannel.com/2012/06/07/scale-structure-and-semantics/">structure</a> are increasingly important for web search. After the coffee break, I went to the <a href="http://www.cikm2012.org/schedules.php#session10">social networks session</a>. Topics there included <a href="http://poptech.org/e1_duncan_watts">social contagion</a>, online question answering, and social network data anonymization. The talks wrapped up just in time for us to watch the daily <a href="http://www.sheraton-maui.com/property/cliffdiveceremony">cliff diving ceremony</a> before heading to the <a href="http://www.cikm2012.org/posters.php">poster session</a>.</p>
<p><strong>DAY 2: QUERY PERFORMANCE PREDICTION, ABANDONMENT</strong></p>
<p>Wednesday opened with a keynote by CMU professor <a href="http://www.cs.cmu.edu/~wcohen/">Wllliam Cohen</a> on &#8220;<a href="http://www.cikm2012.org/keynote.php#william">Learning Similarity Measures based on Random Walks in Graphs</a>&#8220;. He described the framework and techniques that he and his colleagues used to build <a href="http://rtw.ml.cmu.edu/rtw/kbbrowser/">NELL (&#8220;Never-Ending Language Learning&#8221;)</a>. The keynote was pretty dense, but there are lots of papers available on the <a href="http://rtw.ml.cmu.edu/rtw/publications">NELL publications page</a>.</p>
<p>Then back to choosing among parallel sessions. Although I was tempted by the <a href="http://www.cikm2012.org/schedules.php#session15">recommender systems session</a> featuring presentations by my LinkedIn colleagues <a href="http://www.linkedin.com/in/mitultiwari">Mitul Tiwari</a> and <a href="http://www.linkedin.com/pub/bee-chung-chen/5/a8b/877">Bee-Chung Chen</a>, I instead attended the <a href="http://www.cikm2012.org/schedules.php#session13">session on ads and products</a>. Two take-aways from that session: ad targeting benefits from explicit identification of user interests; influence maximization can be modeled adversarially as a two-player game.</p>
<p>After lunch, I attended the <a href="http://www.cikm2012.org/schedules.php#session18">session on formal retrieval models and learning to rank</a>. I most enjoyed the two talks by <a href="http://iew3.technion.ac.il/~kurland/">Oren Kurland</a> that focused on <a href="http://thenoisychannel.com/2010/05/23/estimating-the-query-difficulty-for-information-retrieval/">query performance prediction</a>. In particular, he offered a <a href="http://iew3.technion.ac.il/~kurland/probQPP.pdf">comprehensive probabilistic prediction framework</a> that unifies most of the previously proposed prediction methods using a common formal basis. The session also included a deep dive into aspects of the IBM <a href="http://en.wikipedia.org/wiki/Watson_(computer)">Watson</a> question-answering system.</p>
<p>After the coffee break, I headed to another <a href="http://www.cikm2012.org/schedules.php#session22">session on web search</a> &#8211; one of my favorite sessions of the conference. There was a talk on query segmentation, a topic responsible for my <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/">most popular blog post</a>. Also a great talk on identifying good abandonment, a problem I&#8217;ve been interesting ever since hearing about it at <a href="http://www.slideshare.net/ChaToX/when-no-clicks-are-good-news">SIGIR 2010</a>. Another talk about learning from search logs: generalizing from <a href="http://glinden.blogspot.com/2007/08/effectiveness-of-personalized-search.html">click entropy</a> to &#8220;click pattern entropy&#8221; to analyze query ambiguity. And a talk on modeling domain-dependent query reformulation as machine translation using a pseudo-parallel corpus. All in all, a great session packed with practical content.</p>
<p>Then came a purely social evening. The conference reception was a <a href="http://en.wikipedia.org/wiki/Luau">luau</a>, complete with <a href="http://en.wikipedia.org/wiki/Kalua">kalua pig</a>, <a href="http://en.wikipedia.org/wiki/Mai_Tai">mai tais</a>, <a href="http://en.wikipedia.org/wiki/Hula">hula</a>, and of course <a href="http://en.wikipedia.org/wiki/Poi_(food)">poi</a>. Certainly my most memorable conference banquet. I didn&#8217;t take pictures, but I recommend <a href="http://www.linkedin.com/pub/craig-stanfill/10/1b9/9a0">Craig Stanfill</a>&#8216;s photos on <a href="http://www.flickr.com/photos/photo_fiend/sets/72157631926886096/">Flickr</a>.</p>
<p>And then some of us went to downtown <a href="http://en.wikipedia.org/wiki/Lahaina,_Hawaii">Lahaina</a> to see how the locals (and tourists) <a href="http://www.facebook.com/lahaina.maui.halloween/photos_stream">celebrate Halloween</a>. Much as I missed spending <a href="http://sphotos-a.xx.fbcdn.net/hphotos-ash3/535560_4706231851012_1850678174_n.jpg">Halloween with my family</a>, I had a blast!</p>
<p><strong>DAY 3: INDUSTRY EVENT</strong></p>
<p>Thursday began with the last conference keynote: University of Kansas provost <a href="http://www.provost.ku.edu/jsv">Jeffrey Vitter</a> on &#8220;<a href="http://www.cikm2012.org/keynote.php#jeffrey">Compressed Data Structures with Relevance</a>&#8220;. Like the previous keynotes, it was fairly dense, and I suggest you read the papers cited in his <a href="http://www.ittc.ku.edu/~jsv/Papers/Vit12.CIKMkeynote.pdf">abstract</a> if you&#8217;re interested in the technical details of how to search for query patterns in massive document collections.</p>
<p>Then came my main reason for attending the conference: the <a href="http://www.cikm2012.org/industry_event.php">industry event</a>. As seems to have become a <a href="http://technocalifornia.blogspot.com/2009/08/umap-and-sigir-09.html">pattern at information retrieval conferences</a>, the industry event dominated the other parallel sessions, drawing a standing-room only crowd.</p>
<p>The event started with <strong>eBay</strong> VP of Research <a href="http://labs.ebay.com/eric-brill.html">Eric Brill</a> talking about &#8220;Having A Great Career in Research&#8221;. Unusual in a conference talk, he offered personal and practical advice to students on how to focus their passion and effort towards a happy and successful career. It reminded me of my blog post about <a href="http://thenoisychannel.com/2011/08/21/dream-fit-passion/">dream, fit, and passion</a>, and I hope students took it to heart.</p>
<p><strong>IBM</strong> researcher <a href="http://researcher.watson.ibm.com/researcher/view.php?person=il-CARMEL">David Carmel</a> gave a talk entitled &#8220;Is This Entity Relevant to Your Needs?&#8221;. Noting that 71% of web search queries contain named entities (people, places, organizations), he advocated a probabilistic ranking approach to entity-oriented search that ranks retrieved entities according to amount and quality of supporting evidence.</p>
<p><strong>Microsoft</strong> Technical Fellow (and former Yahoo! Fellow) <a href="http://pages.cs.wisc.edu/~raghu/">Raghu Ramakrishnan</a> talked about &#8220;The Future of Information Discovery and Search: Content Optimization, Interactivity, Semantics, and Social Networks&#8221;. He packed in a lot of nice material, most of which was from his tenure at Yahoo. He included a nice explanation of <a href="http://en.wikipedia.org/wiki/Multi-armed_bandit">explore/exploit</a>, which was also a reminder of how lucky we are at LinkedIn to have hired his former Yahoo colleague <a href="http://www.linkedin.com/in/dipu1025">Deepak Agarwal</a>.</p>
<p>After lunch, <strong>WalmartLabs</strong> Chief Scientist <a href="http://pages.cs.wisc.edu/~anhai/">AnHai Doan</a> gave a talk entitled &#8220;Social Media, Data Integration, and Human Computation&#8221;, in which he described constructing a &#8220;social genome&#8221; by mining social data, connecting it to web data, representing the combined information in a knowledge base. If you&#8217;re interested in more details, he&#8217;ll be giving an extended version of that talk <a href="http://events.linkedin.com/social-media-data-integration-1168301">at LinkedIn on November 29th</a>!</p>
<p><strong>Tencent</strong> Research Director <a href="http://www.linkedin.com/pub/chao-liu/42/a8a/aa0">Chao Liu</a> talked about &#8220;Question Answering through Tencent Open Platform&#8221;. Beyond giving a great overview of one of the <a href="http://en.wikipedia.org/wiki/Tencent_Holdings">world&#8217;s largest internet platforms</a>, he delivered great self-deprecating lines like &#8220;The name is ten cents, and the search engine is <a href="http://www.soso.com/">soso</a>&#8220;.</p>
<p><a href="http://www.linkedin.com/in/dtunkelang">I</a> spoke next about <strong>LinkedIn</strong>&#8216;s &#8220;<a href="http://thenoisychannel.com/2012/11/11/data-by-the-people-for-the-people/">Data By The People, For The People</a>&#8220;. Given that the talk was right after Halloween and just before the presidential elections, I thought it appropriate to choose a title that would have appealed to one of America&#8217;s most distinguished <a href="http://en.wikipedia.org/wiki/Gettysburg_Address">presidents</a> and <a href="http://www.imdb.com/title/tt1611224/">vampire hunters</a>. If you&#8217;re curious to learn more about data science and engineering at LinkedIn (including the publications I cited in my talk), check out <a href="http://data.linkedin.com/">http://data.linkedin.com/</a>.</p>
<p><strong>Groupon</strong> Director of Research <a href="http://www.cs.iastate.edu/~parekh/">Rajesh Parekh</a> talked about &#8220;Leveraging Data to Power Local Commerce&#8221;. He focused on a key problem Groupon faces: determining and optimal category mix for each local market. He described how they approach this problem using <a href="http://en.wikipedia.org/wiki/Modern_portfolio_theory">portfolio theory</a>.</p>
<p>After a coffee break, <strong>Adobe</strong> Chief Software Architect <a href="http://www.adobe.com/aboutadobe/pressroom/executivebios/tommalloy.html">Tom Malloy</a> talked about &#8220;Revolutionizing Digital Marketing with Big Data Analytics&#8221;. <a href="http://pig.apache.org/">Apache Pig</a> co-creator &#8212; and now Google researcher &#8212; <a href="http://infolab.stanford.edu/~olston/">Christopher Olston</a> talked about work he did at Yahoo on &#8220;Programming and Debugging Large-Scale Data Processing Workflows&#8221;. Finally, Microsoft Distinguished Engineer <a href="http://www.linkedin.com/in/xdhuang">Xuedong (&#8220;XD&#8221;) Huang</a> gave a talk entitled &#8220;From HyperText to HyperTEC&#8221;, in which he woke up the audience by having us all participate in the &#8220;<a href="http://www.bingiton.com/">Bing it On</a>&#8221; challenge.</p>
<p><strong>FINAL THOUGHTS</strong></p>
<p>All in all, CIKM 2012 was a great conference in an idyllic setting. Holding the conference in Maui might have been a bit distracting, but the desirability of the location also ensure a high-quality program.</p>
<p>My main complaint is that I don&#8217;t like parallel sessions &#8212; especially when the topics overlap significantly (e.g., web search sessions competed with those on ranking and recommendations). I&#8217;m also not convinced that talks have to be 25 minutes long. Perhaps the conference could more to a format of shorter talks and at least reduce the number of parallel sessions. It would also be great to see more opportunity for interaction &#8212; the coffee breaks always felt too short. For more of my thoughts on reforming academic conferences, see my <a href="http://thenoisychannel.com/2009/08/02/are-academic-conferences-broken-can-we-fix-them/">2009 blog post</a> on the subject.</p>
<p>I also wish more attendees would embrace social media. It&#8217;s ironic that researchers who depend so heavily on social media data (especially Twitter) don&#8217;t engage in it personally. While I&#8217;m honored to have been the conference&#8217;s unofficial tweeter (see this <a href="http://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=1641">visualization of the #cikm2012 tweets</a>), I would have liked to see more attendees engage in a public online conversation. Hopefully others will at least blog about the conference.</p>
<div>But these are quibbles. CIKM continues to be an outstanding conference, and I&#8217;m very excited it&#8217;s coming to the Bay Area next year. See you at <a href="http://www.cikm2013.org/">CIKM 2013</a>!</div>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/11/12/cikm-2012-notes-from-a-conference-in-paradise/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/11/12/cikm-2012-notes-from-a-conference-in-paradise/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data By The People, For The People</title>
		<link>http://thenoisychannel.com/2012/11/11/data-by-the-people-for-the-people/</link>
		<comments>http://thenoisychannel.com/2012/11/11/data-by-the-people-for-the-people/#comments</comments>
		<pubDate>Mon, 12 Nov 2012 03:41:12 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4433</guid>
		<description><![CDATA[I was fortunate this year not only to be able to attend the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012) in Maui, but also to be invited as part of this year&#8217;s industry event, representing LinkedIn. Above are the slides I presented on &#8220;Data By The People, For The People&#8220;. Enjoy!]]></description>
				<content:encoded><![CDATA[<p><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/14991347?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"></iframe></p>
<div style="margin-bottom: 5px;">
<p>I was fortunate this year not only to be able to attend the 21st ACM International Conference on Information and Knowledge Management (<a href="http://www.cikm2012.org/">CIKM 2012</a>) in <a href="http://www.gohawaii.com/maui">Maui</a>, but also to be invited as part of <a href="http://www.cikm2012.org/industry_event.php">this year&#8217;s industry event</a>, representing LinkedIn.</p>
<p>Above are the slides I presented on &#8220;<a href="http://www.slideshare.net/dtunkelang/data-by-the-people-for-the-people">Data By The People, For The People</a>&#8220;. Enjoy!</p>
<div></div>
</div>
<div style="margin-bottom: 5px;"></div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/11/11/data-by-the-people-for-the-people/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/11/11/data-by-the-people-for-the-people/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>LinkedIn at CIKM 2012</title>
		<link>http://thenoisychannel.com/2012/10/26/linkedin-at-cikm-2012/</link>
		<comments>http://thenoisychannel.com/2012/10/26/linkedin-at-cikm-2012/#comments</comments>
		<pubDate>Sat, 27 Oct 2012 05:14:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4405</guid>
		<description><![CDATA[Last year, I had the pleasure of co-organizing the CIKM 2011 Industry Event in Glasgow. This year, I&#8217;m honored to be part of the CIKM 2012 Industry Event program, along with top industry researchers from Adobe, eBay, Google, Groupon, IBM, Microsoft, Tencent, and Walmart Labs. I&#8217;ll be giving a talk on &#8220;Data By The People, [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://data.linkedin.com/admin/publications"><img class="alignnone  wp-image-4406" title="LinkedIn at CIKM 2012" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/10/IN_CIKM2012.jpeg" alt="" width="489" height="165" /></a></p>
<p>Last year, I had the pleasure of co-organizing the <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/">CIKM 2011 Industry Event</a> in Glasgow. This year, I&#8217;m honored to be part of the <a href="http://www.cikm2012.org/industry_event.php">CIKM 2012 Industry Event</a> program, along with top industry researchers from Adobe, eBay, Google, Groupon, IBM, Microsoft, Tencent, and Walmart Labs. I&#8217;ll be giving a talk on &#8220;<a href="http://www.cikm2012.org/industry_event.php#Daniel">Data By The People, For The People</a>&#8220;.</p>
<p>I&#8217;m also thrilled to be joined by my colleague <a href="http://www.linkedin.com/in/mitultiwari">Mitul Tiwari</a>, who will be presenting a paper on &#8220;<a href="http://mitultiwari.posterous.com/related-searches-at-linkedin">Metaphor: A System for Related Search Recommendations</a>&#8220;, work that he did with <a href="http://www.linkedin.com/in/azariasreda">Azarias Reda</a>, <a href="http://www.linkedin.com/pub/yubin-park/25/559/310">Yubin Park</a>, <a href="http://www.linkedin.com/in/christianposse">Christian Posse</a>, and <a href="http://www.linkedin.com/in/shahsam">Sam Shah</a>.</p>
<p>Finally, we&#8217;ll be representing <a href="http://www.linkedin.com/in/manuelgomezrodriguez">Manuel Gomez-Rodriguez</a> and <a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a> at the poster session, presenting their work on &#8220;<a href="http://www.stanford.edu/~manuelgr/pubs/events-cikm12.pdf">Bridging Ofﬂine and Online Social Graph Dynamics</a>&#8220;.</p>
<p>If you&#8217;re attending CIKM, please make sure to say hi to me and Mitul. We&#8217;d be delighted to talk to you about the work that we and our colleagues are doing. Our data team also has a <a href="http://data.linkedin.com/">site</a> showcasing our <a href="http://data.linkedin.com/team">team</a>, our <a href="http://data.linkedin.com/publications">publications</a>, and some of the <a href="http://data.linkedin.com/projects">projects</a> we can discuss publicly. And, of course, we&#8217;re <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=3210547">hiring</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/10/26/linkedin-at-cikm-2012/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/10/26/linkedin-at-cikm-2012/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>HCIR 2012: A Personal Report</title>
		<link>http://thenoisychannel.com/2012/10/08/hcir-2012-a-personal-report/</link>
		<comments>http://thenoisychannel.com/2012/10/08/hcir-2012-a-personal-report/#comments</comments>
		<pubDate>Mon, 08 Oct 2012 07:34:46 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4383</guid>
		<description><![CDATA[Human-computer information retrieval (HCIR) is the study of information retrieval techniques that integrate human intelligence and algorithmic search to help people explore, understand, and use information. Since 2007, the HCIR Symposium (previously known as the HCIR Workshop) has provided a venue for the theoretical and practical study of HCIR. We even inspired an EuroHCIR workshop across the [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone  wp-image-4368" title="Charles River, as seen from Cambridge, MA" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/09/Cambridge.png" alt="" width="480" height="234" /></p>
<p><a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">Human-computer information retrieval</a> (HCIR) is the study of information retrieval techniques that integrate human intelligence and algorithmic search to help people explore, understand, and use information. Since 2007, the <a href="http://hcir.info/">HCIR Symposium</a> (previously known as the HCIR Workshop) has provided a venue for the theoretical and practical study of HCIR. We even inspired an <a href="http://fitlab.eu/euroHCIR2012/">EuroHCIR</a> workshop across the pond that started in 2011 and is going strong.</p>
<p><strong>Overview</strong></p>
<p><strong></strong>The Sixth Symposium on Human-Computer Interaction and Information Retrieval (<a href="http://hcir.info/hcir-2012/">HCIR 2012</a>) took place on October 4th and 5th at <a href="http://domino.research.ibm.com/cambridge/research.nsf/pages/index.html" rel="nofollow">IBM Research</a> in Cambridge, MA. The 75 attendees represented a cross-section of HCIR research and practice. Over a third of the attendees were from industry &#8212; including startups and large technology firms. We had a similar diversity of sponsors, benefiting from the generosity of <a href="http://www.fxpal.com/">FXPAL</a>, <a href="http://research.ibm.com/">IBM Research</a>, <a href="http://data.linkedin.com/">LinkedIn</a>, <a href="http://www.mendeley.com/">Mendeley</a>, <a href="http://research.microsoft.com/">Microsoft Research</a>, <a href="http://www.csail.mit.edu/">MIT CSAIL</a>, and <a href="http://www.oracle.com/">Oracle</a>. And we had participants from 6 countries: Canada, Germany, Israel, New Zealand, Switzerland, and the United States.</p>
<p><strong>Keynote</strong></p>
<p><a href="http://people.ischool.berkeley.edu/~hearst/"><img class="alignnone" title="Marti Hearst" src="http://www.eecs.berkeley.edu/Faculty/Photos/Homepages/hearst.jpg" alt="" width="150" height="210" /></a></p>
<p>We started the Symposium with a keynote from UC Berkeley professor <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>, a pioneer in the area of <a href="http://searchuserinterfaces.com/">search user interfaces</a>, as well as a prominent researcher of information visualization, natural language processing, and social media analysis. Marti set the tone for the symposium with a visionary keynote that she entitled her &#8220;Halloween Cauldron of Ideas for Research&#8221;.</p>
<p>She started by talking about the unaddressed seams of <a href="http://en.wikipedia.org/wiki/Sensemaking">sensemaking</a>, reminding us that <a href="http://en.wikipedia.org/wiki/Information_seeking">information seeking</a> is only one part of an overall sensemaking process. She used the challenge of saving and personally organizing search results as an example of a neglected but crucial part of a search interface.</p>
<p>She then challenged us to think about how audio could be used in search interfaces. She cited a study showing that programmers comment their code better when the commenting interface uses speech rather than the keyboard. She then challenged us to consider how auditory notification or feedback could enhance the search experience.</p>
<p>Finally, she presented the idea of &#8220;radical collaboration&#8221;, offering as an example the use of <a href="http://en.wikipedia.org/wiki/Amazon_Mechanical_Turk">Mechanical Turk</a> to crowdsource vacation planning. The plans were tested by real tourists, who were delighted with the results.</p>
<p>Marti&#8217;s keynote was not only insightful and entertaining (one of her slides featured <a href="http://boingboing.net/2009/08/12/brain-cupcakes.html">brain cupcakes</a>!), but notable in how much she engaged all of us in discussion throughout her presentation. This approach was especially appropriate for an HCIR Symposium, given our emphasis on human interaction. For more detail about the keynote, I recommend <a href="http://palblog.fxpal.com/?p=5482">Gene Golovchinsky&#8217;s summary</a>.</p>
<p><strong>Short Paper Presentations</strong></p>
<p>After a coffee break, we had a session devoted to 5 short papers. Each presenter had 10 minutes: 5 minutes to present and 5 minutes for discussion.</p>
<ul>
<li>We started off with UXLabs director <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a> presenting &#8220;<a href="http://ils.unc.edu/hcir2012/hcir2012_submission_8.pdf">Designing for Consumer Search Behaviour</a>&#8220;, joint work with University College London researcher <a href="http://www.ucl.ac.uk/uclic/people/s_makri">Stephann Makri</a>. Tony could not attend in person, so he submitted a video. He presented a framework for describing consumer search behavior along with concrete examples &#8212; many of them familiar from the time that Tony and I both worked at <a href="http://endeca.com/">Endeca</a>. Most of all, he emphasized the need to close the gap between information science research and industry practice.</li>
<li>Then MIT professor (and <a href="http://groups.csail.mit.edu/haystack/">Haystack</a> principal investigator) <a href="http://people.csail.mit.edu/karger/">David Karger</a> talked about &#8220;<a href="http://ils.unc.edu/hcir2012/hcir2012_submission_29.pdf">Standards Opportunities around Data-Bearing Web Pages</a>&#8220;. He argued that there is a small set of standard user interface patterns for authoring structured data: text search, sorting by properties, presenting items in a template, and <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted browsing</a>. He then advocated that these primitives (which have already been implemented in the popular <a href="http://www.simile-widgets.org/exhibit/">Exhibit</a> framework) be incorporated into a <a href="http://www.w3.org/">W3C</a> standard so that content authors can use them with the expectation that all modern browsers support them.</li>
<li>Next, Harvard student <a href="http://people.seas.harvard.edu/~eagapie/">Elena Agapie</a> presented joint work that she did at FXPAL with <a href="http://www.fxpal.com/?p=gene">Gene Golovchinsky</a> and <a href="http://www.fxpal.com/?p=pernilla">Pernilla Qvarfordt</a>, entitled &#8220;<a href="http://ils.unc.edu/hcir2012/hcir2012_submission_21.pdf">Encouraging Behavior: A Foray into Persuasive Computing</a>&#8220;. Information retrieval researchers and practitioners have often argued that longer queries lead to better retrieval performance. But how do we get users to enter longer queries. Elena and colleagues found that the best way was not to explicitly tell them that longer queries are better, but rather to present a halo around the search box that changes color as the query gets longer. A very interesting approach to apply <a href="http://en.wikipedia.org/wiki/Persuasive_technology">persuasive technology</a> to search!</li>
<li>Then Rutgers student <a href="http://comminfo.rutgers.edu/~rgonzal/">Roberto González-Ibáñez</a> presented joint work with <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah</a> and <a href="http://research.microsoft.com/en-us/um/people/ryenw/">Ryen White</a> on &#8220;<a href="http://ils.unc.edu/hcir2012/hcir2012_submission_9.pdf">Pseudo-Collaboration as a Method to Perform Selective Algorithmic Mediation in Collaborative IR Systems</a>&#8220;. He presented a novel approach that identified when a user should be aided by a collaborator, and to what extent such help could enhance the user&#8217;s search success. An interesting way to achieve the benefits of both user-mediated and system-mediated collaboration.</li>
<li>Finally, University of Washington student <a href="http://jeffhuang.com/">Jeff Huang</a> presented joint work with <a href="http://www.ucl.ac.uk/uclic/people/a_diriye">Abdigani Diriye</a> on &#8220;<a href="http://jeffhuang.com/Final_TouchTracking_HCIR12.pdf">Web User Interaction Mining from Touch-Enabled Mobile Devices</a>&#8220;. He focused on the practical concerns of instrumenting interaction with search engines in mobile environments. Specifically, he suggested tracking the viewport coordinates &#8212; that is, the visible portion of the page at any given time.</li>
</ul>
<p>The short presentation format was extremely effective, encouraging presenters to communicate their ideas efficiently and leaving ample time for discussion.</p>
<p><strong>Posters and Demos</strong></p>
<p>As in previous years, we followed lunch with a vibrant session for posters and demos. Some of the more popular poster themes included question answering, task difficulty, and collaborative information seeking. Here is the full list of poster / demo presentations:</p>
<ul>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_6.pdf">Developing a Typology of Online Q&amp;A Models and Recommending the Right Model for Each Question Type</a><br />
<a href="http://comminfo.rutgers.edu/directory/swchoi/index.html#.UHJqpfl25eI">Erik Choi</a>, <a href="http://eden.rutgers.edu/~vkitzie/librarydiary/">Vanessa Kitzie</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_10.pdf">Investigating Positive and Negative Affects in Collaborative Information Seeking: A Pilot Study Report</a><br />
<a href="http://comminfo.rutgers.edu/~rgonzal/">Roberto González-Ibáñez</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_7.pdf">To Ask or Not to Ask, That is The Question: Investigating Methods and Motivations for Online Q&amp;A</a><br />
<a href="http://eden.rutgers.edu/~vkitzie/librarydiary/">Vanessa Kitzie</a>, <a href="http://comminfo.rutgers.edu/directory/swchoi/index.html#.UHJqpfl25eI">Erik Choi</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_4.pdf">Information Seeking Tasks: Why Do Searchers Feel Difficult?</a><br />
<a href="http://www.libsci.sc.edu/fsd/liu/liu.html">Jingjing Liu</a>, <a href="https://www.southernct.edu/ils/changsukkimphd/">Chang Suk Kim<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_27.pdf">Finding Literary Themes with Relevance Feedback</a><br />
<a href="http://www.eecs.berkeley.edu/~aditi/">Aditi Muralidharan</a>, <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_30.pdf">InFrame-Browsing: Enhancing Standard Web Search</a><br />
<a href="http://www.findke.ovgu.de/en/nitsche.html">Marcus Nitsche</a>, <a href="http://wwwiti.cs.uni-magdeburg.de/~nuernb/">Andreas Nürnberger<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_32.pdf">Trailblazer: Towards the Design of an Exploratory Search User Interface</a><br />
<a href="http://www.findke.ovgu.de/en/nitsche.html">Marcus Nitsche</a>, <a href="http://wwwiti.cs.uni-magdeburg.de/~nuernb/">Andreas Nürnberger<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_11.pdf">min: A Multi-Modal Web Interface for Math Search</a><br />
<a href="https://plus.google.com/113988956382546153422/posts">Christopher Sasarak</a>, <a href="http://www.cs.rit.edu/~dprl/Members.html">Kevin Hart</a>, <a href="http://www.facebook.com/siyuzhu">Siyu Zhu</a>, <a href="http://www.pospeselr.com/">Richard Pospesel</a>, <a href="https://plus.google.com/114623935330432864362/posts">David Stalnaker</a>, <a href="http://people.rit.edu/lxh9868/">Lei Hu</a>, <a href="https://plus.google.com/117625652769752552934/posts">Robert Livolsi</a>, <a href="http://www.cs.rit.edu/~rlaz/">Richard Zanibbi<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_23.pdf">Search Tactics in Collaborative Exploratory Web Search</a><br />
<a href="http://www.sis.pitt.edu/~zyue/">Zhen Yue</a>, <a href="http://www.pitt.edu/~shh69/">Shuguang Han</a>, <a href="http://www.sis.pitt.edu/~daqing/">Daqing He<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_14.pdf">Developing a Dual-Process Information-Seeking Model for Exploratory Search</a><br />
<a href="http://mikezarro.com/">Michael Zarro<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_13.pdf">Interactive Data Mining at the Speed of Thought</a><br />
<a href="http://www.linkedin.com/pub/vladimir-zelevinsky/0/742/a91">Vladimir Zelevinsky<br />
</a></li>
<li><a href="http://ils.unc.edu/hcir2012/hcir2012_submission_18.pdf">Do Users with Different Domain Knowledge Select Different Sets of Documents?</a><br />
<a href="http://comminfo.rutgers.edu/~xzhang/">Xiangmin Zhang</a>, <a href="http://www.libsci.sc.edu/fsd/liu/liu.html">Jingjing Liu</a>, <a href="http://www.linkedin.com/pub/xiaojun-yuan/41/86b/132">Xiaojun Yuan</a>, <a href="http://www.linkedin.com/pub/michael-cole/5/b19/5a9">Michael Cole</a>, <a href="http://comminfo.rutgers.edu/~belkin/belkin.html">Nicholas Belkin</a>, <a href="http://comminfo.rutgers.edu/directory/changl/index.html#.UHJyA_l25eI">Chang Liu<br />
</a></li>
<li>Predicting Task Difficulty from a User&#8217;s Moment to Moment Cognitive Effort During Information Seeking<br />
<a href="http://www.linkedin.com/pub/michael-cole/5/b19/5a9">Michael Cole</a>, <a href="http://comminfo.rutgers.edu/~jacekg/">Jacek Gwizdka</a>, <a href="http://comminfo.rutgers.edu/directory/changl/index.html#.UHJyA_l25eI">Chang Liu</a>, <a href="http://comminfo.rutgers.edu/~belkin/belkin.html">Nicholas Belkin<br />
</a></li>
<li>Effects of Domain Knowledge on User Task Performance in a Knowledge Domain Visualization System<br />
<a href="http://www.linkedin.com/pub/xiaojun-yuan/41/86b/132">Xiaojun Yuan</a>, <a href="http://www.pages.drexel.edu/~cc345/">Chaomei Chen</a>, <a href="http://comminfo.rutgers.edu/~xzhang/">Xiangmin Zhang</a>, <a href="http://www.linkedin.com/pub/joshua-avery/11/3b8/170">Joshua Avery</a>, <a href="http://www.linkedin.com/pub/tao-xu/27/338/940">Tao Xu<br />
</a></li>
<li>Investigating the Effect of Visualization on User Performance of Information Systems<br />
<a href="http://www.linkedin.com/pub/xiaojun-yuan/41/86b/132">Xiaojun Yuan</a></li>
</ul>
<p><strong>Full Paper Presentations</strong></p>
<p>The full paper presentations were split into two sessions, the first held on the 4th and the second held on the 5th. Each presentation slot was 30 minutes. The full papers will be made available soon through the <a href="http://dl.acm.org/">ACM Digital Library</a>.</p>
<ul>
<li>University of Magdeburg student <a href="http://www.findke.ovgu.de/en/nitsche.html">Marcus Nitsche</a> presented &#8220;Knowledge Journey: A Web Search Interface for Young Users&#8221;, joint work with <a href="http://www.findke.ovgu.de/Mitarbeiter/Wissenschaftliche+Mitarbeiter/Tatiana+Gossen.html">Tatiana Gossen</a> and <a href="http://wwwiti.cs.uni-magdeburg.de/~nuernb/">Andreas Nürnberger</a>. The authors performed a study in which they found that children liked having personalized avatars that offer guidance, a wheel-shaped browsing menu, and a coverflow-style results presentation. It will be interesting to see how their study holds up in larger-scale user studies, and whether adults like some of these interface elements too.</li>
<li>Oregon State University professor <a href="http://eecs.oregonstate.edu/people/jensen">Carlos Jensen</a> presented &#8220;Leyline: Provenance-Based Search Using a Graphical Sketchpad&#8221;, joint work with <a href="http://people.oregonstate.edu/~ghorashs/HomepageTestPage.html">Seyedsoroush Ghorashi</a>. I was intrigued to see a search approach focused entirely on <a href="http://en.wikipedia.org/wiki/Provenance">provenance</a> &#8212; that is, the history of a document&#8217;s ownership and transformations. I&#8217;m particularly curious about this area, since I&#8217;m a committee member for <a href="http://users.soe.ucsc.edu/~aleatha/">Aleatha Parker-Wood</a>, who is pursuing a dissertation on &#8220;<a href="http://www.cris.soe.ucsc.edu/pub/parkerwood-ssrctr-12-01.html">Making Sense of File Systems Through Provenance and Rich Metadata</a>&#8220;.</li>
<li>University of Waterloo professor <a href="http://www.mansci.uwaterloo.ca/~msmucker/">Mark Smucker</a> presented joint work with <a href="http://plg.uwaterloo.ca/~claclark/">Charlie Clarke</a> on &#8220;Modeling User Variance in Time-Biased Gain&#8221;. Their simulation-based approach produced distributions of gain that agree with distributions produced by real users. By emphasizing the effect size of differences, their approach could help uncover how much the performance differences among systems matter to real users.</li>
<li>Finally, University of North Carolina at Chapel Hill professor <a href="http://www.ils.unc.edu/~wildem/wildemuth.html">Barbara Wildemuth</a> and University of British Columbia professor <a href="http://faculty.arts.ubc.ca/lfreund/">Luanne Freund</a> delivered a highly interactive presentation on &#8220;Assigning Search Tasks Designed to Elicit Exploratory Search Behaviors&#8221;. They performed an extensive survey of information exploration literature to identify concepts that authors have used to characterize e<a href="http://en.wikipedia.org/wiki/Exploratory_search">xploratory search</a> tasks. They tested examples on the audience to see how well we agreed with their characterization and with one another.</li>
</ul>
<p><strong>HCIR Challenge</strong></p>
<p>With Friday morning came the most anticipated event of the Symposium: the <a href="HCIR Challenge">HCIR Challenge</a>. The Challenge is now in its third year: the <a href="http://hcir.info/hcir-2010/challenge">2010 Challenge</a> focused on historical exploration of news using the <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">New York Times Annotated Corpus</a>; the <a href="http://hcir.info/hcir-2011/challenge">2011 Challenge</a> focused on the problem of information availability using the <a href="http://citeseer.ist.psu.edu/index">CiteSeer</a> digital library of scientific literature.</p>
<p>This year, we turned to the problem of people and expertise finding, a topic of obvious <a href="http://www.linkedin.com/in/dtunkelang">personal</a> interest. We are grateful to <a href="http://www.mendeley.com/">Mendeley</a> for providing this year&#8217;s corpus: a database of over a million researcher profiles with associated metadata including published papers, academic status, disciplines, awards, and more taken from Mendeley&#8217;s network of 1.6M+ researchers and 180M+ academic documents.</p>
<p>We asked participants to build systems that could perform three kinds of tasks:</p>
<ol>
<li><strong>Hiring. </strong>Given a job description, produce a set of suitable candidates for the position.</li>
<li><strong>Assembling a Conference Program.</strong> Given a conference&#8217;s past history, produce a set of suitable candidates for keynotes, program committee members, etc. for the conference.</li>
<li><strong>Finding People to deliver Patent Research or Expert Testimony.</strong> Given a patent, produce a set of suitable candidates who could deliver relevant research or expert testimony for use in a trial. These people can be further segmented, e.g., students and other practitioners might be good at the research, while more senior experts might be more credible in high-stakes litigation.</li>
</ol>
<p>Each of the 5 teams was given 30 minutes to present.</p>
<ul>
<li>École Polytechnique Fédérale de Lausanne student <a href="http://people.epfl.ch/na.li">Na Li</a> presented &#8220;<a href="https://docs.google.com/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MmIxODMzOWYzMjQyMTVhYw">Magnifico: A Platform For Expert Mining Using Metadata</a>&#8220;, joint work with <a href="http://www.linkedin.com/pub/lei-zhou/38/29b/528">Lei Zhou</a> and <a href="http://people.epfl.ch/denis.gillet">Denis Gillet</a>. Magnifico used a modified <a href="http://en.wikipedia.org/wiki/Tf%E2%80%93idf">TF-IDF</a> approach &#8212; where the IDF is an inverse <em>discipline</em> frequency &#8212; to match search queries to topic experts. It also assigned a multi-disciplinary reputation metric based on the expertise distribution of an author&#8217;s readers.</li>
<li>Ben-Gurion University student <a href="http://www.linkedin.com/pub/dima-kagan/4b/995/b28">Dima Kagan</a> presented &#8220;<a href="https://docs.google.com/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NjJjNWU4YjM5ZjlkYjM2Ng">Social Network Based Search for Experts</a>&#8220;, joint work with <a href="http://www.linkedin.com/pub/yehonatan-bitton/34/b9a/7a5">Yehonatan Bitton</a>, <a href="http://www.linkedin.com/pub/michael-fire/22/218/772">Michael Fire</a>, <a href="http://www.linkedin.com/pub/bracha-shapira/6/61b/b4">Bracha Shapira</a>, <a href="http://www.ise.bgu.ac.il/faculty/liorr/">Lior Rokach</a>, and <a href="http://www.biu.ac.il/faculty/Judit/">Judit Bar-Ilan</a>. Their system made excellent use of additional publicly available data, cross-referencing the Mendeley user profiles with data from <a href="http://academia.edu/">Academia.edu</a> and using <a href="http://academic.research.microsoft.com/">Microsoft Academic Search</a> to categorize publication and journals. You can try out their application <a href="http://proj.ise.bgu.ac.il/15/ExpertRecommendation/index.html">here</a>.</li>
<li>University of Pittsburgh student <a href="http://www.pitt.edu/~shh69/">Shuguang Han</a> presented &#8220;<a href="https://docs.google.com/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NjQ1MDA0Zjc3ZWQ1ZGM5NQ">IRIS-IPS: An Interactive People Search System for HCIR Challenge</a>&#8220;– joint work with <a href="http://www.sis.pitt.edu/~daqing/">Daqing He</a>, <a href="http://www.sis.pitt.edu/~zyue/">Zhen Yue</a>, <a href="http://www.sis.pitt.edu/~jjiang/">Jiepu Jiang</a>, and <a href="http://www.pitt.edu/~wej9/">Wei Jeng</a>. The system used three different types of evidence to suggest candidates: expertise relevance, authority based on a <a href="http://en.wikipedia.org/wiki/PageRank">PageRank</a> algorithm applied to the co-authorship network, and social similarity using the <a href="http://en.wikipedia.org/wiki/Jaccard_index">Jaccard similarity</a> between co-authors.</li>
<li><a href="http://faculty.arts.ubc.ca/lfreund/">Luanne Freund</a> and <a href="http://www.linkedin.com/in/kristofkessler">Kristof Kessler</a>, both from the University of British Columbia, presented &#8220;<a href="https://docs.google.com/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6N2M5ZGJjOGRkNDU4MjA3OA">Exposing and exploring academic expertise with Virtu</a>&#8220;, joint work with <a href="http://www.linkedin.com/pub/michael-huggett/1/ab9/301">Michael Huggett</a> and <a href="http://www.linkedin.com/pub/edie-rasmussen/9/1a5/677">Edie Rasmussen</a>. Virtu takes a task-based approach to expertise, exposing and giving the user control over dimensions of expertise that are more or less desirable depending on the type of expert-finding task. The search interface supports information interaction and exploration through a number of browsing and filtering tools, including facets and sliders. You can try out their application <a href="http://www.diigubc.ca/virtu">here</a>.</li>
<li>UCLA student <a href="http://www.linkedin.com/in/feiliux">Fei Liu</a> presented the &#8220;<a href="https://docs.google.com/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NjI4YzU4OGI1ZGI5MmM2MQ">&#8216;iF&#8217; People Search System</a>&#8220;, an impressive solo effort. Also unique among the entries, iF is a mobile application, designed for the iPad and supporting swipe and multi-touch gestures. A very slick application, iF offered a novel approach to exploring the corpus of documents and people using the analysis of their reputations and social network relationships.</li>
</ul>
<p><strong>THE WINNER</strong>: <a href="http://www.diigubc.ca/virtu">Virtu</a>! The competition was fierce, but Virtu stood out for the compelling approach it took to offering users control over the expert-finding process. Congratulations to Luanne, Kristof, and their colleagues for their outstanding work and well-deserved honor.</p>
<p><strong>Reception</strong></p>
<p><strong></strong>After we wrapped up the first day of the Symposium, we walked over to the nearby <a href="http://www.techniquerestaurant.com/locations/boston.html">Technique</a>, a restaurant in the <a href="http://en.wikipedia.org/wiki/Athenaeum_Press">Athenaeum Press</a> building (home to two of Endeca&#8217;s offices in our early years) where students of <a href="http://www.chefs.edu/">Le Cordon Bleu</a> practice their culinary skills. I&#8217;m no master chef, but I certain hope these students earned excellent grades for their performance. We enjoyed a delightful sampling of wines, appetizers, main courses, and desserts.</p>
<p><strong>Conclusion</strong></p>
<p>HCIR has been getting better every year, and this year was no exception. Many attendees in previous years had felt that the one-day format made the event feel rushed, and expanding to a second day took off much of the time pressure. We had ample opportunity for discussion, during the presentations as well as at the coffee breaks and reception. Finally, the Challenge was our best yet, eliciting extraordinary results from the five participating teams.</p>
<p>I&#8217;m proud of how far we&#8217;ve taken HCIR in these six years, and especially grateful to co-organizers <span style="color: #000000;"><a href="http://www.ils.unc.edu/~rcapra/" rel="nofollow" target="_blank">Robert Capra</a>, <a href="http://www.fxpal.com/?p=gene" rel="nofollow" target="_blank">Gene Golovchinsky</a>, <a href="http://faculty.cua.edu/kules/" rel="nofollow">Bill Kules</a>, </span><span style="color: #000000;"><a href="http://www.catherinelsmith.com/" rel="nofollow" target="_blank">Catherine Smith</a>, and <a href="http://research.microsoft.com/en-us/um/people/ryenw/" rel="nofollow">Ryen White</a>.</span></p>
<p><span style="color: #000000;">Time to start thinking about HCIR 2013!</span></p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/10/08/hcir-2012-a-personal-report/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/10/08/hcir-2012-a-personal-report/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Office Hours at Cambridge Brewing Company</title>
		<link>http://thenoisychannel.com/2012/10/01/office-hours-at-cambridge-brewing-company/</link>
		<comments>http://thenoisychannel.com/2012/10/01/office-hours-at-cambridge-brewing-company/#comments</comments>
		<pubDate>Mon, 01 Oct 2012 13:57:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4377</guid>
		<description><![CDATA[I&#8217;ll be in Cambridge, MA this Thursday and Friday for the Sixth Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2012). Hope to see many of you there! But I&#8217;ll also have a few hours on Wednesday evening to meet people informally. If you&#8217;re interested in learning more about LinkedIn, data science or anything else, then hop [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.cambrew.com/"><img class="alignnone" title="Cambridge Brewing Company" src="http://media.cnbc.com/i/CNBC/Sections/Small_Business/_SLIDESHOWS/UniqueBrewPubs/cambridge-brewing-co.jpg" alt="" width="480" height="320" /></a></p>
<p>I&#8217;ll be in Cambridge, MA this Thursday and Friday for the Sixth Symposium on Human-Computer Interaction and Information Retrieval (<a href="http://www.hcir.info/hcir-2012">HCIR 2012</a>). Hope to see many of you there!</p>
<p>But I&#8217;ll also have a few hours on Wednesday evening to meet people informally. If you&#8217;re interested in learning more about LinkedIn, data science or anything else, then hop over to the  <a href="http://www.cambrew.com/">Cambridge Brewing Company</a> on Wednesday, October 3rd. I should be there by 5pm, assuming my flight arrives on time, and I&#8217;ll plan to stay there though dinner. I&#8217;m pretty easy to <a href="http://www.linkedin.com/in/dtunkelang">contact</a>, so feel free to reach out to me through the usual channels.</p>
<p>MIT and Harvard students and faculty are especially welcome!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/10/01/office-hours-at-cambridge-brewing-company/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/10/01/office-hours-at-cambridge-brewing-company/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>HCIR 2012 Symposium: Oct 4-5 in Cambridge, MA</title>
		<link>http://thenoisychannel.com/2012/09/21/hcir-2012-symposium-oct-4-5-in-cambridge-ma/</link>
		<comments>http://thenoisychannel.com/2012/09/21/hcir-2012-symposium-oct-4-5-in-cambridge-ma/#comments</comments>
		<pubDate>Fri, 21 Sep 2012 13:00:40 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4362</guid>
		<description><![CDATA[It&#8217;s the event you&#8217;ve been waiting for: the Sixth Symposium on Human-Computer Interaction and Information Retrieval! HCIR 2012 will take place October 4th and 5th at IBM Research in Cambridge, MA. Who should attend? Researchers, practitioners, and anyone else interested in the exciting work at the intersection of HCI and IR. Areas like interactive information retrieval, exploratory search, and information visualization. [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone  wp-image-4368" title="Charles River, as seen from Cambridge, MA" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/09/Cambridge.png" alt="" width="480" height="234" /></p>
<p>It&#8217;s the event you&#8217;ve been waiting for: the Sixth Symposium on Human-Computer Interaction and Information Retrieval! <a href="http://www.hcir.info/hcir-2012">HCIR 2012</a> will take place October 4th and 5th at <a href="http://domino.research.ibm.com/cambridge/research.nsf/pages/index.html" rel="nofollow">IBM Research</a> in Cambridge, MA.</p>
<p><strong>Who should attend?</strong></p>
<p>Researchers, practitioners, and anyone else interested in the exciting work at the intersection of HCI and IR. Areas like interactive information retrieval, exploratory search, and information visualization.</p>
<p><strong>Why attend?</strong></p>
<p>You&#8217;ll enjoy a highly interactive day and a half of learning from HCIR leaders and pioneers. People like keynote speaker <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>, who literally wrote the book on <a href="http://searchuserinterfaces.com/">search user interfaces</a>. Folks from top universities and industry labs who are developing new methods and models for information seeking. And you&#8217;ll get to see the five teams competing to win the third annual <a href="http://www.hcir.info/hcir-2012">HCIR Challenge</a>, focused this year on people and expertise finding.</p>
<p><strong>How to register?</strong></p>
<p>Just click <a href="http://www.regonline.com/Register/Checkin.aspx?EventId=1138765">here</a> and fill out the information requested. The $150 registration fee includes all sessions on both days, all meeting materials, a reception on October 4th at <a href="http://www.techniquerestaurant.com/locations/boston.html">Technique</a>. We&#8217;re grateful to our sponsors &#8212; FXPAL, IBM Research, LinkedIn, Microsoft Research, MIT CSAIL, and Oracle &#8212; for helping us keep the costs so low.</p>
<p>Capacity is limited, so please register as soon as possible to ensure your attendance.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/09/21/hcir-2012-symposium-oct-4-5-in-cambridge-ma/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/09/21/hcir-2012-symposium-oct-4-5-in-cambridge-ma/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>LinkedIn Presentations at RecSys 2012</title>
		<link>http://thenoisychannel.com/2012/09/16/linkedin-presentations-at-recsys-2012/</link>
		<comments>http://thenoisychannel.com/2012/09/16/linkedin-presentations-at-recsys-2012/#comments</comments>
		<pubDate>Sun, 16 Sep 2012 19:56:36 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4352</guid>
		<description><![CDATA[LinkedIn showed up in force at the 6th ACM International Conference on Recommender Systems (RecSys 2012)! Here are the slides from all of our presentations. Daniel Tunkelang: Content, Connections, and Context &#160; Mario Rodriguez, Christian Posse, and Ethan Zhang: Multiple Objective Optimization in Recommender Systems &#160; Anmol Bhasin: Beyond Ratings and Followers &#160; Mohammad Amin, Baoshi Yan, Sripad [...]]]></description>
				<content:encoded><![CDATA[<p>LinkedIn showed up in force at the 6th ACM International Conference on Recommender Systems (<a href="http://recsys.acm.org/2012/">RecSys 2012</a>)! Here are the slides from all of our presentations.</p>
<p><a href="http://www.linkedin.com/in/dtunkelang">Daniel Tunkelang</a>: Content, Connections, and Context<br />
<iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/14223028?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"></iframe></p>
<p>&nbsp;</p>
<p><a href="http://www.linkedin.com/in/mechanistician">Mario Rodriguez</a>, <a href="http://www.linkedin.com/in/christianposse">Christian Posse</a>, and <a href="http://www.linkedin.com/in/ethanzhang">Ethan Zhang</a>: Multiple Objective Optimization in Recommender Systems<br />
<iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/14268936?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"></iframe></p>
<p>&nbsp;</p>
<p><a href="http://www.linkedin.com/in/abhasin">Anmol Bhasin</a>: Beyond Ratings and Followers<br />
<iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/14272678?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"></iframe></p>
<p>&nbsp;</p>
<p><a href="http://www.linkedin.com/in/shafkatamin">Mohammad Amin</a>, <a href="http://www.linkedin.com/in/baoshiyan">Baoshi Yan</a>, <a href="http://www.linkedin.com/in/sripadsriram">Sripad Sriram</a>, <a href="http://www.linkedin.com/in/abhasin">Anmol Bhasin</a>, and <a href="http://www.linkedin.com/in/christianposse">Christian Posse</a>: Social Referral: Leveraging Network Connections to Deliver Recommendations</p>
<p><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/14305251" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="479" height="511"></iframe></p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/09/16/linkedin-presentations-at-recsys-2012/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/09/16/linkedin-presentations-at-recsys-2012/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>RecSys 2012: Beyond Five Stars</title>
		<link>http://thenoisychannel.com/2012/09/14/recsys-2012-beyond-five-stars/</link>
		<comments>http://thenoisychannel.com/2012/09/14/recsys-2012-beyond-five-stars/#comments</comments>
		<pubDate>Fri, 14 Sep 2012 16:28:25 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4346</guid>
		<description><![CDATA[I spent the past week in Dublin attending the 6th ACM International Conference on Recommender Systems (RecSys 2012). This young conference has become the premier global forum for discussing the state of the art in recommender systems, and I&#8217;m thrilled to have has the opportunity to participate. Sunday: Workshops The conference began on Sunday with [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://nodexlgraphgallery.org/Pages/Graph.aspx?graphID=1190"><img class="alignnone  wp-image-4347" title="#recsys2012 Activity Graph" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/09/recsys2012-Activity-Graph.png" alt="" width="512" height="353" /></a></p>
<p>I spent the past week in Dublin attending the 6th ACM International Conference on Recommender Systems (<a href="http://recsys.acm.org/2012/">RecSys 2012</a>). This young conference has become the premier global forum for discussing the state of the art in recommender systems, and I&#8217;m thrilled to have has the opportunity to participate.</p>
<p><strong>Sunday: Workshops</strong></p>
<p>The conference began on Sunday with a day of parallel <a href="http://recsys.acm.org/2012/program.html#pre-workshops">workshops</a>.</p>
<p>I attended the <a href="http://ls13-www.cs.uni-dortmund.de/homepage/rsweb2012/index.shtml">Workshop on Recommender Systems and the Social Web</a>, where I presented a keynote entitled &#8220;<a href="http://thenoisychannel.com/2012/09/09/content-connections-and-context/">Content, Connections, and Context</a>&#8220;. Major worktop themes included <a href="http://en.wikipedia.org/wiki/Folksonomy">folksonomies</a>, trust, and pinning down what we mean by &#8220;social&#8221; and &#8220;context&#8221;. The most interesting presentation was &#8220;<a href="http://userpages.uni-koblenz.de/~kunegis/paper/kunegis-online-dating-recommender-systems-the-split-complex-number-approach.pdf">Online Dating Recommender Systems: The Split-complex Number Approach</a>&#8220;, in which <a href="http://www.uni-koblenz-landau.de/campus-koblenz/fb4/west/staff/Kunegis">Jérôme Kunegis</a> modeled the dating recommendation problem (specifically, the interaction of &#8220;like&#8221; and &#8220;is-similar&#8221; relationships) using a variation of quaternions introduced in the 19th century! The full workshop program, including slides of all the presentations is available <a href="http://ls13-www.cs.uni-dortmund.de/homepage/rsweb2012/program.shtml">here</a>.</p>
<p>Unfortunately, I was not able to attend the other workshops that day, which focused on <a href="http://recex.ist.tugraz.at/RecSysWorkshop">Human Decision Making in Recommender Systems</a>, <a href="http://cars-workshop.org/">Context-Aware Recommender Systems (CARS)</a>, and <a href="http://ir.ii.uam.es/rue2012/">Recommendation Utility Evaluation (RUE)</a>. But I did hear that <a href="http://www.linkedin.com/in/carlosgomezuribe">Carlos Gomez-Uribe</a> delivered an excellent <a href="http://ir.ii.uam.es/rue2012/keynote.html">keynote</a> at the RUE workshop on the challenges of offline and online evaluation of Netflix&#8217;s recommender systems.</p>
<p><strong>Monday: Experiments, Evaluations, and Pints All Around</strong></p>
<p>Monday started with parallel <a href="http://recsys.acm.org/2012/tutorials.html">tutorial</a> sessions. I attended <a href="http://www.usabart.nl/portfolio/#home.html">Bart Knijnenburg</a>&#8216;s tutorial on &#8220;<a href="http://www.usabart.nl/portfolio/recsys2012tutorial.pdf">Conducting User Experiments in Recommender Systems</a>&#8220;. Bart is an outstanding lecturer, and he delivered an excellent overview of the evaluation landscape. My only complaint is that there was too much material for even a 90-minute session. Fortunately, his <a href="http://www.usabart.nl/portfolio/tutorialhandouts.pdf">slides</a> are online, and perhaps he&#8217;ll be persuaded to expand them into book form. Unfortunately, I missed <a href="http://emotion-research.net/Members/MariaNunes">Maria Augusta Nunes</a> and <a href="http://hci.epfl.ch/members/rong/">Rong Hu</a>&#8216;s parallel tutorial on <a href="http://recsys.acm.org/2012/tutorials.html#personality">personality-based recommender systems</a>.</p>
<p>Then came a rousing research keynote by <a href="http://cs.stanford.edu/people/jure/">Jure Leskovec</a> on &#8220;<a href="http://i.stanford.edu/~jure/pub/talks/evals-recsys-sep12.pdf">How Users Evaluate Things and Each Other in Social Media</a>&#8220;. I won&#8217;t try to summarize the keynote here &#8212; the slides of this and other presentations are available online. But the point Jure made that attracted the most interest was that voting is so predictable that results are determined mostly by turn-out. Aside from the immediate applications of this observation to the <a href="http://fivethirtyeight.blogs.nytimes.com/">US presidential elections</a>, there are many research and practical questions about how to obtain or incent a representative participant pool &#8212; <a href="http://thenoisychannel.com/2010/10/10/pluralistic-ignorance-and-bayesian-truth-serum/">a topic I&#8217;ve been passionate about</a> for a long time.</p>
<p>The program continued with research presentations on <a href="http://recsys.acm.org/2012/program.html#multi">multi-objective recommendation</a> and <a href="http://recsys.acm.org/2012/program.html#social">social recommendations</a>. I may be biased, but my favorite presentation was the work that my colleague <a href="http://www.linkedin.com/in/mechanistician">Mario Rodriguez</a> presented on multiple-objective optimization in LinkedIn&#8217;s recommendation systems. I&#8217;ll post the slides and paper here as soon as they are available.</p>
<p>Monday night, we went to the <a href="http://www.guinness-storehouse.com/en/Index.aspx">Guinness Storehouse</a> for a tour that culminated with fresh pints of Guinness in the <a href="http://www.guinness-storehouse.com/en/EventSpaces_GravityBar.aspx">Gravity Bar</a> overlooking the city. We&#8217;re all grateful to <a href="http://en.wikipedia.org/wiki/William_Sealy_Gosset">William Gosset</a>, a chemist working for the Guinness brewery when he introduced the now ubiquitous <a href="http://en.wikipedia.org/wiki/Student's_t-test">t-test</a> in 1908 as a way to monitor the quality of his product. A toast to statistics and to great beer!</p>
<p><strong>Tuesday: Math, Posters, and Dancing</strong></p>
<p>Tuesday started with another pair of parallel <a href="http://recsys.acm.org/2012/tutorials.html">tutorial</a> sessions. I attended <a href="http://xavier.amatriain.net/">Xavier Amatriain</a>&#8216;s tutorial on &#8220;<a href="http://recsys.acm.org/2012/tutorials.html#building">Building Industrial-scale Real-world Recommender Systems</a>&#8221; at Netflix. It was an excellent presentation, especially considering that Xavier had just come from a transatlantic flight! A major theme in his presentation was that Netflix is moving beyond the emphasis on user ratings to make the interaction with the user more transparent and <a href="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/">conversational</a>. Unfortunately I had to miss the parallel tutorial on the &#8220;<a href="http://recsys.acm.org/2012/tutorials.html#best">The Challenge of Recommender Systems Challenges</a>&#8221; by <a href="http://www.dai-labor.de/team/alan.said">Alan Said</a>, <a href="http://www.gravityrd.com/page/11-management?lang=en">Domonkos Tikk</a>, and <a href="http://www.is.informatik.uni-wuerzburg.de/staff/hotho/">Andreas Hotho</a>.</p>
<p>Tuesday continued with research papers on <a href="http://recsys.acm.org/2012/program.html#implicit">implicit feedback</a> and <a href="http://recsys.acm.org/2012/program.html#contextual">context-aware recommendations</a>. One that drew particular interest was Daniel Kluver&#8217;s information-theoretical work to quantify the preference information contained in ratings and predictions, measured in preference bits per second (paper available <a href="http://dl.acm.org/citation.cfm?id=2365974">here</a> for ACM DL subscribers). And <a href="http://www.sze.hu/~gtakacs/">Gabor Takacs</a> had the day&#8217;s best line with &#8220;if you don&#8217;t like math, leave the room.&#8221; <a href="http://twitter.yfrog.com/z/h7f8texj">He wasn&#8217;t kidding!</a></p>
<p>Then came the <a href="http://recsys.acm.org/2012/posters_and_demos.html">posters and demos</a> &#8212; first a &#8220;slam&#8221; session where each author could make a 60-second pitch, and then two hours for everyone to interact with the authors while enjoying <a href="http://data.linkedin.com/">LinkedIn</a>-sponsored drinks. There were lots of great posters, but my favorite was <a href="http://www.linkedin.com/in/elehack">Michael Ekstrand</a>&#8216;s &#8220;<a href="http://grouplens.org/recsys2012-error-analysis">When Recommenders Fail: Predicting Recommender Failure for Algorithm Selection and Combination</a>&#8220;.</p>
<p>Tuesday night we had a delightful banquet capped by a performance of traditional Irish step dancing. The dancers, girls ranging from 4 to 18 years old, were extraordinary. I&#8217;m sorry I didn&#8217;t capture any of the performance on camera, and I&#8217;m hoping someone else did.</p>
<p><strong>Wednesday: Industry Track and a Grand Finale</strong></p>
<p>Wednesday morning we had the <a href="http://recsys.acm.org/2012/industry_track.html">industry track</a>. I&#8217;m biased as a <a href="http://recsys.acm.org/2012/organisation.html">co-organizer</a>, but I heard <a href="https://twitter.com/zenogantner/status/245816006793125888">resounding</a> <a href="https://twitter.com/phelo/status/245816081451712512">feedback</a> that the industry track was the highlight of the conference. I was very impressed with the presentations by senior technologists at Facebook, Yahoo, StumbleUpon. LinkedIn, Microsoft, and Echo Nest. And <a href="https://www.linkedin.com/in/ronnyk">Ronny Kohavi</a>&#8216;s keynote on &#8220;<a href="http://www.exp-platform.com/Pages/2012RecSys.aspx">Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics</a>&#8220; was a masterpiece. I encourage you to look at the <a href="http://www.slideshare.net/tag/recsys2012">slides</a> for all of these excellent presentations.</p>
<p>Afterward came the last two research sessions, which included the <a href="http://recsys.acm.org/2012/awards.html">best-paper</a> awardee &#8220;<a href="http://www.ci.tuwien.ac.at/~alexis/Publications_files/climf-recsys12.pdf">CLiMF: Learning to Maximize Reciprocal Rank with Collaborative Less-is-More Filtering</a>&#8220;. I&#8217;ve been a fan of &#8220;less is more&#8221; ever since seeing Harr Chen present <a href="http://people.csail.mit.edu/harr/papers/sigir2006.ppt">a paper with that title</a> at SIGIR 2006, and I&#8217;m delighted to see these concepts making their way to the RecSys community. In fact, I saw some other ideas, like <a href="http://en.wikipedia.org/wiki/Learning_to_rank">learning to rank</a>, crossing over from IR to RecSys, and I believe this cross-pollination benefits both fields. Finally, I really enjoyed the last research presentation of the conference, in which <a href="http://www.linkedin.com/in/smritibhagat">Smriti Bhagat</a> talked about <a href="http://paloalto.thlab.net/uploads/papers/paper_1.pdf">inferring and obfuscating user demographics</a> based on ratings. The technical and ethical facets of inferring private data are topics <a href="http://jimadler.me/post/18618791545/strata-2012-is-privacy-a-big-data-prison">close to my heart</a>.</p>
<p>Finally, next year&#8217;s hosts exhorted this year&#8217;s participants to come to Hong Kong for <a href="http://recsys.hosting.acm.org/recsys13/">RecSys 2013</a>, and we heard the final conference presentation: Neal Lathia&#8217;s 100-euro-winning entry in the <a href="http://acmrecsys.wordpress.com/2011/10/26/the-recsys-2012-limerick-challenge/">RecSys Limerick Challenge</a>.</p>
<p><strong>Thursday: Flying Home</strong></p>
<p>Sadly I missed the last day conference-related activity: the <a href="http://recsys.acm.org/2012/doctoral_symposium.html">doctoral symposium</a>, the <a href="http://2012.recsyschallenge.com/">RecSys Data Challenge</a>, and additional <a href="http://recsys.acm.org/2012/program.html#post-workshops">workshops</a>. I&#8217;m looking forward to seeing discussion of these online, as well as reviewing the very active <a href="https://twitter.com/i/#!/search/%23recsys2012">#recsys2012</a> tweet stream.</p>
<p>All in all, it was an excellent conference. LinkedIn, Netflix, and other industry participants comprised about a third of attendees, and there was a strong conversation bridging the gap between academic research and industry practice. I appreciate the focus of the nuances of evaluation, particularly the challenges of combining offline evaluation with online testing, and ensuring that the participant pool is robust. The one topic where I would have like to see more discussion was that of creating robust incentives for people to participate in recommender systems. Maybe next year in Hong Kong?</p>
<p>Oh, and we&#8217;re <a href="http://data.linkedin.com/team">hiring</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/09/14/recsys-2012-beyond-five-stars/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/09/14/recsys-2012-beyond-five-stars/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Content, Connections, and Context</title>
		<link>http://thenoisychannel.com/2012/09/09/content-connections-and-context/</link>
		<comments>http://thenoisychannel.com/2012/09/09/content-connections-and-context/#comments</comments>
		<pubDate>Sun, 09 Sep 2012 14:52:48 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4341</guid>
		<description><![CDATA[This is keynote presentation I delivered at the Workshop on Recommender Systems and the Social Web, held as part of the 6th ACM International Conference on Recommender Systems (RecSys 2012): Content, Connections, and Context  Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context. Content [...]]]></description>
				<content:encoded><![CDATA[<p><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/14223028?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"></iframe></p>
<p>This is keynote presentation I delivered at the <a href="http://ls13-www.cs.uni-dortmund.de/homepage/rsweb2012/index.shtml">Workshop on Recommender Systems and the Social Web</a>, held as part of the 6th ACM International Conference on Recommender Systems (<a href="http://recsys.acm.org/2012/">RecSys 2012</a>):</p>
<p style="padding-left: 30px;"><strong>Content, Connections, and Context </strong></p>
<p style="padding-left: 30px;"><strong></strong>Recommender systems for the social web combine three kinds of signals to relate the subject and object of recommendations: content, connections, and context.</p>
<p style="padding-left: 30px;">Content comes first &#8211; we need to understand what we are recommending and to whom we are recommending it in order to decide whether the recommendation is relevant. Connections supply a social dimension, both as inputs to improve relevance and as social proof to explain the recommendations. Finally, context determines where and when a recommendation is appropriate.</p>
<p style="padding-left: 30px;">I&#8217;ll talk about how we use these three kinds of signals in LinkedIn&#8217;s recommender systems, as well as the challenges we see in delivering social recommendations and measuring their relevance.</p>
<p>When I&#8217;m back from Dublin, I promise to blog about my impressions and reflections from the conference. In the mean time, I hope you enjoy the slides!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/09/09/content-connections-and-context/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/09/09/content-connections-and-context/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>LinkedIn at RecSys 2012</title>
		<link>http://thenoisychannel.com/2012/09/04/linkedin-at-recsys-2012/</link>
		<comments>http://thenoisychannel.com/2012/09/04/linkedin-at-recsys-2012/#comments</comments>
		<pubDate>Tue, 04 Sep 2012 13:45:45 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4331</guid>
		<description><![CDATA[LinkedIn is an industry leader in the area of recommender systems &#8211; a place where big data meets clever algorithms and content meets social. If you&#8217;re one of the 175M+ people using LinkedIn, you&#8217;ve probably noticed some of our recommendation products, such People You May Know, Jobs You Might Be Interested In, and LinkedIn Today. So it&#8217;s [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://recsys.acm.org/2012/"><img class="alignnone" title="RecSys 2012" src="http://recsys.acm.org/2012/images/header.png" alt="" width="484" height="158" /></a></p>
<p>LinkedIn is an industry leader in the area of <a href="http://en.wikipedia.org/wiki/Recommender_system">recommender systems</a> &#8211; a place where big data meets clever algorithms and content meets social. If you&#8217;re one of the 175M+ people using LinkedIn, you&#8217;ve probably noticed some of our recommendation products, such <a href="http://www.linkedin.com/people/pymk">People You May Know</a>, <a href="http://www.linkedin.com/jsearch/rec">Jobs You Might Be Interested In</a>, and <a href="http://www.linkedin.com/today/">LinkedIn Today</a>.</p>
<p>So it&#8217;s no surprise we&#8217;re participating in the 6th ACM International Conference on Recommender Systems (<a href="http://recsys.acm.org/2012/">RecSys 2012</a>), which will take place in Dublin next week.</p>
<p>Here&#8217;s a preview:</p>
<ul>
<li>Sunday, September 9: I&#8217;ll be presenting a keynote at the <a href="http://ls13-www.cs.uni-dortmund.de/homepage/rsweb2012/index.shtml">Workshop on Recommender Systems and the Social Web</a> entitled &#8220;<a href="http://ls13-www.cs.uni-dortmund.de/homepage/rsweb2012/invitedtalk.shtml">Content, Connections, and Context</a>&#8220;. <a href="http://www.linkedin.com/in/abhasin">Anmol Bhasin</a> will be a panelist at the <a href="http://cars-workshop.org/">Workshop on Context-Aware Recommender Systems</a>.</li>
<li>Monday, September 10: <a href="http://www.linkedin.com/in/mechanistician">Mario Rodriguez</a> will present work he&#8217;s done with <a href="http://www.linkedin.com/in/christianposse">Christian Posse</a> and <a href="http://www.linkedin.com/in/ethanzhang">Ethan Zhang</a> on &#8220;<a href="http://recsys.acm.org/2012/program.html#multi">Multiple Objective Optimization in Recommendation Systems</a>&#8220;.</li>
<li>Tuesday, September 11: LinkedIn is sponsoring the <a href="http://recsys.acm.org/2012/posters_and_demos.html">poster/demo session</a>. Check out the session and stop by our booth!</li>
<li>Wednesday, September 12: <a href="http://www.linkedin.com/in/abhasin">Anmol Bhasin</a> will deliver an invited talk on &#8220;<a href="http://recsys.acm.org/2012/industry_track.html#anmol" target="_self">Recommender Systems &amp; The Social Web</a>&#8221; as part of the <a href="http://thenoisychannel.com/2012/06/28/recsys-2012-industry-track/">industry track</a> that <a href="http://www.linkedin.com/pub/yehuda-koren/7/614/856">Yehuda Koren</a> and I are co-chairing.</li>
</ul>
<p>I hope to see many of you at the conference, especially if you&#8217;re interested in learning about opportunities to work on recommendation systems and related areas at LinkedIn. And perhaps you can provide your own recommendations &#8212; specifically, local pubs where we can take in the local spirit.</p>
<p>See you in Dublin. Sláinte!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/09/04/linkedin-at-recsys-2012/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/09/04/linkedin-at-recsys-2012/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Panos Ipeirotis talking at LinkedIn about Crowdsourcing!</title>
		<link>http://thenoisychannel.com/2012/08/30/panos-ipeirotis-talking-at-linkedin-about-crowdsourcing/</link>
		<comments>http://thenoisychannel.com/2012/08/30/panos-ipeirotis-talking-at-linkedin-about-crowdsourcing/#comments</comments>
		<pubDate>Thu, 30 Aug 2012 15:34:52 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4323</guid>
		<description><![CDATA[Sharing knowledge is part of our core culture at LinkedIn, whether it’s through hackdays or contributions to open-source projects. We actively participate in academic conferences, such as KDD, SIGIR, RecSys, and CIKM, as well as industry conferences like QCON and Strata. Beyond sharing our own knowledge, we provide a platform for researchers and practitioners to share their insights with the technical community. We [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://people.stern.nyu.edu/panos/"><img class="alignnone" title="Panos Ipeirotis" src="http://pages.stern.nyu.edu/~panos/panos.jpg" alt="" width="220" height="255" /></a></p>
<p>Sharing knowledge is part of our core culture at LinkedIn, whether it’s through <a href="http://engineering.linkedin.com/tags/hackday" target="_blank">hackdays</a> or contributions to <a href="https://engineering.linkedin.com/tags/open-source">open-source projects</a>. We actively participate in academic conferences, such as <a href="http://kdd2012.sigkdd.org/" target="_blank">KDD</a>, <a href="http://www.sigir.org/sigir2012/" target="_blank">SIGIR</a>, <a href="http://recsys.acm.org/2012/" target="_blank">RecSys</a>, and <a href="http://www.cikm2012.org/">CIKM</a>, as well as industry conferences like <a href="http://qconlondon.com/london-2012/" target="_blank">QCON</a> and <a href="http://strataconf.com/public/content/home" target="_blank">Strata</a>.</p>
<p>Beyond sharing our own knowledge, we provide a platform for researchers and practitioners to share their insights with the technical community. We host an <a href="http://www.youtube.com/linkedintechtalks" target="_blank">Tech Talk series</a> at our Mountain View headquarters that we open up to the general public. Some of our recent speakers include Coursera founders <a href="http://blog.linkedin.com/2012/06/14/coursera-tech-talk/" target="_blank">Daphne Koller and Andrew Ng</a>, UC-Berkeley professor <a href="http://events.linkedin.com/programming-for-distributed-consistency-1010974" target="_blank">Joe Hellerstein</a>,  and Hadapt Chief Scientist <a href="http://engineering.linkedin.com/hadoop/recap-improving-hadoop-performance-1000x" target="_blank">Daniel Abadi</a>. It&#8217;s an excellent opportunity for people with shared professional interests can reconnect with people they know, as well as make new connections. For those who cannot attend, we offer a <a href="http://www.ustream.tv/channel/linkedin-techtalks" target="_blank">live stream</a>.</p>
<p>Our next talk will be by <a href="http://people.stern.nyu.edu/panos/" target="_blank">Panos Ipeirotis</a>, a professor at NYU and one of the world&#8217;s top experts on crowdsourcing. Here is a full description:</p>
<p style="padding-left: 30px;"><strong>Crowdsourcing: Achieving Data Quality with Imperfect Humans</strong><br />
Friday, September 7, 2012 at 3:00 PM<br />
LinkedIn (<a href="https://maps.google.com/maps?q=2029+Stierlin+Ct,+Mountain+View,+CA+94043">map</a>)</p>
<p style="padding-left: 30px;">Crowdsourcing is a great tool to collect data and support machine learning &#8212; it is the ultimate form of outsourcing. But crowdsourcing introduces budget and quality challenges that must be addressed to realize its benefits.</p>
<p style="padding-left: 30px;">In this talk, I will discuss the use of crowdsourcing for building robust machine learning models quickly and under budget constraints. I&#8217;ll operate under the realistic assumption that we are processing imperfect labels that reflect random and systematic error on the part of human workers. I will also describe our &#8220;beat the machine&#8221; system engages humans to improve a machine learning system by discovering cases where the machine fails and fails while confident on being correct. I&#8217;ll use classification problems that arise in online advertising.</p>
<p style="padding-left: 30px;">Finally, I&#8217;ll discuss our latest results showing that mice and Mechanical Turk workers are not that different after all.</p>
<p style="padding-left: 30px;">Panos Ipeirotis is an Associate Professor and George A. Kellner Faculty Fellow at the Department of Information, Operations, and Management Sciences at Leonard N. Stern School of Business of New York University. His recent research interests focus on crowdsourcing and on mining user-generated content on the Internet. He received his Ph.D. degree in Computer Science from Columbia University in 2004, with distinction. He has received three “Best Paper” awards (IEEE ICDE 2005, ACM SIGMOD 2006, WWW 2011), two “Best Paper Runner Up” awards (JCDL 2002, ACM KDD 2008), and is also a recipient of a CAREER award from the National Science Foundation.</p>
<p>If you&#8217;re in the Bay Area, I encourage you to attend in person &#8212; Panos is a great speaker, and it&#8217;s also a great opportunity to <a href="http://www.linkedin.com/" target="_blank">network</a> with other attendees. If not, then you can follow on the <a href="http://www.ustream.tv/channel/linkedin-techtalks" target="_blank">live stream</a>.</p>
<p>The event is free, but please sign up on the <a href="http://events.linkedin.com/crowdsourcing-achieving-data-quality-1070763" target="_blank">event page</a>. See you next week!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/08/30/panos-ipeirotis-talking-at-linkedin-about-crowdsourcing/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/08/30/panos-ipeirotis-talking-at-linkedin-about-crowdsourcing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Data Werewolves</title>
		<link>http://thenoisychannel.com/2012/08/23/data-werewolves/</link>
		<comments>http://thenoisychannel.com/2012/08/23/data-werewolves/#comments</comments>
		<pubDate>Fri, 24 Aug 2012 04:40:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4311</guid>
		<description><![CDATA[Thank you Scott Adams for the free advertising. Of course, LinkedIn is the place to find data werewolves.   Want to find more data werewolves. Check out my team! Don&#8217;t worry, they only bite when they&#8217;re hungry.]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.dilbert.com/strips/comic/2012-08-23/"><img class="alignnone size-full wp-image-4312" title="Data Werewolves of LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/08/DataWerewolves.jpeg" alt="" width="500" height="155" /></a></p>
<p>Thank you <a href="http://dilbert.com/">Scott Adams</a> for the free advertising. Of course, <a href="http://www.linkedin.com/search/fpsearch?type=people&amp;keywords=data+werewolf">LinkedIn</a> is the place to find data werewolves.</p>
<p><a href="http://www.linkedin.com/in/joyce"><img class="alignnone  wp-image-4314" style="border: 1px solid black;" title="Joyce Wang, Data Werewolf" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/08/Joyce.png" alt="" width="225" height="79" /></a> <a href="http://www.linkedin.com/in/suryasev"><img class="alignnone  wp-image-4315" style="border: 1px solid black;" title="Sal Uryasev, Data Werewolf" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/08/Sal.png" alt="" width="225" height="79" /></a></p>
<p>Want to find more data werewolves. Check out my <a href="http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/">team</a>! Don&#8217;t worry, they only bite when they&#8217;re hungry.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/08/23/data-werewolves/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/08/23/data-werewolves/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Matt Lease: Recent Adventures in Crowdsourcing and Human Computation</title>
		<link>http://thenoisychannel.com/2012/08/20/matt-lease-recent-adventures-in-crowdsourcing-and-human-computation/</link>
		<comments>http://thenoisychannel.com/2012/08/20/matt-lease-recent-adventures-in-crowdsourcing-and-human-computation/#comments</comments>
		<pubDate>Tue, 21 Aug 2012 04:01:45 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4304</guid>
		<description><![CDATA[Today we (specifically, my colleague Daria Sorokina) had the pleasure of hosting UT-Austin professor Matt Lease at LinkedIn to give a talk on his &#8220;Recent Adventures in Crowdsourcing and Human Computation&#8220;. It was a great talk, and the slides above are full of references to research that he and his colleagues have done in this [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/14022481" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe></p>
<p>Today we (specifically, my colleague <a href="http://www.linkedin.com/in/dariasorokina">Daria Sorokina</a>) had the pleasure of hosting UT-Austin professor <a href="http://www.ischool.utexas.edu/~ml/">Matt Lease</a> at LinkedIn to give a talk on his &#8220;<a href="http://www.slideshare.net/mattlease/recent-adventures-in-crowdsourcing-and-human-computation">Recent Adventures in Crowdsourcing and Human Computation</a>&#8220;. It was a great talk, and the slides above are full of references to research that he and his colleagues have done in this area. A great resource for people interested in the theory and practice of crowdsourcing!</p>
<p>If you are interested in learning more about crowdsourcing, then sign up for an upcoming LinkedIn tech talk by NYU professor <a href="http://people.stern.nyu.edu/panos/">Panos Ipeirotis</a> on &#8220;<a href="http://events.linkedin.com/crowdsourcing-achieving-data-quality-1070763">Crowdsourcing: Achieving Data Quality with Imperfect Humans</a>&#8220;.</p>
<p>And if you&#8217;re already an expert, then perhaps you&#8217;d like to <a href="http://www.linkedin.com/jobs?viewJob=&#038;jobId=3559257">work on crowdsourcing at LinkedIn</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/08/20/matt-lease-recent-adventures-in-crowdsourcing-and-human-computation/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/08/20/matt-lease-recent-adventures-in-crowdsourcing-and-human-computation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WTF! @ k: Measuring Ineffectiveness</title>
		<link>http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/</link>
		<comments>http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/#comments</comments>
		<pubDate>Mon, 20 Aug 2012 13:00:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4277</guid>
		<description><![CDATA[At SIGIR 2004, Ellen Voorhees presented a paper entitled &#8220;Measuring Ineffectiveness&#8221; in which she asserted: Using average values of traditional evaluation measures [for information retrieval systems] is not an appropriate methodology because it emphasizes effective topics: poorly performing topics’ scores are by definition small, and they are therefore difficult to distinguish from the noise inherent [...]]]></description>
				<content:encoded><![CDATA[<p>At <a href="http://www.sigir.org/sigir2004/">SIGIR 2004</a>, <a href="http://www.linkedin.com/pub/ellen-voorhees/6/115/3b8">Ellen Voorhees</a> presented a paper entitled &#8220;<a href="http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirVoorhees2004.pdf">Measuring Ineffectiveness</a>&#8221; in which she asserted:</p>
<blockquote><p>Using average values of traditional evaluation measures [for information retrieval systems] is not an appropriate methodology because it emphasizes effective topics: poorly performing topics’ scores are by definition small, and they are therefore difficult to distinguish from the noise inherent in retrieval evaluation.</p></blockquote>
<p>Ellen is one of the world&#8217;s top researchers in the field of information retrieval evaluation. And for those not familiar with <a href="http://trec.nist.gov/">TREC</a> terminology, &#8220;topics&#8221; are the queries used to evaluate information retrieval systems. So what she&#8217;s saying above is that, in order to evaluate systems effectively, we need to focus more on failures than on successes.</p>
<p>Specifically, she proposed that we judge information retrieval system performance by measuring the percentage of topics (i.e., queries) with no relevant results in the top 10 retrieved (%no), a measure that was then adopted by the <a href="http://trec.nist.gov/data/robust.html">TREC robust retrieval track</a>.</p>
<p><strong>Information Retrieval in the Wild</strong></p>
<p><a href="http://en.wikipedia.org/wiki/Information_retrieval">Information retrieval</a> (aka search) in the wild is a bit different from information retrieval in the lab. We don&#8217;t have a gold standard of human <a href="http://trec.nist.gov/data/reljudge_eng.html">relevance judgements</a> against which we can compare search engine results. And even if we can assemble a representative collection of test queries, it isn&#8217;t economically plausible to assemble this gold standard for a large document corpus where each query can have thousands &#8212; even millions &#8212; of relevant results.</p>
<p>Moreover, the massive growth of the internet and the advent of social networks have changed the landscape of information retrieval. The idea that the relationship between a document and a search query, would  be sufficient to determine relevance was always a <a href="http://thenoisychannel.com/2009/01/08/google-tech-talk-reconsidering-relevance/">crude approximation</a>, but now the diversity of a global user base makes this approximation even cruder.</p>
<p>For example, consider this query on Google for [<a href="https://www.google.com/search?q=nlp">nlp</a>]:</p>
<p><a href="https://www.google.com/search?q=nlp"><img class="alignnone  wp-image-4285" title="Google search for [nlp]" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/08/nlp1.png" alt="" width="500" height="268" /></a></p>
<p>Hopefully Google&#8217;s <a href="http://searchengineland.com/schmidt-listing-googles-200-ranking-factors-would-reveal-business-secrets-51065">hundreds of ranking factors</a> &#8212; and all of you &#8212;  know me well enough to know that, when I say NLP, I&#8217;m probably referring to <a href="http://en.wikipedia.org/wiki/Natural_language_processing">natural language processing</a> rather than <a href="http://en.wikipedia.org/wiki/Neuro-linguistic_programming">neuro-linguistic programming</a>. Still, it&#8217;s an understandable mistake &#8212; the latter NLP sells a lot more <a href="http://www.amazon.com/s/field-keywords=nlp">books</a>.</p>
<p>And search in the context of a social network makes the user&#8217;s identity and task context key factors for determine relevance &#8212; factors that are uniquely available to each user. For example, if I search on Linkedin for [peter kim], the search engine cannot know for certain whether I&#8217;m looking for my former co-worker, a celebrity I&#8217;m connected to, a current co-worker who is a 2nd-degree connection, or someone else entirely.</p>
<p><a href="http://www.linkedin.com/search/fpsearch?type=people&#038;keywords=peter+kim"><img class="alignnone size-full wp-image-4290" title="LinkedIn search for [peter kim]" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/08/peter-kim.png" alt="" width="500" height="322" /></a></p>
<p>In short, we cannot rely on human relevance judgments to determine if we are delivering users the most relevant results.</p>
<p><strong>From %no to WTF! @ k</strong></p>
<p>But human judgments can still provide enormous value for evaluating search engine and recommender system performance. Even if we can&#8217;t use them to distinguish the most relevant results, we can identify situations where we are delivering glaringly irrelevant results. Situations where the user&#8217;s natural reaction is &#8220;<a href="http://www.urbandictionary.com/define.php?term=wtf">WTF!</a>&#8220;.</p>
<p>People understand that search engines and recommender systems aren&#8217;t mind readers. We humans recognize that computers make mistakes, much as other people do. To err, after all, is human.</p>
<p>What we don&#8217;t forgive &#8212; especially from computers &#8212; are seemingly inexplicable mistakes that any reasonable person would be able to recognize.</p>
<p>I&#8217;m not going to single out any sites to provide examples. I&#8217;m sure you are familiar with the experience of a search engine or recommender system returning a result that makes you want to scream &#8220;WTF!&#8221;. I may even bear some responsibility, in which case I apologize. Besides, everyone is entitled to the occasional mistake.</p>
<p>But I&#8217;m hard-pressed to come up with a better measure to optimize (i.e., minimize) than WTF! @ k &#8212; that is, the number of top-k results that elicit a WTF! reaction. The value of k depends on the application. For a search engine, k = 10 could correspond to the first page of results. For a recommender system, k is probably smaller, e.g., 3.</p>
<p>Also the system can substantially mitigate the risk of WTF! results by providing explanations for results and making the information seeking process more of a <a href="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/">conversation with the user</a>.</p>
<p><strong>Measuring WTF! @ k</strong></p>
<p>Hopefully you agree that we should strive to minimize WTF! @ k. But, as <a href="http://en.wikipedia.org/wiki/William_Thomson,_1st_Baron_Kelvin">Lord Kelvin</a> tells us, if you can&#8217;t measure it, then you can&#8217;t improve it. How do we measure WTF! @ k?</p>
<p>On one hand, we cannot rely on click behavior to measure it implicitly. All non-clicks look the same, and we can&#8217;t tell which ones were WTF! results. In fact, egregiously irrelevant results may inspire clicks out of sheer curiosity. One of the phenomena that search engines watch out for is an unusually high click-through rate &#8212; those clicks often signal something other than relevance, like a racy or offensive result.</p>
<p>On the other hand, we can measure WTF! @ k with human judgments. A rater does not need to have the personal and task context of a user to evaluate whether a result is at least plausibly relevant. WTF! @ k is thus a measure that is amenable to <a href="http://en.wikipedia.org/wiki/Crowdsourcing">crowdsourcing</a>, a technique that both <a href="https://plus.google.com/109412257237874861202/posts/PtwJXhJaVD5">Google</a> and <a href="http://searchengineland.com/bing-search-quality-rating-guidelines-130592">Bing</a> use to improve search quality. As does LinkedIn, and we are hiring a <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=3559257">program manager for crowdsourcing</a>.</p>
<p><strong>Conclusion</strong></p>
<p>As information retrieval systems become increasingly personalized and task-centric, I hope we will see more people using measures like WTF! @ k to evaluate their performance, as well as working to make results more explainable. After all, no one likes <a href="http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/">hurting their computer&#8217;s feelings</a> by screaming WTF! at it.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/08/20/wtf-k-measuring-ineffectiveness/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>Hiring: Taking It Personally</title>
		<link>http://thenoisychannel.com/2012/08/01/hiring-taking-it-personally/</link>
		<comments>http://thenoisychannel.com/2012/08/01/hiring-taking-it-personally/#comments</comments>
		<pubDate>Thu, 02 Aug 2012 06:09:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4272</guid>
		<description><![CDATA[As a manager, I&#8217;ve found that I mostly have two jobs: bringing great people onto the team, and creating the conditions for their success. The second job is the reason I became a manager &#8212; there&#8217;s nothing more satisfying than seeing people achieve greatness in both the value they create and their own professional development. [...]]]></description>
				<content:encoded><![CDATA[<p>As a manager, I&#8217;ve found that I mostly have two jobs: bringing great people onto the team, and creating the conditions for their success. The second job is the reason I became a manager &#8212; there&#8217;s nothing more satisfying than seeing people achieve greatness in both the value they create and their own professional development.</p>
<p>But the first step is getting those people on your team. And hiring great people is hard, even when you and your colleagues are building the <a href="http://www.linkedin.com/hiring">world&#8217;s best hiring solutions</a>! By definition, the best people are scarce and highly sought after.</p>
<p>At the risk of giving away my competitive edge, I&#8217;d like to offer a word of advice to hiring managers: take it personally. That is, make the hiring process all about the people you&#8217;re trying to hire and the people on your team.</p>
<p>How does that work in practice? It means that everyone on the team participates in every part of the hiring process &#8212; from sourcing to interviewing to closing. A candidate interviews with the team he or she will work with, so everyone is invested in the process. The interview questions reflect the real problems the candidate would work on. And interviews communicate culture in both directions &#8212; by the end of the interviews, it&#8217;s clear to both the interviewers and the candidate whether they would enjoy working together.</p>
<p>I&#8217;ve seen and been part of impersonal hiring processes. And I  understand how the desire to build a scalable process can lead to a bureaucratic, assembly-line approach. But I wholeheartedly reject it. Hiring is fundamentally about people, and that means making the process a human one for everyone involved.</p>
<p>And taking it personally extends to sourcing. Earlier this week, the LinkedIn data science <a href="http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/">team</a> hosted a happy hour for folks interested in learning more about us and our work. Of course we used our own technology to identify amazing candidates, but I emailed everyone personally, and the whole point of the event was to get to know one another in an informal atmosphere. It was a great time for everyone, and I can&#8217;t imagine a better way to convey the unique team culture we have built.</p>
<p>I&#8217;m all for technology and process that offers efficiency and scalability. But sometimes your most effective tool is your own humanity. When it comes to hiring, take it personally.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/08/01/hiring-taking-it-personally/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/08/01/hiring-taking-it-personally/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Upcoming Conferences: RecSys, HCIR, CIKM</title>
		<link>http://thenoisychannel.com/2012/07/20/upcoming-conferences-recsys-hcir-cikm/</link>
		<comments>http://thenoisychannel.com/2012/07/20/upcoming-conferences-recsys-hcir-cikm/#comments</comments>
		<pubDate>Fri, 20 Jul 2012 21:00:32 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4262</guid>
		<description><![CDATA[Long-time readers know that I have strong opinions about academic conferences. I find the main value of conferences and workshops to be facilitating face-to-face interaction among researchers and practitioners who share professional interests. An offline version of LinkedIn, if you will. This year, I&#8217;m focusing my attention on three conferences: RecSys, HCIR, and CIKM. Regrettably I [...]]]></description>
				<content:encoded><![CDATA[<p>Long-time readers know that I have <a href="http://thenoisychannel.com/2009/08/02/are-academic-conferences-broken-can-we-fix-them/">strong opinions</a> about academic conferences. I find the main value of conferences and workshops to be facilitating face-to-face interaction among researchers and practitioners who share professional interests. An offline version of LinkedIn, if you will.</p>
<p>This year, I&#8217;m focusing my attention on three conferences: <a href="http://recsys.acm.org/2012/">RecSys</a>, <a href="http://www.hcir.info/hcir-2012">HCIR</a>, and <a href="http://www.cikm2012.org/">CIKM</a>. Regrettably I won&#8217;t be able to attend <a href="http://http://www.sigir.org/sigir2012/">SIGIR</a>, <a href="http://strataconf.com/stratany2012">Strata NY</a>, or <a href="http://http://www.acm.org/uist/uist2012/">UIST</a>. But fortunately my colleagues are attending the first two, and hopefully some UIST attendees will be able to arrive a few days early and attend HCIR. Perhaps we can steal a page from <a href="http://http://wsdm2012.org/http://wsdm2012.org/">WSDM</a> and <a href="http://cscw2012.org/http://cscw2012.org/">CSCW</a> and arrange a <a href="http://research.microsoft.com/en-us/events/sss2012/">cross-conference social</a> in Cambridge.</p>
<p><strong>6th ACM Recommender System Conference (RecSys 2012)</strong></p>
<p>At RecSys, which will take place September 9-13 in Dublin,I&#8217;m co-organizing the <a href="http://recsys.acm.org/2012/industry_track.html">Industry Track</a> with <a href="http://www.linkedin.com/pub/yehuda-koren/7/614/856">Yehuda Koren</a>. The program features technology leaders from Facebook, Microsoft, StumbleUpon, The Echo Nest, Yahoo, and of course LinkedIn. I&#8217;m also delivering a keynote at the <a href="http://ls13-www.cs.uni-dortmund.de/homepage/rsweb2012/index.shtml">Workshop on Recommender Systems and the Social Web</a>. I hope to see you there, along with several of my colleagues who will be presenting their work on recommender systems at LinkedIn.</p>
<p><strong>6th Symposium on Human-Computer Interaction and Information Retrieval (HCIR 2012)</strong></p>
<p>The 6th HCIR represents a milestone &#8212; we&#8217;ve upgraded from a 1-day worksop to a 2-day symposium. We are continuing two great traditions: strong keynotes (<a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>) and the <a href="http://www.hcir.info/hcir-2012/challenge">HCIR Challenge</a> (focused on people search). The symposium will take place October 4-5 in Cambridge, MA. Hope to see many of you there. And, if you&#8217;re still working on your submissions and challenge entries, good luck wrapping them up by the July 29 deadline!</p>
<p><strong>21st ACM International Conference on Information and Knowledge Management (CIKM 2012)</strong></p>
<p>Finally, you can&#8217;t miss CIKM in Hawaii! This year&#8217;s conference will take place October 29 &#8211; November 2 in Maui. After co-organizing <a href="http://www.cikm2011.org/industryevent">last year&#8217;s industry track</a> in Glasgow, I&#8217;m delighted to be a speaker in <a href="http://www.cikm2012.org/industry_event.php">this year&#8217;s track</a>, which also includes researchers and practitioners from Adobe, eBay, Google, Groupon, IBM, Microsoft, Tencent, Walmart Labs, and Yahoo. A great program in one of the world&#8217;s most beautiful settings, how can you resist?</p>
<p>I hope to see many of you at one &#8212; hopefully all! &#8212; of these great events! But, if you can&#8217;t make it, be reassured that I&#8217;ll blog about them here.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/07/20/upcoming-conferences-recsys-hcir-cikm/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/07/20/upcoming-conferences-recsys-hcir-cikm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HCIR 2012 Challenge: People Search</title>
		<link>http://thenoisychannel.com/2012/07/08/hcir-2012-challenge-people-search/</link>
		<comments>http://thenoisychannel.com/2012/07/08/hcir-2012-challenge-people-search/#comments</comments>
		<pubDate>Sun, 08 Jul 2012 23:22:52 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4256</guid>
		<description><![CDATA[As we get ready for the Sixth Symposium on Human-Computer Interaction and Information Retrieval this October in Cambridge, MA, people around the world are working on their entries for the third HCIR Challenge. Our first HCIR Challenge in 2010 focused on exploratory search of a news archive. Thanks to the generosity of the Linguistic Data Consortium (LDC), we [...]]]></description>
				<content:encoded><![CDATA[<p>As we get ready for the <a href="http://hcir.info/hcir-2012/">Sixth Symposium on Human-Computer Interaction and Information Retrieval</a> this October in Cambridge, MA, people around the world are working on their entries for the <a href="http://hcir.info/hcir-2012/challenge">third HCIR Challenge</a>.</p>
<p>Our <a href="http://hcir.info/hcir-2010/challenge">first HCIR Challenge in 2010</a> focused on exploratory search of a news archive. Thanks to the generosity of the <a href="http://www.ldc.upenn.edu/">Linguistic Data Consortium (LDC)</a>, we were able to provide participants with access to the <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">New York Times (NYT) Annotated Corpus</a> free of charge. Six teams presented their entries:</p>
<p style="padding-left: 30px;"><a href="https://sites.google.com/site/hcirworkshop/hcir-2010/proceedings/Boscarino_cr36.pdf?attredirects=0">Search for Journalists: New York Times Challenge Report<br />
</a><em>Corrado Boscarino, Arjen P. de Vries, and Wouter Alink (Centrum Wiskunde and Informatica)</em></p>
<p style="padding-left: 30px;"><a href="https://sites.google.com/site/hcirworkshop/hcir-2010/proceedings/Kohlsch%C3%BCtter_cr38.pdf?attredirects=0">Exploring the New York Times Corpus with NewsClub<br />
</a><em>Christian Kohlschütter (Leibniz Universität Hannover)</em></p>
<p style="padding-left: 30px;"><strong><a href="https://sites.google.com/site/hcirworkshop/hcir-2010/proceedings/Matthews_cr32.pdf?attredirects=0">Searching Through Time in the New York Times</a> (WINNER)</strong><br />
<strong><em>Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)<br />
</em>(covered in <em>Technology Review</em>: &#8220;<a href="http://www.technologyreview.com/news/420424/a-search-service-that-can-peer-into-the-future/">A Search Service that Can Peer into the Future</a>&#8220;)</strong></p>
<p style="padding-left: 30px;"><a href="https://sites.google.com/site/hcirworkshop/hcir-2010/proceedings/Vydiswaran_cr34.pdf?attredirects=0">News Sync: Three Reasons to Visualize News Better<br />
</a><em>V.G. Vinod Vydiswaran (University of Illinois), Jeroen van den Eijkhof (University of Washington), Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research) </em></p>
<p style="padding-left: 30px;"><a href="https://sites.google.com/site/hcirworkshop/hcir-2010/proceedings/Zelevinsky_cr33.pdf?attredirects=0">Custom Dimensions for Text Corpus Navigation<br />
</a><em>Vladimir Zelevinsky (Endeca Technologies)</em></p>
<p style="padding-left: 30px;"><a href="https://sites.google.com/site/hcirworkshop/hcir-2010/proceedings/Zheng_cr35.pdf?attredirects=0">A Retrieval System Based on Sentiment Analysis<br />
</a><em>Wei Zheng and Hui Fang (University of Delaware)</em></p>
<p>In 2011, we continued wth a <a href="http://hcir.info/hcir-2011/challenge">Challenge focused on the problem of information availability</a>. Four teams presented their systems to address this particularly difficult area of information retrieval:</p>
<p style="padding-left: 30px;"><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NzE1YmM2YzE4ODBhYzRjZA" target="_blank">FreeSearch – Literature Search in a Natural Way<br />
</a><em>Claudiu S. Firan, Wolfgang Nejdl, Mihai Georgescu (University of Hanover), and Xinyun Sun (DEKE Lab MOE, Renmin)</em></p>
<p style="padding-left: 30px;"><strong><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MmZmM2Y5Yzg5OTM4NGI5NQ" target="_blank">Session-based search with Querium</a></strong><strong> (WINNER)<br />
</strong><strong><em>Gene Golovchinsky (FX Palo Alto Lab) and Abdigani Diriye (University College London)</em></strong></p>
<p style="padding-left: 30px;"><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NWI1NTc5NWNmNDlmZDUyZg" target="_blank">GisterPro<br />
</a><em>David L.Ostby and Edmond Brian (Visual Purple)</em></p>
<p style="padding-left: 30px;"><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjIwOWNlOWY4YTQzMDRmZA" target="_blank">Query Analytics Workbench<br />
</a><em>Antony Scerri, Matthew Corkum, Keith Gutfreund, Ron Daniel Jr., Michael Taylor (Elsevier Labs)</em></p>
<p><a href="http://hcir.info/hcir-2012/challenge">This year&#8217;s Challenge</a> focuses on people search &#8212; that is, on the problem of people and expertise finding.</p>
<p>Here are examples of the kinds of tasks we will publish after the systems are frozen at the end of August:</p>
<ul>
<li><strong>Hiring
<p></strong>Given a job description, produce a set of suitable candidates for the position. An example of a job description: <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=3004979">http://www.linkedin.com/jobs?viewJob=&amp;jobId=3004979</a>.<br />
<strong><br />
</strong></li>
<li><strong>Assembling a Conference Program
<p></strong>Given a conference&#8217;s past history, produce a set of suitable candidates for keynotes, program committee members, etc. for the conference. An example conference could be HCIR 2013, where past conferences are described at <a href="http://hcir.info/">http://hcir.info/</a>.<br />
<strong><br />
</strong></li>
<li><strong>Finding People to deliver Patent Research or Expert Testimony
<p></strong>Given a patent, produce a set of suitable candidates who could deliver relevant research or expert testimony for use in a trial. These people can be further segmented, e.g., students and other practitioners might be good at the research, while more senior experts might be more credible in high-stakes litigation. An example task would be to find people for <a href="http://www.articleonepartners.com/study/index/1658-system-and-method-for-providing-consumer-rewards">http://www.articleonepartners.com/study/index/1658-system-and-method-for-providing-consumer-rewards</a>.</li>
</ul>
<p>For all of the tasks there is a dual goal of obtaining a set of candidates (ideally organized or ranked) and producing a repeatable and extensible search strategy.</p>
<p>Best of luck to this year&#8217;s HCIR Challenge participants &#8212; I&#8217;m excited to see the systems that they present this October at the Symposium!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/07/08/hcir-2012-challenge-people-search/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/07/08/hcir-2012-challenge-people-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RecSys 2012 Industry Track</title>
		<link>http://thenoisychannel.com/2012/06/28/recsys-2012-industry-track/</link>
		<comments>http://thenoisychannel.com/2012/06/28/recsys-2012-industry-track/#comments</comments>
		<pubDate>Fri, 29 Jun 2012 05:41:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4245</guid>
		<description><![CDATA[I&#8217;m proud to be co-organizing the RecSys 2012 Industry Track with Yehuda Koren. Check out the line-up: Ronny Kohavi (Microsoft), Keynote Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics Ralf Herbrich (Facebook) Distributed, Real-Time Bayesian Learning in Online Services Ronny Lempel (Yahoo! Research) Recommendation Challenges in Web Media Settings Sumanth Kolar (StumbleUpon) Recommendations and Discovery [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://recsys.acm.org/2012/"><img class="alignnone" title="RecSys 2012" src="http://recsys.acm.org/wp-content/uploads/2012/04/dublin-small.jpg" alt="" width="500" height="90" /></a></p>
<p>I&#8217;m proud to be co-organizing the <a href="http://recsys.acm.org/2012/industry_track.html">RecSys 2012 Industry Track</a> with <a href="http://www.linkedin.com/pub/yehuda-koren/7/614/856">Yehuda Koren</a>.</p>
<p>Check out the line-up:</p>
<ul>
<li><a href="http://www.linkedin.com/in/ronnyk">Ronny Kohavi</a> (Microsoft), Keynote<br />
Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics</li>
</ul>
<ul>
<li><a href="http://www.linkedin.com/pub/ralf-herbrich/4/832/28a">Ralf Herbrich</a> (Facebook)<br />
Distributed, Real-Time Bayesian Learning in Online Services</li>
</ul>
<ul>
<li><a href="http://www.linkedin.com/pub/ronny-lempel/5/567/83a">Ronny Lempel</a> (Yahoo! Research)<br />
Recommendation Challenges in Web Media Settings</li>
</ul>
<ul>
<li><a href="http://www.linkedin.com/in/sumanthkolar">Sumanth Kolar</a> (StumbleUpon)<br />
Recommendations and Discovery at StumbleUpon</li>
</ul>
<ul>
<li><a href="http://www.linkedin.com/in/abhasin">Anmol Bhasin</a> (LinkedIn)<br />
Recommender Systems &amp; The Social Web</li>
</ul>
<ul>
<li><a href="http://www.linkedin.com/in/thoregraepel">Thore Graepel</a> (Microsoft Research)<br />
Towards Personality-Based Personalization</li>
</ul>
<ul>
<li><a href="http://www.linkedin.com/pub/paul-lamere/1/204/10">Paul Lamere</a> (The Echo Nest)<br />
I&#8217;ve got 10 million songs in my pocket. Now what?</li>
</ul>
<p>Hope to see you at RecSys this September in Dublin! <a href="http://www.regonline.co.uk/Register/Checkin.aspx?EventID=1075698">Registration</a> is open now.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/06/28/recsys-2012-industry-track/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/06/28/recsys-2012-industry-track/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Information Cascades, Revisited</title>
		<link>http://thenoisychannel.com/2012/06/12/information-cascades-revisited/</link>
		<comments>http://thenoisychannel.com/2012/06/12/information-cascades-revisited/#comments</comments>
		<pubDate>Tue, 12 Jun 2012 13:54:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4234</guid>
		<description><![CDATA[A couple of years ago, I blogged about an information cascade problem I&#8217;d read about in David Easley and Jon Kleinberg&#8216;s textbook on Networks, Crowds, and Markets. To recall the problem (which they themselves borrowed from Lisa Anderson and Charles Holt: The experimenter puts an urn at the front of the room with three marbles in it; she announces that there is a [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://whc.unesco.org/en/list/303"><img class="alignnone" title="Iguazu Falls" src="http://upload.wikimedia.org/wikipedia/commons/2/2c/Iguazu_D%C3%A9cembre_2007_-_Panorama_7.jpg" alt="" width="512" height="111" /></a></p>
<p>A couple of years ago, I blogged about an <a href="http://thenoisychannel.com/2010/11/17/an-information-cascade/">information cascade</a> problem I&#8217;d read about in <a href="http://www.arts.cornell.edu/econ/deasley/">David Easley</a> and <a href="http://www.cs.cornell.edu/home/kleinber/">Jon Kleinberg</a>&#8216;s textbook on <em><a href="http://www.cs.cornell.edu/home/kleinber/networks-book/">Networks, Crowds, and Markets</a></em>. To recall the problem (which they themselves borrowed from <a href="http://lrande.people.wm.edu/">Lisa Anderson</a> and <a href="http://people.virginia.edu/~cah2k">Charles Holt</a>:</p>
<blockquote><p>The experimenter puts an urn at the front of the room with three marbles in it; she announces that there is a 50% chance that the urn contains two red marbles and one blue marble, and a 50% chance that the urn contains two blue marbles and one red marble…one by one, each student comes to the front of the room and draws a marble from the urn; he looks at the color and then places it back in the urn without showing it to the rest of the class. The student then guesses whether the urn is majority-red or majority-blue and publicly announces this guess to the class.</p></blockquote>
<p>The fascinating result is that the sequence of guesses locks in on a single color as soon as two consecutive students agree. For example, if the first two marbles drawn are blue, then all subsequent students will guess blue. If the urn is majority-red, then it turns out there is a 16/21 probability that the sequence will converge to red and a 5/21 probability that it will converge to blue.</p>
<p>Let me explain why I find this problem so fascinating.</p>
<p>Consider a scenario where you are among a group of people faced with the single binary decision &#8212; let&#8217;s say, choosing red or blue &#8212; and that each of you is independently tasked with recommending the best decision given your own judgement and all available information. Assume further that each of you is perfectly rational and that each of your prior decisions (i.e., without knowing what anyone else thinks) is based on <a href="http://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">independent and identically distributed random variables</a>. Let&#8217;s follow the example above, in which each participant in the decision process has a prior corresponding to a <a href="http://en.wikipedia.org/wiki/Bernoulli_distribution">Bernoulli random variable</a> with probability p = 2/3.</p>
<p>If each of you makes a decision independently, then the expected fraction of participants who makes the right decision is 2/3.</p>
<p>But you could do better if you have a chance to observe others&#8217; independent decision making first. For example, if you get to witness 100 independent decisions, then you have a very low probability of going wrong by voting the majority. If you&#8217;d like the gory details, review the <a href="http://en.wikipedia.org/wiki/Binomial_random_variable#Cumulative_distribution_function">cumulative distribution function of binomial random variables</a>.</p>
<p>On the other hand, if the decisions happen sequentially and every person has access to all of the previous decisions, then we see an information cascade. Rationally, it makes sense to let previous decisions influence your own &#8212; and indeed 16/21 &gt; 2/3. But 16/21 is still almost a one in four chance of making the wrong decision, even after you witness 100 previous decisions. We are wasting a lot of independent input because of how participants are incented.</p>
<p>I can&#8217;t help wondering how changing the incentives could affect the outcome of this process. What would happen if participants were rewarded based, in whole or in part, on the accuracy of the participants who guess after them?</p>
<p>Consider as an extreme case rewarding all participants based solely on the accuracy of the final participant&#8217;s guess. In that case, the optimal strategy for all but the last participant is to ignore previous participants&#8217; guesses and vote based solely on their own independent judgements. Then the final participant combines these judgements with his or her own and votes based on the majority. The result makes optimal use of all participants&#8217; independent judgments, despite the sequential decision process.</p>
<p>But what if individuals are reward based on a combination of individual and collective success? Consider the 3rd participant in our example who draws a red marble after the previous participants guess blue. Let&#8217;s say that there are 5 participants in total. If the reward is entirely based on individual success, the 3rd participant will vote blue, yielding an expected reward of 2/3. If the reward is entirely based on group success, the 3rd participant will vote red, yielding an expected reward of 20/27 (details left as an exercise for the reader). If we make the reward evenly split between individual success and group success, the 3rd participant will still vote blue &#8212; the benefit from helping the group will not be enough to overcome the cost to the individual reward.</p>
<p>There&#8217;s a lot more math in the details of this problem, e.g. &#8220;<a href="http://www.saet.illinois.edu/papers_and_talks/event-07/nolearn270312.pdf">The Mathematics of Bayesian Learning Traps</a>&#8220;, by Simon Loertscher and Andrew McLennan. But there&#8217;s a simple take-away: incentives are crucial in determining how we best exploit our collective wisdom. Something to think about the next time you&#8217;re on a committee.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/06/12/information-cascades-revisited/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/06/12/information-cascades-revisited/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Scale, Structure, and Semantics</title>
		<link>http://thenoisychannel.com/2012/06/07/scale-structure-and-semantics/</link>
		<comments>http://thenoisychannel.com/2012/06/07/scale-structure-and-semantics/#comments</comments>
		<pubDate>Thu, 07 Jun 2012 23:49:33 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4227</guid>
		<description><![CDATA[This morning I had the pleasure to present a keynote address at the Semantic Technology &#38; Business Conference (SemTechBiz). I&#8217;ve had a long and warm relationship with the semantic technology community &#8212; especially with Marco Neumann and the New York Semantic Web Meetup. But I&#8217;m not exactly a fanboy of the semantic web, and I [...]]]></description>
				<content:encoded><![CDATA[<p><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0;" src="http://www.slideshare.net/slideshow/embed_code/13242356?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This morning I had the pleasure to present a keynote address at the <a href="http://semtechbizsf2012.semanticweb.com/">Semantic Technology &amp; Business Conference (SemTechBiz)</a>. I&#8217;ve had a long and warm relationship with the semantic technology community &#8212; especially with <a href="http://www.marconeumann.org/">Marco Neumann</a> and the <a href="http://www.meetup.com/semweb-25/">New York Semantic Web Meetup</a>.</p>
<p>But I&#8217;m not exactly a fanboy of the semantic web, and I wasn&#8217;t sure how the audience would respond to some of my more provocative assertions. Fortunately the reception was very positive. Several people approached me afterwards to thank me for presenting a balanced argument for combining big data with structured representations and for raising <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> issues.</p>
<p>A couple of people felt that <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> was old news. I&#8217;m delighted that faceted search is becoming increasingly common, but there is still a lot of opportunity to use it more often and more effectively, And I was pleasantly surprised at the interest in discussing extensions of faceted search to address relationships between entities, as well as other nuances. I&#8217;ll have to dive into those in future posts.</p>
<p>For now, I hope you enjoy the slides, and I encourage you to ask questions in the comments.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/06/07/scale-structure-and-semantics/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/06/07/scale-structure-and-semantics/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Visual Search Startup Modista Is Back!</title>
		<link>http://thenoisychannel.com/2012/06/02/visual-search-startup-modista-is-back/</link>
		<comments>http://thenoisychannel.com/2012/06/02/visual-search-startup-modista-is-back/#comments</comments>
		<pubDate>Sat, 02 Jun 2012 22:40:45 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4215</guid>
		<description><![CDATA[Long-time readers know that I&#8217;m a great fan of visual search startup Modista, which was a victim of software patent abuse. To my delight, Modista is back from the dead. Check it out! Also see my previous coverage of Modista.]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.modista.com/"><img class=" wp-image-4216 alignleft" title="Modista is back!" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/06/modista.png" alt="" width="486" height="313" /></a></p>
<p>Long-time readers know that I&#8217;m a great fan of visual search startup <a href="http://www.modista.com/">Modista</a>, which was a <a href="http://thenoisychannel.com/2009/12/26/r-i-p-modista/">victim</a> of <a href="http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/">software patent abuse</a>. To my delight, Modista is back from the dead. Check it out!</p>
<p>Also see my <a href="http://thenoisychannel.com/?s=modista">previous coverage of Modista</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/06/02/visual-search-startup-modista-is-back/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/06/02/visual-search-startup-modista-is-back/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HCIR 2012: Call for Participation</title>
		<link>http://thenoisychannel.com/2012/05/29/hcir-2012-call-for-participation/</link>
		<comments>http://thenoisychannel.com/2012/05/29/hcir-2012-call-for-participation/#comments</comments>
		<pubDate>Tue, 29 May 2012 15:08:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4207</guid>
		<description><![CDATA[Human-computer Information Retrieval (HCIR) combines research from the fields of human-computer interaction (HCI) and information retrieval (IR), placing an emphasis on human involvement in search activities. The HCIR Symposium (formerly known as the HCIR Workshop) has run annually since 2007. The event unites academic researchers and industrial practitioners working at the intersection of HCI and [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">Human-computer Information Retrieval (HCIR)</a> combines research from the fields of human-computer interaction (HCI) and information retrieval (IR), placing an emphasis on human involvement in search activities.</p>
<p>The <a href="http://www.hcir.info/">HCIR Symposium</a> (formerly known as the HCIR Workshop) has run annually since 2007. The event unites academic researchers and industrial practitioners working at the intersection of HCI and IR to develop more sophisticated models, tools, and evaluation metrics to support activities such as interactive information retrieval and exploratory search. It provides an opportunity for attendees to informally share ideas via posters, small group discussions and selected short talks.</p>
<p>The <a href="http://www.hcir.info/hcir-2012">Sixth Symposium on Human-Computer Interaction and Information Retrieval</a> will be held as a two-day event on October 4 and 5, 2012 at <a href="http://domino.research.ibm.com/cambridge/research.nsf/pages/index.html" rel="nofollow">IBM Research</a> in Cambridge, Massachusetts. We are delighted to bring the event back to its birthplace (<a href="http://hcir.info/hcir-2007">HCIR 2007</a> took place at MIT), and even more pleased to announce that our keynote speaker for the symposium this year will be UC Berkeley professor and <a href="http://searchuserinterfaces.com/">search user interfaces</a> pioneer <a href="http://people.ischool.berkeley.edu/~hearst/">Marti Hearst</a>.</p>
<p>Topics for discussion and presentation at the symposium include, but are not limited to:</p>
<ul>
<li>Novel interaction techniques for information retrieval.</li>
<li>Modeling and evaluation of interactive information retrieval.</li>
<li>Exploratory search and information discovery.</li>
<li>Information visualization and visual analytics.</li>
<li>Applications of HCI techniques to information retrieval needs in specific domains.</li>
<li>Ethnography and user studies relevant to information retrieval and access.</li>
<li>Scale and efficiency considerations for interactive information retrieval systems.</li>
<li>Relevance feedback and active learning approaches for information retrieval.</li>
</ul>
<p>Demonstrations of systems and prototypes are particularly welcome.</p>
<p>We are also excited to continue the <a href="http://www.hcir.info/hcir-2012/challenge">HCIR Challenge</a>, this year focusing on the problem of people and expertise finding. We are grateful to <a href="http://www.mendeley.com/">Mendeley</a> for providing this year&#8217;s corpus: a database based on Mendeley&#8217;s network of 1.6M+ researchers and 180M+ academic documents. Participants will build systems to enable efficient discovery of experts or expertise for applications such as collaborative research, team building, and competitive analysis.</p>
<p>In addition to the Challenge and a small number of research presentations, we will leave plenty of time for what participants have consistently told us that they find extremely valuable: informal discussions, posters and directed group discussions. Finally, we are extending our previous format to include a few full-length, fully-refereed archival quality papers that will be indexed in the <a href="http://dl.acm.org/">ACM Digital Library</a>.</p>
<p>We have extended the event to a second day to accommodate more presentations (including the full papers), and to leave plenty of time for discussion and for interaction around the poster session.  There will be a reception on Thursday evening.</p>
<p>Please consult the symposium web site,  <a href="http://www.hcir.info/hcir-2012">http://www.hcir.info/hcir-2012</a>, for full details. But here are some important dates to keep in mind:</p>
<ul>
<li>Deadline to request access to HCIR Challenge corpus: Friday, June 15</li>
<li>Submission deadline for position and research papers: Sunday, July 29</li>
</ul>
<p>This event would not be possible without generous support from industry and academia. This year&#8217;s supporters are <a href="http://www.research.ibm.com/">IBM Research</a>, <a href="http://www.linkedin.com/">LinkedIn</a>, <a href="http://research.microsoft.com/">Microsoft Research</a>, <a href="http://www.csail.mit.edu/">MIT CSAIL</a>, and <a href="http://www.oracle.com/">Oracle</a>. Microsoft Research is also providing funds for a limited number of student travel awards. Information about these awards is available at  <a href="http://hcir.info/hcir-2012/student-travel">http://hcir.info/hcir-2012/student-travel</a>.</p>
<p>Looking forward to seeing you in October!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/05/29/hcir-2012-call-for-participation/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/05/29/hcir-2012-call-for-participation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Science at LinkedIn: My Team</title>
		<link>http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/</link>
		<comments>http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/#comments</comments>
		<pubDate>Fri, 18 May 2012 06:48:44 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4197</guid>
		<description><![CDATA[Lots of people ask me what it&#8217;s like to be a data scientist at LinkedIn. The short answer: it&#8217;s awesome. Folks like Pete Skomoroch and team are building data products related to identity and reputation, such as Skills and InMaps. Yael Garten is leading the effort to understand and increase mobile engagement. And other folks work [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/in/josephadler"><img class="alignnone" title="Joe Adler" alt="" src="http://m4.licdn.com/media/p/3/000/062/21e/2d219a0.jpg" width="150" height="150" /></a><a href="http://www.linkedin.com/pub/ahmet-bugdayci/24/853/758"><img class="alignnone" title="Ahmet Bugdayci" alt="" src="http://m3.licdn.com/media/p/3/000/06e/2ca/22bd425.jpg" width="150" height="150" /></a><a href="http://www.linkedin.com/in/heyningcheng"><img class="alignnone" title="Heyning Cheng" alt="" src="http://m4.licdn.com/media/p/3/000/079/0ab/0c3e96d.jpg" width="150" height="150" /></a><br />
<a href="http://www.linkedin.com/in/abhilad"><img class="alignnone" title="Abhi Lad" alt="" src="http://m3.licdn.com/media/p/1/000/0fd/07d/08ecf54.jpg" width="150" height="150" /></a><a href="http://www.linkedin.com/in/gloriatlau"><img class="alignnone" title="Gloria Lau" alt="" src="http://m4.licdn.com/media/p/1/000/09b/214/1a76ad5.jpg" width="150" height="150" /></a><a href="http://www.linkedin.com/in/mrogati"><img class="alignnone" title="Monica Rogati" alt="" src="http://m3.licdn.com/media/p/2/000/0b0/337/26cfec3.jpg" width="150" height="150" /></a><br />
<a href="http://www.linkedin.com/pub/daria-sorokina/19/302/a6b"><img class="alignnone" title="Daria Sorokina" alt="" src="http://m.c.lnkd.licdn.com/media/p/6/000/1c6/046/0ab4bb3.jpg" width="150" height="150" /></a><a href="http://www.linkedin.com/in/rameshsubramonian"><img class="alignnone" title="Ramesh Subramonian" alt="" src="http://m3.licdn.com/media/p/1/000/03e/2b0/11381d8.jpg" width="150" height="150" /></a><a href="http://www.linkedin.com/in/joyce"><img class="alignnone" title="Joyce Wang" alt="" src="http://m1-s.licdn.com/mpr/mpr/shrink_200_200/p/7/000/1dd/146/0a4e0c4.jpg" width="150" height="150" /></a></p>
<p>Lots of people ask me what it&#8217;s like to be a data scientist at LinkedIn. The short answer: it&#8217;s awesome. Folks like <a href="http://www.linkedin.com/in/peterskomoroch">Pete Skomoroch</a> and team are building data products related to identity and reputation, such as <a href="http://www.linkedin.com/skills/">Skills</a> and <a href="http://inmaps.linkedinlabs.com/">InMaps</a>. <a href="http://www.linkedin.com/in/yaelgarten">Yael Garten</a> is leading the effort to understand and increase mobile engagement. And other folks work on everything from open-source infrastructure to fraud detection. Amazing people helping our 160M+ members by deriving valuable insights from big data.</p>
<p>I wanted to take a moment to showcase my own team. As a team, we straddle the boundary between science and engineering. We work closely with several engineering teams to deliver products that our members use everyday.</p>
<p><a href="http://www.linkedin.com/in/josephadler">Joseph Adler</a> is a name you might recognize from your bookshelf: he wrote <em><a href="http://oreilly.com/catalog/9780596009427/ ">Baseball Hacks</a></em> and <em><a href="http://oreilly.com/catalog/9780596801717/">R in a Nutshell</a></em>, both published by O&#8217;Reilly. At LinkedIn, he is a data hacker extraordinaire, currently focused on improving the network update stream.</p>
<p><a href="http://www.linkedin.com/pub/ahmet-bugdayci/24/853/758">Ahmet Bugdayci</a> just joined LinkedIn this year, and he&#8217;s already on a tear. He&#8217;s working on a better approach to representing job titles, one of the most fundamental facets of our members&#8217; professional identity. And he&#8217;s a polyglot.</p>
<p><a href="http://www.linkedin.com/in/heyningcheng">Heyning Cheng</a> is our innovator in chief. He envisions data products and does whatever it takes to hack them together. Our recruiters are especially happy to be his beta testers, and we&#8217;re working to turn those prototypes into shipped product.</p>
<p><a href="http://www.linkedin.com/in/abhilad">Abhimanyu Lad</a> is working on the next generation of LinkedIn search. He&#8217;s already improved spelling correction and <a href="http://www.linkedin.com/search-fe/group_search">group search</a>, as well as building better ways to <a href="http://www.slideshare.net/abhimanyulad/is-it-time-to-abandon-abandonment">measure search effectiveness</a>. But stay tuned &#8212; the best is yet to come!</p>
<p><a href="http://www.linkedin.com/in/gloriatlau">Gloria Lau</a> leads all things data for the student initiative. Check out <a href="http://linkedin.com/alumni">LinkedIn Alumni</a> to see what she&#8217;s been up to. Students are the future, and we&#8217;re excited to be making LinkedIn a great tools for students, alumni, and universities.</p>
<p><a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a> spearheaded many of LinkedIn&#8217;s key products: the Talent Match system that matches jobs to candidates; the first machine learning model for People You May Know; and the first version of Groups You May Like. When she&#8217;s not working on our products, she gives awesome <a href="http://www.slideshare.net/mrogati">presentations</a>.</p>
<p><a href="http://www.linkedin.com/pub/daria-sorokina/19/302/a6b">Daria Sorokina</a> recently joined us and is working on search quality. She&#8217;s a hard-core machine learning researcher and developer: check out her open-source code for <a href="http://additivegroves.net/">additive groves</a>.</p>
<p><a href="http://www.linkedin.com/in/rameshsubramonian">Ramesh Subramonian</a> has been focused on data efforts for our international expansion. Over 60% of our members live outside the United States, and his efforts ensure that LinkedIn&#8217;s value proposition is a global one.</p>
<p><a href="http://www.linkedin.com/in/joyce">Joyce Wang</a> is a data science generalist. She is part of the search team, but she&#8217;s built great tools for log analysis and human evaluation that are finding great use across the company.</p>
<p>I hope that gives you a flavor of what it&#8217;s like to be a data scientist at LinkedIn &#8212; and on my team in particular.</p>
<p>Do you possess that rare combination of computer science background, technical skill, creative problem-solving ability, and product sense? If so, then I&#8217;d love to talk with you about opportunities to work on challenging problems with amazing people!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/05/17/data-science-at-linkedin-my-team/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Science as a Strategy</title>
		<link>http://thenoisychannel.com/2012/04/25/science-as-a-strategy/</link>
		<comments>http://thenoisychannel.com/2012/04/25/science-as-a-strategy/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 16:25:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4178</guid>
		<description><![CDATA[Last night, I had the pleasure to deliver the keynote address at the CIO Summit US. It was an honor to address an assembly of CIOs, CTOs, and technology executives from the nation&#8217;s top organizations. My theme was &#8220;Science as a Strategy&#8221;. To set the stage, I told the story of TunkRank: how, back in [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;"><a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/04/CIOnew.png"><img class="size-full wp-image-4190" title="CIO Summit US" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/04/CIOnew.png" alt="" width="318" height="67" /></a></p>
<p><iframe src="http://www.youtube.com/embed/dftt6Yqgnuw?rel=0" frameborder="0" width="480" height="272"></iframe></p>
<p>Last night, I had the pleasure to deliver the keynote address at the <a href="http://www.ciosummitna.com/">CIO Summit US</a>. It was an honor to address an assembly of CIOs, CTOs, and technology executives from the nation&#8217;s top organizations. My theme was &#8220;Science as a Strategy&#8221;.</p>
<p>To set the stage, I told the story of <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a>: how, back in 2009, I proposed a Twitter influence measure based on an explicit model of attention scarcity which <a href="http://thenoisychannel.com/2010/04/07/go-tunkrank/">proved</a> better than the intuitive but flawed approach of counting followers. The point of the story was not self-promotion, but rather to introduce my core message:</p>
<p><strong>Science is the difference between instinct and strategy.</strong></p>
<p>Given the audience, I didn&#8217;t expect this message to be particularly controversial. But we all know that belief is not the same as action, and science is not always popular in the C-Suite. Thus, I offered three suggestions to overcome the HIPPO (Highest Paid Person&#8217;s Opinion):</p>
<ul>
<li>Ask the right questions.</li>
<li>Practice good data hygiene.</li>
<li>Don’t argue when you can experiment!</li>
</ul>
<p><strong>Asking the Right Questions</strong></p>
<p>Asking the right questions seems obvious &#8212; after all, our answers can only be as good as the questions we ask. But science is littered with examples of people asking the wrong questions &#8212; from 19th-century <a href="http://en.wikipedia.org/wiki/Phrenology">phrenologists</a> measuring the sizes of people&#8217;s skulls to evaluate intelligence to IT executives measuring lines of code to evaluate programmer productivity. It&#8217;s easy for us (today) to recognize these approaches as pseudoscience, but we have to make sure we ask the right questions in our own organizations.</p>
<p>As an example, I turned to the challenge of improving the hiring process. One approach I&#8217;ve seen tried at both Google and LinkedIn is to measure the accuracy of interviewers &#8212; that is, to see how well the hire / no-hire recommendations of individual interviewers predict the final decisions. But this turns out to be the wrong question &#8212; in large part because negative recommendations (especially early ones) weigh much more heavily in the decision than positive ones.</p>
<p>What we found instead was that we should focus on efficiency as an optimization problem. More specifically, there is a trade-off: short-circuiting the process as early as possible (e.g., after the candidate performs poorly on the first phone screen) reduces the average time per candidate, but it also reduces the number of good candidates who make it through the process. To optimize overall throughput (while keeping our high bar), we&#8217;ve had to calibrate the upstream filters. How to optimize that upstream filter turns out to be the right question to ask &#8212; and one we still continue to iterate on.</p>
<p>More generally, I talked about how, when we hire <a href="http://www.quora.com/If-I-want-to-do-Data-Science-would-LinkedIn-or-Twitter-be-a-better-place-to-start-work/answer/Daniel-Tunkelang">data scientists at LinkedIn</a>, we look for not only strong analytical skills but also the product and business sense to pick the right questions to ask – questions whose answers create value for users and drive key business decisions. Asking the right questions is the foundation of good science.</p>
<p><strong>Practicing Good Data Hygiene</strong></p>
<p>Data mining is amazing, but we have to watch out for its pejorative meaning of discovering spurious patterns. I used the <a href="http://www.investopedia.com/terms/s/superbowlindicator.asp">Super Bowl Indicator </a>as an example of data mining gone wrong &#8212; with 80% accuracy, the division (AFC vs. NFC) of the Super Bowl champion predicts the coming year&#8217;s stock market performance. Indeed, the NFC won this year (<a href="http://en.wikipedia.org/wiki/Super_Bowl_XLVI">go Giants!</a>) and subsequent market gains have been consistent with this indicator (so far).</p>
<p>We can all laugh at these misguided investors, but we make these mistakes all the time. Despite what researchers have called the &#8220;<a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/35179.pdf">unreasonable effectiveness of data</a>”, we still need the scientific method of first hypothesizing and then experimenting in order to obtain valid and useful conclusions. Without data hygiene, our desires, preconceptions, and other human frailties infect our rational analysis.</p>
<p>A very different example is using click-through data to measure the effectiveness of relevance ranking. This approach isn&#8217;t completely wrong, but it suffers from several flaws. And the fundamental flaw relates to data hygiene: how we present information to users infects their perception of relevance. Users assume that top-­ranked results are more relevant than lower-­ranked results. Also, they can only click on the results presented to them. To paraphrase <a href="http://en.wikipedia.org/wiki/There_are_known_knowns">Donald Rumsfeld</a>: they don&#8217;t know what they don&#8217;t know. If we aren&#8217;t careful, a click-­based evaluation of relevance creates positive feedback and only reinforces our initial assumptions – which certainly isn&#8217;t the point of evaluation!</p>
<p>Fortunately, there are ways to avoid these biases. We can pay people to rate results presented to them in random order. We can use the <a href="http://en.wikipedia.org/wiki/Multi-armed_bandit">explore / exploit</a> technique to hedge against the ranking algorithm’s preconceived bias. And so on.</p>
<p>But the key take-away is that we have to practice good data hygiene, splitting our projects into the two distinct activities of hypothesis generation (i.e., exploratory analysis) and hypothesis testing using withheld data.</p>
<p><strong>Don’t Argue when you can Experiment</strong></p>
<p>I couldn&#8217;t resist the opportunity to cite Nobel laureate <a href="http://en.wikipedia.org/wiki/Daniel_Kahneman">Daniel Kahneman</a>&#8216;s seminal work on understanding human irrationality. I also threw in Mercier and Sperber&#8217;s recent work on <a href="http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/">reasoning as argumentative</a>. The summary: don&#8217;t trust anyone&#8217;s theories, not even mine!</p>
<p>Then what can you trust? The results of a well-­‐run experiment. Rather than debating data-­‐free assertions, subject your hypotheses to the ultimate test: controlled experiments. Not every hypothesis can be tested using a controlled experiment, but most can be.</p>
<p>I recounted the story of how <a href="http://glinden.blogspot.com/">Greg Linden</a> persuaded his colleagues at Amazon to implement shopping-cart recommendations through <a href="http://en.wikipedia.org/wiki/A/B_testing">A/B testing</a>, despite objections from a marketing SVP. Indeed, his work &#8212; and Amazon&#8217;s generally &#8212; has strongly advanced the practice of A/B testing in online settings.</p>
<p>Of course, A/B testing is fundamental to all of our work at LinkedIn. Every feature we release, whether it&#8217;s the <a href="http://blog.linkedin.com/2012/03/27/new-people-you-may-know/">new People You May Know interface</a> or <a href="http://blog.linkedin.com/2012/04/03/new-group-search/">improvements to Group Search relevance</a>, starts with an A/B test. And sometimes A/B testing causes us to not launch &#8212; we listen to the data.</p>
<p>Don&#8217;t argue when you can experiment. Decisions about how to improve products and processes should not be by an Oxford-­style debate. Rather, those decisions should be informed by data.</p>
<p><strong>Conclusion: Even Steve Jobs Made Mistakes</strong></p>
<p>Some of you may think that this is all good advice, but that science is no match for an inspired leader. Indeed, some pundits have seen Apple&#8217;s success relative to Google as an indictment of data-­driven decision making in favor of an approach that follows a leader&#8217;s gut instinct. Are they right? Should we throw out all of our data and follow our CEOs&#8217; instincts?</p>
<p>Let&#8217;s go back a decade. In 2002, Apple faced a pivotal decision – perhaps the most important decision in its history. The iPod was clearly a breakthrough product, but it was only compatible with the Mac. Remember that, back in 2002, Apple had only a 3.5% market share in the PC business. Apple&#8217;s top executives did their analysis and predicted that they could drive the massive success of the iPod by making it compatible with Windows, the dominant operating system with over 95% market share.</p>
<p>Steve Jobs resisted. At one point he said that Windows users would get to use the iPod &#8220;over [his] dead body&#8221;. After continued convincing, Jobs gave up. According to authorized biographer <a href="http://www.amazon.com/Steve-Jobs-Walter-Isaacson/dp/1451648537">Walter Isaacson</a>, Steve&#8217;s exact words were: “Screw it. I’m sick of listening to you assholes. Go do whatever the hell you want.” Luckily for Steve, Apple, and the consumer public, they did, and the rest is history.</p>
<p>It isn’t easy being one those ass­holes. But that’s our job, much as it was theirs. It’s up to us to turn data into gold, to apply science and technology to create value for our organizations. Because without data, we are gambling on our leaders&#8217; gut feelings. And our leaders, however inspired, have fallible instincts.</p>
<p>Science is the difference between instinct and strategy.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/25/science-as-a-strategy/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/25/science-as-a-strategy/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Semantic Link and Internet Evolution</title>
		<link>http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/</link>
		<comments>http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 22:56:32 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4170</guid>
		<description><![CDATA[    Recently I had a couple of great opportunities to share my thoughts publicly, and I wanted to make sure readers here were aware of them. The first was a special guest appearance on The Semantic Link, a program hosted by Paul Miller with regular panelists Peter Brown, Christine Connors, Eric Franzon, Eric Hoffer, Bernadette [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://semanticweb.com/the-semantic-link-with-guest-daniel-tunkelang-%E2%80%93-april-2012_b28246"><img class="alignnone" title="The Semantic Link" src="http://www.commoncrawl.org/wp-content/uploads/2012/01/semanticweb.com-logo.jpg" alt="" width="126" height="126" /></a>   <a href="http://www.internetevolution.com/radio.asp?doc_id=240580"><img class="alignnone" style="margin-top: 50px; margin-bottom: 50px;" title="Internet Evolution" src="http://img.deusm.com/internetevolution/intevol_logo_top_new.gif" alt="" width="308" height="30" /></a></p>
<p>Recently I had a couple of great opportunities to share my thoughts publicly, and I wanted to make sure readers here were aware of them.</p>
<p>The first was a special guest appearance on <a href="http://semanticweb.com/the-semantic-link-with-guest-daniel-tunkelang-%E2%80%93-april-2012_b28246">The Semantic Link</a>, a program hosted by <a href="http://www.linkedin.com/in/pau1mi11er">Paul Miller</a> with regular panelists <a href="http://pensivepeter.wordpress.com/">Peter Brown</a>, <a href="http://www.linkedin.com/in/cjmconnors">Christine Connors</a>, <a href="http://www.linkedin.com/in/ericfranzon">Eric Franzon</a>, <a href="http://www.linkedin.com/in/erichoffer">Eric Hoffer</a>, <a href="http://www.linkedin.com/in/bhyland">Bernadette Hyland</a>, and <a href="http://www.linkedin.com/in/andraz">Andraz Tori</a>. It was a lot of fun, and a great warm-up for the keynote I&#8217;ll be delivering on &#8220;<a href="http://semtechbizsf2012.semanticweb.com/sessionPop.cfm?confid=65&amp;proposalid=4800">Scale, Structure, and Semantics</a>&#8221; at the upcoming <a href="http://semtechbizsf2012.semanticweb.com/">Semantic Tech &amp; Business Conference (SemTechBiz)</a>, which will take place in San Francisco in June.</p>
<p>The second was a live interview on <a href="http://www.internetevolution.com/radio.asp?doc_id=240580">Internet Evolution</a>, hosted by <a href="http://www.linkedin.com/pub/mary-jander/7/b5/300">Mary Jander</a> and <a href="http://www.linkedin.com/pub/nicole-ferraro/4/921/a29">Nicole Ferraro</a>. They clearly did their homework, scouring my blog posts and web commentary for everything controversial I&#8217;d ever said &#8212; and then some! If that&#8217;s enough to pique your interest, then I encourage you to listen to the recorded interview and read the chat transcript at <a href="http://www.internetevolution.com/radio.asp?doc_id=240580">Internet Evolution</a>.</p>
<p>Happy to answer questions based on either of these sessions &#8212; comment away!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Noah Iliinsky: Tech Talk on Designing Data Visualizations</title>
		<link>http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/</link>
		<comments>http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 14:13:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4160</guid>
		<description><![CDATA[Note: This post was written by Yael Garten, a Senior Data Scientist at LinkedIn. Yael joined Linkedin in 2011, where she leads our mobile analytics team. She previously worked at Stanford on text mining, personalized medicine, and biomedical informatics. We live in an era of Big Data. But how do we use all of that [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/R-oiKt7bUU8?rel=0" frameborder="0" width="462" height="260"></iframe><br />
<em></em></p>
<p><em>Note: This post was written by <a href="http://www.linkedin.com/in/yaelgarten">Yael Garten</a>, a Senior Data Scientist at LinkedIn. Yael joined Linkedin in 2011, where she leads our mobile analytics team. She previously worked at Stanford on text mining, personalized medicine, and biomedical informatics.<br />
</em></p>
<p>We live in an era of Big Data. But how do we use all of that data to answer questions and communicate those answers effectively?</p>
<p>My colleagues and I at LinkedIn were fortunate enough to hear answers from <a href="http://www.linkedin.com/in/iliinsky">Noah Iliinsky</a>, who literally wrote the <a href="http://amzn.to/HJFDMe">book on designing data visualization</a>.</p>
<p>Earlier this month, we hosted Noah at LinkedIn to give a tech talk on &#8220;<a href="http://linkd.in/HaKPwk">Designing Effective Data Visualizations</a>&#8220;. We are proud to make these <a href="http://www.youtube.com/linkedintechtalks">tech talks</a> open to the public, and enjoyed a great mix of attendees from local companies and universities. If you couldn&#8217;t attend the talk in person or remotely, I encourage you to watch the recording, embedded above.</p>
<p>Why do we visualize data? As Noah tells us, visualization makes data accessible. It gives us faster access to actionable insights and allows access to huge amounts of data. Visualization enables both data exploration (when you are still trying to discover the story) and data explanation (when you have a story to tell). Noah reviewed some great examples (watch the talk!), with an emphasis on the dos and don&#8217;ts of data visualization.</p>
<p>In particular, he provided a step-by-step framework for traversing the path from question to answer:</p>
<p>Phase 1: Decide what to visualize.</p>
<ul>
<li>Understand the question your audience wants to answer.</li>
<li>Understand the actions they are hoping the answer will drive.</li>
<li>Consider who is consuming this data &#8212; their needs, biases, etc.</li>
<li>Decide what data to use &#8212; and what data <em>not</em> to use &#8212; and what relationships you are interested in.</li>
<li>Explore the data and construct a storyline.</li>
</ul>
<p>Phase 2: Decide how to visualize it.</p>
<ul>
<li>Use appropriate visual encodings for data and relationships (cf. <a href="http://complexdiagrams.com/properties">http://complexdiagrams.com/properties</a>).</li>
<li>Limit the data you include.</li>
<li>Use position for your most important relationship.</li>
<li>Try different axes.</li>
<li>Show your visualization to different people, without explanations. Show an expert, show a layman.</li>
<li>Iterate, iterate, iterate!</li>
</ul>
<p>Noah also shared his thoughts on how to visualize social networks. He recommended useful tools for data visualization, including <a href="http://www.tableausoftware.com/">Tableau</a>, <a href="http://spotfire.tibco.com/">Spotfire</a>, <a href="http://mbostock.github.com/d3/">D3</a>, <a href="http://processing.org/">Processing</a>, <a href="http://had.co.nz/ggplot2/">ggplot2</a>, <a href="http://www.omnigroup.com/products/omnigraffle/">Omnigraffle</a>, and <a href="http://www.omnigroup.com/products/omnigraphsketcher/">OmnigraphSketcher</a>.</p>
<div>Finally, he left us with key lessons to take home:</div>
<ul>
<li>You are not your audience. This is a huge lesson that all of us must internalize to be great at what we do. Consider what you need to communicate to marketers, investors, member of the general public, etc.</li>
<li>Do user research! Understand your users&#8217; hopes, dreams, and favorite flavors! Understand their identity, their jargon, culture, etc.</li>
<li>Remember that your success is defined by your customer’s success. If you can’t satisfy your customer&#8217;s needs, you have failed &#8212; no matter how insightful your analysis.</li>
</ul>
<p>You can enjoy the talk by watching the embedded video above. And you can find more LinkedIn tech talks on our <a href="http://www.youtube.com/linkedintechtalks">YouTube channel</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data, Algorithms, and People</title>
		<link>http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/</link>
		<comments>http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/#comments</comments>
		<pubDate>Sat, 14 Apr 2012 20:49:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4155</guid>
		<description><![CDATA[One of the highlights of the recent Data 2.0 Summit was a panel featuring: Alexander Gray, CTO of SkyTree Anthony Goldbloom, CEO of Kaggle Josh Wills, Director of Data Science at Cloudera The focus of the panel was supposed to be about &#8220;Data Science and Predicting the Future&#8221;, but the most contentious topic was whether data, algorithms or people (that [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://data2summit.com/"><img class="wp-image-4143 alignleft" title="Data 2.0 Summit" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/03/data20summit.png" alt="" width="480" height="88" /></a></p>
<p>One of the highlights of the recent <a href="http://thenoisychannel.com/2012/03/30/data-2-0-summit/">Data 2.0 Summit</a> was a panel featuring:</p>
<ul>
<li><a href="http://www.linkedin.com/pub/alexander-gray/4/4b6/b55">Alexander Gray</a>, CTO of <a href="http://www.skytreecorp.com/">SkyTree</a></li>
<li><a href="http://www.linkedin.com/in/anthonygoldbloom">Anthony Goldbloom</a>, CEO of <a href="http://www.kaggle.com/">Kaggle</a></li>
<li><a href="http://www.linkedin.com/pub/josh-wills/0/82b/138">Josh Wills</a>, Director of Data Science at <a href="http://www.cloudera.com/">Cloudera</a></li>
</ul>
<p>The focus of the panel was supposed to be about &#8220;Data Science and Predicting the Future&#8221;, but the most contentious topic was whether data, algorithms or people (that is, the data scientists themselves) were the most important factor in the practice and success of data science.</p>
<p>Yes, we one-upped the <a href="http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine">debate</a> that my colleague <a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a> instigated at this year&#8217;s <a href="http://strataconf.com/strata2012/">Strata</a> conference. In fact, Josh cited the &#8220;better data beats more data beats clever algorithms&#8221; argument that Monica made in <a href="http://strataconf.com/strata2012/public/schedule/detail/22538">her own Strata presentation</a>. And, just like at Strata, there was a healthy dose of audience participation.</p>
<p>Of course, I came down on the side of data &#8212; which I believe won the debate hands down.</p>
<p>I&#8217;m a fan of clever algorithms, which Alexander had to defend given that Skytree&#8217;s core value proposition is better machine learning algorithms delivered at scale. But <a href="http://thenoisychannel.com/2009/03/31/the-unreasonable-effectiveness-of-data/">I&#8217;m with Peter Norvig et al.</a> on the dominance of data over algorithms.</p>
<p>Favoring data over people was a harder choice. Anthony naturally made the case for people (Kaggle&#8217;s claim to fame is assembling many of the world&#8217;s best data scientists by organizing competitions). Hopefully <a href="http://www.quora.com/If-I-want-to-do-Data-Science-would-LinkedIn-or-Twitter-be-a-better-place-to-start-work/answer/Daniel-Tunkelang">my team</a> won&#8217;t quit en masse when they read this blog post! But I think they&#8217;ll agree with me that, without the incredible data we work with at LinkedIn, they&#8217;d be unable to deliver the awesomeness that I&#8217;ve come to expect from them.</p>
<p>There&#8217;s a saying that we all cook from the same cookbooks, so that it&#8217;s the ingredients that make all the difference. To take the metaphor further, you can also try to <a href="http://blog.topprospect.com/2011/06/the-biggest-talent-losers-and-winners/">poach your rival&#8217;s chefs</a>. But data is the biggest entry barrier &#8212; and the most sustainable competitive advantage.</p>
<p>Of course, we should have the best people apply the best algorithms to work with the best data. But data comes first. The best meal starts with the best ingredients.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Video of Strata 2012 Talk on Humans, Machines, and the Dimensions of Microwork</title>
		<link>http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/</link>
		<comments>http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/#comments</comments>
		<pubDate>Sat, 31 Mar 2012 20:39:21 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4151</guid>
		<description><![CDATA[&#160; The video of the presentation that Claire Hunsaker and I delivered on &#8220;Humans, Machines, and the Dimensions of Microwork&#8221; at Strata 2012 is now available as part of the complete video compilation. I&#8217;ve taken the liberty to upload it to YouTube &#8212; feel free to watch the embedded video above.]]></description>
				<content:encoded><![CDATA[<p><iframe width="462" height="260" src="http://www.youtube.com/embed/nc5YZYG1p_w?rel=0" frameborder="0" allowfullscreen></iframe><br />
&nbsp;</p>
<p>The video of the presentation that <a href="http://www.linkedin.com/in/clairehunsaker">Claire Hunsaker</a> and I delivered on &#8220;<a href="http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/">Humans, Machines, and the Dimensions of Microwork</a>&#8221; at Strata 2012 is now available as part of the <a href="http://shop.oreilly.com/product/0636920025412.do">complete video compilation</a>. I&#8217;ve taken the liberty to upload it to YouTube &#8212; feel free to watch the embedded video above.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data 2.0 Summit</title>
		<link>http://thenoisychannel.com/2012/03/30/data-2-0-summit/</link>
		<comments>http://thenoisychannel.com/2012/03/30/data-2-0-summit/#comments</comments>
		<pubDate>Fri, 30 Mar 2012 15:48:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4142</guid>
		<description><![CDATA[I&#8217;ll be participating in the Data 2.0 Summit on Tuesday, April 3rd, and I hope to see some of you there. Last year, my colleague (and fellow LinkedIn data scientist) Scott Nicholson attended and wrote this guest post about it. This year, I&#8217;m not only attending but participating on a panel about social data, moderated by AllthingsD Senior [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://data2summit.com/"><img class="wp-image-4143 alignleft" title="Data 2.0 Summit" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/03/data20summit.png" alt="" width="480" height="88" /></a></p>
<p>I&#8217;ll be participating in the <a href="http://data2summit.com/">Data 2.0 Summit</a> on Tuesday, April 3rd, and I hope to see some of you there. Last year, my colleague (and fellow <a href="http://www.quora.com/If-I-want-to-do-Data-Science-would-LinkedIn-or-Twitter-be-a-better-place-to-start-work/answer/Daniel-Tunkelang">LinkedIn data scientist</a>) <a href="http://www.linkedin.com/in/scottnicholsonphd">Scott Nicholson</a> attended and wrote this <a href="http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/">guest post</a> about it. This year, I&#8217;m not only attending but participating on a panel about social data, moderated by <a href="http://allthingsd.com/">AllthingsD</a> Senior Editor <a href="http://allthingsd.com/author/lizg/">Liz Gannes</a>.</p>
<p>There&#8217;s a great line-up of <a href="http://data2summit.com/speakers">speakers</a> for the day, including:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Bram_Cohen">Bram Cohen</a>, the founder, chief scientist, and inventor of <a href="http://www.bittorrent.com/">BitTorrent</a>, the leading peer-to-peer file sharing protocol for sharing large files on the Internet.</li>
<li><a href="http://www.linkedin.com/in/medriscoll">Michael Driscoll</a>, CTO and co-founder of the <a href="http://www.metamarketsgroup.com/">Metamarkets Group</a>. He moderated a fantastic <a href="http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine">debate</a> at the recent <a href="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/">Strata conference</a> about the relative importance of domain expertise or machine learning for data scientists.</li>
<li><a href="http://www.linkedin.com/in/gilelbaz">Gil Elbaz</a>, the founder and CEO of <a href="http://www.factual.com/">Factual</a>, an information marketplace. He is also the co-founder of Applied Semantics, which Google acquired in 2003 for $102M and turned into the foundation for AdSense (now a $10B business).</li>
<li><a href="http://www.linkedin.com/in/anthonygoldbloom">Anthony Goldbloom</a>, co-founder and CEO of <a href="http://www.kaggle.com/">Kaggle</a>,  a platform for data science competitions that generated a lot of discussion at Strata.</li>
<li><a href="http://www.linkedin.com/pub/stefan-weitz/0/9b3/299">Stefan Weitz</a>, director of search at Bing. He&#8217;ll be on my panel. Also see the discussion I had with him in the comment thread for a post on &#8220;<a href="http://thenoisychannel.com/2009/03/17/why-are-people-so-clueless-about-search/">Why Are People So Clueless About Search?</a>&#8220;.</li>
</ul>
<p>And lots more, but you get the idea. I&#8217;m thrilled to be part of such a talent-heavy program and looking forward to insightful discussions with with fellow panelists and attendees. Also a great excuse to spend a day in the city (note for my <a href="https://sites.google.com/site/245henry/">former townspeople</a> &#8212; that&#8217;s what they call San Francisco around here).</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/30/data-2-0-summit/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/30/data-2-0-summit/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Claudia Perlich: Tech Talk on Real-Time Bidding Optimization</title>
		<link>http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/</link>
		<comments>http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 03:45:01 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4138</guid>
		<description><![CDATA[Conventional wisdom holds that physical compliments are counter-productive as pick-up lines. Indeed, a dating site did some analysis showing a negative correlation between such compliments and the probability of a positive response. But, as m6d Chief Scientist and 3-time KDD Cup winner Claudia Perlich explained in her recent talk at LinkedIn, we have to watch [...]]]></description>
				<content:encoded><![CDATA[<p><iframe width="504" height="285" src="http://www.youtube.com/embed/5DSahEbJ4KY?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>Conventional wisdom holds that physical compliments are counter-productive as pick-up lines. Indeed, a <a href="http://blog.okcupid.com/index.php/online-dating-advice-exactly-what-to-say-in-a-first-message/">dating site</a> did some analysis showing a negative correlation between such compliments and the probability of a positive response.</p>
<p>But, as <a href="http://m6d.com/">m6d</a> Chief Scientist and 3-time <a href="http://www.sigkdd.org/kddcup/">KDD Cup</a> winner <a href="http://people.stern.nyu.edu/cperlich/">Claudia Perlich</a> explained in her recent talk at LinkedIn, we have to watch out for confounding variables. In the dating scenario above, beauty is a confounding variable: it determines both the probability of getting a positive response and of the probability of a suitor offering physical compliments. Hence, we need to control for the actual beauty or it can appear that making compliments is a bad idea.</p>
<p>Perlich does not work on online dating, but rather in the data-driven world of online advertising. Specifically, she and her team work on real-time bidding optimization.</p>
<p>Perlich described a variety of design choices that have general applicability to data science problems. For example, her team used hashed tokens of previously visited URLs, rather than the URLs themselves, as features for their machine learning models. They avoided the use of personally identifying information (PII) or even demographic information about their users. These decisions were counterintuitive — typically, more data leads to better results. But Perlich found that these restrictions did not sacrifice accuracy, and had the further benefit of keeping their approach general rather than application- or customer-specific.</p>
<p>Perlich also described several technical challenges that her team had to overcome. For example, they found they could not sample users, so they instead sampled events &#8212; that is, visits, impressions, and conversions. They also found that their <a href="http://en.wikipedia.org/wiki/Linear_model">linear models</a> tended to suffer from <a href="http://en.wikipedia.org/wiki/Overfitting">overfitting</a> in their top predictions &#8212; a problem they resolved by introducing a <a href="http://en.wikipedia.org/wiki/Spline_(mathematics)">spline</a> model.</p>
<p>The talk was deeply technical and yet very relevant and accessible to a broad audience of data scientists and engineers. There&#8217;s much more content than fits in this small summary, so I encourage you to watch the video! And you can <a href="www.youtube.com/linkedintechtalks">watch more LinkedIn tech talks here</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Facing Prosopagnosia</title>
		<link>http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/</link>
		<comments>http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 05:57:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4131</guid>
		<description><![CDATA[       In the past few years, prosopagnosia, also known as &#8220;face blindness&#8221;, has received a fair amount of attention from researchers, as well as from the popular press. My first exposure to the topic was Joshua Davis&#8217;s article entitled &#8220;Face Blind&#8220;, which appeared in Wired in November 2006. I was intrigued, especially since [...]]]></description>
				<content:encoded><![CDATA[<p><object width="212" height="140" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" /><param name="scale" value="noscale" /><param name="salign" value="lt" /><param name="background" value="#333333" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="flashvars" value="si=254&amp;&amp;contentValue=50121783&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121783n&amp;tag=contentMain;contentAux" /><embed width="212" height="140" type="application/x-shockwave-flash" src="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" scale="noscale" salign="lt" background="#333333" allowfullscreen="true" allowscriptaccess="always" flashvars="si=254&amp;&amp;contentValue=50121783&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121783n&amp;tag=contentMain;contentAux" /></object>      <object width="212" height="140" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" /><param name="scale" value="noscale" /><param name="salign" value="lt" /><param name="background" value="#333333" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="flashvars" value="si=254&amp;&amp;contentValue=50121784&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121784n&amp;tag=contentMain;contentAux" /><embed width="212" height="140" type="application/x-shockwave-flash" src="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" scale="noscale" salign="lt" background="#333333" allowfullscreen="true" allowscriptaccess="always" flashvars="si=254&amp;&amp;contentValue=50121784&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121784n&amp;tag=contentMain;contentAux" /></object></p>
<p>In the past few years, <a href="http://en.wikipedia.org/wiki/Prosopagnosia">prosopagnosia</a>, also known as &#8220;face blindness&#8221;, has received a fair amount of attention from researchers, as well as from the popular press.</p>
<p>My first exposure to the topic was Joshua Davis&#8217;s article entitled &#8220;<a href="http://www.wired.com/wired/archive/14.11/blind.html">Face Blind</a>&#8220;, which appeared in Wired in November 2006. I was intrigued, especially since I&#8217;d long recognized that I had difficulty recognizing people by face. Perhaps the person who has done most to raise awareness of prosopagnosia is neurologist <a href="http://en.wikipedia.org/wiki/Oliver_Sacks">Oliver Sacks</a>, who has prosopagnosia himself.</p>
<p>The Wired article inspired me to explore the subject. I discovered <a href="http://faceblind.org/">faceblind.org</a> and found quizzes that tested for prosopagnosia. On one of these, where random guessing would have earned a score of 50%, I scored in the low 60s. My initial reaction was that my score wasn&#8217;t so bad &#8212; it was a hard test! Then my wife took the test and scored in the high 90s. That&#8217;s when I realized that I didn&#8217;t just have difficulty recognizing faces &#8212; I was almost incapable of it.</p>
<p>Faced with this realization, I had to decide whether to share it with my friends and family, let alone with my broader set of social and professional acquaintances. It was tempting not to &#8212; after all, why tell the world that I wasn&#8217;t &#8220;normal&#8221;?</p>
<p>But eventually I realized that it would be better for people around me to know than not know. The biggest downside to prosopagnosia isn&#8217;t the momentary embarrassment of not recognizing someone &#8212; it&#8217;s the content fear of offending people who may think you don&#8217;t value them enough to recognize or acknowledge them.</p>
<p>Hence, I spread the word through my colleagues, ensuring that most of the people with whom I interacted regularly would find out without any big announcements. Some of my co-workers were surprised, since I do a pretty good job of recognizing people using non-facial clues &#8212; height, hair, clothing, where I run into them, etc. I have a great memory, and I have no problems with voice recognition. In other words, I have lots of work-arounds.</p>
<p>Fortunately, I work with a lot of people who understand <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a> &#8212; which is a great framework for understanding how I recognize people. I simply work with a different set of <a href="http://en.wikipedia.org/wiki/Feature_selection">features</a> than most people, but fortunately I achieve sufficient <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision and recall</a> to pass as &#8220;normal&#8221; most of the time.</p>
<p>Anyway, if you didn&#8217;t already know that I had prosopagnosia, welcome to the inner circle! And if you ever felt that I walked by you without recognizing or acknowledging you, please accept my belated apology.</p>
<p>Finally, if you&#8217;re curious to learn more about prosopagnosia, I encourage you to watch the 60 Minutes segments above.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Making Love with Data: Avinash Kaushik&#8217;s Strata 2012 Keynote</title>
		<link>http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/</link>
		<comments>http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/#comments</comments>
		<pubDate>Thu, 08 Mar 2012 07:23:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4121</guid>
		<description><![CDATA[Just watch the presentation, which stole the show at Strata 2012. The written word cannot do justice to Avinash&#8217;s passion and his extraordinary ability to communicate it.]]></description>
				<content:encoded><![CDATA[<p><iframe width="454" height="257" src="http://www.youtube.com/embed/CrSX97elHDA" frameborder="0" allowfullscreen></iframe></p>
<p>Just watch the presentation, which stole the show at <a href="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/">Strata 2012</a>. The written word cannot do justice to Avinash&#8217;s passion and his extraordinary ability to communicate it.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Humans, Machines &amp; the Dimensions of Microwork</title>
		<link>http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/</link>
		<comments>http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/#comments</comments>
		<pubDate>Mon, 05 Mar 2012 05:06:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4116</guid>
		<description><![CDATA[As per my previous post, I had a great time at the O’Reilly Strata Conference. It was a delight to participate in such a fantastic gathering of folks who work with big data. For those who missed my session, I&#8217;ve attached the slides that Claire and I presented. Some of the slides don&#8217;t make sense without [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/11863457" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<div style="padding: 5px 0 12px;">
<p>As per my <a href="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/">previous post</a>, I had a great time at the <a href="http://strataconf.com/strata2012">O’Reilly Strata Conference</a>. It was a delight to participate in such a fantastic gathering of folks who work with big data. For those who missed my session, I&#8217;ve attached the slides that Claire and I presented. Some of the slides don&#8217;t make sense without the voice-over, but hopefully there is enough self-contained content in them to be useful.</p>
<p>The presentation was recorded and will be available as part of the <a href="http://strataconf.com/strata2012/public/sv/q/385">Strata 2012 Video Compilation</a>.</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Strata 2012: Big Data is Bigger than Ever!</title>
		<link>http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/</link>
		<comments>http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 08:57:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4101</guid>
		<description><![CDATA[I spent the last three days at the O&#8217;Reilly Strata Conference, an assembly of two thousand over 2500 people focused on data science and its applications. While I&#8217;m wary of industry conferences from attending vendor-fests in my past life in the enterprise software world, Strata is an exceptionally good conference. The speakers were a who&#8217;s who of data [...]]]></description>
				<content:encoded><![CDATA[<p style="text-align: left;"><a href="http://strataconf.com/strata2012"><img class="wp-image-4102" title="Strata 2012" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/03/Strata-banner.png" alt="" width="481" height="99" /></a></p>
<p>I spent the last three days at the <a href="http://strataconf.com/strata2012">O&#8217;Reilly Strata Conference</a>, an assembly of <del>two thousand</del> over 2500 people focused on data science and its applications. While I&#8217;m wary of industry conferences from attending vendor-fests in my past life in the enterprise software world, Strata is an exceptionally good conference. The <a href="http://strataconf.com/strata2012/public/schedule/speakers">speakers</a> were a who&#8217;s who of data science, including Lucene and Hadoop creator <a href="http://strataconf.com/strata2012/public/schedule/speaker/103766">Doug Cutting</a>, search user interface pioneer <a href="http://strataconf.com/strata2012/public/schedule/speaker/66363">Marti Hearst</a>, and Google chief economist <a href="http://strataconf.com/strata2012/public/schedule/speaker/63098">Hal Varian</a>. You can find the tweet stream for the conference at hash tag <a href="https://twitter.com/#!/search/%23stratconf">#strataconf</a>.</p>
<p><strong>Tuesday</strong></p>
<p>I spent Tuesday in the <a href="http://strataconf.com/strata2012/public/schedule/detail/22903">Deep Data</a> session, billed as a no-holds-barred program for data scientists. My two favorite talks:</p>
<ul>
<li><a href="http://strataconf.com/strata2012/public/schedule/speaker/103739">Claudia Perlich</a>, winner of three <a href="http://www.sigkdd.org/kddcup/">KDD cups</a>, talked about using information to pick the right action and to influence people such that they behave in a way that is better for them, better for us, and possibly better for society in general.</li>
<li><a href="http://strataconf.com/strata2012/public/schedule/speaker/109152">Monica Rogati</a>, my colleague at LinkedIn and the epitome of a data scientist, delivered a fantastic talk about machine learning models and training data in the real world, extending <a href="http://norvig.com/">Peter Norvig</a>&#8216;s point about the &#8220;<a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/35179.pdf">unreasonable effectiveness of data</a>&#8221; to observe that more data beats clever algorithms but better data beats more data.</li>
</ul>
<p>But the most fun that day was the Oxford-style debate featuring <a href="http://www.drewconway.com/Drew_Conway/About.html">Drew Conway</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/76203">Pete Skomoroch</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/33953">Mike Driscoll</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/101103">DJ Patil</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/135062">Amy Heineike</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/104290">Pete Warden</a>, and <a href="http://strataconf.com/strata2012/public/schedule/speaker/1956">Toby Segaran</a>. The question proposed was absurdly <a href="http://dictionary.reference.com/wordoftheday/archive/2010/06/04.html">Manichean</a>: if you had to hire your first data scientist and could only hire one, would you pick a domain expert or a machine learning expert? After the moderator suppressed some initial attempts to hedge (&#8220;both&#8221;, &#8220;it depends&#8221;, etc.), the debaters ripped into the question by taking extreme positions and defending them with gusto. It was a lot of fun, with enthusiastic audience participation and the debaters exploiting their inside knowledge of their opponents&#8217; work histories. In the end, the machine learning side won by a small margin.</p>
<p>I then had the good fortune to grab dinner with Marti Hearst and Hal Varian at <a href="http://xanhrestaurant.com/">Xanh</a> &#8211; a wonderful mix of great food and conversation.</p>
<p><strong>Wednesday</strong></p>
<p>The Wednesday morning keynote session offered some gems:</p>
<ul>
<li>Cloudera CEO <a href="http://strataconf.com/strata2012/public/schedule/speaker/5259">Mike Olson</a> urged big data practitioners to focus on guns, drugs, and oil.</li>
<li>Doctor and data geek <a href="http://strataconf.com/strata2012/public/schedule/speaker/128471">Ben Goldacre</a> delivered a mesmerizing and disturbing talk about the suppression of inconvenient medical trial results and analytical tools to discover it.</li>
</ul>
<p>But the person who stole the show was Google&#8217;s <a href="http://strataconf.com/strata2012/public/schedule/speaker/43798">Avinash Kaushik</a>, who talked about making love with data to find orgasm-inducing actions to change the world and make more money. Unfortunately this was the one talk that was not recorded, but you can read the summary on <a href="https://plus.google.com/105279625231358353479/posts/CLwYzJM48L2">Avinash&#8217;s Google+ page</a>.</p>
<p>As a speaker, I held &#8220;office hours&#8221; on Wednesday. It was supposed to be a 40-minute slot for conference attendees to come and ask me question. But somehow those 40 minutes extended into three hours of conversation about everything from normalized <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">KL divergence</a> to <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/">interview problems</a> &#8212; and segued into a reception with specialty big-data cocktails. By the time I got back to my apartment, my voice, brain, and liver were spent.</p>
<p><strong>Thursday</strong></p>
<p>I spent most of Thursday morning in the speaker lounge, recovering from the previous evening and making last touches on my presentation. But I couldn&#8217;t resist attending a two-part session on privacy. Indeed, this session was distinctive enough to merits it&#8217;s own hash tag: <a href="https://twitter.com/#!/search/%23strataprivacy">#strataprivacy</a>.</p>
<p>The first part featured O&#8217;Reilly&#8217;s <a href="http://strataconf.com/strata2012/public/schedule/speaker/89224">Alex Howard</a> moderating Intelius Chief Privacy Officer <a href="http://strataconf.com/strata2012/public/schedule/speaker/41727">Jim Adler</a> and NYU PhD student <a href="http://strataconf.com/strata2012/public/schedule/speaker/122944">Solon Barocas</a> on a panel provocatively titled  &#8221;<a href="http://strataconf.com/strata2012/public/schedule/detail/22613">If Data Wants to Be Free, is Privacy a Prison?</a>&#8221; It was a great discussion, and I enjoyed the opportunity to offer my own provocative question through Twitter. Since the panelists were arguing that it was unethical to infer private facts from public data, I asked if they were trying to establish a new form of <a href="http://en.wikipedia.org/wiki/Thoughtcrime">thoughtcrime</a>.</p>
<p>The second panel, entitled &#8220;<a href="http://strataconf.com/strata2012/public/schedule/detail/22300">Pretty Simple Data Privacy</a>&#8220;, featured <a href="http://strataconf.com/strata2012/public/schedule/speaker/105140">Kaitlin Thaney</a> from Digital Science, <a href="http://strataconf.com/strata2012/public/schedule/speaker/124127">Betsy Masiello</a> from Google, and <a href="http://strataconf.com/strata2012/public/schedule/speaker/44389">John Wilbanks</a> from the Kauffman Foundation for Entrepreneurship. Given that today was the first day of <a href="http://www.google.com/hostednews/afp/article/ALeqM5ip-cz4mF_UePtGrJ-0Wq8wZ9ykPw">Google&#8217;s new privacy policy</a>, there was no avoiding focus on the associated controversy. I did try to get Betsy to address my charge that Google doesn&#8217;t think users own their search history (cf. &#8220;<a href="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/">Google vs. Bing: A Tweetle Beetle Battle Muddle</a>&#8220;), but she said she was unfamiliar with the details of that event. I do wish that someone at Google with more familiarity would respond publicly.</p>
<p>Back to the speaker room after lunch, until my own talk with Samasource&#8217;s <a href="http://strataconf.com/strata2012/public/schedule/speaker/125659">Claire Hunsaker</a> on &#8220;<a href="http://strataconf.com/strata2012/public/schedule/detail/22363">Humans, Machines, and the Dimensions of Microwork</a>&#8220;. I&#8217;ll post the slides (and there will be a video on the conference site), but the sound bite is that you need to keep crowdsourcing tasks simple, manage the trade-off between task value and difficulty, and watch out for systematic bias.</p>
<p>I wrapped up the conference by hearing <a href="http://strataconf.com/strata2012/public/schedule/speaker/107550">William Gunn</a> talk about how <a href="http://www.mendeley.com/">Mendeley</a> is disrupting <a href="http://nihlibrary.nih.gov/ResearchTools/Pages/bibliometrics.aspx">bibliometrics</a> and perhaps the entire academic publishing and reputation ecosystem. I laud his ambition and wish him and Mendeley luck in this quest.</p>
<p>&nbsp;</p>
<p>In summary, three days of great talks, conversations, and general enjoyment. My thanks to Strata organizers <a href="http://strataconf.com/strata2012/public/schedule/speaker/1">Edd Dumbill</a> and <a href="http://strataconf.com/strata2012/public/schedule/speaker/17816">Alistair Croll</a> for putting together such an outstanding event and for giving me the opportunity to participate.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Enjoying Seattle&#8217;s Best: UW, WSDM, and SSS</title>
		<link>http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/</link>
		<comments>http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/#comments</comments>
		<pubDate>Sun, 12 Feb 2012 20:44:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4095</guid>
		<description><![CDATA[My excursion to Seattle was delightful, and I thought I&#8217;d share some details with readers. I spent most of Friday at the University of Washington, meeting with graduating PhD students.  I&#8217;ve always known that UW is a top school, but I was particularly impressed with this batch. I was pleasantly surprised to see folks like [...]]]></description>
				<content:encoded><![CDATA[<p><img class="size-full wp-image-4096" title="Seattle's Best" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/02/seattles-best.png" alt="" width="293" height="95" /></p>
<p>My excursion to Seattle was delightful, and I thought I&#8217;d share some details with readers.</p>
<p>I spent most of Friday at the University of Washington, meeting with <a href="http://www.cs.washington.edu/education/grad/phdcandidates/">graduating PhD students</a>.  I&#8217;ve always known that UW is a top school, but I was particularly impressed with this batch. I was pleasantly surprised to see folks like <a href="http://www.cs.washington.edu/homes/nodira/Nodira_Khoussainova.html">Nodira Khoussainova</a> and <a href="http://www.cs.washington.edu/homes/kayur/">Kayur Patel</a> working to bring together the often disparate worlds of databases, machine learning, and HCI in order to make people more effective at solving &#8220;big data&#8221; problems. I realize that I&#8217;m aiding and abetting other employers with whom I compete for top talent, but it would be wrong not to encourage everyone to find worthy challenges for these budding scientists.</p>
<p>I then went to the <a href="http://spaceneedle.com/">Space Needle</a> to meet up with the <a href="http://wsdm2012.org/">WSDM 2012</a> crowd. <a href="http://research.microsoft.com/en-us/um/people/teevan/">Jaime Teevan</a> and <a href="http://www.cond.org/">Eytan Adar </a>outdid themselves, providing a great setting for folks to mingle, imbibe, and enjoy a spectacular view of Seattle.</p>
<p>Saturday I attended the &#8220;social&#8221; day of the WSDM conference.</p>
<p><a href="http://www.linkedin.com/pub/andrew-tomkins/0/87/713">Andrew Tomkins</a> chaired the first morning session, which included <a href="http://www.cs.columbia.edu/~hila/">Hila Becker</a>&#8216;s latest work on identifying event content in social media and <a href="http://cs-people.bu.edu/zg/">Georgios Zervas</a> presenting the work on the analyzing reputational effects of Groupon that triggered quite a controversy <a href="http://www.technologyreview.com/blog/arxiv/27150/">last</a> <a href="http://articles.businessinsider.com/2011-09-12/research/30155506_1_daily-deal-business-insider-post-reviews">September</a>. After the break came the spotlight section &#8212; a great sequence of 5-minute presentations that in which researchers both summarized their contributions and lured attendees to visit their posters. I hope that more conferences adopt this format, which optimizes for communicating ideas and discourages long-winded expositions.</p>
<p>I then had the pleasure to have lunch with <a href="http://www.jopedersen.com/jopedersen/Home.html">Jan Pedersen</a> and friends at <a href="http://blueacreseafood.com/">Blueacre Seafood</a> &#8212; great food and even better conversation. We both noted the irony that, even though we are practically neighbors, we only seems to meet up at events like these..</p>
<p>I made it back to the conference in time to hear the two best-paper awardees: <a href="http://www.cs.rochester.edu/~sadilek/">Adam Sadilek</a> on &#8220;<a href="http://hci.cs.rochester.edu/pubs/pdfs/following-friends.pdf">Finding Your Friends and Following Them to Where You Are</a>&#8221; and <a href="http://www.cs.berkeley.edu/~yaron/">Yaron Singer</a> on &#8220;<a href="http://www.cs.berkeley.edu/~yaron/papers/HowToWinFriendsAndInfluencePeople.pdf">How to Win Friends and Influence People, Truthfully: Influence Maximization Mechanisms for Social Networks</a>&#8220;. I highly recommend both papers, especially if you are interested in either social network prediction or the underlying economics of influence.</p>
<p>Another coffee break, and then the keynote: <a href="http://www.hilarymason.com/">Hilary Mason</a> on &#8220;The Secret Life of Social Links&#8221;. Hilary is a great speaker &#8212; I first met her when I invited her to the Workshop on Search and Social Media (<a href="http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/">SSM 2010</a>) at <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>. She didn&#8217;t disappoint, and it&#8217;s great to see practitioners like her crossing the aisle to engage the academic community. Not to mention infusing their slides with <a href="http://en.wikipedia.org/wiki/Lolcat">lolcats</a>.</p>
<p>The conference wrapped up at 5pm, but then we bussed over to Microsoft Research for the <a href="http://research.microsoft.com/en-us/events/sss2012/default.aspx">Social Search Social</a>. That was a fun event designed to cross-pollinate the WSDM and CSCW communities. <a href="http://research.microsoft.com/en-us/um/people/merrie/">Meredith Ringel Morris</a>, <a href="http://www.fxpal.com/?p=gene">Gene Golovchinksy</a>, <a href="http://twitter.com/#!/jerepick">Jeremy Pickens</a>, <a href="http://faculty.ist.psu.edu/reddy/">Madhu Reddy</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah</a>, and <a href="http://people.lis.illinois.edu/~twidale/">Michael Twidale</a> put together a great program of 45-second madness presentations and &#8220;speed-dating&#8221; to pair up WSDM and CSCW attendees. It was far too short, but a lot of fun. And some of us kept up the social spirit by grabbing dinner afterward at <a href="http://www.bluecsushi.com/">Blue C Sushi</a>.</p>
<p>To everyone I met in the last couple of days: thanks for the great company and conversation! Keep sharing ideas and making data and science social.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Social Wisdom in Seattle</title>
		<link>http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/</link>
		<comments>http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/#comments</comments>
		<pubDate>Sat, 04 Feb 2012 19:24:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4089</guid>
		<description><![CDATA[      First, I wanted to give readers a heads up that I&#8217;ll be in Seattle this Friday and Saturday. I&#8217;ll spend Friday afternoon at the University of Washington, meeting with some of their outstanding computer science doctoral students. My schedule filled up with unexpected haste! But if you&#8217;re on campus and urgently want [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.cs.washington.edu/"><img class="alignnone" style="margin-left: -10px; margin-right: -10px;" title="University of Washington Computer Science &amp; Engineering" src="http://www.cs.washington.edu/images/cse_logo_80x133.gif" alt="" width="106" height="64" /></a><a href="http://wsdm2012.org/"><img class="alignnone" title="WSDM 2012" src="http://wsdm2012.org/img/topheader.png?1312770667" alt="" width="274" height="72" /></a>     <a href="http://brynnevans.com/blog/wp-content/uploads/2010/03/social-search.png"><img class="alignnone" title="Social Search" src="http://brynnevans.com/blog/wp-content/uploads/2010/03/social-search.png" alt="" width="85" height="63" /></a></p>
<p>First, I wanted to give readers a heads up that I&#8217;ll be in Seattle this Friday and Saturday. I&#8217;ll spend Friday afternoon at the <a href="http://www.cs.washington.edu/">University of Washington</a>, meeting with some of their outstanding computer science doctoral students. My schedule filled up with unexpected haste! But if you&#8217;re on campus and urgently want to meet, let me know and I&#8217;ll see what I can do.</p>
<p>Saturday I&#8217;ll be attending the social track of <a href="http://wsdm2012.org/">WSDM 2012</a>, the premier international ACM conference covering research in the areas of search and data mining on the Web. I&#8217;m excited about the program, as well as the opportunity to catch up with friends and make new ones. Back in 2010, I had the pleasure of co-organizing the Workshop on Search and Social Media (<a href="http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/">SSM 2010</a>) and being the official ACM blogger for <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>. You can read my posts <a href="http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/">here</a>.</p>
<p>Then, on Saturday evening, I&#8217;ll be heading to Microsoft Research to attend the Social Search Social (<a href="http://research.microsoft.com/en-us/events/sss2012/">SSS 2012</a>). Hats off to organizers <a href="http://research.microsoft.com/en-us/um/people/merrie/">Meredith Ringel Morris</a>, <a href="http://www.fxpal.com/?p=gene">Gene Golovchinksy</a>, <a href="http://twitter.com/#!/jerepick">Jeremy Pickens</a>, <a href="http://faculty.ist.psu.edu/reddy/">Madhu Reddy</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah</a>, and <a href="http://people.lis.illinois.edu/~twidale/">Michael Twidale</a> for creating what looks to be a fun (and very social!) event. I&#8217;m especially looking forward to the 45-second &#8220;madness&#8221; presentations (in which I&#8217;m participating) and the &#8220;speed dating&#8221; to help cross-pollinate  the WSDM and <a href="http://en.wikipedia.org/wiki/Computer-supported_cooperative_work">CSCW</a> communities.</p>
<p>Hope to see some of you there, and of course will share what I learn here at The Noisy Channel. I also encourage you to follow the tweet streams for <a href="https://twitter.com/#!/search?q=%23wsdm2012">#wsdm2012</a> and <a href="https://twitter.com/#!/search?q=%23sss2012">#sss2012</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LinkedIn @ CMU</title>
		<link>http://thenoisychannel.com/2012/01/26/linkedin-cmu/</link>
		<comments>http://thenoisychannel.com/2012/01/26/linkedin-cmu/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 18:47:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4083</guid>
		<description><![CDATA[As regular readers know, I have a deep affection for Carnegie Mellon University, where I did my graduate work. I&#8217;m happy to announce that two of my colleagues (both fellow CMU PhDs) will be giving talks at CMU in a couple of weeks, and I hope that some of you will have the opportunities to [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://engineering.linkedin.com"><img title="LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/in-logo.jpeg" alt="" width="205" height="205" /></a><a href="http://www.cs.cmu.edu/"><img title="CMU School of Computer Science" src="http://www.cs.cmu.edu/~ref/naacl/logos/bronze/dragon-small.jpeg" alt="" width="277" height="241" /></a></p>
<p>As regular readers know, I have a deep affection for Carnegie Mellon University, where I did my graduate work. I&#8217;m happy to announce that two of my colleagues (both fellow CMU PhDs) will be giving talks at CMU in a couple of weeks, and I hope that some of you will have the opportunities to attend.</p>
<p>On Tuesday, February 7th, <a href="http://www.linkedin.com/in/abhilad">Abhimanyu Lad</a> will be hosting an information session at 6pm in Scaife Hall, Room 214. Abhi is rock star on our data science team, and he&#8217;s been working on the next generation of LinkedIn search. You can get a taste of his work from his recent <a href="http://hcir.info/hcir-2011">HCIR 2011</a> presentation, &#8220;<a href="http://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MWVlMGNhZWY5NTA3MzQ2ZA">Is it Time to Abandon Abandonment?</a>&#8220;. Abhi will talk about a variety of technical challenges that data scientists and engineers are working on at LinkedIn.</p>
<p>On Thursday, February 9th, <a href="http://www.linkedin.com/in/paulogilvie">Paul Ogilvie</a> will talk about &#8220;<a href="http://www.lti.cs.cmu.edu/LinkedInPaulOgilvie.pdf">Where Big Data Meets Real-Time: Efficiently Indexing and Ranking News using Activity</a>&#8221; at 3:30pm in GHC 6115. Paul is responsible for article relevance infrastructure and algorithms on <a href="http://www.linkedin.com/today/">LinkedIn Today</a>, a great example of <a href="http://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjEzNDNjZTk5NGYyYWQwOA">social navigation</a> &#8211; not to mention a <a href="http://techcrunch.com/2011/06/30/linkedin-traffic-twitter/">great success for users</a>. Paul will talk about the technical details that make LinkedIn Today possible, including a novel use of inverted lists to efficiently index and support real-time updates to document representations.</p>
<p>And, even if you can&#8217;t make it to the talks, I encourage you to visit the LinkedIn booth at the <a href="http://www.studentaffairs.cmu.edu/career/job-fairs/eoc/index.html">EOC</a> fair on Wednesday, February 8th. We&#8217;re looking for great software engineers and data scientists, and we&#8217;re especially interested in interns.</p>
<div>I hope that CMU students and faculty will take the time to meet Abhi, Paul, and their colleagues when they visit in a couple of weeks.</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/26/linkedin-cmu/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/26/linkedin-cmu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts about Job Performance</title>
		<link>http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/</link>
		<comments>http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 19:24:22 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4074</guid>
		<description><![CDATA[This is the season of annual reviews, at least at LinkedIn. Performance reviews can be daunting for both employees and managers &#8212; at least everywhere that I&#8217;ve worked. Not only are we as human beings terrible at delivering feedback, but we also receive bad advice as managers. For example, many of us have learned the [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://dilbert.com/strips/comic/2009-08-26/"><img class="alignnone" title="Dilbert: Performance Review" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/5000/600/65675/65675.strip.gif" alt="" width="500" height="155" /></a></p>
<p>This is the season of annual reviews, at least at LinkedIn. Performance reviews can be daunting for both employees and managers &#8212; at least <a href="http://www.linkedin.com/in/dtunkelang">everywhere that I&#8217;ve worked</a>. Not only are we as human beings terrible at delivering feedback, but we also receive bad advice as managers.</p>
<p>For example, many of us have learned the &#8220;feedback sandwich&#8221; method, a technique that doesn&#8217;t hold up to scientific validation. Watch the video below to see what Stanford professor <a href="http://www.stanford.edu/~nass/">Clifford Nass</a> has learned from his experiments (see my review of his book <a href="http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/">here</a>).</p>
<p><iframe src="http://www.youtube.com/embed/W2dGxE7E48I" frameborder="0" width="500" height="281"></iframe></p>
<p>Here is what I suggest as a format for performance feedback, whether for writing your own self-assessment or delivering feedback to reports or peers on their performance:</p>
<p style="padding-left: 30px;"><strong>1) What is your day job?</strong></p>
<p style="padding-left: 30px;">Everyone needs a day job &#8212; a mission with a crisp set of responsibilities and deliverables. If you don&#8217;t know what you&#8217;re responsible for delivering, you can&#8217;t assess how well you are delivering it. You should know and articulate your top priorities &#8212; at most three, with a clear #1. For further reading, I suggest the <a href="http://www.quora.com/OKRs-Objectives-and-Key-Results">Quora discussion on OKRs</a> (Objectives and Key Results), an idea pioneered by Intel and now used at top technology companies (including LinkedIn and Google).</p>
<p style="padding-left: 30px;"><strong>2) How are you performing in your day job?</strong></p>
<p style="padding-left: 30px;">Hopefully you make more contributions than you can count. But make sure that your day job comes first. If you find that a disproportionate fraction of your contribution is outside your day job, then consider changing your day job. Your top priority is to meet (hopefully exceed!) the expectations for your day job &#8212; expectations you should set early and revisit regularly. Performance reviews are a great opportunity to brag.</p>
<p style="padding-left: 30px;"><strong>3) What do you do beyond your day job?</strong></p>
<p style="padding-left: 30px;">Your day job should be strongly aligned with your team and company&#8217;s top priorities. But great employees contribute beyond their day job towards other team and company priorities. For example, <a href="http://engineering.linkedin.com/team">talent</a> is our top priority at LinkedIn, so we particularly value contributions to hiring and growing our talent. And, at least in every environment I&#8217;ve experienced, the best employees are those who help make others successful.</p>
<p style="padding-left: 30px;"><strong>4) How do you want to grow?</strong></p>
<p style="padding-left: 30px;">This is really a two-part question. First, what do you want to do next? That could mean getting better at your day job, evolving your current responsibilities, or taking on a different role. Second, what are you doing to get there? You are ultimately responsible for your own professional development. But one of your manager&#8217;s top responsibilities is to help you identify and advance along the path that is best for you. And performance reviews are a great opportunity to make you think about the future.</p>
<p>Regardless of how your company manages performance, these are the key questions you should think about. Performance feedback is a great opportunity to focus on professional development &#8212; your own and that of the people you work with everyday. Make the most of it!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Are You Hitched?</title>
		<link>http://thenoisychannel.com/2012/01/20/are-you-hitched/</link>
		<comments>http://thenoisychannel.com/2012/01/20/are-you-hitched/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 03:27:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4070</guid>
		<description><![CDATA[Let me preface this post by saying that this is my personal blog, and that my opinions here are not necessarily those of my employer. With that out of the way, I love the premise of Hitch.me: a dating site for professionals based on LinkedIn. I won&#8217;t confirm or deny the number of my colleagues [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/EnX_kEKe-3o?rel=0" frameborder="0" width="504" height="284"></iframe></p>
<p>Let me preface this post by saying that this is my personal blog, and that my opinions here are not necessarily those of my employer.</p>
<p>With that out of the way, I love the premise of <a href="http://www.hitch.me/">Hitch.me</a>: a dating site for professionals based on LinkedIn. I won&#8217;t confirm or deny the number of my colleagues who have thought about building a dating site based on our data, but it&#8217;s great to see someone using our <a href="http://developer.linkedin.com/apis">APIs</a> to do so. And the marketing video, while not exactly politically correct, is brilliant.</p>
<p>Yet another reason to work as a data scientist at LinkedIn!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/20/are-you-hitched/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/20/are-you-hitched/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Guided Exploration = Faceted Search, Backwards</title>
		<link>http://thenoisychannel.com/2012/01/17/guided-exploration/</link>
		<comments>http://thenoisychannel.com/2012/01/17/guided-exploration/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 14:00:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4041</guid>
		<description><![CDATA[Information Scent In the early 1990s, PARC researchers Peter Pirolli and Stuart Card developed the theory of information scent (more generally, information foraging) to evaluate user interfaces in terms of how well users can predict which paths will lead them to useful information. Like many HCIR researchers and practitioners, I&#8217;ve found this model to be [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.amazon.com/Blow-Up-Other-Stories-Julio-Cortazar/dp/0394728815"><img class="alignnone size-full wp-image-4043" title="Blow-Up by Julio Cortazar" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/01/blow-up.png" alt="" width="274" height="192" /></a></p>
<p><strong>Information Scent</strong></p>
<p>In the early 1990s, PARC researchers <a href="http://web.mac.com/peter.pirolli/Professional/About_Me.html">Peter Pirolli</a> and <a href="http://www2.parc.com/istl/groups/uir/people/stuart/stuart.htm">Stuart Card</a> developed the theory of information scent (more generally, <a href="http://en.wikipedia.org/wiki/Information_foraging">information foraging</a>) to evaluate user interfaces in terms of how well users can predict which paths will lead them to useful information. Like many <a href="http://hcir.info/">HCIR</a> researchers and practitioners, I&#8217;ve found this model to be a useful way to think about interactive information seeking systems.</p>
<p>Specifically, <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> is an exemplary application of the theory of information scent. Faceted search allows users to express an information need as a keyword search, providing them with a series of opportunities to improve the precision of the initial result set by restricting it to results associated with particular facet values.</p>
<p>For example, if I&#8217;m looking for folks to <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-PL-2350202">hire for my team</a>, I can start my search on LinkedIn with the keywords <em>[information retrieval]</em>, restrict my results to<em> Location: San Francisco Bay Area</em>, and then further restrict to <em>School: CMU</em>.</p>
<p><strong>Precision / Recall Asymmetry</strong></p>
<p>Faceted search is a great tool for information seeking systems. But it offers a flow that is asymmetric with respect to <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision and recall</a>.</p>
<p>Let&#8217;s invert the flow of faceted search. Rather than starting from a large, imprecise result set and progressively narrowing it; let&#8217;s start from a small, precise result set and progressively expand it. Since faceted search is often called &#8220;guided navigation&#8221; (a term <a href="http://www.linkedin.com/in/knabe">Fritz Knabe</a> and I coined at <a href="http://endeca.com/">Endeca</a>), let&#8217;s call this approach &#8220;guided exploration&#8221; (which has a nicer ring than &#8220;guided expansion&#8221;).</p>
<p>Guided exploration exchanges the roles of precision and recall. Faceted search starts with high recall and helps users increase precision while preserving as much recall as possible. In contrast, guided exploration starts with high precision and helps users increase recall while preserving as much precision as possible.</p>
<p>That sounds great in theory, but how can we implement guided exploration in practice?</p>
<p>Let&#8217;s remind ourselves why faceted search works so well. Faceted search offers the user information scent: the facet values help the user identify regions of higher precision relative to his or her information need. By selecting a sequence of facet values, the user arrives at a non-empty set that consists entirely or mostly of relevant results.</p>
<p><strong>How to Expand a Result Set</strong></p>
<p><strong></strong>How do we invert this flow? Just as enlarging an image is more complicated than reducing one, increasing recall is more complicated than increasing precision.</p>
<p>If our initial set is the result of selecting multiple facet values, then we may be able to increase recall by de-selecting facet values (e.g., de-selecting <em>San Francisco Bay Area</em> and <em>CMU</em> in my previous example). If we are using hierarchical facets, then rather than de-selecting a facet value, we may be able to replace it with a parent value (e.g., replacing <em>San Francisco Bay Area</em> with <em>California</em>). We can also remove one or more search keywords to broaden the results (e.g., <em>information</em> or <em>retrieval</em>).</p>
<p>Those are straightforward query relaxations. But there are more interesting ways to expand our results:</p>
<ul>
<li>We can replace a facet value with the union or that value and similar values (e.g., replacing <em>CMU</em> with <em>CMU </em>OR<em> MIT</em>).</li>
<li>We can replace the entire query (or any subquery) with a union of that query and the results for selecting a single facet value (e.g., (<em>[information retrieval] </em>AND<em> Location: San Francisco Bay Area </em>AND<em> School: CMU) </em>OR<em> Company: Google</em>)</li>
<li>We can replace the entire query (or any subquery) with a union of that query and the results for a keyword search a single facet value (e.g., (<em>[information retrieval] </em>AND<em> Location: San Francisco Bay Area </em>AND<em> School: CMU) </em>OR<em> [faceted search]</em>).</li>
</ul>
<p>As we can see, there are many ways to progressively refine a query in a way that expands the result set. The question is how we provide users with options that  increase recall while preserving as much precision as possible.</p>
<p><strong>Frequency : Recall :: Similarity : Precision</strong></p>
<p>Developers of faceted search systems don&#8217;t necessarily invest much thought into deciding which faceted refinement options to present to users. Some systems simply avoid dead ends, offer user all refinement options that least to a non-zero result set. This approach breaks down when there are too many options, in which case most systems offer users the most frequent facet values. A <a href="http://www.uie.com/events/virtual_seminars/facets/Faceted%20Search%20-%20Chapter%207.pdf">chapter</a> in my <a href="http://thenoisychannel.com/faceted-search-the-book/">faceted search book</a> discusses some other options.</p>
<p>Unfortunately, the number of options for guided exploration &#8211; at least if we go beyond the very limited basic options &#8212; is too vast to apply such a naive approach. Unions never lead to dead ends, and we don&#8217;t have a simple measure like frequency to rank our options.</p>
<p>Or perhaps we do. A good reason to favor frequent values as faceted refinement options is that they tend to preserve recall. What we need is a measure that tends to preserving precision when we expand a result set.</p>
<p>That measure is set similarity. More specifically, it is the asymmetric similarity between a set and a superset containing it, which we can think of as the former&#8217;s representativeness of the latter. If we are working with facets, we can measure this similarity in terms of differences between distributions of the facet values. If the current set has high precision, we should favor supersets that are similar to it in order to preserve precision.</p>
<p>I&#8217;ll spare readers the math, but I encourage you to read about <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a> and <a href="http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence">Jensen-Shannon divergence</a> if you are not familiar with measures of similarity between probability distributions. I&#8217;m also glossing over key implementation details  &#8211; such as how to model distributions of facet values as probability distributions, and how to handle  smoothing and normalization for set size. I&#8217;ll try to cover these in future posts. But for now, let&#8217;s assume that we can measure the similarity between a set and a superset.</p>
<p><strong>Guided Exploration: A General Framework</strong></p>
<p>We now have the elements to put together a general framework for guided exploration:</p>
<ul>
<li>Generate a set of candidate expansion options from the current search query using operations such as the following:</li>
<ul>
<li>De-select a facet value.</li>
<li>Replace a facet value with its parent.</li>
<li>Replace a facet value with the union of it and other values from that facet.</li>
<li>Remove a search keyword.</li>
<li>Replace a search keyword with the union of it and related keywords.</li>
<li>Replace the entire query with the union of it and a related facet value selection.</li>
<li>Replace the entire query with the union of it and a related keyword search.</li>
</ul>
<li>Evaluate each expansion option based on the similarity of the resulting set to the current one.</li>
<li>Present the most similar sets to the user as expansion options.</li>
</ul>
<p><strong>Visualizing Drift</strong></p>
<p>It&#8217;s one thing to tell a user that two sets are distributionally similar based on an information-theoretic measure, and another to communicate that similarity in a language the user can understand. Here is an example of visualizing the similarity between <em>[information retrieval]</em><em> </em>AND<em> School: CMU</em> and <em>[information retrieval]</em><em> </em>AND<em> School: (CMU or MIT)</em>:</p>
<p><img class="alignnone  wp-image-4047" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="[information retrieval] AND School: CMU" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/01/Information-Retrieval-AND-CMU.png" alt="" width="510" height="226" /></p>
<p style="text-align: center;"><img class="aligncenter" title="down arrow" src="http://upload.wikimedia.org/wikipedia/commons/a/a3/Down_arrow.svg" alt="" width="130" height="121" /></p>
<p><img class="alignnone  wp-image-4048" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="[information retrieval] AND School: (CMU OR MIT)" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/01/Information-Retrieval-AND-CMU-OR-MIT.png" alt="" width="539" height="226" /></p>
<p>As we can see from even this basic visualization, replacing <em>CMU</em> with (<em>CMU</em> OR<em> MIT)</em> increases the number of results by 70% while keeping a similar distribution of current companies &#8212; the notable exception being people who work for their almae matres.</p>
<p><strong>Conclusion</strong></p>
<p>Faceted search offers some of the most convincing evidence in favor of <a href="http://ils.unc.edu/~march/">Gary Marchionini</a>&#8216;s <a href="http://www.asis.org/Bulletin/Jun-06/marchionini.html">advocacy</a> that we &#8220;empower people to explore large-scale information bases but demand that people also take responsibility for this control&#8221;. Guided exploration aims to generalize the value proposition of faceted search by inverting the roles of precision and recall. Given the <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">importance of recall</a>, I hope to see progress in this direction. If this is a topic that interests you, give me a shout. Especially if you&#8217;re a student looking for an <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=2350247">internship</a> this summer!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/17/guided-exploration/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/17/guided-exploration/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Next Play!</title>
		<link>http://thenoisychannel.com/2012/01/01/next-play/</link>
		<comments>http://thenoisychannel.com/2012/01/01/next-play/#comments</comments>
		<pubDate>Sun, 01 Jan 2012 21:30:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4031</guid>
		<description><![CDATA[Every year brings its own adventures, but for me 2011 will be a tough act to follow. A year ago, I&#8217;d just started working at LinkedIn, and my biggest concern was selling our apartment in Brooklyn so that my family could join me in California. Little did I imagine that my new manager, who had just recruited me [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://blog.linkedin.com/2011/11/23/inday-culture-from-across-the-globe/"><img class="alignnone" title="Next Play: LinkedIn's garage band" src="http://blog.linkedin.com/wp-content/uploads/2011/11/pic-13-next-play.jpg" alt="" width="500" height="333" /></a></p>
<p>Every year brings its own adventures, but for me 2011 will be a tough act to follow.</p>
<p>A year ago, I&#8217;d just started <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">working at LinkedIn</a>, and my biggest concern was selling our <a href="http://sites.google.com/site/245henry/">apartment</a> in Brooklyn so that my family could join me in California.</p>
<p><a href="https://sites.google.com/site/245henry/"><img class="alignnone" title="Bye bye, Brooklyn!" src="http://sites.google.com/site/245henry/_/rsrc/1292981937914/home/View.jpg" alt="" width="200" height="150" /></a></p>
<p>Little did I imagine that my new manager, who had just recruited me from Google to LinkedIn (and persuaded my family to change coasts!), would leave three months later for a startup. Welcome to Silicon Valley! At the time, I felt unready for the abrupt transition into the product executive team. In retrospect, I&#8217;m thankful for the kick in the pants that helped me transform my role and brought the best out of a great team.</p>
<p><a href="http://genelu.com/2011/06/beware-of-infauxgraphics/"><img class="alignnone" title="Talent War Infographic (Credit: Gene Lu)" src="http://genelu.com/wp-content/uploads/2011/06/talent-war-infographic-redo.jpg" alt="" width="200" height="205" /></a></p>
<p>Summer brought the excitement of LinkedIn&#8217;s <a href="http://thenoisychannel.com/2011/05/19/going-public/">IPO</a>. The process was exhilarating, especially to someone who had worked for over a decade at a pre-IPO company.</p>
<p><a href="http://blog.linkedin.com/2011/05/19/lnkd-bell-ringing/"><img class="alignnone" title="LinkedIn IPO" src="http://farm6.staticflickr.com/5190/5737441522_2cd62b4e3f_b.jpg" alt="" width="200" height="122" /></a></p>
<p>Nonetheless, we didn&#8217;t let the IPO distract us from our mission. In March, we celebrated our <a href="http://blog.linkedin.com/2011/03/22/linkedin-100-million/">100 millionth member</a>; by November, we passed 135 million. And <a href="http://press.linkedin.com/about">lots more</a>. We released new data products like <a href="http://blog.linkedin.com/2011/02/03/linkedin-skills/">Skills</a> and <a href="http://blog.linkedin.com/2011/10/19/linkedin-alumni/">Alumni</a>. We won the <a href="http://www.oscon.com/oscon2011/public/schedule/detail/21349">OSCON Data Innovation Award</a> for contributions to the open source software for big data. We also acquired a few companies, including search engine startup <a href="http://k9ventures.com/blog/2011/10/11/congratulations-indextank/">IndexTank</a>. In short, we heeded the two short words on the back of our commemorative IPO t-shirts: &#8220;next play&#8221;.</p>
<p><a href="http://www.cmu.edu/homepage/computing/2011/fall/third-times-a-charm.shtml"><img class="alignnone" title="Celebrating the IndexTank acquisition with Diego Basch and Manu Kumar" src="http://www.cmu.edu/homepage/images/2011/third_time_charm_2_201x201.jpg" alt="" width="200" height="200" /></a></p>
<p>Fall was an intense season of conferences. Between <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/">CIKM</a>, <a href="http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/">HCIR</a>, <a href="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/">RecSys</a>, <a href="http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/">Strata</a>, and a talk at <a href="http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/">CMU</a>, it was a great opportunity to connect and reconnect with researchers and practitioners around the world. I am particularly proud of the success of this year&#8217;s HCIR workshop, which showed how much the workshop (now to become a 2-day symposium!) has grown up in five years.</p>
<p><a href="http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/"><img class="alignnone" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/Keeping-It-Professional.png" alt="" width="200" height="150" /></a></p>
<p>But what capped my year off was seeing <a href="http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/">Endeca</a>, the company I helped start in 1999, become one of Oracle&#8217;s largest <a href="http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/">acquisitions</a>. Even though it&#8217;s been two years since I left, Endeca will always be a core facet of my professional identity. I look forward to great things from all the folks I worked with.</p>
<p><a href="http://www.tbkconsult.com/blog/2011/11/02/oracle-acquires-endeca/"><img class="alignnone" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="Oracle acquires Endeca" src="http://www.tbkconsult.com/blog/wp-content/uploads/2011/11/Endeca-to-Oracle.jpg" alt="" width="200" height="120" /></a></p>
<p>That brings us to 2012, ready to start a new year of adventures. Tough or not, our job is to make every new year more amazing than the previous ones. I&#8217;m ready for the challenge, and I hope you are too.</p>
<p>Here&#8217;s a teaser of what I have planned:</p>
<ul>
<li>My team at LinkedIn is launching into 2012 with a strong focus on derived data quality and relevance. As regular readers know, I see data quality and richer interfaces for information seeking as <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/">inseparable concerns</a>. And, speaking of quality, we&#8217;re <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1827722">hiring</a>!</li>
<li>I&#8217;ll be speaking at <a href="http://strataconf.com/strata2012">Strata</a> in a couple of months with <a href="http://www.linkedin.com/in/clairehunsaker">Claire Hunsaker</a> of <a href="http://blog.linkedin.com/2011/10/20/linkedin-samasource/">Samasource</a> about &#8220;<a href="http://strataconf.com/strata2012/public/schedule/detail/22363">Humans, Machines, and the Dimensions of Microwork</a>&#8220;. I&#8217;m very excited to talk about the intersection of crowdsourcing and data science. And I&#8217;ll be joined by three of LinkedIn&#8217;s top data scientists: <a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a>, <a href="http://www.linkedin.com/in/shahsam">Sam Shah</a>, and <a href="http://www.linkedin.com/in/peterskomoroch">Pete Skomoroch</a>.</li>
<li>I&#8217;ll be co-chairing the RecSys Industry Track this fall with <a href="http://research.yahoo.com/Yehuda_Koren">Yehuda Koren</a>. I&#8217;m honored to have the opportunity to work with Yehuda, who was part of the <a href="http://www2.research.att.com/~volinsky/netflix/bpc.html">Netflix Grand Prize team</a> and won <a href="http://labs.yahoo.com/node/639">best paper</a> at RecSys 2011. We&#8217;re still putting together the program, but you can look at <a href="http://recsys.acm.org/2011/industry_track.shtml">last year&#8217;s program</a> to get an idea of what&#8217;s in store.</li>
<li>I&#8217;ll be at the CIKM Industry Event, this time as an invited speaker. CIKM will be take place in <a href="http://www.gohawaii.com/maui">Maui</a> this fall and I&#8217;m excited about the program that <a href="http://www.cs.technion.ac.il/~gabr/">Evgeniy Gabrilovich</a> is putting together for the Industry Event. It will be an all-invited program, just like <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/">last year</a>.</li>
</ul>
<p>I hope you&#8217;re also starting 2012 with a fresh sense of purpose. Let&#8217;s take a last moment to reflect on a great <a href="http://blog.linkedin.com/2011/12/23/linkedin-blog-2011/">2011</a>, and then…NEXT PLAY!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/01/next-play/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/01/next-play/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>HCIR 2011: Now on YouTube!</title>
		<link>http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/</link>
		<comments>http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/#comments</comments>
		<pubDate>Sat, 17 Dec 2011 16:56:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3983</guid>
		<description><![CDATA[The Fifth Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011), held on October 20th at Google&#8217;s main campus in Mountain View, California, was a resounding success. We has almost a hundred people, presenting a wide array of papers, posters, and challenge entries. You can read my summary of the event in an earlier blog post: &#8220;HCIR 2011: We [...]]]></description>
				<content:encoded><![CDATA[<p>The Fifth Workshop on Human-Computer Interaction and Information Retrieval (<a href="http://hcir.info/hcir-2011">HCIR 2011</a>), held on October 20th at <a href="http://maps.google.com/?q=Google%20Inc.@37.423156,-122.084917&amp;hl=en">Google&#8217;s main campus</a> in Mountain View, California, was a resounding success. We has almost a hundred people, presenting a wide array of <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/schedule/presentations">papers</a>, <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/posters">posters</a>, and <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/challenge">challenge entries</a>. You can read my summary of the event in an earlier blog post: &#8220;<a href="http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/">HCIR 2011: We Have Arrived!</a>&#8220;.</p>
<p>Better yet, you can now, for the first time in the workshop&#8217;s history, watch videos of the presentations. Embedded below are videos of Gary Marchionini&#8217;s keynote address and of the two paper presentation sessions. Thanks again to Google for being such a gracious host &#8212; now online as well as offline!</p>
<h3>Keynote</h3>
<p><iframe src="http://www.youtube.com/embed/jj5Q3FmPVl0" frameborder="0" width="504" height="284"></iframe></p>
<h3>Morning Presentations</h3>
<p><iframe src="http://www.youtube.com/embed/2112ylDx7zs" frameborder="0" width="504" height="284"></iframe></p>
<h3>Afternoon Presentations</h3>
<p><iframe src="http://www.youtube.com/embed/AAgKfvbH7ds" frameborder="0" width="504" height="284"></iframe></p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Jim Adler: The Accidental Chief Privacy Officer</title>
		<link>http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/</link>
		<comments>http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 22:49:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3974</guid>
		<description><![CDATA[Privacy is the third rail of the cloud. On one hand, the ease of sharing information and the power of analytics have produced extraordinary value for consumers, as well as great business models for companies that serve those consumers. On the other hand, people have good reason to worry about the unintended consequences of over-sharing. [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/Q9UqtRvPOVY" frameborder="0" width="504" height="284"></iframe></p>
<p>Privacy is the third rail of the cloud. On one hand, the ease of sharing information and the power of analytics have produced extraordinary value for consumers, as well as great business models for companies that serve those consumers. On the other hand, people have good reason to worry about the unintended consequences of over-sharing.</p>
<p>When I attended the <a href="http://strataconf.com/stratany2011">O&#8217;Reilly Strata New York Conference</a> in September, I had the pleasure to hear and meet Intelius&#8217;s <a href="http://jimadler.me/">Jim Adler</a> talk about being his company&#8217;s &#8220;accidental chief privacy officer&#8221;. <a href="http://www.intelius.com/">Intelius</a>&#8216;s main product is people search &#8212; an area that naturally brings up privacy concerns. Especially since Intelius aggregates and publishes information about people from databases of public records, eroding a history of &#8220;<a href="http://thenoisychannel.com/2008/05/01/privacy-through-difficulty/">privacy through difficulty</a>&#8220;. Impressed with Jim&#8217;s talk at Strata, I persuaded him to deliver a similar talk at <a href="http://www.youtube.com/linkedintechtalks">LinkedIn</a>, the <a href="http://www.youtube.com/watch?v=Q9UqtRvPOVY">video</a> of which you can find above. You can also find his slides on <a href="http://www.slideshare.net/jim-adler/20111116-linked-in1">SlideShare</a>.</p>
<p>Jim brings nuance to the discussion of privacy &#8212; nuance that discussions of online privacy often lack. For example, he responded to the recent controversy about social networks&#8217; &#8220;real names&#8221; policy with a measured post entitled &#8220;<a href="http://jimadler.me/post/9294501184/nyms-pseudonyms-or-anonyms-all-of-the-above">Nyms, Pseudonyms, or Anonyms? All of the Above</a>&#8220;.</p>
<p>Jim appropriately opened his talk by disclosing a personal example. He shares his name with a more prominent <a href="http://www.jimadler.com/">personal injury lawyer</a> who dominates search results for that name, raising the potential of taint by association. Intelius&#8217;s core technical problem is to cluster inputs from the sources it aggregates, thus mapping each person to exactly one record in its database.</p>
<p>Jim went on to note that we are at a stage in the privacy debate where we are likely to see more regulation. He makes a few key observations:</p>
<ul>
<li>Social norms, which form the basis of our laws and regulations (the notion of a &#8220;reasonable expectation of privacy) have changed suddenly, leading to a &#8220;privacy vertigo&#8221; where suddenly the whole world now feels like a small town.</li>
<li>Sharing is a gateway from private to public, which often leads to violation of expectations. This problem is not new, but the efficiency of online sharing dramatically amplifies the unintended consequences of sharing. It is crucial that the parties involved in sharing data also have shared expectations around how that data will be used or disclosed.</li>
<li>We need to distinguish between data use and data access, and not to try to regulate data use with data access regulations. He cites the <a href="http://en.wikipedia.org/wiki/Fair_Credit_Reporting_Act">Fair Credit Reporting Act</a> as one of the most inspired laws of the last 40 years to regulate data use. If you don&#8217;t have time to listen to the whole talk, I recommend you jump to <a href="http://www.youtube.com/watch?v=Q9UqtRvPOVY#t=25m12s">25:12</a>, where he discusses this law in detail.</li>
</ul>
<p>There&#8217;s a lot more in the talk, so I&#8217;m not going to try to summarize it all here. I strongly encourage you to check out the video (which includes lengthy <a href="http://www.youtube.com/watch?v=Q9UqtRvPOVY#t=53m00s">Q&amp;A</a>) and the slides. Better yet, let&#8217;s use the comments to discuss!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Slides and Summaries</title>
		<link>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/</link>
		<comments>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 06:23:39 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3967</guid>
		<description><![CDATA[I&#8217;ve posted slides and summaries for all ten CIKM 2011 Industry Event presentations: Stephen Robertson (Microsoft Research): Why Recall Matters John Giannandrea (Google): Freebase &#8211; A Rosetta Stone for Entities Jeff Hammerbacher (Cloudera): Experiences Evolving a New Analytical Platform: What Works and What&#8217;s Missing Khalid Al-Kofahi (Thomson Reuters): Combining Advanced Technology and Human Expertise in [...]]]></description>
				<content:encoded><![CDATA[<p><a><img class="alignnone" title="CIKM 2011 Industry Event" src="http://webdam.inria.fr/PIKM2011/cikm2011logo.jpg" alt="" width="466" height="74" /></a></p>
<p>I&#8217;ve posted slides and summaries for all ten CIKM 2011 Industry Event presentations:</p>
<ul>
<li>Stephen Robertson (Microsoft Research): <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">Why Recall Matters</a></li>
<li>John Giannandrea (Google): <a href="http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/">Freebase &#8211; A Rosetta Stone for Entities</a></li>
<li>Jeff Hammerbacher (Cloudera): <a href="http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/">Experiences Evolving a New Analytical Platform: What Works and What&#8217;s Missing</a></li>
<li>Khalid Al-Kofahi (Thomson Reuters): <a href="http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/">Combining Advanced Technology and Human Expertise in Legal Research</a></li>
<li>Chavdar Botev (LinkedIn): <a href="http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/">Databus: A System for Timeline-Consistent Low-Latency Change Capture</a></li>
<li>Ben Greene (SAP): <a href="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/">Large Memory Computers for In-Memory Enterprise Applications</a></li>
<li>David Hawking (Funnelback): <a href="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/">Search Problems and Solutions in Higher Education</a></li>
<li>Ed Chi (Google): <a href="http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/">Model-Driven Research in Social Computing</a></li>
<li>Vanja Josifovski (Yahoo! Research): <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/">Toward Deep Understanding of User Behavior on the Web</a></li>
<li>Ilya Segalovich (Yandex): <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/">Improving Search Quality at Yandex: Current Challenges and Solutions</a></li>
</ul>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Ilya Segalovich on Improving Search Quality at Yandex</title>
		<link>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/</link>
		<comments>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 06:10:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3963</guid>
		<description><![CDATA[This post is last in a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The final talk of the CIKM 2011 Industry Event was a talk from Yandex co-founder and CTO Ilya Segalovich on &#8220;Improving Search Quality at Yandex: Current Challenges and Solutions&#8220;. Yandex is the world&#8217;s #5 search engine. It dominates [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10357517?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is last in a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The final talk of the CIKM 2011 Industry Event was a talk from <a href="http://www.yandex.com/">Yandex</a> co-founder and CTO <a href="http://company.yandex.com/corporate_governance/board_of_directors/ilya_segalovich.xml">Ilya Segalovich</a> on &#8220;<a href="http://www.cikm2011.org/industryevent#is">Improving Search Quality at Yandex: Current Challenges and Solutions</a>&#8220;.</p>
<p>Yandex is the world&#8217;s #5 search engine. It dominates the Russian search market, where it has over 64% market share. Ilya focused on three challenges facing Yandex: result diversification, recency-specific ranking, and cross-lingual search.</p>
<p>For result diversification, Ilya focused on queries containing entities without any addition indicators of intent. He asserted that entities offer a strong but incomplete signal of query intent, and in particular that entities often call for suggested query reformulations. The first step in processing such a query is entity categorization. Ilya said that Yandex achieved almost 90% precision using machine learning, and over 95% precision by incorporating manually tuned heuristics. The second step is enumerating possible search intents for the identified category in order to optimize for intent-aware <a href="http://www.isi.edu/~metzler/papers/metzler-cikm09.pdf">expected reciprocal rank</a>. By diversifying entity queries, Yandex reduced abandonment on popular queries, increased click-through rates, and was able to highlight possible intents in result snippets.</p>
<p>Ilya then talked about the problem of balancing recency and relevance in handling queries about current events. He sees recency ranking as a diversification problem, since a desire for recent content is a kind of query intent. A challenge is managing recency-specific ranking is to predict the recency sensitivity of the user for a given query. Yandex considers factors such as the fraction of results found that are at most 3 days old, the number of news results, spikes in the query stream, lexical cues (e.g., searches for &#8220;explosion&#8221; or &#8220;fire&#8221;), and Twitter trending topics. He also referred to a WWW 2006 paper he co-authored on <a href="http://www2006.org/programme/files/pdf/p71.pdf">extracting news-related queries from web query logs</a>. The results of these efforts led to measurable improvements in click-based metrics of user happiness.</p>
<p>Ilya talked about a variety of efforts to support cross-lingual search. Russian users enter a significant fraction (about 15%) of non-Russian queries, but many still prefer Russian-language results. For example, a search for a company name return that company&#8217;s Russian-language home page if one is available. Yandex implements language personalization by learning a user&#8217;s language knowledge and using it as a factor in relevance computation. Yandex also uses machine translation to serve results for Russian-language queries when there are no relevant Russian-language results.</p>
<p>Ilya concluded by pitching the efforts that Yandex is making to participate in and support the broader information retrieval community, including running (and releasing data for) a <a href="http://imat-relpred.yandex.ru/en">relevance prediction challenge</a>. It&#8217;s great to see a reminder that there is more to web search than Google vs. Bing, and refreshing to see how much Yandex shares its methodology and results with the IR community.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Vanja Josifovski on Toward Deep Understanding of User Behavior on the Web</title>
		<link>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/</link>
		<comments>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/#comments</comments>
		<pubDate>Sun, 27 Nov 2011 21:14:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3959</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. Those of you who attended the SIGIR 2009 Industry Track had the opportunity to hear Yahoo researcher Vanja Josifovski make an eloquent case for ad retrieval as a new frontier of information retrieval. At the CIKM 2011 Industry Event, [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10353828?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Those of you who attended the <a href="http://sigir2009.org/Program/industry">SIGIR 2009 Industry Track</a> had the opportunity to hear Yahoo researcher <a href="http://research.yahoo.com/Vanja_Josifovski">Vanja Josifovski</a> make an eloquent case for <a href="http://thenoisychannel.com/2009/07/31/sigir-2009-day-3-industry-track-vanja-josifovski/">ad retrieval as a new frontier of information retrieval</a>. At the CIKM 2011 Industry Event, Vanja delivered an equally compelling presentation entitled &#8220;<a href="http://www.cikm2011.org/industryevent#vj">Toward Deep Understanding of User Behavior: A Biased View of a Practitioner</a>&#8220;.</p>
<p>Vanja first offered a vision in which the web of the future will be  your life partner, delivering life-long pervasive personalized experience. Everything will be personalized, and the experience will pervade your entire online experience &#8212; from your laptop to your web-enabled toaster.</p>
<p>He then brought us back to the state of personalization today. For search personalization, the low <a href="http://en.wikipedia.org/wiki/Entropy_(information_theory)">entropy</a> of query intent makes it difficult &#8212; or too risky &#8212; to significantly outperform the baseline of non-personalized search. In his view, the action today is in content recommendation and ad targeting, where there is high entropy of intent and lots of room for improvement over today&#8217;s crude techniques.</p>
<p>How do we achieve these improvements? We need more data, larger scale, and better methods for reasoning about data. In particular, Vanja noted the data we have today &#8212; searches, page views, connections, messages, purchases &#8212; represents the user&#8217;s thin observable state. In contrast, we lack data about the user&#8217;s internal state, e.g., is the user jet-lagged or worried about government debt. Vanja said that the only way to get more data is to motivate users by creating value for them with it &#8212; i.e., <a href="http://thenoisychannel.com/2011/04/14/social-utility-25/">social is give to get</a>.</p>
<p>Of course, we can&#8217;t talk about user&#8217;s hidden data without thinking about privacy. Vanja asserts that privacy is not dead, but that it&#8217;s in hibernation. So far, he argued, we&#8217;ve managed with a model of industry self-governance with relatively minor impact from data leaks &#8212; specifically as compared to the offline world. But he is apprehensive at the prospect of a major privacy breach inducing legislation that sets back personalization efforts for decades.</p>
<p>Vanja then talked about current personalization methods, including <a href="http://en.wikipedia.org/wiki/Machine_learning">learning</a> relationships among features, <a href="http://en.wikipedia.org/wiki/Dimensionality_reduction">dimensionality reduction</a>, and <a href="http://en.wikipedia.org/wiki/Smoothing">smoothing</a> using external data. He argues that many of the models are mathematically very similar to one another, and it is difficult to analyze the relative merits of the models as opposed to other implementation details of the systems that use them.</p>
<p>Finally, Vanja touched on scale issues. He noted that the <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> framework imposes significant restrictions on algorithms used for personalization, and that we need the right abstractions for modeling in parallel environments.</p>
<div data-is-reply-to="false" data-tweet-id="129588130402668544" data-item-id="129588130402668544" data-screen-name="dtunkelang" data-user-id="15937226">
<div>
<div>
<div>Vanja concluded his talk by citing the role of CIKM as a conference in bringing together the communities that research deep user understanding, information retrieval, and databases. Given the exciting <a href="http://www.gohawaii.com/maui">venue</a> for next year&#8217;s conference, I&#8217;m sure we&#8217;ll continue to see CIKM play this role!</div>
</div>
</div>
</div>
<p>ps. My thanks to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging his <a href="http://www.searchenginecaffe.com/2011/10/cikm-2011-industry-toward-deep.html">notes</a>.</p>
<p>&nbsp;</p>
<div></div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Ed Chi on Model-Driven Research in Social Computing</title>
		<link>http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/</link>
		<comments>http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/#comments</comments>
		<pubDate>Fri, 25 Nov 2011 21:23:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3951</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. Given the extraordinary ascent of all things social in today&#8217;s online world, we could hardly neglect this theme at the CIKM 2011 Industry Event. We were lucky to have Ed Chi, who recently left the PARC [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10164910" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Given the extraordinary ascent of all things social in today&#8217;s online world, we could hardly neglect this theme at the CIKM 2011 Industry Event. We were lucky to have <a href="http://www-users.cs.umn.edu/~echi/">Ed Chi</a>, who recently left the <a href="http://www.parc.com/">PARC</a> Augmented Social Cognition Group to work on Google+, presenting &#8220;<a href="http://www.cikm2011.org/industryevent#ec">Model-Driven Research in Social Computing</a>&#8220;.</p>
<p>Ed warned us at the beginning of the talk that his focus would be on work he&#8217;d done prior to joining Google. Nonetheless, he offered an interesting collection of public statistics about social activity associated with Google properties: 360M words per day being published on Blogger, 150 years of YouTube video being watched everyday on Facebook, and 40M+ people using Google+. Regardless of how Google has fared in the competition for social networking mindshare, Google is clearly no stranger to online social behavior.</p>
<p>Ed then dove into recent research that he and colleagues have done on Twitter activity. Since all of the papers he discussed are available online, I will only touch on highlights. I encourage you to read the full papers:</p>
<ul>
<li><a href="http://www-users.cs.umn.edu/~echi/papers/2010-socialcom/2010-06-25-retweetability-cameraready-v3.pdf">Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network</a></li>
<li><a href="http://www.parc.com/content/attachments/tweets-from-justin.pdf">Tweets from Justin Bieber&#8217;s Heart: the Dynamics of the &#8220;Location&#8221; Field in User Profiles</a></li>
<li><a href="http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2813/3225">Is Twitter a Good Place for Asking Questions? A Characterization Study</a></li>
<li><a href="http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2856/3250">Language Matters in Twitter: A Large Scale Study</a></li>
<li><a href="http://www-users.cs.umn.edu/~echi/papers/2010-UIST/eddi-uist2010.pdf">Eddi: Interactive Topic-based Browsing of Social Status Streams</a></li>
<li><a href="http://www-users.cs.umn.edu/~echi/papers/2010-CHI/Zerozero88-tweet-recommender-ASC-PARC.pdf">Short and Tweet: Experiments on Recommending Content from Information Streams</a></li>
<li><a href="http://www.grouplens.org/system/files/p217-chen.pdf">Speak Little and Well: Recommending Conversations in Online Social Streams</a></li>
</ul>
<p>Ed talked at some length about language-dependent behavior on Twitter. For example, tweets in French are more likely to contain URLs than those in English, while tweets in Japanese are less likely (perhaps because the language is more compact relative to Twitter&#8217;s 140-character limit?). Tweets in Korean are far more likely to be conversational (i.e., explicitly mentioning or replying to other users) than those in English. These differences remind us to be cautious in generalizing our understanding of online social behavior from the behavior of English-speaking users. Ed also talked about cross-language &#8220;brokers&#8221; who tweet in multiple languages: he sees these as indicating connection strength between languages, as well as giving us insight to improve cross-­language communication.</p>
<p>Ed then talked about ways to reduce information overload in social streams. These included <a href="http://www-users.cs.umn.edu/~echi/papers/2010-UIST/eddi-uist2010.pdf">Eddi</a>, a tool for summarizing social streams, and <a href="https://twitter.com/#!/zerozero88">zerozero88</a>, a closed experiment to produce a personal newspaper from a tweet stream. In analyzing the results of the zerozero88 experiment, Ed and his colleagues found that the most successful recommendation strategy combined users&#8217; self-voting with social voting by their friends of friends. They also found that users wanted both relevance and serendipity &#8212; a challenge since the two criteria often compete with one another.</p>
<p>Ed concluded by offering the following design rule: since interaction costs determine number of the people who participate in social activity, get more people into the system by reducing interaction cost. He asserted that this is a key design principle for Google+.</p>
<p>My skepticism about Google&#8217;s social efforts is a matter of public record (cf. <a href="http://thenoisychannel.com/2011/04/14/social-utility-25/">Social Utility, +/- 25%</a>; <a href="http://thenoisychannel.com/2011/07/04/google%C2%B1/">Google±?</a>). But hiring Ed Chi was a real coup for Google, and I&#8217;m optimistic about what he&#8217;ll bring to the Google+ effort.</p>
<p>ps. My thanks to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging his <a href="http://www.searchenginecaffe.com/2011/10/cikm-2011-industry-model-driven.html">notes</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: David Hawking on Search Problems and Solutions in Higher Education</title>
		<link>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/</link>
		<comments>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 01:45:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3946</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. One of the recurring themes at the CIKM 2011 Industry Event was that not all search is web search. Stephen Robertson, in advocating why recall matters, noted that web search was exceptional rather than typical as [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10280109?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>One of the recurring themes at the CIKM 2011 Industry Event was that not all search is web search. Stephen Robertson, in advocating <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">why recall matters</a>, noted that web search was exceptional rather than typical as an information retrieval domain. Khalid Al-Kofahi on spoke about the challenges of <a href="http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/">legal search</a>. Focusing on a different vertical, <a href="http://www.funnelback.com/">Funnelback</a> Chief Scientist <a href="http://david-hawking.net/">David Hawking</a> spoke about &#8220;<a href="http://www.cikm2011.org/industryevent#dh">Search Problems and Solutions in Higher Education</a>&#8220;.</p>
<p>David spent most of the presentation focusing on work that Funnelback did for the <a href="http://www.anu.edu.au/">Australian National University</a>. Funnelback was originally developed by <a href="http://www.csiro.au/">CSIRO</a> and the ANU under the name <a href="http://homepages.cwi.nl/~arjen/wird04/presentations/irdbxml.pdf">Panoptic</a>.</p>
<p>The ANU has a substantial web presence, comprised of hundreds of sites and over a million pages. Like many large sites, it suffers from propagation delay: the most important pages are fresh, but material on the outposts can be stale. Moreover, there is broad diversity of authorship.</p>
<p>The university also has a strong editorial stance for ranking search results: the search engine needs to identify and favor official content. Given the proliferation of unofficial content, it can be a challenge to identify official sites based on signals like incoming link count, click counts, and the use of official style templates.</p>
<p>David described a particular application that Funnelback developed for ANU: a university course finder. The problem is similar to that of ecommerce search and calls for similar solutions, e.g., <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, auto-complete, and suggestions of related queries. And, just as in ecommerce, we can evaluate performance in terms of <a href="http://en.wikipedia.org/wiki/Conversion_rate">conversion rate</a>.</p>
<div data-item-id="129562275987206144" data-item-type="tweet">
<div data-is-reply-to="false" data-tweet-id="129562275987206144" data-item-id="129562275987206144" data-screen-name="dtunkelang" data-user-id="15937226">
<div>
<div>
<div>
<div>
<div>David ended his talk by touching on expertise finding (a problem I think about a lot as a <a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">LinkedIn data scientist</a>!) and showing demos. And, while I no longer work in <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise search</a> myself, I still appreciate its unique challenges. I&#8217;m glad that David and his colleagues are working to overcome those challenges, especially in a domain as important as education.</div>
</div>
</div>
</div>
</div>
</div>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Ben Greene on Large Memory Computers for In-Memory Enterprise Applications</title>
		<link>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/</link>
		<comments>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 17:39:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3941</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. Large-scale computation was, not surprisingly, a major theme at the CIKM 2011 Industry Event. Ben Greene, Director of SAP Research Belfast, delivered a presentation on &#8220;Large Memory Computers for In-Memory Enterprise Applications&#8220;. Ben started by defining in-memory [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10274812?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Large-scale computation was, not surprisingly, a major theme at the CIKM 2011 Industry Event. <a href="http://www.linkedin.com/pub/ben-greene/1/93/785">Ben Greene</a>, Director of SAP Research Belfast, delivered a presentation on &#8220;<a href="http://www.cikm2011.org/industryevent#bg">Large Memory Computers for In-Memory Enterprise Applications</a>&#8220;.</p>
<p>Ben started by defining in-memory computing as &#8220;technology that allows the processing of massive quantities of real time data in the main memory of the server to provide immediate results from analyses and transactions&#8221;. He then asked whether the cloud enables real-time computing, since there is a clear market hunger for cloud computing to solve the problems of our current enterprise systems.</p>
<p>Not surprisingly, he advocated in-memory computing as the solution for those problems. Like <a href="http://www.stanford.edu/~ouster/">John Ousterhout</a> and the <a href="https://ramcloud.stanford.edu/">RAMCloud</a> team, he sees the need to scale <a href="http://en.wikipedia.org/wiki/Dynamic_random-access_memory">DRAM</a> memory independently from physical boxes. He proposed a model of coherent shared memory, using high-speed low-latency networks and separating the data transport and cache layers into a separate tier below the operating system. The goal: no server-side application caches, DRAM-like latency for physically distributed databases, and in fact no separation between the application server and the database server.</p>
<p>Ben argued that coherent shared memory can dramatically lower the cost of in-memory computing while minimizing the pain for application developers. He also offered some benchmarks for SAP&#8217;s <a href="http://ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/ike.pdf">BigIron</a> system to demonstrate the performance improvements.</p>
<p>In short, Ben offered a vision of in-memory computing as a reincarnation of the mainframe. It was an interesting and provocative presentation, and my only regret is that we couldn&#8217;t stage a debate between him and <a href="http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/">Jeff Hammerbacher</a> over the future of large-scale enterprise computing.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Chavdar Botev on Databus: A System for Timeline-Consistent Low-Latency Change Capture</title>
		<link>http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/</link>
		<comments>http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 03:23:12 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3934</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. I&#8217;m of course delighted that one of my colleagues at LinkedIn was able to participate in the CIKM 2011 Industry Event. Principal software engineer Chavdar Botev delivered a presentation on &#8220;Databus: A System for Timeline-Consistent Low-Latency Change Capture&#8220;. [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10244215?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>I&#8217;m of course delighted that one of my colleagues at LinkedIn was able to participate in the CIKM 2011 Industry Event. Principal software engineer <a href="http://www.linkedin.com/in/chavdarbotev">Chavdar Botev</a> delivered a presentation on &#8220;<a href="http://www.cikm2011.org/industryevent#cb">Databus: A System for Timeline-Consistent Low-Latency Change Capture</a>&#8220;.</p>
<p>LinkedIn processes a massive amount of member data and activity. It has over 135M members and is growing faster than two new members per second. Based on recent measurements, those members are on track to perform more than four billion searches on the LinkedIn platform in 2011. All of this activity requires a data change capture mechanism that allows external systems, such as its graph index and real-time full-text search index <a href="http://javasoze.github.com/zoie/">Zoie</a>, to act as subscribers in user space and stay up to date with constantly changing data in the primary stores.</p>
<p>LinkedIn has built the Databus system to meet these needs. Databus meets four key requirements: timeline consistency, guaranteed delivery, low latency, and user-space visibility. For example, edits to member profile fields, such as companies and job titles, need to be <a href="http://www.cs.umass.edu/~ronb/papers/kdd2011.pdf">standardized</a>. Also, in order to give recruiters act quickly on feedback to their job postings, we need to be able to propagate the changes to the job description in near-real-time.</p>
<p>Databus propagates data changes throughout LinkedIn&#8217;s architecture. When there is a change in a primary store (e.g., member profiles or connections), the changes are buffered in the Databus Relay through a push or pull interface. The relay can also capture the transactional semantics of updates. Clients poll for changes in the relay. If a client falls behind the stream of change events in the relay, it is redirected to a Bootstrap database that delivers a compressed delta of the changes since the last event seen by the client.</p>
<p>In contrast to generic message systems (including the <a href="http://incubator.apache.org/kafka/index.html">Kafka</a> system that LinkedIn has open-sourced through Apache), Databus has moreinsight in the structure of the messages and can thus do better than just guaranteeing message-level integrity andtransactional semantics for communication sessions.</p>
<p>I tend to live a few levels above core infrastructure, but I&#8217;m grateful that Chavdar and his colleagues build the core platform that makes all of our large-scale data collection possible. After all, without data we have no <a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">data science</a>.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Khalid Al-Kofahi on Combining Advanced Search Technology and Human Expertise in Legal Research</title>
		<link>http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/</link>
		<comments>http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/#comments</comments>
		<pubDate>Sat, 19 Nov 2011 23:01:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3930</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The original program for the CIKM 2011 Industry Event featured Peter Jackson, who was chief scientist at Thomson Reuters and author of numerous books and papers on natural language processing. Sadly, Peter died on August 3,2011. Thomson Reuters R&#38;D VP of Research Khalid [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10236335?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The original program for the CIKM 2011 Industry Event featured <a href="http://www.jacksonpeter.com/">Peter Jackson</a>, who was chief scientist at <a type="1" href="http://thomsonreuters.com" target="_blank">Thomson Reuters</a> and author of numerous books and papers on <a href="http://www.amazon.com/Natural-Language-Processing-Online-Applications/dp/9027249938/">natural language processing</a>. Sadly, Peter <a href="http://blog.thomsonreuters.com/index.php/remembering-our-colleague-peter-jackson/">died on August 3,2011</a>. Thomson Reuters R&amp;D VP of Research <a href="http://www.linkedin.com/pub/khalid-al-kofahi/0/5a3/b7b">Khalid Al-Kofahi</a> graciously agreed to speak in his place, delivering a presentation on  &#8221;<a href="http://www.cikm2011.org/industryevent#kak">Combining Advanced Search Technology and Human Expertise in Legal Research</a>&#8220;.</p>
<p>Khalid began by giving an &#8220;83-second&#8221; overview of the US legal system, laying out the roles of the law, the courts, and the legislature. He did so to provide the context for the domain that Thomson Reuters serves &#8212; namely, legal information. Legal information providers curate legal information, enhance it editorially and algorithmically, and work to make legal information findable and explainable in particular task contexts. He then worked through an example of how a case law document (specifically, <em><a href="http://en.wikipedia.org/wiki/Burger_King_v._Rudzewicz">Burger King v. Rudzewicz</a></em>), appears in <a href="http://store.westlaw.com/westlawnext/">WestLawNext</a>, with annotations that include headnotes, topic codes, citation data, and historical context.</p>
<p>Channelling <a href="http://www.slideshare.net/dtunkelang/google-tech-talk-reconsidering-relevance-presentation/13">William Goffman</a>, Khalid asserted that a document&#8217;s content (words, phrases, metadata) are not sufficient to determine its aboutness and importance. Rather, we also have to consider what other people say about the document and how they interact with it. This is especially true in the legal domain because of the <a href="http://en.wikipedia.org/wiki/Precedent">precedential</a> nature of law. He then framed legal search in terms of information retrieval metrics, stating the requirements as completeness (<a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a>), accuracy (<a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a>), and authority. Not surprisingly, Khalid agreed with <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">Stephen Robertson&#8217;s emphasis on the importance of recall</a>.</p>
<p>Speaking more generally, Khalid noted that <a href="http://en.wikipedia.org/wiki/Vertical_search">vertical search</a> is not just about search. Rather, it’s about findability. which includes navigation, recommendations, clustering, <a href="http://en.wikipedia.org/wiki/Faceted_classification">faceted classification</a>, collaboration, etc. Most importantly, it&#8217;s about satisfying a set of well-understood tasks. And, particularly in the legal domain, customers demand explainable models. Beyond this demand, explainability serves an additional purpose: it enables the human searcher to add value to the process (cf. <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a>).</p>
<p>It is sad to lose a great researcher like Peter Jackson from our ranks, but I am grateful that Khalid was able to honor his memory by presenting their joint work at CIKM. If you&#8217;d like to learn more, I encourage you to read the publications on the <a href="http://labs.thomsonreuters.com/">Thomson Reuters Labs</a> page.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Jeff Hammerbacher on Experiences Evolving a New Analytical Platform</title>
		<link>http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/</link>
		<comments>http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 16:51:04 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3925</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The third speaker in the program was Cloudera co-founder and Chief Scientist Jeff Hammerbacher. Jeff, recently hailed by Tim O&#8217;Reilly as one of the world&#8217;s most powerful data scientists, built the Facebook Data Team, which is most [...]]]></description>
				<content:encoded><![CDATA[<p><object id="__sse10188294" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20111027cikm-111116104111-phpapp02&amp;rel=0&amp;stripped_title=jeff-10188294&amp;userName=dtunkelang" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse10188294" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20111027cikm-111116104111-phpapp02&amp;rel=0&amp;stripped_title=jeff-10188294&amp;userName=dtunkelang" allowFullScreen="true" allowScriptAccess="always" wmode="transparent" allowscriptaccess="always" allowfullscreen="true" /> </object></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The third speaker in the program was <a href="http://www.cloudera.com/">Cloudera</a> co-founder and Chief Scientist <a href="http://www.linkedin.com/in/jhammerb">Jeff Hammerbacher</a>. Jeff, recently hailed by <a href="http://tim.oreilly.com/">Tim O&#8217;Reilly</a> as one of the <a href="http://www.forbes.com/pictures/lmm45emkh/2-jeff-hammerbacher-chief-scientist-cloudera-and-dj-patil-entrepreneur-in-residence-greylock-ventures">world&#8217;s most powerful data scientists</a>, built the <a href="http://www.facebook.com/data">Facebook Data Team</a>, which is most known for open-source contributions that include <a href="http://en.wikipedia.org/wiki/Apache_Hive">Hive</a> and <a href="http://en.wikipedia.org/wiki/Apache_Cassandra">Cassandra</a>. Jeff&#8217;s talk was entitled “<a href="http://www.cikm2011.org/industryevent#jh">Experiences Evolving a New Analytical Platform: What Works and What&#8217;s Missing</a>“. I am thankful to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging a <a href="http://www.searchenginecaffe.com/2011/10/cikm-industry-talk-jeff-hammerbacher-on.html">summary</a>.</p>
<p>Jeff&#8217;s talk was a whirlwind tour through the philosophy and technology for delivering large-scale analytics (aka &#8220;big data&#8221;) to the world:</p>
<p>1) Philosophy</p>
<p>The true challenges in the task of data mining are creating a data set with the relevant and accurate information and determining the appropriate analysis techniques. While in the past it made sense to plan data storage and structure around the intended use of the data, the economics of storage and the availability of open-source analytics platforms argue for the reverse: data first, ask questions later; store first, establish structure later. The goal is to enable everyone &#8212; developers, analysts, business users &#8212; to &#8220;party on the data&#8221;, providing infrastructure that keeps them from clobbering one another or starving each other of resources.</p>
<p>2) Defining the Platform</p>
<p>No one just uses a relational database anymore. For example, consider <a href="http://en.wikipedia.org/wiki/Microsoft_SQL_Server">Microsoft SQL Server</a>. It is actually part of a unified suite that includes <a href="http://en.wikipedia.org/wiki/Microsoft_SharePoint">SharePoint</a> for collaboration, <a href="http://en.wikipedia.org/wiki/PowerPivot">PowerPivot</a> for <a href="http://en.wikipedia.org/wiki/Online_analytical_processing">OLAP</a>, <a href="http://msdn.microsoft.com/en-us/library/ee362541.aspx">StreamInsight</a> for <a href="http://en.wikipedia.org/wiki/Complex_event_processing">complex event processing</a> (CEP), etc. As with the <a href="http://en.wikipedia.org/wiki/LAMP_(software_bundle)">LAMP</a> stack, there is a coherent framework analytical data management which we can call an analytical data platform.</p>
<p>3) Cloudera&#8217;s Platform</p>
<p>Cloudera starts with a substrate architecture of <a href="http://opencompute.org/">Open Compute</a> commodity Linux servers configured using <a href="http://projects.puppetlabs.com/projects/puppet">Puppet</a> and <a href="http://www.opscode.com/chef/">Chef</a> and coordinated using <a href="http://zookeeper.apache.org/">ZooKeeper</a>. Naturally this entire stack is open-source. They use <a href="http://hadoop.apache.org/hdfs/">HFDS</a> and <a href="http://ceph.newdream.net/">Ceph</a> to provide distributed, <a href="http://en.wikipedia.org/wiki/NoSQL">schema-less</a> storage. They offer append-only table storage and metadata using <a href="http://avro.apache.org/">Avro</a>, <a href="http://hive.apache.org/docs/r0.4.0/api/org/apache/hadoop/hive/ql/io/RCFile.html">RCFile</a>, and <a href="http://incubator.apache.org/hcatalog/">HCatalog</a>; and mutable table storage and metadata using <a href="http://hbase.apache.org/">HBase</a>. For computation, they offer <a href="http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a> (inter-job scheduling, like <a href="http://gridengine.org/blog/">Grid Engine</a>, for data intensive computing) and <a href="http://www.mesosproject.org/">Mesos</a> for cluster resource management; <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>, <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2911">Hamster</a> (<a href="http://www.open-mpi.org/">MPI</a>), <a href="http://www.spark-project.org/">Spark</a>, <a href="http://research.microsoft.com/en-us/projects/dryad/">Dryad</a> / <a href="http://research.microsoft.com/en-us/projects/dryadLINQ/">DryadLINQ</a>, <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Pregel</a> (<a href="http://incubator.apache.org/giraph/">Giraph</a>), and <a href="http://research.google.com/pubs/pub36632.html">Dremel</a> as processing frameworks; and <a href="https://github.com/cloudera/crunch#readme">Crunch</a>  (like Google&#8217;s <a href="http://dl.acm.org/citation.cfm?id=1806596.1806638">FlumeJava</a>), <a href="http://pig.apache.org/">PigLatin</a>, <a href="http://hive.apache.org/">HiveQL</a>, and <a href="http://yahoo.github.com/oozie/">Oozie</a> as high-level interfaces. Finally, Cloudera offers tool access through <a href="http://fuse.sourceforge.net/">FUSE</a>, <a href="http://en.wikipedia.org/wiki/Java_Database_Connectivity">JDBC</a>, and <a href="http://en.wikipedia.org/wiki/Open_Database_Connectivity">ODBC</a>; and data ingest through <a href="http://www.cloudera.com/blog/2009/06/introducing-sqoop/">Sqoop</a> and <a href="https://cwiki.apache.org/FLUME/">Flume</a>.</p>
<p>4) What&#8217;s Next?</p>
<p>For the substrate, we can expect support for fat servers with fat pipes, operating system support for isolation, and improved local filesystems (e.g., <a href="http://en.wikipedia.org/wiki/Btrfs">btrfs</a>). Storage improvements will give us a unified file format, compression, better performance and availability, richer metadata, distributed snapshots, replication across data centers, native client access, and separation of namespace and block management. We will see stabilization of our existing compute tools and better variety, as well as improved fault tolerance, isolation and workload management, low-latency job scheduling, and a unified execution backend for workflow. And we will see better integration through REST API access to all platform components, better document ingest, maintenance of source catalog and provenance information, and an integration beyond ODBC with analytics tools. We will also see tools that facilitate that transition from unstructured to structured data (e.g. <a href="http://cloudera.github.com/RecordBreaker/">RecordBreaker</a>).</p>
<p>Jeff&#8217;s talk was as information-dense as this post suggests, and I hope the mostly-academic CIKM audience was not too shell-shocked. It&#8217;s fantastic to see practitioners not only building essential tools for research in information and knowledge management, but reaching out to the research community to build bridges. I saw lots of intense conversation after his talk, and I hope the results realize the two-fold mission of the Industry Event, which is to give  researchers an opportunity to learn about the problems most relevant to industry practitioners, and to offer practitioners an opportunity to deepen their understanding of the field in which they are working.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: John Giannandrea on Freebase &#8211; A Rosetta Stone for Entities</title>
		<link>http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/</link>
		<comments>http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 08:26:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3916</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The second speaker in the program was Metaweb co-founder John Giannadrea. Google acquired Metaweb last year and has kept its promise to to maintain Freebase as a free and open database for the world (including for rival [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.cikm2011.org/industryevent#jg"><img class="alignnone" title="Freebase - A Rosetta Stone for Entities" src="http://upload.wikimedia.org/wikipedia/en/8/86/Freebase-logo.png" alt="" width="129" height="20" /></a></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The second speaker in the program was <a href="http://en.wikipedia.org/wiki/Metaweb">Metaweb</a> co-founder <a href="http://24.about.apps.freebase.dev.freebaseapps.com/team">John Giannadrea</a>. Google <a href="http://googleblog.blogspot.com/2010/07/deeper-understanding-with-metaweb.html">acquired Metaweb</a> last year and has kept its promise to to maintain <a href="http://en.wikipedia.org/wiki/Freebase">Freebase</a> as a free and open database for the world (including for rival search engine <a href="http://blog.freebase.com/2009/07/13/bing-structured-search-results-powered-by-freebase/">Bing</a> &#8211; though I&#8217;m not sure if Bing is still using Freebase). John&#8217;s talk was entitled &#8220;<a href="http://www.cikm2011.org/industryevent#jg">Freebase &#8211; A Rosetta Stone for Entities</a>&#8220;. I am thankful to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging a <a href="http://www.searchenginecaffe.com/2011/10/cikm-2011-industry-freebase-rosetta.html">summary</a>.</p>
<p>John started by introducing Freebase as a representation of structured objects corresponding to real-world entities and connected by a directed graph of relationships. In other words, a <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web</a>. While it isn&#8217;t quite web-scale, Freebase is a large and growing knowledge base consisting of 25 million entities and 500 million connections &#8212; and doubling annually. The core concept in Freebase is a type, and an entity can have many types. For example, <a href="http://www.freebase.com/view/en/arnold_schwarzenegger">Arnold Schwarzenegger</a> is a <a href="http://www.freebase.com/view/en/politician">politician</a> and an <a href="http://www.freebase.com/view/en/actor">actor</a>. John emphasized the messiness of the real world. For example, most actors are people, but what about the <a href="http://www.freebase.com/view/m/05tf4t">dog</a> who played <a href="http://www.freebase.com/view/en/lassie">Lassie</a>? It&#8217;s important to support exceptions.</p>
<p>The main technical challenge for Freebase is <a href="http://wiki.freebase.com/wiki/Reconciliation">reconciliation</a> &#8212; that is, determining how similar a set of data is to existing Freebase topics. John pointed out how critical it is for Freebase to avoid duplication of content, since the utility of Freebase depends on unique nodes in its graph corresponding to unique objects in the world. Freebase obtains many of its entities by reconciling large, open-source knowledge bases &#8212; including Wikipedia, <a href="http://wordnet.princeton.edu/">WordNet</a>, <a href="http://authorities.loc.gov/">Library of Congress Authorities</a>,  and metadata from the <a href="http://blog.freebase.com/2010/06/11/stanford-university-library-catalog/">Stanford Library</a>. Freebase uses a variety of tools to implement reconciliation, including <a href="http://code.google.com/p/google-refine/?redir=1">Google Refine</a> (formerly known as Freebase Gridworks) and <a href="http://wiki.freebase.com/wiki/Matchmaker">Matchmaker</a>, a tool for gathering human judgments. While reconciliation is a hard technical problem, it is made possible by making inferences across the web of relationships that link entities to one another.</p>
<p>John then presented Freebase as a <a href="http://en.wikipedia.org/wiki/Rosetta_Stone">Rosetta Stone</a> for entities on the web. Since an entity is simply a collection of keys (one of which is its name), Freebase&#8217;s job is to reverse engineer the key-value store that is distributed among the entity&#8217;s web references, e.g., the structured databases backing web sites and encoding keys in URL parameters. He noted that Freebase itself is schema-less (it is a <a href="http://en.wikipedia.org/wiki/Graph_database">graph database</a>), and that even the concept of a <a href="http://www.freebase.com/view/type/type">type</a> is itself an entity (&#8220;Type type is the only type that is an instance of itself&#8221;). Google makes Freebase available through an <a href="http://wiki.freebase.com/wiki/New_Freebase_API">API</a> and the Metaweb Query Language (<a href="http://wiki.freebase.com/wiki/MQL">MQL</a>).</p>
<p>Freebase does have its challenges. The requirement to keep out duplicates is an onerous one, as they discovered when importing a portion of the <a href="http://openlibrary.org/">Open Library</a> catalog. Maintaining quality calls for significant manual curation, and quality varies across the knowledge base. John asserted that Freebase provides 99% accuracy at the 95th percentile, though it&#8217;s not clear to me what that means <em>(update: see Bill&#8217;s <a href="http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/comment-page-1/#comment-10650">comment</a> below)</em>.</p>
<p>While I still have concerns about Freebase&#8217;s robustness as a structured knowledge base (see my post on &#8220;<a href="http://thenoisychannel.com/2011/05/15/in-search-of-structure/">In Search Of Structure</a>&#8220;), I&#8217;m excited to see Google investing in structured representations of knowledge. To hear more about Google&#8217;s efforts in this space, check out the Strata New York panel I moderated on <a href="http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/">Entities, Relationships, and Semantics</a> &#8212; the panelists included <a href="http://secondthought.org/">Andrew Hogue</a>, who leads Google&#8217;s structured data and information extraction group and managed me during my year at Google New York.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Stephen Robertson on Why Recall Matters</title>
		<link>http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/</link>
		<comments>http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 16:02:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3903</guid>
		<description><![CDATA[On October 27th, I had the pleasure to chair the CIKM 2011 Industry Event with former Endeca colleague Tony Russell-Rose. It is my pleasure to report that the program, held in parallel with the main conference sessions, was a resounding success. Since not everyone was able to make it to Glasgow for this event, I&#8217;ll [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10155675" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<div style="padding: 5px 0 12px;">
<p>On October 27th, I had the pleasure to chair the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a> with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>. It is my pleasure to report that the program, held in parallel with the main conference sessions, was a resounding success. Since not everyone was able to make it to Glasgow for this event, I&#8217;ll use this and subsequent posts to summarize the presentations and offer commentary. I&#8217;ll also share any slides that presenters made available to me.</p>
<p>Microsoft researcher <a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a>, who may well be the world&#8217;s preeminent living researcher in the area of <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a>, opened the program with a talk on &#8220;<a href="http://www.cikm2011.org/industryevent#ser">Why Recall Matters</a>&#8220;. For the record, I didn&#8217;t put him up to this, despite my <a href="http://thenoisychannel.com/2009/07/17/in-defense-of-recall/">strong opinions</a> on the subject.</p>
<p>Stephen started by reminding us of ancient times (i.e., before the web), when at least some IR researchers thought in terms of <a href="http://thenoisychannel.com/2008/08/24/set-retrieval-vs-ranked-retrieval/">set retrieval</a> rather than ranked retrieval. He reminded us of the precision and recall &#8220;devices&#8221; that he&#8217;d described in his <a href="http://www.soi.city.ac.uk/~ser/papers/salton_lecture_web.pdf">Salton Award Lecture</a> &#8212; an idea he attributed to the late <a href="http://www.iva.dk/bh/core%20concepts%20in%20lis/articles%20a-z/cranfield_experiments.htm">Cranfield</a> pioneer <a href="http://en.wikipedia.org/wiki/Cyril_Cleverdon">Cyril Cleverdon</a>. He noted that, while set retrieval uses distinct precision and recall devices, ranking conflates both into decision of where to truncate a ranked result list. He also pointed out an interesting asymmetry in the conventional notion of <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision-recall</a> tradeoff: while returning more results can only increase recall, there is no certainly that the additional results will decrease precision. Rather, this decrease is a hypothesis that we associate with systems designed to implement the <a href="http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/jdRobertson1977.pdf">probability ranking principle</a>, returning results in decreasing order of probability of relevance.</p>
<p>He went on to remind us that there is information retrieval beyond web search. He hauled out the usual examples of recall-oriented tasks: <a href="http://www.haxel.com/icic/archive/2009/programme/oct20/dummy-presentation/at_download/attachfile">e-discovery</a>, <a href="http://hcir.info/hcir-2011/challenge">prior art search</a>, and <a href="http://en.wikipedia.org/wiki/Evidence-based_medicine">evidence-based medicine</a>. But he then made the case that not only the web not the only problem in information retrieval, but that &#8220;it&#8217;s the web that&#8217;s strange&#8221; relative to the rest of the information retrieval landscape in so strongly favoring precision over recall. He enumerated some of the peculiarities of the web, including its size (there&#8217;s only one web!), the extreme variation in authorship and quality, the lack of any content standardization (efforts like <a href="http://schema.org/">schema.org</a> notwithstanding), and the advertising-based monetization model that creates an unusual and sometimes adversarial relationships between content owners and search engines. In particular, he cited <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise search</a> as an information retrieval domain that violates the assumptions of web search and calls for more emphasis on recall.</p>
<p>Stephen suggested that, rather than thinking in terms of the precision-recall curve, we consider the recall-fallout curve. <a href="http://en.wikipedia.org/wiki/Information_retrieval#Fall-Out">Fallout</a> is a relatively unknown measure that represents the probability that a non-relevant document is retrieved by the query. He noted that fallout offered little practical use in IR, given that the corpus is populated almost entirely by non-relevant documents. Still, he made the case that the recall-fallout trade-off might be more conceptually appropriate than the precision-recall curve in order to understand the value of recall.</p>
<p>In particular, we can generalize the traditional inverse precision-recall relationship to the hypothesis that the recall-fallout curve is convex (details in &#8220;<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.2609&amp;rep=rep1&amp;type=pdf">On score distributions and relevance</a>&#8220;). We can then calculate instantaneous precision at any point in the result list as the gradient of the recall-fallout curve. Going back to the notion of devices, we can now replace precision devices with fallout devices.</p>
<p>Stephen wrapped up his talk by emphasizing the user of information retrieval systems &#8212; as aspect of IR that is too often neglected outside <a href="http://hcir.info/">HCIR</a> circles. He advocated that systems provide user with evidence of recall, guidance of how far to go down ranked results, and prediction of the recall at any given stopping point.</p>
<p>It was an extraordinary privilege to have Stephen Robertson present at the CIKM Industry Event, and even better to have him make a full-throated argument in favor of recall. I can only hope that researchers and practitioners take him up on it.</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Entities, Relationships, and Semantics: Strata NY Panel on the State of Structured Search</title>
		<link>http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/</link>
		<comments>http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/#comments</comments>
		<pubDate>Sun, 06 Nov 2011 03:56:45 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3896</guid>
		<description><![CDATA[Earlier this year, I had the privilege to moderate a panel at Strata New York 2011 on Entities, Relationships, and Semantics: the State of Structured Search. The four panelists are people I&#8217;ve had the pleasure to work with over the years: Andrew Hogue (Google), Breck Baldwin (Alias-i), Evan Sandhaus (New York Times), Wlodek Zadrozny (IBM Research). They work on some of the world’s largest [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/vr1blOJxXfQ" frameborder="0" width="500" height="281"></iframe></p>
<p>Earlier this year, I had the privilege to moderate a panel at <a href="http://strataconf.com/stratany2011">Strata New York 2011</a> on <a href="http://strataconf.com/stratany2011/public/schedule/detail/21413">Entities, Relationships, and Semantics: the State of Structured Search</a>. The four panelists are people I&#8217;ve had the pleasure to work with over the years: <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124000">Andrew Hogue</a> (Google), <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124001">Breck Baldwin</a> (Alias-i), <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124002">Evan Sandhaus</a> (New York Times), <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124003">Wlodek Zadrozny</a> (IBM Research). They work on some of the world’s largest structured search problems &#8212; from offering users structured search on Google’s web corpus to building a computing system that defeated Jeopardy! champions in an extreme test of natural language understanding.</p>
<p>O&#8217;Reilly has compiled the nearly 50 hours of video from the conference and made the collection available for <a href="http://shop.oreilly.com/product/0636920022985.do">purchase</a>. I was lucky to attend all of the keynotes and many of the breakout sessions, and I highly recommend them. In the meantime, you can see a recording of the panel I moderated.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Interview in Forbes: What is a Data Scientist?</title>
		<link>http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/</link>
		<comments>http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 06:24:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3884</guid>
		<description><![CDATA[Dan Woods has been interviewing a variety of folks to answer the question: &#8220;What is a data scientist?&#8220;, and I had the honor to participate in his series. Here is a teaser of my interview: Above all, a data scientist needs to be able to derive robust conclusions from data. But a data scientist also [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/"><img class="alignnone" title="The Data Science Venn Diagram (Drew Conway)" src="http://www.drewconway.com/zia/wp-content/uploads/2010/09/Data_Science_VD.png" alt="" width="407" height="388" /></a></p>
<p><a href="http://blogs.forbes.com/people/danwoods/">Dan Woods</a> has been interviewing a variety of folks to answer the question: &#8220;<a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">What is a data scientist?</a>&#8220;, and I had the honor to participate in his series.</p>
<p>Here is a teaser of my interview:</p>
<blockquote><p>Above all, a data scientist needs to be able to derive robust conclusions from data. But a data scientist also needs to possess creativity and strong communication skills. Creativity drives the process of hypothesis generation, i.e., picking the right problems to solve that will create value for users and drive business decisions.</p></blockquote>
<p>Read the rest on <a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">Forbes.com</a>. And thanks to Drew Conway for the awesome <a href="http://www.drewconway.com/zia/?p=2378">data science Venn diagram</a> above.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RecSys 2011 Tutorial: Recommendations as a Conversation with the User</title>
		<link>http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/</link>
		<comments>http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 23:39:21 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3880</guid>
		<description><![CDATA[&#160; Last week, I had the privilege to present a tutorial at the 5th ACM International Conference on Recommender Systems (RecSys 2011). Given my passion for HCIR and my advocacy for transparency in recommender systems, it shouldn&#8217;t surprise regular readers that I focused on both. Unfortunately the tutorial was not recorded, but I hope the [...]]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/9967220?rel=0" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe><br />
&nbsp;</p>
<p>Last week, I had the privilege to present a tutorial at the 5th ACM International Conference on Recommender Systems (<a href="http://recsys.acm.org/2011/index.shtml">RecSys 2011</a>). Given my passion for <a href="http://hcir.info/">HCIR</a> and my advocacy for <a href="http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/">transparency in recommender systems</a>, it shouldn&#8217;t surprise regular readers that I focused on both. Unfortunately the tutorial was not recorded, but I hope the slides above prove useful. I also encourage you to take a look at the other <a href="http://recsys.acm.org/2011/tutorials.shtml">tutorials</a>, whose slides are posted on the conference site.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>HCIR 2011: We Have Arrived!</title>
		<link>http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/</link>
		<comments>http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/#comments</comments>
		<pubDate>Fri, 21 Oct 2011 09:08:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3873</guid>
		<description><![CDATA[If you followed the #hcir2011 tweet stream, then you already know what I have to say: the Fifth Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011) was an extraordinary success. We had about 100 people attending, 14 paper presentations, 28 posters, and 4 challenge entries, all packed into one intense day at Google&#8217;s beautiful [...]]]></description>
				<content:encoded><![CDATA[<p>If you followed the <a href="http://twitter.com/#!/search/%23hcir2011">#hcir2011</a> tweet stream, then you already know what I have to say: the Fifth Workshop on Human-Computer Interaction and Information Retrieval (<a href="http://hcir.info/hcir-2011">HCIR 2011</a>) was an extraordinary success. We had about 100 people attending, 14 <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/schedule/presentations">paper presentations</a>, 28 <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/posters">posters</a>, and 4 <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/challenge">challenge</a> entries, all packed into one intense day at Google&#8217;s beautiful Mountain View headquarters.</p>
<p>Wednesday evening before the workshop, we were treated to a welcome reception, the first of a few meals provided by Google&#8217;s excellent chefs. It was a great opportunity to reconnect with old friends and meet many first-time HCIR attendees.</p>
<p>Thursday started with a scrumptious breakfast that included chilaquiles, coconut fritters, and bacon. Last year&#8217;s <a href="https://sites.google.com/site/hcirworkshop/hcir-2010/keynote">keynote</a> and this year&#8217;s local host <a href="https://sites.google.com/site/dmrussell/">Dan Russell</a> pulled all the stops &#8212; apparently <a href="http://www.yelp.com/biz/bigtable-cafe-mountain-view">BigTable</a> is the only Google cafe that serves bacon for breakfast! We then proceeded to a poster boaster session in which each poster presenter had a minute to pitch his or her poster. This session set the tone for the rest of the workshop: concentrated ideas and intense audience engagement.</p>
<p>Then came this year&#8217;s keynote, <a href="http://ils.unc.edu/~march/">Gary Marchionini</a>. It was a particular treat to have Gary as a keynote, since his lecture on &#8220;<a href="http://www.asis.org/Bulletin/Jun-06/marchionini.html">Toward Human-Computer Information Retrieval</a>&#8221; inspired me to conceive the HCIR workshop back in 2007. And Gary delivered the goods. He started with a review of the history of HCIR, including some lesser known figures like <a href="http://www.linkedin.com/pub/donald-hawkins/10/a59/77">Don Hawkins</a> (who was in the audience) , <a href="http://www.ideals.illinois.edu/handle/2142/14100">Pauline Cochrane</a>, <a href="http://stuff.mit.edu/people/rmarcus/home.html">Richard Marcus</a>, and <a href="http://www3.fis.utoronto.ca/faculty/meadow/">Charles Meadow</a>.  He brought a few chuckles by citing <a href="http://comminfo.rutgers.edu/~belkin/belkin.html">Nick Belkin</a> (who was present) and <a href="http://research.microsoft.com/en-us/um/people/sdumais/">Sue Dumais</a> (who was not) as the father and mother of HCIR. Naturally he described some of his own work at the University of North Carolina, including the <a href="http://www.open-video.org/">Open Video</a>, <a href="http://ils.unc.edu/relationbrowser/">Relation Browser</a>, and <a href="http://ils.unc.edu/resultsspace/">ResultsSpace</a> projects.But the highlight of his talk was a graph he presented showing two paths to the same user end-state, one of the paths being a smooth progression and the other being a roller-coaster of ups and down. The question of which one was better drew a wide variety of responses, my favorite being <a href="http://www.fxpal.com/?p=gene">Gene Golovchinsky</a> observing that learning is the friction of the information-seeking process.</p>
<p>We broke for coffee and then came back to the first session of paper presentations. <a href="http://www.athenikos.com/">Sofia Athenikos</a> presented a semantic search engine that outperformed IMDB in a user study. <a href="http://comminfo.rutgers.edu/directory/changl/index.html">Chang Liu</a> explored the effect of task difficulty and domain knowledge on dwell times, finding counterintuitive results (at least for me) regarding the correlation of expertise to dwell time. <a href="https://sites.google.com/site/jliujingjing/">Jingjing Liu</a> presented research on knowledge examination in multi-session tasks. Then came the lightning talks: <a href="http://www.mansci.uwaterloo.ca/~msmucker/">Mark Smucker</a> on how users examine and process ranked document lists; <a href="http://www.cs.umass.edu/~jykim/">Jin Kim</a> on simulating associative browsing; <a href="http://faculty.cua.edu/kules/">Bill Kules</a> on visualizing the stages of exploratory search; and <a href="http://comminfo.rutgers.edu/directory/mjcole/index.html">Michael Cole</a> on user domain knowledge and eye movement patterns during search. Way too much goodness to summarize here &#8212; I suggest you read the full papers on the <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/schedule/presentations">workshop site</a>.</p>
<p>Then came lunch &#8212; again in BigTable, but this time with outdoor seating &#8212; and the poster session. As always, this it the most interactive part of the day: two hours of non-stop discussion that start over food and end with prying people away from discussions about posters. I was especially proud of LinkedIn&#8217;s contributions to the <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/posters">poster session</a>, which covered <a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NmIxNzEzZjE3ZTVhZTAyYw&amp;pli=1">faceted search log analysis</a>, <a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjEzNDNjZTk5NGYyYWQwOA">social navigation</a>, and <a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MWVlMGNhZWY5NTA3MzQ2ZA">whether it is time to abandon abandonment</a>.</p>
<p>Then back to the second session of  paper presentations. <a href="http://faculty.arts.ubc.ca/lfreund/">Luanne Freund</a> talked about document usefulness and genre, finding that genre, besides being hard for users to reliably identify, only matters for tasks that involve doing, deciding, learning; but not for those that involve fact finding or problem solving. <a href="http://www.fxpal.com/?p=gene">Gene Golovchinsky</a> presented work on designing for collaboration in information seeking, previewing the system he used for his challenge entry.  <a href="http://www.medelyan.com/">Alyona Medelyan</a> used the <a href="http://www.pingar.com/">Pingar</a> search engine to evaluate how search interface features affect performance on biosciences tasks. Then more lightning talks: <a href="http://www.ils.unc.edu/~rcapra/">Rob Capra</a> analyzing faceted search on mobile devices; <a href="http://www.linkedin.com/pub/keith-bagley/0/657/124">Keith Bagley</a> on conceptual mile markers for exploratory search; <a href="http://ils.unc.edu/~wildem/ASIST2008/Yuan-CV.pdf">Xiaojun Yuan</a> on how cognitive styles affect user performance; and <a href="http://mikezarro.com/">Mike Zarro</a> on using social tags and controlled vocabularies as search filters.</p>
<p>Last but not least came the <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/challenge">HCIR Challenge</a>:</p>
<blockquote><p>The HCIR 2011 Challenge focuses on the case where recall is everything – namely, the problem of information availability. The information availability problem arises when the seeker faces uncertainty as to whether the information of interest is available at all. Instances of this problem include some of the highest-value information tasks, such as those facing national security and legal/patent professionals, who might spend hours or days searching to determine whether the desired information exists.</p>
<p>The corpus we will use for the HCIR 2011 Challenge is the CiteSeer digital library of scientific literature. The CiteSeer corpus contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.</p></blockquote>
<p>There were four entries:</p>
<ul>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NzE1YmM2YzE4ODBhYzRjZA" target="_blank">FreeSearch – Literature Search in a Natural Way<br />
</a><em>Claudiu S. Firan, Wolfgang Nejdl, Mihai Georgescu (University of Hanover), and Xinyun Sun (DEKE Lab MOE, Renmin)<br />
</em></li>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MmZmM2Y5Yzg5OTM4NGI5NQ" target="_blank">Session-based search with Querium<br />
</a><em>Gene Golovchinsky (FX Palo Alto Lab) and Abdigani Diriye (University College London)<br />
</em></li>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NWI1NTc5NWNmNDlmZDUyZg" target="_blank">GisterPro<br />
</a><em>David L.Ostby and Edmond Brian (Visual Purple)<br />
</em></li>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjIwOWNlOWY4YTQzMDRmZA" target="_blank">Query Analytics Workbench<br />
</a><em>Antony Scerri, Matthew Corkum, Keith Gutfreund, Ron Daniel Jr., Michael Taylor (Elsevier Labs)</em></li>
</ul>
<p>The competition was fierce. Claudiu showed off the <a href="http://dblp.l3s.de/">Faceted DBLP</a> interface, which is well suited to the information availability task on CiteSeer data. Ed showed how GisterPro uses visualization to support the information seeking process. But it came down to a close call between the Query Analytics Workbench and Querium. Despite the Elsevier team&#8217;s impressive functionality and animated presentation, Gene&#8217;s simpler interface and application of <a href="http://www.fxpal.com/publications/FXPAL-PR-08-467.pdf">ranked fusion</a> won the day. Congratulations to Gene and Abdigani, this year&#8217;s HCIR Challenge winners!</p>
<p>We wrapped up the evening at the <a href="http://tiedhouse.com/">Tied House</a>, a local microbrewery. And of course the discussion turned to where, when, and how we will hold next year&#8217;s workshop. Watch this space. In the meantime, my heartfelt thanks to everyone who made this year&#8217;s workshop such a success &#8212; and especially to our sponsors. Thank you Endeca, Kent State, Microsoft, and Google!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oracle Acquires Endeca!</title>
		<link>http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/</link>
		<comments>http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 16:20:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3860</guid>
		<description><![CDATA[    Today is a wonderful day for Endeca and Oracle! Oracle has announced that it has entered into an agreement to acquire Endeca, bringing together two of the powerhouses of information access. Quoting from the announcement: &#8220;The combination of Oracle and Endeca is expected to create a comprehensive technology platform to process, store, manage, [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.endeca.com/"><img class="alignnone" title="Endeca" src="http://allinio.com/wp-content/uploads/2008/10/Endeca-Logo.gif" alt="" width="116" height="60" /></a>   <img class="alignnone" title="acquired by" src="http://www.freestockphotos.biz/pictures/2/2987/arrow.png" alt="" width="60" height="60" /><a href="http://www.oracle.com/"><img class="alignnone" title="Oracle" src="http://www.logostage.com/logos/Oracle.jpg" alt="" width="311" height="60" /></a></p>
<p>Today is a wonderful day for <a href="http://www.endeca.com/">Endeca</a> and <a href="http://www.oracle.com/">Oracle</a>! Oracle has <a href="http://www.oracle.com/us/corporate/press/517791">announced</a> that it has entered into an agreement to acquire Endeca, bringing together two of the powerhouses of information access. Quoting from the announcement: &#8220;The combination of Oracle and Endeca is expected to create a comprehensive technology platform to process, store, manage, search and analyze structured and unstructured information together. &#8221;</p>
<p>As part of Endeca&#8217;s founding team, I am very proud to see this day. My ten years at Endeca were a formative experience that established my professional identity and inspired my passion to pursue the vision of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a> (by happy coincidence, the 5th annual <a href="http://hcir.info/">HCIR workshop</a> take place on Thursday). Reading Oracle&#8217;s <a href="http://www.oracle.com/us/corporate/Acquisitions/endeca/general-presentation-517133.pdf">presentation</a> about the acquisition, I&#8217;m excited to see how Endeca&#8217;s technology will play a key role in unifying structured and unstructured data management and analysis for Oracle&#8217;s customers.</p>
<p>I take pride in my contributions to Endeca &#8212; I still slip sometimes and refer to Endeca as &#8220;we&#8221;. But the real heroes here are the folks &#8212; and especially the <a href="http://www.endeca.com/en/about-us/leadership-team.html">leadership</a> &#8212;  who have seen this journey through from start to finish. In particular, I am grateful to Steve Papa, Pete Bell, Adam Ferrari, Jack Walter, Keith Johnson, Nik Bates-Haus, and Jason Purcell for everything they have done to bring about this extraordinary outcome.</p>
<p>Finally, excited as I am about this event, it is only the beginning. I am excited to see Endeca&#8217;s people and technology powering one of the world&#8217;s largest enterprise software companies. Looking forward to the next play!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn</title>
		<link>http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/</link>
		<comments>http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/#comments</comments>
		<pubDate>Sat, 01 Oct 2011 04:39:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3850</guid>
		<description><![CDATA[Last week, I delivered the following presentation at the CMU Intelligence Seminar: &#160; I had a great audience, including the department head! Of course that meant fielding tough questions, but that&#8217;s what makes it fun to present at my alma mater. Now that it&#8217;s been over a decade since my defense, I can handle the [...]]]></description>
				<content:encoded><![CDATA[<p>Last week, I delivered the following presentation at the <a href="http://www.cs.cmu.edu/~iseminar/">CMU Intelligence Seminar</a>:</p>
<div id="__ss_9494640" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"></strong><object id="__sse9494640" width="467" height="390" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=keepingitprofessional-110930233231-phpapp02&amp;stripped_title=keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin&amp;userName=dtunkelang" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse9494640" width="467" height="390" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=keepingitprofessional-110930233231-phpapp02&amp;stripped_title=keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin&amp;userName=dtunkelang" allowFullScreen="true" allowScriptAccess="always" allowscriptaccess="always" allowfullscreen="true" /></object></div>
<p>&nbsp;</p>
<p>I had a great audience, including the department head! Of course that meant fielding tough questions, but that&#8217;s what makes it fun to present at my alma mater. Now that it&#8217;s been over a decade since my defense, I can handle the tough questions. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Unfortunately there is no video, but hopefully the slides are reasonably self-explanatory. If you have questions, please ask them in the comments.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Visiting the East Coast: CMU and Strata New York</title>
		<link>http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/</link>
		<comments>http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/#comments</comments>
		<pubDate>Sun, 18 Sep 2011 16:01:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3831</guid>
		<description><![CDATA[              &#160; Tonight I&#8217;m taking a red-eye to Pittsburgh so that I can spend three days at my (doctoral) alma mater, CMU. In addition to spending time with lots of great students and faculty, my goal is to communicate a taste of the hard computer science problems we are [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://cs.cmu.edu/"><img class="alignnone" title="CMU School of Computer Science" src="http://www.cs.cmu.edu/~ref/naacl/logos/bronze/dragon-small.jpeg" alt="" width="178" height="154" /></a>             <a href="http://strataconf.com/stratany2011/"><img class="alignnone" title="O'Reilly Strata New York: Making Data Work" src="http://assets.en.oreilly.com/1/eventseries/23/strata_franchise_logo_strata.gif" alt="" width="245" height="79" /></a></p>
<p>&nbsp;</p>
<p>Tonight I&#8217;m taking a red-eye to Pittsburgh so that I can spend three days at my (doctoral) alma mater, <a href="http://www.cs.cmu.edu/~quixote/">CMU</a>. In addition to spending time with lots of great students and faculty, my goal is to communicate a taste of the hard computer science problems we are solving (or trying to solve!) at <a href="http://engineering.linkedin.com/">LinkedIn</a>. I&#8217;m giving a <a href="http://www.cs.cmu.edu/~iseminar/">tech talk</a> Tuesday afternoon, joining my colleagues for an info session Tuesday evening, and participating in the <a href="http://toc.web.cmu.edu/">Technical Opportunities Conference</a> (TOC) Wednesday.</p>
<p>Here&#8217;s a teaser for my tech talk:</p>
<p><a href="http://www.cs.cmu.edu/~iseminar/"><img class="alignnone size-full wp-image-3832" style="border-width: 5px; border-color: black; border-style: solid;" title="Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/Keeping-It-Professional.png" alt="" width="475" height="356" /></a></p>
<p>You can find more details about LinkedIn&#8217;s visits to CMU and other campuses at <a href="http://studentcareers.linkedin.com/">http://studentcareers.linkedin.com/</a>.</p>
<p>Hopefully some of you are attending the O&#8217;Reilly <a href="http://strataconf.com/stratany2011/">Strata Conference</a> in New York this Thursday and Friday. If so, I encourage you to attend my panel session on &#8220;<a href="http://strataconf.com/stratany2011/public/schedule/detail/21413">Entities, Relationships, and Semantics: the State of Structured Search</a>&#8220;:</p>
<blockquote><p>Structured search improves the search experience through the identification of entities and their relationships in documents and queries. This panel will explore the current state of structured and semi-structured search, as well as exploring the open problems in an area that promises to revolutionize information seeking.</p></blockquote>
<p>The four panelists work on some of the world’s largest structured search problems, from offering users structured search on Google’s web corpus to building a computing system that defeated <em>Jeopardy!</em> champions in an extreme test of natural language understanding. They work on the data, tools, and research that are driving this field. They are all excellent researchers and presenters, promising to offer a informative and engaging panel discussion, for which I will act as moderator.</p>
<p>Panelists:</p>
<ul>
<li><strong>Andrew Hogue</strong> is a Senior Staff Engineer and Engineering Manager in the Search Quality group at Google New York. He has worked on a wide array of projects including question answering, Google Squared, sentiment analysis, local and product search, and Google Goggles. His is interested in the areas of structured data, information extraction, and machine learning, and their applications to search and search interfaces. Prior to Google, he earned a M.Eng. and B.S. in Computer Science from MIT.</li>
</ul>
<ul>
<li><strong>Breck Baldwin</strong> is the President of Alias-i, creators of the popular LingPipe computational linguistics toolkit. He received his Ph.D. in computer science in 1995 from the University of Pennsylvania. In the time between his thesis on coreference resolution and evaluation and founding Alias-i in 1999, Breck worked on DARPA-funded projects through the University of Pennsylvania.</li>
</ul>
<ul>
<li><strong>Evan Sandhaus</strong> works as the Semantic Technologist in The New York Times Research and Development Labs. He is spearheading The New York Times Linked Open Data Strategy and overseeing the release of 1.8 million documents to the computer science research community. Previously, Evan helped to put The New York Times on Google Earth, collaborated with New York University to explore new directions in News Search, and worked to bring The New York Times to Facebook.</li>
</ul>
<ul>
<li><strong>Wlodek Zadrozny</strong> is an IBM Researcher working on natural language applications. Most recently he worked on text sources for Watson (IBM’s Jeopardy chamption) and applying related DeepQA technology to business problems. His previous work ranged from language processing research to product development and technical planning; in particular, he lead the development of interactions systems that used speech, natural language and focused search. Wlodek Zadrozny received a Ph.D. in Mathematics, from the Polish Academy of Science.</li>
</ul>
<p><strong>And one more thing.</strong> Karaoke at <a href="http://www.2ndon2nd.com/">Second on Second</a> in the East Village on Friday night. It&#8217;s an unofficial Strata after-party, so come join us Big Data folks for some Big Fun.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Different Anniversary: Happy Birthday, Endeca!</title>
		<link>http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/</link>
		<comments>http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 03:43:59 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3819</guid>
		<description><![CDATA[             I grew up in New York City. On September 11th, 2001, I was in Cambridge, Massachusetts, desperately trying to get through to my parents by all means of communication at my disposal. My dad worked at 40 Worth Street, only a few blocks away from the World Trade Center. [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone" title="Happy Birthday!" src="http://images.pictureshunt.com/pics/h/happy_birthday-2004.jpg" alt="" width="182" height="134" />            <a href="http://www.endeca.com/"><img class="alignnone" title="Endeca" src="http://www.f9systems.com/site/sites/default/files/endeca_logo.gif" alt="" width="243" height="134" /></a></p>
<p>I grew up in New York City. On September 11th, 2001, I was in Cambridge, Massachusetts, desperately trying to get through to my parents by all means of communication at my disposal. My dad worked at <a href="http://maps.google.com/maps?q=40+Worth+Street+New+York+NY">40 Worth Street</a>, only a few blocks away from the World Trade Center. Thankfully none of my family or friends were harmed that day, but that fateful event ten years ago left a mark on the world that no one of my generation will ever forget.</p>
<p>Fortunately I have happier associations with this anniversary.</p>
<p>On September 11th, 1999, I boarded an Amtrak from New York to Boston to join Steve Papa, Pete Bell, Dave Gourley, Fritz Knabe, Jack Walter, and Phil Braden to start the company that would eventually be named <a href="http://www.endeca.com/">Endeca</a>. I had no way of knowing whether we would persuade VCs to fund us beyond our six months of seed investment, let alone that we would develop a technology that to revolutionize the search experience of millions of users around the world. Our modest ambition was to build a better way to find stuff on eBay. That goal remains unfulfilled, but <a href="http://www.endeca.com/en/solutions/Customer-Experience-Management/b2c-ecommerce.html">44 of the top 100 online retailers use Endeca</a>, which isn&#8217;t too shabby. Especially considering that Endeca has expanded well beyond online retail into domains like manufacturing, business intelligence, and government.</p>
<p>On Seprtember 11th, 2002, I gathered the Endeca founding team for a dinner to celebrate the company&#8217;s 3rd birthday. Given my reputation for general irreverence, I feared that my colleagues would think this was a stunt to mock the memory of the more familiar 9/11. But it was quite the opposite. September 11th, 1999 was a turning point in my professional life, and no terrorist was going to take that happiness away from me. To this day I am grateful that my colleagues recognized my sincerity and joined me in this celebration.</p>
<p>The dinner that night was an emotional one: 2002 had been a <a href="http://en.wikipedia.org/wiki/Dot-com_bubble#The_bubble_bursts">tough year</a> for the software industry &#8212; one in which we saw many of our peer companies fold. Fortunately it was the beginning of much better times for us: from 2003 to 2006, Endeca was the <a href="http://www.endeca.com/en/news-and-events/press-releases/2007/endeca-named-massachusetts-fastest-growing-private-company-in-boston-business-journal-annual-ranking.html">fastest growing private company in Massachusetts</a>. No IPO yet, but the <a href="http://www.bizjournals.com/boston/print-edition/2011/07/01/endeca-gears-up-for-likely-ipo-bid.html">rumors</a> are encouraging.</p>
<p>I left Endeca almost two years ago, going to <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">Google</a> and then <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">LinkedIn</a>. But I will always have fond memories of the decade I spent at Endeca &#8212; an experience that established much of the <a href="http://thenoisychannel.com/2011/08/21/dream-fit-passion/">passion</a> that drives me today. I am very proud to have been part of the founding team of such a great company, even if now I can only follow from a distance.</p>
<p>Happy birthday, Endeca, and many more to come!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Attention CMU Students!</title>
		<link>http://thenoisychannel.com/2011/09/07/attention-cmu-students/</link>
		<comments>http://thenoisychannel.com/2011/09/07/attention-cmu-students/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 02:04:18 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3812</guid>
		<description><![CDATA[As many of you know, I&#8217;m a proud alumnus of the CMU School of Computer Science (yes, I also attended the CMU of Massachusetts). I&#8217;m delighted to have the opportunity to spend a few days on campus this month, and I hope that I&#8217;ll have a chance to meet with lots of students and faculty while [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://engineering.linkedin.com"><img class="alignnone size-full wp-image-3814" title="LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/in-logo.jpeg" alt="" width="205" height="205" /></a><a href="http://www.cs.cmu.edu/"><img class="alignnone" title="CMU School of Computer Science" src="http://www.cs.cmu.edu/~ref/naacl/logos/bronze/dragon-small.jpeg" alt="" width="277" height="241" /></a></p>
<p>As many of you know, I&#8217;m a proud alumnus of the <a href="http://www.cs.cmu.edu/~quixote/">CMU School of Computer Science</a> (yes, I also attended the <a href="http://www.eecs.mit.edu/">CMU of Massachusetts</a>). I&#8217;m delighted to have the opportunity to spend a few days on campus this month, and I hope that I&#8217;ll have a chance to meet with lots of students and faculty while I&#8217;m there.</p>
<p>Specifically, I&#8217;ll be giving a talk at Eugene Fink&#8217;s <a href="http://www.cs.cmu.edu/~iseminar/">Intelligence Seminar</a> on Tuesday, September 20th at 3:30pm in Gates-Hillman 4303:</p>
<blockquote><p><strong>Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn</strong></p>
<p>LinkedIn operates the world&#8217;s largest professional network on the Internet with more than 120 million members in over 200 countries. In order to connect its users to the people, opportunities, and content that best advance their careers, LinkedIn has developed a variety of algorithms that surface relevant content, offer personalized recommendations, and establish topic-sensitive reputation &#8212; all at a massive scale. In this talk, I will discuss some of the most challenging technical problems we face at LinkedIn, and the approaches we are taking to address them.</p></blockquote>
<p>I hope to see all of you there! My colleagues and I will also be hosting an information session that same Tuesday at 6pm in Porter Hall, Room 125B, as well as participating in the <a href="http://toc.web.cmu.edu/">Technical Opportunities Conference</a> Tuesday and Wednesday. And of course LinkedIn will be conducting on-campus <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/">interviews</a>: those will take place all day on Thursday, September 22nd.</p>
<p>If you are a CMU student interested in <a href="http://engineering.linkedin.com/">opportunities at LinkedIn</a>, please <a href="http://www.studentaffairs.cmu.edu/career/tartantrak/tartantrakstudentlogin.html">apply through TartanTrak</a> (yes, I wish you could just <a href="http://blog.linkedin.com/2011/07/24/apply-with-linkedin/">apply with LinkedIn</a> &#8211; we&#8217;ll get there!). Of course, feel free to reach out to me personally at <a href="mailto:dtunkelang@linkedin.com">dtunkelang@linkedin.com</a>. We already have more applicants than slots, but I promise that every application will be considered. I&#8217;m very excited to recruit CMU students to strengthen our growing team of software engineers and data scientists.</p>
<p>See you soon, and let&#8217;s go Tartans!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/07/attention-cmu-students/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/07/attention-cmu-students/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dream. Fit. Passion.</title>
		<link>http://thenoisychannel.com/2011/08/21/dream-fit-passion/</link>
		<comments>http://thenoisychannel.com/2011/08/21/dream-fit-passion/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 01:14:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3804</guid>
		<description><![CDATA[A few days ago, our CEO Jeff Weiner led a session at LinkedIn on how to &#8220;close&#8221; candidates &#8212; that is, how to persuade candidates to join your team once you have found and interviewed them. Since not everyone has the opportunity to work at LinkedIn and experience Jeff&#8217;s leadership first-hand, I thought I&#8217;d share [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/File:Franz_Marc_-_Der_Traum_-_Google_Art_Project.jpg"><img class="alignnone" title="Franz Marc - The Dream" src="http://upload.wikimedia.org/wikipedia/commons/thumb/a/a1/Franz_Marc_-_Der_Traum_-_Google_Art_Project.jpg/500px-Franz_Marc_-_Der_Traum_-_Google_Art_Project.jpg" alt="" width="500" height="368" /></a></p>
<p>A few days ago, our CEO <a href="http://www.linkedin.com/in/jeffweiner08">Jeff Weiner</a> led a session at LinkedIn on how to &#8220;close&#8221; candidates &#8212; that is, how to persuade candidates to join your team once you have <a href="http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/ ">found</a> and <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/ ">interviewed</a> them. Since not everyone has the opportunity to <a href="http://engineering.linkedin.com/">work at LinkedIn</a> and experience Jeff&#8217;s leadership first-hand, I thought I&#8217;d share some of his wisdom here.</p>
<p>The key take-away  was that closing a candidate is not about selling the job or company to the candidate, but rather working with the candidate to figure out what the candidate wants and whether the job will help him or her achieve that desire. As an employer, you need to do three things to close a candidate:</p>
<p>1) Figure out what is the candidate&#8217;s dream.<br />
2) Determine if job and candidate are the right fit.<br />
3) Communicate your own passion.</p>
<p>Let&#8217;s take these one at a time.</p>
<p><strong>Dream.</strong></p>
<p>As I&#8217;ve written here in the past, we have to <a href="http://thenoisychannel.com/2011/01/17/dare-to-dream/">dare to dream</a>. Most of us rely on jobs to sustain us and our loved ones &#8212; and for some a job is nothing more than that. There&#8217;s no shame in having a dream that is unrelated to a job &#8212; Franz Kafka famously worked in a variety of &#8220;<a href="http://en.wikipedia.org/wiki/Franz_Kafka#Employment">bread jobs</a>&#8221; in order to pay the bills while he wrote novels. Others find their calling as humanitarians, activists, or care givers. It&#8217;s easy for many of us to forget that life isn&#8217;t always about work.</p>
<p>But the great thing about working in technology is that you can get paid to fulfill your own dream. Look at <a href="http://www.wired.com/wired/archive/13.08/battelle.html">Larry and Sergey</a>, who set out to organize the world&#8217;s information. Or <a href="http://books.simonandschuster.com/Steve-Jobs/Walter-Isaacson/9781442346956">Steve Jobs</a>, whose dream has been to create innovative products. Not everyone is as specific in their dreams or as successful in realizing them, but, as the saying goes, you have to be in it to win it.</p>
<p>Convincing a person to accept a job offer works best when that job brings the person closer to fulfilling his or her dream. My own decisions to go to <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">Google</a> and then <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/ ">LinkedIn</a> are good examples. Working at <a href="http://www.endeca.com/">Endeca</a> drove me to pursue a vision of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> &#8212; to optimize the way people and machines work together to solve information seeking and exploration tasks. At Google, I hoped to bring exploratory search to the open web. I&#8217;ll concede that I did not make much headway, but I&#8217;m glad that I tried.</p>
<p>And at LinkedIn, I work on problems that not only stretch the boundaries of information science, but whose solutions help millions of other people achieve their dreams by making them more successful professionally. My dream is to truly reduce HCIR to practice so that people can lead better and more productive lives. Once the folks at LinkedIn understood my dream, closing me was just a matter of offering me the keys to make that dream a reality.</p>
<p>If you want someone to work at your company, get to know that person&#8217;s dreams. If the job you are offering can&#8217;t help him or her realize those dreams, be honest about it. It&#8217;s better for both of you, and for a world that is better off with people devoting their lives&#8217; work to fulfilling their dreams.</p>
<p><strong>Fit.</strong></p>
<p>Fit is a two way street: the candidate should be right for the job, and the job should be right for the candidate. The interviewing process typically focuses on establishing the former, but we often forget that the candidate&#8217;s decision focuses on the latter. Just because someone is capable of doing a job doesn&#8217;t mean it&#8217;s the right job for that person.</p>
<p>For me, fit means many things. A work environment where people work hard and take the company&#8217;s success personally. Incentives that allow everyone to win, rather than a zero-sum game where people compete for scarce opportunities. Openness, since I&#8217;m someone who lives most of my life <a href="http://www.forbes.com/2008/10/13/cio-mesh-collaboration-tech-cio-cx_dw_1014mesh.html">in public</a>. I could go on &#8212; but I hope you get the general idea. Fit is the set of functional and non-functional requirements that determine whether someone will enjoy a job. And people who enjoy their jobs tend to be productive and stay a while.</p>
<p>If you are trying to persuade someone to accept a job offer, you have to see the decision from that person&#8217;s point of view. In other words, ask yourself &#8212; and convincingly answer &#8212; why the job is the right fit for the candidate. That means accepting the possibility that is isn&#8217;t the right fit, and doing right by the candidate even if that means backing off.</p>
<p><strong>Passion.</strong></p>
<p>Choosing a job is one of the most important life decisions that people make. It&#8217;s not quite up there with getting married or having a child, but it&#8217;s a a decision that most people take (and should take) very seriously. Some people create spreadsheets of the pros and cons to compare opportunities and try to frame their decision as an <a href="http://www.decisionmaking.org/career_decisionmaking.html">optimization problem</a>. Others go with their gut.</p>
<p>Those who know me personally &#8212; whether from face-to-face or online interaction &#8212; know that I wear my passion on my sleeve. I can&#8217;t understand how someone could get up in the morning and go to work without being passionate about his or her job. I know that many people don&#8217;t have a choice in the matter, and I pity them. In a country where most people take subsistence for granted, having a job you love strikes me as a necessity, rather than a luxury.</p>
<p>But what is clear is that if you, as an employer, are not passionate about what you do, you have no business expecting a candidate to take such a big leap of faith with you. Moreover, passion is hard to fake. As it should be &#8212; I&#8217;m not suggesting that employers should pretend to be excited about their jobs. Rather, your own sincere excitement is a baseline for those you hope to attract to your team. Passion is contagious, and passion is the raw material for making dreams come true.</p>
<p><strong>Dream. Fit. Passion.</strong></p>
<p>There you have it: dream, fit, passion. And remember, closing isn&#8217;t selling. Do right by the people you try to hire. After all, jobs are short, but careers are long. Celebrate everyone&#8217;s professional success, and take your losses in stride. I can tell you from experience that it all works out for the best.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/08/21/dream-fit-passion/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/08/21/dream-fit-passion/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>Retiring a Great Interview Problem</title>
		<link>http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/</link>
		<comments>http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 07:27:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3767</guid>
		<description><![CDATA[Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find candidates who can write code. The tech press sporadically publishes &#8220;best&#8221; interview questions that make me cringe &#8212; though I love the IKEA question. Startups like Codility and Interview Street see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://search.dilbert.com/comic/Job%20Interview"><img class="alignnone" title="Job Interview on Dilbert.com" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/200/1222/1222.strip.gif" alt="" width="640" height="199" /></a></p>
<p>Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">candidates who can write code</a>. The tech press sporadically publishes <a href="http://royal.pingdom.com/2008/08/15/the-best-job-interview-questions-from-microsoft-google%E2%80%A6-and-ikea/">&#8220;best&#8221; interview questions</a> that make me cringe &#8212; though I love the <a href="http://farm4.static.flickr.com/3147/2765642306_fb64f6c3d7_o.jpg">IKEA question</a>. Startups like <a href="http://codility.com/">Codility</a> and <a href="http://www.interviewstreet.com/recruit/home/">Interview Street</a> see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding interviews. Meanwhile, Diego Basch and others are urging us to stop subjecting candidates to <a href="http://blog.indextank.com/1030/interviewing-engineers-enough-with-the-whiteboard-coding/">whiteboard coding exercises</a>.</p>
<p>I don&#8217;t have a silver bullet to offer. I agree that IQ tests and gotcha questions are a terrible way to assess software engineering candidates. At best, they test only one desirable attribute; at worst, they are a crapshoot as to whether a candidate has seen a similar problem or stumbles into the key insight. Coding questions are a much better tool for assessing people whose day job will be coding, but conventional interviews &#8212; whether by phone or in person &#8212; are a suboptimal way to test coding strength. Also, it&#8217;s not clear whether a coding question should assess problem-solving, pure translation of a solution into working code, or both.</p>
<p>In the face of all of these challenges, I came up with an interview problem that has served me and others well for a few years at <a href="http://www.endeca.com/en/about-us/jobs.html">Endeca</a>, <a href="http://www.google.com/intl/ln/jobs/uslocations/new-york/swe/index.html">Google</a>, and <a href="http://engineering.linkedin.com/">LinkedIn</a>. It is with a heavy heart that I retire it, for reasons I&#8217;ll discuss at the end of the post. But first let me describe the problem and explain why it has been so effective.</p>
<p><strong>The Problem</strong></p>
<p>I call it the &#8220;word break&#8221; problem and describe it as follows:</p>
<pre>Given an input string and a dictionary of words,
segment the input string into a space-separated
sequence of dictionary words if possible. For
example, if the input string is "applepie" and
dictionary contains a standard set of English words,
then we would return the string "apple pie" as output.</pre>
<p>Note that I&#8217;ve deliberately left some aspects of this problem vague or underspecified, giving the candidate an opportunity to flesh them out. Here are examples of questions a candidate might ask, and how I would answer them:</p>
<pre>Q: What if the input string is already a word in the
   dictionary?
A: A single word is a special case of a space-separated
   sequence of words.

Q: Should I only consider segmentations into two words?
A: No, but start with that case if it's easier.

Q: What if the input string cannot be segmented into a
   sequence of words in the dictionary?
A: Then return null or something equivalent.

Q: What about stemming, spelling correction, etc.?
A: Just segment the exact input string into a sequence
   of exact words in the dictionary.

Q: What if there are multiple valid segmentations?
A: Just return any valid segmentation if there is one.

Q: I'm thinking of implementing the dictionary as a
   <a href="http://en.wikipedia.org/wiki/Trie">trie</a>, <a href="http://en.wikipedia.org/wiki/Suffix_tree">suffix tree</a>, <a href="http://en.wikipedia.org/wiki/Fibonacci_heap">Fibonacci heap</a>, ...
A: You don't need to implement the dictionary. Just
   assume access to a reasonable implementation.

Q: What operations does the dictionary support?
A: Exact string lookup. That's all you need.

Q: How big is the dictionary?
A: Assume it's much bigger than the input string,
   but that it fits in memory.</pre>
<p>Seeing how a candidate negotiates these details is instructive: it offers you a sense of the candidate&#8217;s communication skills and attention to detail, not to mention the candidate&#8217;s basic understanding of data structures and algorithms.</p>
<p><strong>A FizzBuzz Solution</strong></p>
<p>Enough with the problem specification and on to the solution. Some candidates start with the simplified version of the problem that only considers segmentations into two words. I consider this a <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">FizzBuzz</a> problem, and I expect any competent software engineer to produce the equivalent of the following in their programming language of choice. I&#8217;ll use Java in my example solutions.</p>
<pre>String SegmentString(String input, Set&lt;String&gt; dict) {
  int len = input.length();
  for (int i = 1; i &lt; len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      if (dict.contains(suffix)) {
        return prefix + " " + suffix;
      }
    }
  }
  return null;
}</pre>
<p>I have interviewed candidates who could not produce the above &#8212; including candidates who had passed a technical phone screen at Google. As Jeff Atwood says, FizzBuzz problems are a great way to keep interviewers from wasting their time interviewing programmers who can&#8217;t program.</p>
<p><strong>A General Solution</strong></p>
<p>Of course, the more interesting problem is the general case, where the input string may be segmented into any number of dictionary words. There are a number of ways to approach this problem, but the most straightforward is <a href="http://en.wikipedia.org/wiki/Backtracking">recursive backtracking</a>. Here is a typical solution that builds on the previous one:</p>
<pre>String SegmentString(String input, Set&lt;String&gt; dict) {
  if (dict.contains(input)) return input;
  int len = input.length();
  for (int i = 1; i &lt; len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
    }
  }
  return null;
}</pre>
<p>Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit <a href="http://www.cprogramming.com/tutorial/computersciencetheory/stack.html">stack</a>) in half an hour. I&#8217;m sure that many of them are competent and productive. But I would not hire them to work on <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> or <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a> problems, especially at a company that delivers search functionality on a massive scale.</p>
<p><strong>Analyzing the Running Time</strong></p>
<p><strong></strong>But wait, there&#8217;s more! When a candidate does arrive at a solution like the above, I ask for an <a href="http://en.wikipedia.org/wiki/Big_O_notation">big O</a> analysis of its worst-case running time as a function of n, the length of the input string. I&#8217;ve heard candidates respond with everything from O(n) to O(n!).</p>
<p>I typically offer the following hint:</p>
<pre>Consider a pathological dictionary containing the words
"a", "aa", "aaa", ..., i.e., words composed solely of
the letter 'a'. What happens when the input string is a
sequence of n-1 'a's followed by a 'b'?</pre>
<p>Hopefully the candidate can figure out that the recursive backtracking solution will explore every possible segmentation of this input string, which reduces the analysis to determine the number of possible segmentations. I leave it as an exercise to the reader (with this <a href="http://en.wikipedia.org/wiki/Power_set">hint</a>) to determine that this number is O(2<sup>n</sup>).</p>
<p><strong>An Efficient Solution</strong></p>
<p><strong></strong>If a candidate gets this far, I ask if it is possible to do better than O(2<sup>n</sup>). Most candidates realize this is a loaded question, and strong ones recognize the opportunity to apply <a href="http://20bits.com/articles/introduction-to-dynamic-programming/">dynamic programming</a> or <a href="http://en.wikipedia.org/wiki/Memoization">memoization</a>. Here is a solution using memoization:</p>
<pre>Map&lt;String, String&gt; memoized;

String SegmentString(String input, Set&lt;String&gt; dict) {
  if (dict.contains(input)) return input;
  if (memoized.containsKey(input) {
    return memoized.get(input);
  }
  int len = input.length();
  for (int i = 1; i &lt; len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        <del datetime="2012-10-18T05:09:38+00:00">memoized.put(input, prefix + " " + segSuffix);</del>
        return prefix + " " + segSuffix;</pre>
<p>      }<br />
    }<br />
  memoized.put(input, null);<br />
  return null;<br />
}<br />
Again the candidate should be able to perform the worst-case analysis. The key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes. I leave as an exercise to the reader to determine that the worst-case running time of the memoized solution above is O(n<sup>2</sup>), assuming that the substring operation only requires constant time (a discussion which itself makes for an <a href="http://stackoverflow.com/questions/4679746/time-complexity-of-javas-substring">interesting tangent</a>).</p>
<p><strong>Why I Love This Problem</strong></p>
<p>There are lots of reasons I love this problem. I'll enumerate a few:</p>
<ul>
<li>It is a real problem that came up in the couse of developing production software. I developed Endeca's original implementation for rewriting search queries, and this problem came up in the context of spelling correction and thesaurus expansion.</li>
<li>It does not require any specialized knowledge -- just strings, sets, maps, recursion, and a simple application of dynamic programming / memoization. Basics that are covered in a first- or second-year undergraduate course in computer science.</li>
<li>The code is non-trivial but compact enough to use under the tight conditions of a 45-minute interview, whether in person or over the phone using a tool like <a href="http://collabedit.com/">Collabedit</a>.</li>
<li>The problem is challenging, but it isn't a gotcha problem. Rather, it requires a methodical analysis of the problem and the application of basic computer science tools.</li>
<li>The candidate's performance on the problem isn't binary. The worst candidates don't even manage to implement the fizzbuzz solution in 45 minutes. The best implement a memoized solution in 10 minutes, allowing you to make the problem even more interesting, e.g., asking how they would handle a dictionary too large to fit in main memory. Most candidates perform somewhere in the middle.</li>
</ul>
<p><strong>Happy Retirement</strong></p>
<p>Unfortunately, all good things come to an end. I recently discovered that a candidate posted this problem on <a href="http://www.glassdoor.com/">Glassdoor</a>. The solution posted there hardly goes into the level of detail I've provided in this post, but I decided that a problem this good deserved to retire in style.</p>
<p>It's hard to come up with good interview problems, and it's also hard to keep secrets. <a href="http://thenoisychannel.com/2010/12/22/the-secret-may-be-to-keep-fewer-secrets/">The secret may be to keep fewer secrets.</a> An ideal interview question is one for which advance knowledge has limited value. I'm working with my colleagues on such an approach. Naturally, I'll share more if and when we deploy it.</p>
<p>In the mean time, I hope that everyone who experienced the word break problem appreciated it as a worthy test of their skills. No problem is perfect, nor can performance on a single interview question ever be a perfect predictor of how well a candidate will perform as an engineer. Still, this one was pretty good, and I know that a bunch of us will miss it.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/feed/</wfw:commentRss>
		<slash:comments>105</slash:comments>
		</item>
		<item>
		<title>Upcoming Information Retrieval Conferences</title>
		<link>http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/</link>
		<comments>http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 21:11:19 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3748</guid>
		<description><![CDATA[I hope everyone who attended the recent SIGIR 2011 in Beijing had an excellent experience. I didn&#8217;t manage to make it to that side of the globe myself, but I&#8217;m looking forward to hearing back from my LinkedIn colleagues who were there &#8212; particularly Paul Ogilvie, who gave an invited talk at the first Workshop on Entity-Oriented [...]]]></description>
				<content:encoded><![CDATA[<p>I hope everyone who attended the recent <a href="http://www.sigir2011.org/">SIGIR 2011</a> in Beijing had an excellent experience. I didn&#8217;t manage to make it to that side of the globe myself, but I&#8217;m looking forward to hearing back from my <a href="http://engineering.linkedin.com/">LinkedIn</a> colleagues who were there &#8212; particularly <a href="http://www.linkedin.com/in/paulogilvie">Paul Ogilvie</a>, who gave an invited talk at the first Workshop on Entity-Oriented Search (EOS) on &#8220;Anchoring Relevance with Entities&#8221;.</p>
<p>There are four outstanding information retrieval conferences coming up, and I will have the pleasure of participating in three of them. I&#8217;d like to make sure readers here are aware of all of them.</p>
<p><a href="http://www.kdd.org/kdd2011/"><img class="alignnone" title="KDD 2011" src="http://www.kdd.org/kdd2011/images/KDD_Banner_10_Jan.jpg" alt="" width="757" height="86" /></a></p>
<p>The first is <a href="http://www.kdd.org/kdd2011/">KDD 2011</a>, which will take place August 21-24, 2011 in San Diego, CA. The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location.</p>
<p>I will not be attending KDD myself, but several of my colleagues will be there. In particular, <a href="http://www.linkedin.com/in/bekkerman">Ron Bekkerman</a> will be presenting a paper on &#8220;High-Precision Phrase-Based Document Classification on a Modern Scale&#8221;, as well as offering a tutorial on &#8220;Scaling Up Machine Learning: Parallel and Distributed Approaches&#8221;.</p>
<p><strong><a href="http://hcir.info/hcir-2011/">Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011)</a> - Mountain View, CA &#8211; October 20, 2011</strong></p>
<p>The second is <a href="http://hcir.info/hcir-2011/">HCIR 2011</a>, the fifth annual HCIR workshop, which I am co-organizing. It will be held all day on Thursday, October 20th, 2011 at Google&#8217;s main campus in Mountain View, California. There will be a reception on Wednesday evening before the workshop. Our keynote speaker this year will be Gary Marchionini, Dean of the School of Information and Library Science, University of North Carolina at Chapel Hill. We are also excited to continue the HCIR Challenge, this year focusing on the problem of information availability, where the seeker faces uncertainty as to whether the information of interest is available at all. The corpus will be the CiteSeer digital library of scientific literature, which contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.</p>
<p>Thanks to generous contributions made by Google, Microsoft Research, and Endeca, there will be no registration fee for HCIR this year. Information about how to register will be sent to authors of accepted position papers, research papers, and challenge reports. Note that the submission deadline has been <strong>extended by two weeks to Sunday, August 14th</strong>. I strongly encourage you to submit in one of these categories in you are working in this field.</p>
<p><a href="http://recsys.acm.org/2011/"><img class="alignnone size-full wp-image-3753" title="RecSys 2011" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/recsys11.png" alt="" width="586" height="178" /></a></p>
<p>The third is <a href="http://recsys.acm.org/2011/">RecSys 2011</a>, the 5th ACM International Conference on Recommender Systems. RecSys 2011 builds on the success of the Recommenders 06 Summer School in Bilbao, Spain and the series of four successful conference events from 2007 to 2010 in Minneapolis (2007), Lausanne (2008), New York (2009) and Barcelona (2010). In these events many members of the practitioner and research communities valued the rich exchange of ideas made possible by the shared plenary sessions. The 5th International conference will promote the same close interaction among practitioners and researchers.</p>
<p>I will be giving a tutorial at RecSys 2011 on &#8220;Recommendations as a Conversation with the User&#8221;.</p>
<p><a href="http://www.cikm2011.org/"><img class="alignnone" title="CIKM 2011" src="http://www.cikm2011.org/sites/default/files/cikm2011_craigm_v1_logo.jpg" alt="" width="491" height="78" /></a></p>
<p>The fourth is <a href="http://www.cikm2011.org/">CIKM 2011</a>, the 20th ACM Conference on Information and Knowledge Management. It will take place in Glasgow, Scotland, UK, 24th-28th October 2011. Since 1992, the CIKM has successfully brought together leading researchers and developers from the database, information retrieval, and knowledge management communities. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future research directions through the publication of high quality, applied and theoretical research findings. CIKM 2011 will continue the tradition of promoting collaboration among multiple areas in the general areas of databases, information retrieval, and knowledge management.</p>
<p>I am proud to be organizing the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which will feature such industry heavyweights as <a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a> (Microsoft Research), <a href="http://www.freebase.com/view/en/john_giannandrea">John Giannandrea</a> (Google), <a href="http://research.yahoo.com/Vanja_Josifovski">Vanja Josifovski</a> (Yahoo! Research), <a href="http://company.yandex.com/corporate_governance/board_of_directors/ilya_segalovich.xml">Ilya Segalovich</a> (Yandex), <a href="http://jeffhammerbacher.com/">Jeff Hammerbacher</a> (Cloudera), and <a href="http://www.linkedin.com/in/chavdarbotev">Chavdar Botev</a> (LinkedIn).</p>
<p>I&#8217;m very excited about all four of these opportunities to exchange ideas about information retrieval and related areas, and I am grateful to LinkedIn for supporting my participation, as well as that of my colleagues. I hope to see some of you at these events!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Attention vs. Privacy</title>
		<link>http://thenoisychannel.com/2011/07/24/attention-vs-privacy/</link>
		<comments>http://thenoisychannel.com/2011/07/24/attention-vs-privacy/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 06:22:22 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3740</guid>
		<description><![CDATA[A major feature of the recently released Google+ is Circles, which allows you to &#8220;share relevant content with the right people, and follow content posted by people you find interesting.&#8221; Most people seem to look at Circles as a privacy feature &#8212; and indeed Google&#8217;s official description gives the impression that Circles exist to manage [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3741" title="Attention" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/attention.jpg" alt="" width="256" height="256" /></p>
<p>A major feature of the recently released <a href="https://plus.google.com/">Google+</a> is <a href="http://www.google.com/support/+/bin/static.py?hl=en&amp;page=guide.cs&amp;guide=1257347&amp;rd=1">Circles</a>, which allows you to &#8220;share relevant content with the right people, and follow content posted by people you find interesting.&#8221;</p>
<p>Most people seem to look at Circles as a privacy feature &#8212; and indeed Google&#8217;s official description gives the impression that Circles exist to manage privacy based on real-life social contexts. Of course, re-sharing can result in unintended consequences, and Google even offers a <a href="http://www.google.com/support/+/bin/static.py?hl=en&amp;page=guide.cs&amp;guide=1358057&amp;answer=1297219&amp;rd=1">warning</a> that:</p>
<blockquote><p>Unless you disable reshares, anything you share (either publicly or with your circles) can be reshared beyond the original people you shared the content with. This could happen either through reshares or through mentions in comments.</p></blockquote>
<p>Privacy is a big deal, <a href="http://ftc.gov/opa/2011/03/google.shtm">especially for Google</a> &#8212; and particularly in the context of rolling out a new social network. Still, I&#8217;m not persuaded that privacy is the only or even the primary concern motivating the concept of <a href="http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/">social circles</a>.</p>
<p>Sharing content with someone is not just about giving that person permission to see it. Sharing content with someone asserts a claim on that person&#8217;s <a href="http://thenoisychannel.com/2008/12/17/the-macroeconomics-of-information-and-attention-how-people-make-decisions/">attention</a>. While it may be a privilege for me to have access to your content, it may be even more of a privilege for you that I allocate my scarce attention to consume it.</p>
<p>What if we focus on routing content to the people who would find it most interesting? Such an approach works best if all of the shared content is <a href="http://thenoisychannel.com/2008/11/27/when-in-doubt-make-it-public/">public</a> with respect to permissions &#8212; that is, people post it without any expectation of privacy. Twitter demonstrates that many people are comfortable with such a sharing model. Imagine if they could learn to trust a system that optimizes (or at least attempts to optimize) the allocation of everyone&#8217;s attention. This is not an easy problem by any means, nor is it one that is likely to be solved by algorithms alone. It will take a strong dose of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> to get it right. But, at least in my view, optimizing the allocation of human attention is the grand challenge that everyone working with information retrieval or social networks should be striving to address.</p>
<p>Privacy is important, and social networks should offer simple, robust privacy controls that users understand. We all have experienced the problem of <a href="http://thenoisychannel.com/2008/09/23/quick-bites-filter-failure/">filter failure</a>. But sharing isn&#8217;t just about privacy. Our attention is our most precious cognitive asset, both as individuals and as a society, Moreover, our attention faces ever-increasing demands as our social lives evolve in an online world relatively free of physical constraints. Social network developers would do well to pay attention&#8230;to attention.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/24/attention-vs-privacy/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/24/attention-vs-privacy/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Guest Post: Diego Basch on The Need for Speed</title>
		<link>http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/</link>
		<comments>http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/#comments</comments>
		<pubDate>Mon, 18 Jul 2011 00:05:00 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3718</guid>
		<description><![CDATA[Diego Basch is the CEO and founder of IndexTank, a hosted search service that powers major web sites such as Reddit, Twitvid, blip.tv, as well as providing a WordPress plug-in for blogs (like this one). Diego gained his search experience working with Inktomi, where he wrote some of the world&#8217;s first web-scale link analysis algorithms. He [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3720" title="Diego Basch" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/dbasch.jpg" alt="" width="180" height="180" /></p>
<p><a href="http://indextank.com/"><img class="alignnone" title="IndexTank" src="http://indextank.com/_static/common/images/logo.gif" alt="" width="149" height="37" /></a></p>
<p><em>Diego Basch is the CEO and founder of <a href="http://indextank.com/">IndexTank</a>, a hosted search service that powers major web sites such as Reddit, Twitvid, blip.tv, as well as providing a WordPress plug-in for blogs (like this one). Diego gained his search experience working with Inktomi, where he wrote some of the world&#8217;s first web-scale link analysis algorithms. He is on a mission to make every search box blazing fast and useful.</em></p>
<p>So much brainpower is spent solving the wrong problems. The world is filled with solutions looking for problems that nobody has &#8212; as illustrated by a Google query for [<a href="http://www.google.com/search?sourceid=chrome&amp;ie=UTF-8&amp;q=stupidest+inventions+ever">stupidest inventions ever</a>]. More often, people focus narrowly on a particular approach when they should focus on the problem the approach is intended to solve. Or they take a solution for one problem and assume it will apply to another.</p>
<p>Consider the emphasis that search engine developers place on relevance ranking. It is not hard to understand why web-scale search engines emphasize relevance. For example, a search on Google for [<a href="http://www.google.com/search?aq=f&amp;sourceid=chrome&amp;ie=UTF-8&amp;q=emergency+locksmith">emergency locksmith</a>] returns tens of billions of web pages, among which there are only a handful results that you want. Google must filter out the growing number of <a href="http://www.nytimes.com/2011/07/10/your-money/lead-gen-sites-pose-challenge-to-google-the-haggler.html?_r=2">lead generation companies</a> that spend a ton of money trying to game its results.</p>
<p>Most web and application developers are familiar with the concept of relevance, so they naturally assume that it should be the primary concern when they add search to their own sites or apps. When I talk to people who want full-text search for their 40,000 book titles or 100k classified ads, they ask me about all the ways they can tune relevance. But often they are focusing on a solution, rather than their fundamental problem.</p>
<p>Developers are (or should be!) trying to improve the user experience of their application search. Too often they wrongly assume that relevance is the single most important factor for optimizing this user experience. Let&#8217;s surface this confusion in a concrete example.</p>
<p>As a rock climber, once in a while I feel the aches and pains caused by the sport. As the years go by it&#8217;s very important to keep your tendons healthy if you do not want to take forced breaks (or type with one hand!). <a href="http://rockclimbing.com/" target="_blank">Rockclimbing.com</a> is one of the most popular climbing sites, and I know some medical professionals who occasionally answer health-related questions there. Let&#8217;s search there for [<a href="http://www.rockclimbing.com/cgi-bin/forum/gforum.cgi?do=search_results&amp;search_forum=all&amp;search_string=tendon%20injury%20prevention&amp;sb=score&amp;mh=25" target="_blank">tendon injury prevention</a>].</p>
<p><img class="alignnone size-full wp-image-3736" title="tendon injury prevention" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/tendoninjuryprevention.png" alt="" width="806" height="626" /></p>
<p>In the above example, part of the problem is that the search results do not have contextual snippets. Maybe there is relevant information hiding behind a click, but the user has no way of knowing. More generally, there&#8217;s no hint as to what results could be better. Information such as score of the answer (which is available), the author&#8217;s bio (e.g. &#8220;climber, physical therapist&#8221;) would make the decision easier. If you need to click and scroll, search within the page, go back and try something else, then the search engine is wasting your time.<span style="font-family: arial, sans-serif;"> </span></p>
<p>Which brings us the broader point: when users search, they want to spend the least amount of time possible getting to the information they want. Relevance is a means to this end. In particular, clicks and typing costs users time. That time can come from page load, rendering, repeated use of the back button, and of course typing (and re-typing) search queries.</p>
<p>Some application search engines really nail the user experience. Let’s say we’re looking for the movie Koyaanits-however-you-spell-it. Go to the <a href="http://imdb.com/">Internet Movie Database</a> (IMDB) and start typing k-o-y-e &#8212; and there it is, as the second result. Notice that there is a ton of irrelevant stuff around it but it doesn’t matter. I see what I want very quickly.</p>
<p><img src="https://lh5.googleusercontent.com/FBZy2xxDowcimmRbJbtwqFixN375kw6a5JM5UJmin_m1IWrdKGSGwSRzvDfCj6esLW4pXBD5K-SA8JLdmi54xxnmOoliII8u66KeLKuC59ZL7VhcdcQ" alt="" width="382px;" height="348px;" /></p>
<p>Hopefully these two examples serve to illustrate the broader point: search engines should not focus on relevance as an end in itself, but rather on whatever helps users find the information they want as quickly as possible. That means offering contextual snippets, instant feedback, and of course snappy response times. Give users speed, and you will make them happy.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Google±?</title>
		<link>http://thenoisychannel.com/2011/07/04/google%c2%b1/</link>
		<comments>http://thenoisychannel.com/2011/07/04/google%c2%b1/#comments</comments>
		<pubDate>Mon, 04 Jul 2011 22:55:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3703</guid>
		<description><![CDATA[When I left Google last December, it was an open secret that Google was developing a social networking product. Now that Google has released Google+, I am at liberty to share my personal impressions. Let&#8217;s start with the clear wins. Impressive launch. Google has certainly learned its lesson from the past launches of Wave and Buzz. [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://plus.google.com/"><img class="alignnone size-full wp-image-3707" title="Google+" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/Google+.png" alt="" width="500" height="477" /></a></p>
<p>When I <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">left Google</a> last December, it was an <a href="http://techcrunch.com/2010/12/01/google-social-emerald-sea/">open secret</a> that Google was developing a social networking product. Now that Google has released <a href="http://plus.google.com/">Google+</a>, I am at liberty to share my personal impressions.</p>
<p>Let&#8217;s start with the clear wins.</p>
<ul>
<li><strong>Impressive launch.</strong> Google has certainly learned its lesson from the past launches of <a href="http://mashable.com/2010/08/04/rip-google-wave/">Wave</a> and <a href="http://www.quora.com/Why-did-Google-Buzz-fail">Buzz</a>. Google+ is unambiguously opt-in &#8212; no one is going to complain about being <a href="http://techcrunch.com/2011/03/30/reid-hoffman-data-ambush/">ambushed</a>. People have been begging for invites. But Google is wisely releasing invites quickly enough to build critical mass. I&#8217;d say that Google has at least picked up the <a href="http://www.quora.com/">Quora</a> crowd of early adopters in Silicon Valley.</li>
</ul>
<ul>
<li><strong>Clean design.</strong> Design lead <a href="http://techcrunch.com/2011/06/28/google-plus-design-andy-hertzfeld/">Andy Hertzfeld</a> (of Macintosh fame) has nailed it, leading bloggers to comment that this looks too well designed to be a Google product. Comparing Google+ to Facebook now, I&#8217;m reminded at least a little of comparisons between Facebook and Myspace. Great move for Google here.</li>
</ul>
<p>Now let&#8217;s talk about Google&#8217;s three big features here: Circles, Sparks, and Hangouts.</p>
<ul>
<li><strong>Circles.</strong> Straight out of Paul Adams&#8217;s <a href="http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/">presentation of social networking</a> (which he created before he <a href="http://techcrunch.com/2011/07/01/paul-adams-seeing-google-in-public-is-like-bumping-into-an-ex-girlfriend/">left Google for Facebook</a>), the idea is simple: a person doesn&#8217;t have a single group of friends, but rather several groups that tend are mostly disjoint. Through Circles, Google+ makes this soft partitioning of the social space a core design principle. You add people to one or more circles, follow the stream of activity from a circle, and share with circles. It&#8217;s great in theory. But in practice it creates friction, especially for people trained on Facebook. There&#8217;s a trade-off between simplicity and expressive power, and Google is placing a strong bet on how users will make this trade-off.  I&#8217;m inclined to agree with <a href="http://www.quora.com/Yishan-Wong/How-Google+-Shows-That-Google-Still-Doesnt-Understand-Social">Yishan Wong</a> that &#8220;the sorting of friends into buckets (friend lists) is something that only nerds do&#8221;. Given Google&#8217;s deep expertise in machine learning, I&#8217;m expecting Google to reduce this friction by give users intelligent suggestions. <em>Full disclosure: my colleagues at LinkedIn built <a href="http://blog.linkedin.com/2011/01/24/linkedin-inmaps/">InMaps</a>, which infers communities from your social network.</em></li>
</ul>
<ul>
<li><strong>Sparks.</strong> The tagline for Sparks is &#8220;For nerding out. Together.&#8221; It feels like a positioning designed by Googlers for Googlers&#8211; you can see promotional videos <a href="http://www.youtube.com/watch?v=MRkAdTflltcgoo">here</a> and <a href="http://www.youtube.com/watch?v=0DoAl4JXhQo">here</a>. I haven&#8217;t seen much talk about Sparks, and what little commentary I&#8217;ve seen is less than gushing. I&#8217;ve experimented with it a bit from a consumption side, and I confess I&#8217;m underwhelmed. Perhaps it&#8217;s a chicken-and-egg problem &#8212; Sparks will only be useful if users populate their profiles with interests, but right now users have no incentive to do so. If Sparks is Google&#8217;s attempt to make <a href="http://en.wikipedia.org/wiki/Google_Reader">Reader</a> more social, there&#8217;s still a ways to go. <em>Full disclosure: LinkedIn has its own approach to social news, <a href="http://blog.linkedin.com/2011/03/10/linkedin-today/">LinkedIn Today</a>, which seems to be <a href="http://techcrunch.com/2011/06/30/linkedin-traffic-twitter/">doing something right</a>. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  </em></li>
</ul>
<ul>
<li><strong>Hangouts.</strong> In plain English, Hangouts are group video chat embedded in a social network. Which sounds a lot like what Facebook is <a href="http://techcrunch.com/2011/07/01/facebook-will-launch-in-browser-video-chat-next-week-in-partnership-with-skype/">rumored</a> to be releasing this week through a partnership with Skype. Which in turn was just <a href="http://www.microsoft.com/presspass/press/2011/may11/05-10corpnewspr.mspx">acquired by Microsoft</a>. Will Apple join the party too by implementing group chat in <a href="http://www.apple.com/mac/facetime/">FaceTime</a>? Competitive dynamics aside, this is a very cool feature that hopefully won&#8217;t devolve into <a href="http://en.wikipedia.org/wiki/Chatroulette">Chatroulette</a>. Nothing to, um, disclose here.</li>
</ul>
<p>But the $64B question is whether all this will matter. Can Google+ sustainably co-exist with Facebook? Will people use both services &#8212; and, if so, how will they allocate their attention between them? Or is the success of Google+ predicated on displacing Facebook? Or Twitter? Either of those would certainly qualify as a <a href="http://en.wikipedia.org/wiki/Big_Hairy_Audacious_Goal">Big Hairy Audacious Goal</a>.</p>
<p>Like <a href="http://www.avc.com/a_vc/2011/07/why-im-rooting-for-google.html">Fred Wilson</a>, I&#8217;m rooting for Google+ to succeed &#8212; but even Fred <a href="http://www.avc.com/a_vc/2011/07/why-im-rooting-for-google.html#comment-240598057">notes</a> that he would not be able to get his family on Google+, as they are already happy with Facebook. It&#8217;s not clear to me what I can get *today* from Google+ that I can&#8217;t get from Facebook.</p>
<p>Granted, I&#8217;m not a heavy Facebook user, so I&#8217;m not the best person to ask this question. So readers, I ask you: why will or won&#8217;t you use Google+?</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/04/google%c2%b1/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/04/google%c2%b1/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>InSecret: A LinkedIn Hackday Master Tries Something Different</title>
		<link>http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/</link>
		<comments>http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/#comments</comments>
		<pubDate>Sat, 25 Jun 2011 04:48:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3692</guid>
		<description><![CDATA[LinkedIn Hackdays are an awesome opportunity for innovation &#8212; learn more about them here. But first check out this unusual entry by Hackday master Dhananjay Ragade, whose previous hacks include the LinkedIn Year in Review: Don&#8217;t worry, it&#8217;s safe for work. Well, unless you work for Linden Lab.]]></description>
				<content:encoded><![CDATA[<p>LinkedIn Hackdays are an awesome opportunity for innovation &#8212; learn more about them <a href="http://blog.linkedin.com/category/linkedin-hackdays/">here</a>. But first check out this unusual entry by Hackday master <a href="http://www.linkedin.com/in/dragade">Dhananjay Ragade</a>, whose previous hacks include the <a href="http://blog.linkedin.com/2011/02/22/linkedin-year-in-review/">LinkedIn Year in Review</a>:</p>
<p><embed width="504" height="312" src="http://www.xtranormal.com/site_media/players/jw_player_v54/player.swf" flashvars="&amp;author=drr&amp;autostart=false&amp;backcolor=0x000000&amp;date=June%2013%2C%202011&amp;description=Sheldon%20wants%20to%20know%20Jane's%20secret%20to%20her%20success.&amp;fbit.height=283&amp;fbit.visible=true&amp;fbit.width=504&amp;fbit.x=0&amp;fbit.y=0&amp;file=http%3A%2F%2Ffarmprod.content.xtranormal.com%2F2011-06-18%2Fpublish%2Fe8937f92-99f5-11e0-aece-123138070614.mp4&amp;frontcolor=0xeeeeee&amp;gapro.accountid=UA-5134028-2&amp;gapro.height=283&amp;gapro.visible=true&amp;gapro.width=504&amp;gapro.x=0&amp;gapro.y=0&amp;image=http%3A%2F%2Ffarmprod.content.xtranormal.com%2F2011-06-18%2Fpublish%2Fe8937f92-99f5-11e0-aece-123138070614.png&amp;lightcolor=0xeeeeee&amp;link=http%3A%2F%2Fwww.xtranormal.com%2Fwatch%2F12209260%2Finsecret&amp;plugins=fbit-1%2Ctweetit-1%2Cviral-2%2Cgapro&amp;screencolor=0x000000&amp;skin=http%3A%2F%2Fwww.xtranormal.com%2Fsite_media%2Fplayers%2Fjw_player_v54%2Fxn.xml&amp;title=InSecret&amp;tweetit.height=283&amp;tweetit.visible=true&amp;tweetit.width=504&amp;tweetit.x=0&amp;tweetit.y=0" allowfullscreen="true" allowscriptaccess="always" bgcolor="0x000000"></embed></p>
<p>Don&#8217;t worry, it&#8217;s safe for work. Well, unless you work for Linden Lab. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>It Just Works</title>
		<link>http://thenoisychannel.com/2011/06/23/it-just-works/</link>
		<comments>http://thenoisychannel.com/2011/06/23/it-just-works/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 06:20:43 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3682</guid>
		<description><![CDATA[Given that I work for the world&#8217;s largest professional network, I take work very personally. I&#8217;m also deeply involved in LinkedIn&#8217;s hiring process, which gives me opportunities to see how people make career decisions. I thought I&#8217;d share my own perspective here. For me there are three things that matter to me about my work: [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Maslow's_hierarchy_of_needs"><img class="alignnone" title="Maslow's Hierarchy of Needs" src="http://upload.wikimedia.org/wikipedia/commons/6/60/Maslow%27s_Hierarchy_of_Needs.svg" alt="" width="491" height="369" /></a></p>
<p>Given that I work for the <a href="http://www.linkedin.com/">world&#8217;s largest professional network</a>, I take work very personally. I&#8217;m also deeply involved in LinkedIn&#8217;s hiring process, which gives me opportunities to see how people make career decisions. I thought I&#8217;d share my own perspective here.</p>
<p>For me there are three things that matter to me about my work:</p>
<ol>
<li><strong>Do I love the work I do? </strong>Does work feel like play, stimulating me intellectually and emotionally? Am I excited about the people I work with? Is work a grind, or is it something I do for fun?</li>
<li><strong>Is the work I do of value to my employer?</strong> Am I justifying my employer&#8217;s investment in me, or am I a freeloader lost in the inefficiency of corporate bureaucracy?</li>
<li><strong>Is my work making the world a better place?</strong> Specifically, is the work I do making the world by more like the world I want to live in?</li>
</ol>
<p>Not everyone may share my above values, and in any case not every job can address all of these values. But I am fortunate to have found one that does, and I&#8217;m loving it. To borrow a phrase, it just works.</p>
<p>If you haven&#8217;t seen this video by Dan Pink on what motivates people, I urge you to watch it. It&#8217;s a great reminder that there is more to motivation than economic incentives.</p>
<p><iframe width="490" height="305" src="http://www.youtube.com/embed/u6XAPnuFjJc?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>Finally, I hope that you are doing work that fulfills you. As I work to grow my <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">great team at LinkedIn</a>, my mission is not only to to bring great people to LinkedIn, but bring great work and fulfillment to great people. Whatever you do, be amazing.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/23/it-just-works/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/23/it-just-works/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Foo for Thought</title>
		<link>http://thenoisychannel.com/2011/06/18/foo-for-thought/</link>
		<comments>http://thenoisychannel.com/2011/06/18/foo-for-thought/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 20:29:29 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3678</guid>
		<description><![CDATA[Last weekend I had the extraordinary privilege to attend Foo Camp, an annual gathering of about 250 Friends Of O&#8217;Reilly (aka Foo). Tim O&#8217;Reilly, Sara Winge, and their colleagues have amazing friends, as you can see if you scan this unofficial list of attendees working on big data, open government, computer security, and more generally [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone" title="Foo Camp (photo by Jeremy Zawodny)" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/06/foo-camp.jpg" alt="" width="500" height="366" /></p>
<p>Last weekend I had the extraordinary privilege to attend Foo Camp, an annual gathering of about 250 Friends Of O&#8217;Reilly (aka Foo). <a href="http://radar.oreilly.com/tim/">Tim O&#8217;Reilly</a>, <a href="http://radar.oreilly.com/sara/">Sara Winge</a>, and their colleagues have amazing friends, as you can see if you scan this <a href="http://twitter.com/#!/mrflip/foocamp/members">unofficial list of attendees</a> working on big data, open government, computer security, and more generally on the cutting edge of technology and culture (especially where the two overlap).</p>
<p>Foo Camp is an <a href="http://en.wikipedia.org/wiki/Unconference">unconference</a>, which merits some elaboration. No fees, no conference hotel (many attendees literally set up camp in the space O&#8217;Reilly provided), and no advance program aside from some preselected 5-minute <a href="http://ignite.oreilly.com/">Ignite</a> presentations. Attendees proposed and organized sessions, merging and re-arranging them to optimize for participation. It was a bit chaotic (especially the mad rush after dinner to secure session slots), but very effective.</p>
<p>The minimalist format brought out the best in participants.</p>
<p>For example, I am passionate about (i.e., against) software patents, so I organized a session about them. I did a double-take when I realized that one of the participants was <a href="http://people.ischool.berkeley.edu/~pam/">Pamela Samuelson</a>, perhaps the world&#8217;s top expers on intellectual property law. I braced myself to be schooled &#8212; as I was. But she did it gently and constructively. Specifically, she pointed me to work that her colleagues <a href="http://www.law.berkeley.edu/4457.htm">Jason Schultz</a> and <a href="http://www.law.berkeley.edu/9959.htm">Jennifer Urban</a> were doing on a defensive patent strategy for open-source software (including a <a href="http://events.stanford.edu/events/276/27687/">proposed license</a>), as well as reminding me of the <a href="http://radar.oreilly.com/2010/07/why-software-startups-decide-t.html">Berkeley Patent Survey</a> supporting the argument that software entrepreneurs only file for patents because of real or perceived pressure from their investors. I also heard war stories from lawyers who have done pro bono work against patent trolls, reinforcing my own resolve and also reassuring me that the examples I&#8217;ve seen <a href="http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/">at close range</a> are not isolated.</p>
<p>Another session asked whether we are too data driven in our work. What was notable is that this session included participants from some of the largest internet companies debating some of the must fundamental ways in which we work, e.g., do we actually learn from data or do we engage in assault by data to defend preconceived positions (cf. <a href="http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/">argumentative theory</a>). Like all of the conference, the discussion was under &#8220;frieNDA&#8221;. so I&#8217;m being intentionally vague on the specifics. But it was refreshing to see candid admission that all of us know and have experienced the dangers of manipulating an audience with data, and that there are no algorithms to enforce common sense and good faith.</p>
<p>I won&#8217;t even try to enumerate the sessions and side conversations that excited me &#8212; topics included privacy, the future of publishing, a critical analysis of geek culture, and irrational user behavior. I missed the session on data-driven parenting, though others have pointed out to me that you can only learn so much if you don&#8217;t have twins and perform <a href="http://en.wikipedia.org/wiki/A/B_testing">A/B tests</a>. The best summary is intellectual diversity and overstimulation. If you&#8217;d like to get a general sense of the discussion, check out the <a href="http://twitter.com/#!/search/%23foocamp">#foocamp</a> tweet stream. I also recommend Scott Berkun&#8217;s post on &#8220;<a href="http://www.scottberkun.com/blog/2011/what-i-learned-at-foo-camp-11/">What I learned at FOO Camp</a>&#8220;.</p>
<p>As someone who organizes the <a href="http://hcir.info/hcir-2011/">occasional</a> <a href="http://www.cikm2011.org/industryevent">event</a>, I&#8217;m intrigued by the unconference approach &#8212; especially now that I&#8217;ve experienced it first-hand. Moreover, I feel strongly that <a href="http://thenoisychannel.com/2009/08/02/are-academic-conferences-broken-can-we-fix-them/">the academic conference model needs an upgrade</a>. But I also know that open-ended, free-form discussion sessions are not a viable alternative &#8212; indeed, a big part of Foo Camp&#8217;s success was how it inspired participants to organize sessions &#8212; and to vote with their feet to attend the worthwhile ones. And of course part of that success came from inviting active, engaged participants rather than passive spectators.</p>
<p>Many of you also organize events, and I&#8217;m sure that all of you attend them. I&#8217;m curious to hear your thoughts about how to make them better, and happy to share more of what I learned at Foo Camp. After all, Foo is for (inspiring) thought.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/18/foo-for-thought/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/18/foo-for-thought/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Christos Faloutsos: Mining Billion-Node Graphs</title>
		<link>http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/</link>
		<comments>http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/#comments</comments>
		<pubDate>Thu, 09 Jun 2011 03:00:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3667</guid>
		<description><![CDATA[As promised, here is a video of CMU professor Christos Faloutsos&#8216;s recent tech talk at LinkedIn on &#8220;Mining Billion-Node Graphs&#8220;. Enjoy! And check out our next week&#8217;s open tech talk by Sreenivas Gollapudi of Microsoft Research on &#8220;A Framework for Result Diversification in Search&#8220;. ps. If you like these topics, then please talk to me [...]]]></description>
				<content:encoded><![CDATA[<p><iframe width="500" height="312" src="http://www.youtube.com/embed/GBzoNgqF-gQ?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>As promised, here is a video of CMU professor <a href="http://www.cs.cmu.edu/~christos/">Christos Faloutsos</a>&#8216;s recent tech talk at LinkedIn on &#8220;<a href="http://events.linkedin.com/Mining-Billion-Node-Graphs-LinkedIn-Tech/pub/660176">Mining Billion-Node Graphs</a>&#8220;. Enjoy!</p>
<p>And check out our next week&#8217;s open tech talk by <a href="http://www.sreenivasgollapudi.com/">Sreenivas Gollapudi</a> of Microsoft Research on &#8220;<a href="http://events.linkedin.com/Framework-Result-Diversification-Search/pub/691171">A Framework for Result Diversification in Search</a>&#8220;.</p>
<p>ps. If you like these topics, then please talk to me about opportunities at LinkedIn! My group is <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">hiring</a>, as are <a href="http://www.linkedin.com/jsearch?keywords=engineering+OR+scientist+OR+research+OR+data&#038;searchLocationType=Y&#038;keepFacets=keepFacets&#038;page_num=1&#038;facet_COMPANY=1337&#038;pplSearchOrigin=MDYS&#038;sortCriteria=R">many others</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Winning the War for Software Engineering Talent</title>
		<link>http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/</link>
		<comments>http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/#comments</comments>
		<pubDate>Mon, 06 Jun 2011 01:07:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3647</guid>
		<description><![CDATA[The war for talent. It&#8217;s the latest metaphor for the challenge that tech companies face as excitement is building in Silicon Valley again. Well, not really &#8212; McKinsey coined the phrase in 1997 and used it as the title of a book published four years later. But anyone who has been trying to hire great [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone" style="margin-left: 10px; margin-right: 10px;" src="http://siliconvalley.sla.org/wp-content/uploads/2010/11/i_want_you_poster.jpg" alt="" width="206" height="230" /><img class="size-full wp-image-3648 aligncenter" title="Real Genius" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/06/realgenius.jpg" alt="" width="182" height="230" /></p>
<p>The war for talent. It&#8217;s the latest metaphor for the challenge that tech companies face as excitement is building in Silicon Valley again. Well, not really &#8212; McKinsey coined the phrase in 1997 and used it as the title of a <a href="http://www.amazon.com/War-Talent-Ed-Michaels/dp/1578514592">book</a> published four years later.</p>
<p>But anyone who has been trying to hire great software engineers in recent months knows how hard it is to do so. Particularly for folks like me who are trying to hire <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">data scientists</a> &#8212; apparently there&#8217;s a <a href="http://www.mckinsey.com/mgi/publications/big_data/index.asp">national shortage</a>. This is nothing new &#8212; as Joel Spolsky noted in a 2006 <a href="http://www.joelonsoftware.com/articles/FindingGreatDevelopers.html">post</a>, &#8220;the great software developers, indeed, the best people in every field, are quite simply never on the market.&#8221;</p>
<p>I&#8217;m not an expert (or <a href="http://blog.linkedin.com/2010/04/08/linkedin-ninja-job-title/">ninja</a>) on the subject of recruiting or employer branding in general, but I&#8217;ve seen enough of how companies go about hiring software engineers to know that we can do better. I&#8217;d like to share some of my thoughts and experiences, and I hope that you will reciprocate and share your thoughts in the comments. I&#8217;m especially interesting in hearing from folks who are at universities (aka hunting grounds) or who are involved in organizing academic conferences.</p>
<p>First, let&#8217;s talk about how we measure success. As <a href="http://en.wikipedia.org/wiki/William_Thomson,_1st_Baron_Kelvin">Lord Kelvin</a> famously said, &#8220;If you can&#8217;t measure it, you can&#8217;t improve it.&#8221; I&#8217;m not going to talk about how to handle active candidates &#8212; that&#8217;s a filtering problem which, in my opinion, is much more tractable. For example, see what Joel has to say about <a href="http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html">interviewing developers</a>. Rather, I&#8217;m concerned with the challenge of discovering qualified passive candidates and converting them into active ones. Hence, I propose we make our metric the number of qualified applicants.</p>
<p>The baseline strategy is sourcing, i.e. have sourcers or hiring managers scour the world for qualified candidates (there&#8217;s an <a href="http://www.linkedin.com/hiring">app</a> for that), entice them with your best recruiting pitch, and then go hog wild on the folks who respond. The success of this strategy depends mainly on the rate at which you, your sourcers, or your hiring managers find qualified candidates &#8212; which in turn may split into the two subtasks of finding candidates and filtering them &#8212; and the conversion rate for the qualified candidates you find. Since the best candidates are often happy in their current positions, sourcing passive candidates requires a lot of work and a thick skin for rejection.</p>
<p>What are other ways to attract qualified passive candidates? Here are a few, with examples from my experience at LinkedIn:</p>
<ul>
<li><strong>Hosting events.</strong> Last week at LinkedIn, we hosted CMU professor <a href="http://www.cs.cmu.edu/~christos/">Christos Faloutsos</a>, who delivered a fantastic talk on &#8220;<a href="http://events.linkedin.com/Mining-Billion-Node-Graphs-LinkedIn-Tech/pub/660176">Mining Billion Node Graphs</a>&#8221; &#8212; a topic we thought interesting enough to justify opening up the talk to the general public. We had a few hundred guests, many of whom are precisely the kinds of folks we are trying to hire. Even more people watched the live stream online or will watch the video when we post it to YouTube (coming soon &#8212; stay tuned!). While this was not a recruiting event (we did not even announce that we are hiring), it was a great opportunity to associate LinkedIn with the hard computer science problems we solve on a daily basis.</li>
<li><strong>Sponsoring events.</strong> Sponsorship is tricky &#8212; if you&#8217;re not careful, you spend a lot of money for a glorified display ad. Sometimes sponsorship offers speaking slots as part of the package, but audiences are rightfully skeptical of speakers who have paid for their slots &#8212; especially at conferences that charge hefty fees for attendance. But sometimes sponsorship works. For example, LinkedIn&#8217;s was a sponsor of the <a href="http://strataconf.com/strata2011">O&#8217;Reilly Strata Conference</a>, and the perks of sponsorship complemented our earned speaker slots, helping us bring enormous visibility to our data scientist team and its recent innovations like <a href="http://blog.linkedin.com/2011/01/24/linkedin-inmaps/">InMaps</a> (we has a booth there to print attendees&#8217; InMaps) and <a href="http://thenoisychannel.com/2011/02/04/got-skills/">Skills</a> (which launched during the conference). While Strata generated few direct leads, it left a lasting impression in the <a href="http://en.wikipedia.org/wiki/Big_data">big data</a> community, and I regularly hear candidates refer to it.</li>
<li><strong>Participating in events.</strong> As the Beatles tell us, money <a href="http://en.wikipedia.org/wiki/Can't_Buy_Me_Love">can&#8217;t buy you love</a>. If you want to make an (positive) impression at a conference, you have to contribute people and ideas. This is especially true at academic conferences, where attendees quickly throw out the the extra weight in their tote bags and focus on the conference&#8217;s content and professional networking opportunities. It&#8217;s great if you are Microsoft with a team of close to a thousand researchers and can <a href="http://research.microsoft.com/en-us/news/features/sigir2010-071910.aspx">dominate</a> a conference like <a href="http://sigir.org/">SIGIR</a>. But smaller companies can still make a strong impression on researchers &#8212; and especially on students who may be looking for internships or full-time positions &#8212; by taking an active role at conferences. The traditional approach is to submit papers to the main conference track &#8212; but other avenues include <a href="http://www.kdd.org/kdd2011/tutorials.shtml">tutorials</a>, <a href="http://hcir.info/hcir-2011/">workshops</a>, and <a href="http://www.cikm2011.org/industryevent">industry events</a>. Such participation is often invited, but such invitations are in turn earned by cultivating relationships with researchers &#8212; especially the ones who find themselves on organizing committees.</li>
<li><strong>Contribute to open source projects.</strong> The Search, Network, and Analytics (SNA) team at LinkedIn contributes frequently to open-source projects and publicizes some of its work at <a href="http://sna-projects.com/">http://sna-projects.com/</a>. Open source projects are a great way to earn the respect of engineers who value source over PowerPoint. Especially when your employees include <a href="http://www.linkedin.com/in/allenwittenauer">committers</a> to key technologies like Hadoop. Moreover, open-source projects are social communities, so contributing to them offers opportunities for employees to interact with potential hires.</li>
<li><strong>Social media.</strong> By now, I&#8217;d like to think that marketers understand social media to simply be another set of marketing channels. But I think the territory is still pretty new for employers. Here is a simple suggestion: encourage (but do not try to force) employees to express themselves professionally online. Enforce the standard non-disclosure rules, of course, but don&#8217;t try to manage their voices. Authenticity speaks for itself &#8212; for example, look at what <a href="http://www.linkedin.com/in/adamnash">Adam Nash</a> says about LinkedIn on his <a href="http://blog.adamnash.com/?s=linkedin">personal blog</a>. Or my own posts <a href="http://thenoisychannel.com/?s=linkedin">here</a>. Engineers don&#8217;t read press releases or  corporate blogs, but they do pay attention to their peers. And there&#8217;s nothing unique about blogs &#8212; the same principle applies to platforms like Twitter, Facebook, Quora, and of course LinkedIn. Not all employees enjoy being online extroverts, but those that do not only act as brand ambassadors, but also are likely to eventually strike up conversations with passive candidates about employment opportunities.</li>
</ul>
<p>Finally, don&#8217;t forget measure the results of these efforts! Some activities generate leads directly, in which case you can make an apples-to-apples comparison of their results and costs with the baseline strategy of sourcing. It&#8217;s harder to measure the longer-term effect of efforts to raise visibility, but you can at least ask candidates if they are aware of those efforts &#8212; after all, efforts to raise visibility should be visible to candidates! You can also ask candidates if those efforts were a factor in their decision to apply. These measures aren&#8217;t perfect, but they are a lot better than nothing, especially when you&#8217;re trying to decide how best to invest limited resources.</p>
<p>Of course, even an optimal strategy can&#8217;t substitute for offering a combination of interesting work, competitive compensation, and a work hard / play hard <a href="http://www.youtube.com/watch?v=PUwEEOhcK3s">culture</a>. As with all marketing efforts, you need to start with a great product. But great products don&#8217;t sell themselves: you need to invest in a combination of outbound and inbound marketing to have a fighting chance in the war for talent. Good luck! And, in case you didn&#8217;t notice, <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">we&#8217;re hiring</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>I&#8217;d Like To Have An Argument Please</title>
		<link>http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/</link>
		<comments>http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/#comments</comments>
		<pubDate>Tue, 31 May 2011 03:19:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3639</guid>
		<description><![CDATA[If you Google [relevance theory], you&#8217;ll discover this Wikipedia entry about a theory proposed by Dan Sperber and Deirdre Wilson arguing that, in any given communication situation, the listener will stop processing as soon as he or she has found meaning that fits his or her expectation of relevance. The Wikipedia entry offers the following example of [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/The_Argument_Sketch"><img class="alignnone" title="Monty Python: The Argument Sketch" src="http://upload.wikimedia.org/wikipedia/en/8/85/Argument_Clinic.png" alt="" width="400" height="312" /></a></p>
<p>If you Google [<a href="http://www.google.com/search?q=relevance+theory">relevance theory</a>], you&#8217;ll discover this <a href="http://en.wikipedia.org/wiki/Relevance_theory">Wikipedia entry</a> about a theory proposed by Dan Sperber and Deirdre Wilson arguing that, in any given communication situation, the listener will stop processing as soon as he or she has found meaning that fits his or her expectation of relevance. The Wikipedia entry offers the following example of this principle:</p>
<blockquote><p>Mary: Would you like to come for a run?</p>
<p>Bill: I&#8217;m resting today.</p>
<p>We understand from this example that Bill does not want to go for a run. But that is not what he said. He only said enough for Mary to add the context-mediated information: i.e. someone who is resting doesn&#8217;t usually go for a run. The implication is that Bill doesn&#8217;t want to go for a run today.</p></blockquote>
<p>This theory may call to mind the <a href="http://en.wikipedia.org/wiki/Gricean_maxims">Gricean Maxims</a> &#8212; indeed, Sperber and Wilson borrow heavily from Grice&#8217;s work.</p>
<p>But I mainly bring up relevance theory to introduce Sperber to those unfamiliar with him. My friend (and <a href="http://www.endeca.com/">Endeca</a> co-founder) <a href="http://facets.endeca.com/authors-2/">Pete Bell</a> recently called to my intention an article by neuroscientist <a href="http://en.wikipedia.org/wiki/Jonah_Lehrer">Jonah Lehrer</a> entitled &#8220;<a href="http://www.wired.com/wiredscience/2011/05/the-sad-reason-we-reason/">The Reason We Reason</a>&#8220;. The article reviews the <a href="http://www.fallacyfiles.org/hothandf.html">&#8220;hot hand&#8221; fallacy</a> and then proceeds to cite a new theory by Sperber and <a href="http://sites.google.com/site/hugomercier/">Hugo Mercier</a>:</p>
<blockquote><p>Reasoning is generally seen as a mean to improve knowledge and make better decisions. Much evidence, however, shows that reasoning often leads to epistemic distortions and poor decisions. This suggests rethinking the function of reasoning. Our hypothesis is that the function of reasoning is argumentative. It is to devise and evaluate arguments intended to persuade.</p></blockquote>
<p>The full article by Mercier and Sperber runs over 17K works and is entitled &#8220;<a href="http://www.dan.sperber.fr/wp-content/uploads/2009/10/MercierSperberWhydohumansreason.pdf">Why do humans reason? Arguments for an argumentative theory</a>&#8220;.</p>
<p>As someone who has spent most of his professional life thinking about information retrieval in practical contexts, I automatically relate relevance theory to <a href="http://en.wikipedia.org/wiki/Relevance_(Information_Retrieval)">relevance in the context of information retrieval</a>. Relevance has been a subject of intense debate in the information science community (<a href="http://thenoisychannel.com/2008/05/05/saracevic-on-relevance-and-interaction/">Tefko Saracevic</a> tells the story wonderfully). Indeed, a key reason that I created the <a href="http://hcir.info/">HCIR workshop</a> was the belief that information retrieval researchers and practitioners (i.e., search engine developers) were placing too much emphasis on an objective notion of topical relevance, and not enough focus on the user.</p>
<p>Mercier and Sperber&#8217;s theory offers an interesting challenge to information retrieval researchers: perhaps a user&#8217;s information need is less about arriving at the truth and more about finding confirmatory evidence to support a preconceived conclusion. If so, should we adjust our notions of relevance accordingly? Also, if we evaluate or inform search quality based on observed user behavior (such as click-through behavior), then are we already inadvertently conflating topical relevance with users&#8217; confirmatory bias?</p>
<p>Many people have noted that personalization gives us the truth we want: recent examples include Robin Sloan and Matt Thompson&#8217;s <em><a href="http://www.robinsloan.com/epic/">EPIC 2014</a></em> and Eli Pariser&#8217;s <em><a href="http://www.thefilterbubble.com/">The Filter Bubble</a></em>. Despite the consensus that over-fitting information access to our personal tastes is a bad thing (perhaps even dystopian), technology seems to relentlessly push us in this direction. Moreover, some degree of personalization is clearly useful &#8212; such as prioritizing information that relates to our personal and professional interests.</p>
<p>Nonetheless, anyone working in the area of information seeking systems should be concerned with the question of the user&#8217;s goal in using that system. Many of us take for granted that the user&#8217;s main goal is truth seeking, and we design our systems accordingly. What can or should we do differently if the user&#8217;s main goal is not informative but persuasive? Is the user looking for an answer&#8230;or an <a href="http://www.youtube.com/watch?v=teMlv3ripSM">argument</a>?</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Going Public</title>
		<link>http://thenoisychannel.com/2011/05/19/going-public/</link>
		<comments>http://thenoisychannel.com/2011/05/19/going-public/#comments</comments>
		<pubDate>Fri, 20 May 2011 03:53:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3629</guid>
		<description><![CDATA[What a day! I&#8217;ve been excited about LinkedIn from the moment I joined &#8212; and for several years before that &#8212; but today has been a unique experience. I hope our celebration extends beyond LinkedIn&#8217;s employees and investors &#8212; this is a great day for Silicon Valley, for the data scientists who are building its [...]]]></description>
				<content:encoded><![CDATA[<p><iframe width="475" height="296" src="http://www.youtube.com/embed/mCrYkEVygIs?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>What a day! I&#8217;ve been excited about LinkedIn from the moment I <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">joined</a> &#8212; and for several years before that &#8212; but today has been a unique experience. I hope our celebration extends beyond LinkedIn&#8217;s employees and investors &#8212; this is a great day for Silicon Valley, for the <a href="http://thenoisychannel.com/2011/01/04/so-you-like-big-data/">data scientists</a> who are building its most valuable companies, and for the users who are benefiting from it all. I am proud and deeply grateful to be a part of this extraordinary adventure. My thanks to my hundreds of incredible colleagues and to the <a href="http://blog.linkedin.com/2011/03/22/linkedin-100-million/">100M users</a> who have made it possible.</p>
<p>ps. Yes, we are still <a href="http://www.linkedin.com/jobs?viewJob=&#038;jobId=1544636">hiring</a>, so please contact me if you&#8217;re the kind of person who loves turning data into gold. And if you are local, check out Christos Faloutsos&#8217;s upcoming tech talk on <a href="http://bit.ly/graphmine">Mining Billion Node Graphs</a>, which will take place at LinkedIn on June 2 and is open to the public.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/19/going-public/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/19/going-public/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>In Search Of Structure</title>
		<link>http://thenoisychannel.com/2011/05/15/in-search-of-structure/</link>
		<comments>http://thenoisychannel.com/2011/05/15/in-search-of-structure/#comments</comments>
		<pubDate>Sun, 15 May 2011 19:24:53 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3607</guid>
		<description><![CDATA[A couple of weeks ago, I participated in a summit that Greylock Partners organized for its portfolio companies at LinkedIn to discuss the power of data. Invited participants represented some of the most interesting &#8220;big data&#8221; companies in Silicon Valley, including Google, Facebook, Pandora, Cloudera, and Zynga. Discussion took place under the Chatham House Rule, [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.iws.org/images/search.gif"><img class="alignnone" title="In Search Of" src="http://www.iws.org/images/search.gif" alt="" width="185" height="182" /></a><a href="http://www.geo.arizona.edu/xtal/geos306/Image1.gif"><img class="alignnone" title="Structure" src="http://www.geo.arizona.edu/xtal/geos306/Image1.gif" alt="" width="243" height="182" /></a></p>
<p style="text-align: left;">A couple of weeks ago, I participated in a summit that <a href="http://www.greylock.com/">Greylock Partners</a> organized for its <a href="http://www.greylock.com/portfolio/portfolio/">portfolio</a> companies at <a href="http://www.linkedin.com/">LinkedIn</a> to discuss the power of data. Invited participants represented some of the most interesting &#8220;<a href="http://en.wikipedia.org/wiki/Big_data">big data</a>&#8221; companies in Silicon Valley, including Google, Facebook, Pandora, Cloudera, and Zynga. Discussion took place under the <a href="http://en.wikipedia.org/wiki/Chatham_House_Rule">Chatham House Rule</a>, so I&#8217;m not at liberty to share much detail. But I can say that there were energetic conversations about metrics, tools, and (of course) <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1544636">hiring</a>.</p>
<p>One of the participants was Google researcher <a href="http://www.cs.washington.edu/homes/alon/">Alon Halevy</a>, who generously shared his presentation on <a href="http://www.google.com/fusiontables/public/tour/index.html">Fusion Tables</a> with me with permission to re-share it <a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/05/fusionTablesMay5.pdf">here</a>.</p>
<p>Fusion Tables allow the general public to upload, visualize, and share structured data. They are particularly useful for journalists who want to distill compelling stories from data &#8212; indeed, <em>The Guardian</em>&#8216;s <a href="http://www.guardian.co.uk/profile/simonrogers">Simon Rogers</a> has used Fusion Tables to visualize and interpret everything from <a href="http://www.guardian.co.uk/news/datablog/2011/mar/14/nuclear-power-plant-accidents-list-rank">nuclear power plant accidents</a> to <a href="http://www.guardian.co.uk/news/datablog/2010/nov/29/wikileaks-cables-data">Wikileaks</a>.</p>
<p>After his presentation, I asked Alon for his thoughts on why haven&#8217;t we seen an encyclopedic structured data repository comparable in scope, scale to Wikipedia? Alon offered that structured data is brittle &#8212; its value tends depend more on context than the unstructured content that populates Wikipedia. I agree in part &#8212; for example, consider this <a href="http://www.nypost.com/p/news/local/posh_nabes_get_bus_ted_LS1oa34dj4q4SoJPkY9FZK">map</a> of Brooklyn bus stops that were slated for elimination last summer. Such data is useful in a narrow context, but hardly encyclopedic.</p>
<p>But what about <a href="http://wiki.freebase.com/wiki/What_is_Freebase%3F">Freebase</a> and <a href="http://dbpedia.org/About">DBpedia</a>? Freebase is an open repository of structured data associated with about 20 million topics. DBpedia describes itself as &#8220;a community effort to extract structured information from Wikipedia and to make this information available on the Web.&#8221; While these tools have seen some use by developers (especially in the <a href="http://semanticweb.org/">semantic web community</a>), they have not achieved mainstream adoption. Perhaps data marketplaces like <a href="http://www.factual.com/">Factual</a> and <a href="http://www.infochimps.com/">Infochimps</a> will be successful as for-profite businesses, but the question remains why we don&#8217;t have a Wikipedia-scale success story for public structured data.</p>
<p>I think the problem is easiest to frame in <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> terms. Wikipedia is all about <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a>, but not so much about <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a>. Let me elaborate.</p>
<p>Wikipedia represents a collective attempt to achieve precision at the level of individual entries. Contributor / editors correct mistakes and argue over the details of content and tone. But coverage is a much lower priority. When in doubt, the Wikipedia collective assumes that information is not notable enough to justify inclusion. Thus Wikipedia errs on the side of precision rather than recall when it comes to meeting the information needs of its users.</p>
<p>This arrangement works well for a typical web user who seeks out information by using Google web search as an interface to discover Wikipedia articles. But structured data is about sets, not just individuals. It does me no good to see aggregate statistics about a set of entities if the set is erratically populated (e.g., Wikipedia&#8217;s list of <a href="http://en.wikipedia.org/wiki/Category:Companies_established_in_1999">companies established in 1999</a> or Freebase&#8217;s list of those <a href="http://www.freebase.com/view/user/masouras/default_domain/views/companies_founded_after_2000">founded after 2000</a>).</p>
<p>In the June 2009 SIGIR Forum, University of Melbourne researchers Justin Zobel, Alistair Moffat, and Laurence Park argued &#8220;<a href="http://www.sigir.org/forum/2009J/2009j-sigirforum-zobel.pdf">against recall</a>&#8220;, concluding that they could find &#8220;no justification for implicit or explicit use of recall as a measure of search satisfaction.&#8221; I posted a rebuttal entitled &#8220;<a href="http://thenoisychannel.com/2009/07/17/in-defense-of-recall/">In Defense of Recall</a>&#8220;, arguing that recall is much more useful as a measure for set retrieval than for ranked retrieval. Revisiting this argument two year later, I can see that it holds even more strongly if we are interested in structured data where we want to reason about aggregate properties of sets.</p>
<p>Back when we both worked at <a href="http://www.endeca.com/">Endeca</a>, my colleague <a href="http://www.linkedin.com/in/robgonzalez">Rob Gonzalez</a> described structured data repositories to be as a public good that no one is ever willing to pay for. I&#8217;m an optimist by nature, but in this case I fear he has a point. It takes a lot of work to build something useful, and no one seems to have addressed the challenge of incenting people to contribute this work for either economic or altruistic motives.</p>
<p>Or perhaps we&#8217;ll just have to wait for the holy grail of information extraction algorithms to structure the world&#8217;s information for us? Ironically, that&#8217;s not even included on Wikipedia&#8217;s list of <a href="http://en.wikipedia.org/wiki/AI-complete">AI-complete</a> problems.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/15/in-search-of-structure/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/15/in-search-of-structure/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Announcing HCIR 2011!</title>
		<link>http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/</link>
		<comments>http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/#comments</comments>
		<pubDate>Sun, 08 May 2011 04:24:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3598</guid>
		<description><![CDATA[As regular readers know, I&#8217;ve been co-organizing annual workshops on Human-Computer Interaction and Information Retrieval since creating the first HCIR workshop in 2007. These have been a huge success, not only bridging the gap between IR and HCI, but also bringing together researchers and practitioners to address concerns shared by both communities. Past keynote speakers have included [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://isquared.files.wordpress.com/2011/03/wordle.jpg"><img class="alignnone" title="HCIR Wordle (via Tony Russell-Rose)" src="http://isquared.files.wordpress.com/2011/03/wordle.jpg" alt="" width="524" height="254" /></a></p>
<p>As regular readers know, I&#8217;ve been co-organizing annual workshops on <a href="http://hcir.info/">Human-Computer Interaction and Information Retrieval</a> since creating the first HCIR workshop in <a href="http://projects.csail.mit.edu/hcir/">2007</a>. These have been a huge success, not only bridging the gap between <a href="http://en.wikipedia.org/wiki/Information_retrieval">IR</a> and <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_interaction">HCI</a>, but also bringing together researchers and practitioners to address concerns shared by both communities. Past keynote speakers have included such information science luminaries as <a href="http://en.wikipedia.org/wiki/Susan_Dumais">Susan Dumais</a>, <a href="http://en.wikipedia.org/wiki/Ben_Shneiderman">Ben Shneiderman</a>, and <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a>.</p>
<p>Every workshop has improved on the previous year&#8217;s, and <a href="http://hcir.info/hcir-2011/">HCIR 2011</a>, which will take place on Thursday, October 20, will be no exception.</p>
<p>Our venue will be <a href="http://maps.google.com/maps/place?cid=1017478923201951099">Google&#8217;s headquarters</a> in Mountain View, California. We could hardly imagine a more appropriate venue: Google has done more than any another company to contribute to everyday information access. Google has been extremely generous as a host and sponsor (other sponsors include <a href="http://www.endeca.com/">Endeca</a> and <a href="http://research.microsoft.com/">Microsoft Research</a>), and its location in the heart of Silicon Valley is ideal for attracting researchers and practitioners building the future of HCIR.</p>
<p>Our keynote speaker will be <a rel="nofollow" href="http://ils.unc.edu/~march/" target="_blank">Gary Marchionini</a>, Dean of the School of Information and Library Science at the University of North Carolina at Chapel Hill. Gary coined the phrase &#8221;human–computer information retrieval&#8221; in a lecture entitled &#8220;<a href="http://www.asis.org/Bulletin/Jun-06/marchionini.html">Toward Human-Computer Information retrieval</a>&#8220;, in which he asserted that &#8221;HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.&#8221; We are honored to have Gary deliver this year&#8217;s keynote.</p>
<p>But of course the main attraction is the contribution of participants. This year we invite three types of papers: position papers, research papers and challenge reports. Possible topics for discussion and presentation at the workshop include, but are not limited to:</p>
<ul>
<li><span style="color: #000000;">Novel interaction techniques for information retrieval.</span></li>
<li><span style="color: #000000;">Modeling and evaluation of interactive information retrieval.</span></li>
<li><span style="color: #000000;">Exploratory search and information discovery.</span></li>
<li><span style="color: #000000;">Information visualization and visual analytics.</span></li>
<li><span style="color: #000000;">Applications of HCI techniques to information retrieval needs in specific domains.</span></li>
<li><span style="color: #000000;">Ethnography and user studies relevant to information retrieval and access.</span></li>
<li><span style="color: #000000;">Scale and efficiency considerations for interactive information retrieval systems.</span></li>
<li><span style="color: #000000;">Relevance feedback and active learning approaches for information retrieval.</span></li>
</ul>
<p><span style="color: #000000;">Demonstrations of systems and prototypes are particularly welcome.</span></p>
<p>Building on the success of the <a href="http://hcir.info/hcir-2010/challenge">last year&#8217;s HCIR Challenge</a> to address historical exploration of a news archive, <a href="http://hcir.info/hcir-2011/challenge">this year&#8217;s HCIR Challenge</a> will focus on the problem of information availability. The corpus for the Challenge will be the <a href="http://citeseerx.ist.psu.edu/">CiteSeer</a> digital library of scientific literature.</p>
<p>For more information about the workshop, including how to submit papers or participate in the challenge, please visit the <a href="http://hcir.info/hcir-2011/">HCIR 2011 website</a>.</p>
<p>Here are the key dates for submitting position and research papers:</p>
<ul>
<li>Submission deadline (position and research papers): <strong>July 31</strong></li>
<li>Notification of acceptance decision: <strong>September 8</strong></li>
<li>Presentations and poster session at workshop:<strong> October 20</strong></li>
</ul>
<p>Key dates for Challenge participants:</p>
<ul>
<li>Request access to corpus (contact <a href="mailto:dtunkelang@gmail.com">me</a>) deadline: <strong>June 19</strong></li>
<li>Freeze system and submit brief description: <strong>September 25</strong></li>
<li>Submit videos or screenshots demonstrating systems on example tasks: <strong>October 9</strong></li>
<li>Live demonstrations at workshop: <strong>October 20</strong></li>
</ul>
<p>I&#8217;m looking forward to this year&#8217;s submissions, and to a great workshop in October. I hope to see many of you there!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>CFP: CIKM 2011 Industry Event</title>
		<link>http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/</link>
		<comments>http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/#comments</comments>
		<pubDate>Sun, 01 May 2011 00:43:10 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3594</guid>
		<description><![CDATA[As I posted a few months ago, I&#8217;m organizing the Industry Event at CIKM 2011 with Tony Russell-Rose. We have a great set of keynotes lined up: Stephen Robertson (Microsoft Research) John Giannandrea (Google) Jeff Hammerbacher (Cloudera) Peter Jackson (Thomson Reuters) We&#8217;re also looking for submissions from industry researchers and practitioners. The submission deadline is June 21. Here is [...]]]></description>
				<content:encoded><![CDATA[<div>
<p><a href="http://www.cikm2011.org/node/20"><img title="CIKM 2011" src="http://www.cikm2011.org/sites/default/files/cikm2011_craigm_v1_logo.jpg" alt="" width="576" height="91" /></a></p>
<p>As I posted a few months ago, I&#8217;m organizing the <a href="http://www.cikm2011.org/node/20">Industry Event</a> at <a href="http://www.cikm2011.org/">CIKM 2011</a> with <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>. We have a great set of keynotes lined up:</p>
<ul>
<li><a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a> (Microsoft Research)</li>
<li><a href="http://www.freebase.com/view/en/john_giannandrea">John Giannandrea</a> (Google)</li>
<li><a href="http://jeffhammerbacher.com/">Jeff Hammerbacher</a> (Cloudera)</li>
<li><a href="http://www.jacksonpeter.com/">Peter Jackson</a> (Thomson Reuters)</li>
</ul>
<p>We&#8217;re also looking for submissions from industry researchers and practitioners. The submission deadline is <strong>June 21</strong>.</p>
<p>Here is a copy of the <a href="http://www.cikm2011.org/node/20">call for papers</a>:</p>
<p>This year’s CIKM conference will include an Industry Event, which will be held during the regular conference program in parallel with the technical tracks.</p>
<p>The Industry Event&#8217;s objectives are twofold. The first objective is to present the state-of-the-art in information retrieval, knowledge management, databases, and data mining, delivered as keynote talks by influential technical leaders who work in industry. The second objective is to present interesting, novel and innovative industry developments in these areas.</p>
<p>Industry authors are invited to prepare proposals for presenting interesting, novel and innovative ideas, and submit these to <a href="mailto:industry@cikm2011.org">industry@cikm2011.org</a> by June 21st 2011. The proposals should contain (with respective lengths):</p>
<ul>
<li>Short company portrait (125 words)</li>
<li>Short CV of the presenter (125 words)</li>
<li>Title and abstract of the presentation (250 words)</li>
<li>Reasons why the presentation should be interesting to the CIKM audience</li>
</ul>
<p>When submitting a proposal, please bear in mind the following:</p>
<ul>
<li>Ensure the presentation is relevant to the CIKM audience (the Call for Papers gives a good idea of the conference scope).</li>
<li>Try to highlight interesting R&amp;D challenges in the work you present. Please do not present a sales pitch.</li>
<li>All slides will be made public (no confidential information on the slides; you will be expected to ensure your slides are approved by your company before being presented).</li>
<li>Presenters may opt to have their presentation videoed and made public, and if so, the presenter will be asked to sign a release form.</li>
</ul>
<p>We look forward to receiving your submissions, and welcoming you to the CIKM 2011 Conference and Industry Event.</p>
<p><strong>Important dates:</strong><br />
21 June 2011:	 Industry Event paper proposals due<br />
19 July 2011:	 Notifications sent<br />
27 October 2011:	Industry Event<br />
24-28 October 2011:	CIKM conference</p>
<p>&nbsp;</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CFP: IEEE Internet Computing Special Issue on Context-Aware Computing</title>
		<link>http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/</link>
		<comments>http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/#comments</comments>
		<pubDate>Sun, 01 May 2011 00:17:22 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3585</guid>
		<description><![CDATA[Pankaj Mehra and I are guest editors for an upcoming special issue of IEEE Internet Computing with the topic &#8220;Beyond Search: Context-Aware Computing&#8220;. Here is a copy of the call for papers: Context is the unstated actor in human communications, actions, and situations. It makes our communication efficient, our commands actionable, and our situations understandable to [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.computer.org/portal/web/internet/home"><img class="alignnone" title="IEEE Internet Computing" src="http://www.computer.org/portal/image/image_gallery?uuid=9a783503-bfa2-47ca-aa63-6d02ca87c662&amp;groupId=889131&amp;t=1256675935719" alt="" width="481" height="55" /></a></p>
<p><a href="http://www.linkedin.com/in/pankajmehra">Pankaj Mehra</a> and I are guest editors for an upcoming special issue of <em><a href="http://www.computer.org/portal/web/internet/home">IEEE Internet Computing</a></em> with the topic &#8220;<a href="http://www.computer.org/portal/web/computingnow/iccfp2">Beyond Search: Context-Aware Computing</a>&#8220;.</p>
<p>Here is a copy of the call for papers:</p>
<p>Context is the unstated actor in human communications, actions, and situations. It makes our communication efficient, our commands actionable, and our situations understandable to the people, organizations, and devices that provide us with content or services. The increased embedding of technology into our personal and social environments drives a need for context-aware computing.</p>
<p>Context-aware computing offers mobile Internet users an experience that goes beyond user-initiated search and location-­based services. Context awareness sharpens relevance when responding to user-initiated actions (such as product search and support calls). It also enables proactive communications through analysis of a user’s behavior and environment, thereby forming the basis for key business imperatives targeting customer-engagement systems. Even greater opportunity arises from context use in systems that can make sense of and engage in customer dialogs and forums.</p>
<p>This special issue seeks original articles that support and illustrate context use in creating enhanced user experiences. Sample topics include</p>
<ul>
<li>proactive, contextualized delivery of information, alerts, and advertisements;</li>
<li>context-mediated Web service orchestration, yielding actionable interpretation of spoken high-level commands;</li>
<li>system architecture, economics, and ecosystems for comprehensive capture, representation, communication, gathering, and brokering the larger user context;</li>
<li>systems of engagement that treat discourse as text plus context and process textual communication as an event in which linguistic, cognitive, and social actions converge; and</li>
<li>reasoning and knowledge representation mechanisms that use context in selecting the body of knowledge to use, the level of detail to model, and the point of view with which to communicate and interpret text and data.</li>
</ul>
<p>All submissions must be original manuscripts of fewer than 5,000 words, focused on Internet technologies and implementations. All manuscripts are subject to peer review on both technical merit and relevance to <em>IC</em>’s international readership—primarily system and software design engineers. We do not accept white papers, and we discourage strictly theoretical or mathematical papers. To submit a manuscript, please log on to <a href="https://mc.manuscriptcentral.com/ic-cs" target="_blank">ScholarOne (https://mc.manuscriptcentral.com:443/ic-cs)</a> to create or access an account, which you can use to log on to <a href="http://www.computer.org/portal/web/peerreviewmagazines/acinternet"><em>IC</em>’s Author Center</a> and upload your submission.</p>
<p>I hope some of you will submit articles in time for the <strong>June 15</strong> deadline, and Pankaj and I look forward to reviewing them.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Identifying Influencers on Twitter</title>
		<link>http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/</link>
		<comments>http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/#comments</comments>
		<pubDate>Sun, 17 Apr 2011 02:52:43 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3567</guid>
		<description><![CDATA[One of the perks of working at LinkedIn is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a WSDM 2011 paper on &#8220;Identifying &#8216;Influencers&#8217; on Twitter&#8221; by Eytan Bakshy, Jake Hofman, Winter Mason, and Duncan Watts. It&#8217;s great to see the folks at [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://darmano.typepad.com/logic_emotion/2006/08/levels_of_influ.html"><img class="alignnone size-full wp-image-3569" title="Levels of Influence (David Armamo)" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/04/levels-of-influence.gif" alt="" width="418" height="418" /></a><br />
One of the perks of <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1544636">working at LinkedIn</a> is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a <a href="http://www.wsdm2011.org/">WSDM 2011</a> paper on &#8220;<a href="http://research.yahoo.com/files/bakshy_wsdm.pdf">Identifying &#8216;Influencers&#8217; on Twitter</a>&#8221; by <a href="http://www-personal.umich.edu/~ebakshy">Eytan Bakshy</a>, <a href="http://research.yahoo.com/Jake_Hofman">Jake Hofman</a>, <a href="http://research.yahoo.com/Winter_Mason">Winter Mason</a>, and <a href="http://research.yahoo.com/Duncan_Watts">Duncan Watts</a>. It&#8217;s great to see the folks at Yahoo! Research doing cutting-edge work in this space.</p>
<p>I thought I&#8217;d prepare for the discussion by sharing my thoughts here. Perhaps some of you will even be kind enough to add your own ideas, which I promise to share with the reading group.</p>
<p>I encourage you to read the paper, but here&#8217;s a summary of its results:</p>
<ul>
<li>A user&#8217;s influence on Twitter is the extent to which that user can cause diffusion a posted URL, as measured by reposts propagated through follower edges in Twitter&#8217;s directed social graph.</li>
<li>The best predictors of future total influence are follower count and past local influence, where local influence refers to the average number of reposts by that user’s immediate followers, and total influence refers to average total cascade size.</li>
<li>The content features of individual posts do not have identifiable predictive value.</li>
<li>Barring a high per-influencer acquisition cost, the most cost-effective strategy for buying influence is to target users of average influence.</li>
</ul>
<p>Let&#8217;s dive in a bit deeper.</p>
<p>The definitions of influence and influencers are, by the authors&#8217; own admission, narrow and arbitrary. There are many ways one could define influence, even within the context of Twitter use. But I agree with the authors that these definitions have enough <a href="http://en.wiktionary.org/wiki/verisimilitude">verisimilitude</a> to be useful, and their simplicity facilitates quantitative analysis.</p>
<p>It&#8217;s hardly surprising that past influence is a strong predictor of future influence. But it might seem counterintuitive that, for predicting future total influence,  past local influence is more informative than past total influence. The authors suggest the explanation that most non-trivial cascades are of depth 1 &#8212; i.e., total influence is mostly local influence. But at most that would make the two features equally informative, and total influence should still be a mildly better predictor.</p>
<p>I suspect that another factor is in play &#8212; namely, that the difference between local influence and total influence reflects the unpredictable and rare virality of the content (e.g., <a href="http://networkeffect.allthingsd.com/20110415/random-facebook-users-question-gets-four-million-votes/">a random Facebook Question generated 4M votes</a>). If this hypothesis is correct, then past local influence factors out this unpredictable factor and is thus a better predictor of both future local influence and future total influence.</p>
<p>I&#8217;m a bit surprised that follower count supplies additional informative value beyond the past local influence; after all, local influence should already reflect the extent to which the followers are being influenced. It&#8217;s possible that past influence lags the follower count, since it does not sufficiently weigh the potential contributions of more recent followers. But another possibility is one analogous to the predictive value of past local vs. global influence: past local influence may include an unpredictable content factor which follower count factors out.</p>
<p>Of course, I can&#8217;t help suggesting that <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a> might be a more useful indicator than follower count. Unfortunately the authors don&#8217;t seem to be aware of the TunkRank work &#8212; or perhaps they preferred to restrict their attention to basic features.</p>
<p>I&#8217;m not surprised by the inability to exploit content features to predict influence. If it were easy to generate viral content, <a href="http://en.wikipedia.org/wiki/Get-rich-quick_scheme">everyone would do it</a>. Granted, a deeper analysis might squeeze out a few features (like those suggested in the <a href="http://www.buddymedia.com/newsroom/?p=9335">Buddy Media report</a>), but I don&#8217;t think there are any silver bullets here.</p>
<p>Finally, the authors consider the question of designing a cost-effective strategy to buy influence. The authors assume that the cost of buying influence can be modeled in terms of two parameters: a per-influencer acquisition cost (which is the same for each influencer) and a per-follower cost for each influencer. They conclude that, until the acquisition cost is extremely high (i.e., over 10,000 times the per-follower cost), the most cost-efficient influencers are those of average influence. In other words, there&#8217;s no reason to target the <a href="http://www.amazon.com/Influentials-American-Tells-Other-Where/dp/0743227298">small number of highly influential users</a>.</p>
<p>The authors may be arriving at the right conclusion (Watts&#8217;s <a href="http://research.yahoo.com/files/w_d_JCR.pdf">earlier work</a> with <a href="http://www.uvm.edu/~pdodds/">Peter Dodds</a>, which the paper cites, questions the &#8220;influentials&#8221; hypothesis), but I&#8217;m not convinced by their economic model of an influence market. It may be the case that professional influencers are trying to peddle their followers&#8217; attention on a per-follower basis &#8212; there are <a href="http://www.buytwitterfollowers.org/">sites</a> <a href="http://twitter1k.com/">that</a> <a href="http://www.socialkik.com/twitter_promo.html">offer</a> <a href="http://www.twitterfollowersshop.com/">this</a> <a href="http://usocial.net/twitter_marketing/">model</a>.</p>
<p>But why should anyone believe that an influencer&#8217;s value is proportional to his or her number of followers? The authors&#8217; own work suggests that past local influence is a more valuable predictor than follower count, and again they might want to look at TunkRank.</p>
<p>Regardless, I&#8217;m not surprised that a fixed per-follower cost makes users with high follower counts less cost-effective, as I subscribe to its corollary: as a user&#8217;s follower count goes up, the per-follower value diminishes. I haven&#8217;t done the analysis, but I believe that the ratio of a user&#8217;s TunkRank to the user&#8217;s follower count tends to go down as a user&#8217;s follower count goes up. A more interesting research (and practical) question would be to establish a correctly calibrated model of influencer value and then explore portfolio strategies.</p>
<p>In any case, it&#8217;s an interesting paper, and I look forward to discussing it with my colleagues next week. Of course, I&#8217;m happy to discuss it here in the meantime. If you&#8217;re in my reading group, feel free to chime in. And you&#8217;re not in you&#8217;re not in my reading group, consider joining. We do have <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1544636">openings</a>. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
		</item>
		<item>
		<title>Social Utility, +/- 25%</title>
		<link>http://thenoisychannel.com/2011/04/14/social-utility-25/</link>
		<comments>http://thenoisychannel.com/2011/04/14/social-utility-25/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 04:19:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3550</guid>
		<description><![CDATA[I like Google&#8230; I&#8217;ve been a regular Google user since the day I first discovered its existence in 1999. Indeed, I&#8217;ve consistently found Google to be the most useful service on the web. That&#8217;s not love, but it&#8217;s a very strong +1. Moreover, I&#8217;d say that my preference for Google is an informed one. I&#8217;ve [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.businessinsider.com/heres-the-memo-telling-all-google-employees-their-2011-pay-depends-on-google-sucking-less-at-social-2011-4"><img class="alignnone" title="FAQ for Google employees about the &quot;social&quot; bonus (via Business Insider)" src="http://static3.businessinsider.com/image/4d9e557eccd1d599390f0000-915-581/multiplier-faq.jpg" alt="" width="514" height="326" /></a></p>
<h3>I like Google&#8230;</h3>
<p>I&#8217;ve been a regular Google user since the day I first discovered its existence in 1999. Indeed, I&#8217;ve consistently found Google to be the most useful service on the web. That&#8217;s not love, but it&#8217;s a very strong <a href="http://www.google.com/+1/button/">+1</a>.</p>
<p>Moreover, I&#8217;d say that my preference for Google is an informed one. I&#8217;ve given all of the major search engines a <a href="http://thenoisychannel.com/2009/06/01/banging-on-bing-a-bummer/">fair chance</a>, and even tried a fair number of <a href="http://thenoisychannel.com/2008/10/16/duck-duck-go/">obscure</a> <a href="http://thenoisychannel.com/2009/03/15/kosmix-im-impressed/">ones</a>. They all have their strengths, but none have delivered enough utility to me to justify the cognitive load of using more than one search engine for the open web.</p>
<h3>&#8230;but I don&#8217;t need Google.</h3>
<p>Nonetheless, I know that, if Google disappeared tomorrow or became <a href="http://www.mobilecrunch.com/2010/09/09/verizon-to-bing-i-choose-you/">inconvenient to access</a>, I&#8217;d be content with one of its competitors. I have no particular investment in Google beyond brand loyalty.</p>
<p>Actually, that&#8217;s not entirely true. I could easily walk away from Google search, but I&#8217;d be apoplectic if I suddenly lost access to my Gmail account &#8212; much as if I lost access to my LinkedIn or Twitter accounts. Indeed, Gmail is the only way in which Google has me locked in, but I don&#8217;t see my Gmail account as entangled with my access to Google&#8217;s other services.</p>
<p>Perhaps that not a bug but a feature: after all, Google trumpets the virtues of <a href="http://googleblog.blogspot.com/2009/12/meaning-of-open.html">&#8220;open&#8221;</a> and the portability of user data (including Gmail) through the <a href="http://www.dataliberation.org/">Data Liberation Front</a>. Nonetheless, it&#8217;s no secret that Google has a major case of <a href="http://abclocal.go.com/kabc/story?section=news/consumer&amp;id=8072533">Facebook envy</a>. And if <a href="http://www.businessinsider.com/heres-the-memo-telling-all-google-employees-their-2011-pay-depends-on-google-sucking-less-at-social-2011-4">rumors</a> hold, Google is now making the success of its social strategy a major component in all employee compensation.</p>
<h3>Social is Give to Get.</h3>
<p>Google critics often assert that <a href="http://www.google.com/search?q=%22google+doesn't+get+social%22">Google doesn&#8217;t get social</a>. But I think the problem isn&#8217;t so much with what Google gets as what it gives. When it comes to social, you have to give to get. That is, to get data and engagement, you have to provide social utility.</p>
<p>To start off, Google would love to know <strong>who you are</strong>. That&#8217;s why it developed <a href="http://www.google.com/support/accounts/bin/answer.py?answer=97703">Google Profiles</a> in 2007. People are more than willing to provide data about who they are, as proven by the hundreds of millions of people who create profiles on Facebook and LinkedIn. Perhaps Google was a little bit late to the game. More likely, people didn&#8217;t see enough utility in creating Google profiles. Facebook, on the other hand, helps people be found by their friends and family in a context designed for social interaction. LinkedIn offers people the opportunity to be found by people who can help you professionally: colleagues, classmates, potential employers, etc. Google didn&#8217;t give people much reason to invest effort &#8212; in fact it seems to treat Profiles as a dumping ground populated by Google&#8217;s other products, rather than valuable piece of online real estate embedded in a living social context. Not surprisingly, users invest their efforts elsewhere.</p>
<p>Google would also love to know <strong>where you are</strong> and <strong>where you&#8217;ve been</strong> &#8212; that&#8217;s why Google created <a href="http://techcrunch.com/2009/02/04/broadcast-your-location-to-friends-with-google-latitude/">Latitude</a> in 2009. Moreover, Google developed this pioneering location-based service as a complement to Google Maps, perhaps the best product Google has produced outside of search. Given it&#8217;s dominance in mapping services, directions, and local search, Google should be the leader of all things local. And yet, while Latitude has flopped, Foursquare &#8212; which launched in the same year as a tiny startup after Google acquired and shut down its <a href="http://en.wikipedia.org/wiki/Dodgeball_(service)">previous incarnation</a>&#8211; succeeded in defining location-based services as a category. Before Foursquare, the idea of a service tracking your location was one that most of us associated with <a href="http://www.lojack.com/">Lo-Jack</a> and <a href="http://en.wikipedia.org/wiki/Nineteen_Eighty-Four">Big Brother</a> &#8212; if not with modern totalitarian regimes. Yet, by making a game out of &#8220;checking in&#8221; to venues, Foursquare inspired its users to willingly &#8212; and eagerly! &#8212; share and publish their whereabouts. It&#8217;s unclear whether this model will create sustained interest (cf. Mark Watkins&#8217;s analysis at <a href="http://www.readwriteweb.com/archives/2011_the_year_the_check-in_died.php">ReadWriteWeb</a>), but Foursquare&#8217;s success thus far is predicate on its offers social utility in exchange for data and attention.</p>
<p>Of course, Google also wants to know <strong>what you like</strong>. That&#8217;s why Google developed <a href="http://thenoisychannel.com/2008/11/21/google-searchwiki-an-interesting-take-on-pim/">SearchWiki</a> (RIP), <a href="http://google-latlong.blogspot.com/2010/11/discover-yours-local-recommendations.html">Hotpot</a> (now <a href="http://googleblog.blogspot.com/2011/04/hotpot-is-going-places.html">merged into Places</a>), and most recently <a href="http://www.google.com/+1/button/">+1</a>. As Amazon, Facebook, Netflix, and Yelp have demonstrated, people aren&#8217;t shy about sharing their opinions publicly, given the right social context and utility. Unfortunately, Google seems to struggle with that last part. Google embedded SearchWiki in the non-social context of search &#8212; and has launched +1 the same way. It&#8217;s not at all clear what users would gain by going out of their flow to annotate search results. Hotpot may simply be a case of too little, too late &#8212; people are already trained to go to Yelp and Facebook Fan pages for subjective information about service businesses. Overall, Google has not given users a reason to believe there is significant return on their investment in sharing opinions.</p>
<h3>Collecting Data Doesn&#8217;t Count.</h3>
<p>Of course Google is able to collect a significant amount of data about users&#8217; identities through their search history, cookies, browser toolbars, and purchase history (if they use Google Checkout). Indeed, it is Google inference of user intent in search queries that has allowed Google to become the poster child of online advertising.</p>
<p>But collecting data is not the same as having the user volunteer it. Most users have a transactional relationship with Google, tolerating data collection and advertising in exchange for a free service. Google wants more &#8212; it wants users to invest in identities associated with their Google accounts. But Google doesn&#8217;t seem to undertand that users don&#8217;t make these investments unless their receive some social or professional utility in return.</p>
<p>If it&#8217;s true that Larry Page is making &#8220;social&#8221; Google&#8217;s top <a href="http://dondodge.typepad.com/the_next_big_thing/2010/01/how-google-sets-goals-and-measures-success.html">OKR</a>, then I hope for the sake of my former colleagues that Google has learned from its past experiments.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/14/social-utility-25/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/14/social-utility-25/feed/</wfw:commentRss>
		<slash:comments>39</slash:comments>
		</item>
		<item>
		<title>Guest Blog: Data 2.0 Conference Report</title>
		<link>http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/</link>
		<comments>http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 15:26:41 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3546</guid>
		<description><![CDATA[Note: This post was written by Scott Nicholson, a Senior Data Scientist at LinkedIn. Scott is data and modeling geek with a passion for startups, product and user experience. His work at LinkedIn focuses on analyzing and improving user engagement and monetization. I’m happy to report back on my experience at the Data 2.0 conference, an [...]]]></description>
				<content:encoded><![CDATA[<p><object width="400" height="300"><param name="flashvars" value="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fgroups%2Fdata2con%2Fpool%2Fshow%2F&amp;page_show_back_url=%2Fgroups%2Fdata2con%2Fpool%2F&amp;group_id=1614380@N25&amp;jump_to=&amp;start_index=" /><param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649" /><param name="allowFullScreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" flashvars="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fgroups%2Fdata2con%2Fpool%2Fshow%2F&amp;page_show_back_url=%2Fgroups%2Fdata2con%2Fpool%2F&amp;group_id=1614380@N25&amp;jump_to=&amp;start_index=" allowfullscreen="true"></embed></object></p>
<p><em>Note: This post was written by <a href="http://www.linkedin.com/in/scottnicholsonphd">Scott Nicholson</a>, a Senior Data Scientist at LinkedIn. Scott is data and modeling geek with a passion for startups, product and user experience. His work at LinkedIn focuses on analyzing and improving user engagement and monetization.</em></p>
<p>I’m happy to report back on my experience at the <a href="http://data2con.com/">Data 2.0 conference</a>, an event organized by <a href="http://midventures.com/">midVentures</a> and targeted at entrepreneurs building products to leverage the dramatic increase in publicly and privately collected data. The conference has four main themes: what data is available, how to obtain data, how to store and access data, and how to create value from data products. For data nerds or hackers, the conference offered a delightful stream of  “you know what would be cool&#8230;” ideas.</p>
<p>The morning started off on a strong foot with a talk by <a href="http://wadhwa.com/">Vivek Wadhwa</a> on how data is going to define the next generation of successful startups in a new information age. He observed the increasing online access to data that has previously been restricted to offline access (or no access at all). He also emphasized the importance of  new sources of data, such as medical records and genome data. We need to think of social use of data beyond Twitter, Facebook and LinkedIn: for example, genome data will allow us to connect to each other in ways that helps us better understand our similarities and differences. Meanwhile, some existing data sources will become increasingly open and available to all. Wadhwa stressed the importance of leveraging the open sources of federal, state and local government data to come up with solutions to the existing closed and clunky legacy systems that governments used to generate data reports (<em>a pity that <a href="http://data.gov/">data.gov</a> and related programs may be <a href="http://www.guardian.co.uk/news/datablog/2011/apr/05/data-gov-crisis-obama">defunded</a> &#8212; DT</em>).</p>
<p>The morning keynote segued nicely into the <a href="http://data2con.com/schedule/topics-2/#WhyOpenData">panel</a> on open data sources. <a href="http://www.jaynath.com/">Jay Nath</a>, Director of CRM for the city of San Francisco, noted that, while many applications are using government data and APIs, they mostly address consumer convenience (e.g., public transit apps) rather than government efficiency.  Panelists agreed that government employees have few incentives to take risks by using new technology: legacy systems might be expensive, inflexible and inefficient, but they do perform their limited function. Alluding to Eric Ries&#8217;s idea of a &#8220;<a href="http://theleanstartup.com/">lean startup</a>&#8220;, Nath suggested the concept of a &#8220;lean government&#8221; that lowered costs, sped up its operations, and avoided procurement processes by using open source technology &#8212; all in the context of providing services to its citizens.</p>
<p>The inspiring mid-day keynote by former Amazon Chief Scientist <a href="http://www.weigend.com/">Andreas Weigend</a> took a different perspective from the morning sessions: he focused on the how data sharing can provide tangible value to end-users, even resulting in significant behavior change. He cited products like<a href="http://www.withings.com/en/bodyscale"> tweeting weight scales</a>,<a href="http://www.fitbit.com/"> FitBit</a>, and<a href="http://www.apple.com/ipod/nike/"> Nike +</a> that allow people to share data about their fitness efforts, thus leading to social reinforcement for positive behaviors. I personally see this area as a great example of where data scientists and engineers can create enormous economic value and increase people’s welfare</p>
<p>The day also featured a various product launches and presentations. Here are a few that caught my attention:</p>
<ul>
<li><a href="http://micello.net/">Micello</a>: Google maps for indoors. They won the startup competition that was held in conjunction with the conference.</li>
<li><a href="https://www.tropo.com/home.jsp">Tropo</a>: API for voice calls and SMS</li>
<li><a href="http://www.datastax.com/products/brisk">DataStax Brisk</a>: Technology unifying<a href="http://hadoop.apache.org/"> Hadoop</a>,<a href="http://wiki.apache.org/hadoop/Hive"> Hive</a> &amp;<a href="http://cassandra.apache.org/"> Cassandra</a>. A new Hadoop distribution powered by Cassandra.</li>
<li><a href="http://www.neerlife.com/">Neer</a>: always-on location awareness app from Qualcomm. Privately share location with groups and families.</li>
<li><a href="http://www.heritagehealthprize.com/c/hhp">Heritage Health Prize</a>: $3MM prize for predictive modeling around who will require hospitalization (a follow-up on their announcing the prize at<a href="http://strataconf.com/strata2011"> Strata</a>)</li>
</ul>
<p>Overall, it was great to see hundreds of people exploring innovations and opportunities to use data to improve business, technology and society.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Steal These Ideas!</title>
		<link>http://thenoisychannel.com/2011/03/27/steal-these-ideas/</link>
		<comments>http://thenoisychannel.com/2011/03/27/steal-these-ideas/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 01:05:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3541</guid>
		<description><![CDATA[Talk is cheap, as the saying goes. That&#8217;s a good thing, since I am always overflowing with ideas that I have neither the time (I love my day job!) nor the money to advance. What I do have is a blog that I hope inspires readers to turn some of these ideas into reality. My [...]]]></description>
				<content:encoded><![CDATA[<p>Talk is cheap, as the saying goes. That&#8217;s a good thing, since I am always overflowing with ideas that I have neither the time (I love my <a href="http://www.linkedin.com/in/dtunkelang">day job</a>!) nor the money to advance. What I do have is a blog that I hope inspires readers to turn some of these ideas into reality.</p>
<p>My ideas are somewhat predictable, in that they all address user-centric <a href="http://en.wikipedia.org/wiki/Information_seeking">information-seeking</a> problems. Working for over a decade in this space has focused my intellectual curiosity somewhat &#8212; and of course I work on a number of these problems at <a href="http://www.linkedin.com/">LinkedIn</a>. But there are many information-seeking problems that are outside of my present or foreseeable scope.</p>
<p>Here are two ideas that I&#8217;m hoping someone will execute on so I don&#8217;t have to:</p>
<p style="padding-left: 30px;"><strong>1. Shopping: Help Me Figure Out What I Want</strong></p>
<p style="padding-left: 30px;">We&#8217;ve come a long way to improve the shopping experience, at least for utilitarian shoppers like yours truly. If I know exactly what I want, I usually find it by using Google as a gateway to Amazon, taking a bit more time if I&#8217;m feeling price-sensitive. I&#8217;d happily install a browser extension that could automatically detect product search queries and take them to my preferred shopping sites, bypassing the search results page, but that&#8217;s a minor detail of convenience (though probably not such a minor detail for the search engine companies). In any case, <a href="http://www.iva.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item search</a> for online shopping is hardly inspiring as an open problem.</p>
<p style="padding-left: 30px;"><a href="http://en.wikipedia.org/wiki/Exploratory_search">Exploratory search</a> is another story entirely. For all the work that&#8217;s been done on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, it is used almost exclusively to help people narrow search results. Progressive narrowing is great if you have a pre-established information need, but it is not the best interface if you&#8217;re hoping to evolve your information need through exploration. Instead of just &#8220;help me find what I&#8217;m looking for&#8221;, I&#8217;d also like to see more &#8220;help me figure out what I want&#8221;. I&#8217;d like to see an innovator applying faceted search to broaden queries, not just to narrow them, as well as going beyond <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a> and &#8220;related items&#8221; to create a compelling browsing experience based on semantic and <a href="http://thenoisychannel.com/2008/04/27/social-navigation/">social navigation</a>.</p>
<p style="padding-left: 30px;"><strong>2. Organizing the World&#8217;s Information: Beyond Wikipedia and Navigational Queries</strong></p>
<p style="padding-left: 30px;">If shopping online often reduces to using Google to find product pages on Amazon, then <a href="http://en.wikipedia.org/wiki/Web_search_query">informational queries</a> similarly reduce to using Google to find Wikipedia entries. Nothing against Wikipedia &#8212; I think it is one of the most extraordinary achievements of our generation &#8212; but I think of the web as a library and Wikipedia as its encyclopedia section. Google&#8217;s <a href="http://www.google.com/corporate/">mission statement</a> notwithstanding, web search engines do a poor job of organizing the rest of the world&#8217;s information, instead choosing to optimize for known-item search.</p>
<p style="padding-left: 30px;">There are countless opportunities for improvement here. Imagine if there were an interface for books, scholarly articles, patents, music, or videos that supported browsing and exploration of their content and meta-data. We&#8217;ve seen the beginnings of such an approach for individual libraries (e.g., the <a href="http://search.trln.org/">Triangle Research Libraries Network</a>), but there is so much more to do in this space. Perhaps it&#8217;s a space that is hard to monetize, but even then I&#8217;d expect philanthropists to take an interest in making the world&#8217;s knowledge and creative artifacts more accessible.</p>
<p>If you are pursuing either of these areas, I&#8217;d love to hear about it. I&#8217;m sure readers here would too. I&#8217;m also curious to learn more about innovation in the travel and personals spaces, as those are both areas that could benefit from supporting exploratory search. And if you have work in progress, please present it at the <a href="http://hcir.info/hcir-2011">HCIR workshop</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/03/27/steal-these-ideas/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/03/27/steal-these-ideas/feed/</wfw:commentRss>
		<slash:comments>38</slash:comments>
		</item>
		<item>
		<title>LinkedIn: HCIR for Fun and Profit</title>
		<link>http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/</link>
		<comments>http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/#comments</comments>
		<pubDate>Sun, 20 Mar 2011 05:10:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3524</guid>
		<description><![CDATA[This afternoon, I met with a couple of Stanford seniors to advise them on a startup they&#8217;ve been developing and targeting towards mid-sized online retailers. I&#8217;d expected to spend most of the time talking about their technology and customer development strategy &#8212; and we did indeed talk about these things. But we spent most of [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/"><img class="alignnone size-full wp-image-3527" title="LinkedIn -- not just the world's best recruiting tool!" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/03/Screen-shot-2011-03-19-at-9.14.52-PM1.png" alt="" width="532" height="534" /></a></p>
<p>This afternoon, I met with a couple of Stanford seniors to advise them on a startup they&#8217;ve been developing and targeting towards mid-sized online retailers. I&#8217;d expected to spend most of the time talking about their technology and <a href="http://www.amazon.com/Four-Steps-Epiphany-Steven-Blank/dp/0976470705">customer development</a> strategy &#8212; and we did indeed talk about these things. But we spent most of the time brainstorming <em>whom</em> I knew that could best help them achieve the key milestone of landing a first customer.</p>
<p>Not surprisingly, my first step was to open up my laptop and head straight to <a href="http://www.linkedin.com/">LinkedIn</a> (I&#8217;m not only a <a href="http://www.linkedin.com/search/fpsearch?title=data+scientist&amp;currentTitle=CP&amp;searchLocationType=I&amp;countryCode=us&amp;keepFacets=keepFacets&amp;page_num=1&amp;facet_CC=1337">data scientist</a> &#8212; I&#8217;m also a <a href="http://www.linkedin.com/in/dtunkelang">member</a>!) to see who in my network might be most helpful to them at this critical stage. The students were openly impressed: despite being sharp, energetic, and remarkably business-savvy for a couple of guys not old enough to legally buy beer, they had never seen someone use LinkedIn the way I was doing in front of them &#8212; not for hiring or recruiting, but as an <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> tool to find useful professional connections.</p>
<p>I started with a search for online retail, then restricted to directors and VPs, narrowing down further to first-degree and second-degree connections. I vetted second-degree connections by looking at my paths to them, determining who would be likely to be most helpful either because they owed me a favor or because they might have their own interest in the startup&#8217;s success.</p>
<p>We then browsed through the list of <a href="http://www.internetretailer.com/top500/list/">top online retailers</a>, identifying plausible companies for them to target and then looking for my first-degree and second-degree connections not only at those companies but also at other companies in the same space. We spent over an hour fluidly going back and forth between talking and exploring on LinkedIn. In the course of this exploration, we not only produced a list of people to contact, but also arrived at a better understanding of the business strategy.</p>
<p>I&#8217;m always happy to help young entrepreneurs who represent the future of our economy, and even happier to do so using the tools my colleagues and I are constantly working to improve. But I&#8217;m surprised and a bit disheartened that the methods I used are not common knowledge, especially among people who stand to gain the most benefit from them. Perhaps, as someone who has been using LinkedIn since 2004, I take for granted that people know how to take advantage of it for professional networking. I hope that the company&#8217;s increasing visibility will make more people aware that LinkedIn is not *just* the best things that has ever happened to recruiting.</p>
<p>Also, as an <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> advocate, I&#8217;d like to see these kinds of information-seeking tasks receive more attention from researchers and practitioners. I&#8217;ve been saying for a while that these and similar tasks that are neglected by the <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> community and <a href="http://thenoisychannel.com/2008/08/07/where-google-isnt-good-enough/">not adequately addressed by Google</a> . For example, while there has been significant research effort in the area of <a href="http://scholar.google.com/scholar?hl=en&amp;q=expert+finding&amp;btnG=Search&amp;as_sdt=1%2C5&amp;as_ylo=&amp;as_vis=0">expert finding</a>, I&#8217;d like to see more efforts to improving the interactive process of finding experts and expertise. And <a href="http://thenoisychannel.com/2011/02/04/got-skills/">not just from LinkedIn</a>!</p>
<p>If you are doing work in this space, I hope you&#8217;ll participate in the upcoming <a href="http://hcir.info/">HCIR workshop</a> and show off your stuff.</p>
<p>In the meantime, I hope you make the most of LinkedIn, for fun and for profit. As a mentor of mine told me in my first job, it&#8217;s &#8220;network or not work&#8221;.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Practical Rant about Software Patents</title>
		<link>http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/</link>
		<comments>http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/#comments</comments>
		<pubDate>Mon, 07 Mar 2011 08:21:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3496</guid>
		<description><![CDATA[Given the controversial content of this post, I&#8217;d like to remind readers upfront that this post, like all of the contents of this blog, represents my personal opinions, and in particular does not represent the opinions of my present or former employers. I am not a lawyer, nor do I claim to have read any [...]]]></description>
				<content:encoded><![CDATA[<p><em>Given the controversial content of this post, I&#8217;d like to remind readers upfront that this post, like all of the contents of this blog, represents my personal opinions, and in particular does not represent the opinions of my present or former employers. I am not a lawyer, nor do I claim to have read any of the patents to which I directly or indirectly allude in this posts. None of the below should construed as legal advice. Finally, the material is US-centric &#8212; your national software patent policy may vary.</em></p>
<p>My feelings about software patents are a matter of public record (e.g., this <a href="http://thenoisychannel.com/2010/09/25/an-open-letter-to-the-uspto/">open letter to the USPTO</a>). As things stand today, software patents act as an innovation tax rather than as a catalyst for innovation. It may be possible to resolve the problems of software patents through aggressive reform, but it would be better to abolish software patents than to maintain the status quo.</p>
<p>My personal feelings notwithstanding, I acknowledge the reality that today&#8217;s software companies need to have defensive patent strategies. In a previous job, one of my key accomplishments was to hire a director of intellectual property. It was a difficult hire, but it happened just in time to defend against a particularly noxious patent troll. I am not at liberty to spell out the details, but I can say that we responded with a long, expensive fight that effectively quashed the patent and the lawsuit.</p>
<h3><strong>Beware Of Trolls</strong></h3>
<h2><strong><a href="http://en.wikipedia.org/wiki/Patent_troll"><img class="alignnone" title="Beware of Trolls" src="http://upload.wikimedia.org/wikipedia/commons/2/27/BewareOfTrolls.svg" alt="" width="97" height="85" /></a><br />
</strong></h2>
<p>Patent trolls, known less pejoratively as non-practicing entities (NPE) because they do not actually sell products or services that implement the systems or methods in the patents they own, take advantage of asymmetric risk. On one hand, an NPE does not need much money to bankroll (or at least initiate) a patent infringement suit &#8212; in fact, there are law firms who will take such cases on contingency. On the other hand, the company being sued faces potentially ruinous costs. Moreover, even if a company feels certain that a lawsuit against it is baseless, the company cannot count on the imperfect and inefficient legal system to reach a fair outcome. As a result, the company has to choose between spending heavily in its own defense or settling with the NPE. Most companies opt for the less risky route and negotiate settlements, providing funds that the NPEs use to sue more companies.</p>
<p>Some people have a name for this style of asymmetric warfare &#8212; namely, <a href="http://en.wikipedia.org/wiki/Asymmetric_warfare#Asymmetric_warfare_and_terrorism">terrorism</a>. I suppose that the word terrorist is loaded enough without increasing its breadth to include patent trolls &#8212; not to mention that trolls have their <a href="http://en.wikipedia.org/wiki/Patent_troll#Criticism_of_the_term">defenders</a>. But the metaphor is a useful one. A terrorist attack inflicts an amount of damage that is much greater than the absolute cost to the terrorist, e.g., a suicide bomber who inflicts mass murder. Moreover, the threat of terrorism puts the object of that threat in the position between settling (aka negotiating with terrorists) or spending heavily on counter-terrorism efforts. As Peter Neumann notes in a <em>Foreign Affairs</em> <a href="http://www.foreignaffairs.com/articles/62276/peter-r-neumann/negotiating-with-terrorists  ">article</a>:</p>
<blockquote><p>The argument against negotiating with terrorists is simple: Democracies must never give in to violence, and terrorists must never be rewarded for using it. Negotiations give legitimacy to terrorists and their methods…</p>
<p>Yet in practice, democratic governments often negotiate with terrorists.</p></blockquote>
<p>There have been various attempts to address the threat of patent trolls.</p>
<p>Google litigation director Catherine Lacavera has gone <a href="http://www.bloomberg.com/apps/news?pid=newsarchive&amp;sid=ar3V._UIg9CM">on record</a> saying that Google intends to fight rather than settle patent infringement lawsuits in order to deter patent trolls. We&#8217;ll see if Google can sustain this &#8220;we don&#8217;t negotiate with terrorists&#8221; approach; I admire the resolve, but like Neumann I&#8217;m skeptical.</p>
<p><a href="http://www.articleonepartners.com/">Article One Partners</a> has built a business around crowd-sourcing patent invalidation. Clients pay for research to invalidate patents, and Article One offers bounties to anyone who contributes valuable evidence. In theory, companies can request validity analysis of their own patents to test them for robustness, but I assume that the primary application of this service is the invalidation patents that a company sees as threats.</p>
<p><a href="http://www.rpxcorp.com/">Rational Patent (RPX)</a> has created a defensive patent pool. purchasing a large portfolio of patents and then licensing them to its member companies. Some have questioned whether this approach is &#8220;<a href="http://techcrunch.com/2008/11/24/is-rpxs-defensive-patent-aggregation-simply-patent-extortion-by-another-name/">patent extortion by another name</a>&#8220;, and indeed paying RPX for a blanket license does feel a bit like preemptively settling in bulk. But I&#8217;d be more concerned that the &#8220;over 1,500 US and international patent assets&#8221; that RPX claims to have acquired are a drop in the bucket compared to the vast number of patents that the USPTO has granted, many of dubious merit.</p>
<p>Meanwhile, patent trolldom is serious business. Former Microsoft CTO <a href="http://en.wikipedia.org/wiki/Nathan_Myhrvold">Nathan Myhrvold</a> created <a href="http://www.intellectualventures.com/">Intellectual Ventures</a> to &#8220;invest both expertise and capital in the development and monetization of inventions and patent portfolios.&#8221; The company has only filed one <a href="http://bits.blogs.nytimes.com/2010/12/08/intellectual-ventures-goes-to-court/">lawsuit</a> so far, but Mike Masnick <a href="http://www.techdirt.com/articles/20100217/1853298215.shtml">claims</a> that it has used over a thousand shell companies to conduct stealth lawsuits.</p>
<p>Unfortunately, the proliferation of lawsuits by software patent trolls suggests that the economic incentives encourage such suits. If every company could sustain a &#8220;<a href="http://en.wikiquote.org/wiki/Galaxy_Quest">Never give up, never surrender!</a>&#8221; approach, patent trolls would eventually go away, but it is unlikely that companies would be willing to assume the short-term risks that such an approach entails.</p>
<p>Moreover, this approach only works if everyone participates, requiring that every company forgo the competitive advantage it could enjoy from being the only company among its competitors to appease the trolls. This is a classic <a href="http://en.wikipedia.org/wiki/Tragedy_of_the_commons">tragedy of the commons</a>. I&#8217;m hopeful that we&#8217;ll eventually implement sensible patent reform in the United States, but I expect it will take a long time to overcome the entrenched interests that support the status quo.</p>
<h3><strong>It&#8217;s Not Just The Trolls</strong></h3>
<p><a href="http://bits.blogs.nytimes.com/2010/03/04/an-explosion-of-mobile-patent-lawsuits/"><img class="alignnone" title="An Explosion of Mobile Patent Lawsuits" src="http://graphics8.nytimes.com/images/2010/03/03/technology/bits-suepatent2/bits-suepatent2-blogSpan.jpg" alt="" width="138" height="184" /></a></p>
<p>But NPEs are not the only cause for concern. Many established companies, including some technology leaders, are not averse to using patent lawsuits as part of their business strategy. The mobile device and software space is a particularly <a href="http://bits.blogs.nytimes.com/2010/03/04/an-explosion-of-mobile-patent-lawsuits/">popular arena</a> for patent litigation, the most notable being <a href="http://www.scribd.com/doc/35810897/Oracle-Google-Complaint">Oracle&#8217;s lawsuit against Google</a> claiming that Android infringes on patents related to Java. The stakes are extraordinary, dwarfing even the <a href="http://en.wikipedia.org/wiki/NTP,_Inc.#RIM_patent_infringement_litigation">$612.5M that RIM paid NTP</a> in order to avoid a complete shutdown of the BlackBerry service (ironically, at least some of the patents involved have since been rejected by the patent office after re-examination).</p>
<p>Patent lawsuits can also be a way for larger companies to bully smaller ones. For example, a couple of entrepreneurs at visual search startup <a href="http://thenoisychannel.com/2009/12/26/r-i-p-modista/">Modista</a> were forced to shut down their company because of a lawsuit by Like.com, a more established player in the space which was ultimately <a href="http://www.rev2.org/2010/08/16/google-acquires-like-com-visual-search/">acquired by Google</a>. Note: although I was Google at the time, I have no inside knowledge of the acquisition, nor whether there is any truth to the speculation that Google acquired the company for its patents.</p>
<h3>Defensive Patenting</h3>
<p><a href="http://en.wikipedia.org/wiki/The_Art_of_War"><img class="alignnone" title="Sun Tzu - The Art of War" src="http://upload.wikimedia.org/wikipedia/commons/9/94/Bamboo_book_-_binding_-_UCR.jpg" alt="" width="145" height="168" /></a></p>
<p>Moral considerations aside, the above stories make it clear that defensive patent strategy isn&#8217;t just about NPEs. In fact, many software companies take an approach to defensive patenting is to assemble a trove of patents that are useful for countersuits and thus serve as a deterrent. Back to military metaphors, it&#8217;s similar to countries developing nuclear weapons (a popular metaphor for patents in general) in accordance with the doctrine of <a href="http://en.wikipedia.org/wiki/Mutual_assured_destruction">mutual assured destruction</a>.</p>
<p>Companies that follow a defensive patent strategy typically implement a process for capturing intellectual property. Scientists and engineers file invention disclosures, a committee reviews these for patentability, and a law firm translates the invention disclosures into patent filings. The filings then go through the meat grinder of patent prosecution and eventually are extruded as patents.</p>
<p>It all sounds great in theory &#8212; indeed, I have seen executives who mostly worry about educating scientists and engineers about patents and providing the right incentives to encourage them to write and submit invention disclosures. Indeed, it can be difficult to integrate intellectual property capture into the process and culture of a software company. But I think there are two much bigger issues.</p>
<p>First, it takes several years to obtain a patent. Indeed, the <a href="http://www.uspto.gov/dashboards/patents/main.dashxml">USPTO dashboard</a> shows that it takes two years just to get an initial response from the patent office. Thus a defensive patenting strategy requires significant advance planning: any patents filed today are unlikely to be useful deterrents until at least 2014. Given the rapid pace of the software industry, this delay is very significant. Moreover, startups are especially vulnerable in their first few years.</p>
<p>Second, intellectual property capture processes are inherently optimized for offensive (i.e., don&#8217;t copy my invention or I&#8217;ll sue you) rather than defensive (i.e., don&#8217;t sue me) patent strategy. Consider Google&#8217;s defensive position with respect to Oracle in the aforementioned lawsuit. Google has a relatively small patent portfolio, but it has obtained patents for some of its major innovations, such as <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>. Let&#8217;s put aside <a href="http://www.dbms2.com/2010/02/11/google-mapreduce-patent/">questions about the validity of the MapReduce patent</a> &#8212; especially since patents enjoy the <a href="http://www.uspto.gov/web/offices/pac/mpep/documents/appxl_35_U_S_C_282.htm">presumption of validity</a>. The bigger question is to whom such a patent serves as a deterrent against patent lawsuits. It may very well deter <a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a> users, which include Google arch-rival <a href="http://www.facebook.com/notes/facebook-engineering/looking-at-the-code-behind-our-three-uses-of-apache-hadoop/468211193919">Facebook</a>. But, as far as I know, Oracle is not vulnerable on this front. FOSS Patents blogger Florian Mueller did an <a href="http://fosspatents.blogspot.com/2011/01/google-is-patently-too-weak-to-protect.html">analysis</a> and concluded that Google&#8217;s patents are not an effective deterrent. Indeed, the fact that Google has not counter-sued Oracle using its own patents is at least consistent with this analysis.</p>
<p>What if Google were to invest in obtaining (i.e., purchasing) a collection of broad patents that had to do with relational databases? Such patents could have nothing to do with Google&#8217;s areas of innovation and nonetheless serve as an effective deterrent against lawsuits from relational database companies like Oracle. Even if the patents were not robust, they would still have some value as deterrents because of their presumption of validity and the aforementioned inefficiency of the legal system.</p>
<p>In general, the most valuable defensive patents are those that you believe your competitors (or anyone else who might have an interest in suing you) are already infringing. Even if those patents would be unlikely to survive re-examination, the re-examination process is long and expensive, and even the most outrageous of patents enjoys the presumption of validity.</p>
<h3>Everybody Into The Pool</h3>
<p><a href="http://www.youtube.com/watch?v=kB2Vuc2W0U0"><img class="alignnone" title="Flintstones - Everybody into the Pool" src="http://i4.ytimg.com/vi/kB2Vuc2W0U0/default.jpg" alt="" width="120" height="90" /></a></p>
<p>A <a href="http://en.wikipedia.org/wiki/Patent_pool">patent pool</a> is a consortium of at companies that agree to cross-license each other&#8217;s patents &#8212; a sort of mutual non-aggression pact. But perhaps companies that only believe in the defensive use of patents should take a more aggressive approach to patent pooling. Following the example of <a href="http://en.wikipedia.org/wiki/NATO">NATO</a>, they could create an alliance in which they agree to mutual defense in response to an attack by any external party. I don&#8217;t know if such an approach would be viewed as anti-competitive, but it does strike me as a cost-effective alternative to the current approach for defensive patenting.</p>
<p>And, as with most ideas, this one is hardly original. In 1993, Autodesk founder <a href="http://en.wikipedia.org/wiki/John_Walker_(programmer)">John Walker</a> published &#8220;<a href="http://www.fourmilab.ch/autofile/www/chapter2_105.html">PATO: Collective Security In the Age of Software Patents</a>&#8220;, in which he proposed:</p>
<blockquote><p>The basic principle of NATO is that an attack on any member is considered an attack on all members. In PATO it works like this&#8211;if any member of PATO is alleged with infringement of a software patent by a non-member, then that member may counter-sue the attacker based on infringement of any patent in the PATO cross-licensing pool, regardless of what member contributed it. Once a load of companies and patents are in the pool, this will be a deterrent equivalent to a couple thousand MIRVs in silos&#8211;odds are that any potential plaintiff will be more vulnerable to 10 or 20 PATO patents than the PATO member is to one patent from the aggressor. Perhaps the suit will just be dropped and the bad guy will decide to join PATO&#8230;.</p>
<p>Since PATO is chartered to promote the free exchange and licensing of software patents, members do not seek revenue from their software patents&#8211;only mutual security. Thus, anybody can join PATO, even individual programmers who do not have a patent to contribute to the pool&#8211;they need only pay the nominal yearly dues and adhere to the treaty&#8211;that any software patents they are granted will go in the pool and that they will not sue any other PATO member for infringement of a software patent.</p></blockquote>
<p>It&#8217;s been almost two decades, but perhaps PATO is an idea whose time has come. And, even if a collective effort fails, individual companies might do well to focus less on intellectual property capture and more on collecting the kinds of nuisance patents currently favored by trolls. After all, the best defense is the credible threat of a good offense.</p>
<h3>Conclusion</h3>
<p>Even if you hate software patents, you can&#8217;t afford to ignore them if you are in the software industry. And I&#8217;m well aware that not everyone shares my view of software patents. But I hope those who do find useful advice in the above discussion. I&#8217;d love to see the software industry move beyond this innovation tax.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/feed/</wfw:commentRss>
		<slash:comments>34</slash:comments>
		</item>
		<item>
		<title>Life, the Universe, and SEO Revisited</title>
		<link>http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/</link>
		<comments>http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/#comments</comments>
		<pubDate>Sat, 26 Feb 2011 19:46:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3488</guid>
		<description><![CDATA[A couple of years ago, I wrote a post entitled &#8220;Life, the Universe, and SEO&#8221; in which I considered Google&#8217;s relationship with the search engine optimization (SEO) industry. Specifically, I compared it to the relationship that Deep Thought, the computer in Douglas Adams&#8217;s Hitchhikers Guide to the Galaxy, has with the Amalgamated Union of Philosophers, [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.naderlibrary.com/hitchhiker.screen2.htm"><img class="alignnone" title=" The Amalgamated Union of Philosophers, Sages, Luminaries and Other Thinking Persons" src="http://www.naderlibrary.com/ahitchscreen81.jpg" alt="" width="512" height="384" /></a></p>
<p>A couple of years ago, I wrote a post entitled &#8220;<a href="http://thenoisychannel.com/2008/11/24/life-the-universe-and-seo/">Life, the Universe, and SEO</a>&#8221; in which I considered Google&#8217;s relationship with the search engine optimization (SEO) industry. Specifically, I compared it to the relationship that <a href="http://en.wikipedia.org/wiki/Minor_characters_from_The_Hitchhiker's_Guide_to_the_Galaxy#Deep_Thought">Deep Thought</a>, the computer in Douglas Adams&#8217;s <em><a href="http://en.wikipedia.org/wiki/The_Hitchhiker's_Guide_to_the_Galaxy">Hitchhikers Guide to the Galaxy</a></em>, has with the Amalgamated Union of Philosophers, Sages, Luminaries and Other Thinking Persons.</p>
<p>Interestingly, both SEO and union protests have been front-page news of late. I&#8217;ll focus on the former.</p>
<p>Three recent incidents brought mainstream attention to the SEO industry:</p>
<ul>
<li>Two weeks ago, Google head of web spam Matt Cutts <a href="http://www.nytimes.com/2011/02/13/business/13search.html">told the New York Times</a> that Google was engaging in a &#8220;corrective action&#8221; that penalized retailer J. C. Penney&#8217;s search results because the company had engaged in SEO practices that violated Google&#8217;s guidelines. For months before the action (which included the holiday season), J. C. Penney was performing exceptionally well in broad collection of Google searches, including such queries as [<a href="http://www.google.com/search?q=dresses">dresses</a>], [<a href="http://www.google.com/search?q=bedding">bedding</a>], [<a href="http://www.google.com/search?q=area+rugs">area rugs</a>], [<a href="http://www.google.com/search?q=skinny+jeans">skinny jeans</a>], [<a href="http://www.google.com/search?q=home+decor">home decor</a>], [<a href="http://www.google.com/search?q=comforter+sets">comforter sets</a>], [<a href="http://www.google.com/search?q=furniture">furniture</a>], [<a href="http://www.google.com/search?q=tablecloths">tablecloths</a>], and [<a href="http://www.google.com/search?q=grommet+top+curtains">grommet top curtains</a>]. As I write this blog post, I do not see results from <a href="http://www.jcpenney.com/">jcpenney.com</a> on the first result page for any of these search queries.</li>
<li>This past Thursday, online retailer Overstock.com <a href="http://online.wsj.com/article/SB10001424052748704520504576162753779521700.html">reported to the Wall Street Journal</a> that Google was penalizing them because of Overstock&#8217;s now discontinued practice of rewarding students and faculties with discounts in exchange for linking to Overstock pages from their university web pages. Before the penalty, these links were helping Overstock show up at the top of result sets for queries like [<a href="http://www.google.com/search?q=bunk+beds">bunk beds</a>] and [<a href="http://www.google.com/search?q=gift+baskets">gift baskets</a>]. As I write this blog post, I do not see results from <a href="http://www.overstock.com/">overstock.com</a> on the first result page for either of these search queries.</li>
<li>That same day, Google announced, via an <a href="http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html">official blog post</a> by Amit Singhal (Google&#8217;s head of core ranking) and Matt Cutts, a change that, according to their analysis, noticeably impacts 11.8% of of Google search queries. In their words: &#8220;This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.&#8221;</li>
</ul>
<p>Of course, Google is always working to improve search quality and stay at least one step ahead of those who attempt to reverse-engineer and game its ranking of results. But it&#8217;s quite unusual to see so much public discussion of ranking changes in such a short time period.</p>
<p>Granted, there is a growing chorus in the blogosphere bemoaning the <a href="http://dashes.com/anil/2011/01/threes-a-trend-the-decline-of-google-search-quality.html">decline of Google&#8217;s search quality</a>. Much of it focused on &#8220;<a href="http://en.wikipedia.org/wiki/Content_farm">content farms</a>&#8221; that seem to be the target of Google&#8217;s latest update. Perhaps Google&#8217;s new public assertiveness is a reaction to what it sees as unfair press. Indeed, Google&#8217;s recent <a href="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/">public spat with Bing</a> would be consistent with a more assertive PR stance.</p>
<p>But what I find most encouraging is the Google&#8217;s recent release of Chrome browser extension that allows users to <a href="http://googleblog.blogspot.com/2011/02/new-chrome-extension-block-sites-from.html">create personal site blocklists</a> that are reported to Google. Some may see this is as a reincarnation of <a href="http://thenoisychannel.com/2008/11/21/google-searchwiki-an-interesting-take-on-pim/">SearchWiki</a>, an ill-conceived and short-lived feature that allowed searchers to annotate and re-order results. But filtering out entire sites for all searches offers users a much greater return on investment than demoting individual results for specific searches.</p>
<p>Of course, I&#8217;d love to see user control taken <a href="http://thenoisychannel.com/2009/01/08/google-tech-talk-reconsidering-relevance/">much further</a>. And I wonder if efforts like personal blocklists are the beginning of Amit offering me a more positive answer to the <a href="http://thenoisychannel.com/2008/04/08/qa-with-amit-singhal-2/">question</a> I asked him back in 2008 about relevance approaches that relied on transparent design rather than obscurity.</p>
<p>I&#8217;m a realist: I recognize that many site owners are competing for users&#8217; attention, that most users are lazy, and that Google wants to optimize search quality subject to these constraints. I also don&#8217;t think that anyone today threatens Google with the promise of better search quality (and yes, I&#8217;ve tried <a href="http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/">Blekko</a>).</p>
<p>Perhaps the day is in sight when <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a> (HCIR) offers a better alternative to the organization of web search results than the black-box ranking that fuels the SEO industry. But I&#8217;ve waiting for that long enough to not be holding my breath. Instead, I&#8217;m encouraged to see a growing recognition that today&#8217;s approaches are an endless game of <a href="http://en.wikipedia.org/wiki/Whac-A-Mole">Whac-A-Mole</a>, and I&#8217;m delighted  that at least one of the improvements on the table takes a realistic approach to putting more power in the hands of users.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Life&#8217;s a Beach</title>
		<link>http://thenoisychannel.com/2011/02/14/lifes-a-beach/</link>
		<comments>http://thenoisychannel.com/2011/02/14/lifes-a-beach/#comments</comments>
		<pubDate>Tue, 15 Feb 2011 00:48:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3484</guid>
		<description><![CDATA[Heading to Punta Cana for a week. Feel free to keep writing great comments &#8212; will catch up when I get back!]]></description>
				<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/B%C3%A1varo"><img class="alignnone" title="Punta Cana - Bávaro Beach" src="http://upload.wikimedia.org/wikipedia/commons/7/78/Bavaro.jpg" alt="" width="500" height="375" /></a></p>
<p>Heading to <a href="http://en.wikipedia.org/wiki/B%C3%A1varo">Punta Cana</a> for a week. Feel free to keep writing great <a href="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/#comments">comments</a> &#8212; will catch up when I get back!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/14/lifes-a-beach/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/14/lifes-a-beach/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google vs. Bing: A Tweetle Beetle Battle Muddle</title>
		<link>http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/</link>
		<comments>http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/#comments</comments>
		<pubDate>Sun, 06 Feb 2011 00:26:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3474</guid>
		<description><![CDATA[Unless you&#8217;ve been living in a cone of silence, you&#8217;ve probably heard about the epic war of words between Google and Bing. But just in case, here&#8217;s a quick summary: Amit Singhal, Google Fellow: &#8220;Microsoft’s Bing uses Google search results—and denies it&#8220;: Bing is using some combination of: Internet Explorer 8, which can send data [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://fc09.deviantart.net/fs4/i/2004/218/b/e/tweetle_beetle_battle.jpg"><img class="alignnone" title="tweetle beetle bottle puddle battle muddle" src="http://fc09.deviantart.net/fs4/i/2004/218/b/e/tweetle_beetle_battle.jpg" alt="" width="528" height="408" /></a></p>
<p>Unless you&#8217;ve been living in a <a href="http://en.wikipedia.org/wiki/Cone_of_Silence">cone of silence</a>, you&#8217;ve probably heard about the epic war of words between Google and Bing. But just in case, here&#8217;s a quick summary:</p>
<p>Amit Singhal, Google Fellow: &#8220;<a href="http://googleblog.blogspot.com/2011/02/microsofts-bing-uses-google-search.html ">Microsoft’s Bing uses Google search results—and denies it</a>&#8220;:</p>
<blockquote><p>Bing is using some combination of:</p>
<ul>
<li>Internet Explorer 8, which can send data to Microsoft via its <a href="http://www.microsoft.com/windows/internet-explorer/privacy.aspx">Suggested Sites</a> feature</li>
<li>the Bing Toolbar, which can send data via Microsoft’s <a href="http://www.microsoft.com/products/ceip/EN-US/default.mspx">Customer Experience Improvement Program</a></li>
</ul>
<p>or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.</p></blockquote>
<p>Harry Shum, Corporate Vice President, Bing: &#8220;<a href="http://www.bing.com/community/site_blogs/b/search/archive/2011/02/01/thoughts-on-search-quality.aspx">Thoughts on search quality</a>&#8220;:</p>
<blockquote><p>We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.</p></blockquote>
<p>Yusuf Mehdi, Senior Vice President, Online Services Division, Bing: &#8220;<a href="http://www.bing.com/community/site_blogs/b/search/archive/2011/02/02/setting-the-record-straight.aspx">Setting the record straight</a>&#8220;:</p>
<blockquote><p>Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results.  What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.</p></blockquote>
<p>Matt Cutts, Head of Webspam, Google: &#8220;<a href="http://www.mattcutts.com/blog/google-bing/ ">My thoughts on this week’s debate</a>&#8220;:</p>
<blockquote><p>Something I’ve heard smart people say is that this could be due to generalized clickstream processing rather than code that targets Google specifically. I’d love if Microsoft would clarify that, but <a href="http://news.ycombinator.com/item?id=2168332">at least one example has surfaced</a> in which Microsoft was targeting Google’s urls specifically. The paper is titled <a href="http://aclweb.org/anthology/P/P10/P10-1028.pdf">Learning Phrase-Based Spelling Error Models from Clickthrough Data</a> and here’s some of the relevant parts:</p>
<p style="padding-left: 30px;">The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser <em>[I assume this is Internet Explorer. --Matt]</em> …. In our experiments, we “reverse-engineer” the parameters from the URLs of these [query formulation] sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs.”</p>
<p>This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction. Figure 1 of that paper looks like Microsoft used specific Google url parameters such as “&amp;spell=1″ to extract spell corrections from Google. Targeting Google deliberately is quite different than using lots of clicks from different places.</p></blockquote>
<p>Let me start by saying that these are very serious words from very serious people.</p>
<p>Amit and Matt, both of whom I know personally, are not just two of the most prominent Google employees &#8212; they have a deep personal investment in Google&#8217;s search quality. Amit is personally responsible for much of Google&#8217;s web search ranking algorithm, and Matt is surely the person whom spammers (and many SEO consultants) most love to hate. There is no question in my mind that the emotion both of them are expressing is sincere.</p>
<p>I haven&#8217;t met Harry or Yusuf, but I have no reason to doubt their own sincerity &#8212; especially since everything they are saying seems consistent with the facts &#8212; in fact, consistent with the substantive parts of Google&#8217;s allegations. Indeed, the facts don&#8217;t really seem to be in dispute. And more generally, I&#8217;ve met some of the folks who lead the Bing team (like <a href="http://www.jopedersen.com/">Jan Pedersen</a>), and, like Matt, I believe they are thoughtful and sincere and are devoted to building a great search engine of their own.</p>
<p>The debate is not about the facts. Rather, it&#8217;s about what is right and wrong. I will try to summarize the two sides&#8217; position without editorializing.</p>
<p>Bing is claiming that:</p>
<ul>
<li>Users have a right to do as they please with their own clickthrough data, which includes data from Google search sessions.</li>
<li>Bing toolbar users opted in to share this clickthrough data with Bing.</li>
<li>By using this clickthrough data, Bing creates value for users.</li>
</ul>
<p>Google is claiming that:</p>
<ul>
<li>Bing&#8217;s specific targeting of Google clickthrough data amounts to copying Google and is wrong.</li>
<li>Bing toolbar users are not necessarily aware that they are complicit in this behavior.</li>
<li>Bing is disingenuous in understating how much it benefits from Google as a signal.</li>
</ul>
<p>What do I think?</p>
<p>I agree with Bing that users have the right to do as they please with clickthrough data. I&#8217;d think Google would agree too, given that Google wrote the sermon on &#8220;<a href="http://googleblog.blogspot.com/2009/12/meaning-of-open.html">the meaning of open</a>&#8220;:</p>
<blockquote><p>Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.</p></blockquote>
<p>I agree with all of the three points I listed as Google&#8217;s claims except for the part that Bing&#8217;s behavior is wrong. It&#8217;s up to users if they want to help Bing compete with Google. Do users know that they&#8217;re doing so? Probably not. But would they stop doing so if they did? I doubt it. I can&#8217;t see why most users would have a dog in this fight &#8212; and in fact, it may be in users&#8217; interest to help Bing be more competitive.</p>
<p>I do think Bing should be forthright about what it is doing &#8212; and how much this user-provided data from Google search sessions is contributing to its own quality improvements. Bing can, of course, keep this information secret, but I&#8217;d think that Bing would want to defend its reputation as an innovator &#8212; especially as the David in a David vs. Goliath fight.</p>
<p>But I also think that Google should be careful with its accusations. Accusing Bing of not being innovative is one thing, and that accusation, backed by concrete examples, is probably enough to score points. But implying that Google owns its users&#8217; clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.</p>
<p>I&#8217;m curious to hear what others here think. It&#8217;s been a while since I could freely express opinions about Google and Bing, so I&#8217;m delighted to have such a hot controversy to incite discussion. Because everyone enjoys a<a href="http://en.wikipedia.org/wiki/Fox_in_Socks"> muddle puddle tweetle poodle beetle noodle bottle paddle battle</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/feed/</wfw:commentRss>
		<slash:comments>52</slash:comments>
		</item>
		<item>
		<title>Got Skills?</title>
		<link>http://thenoisychannel.com/2011/02/04/got-skills/</link>
		<comments>http://thenoisychannel.com/2011/02/04/got-skills/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 05:53:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3463</guid>
		<description><![CDATA[Last October, a certain blogger said: LinkedIn needs to implement some kind of concept extraction to provide a useful topic facet (something I’d also love to see for their regular people search). This is a challenging information extraction problem, especially for the open web, but I also know from experience that it is tractable within a domain. [...]]]></description>
				<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/skills/skill/Information_Retrieval"><img class="alignnone size-full wp-image-3466" title="LinkedIn Skills: Information Retrieval" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-9.10.05-PM1.png" alt="" width="507" height="589" /></a></p>
<p>Last October, <a href="http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/">a certain blogger said</a>:</p>
<blockquote><p>LinkedIn needs to implement some kind of concept extraction to provide a useful topic facet (something I’d also love to see for their regular people search). This is a challenging information extraction problem, especially for the open web, but I also know from <a href="http://www.endeca.com/">experience</a> that it is tractable within a domain. Given LinkedIn’s professional focus, I believe this is a problem they can and should tackle.</p></blockquote>
<p>Shortly after writing that post, I interviewed at LinkedIn and met <a href="http://www.linkedin.com/in/peterskomoroch">Pete Skomoroch</a>, who showed me an early preview of the work his team was doing to make skills a <a href="http://en.wikipedia.org/wiki/Faceted_search">facet</a> for exploring the space of LinkedIn member profiles. That demo made a strong impression on me, giving me a taste of the great products LinkedIn&#8217;s data scientists were working on in the lab.</p>
<p>And now I&#8217;m delighted that everyone can try out the beta launch of <a href="http://www.linkedin.com/skills/">LinkedIn Skills</a> which was announced today at O&#8217;Reilly&#8217;s <a href="http://strataconf.com/strata2011">Strata 2011</a> conference on Big Data.</p>
<p>As Pete says in his <a href="http://blog.linkedin.com/2011/02/03/linkedin-skills/">blog post</a>:</p>
<p><!-- p.p1 {margin: 0.0px 0.0px 13.0px 0.0px; line-height: 17.0px; font: 13.0px Arial} --></p>
<blockquote><p>If you search for a particular skill, we’ll surface key people within that community, show you the top locations, related companies, relevant jobs, and groups where you can interact with like minded professionals.  You’ll also be able to explore similar skills and compare their growth relative to each other.</p></blockquote>
<p>I encourage you to check it out &#8212; whether you&#8217;re looking for experts on <a href="http://www.linkedin.com/skills/skill/Hadoop">Hadoop</a>, <a href="http://www.linkedin.com/skills/skill/Cheese">cheese</a>, or anything else! It&#8217;s a beta, so I&#8217;m sure you&#8217;ll find rough edges; but I hope it gives you a sense of how LinkedIn&#8217;s data can enable a incredibly powerful and useful <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> experience.</p>
<p><a href="http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/">No forward-looking statements</a>, except to say that it only gets better from here!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/04/got-skills/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/04/got-skills/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Be Vewy Vewy Quiet</title>
		<link>http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/</link>
		<comments>http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 18:52:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3456</guid>
		<description><![CDATA[While my blog has always been and will always be a personal one, I do operate under certain constraints as someone whose subject matter relates strongly to his professional interests. I deeply appreciate how long-time readers have respected the balancing act I sometimes have to make as both an independent individual and an employee. Right [...]]]></description>
				<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-3457" title="&quot;Be vewy vewy quiet, I'm hunting wabbits.&quot;" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/01/elmer-221x300.jpg" alt="" width="155" height="210" /></p>
<p>While my blog has always been and will always be a <a href="http://thenoisychannel.com/2008/12/10/this-is-not-a-corporate-blog/">personal</a> one, I do operate under certain constraints as someone whose subject matter relates strongly to his professional interests. I deeply appreciate how long-time readers have respected the balancing act I sometimes have to make as both an independent individual and an employee.</p>
<p>Right now, that means I must respect the conditions of my employer&#8217;s <a href="http://www.sec.gov/answers/quiet.htm">quiet period</a> &#8212; and I will do so very conservatively (e.g., no Playboy interviews). I apologize if the content of this blog suffers in the interim, but I hope you understand my need to be cautious.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Internship Opportunities at LinkedIn</title>
		<link>http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/</link>
		<comments>http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/#comments</comments>
		<pubDate>Wed, 19 Jan 2011 04:47:03 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3450</guid>
		<description><![CDATA[Do you love big data? Do you enjoy applying your skills in data mining, machine learning, information retrieval and data visualization? Are you a hands-on implementer who can turn your ideas into reality, whether in Java or Python? Are you turned on by NoSQL technologies like Hadoop, Pig, and Voldemort? And one last question&#8230;are looking for [...]]]></description>
				<content:encoded><![CDATA[<p>Do you love <a href="http://thenoisychannel.com/2011/01/04/so-you-like-big-data/">big data</a>? Do you enjoy applying your skills in data mining, machine learning, information retrieval and data visualization? Are you a hands-on implementer who can turn your ideas into reality, whether in Java or Python? Are you turned on by <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> technologies like <a href="http://hadoop.apache.org/">Hadoop</a>, <a href="http://pig.apache.org/">Pig</a>, and <a href="http://project-voldemort.com/">Voldemort</a>?</p>
<p>And one last question&#8230;are looking for an exciting internship opportunity this summer? Then you&#8217;ve come to the right place at the right time: <a href="http://www.linkedin.com/">LinkedIn</a> is looking for a few good interns for summer 2011! You can find more details <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1354912">here</a> or go directly to the <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1354912">application form</a>.</p>
<p>If you are interested, I encourage you to act quickly, since we are already interviewing candidates.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dare To Dream</title>
		<link>http://thenoisychannel.com/2011/01/17/dare-to-dream/</link>
		<comments>http://thenoisychannel.com/2011/01/17/dare-to-dream/#comments</comments>
		<pubDate>Mon, 17 Jan 2011 23:20:33 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3444</guid>
		<description><![CDATA[&#8220;If a man hasn&#8217;t discovered something that he will die for, he isn&#8217;t fit to live.&#8221; Martin Luther King, Jr. said these words at a speech in Detroit on June 23, 1963. Less than five years later, he died for the cause to which he devoted his life: the advancement of civil rights in the [...]]]></description>
				<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/PbUtL_0vAJk?fs=1&amp;hl=en_US&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/PbUtL_0vAJk?fs=1&amp;hl=en_US&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p><em>&#8220;If a man hasn&#8217;t discovered something that he will die for, he isn&#8217;t fit to live.&#8221;</em></p>
<p><a href="http://en.wikipedia.org/wiki/Martin_Luther_King,_Jr.">Martin Luther King, Jr.</a> said these words at a <a href="http://www.mlkonline.net/detroit.html">speech in Detroit</a> on June 23, 1963. Less than five years later, he died for the cause to which he devoted his life: the advancement of civil rights in the United States and around the world through civil disobedience and other nonviolent resistance.</p>
<p>Today, as Americans commemorate Dr. King&#8217;s birthday, there are many ways we can honor his memory and build on his legacy. As much as King advanced the civil rights movement, there is still much to be done to fulfill his dream.</p>
<p>But I&#8217;d like to go back to the quote from his speech in Detroit. King&#8217;s words reveal a truth even deeper than his struggle for civil rights. They demand that we approach life with passion, that we live to do something more than pass the time.</p>
<p>In the face of pressing day-to-day responsibilities, it is easy to fall into a reactive rhythm, doing what we have to do and then using what time is left to escape into oblivion. For many of us, passion may feel like a nice-to-have, something to think about after we&#8217;ve cleared out our queues and gotten a full night of sleep &#8212; only to wake up and find that the queue is full again. It is easy to go through life like <a href="http://en.wikipedia.org/wiki/Sisyphus">Sisyphus</a>, sweating profusely as we roll our boulders but lacking the intellectual ambition to question why we make those efforts.</p>
<p>Today, the least we can do to honor the memory of Martin Luther King, Jr. is to reflect on his personal passion to leave the world better than he found it. Hopefully none of us will ever have to make the sacrifice that he made to realize his dream. But if we do not dare to dream at all &#8212; if we are not passionate and ambitious about what do &#8212; then, indeed, we are not fit to live.</p>
<p>Dare to dream &#8212; and live to make that dream a reality.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/17/dare-to-dream/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/17/dare-to-dream/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quo Vadis, Quora?</title>
		<link>http://thenoisychannel.com/2011/01/09/quo-vadis-quora/</link>
		<comments>http://thenoisychannel.com/2011/01/09/quo-vadis-quora/#comments</comments>
		<pubDate>Sun, 09 Jan 2011 22:14:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3433</guid>
		<description><![CDATA[I know, everyone is sick about hearing about Quora, the community question answering site that is the darling of the blogosphere, and perhaps you fled here from TechCrunch hoping for something different. If so, I apologize. And if you want to read something else, I encourage you to use either the random post widget I [...]]]></description>
				<content:encoded><![CDATA[<p>I know, everyone is sick about hearing about <a href="http://www.quora.com/">Quora</a>, the community question answering site that is the darling of the blogosphere, and perhaps you fled here from <a href="http://techcrunch.com/tag/quora/">TechCrunch</a> hoping for something different. If so, I apologize. And if you want to read something else, I encourage you to use either the random post widget I recently added to the right-hand sidebar  or the <a href="http://thenoisychannel.com/2011/01/07/enabling-exploratory-search-with-dhiti/">exploration widget</a> at the bottom of this post.</p>
<p>But I have personal reasons to be interested in Quora. One of their lead engineers, <a href="http://xng.cc/">Albert Sheu</a>, was a <a href="http://www.linkedin.com/profile/recommendations?id=9903676">star intern</a> of mine at <a href="http://www.endeca.com/">Endeca</a>. And Quora raises lots of interesting questions about search, user experience, knowledge management, and <a href="http://thenoisychannel.com/2010/05/02/thoughts-about-online-reputation/">online reputation</a>. How could I resist?</p>
<p>I see three potential reasons to use Quora:</p>
<ol>
<li>Objective question answering.</li>
<li>Subjective question answering.</li>
<li>Community participation.</li>
</ol>
<p>Let&#8217;s consider how Quora fares today on each of these, and where it might go.</p>
<p><strong>1. Objective question answering.</strong></p>
<p>When I <a href="http://thenoisychannel.com/2010/04/19/qui-quae-quora/">blogged about Quora</a> early last year, I said that &#8220;I don’t see Quora as a knowledge base of first resort–except possibly to learn more about software startups.&#8221; Despite Quora&#8217;s recently <a href="http://www.quora.com/Quora-Growth-Surge-Dec-2010-Jan-2011">growth surge</a>, I am not ready to change my answer significantly &#8212; I find that Quora&#8217;s topics are pretty sparse when I stray from its Silicon Valley focus.</p>
<p>Within that focus, Quora is nailing it. For example, I was curious to learn whether someone who signed a non-compete agreement outside of California was still subject to it if he or she moved to California, where such contracts are legally unenforceable. Not surprisingly, <a href="http://www.quora.com/Non-Compete-Agreements">non-compete agreements</a> are a topic on Quora, and I quickly found a <a href="http://www.quora.com/If-I-have-a-non-compete-agreement-with-a-company-in-NY-and-move-to-CA-is-the-non-compete-agreement-unenforcable">useful answer</a> from a lawyer.</p>
<p>But for most objective questions, I&#8217;m still turning to Google and Wikipedia &#8212; or to <a href="http://twitter.com/#!/dtunkelang">Twitter</a> if both of those fail and I am willing to ask a favor of my followers (who <a href="http://thenoisychannel.com/2009/03/14/challenge-blog-twitter-vs-aardvark/">kick ass</a>!). Sometimes Google will take me to Quora, but I can&#8217;t imagine Quora will succeed through this flow in the long term.</p>
<p><strong>2. Subjective question answering.</strong></p>
<p>I see subjective question answering as Quora&#8217;s strongest suit. A good subjective question on Quora &#8212; often a &#8220;why&#8221; question &#8212; generates a diverse collection of interesting and informed perspectives. A couple of good example are &#8220;<a href="http://www.quora.com/Why-did-Google-Wave-fail-to-get-significant-user-adoption">Why did Google Wave fail to get significant user adoption?</a>&#8221; and &#8220;<a href="http://www.quora.com/Social-Networks/What-is-lacking-in-social-networking-now">What is lacking in social networking now?</a>&#8220;.</p>
<p>Again, these questions are well within the Silicon Valley focus, but I could see Quora extending this value proposition to other verticals if it can grow the communities successfully. And I certainly don&#8217;t see myself going to Google or even Twitter to get useful answers to subjective questions. The closest is <a href="http://thenoisychannel.com/2009/05/27/topsy-tippling-the-stream-of-conversations/">Topsy</a>, and Quora has the advantage of being explicitly organized around questions and topics.</p>
<p><strong>3. Community participation.</strong></p>
<p>Is Quora a question answering site or a social network? Quora users and employees have tried to answer that question (<a href="http://www.quora.com/Is-Quora-a-social-network">on Quora</a>, natch), but I&#8217;m not sure Quora&#8217;s converged enough for anyone to know. What is clear is that Quora emphasizes conversation, making it more like a blog or wiki than an answers site.</p>
<p>Conversation certainly engages its participants. But it also raises the cost of participation. One of the things I love about Google is that it gives me information without unnecessary overhead. When I want conversation, I go to social venues like Twitter.</p>
<p>Perhaps Quora can be both a question answering site and a social network. But I suspect it will need to choose. Most people don&#8217;t have the time or patience to participate in additional communities, so question answering is the easier sell to a mass audience. But the participation is what makes Quora especially distinctive today. Perhaps it&#8217;s a question of quality vs. quantity.</p>
<p>So, <em><a href="http://en.wikipedia.org/wiki/Quo_vadis">quo vadis</a></em>, Quora? I suppose I&#8217;ll have to <a href="http://www.quora.com/Quora-Quality/Quora-is-a-curated-community-of-early-adopters-now-its-nice-but-how-can-it-scale">check Quora</a> (or <a href="http://www.cwora.com/">Cwora</a>) to find the answers.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/09/quo-vadis-quora/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/09/quo-vadis-quora/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>No More Quora Invites</title>
		<link>http://thenoisychannel.com/2011/01/07/no-more-quora-invites/</link>
		<comments>http://thenoisychannel.com/2011/01/07/no-more-quora-invites/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 14:54:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3430</guid>
		<description><![CDATA[Over the past days, I have been inundated with requests for Quora invites. I realize that I brought this upon myself my making my blog the top hit on Google for [quora invite] &#8212; though it seems I&#8217;m at least down to the #2 slot now. In any case, I have sent out over a [...]]]></description>
				<content:encoded><![CDATA[<p>Over the past days, I have been inundated with requests for <a href="http://www.quora.com/">Quora</a> invites. I realize that I brought this upon myself my making my blog the top hit on Google for [<a href="http://www.google.com/search?q=quora+invite">quora invite</a>] &#8212; though it seems I&#8217;m at least down to the #2 slot now. In any case, I have sent out over a hundred invites and need to stop fulfilling requests so that I can focus on my day job!</p>
<p>I hope everyone I&#8217;ve invited is enjoying Quora. But I also hope you take it upon yourselves to circulate more invitations to those who want them. Any Quora user can send out invites &#8212; that&#8217;s how these viral sites work. If you&#8217;re still looking for an invite, I urge you to use Twitter or some other broadcast mechanism to request it. As of today, I will stop responding to Quora invite requests through my blog or email, and I will also delete comments requesting them. I am sorry if this is a bit harsh, but I hope folks understand.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/07/no-more-quora-invites/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/07/no-more-quora-invites/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
