<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Noisy Channel</title>
	<atom:link href="http://thenoisychannel.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://thenoisychannel.com</link>
	<description></description>
	<lastBuildDate>Thu, 03 May 2012 02:44:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Science as a Strategy</title>
		<link>http://thenoisychannel.com/2012/04/25/science-as-a-strategy/</link>
		<comments>http://thenoisychannel.com/2012/04/25/science-as-a-strategy/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 16:25:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4178</guid>
		<description><![CDATA[Last night, I had the pleasure to deliver the keynote address as the CIO Summit US. It was an honor to address an assembly of CIOs, CTOs, and technology executives from the nation&#8217;s top organizations. My theme was &#8220;Science as a Strategy&#8221;. To set the stage, I told the story of TunkRank: how, back in [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/04/CIOnew.png"><img class="size-full wp-image-4190" title="CIO Summit US" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/04/CIOnew.png" alt="" width="318" height="67" /></a></p>
<p><iframe src="http://www.youtube.com/embed/dftt6Yqgnuw?rel=0" frameborder="0" width="480" height="272"></iframe></p>
<p>Last night, I had the pleasure to deliver the keynote address as the <a href="http://www.ciosummitna.com/">CIO Summit US</a>. It was an honor to address an assembly of CIOs, CTOs, and technology executives from the nation&#8217;s top organizations. My theme was &#8220;Science as a Strategy&#8221;.</p>
<p>To set the stage, I told the story of <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a>: how, back in 2009, I proposed a Twitter influence measure based on an explicit model of attention scarcity which <a href="http://thenoisychannel.com/2010/04/07/go-tunkrank/">proved</a> better than the intuitive but flawed approach of counting followers. The point of the story was not self-promotion, but rather to introduce my core message:</p>
<p><strong>Science is the difference between instinct and strategy.</strong></p>
<p>Given the audience, I didn&#8217;t expect this message to be particularly controversial. But we all know that belief is not the same as action, and science is not always popular in the C-Suite. Thus, I offered three suggestions to overcome the HIPPO (Highest Paid Person&#8217;s Opinion):</p>
<ul>
<li>Ask the right questions.</li>
<li>Practice good data hygiene.</li>
<li>Don’t argue when you can experiment!</li>
</ul>
<p><strong>Asking the Right Questions</strong></p>
<p>Asking the right questions seems obvious &#8212; after all, our answers can only be as good as the questions we ask. But science is littered with examples of people asking the wrong questions &#8212; from 19th-century <a href="http://en.wikipedia.org/wiki/Phrenology">phrenologists</a> measuring the sizes of people&#8217;s skulls to evaluate intelligence to IT executives measuring lines of code to evaluate programmer productivity. It&#8217;s easy for us (today) to recognize these approaches as pseudoscience, but we have to make sure we ask the right questions in our own organizations.</p>
<p>As an example, I turned to the challenge of improving the hiring process. One approach I&#8217;ve seen tried at both Google and LinkedIn is to measure the accuracy of interviewers &#8212; that is, to see how well the hire / no-hire recommendations of individual interviewers predict the final decisions. But this turns out to be the wrong question &#8212; in large part because negative recommendations (especially early ones) weigh much more heavily in the decision than positive ones.</p>
<p>What we found instead was that we should focus on efficiency as an optimization problem. More specifically, there is a trade-off: short-circuiting the process as early as possible (e.g., after the candidate performs poorly on the first phone screen) reduces the average time per candidate, but it also reduces the number of good candidates who make it through the process. To optimize overall throughput (while keeping our high bar), we&#8217;ve had to calibrate the upstream filters. How to optimize that upstream filter turns out to be the right question to ask &#8212; and one we still continue to iterate on.</p>
<p>More generally, I talked about how, when we hire <a href="http://www.quora.com/If-I-want-to-do-Data-Science-would-LinkedIn-or-Twitter-be-a-better-place-to-start-work/answer/Daniel-Tunkelang">data scientists at LinkedIn</a>, we look for not only strong analytical skills but also the product and business sense to pick the right questions to ask – questions whose answers create value for users and drive key business decisions. Asking the right questions is the foundation of good science.</p>
<p><strong>Practicing Good Data Hygiene</strong></p>
<p>Data mining is amazing, but we have to watch out for its pejorative meaning of discovering spurious patterns. I used the <a href="http://www.investopedia.com/terms/s/superbowlindicator.asp">Super Bowl Indicator </a>as an example of data mining gone wrong &#8212; with 80% accuracy, the division (AFC vs. NFC) of the Super Bowl champion predicts the coming year&#8217;s stock market performance. Indeed, the NFC won this year (<a href="http://en.wikipedia.org/wiki/Super_Bowl_XLVI">go Giants!</a>) and subsequent market gains have been consistent with this indicator (so far).</p>
<p>We can all laugh at these misguided investors, but we make these mistakes all the time. Despite what researchers have called the &#8220;<a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/35179.pdf">unreasonable effectiveness of data</a>”, we still need the scientific method of first hypothesizing and then experimenting in order to obtain valid and useful conclusions. Without data hygiene, our desires, preconceptions, and other human frailties infect our rational analysis.</p>
<p>A very different example is using click-through data to measure the effectiveness of relevance ranking. This approach isn&#8217;t completely wrong, but it suffers from several flaws. And the fundamental flaw relates to data hygiene: how we present information to users infects their perception of relevance. Users assume that top-­ranked results are more relevant than lower-­ranked results. Also, they can only click on the results presented to them. To paraphrase <a href="http://en.wikipedia.org/wiki/There_are_known_knowns">Donald Rumsfeld</a>: they don&#8217;t know what they don&#8217;t know. If we aren&#8217;t careful, a click-­based evaluation of relevance creates positive feedback and only reinforces our initial assumptions – which certainly isn&#8217;t the point of evaluation!</p>
<p>Fortunately, there are ways to avoid these biases. We can pay people to rate results presented to them in random order. We can use the <a href="http://en.wikipedia.org/wiki/Multi-armed_bandit">explore / exploit</a> technique to hedge against the ranking algorithm’s preconceived bias. And so on.</p>
<p>But the key take-away is that we have to practice good data hygiene, splitting our projects into the two distinct activities of hypothesis generation (i.e., exploratory analysis) and hypothesis testing using withheld data.</p>
<p><strong>Don’t Argue when you can Experiment</strong></p>
<p>I couldn&#8217;t resist the opportunity to cite Nobel laureate <a href="http://en.wikipedia.org/wiki/Daniel_Kahneman">Daniel Kahneman</a>&#8216;s seminal work on understanding human irrationality. I also threw in Mercier and Sperber&#8217;s recent work on <a href="http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/">reasoning as argumentative</a>. The summary: don&#8217;t trust anyone&#8217;s theories, not even mine!</p>
<p>Then what can you trust? The results of a well-­‐run experiment. Rather than debating data-­‐free assertions, subject your hypotheses to the ultimate test: controlled experiments. Not every hypothesis can be tested using a controlled experiment, but most can be.</p>
<p>I recounted the story of how <a href="http://glinden.blogspot.com/">Greg Linden</a> persuaded his colleagues at Amazon to implement shopping-cart recommendations through <a href="http://en.wikipedia.org/wiki/A/B_testing">A/B testing</a>, despite objections from a marketing SVP. Indeed, his work &#8212; and Amazon&#8217;s generally &#8212; has strongly advanced the practice of A/B testing in online settings.</p>
<p>Of course, A/B testing is fundamental to all of our work at LinkedIn. Every feature we release, whether it&#8217;s the <a href="http://blog.linkedin.com/2012/03/27/new-people-you-may-know/">new People You May Know interface</a> or <a href="http://blog.linkedin.com/2012/04/03/new-group-search/">improvements to Group Search relevance</a>, starts with an A/B test. And sometimes A/B testing causes us to not launch &#8212; we listen to the data.</p>
<p>Don&#8217;t argue when you can experiment. Decisions about how to improve products and processes should not be by an Oxford-­style debate. Rather, those decisions should be informed by data.</p>
<p><strong>Conclusion: Even Steve Jobs Made Mistakes</strong></p>
<p>Some of you may think that this is all good advice, but that science is no match for an inspired leader. Indeed, some pundits have seen Apple&#8217;s success relative to Google as an indictment of data-­driven decision making in favor of an approach that follows a leader&#8217;s gut instinct. Are they right? Should we throw out all of our data and follow our CEOs&#8217; instincts?</p>
<p>Let&#8217;s go back a decade. In 2002, Apple faced a pivotal decision – perhaps the most important decision in its history. The iPod was clearly a breakthrough product, but it was only compatible with the Mac. Remember that, back in 2002, Apple had only a 3.5% market share in the PC business. Apple&#8217;s top executives did their analysis and predicted that they could drive the massive success of the iPod by making it compatible with Windows, the dominant operating system with over 95% market share.</p>
<p>Steve Jobs resisted. At one point he said that Windows users would get to use the iPod &#8220;over [his] dead body&#8221;. After continued convincing, Jobs gave up. According to authorized biographer <a href="http://www.amazon.com/Steve-Jobs-Walter-Isaacson/dp/1451648537">Walter Isaacson</a>, Steve&#8217;s exact words were: “Screw it. I’m sick of listening to you assholes. Go do whatever the hell you want.” Luckily for Steve, Apple, and the consumer public, they did, and the rest is history.</p>
<p>It isn’t easy being one those ass­holes. But that’s our job, much as it was theirs. It’s up to us to turn data into gold, to apply science and technology to create value for our organizations. Because without data, we are gambling on our leaders&#8217; gut feelings. And our leaders, however inspired, have fallible instincts.</p>
<p>Science is the difference between instinct and strategy.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/25/science-as-a-strategy/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/25/science-as-a-strategy/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Semantic Link and Internet Evolution</title>
		<link>http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/</link>
		<comments>http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 22:56:32 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4170</guid>
		<description><![CDATA[    Recently I had a couple of great opportunities to share my thoughts publicly, and I wanted to make sure readers here were aware of them. The first was a special guest appearance on The Semantic Link, a program hosted by Paul Miller with regular panelists Peter Brown, Christine Connors, Eric Franzon, Eric Hoffer, Bernadette [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://semanticweb.com/the-semantic-link-with-guest-daniel-tunkelang-%E2%80%93-april-2012_b28246"><img class="alignnone" title="The Semantic Link" src="http://www.commoncrawl.org/wp-content/uploads/2012/01/semanticweb.com-logo.jpg" alt="" width="126" height="126" /></a>   <a href="http://www.internetevolution.com/radio.asp?doc_id=240580"><img class="alignnone" style="margin-top: 50px; margin-bottom: 50px;" title="Internet Evolution" src="http://img.deusm.com/internetevolution/intevol_logo_top_new.gif" alt="" width="308" height="30" /></a></p>
<p>Recently I had a couple of great opportunities to share my thoughts publicly, and I wanted to make sure readers here were aware of them.</p>
<p>The first was a special guest appearance on <a href="http://semanticweb.com/the-semantic-link-with-guest-daniel-tunkelang-%E2%80%93-april-2012_b28246">The Semantic Link</a>, a program hosted by <a href="http://www.linkedin.com/in/pau1mi11er">Paul Miller</a> with regular panelists <a href="http://pensivepeter.wordpress.com/">Peter Brown</a>, <a href="http://www.linkedin.com/in/cjmconnors">Christine Connors</a>, <a href="http://www.linkedin.com/in/ericfranzon">Eric Franzon</a>, <a href="http://www.linkedin.com/in/erichoffer">Eric Hoffer</a>, <a href="http://www.linkedin.com/in/bhyland">Bernadette Hyland</a>, and <a href="http://www.linkedin.com/in/andraz">Andraz Tori</a>. It was a lot of fun, and a great warm-up for the keynote I&#8217;ll be delivering on &#8220;<a href="http://semtechbizsf2012.semanticweb.com/sessionPop.cfm?confid=65&amp;proposalid=4800">Scale, Structure, and Semantics</a>&#8221; at the upcoming <a href="http://semtechbizsf2012.semanticweb.com/">Semantic Tech &amp; Business Conference (SemTechBiz)</a>, which will take place in San Francisco in June.</p>
<p>The second was a live interview on <a href="http://www.internetevolution.com/radio.asp?doc_id=240580">Internet Evolution</a>, hosted by <a href="http://www.linkedin.com/pub/mary-jander/7/b5/300">Mary Jander</a> and <a href="http://www.linkedin.com/pub/nicole-ferraro/4/921/a29">Nicole Ferraro</a>. They clearly did their homework, scouring my blog posts and web commentary for everything controversial I&#8217;d ever said &#8212; and then some! If that&#8217;s enough to pique your interest, then I encourage you to listen to the recorded interview and read the chat transcript at <a href="http://www.internetevolution.com/radio.asp?doc_id=240580">Internet Evolution</a>.</p>
<p>Happy to answer questions based on either of these sessions &#8212; comment away!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/19/semantic-link-and-internet-evolution/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Noah Iliinsky: Tech Talk on Designing Data Visualizations</title>
		<link>http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/</link>
		<comments>http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/#comments</comments>
		<pubDate>Wed, 18 Apr 2012 14:13:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4160</guid>
		<description><![CDATA[Note: This post was written by Yael Garten, a Senior Data Scientist at LinkedIn. Yael joined Linkedin in 2011, where she leads our mobile analytics team. She previously worked at Stanford on text mining, personalized medicine, and biomedical informatics. We live in an era of Big Data. But how do we use all of that [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/R-oiKt7bUU8?rel=0" frameborder="0" width="462" height="260"></iframe><br />
<em></em></p>
<p><em>Note: This post was written by <a href="http://www.linkedin.com/in/yaelgarten">Yael Garten</a>, a Senior Data Scientist at LinkedIn. Yael joined Linkedin in 2011, where she leads our mobile analytics team. She previously worked at Stanford on text mining, personalized medicine, and biomedical informatics.<br />
</em></p>
<p>We live in an era of Big Data. But how do we use all of that data to answer questions and communicate those answers effectively?</p>
<p>My colleagues and I at LinkedIn were fortunate enough to hear answers from <a href="http://www.linkedin.com/in/iliinsky">Noah Iliinsky</a>, who literally wrote the <a href="http://amzn.to/HJFDMe">book on designing data visualization</a>.</p>
<p>Earlier this month, we hosted Noah at LinkedIn to give a tech talk on &#8220;<a href="http://linkd.in/HaKPwk">Designing Effective Data Visualizations</a>&#8220;. We are proud to make these <a href="http://www.youtube.com/linkedintechtalks">tech talks</a> open to the public, and enjoyed a great mix of attendees from local companies and universities. If you couldn&#8217;t attend the talk in person or remotely, I encourage you to watch the recording, embedded above.</p>
<p>Why do we visualize data? As Noah tells us, visualization makes data accessible. It gives us faster access to actionable insights and allows access to huge amounts of data. Visualization enables both data exploration (when you are still trying to discover the story) and data explanation (when you have a story to tell). Noah reviewed some great examples (watch the talk!), with an emphasis on the dos and don&#8217;ts of data visualization.</p>
<p>In particular, he provided a step-by-step framework for traversing the path from question to answer:</p>
<p>Phase 1: Decide what to visualize.</p>
<ul>
<li>Understand the question your audience wants to answer.</li>
<li>Understand the actions they are hoping the answer will drive.</li>
<li>Consider who is consuming this data &#8212; their needs, biases, etc.</li>
<li>Decide what data to use &#8212; and what data <em>not</em> to use &#8212; and what relationships you are interested in.</li>
<li>Explore the data and construct a storyline.</li>
</ul>
<p>Phase 2: Decide how to visualize it.</p>
<ul>
<li>Use appropriate visual encodings for data and relationships (cf. <a href="http://complexdiagrams.com/properties">http://complexdiagrams.com/properties</a>).</li>
<li>Limit the data you include.</li>
<li>Use position for your most important relationship.</li>
<li>Try different axes.</li>
<li>Show your visualization to different people, without explanations. Show an expert, show a layman.</li>
<li>Iterate, iterate, iterate!</li>
</ul>
<p>Noah also shared his thoughts on how to visualize social networks. He recommended useful tools for data visualization, including <a href="http://www.tableausoftware.com/">Tableau</a>, <a href="http://spotfire.tibco.com/">Spotfire</a>, <a href="http://mbostock.github.com/d3/">D3</a>, <a href="http://processing.org/">Processing</a>, <a href="http://had.co.nz/ggplot2/">ggplot2</a>, <a href="http://www.omnigroup.com/products/omnigraffle/">Omnigraffle</a>, and <a href="http://www.omnigroup.com/products/omnigraphsketcher/">OmnigraphSketcher</a>.</p>
<div>Finally, he left us with key lessons to take home:</div>
<ul>
<li>You are not your audience. This is a huge lesson that all of us must internalize to be great at what we do. Consider what you need to communicate to marketers, investors, member of the general public, etc.</li>
<li>Do user research! Understand your users&#8217; hopes, dreams, and favorite flavors! Understand their identity, their jargon, culture, etc.</li>
<li>Remember that your success is defined by your customer’s success. If you can’t satisfy your customer&#8217;s needs, you have failed &#8212; no matter how insightful your analysis.</li>
</ul>
<p>You can enjoy the talk by watching the embedded video above. And you can find more LinkedIn tech talks on our <a href="http://www.youtube.com/linkedintechtalks">YouTube channel</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/18/noah-iliinsky-tech-talk-on-designing-data-visualizations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data, Algorithms, and People</title>
		<link>http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/</link>
		<comments>http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/#comments</comments>
		<pubDate>Sat, 14 Apr 2012 20:49:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4155</guid>
		<description><![CDATA[One of the highlights of the recent Data 2.0 Summit was a panel featuring: Alexander Gray, CTO of SkyTree Anthony Goldbloom, CEO of Kaggle Josh Wills, Director of Data Science at Cloudera The focus of the panel was supposed to be about &#8220;Data Science and Predicting the Future&#8221;, but the most contentious topic was whether data, algorithms or people (that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://data2summit.com/"><img class="wp-image-4143 alignleft" title="Data 2.0 Summit" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/03/data20summit.png" alt="" width="480" height="88" /></a></p>
<p>One of the highlights of the recent <a href="http://thenoisychannel.com/2012/03/30/data-2-0-summit/">Data 2.0 Summit</a> was a panel featuring:</p>
<ul>
<li><a href="http://www.linkedin.com/pub/alexander-gray/4/4b6/b55">Alexander Gray</a>, CTO of <a href="http://www.skytreecorp.com/">SkyTree</a></li>
<li><a href="http://www.linkedin.com/in/anthonygoldbloom">Anthony Goldbloom</a>, CEO of <a href="http://www.kaggle.com/">Kaggle</a></li>
<li><a href="http://www.linkedin.com/pub/josh-wills/0/82b/138">Josh Wills</a>, Director of Data Science at <a href="http://www.cloudera.com/">Cloudera</a></li>
</ul>
<p>The focus of the panel was supposed to be about &#8220;Data Science and Predicting the Future&#8221;, but the most contentious topic was whether data, algorithms or people (that is, the data scientists themselves) were the most important factor in the practice and success of data science.</p>
<p>Yes, we one-upped the <a href="http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine">debate</a> that my colleague <a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a> instigated at this year&#8217;s <a href="http://strataconf.com/strata2012/">Strata</a> conference. In fact, Josh cited the &#8220;better data beats more data beats clever algorithms&#8221; argument that Monica made in <a href="http://strataconf.com/strata2012/public/schedule/detail/22538">her own Strata presentation</a>. And, just like at Strata, there was a healthy dose of audience participation.</p>
<p>Of course, I came down on the side of data &#8212; which I believe won the debate hands down.</p>
<p>I&#8217;m a fan of clever algorithms, which Alexander had to defend given that Skytree&#8217;s core value proposition is better machine learning algorithms delivered at scale. But <a href="http://thenoisychannel.com/2009/03/31/the-unreasonable-effectiveness-of-data/">I&#8217;m with Peter Norvig et al.</a> on the dominance of data over algorithms.</p>
<p>Favoring data over people was a harder choice. Anthony naturally made the case for people (Kaggle&#8217;s claim to fame is assembling many of the world&#8217;s best data scientists by organizing competitions). Hopefully <a href="http://www.quora.com/If-I-want-to-do-Data-Science-would-LinkedIn-or-Twitter-be-a-better-place-to-start-work/answer/Daniel-Tunkelang">my team</a> won&#8217;t quit en masse when they read this blog post! But I think they&#8217;ll agree with me that, without the incredible data we work with at LinkedIn, they&#8217;d be unable to deliver the awesomeness that I&#8217;ve come to expect from them.</p>
<p>There&#8217;s a saying that we all cook from the same cookbooks, so that it&#8217;s the ingredients that make all the difference. To take the metaphor further, you can also try to <a href="http://blog.topprospect.com/2011/06/the-biggest-talent-losers-and-winners/">poach your rival&#8217;s chefs</a>. But data is the biggest entry barrier &#8212; and the most sustainable competitive advantage.</p>
<p>Of course, we should have the best people apply the best algorithms to work with the best data. But data comes first. The best meal starts with the best ingredients.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/04/14/data-algorithms-and-people/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Video of Strata 2012 Talk on Humans, Machines, and the Dimensions of Microwork</title>
		<link>http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/</link>
		<comments>http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/#comments</comments>
		<pubDate>Sat, 31 Mar 2012 20:39:21 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4151</guid>
		<description><![CDATA[&#160; The video of the presentation that Claire Hunsaker and I delivered on &#8220;Humans, Machines, and the Dimensions of Microwork&#8221; at Strata 2012 is now available as part of the complete video compilation. I&#8217;ve taken the liberty to upload it to YouTube &#8212; feel free to watch the embedded video above.]]></description>
			<content:encoded><![CDATA[<p><iframe width="462" height="260" src="http://www.youtube.com/embed/nc5YZYG1p_w?rel=0" frameborder="0" allowfullscreen></iframe><br />
&nbsp;</p>
<p>The video of the presentation that <a href="http://www.linkedin.com/in/clairehunsaker">Claire Hunsaker</a> and I delivered on &#8220;<a href="http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/">Humans, Machines, and the Dimensions of Microwork</a>&#8221; at Strata 2012 is now available as part of the <a href="http://shop.oreilly.com/product/0636920025412.do">complete video compilation</a>. I&#8217;ve taken the liberty to upload it to YouTube &#8212; feel free to watch the embedded video above.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/31/video-of-strata-2012-talk-on-humans-machines-and-the-dimensions-of-microwork/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data 2.0 Summit</title>
		<link>http://thenoisychannel.com/2012/03/30/data-2-0-summit/</link>
		<comments>http://thenoisychannel.com/2012/03/30/data-2-0-summit/#comments</comments>
		<pubDate>Fri, 30 Mar 2012 15:48:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4142</guid>
		<description><![CDATA[I&#8217;ll be participating in the Data 2.0 Summit on Tuesday, April 3rd, and I hope to see some of you there. Last year, my colleague (and fellow LinkedIn data scientist) Scott Nicholson attended and wrote this guest post about it. This year, I&#8217;m not only attending but participating on a panel about social data, moderated by AllthingsD Senior [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://data2summit.com/"><img class="wp-image-4143 alignleft" title="Data 2.0 Summit" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/03/data20summit.png" alt="" width="480" height="88" /></a></p>
<p>I&#8217;ll be participating in the <a href="http://data2summit.com/">Data 2.0 Summit</a> on Tuesday, April 3rd, and I hope to see some of you there. Last year, my colleague (and fellow <a href="http://www.quora.com/If-I-want-to-do-Data-Science-would-LinkedIn-or-Twitter-be-a-better-place-to-start-work/answer/Daniel-Tunkelang">LinkedIn data scientist</a>) <a href="http://www.linkedin.com/in/scottnicholsonphd">Scott Nicholson</a> attended and wrote this <a href="http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/">guest post</a> about it. This year, I&#8217;m not only attending but participating on a panel about social data, moderated by <a href="http://allthingsd.com/">AllthingsD</a> Senior Editor <a href="http://allthingsd.com/author/lizg/">Liz Gannes</a>.</p>
<p>There&#8217;s a great line-up of <a href="http://data2summit.com/speakers">speakers</a> for the day, including:</p>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Bram_Cohen">Bram Cohen</a>, the founder, chief scientist, and inventor of <a href="http://www.bittorrent.com/">BitTorrent</a>, the leading peer-to-peer file sharing protocol for sharing large files on the Internet.</li>
<li><a href="http://www.linkedin.com/in/medriscoll">Michael Driscoll</a>, CTO and co-founder of the <a href="http://www.metamarketsgroup.com/">Metamarkets Group</a>. He moderated a fantastic <a href="http://medriscoll.com/post/18784448854/the-data-science-debate-domain-expertise-or-machine">debate</a> at the recent <a href="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/">Strata conference</a> about the relative importance of domain expertise or machine learning for data scientists.</li>
<li><a href="http://www.linkedin.com/in/gilelbaz">Gil Elbaz</a>, the founder and CEO of <a href="http://www.factual.com/">Factual</a>, an information marketplace. He is also the co-founder of Applied Semantics, which Google acquired in 2003 for $102M and turned into the foundation for AdSense (now a $10B business).</li>
<li><a href="http://www.linkedin.com/in/anthonygoldbloom">Anthony Goldbloom</a>, co-founder and CEO of <a href="http://www.kaggle.com/">Kaggle</a>,  a platform for data science competitions that generated a lot of discussion at Strata.</li>
<li><a href="http://www.linkedin.com/pub/stefan-weitz/0/9b3/299">Stefan Weitz</a>, director of search at Bing. He&#8217;ll be on my panel. Also see the discussion I had with him in the comment thread for a post on &#8220;<a href="http://thenoisychannel.com/2009/03/17/why-are-people-so-clueless-about-search/">Why Are People So Clueless About Search?</a>&#8220;.</li>
</ul>
<p>And lots more, but you get the idea. I&#8217;m thrilled to be part of such a talent-heavy program and looking forward to insightful discussions with with fellow panelists and attendees. Also a great excuse to spend a day in the city (note for my <a href="https://sites.google.com/site/245henry/">former townspeople</a> &#8212; that&#8217;s what they call San Francisco around here).</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/30/data-2-0-summit/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/30/data-2-0-summit/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Claudia Perlich: Tech Talk on Real-Time Bidding Optimization</title>
		<link>http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/</link>
		<comments>http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/#comments</comments>
		<pubDate>Fri, 23 Mar 2012 03:45:01 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4138</guid>
		<description><![CDATA[Conventional wisdom holds that physical compliments are counter-productive as pick-up lines. Indeed, a dating site did some analysis showing a negative correlation between such compliments and the probability of a positive response. But, as m6d Chief Scientist and 3-time KDD Cup winner Claudia Perlich explained in her recent talk at LinkedIn, we have to watch [...]]]></description>
			<content:encoded><![CDATA[<p><iframe width="504" height="285" src="http://www.youtube.com/embed/5DSahEbJ4KY?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>Conventional wisdom holds that physical compliments are counter-productive as pick-up lines. Indeed, a <a href="http://blog.okcupid.com/index.php/online-dating-advice-exactly-what-to-say-in-a-first-message/">dating site</a> did some analysis showing a negative correlation between such compliments and the probability of a positive response.</p>
<p>But, as <a href="http://m6d.com/">m6d</a> Chief Scientist and 3-time <a href="http://www.sigkdd.org/kddcup/">KDD Cup</a> winner <a href="http://people.stern.nyu.edu/cperlich/">Claudia Perlich</a> explained in her recent talk at LinkedIn, we have to watch out for confounding variables. In the dating scenario above, beauty is a confounding variable: it determines both the probability of getting a positive response and of the probability of a suitor offering physical compliments. Hence, we need to control for the actual beauty or it can appear that making compliments is a bad idea.</p>
<p>Perlich does not work on online dating, but rather in the data-driven world of online advertising. Specifically, she and her team work on real-time bidding optimization.</p>
<p>Perlich described a variety of design choices that have general applicability to data science problems. For example, her team used hashed tokens of previously visited URLs, rather than the URLs themselves, as features for their machine learning models. They avoided the use of personally identifying information (PII) or even demographic information about their users. These decisions were counterintuitive — typically, more data leads to better results. But Perlich found that these restrictions did not sacrifice accuracy, and had the further benefit of keeping their approach general rather than application- or customer-specific.</p>
<p>Perlich also described several technical challenges that her team had to overcome. For example, they found they could not sample users, so they instead sampled events &#8212; that is, visits, impressions, and conversions. They also found that their <a href="http://en.wikipedia.org/wiki/Linear_model">linear models</a> tended to suffer from <a href="http://en.wikipedia.org/wiki/Overfitting">overfitting</a> in their top predictions &#8212; a problem they resolved by introducing a <a href="http://en.wikipedia.org/wiki/Spline_(mathematics)">spline</a> model.</p>
<p>The talk was deeply technical and yet very relevant and accessible to a broad audience of data scientists and engineers. There&#8217;s much more content than fits in this small summary, so I encourage you to watch the video! And you can <a href="www.youtube.com/linkedintechtalks">watch more LinkedIn tech talks here</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/22/claudia-perlich-tech-talk-on-real-time-bidding-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Facing Prosopagnosia</title>
		<link>http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/</link>
		<comments>http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 05:57:34 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4131</guid>
		<description><![CDATA[       In the past few years, prosopagnosia, also known as &#8220;face blindness&#8221;, has received a fair amount of attention from researchers, as well as from the popular press. My first exposure to the topic was Joshua Davis&#8217;s article entitled &#8220;Face Blind&#8220;, which appeared in Wired in November 2006. I was intrigued, especially since [...]]]></description>
			<content:encoded><![CDATA[<p><object width="212" height="140" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" /><param name="scale" value="noscale" /><param name="salign" value="lt" /><param name="background" value="#333333" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="flashvars" value="si=254&amp;&amp;contentValue=50121783&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121783n&amp;tag=contentMain;contentAux" /><embed width="212" height="140" type="application/x-shockwave-flash" src="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" scale="noscale" salign="lt" background="#333333" allowfullscreen="true" allowscriptaccess="always" flashvars="si=254&amp;&amp;contentValue=50121783&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121783n&amp;tag=contentMain;contentAux" /></object>      <object width="212" height="140" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" /><param name="scale" value="noscale" /><param name="salign" value="lt" /><param name="background" value="#333333" /><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="flashvars" value="si=254&amp;&amp;contentValue=50121784&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121784n&amp;tag=contentMain;contentAux" /><embed width="212" height="140" type="application/x-shockwave-flash" src="http://cnettv.cnet.com/av/video/cbsnews/atlantis2/cbsnews_player_embed.swf" scale="noscale" salign="lt" background="#333333" allowfullscreen="true" allowscriptaccess="always" flashvars="si=254&amp;&amp;contentValue=50121784&amp;shareUrl=http://www.cbsnews.com/video/watch/?id=50121784n&amp;tag=contentMain;contentAux" /></object></p>
<p>In the past few years, <a href="http://en.wikipedia.org/wiki/Prosopagnosia">prosopagnosia</a>, also known as &#8220;face blindness&#8221;, has received a fair amount of attention from researchers, as well as from the popular press.</p>
<p>My first exposure to the topic was Joshua Davis&#8217;s article entitled &#8220;<a href="http://www.wired.com/wired/archive/14.11/blind.html">Face Blind</a>&#8220;, which appeared in Wired in November 2006. I was intrigued, especially since I&#8217;d long recognized that I had difficulty recognizing people by face. Perhaps the person who has done most to raise awareness of prosopagnosia is neurologist <a href="http://en.wikipedia.org/wiki/Oliver_Sacks">Oliver Sacks</a>, who has prosopagnosia himself.</p>
<p>The Wired article inspired me to explore the subject. I discovered <a href="http://faceblind.org/">faceblind.org</a> and found quizzes that tested for prosopagnosia. On one of these, where random guessing would have earned a score of 50%, I scored in the low 60s. My initial reaction was that my score wasn&#8217;t so bad &#8212; it was a hard test! Then my wife took the test and scored in the high 90s. That&#8217;s when I realized that I didn&#8217;t just have difficulty recognizing faces &#8212; I was almost incapable of it.</p>
<p>Faced with this realization, I had to decide whether to share it with my friends and family, let alone with my broader set of social and professional acquaintances. It was tempting not to &#8212; after all, why tell the world that I wasn&#8217;t &#8220;normal&#8221;?</p>
<p>But eventually I realized that it would be better for people around me to know than not know. The biggest downside to prosopagnosia isn&#8217;t the momentary embarrassment of not recognizing someone &#8212; it&#8217;s the content fear of offending people who may think you don&#8217;t value them enough to recognize or acknowledge them.</p>
<p>Hence, I spread the word through my colleagues, ensuring that most of the people with whom I interacted regularly would find out without any big announcements. Some of my co-workers were surprised, since I do a pretty good job of recognizing people using non-facial clues &#8212; height, hair, clothing, where I run into them, etc. I have a great memory, and I have no problems with voice recognition. In other words, I have lots of work-arounds.</p>
<p>Fortunately, I work with a lot of people who understand <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a> &#8212; which is a great framework for understanding how I recognize people. I simply work with a different set of <a href="http://en.wikipedia.org/wiki/Feature_selection">features</a> than most people, but fortunately I achieve sufficient <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision and recall</a> to pass as &#8220;normal&#8221; most of the time.</p>
<p>Anyway, if you didn&#8217;t already know that I had prosopagnosia, welcome to the inner circle! And if you ever felt that I walked by you without recognizing or acknowledging you, please accept my belated apology.</p>
<p>Finally, if you&#8217;re curious to learn more about prosopagnosia, I encourage you to watch the 60 Minutes segments above.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/18/facing-prosopagnosia/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Making Love with Data: Avinash Kaushik&#8217;s Strata 2012 Keynote</title>
		<link>http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/</link>
		<comments>http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/#comments</comments>
		<pubDate>Thu, 08 Mar 2012 07:23:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4121</guid>
		<description><![CDATA[Just watch the presentation, which stole the show at Strata 2012. The written word cannot do justice to Avinash&#8217;s passion and his extraordinary ability to communicate it.]]></description>
			<content:encoded><![CDATA[<p><iframe width="454" height="257" src="http://www.youtube.com/embed/CrSX97elHDA" frameborder="0" allowfullscreen></iframe></p>
<p>Just watch the presentation, which stole the show at <a href="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/">Strata 2012</a>. The written word cannot do justice to Avinash&#8217;s passion and his extraordinary ability to communicate it.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/07/making-love-with-data-avinash-kaushiks-strata-2012-keynote/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Humans, Machines &amp; the Dimensions of Microwork</title>
		<link>http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/</link>
		<comments>http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/#comments</comments>
		<pubDate>Mon, 05 Mar 2012 05:06:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4116</guid>
		<description><![CDATA[As per my previous post, I had a great time at the O’Reilly Strata Conference. It was a delight to participate in such a fantastic gathering of folks who work with big data. For those who missed my session, I&#8217;ve attached the slides that Claire and I presented. Some of the slides don&#8217;t make sense without [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/11863457" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<div style="padding: 5px 0 12px;">
<p>As per my <a href="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/">previous post</a>, I had a great time at the <a href="http://strataconf.com/strata2012">O’Reilly Strata Conference</a>. It was a delight to participate in such a fantastic gathering of folks who work with big data. For those who missed my session, I&#8217;ve attached the slides that Claire and I presented. Some of the slides don&#8217;t make sense without the voice-over, but hopefully there is enough self-contained content in them to be useful.</p>
<p>The presentation was recorded and will be available as part of the <a href="http://strataconf.com/strata2012/public/sv/q/385">Strata 2012 Video Compilation</a>.</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/04/humans-machines-and-the-dimensions-of-microwork/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Strata 2012: Big Data is Bigger than Ever!</title>
		<link>http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/</link>
		<comments>http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 08:57:23 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4101</guid>
		<description><![CDATA[I spent the last three days at the O&#8217;Reilly Strata Conference, an assembly of two thousand over 2500 people focused on data science and its applications. While I&#8217;m wary of industry conferences from attending vendor-fests in my past life in the enterprise software world, Strata is an exceptionally good conference. The speakers were a who&#8217;s who of data [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><a href="http://strataconf.com/strata2012"><img class="wp-image-4102" title="Strata 2012" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/03/Strata-banner.png" alt="" width="481" height="99" /></a></p>
<p>I spent the last three days at the <a href="http://strataconf.com/strata2012">O&#8217;Reilly Strata Conference</a>, an assembly of <del>two thousand</del> over 2500 people focused on data science and its applications. While I&#8217;m wary of industry conferences from attending vendor-fests in my past life in the enterprise software world, Strata is an exceptionally good conference. The <a href="http://strataconf.com/strata2012/public/schedule/speakers">speakers</a> were a who&#8217;s who of data science, including Lucene and Hadoop creator <a href="http://strataconf.com/strata2012/public/schedule/speaker/103766">Doug Cutting</a>, search user interface pioneer <a href="http://strataconf.com/strata2012/public/schedule/speaker/66363">Marti Hearst</a>, and Google chief economist <a href="http://strataconf.com/strata2012/public/schedule/speaker/63098">Hal Varian</a>. You can find the tweet stream for the conference at hash tag <a href="https://twitter.com/#!/search/%23stratconf">#strataconf</a>.</p>
<p><strong>Tuesday</strong></p>
<p>I spent Tuesday in the <a href="http://strataconf.com/strata2012/public/schedule/detail/22903">Deep Data</a> session, billed as a no-holds-barred program for data scientists. My two favorite talks:</p>
<ul>
<li><a href="http://strataconf.com/strata2012/public/schedule/speaker/103739">Claudia Perlich</a>, winner of three <a href="http://www.sigkdd.org/kddcup/">KDD cups</a>, talked about using information to pick the right action and to influence people such that they behave in a way that is better for them, better for us, and possibly better for society in general.</li>
<li><a href="http://strataconf.com/strata2012/public/schedule/speaker/109152">Monica Rogati</a>, my colleague at LinkedIn and the epitome of a data scientist, delivered a fantastic talk about machine learning models and training data in the real world, extending <a href="http://norvig.com/">Peter Norvig</a>&#8216;s point about the &#8220;<a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/pubs/archive/35179.pdf">unreasonable effectiveness of data</a>&#8221; to observe that more data beats clever algorithms but better data beats more data.</li>
</ul>
<p>But the most fun that day was the Oxford-style debate featuring <a href="http://www.drewconway.com/Drew_Conway/About.html">Drew Conway</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/76203">Pete Skomoroch</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/33953">Mike Driscoll</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/101103">DJ Patil</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/135062">Amy Heineike</a>, <a href="http://strataconf.com/strata2012/public/schedule/speaker/104290">Pete Warden</a>, and <a href="http://strataconf.com/strata2012/public/schedule/speaker/1956">Toby Segaran</a>. The question proposed was absurdly <a href="http://dictionary.reference.com/wordoftheday/archive/2010/06/04.html">Manichean</a>: if you had to hire your first data scientist and could only hire one, would you pick a domain expert or a machine learning expert? After the moderator suppressed some initial attempts to hedge (&#8220;both&#8221;, &#8220;it depends&#8221;, etc.), the debaters ripped into the question by taking extreme positions and defending them with gusto. It was a lot of fun, with enthusiastic audience participation and the debaters exploiting their inside knowledge of their opponents&#8217; work histories. In the end, the machine learning side won by a small margin.</p>
<p>I then had the good fortune to grab dinner with Marti Hearst and Hal Varian at <a href="http://xanhrestaurant.com/">Xanh</a> &#8211; a wonderful mix of great food and conversation.</p>
<p><strong>Wednesday</strong></p>
<p>The Wednesday morning keynote session offered some gems:</p>
<ul>
<li>Cloudera CEO <a href="http://strataconf.com/strata2012/public/schedule/speaker/5259">Mike Olson</a> urged big data practitioners to focus on guns, drugs, and oil.</li>
<li>Doctor and data geek <a href="http://strataconf.com/strata2012/public/schedule/speaker/128471">Ben Goldacre</a> delivered a mesmerizing and disturbing talk about the suppression of inconvenient medical trial results and analytical tools to discover it.</li>
</ul>
<p>But the person who stole the show was Google&#8217;s <a href="http://strataconf.com/strata2012/public/schedule/speaker/43798">Avinash Kaushik</a>, who talked about making love with data to find orgasm-inducing actions to change the world and make more money. Unfortunately this was the one talk that was not recorded, but you can read the summary on <a href="https://plus.google.com/105279625231358353479/posts/CLwYzJM48L2">Avinash&#8217;s Google+ page</a>.</p>
<p>As a speaker, I held &#8220;office hours&#8221; on Wednesday. It was supposed to be a 40-minute slot for conference attendees to come and ask me question. But somehow those 40 minutes extended into three hours of conversation about everything from normalized <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">KL divergence</a> to <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/">interview problems</a> &#8212; and segued into a reception with specialty big-data cocktails. By the time I got back to my apartment, my voice, brain, and liver were spent.</p>
<p><strong>Thursday</strong></p>
<p>I spent most of Thursday morning in the speaker lounge, recovering from the previous evening and making last touches on my presentation. But I couldn&#8217;t resist attending a two-part session on privacy. Indeed, this session was distinctive enough to merits it&#8217;s own hash tag: <a href="https://twitter.com/#!/search/%23strataprivacy">#strataprivacy</a>.</p>
<p>The first part featured O&#8217;Reilly&#8217;s <a href="http://strataconf.com/strata2012/public/schedule/speaker/89224">Alex Howard</a> moderating Intelius Chief Privacy Officer <a href="http://strataconf.com/strata2012/public/schedule/speaker/41727">Jim Adler</a> and NYU PhD student <a href="http://strataconf.com/strata2012/public/schedule/speaker/122944">Solon Barocas</a> on a panel provocatively titled  &#8221;<a href="http://strataconf.com/strata2012/public/schedule/detail/22613">If Data Wants to Be Free, is Privacy a Prison?</a>&#8221; It was a great discussion, and I enjoyed the opportunity to offer my own provocative question through Twitter. Since the panelists were arguing that it was unethical to infer private facts from public data, I asked if they were trying to establish a new form of <a href="http://en.wikipedia.org/wiki/Thoughtcrime">thoughtcrime</a>.</p>
<p>The second panel, entitled &#8220;<a href="http://strataconf.com/strata2012/public/schedule/detail/22300">Pretty Simple Data Privacy</a>&#8220;, featured <a href="http://strataconf.com/strata2012/public/schedule/speaker/105140">Kaitlin Thaney</a> from Digital Science, <a href="http://strataconf.com/strata2012/public/schedule/speaker/124127">Betsy Masiello</a> from Google, and <a href="http://strataconf.com/strata2012/public/schedule/speaker/44389">John Wilbanks</a> from the Kauffman Foundation for Entrepreneurship. Given that today was the first day of <a href="http://www.google.com/hostednews/afp/article/ALeqM5ip-cz4mF_UePtGrJ-0Wq8wZ9ykPw">Google&#8217;s new privacy policy</a>, there was no avoiding focus on the associated controversy. I did try to get Betsy to address my charge that Google doesn&#8217;t think users own their search history (cf. &#8220;<a href="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/">Google vs. Bing: A Tweetle Beetle Battle Muddle</a>&#8220;), but she said she was unfamiliar with the details of that event. I do wish that someone at Google with more familiarity would respond publicly.</p>
<p>Back to the speaker room after lunch, until my own talk with Samasource&#8217;s <a href="http://strataconf.com/strata2012/public/schedule/speaker/125659">Claire Hunsaker</a> on &#8220;<a href="http://strataconf.com/strata2012/public/schedule/detail/22363">Humans, Machines, and the Dimensions of Microwork</a>&#8220;. I&#8217;ll post the slides (and there will be a video on the conference site), but the sound bite is that you need to keep crowdsourcing tasks simple, manage the trade-off between task value and difficulty, and watch out for systematic bias.</p>
<p>I wrapped up the conference by hearing <a href="http://strataconf.com/strata2012/public/schedule/speaker/107550">William Gunn</a> talk about how <a href="http://www.mendeley.com/">Mendeley</a> is disrupting <a href="http://nihlibrary.nih.gov/ResearchTools/Pages/bibliometrics.aspx">bibliometrics</a> and perhaps the entire academic publishing and reputation ecosystem. I laud his ambition and wish him and Mendeley luck in this quest.</p>
<p>&nbsp;</p>
<p>In summary, three days of great talks, conversations, and general enjoyment. My thanks to Strata organizers <a href="http://strataconf.com/strata2012/public/schedule/speaker/1">Edd Dumbill</a> and <a href="http://strataconf.com/strata2012/public/schedule/speaker/17816">Alistair Croll</a> for putting together such an outstanding event and for giving me the opportunity to participate.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/03/02/strata-2012-big-data-is-bigger-than-ever/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>Enjoying Seattle&#8217;s Best: UW, WSDM, and SSS</title>
		<link>http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/</link>
		<comments>http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/#comments</comments>
		<pubDate>Sun, 12 Feb 2012 20:44:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4095</guid>
		<description><![CDATA[My excursion to Seattle was delightful, and I thought I&#8217;d share some details with readers. I spent most of Friday at the University of Washington, meeting with graduating PhD students.  I&#8217;ve always known that UW is a top school, but I was particularly impressed with this batch. I was pleasantly surprised to see folks like [...]]]></description>
			<content:encoded><![CDATA[<p><img class="size-full wp-image-4096" title="Seattle's Best" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/02/seattles-best.png" alt="" width="293" height="95" /></p>
<p>My excursion to Seattle was delightful, and I thought I&#8217;d share some details with readers.</p>
<p>I spent most of Friday at the University of Washington, meeting with <a href="http://www.cs.washington.edu/education/grad/phdcandidates/">graduating PhD students</a>.  I&#8217;ve always known that UW is a top school, but I was particularly impressed with this batch. I was pleasantly surprised to see folks like <a href="http://www.cs.washington.edu/homes/nodira/Nodira_Khoussainova.html">Nodira Khoussainova</a> and <a href="http://www.cs.washington.edu/homes/kayur/">Kayur Patel</a> working to bring together the often disparate worlds of databases, machine learning, and HCI in order to make people more effective at solving &#8220;big data&#8221; problems. I realize that I&#8217;m aiding and abetting other employers with whom I compete for top talent, but it would be wrong not to encourage everyone to find worthy challenges for these budding scientists.</p>
<p>I then went to the <a href="http://spaceneedle.com/">Space Needle</a> to meet up with the <a href="http://wsdm2012.org/">WSDM 2012</a> crowd. <a href="http://research.microsoft.com/en-us/um/people/teevan/">Jaime Teevan</a> and <a href="http://www.cond.org/">Eytan Adar </a>outdid themselves, providing a great setting for folks to mingle, imbibe, and enjoy a spectacular view of Seattle.</p>
<p>Saturday I attended the &#8220;social&#8221; day of the WSDM conference.</p>
<p><a href="http://www.linkedin.com/pub/andrew-tomkins/0/87/713">Andrew Tomkins</a> chaired the first morning session, which included <a href="http://www.cs.columbia.edu/~hila/">Hila Becker</a>&#8216;s latest work on identifying event content in social media and <a href="http://cs-people.bu.edu/zg/">Georgios Zervas</a> presenting the work on the analyzing reputational effects of Groupon that triggered quite a controversy <a href="http://www.technologyreview.com/blog/arxiv/27150/">last</a> <a href="http://articles.businessinsider.com/2011-09-12/research/30155506_1_daily-deal-business-insider-post-reviews">September</a>. After the break came the spotlight section &#8212; a great sequence of 5-minute presentations that in which researchers both summarized their contributions and lured attendees to visit their posters. I hope that more conferences adopt this format, which optimizes for communicating ideas and discourages long-winded expositions.</p>
<p>I then had the pleasure to have lunch with <a href="http://www.jopedersen.com/jopedersen/Home.html">Jan Pedersen</a> and friends at <a href="http://blueacreseafood.com/">Blueacre Seafood</a> &#8212; great food and even better conversation. We both noted the irony that, even though we are practically neighbors, we only seems to meet up at events like these..</p>
<p>I made it back to the conference in time to hear the two best-paper awardees: <a href="http://www.cs.rochester.edu/~sadilek/">Adam Sadilek</a> on &#8220;<a href="http://hci.cs.rochester.edu/pubs/pdfs/following-friends.pdf">Finding Your Friends and Following Them to Where You Are</a>&#8221; and <a href="http://www.cs.berkeley.edu/~yaron/">Yaron Singer</a> on &#8220;<a href="http://www.cs.berkeley.edu/~yaron/papers/HowToWinFriendsAndInfluencePeople.pdf">How to Win Friends and Influence People, Truthfully: Influence Maximization Mechanisms for Social Networks</a>&#8220;. I highly recommend both papers, especially if you are interested in either social network prediction or the underlying economics of influence.</p>
<p>Another coffee break, and then the keynote: <a href="http://www.hilarymason.com/">Hilary Mason</a> on &#8220;The Secret Life of Social Links&#8221;. Hilary is a great speaker &#8212; I first met her when I invited her to the Workshop on Search and Social Media (<a href="http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/">SSM 2010</a>) at <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>. She didn&#8217;t disappoint, and it&#8217;s great to see practitioners like her crossing the aisle to engage the academic community. Not to mention infusing their slides with <a href="http://en.wikipedia.org/wiki/Lolcat">lolcats</a>.</p>
<p>The conference wrapped up at 5pm, but then we bussed over to Microsoft Research for the <a href="http://research.microsoft.com/en-us/events/sss2012/default.aspx">Social Search Social</a>. That was a fun event designed to cross-pollinate the WSDM and CSCW communities. <a href="http://research.microsoft.com/en-us/um/people/merrie/">Meredith Ringel Morris</a>, <a href="http://www.fxpal.com/?p=gene">Gene Golovchinksy</a>, <a href="http://twitter.com/#!/jerepick">Jeremy Pickens</a>, <a href="http://faculty.ist.psu.edu/reddy/">Madhu Reddy</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah</a>, and <a href="http://people.lis.illinois.edu/~twidale/">Michael Twidale</a> put together a great program of 45-second madness presentations and &#8220;speed-dating&#8221; to pair up WSDM and CSCW attendees. It was far too short, but a lot of fun. And some of us kept up the social spirit by grabbing dinner afterward at <a href="http://www.bluecsushi.com/">Blue C Sushi</a>.</p>
<p>To everyone I met in the last couple of days: thanks for the great company and conversation! Keep sharing ideas and making data and science social.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/02/12/enjoying-seattles-best-uw-wsdm-and-sss/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Social Wisdom in Seattle</title>
		<link>http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/</link>
		<comments>http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/#comments</comments>
		<pubDate>Sat, 04 Feb 2012 19:24:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4089</guid>
		<description><![CDATA[      First, I wanted to give readers a heads up that I&#8217;ll be in Seattle this Friday and Saturday. I&#8217;ll spend Friday afternoon at the University of Washington, meeting with some of their outstanding computer science doctoral students. My schedule filled up with unexpected haste! But if you&#8217;re on campus and urgently want [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cs.washington.edu/"><img class="alignnone" style="margin-left: -10px; margin-right: -10px;" title="University of Washington Computer Science &amp; Engineering" src="http://www.cs.washington.edu/images/cse_logo_80x133.gif" alt="" width="106" height="64" /></a><a href="http://wsdm2012.org/"><img class="alignnone" title="WSDM 2012" src="http://wsdm2012.org/img/topheader.png?1312770667" alt="" width="274" height="72" /></a>     <a href="http://brynnevans.com/blog/wp-content/uploads/2010/03/social-search.png"><img class="alignnone" title="Social Search" src="http://brynnevans.com/blog/wp-content/uploads/2010/03/social-search.png" alt="" width="85" height="63" /></a></p>
<p>First, I wanted to give readers a heads up that I&#8217;ll be in Seattle this Friday and Saturday. I&#8217;ll spend Friday afternoon at the <a href="http://www.cs.washington.edu/">University of Washington</a>, meeting with some of their outstanding computer science doctoral students. My schedule filled up with unexpected haste! But if you&#8217;re on campus and urgently want to meet, let me know and I&#8217;ll see what I can do.</p>
<p>Saturday I&#8217;ll be attending the social track of <a href="http://wsdm2012.org/">WSDM 2012</a>, the premier international ACM conference covering research in the areas of search and data mining on the Web. I&#8217;m excited about the program, as well as the opportunity to catch up with friends and make new ones. Back in 2010, I had the pleasure of co-organizing the Workshop on Search and Social Media (<a href="http://thenoisychannel.com/2010/01/25/workshop-on-search-and-social-media-ssm-2010/">SSM 2010</a>) and being the official ACM blogger for <a href="http://www.wsdm-conference.org/2010/">WSDM 2010</a>. You can read my posts <a href="http://thenoisychannel.com/2010/02/04/report-on-the-third-workshop-on-search-and-social-media-ssm-2010/">here</a>.</p>
<p>Then, on Saturday evening, I&#8217;ll be heading to Microsoft Research to attend the Social Search Social (<a href="http://research.microsoft.com/en-us/events/sss2012/">SSS 2012</a>). Hats off to organizers <a href="http://research.microsoft.com/en-us/um/people/merrie/">Meredith Ringel Morris</a>, <a href="http://www.fxpal.com/?p=gene">Gene Golovchinksy</a>, <a href="http://twitter.com/#!/jerepick">Jeremy Pickens</a>, <a href="http://faculty.ist.psu.edu/reddy/">Madhu Reddy</a>, <a href="http://comminfo.rutgers.edu/~chirags/">Chirag Shah</a>, and <a href="http://people.lis.illinois.edu/~twidale/">Michael Twidale</a> for creating what looks to be a fun (and very social!) event. I&#8217;m especially looking forward to the 45-second &#8220;madness&#8221; presentations (in which I&#8217;m participating) and the &#8220;speed dating&#8221; to help cross-pollinate  the WSDM and <a href="http://en.wikipedia.org/wiki/Computer-supported_cooperative_work">CSCW</a> communities.</p>
<p>Hope to see some of you there, and of course will share what I learn here at The Noisy Channel. I also encourage you to follow the tweet streams for <a href="https://twitter.com/#!/search?q=%23wsdm2012">#wsdm2012</a> and <a href="https://twitter.com/#!/search?q=%23sss2012">#sss2012</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/02/04/social-wisdom-in-seattle/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LinkedIn @ CMU</title>
		<link>http://thenoisychannel.com/2012/01/26/linkedin-cmu/</link>
		<comments>http://thenoisychannel.com/2012/01/26/linkedin-cmu/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 18:47:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4083</guid>
		<description><![CDATA[As regular readers know, I have a deep affection for Carnegie Mellon University, where I did my graduate work. I&#8217;m happy to announce that two of my colleagues (both fellow CMU PhDs) will be giving talks at CMU in a couple of weeks, and I hope that some of you will have the opportunities to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://engineering.linkedin.com"><img title="LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/in-logo.jpeg" alt="" width="205" height="205" /></a><a href="http://www.cs.cmu.edu/"><img title="CMU School of Computer Science" src="http://www.cs.cmu.edu/~ref/naacl/logos/bronze/dragon-small.jpeg" alt="" width="277" height="241" /></a></p>
<p>As regular readers know, I have a deep affection for Carnegie Mellon University, where I did my graduate work. I&#8217;m happy to announce that two of my colleagues (both fellow CMU PhDs) will be giving talks at CMU in a couple of weeks, and I hope that some of you will have the opportunities to attend.</p>
<p>On Tuesday, February 7th, <a href="http://www.linkedin.com/in/abhilad">Abhimanyu Lad</a> will be hosting an information session at 6pm in Scaife Hall, Room 214. Abhi is rock star on our data science team, and he&#8217;s been working on the next generation of LinkedIn search. You can get a taste of his work from his recent <a href="http://hcir.info/hcir-2011">HCIR 2011</a> presentation, &#8220;<a href="http://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MWVlMGNhZWY5NTA3MzQ2ZA">Is it Time to Abandon Abandonment?</a>&#8220;. Abhi will talk about a variety of technical challenges that data scientists and engineers are working on at LinkedIn.</p>
<p>On Thursday, February 9th, <a href="http://www.linkedin.com/in/paulogilvie">Paul Ogilvie</a> will talk about &#8220;<a href="http://www.lti.cs.cmu.edu/LinkedInPaulOgilvie.pdf">Where Big Data Meets Real-Time: Efficiently Indexing and Ranking News using Activity</a>&#8221; at 3:30pm in GHC 6115. Paul is responsible for article relevance infrastructure and algorithms on <a href="http://www.linkedin.com/today/">LinkedIn Today</a>, a great example of <a href="http://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjEzNDNjZTk5NGYyYWQwOA">social navigation</a> &#8211; not to mention a <a href="http://techcrunch.com/2011/06/30/linkedin-traffic-twitter/">great success for users</a>. Paul will talk about the technical details that make LinkedIn Today possible, including a novel use of inverted lists to efficiently index and support real-time updates to document representations.</p>
<p>And, even if you can&#8217;t make it to the talks, I encourage you to visit the LinkedIn booth at the <a href="http://www.studentaffairs.cmu.edu/career/job-fairs/eoc/index.html">EOC</a> fair on Wednesday, February 8th. We&#8217;re looking for great software engineers and data scientists, and we&#8217;re especially interested in interns.</p>
<div>I hope that CMU students and faculty will take the time to meet Abhi, Paul, and their colleagues when they visit in a couple of weeks.</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/26/linkedin-cmu/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/26/linkedin-cmu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Thoughts about Job Performance</title>
		<link>http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/</link>
		<comments>http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/#comments</comments>
		<pubDate>Sun, 22 Jan 2012 19:24:22 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4074</guid>
		<description><![CDATA[This is the season of annual reviews, at least at LinkedIn. Performance reviews can be daunting for both employees and managers &#8212; at least everywhere that I&#8217;ve worked. Not only are we as human beings terrible at delivering feedback, but we also receive bad advice as managers. For example, many of us have learned the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://dilbert.com/strips/comic/2009-08-26/"><img class="alignnone" title="Dilbert: Performance Review" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/60000/5000/600/65675/65675.strip.gif" alt="" width="500" height="155" /></a></p>
<p>This is the season of annual reviews, at least at LinkedIn. Performance reviews can be daunting for both employees and managers &#8212; at least <a href="http://www.linkedin.com/in/dtunkelang">everywhere that I&#8217;ve worked</a>. Not only are we as human beings terrible at delivering feedback, but we also receive bad advice as managers.</p>
<p>For example, many of us have learned the &#8220;feedback sandwich&#8221; method, a technique that doesn&#8217;t hold up to scientific validation. Watch the video below to see what Stanford professor <a href="http://www.stanford.edu/~nass/">Clifford Nass</a> has learned from his experiments (see my review of his book <a href="http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/">here</a>).</p>
<p><iframe src="http://www.youtube.com/embed/W2dGxE7E48I" frameborder="0" width="500" height="281"></iframe></p>
<p>Here is what I suggest as a format for performance feedback, whether for writing your own self-assessment or delivering feedback to reports or peers on their performance:</p>
<p style="padding-left: 30px;"><strong>1) What is your day job?</strong></p>
<p style="padding-left: 30px;">Everyone needs a day job &#8212; a mission with a crisp set of responsibilities and deliverables. If you don&#8217;t know what you&#8217;re responsible for delivering, you can&#8217;t assess how well you are delivering it. You should know and articulate your top priorities &#8212; at most three, with a clear #1. For further reading, I suggest the <a href="http://www.quora.com/OKRs-Objectives-and-Key-Results">Quora discussion on OKRs</a> (Objectives and Key Results), an idea pioneered by Intel and now used at top technology companies (including LinkedIn and Google).</p>
<p style="padding-left: 30px;"><strong>2) How are you performing in your day job?</strong></p>
<p style="padding-left: 30px;">Hopefully you make more contributions than you can count. But make sure that your day job comes first. If you find that a disproportionate fraction of your contribution is outside your day job, then consider changing your day job. Your top priority is to meet (hopefully exceed!) the expectations for your day job &#8212; expectations you should set early and revisit regularly. Performance reviews are a great opportunity to brag.</p>
<p style="padding-left: 30px;"><strong>3) What do you do beyond your day job?</strong></p>
<p style="padding-left: 30px;">Your day job should be strongly aligned with your team and company&#8217;s top priorities. But great employees contribute beyond their day job towards other team and company priorities. For example, <a href="http://engineering.linkedin.com/team">talent</a> is our top priority at LinkedIn, so we particularly value contributions to hiring and growing our talent. And, at least in every environment I&#8217;ve experienced, the best employees are those who help make others successful.</p>
<p style="padding-left: 30px;"><strong>4) How do you want to grow?</strong></p>
<p style="padding-left: 30px;">This is really a two-part question. First, what do you want to do next? That could mean getting better at your day job, evolving your current responsibilities, or taking on a different role. Second, what are you doing to get there? You are ultimately responsible for your own professional development. But one of your manager&#8217;s top responsibilities is to help you identify and advance along the path that is best for you. And performance reviews are a great opportunity to make you think about the future.</p>
<p>Regardless of how your company manages performance, these are the key questions you should think about. Performance feedback is a great opportunity to focus on professional development &#8212; your own and that of the people you work with everyday. Make the most of it!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/22/thoughts-about-job-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Are You Hitched?</title>
		<link>http://thenoisychannel.com/2012/01/20/are-you-hitched/</link>
		<comments>http://thenoisychannel.com/2012/01/20/are-you-hitched/#comments</comments>
		<pubDate>Sat, 21 Jan 2012 03:27:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4070</guid>
		<description><![CDATA[Let me preface this post by saying that this is my personal blog, and that my opinions here are not necessarily those of my employer. With that out of the way, I love the premise of Hitch.me: a dating site for professionals based on LinkedIn. I won&#8217;t confirm or deny the number of my colleagues [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/EnX_kEKe-3o?rel=0" frameborder="0" width="504" height="284"></iframe></p>
<p>Let me preface this post by saying that this is my personal blog, and that my opinions here are not necessarily those of my employer.</p>
<p>With that out of the way, I love the premise of <a href="http://www.hitch.me/">Hitch.me</a>: a dating site for professionals based on LinkedIn. I won&#8217;t confirm or deny the number of my colleagues who have thought about building a dating site based on our data, but it&#8217;s great to see someone using our <a href="http://developer.linkedin.com/apis">APIs</a> to do so. And the marketing video, while not exactly politically correct, is brilliant.</p>
<p>Yet another reason to work as a data scientist at LinkedIn!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/20/are-you-hitched/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/20/are-you-hitched/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Guided Exploration = Faceted Search, Backwards</title>
		<link>http://thenoisychannel.com/2012/01/17/guided-exploration/</link>
		<comments>http://thenoisychannel.com/2012/01/17/guided-exploration/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 14:00:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4041</guid>
		<description><![CDATA[Information Scent In the early 1990s, PARC researchers Peter Pirolli and Stuart Card developed the theory of information scent (more generally, information foraging) to evaluate user interfaces in terms of how well users can predict which paths will lead them to useful information. Like many HCIR researchers and practitioners, I&#8217;ve found this model to be [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.amazon.com/Blow-Up-Other-Stories-Julio-Cortazar/dp/0394728815"><img class="alignnone size-full wp-image-4043" title="Blow-Up by Julio Cortazar" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/01/blow-up.png" alt="" width="274" height="192" /></a></p>
<p><strong>Information Scent</strong></p>
<p>In the early 1990s, PARC researchers <a href="http://web.mac.com/peter.pirolli/Professional/About_Me.html">Peter Pirolli</a> and <a href="http://www2.parc.com/istl/groups/uir/people/stuart/stuart.htm">Stuart Card</a> developed the theory of information scent (more generally, <a href="http://en.wikipedia.org/wiki/Information_foraging">information foraging</a>) to evaluate user interfaces in terms of how well users can predict which paths will lead them to useful information. Like many <a href="http://hcir.info/">HCIR</a> researchers and practitioners, I&#8217;ve found this model to be a useful way to think about interactive information seeking systems.</p>
<p>Specifically, <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> is an exemplary application of the theory of information scent. Faceted search allows users to express an information need as a keyword search, providing them with a series of opportunities to improve the precision of the initial result set by restricting it to results associated with particular facet values.</p>
<p>For example, if I&#8217;m looking for folks to <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-PL-2350202">hire for my team</a>, I can start my search on LinkedIn with the keywords <em>[information retrieval]</em>, restrict my results to<em> Location: San Francisco Bay Area</em>, and then further restrict to <em>School: CMU</em>.</p>
<p><strong>Precision / Recall Asymmetry</strong></p>
<p>Faceted search is a great tool for information seeking systems. But it offers a flow that is asymmetric with respect to <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision and recall</a>.</p>
<p>Let&#8217;s invert the flow of faceted search. Rather than starting from a large, imprecise result set and progressively narrowing it; let&#8217;s start from a small, precise result set and progressively expand it. Since faceted search is often called &#8220;guided navigation&#8221; (a term <a href="http://www.linkedin.com/in/knabe">Fritz Knabe</a> and I coined at <a href="http://endeca.com/">Endeca</a>), let&#8217;s call this approach &#8220;guided exploration&#8221; (which has a nicer ring than &#8220;guided expansion&#8221;).</p>
<p>Guided exploration exchanges the roles of precision and recall. Faceted search starts with high recall and helps users increase precision while preserving as much recall as possible. In contrast, guided exploration starts with high precision and helps users increase recall while preserving as much precision as possible.</p>
<p>That sounds great in theory, but how can we implement guided exploration in practice?</p>
<p>Let&#8217;s remind ourselves why faceted search works so well. Faceted search offers the user information scent: the facet values help the user identify regions of higher precision relative to his or her information need. By selecting a sequence of facet values, the user arrives at a non-empty set that consists entirely or mostly of relevant results.</p>
<p><strong>How to Expand a Result Set</strong></p>
<p><strong></strong>How do we invert this flow? Just as enlarging an image is more complicated than reducing one, increasing recall is more complicated than increasing precision.</p>
<p>If our initial set is the result of selecting multiple facet values, then we may be able to increase recall by de-selecting facet values (e.g., de-selecting <em>San Francisco Bay Area</em> and <em>CMU</em> in my previous example). If we are using hierarchical facets, then rather than de-selecting a facet value, we may be able to replace it with a parent value (e.g., replacing <em>San Francisco Bay Area</em> with <em>California</em>). We can also remove one or more search keywords to broaden the results (e.g., <em>information</em> or <em>retrieval</em>).</p>
<p>Those are straightforward query relaxations. But there are more interesting ways to expand our results:</p>
<ul>
<li>We can replace a facet value with the union or that value and similar values (e.g., replacing <em>CMU</em> with <em>CMU </em>OR<em> MIT</em>).</li>
<li>We can replace the entire query (or any subquery) with a union of that query and the results for selecting a single facet value (e.g., (<em>[information retrieval] </em>AND<em> Location: San Francisco Bay Area </em>AND<em> School: CMU) </em>OR<em> Company: Google</em>)</li>
<li>We can replace the entire query (or any subquery) with a union of that query and the results for a keyword search a single facet value (e.g., (<em>[information retrieval] </em>AND<em> Location: San Francisco Bay Area </em>AND<em> School: CMU) </em>OR<em> [faceted search]</em>).</li>
</ul>
<p>As we can see, there are many ways to progressively refine a query in a way that expands the result set. The question is how we provide users with options that  increase recall while preserving as much precision as possible.</p>
<p><strong>Frequency : Recall :: Similarity : Precision</strong></p>
<p>Developers of faceted search systems don&#8217;t necessarily invest much thought into deciding which faceted refinement options to present to users. Some systems simply avoid dead ends, offer user all refinement options that least to a non-zero result set. This approach breaks down when there are too many options, in which case most systems offer users the most frequent facet values. A <a href="http://www.uie.com/events/virtual_seminars/facets/Faceted%20Search%20-%20Chapter%207.pdf">chapter</a> in my <a href="http://thenoisychannel.com/faceted-search-the-book/">faceted search book</a> discusses some other options.</p>
<p>Unfortunately, the number of options for guided exploration &#8211; at least if we go beyond the very limited basic options &#8212; is too vast to apply such a naive approach. Unions never lead to dead ends, and we don&#8217;t have a simple measure like frequency to rank our options.</p>
<p>Or perhaps we do. A good reason to favor frequent values as faceted refinement options is that they tend to preserve recall. What we need is a measure that tends to preserving precision when we expand a result set.</p>
<p>That measure is set similarity. More specifically, it is the asymmetric similarity between a set and a superset containing it, which we can think of as the former&#8217;s representativeness of the latter. If we are working with facets, we can measure this similarity in terms of differences between distributions of the facet values. If the current set has high precision, we should favor supersets that are similar to it in order to preserve precision.</p>
<p>I&#8217;ll spare readers the math, but I encourage you to read about <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">Kullback-Leibler divergence</a> and <a href="http://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence">Jensen-Shannon divergence</a> if you are not familiar with measures of similarity between probability distributions. I&#8217;m also glossing over key implementation details  &#8211; such as how to model distributions of facet values as probability distributions, and how to handle  smoothing and normalization for set size. I&#8217;ll try to cover these in future posts. But for now, let&#8217;s assume that we can measure the similarity between a set and a superset.</p>
<p><strong>Guided Exploration: A General Framework</strong></p>
<p>We now have the elements to put together a general framework for guided exploration:</p>
<ul>
<li>Generate a set of candidate expansion options from the current search query using operations such as the following:</li>
<ul>
<li>De-select a facet value.</li>
<li>Replace a facet value with its parent.</li>
<li>Replace a facet value with the union of it and other values from that facet.</li>
<li>Remove a search keyword.</li>
<li>Replace a search keyword with the union of it and related keywords.</li>
<li>Replace the entire query with the union of it and a related facet value selection.</li>
<li>Replace the entire query with the union of it and a related keyword search.</li>
</ul>
<li>Evaluate each expansion option based on the similarity of the resulting set to the current one.</li>
<li>Present the most similar sets to the user as expansion options.</li>
</ul>
<p><strong>Visualizing Drift</strong></p>
<p>It&#8217;s one thing to tell a user that two sets are distributionally similar based on an information-theoretic measure, and another to communicate that similarity in a language the user can understand. Here is an example of visualizing the similarity between <em>[information retrieval]</em><em> </em>AND<em> School: CMU</em> and <em>[information retrieval]</em><em> </em>AND<em> School: (CMU or MIT)</em>:</p>
<p><img class="alignnone  wp-image-4047" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="[information retrieval] AND School: CMU" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/01/Information-Retrieval-AND-CMU.png" alt="" width="510" height="226" /></p>
<p style="text-align: center;"><img class="aligncenter" title="down arrow" src="http://upload.wikimedia.org/wikipedia/commons/a/a3/Down_arrow.svg" alt="" width="130" height="121" /></p>
<p><img class="alignnone  wp-image-4048" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="[information retrieval] AND School: (CMU OR MIT)" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2012/01/Information-Retrieval-AND-CMU-OR-MIT.png" alt="" width="539" height="226" /></p>
<p>As we can see from even this basic visualization, replacing <em>CMU</em> with (<em>CMU</em> OR<em> MIT)</em> increases the number of results by 70% while keeping a similar distribution of current companies &#8212; the notable exception being people who work for their almae matres.</p>
<p><strong>Conclusion</strong></p>
<p>Faceted search offers some of the most convincing evidence in favor of <a href="http://ils.unc.edu/~march/">Gary Marchionini</a>&#8216;s <a href="http://www.asis.org/Bulletin/Jun-06/marchionini.html">advocacy</a> that we &#8220;empower people to explore large-scale information bases but demand that people also take responsibility for this control&#8221;. Guided exploration aims to generalize the value proposition of faceted search by inverting the roles of precision and recall. Given the <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">importance of recall</a>, I hope to see progress in this direction. If this is a topic that interests you, give me a shout. Especially if you&#8217;re a student looking for an <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=2350247">internship</a> this summer!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/17/guided-exploration/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/17/guided-exploration/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Next Play!</title>
		<link>http://thenoisychannel.com/2012/01/01/next-play/</link>
		<comments>http://thenoisychannel.com/2012/01/01/next-play/#comments</comments>
		<pubDate>Sun, 01 Jan 2012 21:30:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=4031</guid>
		<description><![CDATA[Every year brings its own adventures, but for me 2011 will be a tough act to follow. A year ago, I&#8217;d just started working at LinkedIn, and my biggest concern was selling our apartment in Brooklyn so that my family could join me in California. Little did I imagine that my new manager, who had just recruited me [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.linkedin.com/2011/11/23/inday-culture-from-across-the-globe/"><img class="alignnone" title="Next Play: LinkedIn's garage band" src="http://blog.linkedin.com/wp-content/uploads/2011/11/pic-13-next-play.jpg" alt="" width="500" height="333" /></a></p>
<p>Every year brings its own adventures, but for me 2011 will be a tough act to follow.</p>
<p>A year ago, I&#8217;d just started <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">working at LinkedIn</a>, and my biggest concern was selling our <a href="http://sites.google.com/site/245henry/">apartment</a> in Brooklyn so that my family could join me in California.</p>
<p><a href="https://sites.google.com/site/245henry/"><img class="alignnone" title="Bye bye, Brooklyn!" src="http://sites.google.com/site/245henry/_/rsrc/1292981937914/home/View.jpg" alt="" width="200" height="150" /></a></p>
<p>Little did I imagine that my new manager, who had just recruited me from Google to LinkedIn (and persuaded my family to change coasts!), would leave three months later for a startup. Welcome to Silicon Valley! At the time, I felt unready for the abrupt transition into the product executive team. In retrospect, I&#8217;m thankful for the kick in the pants that helped me transform my role and brought the best out of a great team.</p>
<p><a href="http://genelu.com/2011/06/beware-of-infauxgraphics/"><img class="alignnone" title="Talent War Infographic (Credit: Gene Lu)" src="http://genelu.com/wp-content/uploads/2011/06/talent-war-infographic-redo.jpg" alt="" width="200" height="205" /></a></p>
<p>Summer brought the excitement of LinkedIn&#8217;s <a href="http://thenoisychannel.com/2011/05/19/going-public/">IPO</a>. The process was exhilarating, especially to someone who had worked for over a decade at a pre-IPO company.</p>
<p><a href="http://blog.linkedin.com/2011/05/19/lnkd-bell-ringing/"><img class="alignnone" title="LinkedIn IPO" src="http://farm6.staticflickr.com/5190/5737441522_2cd62b4e3f_b.jpg" alt="" width="200" height="122" /></a></p>
<p>Nonetheless, we didn&#8217;t let the IPO distract us from our mission. In March, we celebrated our <a href="http://blog.linkedin.com/2011/03/22/linkedin-100-million/">100 millionth member</a>; by November, we passed 135 million. And <a href="http://press.linkedin.com/about">lots more</a>. We released new data products like <a href="http://blog.linkedin.com/2011/02/03/linkedin-skills/">Skills</a> and <a href="http://blog.linkedin.com/2011/10/19/linkedin-alumni/">Alumni</a>. We won the <a href="http://www.oscon.com/oscon2011/public/schedule/detail/21349">OSCON Data Innovation Award</a> for contributions to the open source software for big data. We also acquired a few companies, including search engine startup <a href="http://k9ventures.com/blog/2011/10/11/congratulations-indextank/">IndexTank</a>. In short, we heeded the two short words on the back of our commemorative IPO t-shirts: &#8220;next play&#8221;.</p>
<p><a href="http://www.cmu.edu/homepage/computing/2011/fall/third-times-a-charm.shtml"><img class="alignnone" title="Celebrating the IndexTank acquisition with Diego Basch and Manu Kumar" src="http://www.cmu.edu/homepage/images/2011/third_time_charm_2_201x201.jpg" alt="" width="200" height="200" /></a></p>
<p>Fall was an intense season of conferences. Between <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/">CIKM</a>, <a href="http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/">HCIR</a>, <a href="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/">RecSys</a>, <a href="http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/">Strata</a>, and a talk at <a href="http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/">CMU</a>, it was a great opportunity to connect and reconnect with researchers and practitioners around the world. I am particularly proud of the success of this year&#8217;s HCIR workshop, which showed how much the workshop (now to become a 2-day symposium!) has grown up in five years.</p>
<p><a href="http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/"><img class="alignnone" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/Keeping-It-Professional.png" alt="" width="200" height="150" /></a></p>
<p>But what capped my year off was seeing <a href="http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/">Endeca</a>, the company I helped start in 1999, become one of Oracle&#8217;s largest <a href="http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/">acquisitions</a>. Even though it&#8217;s been two years since I left, Endeca will always be a core facet of my professional identity. I look forward to great things from all the folks I worked with.</p>
<p><a href="http://www.tbkconsult.com/blog/2011/11/02/oracle-acquires-endeca/"><img class="alignnone" style="border-image: initial; border-width: 1px; border-color: black; border-style: solid;" title="Oracle acquires Endeca" src="http://www.tbkconsult.com/blog/wp-content/uploads/2011/11/Endeca-to-Oracle.jpg" alt="" width="200" height="120" /></a></p>
<p>That brings us to 2012, ready to start a new year of adventures. Tough or not, our job is to make every new year more amazing than the previous ones. I&#8217;m ready for the challenge, and I hope you are too.</p>
<p>Here&#8217;s a teaser of what I have planned:</p>
<ul>
<li>My team at LinkedIn is launching into 2012 with a strong focus on derived data quality and relevance. As regular readers know, I see data quality and richer interfaces for information seeking as <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/">inseparable concerns</a>. And, speaking of quality, we&#8217;re <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1827722">hiring</a>!</li>
<li>I&#8217;ll be speaking at <a href="http://strataconf.com/strata2012">Strata</a> in a couple of months with <a href="http://www.linkedin.com/in/clairehunsaker">Claire Hunsaker</a> of <a href="http://blog.linkedin.com/2011/10/20/linkedin-samasource/">Samasource</a> about &#8220;<a href="http://strataconf.com/strata2012/public/schedule/detail/22363">Humans, Machines, and the Dimensions of Microwork</a>&#8220;. I&#8217;m very excited to talk about the intersection of crowdsourcing and data science. And I&#8217;ll be joined by three of LinkedIn&#8217;s top data scientists: <a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a>, <a href="http://www.linkedin.com/in/shahsam">Sam Shah</a>, and <a href="http://www.linkedin.com/in/peterskomoroch">Pete Skomoroch</a>.</li>
<li>I&#8217;ll be co-chairing the RecSys Industry Track this fall with <a href="http://research.yahoo.com/Yehuda_Koren">Yehuda Koren</a>. I&#8217;m honored to have the opportunity to work with Yehuda, who was part of the <a href="http://www2.research.att.com/~volinsky/netflix/bpc.html">Netflix Grand Prize team</a> and won <a href="http://labs.yahoo.com/node/639">best paper</a> at RecSys 2011. We&#8217;re still putting together the program, but you can look at <a href="http://recsys.acm.org/2011/industry_track.shtml">last year&#8217;s program</a> to get an idea of what&#8217;s in store.</li>
<li>I&#8217;ll be at the CIKM Industry Event, this time as an invited speaker. CIKM will be take place in <a href="http://www.gohawaii.com/maui">Maui</a> this fall and I&#8217;m excited about the program that <a href="http://www.cs.technion.ac.il/~gabr/">Evgeniy Gabrilovich</a> is putting together for the Industry Event. It will be an all-invited program, just like <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/">last year</a>.</li>
</ul>
<p>I hope you&#8217;re also starting 2012 with a fresh sense of purpose. Let&#8217;s take a last moment to reflect on a great <a href="http://blog.linkedin.com/2011/12/23/linkedin-blog-2011/">2011</a>, and then…NEXT PLAY!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2012/01/01/next-play/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2012/01/01/next-play/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>HCIR 2011: Now on YouTube!</title>
		<link>http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/</link>
		<comments>http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/#comments</comments>
		<pubDate>Sat, 17 Dec 2011 16:56:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3983</guid>
		<description><![CDATA[The Fifth Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011), held on October 20th at Google&#8217;s main campus in Mountain View, California, was a resounding success. We has almost a hundred people, presenting a wide array of papers, posters, and challenge entries. You can read my summary of the event in an earlier blog post: &#8220;HCIR 2011: We [...]]]></description>
			<content:encoded><![CDATA[<p>The Fifth Workshop on Human-Computer Interaction and Information Retrieval (<a href="http://hcir.info/hcir-2011">HCIR 2011</a>), held on October 20th at <a href="http://maps.google.com/?q=Google%20Inc.@37.423156,-122.084917&amp;hl=en">Google&#8217;s main campus</a> in Mountain View, California, was a resounding success. We has almost a hundred people, presenting a wide array of <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/schedule/presentations">papers</a>, <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/posters">posters</a>, and <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/challenge">challenge entries</a>. You can read my summary of the event in an earlier blog post: &#8220;<a href="http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/">HCIR 2011: We Have Arrived!</a>&#8220;.</p>
<p>Better yet, you can now, for the first time in the workshop&#8217;s history, watch videos of the presentations. Embedded below are videos of Gary Marchionini&#8217;s keynote address and of the two paper presentation sessions. Thanks again to Google for being such a gracious host &#8212; now online as well as offline!</p>
<h3>Keynote</h3>
<p><iframe src="http://www.youtube.com/embed/jj5Q3FmPVl0" frameborder="0" width="504" height="284"></iframe></p>
<h3>Morning Presentations</h3>
<p><iframe src="http://www.youtube.com/embed/2112ylDx7zs" frameborder="0" width="504" height="284"></iframe></p>
<h3>Afternoon Presentations</h3>
<p><iframe src="http://www.youtube.com/embed/AAgKfvbH7ds" frameborder="0" width="504" height="284"></iframe></p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/12/17/hcir-2011-now-on-youtube/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Jim Adler: The Accidental Chief Privacy Officer</title>
		<link>http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/</link>
		<comments>http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/#comments</comments>
		<pubDate>Sun, 04 Dec 2011 22:49:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3974</guid>
		<description><![CDATA[Privacy is the third rail of the cloud. On one hand, the ease of sharing information and the power of analytics have produced extraordinary value for consumers, as well as great business models for companies that serve those consumers. On the other hand, people have good reason to worry about the unintended consequences of over-sharing. [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/Q9UqtRvPOVY" frameborder="0" width="504" height="284"></iframe></p>
<p>Privacy is the third rail of the cloud. On one hand, the ease of sharing information and the power of analytics have produced extraordinary value for consumers, as well as great business models for companies that serve those consumers. On the other hand, people have good reason to worry about the unintended consequences of over-sharing.</p>
<p>When I attended the <a href="http://strataconf.com/stratany2011">O&#8217;Reilly Strata New York Conference</a> in September, I had the pleasure to hear and meet Intelius&#8217;s <a href="http://jimadler.me/">Jim Adler</a> talk about being his company&#8217;s &#8220;accidental chief privacy officer&#8221;. <a href="http://www.intelius.com/">Intelius</a>&#8216;s main product is people search &#8212; an area that naturally brings up privacy concerns. Especially since Intelius aggregates and publishes information about people from databases of public records, eroding a history of &#8220;<a href="http://thenoisychannel.com/2008/05/01/privacy-through-difficulty/">privacy through difficulty</a>&#8220;. Impressed with Jim&#8217;s talk at Strata, I persuaded him to deliver a similar talk at <a href="http://www.youtube.com/linkedintechtalks">LinkedIn</a>, the <a href="http://www.youtube.com/watch?v=Q9UqtRvPOVY">video</a> of which you can find above. You can also find his slides on <a href="http://www.slideshare.net/jim-adler/20111116-linked-in1">SlideShare</a>.</p>
<p>Jim brings nuance to the discussion of privacy &#8212; nuance that discussions of online privacy often lack. For example, he responded to the recent controversy about social networks&#8217; &#8220;real names&#8221; policy with a measured post entitled &#8220;<a href="http://jimadler.me/post/9294501184/nyms-pseudonyms-or-anonyms-all-of-the-above">Nyms, Pseudonyms, or Anonyms? All of the Above</a>&#8220;.</p>
<p>Jim appropriately opened his talk by disclosing a personal example. He shares his name with a more prominent <a href="http://www.jimadler.com/">personal injury lawyer</a> who dominates search results for that name, raising the potential of taint by association. Intelius&#8217;s core technical problem is to cluster inputs from the sources it aggregates, thus mapping each person to exactly one record in its database.</p>
<p>Jim went on to note that we are at a stage in the privacy debate where we are likely to see more regulation. He makes a few key observations:</p>
<ul>
<li>Social norms, which form the basis of our laws and regulations (the notion of a &#8220;reasonable expectation of privacy) have changed suddenly, leading to a &#8220;privacy vertigo&#8221; where suddenly the whole world now feels like a small town.</li>
<li>Sharing is a gateway from private to public, which often leads to violation of expectations. This problem is not new, but the efficiency of online sharing dramatically amplifies the unintended consequences of sharing. It is crucial that the parties involved in sharing data also have shared expectations around how that data will be used or disclosed.</li>
<li>We need to distinguish between data use and data access, and not to try to regulate data use with data access regulations. He cites the <a href="http://en.wikipedia.org/wiki/Fair_Credit_Reporting_Act">Fair Credit Reporting Act</a> as one of the most inspired laws of the last 40 years to regulate data use. If you don&#8217;t have time to listen to the whole talk, I recommend you jump to <a href="http://www.youtube.com/watch?v=Q9UqtRvPOVY#t=25m12s">25:12</a>, where he discusses this law in detail.</li>
</ul>
<p>There&#8217;s a lot more in the talk, so I&#8217;m not going to try to summarize it all here. I strongly encourage you to check out the video (which includes lengthy <a href="http://www.youtube.com/watch?v=Q9UqtRvPOVY#t=53m00s">Q&amp;A</a>) and the slides. Better yet, let&#8217;s use the comments to discuss!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/12/04/jim-adler-the-accidental-chief-privacy-officer/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Slides and Summaries</title>
		<link>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/</link>
		<comments>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 06:23:39 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3967</guid>
		<description><![CDATA[I&#8217;ve posted slides and summaries for all ten CIKM 2011 Industry Event presentations: Stephen Robertson (Microsoft Research): Why Recall Matters John Giannandrea (Google): Freebase &#8211; A Rosetta Stone for Entities Jeff Hammerbacher (Cloudera): Experiences Evolving a New Analytical Platform: What Works and What&#8217;s Missing Khalid Al-Kofahi (Thomson Reuters): Combining Advanced Technology and Human Expertise in [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cikm2011.org/node/20"><img class="alignnone" title="CIKM 2011 Industry Event" src="http://www.cikm2011.org/sites/default/files/cikm2011_craigm_v1_logo.jpg" alt="" width="466" height="74" /></a></p>
<p>I&#8217;ve posted slides and summaries for all ten <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a> presentations:</p>
<ul>
<li>Stephen Robertson (Microsoft Research): <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">Why Recall Matters</a></li>
<li>John Giannandrea (Google): <a href="http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/">Freebase &#8211; A Rosetta Stone for Entities</a></li>
<li>Jeff Hammerbacher (Cloudera): <a href="http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/">Experiences Evolving a New Analytical Platform: What Works and What&#8217;s Missing</a></li>
<li>Khalid Al-Kofahi (Thomson Reuters): <a href="http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/">Combining Advanced Technology and Human Expertise in Legal Research</a></li>
<li>Chavdar Botev (LinkedIn): <a href="http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/">Databus: A System for Timeline-Consistent Low-Latency Change Capture</a></li>
<li>Ben Greene (SAP): <a href="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/">Large Memory Computers for In-Memory Enterprise Applications</a></li>
<li>David Hawking (Funnelback): <a href="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/">Search Problems and Solutions in Higher Education</a></li>
<li>Ed Chi (Google): <a href="http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/">Model-Driven Research in Social Computing</a></li>
<li>Vanja Josifovski (Yahoo! Research): <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/">Toward Deep Understanding of User Behavior on the Web</a></li>
<li>Ilya Segalovich (Yandex): <a href="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/">Improving Search Quality at Yandex: Current Challenges and Solutions</a></li>
</ul>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-slides-and-summaries/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Ilya Segalovich on Improving Search Quality at Yandex</title>
		<link>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/</link>
		<comments>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/#comments</comments>
		<pubDate>Mon, 28 Nov 2011 06:10:37 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3963</guid>
		<description><![CDATA[This post is last in a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The final talk of the CIKM 2011 Industry Event was a talk from Yandex co-founder and CTO Ilya Segalovich on &#8220;Improving Search Quality at Yandex: Current Challenges and Solutions&#8220;. Yandex is the world&#8217;s #5 search engine. It dominates [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10357517?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is last in a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The final talk of the CIKM 2011 Industry Event was a talk from <a href="http://www.yandex.com/">Yandex</a> co-founder and CTO <a href="http://company.yandex.com/corporate_governance/board_of_directors/ilya_segalovich.xml">Ilya Segalovich</a> on &#8220;<a href="http://www.cikm2011.org/industryevent#is">Improving Search Quality at Yandex: Current Challenges and Solutions</a>&#8220;.</p>
<p>Yandex is the world&#8217;s #5 search engine. It dominates the Russian search market, where it has over 64% market share. Ilya focused on three challenges facing Yandex: result diversification, recency-specific ranking, and cross-lingual search.</p>
<p>For result diversification, Ilya focused on queries containing entities without any addition indicators of intent. He asserted that entities offer a strong but incomplete signal of query intent, and in particular that entities often call for suggested query reformulations. The first step in processing such a query is entity categorization. Ilya said that Yandex achieved almost 90% precision using machine learning, and over 95% precision by incorporating manually tuned heuristics. The second step is enumerating possible search intents for the identified category in order to optimize for intent-aware <a href="http://www.isi.edu/~metzler/papers/metzler-cikm09.pdf">expected reciprocal rank</a>. By diversifying entity queries, Yandex reduced abandonment on popular queries, increased click-through rates, and was able to highlight possible intents in result snippets.</p>
<p>Ilya then talked about the problem of balancing recency and relevance in handling queries about current events. He sees recency ranking as a diversification problem, since a desire for recent content is a kind of query intent. A challenge is managing recency-specific ranking is to predict the recency sensitivity of the user for a given query. Yandex considers factors such as the fraction of results found that are at most 3 days old, the number of news results, spikes in the query stream, lexical cues (e.g., searches for &#8220;explosion&#8221; or &#8220;fire&#8221;), and Twitter trending topics. He also referred to a WWW 2006 paper he co-authored on <a href="http://www2006.org/programme/files/pdf/p71.pdf">extracting news-related queries from web query logs</a>. The results of these efforts led to measurable improvements in click-based metrics of user happiness.</p>
<p>Ilya talked about a variety of efforts to support cross-lingual search. Russian users enter a significant fraction (about 15%) of non-Russian queries, but many still prefer Russian-language results. For example, a search for a company name return that company&#8217;s Russian-language home page if one is available. Yandex implements language personalization by learning a user&#8217;s language knowledge and using it as a factor in relevance computation. Yandex also uses machine translation to serve results for Russian-language queries when there are no relevant Russian-language results.</p>
<p>Ilya concluded by pitching the efforts that Yandex is making to participate in and support the broader information retrieval community, including running (and releasing data for) a <a href="http://imat-relpred.yandex.ru/en">relevance prediction challenge</a>. It&#8217;s great to see a reminder that there is more to web search than Google vs. Bing, and refreshing to see how much Yandex shares its methodology and results with the IR community.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-ilya-segalovich-on-improving-search-quality-at-yandex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Vanja Josifovski on Toward Deep Understanding of User Behavior on the Web</title>
		<link>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/</link>
		<comments>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/#comments</comments>
		<pubDate>Sun, 27 Nov 2011 21:14:54 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3959</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. Those of you who attended the SIGIR 2009 Industry Track had the opportunity to hear Yahoo researcher Vanja Josifovski make an eloquent case for ad retrieval as a new frontier of information retrieval. At the CIKM 2011 Industry Event, [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10353828?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Those of you who attended the <a href="http://sigir2009.org/Program/industry">SIGIR 2009 Industry Track</a> had the opportunity to hear Yahoo researcher <a href="http://research.yahoo.com/Vanja_Josifovski">Vanja Josifovski</a> make an eloquent case for <a href="http://thenoisychannel.com/2009/07/31/sigir-2009-day-3-industry-track-vanja-josifovski/">ad retrieval as a new frontier of information retrieval</a>. At the CIKM 2011 Industry Event, Vanja delivered an equally compelling presentation entitled &#8220;<a href="http://www.cikm2011.org/industryevent#vj">Toward Deep Understanding of User Behavior: A Biased View of a Practitioner</a>&#8220;.</p>
<p>Vanja first offered a vision in which the web of the future will be  your life partner, delivering life-long pervasive personalized experience. Everything will be personalized, and the experience will pervade your entire online experience &#8212; from your laptop to your web-enabled toaster.</p>
<p>He then brought us back to the state of personalization today. For search personalization, the low <a href="http://en.wikipedia.org/wiki/Entropy_(information_theory)">entropy</a> of query intent makes it difficult &#8212; or too risky &#8212; to significantly outperform the baseline of non-personalized search. In his view, the action today is in content recommendation and ad targeting, where there is high entropy of intent and lots of room for improvement over today&#8217;s crude techniques.</p>
<p>How do we achieve these improvements? We need more data, larger scale, and better methods for reasoning about data. In particular, Vanja noted the data we have today &#8212; searches, page views, connections, messages, purchases &#8212; represents the user&#8217;s thin observable state. In contrast, we lack data about the user&#8217;s internal state, e.g., is the user jet-lagged or worried about government debt. Vanja said that the only way to get more data is to motivate users by creating value for them with it &#8212; i.e., <a href="http://thenoisychannel.com/2011/04/14/social-utility-25/">social is give to get</a>.</p>
<p>Of course, we can&#8217;t talk about user&#8217;s hidden data without thinking about privacy. Vanja asserts that privacy is not dead, but that it&#8217;s in hibernation. So far, he argued, we&#8217;ve managed with a model of industry self-governance with relatively minor impact from data leaks &#8212; specifically as compared to the offline world. But he is apprehensive at the prospect of a major privacy breach inducing legislation that sets back personalization efforts for decades.</p>
<p>Vanja then talked about current personalization methods, including <a href="http://en.wikipedia.org/wiki/Machine_learning">learning</a> relationships among features, <a href="http://en.wikipedia.org/wiki/Dimensionality_reduction">dimensionality reduction</a>, and <a href="http://en.wikipedia.org/wiki/Smoothing">smoothing</a> using external data. He argues that many of the models are mathematically very similar to one another, and it is difficult to analyze the relative merits of the models as opposed to other implementation details of the systems that use them.</p>
<p>Finally, Vanja touched on scale issues. He noted that the <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a> framework imposes significant restrictions on algorithms used for personalization, and that we need the right abstractions for modeling in parallel environments.</p>
<div data-is-reply-to="false" data-tweet-id="129588130402668544" data-item-id="129588130402668544" data-screen-name="dtunkelang" data-user-id="15937226">
<div>
<div>
<div>Vanja concluded his talk by citing the role of CIKM as a conference in bringing together the communities that research deep user understanding, information retrieval, and databases. Given the exciting <a href="http://www.gohawaii.com/maui">venue</a> for next year&#8217;s conference, I&#8217;m sure we&#8217;ll continue to see CIKM play this role!</div>
</div>
</div>
</div>
<p>ps. My thanks to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging his <a href="http://www.searchenginecaffe.com/2011/10/cikm-2011-industry-toward-deep.html">notes</a>.</p>
<p>&nbsp;</p>
<div></div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/27/cikm-2011-industry-event-vanja-josifovski-on-toward-deep-understanding-of-user-behavior-on-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Ed Chi on Model-Driven Research in Social Computing</title>
		<link>http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/</link>
		<comments>http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/#comments</comments>
		<pubDate>Fri, 25 Nov 2011 21:23:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3951</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. Given the extraordinary ascent of all things social in today&#8217;s online world, we could hardly neglect this theme at the CIKM 2011 Industry Event. We were lucky to have Ed Chi, who recently left the PARC [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10164910" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Given the extraordinary ascent of all things social in today&#8217;s online world, we could hardly neglect this theme at the CIKM 2011 Industry Event. We were lucky to have <a href="http://www-users.cs.umn.edu/~echi/">Ed Chi</a>, who recently left the <a href="http://www.parc.com/">PARC</a> Augmented Social Cognition Group to work on Google+, presenting &#8220;<a href="http://www.cikm2011.org/industryevent#ec">Model-Driven Research in Social Computing</a>&#8220;.</p>
<p>Ed warned us at the beginning of the talk that his focus would be on work he&#8217;d done prior to joining Google. Nonetheless, he offered an interesting collection of public statistics about social activity associated with Google properties: 360M words per day being published on Blogger, 150 years of YouTube video being watched everyday on Facebook, and 40M+ people using Google+. Regardless of how Google has fared in the competition for social networking mindshare, Google is clearly no stranger to online social behavior.</p>
<p>Ed then dove into recent research that he and colleagues have done on Twitter activity. Since all of the papers he discussed are available online, I will only touch on highlights. I encourage you to read the full papers:</p>
<ul>
<li><a href="http://www-users.cs.umn.edu/~echi/papers/2010-socialcom/2010-06-25-retweetability-cameraready-v3.pdf">Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network</a></li>
<li><a href="http://www.parc.com/content/attachments/tweets-from-justin.pdf">Tweets from Justin Bieber&#8217;s Heart: the Dynamics of the &#8220;Location&#8221; Field in User Profiles</a></li>
<li><a href="http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2813/3225">Is Twitter a Good Place for Asking Questions? A Characterization Study</a></li>
<li><a href="http://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2856/3250">Language Matters in Twitter: A Large Scale Study</a></li>
<li><a href="http://www-users.cs.umn.edu/~echi/papers/2010-UIST/eddi-uist2010.pdf">Eddi: Interactive Topic-based Browsing of Social Status Streams</a></li>
<li><a href="http://www-users.cs.umn.edu/~echi/papers/2010-CHI/Zerozero88-tweet-recommender-ASC-PARC.pdf">Short and Tweet: Experiments on Recommending Content from Information Streams</a></li>
<li><a href="http://www.grouplens.org/system/files/p217-chen.pdf">Speak Little and Well: Recommending Conversations in Online Social Streams</a></li>
</ul>
<p>Ed talked at some length about language-dependent behavior on Twitter. For example, tweets in French are more likely to contain URLs than those in English, while tweets in Japanese are less likely (perhaps because the language is more compact relative to Twitter&#8217;s 140-character limit?). Tweets in Korean are far more likely to be conversational (i.e., explicitly mentioning or replying to other users) than those in English. These differences remind us to be cautious in generalizing our understanding of online social behavior from the behavior of English-speaking users. Ed also talked about cross-language &#8220;brokers&#8221; who tweet in multiple languages: he sees these as indicating connection strength between languages, as well as giving us insight to improve cross-­language communication.</p>
<p>Ed then talked about ways to reduce information overload in social streams. These included <a href="http://www-users.cs.umn.edu/~echi/papers/2010-UIST/eddi-uist2010.pdf">Eddi</a>, a tool for summarizing social streams, and <a href="https://twitter.com/#!/zerozero88">zerozero88</a>, a closed experiment to produce a personal newspaper from a tweet stream. In analyzing the results of the zerozero88 experiment, Ed and his colleagues found that the most successful recommendation strategy combined users&#8217; self-voting with social voting by their friends of friends. They also found that users wanted both relevance and serendipity &#8212; a challenge since the two criteria often compete with one another.</p>
<p>Ed concluded by offering the following design rule: since interaction costs determine number of the people who participate in social activity, get more people into the system by reducing interaction cost. He asserted that this is a key design principle for Google+.</p>
<p>My skepticism about Google&#8217;s social efforts is a matter of public record (cf. <a href="http://thenoisychannel.com/2011/04/14/social-utility-25/">Social Utility, +/- 25%</a>; <a href="http://thenoisychannel.com/2011/07/04/google%C2%B1/">Google±?</a>). But hiring Ed Chi was a real coup for Google, and I&#8217;m optimistic about what he&#8217;ll bring to the Google+ effort.</p>
<p>ps. My thanks to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging his <a href="http://www.searchenginecaffe.com/2011/10/cikm-2011-industry-model-driven.html">notes</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/25/cikm-2011-industry-event-ed-chi-on-model-driven-research-in-social-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: David Hawking on Search Problems and Solutions in Higher Education</title>
		<link>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/</link>
		<comments>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 01:45:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3946</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. One of the recurring themes at the CIKM 2011 Industry Event was that not all search is web search. Stephen Robertson, in advocating why recall matters, noted that web search was exceptional rather than typical as [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10280109?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>One of the recurring themes at the CIKM 2011 Industry Event was that not all search is web search. Stephen Robertson, in advocating <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">why recall matters</a>, noted that web search was exceptional rather than typical as an information retrieval domain. Khalid Al-Kofahi on spoke about the challenges of <a href="http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/">legal search</a>. Focusing on a different vertical, <a href="http://www.funnelback.com/">Funnelback</a> Chief Scientist <a href="http://david-hawking.net/">David Hawking</a> spoke about &#8220;<a href="http://www.cikm2011.org/industryevent#dh">Search Problems and Solutions in Higher Education</a>&#8220;.</p>
<p>David spent most of the presentation focusing on work that Funnelback did for the <a href="http://www.anu.edu.au/">Australian National University</a>. Funnelback was originally developed by <a href="http://www.csiro.au/">CSIRO</a> and the ANU under the name <a href="http://homepages.cwi.nl/~arjen/wird04/presentations/irdbxml.pdf">Panoptic</a>.</p>
<p>The ANU has a substantial web presence, comprised of hundreds of sites and over a million pages. Like many large sites, it suffers from propagation delay: the most important pages are fresh, but material on the outposts can be stale. Moreover, there is broad diversity of authorship.</p>
<p>The university also has a strong editorial stance for ranking search results: the search engine needs to identify and favor official content. Given the proliferation of unofficial content, it can be a challenge to identify official sites based on signals like incoming link count, click counts, and the use of official style templates.</p>
<p>David described a particular application that Funnelback developed for ANU: a university course finder. The problem is similar to that of ecommerce search and calls for similar solutions, e.g., <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, auto-complete, and suggestions of related queries. And, just as in ecommerce, we can evaluate performance in terms of <a href="http://en.wikipedia.org/wiki/Conversion_rate">conversion rate</a>.</p>
<div data-item-id="129562275987206144" data-item-type="tweet">
<div data-is-reply-to="false" data-tweet-id="129562275987206144" data-item-id="129562275987206144" data-screen-name="dtunkelang" data-user-id="15937226">
<div>
<div>
<div>
<div>
<div>David ended his talk by touching on expertise finding (a problem I think about a lot as a <a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">LinkedIn data scientist</a>!) and showing demos. And, while I no longer work in <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise search</a> myself, I still appreciate its unique challenges. I&#8217;m glad that David and his colleagues are working to overcome those challenges, especially in a domain as important as education.</div>
</div>
</div>
</div>
</div>
</div>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-david-hawking-on-search-problems-and-solutions-in-higher-education/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Ben Greene on Large Memory Computers for In-Memory Enterprise Applications</title>
		<link>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/</link>
		<comments>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/#comments</comments>
		<pubDate>Tue, 22 Nov 2011 17:39:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3941</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. Large-scale computation was, not surprisingly, a major theme at the CIKM 2011 Industry Event. Ben Greene, Director of SAP Research Belfast, delivered a presentation on &#8220;Large Memory Computers for In-Memory Enterprise Applications&#8220;. Ben started by defining in-memory [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10274812?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>Large-scale computation was, not surprisingly, a major theme at the CIKM 2011 Industry Event. <a href="http://www.linkedin.com/pub/ben-greene/1/93/785">Ben Greene</a>, Director of SAP Research Belfast, delivered a presentation on &#8220;<a href="http://www.cikm2011.org/industryevent#bg">Large Memory Computers for In-Memory Enterprise Applications</a>&#8220;.</p>
<p>Ben started by defining in-memory computing as &#8220;technology that allows the processing of massive quantities of real time data in the main memory of the server to provide immediate results from analyses and transactions&#8221;. He then asked whether the cloud enables real-time computing, since there is a clear market hunger for cloud computing to solve the problems of our current enterprise systems.</p>
<p>Not surprisingly, he advocated in-memory computing as the solution for those problems. Like <a href="http://www.stanford.edu/~ouster/">John Ousterhout</a> and the <a href="https://ramcloud.stanford.edu/">RAMCloud</a> team, he sees the need to scale <a href="http://en.wikipedia.org/wiki/Dynamic_random-access_memory">DRAM</a> memory independently from physical boxes. He proposed a model of coherent shared memory, using high-speed low-latency networks and separating the data transport and cache layers into a separate tier below the operating system. The goal: no server-side application caches, DRAM-like latency for physically distributed databases, and in fact no separation between the application server and the database server.</p>
<p>Ben argued that coherent shared memory can dramatically lower the cost of in-memory computing while minimizing the pain for application developers. He also offered some benchmarks for SAP&#8217;s <a href="http://ra.ziti.uni-heidelberg.de/coeht/pages/events/20110208/presentations/ike.pdf">BigIron</a> system to demonstrate the performance improvements.</p>
<p>In short, Ben offered a vision of in-memory computing as a reincarnation of the mainframe. It was an interesting and provocative presentation, and my only regret is that we couldn&#8217;t stage a debate between him and <a href="http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/">Jeff Hammerbacher</a> over the future of large-scale enterprise computing.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/22/cikm-2011-industry-event-ben-greene-on-large-memory-computers-for-in-memory-enterprise-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Chavdar Botev on Databus: A System for Timeline-Consistent Low-Latency Change Capture</title>
		<link>http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/</link>
		<comments>http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 03:23:12 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3934</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. I&#8217;m of course delighted that one of my colleagues at LinkedIn was able to participate in the CIKM 2011 Industry Event. Principal software engineer Chavdar Botev delivered a presentation on &#8220;Databus: A System for Timeline-Consistent Low-Latency Change Capture&#8220;. [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10244215?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>I&#8217;m of course delighted that one of my colleagues at LinkedIn was able to participate in the CIKM 2011 Industry Event. Principal software engineer <a href="http://www.linkedin.com/in/chavdarbotev">Chavdar Botev</a> delivered a presentation on &#8220;<a href="http://www.cikm2011.org/industryevent#cb">Databus: A System for Timeline-Consistent Low-Latency Change Capture</a>&#8220;.</p>
<p>LinkedIn processes a massive amount of member data and activity. It has over 135M members and is growing faster than two new members per second. Based on recent measurements, those members are on track to perform more than four billion searches on the LinkedIn platform in 2011. All of this activity requires a data change capture mechanism that allows external systems, such as its graph index and real-time full-text search index <a href="http://javasoze.github.com/zoie/">Zoie</a>, to act as subscribers in user space and stay up to date with constantly changing data in the primary stores.</p>
<p>LinkedIn has built the Databus system to meet these needs. Databus meets four key requirements: timeline consistency, guaranteed delivery, low latency, and user-space visibility. For example, edits to member profile fields, such as companies and job titles, need to be <a href="http://www.cs.umass.edu/~ronb/papers/kdd2011.pdf">standardized</a>. Also, in order to give recruiters act quickly on feedback to their job postings, we need to be able to propagate the changes to the job description in near-real-time.</p>
<p>Databus propagates data changes throughout LinkedIn&#8217;s architecture. When there is a change in a primary store (e.g., member profiles or connections), the changes are buffered in the Databus Relay through a push or pull interface. The relay can also capture the transactional semantics of updates. Clients poll for changes in the relay. If a client falls behind the stream of change events in the relay, it is redirected to a Bootstrap database that delivers a compressed delta of the changes since the last event seen by the client.</p>
<p>In contrast to generic message systems (including the <a href="http://incubator.apache.org/kafka/index.html">Kafka</a> system that LinkedIn has open-sourced through Apache), Databus has moreinsight in the structure of the messages and can thus do better than just guaranteeing message-level integrity andtransactional semantics for communication sessions.</p>
<p>I tend to live a few levels above core infrastructure, but I&#8217;m grateful that Chavdar and his colleagues build the core platform that makes all of our large-scale data collection possible. After all, without data we have no <a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">data science</a>.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/20/cikm-2011-industry-event-chavdar-botev-on-databus-a-system-for-timeline-consistent-low-latency-change-capture/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Khalid Al-Kofahi on Combining Advanced Search Technology and Human Expertise in Legal Research</title>
		<link>http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/</link>
		<comments>http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/#comments</comments>
		<pubDate>Sat, 19 Nov 2011 23:01:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3930</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The original program for the CIKM 2011 Industry Event featured Peter Jackson, who was chief scientist at Thomson Reuters and author of numerous books and papers on natural language processing. Sadly, Peter died on August 3,2011. Thomson Reuters R&#38;D VP of Research Khalid [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10236335?rel=0" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The original program for the CIKM 2011 Industry Event featured <a href="http://www.jacksonpeter.com/">Peter Jackson</a>, who was chief scientist at <a type="1" href="http://thomsonreuters.com" target="_blank">Thomson Reuters</a> and author of numerous books and papers on <a href="http://www.amazon.com/Natural-Language-Processing-Online-Applications/dp/9027249938/">natural language processing</a>. Sadly, Peter <a href="http://blog.thomsonreuters.com/index.php/remembering-our-colleague-peter-jackson/">died on August 3,2011</a>. Thomson Reuters R&amp;D VP of Research <a href="http://www.linkedin.com/pub/khalid-al-kofahi/0/5a3/b7b">Khalid Al-Kofahi</a> graciously agreed to speak in his place, delivering a presentation on  &#8221;<a href="http://www.cikm2011.org/industryevent#kak">Combining Advanced Search Technology and Human Expertise in Legal Research</a>&#8220;.</p>
<p>Khalid began by giving an &#8220;83-second&#8221; overview of the US legal system, laying out the roles of the law, the courts, and the legislature. He did so to provide the context for the domain that Thomson Reuters serves &#8212; namely, legal information. Legal information providers curate legal information, enhance it editorially and algorithmically, and work to make legal information findable and explainable in particular task contexts. He then worked through an example of how a case law document (specifically, <em><a href="http://en.wikipedia.org/wiki/Burger_King_v._Rudzewicz">Burger King v. Rudzewicz</a></em>), appears in <a href="http://store.westlaw.com/westlawnext/">WestLawNext</a>, with annotations that include headnotes, topic codes, citation data, and historical context.</p>
<p>Channelling <a href="http://www.slideshare.net/dtunkelang/google-tech-talk-reconsidering-relevance-presentation/13">William Goffman</a>, Khalid asserted that a document&#8217;s content (words, phrases, metadata) are not sufficient to determine its aboutness and importance. Rather, we also have to consider what other people say about the document and how they interact with it. This is especially true in the legal domain because of the <a href="http://en.wikipedia.org/wiki/Precedent">precedential</a> nature of law. He then framed legal search in terms of information retrieval metrics, stating the requirements as completeness (<a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a>), accuracy (<a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a>), and authority. Not surprisingly, Khalid agreed with <a href="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/">Stephen Robertson&#8217;s emphasis on the importance of recall</a>.</p>
<p>Speaking more generally, Khalid noted that <a href="http://en.wikipedia.org/wiki/Vertical_search">vertical search</a> is not just about search. Rather, it’s about findability. which includes navigation, recommendations, clustering, <a href="http://en.wikipedia.org/wiki/Faceted_classification">faceted classification</a>, collaboration, etc. Most importantly, it&#8217;s about satisfying a set of well-understood tasks. And, particularly in the legal domain, customers demand explainable models. Beyond this demand, explainability serves an additional purpose: it enables the human searcher to add value to the process (cf. <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a>).</p>
<p>It is sad to lose a great researcher like Peter Jackson from our ranks, but I am grateful that Khalid was able to honor his memory by presenting their joint work at CIKM. If you&#8217;d like to learn more, I encourage you to read the publications on the <a href="http://labs.thomsonreuters.com/">Thomson Reuters Labs</a> page.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/19/cikm-2011-industry-event-khalid-al-kofahi-on-combining-advanced-search-technology-and-human-expertise-in-legal-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Jeff Hammerbacher on Experiences Evolving a New Analytical Platform</title>
		<link>http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/</link>
		<comments>http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 16:51:04 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3925</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The third speaker in the program was Cloudera co-founder and Chief Scientist Jeff Hammerbacher. Jeff, recently hailed by Tim O&#8217;Reilly as one of the world&#8217;s most powerful data scientists, built the Facebook Data Team, which is most [...]]]></description>
			<content:encoded><![CDATA[<p><object id="__sse10188294" width="425" height="355" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="wmode" value="transparent" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20111027cikm-111116104111-phpapp02&amp;rel=0&amp;stripped_title=jeff-10188294&amp;userName=dtunkelang" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse10188294" width="425" height="355" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=20111027cikm-111116104111-phpapp02&amp;rel=0&amp;stripped_title=jeff-10188294&amp;userName=dtunkelang" allowFullScreen="true" allowScriptAccess="always" wmode="transparent" allowscriptaccess="always" allowfullscreen="true" /> </object></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The third speaker in the program was <a href="http://www.cloudera.com/">Cloudera</a> co-founder and Chief Scientist <a href="http://www.linkedin.com/in/jhammerb">Jeff Hammerbacher</a>. Jeff, recently hailed by <a href="http://tim.oreilly.com/">Tim O&#8217;Reilly</a> as one of the <a href="http://www.forbes.com/pictures/lmm45emkh/2-jeff-hammerbacher-chief-scientist-cloudera-and-dj-patil-entrepreneur-in-residence-greylock-ventures">world&#8217;s most powerful data scientists</a>, built the <a href="http://www.facebook.com/data">Facebook Data Team</a>, which is most known for open-source contributions that include <a href="http://en.wikipedia.org/wiki/Apache_Hive">Hive</a> and <a href="http://en.wikipedia.org/wiki/Apache_Cassandra">Cassandra</a>. Jeff&#8217;s talk was entitled “<a href="http://www.cikm2011.org/industryevent#jh">Experiences Evolving a New Analytical Platform: What Works and What&#8217;s Missing</a>“. I am thankful to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging a <a href="http://www.searchenginecaffe.com/2011/10/cikm-industry-talk-jeff-hammerbacher-on.html">summary</a>.</p>
<p>Jeff&#8217;s talk was a whirlwind tour through the philosophy and technology for delivering large-scale analytics (aka &#8220;big data&#8221;) to the world:</p>
<p>1) Philosophy</p>
<p>The true challenges in the task of data mining are creating a data set with the relevant and accurate information and determining the appropriate analysis techniques. While in the past it made sense to plan data storage and structure around the intended use of the data, the economics of storage and the availability of open-source analytics platforms argue for the reverse: data first, ask questions later; store first, establish structure later. The goal is to enable everyone &#8212; developers, analysts, business users &#8212; to &#8220;party on the data&#8221;, providing infrastructure that keeps them from clobbering one another or starving each other of resources.</p>
<p>2) Defining the Platform</p>
<p>No one just uses a relational database anymore. For example, consider <a href="http://en.wikipedia.org/wiki/Microsoft_SQL_Server">Microsoft SQL Server</a>. It is actually part of a unified suite that includes <a href="http://en.wikipedia.org/wiki/Microsoft_SharePoint">SharePoint</a> for collaboration, <a href="http://en.wikipedia.org/wiki/PowerPivot">PowerPivot</a> for <a href="http://en.wikipedia.org/wiki/Online_analytical_processing">OLAP</a>, <a href="http://msdn.microsoft.com/en-us/library/ee362541.aspx">StreamInsight</a> for <a href="http://en.wikipedia.org/wiki/Complex_event_processing">complex event processing</a> (CEP), etc. As with the <a href="http://en.wikipedia.org/wiki/LAMP_(software_bundle)">LAMP</a> stack, there is a coherent framework analytical data management which we can call an analytical data platform.</p>
<p>3) Cloudera&#8217;s Platform</p>
<p>Cloudera starts with a substrate architecture of <a href="http://opencompute.org/">Open Compute</a> commodity Linux servers configured using <a href="http://projects.puppetlabs.com/projects/puppet">Puppet</a> and <a href="http://www.opscode.com/chef/">Chef</a> and coordinated using <a href="http://zookeeper.apache.org/">ZooKeeper</a>. Naturally this entire stack is open-source. They use <a href="http://hadoop.apache.org/hdfs/">HFDS</a> and <a href="http://ceph.newdream.net/">Ceph</a> to provide distributed, <a href="http://en.wikipedia.org/wiki/NoSQL">schema-less</a> storage. They offer append-only table storage and metadata using <a href="http://avro.apache.org/">Avro</a>, <a href="http://hive.apache.org/docs/r0.4.0/api/org/apache/hadoop/hive/ql/io/RCFile.html">RCFile</a>, and <a href="http://incubator.apache.org/hcatalog/">HCatalog</a>; and mutable table storage and metadata using <a href="http://hbase.apache.org/">HBase</a>. For computation, they offer <a href="http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html">YARN</a> (inter-job scheduling, like <a href="http://gridengine.org/blog/">Grid Engine</a>, for data intensive computing) and <a href="http://www.mesosproject.org/">Mesos</a> for cluster resource management; <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>, <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2911">Hamster</a> (<a href="http://www.open-mpi.org/">MPI</a>), <a href="http://www.spark-project.org/">Spark</a>, <a href="http://research.microsoft.com/en-us/projects/dryad/">Dryad</a> / <a href="http://research.microsoft.com/en-us/projects/dryadLINQ/">DryadLINQ</a>, <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Pregel</a> (<a href="http://incubator.apache.org/giraph/">Giraph</a>), and <a href="http://research.google.com/pubs/pub36632.html">Dremel</a> as processing frameworks; and <a href="https://github.com/cloudera/crunch#readme">Crunch</a>  (like Google&#8217;s <a href="http://dl.acm.org/citation.cfm?id=1806596.1806638">FlumeJava</a>), <a href="http://pig.apache.org/">PigLatin</a>, <a href="http://hive.apache.org/">HiveQL</a>, and <a href="http://yahoo.github.com/oozie/">Oozie</a> as high-level interfaces. Finally, Cloudera offers tool access through <a href="http://fuse.sourceforge.net/">FUSE</a>, <a href="http://en.wikipedia.org/wiki/Java_Database_Connectivity">JDBC</a>, and <a href="http://en.wikipedia.org/wiki/Open_Database_Connectivity">ODBC</a>; and data ingest through <a href="http://www.cloudera.com/blog/2009/06/introducing-sqoop/">Sqoop</a> and <a href="https://cwiki.apache.org/FLUME/">Flume</a>.</p>
<p>4) What&#8217;s Next?</p>
<p>For the substrate, we can expect support for fat servers with fat pipes, operating system support for isolation, and improved local filesystems (e.g., <a href="http://en.wikipedia.org/wiki/Btrfs">btrfs</a>). Storage improvements will give us a unified file format, compression, better performance and availability, richer metadata, distributed snapshots, replication across data centers, native client access, and separation of namespace and block management. We will see stabilization of our existing compute tools and better variety, as well as improved fault tolerance, isolation and workload management, low-latency job scheduling, and a unified execution backend for workflow. And we will see better integration through REST API access to all platform components, better document ingest, maintenance of source catalog and provenance information, and an integration beyond ODBC with analytics tools. We will also see tools that facilitate that transition from unstructured to structured data (e.g. <a href="http://cloudera.github.com/RecordBreaker/">RecordBreaker</a>).</p>
<p>Jeff&#8217;s talk was as information-dense as this post suggests, and I hope the mostly-academic CIKM audience was not too shell-shocked. It&#8217;s fantastic to see practitioners not only building essential tools for research in information and knowledge management, but reaching out to the research community to build bridges. I saw lots of intense conversation after his talk, and I hope the results realize the two-fold mission of the Industry Event, which is to give  researchers an opportunity to learn about the problems most relevant to industry practitioners, and to offer practitioners an opportunity to deepen their understanding of the field in which they are working.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/16/cikm-2011-industry-event-jeff-hammerbacher-on-experiences-evolving-a-new-analytical-platform/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: John Giannandrea on Freebase &#8211; A Rosetta Stone for Entities</title>
		<link>http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/</link>
		<comments>http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/#comments</comments>
		<pubDate>Tue, 15 Nov 2011 08:26:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3916</guid>
		<description><![CDATA[This post is part of a series summarizing the presentations at the CIKM 2011 Industry Event, which I chaired with former Endeca colleague Tony Russell-Rose. The second speaker in the program was Metaweb co-founder John Giannadrea. Google acquired Metaweb last year and has kept its promise to to maintain Freebase as a free and open database for the world (including for rival [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cikm2011.org/industryevent#jg"><img class="alignnone" title="Freebase - A Rosetta Stone for Entities" src="http://upload.wikimedia.org/wikipedia/en/8/86/Freebase-logo.png" alt="" width="129" height="20" /></a></p>
<p>This post is part of a series summarizing the presentations at the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which I chaired with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>.</p>
<p>The second speaker in the program was <a href="http://en.wikipedia.org/wiki/Metaweb">Metaweb</a> co-founder <a href="http://24.about.apps.freebase.dev.freebaseapps.com/team">John Giannadrea</a>. Google <a href="http://googleblog.blogspot.com/2010/07/deeper-understanding-with-metaweb.html">acquired Metaweb</a> last year and has kept its promise to to maintain <a href="http://en.wikipedia.org/wiki/Freebase">Freebase</a> as a free and open database for the world (including for rival search engine <a href="http://blog.freebase.com/2009/07/13/bing-structured-search-results-powered-by-freebase/">Bing</a> &#8211; though I&#8217;m not sure if Bing is still using Freebase). John&#8217;s talk was entitled &#8220;<a href="http://www.cikm2011.org/industryevent#jg">Freebase &#8211; A Rosetta Stone for Entities</a>&#8220;. I am thankful to <a href="http://www.searchenginecaffe.com/">Jeff Dalton</a> for live-blogging a <a href="http://www.searchenginecaffe.com/2011/10/cikm-2011-industry-freebase-rosetta.html">summary</a>.</p>
<p>John started by introducing Freebase as a representation of structured objects corresponding to real-world entities and connected by a directed graph of relationships. In other words, a <a href="http://en.wikipedia.org/wiki/Semantic_Web">semantic web</a>. While it isn&#8217;t quite web-scale, Freebase is a large and growing knowledge base consisting of 25 million entities and 500 million connections &#8212; and doubling annually. The core concept in Freebase is a type, and an entity can have many types. For example, <a href="http://www.freebase.com/view/en/arnold_schwarzenegger">Arnold Schwarzenegger</a> is a <a href="http://www.freebase.com/view/en/politician">politician</a> and an <a href="http://www.freebase.com/view/en/actor">actor</a>. John emphasized the messiness of the real world. For example, most actors are people, but what about the <a href="http://www.freebase.com/view/m/05tf4t">dog</a> who played <a href="http://www.freebase.com/view/en/lassie">Lassie</a>? It&#8217;s important to support exceptions.</p>
<p>The main technical challenge for Freebase is <a href="http://wiki.freebase.com/wiki/Reconciliation">reconciliation</a> &#8212; that is, determining how similar a set of data is to existing Freebase topics. John pointed out how critical it is for Freebase to avoid duplication of content, since the utility of Freebase depends on unique nodes in its graph corresponding to unique objects in the world. Freebase obtains many of its entities by reconciling large, open-source knowledge bases &#8212; including Wikipedia, <a href="http://wordnet.princeton.edu/">WordNet</a>, <a href="http://authorities.loc.gov/">Library of Congress Authorities</a>,  and metadata from the <a href="http://blog.freebase.com/2010/06/11/stanford-university-library-catalog/">Stanford Library</a>. Freebase uses a variety of tools to implement reconciliation, including <a href="http://code.google.com/p/google-refine/?redir=1">Google Refine</a> (formerly known as Freebase Gridworks) and <a href="http://wiki.freebase.com/wiki/Matchmaker">Matchmaker</a>, a tool for gathering human judgments. While reconciliation is a hard technical problem, it is made possible by making inferences across the web of relationships that link entities to one another.</p>
<p>John then presented Freebase as a <a href="http://en.wikipedia.org/wiki/Rosetta_Stone">Rosetta Stone</a> for entities on the web. Since an entity is simply a collection of keys (one of which is its name), Freebase&#8217;s job is to reverse engineer the key-value store that is distributed among the entity&#8217;s web references, e.g., the structured databases backing web sites and encoding keys in URL parameters. He noted that Freebase itself is schema-less (it is a <a href="http://en.wikipedia.org/wiki/Graph_database">graph database</a>), and that even the concept of a <a href="http://www.freebase.com/view/type/type">type</a> is itself an entity (&#8220;Type type is the only type that is an instance of itself&#8221;). Google makes Freebase available through an <a href="http://wiki.freebase.com/wiki/New_Freebase_API">API</a> and the Metaweb Query Language (<a href="http://wiki.freebase.com/wiki/MQL">MQL</a>).</p>
<p>Freebase does have its challenges. The requirement to keep out duplicates is an onerous one, as they discovered when importing a portion of the <a href="http://openlibrary.org/">Open Library</a> catalog. Maintaining quality calls for significant manual curation, and quality varies across the knowledge base. John asserted that Freebase provides 99% accuracy at the 95th percentile, though it&#8217;s not clear to me what that means <em>(update: see Bill&#8217;s <a href="http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/comment-page-1/#comment-10650">comment</a> below)</em>.</p>
<p>While I still have concerns about Freebase&#8217;s robustness as a structured knowledge base (see my post on &#8220;<a href="http://thenoisychannel.com/2011/05/15/in-search-of-structure/">In Search Of Structure</a>&#8220;), I&#8217;m excited to see Google investing in structured representations of knowledge. To hear more about Google&#8217;s efforts in this space, check out the Strata New York panel I moderated on <a href="http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/">Entities, Relationships, and Semantics</a> &#8212; the panelists included <a href="http://secondthought.org/">Andrew Hogue</a>, who leads Google&#8217;s structured data and information extraction group and managed me during my year at Google New York.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/15/cikm-2011-industry-event-john-giannandrea-on-freebase-a-rosetta-stone-for-entities/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event: Stephen Robertson on Why Recall Matters</title>
		<link>http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/</link>
		<comments>http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 16:02:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3903</guid>
		<description><![CDATA[On October 27th, I had the pleasure to chair the CIKM 2011 Industry Event with former Endeca colleague Tony Russell-Rose. It is my pleasure to report that the program, held in parallel with the main conference sessions, was a resounding success. Since not everyone was able to make it to Glasgow for this event, I&#8217;ll [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/10155675" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="425" height="355"></iframe></p>
<div style="padding: 5px 0 12px;">
<p>On October 27th, I had the pleasure to chair the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a> with former <a href="http://www.endeca.com/">Endeca</a> colleague <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>. It is my pleasure to report that the program, held in parallel with the main conference sessions, was a resounding success. Since not everyone was able to make it to Glasgow for this event, I&#8217;ll use this and subsequent posts to summarize the presentations and offer commentary. I&#8217;ll also share any slides that presenters made available to me.</p>
<p>Microsoft researcher <a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a>, who may well be the world&#8217;s preeminent living researcher in the area of <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a>, opened the program with a talk on &#8220;<a href="http://www.cikm2011.org/industryevent#ser">Why Recall Matters</a>&#8220;. For the record, I didn&#8217;t put him up to this, despite my <a href="http://thenoisychannel.com/2009/07/17/in-defense-of-recall/">strong opinions</a> on the subject.</p>
<p>Stephen started by reminding us of ancient times (i.e., before the web), when at least some IR researchers thought in terms of <a href="http://thenoisychannel.com/2008/08/24/set-retrieval-vs-ranked-retrieval/">set retrieval</a> rather than ranked retrieval. He reminded us of the precision and recall &#8220;devices&#8221; that he&#8217;d described in his <a href="http://www.soi.city.ac.uk/~ser/papers/salton_lecture_web.pdf">Salton Award Lecture</a> &#8212; an idea he attributed to the late <a href="http://www.iva.dk/bh/core%20concepts%20in%20lis/articles%20a-z/cranfield_experiments.htm">Cranfield</a> pioneer <a href="http://en.wikipedia.org/wiki/Cyril_Cleverdon">Cyril Cleverdon</a>. He noted that, while set retrieval uses distinct precision and recall devices, ranking conflates both into decision of where to truncate a ranked result list. He also pointed out an interesting asymmetry in the conventional notion of <a href="http://en.wikipedia.org/wiki/Precision_and_recall">precision-recall</a> tradeoff: while returning more results can only increase recall, there is no certainly that the additional results will decrease precision. Rather, this decrease is a hypothesis that we associate with systems designed to implement the <a href="http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/jdRobertson1977.pdf">probability ranking principle</a>, returning results in decreasing order of probability of relevance.</p>
<p>He went on to remind us that there is information retrieval beyond web search. He hauled out the usual examples of recall-oriented tasks: <a href="http://www.haxel.com/icic/archive/2009/programme/oct20/dummy-presentation/at_download/attachfile">e-discovery</a>, <a href="http://hcir.info/hcir-2011/challenge">prior art search</a>, and <a href="http://en.wikipedia.org/wiki/Evidence-based_medicine">evidence-based medicine</a>. But he then made the case that not only the web not the only problem in information retrieval, but that &#8220;it&#8217;s the web that&#8217;s strange&#8221; relative to the rest of the information retrieval landscape in so strongly favoring precision over recall. He enumerated some of the peculiarities of the web, including its size (there&#8217;s only one web!), the extreme variation in authorship and quality, the lack of any content standardization (efforts like <a href="http://schema.org/">schema.org</a> notwithstanding), and the advertising-based monetization model that creates an unusual and sometimes adversarial relationships between content owners and search engines. In particular, he cited <a href="http://en.wikipedia.org/wiki/Enterprise_search">enterprise search</a> as an information retrieval domain that violates the assumptions of web search and calls for more emphasis on recall.</p>
<p>Stephen suggested that, rather than thinking in terms of the precision-recall curve, we consider the recall-fallout curve. <a href="http://en.wikipedia.org/wiki/Information_retrieval#Fall-Out">Fallout</a> is a relatively unknown measure that represents the probability that a non-relevant document is retrieved by the query. He noted that fallout offered little practical use in IR, given that the corpus is populated almost entirely by non-relevant documents. Still, he made the case that the recall-fallout trade-off might be more conceptually appropriate than the precision-recall curve in order to understand the value of recall.</p>
<p>In particular, we can generalize the traditional inverse precision-recall relationship to the hypothesis that the recall-fallout curve is convex (details in &#8220;<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.138.2609&amp;rep=rep1&amp;type=pdf">On score distributions and relevance</a>&#8220;). We can then calculate instantaneous precision at any point in the result list as the gradient of the recall-fallout curve. Going back to the notion of devices, we can now replace precision devices with fallout devices.</p>
<p>Stephen wrapped up his talk by emphasizing the user of information retrieval systems &#8212; as aspect of IR that is too often neglected outside <a href="http://hcir.info/">HCIR</a> circles. He advocated that systems provide user with evidence of recall, guidance of how far to go down ranked results, and prediction of the recall at any given stopping point.</p>
<p>It was an extraordinary privilege to have Stephen Robertson present at the CIKM Industry Event, and even better to have him make a full-throated argument in favor of recall. I can only hope that researchers and practitioners take him up on it.</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/14/cikm-2011-industry-event-stephen-robertson-on-why-recall-matters/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Entities, Relationships, and Semantics: Strata NY Panel on the State of Structured Search</title>
		<link>http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/</link>
		<comments>http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/#comments</comments>
		<pubDate>Sun, 06 Nov 2011 03:56:45 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3896</guid>
		<description><![CDATA[Earlier this year, I had the privilege to moderate a panel at Strata New York 2011 on Entities, Relationships, and Semantics: the State of Structured Search. The four panelists are people I&#8217;ve had the pleasure to work with over the years: Andrew Hogue (Google), Breck Baldwin (Alias-i), Evan Sandhaus (New York Times), Wlodek Zadrozny (IBM Research). They work on some of the world’s largest [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.youtube.com/embed/vr1blOJxXfQ" frameborder="0" width="500" height="281"></iframe></p>
<p>Earlier this year, I had the privilege to moderate a panel at <a href="http://strataconf.com/stratany2011">Strata New York 2011</a> on <a href="http://strataconf.com/stratany2011/public/schedule/detail/21413">Entities, Relationships, and Semantics: the State of Structured Search</a>. The four panelists are people I&#8217;ve had the pleasure to work with over the years: <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124000">Andrew Hogue</a> (Google), <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124001">Breck Baldwin</a> (Alias-i), <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124002">Evan Sandhaus</a> (New York Times), <a href="http://strataconf.com/stratany2011/public/schedule/speaker/124003">Wlodek Zadrozny</a> (IBM Research). They work on some of the world’s largest structured search problems &#8212; from offering users structured search on Google’s web corpus to building a computing system that defeated Jeopardy! champions in an extreme test of natural language understanding.</p>
<p>O&#8217;Reilly has compiled the nearly 50 hours of video from the conference and made the collection available for <a href="http://shop.oreilly.com/product/0636920022985.do">purchase</a>. I was lucky to attend all of the keynotes and many of the breakout sessions, and I highly recommend them. In the meantime, you can see a recording of the panel I moderated.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/05/entities-relationships-and-semantics-strata-ny-panel-on-the-state-of-structured-search/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Interview in Forbes: What is a Data Scientist?</title>
		<link>http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/</link>
		<comments>http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/#comments</comments>
		<pubDate>Wed, 02 Nov 2011 06:24:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3884</guid>
		<description><![CDATA[Dan Woods has been interviewing a variety of folks to answer the question: &#8220;What is a data scientist?&#8220;, and I had the honor to participate in his series. Here is a teaser of my interview: Above all, a data scientist needs to be able to derive robust conclusions from data. But a data scientist also [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/"><img class="alignnone" title="The Data Science Venn Diagram (Drew Conway)" src="http://www.drewconway.com/zia/wp-content/uploads/2010/09/Data_Science_VD.png" alt="" width="407" height="388" /></a></p>
<p><a href="http://blogs.forbes.com/people/danwoods/">Dan Woods</a> has been interviewing a variety of folks to answer the question: &#8220;<a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">What is a data scientist?</a>&#8220;, and I had the honor to participate in his series.</p>
<p>Here is a teaser of my interview:</p>
<blockquote><p>Above all, a data scientist needs to be able to derive robust conclusions from data. But a data scientist also needs to possess creativity and strong communication skills. Creativity drives the process of hypothesis generation, i.e., picking the right problems to solve that will create value for users and drive business decisions.</p></blockquote>
<p>Read the rest on <a href="http://www.forbes.com/sites/danwoods/2011/10/24/linkedins-daniel-tunkelang-on-what-is-a-data-scientist/">Forbes.com</a>. And thanks to Drew Conway for the awesome <a href="http://www.drewconway.com/zia/?p=2378">data science Venn diagram</a> above.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/11/01/interview-in-forbes-what-is-a-data-scientist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RecSys 2011 Tutorial: Recommendations as a Conversation with the User</title>
		<link>http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/</link>
		<comments>http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/#comments</comments>
		<pubDate>Mon, 31 Oct 2011 23:39:21 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3880</guid>
		<description><![CDATA[&#160; Last week, I had the privilege to present a tutorial at the 5th ACM International Conference on Recommender Systems (RecSys 2011). Given my passion for HCIR and my advocacy for transparency in recommender systems, it shouldn&#8217;t surprise regular readers that I focused on both. Unfortunately the tutorial was not recorded, but I hope the [...]]]></description>
			<content:encoded><![CDATA[<p><iframe src="http://www.slideshare.net/slideshow/embed_code/9967220?rel=0" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe><br />
&nbsp;</p>
<p>Last week, I had the privilege to present a tutorial at the 5th ACM International Conference on Recommender Systems (<a href="http://recsys.acm.org/2011/index.shtml">RecSys 2011</a>). Given my passion for <a href="http://hcir.info/">HCIR</a> and my advocacy for <a href="http://thenoisychannel.com/2008/11/21/the-napoleon-dynamite-problem/">transparency in recommender systems</a>, it shouldn&#8217;t surprise regular readers that I focused on both. Unfortunately the tutorial was not recorded, but I hope the slides above prove useful. I also encourage you to take a look at the other <a href="http://recsys.acm.org/2011/tutorials.shtml">tutorials</a>, whose slides are posted on the conference site.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/10/31/recsys-2011-tutorial-recommendations-as-a-conversation-with-the-user/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>HCIR 2011: We Have Arrived!</title>
		<link>http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/</link>
		<comments>http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/#comments</comments>
		<pubDate>Fri, 21 Oct 2011 09:08:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3873</guid>
		<description><![CDATA[If you followed the #hcir2011 tweet stream, then you already know what I have to say: the Fifth Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011) was an extraordinary success. We had about 100 people attending, 14 paper presentations, 28 posters, and 4 challenge entries, all packed into one intense day at Google&#8217;s beautiful [...]]]></description>
			<content:encoded><![CDATA[<p>If you followed the <a href="http://twitter.com/#!/search/%23hcir2011">#hcir2011</a> tweet stream, then you already know what I have to say: the Fifth Workshop on Human-Computer Interaction and Information Retrieval (<a href="http://hcir.info/hcir-2011">HCIR 2011</a>) was an extraordinary success. We had about 100 people attending, 14 <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/schedule/presentations">paper presentations</a>, 28 <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/posters">posters</a>, and 4 <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/challenge">challenge</a> entries, all packed into one intense day at Google&#8217;s beautiful Mountain View headquarters.</p>
<p>Wednesday evening before the workshop, we were treated to a welcome reception, the first of a few meals provided by Google&#8217;s excellent chefs. It was a great opportunity to reconnect with old friends and meet many first-time HCIR attendees.</p>
<p>Thursday started with a scrumptious breakfast that included chilaquiles, coconut fritters, and bacon. Last year&#8217;s <a href="https://sites.google.com/site/hcirworkshop/hcir-2010/keynote">keynote</a> and this year&#8217;s local host <a href="https://sites.google.com/site/dmrussell/">Dan Russell</a> pulled all the stops &#8212; apparently <a href="http://www.yelp.com/biz/bigtable-cafe-mountain-view">BigTable</a> is the only Google cafe that serves bacon for breakfast! We then proceeded to a poster boaster session in which each poster presenter had a minute to pitch his or her poster. This session set the tone for the rest of the workshop: concentrated ideas and intense audience engagement.</p>
<p>Then came this year&#8217;s keynote, <a href="http://ils.unc.edu/~march/">Gary Marchionini</a>. It was a particular treat to have Gary as a keynote, since his lecture on &#8220;<a href="http://www.asis.org/Bulletin/Jun-06/marchionini.html">Toward Human-Computer Information Retrieval</a>&#8221; inspired me to conceive the HCIR workshop back in 2007. And Gary delivered the goods. He started with a review of the history of HCIR, including some lesser known figures like <a href="http://www.linkedin.com/pub/donald-hawkins/10/a59/77">Don Hawkins</a> (who was in the audience) , <a href="http://www.ideals.illinois.edu/handle/2142/14100">Pauline Cochrane</a>, <a href="http://stuff.mit.edu/people/rmarcus/home.html">Richard Marcus</a>, and <a href="http://www3.fis.utoronto.ca/faculty/meadow/">Charles Meadow</a>.  He brought a few chuckles by citing <a href="http://comminfo.rutgers.edu/~belkin/belkin.html">Nick Belkin</a> (who was present) and <a href="http://research.microsoft.com/en-us/um/people/sdumais/">Sue Dumais</a> (who was not) as the father and mother of HCIR. Naturally he described some of his own work at the University of North Carolina, including the <a href="http://www.open-video.org/">Open Video</a>, <a href="http://ils.unc.edu/relationbrowser/">Relation Browser</a>, and <a href="http://ils.unc.edu/resultsspace/">ResultsSpace</a> projects.But the highlight of his talk was a graph he presented showing two paths to the same user end-state, one of the paths being a smooth progression and the other being a roller-coaster of ups and down. The question of which one was better drew a wide variety of responses, my favorite being <a href="http://www.fxpal.com/?p=gene">Gene Golovchinsky</a> observing that learning is the friction of the information-seeking process.</p>
<p>We broke for coffee and then came back to the first session of paper presentations. <a href="http://www.athenikos.com/">Sofia Athenikos</a> presented a semantic search engine that outperformed IMDB in a user study. <a href="http://comminfo.rutgers.edu/directory/changl/index.html">Chang Liu</a> explored the effect of task difficulty and domain knowledge on dwell times, finding counterintuitive results (at least for me) regarding the correlation of expertise to dwell time. <a href="https://sites.google.com/site/jliujingjing/">Jingjing Liu</a> presented research on knowledge examination in multi-session tasks. Then came the lightning talks: <a href="http://www.mansci.uwaterloo.ca/~msmucker/">Mark Smucker</a> on how users examine and process ranked document lists; <a href="http://www.cs.umass.edu/~jykim/">Jin Kim</a> on simulating associative browsing; <a href="http://faculty.cua.edu/kules/">Bill Kules</a> on visualizing the stages of exploratory search; and <a href="http://comminfo.rutgers.edu/directory/mjcole/index.html">Michael Cole</a> on user domain knowledge and eye movement patterns during search. Way too much goodness to summarize here &#8212; I suggest you read the full papers on the <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/schedule/presentations">workshop site</a>.</p>
<p>Then came lunch &#8212; again in BigTable, but this time with outdoor seating &#8212; and the poster session. As always, this it the most interactive part of the day: two hours of non-stop discussion that start over food and end with prying people away from discussions about posters. I was especially proud of LinkedIn&#8217;s contributions to the <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/posters">poster session</a>, which covered <a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NmIxNzEzZjE3ZTVhZTAyYw&amp;pli=1">faceted search log analysis</a>, <a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjEzNDNjZTk5NGYyYWQwOA">social navigation</a>, and <a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MWVlMGNhZWY5NTA3MzQ2ZA">whether it is time to abandon abandonment</a>.</p>
<p>Then back to the second session of  paper presentations. <a href="http://faculty.arts.ubc.ca/lfreund/">Luanne Freund</a> talked about document usefulness and genre, finding that genre, besides being hard for users to reliably identify, only matters for tasks that involve doing, deciding, learning; but not for those that involve fact finding or problem solving. <a href="http://www.fxpal.com/?p=gene">Gene Golovchinsky</a> presented work on designing for collaboration in information seeking, previewing the system he used for his challenge entry.  <a href="http://www.medelyan.com/">Alyona Medelyan</a> used the <a href="http://www.pingar.com/">Pingar</a> search engine to evaluate how search interface features affect performance on biosciences tasks. Then more lightning talks: <a href="http://www.ils.unc.edu/~rcapra/">Rob Capra</a> analyzing faceted search on mobile devices; <a href="http://www.linkedin.com/pub/keith-bagley/0/657/124">Keith Bagley</a> on conceptual mile markers for exploratory search; <a href="http://ils.unc.edu/~wildem/ASIST2008/Yuan-CV.pdf">Xiaojun Yuan</a> on how cognitive styles affect user performance; and <a href="http://mikezarro.com/">Mike Zarro</a> on using social tags and controlled vocabularies as search filters.</p>
<p>Last but not least came the <a href="https://sites.google.com/site/hcirworkshop/hcir-2011/challenge">HCIR Challenge</a>:</p>
<blockquote><p>The HCIR 2011 Challenge focuses on the case where recall is everything – namely, the problem of information availability. The information availability problem arises when the seeker faces uncertainty as to whether the information of interest is available at all. Instances of this problem include some of the highest-value information tasks, such as those facing national security and legal/patent professionals, who might spend hours or days searching to determine whether the desired information exists.</p>
<p>The corpus we will use for the HCIR 2011 Challenge is the CiteSeer digital library of scientific literature. The CiteSeer corpus contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.</p></blockquote>
<p>There were four entries:</p>
<ul>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NzE1YmM2YzE4ODBhYzRjZA" target="_blank">FreeSearch – Literature Search in a Natural Way<br />
</a><em>Claudiu S. Firan, Wolfgang Nejdl, Mihai Georgescu (University of Hanover), and Xinyun Sun (DEKE Lab MOE, Renmin)<br />
</em></li>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MmZmM2Y5Yzg5OTM4NGI5NQ" target="_blank">Session-based search with Querium<br />
</a><em>Gene Golovchinsky (FX Palo Alto Lab) and Abdigani Diriye (University College London)<br />
</em></li>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6NWI1NTc5NWNmNDlmZDUyZg" target="_blank">GisterPro<br />
</a><em>David L.Ostby and Edmond Brian (Visual Purple)<br />
</em></li>
<li><a href="https://docs.google.com/a/kent.edu/viewer?a=v&amp;pid=sites&amp;srcid=ZGVmYXVsdGRvbWFpbnxoY2lyd29ya3Nob3B8Z3g6MjIwOWNlOWY4YTQzMDRmZA" target="_blank">Query Analytics Workbench<br />
</a><em>Antony Scerri, Matthew Corkum, Keith Gutfreund, Ron Daniel Jr., Michael Taylor (Elsevier Labs)</em></li>
</ul>
<p>The competition was fierce. Claudiu showed off the <a href="http://dblp.l3s.de/">Faceted DBLP</a> interface, which is well suited to the information availability task on CiteSeer data. Ed showed how GisterPro uses visualization to support the information seeking process. But it came down to a close call between the Query Analytics Workbench and Querium. Despite the Elsevier team&#8217;s impressive functionality and animated presentation, Gene&#8217;s simpler interface and application of <a href="http://www.fxpal.com/publications/FXPAL-PR-08-467.pdf">ranked fusion</a> won the day. Congratulations to Gene and Abdigani, this year&#8217;s HCIR Challenge winners!</p>
<p>We wrapped up the evening at the <a href="http://tiedhouse.com/">Tied House</a>, a local microbrewery. And of course the discussion turned to where, when, and how we will hold next year&#8217;s workshop. Watch this space. In the meantime, my heartfelt thanks to everyone who made this year&#8217;s workshop such a success &#8212; and especially to our sponsors. Thank you Endeca, Kent State, Microsoft, and Google!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/10/21/hcir-2011-we-have-arrived/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Oracle Acquires Endeca!</title>
		<link>http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/</link>
		<comments>http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 16:20:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3860</guid>
		<description><![CDATA[    Today is a wonderful day for Endeca and Oracle! Oracle has announced that it has entered into an agreement to acquire Endeca, bringing together two of the powerhouses of information access. Quoting from the announcement: &#8220;The combination of Oracle and Endeca is expected to create a comprehensive technology platform to process, store, manage, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.endeca.com/"><img class="alignnone" title="Endeca" src="http://allinio.com/wp-content/uploads/2008/10/Endeca-Logo.gif" alt="" width="116" height="60" /></a>   <img class="alignnone" title="acquired by" src="http://www.freestockphotos.biz/pictures/2/2987/arrow.png" alt="" width="60" height="60" /><a href="http://www.oracle.com/"><img class="alignnone" title="Oracle" src="http://www.logostage.com/logos/Oracle.jpg" alt="" width="311" height="60" /></a></p>
<p>Today is a wonderful day for <a href="http://www.endeca.com/">Endeca</a> and <a href="http://www.oracle.com/">Oracle</a>! Oracle has <a href="http://www.oracle.com/us/corporate/press/517791">announced</a> that it has entered into an agreement to acquire Endeca, bringing together two of the powerhouses of information access. Quoting from the announcement: &#8220;The combination of Oracle and Endeca is expected to create a comprehensive technology platform to process, store, manage, search and analyze structured and unstructured information together. &#8221;</p>
<p>As part of Endeca&#8217;s founding team, I am very proud to see this day. My ten years at Endeca were a formative experience that established my professional identity and inspired my passion to pursue the vision of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a> (by happy coincidence, the 5th annual <a href="http://hcir.info/">HCIR workshop</a> take place on Thursday). Reading Oracle&#8217;s <a href="http://www.oracle.com/us/corporate/Acquisitions/endeca/general-presentation-517133.pdf">presentation</a> about the acquisition, I&#8217;m excited to see how Endeca&#8217;s technology will play a key role in unifying structured and unstructured data management and analysis for Oracle&#8217;s customers.</p>
<p>I take pride in my contributions to Endeca &#8212; I still slip sometimes and refer to Endeca as &#8220;we&#8221;. But the real heroes here are the folks &#8212; and especially the <a href="http://www.endeca.com/en/about-us/leadership-team.html">leadership</a> &#8212;  who have seen this journey through from start to finish. In particular, I am grateful to Steve Papa, Pete Bell, Adam Ferrari, Jack Walter, Keith Johnson, Nik Bates-Haus, and Jason Purcell for everything they have done to bring about this extraordinary outcome.</p>
<p>Finally, excited as I am about this event, it is only the beginning. I am excited to see Endeca&#8217;s people and technology powering one of the world&#8217;s largest enterprise software companies. Looking forward to the next play!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/10/18/oracle-acquires-endeca/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn</title>
		<link>http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/</link>
		<comments>http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/#comments</comments>
		<pubDate>Sat, 01 Oct 2011 04:39:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3850</guid>
		<description><![CDATA[Last week, I delivered the following presentation at the CMU Intelligence Seminar: &#160; I had a great audience, including the department head! Of course that meant fielding tough questions, but that&#8217;s what makes it fun to present at my alma mater. Now that it&#8217;s been over a decade since my defense, I can handle the [...]]]></description>
			<content:encoded><![CDATA[<p>Last week, I delivered the following presentation at the <a href="http://www.cs.cmu.edu/~iseminar/">CMU Intelligence Seminar</a>:</p>
<div id="__ss_9494640" style="width: 425px;"><strong style="display: block; margin: 12px 0 4px;"></strong><object id="__sse9494640" width="467" height="390" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=keepingitprofessional-110930233231-phpapp02&amp;stripped_title=keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin&amp;userName=dtunkelang" /><param name="allowscriptaccess" value="always" /><param name="allowfullscreen" value="true" /><embed id="__sse9494640" width="467" height="390" type="application/x-shockwave-flash" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=keepingitprofessional-110930233231-phpapp02&amp;stripped_title=keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin&amp;userName=dtunkelang" allowFullScreen="true" allowScriptAccess="always" allowscriptaccess="always" allowfullscreen="true" /></object></div>
<p>&nbsp;</p>
<p>I had a great audience, including the department head! Of course that meant fielding tough questions, but that&#8217;s what makes it fun to present at my alma mater. Now that it&#8217;s been over a decade since my defense, I can handle the tough questions. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Unfortunately there is no video, but hopefully the slides are reasonably self-explanatory. If you have questions, please ask them in the comments.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/30/keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Visiting the East Coast: CMU and Strata New York</title>
		<link>http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/</link>
		<comments>http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/#comments</comments>
		<pubDate>Sun, 18 Sep 2011 16:01:02 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3831</guid>
		<description><![CDATA[              &#160; Tonight I&#8217;m taking a red-eye to Pittsburgh so that I can spend three days at my (doctoral) alma mater, CMU. In addition to spending time with lots of great students and faculty, my goal is to communicate a taste of the hard computer science problems we are [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cs.cmu.edu/"><img class="alignnone" title="CMU School of Computer Science" src="http://www.cs.cmu.edu/~ref/naacl/logos/bronze/dragon-small.jpeg" alt="" width="178" height="154" /></a>             <a href="http://strataconf.com/stratany2011/"><img class="alignnone" title="O'Reilly Strata New York: Making Data Work" src="http://assets.en.oreilly.com/1/eventseries/23/strata_franchise_logo_strata.gif" alt="" width="245" height="79" /></a></p>
<p>&nbsp;</p>
<p>Tonight I&#8217;m taking a red-eye to Pittsburgh so that I can spend three days at my (doctoral) alma mater, <a href="http://www.cs.cmu.edu/~quixote/">CMU</a>. In addition to spending time with lots of great students and faculty, my goal is to communicate a taste of the hard computer science problems we are solving (or trying to solve!) at <a href="http://engineering.linkedin.com/">LinkedIn</a>. I&#8217;m giving a <a href="http://www.cs.cmu.edu/~iseminar/">tech talk</a> Tuesday afternoon, joining my colleagues for an info session Tuesday evening, and participating in the <a href="http://toc.web.cmu.edu/">Technical Opportunities Conference</a> (TOC) Wednesday.</p>
<p>Here&#8217;s a teaser for my tech talk:</p>
<p><a href="http://www.cs.cmu.edu/~iseminar/"><img class="alignnone size-full wp-image-3832" style="border-width: 5px; border-color: black; border-style: solid;" title="Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/Keeping-It-Professional.png" alt="" width="475" height="356" /></a></p>
<p>You can find more details about LinkedIn&#8217;s visits to CMU and other campuses at <a href="http://studentcareers.linkedin.com/">http://studentcareers.linkedin.com/</a>.</p>
<p>Hopefully some of you are attending the O&#8217;Reilly <a href="http://strataconf.com/stratany2011/">Strata Conference</a> in New York this Thursday and Friday. If so, I encourage you to attend my panel session on &#8220;<a href="http://strataconf.com/stratany2011/public/schedule/detail/21413">Entities, Relationships, and Semantics: the State of Structured Search</a>&#8220;:</p>
<blockquote><p>Structured search improves the search experience through the identification of entities and their relationships in documents and queries. This panel will explore the current state of structured and semi-structured search, as well as exploring the open problems in an area that promises to revolutionize information seeking.</p></blockquote>
<p>The four panelists work on some of the world’s largest structured search problems, from offering users structured search on Google’s web corpus to building a computing system that defeated <em>Jeopardy!</em> champions in an extreme test of natural language understanding. They work on the data, tools, and research that are driving this field. They are all excellent researchers and presenters, promising to offer a informative and engaging panel discussion, for which I will act as moderator.</p>
<p>Panelists:</p>
<ul>
<li><strong>Andrew Hogue</strong> is a Senior Staff Engineer and Engineering Manager in the Search Quality group at Google New York. He has worked on a wide array of projects including question answering, Google Squared, sentiment analysis, local and product search, and Google Goggles. His is interested in the areas of structured data, information extraction, and machine learning, and their applications to search and search interfaces. Prior to Google, he earned a M.Eng. and B.S. in Computer Science from MIT.</li>
</ul>
<ul>
<li><strong>Breck Baldwin</strong> is the President of Alias-i, creators of the popular LingPipe computational linguistics toolkit. He received his Ph.D. in computer science in 1995 from the University of Pennsylvania. In the time between his thesis on coreference resolution and evaluation and founding Alias-i in 1999, Breck worked on DARPA-funded projects through the University of Pennsylvania.</li>
</ul>
<ul>
<li><strong>Evan Sandhaus</strong> works as the Semantic Technologist in The New York Times Research and Development Labs. He is spearheading The New York Times Linked Open Data Strategy and overseeing the release of 1.8 million documents to the computer science research community. Previously, Evan helped to put The New York Times on Google Earth, collaborated with New York University to explore new directions in News Search, and worked to bring The New York Times to Facebook.</li>
</ul>
<ul>
<li><strong>Wlodek Zadrozny</strong> is an IBM Researcher working on natural language applications. Most recently he worked on text sources for Watson (IBM’s Jeopardy chamption) and applying related DeepQA technology to business problems. His previous work ranged from language processing research to product development and technical planning; in particular, he lead the development of interactions systems that used speech, natural language and focused search. Wlodek Zadrozny received a Ph.D. in Mathematics, from the Polish Academy of Science.</li>
</ul>
<p><strong>And one more thing.</strong> Karaoke at <a href="http://www.2ndon2nd.com/">Second on Second</a> in the East Village on Friday night. It&#8217;s an unofficial Strata after-party, so come join us Big Data folks for some Big Fun.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/18/visiting-the-east-coast-cmu-and-strata-new-york/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Different Anniversary: Happy Birthday, Endeca!</title>
		<link>http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/</link>
		<comments>http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/#comments</comments>
		<pubDate>Mon, 12 Sep 2011 03:43:59 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3819</guid>
		<description><![CDATA[             I grew up in New York City. On September 11th, 2001, I was in Cambridge, Massachusetts, desperately trying to get through to my parents by all means of communication at my disposal. My dad worked at 40 Worth Street, only a few blocks away from the World Trade Center. [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone" title="Happy Birthday!" src="http://images.pictureshunt.com/pics/h/happy_birthday-2004.jpg" alt="" width="182" height="134" />            <a href="http://www.endeca.com/"><img class="alignnone" title="Endeca" src="http://www.f9systems.com/site/sites/default/files/endeca_logo.gif" alt="" width="243" height="134" /></a></p>
<p>I grew up in New York City. On September 11th, 2001, I was in Cambridge, Massachusetts, desperately trying to get through to my parents by all means of communication at my disposal. My dad worked at <a href="http://maps.google.com/maps?q=40+Worth+Street+New+York+NY">40 Worth Street</a>, only a few blocks away from the World Trade Center. Thankfully none of my family or friends were harmed that day, but that fateful event ten years ago left a mark on the world that no one of my generation will ever forget.</p>
<p>Fortunately I have happier associations with this anniversary.</p>
<p>On September 11th, 1999, I boarded an Amtrak from New York to Boston to join Steve Papa, Pete Bell, Dave Gourley, Fritz Knabe, Jack Walter, and Phil Braden to start the company that would eventually be named <a href="http://www.endeca.com/">Endeca</a>. I had no way of knowing whether we would persuade VCs to fund us beyond our six months of seed investment, let alone that we would develop a technology that to revolutionize the search experience of millions of users around the world. Our modest ambition was to build a better way to find stuff on eBay. That goal remains unfulfilled, but <a href="http://www.endeca.com/en/solutions/Customer-Experience-Management/b2c-ecommerce.html">44 of the top 100 online retailers use Endeca</a>, which isn&#8217;t too shabby. Especially considering that Endeca has expanded well beyond online retail into domains like manufacturing, business intelligence, and government.</p>
<p>On Seprtember 11th, 2002, I gathered the Endeca founding team for a dinner to celebrate the company&#8217;s 3rd birthday. Given my reputation for general irreverence, I feared that my colleagues would think this was a stunt to mock the memory of the more familiar 9/11. But it was quite the opposite. September 11th, 1999 was a turning point in my professional life, and no terrorist was going to take that happiness away from me. To this day I am grateful that my colleagues recognized my sincerity and joined me in this celebration.</p>
<p>The dinner that night was an emotional one: 2002 had been a <a href="http://en.wikipedia.org/wiki/Dot-com_bubble#The_bubble_bursts">tough year</a> for the software industry &#8212; one in which we saw many of our peer companies fold. Fortunately it was the beginning of much better times for us: from 2003 to 2006, Endeca was the <a href="http://www.endeca.com/en/news-and-events/press-releases/2007/endeca-named-massachusetts-fastest-growing-private-company-in-boston-business-journal-annual-ranking.html">fastest growing private company in Massachusetts</a>. No IPO yet, but the <a href="http://www.bizjournals.com/boston/print-edition/2011/07/01/endeca-gears-up-for-likely-ipo-bid.html">rumors</a> are encouraging.</p>
<p>I left Endeca almost two years ago, going to <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">Google</a> and then <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">LinkedIn</a>. But I will always have fond memories of the decade I spent at Endeca &#8212; an experience that established much of the <a href="http://thenoisychannel.com/2011/08/21/dream-fit-passion/">passion</a> that drives me today. I am very proud to have been part of the founding team of such a great company, even if now I can only follow from a distance.</p>
<p>Happy birthday, Endeca, and many more to come!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/11/a-different-anniversary-happy-birthday-endeca/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Attention CMU Students!</title>
		<link>http://thenoisychannel.com/2011/09/07/attention-cmu-students/</link>
		<comments>http://thenoisychannel.com/2011/09/07/attention-cmu-students/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 02:04:18 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3812</guid>
		<description><![CDATA[As many of you know, I&#8217;m a proud alumnus of the CMU School of Computer Science (yes, I also attended the CMU of Massachusetts). I&#8217;m delighted to have the opportunity to spend a few days on campus this month, and I hope that I&#8217;ll have a chance to meet with lots of students and faculty while [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://engineering.linkedin.com"><img class="alignnone size-full wp-image-3814" title="LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/09/in-logo.jpeg" alt="" width="205" height="205" /></a><a href="http://www.cs.cmu.edu/"><img class="alignnone" title="CMU School of Computer Science" src="http://www.cs.cmu.edu/~ref/naacl/logos/bronze/dragon-small.jpeg" alt="" width="277" height="241" /></a></p>
<p>As many of you know, I&#8217;m a proud alumnus of the <a href="http://www.cs.cmu.edu/~quixote/">CMU School of Computer Science</a> (yes, I also attended the <a href="http://www.eecs.mit.edu/">CMU of Massachusetts</a>). I&#8217;m delighted to have the opportunity to spend a few days on campus this month, and I hope that I&#8217;ll have a chance to meet with lots of students and faculty while I&#8217;m there.</p>
<p>Specifically, I&#8217;ll be giving a talk at Eugene Fink&#8217;s <a href="http://www.cs.cmu.edu/~iseminar/">Intelligence Seminar</a> on Tuesday, September 20th at 3:30pm in Gates-Hillman 4303:</p>
<blockquote><p><strong>Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn</strong></p>
<p>LinkedIn operates the world&#8217;s largest professional network on the Internet with more than 120 million members in over 200 countries. In order to connect its users to the people, opportunities, and content that best advance their careers, LinkedIn has developed a variety of algorithms that surface relevant content, offer personalized recommendations, and establish topic-sensitive reputation &#8212; all at a massive scale. In this talk, I will discuss some of the most challenging technical problems we face at LinkedIn, and the approaches we are taking to address them.</p></blockquote>
<p>I hope to see all of you there! My colleagues and I will also be hosting an information session that same Tuesday at 6pm in Porter Hall, Room 125B, as well as participating in the <a href="http://toc.web.cmu.edu/">Technical Opportunities Conference</a> Tuesday and Wednesday. And of course LinkedIn will be conducting on-campus <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/">interviews</a>: those will take place all day on Thursday, September 22nd.</p>
<p>If you are a CMU student interested in <a href="http://engineering.linkedin.com/">opportunities at LinkedIn</a>, please <a href="http://www.studentaffairs.cmu.edu/career/tartantrak/tartantrakstudentlogin.html">apply through TartanTrak</a> (yes, I wish you could just <a href="http://blog.linkedin.com/2011/07/24/apply-with-linkedin/">apply with LinkedIn</a> &#8211; we&#8217;ll get there!). Of course, feel free to reach out to me personally at <a href="mailto:dtunkelang@linkedin.com">dtunkelang@linkedin.com</a>. We already have more applicants than slots, but I promise that every application will be considered. I&#8217;m very excited to recruit CMU students to strengthen our growing team of software engineers and data scientists.</p>
<p>See you soon, and let&#8217;s go Tartans!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/09/07/attention-cmu-students/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/09/07/attention-cmu-students/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dream. Fit. Passion.</title>
		<link>http://thenoisychannel.com/2011/08/21/dream-fit-passion/</link>
		<comments>http://thenoisychannel.com/2011/08/21/dream-fit-passion/#comments</comments>
		<pubDate>Mon, 22 Aug 2011 01:14:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3804</guid>
		<description><![CDATA[A few days ago, our CEO Jeff Weiner led a session at LinkedIn on how to &#8220;close&#8221; candidates &#8212; that is, how to persuade candidates to join your team once you have found and interviewed them. Since not everyone has the opportunity to work at LinkedIn and experience Jeff&#8217;s leadership first-hand, I thought I&#8217;d share [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/File:Franz_Marc_-_Der_Traum_-_Google_Art_Project.jpg"><img class="alignnone" title="Franz Marc - The Dream" src="http://upload.wikimedia.org/wikipedia/commons/thumb/a/a1/Franz_Marc_-_Der_Traum_-_Google_Art_Project.jpg/500px-Franz_Marc_-_Der_Traum_-_Google_Art_Project.jpg" alt="" width="500" height="368" /></a></p>
<p>A few days ago, our CEO <a href="http://www.linkedin.com/in/jeffweiner08">Jeff Weiner</a> led a session at LinkedIn on how to &#8220;close&#8221; candidates &#8212; that is, how to persuade candidates to join your team once you have <a href="http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/ ">found</a> and <a href="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/ ">interviewed</a> them. Since not everyone has the opportunity to <a href="http://engineering.linkedin.com/">work at LinkedIn</a> and experience Jeff&#8217;s leadership first-hand, I thought I&#8217;d share some of his wisdom here.</p>
<p>The key take-away  was that closing a candidate is not about selling the job or company to the candidate, but rather working with the candidate to figure out what the candidate wants and whether the job will help him or her achieve that desire. As an employer, you need to do three things to close a candidate:</p>
<p>1) Figure out what is the candidate&#8217;s dream.<br />
2) Determine if job and candidate are the right fit.<br />
3) Communicate your own passion.</p>
<p>Let&#8217;s take these one at a time.</p>
<p><strong>Dream.</strong></p>
<p>As I&#8217;ve written here in the past, we have to <a href="http://thenoisychannel.com/2011/01/17/dare-to-dream/">dare to dream</a>. Most of us rely on jobs to sustain us and our loved ones &#8212; and for some a job is nothing more than that. There&#8217;s no shame in having a dream that is unrelated to a job &#8212; Franz Kafka famously worked in a variety of &#8220;<a href="http://en.wikipedia.org/wiki/Franz_Kafka#Employment">bread jobs</a>&#8221; in order to pay the bills while he wrote novels. Others find their calling as humanitarians, activists, or care givers. It&#8217;s easy for many of us to forget that life isn&#8217;t always about work.</p>
<p>But the great thing about working in technology is that you can get paid to fulfill your own dream. Look at <a href="http://www.wired.com/wired/archive/13.08/battelle.html">Larry and Sergey</a>, who set out to organize the world&#8217;s information. Or <a href="http://books.simonandschuster.com/Steve-Jobs/Walter-Isaacson/9781442346956">Steve Jobs</a>, whose dream has been to create innovative products. Not everyone is as specific in their dreams or as successful in realizing them, but, as the saying goes, you have to be in it to win it.</p>
<p>Convincing a person to accept a job offer works best when that job brings the person closer to fulfilling his or her dream. My own decisions to go to <a href="http://thenoisychannel.com/2009/11/06/going-to-google/">Google</a> and then <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/ ">LinkedIn</a> are good examples. Working at <a href="http://www.endeca.com/">Endeca</a> drove me to pursue a vision of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> &#8212; to optimize the way people and machines work together to solve information seeking and exploration tasks. At Google, I hoped to bring exploratory search to the open web. I&#8217;ll concede that I did not make much headway, but I&#8217;m glad that I tried.</p>
<p>And at LinkedIn, I work on problems that not only stretch the boundaries of information science, but whose solutions help millions of other people achieve their dreams by making them more successful professionally. My dream is to truly reduce HCIR to practice so that people can lead better and more productive lives. Once the folks at LinkedIn understood my dream, closing me was just a matter of offering me the keys to make that dream a reality.</p>
<p>If you want someone to work at your company, get to know that person&#8217;s dreams. If the job you are offering can&#8217;t help him or her realize those dreams, be honest about it. It&#8217;s better for both of you, and for a world that is better off with people devoting their lives&#8217; work to fulfilling their dreams.</p>
<p><strong>Fit.</strong></p>
<p>Fit is a two way street: the candidate should be right for the job, and the job should be right for the candidate. The interviewing process typically focuses on establishing the former, but we often forget that the candidate&#8217;s decision focuses on the latter. Just because someone is capable of doing a job doesn&#8217;t mean it&#8217;s the right job for that person.</p>
<p>For me, fit means many things. A work environment where people work hard and take the company&#8217;s success personally. Incentives that allow everyone to win, rather than a zero-sum game where people compete for scarce opportunities. Openness, since I&#8217;m someone who lives most of my life <a href="http://www.forbes.com/2008/10/13/cio-mesh-collaboration-tech-cio-cx_dw_1014mesh.html">in public</a>. I could go on &#8212; but I hope you get the general idea. Fit is the set of functional and non-functional requirements that determine whether someone will enjoy a job. And people who enjoy their jobs tend to be productive and stay a while.</p>
<p>If you are trying to persuade someone to accept a job offer, you have to see the decision from that person&#8217;s point of view. In other words, ask yourself &#8212; and convincingly answer &#8212; why the job is the right fit for the candidate. That means accepting the possibility that is isn&#8217;t the right fit, and doing right by the candidate even if that means backing off.</p>
<p><strong>Passion.</strong></p>
<p>Choosing a job is one of the most important life decisions that people make. It&#8217;s not quite up there with getting married or having a child, but it&#8217;s a a decision that most people take (and should take) very seriously. Some people create spreadsheets of the pros and cons to compare opportunities and try to frame their decision as an <a href="http://www.decisionmaking.org/career_decisionmaking.html">optimization problem</a>. Others go with their gut.</p>
<p>Those who know me personally &#8212; whether from face-to-face or online interaction &#8212; know that I wear my passion on my sleeve. I can&#8217;t understand how someone could get up in the morning and go to work without being passionate about his or her job. I know that many people don&#8217;t have a choice in the matter, and I pity them. In a country where most people take subsistence for granted, having a job you love strikes me as a necessity, rather than a luxury.</p>
<p>But what is clear is that if you, as an employer, are not passionate about what you do, you have no business expecting a candidate to take such a big leap of faith with you. Moreover, passion is hard to fake. As it should be &#8212; I&#8217;m not suggesting that employers should pretend to be excited about their jobs. Rather, your own sincere excitement is a baseline for those you hope to attract to your team. Passion is contagious, and passion is the raw material for making dreams come true.</p>
<p><strong>Dream. Fit. Passion.</strong></p>
<p>There you have it: dream, fit, passion. And remember, closing isn&#8217;t selling. Do right by the people you try to hire. After all, jobs are short, but careers are long. Celebrate everyone&#8217;s professional success, and take your losses in stride. I can tell you from experience that it all works out for the best.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/08/21/dream-fit-passion/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/08/21/dream-fit-passion/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Retiring a Great Interview Problem</title>
		<link>http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/</link>
		<comments>http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 07:27:50 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3767</guid>
		<description><![CDATA[Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find candidates who can write code. The tech press sporadically publishes &#8220;best&#8221; interview questions that make me cringe &#8212; though I love the IKEA question. Startups like Codility and Interview Street see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://search.dilbert.com/comic/Job%20Interview"><img class="alignnone" title="Job Interview on Dilbert.com" src="http://dilbert.com/dyn/str_strip/000000000/00000000/0000000/000000/00000/1000/200/1222/1222.strip.gif" alt="" width="640" height="199" /></a></p>
<p>Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">candidates who can write code</a>. The tech press sporadically publishes <a href="http://royal.pingdom.com/2008/08/15/the-best-job-interview-questions-from-microsoft-google%E2%80%A6-and-ikea/">&#8220;best&#8221; interview questions</a> that make me cringe &#8212; though I love the <a href="http://farm4.static.flickr.com/3147/2765642306_fb64f6c3d7_o.jpg">IKEA question</a>. Startups like <a href="http://codility.com/">Codility</a> and <a href="http://www.interviewstreet.com/recruit/home/">Interview Street</a> see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding interviews. Meanwhile, Diego Basch and others are urging us to stop subjecting candidates to <a href="http://blog.indextank.com/1030/interviewing-engineers-enough-with-the-whiteboard-coding/">whiteboard coding exercises</a>.</p>
<p>I don&#8217;t have a silver bullet to offer. I agree that IQ tests and gotcha questions are a terrible way to assess software engineering candidates. At best, they test only one desirable attribute; at worst, they are a crapshoot as to whether a candidate has seen a similar problem or stumbles into the key insight. Coding questions are a much better tool for assessing people whose day job will be coding, but conventional interviews &#8212; whether by phone or in person &#8212; are a suboptimal way to test coding strength. Also, it&#8217;s not clear whether a coding question should assess problem-solving, pure translation of a solution into working code, or both.</p>
<p>In the face of all of these challenges, I came up with an interview problem that has served me and others well for a few years at <a href="http://www.endeca.com/en/about-us/jobs.html">Endeca</a>, <a href="http://www.google.com/intl/ln/jobs/uslocations/new-york/swe/index.html">Google</a>, and <a href="http://engineering.linkedin.com/">LinkedIn</a>. It is with a heavy heart that I retire it, for reasons I&#8217;ll discuss at the end of the post. But first let me describe the problem and explain why it has been so effective.</p>
<p><strong>The Problem</strong></p>
<p>I call it the &#8220;word break&#8221; problem and describe it as follows:</p>
<pre>Given an input string and a dictionary of words,
segment the input string into a space-separated
sequence of dictionary words if possible. For
example, if the input string is "applepie" and
dictionary contains a standard set of English words,
then we would return the string "apple pie" as output.</pre>
<p>Note that I&#8217;ve deliberately left some aspects of this problem vague or underspecified, giving the candidate an opportunity to flesh them out. Here are examples of questions a candidate might ask, and how I would answer them:</p>
<pre>Q: What if the input string is already a word in the
   dictionary?
A: A single word is a special case of a space-separated
   sequence of words.

Q: Should I only consider segmentations into two words?
A: No, but start with that case if it's easier.

Q: What if the input string cannot be segmented into a
   sequence of words in the dictionary?
A: Then return null or something equivalent.

Q: What about stemming, spelling correction, etc.?
A: Just segment the exact input string into a sequence
   of exact words in the dictionary.

Q: What if there are multiple valid segmentations?
A: Just return any valid segmentation if there is one.

Q: I'm thinking of implementing the dictionary as a
   <a href="http://en.wikipedia.org/wiki/Trie">trie</a>, <a href="http://en.wikipedia.org/wiki/Suffix_tree">suffix tree</a>, <a href="http://en.wikipedia.org/wiki/Fibonacci_heap">Fibonacci heap</a>, ...
A: You don't need to implement the dictionary. Just
   assume access to a reasonable implementation.

Q: What operations does the dictionary support?
A: Exact string lookup. That's all you need.

Q: How big is the dictionary?
A: Assume it's much bigger than the input string,
   but that it fits in memory.</pre>
<p>Seeing how a candidate negotiates these details is instructive: it offers you a sense of the candidate&#8217;s communication skills and attention to detail, not to mention the candidate&#8217;s basic understanding of data structures and algorithms.</p>
<p><strong>A FizzBuzz Solution</strong></p>
<p>Enough with the problem specification and on to the solution. Some candidates start with the simplified version of the problem that only considers segmentations into two words. I consider this a <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">FizzBuzz</a> problem, and I expect any competent software engineer to produce the equivalent of the following in their programming language of choice. I&#8217;ll use Java in my example solutions.</p>
<pre>String SegmentString(String input, Set&lt;String&gt; dict) {
  int len = input.length();
  for (int i = 1; i &lt; len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      if (dict.contains(suffix)) {
        return prefix + " " + suffix;
      }
    }
  }
  return null;
}</pre>
<p>I have interviewed candidates who could not produce the above &#8212; including candidates who had passed a technical phone screen at Google. As Jeff Atwood says, FizzBuzz problems are a great way to keep interviewers from wasting their time interviewing programmers who can&#8217;t program.</p>
<p><strong>A General Solution</strong></p>
<p>Of course, the more interesting problem is the general case, where the input string may be segmented into any number of dictionary words. There are a number of ways to approach this problem, but the most straightforward is <a href="http://en.wikipedia.org/wiki/Backtracking">recursive backtracking</a>. Here is a typical solution that builds on the previous one:</p>
<pre>String SegmentString(String input, Set&lt;String&gt; dict) {
  if (dict.contains(input)) return input;
  int len = input.length();
  for (int i = 1; i &lt; len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
    }
  }
  return null;
}</pre>
<p>Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit <a href="http://www.cprogramming.com/tutorial/computersciencetheory/stack.html">stack</a>) in half an hour. I&#8217;m sure that many of them are competent and productive. But I would not hire them to work on <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> or <a href="http://en.wikipedia.org/wiki/Machine_learning">machine learning</a> problems, especially at a company that delivers search functionality on a massive scale.</p>
<p><strong>Analyzing the Running Time</strong></p>
<p><strong></strong>But wait, there&#8217;s more! When a candidate does arrive at a solution like the above, I ask for an <a href="http://en.wikipedia.org/wiki/Big_O_notation">big O</a> analysis of its worst-case running time as a function of n, the length of the input string. I&#8217;ve heard candidates respond with everything from O(n) to O(n!).</p>
<p>I typically offer the following hint:</p>
<pre>Consider a pathological dictionary containing the words
"a", "aa", "aaa", ..., i.e., words composed solely of
the letter 'a'. What happens when the input string is a
sequence of n-1 'a's followed by a 'b'?</pre>
<p>Hopefully the candidate can figure out that the recursive backtracking solution will explore every possible segmentation of this input string, which reduces the analysis to determine the number of possible segmentations. I leave it as an exercise to the reader (with this <a href="http://en.wikipedia.org/wiki/Power_set">hint</a>) to determine that this number is O(2<sup>n</sup>).</p>
<p><strong>An Efficient Solution</strong></p>
<p><strong></strong>If a candidate gets this far, I ask if it is possible to do better than O(2<sup>n</sup>). Most candidates realize this is a loaded question, and strong ones recognize the opportunity to apply <a href="http://20bits.com/articles/introduction-to-dynamic-programming/">dynamic programming</a> or <a href="http://en.wikipedia.org/wiki/Memoization">memoization</a>. Here is a solution using memoization:</p>
<pre>Map&lt;String, String&gt; memoized;

String SegmentString(String input, Set&lt;String&gt; dict) {
  if (dict.contains(input)) return input;
  if (memoized.containsKey(input) {
    return memoized.get(input);
  }
  int len = input.length();
  for (int i = 1; i &lt; len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        memoized.put(input, prefix + " " + segSuffix);
        return prefix + " " + segSuffix;
      }
    }
  memoized.put(input, null);
  return null;
}</pre>
<p>Again the candidate should be able to perform the worst-case analysis. The key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes. I leave as an exercise to the reader to determine that the worst-case running time of the memoized solution above is O(n<sup>2</sup>), assuming that the substring operation only requires constant time (a discussion which itself makes for an <a href="http://stackoverflow.com/questions/4679746/time-complexity-of-javas-substring">interesting tangent</a>).</p>
<p><strong>Why I Love This Problem</strong></p>
<p>There are lots of reasons I love this problem. I&#8217;ll enumerate a few:</p>
<ul>
<li>It is a real problem that came up in the couse of developing production software. I developed Endeca&#8217;s original implementation for rewriting search queries, and this problem came up in the context of spelling correction and thesaurus expansion.</li>
<li>It does not require any specialized knowledge &#8212; just strings, sets, maps, recursion, and a simple application of dynamic programming / memoization. Basics that are covered in a first- or second-year undergraduate course in computer science.</li>
<li>The code is non-trivial but compact enough to use under the tight conditions of a 45-minute interview, whether in person or over the phone using a tool like <a href="http://collabedit.com/">Collabedit</a>.</li>
<li>The problem is challenging, but it isn&#8217;t a gotcha problem. Rather, it requires a methodical analysis of the problem and the application of basic computer science tools.</li>
<li>The candidate&#8217;s performance on the problem isn&#8217;t binary. The worst candidates don&#8217;t even manage to implement the fizzbuzz solution in 45 minutes. The best implement a memoized solution in 10 minutes, allowing you to make the problem even more interesting, e.g., asking how they would handle a dictionary too large to fit in main memory. Most candidates perform somewhere in the middle.</li>
</ul>
<p><strong>Happy Retirement</strong></p>
<p>Unfortunately, all good things come to an end. I recently discovered that a candidate posted this problem on <a href="http://www.glassdoor.com/">Glassdoor</a>. The solution posted there hardly goes into the level of detail I&#8217;ve provided in this post, but I decided that a problem this good deserved to retire in style.</p>
<p>It&#8217;s hard to come up with good interview problems, and it&#8217;s also hard to keep secrets. <a href="http://thenoisychannel.com/2010/12/22/the-secret-may-be-to-keep-fewer-secrets/">The secret may be to keep fewer secrets.</a> An ideal interview question is one for which advance knowledge has limited value. I&#8217;m working with my colleagues on such an approach. Naturally, I&#8217;ll share more if and when we deploy it.</p>
<p>In the mean time, I hope that everyone who experienced the word break problem appreciated it as a worthy test of their skills. No problem is perfect, nor can performance on a single interview question ever be a perfect predictor of how well a candidate will perform as an engineer. Still, this one was pretty good, and I know that a bunch of us will miss it.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/08/08/retiring-a-great-interview-problem/feed/</wfw:commentRss>
		<slash:comments>84</slash:comments>
		</item>
		<item>
		<title>Upcoming Information Retrieval Conferences</title>
		<link>http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/</link>
		<comments>http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/#comments</comments>
		<pubDate>Sun, 31 Jul 2011 21:11:19 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3748</guid>
		<description><![CDATA[I hope everyone who attended the recent SIGIR 2011 in Beijing had an excellent experience. I didn&#8217;t manage to make it to that side of the globe myself, but I&#8217;m looking forward to hearing back from my LinkedIn colleagues who were there &#8212; particularly Paul Ogilvie, who gave an invited talk at the first Workshop on Entity-Oriented [...]]]></description>
			<content:encoded><![CDATA[<p>I hope everyone who attended the recent <a href="http://www.sigir2011.org/">SIGIR 2011</a> in Beijing had an excellent experience. I didn&#8217;t manage to make it to that side of the globe myself, but I&#8217;m looking forward to hearing back from my <a href="http://engineering.linkedin.com/">LinkedIn</a> colleagues who were there &#8212; particularly <a href="http://www.linkedin.com/in/paulogilvie">Paul Ogilvie</a>, who gave an invited talk at the first Workshop on Entity-Oriented Search (EOS) on &#8220;Anchoring Relevance with Entities&#8221;.</p>
<p>There are four outstanding information retrieval conferences coming up, and I will have the pleasure of participating in three of them. I&#8217;d like to make sure readers here are aware of all of them.</p>
<p><a href="http://www.kdd.org/kdd2011/"><img class="alignnone" title="KDD 2011" src="http://www.kdd.org/kdd2011/images/KDD_Banner_10_Jan.jpg" alt="" width="757" height="86" /></a></p>
<p>The first is <a href="http://www.kdd.org/kdd2011/">KDD 2011</a>, which will take place August 21-24, 2011 in San Diego, CA. The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location.</p>
<p>I will not be attending KDD myself, but several of my colleagues will be there. In particular, <a href="http://www.linkedin.com/in/bekkerman">Ron Bekkerman</a> will be presenting a paper on &#8220;High-Precision Phrase-Based Document Classification on a Modern Scale&#8221;, as well as offering a tutorial on &#8220;Scaling Up Machine Learning: Parallel and Distributed Approaches&#8221;.</p>
<p><strong><a href="http://hcir.info/hcir-2011/">Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011)</a> - Mountain View, CA &#8211; October 20, 2011</strong></p>
<p>The second is <a href="http://hcir.info/hcir-2011/">HCIR 2011</a>, the fifth annual HCIR workshop, which I am co-organizing. It will be held all day on Thursday, October 20th, 2011 at Google&#8217;s main campus in Mountain View, California. There will be a reception on Wednesday evening before the workshop. Our keynote speaker this year will be Gary Marchionini, Dean of the School of Information and Library Science, University of North Carolina at Chapel Hill. We are also excited to continue the HCIR Challenge, this year focusing on the problem of information availability, where the seeker faces uncertainty as to whether the information of interest is available at all. The corpus will be the CiteSeer digital library of scientific literature, which contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.</p>
<p>Thanks to generous contributions made by Google, Microsoft Research, and Endeca, there will be no registration fee for HCIR this year. Information about how to register will be sent to authors of accepted position papers, research papers, and challenge reports. Note that the submission deadline has been <strong>extended by two weeks to Sunday, August 14th</strong>. I strongly encourage you to submit in one of these categories in you are working in this field.</p>
<p><a href="http://recsys.acm.org/2011/"><img class="alignnone size-full wp-image-3753" title="RecSys 2011" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/recsys11.png" alt="" width="586" height="178" /></a></p>
<p>The third is <a href="http://recsys.acm.org/2011/">RecSys 2011</a>, the 5th ACM International Conference on Recommender Systems. RecSys 2011 builds on the success of the Recommenders 06 Summer School in Bilbao, Spain and the series of four successful conference events from 2007 to 2010 in Minneapolis (2007), Lausanne (2008), New York (2009) and Barcelona (2010). In these events many members of the practitioner and research communities valued the rich exchange of ideas made possible by the shared plenary sessions. The 5th International conference will promote the same close interaction among practitioners and researchers.</p>
<p>I will be giving a tutorial at RecSys 2011 on &#8220;Recommendations as a Conversation with the User&#8221;.</p>
<p><a href="http://www.cikm2011.org/"><img class="alignnone" title="CIKM 2011" src="http://www.cikm2011.org/sites/default/files/cikm2011_craigm_v1_logo.jpg" alt="" width="491" height="78" /></a></p>
<p>The fourth is <a href="http://www.cikm2011.org/">CIKM 2011</a>, the 20th ACM Conference on Information and Knowledge Management. It will take place in Glasgow, Scotland, UK, 24th-28th October 2011. Since 1992, the CIKM has successfully brought together leading researchers and developers from the database, information retrieval, and knowledge management communities. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future research directions through the publication of high quality, applied and theoretical research findings. CIKM 2011 will continue the tradition of promoting collaboration among multiple areas in the general areas of databases, information retrieval, and knowledge management.</p>
<p>I am proud to be organizing the <a href="http://www.cikm2011.org/industryevent">CIKM 2011 Industry Event</a>, which will feature such industry heavyweights as <a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a> (Microsoft Research), <a href="http://www.freebase.com/view/en/john_giannandrea">John Giannandrea</a> (Google), <a href="http://research.yahoo.com/Vanja_Josifovski">Vanja Josifovski</a> (Yahoo! Research), <a href="http://company.yandex.com/corporate_governance/board_of_directors/ilya_segalovich.xml">Ilya Segalovich</a> (Yandex), <a href="http://jeffhammerbacher.com/">Jeff Hammerbacher</a> (Cloudera), and <a href="http://www.linkedin.com/in/chavdarbotev">Chavdar Botev</a> (LinkedIn).</p>
<p>I&#8217;m very excited about all four of these opportunities to exchange ideas about information retrieval and related areas, and I am grateful to LinkedIn for supporting my participation, as well as that of my colleagues. I hope to see some of you at these events!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/31/upcoming-information-retrieval-conferences/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Attention vs. Privacy</title>
		<link>http://thenoisychannel.com/2011/07/24/attention-vs-privacy/</link>
		<comments>http://thenoisychannel.com/2011/07/24/attention-vs-privacy/#comments</comments>
		<pubDate>Mon, 25 Jul 2011 06:22:22 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3740</guid>
		<description><![CDATA[A major feature of the recently released Google+ is Circles, which allows you to &#8220;share relevant content with the right people, and follow content posted by people you find interesting.&#8221; Most people seem to look at Circles as a privacy feature &#8212; and indeed Google&#8217;s official description gives the impression that Circles exist to manage [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3741" title="Attention" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/attention.jpg" alt="" width="256" height="256" /></p>
<p>A major feature of the recently released <a href="https://plus.google.com/">Google+</a> is <a href="http://www.google.com/support/+/bin/static.py?hl=en&amp;page=guide.cs&amp;guide=1257347&amp;rd=1">Circles</a>, which allows you to &#8220;share relevant content with the right people, and follow content posted by people you find interesting.&#8221;</p>
<p>Most people seem to look at Circles as a privacy feature &#8212; and indeed Google&#8217;s official description gives the impression that Circles exist to manage privacy based on real-life social contexts. Of course, re-sharing can result in unintended consequences, and Google even offers a <a href="http://www.google.com/support/+/bin/static.py?hl=en&amp;page=guide.cs&amp;guide=1358057&amp;answer=1297219&amp;rd=1">warning</a> that:</p>
<blockquote><p>Unless you disable reshares, anything you share (either publicly or with your circles) can be reshared beyond the original people you shared the content with. This could happen either through reshares or through mentions in comments.</p></blockquote>
<p>Privacy is a big deal, <a href="http://ftc.gov/opa/2011/03/google.shtm">especially for Google</a> &#8212; and particularly in the context of rolling out a new social network. Still, I&#8217;m not persuaded that privacy is the only or even the primary concern motivating the concept of <a href="http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/">social circles</a>.</p>
<p>Sharing content with someone is not just about giving that person permission to see it. Sharing content with someone asserts a claim on that person&#8217;s <a href="http://thenoisychannel.com/2008/12/17/the-macroeconomics-of-information-and-attention-how-people-make-decisions/">attention</a>. While it may be a privilege for me to have access to your content, it may be even more of a privilege for you that I allocate my scarce attention to consume it.</p>
<p>What if we focus on routing content to the people who would find it most interesting? Such an approach works best if all of the shared content is <a href="http://thenoisychannel.com/2008/11/27/when-in-doubt-make-it-public/">public</a> with respect to permissions &#8212; that is, people post it without any expectation of privacy. Twitter demonstrates that many people are comfortable with such a sharing model. Imagine if they could learn to trust a system that optimizes (or at least attempts to optimize) the allocation of everyone&#8217;s attention. This is not an easy problem by any means, nor is it one that is likely to be solved by algorithms alone. It will take a strong dose of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> to get it right. But, at least in my view, optimizing the allocation of human attention is the grand challenge that everyone working with information retrieval or social networks should be striving to address.</p>
<p>Privacy is important, and social networks should offer simple, robust privacy controls that users understand. We all have experienced the problem of <a href="http://thenoisychannel.com/2008/09/23/quick-bites-filter-failure/">filter failure</a>. But sharing isn&#8217;t just about privacy. Our attention is our most precious cognitive asset, both as individuals and as a society, Moreover, our attention faces ever-increasing demands as our social lives evolve in an online world relatively free of physical constraints. Social network developers would do well to pay attention&#8230;to attention.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/24/attention-vs-privacy/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/24/attention-vs-privacy/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Guest Post: Diego Basch on The Need for Speed</title>
		<link>http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/</link>
		<comments>http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/#comments</comments>
		<pubDate>Mon, 18 Jul 2011 00:05:00 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3718</guid>
		<description><![CDATA[Diego Basch is the CEO and founder of IndexTank, a hosted search service that powers major web sites such as Reddit, Twitvid, blip.tv, as well as providing a WordPress plug-in for blogs (like this one). Diego gained his search experience working with Inktomi, where he wrote some of the world&#8217;s first web-scale link analysis algorithms. He [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3720" title="Diego Basch" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/dbasch.jpg" alt="" width="180" height="180" /></p>
<p><a href="http://indextank.com/"><img class="alignnone" title="IndexTank" src="http://indextank.com/_static/common/images/logo.gif" alt="" width="149" height="37" /></a></p>
<p><em>Diego Basch is the CEO and founder of <a href="http://indextank.com/">IndexTank</a>, a hosted search service that powers major web sites such as Reddit, Twitvid, blip.tv, as well as providing a WordPress plug-in for blogs (like this one). Diego gained his search experience working with Inktomi, where he wrote some of the world&#8217;s first web-scale link analysis algorithms. He is on a mission to make every search box blazing fast and useful.</em></p>
<p>So much brainpower is spent solving the wrong problems. The world is filled with solutions looking for problems that nobody has &#8212; as illustrated by a Google query for [<a href="http://www.google.com/search?sourceid=chrome&amp;ie=UTF-8&amp;q=stupidest+inventions+ever">stupidest inventions ever</a>]. More often, people focus narrowly on a particular approach when they should focus on the problem the approach is intended to solve. Or they take a solution for one problem and assume it will apply to another.</p>
<p>Consider the emphasis that search engine developers place on relevance ranking. It is not hard to understand why web-scale search engines emphasize relevance. For example, a search on Google for [<a href="http://www.google.com/search?aq=f&amp;sourceid=chrome&amp;ie=UTF-8&amp;q=emergency+locksmith">emergency locksmith</a>] returns tens of billions of web pages, among which there are only a handful results that you want. Google must filter out the growing number of <a href="http://www.nytimes.com/2011/07/10/your-money/lead-gen-sites-pose-challenge-to-google-the-haggler.html?_r=2">lead generation companies</a> that spend a ton of money trying to game its results.</p>
<p>Most web and application developers are familiar with the concept of relevance, so they naturally assume that it should be the primary concern when they add search to their own sites or apps. When I talk to people who want full-text search for their 40,000 book titles or 100k classified ads, they ask me about all the ways they can tune relevance. But often they are focusing on a solution, rather than their fundamental problem.</p>
<p>Developers are (or should be!) trying to improve the user experience of their application search. Too often they wrongly assume that relevance is the single most important factor for optimizing this user experience. Let&#8217;s surface this confusion in a concrete example.</p>
<p>As a rock climber, once in a while I feel the aches and pains caused by the sport. As the years go by it&#8217;s very important to keep your tendons healthy if you do not want to take forced breaks (or type with one hand!). <a href="http://rockclimbing.com/" target="_blank">Rockclimbing.com</a> is one of the most popular climbing sites, and I know some medical professionals who occasionally answer health-related questions there. Let&#8217;s search there for [<a href="http://www.rockclimbing.com/cgi-bin/forum/gforum.cgi?do=search_results&amp;search_forum=all&amp;search_string=tendon%20injury%20prevention&amp;sb=score&amp;mh=25" target="_blank">tendon injury prevention</a>].</p>
<p><img class="alignnone size-full wp-image-3736" title="tendon injury prevention" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/tendoninjuryprevention.png" alt="" width="806" height="626" /></p>
<p>In the above example, part of the problem is that the search results do not have contextual snippets. Maybe there is relevant information hiding behind a click, but the user has no way of knowing. More generally, there&#8217;s no hint as to what results could be better. Information such as score of the answer (which is available), the author&#8217;s bio (e.g. &#8220;climber, physical therapist&#8221;) would make the decision easier. If you need to click and scroll, search within the page, go back and try something else, then the search engine is wasting your time.<span style="font-family: arial, sans-serif;"> </span></p>
<p>Which brings us the broader point: when users search, they want to spend the least amount of time possible getting to the information they want. Relevance is a means to this end. In particular, clicks and typing costs users time. That time can come from page load, rendering, repeated use of the back button, and of course typing (and re-typing) search queries.</p>
<p>Some application search engines really nail the user experience. Let’s say we’re looking for the movie Koyaanits-however-you-spell-it. Go to the <a href="http://imdb.com/">Internet Movie Database</a> (IMDB) and start typing k-o-y-e &#8212; and there it is, as the second result. Notice that there is a ton of irrelevant stuff around it but it doesn’t matter. I see what I want very quickly.</p>
<p><img src="https://lh5.googleusercontent.com/FBZy2xxDowcimmRbJbtwqFixN375kw6a5JM5UJmin_m1IWrdKGSGwSRzvDfCj6esLW4pXBD5K-SA8JLdmi54xxnmOoliII8u66KeLKuC59ZL7VhcdcQ" alt="" width="382px;" height="348px;" /></p>
<p>Hopefully these two examples serve to illustrate the broader point: search engines should not focus on relevance as an end in itself, but rather on whatever helps users find the information they want as quickly as possible. That means offering contextual snippets, instant feedback, and of course snappy response times. Give users speed, and you will make them happy.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/17/guest-post-diego-bsch-on-the-need-for-speed/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Google±?</title>
		<link>http://thenoisychannel.com/2011/07/04/google%c2%b1/</link>
		<comments>http://thenoisychannel.com/2011/07/04/google%c2%b1/#comments</comments>
		<pubDate>Mon, 04 Jul 2011 22:55:09 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3703</guid>
		<description><![CDATA[When I left Google last December, it was an open secret that Google was developing a social networking product. Now that Google has released Google+, I am at liberty to share my personal impressions. Let&#8217;s start with the clear wins. Impressive launch. Google has certainly learned its lesson from the past launches of Wave and Buzz. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://plus.google.com/"><img class="alignnone size-full wp-image-3707" title="Google+" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/07/Google+.png" alt="" width="500" height="477" /></a></p>
<p>When I <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">left Google</a> last December, it was an <a href="http://techcrunch.com/2010/12/01/google-social-emerald-sea/">open secret</a> that Google was developing a social networking product. Now that Google has released <a href="http://plus.google.com/">Google+</a>, I am at liberty to share my personal impressions.</p>
<p>Let&#8217;s start with the clear wins.</p>
<ul>
<li><strong>Impressive launch.</strong> Google has certainly learned its lesson from the past launches of <a href="http://mashable.com/2010/08/04/rip-google-wave/">Wave</a> and <a href="http://www.quora.com/Why-did-Google-Buzz-fail">Buzz</a>. Google+ is unambiguously opt-in &#8212; no one is going to complain about being <a href="http://techcrunch.com/2011/03/30/reid-hoffman-data-ambush/">ambushed</a>. People have been begging for invites. But Google is wisely releasing invites quickly enough to build critical mass. I&#8217;d say that Google has at least picked up the <a href="http://www.quora.com/">Quora</a> crowd of early adopters in Silicon Valley.</li>
</ul>
<ul>
<li><strong>Clean design.</strong> Design lead <a href="http://techcrunch.com/2011/06/28/google-plus-design-andy-hertzfeld/">Andy Hertzfeld</a> (of Macintosh fame) has nailed it, leading bloggers to comment that this looks too well designed to be a Google product. Comparing Google+ to Facebook now, I&#8217;m reminded at least a little of comparisons between Facebook and Myspace. Great move for Google here.</li>
</ul>
<p>Now let&#8217;s talk about Google&#8217;s three big features here: Circles, Sparks, and Hangouts.</p>
<ul>
<li><strong>Circles.</strong> Straight out of Paul Adams&#8217;s <a href="http://thenoisychannel.com/2010/07/08/paul-adamss-presentation-on-social-networking/">presentation of social networking</a> (which he created before he <a href="http://techcrunch.com/2011/07/01/paul-adams-seeing-google-in-public-is-like-bumping-into-an-ex-girlfriend/">left Google for Facebook</a>), the idea is simple: a person doesn&#8217;t have a single group of friends, but rather several groups that tend are mostly disjoint. Through Circles, Google+ makes this soft partitioning of the social space a core design principle. You add people to one or more circles, follow the stream of activity from a circle, and share with circles. It&#8217;s great in theory. But in practice it creates friction, especially for people trained on Facebook. There&#8217;s a trade-off between simplicity and expressive power, and Google is placing a strong bet on how users will make this trade-off.  I&#8217;m inclined to agree with <a href="http://www.quora.com/Yishan-Wong/How-Google+-Shows-That-Google-Still-Doesnt-Understand-Social">Yishan Wong</a> that &#8220;the sorting of friends into buckets (friend lists) is something that only nerds do&#8221;. Given Google&#8217;s deep expertise in machine learning, I&#8217;m expecting Google to reduce this friction by give users intelligent suggestions. <em>Full disclosure: my colleagues at LinkedIn built <a href="http://blog.linkedin.com/2011/01/24/linkedin-inmaps/">InMaps</a>, which infers communities from your social network.</em></li>
</ul>
<ul>
<li><strong>Sparks.</strong> The tagline for Sparks is &#8220;For nerding out. Together.&#8221; It feels like a positioning designed by Googlers for Googlers&#8211; you can see promotional videos <a href="http://www.youtube.com/watch?v=MRkAdTflltcgoo">here</a> and <a href="http://www.youtube.com/watch?v=0DoAl4JXhQo">here</a>. I haven&#8217;t seen much talk about Sparks, and what little commentary I&#8217;ve seen is less than gushing. I&#8217;ve experimented with it a bit from a consumption side, and I confess I&#8217;m underwhelmed. Perhaps it&#8217;s a chicken-and-egg problem &#8212; Sparks will only be useful if users populate their profiles with interests, but right now users have no incentive to do so. If Sparks is Google&#8217;s attempt to make <a href="http://en.wikipedia.org/wiki/Google_Reader">Reader</a> more social, there&#8217;s still a ways to go. <em>Full disclosure: LinkedIn has its own approach to social news, <a href="http://blog.linkedin.com/2011/03/10/linkedin-today/">LinkedIn Today</a>, which seems to be <a href="http://techcrunch.com/2011/06/30/linkedin-traffic-twitter/">doing something right</a>. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />  </em></li>
</ul>
<ul>
<li><strong>Hangouts.</strong> In plain English, Hangouts are group video chat embedded in a social network. Which sounds a lot like what Facebook is <a href="http://techcrunch.com/2011/07/01/facebook-will-launch-in-browser-video-chat-next-week-in-partnership-with-skype/">rumored</a> to be releasing this week through a partnership with Skype. Which in turn was just <a href="http://www.microsoft.com/presspass/press/2011/may11/05-10corpnewspr.mspx">acquired by Microsoft</a>. Will Apple join the party too by implementing group chat in <a href="http://www.apple.com/mac/facetime/">FaceTime</a>? Competitive dynamics aside, this is a very cool feature that hopefully won&#8217;t devolve into <a href="http://en.wikipedia.org/wiki/Chatroulette">Chatroulette</a>. Nothing to, um, disclose here.</li>
</ul>
<p>But the $64B question is whether all this will matter. Can Google+ sustainably co-exist with Facebook? Will people use both services &#8212; and, if so, how will they allocate their attention between them? Or is the success of Google+ predicated on displacing Facebook? Or Twitter? Either of those would certainly qualify as a <a href="http://en.wikipedia.org/wiki/Big_Hairy_Audacious_Goal">Big Hairy Audacious Goal</a>.</p>
<p>Like <a href="http://www.avc.com/a_vc/2011/07/why-im-rooting-for-google.html">Fred Wilson</a>, I&#8217;m rooting for Google+ to succeed &#8212; but even Fred <a href="http://www.avc.com/a_vc/2011/07/why-im-rooting-for-google.html#comment-240598057">notes</a> that he would not be able to get his family on Google+, as they are already happy with Facebook. It&#8217;s not clear to me what I can get *today* from Google+ that I can&#8217;t get from Facebook.</p>
<p>Granted, I&#8217;m not a heavy Facebook user, so I&#8217;m not the best person to ask this question. So readers, I ask you: why will or won&#8217;t you use Google+?</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/07/04/google%c2%b1/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/07/04/google%c2%b1/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>InSecret: A LinkedIn Hackday Master Tries Something Different</title>
		<link>http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/</link>
		<comments>http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/#comments</comments>
		<pubDate>Sat, 25 Jun 2011 04:48:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3692</guid>
		<description><![CDATA[LinkedIn Hackdays are an awesome opportunity for innovation &#8212; learn more about them here. But first check out this unusual entry by Hackday master Dhananjay Ragade, whose previous hacks include the LinkedIn Year in Review: Don&#8217;t worry, it&#8217;s safe for work. Well, unless you work for Linden Lab.]]></description>
			<content:encoded><![CDATA[<p>LinkedIn Hackdays are an awesome opportunity for innovation &#8212; learn more about them <a href="http://blog.linkedin.com/category/linkedin-hackdays/">here</a>. But first check out this unusual entry by Hackday master <a href="http://www.linkedin.com/in/dragade">Dhananjay Ragade</a>, whose previous hacks include the <a href="http://blog.linkedin.com/2011/02/22/linkedin-year-in-review/">LinkedIn Year in Review</a>:</p>
<p><embed width="504" height="312" src="http://www.xtranormal.com/site_media/players/jw_player_v54/player.swf" flashvars="&amp;author=drr&amp;autostart=false&amp;backcolor=0x000000&amp;date=June%2013%2C%202011&amp;description=Sheldon%20wants%20to%20know%20Jane's%20secret%20to%20her%20success.&amp;fbit.height=283&amp;fbit.visible=true&amp;fbit.width=504&amp;fbit.x=0&amp;fbit.y=0&amp;file=http%3A%2F%2Ffarmprod.content.xtranormal.com%2F2011-06-18%2Fpublish%2Fe8937f92-99f5-11e0-aece-123138070614.mp4&amp;frontcolor=0xeeeeee&amp;gapro.accountid=UA-5134028-2&amp;gapro.height=283&amp;gapro.visible=true&amp;gapro.width=504&amp;gapro.x=0&amp;gapro.y=0&amp;image=http%3A%2F%2Ffarmprod.content.xtranormal.com%2F2011-06-18%2Fpublish%2Fe8937f92-99f5-11e0-aece-123138070614.png&amp;lightcolor=0xeeeeee&amp;link=http%3A%2F%2Fwww.xtranormal.com%2Fwatch%2F12209260%2Finsecret&amp;plugins=fbit-1%2Ctweetit-1%2Cviral-2%2Cgapro&amp;screencolor=0x000000&amp;skin=http%3A%2F%2Fwww.xtranormal.com%2Fsite_media%2Fplayers%2Fjw_player_v54%2Fxn.xml&amp;title=InSecret&amp;tweetit.height=283&amp;tweetit.visible=true&amp;tweetit.width=504&amp;tweetit.x=0&amp;tweetit.y=0" allowfullscreen="true" allowscriptaccess="always" bgcolor="0x000000"></embed></p>
<p>Don&#8217;t worry, it&#8217;s safe for work. Well, unless you work for Linden Lab. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/24/insecret-a-linkedin-hackday-master-tries-something-different/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>It Just Works</title>
		<link>http://thenoisychannel.com/2011/06/23/it-just-works/</link>
		<comments>http://thenoisychannel.com/2011/06/23/it-just-works/#comments</comments>
		<pubDate>Fri, 24 Jun 2011 06:20:43 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3682</guid>
		<description><![CDATA[Given that I work for the world&#8217;s largest professional network, I take work very personally. I&#8217;m also deeply involved in LinkedIn&#8217;s hiring process, which gives me opportunities to see how people make career decisions. I thought I&#8217;d share my own perspective here. For me there are three things that matter to me about my work: [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Maslow's_hierarchy_of_needs"><img class="alignnone" title="Maslow's Hierarchy of Needs" src="http://upload.wikimedia.org/wikipedia/commons/6/60/Maslow%27s_Hierarchy_of_Needs.svg" alt="" width="491" height="369" /></a></p>
<p>Given that I work for the <a href="http://www.linkedin.com/">world&#8217;s largest professional network</a>, I take work very personally. I&#8217;m also deeply involved in LinkedIn&#8217;s hiring process, which gives me opportunities to see how people make career decisions. I thought I&#8217;d share my own perspective here.</p>
<p>For me there are three things that matter to me about my work:</p>
<ol>
<li><strong>Do I love the work I do? </strong>Does work feel like play, stimulating me intellectually and emotionally? Am I excited about the people I work with? Is work a grind, or is it something I do for fun?</li>
<li><strong>Is the work I do of value to my employer?</strong> Am I justifying my employer&#8217;s investment in me, or am I a freeloader lost in the inefficiency of corporate bureaucracy?</li>
<li><strong>Is my work making the world a better place?</strong> Specifically, is the work I do making the world by more like the world I want to live in?</li>
</ol>
<p>Not everyone may share my above values, and in any case not every job can address all of these values. But I am fortunate to have found one that does, and I&#8217;m loving it. To borrow a phrase, it just works.</p>
<p>If you haven&#8217;t seen this video by Dan Pink on what motivates people, I urge you to watch it. It&#8217;s a great reminder that there is more to motivation than economic incentives.</p>
<p><iframe width="490" height="305" src="http://www.youtube.com/embed/u6XAPnuFjJc?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>Finally, I hope that you are doing work that fulfills you. As I work to grow my <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">great team at LinkedIn</a>, my mission is not only to to bring great people to LinkedIn, but bring great work and fulfillment to great people. Whatever you do, be amazing.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/23/it-just-works/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/23/it-just-works/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Foo for Thought</title>
		<link>http://thenoisychannel.com/2011/06/18/foo-for-thought/</link>
		<comments>http://thenoisychannel.com/2011/06/18/foo-for-thought/#comments</comments>
		<pubDate>Sat, 18 Jun 2011 20:29:29 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3678</guid>
		<description><![CDATA[Last weekend I had the extraordinary privilege to attend Foo Camp, an annual gathering of about 250 Friends Of O&#8217;Reilly (aka Foo). Tim O&#8217;Reilly, Sara Winge, and their colleagues have amazing friends, as you can see if you scan this unofficial list of attendees working on big data, open government, computer security, and more generally [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone" title="Foo Camp (photo by Jeremy Zawodny)" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/06/foo-camp.jpg" alt="" width="500" height="366" /></p>
<p>Last weekend I had the extraordinary privilege to attend Foo Camp, an annual gathering of about 250 Friends Of O&#8217;Reilly (aka Foo). <a href="http://radar.oreilly.com/tim/">Tim O&#8217;Reilly</a>, <a href="http://radar.oreilly.com/sara/">Sara Winge</a>, and their colleagues have amazing friends, as you can see if you scan this <a href="http://twitter.com/#!/mrflip/foocamp/members">unofficial list of attendees</a> working on big data, open government, computer security, and more generally on the cutting edge of technology and culture (especially where the two overlap).</p>
<p>Foo Camp is an <a href="http://en.wikipedia.org/wiki/Unconference">unconference</a>, which merits some elaboration. No fees, no conference hotel (many attendees literally set up camp in the space O&#8217;Reilly provided), and no advance program aside from some preselected 5-minute <a href="http://ignite.oreilly.com/">Ignite</a> presentations. Attendees proposed and organized sessions, merging and re-arranging them to optimize for participation. It was a bit chaotic (especially the mad rush after dinner to secure session slots), but very effective.</p>
<p>The minimalist format brought out the best in participants.</p>
<p>For example, I am passionate about (i.e., against) software patents, so I organized a session about them. I did a double-take when I realized that one of the participants was <a href="http://people.ischool.berkeley.edu/~pam/">Pamela Samuelson</a>, perhaps the world&#8217;s top expers on intellectual property law. I braced myself to be schooled &#8212; as I was. But she did it gently and constructively. Specifically, she pointed me to work that her colleagues <a href="http://www.law.berkeley.edu/4457.htm">Jason Schultz</a> and <a href="http://www.law.berkeley.edu/9959.htm">Jennifer Urban</a> were doing on a defensive patent strategy for open-source software (including a <a href="http://events.stanford.edu/events/276/27687/">proposed license</a>), as well as reminding me of the <a href="http://radar.oreilly.com/2010/07/why-software-startups-decide-t.html">Berkeley Patent Survey</a> supporting the argument that software entrepreneurs only file for patents because of real or perceived pressure from their investors. I also heard war stories from lawyers who have done pro bono work against patent trolls, reinforcing my own resolve and also reassuring me that the examples I&#8217;ve seen <a href="http://thenoisychannel.com/2009/10/03/software-patents-a-personal-story/">at close range</a> are not isolated.</p>
<p>Another session asked whether we are too data driven in our work. What was notable is that this session included participants from some of the largest internet companies debating some of the must fundamental ways in which we work, e.g., do we actually learn from data or do we engage in assault by data to defend preconceived positions (cf. <a href="http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/">argumentative theory</a>). Like all of the conference, the discussion was under &#8220;frieNDA&#8221;. so I&#8217;m being intentionally vague on the specifics. But it was refreshing to see candid admission that all of us know and have experienced the dangers of manipulating an audience with data, and that there are no algorithms to enforce common sense and good faith.</p>
<p>I won&#8217;t even try to enumerate the sessions and side conversations that excited me &#8212; topics included privacy, the future of publishing, a critical analysis of geek culture, and irrational user behavior. I missed the session on data-driven parenting, though others have pointed out to me that you can only learn so much if you don&#8217;t have twins and perform <a href="http://en.wikipedia.org/wiki/A/B_testing">A/B tests</a>. The best summary is intellectual diversity and overstimulation. If you&#8217;d like to get a general sense of the discussion, check out the <a href="http://twitter.com/#!/search/%23foocamp">#foocamp</a> tweet stream. I also recommend Scott Berkun&#8217;s post on &#8220;<a href="http://www.scottberkun.com/blog/2011/what-i-learned-at-foo-camp-11/">What I learned at FOO Camp</a>&#8220;.</p>
<p>As someone who organizes the <a href="http://hcir.info/hcir-2011/">occasional</a> <a href="http://www.cikm2011.org/industryevent">event</a>, I&#8217;m intrigued by the unconference approach &#8212; especially now that I&#8217;ve experienced it first-hand. Moreover, I feel strongly that <a href="http://thenoisychannel.com/2009/08/02/are-academic-conferences-broken-can-we-fix-them/">the academic conference model needs an upgrade</a>. But I also know that open-ended, free-form discussion sessions are not a viable alternative &#8212; indeed, a big part of Foo Camp&#8217;s success was how it inspired participants to organize sessions &#8212; and to vote with their feet to attend the worthwhile ones. And of course part of that success came from inviting active, engaged participants rather than passive spectators.</p>
<p>Many of you also organize events, and I&#8217;m sure that all of you attend them. I&#8217;m curious to hear your thoughts about how to make them better, and happy to share more of what I learned at Foo Camp. After all, Foo is for (inspiring) thought.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/18/foo-for-thought/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/18/foo-for-thought/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Christos Faloutsos: Mining Billion-Node Graphs</title>
		<link>http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/</link>
		<comments>http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/#comments</comments>
		<pubDate>Thu, 09 Jun 2011 03:00:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3667</guid>
		<description><![CDATA[As promised, here is a video of CMU professor Christos Faloutsos&#8216;s recent tech talk at LinkedIn on &#8220;Mining Billion-Node Graphs&#8220;. Enjoy! And check out our next week&#8217;s open tech talk by Sreenivas Gollapudi of Microsoft Research on &#8220;A Framework for Result Diversification in Search&#8220;. ps. If you like these topics, then please talk to me [...]]]></description>
			<content:encoded><![CDATA[<p><iframe width="500" height="312" src="http://www.youtube.com/embed/GBzoNgqF-gQ?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>As promised, here is a video of CMU professor <a href="http://www.cs.cmu.edu/~christos/">Christos Faloutsos</a>&#8216;s recent tech talk at LinkedIn on &#8220;<a href="http://events.linkedin.com/Mining-Billion-Node-Graphs-LinkedIn-Tech/pub/660176">Mining Billion-Node Graphs</a>&#8220;. Enjoy!</p>
<p>And check out our next week&#8217;s open tech talk by <a href="http://www.sreenivasgollapudi.com/">Sreenivas Gollapudi</a> of Microsoft Research on &#8220;<a href="http://events.linkedin.com/Framework-Result-Diversification-Search/pub/691171">A Framework for Result Diversification in Search</a>&#8220;.</p>
<p>ps. If you like these topics, then please talk to me about opportunities at LinkedIn! My group is <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">hiring</a>, as are <a href="http://www.linkedin.com/jsearch?keywords=engineering+OR+scientist+OR+research+OR+data&#038;searchLocationType=Y&#038;keepFacets=keepFacets&#038;page_num=1&#038;facet_COMPANY=1337&#038;pplSearchOrigin=MDYS&#038;sortCriteria=R">many others</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/08/christos-faloutsos-mining-billion-node-graphs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Winning the War for Software Engineering Talent</title>
		<link>http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/</link>
		<comments>http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/#comments</comments>
		<pubDate>Mon, 06 Jun 2011 01:07:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3647</guid>
		<description><![CDATA[The war for talent. It&#8217;s the latest metaphor for the challenge that tech companies face as excitement is building in Silicon Valley again. Well, not really &#8212; McKinsey coined the phrase in 1997 and used it as the title of a book published four years later. But anyone who has been trying to hire great [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone" style="margin-left: 10px; margin-right: 10px;" src="http://siliconvalley.sla.org/wp-content/uploads/2010/11/i_want_you_poster.jpg" alt="" width="206" height="230" /><img class="size-full wp-image-3648 aligncenter" title="Real Genius" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/06/realgenius.jpg" alt="" width="182" height="230" /></p>
<p>The war for talent. It&#8217;s the latest metaphor for the challenge that tech companies face as excitement is building in Silicon Valley again. Well, not really &#8212; McKinsey coined the phrase in 1997 and used it as the title of a <a href="http://www.amazon.com/War-Talent-Ed-Michaels/dp/1578514592">book</a> published four years later.</p>
<p>But anyone who has been trying to hire great software engineers in recent months knows how hard it is to do so. Particularly for folks like me who are trying to hire <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">data scientists</a> &#8212; apparently there&#8217;s a <a href="http://www.mckinsey.com/mgi/publications/big_data/index.asp">national shortage</a>. This is nothing new &#8212; as Joel Spolsky noted in a 2006 <a href="http://www.joelonsoftware.com/articles/FindingGreatDevelopers.html">post</a>, &#8220;the great software developers, indeed, the best people in every field, are quite simply never on the market.&#8221;</p>
<p>I&#8217;m not an expert (or <a href="http://blog.linkedin.com/2010/04/08/linkedin-ninja-job-title/">ninja</a>) on the subject of recruiting or employer branding in general, but I&#8217;ve seen enough of how companies go about hiring software engineers to know that we can do better. I&#8217;d like to share some of my thoughts and experiences, and I hope that you will reciprocate and share your thoughts in the comments. I&#8217;m especially interesting in hearing from folks who are at universities (aka hunting grounds) or who are involved in organizing academic conferences.</p>
<p>First, let&#8217;s talk about how we measure success. As <a href="http://en.wikipedia.org/wiki/William_Thomson,_1st_Baron_Kelvin">Lord Kelvin</a> famously said, &#8220;If you can&#8217;t measure it, you can&#8217;t improve it.&#8221; I&#8217;m not going to talk about how to handle active candidates &#8212; that&#8217;s a filtering problem which, in my opinion, is much more tractable. For example, see what Joel has to say about <a href="http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html">interviewing developers</a>. Rather, I&#8217;m concerned with the challenge of discovering qualified passive candidates and converting them into active ones. Hence, I propose we make our metric the number of qualified applicants.</p>
<p>The baseline strategy is sourcing, i.e. have sourcers or hiring managers scour the world for qualified candidates (there&#8217;s an <a href="http://www.linkedin.com/hiring">app</a> for that), entice them with your best recruiting pitch, and then go hog wild on the folks who respond. The success of this strategy depends mainly on the rate at which you, your sourcers, or your hiring managers find qualified candidates &#8212; which in turn may split into the two subtasks of finding candidates and filtering them &#8212; and the conversion rate for the qualified candidates you find. Since the best candidates are often happy in their current positions, sourcing passive candidates requires a lot of work and a thick skin for rejection.</p>
<p>What are other ways to attract qualified passive candidates? Here are a few, with examples from my experience at LinkedIn:</p>
<ul>
<li><strong>Hosting events.</strong> Last week at LinkedIn, we hosted CMU professor <a href="http://www.cs.cmu.edu/~christos/">Christos Faloutsos</a>, who delivered a fantastic talk on &#8220;<a href="http://events.linkedin.com/Mining-Billion-Node-Graphs-LinkedIn-Tech/pub/660176">Mining Billion Node Graphs</a>&#8221; &#8212; a topic we thought interesting enough to justify opening up the talk to the general public. We had a few hundred guests, many of whom are precisely the kinds of folks we are trying to hire. Even more people watched the live stream online or will watch the video when we post it to YouTube (coming soon &#8212; stay tuned!). While this was not a recruiting event (we did not even announce that we are hiring), it was a great opportunity to associate LinkedIn with the hard computer science problems we solve on a daily basis.</li>
<li><strong>Sponsoring events.</strong> Sponsorship is tricky &#8212; if you&#8217;re not careful, you spend a lot of money for a glorified display ad. Sometimes sponsorship offers speaking slots as part of the package, but audiences are rightfully skeptical of speakers who have paid for their slots &#8212; especially at conferences that charge hefty fees for attendance. But sometimes sponsorship works. For example, LinkedIn&#8217;s was a sponsor of the <a href="http://strataconf.com/strata2011">O&#8217;Reilly Strata Conference</a>, and the perks of sponsorship complemented our earned speaker slots, helping us bring enormous visibility to our data scientist team and its recent innovations like <a href="http://blog.linkedin.com/2011/01/24/linkedin-inmaps/">InMaps</a> (we has a booth there to print attendees&#8217; InMaps) and <a href="http://thenoisychannel.com/2011/02/04/got-skills/">Skills</a> (which launched during the conference). While Strata generated few direct leads, it left a lasting impression in the <a href="http://en.wikipedia.org/wiki/Big_data">big data</a> community, and I regularly hear candidates refer to it.</li>
<li><strong>Participating in events.</strong> As the Beatles tell us, money <a href="http://en.wikipedia.org/wiki/Can't_Buy_Me_Love">can&#8217;t buy you love</a>. If you want to make an (positive) impression at a conference, you have to contribute people and ideas. This is especially true at academic conferences, where attendees quickly throw out the the extra weight in their tote bags and focus on the conference&#8217;s content and professional networking opportunities. It&#8217;s great if you are Microsoft with a team of close to a thousand researchers and can <a href="http://research.microsoft.com/en-us/news/features/sigir2010-071910.aspx">dominate</a> a conference like <a href="http://sigir.org/">SIGIR</a>. But smaller companies can still make a strong impression on researchers &#8212; and especially on students who may be looking for internships or full-time positions &#8212; by taking an active role at conferences. The traditional approach is to submit papers to the main conference track &#8212; but other avenues include <a href="http://www.kdd.org/kdd2011/tutorials.shtml">tutorials</a>, <a href="http://hcir.info/hcir-2011/">workshops</a>, and <a href="http://www.cikm2011.org/industryevent">industry events</a>. Such participation is often invited, but such invitations are in turn earned by cultivating relationships with researchers &#8212; especially the ones who find themselves on organizing committees.</li>
<li><strong>Contribute to open source projects.</strong> The Search, Network, and Analytics (SNA) team at LinkedIn contributes frequently to open-source projects and publicizes some of its work at <a href="http://sna-projects.com/">http://sna-projects.com/</a>. Open source projects are a great way to earn the respect of engineers who value source over PowerPoint. Especially when your employees include <a href="http://www.linkedin.com/in/allenwittenauer">committers</a> to key technologies like Hadoop. Moreover, open-source projects are social communities, so contributing to them offers opportunities for employees to interact with potential hires.</li>
<li><strong>Social media.</strong> By now, I&#8217;d like to think that marketers understand social media to simply be another set of marketing channels. But I think the territory is still pretty new for employers. Here is a simple suggestion: encourage (but do not try to force) employees to express themselves professionally online. Enforce the standard non-disclosure rules, of course, but don&#8217;t try to manage their voices. Authenticity speaks for itself &#8212; for example, look at what <a href="http://www.linkedin.com/in/adamnash">Adam Nash</a> says about LinkedIn on his <a href="http://blog.adamnash.com/?s=linkedin">personal blog</a>. Or my own posts <a href="http://thenoisychannel.com/?s=linkedin">here</a>. Engineers don&#8217;t read press releases or  corporate blogs, but they do pay attention to their peers. And there&#8217;s nothing unique about blogs &#8212; the same principle applies to platforms like Twitter, Facebook, Quora, and of course LinkedIn. Not all employees enjoy being online extroverts, but those that do not only act as brand ambassadors, but also are likely to eventually strike up conversations with passive candidates about employment opportunities.</li>
</ul>
<p>Finally, don&#8217;t forget measure the results of these efforts! Some activities generate leads directly, in which case you can make an apples-to-apples comparison of their results and costs with the baseline strategy of sourcing. It&#8217;s harder to measure the longer-term effect of efforts to raise visibility, but you can at least ask candidates if they are aware of those efforts &#8212; after all, efforts to raise visibility should be visible to candidates! You can also ask candidates if those efforts were a factor in their decision to apply. These measures aren&#8217;t perfect, but they are a lot better than nothing, especially when you&#8217;re trying to decide how best to invest limited resources.</p>
<p>Of course, even an optimal strategy can&#8217;t substitute for offering a combination of interesting work, competitive compensation, and a work hard / play hard <a href="http://www.youtube.com/watch?v=PUwEEOhcK3s">culture</a>. As with all marketing efforts, you need to start with a great product. But great products don&#8217;t sell themselves: you need to invest in a combination of outbound and inbound marketing to have a fighting chance in the war for talent. Good luck! And, in case you didn&#8217;t notice, <a href="http://www.linkedin.com/jobs/jobs-Data-Scientist-1544636">we&#8217;re hiring</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/06/05/winning-the-war-for-software-engineering-talent/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>I&#8217;d Like To Have An Argument Please</title>
		<link>http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/</link>
		<comments>http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/#comments</comments>
		<pubDate>Tue, 31 May 2011 03:19:17 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3639</guid>
		<description><![CDATA[If you Google [relevance theory], you&#8217;ll discover this Wikipedia entry about a theory proposed by Dan Sperber and Deirdre Wilson arguing that, in any given communication situation, the listener will stop processing as soon as he or she has found meaning that fits his or her expectation of relevance. The Wikipedia entry offers the following example of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/The_Argument_Sketch"><img class="alignnone" title="Monty Python: The Argument Sketch" src="http://upload.wikimedia.org/wikipedia/en/8/85/Argument_Clinic.png" alt="" width="400" height="312" /></a></p>
<p>If you Google [<a href="http://www.google.com/search?q=relevance+theory">relevance theory</a>], you&#8217;ll discover this <a href="http://en.wikipedia.org/wiki/Relevance_theory">Wikipedia entry</a> about a theory proposed by Dan Sperber and Deirdre Wilson arguing that, in any given communication situation, the listener will stop processing as soon as he or she has found meaning that fits his or her expectation of relevance. The Wikipedia entry offers the following example of this principle:</p>
<blockquote><p>Mary: Would you like to come for a run?</p>
<p>Bill: I&#8217;m resting today.</p>
<p>We understand from this example that Bill does not want to go for a run. But that is not what he said. He only said enough for Mary to add the context-mediated information: i.e. someone who is resting doesn&#8217;t usually go for a run. The implication is that Bill doesn&#8217;t want to go for a run today.</p></blockquote>
<p>This theory may call to mind the <a href="http://en.wikipedia.org/wiki/Gricean_maxims">Gricean Maxims</a> &#8212; indeed, Sperber and Wilson borrow heavily from Grice&#8217;s work.</p>
<p>But I mainly bring up relevance theory to introduce Sperber to those unfamiliar with him. My friend (and <a href="http://www.endeca.com/">Endeca</a> co-founder) <a href="http://facets.endeca.com/authors-2/">Pete Bell</a> recently called to my intention an article by neuroscientist <a href="http://en.wikipedia.org/wiki/Jonah_Lehrer">Jonah Lehrer</a> entitled &#8220;<a href="http://www.wired.com/wiredscience/2011/05/the-sad-reason-we-reason/">The Reason We Reason</a>&#8220;. The article reviews the <a href="http://www.fallacyfiles.org/hothandf.html">&#8220;hot hand&#8221; fallacy</a> and then proceeds to cite a new theory by Sperber and <a href="http://sites.google.com/site/hugomercier/">Hugo Mercier</a>:</p>
<blockquote><p>Reasoning is generally seen as a mean to improve knowledge and make better decisions. Much evidence, however, shows that reasoning often leads to epistemic distortions and poor decisions. This suggests rethinking the function of reasoning. Our hypothesis is that the function of reasoning is argumentative. It is to devise and evaluate arguments intended to persuade.</p></blockquote>
<p>The full article by Mercier and Sperber runs over 17K works and is entitled &#8220;<a href="http://www.dan.sperber.fr/wp-content/uploads/2009/10/MercierSperberWhydohumansreason.pdf">Why do humans reason? Arguments for an argumentative theory</a>&#8220;.</p>
<p>As someone who has spent most of his professional life thinking about information retrieval in practical contexts, I automatically relate relevance theory to <a href="http://en.wikipedia.org/wiki/Relevance_(Information_Retrieval)">relevance in the context of information retrieval</a>. Relevance has been a subject of intense debate in the information science community (<a href="http://thenoisychannel.com/2008/05/05/saracevic-on-relevance-and-interaction/">Tefko Saracevic</a> tells the story wonderfully). Indeed, a key reason that I created the <a href="http://hcir.info/">HCIR workshop</a> was the belief that information retrieval researchers and practitioners (i.e., search engine developers) were placing too much emphasis on an objective notion of topical relevance, and not enough focus on the user.</p>
<p>Mercier and Sperber&#8217;s theory offers an interesting challenge to information retrieval researchers: perhaps a user&#8217;s information need is less about arriving at the truth and more about finding confirmatory evidence to support a preconceived conclusion. If so, should we adjust our notions of relevance accordingly? Also, if we evaluate or inform search quality based on observed user behavior (such as click-through behavior), then are we already inadvertently conflating topical relevance with users&#8217; confirmatory bias?</p>
<p>Many people have noted that personalization gives us the truth we want: recent examples include Robin Sloan and Matt Thompson&#8217;s <em><a href="http://www.robinsloan.com/epic/">EPIC 2014</a></em> and Eli Pariser&#8217;s <em><a href="http://www.thefilterbubble.com/">The Filter Bubble</a></em>. Despite the consensus that over-fitting information access to our personal tastes is a bad thing (perhaps even dystopian), technology seems to relentlessly push us in this direction. Moreover, some degree of personalization is clearly useful &#8212; such as prioritizing information that relates to our personal and professional interests.</p>
<p>Nonetheless, anyone working in the area of information seeking systems should be concerned with the question of the user&#8217;s goal in using that system. Many of us take for granted that the user&#8217;s main goal is truth seeking, and we design our systems accordingly. What can or should we do differently if the user&#8217;s main goal is not informative but persuasive? Is the user looking for an answer&#8230;or an <a href="http://www.youtube.com/watch?v=teMlv3ripSM">argument</a>?</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/30/id-like-to-have-an-argument-please/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Going Public</title>
		<link>http://thenoisychannel.com/2011/05/19/going-public/</link>
		<comments>http://thenoisychannel.com/2011/05/19/going-public/#comments</comments>
		<pubDate>Fri, 20 May 2011 03:53:49 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3629</guid>
		<description><![CDATA[What a day! I&#8217;ve been excited about LinkedIn from the moment I joined &#8212; and for several years before that &#8212; but today has been a unique experience. I hope our celebration extends beyond LinkedIn&#8217;s employees and investors &#8212; this is a great day for Silicon Valley, for the data scientists who are building its [...]]]></description>
			<content:encoded><![CDATA[<p><iframe width="475" height="296" src="http://www.youtube.com/embed/mCrYkEVygIs?rel=0" frameborder="0" allowfullscreen></iframe></p>
<p>What a day! I&#8217;ve been excited about LinkedIn from the moment I <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">joined</a> &#8212; and for several years before that &#8212; but today has been a unique experience. I hope our celebration extends beyond LinkedIn&#8217;s employees and investors &#8212; this is a great day for Silicon Valley, for the <a href="http://thenoisychannel.com/2011/01/04/so-you-like-big-data/">data scientists</a> who are building its most valuable companies, and for the users who are benefiting from it all. I am proud and deeply grateful to be a part of this extraordinary adventure. My thanks to my hundreds of incredible colleagues and to the <a href="http://blog.linkedin.com/2011/03/22/linkedin-100-million/">100M users</a> who have made it possible.</p>
<p>ps. Yes, we are still <a href="http://www.linkedin.com/jobs?viewJob=&#038;jobId=1544636">hiring</a>, so please contact me if you&#8217;re the kind of person who loves turning data into gold. And if you are local, check out Christos Faloutsos&#8217;s upcoming tech talk on <a href="http://bit.ly/graphmine">Mining Billion Node Graphs</a>, which will take place at LinkedIn on June 2 and is open to the public.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/19/going-public/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/19/going-public/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>In Search Of Structure</title>
		<link>http://thenoisychannel.com/2011/05/15/in-search-of-structure/</link>
		<comments>http://thenoisychannel.com/2011/05/15/in-search-of-structure/#comments</comments>
		<pubDate>Sun, 15 May 2011 19:24:53 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3607</guid>
		<description><![CDATA[A couple of weeks ago, I participated in a summit that Greylock Partners organized for its portfolio companies at LinkedIn to discuss the power of data. Invited participants represented some of the most interesting &#8220;big data&#8221; companies in Silicon Valley, including Google, Facebook, Pandora, Cloudera, and Zynga. Discussion took place under the Chatham House Rule, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.iws.org/images/search.gif"><img class="alignnone" title="In Search Of" src="http://www.iws.org/images/search.gif" alt="" width="185" height="182" /></a><a href="http://www.geo.arizona.edu/xtal/geos306/Image1.gif"><img class="alignnone" title="Structure" src="http://www.geo.arizona.edu/xtal/geos306/Image1.gif" alt="" width="243" height="182" /></a></p>
<p style="text-align: left;">A couple of weeks ago, I participated in a summit that <a href="http://www.greylock.com/">Greylock Partners</a> organized for its <a href="http://www.greylock.com/portfolio/portfolio/">portfolio</a> companies at <a href="http://www.linkedin.com/">LinkedIn</a> to discuss the power of data. Invited participants represented some of the most interesting &#8220;<a href="http://en.wikipedia.org/wiki/Big_data">big data</a>&#8221; companies in Silicon Valley, including Google, Facebook, Pandora, Cloudera, and Zynga. Discussion took place under the <a href="http://en.wikipedia.org/wiki/Chatham_House_Rule">Chatham House Rule</a>, so I&#8217;m not at liberty to share much detail. But I can say that there were energetic conversations about metrics, tools, and (of course) <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1544636">hiring</a>.</p>
<p>One of the participants was Google researcher <a href="http://www.cs.washington.edu/homes/alon/">Alon Halevy</a>, who generously shared his presentation on <a href="http://www.google.com/fusiontables/public/tour/index.html">Fusion Tables</a> with me with permission to re-share it <a href="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/05/fusionTablesMay5.pdf">here</a>.</p>
<p>Fusion Tables allow the general public to upload, visualize, and share structured data. They are particularly useful for journalists who want to distill compelling stories from data &#8212; indeed, <em>The Guardian</em>&#8216;s <a href="http://www.guardian.co.uk/profile/simonrogers">Simon Rogers</a> has used Fusion Tables to visualize and interpret everything from <a href="http://www.guardian.co.uk/news/datablog/2011/mar/14/nuclear-power-plant-accidents-list-rank">nuclear power plant accidents</a> to <a href="http://www.guardian.co.uk/news/datablog/2010/nov/29/wikileaks-cables-data">Wikileaks</a>.</p>
<p>After his presentation, I asked Alon for his thoughts on why haven&#8217;t we seen an encyclopedic structured data repository comparable in scope, scale to Wikipedia? Alon offered that structured data is brittle &#8212; its value tends depend more on context than the unstructured content that populates Wikipedia. I agree in part &#8212; for example, consider this <a href="http://www.nypost.com/p/news/local/posh_nabes_get_bus_ted_LS1oa34dj4q4SoJPkY9FZK">map</a> of Brooklyn bus stops that were slated for elimination last summer. Such data is useful in a narrow context, but hardly encyclopedic.</p>
<p>But what about <a href="http://wiki.freebase.com/wiki/What_is_Freebase%3F">Freebase</a> and <a href="http://dbpedia.org/About">DBpedia</a>? Freebase is an open repository of structured data associated with about 20 million topics. DBpedia describes itself as &#8220;a community effort to extract structured information from Wikipedia and to make this information available on the Web.&#8221; While these tools have seen some use by developers (especially in the <a href="http://semanticweb.org/">semantic web community</a>), they have not achieved mainstream adoption. Perhaps data marketplaces like <a href="http://www.factual.com/">Factual</a> and <a href="http://www.infochimps.com/">Infochimps</a> will be successful as for-profite businesses, but the question remains why we don&#8217;t have a Wikipedia-scale success story for public structured data.</p>
<p>I think the problem is easiest to frame in <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> terms. Wikipedia is all about <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a>, but not so much about <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a>. Let me elaborate.</p>
<p>Wikipedia represents a collective attempt to achieve precision at the level of individual entries. Contributor / editors correct mistakes and argue over the details of content and tone. But coverage is a much lower priority. When in doubt, the Wikipedia collective assumes that information is not notable enough to justify inclusion. Thus Wikipedia errs on the side of precision rather than recall when it comes to meeting the information needs of its users.</p>
<p>This arrangement works well for a typical web user who seeks out information by using Google web search as an interface to discover Wikipedia articles. But structured data is about sets, not just individuals. It does me no good to see aggregate statistics about a set of entities if the set is erratically populated (e.g., Wikipedia&#8217;s list of <a href="http://en.wikipedia.org/wiki/Category:Companies_established_in_1999">companies established in 1999</a> or Freebase&#8217;s list of those <a href="http://www.freebase.com/view/user/masouras/default_domain/views/companies_founded_after_2000">founded after 2000</a>).</p>
<p>In the June 2009 SIGIR Forum, University of Melbourne researchers Justin Zobel, Alistair Moffat, and Laurence Park argued &#8220;<a href="http://www.sigir.org/forum/2009J/2009j-sigirforum-zobel.pdf">against recall</a>&#8220;, concluding that they could find &#8220;no justification for implicit or explicit use of recall as a measure of search satisfaction.&#8221; I posted a rebuttal entitled &#8220;<a href="http://thenoisychannel.com/2009/07/17/in-defense-of-recall/">In Defense of Recall</a>&#8220;, arguing that recall is much more useful as a measure for set retrieval than for ranked retrieval. Revisiting this argument two year later, I can see that it holds even more strongly if we are interested in structured data where we want to reason about aggregate properties of sets.</p>
<p>Back when we both worked at <a href="http://www.endeca.com/">Endeca</a>, my colleague <a href="http://www.linkedin.com/in/robgonzalez">Rob Gonzalez</a> described structured data repositories to be as a public good that no one is ever willing to pay for. I&#8217;m an optimist by nature, but in this case I fear he has a point. It takes a lot of work to build something useful, and no one seems to have addressed the challenge of incenting people to contribute this work for either economic or altruistic motives.</p>
<p>Or perhaps we&#8217;ll just have to wait for the holy grail of information extraction algorithms to structure the world&#8217;s information for us? Ironically, that&#8217;s not even included on Wikipedia&#8217;s list of <a href="http://en.wikipedia.org/wiki/AI-complete">AI-complete</a> problems.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/15/in-search-of-structure/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/15/in-search-of-structure/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
		<item>
		<title>Announcing HCIR 2011!</title>
		<link>http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/</link>
		<comments>http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/#comments</comments>
		<pubDate>Sun, 08 May 2011 04:24:24 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3598</guid>
		<description><![CDATA[As regular readers know, I&#8217;ve been co-organizing annual workshops on Human-Computer Interaction and Information Retrieval since creating the first HCIR workshop in 2007. These have been a huge success, not only bridging the gap between IR and HCI, but also bringing together researchers and practitioners to address concerns shared by both communities. Past keynote speakers have included [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://isquared.files.wordpress.com/2011/03/wordle.jpg"><img class="alignnone" title="HCIR Wordle (via Tony Russell-Rose)" src="http://isquared.files.wordpress.com/2011/03/wordle.jpg" alt="" width="524" height="254" /></a></p>
<p>As regular readers know, I&#8217;ve been co-organizing annual workshops on <a href="http://hcir.info/">Human-Computer Interaction and Information Retrieval</a> since creating the first HCIR workshop in <a href="http://projects.csail.mit.edu/hcir/">2007</a>. These have been a huge success, not only bridging the gap between <a href="http://en.wikipedia.org/wiki/Information_retrieval">IR</a> and <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_interaction">HCI</a>, but also bringing together researchers and practitioners to address concerns shared by both communities. Past keynote speakers have included such information science luminaries as <a href="http://en.wikipedia.org/wiki/Susan_Dumais">Susan Dumais</a>, <a href="http://en.wikipedia.org/wiki/Ben_Shneiderman">Ben Shneiderman</a>, and <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a>.</p>
<p>Every workshop has improved on the previous year&#8217;s, and <a href="http://hcir.info/hcir-2011/">HCIR 2011</a>, which will take place on Thursday, October 20, will be no exception.</p>
<p>Our venue will be <a href="http://maps.google.com/maps/place?cid=1017478923201951099">Google&#8217;s headquarters</a> in Mountain View, California. We could hardly imagine a more appropriate venue: Google has done more than any another company to contribute to everyday information access. Google has been extremely generous as a host and sponsor (other sponsors include <a href="http://www.endeca.com/">Endeca</a> and <a href="http://research.microsoft.com/">Microsoft Research</a>), and its location in the heart of Silicon Valley is ideal for attracting researchers and practitioners building the future of HCIR.</p>
<p>Our keynote speaker will be <a rel="nofollow" href="http://ils.unc.edu/~march/" target="_blank">Gary Marchionini</a>, Dean of the School of Information and Library Science at the University of North Carolina at Chapel Hill. Gary coined the phrase &#8221;human–computer information retrieval&#8221; in a lecture entitled &#8220;<a href="http://www.asis.org/Bulletin/Jun-06/marchionini.html">Toward Human-Computer Information retrieval</a>&#8220;, in which he asserted that &#8221;HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.&#8221; We are honored to have Gary deliver this year&#8217;s keynote.</p>
<p>But of course the main attraction is the contribution of participants. This year we invite three types of papers: position papers, research papers and challenge reports. Possible topics for discussion and presentation at the workshop include, but are not limited to:</p>
<ul>
<li><span style="color: #000000;">Novel interaction techniques for information retrieval.</span></li>
<li><span style="color: #000000;">Modeling and evaluation of interactive information retrieval.</span></li>
<li><span style="color: #000000;">Exploratory search and information discovery.</span></li>
<li><span style="color: #000000;">Information visualization and visual analytics.</span></li>
<li><span style="color: #000000;">Applications of HCI techniques to information retrieval needs in specific domains.</span></li>
<li><span style="color: #000000;">Ethnography and user studies relevant to information retrieval and access.</span></li>
<li><span style="color: #000000;">Scale and efficiency considerations for interactive information retrieval systems.</span></li>
<li><span style="color: #000000;">Relevance feedback and active learning approaches for information retrieval.</span></li>
</ul>
<p><span style="color: #000000;">Demonstrations of systems and prototypes are particularly welcome.</span></p>
<p>Building on the success of the <a href="http://hcir.info/hcir-2010/challenge">last year&#8217;s HCIR Challenge</a> to address historical exploration of a news archive, <a href="http://hcir.info/hcir-2011/challenge">this year&#8217;s HCIR Challenge</a> will focus on the problem of information availability. The corpus for the Challenge will be the <a href="http://citeseerx.ist.psu.edu/">CiteSeer</a> digital library of scientific literature.</p>
<p>For more information about the workshop, including how to submit papers or participate in the challenge, please visit the <a href="http://hcir.info/hcir-2011/">HCIR 2011 website</a>.</p>
<p>Here are the key dates for submitting position and research papers:</p>
<ul>
<li>Submission deadline (position and research papers): <strong>July 31</strong></li>
<li>Notification of acceptance decision: <strong>September 8</strong></li>
<li>Presentations and poster session at workshop:<strong> October 20</strong></li>
</ul>
<p>Key dates for Challenge participants:</p>
<ul>
<li>Request access to corpus (contact <a href="mailto:dtunkelang@gmail.com">me</a>) deadline: <strong>June 19</strong></li>
<li>Freeze system and submit brief description: <strong>September 25</strong></li>
<li>Submit videos or screenshots demonstrating systems on example tasks: <strong>October 9</strong></li>
<li>Live demonstrations at workshop: <strong>October 20</strong></li>
</ul>
<p>I&#8217;m looking forward to this year&#8217;s submissions, and to a great workshop in October. I hope to see many of you there!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/05/07/announcing-hcir-2011/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>CFP: CIKM 2011 Industry Event</title>
		<link>http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/</link>
		<comments>http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/#comments</comments>
		<pubDate>Sun, 01 May 2011 00:43:10 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3594</guid>
		<description><![CDATA[As I posted a few months ago, I&#8217;m organizing the Industry Event at CIKM 2011 with Tony Russell-Rose. We have a great set of keynotes lined up: Stephen Robertson (Microsoft Research) John Giannandrea (Google) Jeff Hammerbacher (Cloudera) Peter Jackson (Thomson Reuters) We&#8217;re also looking for submissions from industry researchers and practitioners. The submission deadline is June 21. Here is [...]]]></description>
			<content:encoded><![CDATA[<div>
<p><a href="http://www.cikm2011.org/node/20"><img title="CIKM 2011" src="http://www.cikm2011.org/sites/default/files/cikm2011_craigm_v1_logo.jpg" alt="" width="576" height="91" /></a></p>
<p>As I posted a few months ago, I&#8217;m organizing the <a href="http://www.cikm2011.org/node/20">Industry Event</a> at <a href="http://www.cikm2011.org/">CIKM 2011</a> with <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>. We have a great set of keynotes lined up:</p>
<ul>
<li><a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a> (Microsoft Research)</li>
<li><a href="http://www.freebase.com/view/en/john_giannandrea">John Giannandrea</a> (Google)</li>
<li><a href="http://jeffhammerbacher.com/">Jeff Hammerbacher</a> (Cloudera)</li>
<li><a href="http://www.jacksonpeter.com/">Peter Jackson</a> (Thomson Reuters)</li>
</ul>
<p>We&#8217;re also looking for submissions from industry researchers and practitioners. The submission deadline is <strong>June 21</strong>.</p>
<p>Here is a copy of the <a href="http://www.cikm2011.org/node/20">call for papers</a>:</p>
<p>This year’s CIKM conference will include an Industry Event, which will be held during the regular conference program in parallel with the technical tracks.</p>
<p>The Industry Event&#8217;s objectives are twofold. The first objective is to present the state-of-the-art in information retrieval, knowledge management, databases, and data mining, delivered as keynote talks by influential technical leaders who work in industry. The second objective is to present interesting, novel and innovative industry developments in these areas.</p>
<p>Industry authors are invited to prepare proposals for presenting interesting, novel and innovative ideas, and submit these to <a href="mailto:industry@cikm2011.org">industry@cikm2011.org</a> by June 21st 2011. The proposals should contain (with respective lengths):</p>
<ul>
<li>Short company portrait (125 words)</li>
<li>Short CV of the presenter (125 words)</li>
<li>Title and abstract of the presentation (250 words)</li>
<li>Reasons why the presentation should be interesting to the CIKM audience</li>
</ul>
<p>When submitting a proposal, please bear in mind the following:</p>
<ul>
<li>Ensure the presentation is relevant to the CIKM audience (the Call for Papers gives a good idea of the conference scope).</li>
<li>Try to highlight interesting R&amp;D challenges in the work you present. Please do not present a sales pitch.</li>
<li>All slides will be made public (no confidential information on the slides; you will be expected to ensure your slides are approved by your company before being presented).</li>
<li>Presenters may opt to have their presentation videoed and made public, and if so, the presenter will be asked to sign a release form.</li>
</ul>
<p>We look forward to receiving your submissions, and welcoming you to the CIKM 2011 Conference and Industry Event.</p>
<p><strong>Important dates:</strong><br />
21 June 2011:	 Industry Event paper proposals due<br />
19 July 2011:	 Notifications sent<br />
27 October 2011:	Industry Event<br />
24-28 October 2011:	CIKM conference</p>
<p>&nbsp;</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/30/cfp-cikm-2011-industry-event/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CFP: IEEE Internet Computing Special Issue on Context-Aware Computing</title>
		<link>http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/</link>
		<comments>http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/#comments</comments>
		<pubDate>Sun, 01 May 2011 00:17:22 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3585</guid>
		<description><![CDATA[Pankaj Mehra and I are guest editors for an upcoming special issue of IEEE Internet Computing with the topic &#8220;Beyond Search: Context-Aware Computing&#8220;. Here is a copy of the call for papers: Context is the unstated actor in human communications, actions, and situations. It makes our communication efficient, our commands actionable, and our situations understandable to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.computer.org/portal/web/internet/home"><img class="alignnone" title="IEEE Internet Computing" src="http://www.computer.org/portal/image/image_gallery?uuid=9a783503-bfa2-47ca-aa63-6d02ca87c662&amp;groupId=889131&amp;t=1256675935719" alt="" width="481" height="55" /></a></p>
<p><a href="http://www.linkedin.com/in/pankajmehra">Pankaj Mehra</a> and I are guest editors for an upcoming special issue of <em><a href="http://www.computer.org/portal/web/internet/home">IEEE Internet Computing</a></em> with the topic &#8220;<a href="http://www.computer.org/portal/web/computingnow/iccfp2">Beyond Search: Context-Aware Computing</a>&#8220;.</p>
<p>Here is a copy of the call for papers:</p>
<p>Context is the unstated actor in human communications, actions, and situations. It makes our communication efficient, our commands actionable, and our situations understandable to the people, organizations, and devices that provide us with content or services. The increased embedding of technology into our personal and social environments drives a need for context-aware computing.</p>
<p>Context-aware computing offers mobile Internet users an experience that goes beyond user-initiated search and location-­based services. Context awareness sharpens relevance when responding to user-initiated actions (such as product search and support calls). It also enables proactive communications through analysis of a user’s behavior and environment, thereby forming the basis for key business imperatives targeting customer-engagement systems. Even greater opportunity arises from context use in systems that can make sense of and engage in customer dialogs and forums.</p>
<p>This special issue seeks original articles that support and illustrate context use in creating enhanced user experiences. Sample topics include</p>
<ul>
<li>proactive, contextualized delivery of information, alerts, and advertisements;</li>
<li>context-mediated Web service orchestration, yielding actionable interpretation of spoken high-level commands;</li>
<li>system architecture, economics, and ecosystems for comprehensive capture, representation, communication, gathering, and brokering the larger user context;</li>
<li>systems of engagement that treat discourse as text plus context and process textual communication as an event in which linguistic, cognitive, and social actions converge; and</li>
<li>reasoning and knowledge representation mechanisms that use context in selecting the body of knowledge to use, the level of detail to model, and the point of view with which to communicate and interpret text and data.</li>
</ul>
<p>All submissions must be original manuscripts of fewer than 5,000 words, focused on Internet technologies and implementations. All manuscripts are subject to peer review on both technical merit and relevance to <em>IC</em>’s international readership—primarily system and software design engineers. We do not accept white papers, and we discourage strictly theoretical or mathematical papers. To submit a manuscript, please log on to <a href="https://mc.manuscriptcentral.com/ic-cs" target="_blank">ScholarOne (https://mc.manuscriptcentral.com:443/ic-cs)</a> to create or access an account, which you can use to log on to <a href="http://www.computer.org/portal/web/peerreviewmagazines/acinternet"><em>IC</em>’s Author Center</a> and upload your submission.</p>
<p>I hope some of you will submit articles in time for the <strong>June 15</strong> deadline, and Pankaj and I look forward to reviewing them.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/30/cfp-ieee-internet-computing-special-issue-on-context-aware-computing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Identifying Influencers on Twitter</title>
		<link>http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/</link>
		<comments>http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/#comments</comments>
		<pubDate>Sun, 17 Apr 2011 02:52:43 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3567</guid>
		<description><![CDATA[One of the perks of working at LinkedIn is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a WSDM 2011 paper on &#8220;Identifying &#8216;Influencers&#8217; on Twitter&#8221; by Eytan Bakshy, Jake Hofman, Winter Mason, and Duncan Watts. It&#8217;s great to see the folks at [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://darmano.typepad.com/logic_emotion/2006/08/levels_of_influ.html"><img class="alignnone size-full wp-image-3569" title="Levels of Influence (David Armamo)" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/04/levels-of-influence.gif" alt="" width="418" height="418" /></a><br />
One of the perks of <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1544636">working at LinkedIn</a> is being surrounded by intellectually curious colleagues. I recently joined a reading group and signed up to lead our discussion of a <a href="http://www.wsdm2011.org/">WSDM 2011</a> paper on &#8220;<a href="http://research.yahoo.com/files/bakshy_wsdm.pdf">Identifying &#8216;Influencers&#8217; on Twitter</a>&#8221; by <a href="http://www-personal.umich.edu/~ebakshy">Eytan Bakshy</a>, <a href="http://research.yahoo.com/Jake_Hofman">Jake Hofman</a>, <a href="http://research.yahoo.com/Winter_Mason">Winter Mason</a>, and <a href="http://research.yahoo.com/Duncan_Watts">Duncan Watts</a>. It&#8217;s great to see the folks at Yahoo! Research doing cutting-edge work in this space.</p>
<p>I thought I&#8217;d prepare for the discussion by sharing my thoughts here. Perhaps some of you will even be kind enough to add your own ideas, which I promise to share with the reading group.</p>
<p>I encourage you to read the paper, but here&#8217;s a summary of its results:</p>
<ul>
<li>A user&#8217;s influence on Twitter is the extent to which that user can cause diffusion a posted URL, as measured by reposts propagated through follower edges in Twitter&#8217;s directed social graph.</li>
<li>The best predictors of future total influence are follower count and past local influence, where local influence refers to the average number of reposts by that user’s immediate followers, and total influence refers to average total cascade size.</li>
<li>The content features of individual posts do not have identifiable predictive value.</li>
<li>Barring a high per-influencer acquisition cost, the most cost-effective strategy for buying influence is to target users of average influence.</li>
</ul>
<p>Let&#8217;s dive in a bit deeper.</p>
<p>The definitions of influence and influencers are, by the authors&#8217; own admission, narrow and arbitrary. There are many ways one could define influence, even within the context of Twitter use. But I agree with the authors that these definitions have enough <a href="http://en.wiktionary.org/wiki/verisimilitude">verisimilitude</a> to be useful, and their simplicity facilitates quantitative analysis.</p>
<p>It&#8217;s hardly surprising that past influence is a strong predictor of future influence. But it might seem counterintuitive that, for predicting future total influence,  past local influence is more informative than past total influence. The authors suggest the explanation that most non-trivial cascades are of depth 1 &#8212; i.e., total influence is mostly local influence. But at most that would make the two features equally informative, and total influence should still be a mildly better predictor.</p>
<p>I suspect that another factor is in play &#8212; namely, that the difference between local influence and total influence reflects the unpredictable and rare virality of the content (e.g., <a href="http://networkeffect.allthingsd.com/20110415/random-facebook-users-question-gets-four-million-votes/">a random Facebook Question generated 4M votes</a>). If this hypothesis is correct, then past local influence factors out this unpredictable factor and is thus a better predictor of both future local influence and future total influence.</p>
<p>I&#8217;m a bit surprised that follower count supplies additional informative value beyond the past local influence; after all, local influence should already reflect the extent to which the followers are being influenced. It&#8217;s possible that past influence lags the follower count, since it does not sufficiently weigh the potential contributions of more recent followers. But another possibility is one analogous to the predictive value of past local vs. global influence: past local influence may include an unpredictable content factor which follower count factors out.</p>
<p>Of course, I can&#8217;t help suggesting that <a href="http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/">TunkRank</a> might be a more useful indicator than follower count. Unfortunately the authors don&#8217;t seem to be aware of the TunkRank work &#8212; or perhaps they preferred to restrict their attention to basic features.</p>
<p>I&#8217;m not surprised by the inability to exploit content features to predict influence. If it were easy to generate viral content, <a href="http://en.wikipedia.org/wiki/Get-rich-quick_scheme">everyone would do it</a>. Granted, a deeper analysis might squeeze out a few features (like those suggested in the <a href="http://www.buddymedia.com/newsroom/?p=9335">Buddy Media report</a>), but I don&#8217;t think there are any silver bullets here.</p>
<p>Finally, the authors consider the question of designing a cost-effective strategy to buy influence. The authors assume that the cost of buying influence can be modeled in terms of two parameters: a per-influencer acquisition cost (which is the same for each influencer) and a per-follower cost for each influencer. They conclude that, until the acquisition cost is extremely high (i.e., over 10,000 times the per-follower cost), the most cost-efficient influencers are those of average influence. In other words, there&#8217;s no reason to target the <a href="http://www.amazon.com/Influentials-American-Tells-Other-Where/dp/0743227298">small number of highly influential users</a>.</p>
<p>The authors may be arriving at the right conclusion (Watts&#8217;s <a href="http://research.yahoo.com/files/w_d_JCR.pdf">earlier work</a> with <a href="http://www.uvm.edu/~pdodds/">Peter Dodds</a>, which the paper cites, questions the &#8220;influentials&#8221; hypothesis), but I&#8217;m not convinced by their economic model of an influence market. It may be the case that professional influencers are trying to peddle their followers&#8217; attention on a per-follower basis &#8212; there are <a href="http://www.buytwitterfollowers.org/">sites</a> <a href="http://twitter1k.com/">that</a> <a href="http://www.socialkik.com/twitter_promo.html">offer</a> <a href="http://www.twitterfollowersshop.com/">this</a> <a href="http://usocial.net/twitter_marketing/">model</a>.</p>
<p>But why should anyone believe that an influencer&#8217;s value is proportional to his or her number of followers? The authors&#8217; own work suggests that past local influence is a more valuable predictor than follower count, and again they might want to look at TunkRank.</p>
<p>Regardless, I&#8217;m not surprised that a fixed per-follower cost makes users with high follower counts less cost-effective, as I subscribe to its corollary: as a user&#8217;s follower count goes up, the per-follower value diminishes. I haven&#8217;t done the analysis, but I believe that the ratio of a user&#8217;s TunkRank to the user&#8217;s follower count tends to go down as a user&#8217;s follower count goes up. A more interesting research (and practical) question would be to establish a correctly calibrated model of influencer value and then explore portfolio strategies.</p>
<p>In any case, it&#8217;s an interesting paper, and I look forward to discussing it with my colleagues next week. Of course, I&#8217;m happy to discuss it here in the meantime. If you&#8217;re in my reading group, feel free to chime in. And you&#8217;re not in you&#8217;re not in my reading group, consider joining. We do have <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1544636">openings</a>. <img src='http://thenoisychannel.com/wordpress/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/16/identifying-influencers-on-twitter/feed/</wfw:commentRss>
		<slash:comments>25</slash:comments>
		</item>
		<item>
		<title>Social Utility, +/- 25%</title>
		<link>http://thenoisychannel.com/2011/04/14/social-utility-25/</link>
		<comments>http://thenoisychannel.com/2011/04/14/social-utility-25/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 04:19:28 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3550</guid>
		<description><![CDATA[I like Google&#8230; I&#8217;ve been a regular Google user since the day I first discovered its existence in 1999. Indeed, I&#8217;ve consistently found Google to be the most useful service on the web. That&#8217;s not love, but it&#8217;s a very strong +1. Moreover, I&#8217;d say that my preference for Google is an informed one. I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.businessinsider.com/heres-the-memo-telling-all-google-employees-their-2011-pay-depends-on-google-sucking-less-at-social-2011-4"><img class="alignnone" title="FAQ for Google employees about the &quot;social&quot; bonus (via Business Insider)" src="http://static3.businessinsider.com/image/4d9e557eccd1d599390f0000-915-581/multiplier-faq.jpg" alt="" width="514" height="326" /></a></p>
<h3>I like Google&#8230;</h3>
<p>I&#8217;ve been a regular Google user since the day I first discovered its existence in 1999. Indeed, I&#8217;ve consistently found Google to be the most useful service on the web. That&#8217;s not love, but it&#8217;s a very strong <a href="http://www.google.com/+1/button/">+1</a>.</p>
<p>Moreover, I&#8217;d say that my preference for Google is an informed one. I&#8217;ve given all of the major search engines a <a href="http://thenoisychannel.com/2009/06/01/banging-on-bing-a-bummer/">fair chance</a>, and even tried a fair number of <a href="http://thenoisychannel.com/2008/10/16/duck-duck-go/">obscure</a> <a href="http://thenoisychannel.com/2009/03/15/kosmix-im-impressed/">ones</a>. They all have their strengths, but none have delivered enough utility to me to justify the cognitive load of using more than one search engine for the open web.</p>
<h3>&#8230;but I don&#8217;t need Google.</h3>
<p>Nonetheless, I know that, if Google disappeared tomorrow or became <a href="http://www.mobilecrunch.com/2010/09/09/verizon-to-bing-i-choose-you/">inconvenient to access</a>, I&#8217;d be content with one of its competitors. I have no particular investment in Google beyond brand loyalty.</p>
<p>Actually, that&#8217;s not entirely true. I could easily walk away from Google search, but I&#8217;d be apoplectic if I suddenly lost access to my Gmail account &#8212; much as if I lost access to my LinkedIn or Twitter accounts. Indeed, Gmail is the only way in which Google has me locked in, but I don&#8217;t see my Gmail account as entangled with my access to Google&#8217;s other services.</p>
<p>Perhaps that not a bug but a feature: after all, Google trumpets the virtues of <a href="http://googleblog.blogspot.com/2009/12/meaning-of-open.html">&#8220;open&#8221;</a> and the portability of user data (including Gmail) through the <a href="http://www.dataliberation.org/">Data Liberation Front</a>. Nonetheless, it&#8217;s no secret that Google has a major case of <a href="http://abclocal.go.com/kabc/story?section=news/consumer&amp;id=8072533">Facebook envy</a>. And if <a href="http://www.businessinsider.com/heres-the-memo-telling-all-google-employees-their-2011-pay-depends-on-google-sucking-less-at-social-2011-4">rumors</a> hold, Google is now making the success of its social strategy a major component in all employee compensation.</p>
<h3>Social is Give to Get.</h3>
<p>Google critics often assert that <a href="http://www.google.com/search?q=%22google+doesn't+get+social%22">Google doesn&#8217;t get social</a>. But I think the problem isn&#8217;t so much with what Google gets as what it gives. When it comes to social, you have to give to get. That is, to get data and engagement, you have to provide social utility.</p>
<p>To start off, Google would love to know <strong>who you are</strong>. That&#8217;s why it developed <a href="http://www.google.com/support/accounts/bin/answer.py?answer=97703">Google Profiles</a> in 2007. People are more than willing to provide data about who they are, as proven by the hundreds of millions of people who create profiles on Facebook and LinkedIn. Perhaps Google was a little bit late to the game. More likely, people didn&#8217;t see enough utility in creating Google profiles. Facebook, on the other hand, helps people be found by their friends and family in a context designed for social interaction. LinkedIn offers people the opportunity to be found by people who can help you professionally: colleagues, classmates, potential employers, etc. Google didn&#8217;t give people much reason to invest effort &#8212; in fact it seems to treat Profiles as a dumping ground populated by Google&#8217;s other products, rather than valuable piece of online real estate embedded in a living social context. Not surprisingly, users invest their efforts elsewhere.</p>
<p>Google would also love to know <strong>where you are</strong> and <strong>where you&#8217;ve been</strong> &#8212; that&#8217;s why Google created <a href="http://techcrunch.com/2009/02/04/broadcast-your-location-to-friends-with-google-latitude/">Latitude</a> in 2009. Moreover, Google developed this pioneering location-based service as a complement to Google Maps, perhaps the best product Google has produced outside of search. Given it&#8217;s dominance in mapping services, directions, and local search, Google should be the leader of all things local. And yet, while Latitude has flopped, Foursquare &#8212; which launched in the same year as a tiny startup after Google acquired and shut down its <a href="http://en.wikipedia.org/wiki/Dodgeball_(service)">previous incarnation</a>&#8211; succeeded in defining location-based services as a category. Before Foursquare, the idea of a service tracking your location was one that most of us associated with <a href="http://www.lojack.com/">Lo-Jack</a> and <a href="http://en.wikipedia.org/wiki/Nineteen_Eighty-Four">Big Brother</a> &#8212; if not with modern totalitarian regimes. Yet, by making a game out of &#8220;checking in&#8221; to venues, Foursquare inspired its users to willingly &#8212; and eagerly! &#8212; share and publish their whereabouts. It&#8217;s unclear whether this model will create sustained interest (cf. Mark Watkins&#8217;s analysis at <a href="http://www.readwriteweb.com/archives/2011_the_year_the_check-in_died.php">ReadWriteWeb</a>), but Foursquare&#8217;s success thus far is predicate on its offers social utility in exchange for data and attention.</p>
<p>Of course, Google also wants to know <strong>what you like</strong>. That&#8217;s why Google developed <a href="http://thenoisychannel.com/2008/11/21/google-searchwiki-an-interesting-take-on-pim/">SearchWiki</a> (RIP), <a href="http://google-latlong.blogspot.com/2010/11/discover-yours-local-recommendations.html">Hotpot</a> (now <a href="http://googleblog.blogspot.com/2011/04/hotpot-is-going-places.html">merged into Places</a>), and most recently <a href="http://www.google.com/+1/button/">+1</a>. As Amazon, Facebook, Netflix, and Yelp have demonstrated, people aren&#8217;t shy about sharing their opinions publicly, given the right social context and utility. Unfortunately, Google seems to struggle with that last part. Google embedded SearchWiki in the non-social context of search &#8212; and has launched +1 the same way. It&#8217;s not at all clear what users would gain by going out of their flow to annotate search results. Hotpot may simply be a case of too little, too late &#8212; people are already trained to go to Yelp and Facebook Fan pages for subjective information about service businesses. Overall, Google has not given users a reason to believe there is significant return on their investment in sharing opinions.</p>
<h3>Collecting Data Doesn&#8217;t Count.</h3>
<p>Of course Google is able to collect a significant amount of data about users&#8217; identities through their search history, cookies, browser toolbars, and purchase history (if they use Google Checkout). Indeed, it is Google inference of user intent in search queries that has allowed Google to become the poster child of online advertising.</p>
<p>But collecting data is not the same as having the user volunteer it. Most users have a transactional relationship with Google, tolerating data collection and advertising in exchange for a free service. Google wants more &#8212; it wants users to invest in identities associated with their Google accounts. But Google doesn&#8217;t seem to undertand that users don&#8217;t make these investments unless their receive some social or professional utility in return.</p>
<p>If it&#8217;s true that Larry Page is making &#8220;social&#8221; Google&#8217;s top <a href="http://dondodge.typepad.com/the_next_big_thing/2010/01/how-google-sets-goals-and-measures-success.html">OKR</a>, then I hope for the sake of my former colleagues that Google has learned from its past experiments.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/14/social-utility-25/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/14/social-utility-25/feed/</wfw:commentRss>
		<slash:comments>39</slash:comments>
		</item>
		<item>
		<title>Guest Blog: Data 2.0 Conference Report</title>
		<link>http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/</link>
		<comments>http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 15:26:41 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3546</guid>
		<description><![CDATA[Note: This post was written by Scott Nicholson, a Senior Data Scientist at LinkedIn. Scott is data and modeling geek with a passion for startups, product and user experience. His work at LinkedIn focuses on analyzing and improving user engagement and monetization. I’m happy to report back on my experience at the Data 2.0 conference, an [...]]]></description>
			<content:encoded><![CDATA[<p><object width="400" height="300"><param name="flashvars" value="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fgroups%2Fdata2con%2Fpool%2Fshow%2F&amp;page_show_back_url=%2Fgroups%2Fdata2con%2Fpool%2F&amp;group_id=1614380@N25&amp;jump_to=&amp;start_index=" /><param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649" /><param name="allowFullScreen" value="true" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" flashvars="offsite=true&amp;lang=en-us&amp;page_show_url=%2Fgroups%2Fdata2con%2Fpool%2Fshow%2F&amp;page_show_back_url=%2Fgroups%2Fdata2con%2Fpool%2F&amp;group_id=1614380@N25&amp;jump_to=&amp;start_index=" allowfullscreen="true"></embed></object></p>
<p><em>Note: This post was written by <a href="http://www.linkedin.com/in/scottnicholsonphd">Scott Nicholson</a>, a Senior Data Scientist at LinkedIn. Scott is data and modeling geek with a passion for startups, product and user experience. His work at LinkedIn focuses on analyzing and improving user engagement and monetization.</em></p>
<p>I’m happy to report back on my experience at the <a href="http://data2con.com/">Data 2.0 conference</a>, an event organized by <a href="http://midventures.com/">midVentures</a> and targeted at entrepreneurs building products to leverage the dramatic increase in publicly and privately collected data. The conference has four main themes: what data is available, how to obtain data, how to store and access data, and how to create value from data products. For data nerds or hackers, the conference offered a delightful stream of  “you know what would be cool&#8230;” ideas.</p>
<p>The morning started off on a strong foot with a talk by <a href="http://wadhwa.com/">Vivek Wadhwa</a> on how data is going to define the next generation of successful startups in a new information age. He observed the increasing online access to data that has previously been restricted to offline access (or no access at all). He also emphasized the importance of  new sources of data, such as medical records and genome data. We need to think of social use of data beyond Twitter, Facebook and LinkedIn: for example, genome data will allow us to connect to each other in ways that helps us better understand our similarities and differences. Meanwhile, some existing data sources will become increasingly open and available to all. Wadhwa stressed the importance of leveraging the open sources of federal, state and local government data to come up with solutions to the existing closed and clunky legacy systems that governments used to generate data reports (<em>a pity that <a href="http://data.gov/">data.gov</a> and related programs may be <a href="http://www.guardian.co.uk/news/datablog/2011/apr/05/data-gov-crisis-obama">defunded</a> &#8212; DT</em>).</p>
<p>The morning keynote segued nicely into the <a href="http://data2con.com/schedule/topics-2/#WhyOpenData">panel</a> on open data sources. <a href="http://www.jaynath.com/">Jay Nath</a>, Director of CRM for the city of San Francisco, noted that, while many applications are using government data and APIs, they mostly address consumer convenience (e.g., public transit apps) rather than government efficiency.  Panelists agreed that government employees have few incentives to take risks by using new technology: legacy systems might be expensive, inflexible and inefficient, but they do perform their limited function. Alluding to Eric Ries&#8217;s idea of a &#8220;<a href="http://theleanstartup.com/">lean startup</a>&#8220;, Nath suggested the concept of a &#8220;lean government&#8221; that lowered costs, sped up its operations, and avoided procurement processes by using open source technology &#8212; all in the context of providing services to its citizens.</p>
<p>The inspiring mid-day keynote by former Amazon Chief Scientist <a href="http://www.weigend.com/">Andreas Weigend</a> took a different perspective from the morning sessions: he focused on the how data sharing can provide tangible value to end-users, even resulting in significant behavior change. He cited products like<a href="http://www.withings.com/en/bodyscale"> tweeting weight scales</a>,<a href="http://www.fitbit.com/"> FitBit</a>, and<a href="http://www.apple.com/ipod/nike/"> Nike +</a> that allow people to share data about their fitness efforts, thus leading to social reinforcement for positive behaviors. I personally see this area as a great example of where data scientists and engineers can create enormous economic value and increase people’s welfare</p>
<p>The day also featured a various product launches and presentations. Here are a few that caught my attention:</p>
<ul>
<li><a href="http://micello.net/">Micello</a>: Google maps for indoors. They won the startup competition that was held in conjunction with the conference.</li>
<li><a href="https://www.tropo.com/home.jsp">Tropo</a>: API for voice calls and SMS</li>
<li><a href="http://www.datastax.com/products/brisk">DataStax Brisk</a>: Technology unifying<a href="http://hadoop.apache.org/"> Hadoop</a>,<a href="http://wiki.apache.org/hadoop/Hive"> Hive</a> &amp;<a href="http://cassandra.apache.org/"> Cassandra</a>. A new Hadoop distribution powered by Cassandra.</li>
<li><a href="http://www.neerlife.com/">Neer</a>: always-on location awareness app from Qualcomm. Privately share location with groups and families.</li>
<li><a href="http://www.heritagehealthprize.com/c/hhp">Heritage Health Prize</a>: $3MM prize for predictive modeling around who will require hospitalization (a follow-up on their announcing the prize at<a href="http://strataconf.com/strata2011"> Strata</a>)</li>
</ul>
<p>Overall, it was great to see hundreds of people exploring innovations and opportunities to use data to improve business, technology and society.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/04/07/guest-blog-data-2-0-conference-report/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Steal These Ideas!</title>
		<link>http://thenoisychannel.com/2011/03/27/steal-these-ideas/</link>
		<comments>http://thenoisychannel.com/2011/03/27/steal-these-ideas/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 01:05:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3541</guid>
		<description><![CDATA[Talk is cheap, as the saying goes. That&#8217;s a good thing, since I am always overflowing with ideas that I have neither the time (I love my day job!) nor the money to advance. What I do have is a blog that I hope inspires readers to turn some of these ideas into reality. My [...]]]></description>
			<content:encoded><![CDATA[<p>Talk is cheap, as the saying goes. That&#8217;s a good thing, since I am always overflowing with ideas that I have neither the time (I love my <a href="http://www.linkedin.com/in/dtunkelang">day job</a>!) nor the money to advance. What I do have is a blog that I hope inspires readers to turn some of these ideas into reality.</p>
<p>My ideas are somewhat predictable, in that they all address user-centric <a href="http://en.wikipedia.org/wiki/Information_seeking">information-seeking</a> problems. Working for over a decade in this space has focused my intellectual curiosity somewhat &#8212; and of course I work on a number of these problems at <a href="http://www.linkedin.com/">LinkedIn</a>. But there are many information-seeking problems that are outside of my present or foreseeable scope.</p>
<p>Here are two ideas that I&#8217;m hoping someone will execute on so I don&#8217;t have to:</p>
<p style="padding-left: 30px;"><strong>1. Shopping: Help Me Figure Out What I Want</strong></p>
<p style="padding-left: 30px;">We&#8217;ve come a long way to improve the shopping experience, at least for utilitarian shoppers like yours truly. If I know exactly what I want, I usually find it by using Google as a gateway to Amazon, taking a bit more time if I&#8217;m feeling price-sensitive. I&#8217;d happily install a browser extension that could automatically detect product search queries and take them to my preferred shopping sites, bypassing the search results page, but that&#8217;s a minor detail of convenience (though probably not such a minor detail for the search engine companies). In any case, <a href="http://www.iva.dk/bh/core%20concepts%20in%20lis/articles%20a-z/known_item_search.htm">known-item search</a> for online shopping is hardly inspiring as an open problem.</p>
<p style="padding-left: 30px;"><a href="http://en.wikipedia.org/wiki/Exploratory_search">Exploratory search</a> is another story entirely. For all the work that&#8217;s been done on <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, it is used almost exclusively to help people narrow search results. Progressive narrowing is great if you have a pre-established information need, but it is not the best interface if you&#8217;re hoping to evolve your information need through exploration. Instead of just &#8220;help me find what I&#8217;m looking for&#8221;, I&#8217;d also like to see more &#8220;help me figure out what I want&#8221;. I&#8217;d like to see an innovator applying faceted search to broaden queries, not just to narrow them, as well as going beyond <a href="http://en.wikipedia.org/wiki/Collaborative_filtering">collaborative filtering</a> and &#8220;related items&#8221; to create a compelling browsing experience based on semantic and <a href="http://thenoisychannel.com/2008/04/27/social-navigation/">social navigation</a>.</p>
<p style="padding-left: 30px;"><strong>2. Organizing the World&#8217;s Information: Beyond Wikipedia and Navigational Queries</strong></p>
<p style="padding-left: 30px;">If shopping online often reduces to using Google to find product pages on Amazon, then <a href="http://en.wikipedia.org/wiki/Web_search_query">informational queries</a> similarly reduce to using Google to find Wikipedia entries. Nothing against Wikipedia &#8212; I think it is one of the most extraordinary achievements of our generation &#8212; but I think of the web as a library and Wikipedia as its encyclopedia section. Google&#8217;s <a href="http://www.google.com/corporate/">mission statement</a> notwithstanding, web search engines do a poor job of organizing the rest of the world&#8217;s information, instead choosing to optimize for known-item search.</p>
<p style="padding-left: 30px;">There are countless opportunities for improvement here. Imagine if there were an interface for books, scholarly articles, patents, music, or videos that supported browsing and exploration of their content and meta-data. We&#8217;ve seen the beginnings of such an approach for individual libraries (e.g., the <a href="http://search.trln.org/">Triangle Research Libraries Network</a>), but there is so much more to do in this space. Perhaps it&#8217;s a space that is hard to monetize, but even then I&#8217;d expect philanthropists to take an interest in making the world&#8217;s knowledge and creative artifacts more accessible.</p>
<p>If you are pursuing either of these areas, I&#8217;d love to hear about it. I&#8217;m sure readers here would too. I&#8217;m also curious to learn more about innovation in the travel and personals spaces, as those are both areas that could benefit from supporting exploratory search. And if you have work in progress, please present it at the <a href="http://hcir.info/hcir-2011">HCIR workshop</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/03/27/steal-these-ideas/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/03/27/steal-these-ideas/feed/</wfw:commentRss>
		<slash:comments>38</slash:comments>
		</item>
		<item>
		<title>LinkedIn: HCIR for Fun and Profit</title>
		<link>http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/</link>
		<comments>http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/#comments</comments>
		<pubDate>Sun, 20 Mar 2011 05:10:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3524</guid>
		<description><![CDATA[This afternoon, I met with a couple of Stanford seniors to advise them on a startup they&#8217;ve been developing and targeting towards mid-sized online retailers. I&#8217;d expected to spend most of the time talking about their technology and customer development strategy &#8212; and we did indeed talk about these things. But we spent most of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/"><img class="alignnone size-full wp-image-3527" title="LinkedIn -- not just the world's best recruiting tool!" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/03/Screen-shot-2011-03-19-at-9.14.52-PM1.png" alt="" width="532" height="534" /></a></p>
<p>This afternoon, I met with a couple of Stanford seniors to advise them on a startup they&#8217;ve been developing and targeting towards mid-sized online retailers. I&#8217;d expected to spend most of the time talking about their technology and <a href="http://www.amazon.com/Four-Steps-Epiphany-Steven-Blank/dp/0976470705">customer development</a> strategy &#8212; and we did indeed talk about these things. But we spent most of the time brainstorming <em>whom</em> I knew that could best help them achieve the key milestone of landing a first customer.</p>
<p>Not surprisingly, my first step was to open up my laptop and head straight to <a href="http://www.linkedin.com/">LinkedIn</a> (I&#8217;m not only a <a href="http://www.linkedin.com/search/fpsearch?title=data+scientist&amp;currentTitle=CP&amp;searchLocationType=I&amp;countryCode=us&amp;keepFacets=keepFacets&amp;page_num=1&amp;facet_CC=1337">data scientist</a> &#8212; I&#8217;m also a <a href="http://www.linkedin.com/in/dtunkelang">member</a>!) to see who in my network might be most helpful to them at this critical stage. The students were openly impressed: despite being sharp, energetic, and remarkably business-savvy for a couple of guys not old enough to legally buy beer, they had never seen someone use LinkedIn the way I was doing in front of them &#8212; not for hiring or recruiting, but as an <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> tool to find useful professional connections.</p>
<p>I started with a search for online retail, then restricted to directors and VPs, narrowing down further to first-degree and second-degree connections. I vetted second-degree connections by looking at my paths to them, determining who would be likely to be most helpful either because they owed me a favor or because they might have their own interest in the startup&#8217;s success.</p>
<p>We then browsed through the list of <a href="http://www.internetretailer.com/top500/list/">top online retailers</a>, identifying plausible companies for them to target and then looking for my first-degree and second-degree connections not only at those companies but also at other companies in the same space. We spent over an hour fluidly going back and forth between talking and exploring on LinkedIn. In the course of this exploration, we not only produced a list of people to contact, but also arrived at a better understanding of the business strategy.</p>
<p>I&#8217;m always happy to help young entrepreneurs who represent the future of our economy, and even happier to do so using the tools my colleagues and I are constantly working to improve. But I&#8217;m surprised and a bit disheartened that the methods I used are not common knowledge, especially among people who stand to gain the most benefit from them. Perhaps, as someone who has been using LinkedIn since 2004, I take for granted that people know how to take advantage of it for professional networking. I hope that the company&#8217;s increasing visibility will make more people aware that LinkedIn is not *just* the best things that has ever happened to recruiting.</p>
<p>Also, as an <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> advocate, I&#8217;d like to see these kinds of information-seeking tasks receive more attention from researchers and practitioners. I&#8217;ve been saying for a while that these and similar tasks that are neglected by the <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a> community and <a href="http://thenoisychannel.com/2008/08/07/where-google-isnt-good-enough/">not adequately addressed by Google</a> . For example, while there has been significant research effort in the area of <a href="http://scholar.google.com/scholar?hl=en&amp;q=expert+finding&amp;btnG=Search&amp;as_sdt=1%2C5&amp;as_ylo=&amp;as_vis=0">expert finding</a>, I&#8217;d like to see more efforts to improving the interactive process of finding experts and expertise. And <a href="http://thenoisychannel.com/2011/02/04/got-skills/">not just from LinkedIn</a>!</p>
<p>If you are doing work in this space, I hope you&#8217;ll participate in the upcoming <a href="http://hcir.info/">HCIR workshop</a> and show off your stuff.</p>
<p>In the meantime, I hope you make the most of LinkedIn, for fun and for profit. As a mentor of mine told me in my first job, it&#8217;s &#8220;network or not work&#8221;.</p>
<p>&nbsp;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/03/19/linkedin-hcir-for-fun-and-profit/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A Practical Rant about Software Patents</title>
		<link>http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/</link>
		<comments>http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/#comments</comments>
		<pubDate>Mon, 07 Mar 2011 08:21:47 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3496</guid>
		<description><![CDATA[Given the controversial content of this post, I&#8217;d like to remind readers upfront that this post, like all of the contents of this blog, represents my personal opinions, and in particular does not represent the opinions of my present or former employers. I am not a lawyer, nor do I claim to have read any [...]]]></description>
			<content:encoded><![CDATA[<p><em>Given the controversial content of this post, I&#8217;d like to remind readers upfront that this post, like all of the contents of this blog, represents my personal opinions, and in particular does not represent the opinions of my present or former employers. I am not a lawyer, nor do I claim to have read any of the patents to which I directly or indirectly allude in this posts. None of the below should construed as legal advice. Finally, the material is US-centric &#8212; your national software patent policy may vary.</em></p>
<p>My feelings about software patents are a matter of public record (e.g., this <a href="http://thenoisychannel.com/2010/09/25/an-open-letter-to-the-uspto/">open letter to the USPTO</a>). As things stand today, software patents act as an innovation tax rather than as a catalyst for innovation. It may be possible to resolve the problems of software patents through aggressive reform, but it would be better to abolish software patents than to maintain the status quo.</p>
<p>My personal feelings notwithstanding, I acknowledge the reality that today&#8217;s software companies need to have defensive patent strategies. In a previous job, one of my key accomplishments was to hire a director of intellectual property. It was a difficult hire, but it happened just in time to defend against a particularly noxious patent troll. I am not at liberty to spell out the details, but I can say that we responded with a long, expensive fight that effectively quashed the patent and the lawsuit.</p>
<h3><strong>Beware Of Trolls</strong></h3>
<h2><strong><a href="http://en.wikipedia.org/wiki/Patent_troll"><img class="alignnone" title="Beware of Trolls" src="http://upload.wikimedia.org/wikipedia/commons/2/27/BewareOfTrolls.svg" alt="" width="97" height="85" /></a><br />
</strong></h2>
<p>Patent trolls, known less pejoratively as non-practicing entities (NPE) because they do not actually sell products or services that implement the systems or methods in the patents they own, take advantage of asymmetric risk. On one hand, an NPE does not need much money to bankroll (or at least initiate) a patent infringement suit &#8212; in fact, there are law firms who will take such cases on contingency. On the other hand, the company being sued faces potentially ruinous costs. Moreover, even if a company feels certain that a lawsuit against it is baseless, the company cannot count on the imperfect and inefficient legal system to reach a fair outcome. As a result, the company has to choose between spending heavily in its own defense or settling with the NPE. Most companies opt for the less risky route and negotiate settlements, providing funds that the NPEs use to sue more companies.</p>
<p>Some people have a name for this style of asymmetric warfare &#8212; namely, <a href="http://en.wikipedia.org/wiki/Asymmetric_warfare#Asymmetric_warfare_and_terrorism">terrorism</a>. I suppose that the word terrorist is loaded enough without increasing its breadth to include patent trolls &#8212; not to mention that trolls have their <a href="http://en.wikipedia.org/wiki/Patent_troll#Criticism_of_the_term">defenders</a>. But the metaphor is a useful one. A terrorist attack inflicts an amount of damage that is much greater than the absolute cost to the terrorist, e.g., a suicide bomber who inflicts mass murder. Moreover, the threat of terrorism puts the object of that threat in the position between settling (aka negotiating with terrorists) or spending heavily on counter-terrorism efforts. As Peter Neumann notes in a <em>Foreign Affairs</em> <a href="http://www.foreignaffairs.com/articles/62276/peter-r-neumann/negotiating-with-terrorists  ">article</a>:</p>
<blockquote><p>The argument against negotiating with terrorists is simple: Democracies must never give in to violence, and terrorists must never be rewarded for using it. Negotiations give legitimacy to terrorists and their methods…</p>
<p>Yet in practice, democratic governments often negotiate with terrorists.</p></blockquote>
<p>There have been various attempts to address the threat of patent trolls.</p>
<p>Google litigation director Catherine Lacavera has gone <a href="http://www.bloomberg.com/apps/news?pid=newsarchive&amp;sid=ar3V._UIg9CM">on record</a> saying that Google intends to fight rather than settle patent infringement lawsuits in order to deter patent trolls. We&#8217;ll see if Google can sustain this &#8220;we don&#8217;t negotiate with terrorists&#8221; approach; I admire the resolve, but like Neumann I&#8217;m skeptical.</p>
<p><a href="http://www.articleonepartners.com/">Article One Partners</a> has built a business around crowd-sourcing patent invalidation. Clients pay for research to invalidate patents, and Article One offers bounties to anyone who contributes valuable evidence. In theory, companies can request validity analysis of their own patents to test them for robustness, but I assume that the primary application of this service is the invalidation patents that a company sees as threats.</p>
<p><a href="http://www.rpxcorp.com/">Rational Patent (RPX)</a> has created a defensive patent pool. purchasing a large portfolio of patents and then licensing them to its member companies. Some have questioned whether this approach is &#8220;<a href="http://techcrunch.com/2008/11/24/is-rpxs-defensive-patent-aggregation-simply-patent-extortion-by-another-name/">patent extortion by another name</a>&#8220;, and indeed paying RPX for a blanket license does feel a bit like preemptively settling in bulk. But I&#8217;d be more concerned that the &#8220;over 1,500 US and international patent assets&#8221; that RPX claims to have acquired are a drop in the bucket compared to the vast number of patents that the USPTO has granted, many of dubious merit.</p>
<p>Meanwhile, patent trolldom is serious business. Former Microsoft CTO <a href="http://en.wikipedia.org/wiki/Nathan_Myhrvold">Nathan Myhrvold</a> created <a href="http://www.intellectualventures.com/">Intellectual Ventures</a> to &#8220;invest both expertise and capital in the development and monetization of inventions and patent portfolios.&#8221; The company has only filed one <a href="http://bits.blogs.nytimes.com/2010/12/08/intellectual-ventures-goes-to-court/">lawsuit</a> so far, but Mike Masnick <a href="http://www.techdirt.com/articles/20100217/1853298215.shtml">claims</a> that it has used over a thousand shell companies to conduct stealth lawsuits.</p>
<p>Unfortunately, the proliferation of lawsuits by software patent trolls suggests that the economic incentives encourage such suits. If every company could sustain a &#8220;<a href="http://en.wikiquote.org/wiki/Galaxy_Quest">Never give up, never surrender!</a>&#8221; approach, patent trolls would eventually go away, but it is unlikely that companies would be willing to assume the short-term risks that such an approach entails.</p>
<p>Moreover, this approach only works if everyone participates, requiring that every company forgo the competitive advantage it could enjoy from being the only company among its competitors to appease the trolls. This is a classic <a href="http://en.wikipedia.org/wiki/Tragedy_of_the_commons">tragedy of the commons</a>. I&#8217;m hopeful that we&#8217;ll eventually implement sensible patent reform in the United States, but I expect it will take a long time to overcome the entrenched interests that support the status quo.</p>
<h3><strong>It&#8217;s Not Just The Trolls</strong></h3>
<p><a href="http://bits.blogs.nytimes.com/2010/03/04/an-explosion-of-mobile-patent-lawsuits/"><img class="alignnone" title="An Explosion of Mobile Patent Lawsuits" src="http://graphics8.nytimes.com/images/2010/03/03/technology/bits-suepatent2/bits-suepatent2-blogSpan.jpg" alt="" width="138" height="184" /></a></p>
<p>But NPEs are not the only cause for concern. Many established companies, including some technology leaders, are not averse to using patent lawsuits as part of their business strategy. The mobile device and software space is a particularly <a href="http://bits.blogs.nytimes.com/2010/03/04/an-explosion-of-mobile-patent-lawsuits/">popular arena</a> for patent litigation, the most notable being <a href="http://www.scribd.com/doc/35810897/Oracle-Google-Complaint">Oracle&#8217;s lawsuit against Google</a> claiming that Android infringes on patents related to Java. The stakes are extraordinary, dwarfing even the <a href="http://en.wikipedia.org/wiki/NTP,_Inc.#RIM_patent_infringement_litigation">$612.5M that RIM paid NTP</a> in order to avoid a complete shutdown of the BlackBerry service (ironically, at least some of the patents involved have since been rejected by the patent office after re-examination).</p>
<p>Patent lawsuits can also be a way for larger companies to bully smaller ones. For example, a couple of entrepreneurs at visual search startup <a href="http://thenoisychannel.com/2009/12/26/r-i-p-modista/">Modista</a> were forced to shut down their company because of a lawsuit by Like.com, a more established player in the space which was ultimately <a href="http://www.rev2.org/2010/08/16/google-acquires-like-com-visual-search/">acquired by Google</a>. Note: although I was Google at the time, I have no inside knowledge of the acquisition, nor whether there is any truth to the speculation that Google acquired the company for its patents.</p>
<h3>Defensive Patenting</h3>
<p><a href="http://en.wikipedia.org/wiki/The_Art_of_War"><img class="alignnone" title="Sun Tzu - The Art of War" src="http://upload.wikimedia.org/wikipedia/commons/9/94/Bamboo_book_-_binding_-_UCR.jpg" alt="" width="145" height="168" /></a></p>
<p>Moral considerations aside, the above stories make it clear that defensive patent strategy isn&#8217;t just about NPEs. In fact, many software companies take an approach to defensive patenting is to assemble a trove of patents that are useful for countersuits and thus serve as a deterrent. Back to military metaphors, it&#8217;s similar to countries developing nuclear weapons (a popular metaphor for patents in general) in accordance with the doctrine of <a href="http://en.wikipedia.org/wiki/Mutual_assured_destruction">mutual assured destruction</a>.</p>
<p>Companies that follow a defensive patent strategy typically implement a process for capturing intellectual property. Scientists and engineers file invention disclosures, a committee reviews these for patentability, and a law firm translates the invention disclosures into patent filings. The filings then go through the meat grinder of patent prosecution and eventually are extruded as patents.</p>
<p>It all sounds great in theory &#8212; indeed, I have seen executives who mostly worry about educating scientists and engineers about patents and providing the right incentives to encourage them to write and submit invention disclosures. Indeed, it can be difficult to integrate intellectual property capture into the process and culture of a software company. But I think there are two much bigger issues.</p>
<p>First, it takes several years to obtain a patent. Indeed, the <a href="http://www.uspto.gov/dashboards/patents/main.dashxml">USPTO dashboard</a> shows that it takes two years just to get an initial response from the patent office. Thus a defensive patenting strategy requires significant advance planning: any patents filed today are unlikely to be useful deterrents until at least 2014. Given the rapid pace of the software industry, this delay is very significant. Moreover, startups are especially vulnerable in their first few years.</p>
<p>Second, intellectual property capture processes are inherently optimized for offensive (i.e., don&#8217;t copy my invention or I&#8217;ll sue you) rather than defensive (i.e., don&#8217;t sue me) patent strategy. Consider Google&#8217;s defensive position with respect to Oracle in the aforementioned lawsuit. Google has a relatively small patent portfolio, but it has obtained patents for some of its major innovations, such as <a href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>. Let&#8217;s put aside <a href="http://www.dbms2.com/2010/02/11/google-mapreduce-patent/">questions about the validity of the MapReduce patent</a> &#8212; especially since patents enjoy the <a href="http://www.uspto.gov/web/offices/pac/mpep/documents/appxl_35_U_S_C_282.htm">presumption of validity</a>. The bigger question is to whom such a patent serves as a deterrent against patent lawsuits. It may very well deter <a href="http://en.wikipedia.org/wiki/Hadoop">Hadoop</a> users, which include Google arch-rival <a href="http://www.facebook.com/notes/facebook-engineering/looking-at-the-code-behind-our-three-uses-of-apache-hadoop/468211193919">Facebook</a>. But, as far as I know, Oracle is not vulnerable on this front. FOSS Patents blogger Florian Mueller did an <a href="http://fosspatents.blogspot.com/2011/01/google-is-patently-too-weak-to-protect.html">analysis</a> and concluded that Google&#8217;s patents are not an effective deterrent. Indeed, the fact that Google has not counter-sued Oracle using its own patents is at least consistent with this analysis.</p>
<p>What if Google were to invest in obtaining (i.e., purchasing) a collection of broad patents that had to do with relational databases? Such patents could have nothing to do with Google&#8217;s areas of innovation and nonetheless serve as an effective deterrent against lawsuits from relational database companies like Oracle. Even if the patents were not robust, they would still have some value as deterrents because of their presumption of validity and the aforementioned inefficiency of the legal system.</p>
<p>In general, the most valuable defensive patents are those that you believe your competitors (or anyone else who might have an interest in suing you) are already infringing. Even if those patents would be unlikely to survive re-examination, the re-examination process is long and expensive, and even the most outrageous of patents enjoys the presumption of validity.</p>
<h3>Everybody Into The Pool</h3>
<p><a href="http://www.youtube.com/watch?v=kB2Vuc2W0U0"><img class="alignnone" title="Flintstones - Everybody into the Pool" src="http://i4.ytimg.com/vi/kB2Vuc2W0U0/default.jpg" alt="" width="120" height="90" /></a></p>
<p>A <a href="http://en.wikipedia.org/wiki/Patent_pool">patent pool</a> is a consortium of at companies that agree to cross-license each other&#8217;s patents &#8212; a sort of mutual non-aggression pact. But perhaps companies that only believe in the defensive use of patents should take a more aggressive approach to patent pooling. Following the example of <a href="http://en.wikipedia.org/wiki/NATO">NATO</a>, they could create an alliance in which they agree to mutual defense in response to an attack by any external party. I don&#8217;t know if such an approach would be viewed as anti-competitive, but it does strike me as a cost-effective alternative to the current approach for defensive patenting.</p>
<p>And, as with most ideas, this one is hardly original. In 1993, Autodesk founder <a href="http://en.wikipedia.org/wiki/John_Walker_(programmer)">John Walker</a> published &#8220;<a href="http://www.fourmilab.ch/autofile/www/chapter2_105.html">PATO: Collective Security In the Age of Software Patents</a>&#8220;, in which he proposed:</p>
<blockquote><p>The basic principle of NATO is that an attack on any member is considered an attack on all members. In PATO it works like this&#8211;if any member of PATO is alleged with infringement of a software patent by a non-member, then that member may counter-sue the attacker based on infringement of any patent in the PATO cross-licensing pool, regardless of what member contributed it. Once a load of companies and patents are in the pool, this will be a deterrent equivalent to a couple thousand MIRVs in silos&#8211;odds are that any potential plaintiff will be more vulnerable to 10 or 20 PATO patents than the PATO member is to one patent from the aggressor. Perhaps the suit will just be dropped and the bad guy will decide to join PATO&#8230;.</p>
<p>Since PATO is chartered to promote the free exchange and licensing of software patents, members do not seek revenue from their software patents&#8211;only mutual security. Thus, anybody can join PATO, even individual programmers who do not have a patent to contribute to the pool&#8211;they need only pay the nominal yearly dues and adhere to the treaty&#8211;that any software patents they are granted will go in the pool and that they will not sue any other PATO member for infringement of a software patent.</p></blockquote>
<p>It&#8217;s been almost two decades, but perhaps PATO is an idea whose time has come. And, even if a collective effort fails, individual companies might do well to focus less on intellectual property capture and more on collecting the kinds of nuisance patents currently favored by trolls. After all, the best defense is the credible threat of a good offense.</p>
<h3>Conclusion</h3>
<p>Even if you hate software patents, you can&#8217;t afford to ignore them if you are in the software industry. And I&#8217;m well aware that not everyone shares my view of software patents. But I hope those who do find useful advice in the above discussion. I&#8217;d love to see the software industry move beyond this innovation tax.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/03/07/a-practical-rant-about-software-patents/feed/</wfw:commentRss>
		<slash:comments>33</slash:comments>
		</item>
		<item>
		<title>Life, the Universe, and SEO Revisited</title>
		<link>http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/</link>
		<comments>http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/#comments</comments>
		<pubDate>Sat, 26 Feb 2011 19:46:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3488</guid>
		<description><![CDATA[A couple of years ago, I wrote a post entitled &#8220;Life, the Universe, and SEO&#8221; in which I considered Google&#8217;s relationship with the search engine optimization (SEO) industry. Specifically, I compared it to the relationship that Deep Thought, the computer in Douglas Adams&#8217;s Hitchhikers Guide to the Galaxy, has with the Amalgamated Union of Philosophers, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.naderlibrary.com/hitchhiker.screen2.htm"><img class="alignnone" title=" The Amalgamated Union of Philosophers, Sages, Luminaries and Other Thinking Persons" src="http://www.naderlibrary.com/ahitchscreen81.jpg" alt="" width="512" height="384" /></a></p>
<p>A couple of years ago, I wrote a post entitled &#8220;<a href="http://thenoisychannel.com/2008/11/24/life-the-universe-and-seo/">Life, the Universe, and SEO</a>&#8221; in which I considered Google&#8217;s relationship with the search engine optimization (SEO) industry. Specifically, I compared it to the relationship that <a href="http://en.wikipedia.org/wiki/Minor_characters_from_The_Hitchhiker's_Guide_to_the_Galaxy#Deep_Thought">Deep Thought</a>, the computer in Douglas Adams&#8217;s <em><a href="http://en.wikipedia.org/wiki/The_Hitchhiker's_Guide_to_the_Galaxy">Hitchhikers Guide to the Galaxy</a></em>, has with the Amalgamated Union of Philosophers, Sages, Luminaries and Other Thinking Persons.</p>
<p>Interestingly, both SEO and union protests have been front-page news of late. I&#8217;ll focus on the former.</p>
<p>Three recent incidents brought mainstream attention to the SEO industry:</p>
<ul>
<li>Two weeks ago, Google head of web spam Matt Cutts <a href="http://www.nytimes.com/2011/02/13/business/13search.html">told the New York Times</a> that Google was engaging in a &#8220;corrective action&#8221; that penalized retailer J. C. Penney&#8217;s search results because the company had engaged in SEO practices that violated Google&#8217;s guidelines. For months before the action (which included the holiday season), J. C. Penney was performing exceptionally well in broad collection of Google searches, including such queries as [<a href="http://www.google.com/search?q=dresses">dresses</a>], [<a href="http://www.google.com/search?q=bedding">bedding</a>], [<a href="http://www.google.com/search?q=area+rugs">area rugs</a>], [<a href="http://www.google.com/search?q=skinny+jeans">skinny jeans</a>], [<a href="http://www.google.com/search?q=home+decor">home decor</a>], [<a href="http://www.google.com/search?q=comforter+sets">comforter sets</a>], [<a href="http://www.google.com/search?q=furniture">furniture</a>], [<a href="http://www.google.com/search?q=tablecloths">tablecloths</a>], and [<a href="http://www.google.com/search?q=grommet+top+curtains">grommet top curtains</a>]. As I write this blog post, I do not see results from <a href="http://www.jcpenney.com/">jcpenney.com</a> on the first result page for any of these search queries.</li>
<li>This past Thursday, online retailer Overstock.com <a href="http://online.wsj.com/article/SB10001424052748704520504576162753779521700.html">reported to the Wall Street Journal</a> that Google was penalizing them because of Overstock&#8217;s now discontinued practice of rewarding students and faculties with discounts in exchange for linking to Overstock pages from their university web pages. Before the penalty, these links were helping Overstock show up at the top of result sets for queries like [<a href="http://www.google.com/search?q=bunk+beds">bunk beds</a>] and [<a href="http://www.google.com/search?q=gift+baskets">gift baskets</a>]. As I write this blog post, I do not see results from <a href="http://www.overstock.com/">overstock.com</a> on the first result page for either of these search queries.</li>
<li>That same day, Google announced, via an <a href="http://googleblog.blogspot.com/2011/02/finding-more-high-quality-sites-in.html">official blog post</a> by Amit Singhal (Google&#8217;s head of core ranking) and Matt Cutts, a change that, according to their analysis, noticeably impacts 11.8% of of Google search queries. In their words: &#8220;This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on.&#8221;</li>
</ul>
<p>Of course, Google is always working to improve search quality and stay at least one step ahead of those who attempt to reverse-engineer and game its ranking of results. But it&#8217;s quite unusual to see so much public discussion of ranking changes in such a short time period.</p>
<p>Granted, there is a growing chorus in the blogosphere bemoaning the <a href="http://dashes.com/anil/2011/01/threes-a-trend-the-decline-of-google-search-quality.html">decline of Google&#8217;s search quality</a>. Much of it focused on &#8220;<a href="http://en.wikipedia.org/wiki/Content_farm">content farms</a>&#8221; that seem to be the target of Google&#8217;s latest update. Perhaps Google&#8217;s new public assertiveness is a reaction to what it sees as unfair press. Indeed, Google&#8217;s recent <a href="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/">public spat with Bing</a> would be consistent with a more assertive PR stance.</p>
<p>But what I find most encouraging is the Google&#8217;s recent release of Chrome browser extension that allows users to <a href="http://googleblog.blogspot.com/2011/02/new-chrome-extension-block-sites-from.html">create personal site blocklists</a> that are reported to Google. Some may see this is as a reincarnation of <a href="http://thenoisychannel.com/2008/11/21/google-searchwiki-an-interesting-take-on-pim/">SearchWiki</a>, an ill-conceived and short-lived feature that allowed searchers to annotate and re-order results. But filtering out entire sites for all searches offers users a much greater return on investment than demoting individual results for specific searches.</p>
<p>Of course, I&#8217;d love to see user control taken <a href="http://thenoisychannel.com/2009/01/08/google-tech-talk-reconsidering-relevance/">much further</a>. And I wonder if efforts like personal blocklists are the beginning of Amit offering me a more positive answer to the <a href="http://thenoisychannel.com/2008/04/08/qa-with-amit-singhal-2/">question</a> I asked him back in 2008 about relevance approaches that relied on transparent design rather than obscurity.</p>
<p>I&#8217;m a realist: I recognize that many site owners are competing for users&#8217; attention, that most users are lazy, and that Google wants to optimize search quality subject to these constraints. I also don&#8217;t think that anyone today threatens Google with the promise of better search quality (and yes, I&#8217;ve tried <a href="http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/">Blekko</a>).</p>
<p>Perhaps the day is in sight when <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">human-computer information retrieval</a> (HCIR) offers a better alternative to the organization of web search results than the black-box ranking that fuels the SEO industry. But I&#8217;ve waiting for that long enough to not be holding my breath. Instead, I&#8217;m encouraged to see a growing recognition that today&#8217;s approaches are an endless game of <a href="http://en.wikipedia.org/wiki/Whac-A-Mole">Whac-A-Mole</a>, and I&#8217;m delighted  that at least one of the improvements on the table takes a realistic approach to putting more power in the hands of users.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/26/life-the-universe-and-seo-revisited/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
		</item>
		<item>
		<title>Life&#8217;s a Beach</title>
		<link>http://thenoisychannel.com/2011/02/14/lifes-a-beach/</link>
		<comments>http://thenoisychannel.com/2011/02/14/lifes-a-beach/#comments</comments>
		<pubDate>Tue, 15 Feb 2011 00:48:31 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3484</guid>
		<description><![CDATA[Heading to Punta Cana for a week. Feel free to keep writing great comments &#8212; will catch up when I get back!]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/B%C3%A1varo"><img class="alignnone" title="Punta Cana - Bávaro Beach" src="http://upload.wikimedia.org/wikipedia/commons/7/78/Bavaro.jpg" alt="" width="500" height="375" /></a></p>
<p>Heading to <a href="http://en.wikipedia.org/wiki/B%C3%A1varo">Punta Cana</a> for a week. Feel free to keep writing great <a href="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/#comments">comments</a> &#8212; will catch up when I get back!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/14/lifes-a-beach/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/14/lifes-a-beach/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Google vs. Bing: A Tweetle Beetle Battle Muddle</title>
		<link>http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/</link>
		<comments>http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/#comments</comments>
		<pubDate>Sun, 06 Feb 2011 00:26:26 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3474</guid>
		<description><![CDATA[Unless you&#8217;ve been living in a cone of silence, you&#8217;ve probably heard about the epic war of words between Google and Bing. But just in case, here&#8217;s a quick summary: Amit Singhal, Google Fellow: &#8220;Microsoft’s Bing uses Google search results—and denies it&#8220;: Bing is using some combination of: Internet Explorer 8, which can send data [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://fc09.deviantart.net/fs4/i/2004/218/b/e/tweetle_beetle_battle.jpg"><img class="alignnone" title="tweetle beetle bottle puddle battle muddle" src="http://fc09.deviantart.net/fs4/i/2004/218/b/e/tweetle_beetle_battle.jpg" alt="" width="528" height="408" /></a></p>
<p>Unless you&#8217;ve been living in a <a href="http://en.wikipedia.org/wiki/Cone_of_Silence">cone of silence</a>, you&#8217;ve probably heard about the epic war of words between Google and Bing. But just in case, here&#8217;s a quick summary:</p>
<p>Amit Singhal, Google Fellow: &#8220;<a href="http://googleblog.blogspot.com/2011/02/microsofts-bing-uses-google-search.html ">Microsoft’s Bing uses Google search results—and denies it</a>&#8220;:</p>
<blockquote><p>Bing is using some combination of:</p>
<ul>
<li>Internet Explorer 8, which can send data to Microsoft via its <a href="http://www.microsoft.com/windows/internet-explorer/privacy.aspx">Suggested Sites</a> feature</li>
<li>the Bing Toolbar, which can send data via Microsoft’s <a href="http://www.microsoft.com/products/ceip/EN-US/default.mspx">Customer Experience Improvement Program</a></li>
</ul>
<p>or possibly some other means to send data to Bing on what people search for on Google and the Google search results they click. Those results from Google are then more likely to show up on Bing. Put another way, some Bing results increasingly look like an incomplete, stale version of Google results—a cheap imitation.</p></blockquote>
<p>Harry Shum, Corporate Vice President, Bing: &#8220;<a href="http://www.bing.com/community/site_blogs/b/search/archive/2011/02/01/thoughts-on-search-quality.aspx">Thoughts on search quality</a>&#8220;:</p>
<blockquote><p>We use over 1,000 different signals and features in our ranking algorithm. A small piece of that is clickstream data we get from some of our customers, who opt-in to sharing anonymous data as they navigate the web in order to help us improve the experience for all users.</p></blockquote>
<p>Yusuf Mehdi, Senior Vice President, Online Services Division, Bing: &#8220;<a href="http://www.bing.com/community/site_blogs/b/search/archive/2011/02/02/setting-the-record-straight.aspx">Setting the record straight</a>&#8220;:</p>
<blockquote><p>Google engaged in a “honeypot” attack to trick Bing. In simple terms, Google’s “experiment” was rigged to manipulate Bing search results through a type of attack also known as “click fraud.” That’s right, the same type of attack employed by spammers on the web to trick consumers and produce bogus search results.  What does all this cloak and dagger click fraud prove? Nothing anyone in the industry doesn’t already know. As we have said before and again in this post, we use click stream optionally provided by consumers in an anonymous fashion as one of 1,000 signals to try and determine whether a site might make sense to be in our index.</p></blockquote>
<p>Matt Cutts, Head of Webspam, Google: &#8220;<a href="http://www.mattcutts.com/blog/google-bing/ ">My thoughts on this week’s debate</a>&#8220;:</p>
<blockquote><p>Something I’ve heard smart people say is that this could be due to generalized clickstream processing rather than code that targets Google specifically. I’d love if Microsoft would clarify that, but <a href="http://news.ycombinator.com/item?id=2168332">at least one example has surfaced</a> in which Microsoft was targeting Google’s urls specifically. The paper is titled <a href="http://aclweb.org/anthology/P/P10/P10-1028.pdf">Learning Phrase-Based Spelling Error Models from Clickthrough Data</a> and here’s some of the relevant parts:</p>
<p style="padding-left: 30px;">The clickthrough data of the second type consists of a set of query reformulation sessions extracted from 3 months of log files from a commercial Web browser <em>[I assume this is Internet Explorer. --Matt]</em> …. In our experiments, we “reverse-engineer” the parameters from the URLs of these [query formulation] sessions, and deduce how each search engine encodes both a query and the fact that a user arrived at a URL by clicking on the spelling suggestion of the query – an important indication that the spelling suggestion is desired. From these three months of query reformulation sessions, we extracted about 3 million query-correction pairs.”</p>
<p>This paper very much sounds like Microsoft reverse engineered which specific url parameters on Google corresponded to a spelling correction. Figure 1 of that paper looks like Microsoft used specific Google url parameters such as “&amp;spell=1″ to extract spell corrections from Google. Targeting Google deliberately is quite different than using lots of clicks from different places.</p></blockquote>
<p>Let me start by saying that these are very serious words from very serious people.</p>
<p>Amit and Matt, both of whom I know personally, are not just two of the most prominent Google employees &#8212; they have a deep personal investment in Google&#8217;s search quality. Amit is personally responsible for much of Google&#8217;s web search ranking algorithm, and Matt is surely the person whom spammers (and many SEO consultants) most love to hate. There is no question in my mind that the emotion both of them are expressing is sincere.</p>
<p>I haven&#8217;t met Harry or Yusuf, but I have no reason to doubt their own sincerity &#8212; especially since everything they are saying seems consistent with the facts &#8212; in fact, consistent with the substantive parts of Google&#8217;s allegations. Indeed, the facts don&#8217;t really seem to be in dispute. And more generally, I&#8217;ve met some of the folks who lead the Bing team (like <a href="http://www.jopedersen.com/">Jan Pedersen</a>), and, like Matt, I believe they are thoughtful and sincere and are devoted to building a great search engine of their own.</p>
<p>The debate is not about the facts. Rather, it&#8217;s about what is right and wrong. I will try to summarize the two sides&#8217; position without editorializing.</p>
<p>Bing is claiming that:</p>
<ul>
<li>Users have a right to do as they please with their own clickthrough data, which includes data from Google search sessions.</li>
<li>Bing toolbar users opted in to share this clickthrough data with Bing.</li>
<li>By using this clickthrough data, Bing creates value for users.</li>
</ul>
<p>Google is claiming that:</p>
<ul>
<li>Bing&#8217;s specific targeting of Google clickthrough data amounts to copying Google and is wrong.</li>
<li>Bing toolbar users are not necessarily aware that they are complicit in this behavior.</li>
<li>Bing is disingenuous in understating how much it benefits from Google as a signal.</li>
</ul>
<p>What do I think?</p>
<p>I agree with Bing that users have the right to do as they please with clickthrough data. I&#8217;d think Google would agree too, given that Google wrote the sermon on &#8220;<a href="http://googleblog.blogspot.com/2009/12/meaning-of-open.html">the meaning of open</a>&#8220;:</p>
<blockquote><p>Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.</p></blockquote>
<p>I agree with all of the three points I listed as Google&#8217;s claims except for the part that Bing&#8217;s behavior is wrong. It&#8217;s up to users if they want to help Bing compete with Google. Do users know that they&#8217;re doing so? Probably not. But would they stop doing so if they did? I doubt it. I can&#8217;t see why most users would have a dog in this fight &#8212; and in fact, it may be in users&#8217; interest to help Bing be more competitive.</p>
<p>I do think Bing should be forthright about what it is doing &#8212; and how much this user-provided data from Google search sessions is contributing to its own quality improvements. Bing can, of course, keep this information secret, but I&#8217;d think that Bing would want to defend its reputation as an innovator &#8212; especially as the David in a David vs. Goliath fight.</p>
<p>But I also think that Google should be careful with its accusations. Accusing Bing of not being innovative is one thing, and that accusation, backed by concrete examples, is probably enough to score points. But implying that Google owns its users&#8217; clickthrough data and that Bing has no right to solicit that data from users is another thing entirely.</p>
<p>I&#8217;m curious to hear what others here think. It&#8217;s been a while since I could freely express opinions about Google and Bing, so I&#8217;m delighted to have such a hot controversy to incite discussion. Because everyone enjoys a<a href="http://en.wikipedia.org/wiki/Fox_in_Socks"> muddle puddle tweetle poodle beetle noodle bottle paddle battle</a>!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/05/google-vs-bing-a-tweetle-beetle-battle-muddle/feed/</wfw:commentRss>
		<slash:comments>52</slash:comments>
		</item>
		<item>
		<title>Got Skills?</title>
		<link>http://thenoisychannel.com/2011/02/04/got-skills/</link>
		<comments>http://thenoisychannel.com/2011/02/04/got-skills/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 05:53:38 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3463</guid>
		<description><![CDATA[Last October, a certain blogger said: LinkedIn needs to implement some kind of concept extraction to provide a useful topic facet (something I’d also love to see for their regular people search). This is a challenging information extraction problem, especially for the open web, but I also know from experience that it is tractable within a domain. [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/skills/skill/Information_Retrieval"><img class="alignnone size-full wp-image-3466" title="LinkedIn Skills: Information Retrieval" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/02/Screen-shot-2011-02-03-at-9.10.05-PM1.png" alt="" width="507" height="589" /></a></p>
<p>Last October, <a href="http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/">a certain blogger said</a>:</p>
<blockquote><p>LinkedIn needs to implement some kind of concept extraction to provide a useful topic facet (something I’d also love to see for their regular people search). This is a challenging information extraction problem, especially for the open web, but I also know from <a href="http://www.endeca.com/">experience</a> that it is tractable within a domain. Given LinkedIn’s professional focus, I believe this is a problem they can and should tackle.</p></blockquote>
<p>Shortly after writing that post, I interviewed at LinkedIn and met <a href="http://www.linkedin.com/in/peterskomoroch">Pete Skomoroch</a>, who showed me an early preview of the work his team was doing to make skills a <a href="http://en.wikipedia.org/wiki/Faceted_search">facet</a> for exploring the space of LinkedIn member profiles. That demo made a strong impression on me, giving me a taste of the great products LinkedIn&#8217;s data scientists were working on in the lab.</p>
<p>And now I&#8217;m delighted that everyone can try out the beta launch of <a href="http://www.linkedin.com/skills/">LinkedIn Skills</a> which was announced today at O&#8217;Reilly&#8217;s <a href="http://strataconf.com/strata2011">Strata 2011</a> conference on Big Data.</p>
<p>As Pete says in his <a href="http://blog.linkedin.com/2011/02/03/linkedin-skills/">blog post</a>:</p>
<p><!-- p.p1 {margin: 0.0px 0.0px 13.0px 0.0px; line-height: 17.0px; font: 13.0px Arial} --></p>
<blockquote><p>If you search for a particular skill, we’ll surface key people within that community, show you the top locations, related companies, relevant jobs, and groups where you can interact with like minded professionals.  You’ll also be able to explore similar skills and compare their growth relative to each other.</p></blockquote>
<p>I encourage you to check it out &#8212; whether you&#8217;re looking for experts on <a href="http://www.linkedin.com/skills/skill/Hadoop">Hadoop</a>, <a href="http://www.linkedin.com/skills/skill/Cheese">cheese</a>, or anything else! It&#8217;s a beta, so I&#8217;m sure you&#8217;ll find rough edges; but I hope it gives you a sense of how LinkedIn&#8217;s data can enable a incredibly powerful and useful <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> experience.</p>
<p><a href="http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/">No forward-looking statements</a>, except to say that it only gets better from here!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/02/04/got-skills/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/02/04/got-skills/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Be Vewy Vewy Quiet</title>
		<link>http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/</link>
		<comments>http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 18:52:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3456</guid>
		<description><![CDATA[While my blog has always been and will always be a personal one, I do operate under certain constraints as someone whose subject matter relates strongly to his professional interests. I deeply appreciate how long-time readers have respected the balancing act I sometimes have to make as both an independent individual and an employee. Right [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-medium wp-image-3457" title="&quot;Be vewy vewy quiet, I'm hunting wabbits.&quot;" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2011/01/elmer-221x300.jpg" alt="" width="155" height="210" /></p>
<p>While my blog has always been and will always be a <a href="http://thenoisychannel.com/2008/12/10/this-is-not-a-corporate-blog/">personal</a> one, I do operate under certain constraints as someone whose subject matter relates strongly to his professional interests. I deeply appreciate how long-time readers have respected the balancing act I sometimes have to make as both an independent individual and an employee.</p>
<p>Right now, that means I must respect the conditions of my employer&#8217;s <a href="http://www.sec.gov/answers/quiet.htm">quiet period</a> &#8212; and I will do so very conservatively (e.g., no Playboy interviews). I apologize if the content of this blog suffers in the interim, but I hope you understand my need to be cautious.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/29/be-vewy-vewy-quiet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Internship Opportunities at LinkedIn</title>
		<link>http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/</link>
		<comments>http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/#comments</comments>
		<pubDate>Wed, 19 Jan 2011 04:47:03 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3450</guid>
		<description><![CDATA[Do you love big data? Do you enjoy applying your skills in data mining, machine learning, information retrieval and data visualization? Are you a hands-on implementer who can turn your ideas into reality, whether in Java or Python? Are you turned on by NoSQL technologies like Hadoop, Pig, and Voldemort? And one last question&#8230;are looking for [...]]]></description>
			<content:encoded><![CDATA[<p>Do you love <a href="http://thenoisychannel.com/2011/01/04/so-you-like-big-data/">big data</a>? Do you enjoy applying your skills in data mining, machine learning, information retrieval and data visualization? Are you a hands-on implementer who can turn your ideas into reality, whether in Java or Python? Are you turned on by <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> technologies like <a href="http://hadoop.apache.org/">Hadoop</a>, <a href="http://pig.apache.org/">Pig</a>, and <a href="http://project-voldemort.com/">Voldemort</a>?</p>
<p>And one last question&#8230;are looking for an exciting internship opportunity this summer? Then you&#8217;ve come to the right place at the right time: <a href="http://www.linkedin.com/">LinkedIn</a> is looking for a few good interns for summer 2011! You can find more details <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1354912">here</a> or go directly to the <a href="http://www.linkedin.com/jobs?viewJob=&amp;jobId=1354912">application form</a>.</p>
<p>If you are interested, I encourage you to act quickly, since we are already interviewing candidates.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/19/internship-opportunities-at-linkedin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dare To Dream</title>
		<link>http://thenoisychannel.com/2011/01/17/dare-to-dream/</link>
		<comments>http://thenoisychannel.com/2011/01/17/dare-to-dream/#comments</comments>
		<pubDate>Mon, 17 Jan 2011 23:20:33 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3444</guid>
		<description><![CDATA[&#8220;If a man hasn&#8217;t discovered something that he will die for, he isn&#8217;t fit to live.&#8221; Martin Luther King, Jr. said these words at a speech in Detroit on June 23, 1963. Less than five years later, he died for the cause to which he devoted his life: the advancement of civil rights in the [...]]]></description>
			<content:encoded><![CDATA[<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="480" height="385" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/PbUtL_0vAJk?fs=1&amp;hl=en_US&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="480" height="385" src="http://www.youtube.com/v/PbUtL_0vAJk?fs=1&amp;hl=en_US&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p><em>&#8220;If a man hasn&#8217;t discovered something that he will die for, he isn&#8217;t fit to live.&#8221;</em></p>
<p><a href="http://en.wikipedia.org/wiki/Martin_Luther_King,_Jr.">Martin Luther King, Jr.</a> said these words at a <a href="http://www.mlkonline.net/detroit.html">speech in Detroit</a> on June 23, 1963. Less than five years later, he died for the cause to which he devoted his life: the advancement of civil rights in the United States and around the world through civil disobedience and other nonviolent resistance.</p>
<p>Today, as Americans commemorate Dr. King&#8217;s birthday, there are many ways we can honor his memory and build on his legacy. As much as King advanced the civil rights movement, there is still much to be done to fulfill his dream.</p>
<p>But I&#8217;d like to go back to the quote from his speech in Detroit. King&#8217;s words reveal a truth even deeper than his struggle for civil rights. They demand that we approach life with passion, that we live to do something more than pass the time.</p>
<p>In the face of pressing day-to-day responsibilities, it is easy to fall into a reactive rhythm, doing what we have to do and then using what time is left to escape into oblivion. For many of us, passion may feel like a nice-to-have, something to think about after we&#8217;ve cleared out our queues and gotten a full night of sleep &#8212; only to wake up and find that the queue is full again. It is easy to go through life like <a href="http://en.wikipedia.org/wiki/Sisyphus">Sisyphus</a>, sweating profusely as we roll our boulders but lacking the intellectual ambition to question why we make those efforts.</p>
<p>Today, the least we can do to honor the memory of Martin Luther King, Jr. is to reflect on his personal passion to leave the world better than he found it. Hopefully none of us will ever have to make the sacrifice that he made to realize his dream. But if we do not dare to dream at all &#8212; if we are not passionate and ambitious about what do &#8212; then, indeed, we are not fit to live.</p>
<p>Dare to dream &#8212; and live to make that dream a reality.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/17/dare-to-dream/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/17/dare-to-dream/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Quo Vadis, Quora?</title>
		<link>http://thenoisychannel.com/2011/01/09/quo-vadis-quora/</link>
		<comments>http://thenoisychannel.com/2011/01/09/quo-vadis-quora/#comments</comments>
		<pubDate>Sun, 09 Jan 2011 22:14:56 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3433</guid>
		<description><![CDATA[I know, everyone is sick about hearing about Quora, the community question answering site that is the darling of the blogosphere, and perhaps you fled here from TechCrunch hoping for something different. If so, I apologize. And if you want to read something else, I encourage you to use either the random post widget I [...]]]></description>
			<content:encoded><![CDATA[<p>I know, everyone is sick about hearing about <a href="http://www.quora.com/">Quora</a>, the community question answering site that is the darling of the blogosphere, and perhaps you fled here from <a href="http://techcrunch.com/tag/quora/">TechCrunch</a> hoping for something different. If so, I apologize. And if you want to read something else, I encourage you to use either the random post widget I recently added to the right-hand sidebar  or the <a href="http://thenoisychannel.com/2011/01/07/enabling-exploratory-search-with-dhiti/">exploration widget</a> at the bottom of this post.</p>
<p>But I have personal reasons to be interested in Quora. One of their lead engineers, <a href="http://xng.cc/">Albert Sheu</a>, was a <a href="http://www.linkedin.com/profile/recommendations?id=9903676">star intern</a> of mine at <a href="http://www.endeca.com/">Endeca</a>. And Quora raises lots of interesting questions about search, user experience, knowledge management, and <a href="http://thenoisychannel.com/2010/05/02/thoughts-about-online-reputation/">online reputation</a>. How could I resist?</p>
<p>I see three potential reasons to use Quora:</p>
<ol>
<li>Objective question answering.</li>
<li>Subjective question answering.</li>
<li>Community participation.</li>
</ol>
<p>Let&#8217;s consider how Quora fares today on each of these, and where it might go.</p>
<p><strong>1. Objective question answering.</strong></p>
<p>When I <a href="http://thenoisychannel.com/2010/04/19/qui-quae-quora/">blogged about Quora</a> early last year, I said that &#8220;I don’t see Quora as a knowledge base of first resort–except possibly to learn more about software startups.&#8221; Despite Quora&#8217;s recently <a href="http://www.quora.com/Quora-Growth-Surge-Dec-2010-Jan-2011">growth surge</a>, I am not ready to change my answer significantly &#8212; I find that Quora&#8217;s topics are pretty sparse when I stray from its Silicon Valley focus.</p>
<p>Within that focus, Quora is nailing it. For example, I was curious to learn whether someone who signed a non-compete agreement outside of California was still subject to it if he or she moved to California, where such contracts are legally unenforceable. Not surprisingly, <a href="http://www.quora.com/Non-Compete-Agreements">non-compete agreements</a> are a topic on Quora, and I quickly found a <a href="http://www.quora.com/If-I-have-a-non-compete-agreement-with-a-company-in-NY-and-move-to-CA-is-the-non-compete-agreement-unenforcable">useful answer</a> from a lawyer.</p>
<p>But for most objective questions, I&#8217;m still turning to Google and Wikipedia &#8212; or to <a href="http://twitter.com/#!/dtunkelang">Twitter</a> if both of those fail and I am willing to ask a favor of my followers (who <a href="http://thenoisychannel.com/2009/03/14/challenge-blog-twitter-vs-aardvark/">kick ass</a>!). Sometimes Google will take me to Quora, but I can&#8217;t imagine Quora will succeed through this flow in the long term.</p>
<p><strong>2. Subjective question answering.</strong></p>
<p>I see subjective question answering as Quora&#8217;s strongest suit. A good subjective question on Quora &#8212; often a &#8220;why&#8221; question &#8212; generates a diverse collection of interesting and informed perspectives. A couple of good example are &#8220;<a href="http://www.quora.com/Why-did-Google-Wave-fail-to-get-significant-user-adoption">Why did Google Wave fail to get significant user adoption?</a>&#8221; and &#8220;<a href="http://www.quora.com/Social-Networks/What-is-lacking-in-social-networking-now">What is lacking in social networking now?</a>&#8220;.</p>
<p>Again, these questions are well within the Silicon Valley focus, but I could see Quora extending this value proposition to other verticals if it can grow the communities successfully. And I certainly don&#8217;t see myself going to Google or even Twitter to get useful answers to subjective questions. The closest is <a href="http://thenoisychannel.com/2009/05/27/topsy-tippling-the-stream-of-conversations/">Topsy</a>, and Quora has the advantage of being explicitly organized around questions and topics.</p>
<p><strong>3. Community participation.</strong></p>
<p>Is Quora a question answering site or a social network? Quora users and employees have tried to answer that question (<a href="http://www.quora.com/Is-Quora-a-social-network">on Quora</a>, natch), but I&#8217;m not sure Quora&#8217;s converged enough for anyone to know. What is clear is that Quora emphasizes conversation, making it more like a blog or wiki than an answers site.</p>
<p>Conversation certainly engages its participants. But it also raises the cost of participation. One of the things I love about Google is that it gives me information without unnecessary overhead. When I want conversation, I go to social venues like Twitter.</p>
<p>Perhaps Quora can be both a question answering site and a social network. But I suspect it will need to choose. Most people don&#8217;t have the time or patience to participate in additional communities, so question answering is the easier sell to a mass audience. But the participation is what makes Quora especially distinctive today. Perhaps it&#8217;s a question of quality vs. quantity.</p>
<p>So, <em><a href="http://en.wikipedia.org/wiki/Quo_vadis">quo vadis</a></em>, Quora? I suppose I&#8217;ll have to <a href="http://www.quora.com/Quora-Quality/Quora-is-a-curated-community-of-early-adopters-now-its-nice-but-how-can-it-scale">check Quora</a> (or <a href="http://www.cwora.com/">Cwora</a>) to find the answers.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/09/quo-vadis-quora/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/09/quo-vadis-quora/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>No More Quora Invites</title>
		<link>http://thenoisychannel.com/2011/01/07/no-more-quora-invites/</link>
		<comments>http://thenoisychannel.com/2011/01/07/no-more-quora-invites/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 14:54:55 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3430</guid>
		<description><![CDATA[Over the past days, I have been inundated with requests for Quora invites. I realize that I brought this upon myself my making my blog the top hit on Google for [quora invite] &#8212; though it seems I&#8217;m at least down to the #2 slot now. In any case, I have sent out over a [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past days, I have been inundated with requests for <a href="http://www.quora.com/">Quora</a> invites. I realize that I brought this upon myself my making my blog the top hit on Google for [<a href="http://www.google.com/search?q=quora+invite">quora invite</a>] &#8212; though it seems I&#8217;m at least down to the #2 slot now. In any case, I have sent out over a hundred invites and need to stop fulfilling requests so that I can focus on my day job!</p>
<p>I hope everyone I&#8217;ve invited is enjoying Quora. But I also hope you take it upon yourselves to circulate more invitations to those who want them. Any Quora user can send out invites &#8212; that&#8217;s how these viral sites work. If you&#8217;re still looking for an invite, I urge you to use Twitter or some other broadcast mechanism to request it. As of today, I will stop responding to Quora invite requests through my blog or email, and I will also delete comments requesting them. I am sorry if this is a bit harsh, but I hope folks understand.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/07/no-more-quora-invites/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/07/no-more-quora-invites/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Enabling Exploratory Search with Dhiti</title>
		<link>http://thenoisychannel.com/2011/01/07/enabling-exploratory-search-with-dhiti/</link>
		<comments>http://thenoisychannel.com/2011/01/07/enabling-exploratory-search-with-dhiti/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 06:29:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3426</guid>
		<description><![CDATA[Last August, I wrote &#8220;Exploring Nuggetize&#8220;, in which I described an interface that Dhiti co-founder Bharath Mohan developed to surface “nuggets” from a site and reduce the user’s cost of exploring a document collection. As an experiment, I&#8217;m now including a Dhiti widget here on The Noisy Channel. If you look at any single post in a browser, [...]]]></description>
			<content:encoded><![CDATA[<p>Last August, I wrote &#8220;<a href="http://thenoisychannel.com/2010/08/15/exploring-nuggetize/">Exploring Nuggetize</a>&#8220;, in which I described an interface that <a href="http://www.dhiti.com/">Dhiti</a> co-founder <a href="http://in.linkedin.com/in/bharathkumarmohan">Bharath Mohan</a> developed to surface “nuggets” from a site and reduce the user’s cost of exploring a document collection. As an experiment, I&#8217;m now including a Dhiti widget here on The Noisy Channel. If you look at any single post in a browser, you&#8217;ll see a widget at the end of the post (before the comments) that attempts to use the post as a starting point for further exploration.</p>
<p>Please use the comments here to provide feedback. Bharath is eager to improve his product, and I&#8217;m eager to improve the experience for all of you!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/07/enabling-exploratory-search-with-dhiti/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/07/enabling-exploratory-search-with-dhiti/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Two Changes</title>
		<link>http://thenoisychannel.com/2011/01/06/two-changes/</link>
		<comments>http://thenoisychannel.com/2011/01/06/two-changes/#comments</comments>
		<pubDate>Thu, 06 Jan 2011 04:09:16 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3422</guid>
		<description><![CDATA[I just wanted to let regular readers know that I&#8217;ve made two changes to this blog. The first is that I&#8217;ve eliminated the use of categories on posts. I found that I was categorizing almost all posts as &#8220;general&#8221;, and that there was almost no value in maintaining such a low-entropy field. Yes, I&#8217;m well [...]]]></description>
			<content:encoded><![CDATA[<p>I just wanted to let regular readers know that I&#8217;ve made two changes to this blog.</p>
<p>The first is that I&#8217;ve eliminated the use of categories on posts. I found that I was categorizing almost all posts as &#8220;general&#8221;, and that there was almost no value in maintaining such a low-<a href="http://en.wikipedia.org/wiki/Entropy_(information_theory)">entropy</a> field. Yes, I&#8217;m well aware of the irony that, despite being an advocate of <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a>, I&#8217;m not providing any meta-data to annotate my posts. But I take what I believe to be a user-driven approach, and I can see from my logs that readers primarily visit to read my most recent posts. For those who like to browse, I encourage you to take advantage of the related posts feature, which is powered by the <a href="http://mitcho.com/code/yarpp/">Yet Another Related Posts Plugin</a>.</p>
<p>The second is that I&#8217;ve retired <a href="http://thenoisychannel.com/2009/01/15/the-noisy-community/">The Noisy Community</a>. Two years ago, I created this directory of regular readers and commenters in order to foster a sense of community. I believe it was a success, but that it has outlived its usefulness. Again based on my logs, I can see that it has been neglected for a while. So, rather than continuing to invest in maintaining it manually, I have given it an honorable discharge.</p>
<p>My apologies to any readers whom I&#8217;ve offended with these changes. As always, I encourage you to make suggestions &#8212; especially if you&#8217;re willing to help implement them!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/06/two-changes/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/06/two-changes/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>So You Like Big Data&#8230;</title>
		<link>http://thenoisychannel.com/2011/01/04/so-you-like-big-data/</link>
		<comments>http://thenoisychannel.com/2011/01/04/so-you-like-big-data/#comments</comments>
		<pubDate>Tue, 04 Jan 2011 04:43:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3408</guid>
		<description><![CDATA[The increasing volume of data that we generate as a species is a story so overplayed as to have become trite. Indeed, a vast amount of this data is in the public domain, including data from the full text and common ngrams of books, genome research, the  United States census, and much more. There is [...]]]></description>
			<content:encoded><![CDATA[<p>The increasing volume of data that we generate as a species is a story so overplayed as to have become trite. Indeed, a vast amount of this data is in the public domain, including data from the <a href="http://www.gutenberg.org/">full text</a> and common <a href="http://ngrams.googlelabs.com/datasets">ngrams</a> of books, <a href="http://www.ncbi.nlm.nih.gov/guide/data-software/">genome research</a>, the  <a href="http://www.census.gov/">United States census</a>, and much more. There is also open-source software not only to <a href="http://nutch.apache.org/">crawl</a> the web, but also to <a href="http://lucene.apache.org/">search</a> the data your crawl. So, if you&#8217;re an aspiring data scientist and just want to get your hands on data, there&#8217;s no excuse&#8211;go out and get it!</p>
<p>But perhaps you&#8217;d like to make a career out your jones for big data. Luckily for you, some of the hottest companies around are hiring data scientists!</p>
<p>Of course, those jobs aren&#8217;t for everyone. To get an idea of the necessary qualifications, I suggest you read the answers on Quora for &#8220;<a href="http://www.quora.com/How-do-I-become-a-data-scientist">How do I become a data scientist?</a>&#8221; to get an idea of the requisite math and computer science skills. I&#8217;m also a fan of <a href="http://www.hilarymason.com/">Hilary Mason</a>&#8216;s definition which was cited in Ryan Kim&#8217;s &#8220;<a href="http://gigaom.com/2010/12/16/wanted-data-scientists-to-turn-information-into-gold/">Wanted: Data Scientists to Turn Information Into Gold</a>&#8220;: a data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. You can see Hilary&#8217;s full explanation in a blog post she co-authored with <a href="http://www.columbia.edu/~chw2/">Chris Wiggins</a>, entitled &#8220;<a href="http://www.dataists.com/2010/09/a-taxonomy-of-data-science/">A Taxonomy of Data Science</a>&#8220;.</p>
<p>If the qualifications haven&#8217;t scared you off, then it&#8217;s just a question of where you can best apply your data scientist skills. The good news is that there are a lot of different ways to make a career out of working with big data. Here are some suggestions for what to work on. I apologize in advance for taking a US-centric perspective &#8212; if you&#8217;re outside the US, I can only hope that the examples have local analogs.</p>
<p><strong>1) Web search.</strong></p>
<p>Google, Yahoo, and Bing all collect an enormous amount of data from people&#8217;s web search activity. Google is, of course, the 800-pound gorilla, but don&#8217;t dismiss the others &#8212; even a single-digit market share is enough to derive extremely valuable insights from user activity. And, since every major search engine makes the bulk of its revenue from advertising, they all present the big-data challenges associated with <a href="http://thenoisychannel.com/2009/07/31/sigir-2009-day-3-industry-track-vanja-josifovski/">computational advertising</a>. Search is, in my view, the web&#8217;s killer app, so you can&#8217;t go wrong working on it. But temper your expectations &#8212; despite heroic efforts from various parties, it seems difficult to deliver revolutionary improvements to this field.</p>
<p><strong>2) Social networking.</strong></p>
<p>Here the biggest players are Facebook and Twitter, but you can find a more comprehensive <a href="http://en.wikipedia.org/wiki/List_of_social_networking_websites">list</a> on Wikipedia. Many consider LinkedIn to be a social network, but I&#8217;ll take the liberty to discuss it in its own section. Social networks attract an outsized share of users&#8217; attention: Facebook alone accounts for a <a href="http://weblogs.hitwise.com/heather-dougherty/2010/11/facebookcom_generates_nearly_1_1.html">quarter of US page views</a> on the web! All of this user activity means a lot of data to crunch, so it&#8217;s not surprising that LinkedIn, Facebook, and Twitter are recognized as having the <a href="http://www.quora.com/Which-companies-have-the-best-data-science-teams">best data science teams</a>. How much you&#8217;ll enjoy working at these companies will in part reflect the value (and values) you perceive in their offerings, but they are all playgrounds for data scientists.</p>
<p><strong>3) Electronic commerce.</strong></p>
<p>While ad-supported web search may be the killer app of the web, what opens up people&#8217;s wallets is e-commerce. Led by Amazon and eBay, e-commerce sites deserve much of the credit for turning the web from an esoteric research project into a mainstream staple. And, <a href="http://www.prenhall.com/divisions/bp/app/alter/student/useful/ch1walmart.html">like their offline counterparts</a>, e-commerce sites generate vast amounts of data from how users view and purchase products. This data drives user recommendations, merchandising campaigns, pricing strategy, and much more. If you&#8217;d like to pursue data-driven capitalism, then e-commerce may be for you. A word of caution: if you are one of a crowd of merchants selling the same products as everyone else (as opposed to a site like <a href="http://www.etsy.com/">Etsy</a> selling unique products), make sure you have a sustainable competitive advantage. Data science is necessary for success in e-commerce, but it may not be sufficient.</p>
<p><strong>4) Digital content.</strong></p>
<p>Whether its books, music, video, or apps, the long-prophesied digital convergence has arrived: almost every newly created piece of digital content is now distributed in electronic form. Here the biggest players are Amazon, Apple, and Google (particularly its YouTube subsidiary), but there is still a lot of flux as new hardware, software, and business models compete for dominance. Digital content poses two daunting challenges: the volume of published content far exceeds people&#8217;s available attention, and digital media products are <a href="http://en.wikipedia.org/wiki/Experience_good">experience goods</a> than people can only evaluate after consuming them. For both of these reasons, the digital content industry depends on data scientists to help people find and discover what they like. The catch: from its advent, the digital content industry has struggled with unauthorized distribution (aka piracy), and the results of this struggle will determine which business models are viable.</p>
<p><strong>5) Finance.</strong></p>
<p><a href="http://www.youtube.com/watch?v=ETxmCCsMoD0">Money, money, money.</a> Working in finance has always been a data-intensive business, but advances in technology have only increased the industry&#8217;s reliance on data scientists. <a href="http://en.wikipedia.org/wiki/Algorithmic_trading">Algorithmic trading</a> &#8212; and <a href="http://en.wikipedia.org/wiki/High-frequency_trading">high-frequency trading</a> in particular &#8212; mean that those who can most effectively and efficiently mine financial data can derive enormous financial benefits. Finance isn&#8217;t for everyone &#8212; the hours are long, the stress is high, and the compensation is highly variable. That said, the financial upside can be quite compelling, and some even enjoy the lifestyle.</p>
<p><strong>6) Public sector.</strong></p>
<p>Given the libertarian leanings of the software industry, the public sector might not seem like an obvious career choice. But some of the largest repositories of data reside there&#8211;from public repositories like <a href="http://www.census.gov/">census</a> data to highly classified repositories restricted to the <a href="http://www.urbandictionary.com/define.php?term=Three-letter+Agencies">TLAs</a>. Better understanding of this data can improve public policy, national security, and much more. Not everyone has the temperament to deal with government bureaucracy, but those who do have the opportunity to turn big data into big public good.</p>
<p><strong>7) LinkedIn.</strong></p>
<p>OK, I&#8217;m being self-serving, but after all this is my blog! LinkedIn is widely recognized as being one of the top data science teams on the planet. But LinkedIn has more than just talent &#8212; it has what Pete Warden of ReadWriteWeb described in &#8220;<a href="http://www.readwriteweb.com/hack/2010/11/secrets-of-the-linkedin-data-scientists.php">Secrets of the LinkedIn Data Scientists</a>&#8221; as &#8220;detailed information on millions of people who are motivated to keep their profiles up-to-date, collect a rich network of connections and have a strong desire from their users for more tools to help them in their professional lives.&#8221; Indeed, I don&#8217;t know of anyone who has a dataset that competes with the combined quantity, quality, and utility of LinkedIn&#8217;s data. Moreover, working as a data scientist at LinkedIn means helping make people more professionally successful by connecting the to opportunities, information, and of course other people. It&#8217;s a wonderful way to create value, and it doesn&#8217;t hurt to do so in the context of a <a href="http://www.businessinsider.com/linkedin-looks-to-almost-double-headcount-in-2010-2010-6">profitable, rapidly growing company</a>.</p>
<p>And LinkedIn recognizes the extraordinary value of data science. Don&#8217;t take my word for it &#8212; listen to LinkedIn CEO Jeff Weiner&#8217;s <a href="http://www.youtube.com/v/unnQOEuAG8o">interview</a> at the 2010 Web 2.0 Summit:</p>
<p>To wrap up, data science is more than just an opportunity to have fun and make the world a better place &#8212; it might even be how you make an honest living!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2011/01/04/so-you-like-big-data/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2011/01/04/so-you-like-big-data/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>Reflecting on 2010: Searching for Answers</title>
		<link>http://thenoisychannel.com/2010/12/30/reflecting-on-2010-searching-for-answers/</link>
		<comments>http://thenoisychannel.com/2010/12/30/reflecting-on-2010-searching-for-answers/#comments</comments>
		<pubDate>Fri, 31 Dec 2010 01:29:29 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3397</guid>
		<description><![CDATA[Yes, it&#8217;s that time of year when we take a moment to reflect on the past year&#8217;s accomplishments and muse about what the next year will bring. Other than milder weather! I began this year as a Noogler and leave it as a Xoogler. I hope I left Google better than I found it &#8212; [...]]]></description>
			<content:encoded><![CDATA[<p>Yes, it&#8217;s that time of year when we take a moment to reflect on the past year&#8217;s accomplishments and muse about what the next year will bring. Other than <a href="http://www.wunderground.com/US/CA/Mountain_View.html">milder weather</a>!</p>
<p>I began this year as a <a href="http://www.urbandictionary.com/define.php?term=noogler">Noogler</a> and leave it as a <a href="http://www.linkedin.com/groups?home=&amp;gid=73619">Xoogler</a>. I hope I left Google better than I found it &#8212; I&#8217;m certainly proud of the improvements my team made to the quality of <a href="http://www.seobythesea.com/?p=245">local authority pages</a>. I also tried to infuse Google with some of the scrappy start-up culture I&#8217;d picked up at <a href="http://www.endeca.com/">Endeca</a>, particularly focusing on the hiring process. In information retrieval terms, I&#8217;d say that Google&#8217;s hiring process does extremely well when it comes to <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Precision">precision</a>, but could use improvement in the areas of <a href="http://en.wikipedia.org/wiki/Precision_and_recall#Recall">recall</a> and efficiency. Still, I&#8217;m impressed at how well Google has maintained its quality standards as the company has grown. Finally, I couldn&#8217;t help being an extrovert: I developed warm relationships with the lead <a href="http://thenoisychannel.com/2009/12/05/blogs-i-read-living-la-vida-local/">bloggers covering local search</a>, including <a href="http://www.localseoguide.com/about-me/">Andrew Shotland</a>, <a href="http://www.davidmihm.com/">David Mihm</a>, <a href="http://twitter.com/#!/golander59">Gib Olander</a>, <a href="http://gesterling.wordpress.com/about/">Greg Sterling</a>, and <a href="http://www.blumenthals.com/index.php?MikeBlumenthal">Mike Blumenthal</a>. Indeed, when I announced my departure, Mike wrote a <a href="http://blumenthals.com/blog/2010/12/03/daniel-tunkelang-leaving-google-maps-to-join-linkedin/">really nice post</a> about the friendship we cultivated over the past year. I hope that he continues to have such relationships with my former co-workers.</p>
<p>Looking back at what was <a href="http://thenoisychannel.com/2010/01/03/search-questions-for-2010-whats-on-my-mind/">on my mind when this year began</a>, I had lots of questions around exploratory, mobile, real-time, social/collaborative search. I also wondered whether it was  possible to offer more transparency in relevance ranking without losing ground in the battle against spam and black-hat SEO.</p>
<p>I&#8217;m as bullish as ever on the value of <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a>:  part of why I <a href="http://thenoisychannel.com/2010/12/03/follow-the-data/">joined LinkedIn</a> is that a significant fraction of the site&#8217;s value comes from supporting users&#8217; exploratory search needs. I also published a position paper at the <a href="http://www.mansci.uwaterloo.ca/~msmucker/publications/simint10proceedings.pdf">SIGIR 2010 Workshop on Simulation of Interaction</a> proposing the use of <a href="http://thenoisychannel.com/2010/05/23/estimating-the-query-difficulty-for-information-retrieval/">query performance prediction</a> to model the fidelity of communication between user and system, thus helping <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> researchers to simulate query refinement with standard test collections. And of course exploratory search was a major theme at the <a href="http://sites.google.com/site/hcirworkshop/hcir-2010">HCIR 2010</a> workshop, not only providing the basis for the first <a href="http://sites.google.com/site/hcirworkshop/hcir-2010/challenge">HCIR Challenge</a>, but even extending to new territory with Max Wilson and David Elsweiler&#8217;s work on <a href="http://www.slideshare.net/gingdotslideshare/hcir2010-casualleisure-search">casual leisure searching</a>.</p>
<p>As for mobile search, I&#8217;d say that 2010 has been the year of &#8220;<a href="http://www.avc.com/a_vc/2010/09/mobile-first-web-second.html">mobile first</a>&#8220;. Thanks to a <a href="http://googlemobile.blogspot.com/2009/12/android-dogfood-diet-for-holidays.html">generous gift</a> from my former employer, I&#8217;ve become a regular user of the mobile web&#8211;and of search in particular. To my surprise, the communication bottleneck has not been screen real estate, but rather the difficulty of entering text. And innovative approaches like <a href="http://www.youtube.com/watch?v=laOlkD8LmZw">voice search</a> and <a href="http://www.swypeinc.com/">Swype</a> go a long way to mitigate that difficulty.</p>
<p>On to real-time search. Not surprisingly, my favorite innovation in this space is <a href="http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/">LinkedIn Signal</a>, which offers exploratory search for Twitter. I still struggle to find <a href="http://thenoisychannel.com/2010/01/18/real-time-search-is-personal/">use cases</a> that emphasize the &#8220;real-time&#8221; aspect of Twitter and other microblogging services, but I am convinced that the path to utility lies in tools that support organization, analysis, and exploration.</p>
<p>On the social/collaborative front, I&#8217;m happy to work for a company whose charter includes &#8220;supporting mediated search by linking people to people, rather than directly to information&#8221;.  While the biggest event in this space in 2010 was Facebook&#8217;s introduction of the <a href="http://developers.facebook.com/docs/reference/plugins/like">Like button</a>, I&#8217;m not convinced that &#8220;likes&#8221; have supplanted links. I&#8217;m still looking to niche players like <a href="http://thenoisychannel.com/2009/05/27/topsy-tippling-the-stream-of-conversations/">Topsy</a> and <a href="http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/">Blekko</a> to push innovation in this space.</p>
<p>Speaking of Blekko, they&#8217;ve made an impressive attempt to increase the transparency of relevance ranking. But, <a href="http://thenoisychannel.com/2010/03/07/google-and-transparency/">as I blogged earlier this year</a>,  I think that, at least for the time being, Google is making the right decision to keep some of its details secret. Now that web search is essentially a <a href="http://www.bing.com/community/site_blogs/b/search/archive/2009/07/29/exciting-times-for-bing-and-yahoo.aspx">duopoly</a> (at least in the US), I believe the real test of the value of transparency to users will be whether one of the two parties employs it as competitive differentiator.</p>
<p>What&#8217;s in store for 2011? LinkedIn CEO <a href="http://www.linkedin.com/in/jeffweiner08">Jeff Weiner</a> has a vision of using data science to provide a &#8220;<a href="http://blogs.wsj.com/venturecapital/2010/06/10/after-first-year-as-linkedins-ceo-jeff-weiner-talks-shop/">Pandora for people</a>&#8220;, and that&#8217;s a vision I&#8217;m eager to help realize. Not surprisingly, when I blogged in 2008 about <a href="http://thenoisychannel.com/2008/08/07/where-google-isnt-good-enough/">where Google wasn&#8217;t good enough</a>, two of the four areas I cited were finding jobs and find employees. Even then I recognized that LinkedIn was the best at both. But LinkedIn can be so much more, and I am looking forward to working with an incredible team and incredible data on a delightful set of information science challenges.</p>
<p>Happy New Year! I hope that 2011 brings you great answers &#8212; and great questions!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/12/30/reflecting-on-2010-searching-for-answers/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/12/30/reflecting-on-2010-searching-for-answers/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The Secret May Be To Keep Fewer Secrets</title>
		<link>http://thenoisychannel.com/2010/12/22/the-secret-may-be-to-keep-fewer-secrets/</link>
		<comments>http://thenoisychannel.com/2010/12/22/the-secret-may-be-to-keep-fewer-secrets/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 05:38:25 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3392</guid>
		<description><![CDATA[In light of the recent WikiLeaks saga and the various leaks that have plagued my former employer, I was musing the other day about whether leaks are inevitable as an organization grows. I started off by considering a model where each individual in an organization leaks a particular piece of sensitive information with a constant [...]]]></description>
			<content:encoded><![CDATA[<p>In light of the recent <a href="http://en.wikipedia.org/wiki/WikiLeaks#US_diplomatic_cables_leak">WikiLeaks saga</a> and the various leaks that have plagued my former employer, I was musing the other day about whether leaks are inevitable as an organization grows.</p>
<p>I started off by considering a model where each individual in an organization leaks a particular piece of sensitive information with a constant probability p, and where acts of leakage are independent and identically distributed events. Now let&#8217;s consider what value of p leads to a 99% probability of leakage in an organization of n = 20,000 people. It&#8217;s <a href="http://www.google.com/search?q=1+/+(1+-+(.01+^+(1/20000)))">less than 1/4000</a>. In other words, even if each person in an organization can keep a secret with 99.98% reliability, almost all secrets will be leaked.</p>
<p>Using this same value of p with n = 900 (roughly the size of my current employer) yields less than a 20% chance of leakage &#8212; certainly not a zero probability, but much closer to zero than to one. And at n = 90 &#8212; the upper end of what I&#8217;d consider a startup &#8212; the probability of leakage drops to 2%. Based on this crude analysis, the ability to keep secrets drops very rapidly as organizations enjoy the growth that comes with success.</p>
<p>Moreover, p is likely to be positively correlated to n &#8212; that is, individuals in larger organizations are more likely to leak sensitive information. Many people in larger organizations have less actual and perceived stake in the organization&#8217;s success, than those in smaller ones. Also, it is difficult to sustain grueling hiring standards &#8212; particularly cultural ones &#8212; as an organization grows.</p>
<p>So what is an organization to do? If the above model is even close to accurate, then I can see four options:</p>
<p><strong>1) Don&#8217;t grow.</strong></p>
<p>Yes, I&#8217;m serious. Not every idea inspires a billion-dollar business, and not every company should grow beyond a hundred people. Growth has costs that offset its benefits, and the inability to keep secrets may be a significant cost for organizations whose competitive advantage depends on proprietary intellectual property. The <a href="http://en.wikipedia.org/wiki/Hedge_fund#Notable_hedge_fund_firms">largest hedge funds</a> each have about 1,000 employees, and most are much smaller. Secrecy is not the only consideration, but it&#8217;s certainly a consideration.</p>
<p><strong>2) Share less with your employees.</strong></p>
<p>If you can&#8217;t reduce p, you can at least reduce n by sharing secrets less widely. Traditional organizations only share sensitive information within a tight inner circle. Even Google, known for sharing almost everything with its employees, keeps tighter control over the details of search result ranking. This approach, however, comes at a cost: it signals to employees that they cannot be trusted. Moreover, if employees discover secret information through rumor, they may feel less responsible for maintaining secrecy than if they had been entrusted with that information.</p>
<p><strong>3) Investigate leaks and punish leakers.</strong></p>
<p>Some organizations succeed better than others at rooting out leakers and punishing them. In economic terms, it makes sense to discourage undesirable behavior through strong disincentives. Note, however, that leakers rarely gain anything tangible in exchange for their leaks and indeed are often acting irrationally in strictly economic terms. People in general have been known to act <a href="http://www.amazon.com/gp/product/B002C949KE">irrationally</a>. So I&#8217;d caution against any approach that assumes human rationality. A better approach may be to detect or prevent of leaks through technology (e.g., <a href="http://en.wikipedia.org/wiki/Packet_analyzer">packet analyzers</a>), but see the previous comment about making employees feel they cannot be trusted.</p>
<p><strong>4) Keep fewer secrets.</strong></p>
<p>A prominent CEO recently said &#8220;<a href="http://www.cnbc.com/id/15840232?play=1&amp;video=1372176413">If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place</a>&#8220;. Yes, I&#8217;m taking the quotation out of context, but I&#8217;d like to offer a variant: if your organization&#8217;s success depends on something that you don’t want anyone to know, maybe you should reconsider your business model. Less glibly, you should avoid unnecessary dependence on secrecy, and you should avoid labeling all corporate information as secret, since that desensitizing employees to the risks of disclosure.</p>
<p>Conclusion? As Ben Franklin said, “Three may keep a secret, if two of them are dead.” Organizations can and do manage to keep secrets. But it&#8217;s hard to fight human nature, and better not to rely on winning that fight.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/12/22/the-secret-may-be-to-keep-fewer-secrets/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/12/22/the-secret-may-be-to-keep-fewer-secrets/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>CIKM 2011 Industry Event</title>
		<link>http://thenoisychannel.com/2010/12/17/cikm-2011-industry-event/</link>
		<comments>http://thenoisychannel.com/2010/12/17/cikm-2011-industry-event/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 17:10:15 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3386</guid>
		<description><![CDATA[CIKM 2011 is nearly a year away, but I wanted to give folks a heads up about the Industry Event there that I am organizing with Tony Russell-Rose. These events have become an an increasingly important part of the annual CIKM and SIGIR conferences, and I believe they are helping to bridge the gap between scholarship [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cikm2011.org/node/20"><img class="alignnone" title="CIKM 2011" src="http://www.cikm2011.org/sites/default/files/cikm2011_craigm_v1_logo.jpg" alt="" width="576" height="91" /></a></p>
<p><a href="http://www.cikm2011.org/">CIKM 2011</a> is nearly a year away, but I wanted to give folks a heads up about the <a href="http://www.cikm2011.org/node/20">Industry Event</a> there that I am organizing with <a href="http://isquared.wordpress.com/about/">Tony Russell-Rose</a>. These events have become an an increasingly important part of the annual CIKM and <a href="http://www.sigir.org/">SIGIR</a> conferences, and I believe they are helping to bridge the gap between scholarship and practice. When I organized the <a href="http://www.sigir2009.org/Program/industry">SIGIR 2009 Industry Event</a>, it was almost too popular &#8212; I felt bad for the parallel research presentations that had to compete with <a href="http://www.mattcutts.com/blog/about-me/">Matt Cutts</a> and <a href="http://www.danah.org/">danah boyd</a> for attendees!</p>
<p>But not so bad that I wouldn&#8217;t do it again! We have an outstanding line-up of invited talks for the CIKM 2011 Industry Event, featuring:</p>
<ul>
<li><a href="http://research.microsoft.com/en-us/people/robertson/">Stephen Robertson</a> (Microsoft Research)</li>
<li><a href="http://www.freebase.com/view/en/john_giannandrea">John Giannandrea</a> (Google)</li>
<li><a href="http://jeffhammerbacher.com/">Jeff Hammerbacher</a> (Cloudera)</li>
<li><a href="http://www.jacksonpeter.com/">Peter Jackson</a> (Thomson Reuters)</li>
</ul>
<p>For those not familiar with industry luminaries, that list includes one of the world&#8217;s most prominent <a href="http://en.wikipedia.org/wiki/Information_retrieval#Major_figures">information retrieval researchers</a>, the founder of Metaweb (which created <a href="http://www.freebase.com/">Freebase</a>), the person who build Facebook&#8217;s data team (which developed <a href="http://wiki.apache.org/hadoop/Hive">Hive</a> and <a href="http://cassandra.apache.org/">Cassandra</a>), and one of the leading industrial researchers on <a href="http://www.jacksonpeter.com/nlp">natural language processing</a>. To borrow a sports metaphor, these were our first-round draft picks, and we are delighted that they all agreed to participate.</p>
<p>And those are just the keynotes! We&#8217;re also going to put out a call for participation soon, so watch this space!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/12/17/cikm-2011-industry-event/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/12/17/cikm-2011-industry-event/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First Week</title>
		<link>http://thenoisychannel.com/2010/12/11/first-week/</link>
		<comments>http://thenoisychannel.com/2010/12/11/first-week/#comments</comments>
		<pubDate>Sat, 11 Dec 2010 20:35:43 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3382</guid>
		<description><![CDATA[It&#8217;s hardly surprising, at least in retrospect, that location-based social networking company Foursquare was founded (twice!) in New York City. Where else (at least in the United States) are there so many people with so many places to go and so many ways to get there? I&#8217;m not a social or environmental determinist, but clearly [...]]]></description>
			<content:encoded><![CDATA[<p>It&#8217;s hardly surprising, at least in retrospect, that location-based social networking company <a href="http://en.wikipedia.org/wiki/Foursquare_(social_networking) ">Foursquare</a> was founded (<a href="http://en.wikipedia.org/wiki/Dodgeball_(service) ">twice</a>!) in New York City. Where else (at least in the United States) are there so many people with so many places to go and so many ways to get there? I&#8217;m not a social or environmental determinist, but clearly a startup needs hospitable conditions to thrive.</p>
<p>Having just started my new life as a citizen of Silicon Valley, I&#8217;ve quickly comprehended how it is the perfect birthplace for LinkedIn. Every introduction has been an exercise of <a href="http://en.wikipedia.org/wiki/Triadic_closure">triadic closure</a>. Indeed, while most people know that the Bay Area is the world&#8217;s leading hub for technology startups, perhaps not everyone realizes that the foundation for this environment is the professional network that binds it. I&#8217;ve only been here for a week, and yet my world seems smaller by the day as I keep discovering new connections among my colleagues. It&#8217;s a lot of fun, if a bit overwhelming!</p>
<p>And fun but overwhelming is a great way to describe LinkedIn itself. It&#8217;s only been a few days since I updated my <a href="http://www.linkedin.com/in/dtunkelang">profile</a>, but I already feel immersed in LinkedIn&#8217;s vibrant culture. I sit in an open office, surrounded by people I work with &#8212; data scientists, software engineers, product managers, designers, and more. And I&#8217;m already interviewing folks I might be working with soon &#8212; in a company <a href="http://www.linkedin.com/company/linkedin/statistics">growing as quickly as LinkedIn</a>, it is everyone&#8217;s job to grow the team. I&#8217;ve joked to friends that moving west gave me three more hours to get work done &#8212; but I&#8217;m using them all and they&#8217;re not enough!</p>
<p>But despite this explosive growth, LinkedIn&#8217;s vision is shared and tight. We all know that our goal is to connect the world’s professionals to make them more productive and successful. Having such a clear-cut mission enables us to directly relate all of our efforts and ambitions to the concrete value they create. It&#8217;s a great feeling, and it helps me keep my sanity as I observe the size of my ever-increasing to-do list.</p>
<p>To say that I&#8217;m still adjusting is an understatement. I haven&#8217;t made a change like this is over a decade, and this adventure feels even more immersive. But a big difference between now and 1999 is that I arrive in my new world with a network of people there to welcome me. I have LinkedIn to thank for helping me develop that network, and it&#8217;s great to finally have the opportunity to give back.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/12/11/first-week/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/12/11/first-week/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Follow The Data</title>
		<link>http://thenoisychannel.com/2010/12/03/follow-the-data/</link>
		<comments>http://thenoisychannel.com/2010/12/03/follow-the-data/#comments</comments>
		<pubDate>Fri, 03 Dec 2010 05:00:06 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3372</guid>
		<description><![CDATA[Today is my last day at Google. I have enjoyed an incredible year there, during which I&#8217;ve had the privilege to work with some of the smartest engineers on the planet. Working at Google taught me how much impact a handful of dedicated people can have on the lives of billions of users. Not that [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.linkedin.com/"><img class="alignnone size-medium wp-image-3374" title="LinkedIn" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/12/linkedin-logo1-300x84.png" alt="" width="300" height="84" /></a></p>
<p>Today is my last day at Google. I have enjoyed an incredible year there, during which I&#8217;ve had the privilege to work with some of the smartest engineers on the planet. Working at Google taught me how much impact a handful of dedicated people can have on the lives of billions of users. Not that long ago, I <a href="http://thenoisychannel.com/2009/01/08/google-tech-talk-reconsidering-relevance/">compared Google to McDonald&#8217;s</a>. Having spent time on the inside, I can attest that Google is a marvel of <a href="http://books.google.com/books?id=ebKsbjM6h5AC&amp;lpg=PA35&amp;ots=wYUlW5xGlI&amp;pg=PA20">scale orchestration</a>. Moreover, the <a href="http://www.google.com/intl/en/jobs/uslocations/new-york/more/">Google New York</a> office represents an impressive concentration of Google&#8217;s talent in the greatest city of the world.</p>
<p>But I am leaving Google to pursue the opportunity of a lifetime. On Monday, I will start a new chapter of my life. I am joining the <a href="http://www.quora.com/Which-companies-have-the-best-data-science-teams">data scientist</a> team at <a href="http://www.linkedin.com/">LinkedIn</a>, where I&#8217;ll be working with <a href="http://www.linkedin.com/in/dpatil">DJ Patil</a> and his world-class team to build products and discover insights from a data collection that I have coveted for years. I&#8217;ll get to work with folks like <a href="http://www.linkedin.com/in/peterskomoroch">Pete Skomoroch</a> and <a href="http://www.linkedin.com/in/mrogati">Monica Rogati</a>. And I&#8217;ll get to tackle challenges in my favorite areas of computer science: information extraction, matching, recommendation, social network analysis, and network visualization. Not to mention working with one of the largest <a href="http://sna-projects.com/blog/2010/07/linkedin-faceted-search/">faceted search</a> deployments on the web!</p>
<p>It was an agonizing decision to leave Google and New York City. But, when LinkedIn reached out to me a couple of months ago, I was reminded of a fateful email from <a href="http://www.endeca.com/about-us-leadership-team.htm">Steve Papa</a> in July 1999 that led me to pack a bag two months later and begin the adventure that is now <a href="http://www.endeca.com/">Endeca</a>. LinkedIn is hardly a startup &#8212; it has <a href="http://press.linkedin.com/faq">over 600 employees and over 80 million members</a>. But I see boundless opportunities to create new value from the great data and talent that LinkedIn has assembled. So, when I received that note from LinkedIn, I didn&#8217;t really have a choice.</p>
<p>This Monday, I begin a new adventure. Data, here I come!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/12/03/follow-the-data/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/12/03/follow-the-data/feed/</wfw:commentRss>
		<slash:comments>62</slash:comments>
		</item>
		<item>
		<title>Giving Thanks as an Information Scientist</title>
		<link>http://thenoisychannel.com/2010/11/25/giving-thanks-as-an-information-scientist/</link>
		<comments>http://thenoisychannel.com/2010/11/25/giving-thanks-as-an-information-scientist/#comments</comments>
		<pubDate>Fri, 26 Nov 2010 02:20:33 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3367</guid>
		<description><![CDATA[As a first-generation American who is married to a card-carrying Native American, I celebrate Thanksgiving the traditional way: a day of gluttony followed by yummy leftovers. But, trite as it may be, I do like to take the time to reflect on the countless things for which I am thankful. A wonderful family, of course, [...]]]></description>
			<content:encoded><![CDATA[<p>As a first-generation American who is married to a <a href="http://shop.cafepress.com/yurok">card-carrying Native American</a>, I celebrate Thanksgiving the traditional way: a day of <a href="http://www.youtube.com/watch?v=Rp4yWTLIPaE">gluttony</a> followed by yummy leftovers. But, trite as it may be, I do like to take the time to reflect on the countless things for which I am thankful. A wonderful <a href="http://www.flickr.com/photos/24264445@N05/">family</a>, of course, but also the great fortune to live in an age where some of the subjects that I find most intellectually stimulating have become highly relevant to our practical daily lives.</p>
<p>Consider <a href="http://en.wikipedia.org/wiki/Information_retrieval">information retrieval</a>. Perhaps I&#8217;m dating myself, but an undergraduate computer science major, I hardly imagined that information retrieval would have much significance outside of academia. Sure, there were commercial IR systems being built in the 1980s, but it wasn&#8217;t until the late 1990s that web search brought IR to the mainstream. Today, it&#8217;s hard to imagine studying computer science without learning about IR. Sure, my <a href="http://www.linkedin.com/in/dtunkelang">career</a> makes me a tad biased, but it is undeniable that information retrieval is one of the defining problems of our generation.</p>
<p>And then there are <a href="http://en.wikipedia.org/wiki/Social_network">social networks</a>. When I studied <a href="http://en.wikipedia.org/wiki/Graph_drawing">graph drawing</a> in the 1990s, the canonical example of a social network was &#8220;<a href="http://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon">Six Degrees of Kevin Bacon</a>&#8220;.  Sure, many of my peers would talk about their <a href="http://en.wikipedia.org/wiki/Erd%C5%91s_number">Erdős numbers</a> (they were more discreet about their placement in the <a href="http://shand.pagesperso-orange.fr/memoires/tarjan.html">Tarjan graph</a>), but the study of social networks was surely an academic pursuit. Who would imagine that, barely a decade later, a movie entitled <em><a href="http://www.imdb.com/title/tt1285016/">The Social Network</a></em> would be a blockbuster movie grossing <a href="http://boxofficemojo.com/movies/?id=socialnetwork.htm">$175M</a>? Leaving aside Hollywood, social networks have become a significant part of our daily lives. Not only do Facebook, Twitter, and LinkedIn account for a <a href="http://blog.nielsen.com/nielsenwire/online_mobile/what-americans-do-online-social-media-and-games-dominate-activity/">large fraction of our time online</a>, but they also affect our offline personal and professional lives.</p>
<p>From childhood, I&#8217;ve been interested in mathematics, computer science, and psychology. Living in an age of information retrieval and social networks means that I can apply these interests in my daily work. Today I give thanks for being born at the right place and right time, blessed with a lifetime of interesting and practical problems to solve. Happy Thanksgiving to all, and enjoy the leftovers!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/11/25/giving-thanks-as-an-information-scientist/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/11/25/giving-thanks-as-an-information-scientist/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>An Information Cascade</title>
		<link>http://thenoisychannel.com/2010/11/17/an-information-cascade/</link>
		<comments>http://thenoisychannel.com/2010/11/17/an-information-cascade/#comments</comments>
		<pubDate>Wed, 17 Nov 2010 12:38:46 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3360</guid>
		<description><![CDATA[I&#8217;ve been reading Networks, Crowds, and Markets, a great textbook by David Easley and Jon Kleinberg. I&#8217;m very grateful to Cambridge University Press for surprising me with an unsolicited review copy. I&#8217;m more than halfway through its 700+ pages. Much of the material is familiar in this &#8220;interdisciplinary look at economics, sociology, computing and information [...]]]></description>
			<content:encoded><![CDATA[<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Helvetica; min-height: 14.0px} -->I&#8217;ve been reading <em><a href="http://www.cs.cornell.edu/home/kleinber/networks-book/">Networks, Crowds, and Markets</a></em>, a great textbook by <a href="http://www.arts.cornell.edu/econ/deasley/">David Easley</a> and <a href="http://www.cs.cornell.edu/home/kleinber/">Jon Kleinberg</a>. I&#8217;m very grateful to Cambridge University Press for surprising me with an unsolicited review copy. I&#8217;m more than halfway through its 700+ pages. Much of the material is familiar in this &#8220;interdisciplinary look at economics, sociology, computing and information science, and applied mathematics to understand networks and behavior&#8221;. But I&#8217;m delighted by much that is new to me, including a particularly elegant description of an <a href="http://en.wikipedia.org/wiki/Information_cascade">information cascade</a>.</p>
<p>I excerpt the following example from section 16.2, which the authors in turn  borrow from <a href="http://lrande.people.wm.edu/">Lisa Anderson</a> and <a href="http://people.virginia.edu/~cah2k">Charles Holt</a>:</p>
<blockquote><p>The experimenter puts an urn at the front of the room with three marbles in it; she announces that there is a 50% chance that the urn contains two red marbles and one blue marble, and  a 50% chance that the urn contains two blue marbles and one red marble&#8230;one by one, each student comes to the front of the room and draws a marble from the urn; he looks at the color and then places it back in the urn without showing it to the rest of the class. The student then guesses whether the urn is majority-red or majority-blue and publicly announces this guess to the class.</p></blockquote>
<p>Let&#8217;s simulate how a set of rational students would perform in this experiment.</p>
<p>The first student has it easy: if he selects a blue marble, he guesses blue; if he selects a red marble, he guesses red. Either way, his guess publicly discloses the first marble&#8217;s color.</p>
<p>This the second student knows exactly the colors of the first two selected marbles. If he selects the same color as the first student, he will make the same guess.  If, however, the second student selects a red marble, he has no reason to prefer one color over the other. Let&#8217;s assume that, when the odds are 50/50, an indifferent student breaks symmetry by selecting the color in his hand. That way, we guarantee that the second student discloses the color of the marble he selects.</p>
<p>Things get interesting with the third student&#8217;s selection. What happens if the first two students have both guessed red, but the third student selects a blue marble? Rationally, the third student will guess red, since he knows that two of the first three selected marbles were red. In fact, if the first two students select red marbles, *every* subsequent student will ignore his own selection and guess red. Of course, analogous reasoning applies if we reverse the colors.</p>
<p>Generalizing from this case, we can see that the sequence guesses locks in on a single color as soon as two consecutive students agree. I leave it as an exercise to the reader to determine that, if the urn is majority-red, there is a 16/21 probability that the sequence will converge to red and a 5/21 probability that it will converge to blue.</p>
<p>A 5/21 probability of arriving at the wrong answer may not seem so bad. But imagine if you could see the actual marbles sampled and not just the guesses (i.e., each student provides an independent signal). The <a href="http://en.wikipedia.org/wiki/Law_of_large_numbers">law of large numbers</a> kicks in quickly, and the probability of the sample majority color being different from the true majority converges to 0.</p>
<p>This example of an information cascade is unrealistically simple, but is eerily suggestive of the way many sequential decision processes work. I hope we all see it as a cautionary tale. The <a href="http://en.wikipedia.org/wiki/Wisdom_of_the_crowd">wisdom of the crowd</a> breaks down when we throw away the independent signals of its participants.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/11/17/an-information-cascade/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/11/17/an-information-cascade/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
		<item>
		<title>The Element of Surprise</title>
		<link>http://thenoisychannel.com/2010/11/07/the-element-of-surprise/</link>
		<comments>http://thenoisychannel.com/2010/11/07/the-element-of-surprise/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 23:09:19 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3353</guid>
		<description><![CDATA[Surprise is not a word that user interface designers typically like to hear. Indeed, the principle of least surprise (also called the principle of least astonishment) is that systems should always strive to act in a way that least surprises the user. Like many interface design principles, the principle of least surprise reflects the premise [...]]]></description>
			<content:encoded><![CDATA[<p>Surprise is not a word that user interface designers typically like to hear. Indeed, the principle of least surprise (also called the <a href="http://en.wikipedia.org/wiki/Principle_of_least_astonishment">principle of least astonishment</a>) is that systems should always strive to act in a way that least surprises the user.</p>
<p>Like many interface design principles, the principle of least surprise reflects the premise that software applications exist to be useful. In utility-oriented applications, surprise means distraction and delay &#8212; negatives that good designers work to avoid.</p>
<p>But we increasingly see applications whose main value to the user is not utility, but entertainment. Indeed, a recent <a href="http://blog.nielsen.com/nielsenwire/online_mobile/what-americans-do-online-social-media-and-games-dominate-activity/">Nielsen report</a> claims that the top two online activities for Americans are social networks / blogs and games. I take the report with a grain of salt, but it seems safe to argue that people have come to expect the internet to be at least as fun as it is useful.</p>
<p>Even search, which would seem to be the poster child for the utility of online services, is being pressed into the service of entertainment. <a href="http://www.cs.swan.ac.uk/~csmax/index.php">Max Wilson</a> and <a href="http://twitter.com/#!/delsweil">David Elsweiler</a> argued as much in their <a href="http://sites.google.com/site/hcirworkshop/hcir-2010">HCIR 2010</a> presentation about &#8220;<a href="http://www.slideshare.net/gingdotslideshare/hcir2010-casualleisure-search">casual leisure searching</a>&#8220;. They mined Twitter to analyze a variety of scenarios where search isn&#8217;t about the use finding something, but rather about enjoying the experience. Indeed, their controversial definition of search is broad enough to include the possibility that the user does not have an information need.</p>
<p>Like the businessman in Antoine de St. Exupery&#8217;s <em><a href="http://gutenberg.net.au/ebooks03/0300771h.html">Le Petit Prince</a></em>, I&#8217;ve long felt that, as &#8220;un homme sérieux&#8221;, my job is delivering utility to users. Users already have lots of ways to waste time; I focus on making their productivity-oriented time more effective and efficient. I&#8217;m glad there are folks who devote their lives to making the rest of us have more fun (especially all the computer scientists who left academia for <a href="http://www.pixar.com/">Pixar</a>), but entertainment simply isn&#8217;t a vocation for me.</p>
<p>However, I&#8217;ve been coming around to the realization that fun and utility are not mutually exclusive. For example, news serves the utilitarian ideal of informing the citizenry, but many (most?) of us read news as a pleasant way to pass the time. Social networks are another example serving a similar function&#8211;perhaps with a balance that is more toward the entertainment of the spectrum but still providing genuine social utility.</p>
<p>A common feature of both of these examples is that users regularly return to the same site expecting the unexpected. The transient nature of news and social news feeds promises an endless supply of fresh content, produced more quickly than users can consume it. This situation is in stark contrast to those of <a href="http://en.wikipedia.org/wiki/Web_search_query">typical web search queries</a>, for which the results are expected to be largely static. Indeed, we may set up alerts to inform us of novel search results, but we are unlikely to regularly visit a bookmarked search results page the way we regularly visit a news or social network site.</p>
<p>Is novelty the only source of surprise? Novelty certainly helps, but it is not a necessity. An alternative source is randomness. I&#8217;m known people to use Wikipedia&#8217;s &#8220;<a href="http://en.wikipedia.org/wiki/Special:Random">random article</a>&#8221; feature. But a more plausible place to introduce randomness is in recommendations &#8212; whether for products or content. Since recommendations are good guesses at best, a bit of randomness can help ensure that the guesses are interesting. Indeed, a SIGIR 2010 paper by <a href="http://www.cs.ucl.ac.uk/staff/n.lathia/">Neal Lathia</a>, <a href="http://www.cs.ucl.ac.uk/staff/s.hailes/">Stephen Hailes</a>, <a href="http://www.cs.ucl.ac.uk/staff/l.capra/">Licia Capra</a>, and <a href="http://xavier.amatriain.net/">Xavier Amatriain</a> on &#8220;<a href="http://mobblog.cs.ucl.ac.uk/2010/05/20/temporal-diversity-in-recommender-systems/">Temporal Diversity in Recommender Systems</a>&#8221; explored the use or randomness to induce diversity in recommendations and arrived at the conclusion that people don’t like being recommended the same things over and over again.</p>
<p>Can we generalize from these examples? I think so. For utility-oriented information needs, it is important to provide users with accurate, predictable, and efficient tools. But we can&#8217;t dismiss everything else as frivolous. Sometimes we just need to offer our users a little bit of surprise to keep it interesting.</p>
<p>Or, as <a href="http://www.imdb.com/title/tt0058331/quotes">Mary Poppins</a> tells us: &#8220;In every job that must be done, there is an element of fun. You find the fun, and &#8211; SNAP &#8211; the job&#8217;s a game!&#8221;</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/11/07/the-element-of-surprise/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/11/07/the-element-of-surprise/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>A Question of User Expectations</title>
		<link>http://thenoisychannel.com/2010/10/25/a-question-of-user-expectations/</link>
		<comments>http://thenoisychannel.com/2010/10/25/a-question-of-user-expectations/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 01:47:01 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3349</guid>
		<description><![CDATA[Ideally, a search engine would read the user&#8217;s mind. Shy of that, a search engine should provide the user with an efficient process for expressing an information need and then provide the user with results relevant to the that need. From an information scientist&#8217;s perspective, these are two distinct problems to solve in the information [...]]]></description>
			<content:encoded><![CDATA[<p>Ideally, a search engine would read the user&#8217;s mind. Shy of that, a search engine should provide the user with an efficient process for expressing an information need and then provide the user with results relevant to the that need.</p>
<p>From an information scientist&#8217;s perspective, these are two distinct problems to solve in the information seeking process: establishing the user&#8217;s information need (query elaboration) and retrieving relevant information (information retrieval).</p>
<p>When open-domain search engines  (i.e., web search engines) went mainstream in the late 1990s, they did so by glossing over the problem of query elaboration and focusing almost entirely on information retrieval. More precisely, they addressed the query elaboration problem by requiring users to provide reasonable queries and search engines to infer information needs from those queries. In recent years, there has been more explicit support for query elaboration&#8211;most notably in the form of type-ahead query suggestions (e.g., <a href="http://www.google.com/instant/">Google Instant</a>). There have also been a variety of efforts to offer related queries as refinements.</p>
<p>But even with such support, query elaboration typically yields an informal, free-text string. All <a href="http://furnas.people.si.umich.edu/Research.Past.html#Vocab">vocabularies</a> have their flaws, but search engines compound the inherent imprecision of language by not even trying to guide users to a common standard. At best, query suggestion nudges users towards more popular&#8211;and hopefully more effective&#8211;queries.</p>
<p>In contrast, consider closed-domain search engines that operate on curated collections, e.g., the catalog search for an ecommerce site. These search engines often provide users with the opportunity to express precise queries, e.g., <a href="http://www.bhphotovideo.com/c/search?ci=9811&amp;N=4291645412+4293918168+4294956965">black digital cameras for under $250</a>. Moreover, well-designed sites offer users <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> interfaces that support progressive query elaboration through guided refinements.</p>
<p>Many (though not all) closed-domain search engines have an advantage over their open-domain counterparts: they can rely on manually curated metadata. The scale and heterogeneity of the open web defies human curation. Perhaps we&#8217;ll reach a point when automatic information extraction offers quality competitive with curation, but we&#8217;re not there yet. Indeed, the lack of good, automatically generated metadata has been cited as the top <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">challenge</a> facing those who would implement faceted search for the open web.</p>
<p>What can we do in the mean time? Here is a simple idea: use a closed-domain search engine do guide users to precise queries, and then apply the resulting queries to the open web. In other words mash up the closed and open collections.</p>
<p>Of course, this is easier said that done. It is not at all clear if or how we can apply a query like &#8220;black digital cameras for under $250&#8243; to a collection that is not annotated with the necessary metadata. But we can certainly try. And our ability to perform information retrieval from structured queries will improve over time&#8211;in fact, it may even improve more quickly if we can start to assume that users are being guided to precise, unambiguous queries.</p>
<p>Even though result quality would be variable, such an approach would at least eliminate a source of uncertainty in the information seeking process: the user would be certain of having a query that accurately represented his or her information need. That is no small victory!</p>
<p>I fear, however, that users might not respond positively to such an interface. Given the certainty that a query accurately represents his or her information need, a user is likely to have higher expectations of result quality than without that certainty. Retrieval errors are harder to forgive when the query elaboration process eliminates almost any chance of misunderstanding. Even if the results were more accurate, they might not be accurate enough to satisfy user expectations.</p>
<p>As an <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> evangelist, I am saddened by this prospect. Reducing uncertainty in any part of the information seeking process seems like it should always be a good thing for the user. I&#8217;m curious to hear what folks here think of this idea.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/10/25/a-question-of-user-expectations/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/10/25/a-question-of-user-expectations/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Pluralistic Ignorance and Bayesian Truth Serum</title>
		<link>http://thenoisychannel.com/2010/10/10/pluralistic-ignorance-and-bayesian-truth-serum/</link>
		<comments>http://thenoisychannel.com/2010/10/10/pluralistic-ignorance-and-bayesian-truth-serum/#comments</comments>
		<pubDate>Sun, 10 Oct 2010 18:39:40 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3337</guid>
		<description><![CDATA[Last week, I had the pleasure of talking with CMU professor George Loewenstein, one of the top researchers in the area of behavioral economics. I mentioned my idea of using prediction markets to address the weaknesses of online review systems and reputation systems, and he offered two insightful pointers. The first pointer was to the [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: center;"><a href="http://capewest.ca/Cartoon_Bayesian_analysis.jpg"><img class="aligncenter" title="Bayesian Analysis" src="http://capewest.ca/Cartoon_Bayesian_analysis.jpg" alt="" width="243" height="259" /></a></p>
<p>Last week, I had the pleasure of talking with CMU professor <a href="http://sds.hss.cmu.edu/src/faculty/loewenstein.php">George Loewenstein</a>, one of the top researchers in the area of <a href="http://en.wikipedia.org/wiki/Behavioral_economics">behavioral economics</a>. I mentioned my idea of <a href="http://thenoisychannel.com/2010/06/09/why-cant-we-just-use-prediction-markets/">using prediction markets</a> to address the weaknesses of online review systems and reputation systems, and he offered two insightful pointers.</p>
<p>The first pointer was to the notion of <a href="http://en.wikipedia.org/wiki/Pluralistic_ignorance">pluralistic ignorance</a>. As summarized on Wikipedia:</p>
<blockquote><p>In social psychology, pluralistic ignorance, a term coined by Daniel Katz and Floyd H. Allport in 1931, describes &#8220;a situation where a majority of group members privately reject a norm, but assume (incorrectly) that most others accept it&#8230;It is, in Krech and Crutchfield’s (1948, pp. 388–89) words, the situation where &#8216;no one believes, but everyone thinks that everyone believes&#8217;&#8221;. This, in turn, provides support for a norm that may be, in fact, disliked by most people.</p></blockquote>
<p>It had not occurred to me that pluralistic ignorance could wreak havoc on the prediction market approach I proposed. Specifically, there is a risk that, even though the majority participants in the market hold a particular opinion, they suppress their individual opinions and instead vote based on mistaken assumptions about the collective opinion of others. Ironically, these participants are pursuing an optimal strategy, given their pluralistic ignorance. Yet the results of such a market would not necessarily reflect the true collective opinion of participants. Clearly there is a need to incorporate people&#8217;s true opinions into the equation, and not just their beliefs about others&#8217; opinions.</p>
<p>Which leads me to the second resource to which Loewenstein pointed me: a paper by fellow behavioral economist and MIT professor <a href="http://mitsloan.mit.edu/faculty/detail.php?in_spseqno=106">Drazen Prelec</a> entitled &#8220;<a href="http://www.eecs.harvard.edu/cs286r/papers/Prelec04.pdf">A Bayesian Truth Serum for Subjective Data</a>&#8220;. As per the abstract:</p>
<blockquote><p>Subjective judgments, an essential information source for science and policy, are problematic because there are no public criteria for assessing judgmental truthfulness. I present a scoring method for eliciting truthful subjective data in situations where objective truth is unknowable. The method assigns high scores not to the most common answers but to the answers that are more common than collectively predicted, with predictions drawn from the same population. This simple adjustment in the scoring criterion removes all bias in favor of consensus: Truthful answers maximize expected score even for respondents who believe that their answer represents a minority view.</p></blockquote>
<p>Most of the paper is devoted to proving, subject to a few assumptions, that the optimal strategy for players in this game is to tell what they believe to be the truth&#8211;that is, the truth-telling strategy is the optimal <a href="http://en.wikipedia.org/wiki/Bayesian_game#Bayesian_Nash_equilibrium">Bayesian Nash equilibrium</a> for all players.</p>
<p>The assumptions are as follows:</p>
<ol>
<li>The sample of respondents is sufficiently large that a single answer cannot appreciably affect the overall results.</li>
<li>Respondents believe that others sharing their opinion will draw the same inferences about population frequencies.</li>
<li>All players assume that other players are responding truthfully&#8211;which follows if they are rational players.</li>
</ol>
<p>Prelec sums up his results as follows:</p>
<blockquote><p>In the absence of reality checks, it is tempting to grant special status to the prevailing consensus. The benefit of explicit scoring is precisely to counteract informal pressures to agree (or perhaps to &#8220;stand out&#8221; and disagree). Indeed, the mere existence of a truth-inducing scoring system provides methodological reassurance for social science, showing that subjective data can, if needed, be elicited by means of a process that is neither faith-based (&#8220;all answers are equally good&#8221;) nor biased against the exceptional view.</p></blockquote>
<p>Unfortunately, I don&#8217;t think that Prelec&#8217;s assumptions hold for most online review systems and reputation systems. In typical applications (e.g., product and service reviews on sites like Amazon and Yelp), the input is too sparse to even approximate the first assumption, and the other two assumptions probably ascribe too much rationality to the participants.</p>
<p>Still, Bayesian truth serum is a step in the right direction, and perhaps the approach (or some simple variant of it) applies to a useful subset of real-world prediction scenarios. Certainly it gives me hope that we&#8217;ll succeed in the quest to mine &#8220;subjective truth&#8221; from crowds.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/10/10/pluralistic-ignorance-and-bayesian-truth-serum/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/10/10/pluralistic-ignorance-and-bayesian-truth-serum/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>LinkedIn Signal = Exploratory Search for Twitter</title>
		<link>http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/</link>
		<comments>http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/#comments</comments>
		<pubDate>Sat, 02 Oct 2010 23:54:58 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3322</guid>
		<description><![CDATA[I like Twitter. Yes, I know that a lot of its content is noise. But I&#8217;ve found Twitter to be a useful professional tool for both publishing and consuming information. Publishing to Twitter is the easy part: I publish links to my blog posts and occasionally engage in public conversations. Consuming information from Twitter is more [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://learn.linkedin.com/twitter/"><img class="alignnone" title="You put your LinkedIn in my Twitter!" src="http://learn.linkedin.com/wp-content/uploads/2009/11/pbandc.jpg" alt="" width="101" height="122" /></a></p>
<p>I like Twitter. Yes, I know that a lot of its content is <a href="http://www.youtube.com/watch?v=PN2HAroA12w">noise</a>. But I&#8217;ve found Twitter to be a useful professional tool for both publishing and consuming information. Publishing to Twitter is the easy part: I publish <a href="http://twitter.com/#!/search/%23thenoisychannel">links</a> to my blog posts and occasionally <a href="http://twitter.com/#!/dtunkelang">engage</a> in public conversations.</p>
<p>Consuming information from Twitter is more of a challenge. I follow <a href="http://twitter.com/#!/dtunkelang/following">100 people</a>, which is about the limit of my <a href="http://thenoisychannel.com/2009/02/27/dunbar-lives/">attention budget</a>. I use saved searches to track long-term interests (much as I use web and news alerts), and I perform ad hoc searches when I am interested in finding out what people are saying about a particular topic.</p>
<p>But Twitter search is not a great fit for analysis or <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploration</a>&#8211;unless you count trending topics as analysis. <a href="http://thenoisychannel.com/2009/03/05/twitter-is-not-a-search-engine/">Originally</a>, the search results were simply the tweets that contained the  matching tweets in order of recency. The current system sometimes promotes a few &#8220;<a href="http://twitter.com/#!/TopTweets">top tweets</a>&#8221; to the top of the results. Still, if you&#8217;d like to get a summary view, slice and dice the results, or perform any other sort of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a> task, you&#8217;re out of luck.</p>
<p>Until now.</p>
<p><strong> </strong>The LinkedIn <a href="http://sna-projects.com/sna/">Search, Network, and Analytics</a> team&#8211;the same folks that built LinkedIn&#8217;s <a href="http://thenoisychannel.com/2009/12/15/linkedin-faceted-search-now-out-of-beta/ ">faceted search</a> system and developed open-source search tools <a href="http://sna-projects.com/zoie/">Zoie</a> and <a href="http://sna-projects.com/bobo/">Bobo</a>&#8211;just introduced a service called <a href="http://blog.linkedin.com/2010/09/29/linkedin-signal/  ">Signal</a> that is squarely aimed at folks like me who use Twitter as a professional tool. It is still in its infancy (in private beta, in fact), but I think it has the potential to dramatically change how people like me use Twitter. You can learn more about its architecture and implementation details <a href="http://sna-projects.com/blog/2010/10/linkedin-signal-a-look-under-the-hood/">here</a>.</p>
<p>Signal joins the often cacophonous Twitter stream to the high-quality structured data that LinkedIn knows about its own users. For example, when I post a tweet, LinkedIn knows that I am in the software industry, work at Google, and live in New York. LinkedIn can only make this connection for people who include Twitter ids in their LinkedIn profiles, but that&#8217;s a substantial and growing population.</p>
<p>Signal then lets you use this structured information to satisfy analytic and exploratory information needs. For example, I can see which companies&#8217; employees are tweeting about software patents (top two are Google and Red Hat).</p>
<p><a href="http://www.linkedin.com/signal/home#software patents?"><img class="alignnone size-full wp-image-3323" title="LinkedIn Signal: software patents" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/10/software-patents.png" alt="" width="596" height="425" /></a></p>
<p>Or compare what Microsoft employees are saying about Android&#8230;</p>
<p><a href="http://www.linkedin.com/signal/home#android?company=00000000000000001035"><img class="alignnone size-full wp-image-3324" title="LinkedIn Signal: android, Company = Microsoft" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/10/android-microsoft.png" alt="" width="594" height="346" /></a></p>
<p>&#8230;to what Google employees are saying about Android.</p>
<p><a href="http://www.linkedin.com/signal/home#android?company=00000000000000001441"><img class="alignnone size-full wp-image-3325" title="LinkedIn Signal: android, Company = Google" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/10/android-google.png" alt="" width="593" height="344" /></a></p>
<p>As you can see on the right-hand side, Signal also mines shared links to identify popular ones relative to given search&#8211;and allows you to see who has shared a particular link. This functionality is similar to <a href="http://thenoisychannel.com/2009/05/27/topsy-tippling-the-stream-of-conversations/">Topsy</a>, but with the advantage of allowing structured searches. Like Topsy, it wrangles the mass of retweeted links into a useful and user-friendly summary.</p>
<p>Signal is still very much in beta. An amusing bug that I encountered earlier today was that, due to some legacy issues in how Linkedin standardized institution names, the system decided that I was an alumnus of the <a href="http://www.longy.edu/">Longy School of Music</a> rather than of <a href="http://www.mit.edu/">MIT</a>. Fortunately, that&#8217;s fixed now (thanks, John!)&#8211;I love karaoke, but I&#8217;m not ready to quit my day job!</p>
<p>Also, Signal only exposes a handful of LinkedIn&#8217;s facets, which limits the breadth of analysis and exploration. I&#8217;d love to see it add a past company facet, making it possible to drill down into what a company&#8217;s ex-employees are saying about a particular topic (e.g., their ex-employer).</p>
<p>Finally, while Signal offers Twitter hashtags as a facet, these are hardly a substitute for a topic facet. In order to provide such a facet, LinkedIn needs to implement some kind of concept extraction to provide a useful topic facet (something I&#8217;d also love to see for their regular people search). This is a challenging information extraction problem, especially for the open web, but I also know from <a href="http://www.endeca.com/">experience</a> that it is tractable within a domain. Given LinkedIn&#8217;s professional focus, I believe this is a problem they can and should tackle.</p>
<p>Of course, Linkedin also needs to convince more of its users to join their LinkedIn accounts to their Twitter accounts&#8211;since that is their input source. But I suspect it&#8217;s mostly a matter of time and education&#8211;and hopefully the buzz around Signal will help raise awareness.</p>
<p>All in all, I see LinkedIn Signal as a great innovation and a big step forward for exploratory search and for Twitter. Congratulations to <a href="http://www.linkedin.com/in/javasoze">John Wang</a>, <a href="http://www.linkedin.com/in/igorperisic">Igor Perisic</a>, and the rest of the LinkedIn search team on the launch!</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/10/02/linkedin-signal-exploratory-search-for-twitter/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>An Open Letter to the USPTO</title>
		<link>http://thenoisychannel.com/2010/09/25/an-open-letter-to-the-uspto/</link>
		<comments>http://thenoisychannel.com/2010/09/25/an-open-letter-to-the-uspto/#comments</comments>
		<pubDate>Sun, 26 Sep 2010 01:02:14 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
		
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3314</guid>
		<description><![CDATA[Following the Supreme Court&#8217;s decision in Bilski v. Kappos, the United States Patent and Trademark Office (USPTO) plans to release new guidance as to which patent applications will be accepted, and which will not. As part of this process, they are seeking input from the public about how that guidance should be structured. The following [...]]]></description>
			<content:encoded><![CDATA[<p><em>Following the Supreme Court&#8217;s decision in Bilski v. Kappos, the United States Patent and Trademark Office (USPTO) plans to release new guidance as to which patent applications will be accepted, and which will not. As part of this process, they are seeking input from the public about how that guidance should be structured. The following is an open letter than I have sent to the USPTO at <a href="mail:Bilski_Guidance@uspto.gov">Bilski_Guidance@uspto.gov</a>. More information is available at <a href="http://en.swpat.org/wiki/USPTO_2010_consultation_-_deadline_27_sept">http://en.swpat.org/wiki/USPTO_2010_consultation_-_deadline_27_sept</a> and <a href="http://www.fsf.org/news/uspto-bilski-guidance">http://www.fsf.org/news/uspto-bilski-guidance</a>. As with all of my posts, the following represents my personal opinion and is not the opinion or policy of my employer.</em></p>
<p>To whom it may concern at the United States Patent Office:</p>
<p>Since completing my undergraduate studies in mathematics and computer science at the Massachusetts Institute of Technology (MIT) and my doctorate in computer science at Carnegie Mellon University (CMU), I have spent my entire professional life in software research and development. I have worked at large software companies, such as IBM, AT&amp;T, and Google, and I also was a founding employee at Endeca, an enterprise software company where I served as Chief Scientist. I am a named inventor on <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=/netahtml/PTO/search-bool.html&amp;r=0&amp;f=S&amp;l=50&amp;TERM1=tunkelang&amp;FIELD1=INNM&amp;co1=AND&amp;TERM2=&amp;FIELD2=&amp;d=PTXT">eight United States patents</a>, as well as on <a href="http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=/netahtml/PTO/search-bool.html&amp;r=0&amp;f=S&amp;l=50&amp;TERM1=tunkelang&amp;FIELD1=IN&amp;co1=AND&amp;TERM2=&amp;FIELD2=&amp;d=PG01">eighteen pending United States patent applications</a>. I played an active role in drafting and prosecuting most of these patents. I have also been involved in defensive patent litigation, which in one case resulted in the re-examination of a patent and a final rejection of most of its claims.</p>
<p>As such, I believe my experience gives me a balanced perspective on the pros and cons of software patents.</p>
<p>As someone who has developed innovative technology, I appreciate the desire of innovators to reap the benefits of their investments. As a founding employee of  a venture-backed startup, I understand how venture capitalists and other investors value companies whose innovations are hard to copy. And I recognize how, in theory, software patents address both of these concerns.</p>
<p>But I have also seen how, in practice, software patents are at best a nuisance and innovation tax and at worst a threat to the survival of early-stage companies. In particular, I have witnessed the proliferation of software patents of dubious validity that has given rise to a &#8220;vulture capitalist&#8221; industry of non-practicing entities (NPEs), colloquially known as patent trolls, who aggressively enforce these patents in order to obtain extortionary settlements. Meanwhile, the software companies where I have worked follow a practice of accumulating patent portfolios primarily in order to use them as deterrents against infringement suits by companies that follow the same strategy.</p>
<p>My experience leads me to conclude that the only beneficiaries of the current regime are patent attorneys and NPEs. All other parties would be benefit if software were excluded from patent eligibility. In particular, I don&#8217;t believe that software patents achieve either of the two outcomes intended by the patent system: incenting inventors to disclose (i.e., teach) trade secrets, and encouraging investment in innovation.</p>
<p>First, let us consider the incentive to disclose trade secrets. In my experience, software patents fall into two categories. The first category focuses on interfaces or processes, avoiding narrowing the scope to any non-obvious system implementation details. Perhaps the most famous example of a patent in this category is Amazon&#8217;s &#8220;one-click&#8221; patent. The second category focuses on algorithm or infrastructure innovations that typically implemented as inside of proprietary closed-source software. An example in this category is the patent on latent semantic indexing, an algorithmic approach used in search and data mining applications. For the first category, patents are hardly necessary to incent disclosure, as the invention must be disclosed to realize its value. Disclosure is meaningful for patents in the second category, but in my experience most companies do not file such patents because they are difficult to enforce. Without access to a company&#8217;s proprietary source code, it is difficult to prove that said source code is infringing on a patent. For this reason, software companies typically focus on the first category of patents, rather than the second. And, as noted, this category of innovation requires no incentive for disclosure.</p>
<p>Second, let us ask whether software patents encourage investment in innovation. Specifically, do patents influence decisions by companies, individual entrepreneurs, or investors to invest time, effort, or money in innovation?</p>
<p>My experience suggests that they do not. Companies and entrepreneurs innovate in order to further their business goals and then file patents as an afterthought. Investors expect companies to file patents, but only because everyone else is doing it, and thus patents offer a limited deterrent value as cited above. In fact, venture capitalists investing in software companies are some of the strongest voices in favor of abolishing software patents. Here are some examples:</p>
<p><a href="http://cdixon.org/about.html">Chris Dixon</a>, co-founder of software companies <a href="http://www.siteadvisor.com/">SiteAdvisor</a> and <a href="http://www.hunch.com/">Hunch</a> and of seed-stage venture capital fund <a href="http://foundercollective.com/">Founder Collective</a>, says:</p>
<blockquote><p>Perhaps patents are necessary in the pharmaceutical industry.  I know very little about that industry but it would seem that some sort of temporary grants of monopoly are necessary to compel companies to spend billions of dollars of upfront R&amp;D.</p>
<p>What I do know about is the software/internet/hardware industry. And I am absolutely sure that if we got rid of patents tomorrow innovation wouldn’t be reduced at all, and the only losers would be lawyers and patent trolls.</p>
<p>Ask any experienced software/internet/hardware entrepreneur if she wouldn’t have started her company if patent law didn’t exist.  Ask any experienced venture investor if the non-existence of patent law would have changed their views on investments they made.  The answer will invariably be no (unless their company was a patent troll or something related).</p>
<p><a href="http://cdixon.org/2009/09/24/software-patents-should-be-abolished/">http://cdixon.org/2009/09/24/software-patents-should-be-abolished/</a></p></blockquote>
<p><a href="http://www.feld.com/wp/about">Brad Feld</a>, co-founder of early-stage venture capital firms <a href="http://www.foundrygroup.com/">Foundry Group</a>, <a href="http://www.mobiusvc.com/">Mobius Venture Capital</a> and <a href="http://www.techstars.org/">TechStars</a>, says:</p>
<blockquote><p>I personally think software patents are an abomination.  My simple suggestion on the panel was to simply abolish them entirely.  There was a lot of discussion around patent reform and whether we should consider having different patent rules for different industries.  We all agreed this was impossible – it was already hard enough to manage a single standard in the US – even if we could get all the various lobbyists to shut up for a while and let the government figure out a set of rules.  However, everyone agreed that the fundamental notion of a patent – that the invention needed to be novel and non-obvious – was at the root of the problem in software.</p>
<p>I’ve skimmed hundreds of software patents in the last decade (and have read a number of them in detail.)  I’ve been involved in four patent lawsuits and a number of “threats” by other parties.  I’ve had many patents granted to companies I’ve been an investor in.  I’ve been involved in patent discussions in every M&amp;A transaction I’ve ever been involved in.  I’ve spent more time than I care to on conference calls with lawyers talking about patent issues.  I’ve always wanted to take a shower after I finished thinking about, discussing, or deciding how to deal with something with regard to a software patent.</p>
<p>I’ll pause for a second, take a deep breath, and remind you that I’m only talking about software patents.  I don’t feel qualified to talk about non-software patents.  However, we you consider the thought that a patent has to be both novel AND non-obvious (e.g. “the claimed subject matter cannot be obvious to someone else skilled in the technical field of invention”), 99% of all software patents should be denied immediately.  I’ve been in several situations where either I or my business partner at the time (Dave Jilk) had created prior art a decade earlier that – if the patent that I was defending against ever went anywhere – would have been used to invalidate the patent.</p>
<p><a href="http://www.feld.com/wp/archives/2006/04/abolish-software-patents.html">http://www.feld.com/wp/archives/2006/04/abolish-software-patents.html</a></p></blockquote>
<p><a href="http://www.avc.com/a_vc/about.html">Fred Wilson</a>, managing partner of venture-capital firm <a href="http://www.unionsquareventures.com/index.php">Union Square Ventures</a>:</p>
<blockquote><p>Even the average reader of the Harvard Business Review has a gut appreciation for the fundamental unfairness of software patents. Software is not the same as a drug compound. It is not a variable speed windshield wiper. It does not cost millions of dollars to develop or require an expensive approval process to get into the market. When it is patented, the &#8220;invention&#8221; is abstracted in the hope of covering the largest possible swath of the market. When software patents are prosecuted, it is very often against young companies that independently invented their technology with no prior knowledge of the patent.</p>
<p><a href="http://www.unionsquareventures.com/2010/02/software-patents-are-the-problem-not-the-answer.php">http://www.unionsquareventures.com/2010/02/software-patents-are-the-problem-not-the-answer.php</a></p></blockquote>
<p>In summary, software patents act as an innovation tax rather than a catalyst for innovation. Perhaps it is possible to resolve the problems of software patents through aggressive reform. But it would be better to abolish software patents than to maintain the status quo.</p>
<p>Sincerely,</p>
<p>Daniel Tunkelang</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/09/25/an-open-letter-to-the-uspto/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/09/25/an-open-letter-to-the-uspto/feed/</wfw:commentRss>
		<slash:comments>111</slash:comments>
		</item>
		<item>
		<title>Search at the Speed of Thought</title>
		<link>http://thenoisychannel.com/2010/09/20/search-at-the-speed-of-thought/</link>
		<comments>http://thenoisychannel.com/2010/09/20/search-at-the-speed-of-thought/#comments</comments>
		<pubDate>Mon, 20 Sep 2010 04:08:30 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3308</guid>
		<description><![CDATA[A guiding principle in information technology has been to enable people to perform tasks at the &#8220;speed of thought&#8221;. The goal is not just to make people more efficient in our use of technology, but to remove the delays and distractions that make us focus on the technology rather than the tasks themselves. For example, [...]]]></description>
			<content:encoded><![CDATA[<p>A guiding principle in information technology has been to enable people to perform tasks at the &#8220;speed of thought&#8221;. The goal is not just to make people more efficient in our use of technology, but to remove the delays and distractions that make us focus on the technology rather than the tasks themselves.</p>
<p>For example, the principle motivation for the <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> work I did at <a href="http://www.endeca.com/">Endeca</a> was to eliminate hurdles that discourage people from exploring information spaces. Most sites already offered user the ability to perform this exploration through advanced or <a href="http://www.usabilityfirst.com/glossary/parametric-search/">parametric</a> search interfaces&#8211;indeed, I recall some critics of faceted search objecting that it was nothing new. But there&#8217;s a reason that most of today&#8217;s consumer-facing sites place faceted search front and center while still relegating advanced search interfaces to an obscure page for power users. Faceted search offers users the fluidity and instant feedback that makes exploration natural for users. Once you&#8217;re used to it, it&#8217;s hard to live without it, whether your looking for real estate (compare <a href="http://www.zillow.com/">Zillow.com</a> to housing search on <a href="http://newyork.craigslist.org/hhh/">craigslist</a>), library books (compare the <a href="http://search.trln.org/">Triangle Research Libraries Network</a> to the <a href="http://catalog.loc.gov/">Library of Congress</a>), or art (compare to <a href="http://www.art.com/">art.com</a> to <a href="http://www.artnet.com/">artnet</a>).</p>
<p>Why is faceted search such a significant improvement over advanced or parametric search interfaces? Because it supports exploration at the speed of thought. If it takes you several seconds&#8211;rather than a single click&#8211;to refine a query, and if you have to repeatedly back off from pages with no results (aka dead ends), your motivation to explore a document collection fades quickly. But when that experience is fluid, you explore without even thinking about it. That is the promise (admittedly not always fulfilled) of faceted search.</p>
<p>Microsoft Live Labs director Gary Flake offered a similar message in his <a href="http://thenoisychannel.com/2010/07/21/sigir-2010-day-1-keynote/">SIGIR 2010 keynote</a>. He argued that we needed to replace our current discrete interactions with search engines into a mode of continuous, fluid interaction where the whole of data is greater than sum or parts. While he offered Microsoft&#8217;s <a href="http://www.getpivot.com/">Pivot</a> client as an example of this vision, he could also have invoked the title of a book that Bill Gates wrote in 1999: <a href="http://www.amazon.com/Business-Speed-Thought-Digital-Nervous/dp/0446525685"><em>Business @ the Speed of Thought</em></a>. Indeed, anyone who has ever worked on data analysis understands that you ask fewer questions when you know you&#8217;ll have to wait for answers. Speed changes the way you interact with information.</p>
<p>And at Google, speed has been an obsession since day one. It makes the top 3 on the &#8220;<a href="http://www.google.com/corporate/tenthings.html">Ten things we know to be true</a>&#8221; list:</p>
<blockquote><p>3. Fast is better than slow.</p>
<p>We know your time is valuable, so when you&#8217;re seeking an answer on the web you want it right away – and we aim to please. We may be the only people in the world who can say our goal is to have people leave our website as quickly as possible. By shaving excess bits and bytes from our pages and increasing the efficiency of our serving environment, we&#8217;ve broken our own speed records many times over, so that the average response time on a search result is a fraction of a second. We keep speed in mind with each new product we release, whether it&#8217;s a mobile application or Google Chrome, a browser designed to be fast enough for the modern web. And we continue to work on making it all go even faster.</p></blockquote>
<p>People have made much of Google VP Marissa Mayer&#8217;s estimate that <a href="http://www.google.com/instant/">Google Instant</a> will save <a href="http://techcrunch.com/2010/09/08/instant-time/">350 million hours</a> of users&#8217; time per year by shaving <a href="http://googleblog.blogspot.com/2010/09/search-now-faster-than-speed-of-type.html">two to five seconds per search</a>. That&#8217;s an impressive number, but I personally think it understates the impact of this interface change. Rather, I&#8217;m inclined to focus on a phrase I&#8217;ve seen repeatedly associated with Google Instant: &#8220;search at the speed of thought&#8221;.</p>
<p>What does that mean in practice? I see two major wins from Google Instant:</p>
<p>1) Typing speed and spelling accuracy don&#8217;t get in the way. For example, by the time you&#8217;ve typed [m n], you see results for <a href="http://en.wikipedia.org/wiki/M._Night_Shyamalan">M. Night Shyamalan</a>, a name whose length and spelling might frustrate even his fans. A search for [marc z] offers results for Facebook CEO <a href="http://en.wikipedia.org/wiki/Mark_Zuckerberg">Mark Zuckerberg</a>. Admittedly, the pre-Instant type-ahead suggestions already got us most of the way there, but the feedback of actual results offers not just guidace but certainty.</p>
<p>2) Users spend less&#8211;and hopefully no time&#8211;in a limbo where they don&#8217;t know if the system has understood the information-seeking intent they have expressed as a query. For example, if I&#8217;m interested in learning more about the Bob Dylan song &#8220;<a href="http://en.wikipedia.org/wiki/Forever_Young_(Bob_Dylan_song)">Forever Young</a>&#8220;, I might enter [forever young] as a search query&#8211;indeed, the suggestion shows up as soon as I&#8217;ve typed in &#8220;fore&#8221;. But a glance at the first few instant results for [forever young] makes it clear that there are lots of songs by this title (including those by <a href="http://en.wikipedia.org/wiki/Forever_Young_(Rod_Stewart_song)">Rod Stewart</a> and <a href="http://en.wikipedia.org/wiki/Forever_Young_(Alphaville_song)">Alphaville</a>&#8211;as well as the recent Jay Z song &#8220;<a href="http://en.wikipedia.org/wiki/Young_Forever">Young Forever</a>&#8221; that reworks the latter). Realizing that my query is ambiguous, I type the single letter &#8220;d&#8221; and instantly see results for the Dylan song. Yes, I could have backed out from an unsuccessful query and then tried again, but instant feedback means far less frustration.</p>
<p>Google Instant also makes it a little easier for users to explore the space of queries related to their information need, but exploration through instant suggestions is very limited compared to using <a href="http://www.google.com/search?q=renaissance%20polyphony&amp;tbs=clue:1">related searches</a> or the <a href="http://www.google.com/search?hl=en&amp;tbs=ww:1&amp;q=real+time+search&amp;btnG=Search">wonder wheel</a>&#8211;let alone what we might be able to do with <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">faceted web search</a>. I&#8217;d love to see this sort of exploration become more fluid, but I recognize the imperative to maintain the simplicity of the search box. Good for us <a href="http://hcir.info/">HCIR</a> folks to know that there&#8217;s still lots of work to do on search interface innovation!</p>
<p>But, in short, speed matters. Instant communication has transformed the way we interact with one another&#8211;both personally and professionally. Instant search is more subtle, but I think it will transform the way we interact with information on the web. I am very proud of my colleagues&#8217; <a href="http://googleblog.blogspot.com/2010/09/google-instant-behind-scenes.html">collective effort</a> to make it possible.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/09/20/search-at-the-speed-of-thought/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/09/20/search-at-the-speed-of-thought/feed/</wfw:commentRss>
		<slash:comments>74</slash:comments>
		</item>
		<item>
		<title>New Web Site for HCIR Workshop</title>
		<link>http://thenoisychannel.com/2010/09/11/new-web-site-for-hcir-workshop/</link>
		<comments>http://thenoisychannel.com/2010/09/11/new-web-site-for-hcir-workshop/#comments</comments>
		<pubDate>Sat, 11 Sep 2010 15:44:57 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
		
		<guid isPermaLink="false">http://thenoisychannel.com/?p=3302</guid>
		<description><![CDATA[In 2007, I persuaded MIT graduate students Michael Bernstein and Robin Stewart (who was interning at Endeca that summer) to help organize the first Workshop on Human-Computer Information and Information Retrieval (HCIR 2007), which we held at MIT and Endeca. Its success convinced us to keep going, and we enjoyed record attendance at this year&#8217;s HCIR [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://hcir.info/"><img class="alignnone size-full wp-image-3303" style="border: 1px solid black;" title="http://hcir.info/" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/09/HCIR-Screenshot.png" alt="" width="529" height="358" /></a></p>
<p>In 2007, I persuaded MIT graduate students <a href="http://people.csail.mit.edu/msbernst/">Michael Bernstein</a> and <a href="http://www.robinstewart.com/">Robin Stewart</a> (who was interning at Endeca that summer) to help organize the first Workshop on Human-Computer Information and Information Retrieval (<a href="http://hcir.info/hcir-2007">HCIR 2007</a>), which we held at MIT and Endeca. Its success convinced us to keep going, and we enjoyed record attendance at this year&#8217;s <a href="http://hcir.info/hcir-2010">HCIR 2010</a>, held at Rutgers University.</p>
<p>As the workshop has grown, we as organizers have realized that we need to invest a little in its online presence. A first step in that direction is a new site for the workshop: <a href="http://hcir.info/">http://hcir.info/</a>. The site contains all of the proceedings from the four annual workshops in one place. It is powered by <a href="http://sites.google.com/">Google Sites</a>, which will make it easy for a bunch of us (and perhaps some of you) to collaboratively maintain it.</p>
<p>I hope everyone here finds the new site useful. Please feel free to come forward with ideas for improving it! But be warned&#8211;if you have a great idea, I might ask you to implement it yourself.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/09/11/new-web-site-for-hcir-workshop/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/09/11/new-web-site-for-hcir-workshop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>David Petrou Presents Google Goggles at NY Tech Meetup</title>
		<link>http://thenoisychannel.com/2010/09/09/david-petrou-presents-google-goggles-at-ny-tech-meetup/</link>
		<comments>http://thenoisychannel.com/2010/09/09/david-petrou-presents-google-goggles-at-ny-tech-meetup/#comments</comments>
		<pubDate>Thu, 09 Sep 2010 05:13:59 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3290</guid>
		<description><![CDATA[Image recognition is one of those problems that has presented long-standing challenges to computer scientists, despite being taken for granted by science fiction writers. Google Goggles represents one of the most audacious efforts to implement image recognition on on a massive scale. Tonight, I had the pleasure of watching my colleague, David Petrou, present a [...]]]></description>
			<content:encoded><![CDATA[<div id="__ss_5160806" style="width: 425px;"><object id="__sse5160806" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=nytmsearchbysightgooglegoggles-100908232259-phpapp01&amp;rel=0&amp;stripped_title=search-by-sight-google-goggles" /><param name="name" value="__sse5160806" /><param name="allowfullscreen" value="true" /><embed id="__sse5160806" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=nytmsearchbysightgooglegoggles-100908232259-phpapp01&amp;rel=0&amp;stripped_title=search-by-sight-google-goggles" name="__sse5160806" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
<p>Image recognition is one of those problems that has presented long-standing challenges to computer scientists, despite being taken for granted by science fiction writers. <a href="http://www.google.com/mobile/goggles/">Google Goggles</a> represents one of the most audacious efforts to implement image recognition on on a massive scale.</p>
<p>Tonight, I had the pleasure of watching my colleague, <a href="http://www.google.com/profiles/dpetrou">David Petrou</a>, present a live demo of Goggles to about 800 people who filled the <a href="http://www.skirballcenter.nyu.edu/">NYU Skirball Center</a> to attend the <a href="http://nytm.org/">NY Tech Meetup</a>. Many thanks to <a href="http://innonate.com/">Nate Westheimer</a> and <a href="http://www.meetup.com/ny-tech/members/10615613/">Brandon Diamond</a> for giving Google the opportunity to present this cool technology to a very engaged audience and in particular to show off some of the technology that Googlers are building <a href="http://www.google.com/intl/en/jobs/uslocations/new-york/">here in New York City</a>.</p>
<p>You can&#8217;t see the live demo in the slides, so I encourage you to view a recording of the presentation <a href="http://livestre.am/mUH4">here</a>.</p>
<p>Also, if you&#8217;re in the New York area and interested in hearing about upcoming Google NYC events, please sign up at <a href="http://bit.ly/googlenycevents">http://bit.ly/googlenycevents</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/09/09/david-petrou-presents-google-goggles-at-ny-tech-meetup/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/09/09/david-petrou-presents-google-goggles-at-ny-tech-meetup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Slouching Towards Creepiness</title>
		<link>http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/</link>
		<comments>http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/#comments</comments>
		<pubDate>Tue, 07 Sep 2010 11:11:41 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3281</guid>
		<description><![CDATA[One of the perks of blogging is that publishers sometimes send me review copies of new books. I couldn&#8217;t help but be curious about a book entitled &#8220;The Man Who Lied to His Laptop: What Machines Teach Us About Human Relationships&#8220;&#8211;especially when principal author Clifford Nass is the director of the Communications between Humans and Interactive [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.cliffordnass.com/"><img class="alignnone" title="The Man Who Lied to His Laptop: What Machines Teach Us About Human Relationships" src="http://www.cliffordnass.com/images/hero-home.jpg" alt="" width="395" height="219" /></a></p>
<p><a href="http://www.cliffordnass.com/"></a>One of the perks of blogging is that publishers sometimes send me review copies of new books. I couldn&#8217;t help but be curious about a book entitled &#8220;<a href="http://www.cliffordnass.com/">The Man Who Lied to His Laptop: What Machines Teach Us About Human Relationships</a>&#8220;&#8211;especially when principal author <a href="http://www.stanford.edu/~nass/">Clifford Nass</a> is the director of the Communications between Humans and Interactive Media (<a href="http://chime.stanford.edu/index.html">CHIMe</a>) Lab at Stanford. He wrote the book with <a href="http://dschoolserver.stanford.edu/people/team_corina_yen.php">Corina Yen</a>, the editor-in-chief of <a href="http://ambidextrousmag.org/index.php"><em>Ambidextrous</em></a>, Stanford&#8217;s journal of design.</p>
<p>They start the book by reviewing evidence that people treat computers as social actors. Nass writes:</p>
<blockquote><p>to make a discovery, I would find any conclusion by a social science researcher and change the sentence &#8220;People will do X when interacting with other people&#8221; to &#8220;People will do X when interacting with a computer&#8221;</p></blockquote>
<p>They then apply this principle by using computers as confederates in social science experiments and generalizing conclusions about human-compter interaction to human-human interaction. It&#8217;s an interesting approach, and they present results about how people respond to praise and criticism, similar/opposite personalities,  etc. You can get a taste of Nass&#8217;s writing from an article he published in the Wall Street Journal entitled &#8220;<a href="http://online.wsj.com/article/SB10001424052748703959704575453411132636080.html">Sweet Talking Your Computer</a>&#8220;.</p>
<p>The book is interesting and entertaining, and I won&#8217;t try to summarize all of its findings here. Rather, I&#8217;d like to explore its implications.</p>
<p>Applying the &#8220;<a href="http://www.vhml.org/theses/wijayat/sources/writings/papers/p72-nass.pdf">computers are social actors</a>&#8221; principle, they cite a variety of computer-aided experiments that explore people&#8217;s social behaviors. For example, they cite a Stanford study on how &#8220;<a href="http://poq.oxfordjournals.org/content/72/5/935.full.pdf">Facial Similarity Between Voters and Candidates Causes Influence</a>&#8221; , in which secretly morphing a photo of a candidate&#8217;s face to resemble the voter&#8217;s face induces a significantly positive effect on the voter&#8217;s preference. They also cite  another experiment on similarity attraction that varies a computer&#8217;s &#8220;personality&#8221; to be either similar or opposite to that of the experimental subject. A similar personality draws a more positive response than an opposite one, but the most positive response comes from the computer starts off with an opposite  personality and then adapts to conform to the personality of the subject. Imitation is flattery, and&#8211;as yet another of their studies shows&#8211;flattery works.</p>
<p>It&#8217;s hard for me to read results like these and not see creepy implications for personalized user interfaces. When I think about the upside of personalization, I envision a happy world where we see improvement in both effectiveness and user satisfaction. But clearly there&#8217;s a dark side where personalization takes advantage of knowledge about users to manipulate their emotional response. While such manipulation may not be in the users&#8217; best interests, it may leave them feeling more satisfied. Where do we draw the line between user satisfaction and manipulation?</p>
<p>I&#8217;m not aware of anyone using personalization this way, but I think it&#8217;s a matter of time before we see people try. It&#8217;s not hard to learn about users&#8217; personalities (especially when so many like taking quizzes!), and apparently it&#8217;s easy to vary the personality traits that machines project in generated text, audio, and video. How long will it before people put these together? Perhaps we are already there.</p>
<p>O brave new world that has such people and machines in it. Shakespeare had no idea.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/09/07/slouching-toward-creepiness/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>HCIR 2010: Bigger and Better than Ever!</title>
		<link>http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/</link>
		<comments>http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 05:24:13 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3275</guid>
		<description><![CDATA[Last Sunday was HCIR 2010, the Fourth Annual Workshop on Human-Computer Interaction and Information Retrieval, held at Rutgers University in New Brunswick, collocated with the Information Interaction in Context Symposium (IIiX 2010). With 70 registered attendees, it was the biggest HCIR workshop we have held. Rutgers was a gracious host, providing space not only for [...]]]></description>
			<content:encoded><![CDATA[<p>Last Sunday was <a href="http://www.hcir2010.org/">HCIR 2010</a>, the Fourth Annual Workshop on Human-Computer Interaction and Information Retrieval, held at Rutgers University in New Brunswick, collocated with the Information Interaction in Context Symposium (<a href="http://www.iiix2010.org/">IIiX 2010</a>).</p>
<p>With 70 registered attendees, it was the biggest HCIR workshop we have held. Rutgers was a gracious host, providing space not only for the all-day workshop but also for a welcome reception the night before.</p>
<p>And, based on an informal survey of participants, I can say with some semblance of objectivity that this was the best HCIR workshop to date.</p>
<p>The opening &#8220;poster boaster&#8221; session was particularly energetic. There was no award for best boaster, but <a href="http://www.bobcatsss2008.org/programme/speakers/136.en.html">Cathal Hoare</a> won an ovation by delivering his boaster as a poem:</p>
<blockquote>
<div id="_mcePaste">If a picture is worth a thousand words</div>
<div id="_mcePaste">Surely to query formulation a photo affords</div>
<div id="_mcePaste">The ability to ask ‘what is that’ in ways that are many</div>
<div id="_mcePaste">But for years we have asked how can-we</div>
<div id="_mcePaste">Narrow the search space so that in reasonable time</div>
<div id="_mcePaste">We can use images to answer questions that are yours and mine</div>
<div id="_mcePaste">In my humble poster I will describe</div>
<div id="_mcePaste">How recent technology and users prescribe</div>
<div id="_mcePaste">A solution that allows me to point and click</div>
<div id="_mcePaste">And get answers so that I don’t feel so thick</div>
<div id="_mcePaste">About my location and my environment</div>
<div id="_mcePaste">And to my touristic explorations bring some enjoyment</div>
<div id="_mcePaste">Now if after all that you feel rather dazed</div>
<div id="_mcePaste">Please come by my poster and see if you are amazed&#8230;.</div>
</blockquote>
<p>As in past years, we enlisted a rock-star keynote speaker&#8211;this time, Google UX researcher <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a>. His slides hardly do justice to his talk&#8211;especially without the audio and video&#8211;but I&#8217;ve embedded them here so that you can get a flavor for his presentation on how we need to do more to improve the searcher.</p>
<p><object id="__sse5065727" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir-keynote-talk-russell-aug-22-2010-100827000301-phpapp01&amp;stripped_title=dan-russell-search-quality-and-user-happiness" /><param name="name" value="__sse5065727" /><param name="allowfullscreen" value="true" /><embed id="__sse5065727" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir-keynote-talk-russell-aug-22-2010-100827000301-phpapp01&amp;stripped_title=dan-russell-search-quality-and-user-happiness" name="__sse5065727" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<p>We accepted six papers for the presentation sessions&#8211;sadly, one of the presenters could not make it because of visa issues. The five presentations covered a variety of topics relating to tools, models, and evaluation for HCIR. The most intriguing of these (to me, at least) was a presentation by <a href="http://www.cs.swan.ac.uk/~csmax/">Max Wilson</a> about &#8220;casual-leisure searching&#8221;&#8211;which he argues breaks our current models of exploratory search. Check out the slides below, as well as Erica Naone&#8217;s article in <em>Technology Review</em> on &#8220;<a href="http://www.technologyreview.com/communications/26135/">Searching for Fun</a>&#8220;.</p>
<p><object id="__sse5045602" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir2010pres-100824083643-phpapp02&amp;stripped_title=hcir2010-casualleisure-search" /><param name="name" value="__sse5045602" /><param name="allowfullscreen" value="true" /><embed id="__sse5045602" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=hcir2010pres-100824083643-phpapp02&amp;stripped_title=hcir2010-casualleisure-search" name="__sse5045602" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div style="padding: 5px 0 12px;">As always, the poster session was the most interactive. Part of the energy came from <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> participants showing off their systems in advance of the final session that would decide which of them would win. In any case, I felt like a heel having to walk through the hall of poster three times in order to herd people back to their seats.</div>
<div style="padding: 5px 0 12px;">Which brings us to the Challenge. When I first suggested the idea of a competition or challenge to my co-organizers back in February, I wasn&#8217;t sure we could pull it off. Indeed,  even after we managed to obtain the use of the <a href="http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2008T19">New York Times Annotated Corpus</a> (thank you, <a href="http://www.ldc.upenn.edu/">LDC</a>!) and a volunteer to set up a baseline system in <a href="http://lucene.apache.org/solr/">Solr</a> (thank you, <a href="http://tommy.chheng.com/">Tommy</a>!), I still worried that we&#8217;d have a party and no one would come. So I was delighted to see six very credible entries competing for the &#8220;people&#8217;s choice&#8221; award.</div>
<div>
<p>All of the participants offered interesting ideas: custom facets, visualization of the associations between relevant terms, multi-document summarization to catch up on a topic, and combining topic modeling with sentiment analysis to analyzing competing perspectives on a controversial issue. The winning entry, presented by Michael Matthews of Yahoo! Labs Bareclona, was the <a href="http://fbmya01.barcelonamedia.org:8080/future/">Time Explorer</a>. As its name suggests, it allows users see the evolution of a topic over time. A cool feature is that it parses absolute and relative dates from article test&#8211;in some cases references to past or future times outside the publication span of the collection. Moreover, the temporal visualization of topics allows users to discover unexpected relationships between entities at particular points in time, e.g., between <a href="http://fbmya01.barcelonamedia.org:8080/future/results.jsp?query=yugoslavia&amp;s=0&amp;rc=10&amp;facet.filter=per:Slobodan%20Milosevic&amp;facet.filter=per:Saddam%20Hussein">Slobodan Milosevic and Saddam Hussein</a>. You can read more about it in Tom  Simonite&#8217;s <em>Technology Review</em> article, &#8220;<a href="http://www.technologyreview.com/computing/26113/">A Search Service that Can Peer into the Future</a>&#8220;.</p>
<p>In short, HCIR 2010 will be a tough act to follow. But we&#8217;re already working on it. Watch this space&#8230;</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/27/hcir-2010-bigger-and-better-than-ever/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Exploring Nuggetize</title>
		<link>http://thenoisychannel.com/2010/08/15/exploring-nuggetize/</link>
		<comments>http://thenoisychannel.com/2010/08/15/exploring-nuggetize/#comments</comments>
		<pubDate>Sun, 15 Aug 2010 23:12:36 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3261</guid>
		<description><![CDATA[I&#8217;ve been exchanging emails with Dhiti co-founder Bharath Mohan about Nuggetize, an intriguing interface that surfaces &#8220;nuggets&#8221; from a site to reduce the user&#8217;s cost of exploring a document collection. Specifically Nuggetize targets research scenarios where users are likely to assemble a substantial reading list before diving into it. You can try Nuggetize on the [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignnone size-full wp-image-3265" title="The Noisy Channel - Nuggetized" src="http://thenoisychannel.com/wordpress/wp-content/uploads/2010/08/The-Noisy-Channel-Nuggetized.png" alt="" width="500" height="280" /></p>
<p>I&#8217;ve been exchanging emails with <a href="http://www.dhiti.com/">Dhiti</a> co-founder <a href="http://in.linkedin.com/in/bharathkumarmohan">Bharath Mohan</a> about <a href="http://www.nuggetize.com/">Nuggetize</a>, an intriguing interface that surfaces &#8220;nuggets&#8221; from a site to reduce the user&#8217;s cost of exploring a document collection. Specifically Nuggetize targets research scenarios where users are likely to assemble a substantial reading list before diving into it. You can try Nuggetize on the general web or on a particular site that has been &#8220;nuggetized&#8221;, e.g., a blog like <a href="http://nuggetize.com/thenoisychannel">this one</a> or <a href="http://nuggetize.com/cdixon-org">Chris Dixon&#8217;s</a>.</p>
<p>I&#8217;m always happy to see people building systems that explicitly support <a href="http://en.wikipedia.org/wiki/Exploratory_search">exploratory search</a> (and am looking forward to seeing the <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> entries in a week!). Regular readers may recall my coverage of <a href="http://thenoisychannel.com/?s=cuil">Cuil</a>, <a href="http://thenoisychannel.com/?s=kosmix">Kosmix</a>, and <a href="http://thenoisychannel.com/?s=duck+duck+go">Duck Duck Go</a>. And of course I helped build a few of my own at <a href="http://endeca.com/">Endeca</a>. So what&#8217;s special about Nuggetize?</p>
<p>Mohan <a href="http://bharathruminates.blogspot.com/2010/05/nuggetize-faceted-search-for-web.html">describes</a> it as a <a href="http://en.wikipedia.org/wiki/Faceted_search">faceted search</a> interface for the web. I&#8217;ll quibble here&#8211;the interface offers grouped refinement options, but the groups don&#8217;t really strike me as <a href="http://en.wikipedia.org/wiki/Faceted_classification">facets</a>. Moreover, the interface isn&#8217;t really designed to explore intersections of the refinement options&#8211;rather, at any given time, you see the intersection of the initial search and a currently selected refinement. But it is certainly an interface that supports query refinement and exploration.</p>
<p>The more interesting features are the nuggets and the support for <a href="http://en.wikipedia.org/wiki/Relevance_feedback">relevance feedback</a>.</p>
<p>The nuggets are full sentences, and thus feel quite different from conventional search-engine snippets. Conventional snippets serve primarily to provide <a href="http://en.wikipedia.org/wiki/Information_foraging#Information_scent">information scent</a>, helping users quickly determine the utility of a search result without the cost of clicking through to it and reading it. In contrast the nuggets are document fragments that are sufficiently self-contained to communicate a coherent thought. The experience suggests passage retrieval rather than document retrieval.</p>
<p>The relevance feedback is explicit: users can thumbs-up or thumbs-down results. After supplying feedback, users can refresh their results (which re-ranks them) and are also presented with suggested categories to use for feedback (both positive and negative). Unfortunately, the research on relevance feedback tells us that, helpful as it could be to improving user experience, users don&#8217;t bite. But perhaps users in research scenarios will give it a chance&#8211;especially with the added expressiveness and transparency of combining document and category feedback.</p>
<p>Overall it is a slick interface, and it&#8217;s nice seeing the various ideas Mohan and his colleagues put together. There&#8217;s certainly room for improvement&#8211;particularly in the quality of the categories, which sometimes feel like victims of <a href="http://en.wikipedia.org/wiki/Polysemy">polysemy</a>. Open-domain information extraction is hard! Some would even call it a <a href="http://thenoisychannel.com/2008/11/18/faceted-search-for-the-web-a-grand-challenge/">grand challenge</a>.</p>
<p>Mohan reads this blog (he reached out to me a few months ago via a <a href="http://thenoisychannel.com/2009/12/03/search-user-interfaces-and-data-quality/#comment-5977">comment</a>), and I&#8217;m sure he&#8217;d be happy to answer questions here.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/08/15/exploring-nuggetize/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/15/exploring-nuggetize/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Taking Blekko out for a Spin</title>
		<link>http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/</link>
		<comments>http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/#comments</comments>
		<pubDate>Sat, 07 Aug 2010 02:57:05 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3254</guid>
		<description><![CDATA[If you&#8217;re a search engine junkie like me, you&#8217;ve probably heard about Blekko, a search engine that has been percolating for over two years and recently launched a private beta. If not, I encourage you to watch the TechCrunch video I&#8217;ve embedded above. You can join the beta by following them on Twitter. I did [...]]]></description>
			<content:encoded><![CDATA[<p><script src="http://player.ooyala.com/player.js?embedCode=90cmtrMTom9vae2YoUwJrngW3UCgI2Zu&amp;deepLinkEmbedCode=90cmtrMTom9vae2YoUwJrngW3UCgI2Zu"></script></p>
<p>If you&#8217;re a search engine junkie like me, you&#8217;ve probably heard about <a href="http://blekko.com/">Blekko</a>, a search engine that has been percolating for <a href="http://blekko.com/">over two years</a> and recently <a href="http://searchengineland.com/blekko-a-new-search-engine-that-lets-you-spin-the-web-47215">launched</a> a private beta. If not, I encourage you to watch the TechCrunch video I&#8217;ve embedded above. You can join the beta by following them <a href="http://www.twitter.com/blekko">on Twitter</a>. I did that earlier this week, and my invitation arrived via a direct message the next day.</p>
<p>Blekko&#8217;s main differentiating feature is that it supports &#8220;slashtags&#8221;. These aren&#8217;t the same as the <a href="http://en.wikipedia.org/wiki/Slashtag">Twitter microsyntax</a> proposed by <a href="http://factoryjoe.com/blog/2009/11/08/slashtags/">Chris Messina</a> and named by <a href="http://unthinkingly.com/2009/11/09/slashtags-for-citizen-editors/">Chris Blow</a>. Rather, they are a way for users to &#8220;spin&#8221; their search results using a variety of filters. For example, [climate /liberal] and [climate /conservative] return very different results, because they are restricted to different sets of sites.</p>
<p>In addition to providing a set of curated slashtags, Blekko allows users to define their own slashtags by specifying the sets of sites to be included. There&#8217;s a social aspect here too: you can use (and follow) other users&#8217; slashtags. Blekko also has some special slashtags that don&#8217;t act as site filters, e.g., /date shows recent results and /seo offers indexing information about web sites.</p>
<p>Blekko emphasizes two characteristics that I find very appealing: transparency and user control. While they do not disclose their relevance ranking algorithm, they do expose some of the information they use to compute it. More significantly, their emphasis on slashtags de-emphasizes default ranking, but rather encourages users to take more responsibility in the information seeking process. Very <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">HCIR</a>!</p>
<p>I like the concept. But I&#8217;m not sure how I feel about the execution. I have three main concerns.</p>
<p>First, the set of slashtags is somewhat haphazard&#8211;to be expected in a beta, but I&#8217;m not sure how it will evolve. I&#8217;d love to see a vocabulary collectively (and transparently) curated like Wikipedia, but I fear it will look more like social tagging site <a href="http://delicious.com/">Delicious</a>, which is a case study in the &#8220;<a href="http://furnas.people.si.umich.edu/Papers/vocab.paper.pdf">vocabulary problem</a>&#8220;. As any information scientist can tell you, managing vocabularies is hard!</p>
<p>Second, I&#8217;m not sure if site filters are the right model. What happens to sites with heterogeneous content? Or to sites that have one-hit wonders and therefore are unlikely to show up in any slashtags? I&#8217;d prefer to see the sites used as seeds to train classifiers that could then be applied to the entire index. Something a bit more like what <a href="http://people.lis.illinois.edu/~mefron/">Miles Efron</a> implemented in <a href="http://people.lis.illinois.edu/~mefron/papers/efron-libmedia.pdf">this research</a>&#8211;only on a much larger scale and applied at a page rather than site level.</p>
<p>Third, I think there&#8217;s a third ingredient that is essential to complement transparency and user control: guidance. As a user, I need to know what slashtags would lead me to interesting results, and ideally I&#8217;d want some kind of preview to make exploration as low-cost as possible.</p>
<p>I know I&#8217;m asking for a lot&#8211;especially from an ambitious startup that has just launched its private beta. But I think the stakes are high in this space, and going easy on a newcomer is no favor. I offer the tough love of a critic who would really like to see this kind of vision succeed.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/06/taking-blekko-out-for-a-spin/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>HCIR 2010 Accepted Papers</title>
		<link>http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/</link>
		<comments>http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 01:55:35 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3246</guid>
		<description><![CDATA[The 4th Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2010) is coming up on August 22 in New Brunswick, NJ, taking place immediately after the Information Interaction in Context conference (IIiX 2010). That&#8217;s just a few weeks away! If you are are interested in attending and haven&#8217;t already registered, please let me know as [...]]]></description>
			<content:encoded><![CDATA[<p>The 4th Workshop on Human-Computer Interaction and Information Retrieval <a href="http://www.hcir2010.org/">(HCIR 2010</a>) is coming up on August 22 in New Brunswick, NJ, taking place immediately after the Information Interaction in Context conference (<a href="http://www.iiix2010.org/">IIiX 2010</a>). That&#8217;s just a few weeks away!</p>
<p>If you are are interested in attending and haven&#8217;t already registered, please let me know as soon as possible via <a href="mail:dtunkelang@gmail.com">email</a> or <a href="http://twitter.com/dtunkelang">Twitter</a> (speaking of which, follow the <a href="http://twitter.com/#search?q=%23hcir10">#hcir2010</a> hash tag). We&#8217;re making the remaining slots available to the community on a first-come, first-serve basis.</p>
<p>Google user experience researcher <a href="http://sites.google.com/site/dmrussell/">Dan Russell</a> will be delivering this year&#8217;s keynote on &#8220;<a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/keynote.html">Why is search sometimes easy and sometimes hard? Understanding serendipity and expertise in the mind of the searcher</a>&#8220;.</p>
<p>Here is the list of <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/presentations.html">accepted papers</a>:</p>
<p>Oral Presentations</p>
<ul>
<li>VISTO: for Web Information Gathering and Organization<br />
<em>Anwar Alhenshiri, Carolyn Watters, and Michael Shepherd (Dalhousie University)</em></li>
<li><em> </em>Time-based Exploration of News Archives<br />
<em>Omar Alonso (Microsoft Corporation), </em><em>Klaus Berberich (Max-Planck Institute for Informatics), </em><em>Srikanta Bedathur (Max-Planck Institute for Informatics), and </em><em>Gerhard Weikum (Max-Planck Institute for Informatics)</em></li>
<li><em></em>Combining Computational Analyses and Interactive Visualization to Enhance Information Retrieval<br />
<em>Carsten Goerg, Jaeyeon Kihm, Jaegul Choo, Zhicheng Liu, Sivasailam Muthiah, Haesun Park, and John Stasko (Georgia Institute of Technology)</em></li>
<li><em></em>Impact of Retrieval Precision on Perceived Difficulty and Other User Measures<br />
<em>Mark Smucker and Chandra Prakash Jethani (University of Waterloo)</em></li>
<li><em></em>Exploratory Searching As Conceptual Exploration<br />
<em>Pertti Vakkari (University of Tampere)</em></li>
<li><em></em>Casual-leisure Searching: The Exploratory Search Scenarios that Break our Current Models<br />
<em>Max L. Wilson (Swansea University) and David Elsweiler (University of Erlangen)</em></li>
</ul>
<p><a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2010/challenge.html">HCIR Challenge</a> Reports</p>
<ul>
<li>Search for Journalists: New York Times Challenge Report<br />
<em>Corrado Boscarino, Arjen P. de Vries, and Wouter Alink </em><em>(Centrum Wiskunde and Informatica)</em></li>
<li>Exploring the New York Times Corpus with NewsClub<br />
<em>Christian Kohlschütter (Leibniz Universität Hannover)</em></li>
<li><em></em>Searching Through Time in the New York Times<br />
<em>Michael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and Hugo Zaragoza (Yahoo! Labs)</em></li>
<li><em></em>News Sync: Three Reasons to Visualize News Better<br />
<em>V.G. Vinod Vydiswaran (University of Illinois), </em><em>Jeroen van den Eijkhof (University of Washington), </em><em>Raman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research), and Jim St. George (Microsoft Research)</em></li>
<li><em></em>Custom Dimensions for Text Corpus Navigation<br />
<em>Vladimir Zelevinsky (Endeca Technologies)</em></li>
<li><em></em>A Retrieval System Based on Sentiment Analysis<br />
<em>Wei Zheng and Hui Fang (University of Delaware)</em></li>
</ul>
<p>Research Posters</p>
<ul>
<li>Improving Web Search for Information Gathering: Visualization in Effect<br />
<em>Anwar Alhenshiri, Carolyn Watters, and Michael Shepherd (Dalhousie University)</em></li>
<li><em></em>User-oriented and Eye-Tracking-based Evaluation of an Interactive Search System<br />
<em>Thomas Beckers and Norbert Fuhr (University of Duisberg-Essen)</em></li>
<li><em></em>Exploring Combinations of Sources for Interaction Features for Document Re-ranking<br />
<em>Emanuele Di Buccio (University of Padua), Massimo Melucci (University of Padua), and Dawei Song (The Robert Gordon University)</em></li>
<li>Extracting Expertise to Facilitate Exploratory Search and Information Discovery: Combining Information Retrieval Techniques with a Computational Cognitive Model<br />
<em>Wai-Tat Fu and Wei Dong (University of Illinois at Urbana-Champaign)</em></li>
<li><em></em>An Architecture for Real-time Textual Query Term Extraction from Images<br />
<em>Cathal Hoare and Humphrey Sorensen (University College Cork)</em></li>
<li><em></em>Transaction Log Analysis of User Actions in a Faceted Library Catalog Interface<br />
<em>Bill Kules (The Catholic University of America), </em><em>Robert Capra (University of North Carolina at Chapel Hill), and </em><em>Joseph Ryan (North Carolina State University Libraries)</em></li>
<li>Context in Health Information Retrieval: What and Where<br />
<em>Carla Lopes and Cristina Ribeiro (University of Porto)</em></li>
<li><em></em>Tactics for Information Search in a Public and an Academic Library Catalog with Faceted Interfaces<br />
<em>Xi Niu and Bradley M. Hemminger (University of North Carolina at Chapel Hill)</em></li>
</ul>
<p>Position Papers</p>
<ul>
<li>Understanding Information Seeking in the Patent Domain and its Impact on the Interface Design of IR Systems<br />
<em>Daniela Becks, Matthias Görtz, and </em><em>Christa Womser-Hacker (University of Hildesheim)</em></li>
<li>Better Search Applications Through Domain Specific Context Descriptions<br />
<em>Corrado Boscarino, Arjen P. de Vries, and Jacco van Ossenbruggen </em><em>(Centrum Wiskunde and Informatica)</em></li>
<li><em></em>Layered, Adaptive Results: Interaction Concepts for Large, Heterogeneous Data Sets<br />
<em>Duane Degler (Design for Context)</em></li>
<li><em></em>Revisiting Exploratory Search from the HCI Perspective<br />
<em>Abdigani Diriye (University College London), Max L. Wilson (Swansea University), </em><em>Ann Blandford (University College London), and </em><em>Anastasios Tombros (Queen Mary University London)</em></li>
<li>Supporting Task with Information Appliances: Taxonomy of Needs<br />
<em>Sarah Gilbert, Lori McCay-Peet, and Elaine Toms (Dalhousie University)</em></li>
<li><em></em>A Proposal for Measuring and Implementing Group’s Affective Relevance in Collaborative Information Seeking<br />
<em>Roberto González-Ibáñez and Chirag Shah (Rutgers University)</em></li>
<li><em></em>Evaluation of Music Information Retrieval: Towards a User-Centered Approach<br />
<em>Xiao Hu (University of Illinois at Urbana Champaign) and </em><em>Jingjing Liu (Rutgers University)</em></li>
<li><em></em>Information Derivatives: A New Way to Examine Information Propagation<br />
<em>Chirag Shah (Rutgers University)</em></li>
<li><em></em>Implicit Factors in Networked Information Feeds<br />
<em>Fred Stutzman (University of North Carolina at Chapel Hill)</em></li>
<li><em></em>Improving the Online News Experience<br />
<em>V. G. Vinod Vydiswaran (University of Illinois) and </em><em>Raman Chandrasekar (Microsoft Research)</em></li>
<li><em></em>Breaking Down the Assumptions of Faceted Search<br />
<em>Vladimir Zelevinsky (Endeca Technologies)</em></li>
<li><em></em>A Survey of User Interfaces in Content-based Image Search Engines on the Web<br />
<em>Danyang Zhang (The City University of New York) </em></li>
</ul>
<p>You can also download the full proceedings <a href="http://www.hcir2010.org/docs/HCIR2010Proceedings.pdf">here</a>.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/03/hcir-2010-accepted-papers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Overcoming Spammers in Twitter</title>
		<link>http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/</link>
		<comments>http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 03:43:20 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3243</guid>
		<description><![CDATA[As I blogged a few months ago, University of Oviedo professor Daniel Gayo-Avello published a research paper entitled “Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms“, in which he concluded that TunkRank was the best of the measures he studied for ranking Twitter users. I recently discovered that he and David Brenes posted slides from [...]]]></description>
			<content:encoded><![CDATA[<p><object id="__sse4504913" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ceri2010-gayobrenes-imagenes-100615061415-phpapp02&amp;stripped_title=overcoming-spammers-in-twitter-a-tale-of-five-algorithms" /><param name="name" value="__sse4504913" /><param name="allowfullscreen" value="true" /><embed id="__sse4504913" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=ceri2010-gayobrenes-imagenes-100615061415-phpapp02&amp;stripped_title=overcoming-spammers-in-twitter-a-tale-of-five-algorithms" name="__sse4504913" allowscriptaccess="always" allowfullscreen="true"></embed></object></p>
<div id="__ss_4504913" style="width: 425px;">
<p>As I blogged <a href="http://thenoisychannel.com/2010/04/07/go-tunkrank/">a few months ago</a>, University of Oviedo professor <a href="http://www.di.uniovi.es/~dani/">Daniel Gayo-Avello</a> published a research paper entitled “<a href="http://arxiv.org/abs/1004.0816">Nepotistic Relationships in Twitter and their Impact on Rank Prestige Algorithms</a>“, in which he concluded that <a href="http://tunkrank.com/">TunkRank</a> was the best of the measures he studied for ranking Twitter users. I recently discovered that he and <a href="http://es.linkedin.com/in/brenes">David Brenes</a> posted slides from their presentation at <a href="http://ir.ii.uam.es/ceri2010/">CERI 2010</a> on &#8220;Overcoming Spammers in Twitter&#8221;. Enjoy!</p>
</div>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/02/overcoming-spammers-in-twitter/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Questions. But Why?</title>
		<link>http://thenoisychannel.com/2010/08/01/questions-but-why/</link>
		<comments>http://thenoisychannel.com/2010/08/01/questions-but-why/#comments</comments>
		<pubDate>Sun, 01 Aug 2010 18:41:51 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3231</guid>
		<description><![CDATA[Yahoo! Answers and Answers.com have been around since 2005. But community question answering (as distinct from question answering using natural language processing) has witnessed a resurgence of popularity&#8211;at least in the blogosphere and among investors. Quora and Hunch are two of hottest startups on the web, and Aardvark was acquired by Google earlier this year. Most recently, Ask.com [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://answers.yahoo.com/">Yahoo! Answers</a> and <a href="http://www.answers.com/">Answers.com</a> have been around since 2005. But community question answering (as distinct from <a href="http://en.wikipedia.org/wiki/Question_answering">question answering using natural language processing</a>) has witnessed a resurgence of popularity&#8211;at least in the blogosphere and among investors. <a href="http://www.quora.com/">Quora</a> and <a href="http://hunch.com/">Hunch</a> are two of hottest startups on the web, and <a href="http://vark.com/">Aardvark</a> was acquired by Google earlier this year. Most recently, <a href="http://www.ask.com/">Ask.com</a> relaunched with a return to its question-answering roots and Facebook began rolling out <a href="http://blog.facebook.com/blog.php?post=411795942130">Facebook Questions</a>.</p>
<p>So there&#8217;s no question that community question answering is hot. The question is why? In particular, is community question answering a step forward or backward relative to today&#8217;s search engines, or is it something different?</p>
<p>Regarding Facebook Questions, Jason Kincaid writes in <a href="http://techcrunch.com/2010/07/28/facebook-qa-service-questions-begins-rolling-out-could-be-massive/">TechCrunch</a>:</p>
<blockquote><p>Given its size, it won’t take long for Facebook to build up a massive amount of data — if that data is consistently reliable, Questions could turn into a viable alternative to Google for many queries.</p></blockquote>
<p>That&#8217;s a big if.  But I think the bigger caveat is the vague quantifier &#8220;many&#8221;. The success of community question answering services will depend on how these services position themselves relative to users&#8217; information needs. Anyone arguing that these services can or should replace today&#8217;s web search engines might want to consider the following examples of information needs that are typical of current search engine use:</p>
<ul>
<li><a href="http://www.google.com/search?q=how+do+i+get+an+iphone+case">How do I get an iPhone case?</a></li>
<li><a href="http://www.google.com/search?q=who+sings+the+choco+latte+song">Who sings the &#8220;choco latte&#8221; song?</a></li>
<li><a href="http://www.google.com/search?q=movies+near+11201">What movies are playing in my neighborhood?</a></li>
<li><a href="http://www.google.com/search?q=how+do+i+get+to+boston+from+new+york">How do I get to Boston from New York?</a></li>
<li><a href="http://www.google.com/search?q=best+selling+netbook">What is the best selling netbook?</a></li>
<li><a href="http://www.google.com/search?q=best+cell+phone+reception+in+new+york">Who offers the best cell phone reception in New York?</a></li>
<li><a href="http://www.google.com/search?q=what+was+the+score+in+the+north+korea+portugal+game">What was the score in the North Korea &#8211; Portugal game?</a></li>
</ul>
<p>I hope I don&#8217;t have to keep going to convince you that web search engines have earned their popularity by serving a broad class of information needs (i.e., answer lots of questions)&#8211;and that&#8217;s without even using the wide variety of personalized and social features that web search engines are rapidly developing.</p>
<p>The common thread in the above questions is that they focus on objective information. In general, such questions are effectively and efficiently answered by search engines based on indexed, published content (including &#8220;<a href="http://en.wikipedia.org/wiki/Deep_Web">deep web</a>&#8221; content made available to search engines via APIs). There&#8217;s a lot of work we can do to improve search engines, particularly in the area of <a href="http://en.wikipedia.org/wiki/Human%E2%80%93computer_information_retrieval">supporting query formulation</a>. But it seems silly and wasteful to route such questions to other people&#8211;human beings should not be reduced to performing tasks at which machines excel.</p>
<p>That said, I agree with Kincaid that there are many information needs that are well addressed by  community question answering. In particular:</p>
<ul>
<li><strong>Questions for which point of view is a feature, not a bug.</strong> Review sites succeed when they provide sincere, informed personal reactions to products and services. Similarly, routing questions to people makes sense either when we care about the answerer&#8217;s a point of view. For some questions, I want the opinion of someone who shares my taste (which is what Hunch is pursuing with its &#8220;<a href="http://www.businessinsider.com/heres-what-comes-after-the-social-graph-2010-7">taste graph</a>&#8220;). For others, I want a diversity of expert opinions&#8211;for which I might turn to Aardvark (which tries to route questions to topic experts), Quora (where people follow particular topics), or <a href="http://www.linkedin.com/answers/">LinkedIn Answers</a>. Over time, the answers to many such questions can be published and indexed&#8211;and indeed some answers sites receive a <a href="http://twitter.com/Hitwise_US/status/19919086878">large share of their traffic</a> from search engines.</li>
<li><strong>Niche topics.</strong> As much as web search as improved <a href="http://thenoisychannel.com/2008/04/22/accessibility-in-information-retrieval/">information accessibility</a> for the &#8220;long tail&#8221; of published information, the effectiveness of web search can be highly variable for the most obscure information needs. Moreover, this effectiveness depends significantly on the user: some people are better at searching than others, especially in their areas of domain expertise. Social search can help level the playing field. Much as Wikipedia has surfaced much of the expertise at the head of the information distribution, community question answering can help out in the tail.</li>
<li><strong>Community for its own sake.</strong> Even in cases where search engines are more effective and efficient than community question answering services, some people prefer to participate in a social exchange rather than to conduct a transaction with an impersonal algorithm. Indeed, <a href="http://vark.com/aardvarkFinalWWW2010.pdf">researchers at Aardvark</a> found that many of the questions posed through their service (pre-acquisition) could be answered successfully using Google. I&#8217;ll go out on a limb and assume that Aardvark&#8217;s users were early technology adopters who are quite conversant with search engines&#8211;but in some case chose to use a social alternative simply because they wanted to be social.</li>
</ul>
<p>Conclusions? Community question answering may be overhyped right now, but it isn&#8217;t a fad. There are broad classes of subjective information needs that require a point of view, if not a diversity of views. And even if much of the use of community question answering sites is mediated by search engines indexing their archives, there will always be a need for fresh content. I also believe that social search will continue to be valuable for niche topics, since neither search engines nor searchers will ever be perfect.</p>
<p>But I think the biggest open question is whether people will favor community question answering simply to be social. I conjecture that, by very publicly integrating community question answering into is social networking platform, Facebook is testing the hypothesis that it can turn information seeking from a utilitarian individual task into an entertaining social destination. Given Facebook&#8217;s <a href="http://mashable.com/2009/09/17/facebook-google-time-spent/">highly engaged</a> user population, we won&#8217;t have to wait long to find out.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/08/01/questions-but-why/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/08/01/questions-but-why/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 3 Industry Track Afternoon Sessions</title>
		<link>http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/</link>
		<comments>http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/#comments</comments>
		<pubDate>Wed, 28 Jul 2010 01:46:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3226</guid>
		<description><![CDATA[While the SIGIR 2010 Industry Track keynotes had the highest-profile speakers, the rest of the day assembled an impressive line-up: The new frontiers of Web search: going beyond the 10 blue links Ricardo Baeza-Yates, Andrei Broder, Yoelle Maarek, and Prabhakar Raghavan, Yahoo! Labs Cross-Language Information Retrieval in the Legal Domain Samir Abdou and Thomas Arni, [...]]]></description>
			<content:encoded><![CDATA[<p>While the <a href="http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/">SIGIR 2010 Industry Track keynotes</a> had the highest-profile speakers, the rest of the day assembled an impressive line-up:</p>
<ul>
<li>The new frontiers of Web search: going beyond the 10 blue links<br />
<em>Ricardo Baeza-Yates, Andrei Broder, Yoelle Maarek, and Prabhakar Raghavan, <strong>Yahoo! Labs<br />
</strong></em></li>
<li>Cross-Language Information Retrieval in the Legal Domain<br />
<em>Samir Abdou and Thomas Arni, <strong>Eurospider</strong></em></li>
<li>Building and Configuring a Real-Time Indexing System<br />
<em>Garret Swart, Ravi Palakodety, Mohammad Faisal, Wesley Lin, <strong>Oracle<br />
</strong></em></li>
<li>Lessons and Challenges from Product Search<br />
<em>Daniel E. Rose, <strong>A9.com (Amazon)<br />
</strong></em></li>
<li>Being Social: Research in Context-aware and Personalized Information Access @ Telefonica<br />
<em>Xavier Amatriain, Karen Church and Josep M. Pujol, <strong>Telefónica</strong><br />
</em></li>
<li>Searching and Finding in a Long Tail Marketplace<br />
<em>Neel Sundaresan, </em><strong><em>eBay</em></strong><em><br />
</em></li>
<li>When No Clicks are Good News<br />
<em>Carlos Castillo, Aris Gionis, Ronny Lempel, and Yoelle Maarek, <strong>Yahoo! Research</strong></em></li>
</ul>
<p>I missed the Eurospider and Oracle talks, but otherwise I spent the afternoon enjoying these sessions. The slides, along with all of the keynote slides, are available <a href="http://www.eurospider.com/acm-sigir-industry-track-2010.html">here</a>.</p>
<p>Some highlights from the talks I attended:</p>
<ul>
<li>Andrei Broder, a pioneer of Web IR and author of the highly cited &#8220;<a href="http://www.sigir.org/forum/F2002/broder.pdf">Taxonomy of Web Search</a>&#8220;,  enumerated a half-dozen challenges for web search to move from its current state to one that not only accomplishes semantic analysis but also supports task completion. Naturally, the one that appeals to me is the need for search engines to move beyond query suggestion and truly engage the user in a dialog.</li>
<li>Dan Rose talked about the challenges of product search, and in particular the blessing and curse of implementing search applications for structured data (something that I&#8217;m very familiar with from my previous role at <a href="http://www.endeca.com/">Endeca</a>). He also warned of the dangers of over-interpreting behavioral data, e.g., a site change that increases revenue does not necessarily imply a better user experience (it could just be favoring higher-priced inventory), and may ultimately alienate customers.</li>
<li>Xavier Amatriain focused on social search, and talked about how, as we&#8217;ve turned to context to help mitigate information overload, we find ourselves confronted with the new problem of context overload. Specifically, he cited the research questioning the wisdom of the crowd, and proposed the <a href="http://www.nuriaoliver.com/RecSys/wisdomFew_sigir09.pdf">wisdom of the (expert) few</a> as a better alternative.</li>
<li>Neel Sundaresan offered an interesting tour of <a href="http://labs.ebay.com/">eBay Research Labs</a> prototypes, including the <a href="http://labs.ebay.com/erl/demoto/to">BayEstimate</a> that helps sellers improve listing titles by discovering the keywords are both representative of the item and used in buyers&#8217; queries.</li>
<li>Finally, Carlos Castillo offered a nice approach to discover when search engine abandonment is &#8220;<a href="http://research.google.com/pubs/pub35486.html">good abandonment</a>&#8220;: identify a subset of &#8220;tenacious&#8221; users who almost never abandon searches and measure <em>their</em> abandonment&#8211;since it is almost certain to be the good kind.</li>
</ul>
<p>All in all, I was very impressed with the quality of the Industry Track, and gratified to see how it had improved on the program I put together last year. Given the key role that industry plays in information retrieval, I think it is important that the top-tier IR conference promote the best that industry has to offer.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/27/sigir-2010-day-3-industry-track-afternoon-sessions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 3 Industry Track Keynotes</title>
		<link>http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/</link>
		<comments>http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/#comments</comments>
		<pubDate>Sun, 25 Jul 2010 22:45:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3219</guid>
		<description><![CDATA[When I organized the SIGIR 2009 Industry Track last year, my goal was to meet the standard set by the CIKM 2008 Industry Event: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the [...]]]></description>
			<content:encoded><![CDATA[<p>When I organized the <a href="http://www.sigir2009.org/Program/industry">SIGIR 2009 Industry Track</a> last year, my goal was to meet the standard set by the <a href="http://www.cikm2008.org/industry_event.php">CIKM 2008 Industry Event</a>: a compelling set of presentations that would give researchers an opportunity to learn about the problems most relevant to industry practitioners, and offer practitioners an opportunity to deepen their understanding of the field in which they are working. I was mostly happy with the results last year, and the popularity of the industry track relative to the parallel technical sessions suggest that my assessment is not simply from personal bias.</p>
<p>But this year the <a href="http://www.sigir2010.org/doku.php?id=industry:program">SIGIR 2010 Industry Track</a> broke new ground. The keynotes were from some of the most senior technologists at the world&#8217;s largest web search engines:</p>
<ul>
<li><a title="http://ir.baidu.com/phoenix.zhtml?c=188488&amp;p=irol-govBio&amp;ID=161381" rel="nofollow" href="http://ir.baidu.com/phoenix.zhtml?c=188488&amp;p=irol-govBio&amp;ID=161381" target="extern">William Chang</a>, Chief Scientist at Baidu</li>
<li><a title="https://docs.google.com/Doc?id=dhbn99z4_529xnxc2hh" rel="nofollow" href="https://docs.google.com/Doc?id=dhbn99z4_529xnxc2hh" target="extern">Yossi Matias</a>, Head of Google&#8217;s Israel R&amp;D Center</li>
<li><a title="http://www.jopedersen.com/jopedersen/Home.html" rel="nofollow" href="http://www.jopedersen.com/jopedersen/Home.html" target="extern">Jan Pedersen</a>, Chief Scientist for Core Search at Microsoft (Bing)</li>
<li><a title="http://company.yandex.com/general_info/management_team.xml" rel="nofollow" href="http://company.yandex.com/general_info/management_team.xml" target="extern">Ilya Segalovich</a>, CTO and Co-Founder of Yandex</li>
</ul>
<p>I won&#8217;t attempt to provide much detail about these presentations, first because <del datetime="2010-07-26T16:42:30+00:00">I&#8217;m hoping they will all be</del> <a href="http://www.eurospider.com/acm-sigir-industry-track-2010.html">they have all been posted online</a> and second because Jeff Dalton has already done an excellent job of posting <a href="http://www.searchenginecaffe.com/search?q=sigir+industry+day">live-blogged notes</a>. Rather, I&#8217;ll offer a few reactions.</p>
<p>William&#8217;s presentation on the &#8220;Future Search: From Information Retrieval to Information Enabled Commerce&#8221; unsurprisingly focused on the Chinese search-related market. While the topic of  <a href="http://googleblog.blogspot.com/2010/06/update-on-china.html">Google in China</a> was an elephant in the room, it did not surface even obliquely in the presentation&#8211;and I commend William for taking the high road. As for Baidu itself, its most interesting innovation is <a href="http://open.baidu.com/">Aladdin</a>, an open search platform that allows participating webmasters to submit query-content pairs.</p>
<p>Yossi&#8217;s presentation on &#8220;Search Flavours at Google&#8221; was a tour de force of Google&#8217;s recent innovations in the search and data mining space. The search examples most focused on the challenges of incorporating context into query understanding&#8211;where context might involve geography, time, social network, etc. But some of the more impressive examples showed off using the power of data to <a href="http://googleresearch.blogspot.com/2009/04/predicting-present-with-google-trends.html">predict the present</a>. More than anything, his presentation made clear that Google is doing a lot more than returning the traditional ten blue links.</p>
<p>Jan talked about &#8220;Query Understanding at Bing&#8221;. I really hope he makes these slides available, since they do a really nice job of describing a machine learning based architecture for processing search queries. To get an idea of this topic, check out <a href="http://thenoisychannel.com/2009/08/02/sigir-2009-day-3-industry-track-nick-craswell/">Nick Craswell&#8217;s presentation</a> from last year&#8217;s SIGIR.</p>
<p>Finally Ilya talked about &#8220;Machine Learning in Search Quality at Yandex&#8221;, the largest search engine in Russia. He described the main challenge in Russia as handling the local aspects of search: he gave as an example that, if you&#8217;re in a small town in Russia, then local results in Moscow may as well be on the moon. Local search is a topic close to my heart, not least of which because it is my day job! Ilya&#8217;s talked focused largely on Yandex&#8217;s <a href="http://company.yandex.com/general_info/technologies.xml">MatrixNet</a> implementation of <a href="http://en.wikipedia.org/wiki/Learning_to_rank">learning to rank</a>. What I&#8217;m surprised he didn&#8217;t mention is the challenges of data acquisition&#8211;in general, for domains beyond the web, obtaining high-quality data is often a much bigger challenge than filtering and ranking it.</p>
<p>All in all, the four keynotes collectively offered an excellent state-of-the-search-engine address.  As with last year, the industry track talks were the most popular morning sessions, and the speakers delivered the goods.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/25/sigir-2010-day-3-industry-track-keynotes/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>SIGIR 2010: Day 2 Technical Sessions</title>
		<link>http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/</link>
		<comments>http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/#comments</comments>
		<pubDate>Fri, 23 Jul 2010 06:49:11 +0000</pubDate>
		<dc:creator>Daniel Tunkelang</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://thenoisychannel.com/?p=3215</guid>
		<description><![CDATA[On the second day of the SIGIR 2010 conference, I did start shuttling between sessions to attend particular talks. In the morning session, I attended three talks. The first, &#8220;Geometric Representations for Multiple Documents&#8221; by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides [...]]]></description>
			<content:encoded><![CDATA[<p>On the second day of the <a href="http://www.sigir2010.org/">SIGIR 2010</a> conference, I did start shuttling between sessions to attend particular talks.</p>
<p>In the morning session, I attended three talks. The first, &#8220;Geometric Representations for Multiple Documents&#8221; by Jangwon Seo and Bruce Croft, looks at the problem of representing combinations of documents in a query model. It provides both theoretical and experimental evidence that geometric means work better than arithmetic means for representing such combinations. The second, &#8220;Using Statistical Decision Theory and Relevance Models for Query-Performance Prediction&#8221; by Anna Shtok, Oren Kurland, and David Carmel, shows the efficacy of a utility estimation framework comprised of relevance models, measures like query clarity to estimate the representativeness of relevance models, and similarity measures to estimate the similarity or correlation between two ranked lists.  The authors demonstrated significant improvements from the framework over simply using the representativeness measures for performance prediction. The third paper, &#8220;Evaluating Verbose Query Processing Techniques&#8221; by Samuel Huston and Bruce Croft, showed that removing &#8220;stop structures&#8221;, a generalization of stop words, could significantly improve performance on long queries. Interestingly, the authors evaluated their approach on &#8220;black box&#8221; commercial search engines Yahoo and Bing without knowledge of their retrieval models.</p>
<p>In the session after lunch, I mostly attended talks from the session on user feedback and user models. The first, &#8220;Incorporating Post-Click Behaviors Into a Click Model&#8221; by Feimin Zhong, Dong Wang, Gang Wang, Weizhu Chen, Yuchen Zhang, Zheng Chen, and Haixun Wang, proposed and experimentally validated a click model to infer document relevance from post-click behavior like dwell time that can be derived from logs. The second, &#8220;Interactive Retrieval Based on Faceted Feedback&#8221; by Lanbo Zhang and Yi Zhang, described an approach using facet values for relevance and pseudo-relevance feedback. It&#8217;s interesting work, but I think the authors should look at work my colleagues and I presented at <a href="http://research.microsoft.com/en-us/um/people/ryenw/hcir2008/">HCIR 2008</a> on distinguishing whether facet values are useful for summarization or for refinement. The third, &#8220;Understanding Web Browsing Behaviors through Weibull Analysis of Dwell Time&#8221; by Chao Liu, Ryen White, and Susan Dumais, offered an elegant model of dwell time and used it to predict dwell time distribution from page-level features. Finally, I attended one talk from the session on retrieval models and ranking: &#8220;Finding Support Sentences for Entities&#8221; by Roi Blanco and Hugo Zaragoza. They present a novel approach of generalizing snippets to interfaces that offer named entities (e.g., people) as supplements to the search results. I am excited to see research that could make richer interfaces more explainable to users.</p>
<p>I spend the last session of the day listening to a couple of talks about users and interactive IR. The first was &#8220;Studying Trailfinding Algorithms for Enhanced Web Search&#8221; by Adish Singla, Ryen White, and Jeff Huang<del datetime="2010-07-28T22:14:41+00:00">, turned out to be the best-paper winner</del>. This work extends <a href="http://research.microsoft.com/en-us/um/people/ryenw/publications.html">previous work</a> that Ryen and colleagues have done on search trails and showed results of various trailfinding algorithms that outperform the trails users follow on their own. The second, &#8220;Context-Aware Ranking in Web Search&#8221; by Biao Xiang, Daxin Jiang, Jian Pei, Xiaohui Sun, Enhong Chen, and Hang Li, analyzes requerying behavior as reformulation, specialization, generalization, or general association, and demonstrates that knowing or inferring which the user is doing significantly improves ranking of the second query&#8217;s results.</p>
<p>The day wrapped up with a luxurious banquet at the <a href="http://www.ichotelsgroup.com/intercontinental/en/gb/locations/overview/gvaha">Hotel Intercontinental</a>, near the <a href="http://www.google.com/images?q=place+des+nations+geneva">Nations Plaza</a>. After sweating through conference sessions without air conditioning, it was a welcome surprise to enjoy great food in such an elegant setting.</p>
<script type="text/javascript" src="http://platform.linkedin.com/in.js"></script><script type="in/share" data-url="http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/"></script>]]></content:encoded>
			<wfw:commentRss>http://thenoisychannel.com/2010/07/23/sigir-2010-day-2-technical-sessions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

