Author: Daniel Tunkelang

High-Class Consultant.

Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn

Post author By Daniel Tunkelang
Post date September 30, 2011
4 Comments on Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn

Last week, I delivered the following presentation at the CMU Intelligence Seminar:

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=keepingitprofessional-110930233231-phpapp02&stripped_title=keeping-it-professional-relevance-recommendations-and-reputation-at-linkedin&userName=dtunkelang

I had a great audience, including the department head! Of course that meant fielding tough questions, but that’s what makes it fun to present at my alma mater. Now that it’s been over a decade since my defense, I can handle the tough questions. 🙂

Unfortunately there is no video, but hopefully the slides are reasonably self-explanatory. If you have questions, please ask them in the comments.

General

Visiting the East Coast: CMU and Strata New York

Post author By Daniel Tunkelang
Post date September 18, 2011

Tonight I’m taking a red-eye to Pittsburgh so that I can spend three days at my (doctoral) alma mater, CMU. In addition to spending time with lots of great students and faculty, my goal is to communicate a taste of the hard computer science problems we are solving (or trying to solve!) at LinkedIn. I’m giving a tech talk Tuesday afternoon, joining my colleagues for an info session Tuesday evening, and participating in the Technical Opportunities Conference (TOC) Wednesday.

Here’s a teaser for my tech talk:

You can find more details about LinkedIn’s visits to CMU and other campuses at http://studentcareers.linkedin.com/.

Hopefully some of you are attending the O’Reilly Strata Conference in New York this Thursday and Friday. If so, I encourage you to attend my panel session on “Entities, Relationships, and Semantics: the State of Structured Search“:

Structured search improves the search experience through the identification of entities and their relationships in documents and queries. This panel will explore the current state of structured and semi-structured search, as well as exploring the open problems in an area that promises to revolutionize information seeking.

The four panelists work on some of the world’s largest structured search problems, from offering users structured search on Google’s web corpus to building a computing system that defeated Jeopardy! champions in an extreme test of natural language understanding. They work on the data, tools, and research that are driving this field. They are all excellent researchers and presenters, promising to offer a informative and engaging panel discussion, for which I will act as moderator.

Panelists:

Andrew Hogue is a Senior Staff Engineer and Engineering Manager in the Search Quality group at Google New York. He has worked on a wide array of projects including question answering, Google Squared, sentiment analysis, local and product search, and Google Goggles. His is interested in the areas of structured data, information extraction, and machine learning, and their applications to search and search interfaces. Prior to Google, he earned a M.Eng. and B.S. in Computer Science from MIT.

Breck Baldwin is the President of Alias-i, creators of the popular LingPipe computational linguistics toolkit. He received his Ph.D. in computer science in 1995 from the University of Pennsylvania. In the time between his thesis on coreference resolution and evaluation and founding Alias-i in 1999, Breck worked on DARPA-funded projects through the University of Pennsylvania.

Evan Sandhaus works as the Semantic Technologist in The New York Times Research and Development Labs. He is spearheading The New York Times Linked Open Data Strategy and overseeing the release of 1.8 million documents to the computer science research community. Previously, Evan helped to put The New York Times on Google Earth, collaborated with New York University to explore new directions in News Search, and worked to bring The New York Times to Facebook.

Wlodek Zadrozny is an IBM Researcher working on natural language applications. Most recently he worked on text sources for Watson (IBM’s Jeopardy chamption) and applying related DeepQA technology to business problems. His previous work ranged from language processing research to product development and technical planning; in particular, he lead the development of interactions systems that used speech, natural language and focused search. Wlodek Zadrozny received a Ph.D. in Mathematics, from the Polish Academy of Science.

And one more thing. Karaoke at Second on Second in the East Village on Friday night. It’s an unofficial Strata after-party, so come join us Big Data folks for some Big Fun.

General

A Different Anniversary: Happy Birthday, Endeca!

Post author By Daniel Tunkelang
Post date September 11, 2011
2 Comments on A Different Anniversary: Happy Birthday, Endeca!

I grew up in New York City. On September 11th, 2001, I was in Cambridge, Massachusetts, desperately trying to get through to my parents by all means of communication at my disposal. My dad worked at 40 Worth Street, only a few blocks away from the World Trade Center. Thankfully none of my family or friends were harmed that day, but that fateful event ten years ago left a mark on the world that no one of my generation will ever forget.

Fortunately I have happier associations with this anniversary.

On September 11th, 1999, I boarded an Amtrak from New York to Boston to join Steve Papa, Pete Bell, Dave Gourley, Fritz Knabe, Jack Walter, and Phil Braden to start the company that would eventually be named Endeca. I had no way of knowing whether we would persuade VCs to fund us beyond our six months of seed investment, let alone that we would develop a technology that to revolutionize the search experience of millions of users around the world. Our modest ambition was to build a better way to find stuff on eBay. That goal remains unfulfilled, but 44 of the top 100 online retailers use Endeca, which isn’t too shabby. Especially considering that Endeca has expanded well beyond online retail into domains like manufacturing, business intelligence, and government.

On Seprtember 11th, 2002, I gathered the Endeca founding team for a dinner to celebrate the company’s 3rd birthday. Given my reputation for general irreverence, I feared that my colleagues would think this was a stunt to mock the memory of the more familiar 9/11. But it was quite the opposite. September 11th, 1999 was a turning point in my professional life, and no terrorist was going to take that happiness away from me. To this day I am grateful that my colleagues recognized my sincerity and joined me in this celebration.

The dinner that night was an emotional one: 2002 had been a tough year for the software industry — one in which we saw many of our peer companies fold. Fortunately it was the beginning of much better times for us: from 2003 to 2006, Endeca was the fastest growing private company in Massachusetts. No IPO yet, but the rumors are encouraging.

I left Endeca almost two years ago, going to Google and then LinkedIn. But I will always have fond memories of the decade I spent at Endeca — an experience that established much of the passion that drives me today. I am very proud to have been part of the founding team of such a great company, even if now I can only follow from a distance.

Happy birthday, Endeca, and many more to come!

General

Attention CMU Students!

Post author By Daniel Tunkelang
Post date September 7, 2011

As many of you know, I’m a proud alumnus of the CMU School of Computer Science (yes, I also attended the CMU of Massachusetts). I’m delighted to have the opportunity to spend a few days on campus this month, and I hope that I’ll have a chance to meet with lots of students and faculty while I’m there.

Specifically, I’ll be giving a talk at Eugene Fink’s Intelligence Seminar on Tuesday, September 20th at 3:30pm in Gates-Hillman 4303:

Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn

LinkedIn operates the world’s largest professional network on the Internet with more than 120 million members in over 200 countries. In order to connect its users to the people, opportunities, and content that best advance their careers, LinkedIn has developed a variety of algorithms that surface relevant content, offer personalized recommendations, and establish topic-sensitive reputation — all at a massive scale. In this talk, I will discuss some of the most challenging technical problems we face at LinkedIn, and the approaches we are taking to address them.

I hope to see all of you there! My colleagues and I will also be hosting an information session that same Tuesday at 6pm in Porter Hall, Room 125B, as well as participating in the Technical Opportunities Conference Tuesday and Wednesday. And of course LinkedIn will be conducting on-campus interviews: those will take place all day on Thursday, September 22nd.

If you are a CMU student interested in opportunities at LinkedIn, please apply through TartanTrak (yes, I wish you could just apply with LinkedIn — we’ll get there!). Of course, feel free to reach out to me personally at dtunkelang@linkedin.com. We already have more applicants than slots, but I promise that every application will be considered. I’m very excited to recruit CMU students to strengthen our growing team of software engineers and data scientists.

See you soon, and let’s go Tartans!

General

Dream. Fit. Passion.

A few days ago, our CEO Jeff Weiner led a session at LinkedIn on how to “close” candidates — that is, how to persuade candidates to join your team once you have found and interviewed them. Since not everyone has the opportunity to work at LinkedIn and experience Jeff’s leadership first-hand, I thought I’d share some of his wisdom here.

The key take-away was that closing a candidate is not about selling the job or company to the candidate, but rather working with the candidate to figure out what the candidate wants and whether the job will help him or her achieve that desire. As an employer, you need to do three things to close a candidate:

1) Figure out what is the candidate’s dream.
2) Determine if job and candidate are the right fit.
3) Communicate your own passion.

Let’s take these one at a time.

Dream.

As I’ve written here in the past, we have to dare to dream. Most of us rely on jobs to sustain us and our loved ones — and for some a job is nothing more than that. There’s no shame in having a dream that is unrelated to a job — Franz Kafka famously worked in a variety of “bread jobs” in order to pay the bills while he wrote novels. Others find their calling as humanitarians, activists, or care givers. It’s easy for many of us to forget that life isn’t always about work.

But the great thing about working in technology is that you can get paid to fulfill your own dream. Look at Larry and Sergey, who set out to organize the world’s information. Or Steve Jobs, whose dream has been to create innovative products. Not everyone is as specific in their dreams or as successful in realizing them, but, as the saying goes, you have to be in it to win it.

Convincing a person to accept a job offer works best when that job brings the person closer to fulfilling his or her dream. My own decisions to go to Google and then LinkedIn are good examples. Working at Endeca drove me to pursue a vision of HCIR — to optimize the way people and machines work together to solve information seeking and exploration tasks. At Google, I hoped to bring exploratory search to the open web. I’ll concede that I did not make much headway, but I’m glad that I tried.

And at LinkedIn, I work on problems that not only stretch the boundaries of information science, but whose solutions help millions of other people achieve their dreams by making them more successful professionally. My dream is to truly reduce HCIR to practice so that people can lead better and more productive lives. Once the folks at LinkedIn understood my dream, closing me was just a matter of offering me the keys to make that dream a reality.

If you want someone to work at your company, get to know that person’s dreams. If the job you are offering can’t help him or her realize those dreams, be honest about it. It’s better for both of you, and for a world that is better off with people devoting their lives’ work to fulfilling their dreams.

Fit.

Fit is a two way street: the candidate should be right for the job, and the job should be right for the candidate. The interviewing process typically focuses on establishing the former, but we often forget that the candidate’s decision focuses on the latter. Just because someone is capable of doing a job doesn’t mean it’s the right job for that person.

For me, fit means many things. A work environment where people work hard and take the company’s success personally. Incentives that allow everyone to win, rather than a zero-sum game where people compete for scarce opportunities. Openness, since I’m someone who lives most of my life in public. I could go on — but I hope you get the general idea. Fit is the set of functional and non-functional requirements that determine whether someone will enjoy a job. And people who enjoy their jobs tend to be productive and stay a while.

If you are trying to persuade someone to accept a job offer, you have to see the decision from that person’s point of view. In other words, ask yourself — and convincingly answer — why the job is the right fit for the candidate. That means accepting the possibility that is isn’t the right fit, and doing right by the candidate even if that means backing off.

Passion.

Choosing a job is one of the most important life decisions that people make. It’s not quite up there with getting married or having a child, but it’s a a decision that most people take (and should take) very seriously. Some people create spreadsheets of the pros and cons to compare opportunities and try to frame their decision as an optimization problem. Others go with their gut.

Those who know me personally — whether from face-to-face or online interaction — know that I wear my passion on my sleeve. I can’t understand how someone could get up in the morning and go to work without being passionate about his or her job. I know that many people don’t have a choice in the matter, and I pity them. In a country where most people take subsistence for granted, having a job you love strikes me as a necessity, rather than a luxury.

But what is clear is that if you, as an employer, are not passionate about what you do, you have no business expecting a candidate to take such a big leap of faith with you. Moreover, passion is hard to fake. As it should be — I’m not suggesting that employers should pretend to be excited about their jobs. Rather, your own sincere excitement is a baseline for those you hope to attract to your team. Passion is contagious, and passion is the raw material for making dreams come true.

Dream. Fit. Passion.

There you have it: dream, fit, passion. And remember, closing isn’t selling. Do right by the people you try to hire. After all, jobs are short, but careers are long. Celebrate everyone’s professional success, and take your losses in stride. I can tell you from experience that it all works out for the best.

General

Retiring a Great Interview Problem

Post author By Daniel Tunkelang
Post date August 8, 2011
105 Comments on Retiring a Great Interview Problem

Interviewing software engineers is hard. Jeff Atwood bemoans how difficult it is to find candidates who can write code. The tech press sporadically publishes “best” interview questions that make me cringe — though I love the IKEA question. Startups like Codility and Interview Street see this challenge as an opportunity, offering hiring managers the prospect of outsourcing their coding interviews. Meanwhile, Diego Basch and others are urging us to stop subjecting candidates to whiteboard coding exercises.

I don’t have a silver bullet to offer. I agree that IQ tests and gotcha questions are a terrible way to assess software engineering candidates. At best, they test only one desirable attribute; at worst, they are a crapshoot as to whether a candidate has seen a similar problem or stumbles into the key insight. Coding questions are a much better tool for assessing people whose day job will be coding, but conventional interviews — whether by phone or in person — are a suboptimal way to test coding strength. Also, it’s not clear whether a coding question should assess problem-solving, pure translation of a solution into working code, or both.

In the face of all of these challenges, I came up with an interview problem that has served me and others well for a few years at Endeca, Google, and LinkedIn. It is with a heavy heart that I retire it, for reasons I’ll discuss at the end of the post. But first let me describe the problem and explain why it has been so effective.

The Problem

I call it the “word break” problem and describe it as follows:

Given an input string and a dictionary of words,
segment the input string into a space-separated
sequence of dictionary words if possible. For
example, if the input string is "applepie" and
dictionary contains a standard set of English words,
then we would return the string "apple pie" as output.

Note that I’ve deliberately left some aspects of this problem vague or underspecified, giving the candidate an opportunity to flesh them out. Here are examples of questions a candidate might ask, and how I would answer them:

Q: What if the input string is already a word in the
   dictionary?
A: A single word is a special case of a space-separated
   sequence of words.

Q: Should I only consider segmentations into two words?
A: No, but start with that case if it's easier.

Q: What if the input string cannot be segmented into a
   sequence of words in the dictionary?
A: Then return null or something equivalent.

Q: What about stemming, spelling correction, etc.?
A: Just segment the exact input string into a sequence
   of exact words in the dictionary.

Q: What if there are multiple valid segmentations?
A: Just return any valid segmentation if there is one.

Q: I'm thinking of implementing the dictionary as a
   trie, suffix tree, Fibonacci heap, ...
A: You don't need to implement the dictionary. Just
   assume access to a reasonable implementation.

Q: What operations does the dictionary support?
A: Exact string lookup. That's all you need.

Q: How big is the dictionary?
A: Assume it's much bigger than the input string,
   but that it fits in memory.

Seeing how a candidate negotiates these details is instructive: it offers you a sense of the candidate’s communication skills and attention to detail, not to mention the candidate’s basic understanding of data structures and algorithms.

A FizzBuzz Solution

Enough with the problem specification and on to the solution. Some candidates start with the simplified version of the problem that only considers segmentations into two words. I consider this a FizzBuzz problem, and I expect any competent software engineer to produce the equivalent of the following in their programming language of choice. I’ll use Java in my example solutions.

String SegmentString(String input, Set<String> dict) {
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      if (dict.contains(suffix)) {
        return prefix + " " + suffix;
      }
    }
  }
  return null;
}

I have interviewed candidates who could not produce the above — including candidates who had passed a technical phone screen at Google. As Jeff Atwood says, FizzBuzz problems are a great way to keep interviewers from wasting their time interviewing programmers who can’t program.

A General Solution

Of course, the more interesting problem is the general case, where the input string may be segmented into any number of dictionary words. There are a number of ways to approach this problem, but the most straightforward is recursive backtracking. Here is a typical solution that builds on the previous one:

String SegmentString(String input, Set<String> dict) {
  if (dict.contains(input)) return input;
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
    }
  }
  return null;
}

Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit stack) in half an hour. I’m sure that many of them are competent and productive. But I would not hire them to work on information retrieval or machine learning problems, especially at a company that delivers search functionality on a massive scale.

Analyzing the Running Time

But wait, there’s more! When a candidate does arrive at a solution like the above, I ask for an big O analysis of its worst-case running time as a function of n, the length of the input string. I’ve heard candidates respond with everything from O(n) to O(n!).

I typically offer the following hint:

Consider a pathological dictionary containing the words
"a", "aa", "aaa", ..., i.e., words composed solely of
the letter 'a'. What happens when the input string is a
sequence of n-1 'a's followed by a 'b'?

Hopefully the candidate can figure out that the recursive backtracking solution will explore every possible segmentation of this input string, which reduces the analysis to determine the number of possible segmentations. I leave it as an exercise to the reader (with this hint) to determine that this number is O(2ⁿ).

An Efficient Solution

If a candidate gets this far, I ask if it is possible to do better than O(2ⁿ). Most candidates realize this is a loaded question, and strong ones recognize the opportunity to apply dynamic programming or memoization. Here is a solution using memoization:

Map<String, String> memoized;

String SegmentString(String input, Set<String> dict) {
  if (dict.contains(input)) return input;
  if (memoized.containsKey(input) {
    return memoized.get(input);
  }
  int len = input.length();
  for (int i = 1; i < len; i++) {
    String prefix = input.substring(0, i);
    if (dict.contains(prefix)) {
      String suffix = input.substring(i, len);
      String segSuffix = SegmentString(suffix, dict);
      if (segSuffix != null) {
        return prefix + " " + segSuffix;
      }
  }
  memoized.put(input, null);
  return null;
}

Again the candidate should be able to perform the worst-case analysis. The key insight is that SegmentString is only called on suffixes of the original input string, and that there are only O(n) suffixes. I leave as an exercise to the reader to determine that the worst-case running time of the memoized solution above is O(n²), assuming that the substring operation only requires constant time (a discussion which itself makes for an interesting tangent).

Why I Love This Problem

There are lots of reasons I love this problem. I’ll enumerate a few:

It is a real problem that came up in the course of developing production software. I developed Endeca’s original implementation for rewriting search queries, and this problem came up in the context of spelling correction and thesaurus expansion.
It does not require any specialized knowledge — just strings, sets, maps, recursion, and a simple application of dynamic programming / memoization. Basics that are covered in a first- or second-year undergraduate course in computer science.
The code is non-trivial but compact enough to use under the tight conditions of a 45-minute interview, whether in person or over the phone using a tool like Collabedit.
The problem is challenging, but it isn’t a gotcha problem. Rather, it requires a methodical analysis of the problem and the application of basic computer science tools.
The candidate’s performance on the problem isn’t binary. The worst candidates don’t even manage to implement the fizzbuzz solution in 45 minutes. The best implement a memoized solution in 10 minutes, allowing you to make the problem even more interesting, e.g., asking how they would handle a dictionary too large to fit in main memory. Most candidates perform somewhere in the middle.

Happy Retirement

Unfortunately, all good things come to an end. I recently discovered that a candidate posted this problem on Glassdoor. The solution posted there hardly goes into the level of detail I’ve provided in this post, but I decided that a problem this good deserved to retire in style.

It’s hard to come up with good interview problems, and it’s also hard to keep secrets. The secret may be to keep fewer secrets. An ideal interview question is one for which advance knowledge has limited value. I’m working with my colleagues on such an approach. Naturally, I’ll share more if and when we deploy it.

In the meantime, I hope that everyone who experienced the word break problem appreciated it as a worthy test of their skills. No problem is perfect, nor can performance on a single interview question ever be a perfect predictor of how well a candidate will perform as an engineer. Still, this one was pretty good, and I know that a bunch of us will miss it.

ps. Check out this post by Behdad Esfahbod that does the problem justice! He also notes that looking up a string in a dictionary isn’t O(1) but has a cost proportional to the string length, which motivates exploring a trie rather than a dictionary to store the strings.

General

Upcoming Information Retrieval Conferences

Post author By Daniel Tunkelang
Post date July 31, 2011
1 Comment on Upcoming Information Retrieval Conferences

I hope everyone who attended the recent SIGIR 2011 in Beijing had an excellent experience. I didn’t manage to make it to that side of the globe myself, but I’m looking forward to hearing back from my LinkedIn colleagues who were there — particularly Paul Ogilvie, who gave an invited talk at the first Workshop on Entity-Oriented Search (EOS) on “Anchoring Relevance with Entities”.

There are four outstanding information retrieval conferences coming up, and I will have the pleasure of participating in three of them. I’d like to make sure readers here are aware of all of them.

The first is KDD 2011, which will take place August 21-24, 2011 in San Diego, CA. The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia, industry, and government to share their ideas, research results and experiences. KDD-2011 will feature keynote presentations, oral paper presentations, poster sessions, workshops, tutorials, panels, exhibits, demonstrations, and the KDD Cup competition. KDD-2011 will run from August 21-24 in San Diego, CA and will feature hundreds of practitioners and academic data miners converging on the one location.

I will not be attending KDD myself, but several of my colleagues will be there. In particular, Ron Bekkerman will be presenting a paper on “High-Precision Phrase-Based Document Classification on a Modern Scale”, as well as offering a tutorial on “Scaling Up Machine Learning: Parallel and Distributed Approaches”.

Workshop on Human-Computer Interaction and Information Retrieval (HCIR 2011) – Mountain View, CA – October 20, 2011

The second is HCIR 2011, the fifth annual HCIR workshop, which I am co-organizing. It will be held all day on Thursday, October 20th, 2011 at Google’s main campus in Mountain View, California. There will be a reception on Wednesday evening before the workshop. Our keynote speaker this year will be Gary Marchionini, Dean of the School of Information and Library Science, University of North Carolina at Chapel Hill. We are also excited to continue the HCIR Challenge, this year focusing on the problem of information availability, where the seeker faces uncertainty as to whether the information of interest is available at all. The corpus will be the CiteSeer digital library of scientific literature, which contains over 750,000 documents and provides rich meta-data about documents, authors, and citations.

Thanks to generous contributions made by Google, Microsoft Research, and Endeca, there will be no registration fee for HCIR this year. Information about how to register will be sent to authors of accepted position papers, research papers, and challenge reports. Note that the submission deadline has been extended by two weeks to Sunday, August 14th. I strongly encourage you to submit in one of these categories in you are working in this field.

The third is RecSys 2011, the 5th ACM International Conference on Recommender Systems. RecSys 2011 builds on the success of the Recommenders 06 Summer School in Bilbao, Spain and the series of four successful conference events from 2007 to 2010 in Minneapolis (2007), Lausanne (2008), New York (2009) and Barcelona (2010). In these events many members of the practitioner and research communities valued the rich exchange of ideas made possible by the shared plenary sessions. The 5th International conference will promote the same close interaction among practitioners and researchers.

I will be giving a tutorial at RecSys 2011 on “Recommendations as a Conversation with the User”.

The fourth is CIKM 2011, the 20th ACM Conference on Information and Knowledge Management. It will take place in Glasgow, Scotland, UK, 24th-28th October 2011. Since 1992, the CIKM has successfully brought together leading researchers and developers from the database, information retrieval, and knowledge management communities. The purpose of the conference is to identify challenging problems facing the development of future knowledge and information systems, and to shape future research directions through the publication of high quality, applied and theoretical research findings. CIKM 2011 will continue the tradition of promoting collaboration among multiple areas in the general areas of databases, information retrieval, and knowledge management.

I am proud to be organizing the CIKM 2011 Industry Event, which will feature such industry heavyweights as Stephen Robertson (Microsoft Research), John Giannandrea (Google), Vanja Josifovski (Yahoo! Research), Ilya Segalovich (Yandex), Jeff Hammerbacher (Cloudera), and Chavdar Botev (LinkedIn).

I’m very excited about all four of these opportunities to exchange ideas about information retrieval and related areas, and I am grateful to LinkedIn for supporting my participation, as well as that of my colleagues. I hope to see some of you at these events!

General

Attention vs. Privacy

A major feature of the recently released Google+ is Circles, which allows you to “share relevant content with the right people, and follow content posted by people you find interesting.”

Most people seem to look at Circles as a privacy feature — and indeed Google’s official description gives the impression that Circles exist to manage privacy based on real-life social contexts. Of course, re-sharing can result in unintended consequences, and Google even offers a warning that:

Unless you disable reshares, anything you share (either publicly or with your circles) can be reshared beyond the original people you shared the content with. This could happen either through reshares or through mentions in comments.

Privacy is a big deal, especially for Google — and particularly in the context of rolling out a new social network. Still, I’m not persuaded that privacy is the only or even the primary concern motivating the concept of social circles.

Sharing content with someone is not just about giving that person permission to see it. Sharing content with someone asserts a claim on that person’s attention. While it may be a privilege for me to have access to your content, it may be even more of a privilege for you that I allocate my scarce attention to consume it.

What if we focus on routing content to the people who would find it most interesting? Such an approach works best if all of the shared content is public with respect to permissions — that is, people post it without any expectation of privacy. Twitter demonstrates that many people are comfortable with such a sharing model. Imagine if they could learn to trust a system that optimizes (or at least attempts to optimize) the allocation of everyone’s attention. This is not an easy problem by any means, nor is it one that is likely to be solved by algorithms alone. It will take a strong dose of HCIR to get it right. But, at least in my view, optimizing the allocation of human attention is the grand challenge that everyone working with information retrieval or social networks should be striving to address.

Privacy is important, and social networks should offer simple, robust privacy controls that users understand. We all have experienced the problem of filter failure. But sharing isn’t just about privacy. Our attention is our most precious cognitive asset, both as individuals and as a society, Moreover, our attention faces ever-increasing demands as our social lives evolve in an online world relatively free of physical constraints. Social network developers would do well to pay attention…to attention.

General

Guest Post: Diego Basch on The Need for Speed

Post author By Daniel Tunkelang
Post date July 17, 2011
6 Comments on Guest Post: Diego Basch on The Need for Speed

Diego Basch is the CEO and founder of IndexTank, a hosted search service that powers major web sites such as Reddit, Twitvid, blip.tv, as well as providing a WordPress plug-in for blogs (like this one). Diego gained his search experience working with Inktomi, where he wrote some of the world’s first web-scale link analysis algorithms. He is on a mission to make every search box blazing fast and useful.

So much brainpower is spent solving the wrong problems. The world is filled with solutions looking for problems that nobody has — as illustrated by a Google query for [stupidest inventions ever]. More often, people focus narrowly on a particular approach when they should focus on the problem the approach is intended to solve. Or they take a solution for one problem and assume it will apply to another.

Consider the emphasis that search engine developers place on relevance ranking. It is not hard to understand why web-scale search engines emphasize relevance. For example, a search on Google for [emergency locksmith] returns tens of billions of web pages, among which there are only a handful results that you want. Google must filter out the growing number of lead generation companies that spend a ton of money trying to game its results.

Most web and application developers are familiar with the concept of relevance, so they naturally assume that it should be the primary concern when they add search to their own sites or apps. When I talk to people who want full-text search for their 40,000 book titles or 100k classified ads, they ask me about all the ways they can tune relevance. But often they are focusing on a solution, rather than their fundamental problem.

Developers are (or should be!) trying to improve the user experience of their application search. Too often they wrongly assume that relevance is the single most important factor for optimizing this user experience. Let’s surface this confusion in a concrete example.

As a rock climber, once in a while I feel the aches and pains caused by the sport. As the years go by it’s very important to keep your tendons healthy if you do not want to take forced breaks (or type with one hand!). Rockclimbing.com is one of the most popular climbing sites, and I know some medical professionals who occasionally answer health-related questions there. Let’s search there for [tendon injury prevention].

In the above example, part of the problem is that the search results do not have contextual snippets. Maybe there is relevant information hiding behind a click, but the user has no way of knowing. More generally, there’s no hint as to what results could be better. Information such as score of the answer (which is available), the author’s bio (e.g. “climber, physical therapist”) would make the decision easier. If you need to click and scroll, search within the page, go back and try something else, then the search engine is wasting your time.

Which brings us the broader point: when users search, they want to spend the least amount of time possible getting to the information they want. Relevance is a means to this end. In particular, clicks and typing costs users time. That time can come from page load, rendering, repeated use of the back button, and of course typing (and re-typing) search queries.

Some application search engines really nail the user experience. Let’s say we’re looking for the movie Koyaanits-however-you-spell-it. Go to the Internet Movie Database (IMDB) and start typing k-o-y-e — and there it is, as the second result. Notice that there is a ton of irrelevant stuff around it but it doesn’t matter. I see what I want very quickly.

Hopefully these two examples serve to illustrate the broader point: search engines should not focus on relevance as an end in itself, but rather on whatever helps users find the information they want as quickly as possible. That means offering contextual snippets, instant feedback, and of course snappy response times. Give users speed, and you will make them happy.

General

Google±?

When I left Google last December, it was an open secret that Google was developing a social networking product. Now that Google has released Google+, I am at liberty to share my personal impressions.

Let’s start with the clear wins.

Impressive launch. Google has certainly learned its lesson from the past launches of Wave and Buzz. Google+ is unambiguously opt-in — no one is going to complain about being ambushed. People have been begging for invites. But Google is wisely releasing invites quickly enough to build critical mass. I’d say that Google has at least picked up the Quora crowd of early adopters in Silicon Valley.

Clean design. Design lead Andy Hertzfeld (of Macintosh fame) has nailed it, leading bloggers to comment that this looks too well designed to be a Google product. Comparing Google+ to Facebook now, I’m reminded at least a little of comparisons between Facebook and Myspace. Great move for Google here.

Now let’s talk about Google’s three big features here: Circles, Sparks, and Hangouts.

Circles. Straight out of Paul Adams’s presentation of social networking (which he created before he left Google for Facebook), the idea is simple: a person doesn’t have a single group of friends, but rather several groups that tend are mostly disjoint. Through Circles, Google+ makes this soft partitioning of the social space a core design principle. You add people to one or more circles, follow the stream of activity from a circle, and share with circles. It’s great in theory. But in practice it creates friction, especially for people trained on Facebook. There’s a trade-off between simplicity and expressive power, and Google is placing a strong bet on how users will make this trade-off. I’m inclined to agree with Yishan Wong that “the sorting of friends into buckets (friend lists) is something that only nerds do”. Given Google’s deep expertise in machine learning, I’m expecting Google to reduce this friction by give users intelligent suggestions. Full disclosure: my colleagues at LinkedIn built InMaps, which infers communities from your social network.

Sparks. The tagline for Sparks is “For nerding out. Together.” It feels like a positioning designed by Googlers for Googlers– you can see promotional videos here and here. I haven’t seen much talk about Sparks, and what little commentary I’ve seen is less than gushing. I’ve experimented with it a bit from a consumption side, and I confess I’m underwhelmed. Perhaps it’s a chicken-and-egg problem — Sparks will only be useful if users populate their profiles with interests, but right now users have no incentive to do so. If Sparks is Google’s attempt to make Reader more social, there’s still a ways to go. Full disclosure: LinkedIn has its own approach to social news, LinkedIn Today, which seems to be doing something right. 🙂

Hangouts. In plain English, Hangouts are group video chat embedded in a social network. Which sounds a lot like what Facebook is rumored to be releasing this week through a partnership with Skype. Which in turn was just acquired by Microsoft. Will Apple join the party too by implementing group chat in FaceTime? Competitive dynamics aside, this is a very cool feature that hopefully won’t devolve into Chatroulette. Nothing to, um, disclose here.

But the $64B question is whether all this will matter. Can Google+ sustainably co-exist with Facebook? Will people use both services — and, if so, how will they allocate their attention between them? Or is the success of Google+ predicated on displacing Facebook? Or Twitter? Either of those would certainly qualify as a Big Hairy Audacious Goal.

Like Fred Wilson, I’m rooting for Google+ to succeed — but even Fred notes that he would not be able to get his family on Google+, as they are already happy with Facebook. It’s not clear to me what I can get *today* from Google+ that I can’t get from Facebook.

Granted, I’m not a heavy Facebook user, so I’m not the best person to ask this question. So readers, I ask you: why will or won’t you use Google+?