Categories
General

InSecret: A LinkedIn Hackday Master Tries Something Different

LinkedIn Hackdays are an awesome opportunity for innovation — learn more about them here. But first check out this unusual entry by Hackday master Dhananjay Ragade, whose previous hacks include the LinkedIn Year in Review:

http://www.xtranormal.com/site_media/players/jw_player_v54/player.swf

Don’t worry, it’s safe for work. Well, unless you work for Linden Lab. 🙂

Categories
General

It Just Works

Given that I work for the world’s largest professional network, I take work very personally. I’m also deeply involved in LinkedIn’s hiring process, which gives me opportunities to see how people make career decisions. I thought I’d share my own perspective here.

For me there are three things that matter to me about my work:

  1. Do I love the work I do? Does work feel like play, stimulating me intellectually and emotionally? Am I excited about the people I work with? Is work a grind, or is it something I do for fun?
  2. Is the work I do of value to my employer? Am I justifying my employer’s investment in me, or am I a freeloader lost in the inefficiency of corporate bureaucracy?
  3. Is my work making the world a better place? Specifically, is the work I do making the world by more like the world I want to live in?

Not everyone may share my above values, and in any case not every job can address all of these values. But I am fortunate to have found one that does, and I’m loving it. To borrow a phrase, it just works.

If you haven’t seen this video by Dan Pink on what motivates people, I urge you to watch it. It’s a great reminder that there is more to motivation than economic incentives.

Finally, I hope that you are doing work that fulfills you. As I work to grow my great team at LinkedIn, my mission is not only to to bring great people to LinkedIn, but bring great work and fulfillment to great people. Whatever you do, be amazing.

Categories
General

Foo for Thought

Last weekend I had the extraordinary privilege to attend Foo Camp, an annual gathering of about 250 Friends Of O’Reilly (aka Foo). Tim O’Reilly, Sara Winge, and their colleagues have amazing friends, as you can see if you scan this unofficial list of attendees working on big data, open government, computer security, and more generally on the cutting edge of technology and culture (especially where the two overlap).

Foo Camp is an unconference, which merits some elaboration. No fees, no conference hotel (many attendees literally set up camp in the space O’Reilly provided), and no advance program aside from some preselected 5-minute Ignite presentations. Attendees proposed and organized sessions, merging and re-arranging them to optimize for participation. It was a bit chaotic (especially the mad rush after dinner to secure session slots), but very effective.

The minimalist format brought out the best in participants.

For example, I am passionate about (i.e., against) software patents, so I organized a session about them. I did a double-take when I realized that one of the participants was Pamela Samuelson, perhaps the world’s top expers on intellectual property law. I braced myself to be schooled — as I was. But she did it gently and constructively. Specifically, she pointed me to work that her colleagues Jason Schultz and Jennifer Urban were doing on a defensive patent strategy for open-source software (including a proposed license), as well as reminding me of the Berkeley Patent Survey supporting the argument that software entrepreneurs only file for patents because of real or perceived pressure from their investors. I also heard war stories from lawyers who have done pro bono work against patent trolls, reinforcing my own resolve and also reassuring me that the examples I’ve seen at close range are not isolated.

Another session asked whether we are too data driven in our work. What was notable is that this session included participants from some of the largest internet companies debating some of the must fundamental ways in which we work, e.g., do we actually learn from data or do we engage in assault by data to defend preconceived positions (cf. argumentative theory). Like all of the conference, the discussion was under “frieNDA”. so I’m being intentionally vague on the specifics. But it was refreshing to see candid admission that all of us know and have experienced the dangers of manipulating an audience with data, and that there are no algorithms to enforce common sense and good faith.

I won’t even try to enumerate the sessions and side conversations that excited me — topics included privacy, the future of publishing, a critical analysis of geek culture, and irrational user behavior. I missed the session on data-driven parenting, though others have pointed out to me that you can only learn so much if you don’t have twins and perform A/B tests. The best summary is intellectual diversity and overstimulation. If you’d like to get a general sense of the discussion, check out the #foocamp tweet stream. I also recommend Scott Berkun’s post on “What I learned at FOO Camp“.

As someone who organizes the occasional event, I’m intrigued by the unconference approach — especially now that I’ve experienced it first-hand. Moreover, I feel strongly that the academic conference model needs an upgrade. But I also know that open-ended, free-form discussion sessions are not a viable alternative — indeed, a big part of Foo Camp’s success was how it inspired participants to organize sessions — and to vote with their feet to attend the worthwhile ones. And of course part of that success came from inviting active, engaged participants rather than passive spectators.

Many of you also organize events, and I’m sure that all of you attend them. I’m curious to hear your thoughts about how to make them better, and happy to share more of what I learned at Foo Camp. After all, Foo is for (inspiring) thought.

Categories
General

Christos Faloutsos: Mining Billion-Node Graphs

As promised, here is a video of CMU professor Christos Faloutsos‘s recent tech talk at LinkedIn on “Mining Billion-Node Graphs“. Enjoy!

And check out our next week’s open tech talk by Sreenivas Gollapudi of Microsoft Research on “A Framework for Result Diversification in Search“.

ps. If you like these topics, then please talk to me about opportunities at LinkedIn! My group is hiring, as are many others.

Categories
General

Winning the War for Software Engineering Talent

The war for talent. It’s the latest metaphor for the challenge that tech companies face as excitement is building in Silicon Valley again. Well, not really — McKinsey coined the phrase in 1997 and used it as the title of a book published four years later.

But anyone who has been trying to hire great software engineers in recent months knows how hard it is to do so. Particularly for folks like me who are trying to hire data scientists — apparently there’s a national shortage. This is nothing new — as Joel Spolsky noted in a 2006 post, “the great software developers, indeed, the best people in every field, are quite simply never on the market.”

I’m not an expert (or ninja) on the subject of recruiting or employer branding in general, but I’ve seen enough of how companies go about hiring software engineers to know that we can do better. I’d like to share some of my thoughts and experiences, and I hope that you will reciprocate and share your thoughts in the comments. I’m especially interesting in hearing from folks who are at universities (aka hunting grounds) or who are involved in organizing academic conferences.

First, let’s talk about how we measure success. As Lord Kelvin famously said, “If you can’t measure it, you can’t improve it.” I’m not going to talk about how to handle active candidates — that’s a filtering problem which, in my opinion, is much more tractable. For example, see what Joel has to say about interviewing developers. Rather, I’m concerned with the challenge of discovering qualified passive candidates and converting them into active ones. Hence, I propose we make our metric the number of qualified applicants.

The baseline strategy is sourcing, i.e. have sourcers or hiring managers scour the world for qualified candidates (there’s an app for that), entice them with your best recruiting pitch, and then go hog wild on the folks who respond. The success of this strategy depends mainly on the rate at which you, your sourcers, or your hiring managers find qualified candidates — which in turn may split into the two subtasks of finding candidates and filtering them — and the conversion rate for the qualified candidates you find. Since the best candidates are often happy in their current positions, sourcing passive candidates requires a lot of work and a thick skin for rejection.

What are other ways to attract qualified passive candidates? Here are a few, with examples from my experience at LinkedIn:

  • Hosting events. Last week at LinkedIn, we hosted CMU professor Christos Faloutsos, who delivered a fantastic talk on “Mining Billion Node Graphs” — a topic we thought interesting enough to justify opening up the talk to the general public. We had a few hundred guests, many of whom are precisely the kinds of folks we are trying to hire. Even more people watched the live stream online or will watch the video when we post it to YouTube (coming soon — stay tuned!). While this was not a recruiting event (we did not even announce that we are hiring), it was a great opportunity to associate LinkedIn with the hard computer science problems we solve on a daily basis.
  • Sponsoring events. Sponsorship is tricky — if you’re not careful, you spend a lot of money for a glorified display ad. Sometimes sponsorship offers speaking slots as part of the package, but audiences are rightfully skeptical of speakers who have paid for their slots — especially at conferences that charge hefty fees for attendance. But sometimes sponsorship works. For example, LinkedIn’s was a sponsor of the O’Reilly Strata Conference, and the perks of sponsorship complemented our earned speaker slots, helping us bring enormous visibility to our data scientist team and its recent innovations like InMaps (we has a booth there to print attendees’ InMaps) and Skills (which launched during the conference). While Strata generated few direct leads, it left a lasting impression in the big data community, and I regularly hear candidates refer to it.
  • Participating in events. As the Beatles tell us, money can’t buy you love. If you want to make an (positive) impression at a conference, you have to contribute people and ideas. This is especially true at academic conferences, where attendees quickly throw out the the extra weight in their tote bags and focus on the conference’s content and professional networking opportunities. It’s great if you are Microsoft with a team of close to a thousand researchers and can dominate a conference like SIGIR. But smaller companies can still make a strong impression on researchers — and especially on students who may be looking for internships or full-time positions — by taking an active role at conferences. The traditional approach is to submit papers to the main conference track — but other avenues include tutorials, workshops, and industry events. Such participation is often invited, but such invitations are in turn earned by cultivating relationships with researchers — especially the ones who find themselves on organizing committees.
  • Contribute to open source projects. The Search, Network, and Analytics (SNA) team at LinkedIn contributes frequently to open-source projects and publicizes some of its work at http://sna-projects.com/. Open source projects are a great way to earn the respect of engineers who value source over PowerPoint. Especially when your employees include committers to key technologies like Hadoop. Moreover, open-source projects are social communities, so contributing to them offers opportunities for employees to interact with potential hires.
  • Social media. By now, I’d like to think that marketers understand social media to simply be another set of marketing channels. But I think the territory is still pretty new for employers. Here is a simple suggestion: encourage (but do not try to force) employees to express themselves professionally online. Enforce the standard non-disclosure rules, of course, but don’t try to manage their voices. Authenticity speaks for itself — for example, look at what Adam Nash says about LinkedIn on his personal blog. Or my own posts here. Engineers don’t read press releases or  corporate blogs, but they do pay attention to their peers. And there’s nothing unique about blogs — the same principle applies to platforms like Twitter, Facebook, Quora, and of course LinkedIn. Not all employees enjoy being online extroverts, but those that do not only act as brand ambassadors, but also are likely to eventually strike up conversations with passive candidates about employment opportunities.

Finally, don’t forget measure the results of these efforts! Some activities generate leads directly, in which case you can make an apples-to-apples comparison of their results and costs with the baseline strategy of sourcing. It’s harder to measure the longer-term effect of efforts to raise visibility, but you can at least ask candidates if they are aware of those efforts — after all, efforts to raise visibility should be visible to candidates! You can also ask candidates if those efforts were a factor in their decision to apply. These measures aren’t perfect, but they are a lot better than nothing, especially when you’re trying to decide how best to invest limited resources.

Of course, even an optimal strategy can’t substitute for offering a combination of interesting work, competitive compensation, and a work hard / play hard culture. As with all marketing efforts, you need to start with a great product. But great products don’t sell themselves: you need to invest in a combination of outbound and inbound marketing to have a fighting chance in the war for talent. Good luck! And, in case you didn’t notice, we’re hiring!

Categories
General

I’d Like To Have An Argument Please

If you Google [relevance theory], you’ll discover this Wikipedia entry about a theory proposed by Dan Sperber and Deirdre Wilson arguing that, in any given communication situation, the listener will stop processing as soon as he or she has found meaning that fits his or her expectation of relevance. The Wikipedia entry offers the following example of this principle:

Mary: Would you like to come for a run?

Bill: I’m resting today.

We understand from this example that Bill does not want to go for a run. But that is not what he said. He only said enough for Mary to add the context-mediated information: i.e. someone who is resting doesn’t usually go for a run. The implication is that Bill doesn’t want to go for a run today.

This theory may call to mind the Gricean Maxims — indeed, Sperber and Wilson borrow heavily from Grice’s work.

But I mainly bring up relevance theory to introduce Sperber to those unfamiliar with him. My friend (and Endeca co-founder) Pete Bell recently called to my intention an article by neuroscientist Jonah Lehrer entitled “The Reason We Reason“. The article reviews the “hot hand” fallacy and then proceeds to cite a new theory by Sperber and Hugo Mercier:

Reasoning is generally seen as a means to improve knowledge and make better decisions. Much evidence, however, shows that reasoning often leads to epistemic distortions and poor decisions. This suggests rethinking the function of reasoning. Our hypothesis is that the function of reasoning is argumentative. It is to devise and evaluate arguments intended to persuade.

The full article by Mercier and Sperber runs over 17K works and is entitled “Why do humans reason? Arguments for an argumentative theory“.

As someone who has spent most of his professional life thinking about information retrieval in practical contexts, I automatically relate relevance theory to relevance in the context of information retrieval. Relevance has been a subject of intense debate in the information science community (Tefko Saracevic tells the story wonderfully). Indeed, a key reason that I created the HCIR workshop was the belief that information retrieval researchers and practitioners (i.e., search engine developers) were placing too much emphasis on an objective notion of topical relevance, and not enough focus on the user.

Mercier and Sperber’s theory offers an interesting challenge to information retrieval researchers: perhaps a user’s information need is less about arriving at the truth and more about finding confirmatory evidence to support a preconceived conclusion. If so, should we adjust our notions of relevance accordingly? Also, if we evaluate or inform search quality based on observed user behavior (such as click-through behavior), then are we already inadvertently conflating topical relevance with users’ confirmatory bias?

Many people have noted that personalization gives us the truth we want: recent examples include Robin Sloan and Matt Thompson’s EPIC 2014 and Eli Pariser’s The Filter Bubble. Despite the consensus that over-fitting information access to our personal tastes is a bad thing (perhaps even dystopian), technology seems to relentlessly push us in this direction. Moreover, some degree of personalization is clearly useful — such as prioritizing information that relates to our personal and professional interests.

Nonetheless, anyone working in the area of information seeking systems should be concerned with the question of the user’s goal in using that system. Many of us take for granted that the user’s main goal is truth seeking, and we design our systems accordingly. What can or should we do differently if the user’s main goal is not informative but persuasive? Is the user looking for an answer…or an argument?

Categories
General

Going Public

What a day! I’ve been excited about LinkedIn from the moment I joined — and for several years before that — but today has been a unique experience. I hope our celebration extends beyond LinkedIn’s employees and investors — this is a great day for Silicon Valley, for the data scientists who are building its most valuable companies, and for the users who are benefiting from it all. I am proud and deeply grateful to be a part of this extraordinary adventure. My thanks to my hundreds of incredible colleagues and to the 100M users who have made it possible.

ps. Yes, we are still hiring, so please contact me if you’re the kind of person who loves turning data into gold. And if you are local, check out Christos Faloutsos’s upcoming tech talk on Mining Billion Node Graphs, which will take place at LinkedIn on June 2 and is open to the public.

Categories
General

In Search Of Structure

A couple of weeks ago, I participated in a summit that Greylock Partners organized for its portfolio companies at LinkedIn to discuss the power of data. Invited participants represented some of the most interesting “big data” companies in Silicon Valley, including Google, Facebook, Pandora, Cloudera, and Zynga. Discussion took place under the Chatham House Rule, so I’m not at liberty to share much detail. But I can say that there were energetic conversations about metrics, tools, and (of course) hiring.

One of the participants was Google researcher Alon Halevy, who generously shared his presentation on Fusion Tables with me with permission to re-share it here.

Fusion Tables allow the general public to upload, visualize, and share structured data. They are particularly useful for journalists who want to distill compelling stories from data — indeed, The Guardian‘s Simon Rogers has used Fusion Tables to visualize and interpret everything from nuclear power plant accidents to Wikileaks.

After his presentation, I asked Alon for his thoughts on why haven’t we seen an encyclopedic structured data repository comparable in scope, scale to Wikipedia? Alon offered that structured data is brittle — its value tends depend more on context than the unstructured content that populates Wikipedia. I agree in part — for example, consider this map of Brooklyn bus stops that were slated for elimination last summer. Such data is useful in a narrow context, but hardly encyclopedic.

But what about Freebase and DBpedia? Freebase is an open repository of structured data associated with about 20 million topics. DBpedia describes itself as “a community effort to extract structured information from Wikipedia and to make this information available on the Web.” While these tools have seen some use by developers (especially in the semantic web community), they have not achieved mainstream adoption. Perhaps data marketplaces like Factual and Infochimps will be successful as for-profite businesses, but the question remains why we don’t have a Wikipedia-scale success story for public structured data.

I think the problem is easiest to frame in information retrieval terms. Wikipedia is all about precision, but not so much about recall. Let me elaborate.

Wikipedia represents a collective attempt to achieve precision at the level of individual entries. Contributor / editors correct mistakes and argue over the details of content and tone. But coverage is a much lower priority. When in doubt, the Wikipedia collective assumes that information is not notable enough to justify inclusion. Thus Wikipedia errs on the side of precision rather than recall when it comes to meeting the information needs of its users.

This arrangement works well for a typical web user who seeks out information by using Google web search as an interface to discover Wikipedia articles. But structured data is about sets, not just individuals. It does me no good to see aggregate statistics about a set of entities if the set is erratically populated (e.g., Wikipedia’s list of companies established in 1999 or Freebase’s list of those founded after 2000).

In the June 2009 SIGIR Forum, University of Melbourne researchers Justin Zobel, Alistair Moffat, and Laurence Park argued “against recall“, concluding that they could find “no justification for implicit or explicit use of recall as a measure of search satisfaction.” I posted a rebuttal entitled “In Defense of Recall“, arguing that recall is much more useful as a measure for set retrieval than for ranked retrieval. Revisiting this argument two year later, I can see that it holds even more strongly if we are interested in structured data where we want to reason about aggregate properties of sets.

Back when we both worked at Endeca, my colleague Rob Gonzalez described structured data repositories to be as a public good that no one is ever willing to pay for. I’m an optimist by nature, but in this case I fear he has a point. It takes a lot of work to build something useful, and no one seems to have addressed the challenge of incenting people to contribute this work for either economic or altruistic motives.

Or perhaps we’ll just have to wait for the holy grail of information extraction algorithms to structure the world’s information for us? Ironically, that’s not even included on Wikipedia’s list of AI-complete problems.

Categories
General

Announcing HCIR 2011!

As regular readers know, I’ve been co-organizing annual workshops on Human-Computer Interaction and Information Retrieval since creating the first HCIR workshop in 2007. These have been a huge success, not only bridging the gap between IR and HCI, but also bringing together researchers and practitioners to address concerns shared by both communities. Past keynote speakers have included such information science luminaries as Susan Dumais, Ben Shneiderman, and Dan Russell.

Every workshop has improved on the previous year’s, and HCIR 2011, which will take place on Thursday, October 20, will be no exception.

Our venue will be Google’s headquarters in Mountain View, California. We could hardly imagine a more appropriate venue: Google has done more than any another company to contribute to everyday information access. Google has been extremely generous as a host and sponsor (other sponsors include Endeca and Microsoft Research), and its location in the heart of Silicon Valley is ideal for attracting researchers and practitioners building the future of HCIR.

Our keynote speaker will be Gary Marchionini, Dean of the School of Information and Library Science at the University of North Carolina at Chapel Hill. Gary coined the phrase “human–computer information retrieval” in a lecture entitled “Toward Human-Computer Information retrieval“, in which he asserted that “HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.” We are honored to have Gary deliver this year’s keynote.

But of course the main attraction is the contribution of participants. This year we invite three types of papers: position papers, research papers and challenge reports. Possible topics for discussion and presentation at the workshop include, but are not limited to:

  • Novel interaction techniques for information retrieval.
  • Modeling and evaluation of interactive information retrieval.
  • Exploratory search and information discovery.
  • Information visualization and visual analytics.
  • Applications of HCI techniques to information retrieval needs in specific domains.
  • Ethnography and user studies relevant to information retrieval and access.
  • Scale and efficiency considerations for interactive information retrieval systems.
  • Relevance feedback and active learning approaches for information retrieval.

Demonstrations of systems and prototypes are particularly welcome.

Building on the success of the last year’s HCIR Challenge to address historical exploration of a news archive, this year’s HCIR Challenge will focus on the problem of information availability. The corpus for the Challenge will be the CiteSeer digital library of scientific literature.

For more information about the workshop, including how to submit papers or participate in the challenge, please visit the HCIR 2011 website.

Here are the key dates for submitting position and research papers:

  • Submission deadline (position and research papers): July 31
  • Notification of acceptance decision: September 8
  • Presentations and poster session at workshop: October 20

Key dates for Challenge participants:

  • Request access to corpus (contact me) deadline: June 19
  • Freeze system and submit brief description: September 25
  • Submit videos or screenshots demonstrating systems on example tasks: October 9
  • Live demonstrations at workshop: October 20

I’m looking forward to this year’s submissions, and to a great workshop in October. I hope to see many of you there!

Categories
General

CFP: CIKM 2011 Industry Event

As I posted a few months ago, I’m organizing the Industry Event at CIKM 2011 with Tony Russell-Rose. We have a great set of keynotes lined up:

We’re also looking for submissions from industry researchers and practitioners. The submission deadline is June 21.

Here is a copy of the call for papers:

This year’s CIKM conference will include an Industry Event, which will be held during the regular conference program in parallel with the technical tracks.

The Industry Event’s objectives are twofold. The first objective is to present the state-of-the-art in information retrieval, knowledge management, databases, and data mining, delivered as keynote talks by influential technical leaders who work in industry. The second objective is to present interesting, novel and innovative industry developments in these areas.

Industry authors are invited to prepare proposals for presenting interesting, novel and innovative ideas, and submit these to industry@cikm2011.org by June 21st 2011. The proposals should contain (with respective lengths):

  • Short company portrait (125 words)
  • Short CV of the presenter (125 words)
  • Title and abstract of the presentation (250 words)
  • Reasons why the presentation should be interesting to the CIKM audience

When submitting a proposal, please bear in mind the following:

  • Ensure the presentation is relevant to the CIKM audience (the Call for Papers gives a good idea of the conference scope).
  • Try to highlight interesting R&D challenges in the work you present. Please do not present a sales pitch.
  • All slides will be made public (no confidential information on the slides; you will be expected to ensure your slides are approved by your company before being presented).
  • Presenters may opt to have their presentation videoed and made public, and if so, the presenter will be asked to sign a release form.

We look forward to receiving your submissions, and welcoming you to the CIKM 2011 Conference and Industry Event.

Important dates:
21 June 2011: Industry Event paper proposals due
19 July 2011: Notifications sent
27 October 2011: Industry Event
24-28 October 2011: CIKM conference