The Noisy Channel

 

LinkedIn Search: A Look Beneath the Hood

January 31st, 2010 · 13 Comments · General

Last week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale.

John was kind enough to publish his slides, and I’ve embedded them above. Unfortunately, there’s no recording of the extensive Q&A (which included various attempts to get John to reveal the precise details of LinkedIn’s data volume), but the slides are quite meaty.

Personally, I learned two surprising things from the talk.

First, I was surprised that LinkedIn dismisses index/cache warming as “cheating”, instead computing almost everything in real time. Specifically, I would have expected LinkedIn to cache information like a user’s set of degree-two connections: these are expensive to compute at query time, especially when the social graph is distributed and sharded by user. I did ask John whether LinkedIn recomputes a user’s degree-two network during a session, and he admitted that LinkedIn is sensible enough to “cheat” and not perform this expensive but almost useless re-computation.

Second, I learned about reference search, a feature I may have missed because it is only available for premium LinkedIn accounts. It’s a nice feature, allowing you to search against company + date range pairs. People who are familiar with implementing faceted search may recognize the preservation of such associations between facet values as a gnarly implementation challenge.

All in all, it was a treat to get this look under the hood, as well as to finally meet John in person. I also ran into Gene Golovchinsky there–so much for my spending a few days on the west coast incognito!

In any case, I’m looking forward to seeing Gene, some of John’s colleagues, and many more interesting people at the Search and Social Media Workshop (SSM 2010) on Wednesday. My apologies to those who aren’t able to attend this oversubscribed event. I promise to blog about it!

13 responses so far ↓

  • 1 Gene Golovchinsky // Jan 31, 2010 at 11:40 pm

    You might have gotten away without being detected but for all the questions you asked! So much for attempts at stealth!

  • 2 Greg Linden // Feb 2, 2010 at 3:14 pm

    Great slides, thanks for posting them.

    This idea of minimizing the caching layer and instead allocating those resources to keep as much of the database in memory as possible is seriously underdone. I know memcached is the trendy thing these days, but some of these memcached layers out there are as big as the database layers and add all the complexity and inefficiency of dealing with cache consistency. Often it would be better to keep as much of the database as possible in memory and maximize the performance of the database (with careful indexing, partitioning, replication, and materialized views).

    Seems obvious that the solution to a slow database should be fixing the slow database, but the much more common solution is to put a lot of stuff in front of that slow database that tries to hide the shame.

  • 3 Daniel Tunkelang // Feb 2, 2010 at 3:26 pm

    If LinkedIn weren’t using degree of connection as a facet, then that would be a no-brainer. While John was a bit coy about the data volume, it’s pretty clear that the non-text data can fit in memory, given the amount of hardware they’re using. I’m surprised at how they chose to handle the degree-two filter without caching, since that seems like an easy way to spend a lot of work per query on a lot of queries. I could have asked John for aggregate network statistics for the LinkedIn user base, but I don’t imagine they publish those numbers.

  • 4 SearchCap: The Day In Search, February 3, 2010 // Feb 3, 2010 at 6:51 pm

    […] LinkedIn Search: A Look Beneath the Hood, thenoisychannel.com […]

  • 5 OS // Feb 3, 2010 at 9:18 pm

    Ever heard of Verity’s Parametric search. Sounds very familiar.

  • 6 Report on the Third Workshop on Search and Social Media (SSM 2010) // Feb 4, 2010 at 4:25 am

    […] After lunch, Jeremy Pickens (FXPAL) moderated a panel representing social media / networking companies: Hilary Mason (bit.ly), Igor Perisic (LinkedIn), and David Hendi (MySpace). Hilary noted that, while bit.ly does not have access to an explicit social graph, it captures implicit connections from user behavior that may not be represented in the graph. Jeremy asked the panelists how much a person’s extended network matters; David and Igor pointed out research indicating correlations of mood and even medical conditions between people and their third-degree connections. Again, the audience was full of questions, especially for Igor. As a fan of faceted search, I was glad to see him touting LinkedIn’s success in making faceted search the primary means of performing people search on the site. For an in-depth view, I recommend “LinkedIn Search: A Look Beneath the Hood“. […]

  • 7 Display Ads Are Back!; Microsoft Display Ads Takes Facebook Punch; Broadcast Execs See More Video Ads Online // Feb 8, 2010 at 1:07 am

    […] Linden points to a recent presentation by LinkedIn engineers (who are likely "LinkedIn") which shows the importance of delivering […]

  • 8 kafka0102的边城客栈 » Blog Archive » 一周技术文档分享 // Feb 11, 2010 at 5:14 am

    […] http://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/# […]

  • 9 Enlaces rápidos (16-02-2010) | Sentido Web // Feb 16, 2010 at 6:05 pm

    […] LinkedIn usa Lucene en su arquitectura […]

  • 10 New details on LinkedIn architecture | IP Address Visitor // Feb 28, 2010 at 6:46 pm

    […] Posted by ariefew March 1, 2010 Googler Daniel Tunkelang recently wrote a post, “LinkedIn Search: A Look Beneath the Hood“, that has slides from a talk by LinkedIn engineers along with some commentary on […]

  • 11 People You May Know — Now With Faceted Search! // May 15, 2010 at 4:50 pm

    […] discovered was an even nicer surprise: LinkedIn now connects the People You May Know feature to its faceted search interface. Indeed, they blogged about it earlier this week. Props to LinkedIn for continuing to […]

  • 12 Rita's Blog » LinkedIn Search: A Look Beneath the Hood // Apr 23, 2011 at 7:21 am

    […] LinkedIn Search: A Look Beneath the HoodLast week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale. […]

  • 13 Faceted Search by LinkedIn // Oct 16, 2012 at 10:32 am

    […] For those of you interested in the technology behind the new LinkedIn search I recommend “LinkedIn search a look beneath the hood”, by Daniel Tunkelang where he links to a presentation by John Wang search architect at LinkedIn. […]

Clicky Web Analytics