LinkedIn Search: A Look Beneath the Hood

Post author By Daniel Tunkelang
Post date January 31, 2010
13 Comments on LinkedIn Search: A Look Beneath the Hood

Last week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale.

John was kind enough to publish his slides, and I’ve embedded them above. Unfortunately, there’s no recording of the extensive Q&A (which included various attempts to get John to reveal the precise details of LinkedIn’s data volume), but the slides are quite meaty.

Personally, I learned two surprising things from the talk.

First, I was surprised that LinkedIn dismisses index/cache warming as “cheating”, instead computing almost everything in real time. Specifically, I would have expected LinkedIn to cache information like a user’s set of degree-two connections: these are expensive to compute at query time, especially when the social graph is distributed and sharded by user. I did ask John whether LinkedIn recomputes a user’s degree-two network during a session, and he admitted that LinkedIn is sensible enough to “cheat” and not perform this expensive but almost useless re-computation.

Second, I learned about reference search, a feature I may have missed because it is only available for premium LinkedIn accounts. It’s a nice feature, allowing you to search against company + date range pairs. People who are familiar with implementing faceted search may recognize the preservation of such associations between facet values as a gnarly implementation challenge.

All in all, it was a treat to get this look under the hood, as well as to finally meet John in person. I also ran into Gene Golovchinsky there–so much for my spending a few days on the west coast incognito!

In any case, I’m looking forward to seeing Gene, some of John’s colleagues, and many more interesting people at the Search and Social Media Workshop (SSM 2010) on Wednesday. My apologies to those who aren’t able to attend this oversubscribed event. I promise to blog about it!

By Daniel Tunkelang

High-Class Consultant.

View Archive

13 replies on “LinkedIn Search: A Look Beneath the Hood”

You might have gotten away without being detected but for all the questions you asked! So much for attempts at stealth!

LikeLike

Great slides, thanks for posting them.

This idea of minimizing the caching layer and instead allocating those resources to keep as much of the database in memory as possible is seriously underdone. I know memcached is the trendy thing these days, but some of these memcached layers out there are as big as the database layers and add all the complexity and inefficiency of dealing with cache consistency. Often it would be better to keep as much of the database as possible in memory and maximize the performance of the database (with careful indexing, partitioning, replication, and materialized views).

Seems obvious that the solution to a slow database should be fixing the slow database, but the much more common solution is to put a lot of stuff in front of that slow database that tries to hide the shame.

LikeLike

If LinkedIn weren’t using degree of connection as a facet, then that would be a no-brainer. While John was a bit coy about the data volume, it’s pretty clear that the non-text data can fit in memory, given the amount of hardware they’re using. I’m surprised at how they chose to handle the degree-two filter without caching, since that seems like an easy way to spend a lot of work per query on a lot of queries. I could have asked John for aggregate network statistics for the LinkedIn user base, but I don’t imagine they publish those numbers.

LikeLike

[…] LinkedIn Search: A Look Beneath the Hood, thenoisychannel.com […]

LikeLike

Ever heard of Verity’s Parametric search. Sounds very familiar.

LikeLike

[…] After lunch, Jeremy Pickens (FXPAL) moderated a panel representing social media / networking companies: Hilary Mason (bit.ly), Igor Perisic (LinkedIn), and David Hendi (MySpace). Hilary noted that, while bit.ly does not have access to an explicit social graph, it captures implicit connections from user behavior that may not be represented in the graph. Jeremy asked the panelists how much a person’s extended network matters; David and Igor pointed out research indicating correlations of mood and even medical conditions between people and their third-degree connections. Again, the audience was full of questions, especially for Igor. As a fan of faceted search, I was glad to see him touting LinkedIn’s success in making faceted search the primary means of performing people search on the site. For an in-depth view, I recommend “LinkedIn Search: A Look Beneath the Hood“. […]

LikeLike

[…] Linden points to a recent presentation by LinkedIn engineers (who are likely "LinkedIn") which shows the importance of delivering […]

LikeLike

[…] https://thenoisychannel.com/2010/01/31/linkedin-search-a-look-beneath-the-hood/# […]

LikeLike

[…] LinkedIn usa Lucene en su arquitectura […]

LikeLike

[…] Posted by ariefew March 1, 2010 Googler Daniel Tunkelang recently wrote a post, “LinkedIn Search: A Look Beneath the Hood“, that has slides from a talk by LinkedIn engineers along with some commentary on […]

LikeLike

[…] discovered was an even nicer surprise: LinkedIn now connects the People You May Know feature to its faceted search interface. Indeed, they blogged about it earlier this week. Props to LinkedIn for continuing to […]

LikeLike

[…] LinkedIn Search: A Look Beneath the HoodLast week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale. […]

LikeLike

[…] For those of you interested in the technology behind the new LinkedIn search I recommend “LinkedIn search a look beneath the hood”, by Daniel Tunkelang where he links to a presentation by John Wang search architect at LinkedIn. […]

LikeLike

Comments are closed.

Share this:

Related

By Daniel Tunkelang

13 replies on “LinkedIn Search: A Look Beneath the Hood”