Categories
General

LinkedIn Search: A Look Beneath the Hood

Last week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale.

John was kind enough to publish his slides, and I’ve embedded them above. Unfortunately, there’s no recording of the extensive Q&A (which included various attempts to get John to reveal the precise details of LinkedIn’s data volume), but the slides are quite meaty.

Personally, I learned two surprising things from the talk.

First, I was surprised that LinkedIn dismisses index/cache warming as “cheating”, instead computing almost everything in real time. Specifically, I would have expected LinkedIn to cache information like a user’s set of degree-two connections: these are expensive to compute at query time, especially when the social graph is distributed and sharded by user. I did ask John whether LinkedIn recomputes a user’s degree-two network during a session, and he admitted that LinkedIn is sensible enough to “cheat” and not perform this expensive but almost useless re-computation.

Second, I learned about reference search, a feature I may have missed because it is only available for premium LinkedIn accounts. It’s a nice feature, allowing you to search against company + date range pairs. People who are familiar with implementing faceted search may recognize the preservation of such associations between facet values as a gnarly implementation challenge.

All in all, it was a treat to get this look under the hood, as well as to finally meet John in person. I also ran into Gene Golovchinsky there–so much for my spending a few days on the west coast incognito!

In any case, I’m looking forward to seeing Gene, some of John’s colleagues, and many more interesting people at the Search and Social Media Workshop (SSM 2010) on Wednesday. My apologies to those who aren’t able to attend this oversubscribed event. I promise to blog about it!

By Daniel Tunkelang

High-Class Consultant.

13 replies on “LinkedIn Search: A Look Beneath the Hood”

Great slides, thanks for posting them.

This idea of minimizing the caching layer and instead allocating those resources to keep as much of the database in memory as possible is seriously underdone. I know memcached is the trendy thing these days, but some of these memcached layers out there are as big as the database layers and add all the complexity and inefficiency of dealing with cache consistency. Often it would be better to keep as much of the database as possible in memory and maximize the performance of the database (with careful indexing, partitioning, replication, and materialized views).

Seems obvious that the solution to a slow database should be fixing the slow database, but the much more common solution is to put a lot of stuff in front of that slow database that tries to hide the shame.

Like

If LinkedIn weren’t using degree of connection as a facet, then that would be a no-brainer. While John was a bit coy about the data volume, it’s pretty clear that the non-text data can fit in memory, given the amount of hardware they’re using. I’m surprised at how they chose to handle the degree-two filter without caching, since that seems like an easy way to spend a lot of work per query on a lot of queries. I could have asked John for aggregate network statistics for the LinkedIn user base, but I don’t imagine they publish those numbers.

Like

[…] After lunch, Jeremy Pickens (FXPAL) moderated a panel representing social media / networking companies: Hilary Mason (bit.ly), Igor Perisic (LinkedIn), and David Hendi (MySpace). Hilary noted that, while bit.ly does not have access to an explicit social graph, it captures implicit connections from user behavior that may not be represented in the graph. Jeremy asked the panelists how much a person’s extended network matters; David and Igor pointed out research indicating correlations of mood and even medical conditions between people and their third-degree connections. Again, the audience was full of questions, especially for Igor. As a fan of faceted search, I was glad to see him touting LinkedIn’s success in making faceted search the primary means of performing people search on the site. For an in-depth view, I recommend “LinkedIn Search: A Look Beneath the Hood“. […]

Like

[…] LinkedIn Search: A Look Beneath the HoodLast week, I had the good fortune to attend a presentation by John Wang, search architect at LinkedIn. You may have read my earlier posts about LinkedIn introducing faceted search and celebrating the interface from a user perspective. John’s presentation at the SDForum took a developer’s perspective, discussing the challenges of combining faceted search and social networking at scale. […]

Like

[…] For those of you interested in the technology behind the new LinkedIn search I recommend “LinkedIn search a look beneath the hood”, by Daniel Tunkelang where he links to a presentation by John Wang search architect at LinkedIn. […]

Like

Comments are closed.