Categories
General

Reflecting on 2010: Searching for Answers

Yes, it’s that time of year when we take a moment to reflect on the past year’s accomplishments and muse about what the next year will bring. Other than milder weather!

I began this year as a Noogler and leave it as a Xoogler. I hope I left Google better than I found it — I’m certainly proud of the improvements my team made to the quality of local authority pages. I also tried to infuse Google with some of the scrappy start-up culture I’d picked up at Endeca, particularly focusing on the hiring process. In information retrieval terms, I’d say that Google’s hiring process does extremely well when it comes to precision, but could use improvement in the areas of recall and efficiency. Still, I’m impressed at how well Google has maintained its quality standards as the company has grown. Finally, I couldn’t help being an extrovert: I developed warm relationships with the lead bloggers covering local search, including Andrew Shotland, David Mihm, Gib Olander, Greg Sterling, and Mike Blumenthal. Indeed, when I announced my departure, Mike wrote a really nice post about the friendship we cultivated over the past year. I hope that he continues to have such relationships with my former co-workers.

Looking back at what was on my mind when this year began, I had lots of questions around exploratory, mobile, real-time, social/collaborative search. I also wondered whether it was possible to offer more transparency in relevance ranking without losing ground in the battle against spam and black-hat SEO.

I’m as bullish as ever on the value of exploratory search:  part of why I joined LinkedIn is that a significant fraction of the site’s value comes from supporting users’ exploratory search needs. I also published a position paper at the SIGIR 2010 Workshop on Simulation of Interaction proposing the use of query performance prediction to model the fidelity of communication between user and system, thus helping HCIR researchers to simulate query refinement with standard test collections. And of course exploratory search was a major theme at the HCIR 2010 workshop, not only providing the basis for the first HCIR Challenge, but even extending to new territory with Max Wilson and David Elsweiler’s work on casual leisure searching.

As for mobile search, I’d say that 2010 has been the year of “mobile first“. Thanks to a generous gift from my former employer, I’ve become a regular user of the mobile web–and of search in particular. To my surprise, the communication bottleneck has not been screen real estate, but rather the difficulty of entering text. And innovative approaches like voice search and Swype go a long way to mitigate that difficulty.

On to real-time search. Not surprisingly, my favorite innovation in this space is LinkedIn Signal, which offers exploratory search for Twitter. I still struggle to find use cases that emphasize the “real-time” aspect of Twitter and other microblogging services, but I am convinced that the path to utility lies in tools that support organization, analysis, and exploration.

On the social/collaborative front, I’m happy to work for a company whose charter includes “supporting mediated search by linking people to people, rather than directly to information”. While the biggest event in this space in 2010 was Facebook’s introduction of the Like button, I’m not convinced that “likes” have supplanted links. I’m still looking to niche players like Topsy and Blekko to push innovation in this space.

Speaking of Blekko, they’ve made an impressive attempt to increase the transparency of relevance ranking. But, as I blogged earlier this year, I think that, at least for the time being, Google is making the right decision to keep some of its details secret. Now that web search is essentially a duopoly (at least in the US), I believe the real test of the value of transparency to users will be whether one of the two parties employs it as competitive differentiator.

What’s in store for 2011? LinkedIn CEO Jeff Weiner has a vision of using data science to provide a “Pandora for people“, and that’s a vision I’m eager to help realize. Not surprisingly, when I blogged in 2008 about where Google wasn’t good enough, two of the four areas I cited were finding jobs and find employees. Even then I recognized that LinkedIn was the best at both. But LinkedIn can be so much more, and I am looking forward to working with an incredible team and incredible data on a delightful set of information science challenges.

Happy New Year! I hope that 2011 brings you great answers — and great questions!

Categories
General

The Secret May Be To Keep Fewer Secrets

In light of the recent WikiLeaks saga and the various leaks that have plagued my former employer, I was musing the other day about whether leaks are inevitable as an organization grows.

I started off by considering a model where each individual in an organization leaks a particular piece of sensitive information with a constant probability p, and where acts of leakage are independent and identically distributed events. Now let’s consider what value of p leads to a 99% probability of leakage in an organization of n = 20,000 people. It’s less than 1/4000. In other words, even if each person in an organization can keep a secret with 99.98% reliability, almost all secrets will be leaked.

Using this same value of p with n = 900 (roughly the size of my current employer) yields less than a 20% chance of leakage — certainly not a zero probability, but much closer to zero than to one. And at n = 90 — the upper end of what I’d consider a startup — the probability of leakage drops to 2%. Based on this crude analysis, the ability to keep secrets drops very rapidly as organizations enjoy the growth that comes with success.

Moreover, p is likely to be positively correlated to n — that is, individuals in larger organizations are more likely to leak sensitive information. Many people in larger organizations have less actual and perceived stake in the organization’s success, than those in smaller ones. Also, it is difficult to sustain grueling hiring standards — particularly cultural ones — as an organization grows.

So what is an organization to do? If the above model is even close to accurate, then I can see four options:

1) Don’t grow.

Yes, I’m serious. Not every idea inspires a billion-dollar business, and not every company should grow beyond a hundred people. Growth has costs that offset its benefits, and the inability to keep secrets may be a significant cost for organizations whose competitive advantage depends on proprietary intellectual property. The largest hedge funds each have about 1,000 employees, and most are much smaller. Secrecy is not the only consideration, but it’s certainly a consideration.

2) Share less with your employees.

If you can’t reduce p, you can at least reduce n by sharing secrets less widely. Traditional organizations only share sensitive information within a tight inner circle. Even Google, known for sharing almost everything with its employees, keeps tighter control over the details of search result ranking. This approach, however, comes at a cost: it signals to employees that they cannot be trusted. Moreover, if employees discover secret information through rumor, they may feel less responsible for maintaining secrecy than if they had been entrusted with that information.

3) Investigate leaks and punish leakers.

Some organizations succeed better than others at rooting out leakers and punishing them. In economic terms, it makes sense to discourage undesirable behavior through strong disincentives. Note, however, that leakers rarely gain anything tangible in exchange for their leaks and indeed are often acting irrationally in strictly economic terms. People in general have been known to act irrationally. So I’d caution against any approach that assumes human rationality. A better approach may be to detect or prevent of leaks through technology (e.g., packet analyzers), but see the previous comment about making employees feel they cannot be trusted.

4) Keep fewer secrets.

A prominent CEO recently said “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place“. Yes, I’m taking the quotation out of context, but I’d like to offer a variant: if your organization’s success depends on something that you don’t want anyone to know, maybe you should reconsider your business model. Less glibly, you should avoid unnecessary dependence on secrecy, and you should avoid labeling all corporate information as secret, since that desensitizing employees to the risks of disclosure.

Conclusion? As Ben Franklin said, “Three may keep a secret, if two of them are dead.” Organizations can and do manage to keep secrets. But it’s hard to fight human nature, and better not to rely on winning that fight.

Categories
General

CIKM 2011 Industry Event

CIKM 2011 is nearly a year away, but I wanted to give folks a heads up about the Industry Event there that I am organizing with Tony Russell-Rose. These events have become an an increasingly important part of the annual CIKM and SIGIR conferences, and I believe they are helping to bridge the gap between scholarship and practice. When I organized the SIGIR 2009 Industry Event, it was almost too popular — I felt bad for the parallel research presentations that had to compete with Matt Cutts and danah boyd for attendees!

But not so bad that I wouldn’t do it again! We have an outstanding line-up of invited talks for the CIKM 2011 Industry Event, featuring:

For those not familiar with industry luminaries, that list includes one of the world’s most prominent information retrieval researchers, the founder of Metaweb (which created Freebase), the person who build Facebook’s data team (which developed Hive and Cassandra), and one of the leading industrial researchers on natural language processing. To borrow a sports metaphor, these were our first-round draft picks, and we are delighted that they all agreed to participate.

And those are just the keynotes! We’re also going to put out a call for participation soon, so watch this space!

Categories
General

First Week

It’s hardly surprising, at least in retrospect, that location-based social networking company Foursquare was founded (twice!) in New York City. Where else (at least in the United States) are there so many people with so many places to go and so many ways to get there? I’m not a social or environmental determinist, but clearly a startup needs hospitable conditions to thrive.

Having just started my new life as a citizen of Silicon Valley, I’ve quickly comprehended how it is the perfect birthplace for LinkedIn. Every introduction has been an exercise of triadic closure. Indeed, while most people know that the Bay Area is the world’s leading hub for technology startups, perhaps not everyone realizes that the foundation for this environment is the professional network that binds it. I’ve only been here for a week, and yet my world seems smaller by the day as I keep discovering new connections among my colleagues. It’s a lot of fun, if a bit overwhelming!

And fun but overwhelming is a great way to describe LinkedIn itself. It’s only been a few days since I updated my profile, but I already feel immersed in LinkedIn’s vibrant culture. I sit in an open office, surrounded by people I work with — data scientists, software engineers, product managers, designers, and more. And I’m already interviewing folks I might be working with soon — in a company growing as quickly as LinkedIn, it is everyone’s job to grow the team. I’ve joked to friends that moving west gave me three more hours to get work done — but I’m using them all and they’re not enough!

But despite this explosive growth, LinkedIn’s vision is shared and tight. We all know that our goal is to connect the world’s professionals to make them more productive and successful. Having such a clear-cut mission enables us to directly relate all of our efforts and ambitions to the concrete value they create. It’s a great feeling, and it helps me keep my sanity as I observe the size of my ever-increasing to-do list.

To say that I’m still adjusting is an understatement. I haven’t made a change like this is over a decade, and this adventure feels even more immersive. But a big difference between now and 1999 is that I arrive in my new world with a network of people there to welcome me. I have LinkedIn to thank for helping me develop that network, and it’s great to finally have the opportunity to give back.

Categories
General

Follow The Data

Today is my last day at Google. I have enjoyed an incredible year there, during which I’ve had the privilege to work with some of the smartest engineers on the planet. Working at Google taught me how much impact a handful of dedicated people can have on the lives of billions of users. Not that long ago, I compared Google to McDonald’s. Having spent time on the inside, I can attest that Google is a marvel of scale orchestration. Moreover, the Google New York office represents an impressive concentration of Google’s talent in the greatest city of the world.

But I am leaving Google to pursue the opportunity of a lifetime. On Monday, I will start a new chapter of my life. I am joining the data scientist team at LinkedIn, where I’ll be working with DJ Patil and his world-class team to build products and discover insights from a data collection that I have coveted for years. I’ll get to work with folks like Pete Skomoroch and Monica Rogati. And I’ll get to tackle challenges in my favorite areas of computer science: information extraction, matching, recommendation, social network analysis, and network visualization. Not to mention working with one of the largest faceted search deployments on the web!

It was an agonizing decision to leave Google and New York City. But, when LinkedIn reached out to me a couple of months ago, I was reminded of a fateful email from Steve Papa in July 1999 that led me to pack a bag two months later and begin the adventure that is now Endeca. LinkedIn is hardly a startup — it has over 600 employees and over 80 million members. But I see boundless opportunities to create new value from the great data and talent that LinkedIn has assembled. So, when I received that note from LinkedIn, I didn’t really have a choice.

This Monday, I begin a new adventure. Data, here I come!