Categories
General

Got Skills?

Last October, a certain blogger said:

LinkedIn needs to implement some kind of concept extraction to provide a useful topic facet (something I’d also love to see for their regular people search). This is a challenging information extraction problem, especially for the open web, but I also know from experience that it is tractable within a domain. Given LinkedIn’s professional focus, I believe this is a problem they can and should tackle.

Shortly after writing that post, I interviewed at LinkedIn and met Pete Skomoroch, who showed me an early preview of the work his team was doing to make skills a facet for exploring the space of LinkedIn member profiles. That demo made a strong impression on me, giving me a taste of the great products LinkedIn’s data scientists were working on in the lab.

And now I’m delighted that everyone can try out the beta launch of LinkedIn Skills which was announced today at O’Reilly’s Strata 2011 conference on Big Data.

As Pete says in his blog post:

If you search for a particular skill, we’ll surface key people within that community, show you the top locations, related companies, relevant jobs, and groups where you can interact with like minded professionals.  You’ll also be able to explore similar skills and compare their growth relative to each other.

I encourage you to check it out — whether you’re looking for experts on Hadoop, cheese, or anything else! It’s a beta, so I’m sure you’ll find rough edges; but I hope it gives you a sense of how LinkedIn’s data can enable a incredibly powerful and useful exploratory search experience.

No forward-looking statements, except to say that it only gets better from here!

Categories
General

Be Vewy Vewy Quiet

While my blog has always been and will always be a personal one, I do operate under certain constraints as someone whose subject matter relates strongly to his professional interests. I deeply appreciate how long-time readers have respected the balancing act I sometimes have to make as both an independent individual and an employee.

Right now, that means I must respect the conditions of my employer’s quiet period — and I will do so very conservatively (e.g., no Playboy interviews). I apologize if the content of this blog suffers in the interim, but I hope you understand my need to be cautious.

Categories
General

Internship Opportunities at LinkedIn

Do you love big data? Do you enjoy applying your skills in data mining, machine learning, information retrieval and data visualization? Are you a hands-on implementer who can turn your ideas into reality, whether in Java or Python? Are you turned on by NoSQL technologies like Hadoop, Pig, and Voldemort?

And one last question…are looking for an exciting internship opportunity this summer? Then you’ve come to the right place at the right time: LinkedIn is looking for a few good interns for summer 2011! You can find more details here or go directly to the application form.

If you are interested, I encourage you to act quickly, since we are already interviewing candidates.

Categories
General

Dare To Dream

“If a man hasn’t discovered something that he will die for, he isn’t fit to live.”

Martin Luther King, Jr. said these words at a speech in Detroit on June 23, 1963. Less than five years later, he died for the cause to which he devoted his life: the advancement of civil rights in the United States and around the world through civil disobedience and other nonviolent resistance.

Today, as Americans commemorate Dr. King’s birthday, there are many ways we can honor his memory and build on his legacy. As much as King advanced the civil rights movement, there is still much to be done to fulfill his dream.

But I’d like to go back to the quote from his speech in Detroit. King’s words reveal a truth even deeper than his struggle for civil rights. They demand that we approach life with passion, that we live to do something more than pass the time.

In the face of pressing day-to-day responsibilities, it is easy to fall into a reactive rhythm, doing what we have to do and then using what time is left to escape into oblivion. For many of us, passion may feel like a nice-to-have, something to think about after we’ve cleared out our queues and gotten a full night of sleep — only to wake up and find that the queue is full again. It is easy to go through life like Sisyphus, sweating profusely as we roll our boulders but lacking the intellectual ambition to question why we make those efforts.

Today, the least we can do to honor the memory of Martin Luther King, Jr. is to reflect on his personal passion to leave the world better than he found it. Hopefully none of us will ever have to make the sacrifice that he made to realize his dream. But if we do not dare to dream at all — if we are not passionate and ambitious about what do — then, indeed, we are not fit to live.

Dare to dream — and live to make that dream a reality.

Categories
General

Quo Vadis, Quora?

I know, everyone is sick about hearing about Quora, the community question answering site that is the darling of the blogosphere, and perhaps you fled here from TechCrunch hoping for something different. If so, I apologize. And if you want to read something else, I encourage you to use either the random post widget I recently added to the right-hand sidebar  or the exploration widget at the bottom of this post.

But I have personal reasons to be interested in Quora. One of their lead engineers, Albert Sheu, was a star intern of mine at Endeca. And Quora raises lots of interesting questions about search, user experience, knowledge management, and online reputation. How could I resist?

I see three potential reasons to use Quora:

  1. Objective question answering.
  2. Subjective question answering.
  3. Community participation.

Let’s consider how Quora fares today on each of these, and where it might go.

1. Objective question answering.

When I blogged about Quora early last year, I said that “I don’t see Quora as a knowledge base of first resort–except possibly to learn more about software startups.” Despite Quora’s recently growth surge, I am not ready to change my answer significantly — I find that Quora’s topics are pretty sparse when I stray from its Silicon Valley focus.

Within that focus, Quora is nailing it. For example, I was curious to learn whether someone who signed a non-compete agreement outside of California was still subject to it if he or she moved to California, where such contracts are legally unenforceable. Not surprisingly, non-compete agreements are a topic on Quora, and I quickly found a useful answer from a lawyer.

But for most objective questions, I’m still turning to Google and Wikipedia — or to Twitter if both of those fail and I am willing to ask a favor of my followers (who kick ass!). Sometimes Google will take me to Quora, but I can’t imagine Quora will succeed through this flow in the long term.

2. Subjective question answering.

I see subjective question answering as Quora’s strongest suit. A good subjective question on Quora — often a “why” question — generates a diverse collection of interesting and informed perspectives. A couple of good example are “Why did Google Wave fail to get significant user adoption?” and “What is lacking in social networking now?“.

Again, these questions are well within the Silicon Valley focus, but I could see Quora extending this value proposition to other verticals if it can grow the communities successfully. And I certainly don’t see myself going to Google or even Twitter to get useful answers to subjective questions. The closest is Topsy, and Quora has the advantage of being explicitly organized around questions and topics.

3. Community participation.

Is Quora a question answering site or a social network? Quora users and employees have tried to answer that question (on Quora, natch), but I’m not sure Quora’s converged enough for anyone to know. What is clear is that Quora emphasizes conversation, making it more like a blog or wiki than an answers site.

Conversation certainly engages its participants. But it also raises the cost of participation. One of the things I love about Google is that it gives me information without unnecessary overhead. When I want conversation, I go to social venues like Twitter.

Perhaps Quora can be both a question answering site and a social network. But I suspect it will need to choose. Most people don’t have the time or patience to participate in additional communities, so question answering is the easier sell to a mass audience. But the participation is what makes Quora especially distinctive today. Perhaps it’s a question of quality vs. quantity.

So, quo vadis, Quora? I suppose I’ll have to check Quora (or Cwora) to find the answers.

Categories
General

No More Quora Invites

Over the past days, I have been inundated with requests for Quora invites. I realize that I brought this upon myself my making my blog the top hit on Google for [quora invite] — though it seems I’m at least down to the #2 slot now. In any case, I have sent out over a hundred invites and need to stop fulfilling requests so that I can focus on my day job!

I hope everyone I’ve invited is enjoying Quora. But I also hope you take it upon yourselves to circulate more invitations to those who want them. Any Quora user can send out invites — that’s how these viral sites work. If you’re still looking for an invite, I urge you to use Twitter or some other broadcast mechanism to request it. As of today, I will stop responding to Quora invite requests through my blog or email, and I will also delete comments requesting them. I am sorry if this is a bit harsh, but I hope folks understand.

Categories
General

Enabling Exploratory Search with Dhiti

Last August, I wrote “Exploring Nuggetize“, in which I described an interface that Dhiti co-founder Bharath Mohan developed to surface “nuggets” from a site and reduce the user’s cost of exploring a document collection. As an experiment, I’m now including a Dhiti widget here on The Noisy Channel. If you look at any single post in a browser, you’ll see a widget at the end of the post (before the comments) that attempts to use the post as a starting point for further exploration.

Please use the comments here to provide feedback. Bharath is eager to improve his product, and I’m eager to improve the experience for all of you!

Categories
General

Two Changes

I just wanted to let regular readers know that I’ve made two changes to this blog.

The first is that I’ve eliminated the use of categories on posts. I found that I was categorizing almost all posts as “general”, and that there was almost no value in maintaining such a low-entropy field. Yes, I’m well aware of the irony that, despite being an advocate of faceted search, I’m not providing any meta-data to annotate my posts. But I take what I believe to be a user-driven approach, and I can see from my logs that readers primarily visit to read my most recent posts. For those who like to browse, I encourage you to take advantage of the related posts feature, which is powered by the Yet Another Related Posts Plugin.

The second is that I’ve retired The Noisy Community. Two years ago, I created this directory of regular readers and commenters in order to foster a sense of community. I believe it was a success, but that it has outlived its usefulness. Again based on my logs, I can see that it has been neglected for a while. So, rather than continuing to invest in maintaining it manually, I have given it an honorable discharge.

My apologies to any readers whom I’ve offended with these changes. As always, I encourage you to make suggestions — especially if you’re willing to help implement them!

Categories
General

So You Like Big Data…

The increasing volume of data that we generate as a species is a story so overplayed as to have become trite. Indeed, a vast amount of this data is in the public domain, including data from the full text and common ngrams of books, genome research, the  United States census, and much more. There is also open-source software not only to crawl the web, but also to search the data your crawl. So, if you’re an aspiring data scientist and just want to get your hands on data, there’s no excuse–go out and get it!

But perhaps you’d like to make a career out your jones for big data. Luckily for you, some of the hottest companies around are hiring data scientists!

Of course, those jobs aren’t for everyone. To get an idea of the necessary qualifications, I suggest you read the answers on Quora for “How do I become a data scientist?” to get an idea of the requisite math and computer science skills. I’m also a fan of Hilary Mason‘s definition which was cited in Ryan Kim’s “Wanted: Data Scientists to Turn Information Into Gold“: a data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. You can see Hilary’s full explanation in a blog post she co-authored with Chris Wiggins, entitled “A Taxonomy of Data Science“.

If the qualifications haven’t scared you off, then it’s just a question of where you can best apply your data scientist skills. The good news is that there are a lot of different ways to make a career out of working with big data. Here are some suggestions for what to work on. I apologize in advance for taking a US-centric perspective — if you’re outside the US, I can only hope that the examples have local analogs.

1) Web search.

Google, Yahoo, and Bing all collect an enormous amount of data from people’s web search activity. Google is, of course, the 800-pound gorilla, but don’t dismiss the others — even a single-digit market share is enough to derive extremely valuable insights from user activity. And, since every major search engine makes the bulk of its revenue from advertising, they all present the big-data challenges associated with computational advertising. Search is, in my view, the web’s killer app, so you can’t go wrong working on it. But temper your expectations — despite heroic efforts from various parties, it seems difficult to deliver revolutionary improvements to this field.

2) Social networking.

Here the biggest players are Facebook and Twitter, but you can find a more comprehensive list on Wikipedia. Many consider LinkedIn to be a social network, but I’ll take the liberty to discuss it in its own section. Social networks attract an outsized share of users’ attention: Facebook alone accounts for a quarter of US page views on the web! All of this user activity means a lot of data to crunch, so it’s not surprising that LinkedIn, Facebook, and Twitter are recognized as having the best data science teams. How much you’ll enjoy working at these companies will in part reflect the value (and values) you perceive in their offerings, but they are all playgrounds for data scientists.

3) Electronic commerce.

While ad-supported web search may be the killer app of the web, what opens up people’s wallets is e-commerce. Led by Amazon and eBay, e-commerce sites deserve much of the credit for turning the web from an esoteric research project into a mainstream staple. And, like their offline counterparts, e-commerce sites generate vast amounts of data from how users view and purchase products. This data drives user recommendations, merchandising campaigns, pricing strategy, and much more. If you’d like to pursue data-driven capitalism, then e-commerce may be for you. A word of caution: if you are one of a crowd of merchants selling the same products as everyone else (as opposed to a site like Etsy selling unique products), make sure you have a sustainable competitive advantage. Data science is necessary for success in e-commerce, but it may not be sufficient.

4) Digital content.

Whether its books, music, video, or apps, the long-prophesied digital convergence has arrived: almost every newly created piece of digital content is now distributed in electronic form. Here the biggest players are Amazon, Apple, and Google (particularly its YouTube subsidiary), but there is still a lot of flux as new hardware, software, and business models compete for dominance. Digital content poses two daunting challenges: the volume of published content far exceeds people’s available attention, and digital media products are experience goods than people can only evaluate after consuming them. For both of these reasons, the digital content industry depends on data scientists to help people find and discover what they like. The catch: from its advent, the digital content industry has struggled with unauthorized distribution (aka piracy), and the results of this struggle will determine which business models are viable.

5) Finance.

Money, money, money. Working in finance has always been a data-intensive business, but advances in technology have only increased the industry’s reliance on data scientists. Algorithmic trading — and high-frequency trading in particular — mean that those who can most effectively and efficiently mine financial data can derive enormous financial benefits. Finance isn’t for everyone — the hours are long, the stress is high, and the compensation is highly variable. That said, the financial upside can be quite compelling, and some even enjoy the lifestyle.

6) Public sector.

Given the libertarian leanings of the software industry, the public sector might not seem like an obvious career choice. But some of the largest repositories of data reside there–from public repositories like census data to highly classified repositories restricted to the TLAs. Better understanding of this data can improve public policy, national security, and much more. Not everyone has the temperament to deal with government bureaucracy, but those who do have the opportunity to turn big data into big public good.

7) LinkedIn.

OK, I’m being self-serving, but after all this is my blog! LinkedIn is widely recognized as being one of the top data science teams on the planet. But LinkedIn has more than just talent — it has what Pete Warden of ReadWriteWeb described in “Secrets of the LinkedIn Data Scientists” as “detailed information on millions of people who are motivated to keep their profiles up-to-date, collect a rich network of connections and have a strong desire from their users for more tools to help them in their professional lives.” Indeed, I don’t know of anyone who has a dataset that competes with the combined quantity, quality, and utility of LinkedIn’s data. Moreover, working as a data scientist at LinkedIn means helping make people more professionally successful by connecting the to opportunities, information, and of course other people. It’s a wonderful way to create value, and it doesn’t hurt to do so in the context of a profitable, rapidly growing company.

And LinkedIn recognizes the extraordinary value of data science. Don’t take my word for it — listen to LinkedIn CEO Jeff Weiner’s interview at the 2010 Web 2.0 Summit:

To wrap up, data science is more than just an opportunity to have fun and make the world a better place — it might even be how you make an honest living!

Categories
General

Reflecting on 2010: Searching for Answers

Yes, it’s that time of year when we take a moment to reflect on the past year’s accomplishments and muse about what the next year will bring. Other than milder weather!

I began this year as a Noogler and leave it as a Xoogler. I hope I left Google better than I found it — I’m certainly proud of the improvements my team made to the quality of local authority pages. I also tried to infuse Google with some of the scrappy start-up culture I’d picked up at Endeca, particularly focusing on the hiring process. In information retrieval terms, I’d say that Google’s hiring process does extremely well when it comes to precision, but could use improvement in the areas of recall and efficiency. Still, I’m impressed at how well Google has maintained its quality standards as the company has grown. Finally, I couldn’t help being an extrovert: I developed warm relationships with the lead bloggers covering local search, including Andrew Shotland, David Mihm, Gib Olander, Greg Sterling, and Mike Blumenthal. Indeed, when I announced my departure, Mike wrote a really nice post about the friendship we cultivated over the past year. I hope that he continues to have such relationships with my former co-workers.

Looking back at what was on my mind when this year began, I had lots of questions around exploratory, mobile, real-time, social/collaborative search. I also wondered whether it was possible to offer more transparency in relevance ranking without losing ground in the battle against spam and black-hat SEO.

I’m as bullish as ever on the value of exploratory search:  part of why I joined LinkedIn is that a significant fraction of the site’s value comes from supporting users’ exploratory search needs. I also published a position paper at the SIGIR 2010 Workshop on Simulation of Interaction proposing the use of query performance prediction to model the fidelity of communication between user and system, thus helping HCIR researchers to simulate query refinement with standard test collections. And of course exploratory search was a major theme at the HCIR 2010 workshop, not only providing the basis for the first HCIR Challenge, but even extending to new territory with Max Wilson and David Elsweiler’s work on casual leisure searching.

As for mobile search, I’d say that 2010 has been the year of “mobile first“. Thanks to a generous gift from my former employer, I’ve become a regular user of the mobile web–and of search in particular. To my surprise, the communication bottleneck has not been screen real estate, but rather the difficulty of entering text. And innovative approaches like voice search and Swype go a long way to mitigate that difficulty.

On to real-time search. Not surprisingly, my favorite innovation in this space is LinkedIn Signal, which offers exploratory search for Twitter. I still struggle to find use cases that emphasize the “real-time” aspect of Twitter and other microblogging services, but I am convinced that the path to utility lies in tools that support organization, analysis, and exploration.

On the social/collaborative front, I’m happy to work for a company whose charter includes “supporting mediated search by linking people to people, rather than directly to information”. While the biggest event in this space in 2010 was Facebook’s introduction of the Like button, I’m not convinced that “likes” have supplanted links. I’m still looking to niche players like Topsy and Blekko to push innovation in this space.

Speaking of Blekko, they’ve made an impressive attempt to increase the transparency of relevance ranking. But, as I blogged earlier this year, I think that, at least for the time being, Google is making the right decision to keep some of its details secret. Now that web search is essentially a duopoly (at least in the US), I believe the real test of the value of transparency to users will be whether one of the two parties employs it as competitive differentiator.

What’s in store for 2011? LinkedIn CEO Jeff Weiner has a vision of using data science to provide a “Pandora for people“, and that’s a vision I’m eager to help realize. Not surprisingly, when I blogged in 2008 about where Google wasn’t good enough, two of the four areas I cited were finding jobs and find employees. Even then I recognized that LinkedIn was the best at both. But LinkedIn can be so much more, and I am looking forward to working with an incredible team and incredible data on a delightful set of information science challenges.

Happy New Year! I hope that 2011 brings you great answers — and great questions!