Categories
General

Finding, Locating, Discovering

Thanks to Tony Hollingsworth for alerting me to a post by Alex Campbell entitled “Stark realisation: I no longer depend on Google to find stuff“. The title is provocative link bait, but the take-away is very down to earth: Google is primarily useful for locating information than for discovering it.

Library scientists make a distinction between known-item and exploratory search. The former is about locating information: as an information seeker, you know the information exists, and you can even characterize it unambiguously; but the challenge is to convert that description into a location that allows you to retrieve the information. The latter is about discovery: you don’t know that the information you seek exists, and you may be sure of how to characterize what you are looking for–or even know what exactly you want until you’ve learned something about what is available.

These are extreme points on the information seeking spectrum, and most real-world tasks are in the middle, or combine subtasks of both types. For example, in physical libraries (yes, I’m that old!), I remember finding a book in the stacks and then browsing the nearby books in the hopes of serendipitous discovery. These days, I’d be more likely to scan its bibliography–or to look at the books and articles citing it. A known item can be an excellent entry point for exploration. Conversely, exploration can lead you to discover the existence of information that you then simply need to retrieve.

In common use, words like searching and finding cover this entire spectrum of information seeking activity. This breadth of meaning causes a lot of confusion. I’ve blogged about this before: “What is (Not) Search?“:

At the very least, I propose that we distinguish “search” as a problem from “search” as a solution. By the former, I mean the problem of information seeking, which is traditionally the domain of library and information scientists. By the latter, I mean the approach most commonly associated with information retrieval, in which a user enters a query into the system (typically as free text) and the system returns a set of objects that match the query, perhaps with different degrees of relevancy.

Back to Campbell’s article. His main points:

  • Social networks have dramatically expanded our network of contacts.
  • Search engine optimization (SEO) experts have killed their own game.
  • The flow of information has changed: information now comes to us, rather than us having to go out and find it.

I like the spirit of the post, but I think he overstates his case. SEO isn’t all bad–in fact, it’s probably a key factor in Google’s effectiveness. And, while social networks enable social search in theory, and information does come to us; we are experiencing filter failure (Clay Shirky’s term) in a big way.

My conclusion: I agree with him about Google’s limitations–Google is primarily a locating tool, not a discovery tool. Unfortunately, I’m not persuaded that social networks and our theoretical ability to construct an ideal in-flow of information have actually delivered on the promise of more efficient information access. But I’m optimistic that we’ll eventually get there.

Categories
Uncategorized

Blogs I Read: Chris Dixon (cdixon.org)

I’ve started reading a few different blogs in the past months, and one that I particularly like is Chris Dixon’s, which has the simple (if uncreative) title cdixon.org.

Chris has an interesting history that includes heading R&D at a hedge fund, co-founding SiteAdvisor, investing in a number of technology companies (including Skype and Postini), and most recently co-founding Hunch (which I’ve blogged about here a few times). As a karaoke junkie, I can’t help noting that he developed the software that became MySpace Karaoke.

Not surprisingly, Chris brings the combined perspective of an investor and a technologist to his blog. Here are some examples of recent posts that illustrate his range.

Thoughts on machine learning:

Career advice for entrepreneurs:

And of course he occasionally blogs about Hunch, his current venture.

Chris has a strong personality that comes through as a blogger. I think that’s critical for making a blog both informative and entertaining, and I try to channel my own personality (which I’m told, for better or worse, is quite distinctive) through this blog.

In short, check out cdixon.org if you’re interested in the perspective of a practical (and successful) technologist-entrepreneur.

Categories
General

Free as in Freebase

It’s been a while since I’ve blogged about Freebase, the semantic web database maintained by Metaweb. But I recently had the chance to meet Freebasers Robert Cook and Jamie Taylor and hear them present to the New York Semantic Web Meetup on “Content, Identifiers and Freebase” (slides embedded above).

It was a fun and informative presentation. Perhaps the most surprising revelation about Freebase was that all of their data fits in RAM on a 32G box (yes, some of you caught me live-tweeting that during the presentation). Their biggest challenge is collecting good data that lends itself to the reconciliation needed to make Freebase useful as a data repository. Despite the lack of a near-term revenue model, the Freebasers are bullish about their approach: strong identifiers, strong semantics, open data. On the last point, almost all of Freebase is available under the  Creative Commons Attribution License (CC-BY)–which, as far as I can tell, make anyone free to develop a mirror of Freebase. Indeed, many people are using this data, including Google and Bing.

You might wonder whether Freebase is a business or a non-profit foundation–and the question did come up. The answer is that Freebase eventually expects to make money by providing services, e.g., helping advertisers. They see their graph store as a competitive advantage–but they freely admit that this advantage will erode over time. Indeed, the surprisingly small size of their graph makes me wonder how much speed and scalability matter, compared to the challenge of data scarcity.

I’d like to see Freebase succeed. I’m particularly a fan of the work David Huynh has done there on interfaces for semantic web browsing. Clearly their investors are true believers–Metaweb has raised a total of $57M in funding. I don’t quite get it, but I’m happy we can all benefit from the results.

Categories
General

Social Networking: Theory and Practice

I’ve been a student of social network theory for years, enjoying the work of Duncan Watts, Albert-László Barabási, Jon Kleinberg, and a number of other researchers investigating this field. It should be no surprise that a topic that is so core to our humanity has attracted attention from some of our best and brightest.

And I’ve dabbled a bit on the theoretical side myself. The TunkRank measure (I’m indebted to Jason Adams for his implementing it on a live site!) attempts to take the most basic assumption about our social behavior–the constraint that we have a finite attention budget–and explore its implications for influence over social networks. I have a few unexplored hypotheses queued up for when I can find the spare time to try validate them empirically!

But why settle for theory? We live in an age where social networks compete with web search (and perhaps complement search) as the hottest online technologies. If we’re not reading about Google vs. Bing, we’re reading about Facebook vs. Twitter, with LinkedIn offering a third way that seems to co-exist with its more storied peers. In this post, I’d like to focus on LinkedIn.

LinkedIn, despite its feature creep, is still fairly old-school: its raison d’être is for users to build, maintain, and exploit their professional networks. In theory, connections on LinkedIn represent present or past working relationships that become the basis for referrals–whether the goal is employment, sales, or partnership. LinkedIn is not the only professionally oriented social network, but at this point it’s certainly the dominant one.

But I’ve found at least two additional ways to use LinkedIn that I’d like to share:

Intelligence gathering. For reasons I don’t yet claim to understand, people share far more information about themselves–and in a much cleaner, structured form–on LinkedIn than in perhaps any other online medium. Most people’s resumes are not available online, but their LinkedIn profiles are tantamount to resumes. Moreover, their structured format makes it possible for LinkedIn to assemble aggregate profiles of companies, revealing composite pictures that must drive some of those companies’ legal and HR departments batty! At a higher level, LinkedIn also works well as a discovery tool–much more so now they’ve enabled faceted search. It’s still a bit tricky to explore people and companies by topic, but far more effective using LinkedIn than using any other tool I’m aware of.

Meeting new people. Cold-calling, spamming–pick your poison. In short, LinkedIn doesn’t have to only be about connecting with people you already know. But there’s an art to sending unsolicited messages: you have to pass the moral equivalent of a CAPTCHA by proving that your communication strategy isn’t indiscriminate. Let me use a personal example (that Maisha Walker was nice enough to write up in her Inc. magazine column). I decided that I wanted to find everyone on LinkedIn who might be interested in HCIR ’09. So I searched for everyone whose profiles indicated interests in both IR and HCI and sent out a targeted message (in fact, a invite with personalized message–a feature I recently feared they’d killed). The results were overwhelmingly positive. I’m not sure how many of the people I contacted will attend, but I raised awareness without inflicting annoyance. Better yet, one of the people I contacted then discovered I was looking for volunteers to review the draft of my book–and I thus obtained hours of help of someone who, just a day before, had never heard of me!

What intrigues me about LinkedIn (and other social networks) is the extent to which I am exploiting attention market inefficiencies (as LinkedIn may be doing as well). For example, LinkedIn makes it easy to send unsolicited invitations to anyone. Granted, you can lose this privilege by even having a couple of people respond to invitations with “I don’t know this person”. There’s also the question of why people’s social norms around disclosure are so different on LinkedIn than anywhere else–people not only post the content of their resumes, but go through the effort of providing it to LinkedIn in a structured form! Meanwhile, LinkedIn keeps tightfisted control over the information it aggregates–understandably, they recognize that this content is their most valuable asset.

People are still getting used to the idea of social networks. It will be interesting to see how their use evolves, particularly in term of information and attention market efficiency.

Categories
General

Payola? There’s An App For That!

Remember a few months ago when there was a scandal about a Belkin employee paying people $0.65 per review to post 5-star reviews to Amazon?

Well, that was child’s play compared to what PR firm Reverb Communications has allegedly been doing for it clients. According to Gagan Biyani at  TechCrunch, Reverb hired interns to post positive review to Apple’s App Store for clients. Indeed, TechCrunch posted documentation obtained through an anonymous tipster, including the following:

Reverb employs a small team of interns who are focused on managing online message boards, writing influential game reviews, and keeping a gauge on the online communities. Reverb uses the interns as a sounding board to understand the new mediums where consumers are learning about products, hearing about hot new games and listen to the thoughts of our targeted audience. Reverb will use these interns on Developer Y products to post game reviews (written by Reverb staff members) ensuring the majority of the reviews will have the key messaging and talking points developed by the Reverb PR/marketing team.

What makes this story especially newsworthy is that Reverb’s client list includes some big names, such as Harmonix (i.e., Guitar Hero and Rock Band) and MTV Games.

Apparently the reviewer system isn’t entirely anonymous, so Biyani was able to look for patterns:

iTunes allows you to see other reviews posted by the same reviewer. So, we clicked on the reviewer “Vegas Bound” (iTunes link) and started to look at his reviews. He reviewed 7 applications, and gave each one of them 5 stars. Each review was short and sweet, and extremely positive. These reviews represented 6 different developers. A quick Google search revealed an infuriating truth: every single one of these developers was a client of one PR firm: Reverb Communications.

I can only hope that scandals like these will cause people to be more skeptical of reviews (or opinions in general) that come from anonymous or obfuscated sources. While most reviews are probably sincere, it doesn’t take much to erode public trust. Moreover, a few shill reviews can attract attention to a product, thus leading legitimate reviews to follow afterward. Where’s the harm? Products without those shill reviews are starved of the attention they might deserve. Money substitutes for authentic endorsement.

Our brave new world of social media makes it possible to truly democratize the sharing of knowledge and opinions. But gaming the system like this erodes the trust that is essential for this process to work–and thus devalues all of the information available to us online. The key enabler of such gaming is anonymity. Fortunately the miscreants do get caught on occasion. Hopefully we will learn from this experience and build more robust systems that aren’t so easily gamed. Transparency or FAIL.

Categories
Uncategorized

UIE Virtual Seminar on Faceted Search: A Great Experience!

Pete Bell and I delivered the seminar today, and it was a blast! We had over 150 registered listeners–and I found out that at least one of those registrations corresponded to a roomful of 20 people at an online retailer that is a thought leader in web usability and design!

Since we didn’t manage to get to all of the questions (over 40–possibly over 50 counting the activity on Twitter!), we’re going to do a follow-up podcast that will be available even to people who didn’t attend the seminar. And, since even that might not be enough, I’m saving all of the questions as blog fodder.

To all who attended–and to Jared, Adam, and all the folks of UIE–thanks from me and Pete for giving us this great opportunity to connect with folks interested in faceted search and user experience.

Categories
General

Google Search Appliance: Now Without HCIR!

In an earlier post, I speculated about why Google is holding back on faceted search. Of course, I was talking about their web search properties, not their enterprise offerings. I thought that they’d seen the light by now that faceted search–and HCIR in general–is especially important in the enterprise, where you can’t rely on PageRank, anchor text, and SEO–not to mention the large fraction of navigational and straight-to-Wikipedia queries.

But I was wrong. Don’t take it from me–watch the video below (or read this blog post) and listen to what Cyrus Mistry,  the product manager for the Google Search Appliance has to say. I might give him a pass on his dubious conflation all features other than ranked retrieval with “advanced search”. But here’s a direct quote: “users care about one thing: the right result coming to the top”.

Sigh. I don’t dismiss the value of relevance ranking. Some search queries are easy and clearly point to single documents as answers–and any search engine should do well on them. But lots of queries in site search and enterprise search environments (more so than on the web) don’t have a single best answer. That’s why we have faceted search and interfaces that offer useful information scent to users.

I understand that Google is, on the whole HCIR-averse. But I expect more from their enterprise division. To be clear, the “side by side” feature that Mistry touts is nice. It reminds me of Blind Search (built by a Microsoft employee in his spare time), and of a relevance ranking evaluator that Endeca customers have been using for years.

But there’s more to search results than ten blue links. Even the Google web folks seem to be slouching towards accepting the importance of interaction. Their enterprise team should be leading, not lagging.

Categories
Uncategorized

LinkedIn No Longer Allowing Invite Messages?

I noticed recently that, when I sent out an invitation to connect to someone on LinkedIn, there wasn’t the usual slot for including a free-text note with the invitation. I thought it might be a glitch–and I even considered the possibility that this was only happening to my account because I’m a bit of a networking junkie.

But I noticed on Twitter today that Mark Williams (aka @Mr_LinkedIn) had noticed the same change and followed up on it with LinkedIn’s customer service department. I never assume any site behavior on a freely provided service is permanent, but it is starting to look like this is a deliberate decision and not a transient bug.

If so, it’s an annoying change, though I can see the merits. I’ve made heavy use of the connection message, especially when inviting someone I don’t know all that well–or don’t know at all. A personal message can be what distinguishes a welcome cold call from spam. But I’m guessing that others have abused that capability, filling it with spam or worse. Still, I feel like LinkedIn may be throwing the baby out with the bathwater. Will follow up if / when I hear more.

UPDATE: Just saw this message on the LinkedIn site via Twitter:

Unable to Personalize Invitation Message

Why can’t I personalize the message in my Invitation?

We are aware of an issue preventing some members from customizing their Invitation messages. There is no need to contact Customer Service as our team is reviewing the issue to determine the best overall solution.

As a temporary workaround, the following message (with your name in the signature) is being sent when you click on the ‘Send Invitation’ button: ‘I’d like to add you to my professional network on LinkedIn.’

As long as you approve of this message, you may continue to take advantage of this feature. If you prefer a more customized message to be sent, you may delay sending your Invitations until the functionality has been restored.

UPDATE #2: Looks like the problem is resolved.

Categories
General

Prediction Is Hard, Especially About The Future

That Niels Bohr certainly knew what he was talking about! But that hasn’t discouraged folks in any number of industries from trying to make predictions.

Google in particular has been researching the predictability of search trends (just to be fair and balanced, so have Bing and Yahoo). Yossi Matias, Niv Efron, and Yair Shimshoni at Google Labs Israel have made some fascinating observations based on Google Trends, including the following:

  • Over half of the most popular Google search queries are predictable in a 12 month ahead forecast, with a mean absolute prediction error of about 12%.
  • Nearly half of the most popular queries are not predictable (with respect to the model we have used).
  • Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food & Drink (67%) and Travel (65%).
  • Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks & Online Communities (27%).
  • The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean absolute prediction error of of less than 6%.

You can read their full 32-page paper here.

I’m not surprised at the predictability of human search behavior, especially for stable topics or even for unstable ones viewed as aggregates–one could argue the celebrities and scandals du jour are unpredictable but interchangeable. What I’m curious about is what we can do with this predictability.

In the SIGIR ’09 session on Interactive Search, Peter Bailey talked about “Predicting User Interests from Contextual Information“, analyzing the predictive performance of contextual information sources (interaction, task, collection, social, historic) for different temporal durations. Max Van Kleek wrote a nice summary of the talk at the Haystack blog. The paper doesn’t investigate seasonality (perhaps because they only looked at four months of data), but I’d imagine they would subsume it under the broader categories of historic and social context. But they do set a clear goal:

Postquery navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity…Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.

I hope Google is following a similar agenda. If you’re going to go through the trouble of predicting the future, then help make it a better one for users!

Categories
Uncategorized

Last Chance to Register for UIE Virtual Seminar on Faceted Search!

My colleague, Endeca co-founder Pete Bell, and I are giving a virtual seminar on faceted search for User Interface Engineering (UIE) this Thursday, August 20th at 1:30PM EST. We’ve heard that there are over a hundred sign-ups already–which may actually correspond to more people, since a sign-up may mean a group of people watching in a conference room. We’re very excited about the opportunity to share our insights on a topic that draws such interest.

Jared Spool, who invited us to give this seminar, will be moderating. Indepedendent of the seminar, you you check out his work (and the UIE site) if you are interested in web usability.

The regular price is $129, but Noisy Channel readers who are interested in attending can get a $30 discount by using TUNKELANG (yes, all caps) as a promo code. Attendees also receive a free copy of my book, Faceted Search. That’s a a total value of over $150 for just $99! And it slices and dices!