Categories
General

Free as in Freebase

It’s been a while since I’ve blogged about Freebase, the semantic web database maintained by Metaweb. But I recently had the chance to meet Freebasers Robert Cook and Jamie Taylor and hear them present to the New York Semantic Web Meetup on “Content, Identifiers and Freebase” (slides embedded above).

It was a fun and informative presentation. Perhaps the most surprising revelation about Freebase was that all of their data fits in RAM on a 32G box (yes, some of you caught me live-tweeting that during the presentation). Their biggest challenge is collecting good data that lends itself to the reconciliation needed to make Freebase useful as a data repository. Despite the lack of a near-term revenue model, the Freebasers are bullish about their approach: strong identifiers, strong semantics, open data. On the last point, almost all of Freebase is available under the  Creative Commons Attribution License (CC-BY)–which, as far as I can tell, make anyone free to develop a mirror of Freebase. Indeed, many people are using this data, including Google and Bing.

You might wonder whether Freebase is a business or a non-profit foundation–and the question did come up. The answer is that Freebase eventually expects to make money by providing services, e.g., helping advertisers. They see their graph store as a competitive advantage–but they freely admit that this advantage will erode over time. Indeed, the surprisingly small size of their graph makes me wonder how much speed and scalability matter, compared to the challenge of data scarcity.

I’d like to see Freebase succeed. I’m particularly a fan of the work David Huynh has done there on interfaces for semantic web browsing. Clearly their investors are true believers–Metaweb has raised a total of $57M in funding. I don’t quite get it, but I’m happy we can all benefit from the results.

Categories
General

Social Networking: Theory and Practice

I’ve been a student of social network theory for years, enjoying the work of Duncan Watts, Albert-László Barabási, Jon Kleinberg, and a number of other researchers investigating this field. It should be no surprise that a topic that is so core to our humanity has attracted attention from some of our best and brightest.

And I’ve dabbled a bit on the theoretical side myself. The TunkRank measure (I’m indebted to Jason Adams for his implementing it on a live site!) attempts to take the most basic assumption about our social behavior–the constraint that we have a finite attention budget–and explore its implications for influence over social networks. I have a few unexplored hypotheses queued up for when I can find the spare time to try validate them empirically!

But why settle for theory? We live in an age where social networks compete with web search (and perhaps complement search) as the hottest online technologies. If we’re not reading about Google vs. Bing, we’re reading about Facebook vs. Twitter, with LinkedIn offering a third way that seems to co-exist with its more storied peers. In this post, I’d like to focus on LinkedIn.

LinkedIn, despite its feature creep, is still fairly old-school: its raison d’être is for users to build, maintain, and exploit their professional networks. In theory, connections on LinkedIn represent present or past working relationships that become the basis for referrals–whether the goal is employment, sales, or partnership. LinkedIn is not the only professionally oriented social network, but at this point it’s certainly the dominant one.

But I’ve found at least two additional ways to use LinkedIn that I’d like to share:

Intelligence gathering. For reasons I don’t yet claim to understand, people share far more information about themselves–and in a much cleaner, structured form–on LinkedIn than in perhaps any other online medium. Most people’s resumes are not available online, but their LinkedIn profiles are tantamount to resumes. Moreover, their structured format makes it possible for LinkedIn to assemble aggregate profiles of companies, revealing composite pictures that must drive some of those companies’ legal and HR departments batty! At a higher level, LinkedIn also works well as a discovery tool–much more so now they’ve enabled faceted search. It’s still a bit tricky to explore people and companies by topic, but far more effective using LinkedIn than using any other tool I’m aware of.

Meeting new people. Cold-calling, spamming–pick your poison. In short, LinkedIn doesn’t have to only be about connecting with people you already know. But there’s an art to sending unsolicited messages: you have to pass the moral equivalent of a CAPTCHA by proving that your communication strategy isn’t indiscriminate. Let me use a personal example (that Maisha Walker was nice enough to write up in her Inc. magazine column). I decided that I wanted to find everyone on LinkedIn who might be interested in HCIR ’09. So I searched for everyone whose profiles indicated interests in both IR and HCI and sent out a targeted message (in fact, a invite with personalized message–a feature I recently feared they’d killed). The results were overwhelmingly positive. I’m not sure how many of the people I contacted will attend, but I raised awareness without inflicting annoyance. Better yet, one of the people I contacted then discovered I was looking for volunteers to review the draft of my book–and I thus obtained hours of help of someone who, just a day before, had never heard of me!

What intrigues me about LinkedIn (and other social networks) is the extent to which I am exploiting attention market inefficiencies (as LinkedIn may be doing as well). For example, LinkedIn makes it easy to send unsolicited invitations to anyone. Granted, you can lose this privilege by even having a couple of people respond to invitations with “I don’t know this person”. There’s also the question of why people’s social norms around disclosure are so different on LinkedIn than anywhere else–people not only post the content of their resumes, but go through the effort of providing it to LinkedIn in a structured form! Meanwhile, LinkedIn keeps tightfisted control over the information it aggregates–understandably, they recognize that this content is their most valuable asset.

People are still getting used to the idea of social networks. It will be interesting to see how their use evolves, particularly in term of information and attention market efficiency.

Categories
General

Payola? There’s An App For That!

Remember a few months ago when there was a scandal about a Belkin employee paying people $0.65 per review to post 5-star reviews to Amazon?

Well, that was child’s play compared to what PR firm Reverb Communications has allegedly been doing for it clients. According to Gagan Biyani at  TechCrunch, Reverb hired interns to post positive review to Apple’s App Store for clients. Indeed, TechCrunch posted documentation obtained through an anonymous tipster, including the following:

Reverb employs a small team of interns who are focused on managing online message boards, writing influential game reviews, and keeping a gauge on the online communities. Reverb uses the interns as a sounding board to understand the new mediums where consumers are learning about products, hearing about hot new games and listen to the thoughts of our targeted audience. Reverb will use these interns on Developer Y products to post game reviews (written by Reverb staff members) ensuring the majority of the reviews will have the key messaging and talking points developed by the Reverb PR/marketing team.

What makes this story especially newsworthy is that Reverb’s client list includes some big names, such as Harmonix (i.e., Guitar Hero and Rock Band) and MTV Games.

Apparently the reviewer system isn’t entirely anonymous, so Biyani was able to look for patterns:

iTunes allows you to see other reviews posted by the same reviewer. So, we clicked on the reviewer “Vegas Bound” (iTunes link) and started to look at his reviews. He reviewed 7 applications, and gave each one of them 5 stars. Each review was short and sweet, and extremely positive. These reviews represented 6 different developers. A quick Google search revealed an infuriating truth: every single one of these developers was a client of one PR firm: Reverb Communications.

I can only hope that scandals like these will cause people to be more skeptical of reviews (or opinions in general) that come from anonymous or obfuscated sources. While most reviews are probably sincere, it doesn’t take much to erode public trust. Moreover, a few shill reviews can attract attention to a product, thus leading legitimate reviews to follow afterward. Where’s the harm? Products without those shill reviews are starved of the attention they might deserve. Money substitutes for authentic endorsement.

Our brave new world of social media makes it possible to truly democratize the sharing of knowledge and opinions. But gaming the system like this erodes the trust that is essential for this process to work–and thus devalues all of the information available to us online. The key enabler of such gaming is anonymity. Fortunately the miscreants do get caught on occasion. Hopefully we will learn from this experience and build more robust systems that aren’t so easily gamed. Transparency or FAIL.

Categories
Uncategorized

UIE Virtual Seminar on Faceted Search: A Great Experience!

Pete Bell and I delivered the seminar today, and it was a blast! We had over 150 registered listeners–and I found out that at least one of those registrations corresponded to a roomful of 20 people at an online retailer that is a thought leader in web usability and design!

Since we didn’t manage to get to all of the questions (over 40–possibly over 50 counting the activity on Twitter!), we’re going to do a follow-up podcast that will be available even to people who didn’t attend the seminar. And, since even that might not be enough, I’m saving all of the questions as blog fodder.

To all who attended–and to Jared, Adam, and all the folks of UIE–thanks from me and Pete for giving us this great opportunity to connect with folks interested in faceted search and user experience.

Categories
General

Google Search Appliance: Now Without HCIR!

In an earlier post, I speculated about why Google is holding back on faceted search. Of course, I was talking about their web search properties, not their enterprise offerings. I thought that they’d seen the light by now that faceted search–and HCIR in general–is especially important in the enterprise, where you can’t rely on PageRank, anchor text, and SEO–not to mention the large fraction of navigational and straight-to-Wikipedia queries.

But I was wrong. Don’t take it from me–watch the video below (or read this blog post) and listen to what Cyrus Mistry,  the product manager for the Google Search Appliance has to say. I might give him a pass on his dubious conflation all features other than ranked retrieval with “advanced search”. But here’s a direct quote: “users care about one thing: the right result coming to the top”.

Sigh. I don’t dismiss the value of relevance ranking. Some search queries are easy and clearly point to single documents as answers–and any search engine should do well on them. But lots of queries in site search and enterprise search environments (more so than on the web) don’t have a single best answer. That’s why we have faceted search and interfaces that offer useful information scent to users.

I understand that Google is, on the whole HCIR-averse. But I expect more from their enterprise division. To be clear, the “side by side” feature that Mistry touts is nice. It reminds me of Blind Search (built by a Microsoft employee in his spare time), and of a relevance ranking evaluator that Endeca customers have been using for years.

But there’s more to search results than ten blue links. Even the Google web folks seem to be slouching towards accepting the importance of interaction. Their enterprise team should be leading, not lagging.

Categories
Uncategorized

LinkedIn No Longer Allowing Invite Messages?

I noticed recently that, when I sent out an invitation to connect to someone on LinkedIn, there wasn’t the usual slot for including a free-text note with the invitation. I thought it might be a glitch–and I even considered the possibility that this was only happening to my account because I’m a bit of a networking junkie.

But I noticed on Twitter today that Mark Williams (aka @Mr_LinkedIn) had noticed the same change and followed up on it with LinkedIn’s customer service department. I never assume any site behavior on a freely provided service is permanent, but it is starting to look like this is a deliberate decision and not a transient bug.

If so, it’s an annoying change, though I can see the merits. I’ve made heavy use of the connection message, especially when inviting someone I don’t know all that well–or don’t know at all. A personal message can be what distinguishes a welcome cold call from spam. But I’m guessing that others have abused that capability, filling it with spam or worse. Still, I feel like LinkedIn may be throwing the baby out with the bathwater. Will follow up if / when I hear more.

UPDATE: Just saw this message on the LinkedIn site via Twitter:

Unable to Personalize Invitation Message

Why can’t I personalize the message in my Invitation?

We are aware of an issue preventing some members from customizing their Invitation messages. There is no need to contact Customer Service as our team is reviewing the issue to determine the best overall solution.

As a temporary workaround, the following message (with your name in the signature) is being sent when you click on the ‘Send Invitation’ button: ‘I’d like to add you to my professional network on LinkedIn.’

As long as you approve of this message, you may continue to take advantage of this feature. If you prefer a more customized message to be sent, you may delay sending your Invitations until the functionality has been restored.

UPDATE #2: Looks like the problem is resolved.

Categories
General

Prediction Is Hard, Especially About The Future

That Niels Bohr certainly knew what he was talking about! But that hasn’t discouraged folks in any number of industries from trying to make predictions.

Google in particular has been researching the predictability of search trends (just to be fair and balanced, so have Bing and Yahoo). Yossi Matias, Niv Efron, and Yair Shimshoni at Google Labs Israel have made some fascinating observations based on Google Trends, including the following:

  • Over half of the most popular Google search queries are predictable in a 12 month ahead forecast, with a mean absolute prediction error of about 12%.
  • Nearly half of the most popular queries are not predictable (with respect to the model we have used).
  • Some categories have particularly high fraction of predictable queries; for instance, Health (74%), Food & Drink (67%) and Travel (65%).
  • Some categories have particularly low fraction of predictable queries; for instance, Entertainment (35%) and Social Networks & Online Communities (27%).
  • The trends of aggregated queries per categories are much more predictable: 88% of the aggregated category search trends of over 600 categories in Insights for Search are predictable, with a mean absolute prediction error of of less than 6%.

You can read their full 32-page paper here.

I’m not surprised at the predictability of human search behavior, especially for stable topics or even for unstable ones viewed as aggregates–one could argue the celebrities and scandals du jour are unpredictable but interchangeable. What I’m curious about is what we can do with this predictability.

In the SIGIR ’09 session on Interactive Search, Peter Bailey talked about “Predicting User Interests from Contextual Information“, analyzing the predictive performance of contextual information sources (interaction, task, collection, social, historic) for different temporal durations. Max Van Kleek wrote a nice summary of the talk at the Haystack blog. The paper doesn’t investigate seasonality (perhaps because they only looked at four months of data), but I’d imagine they would subsume it under the broader categories of historic and social context. But they do set a clear goal:

Postquery navigation and general browsing behaviors far outweigh direct search engine interaction as an information-gathering activity…Designers of Website suggestion systems can use our findings to provide improved support for post-query navigation and general browsing behaviors.

I hope Google is following a similar agenda. If you’re going to go through the trouble of predicting the future, then help make it a better one for users!

Categories
Uncategorized

Last Chance to Register for UIE Virtual Seminar on Faceted Search!

My colleague, Endeca co-founder Pete Bell, and I are giving a virtual seminar on faceted search for User Interface Engineering (UIE) this Thursday, August 20th at 1:30PM EST. We’ve heard that there are over a hundred sign-ups already–which may actually correspond to more people, since a sign-up may mean a group of people watching in a conference room. We’re very excited about the opportunity to share our insights on a topic that draws such interest.

Jared Spool, who invited us to give this seminar, will be moderating. Indepedendent of the seminar, you you check out his work (and the UIE site) if you are interested in web usability.

The regular price is $129, but Noisy Channel readers who are interested in attending can get a $30 discount by using TUNKELANG (yes, all caps) as a promo code. Attendees also receive a free copy of my book, Faceted Search. That’s a a total value of over $150 for just $99! And it slices and dices!

Categories
General

The Raging Debate Over The Link Economy

Arnon Mishkin wrote a post last Thursday on paidContent called “The Fallacy Of The Link Economy” that has been generating a lot of discussion, so I figured I’d join in the free-for-all. First, let me try to reduce each person’s argument to a direct quote that best sums up his position.

Arnon Mishkin:

The vast majority of the value gets captured by aggregators linking and scraping rather than by the news organizations that get linked and scraped.

Jeff Jarvis:

Links are worth what the recipient makes of them.

Mike Masnick:

It’s not the link alone that has value or the story alone that has value, but the overall process of building a community.

Erick Schonfeld:

If a news site or a blog can say enough interesting things enough times that news aggregators (or other sites) keep linking to them, then they can build up their brand and reader loyalty.

Sigh. I thought the health care debate was bad enough, but I suppose that almost all impassioned debates come down to opposing sides exchanging half-truths.

In Mishkin’s defense: news organizations are in a catch-22. Many have suggested that if a news organization doesn’t want its content showing up on aggregators’ sites, it simply has to modify robots.txt accordingly. But news organizations can only do so individually–which puts them in a prisoner’s dilemma. Anti-trust law prevents news organizations from collectively bargaining with those who aggregate their content. For all intensive purposes, they are forced to abide by the status quo.

In Jarvis’s defense (yes, I’m actually defending Jeff Jarvis!): there isn’t much point in producing content for which most of the value is captured in a teaser so small as to be covered under fair use rights. As he’s said elsewhere, newspapers are inefficient, and the industry will have to shrink a lot to be healthy.

In Masnick’s defense: I cite my own blog post (also inspired by one of his posts) about monetizing community because participation is inherently uncopiable. It’s hard for me to agree with him more strongly than that!

In Schonfeld’s defense: his argument sounds a lot like the “freemium” strategy, which has a respectable track record. In order to build a loyal customer base, you often need to give away free trials as teasers–and that’s effectively what happens when media sites make some of their content available through aggregators. And, as in the freemium model, the actual product has to be significantly more interesting that the free teaser to earn the consumer’s investment–whether that investment is in the form of money, attention, or loyalty.

So, do I agree with them all? Not exactly. Mishkin’s first prescription to news organization should probably be to cut investment in undifferentiated content. Jarvis should acknowledge that the inability of news organizations to collectively bargain is unfair to them. Masnick–well, I basically do agree with him on the limited point he’s making. I suppose the strongest objection would be that not all media sites should be forced to become communities just because they’re hobbled in their ability to negotiate the monetization of the content they produce. And Schonfeld’s argument assumes the current link economy as a given–and one of the biggest points of contention is whether news organizations should be allowed to try to change that economy.

Sadly, I don’t see any of these guys giving the other an inch, which is why this discussion will probably continue unchanged for the foreseeable future. Hopefully the passion of the debate helps sell, um, papers.

Categories
General

Why Does Google Hold Back On Faceted Search?

Sometimes the response to a comment is worthy of an entire post, and this is one of those times. In response to my recent post about Able Grape, a wine search engine developed by Doug Cook (now Director of Twitter Search), Lee asked:

Let’s say I know almost nothing about wines/digital cameras/cars and a search site offers me “options” to drill down. However, I can’t use those effectively and eventually it comes down to availability and price for me. My questions are what are your thoughts on these kinds of situations and is there a scientific explanation/theory on this case?

This may be why Google does not endorse faceted search except for experimental projects.

It’s a great question. There’s been a lot of research on how people make decisions when they have to manage trade-offs among multiple attributes, and the increasing interest in behavioral economics since Daniel Kahneman won the Nobel Prize in 2002 has helped some of that research has even percolated into the mainstream thanks to bestsellers like Freakonomics and Dan Ariely’s Predictable Irrationality.

The short answer is that there’s no point in offering users options that they can’t (or won’t) use effectively. Choice overload is certainly a problem, and our reaction to it is to satisfice, typically resorting to “fast and frugal” heuristics that throw out most of the potential decision criteria and instead focus on one or two attributes, e.g., price and availability.

But that’s no reason to dumb down the data we make available to decision makers. We make hard choices all the time, and fast and frugal can be horrendously suboptimal. We don’t hire employees based solely on their price and availability–or at least good employers don’t! For that matter, I don’t think most people pick wines that way, given that even Trader Joe has to diversify beyond “Two Buck Chuck“. And, while there’s probably more of a market for cheap cameras and cars, I’m pretty sure you’re an extreme outlier if you completely ignore other criteria.

That said, there are some caveats about exposing options to users. Faceted search is hard, especially on the open web. Take it from the folks at Microsoft Research–but I’m sure Googlers would be the first to agree, especially given their experience with projects like Google Squared that, while promising, are nowhere near ready for prime time.

I appreciate that Google is conservative about embracing faceted search–and HCIR in general. I’m actually impressed by the steadily improving quality of their related terms for search queries–even if they do hide them behind two clicks (show options -> related searches). Perhaps they’re feeling some pressure from Bing. But I think they’re largely following the dictum of “if it ain’t broke, don’t fix it”. Google is an extremely successful company. And, as Clayton Christensen argues, successful companies are great at incremental innovation and bad at disruptive innovation. As far as I can tell, faceted search is very disruptive to their model.