Categories
Uncategorized

Visualizing Political Bias

Just saw a post from waxy.org by Andy Baio entitled “Memeorandum Colors: Visualizing Political Bias with Greasemonkey“. Here’s a quick excerpt:

With the help of del.icio.us founder Joshua Schachter, we used a recommendation algorithm to score every blog on Memeorandum based on their linking activity in the last three months. Then I wrote a Greasemonkey script to pull that information out of Google Spreadsheets, and colorize Memeorandum on-the-fly. Left-leaning blogs are blue and right-leaning blogs are red, with darker colors representing strong biases.

To install it, you’ll need Firefox (not a problem for 61% of you, according to my analytics) and optionally the Greasemonkey extension:

After it’s installed, go to any page on Memeorandum and wait a second for the coloring to appear. For details of how they used Singular Value Decomposition (SVD) to score the blogs, check out the post.

It’s a nice application, and it reminds me of a 2004 CIKM paper by Miles Efron entitled “The Liberal Media and Right-Wing Conspiracies: Using Cocitation Information to Estimate Political Orientation in Web Documents“.

Categories
General

Twitter’s Twist on the Attention Economy

I am a long-time LinkedIn user, and over time I’ve accumulated over 1,000 connections. Most of them are people I actually know or at least have interacted with online beyond “connecting”.

You might think that’s a large number of people to have as connections, and that I could afford to have a more selective velvet rope. And, as you may have noted, I know only most of my connections; some of them are link spammers whose connection requests I nonetheless accepted.

But, you see, there’s no incentive for an individual to reject a spammy connection request. Link spammers do reduce the relative value of legitimate links, and as a result devalue the LinkedIn network as a whole. But it’s a classic tragedy of the commons. Why should I personally sacrifice the reach of my network if I gain nothing? As far as I can tell, this problem applies just as much to Facebook and other social networking platforms.

Twitter is a different beast. Granted, Twitter and LinkedIn may not even see each other as competitors, but that is beside the point. They are competing for people’s social networking cycles, and all of today’s social networking platforms / applications are surely keeping their options open as to what positions they will ultimately stake out.

In any case, what most differentiates Twitter from LinkedIn is their attention economics. On LinkedIn, you incur a benefit–at no apparent cost–from the size of your network, up to degree 3. In contrast, all that matters in the Twitter “social graph” are your immediate links. You don’t get any direct benefit from connections at distance greater than 1. Moreover, the connections are asymmetric, as are their costs and benefits. Following people is an investment of your attention, where the return is access to information (in a broad sense). Being followed is an investment of their attention, and hence an opportunity to exert influence. The asymmetry of Twitter connections is most evident for celebrity influencers, who have far more followers than followees.

While Twitter, at least in my view, is a work in progress, I think they have done well to align their model with attention scarcity. I’m most keenly aware of this scarcity as I decide whom to follow. Accepting a connection from a LinkedIn spammer costs me nothing, while following someone on Twitter who updates on every inhale and exhale would render the service completely worthless.

As a result, connections in Twitter reflect real value. They correspond to investments of attention. Someone with many followers is much like an author with many readers. While I’m sure this metric can be gamed (e.g., by creating bogus Twitter accounts and having them follow you), at least Twitter has the model right in principle.

Speaking of which, if you’re interested in following my tweets, you can find them here.

Categories
Uncategorized

People Ask Lousy Questions

I just saw Michelle Manafy’s notes in EContent about the recent Enterprise Search Summit West.

A great quote from IDC analyst Sue Feldman: “One of the problems we have with search is that people ask such lousy questions…anytime tools hand people clues, it helps.” Sue has been pushing conversational interfaces for a while, and I agree with her that, as an industry, we need to keep working on the tools to support query elaboration and interaction in general.

I do take issue with Stephen Arnold’s advice at the same conference to vendors to get on the Google-enhancement gravy train and “build solutions that sit on top of Google and make it work better.” Dare I say that the writer of Beyond Search is being a bit reactive?

Categories
General

Search is Not Advertising

Thanks to Greg Linden (who in turn thanks John Battelle) for calling my attention to a post by Google VP of Product Management Susan Wojcicki entitled “Ad Perfect“.

We can distill Wojcicki’s post to three principles, each a direct quote:

  1. “advertising should deliver the right information to the right person at the right time”
  2. “help you learn about something you didn’t know you wanted”
  3. “it needs to be very easy and quick for anyone to create good ads, to show them only to people for whom they are useful, and to measure how effective they are”

While Wojcicki does call out the similarity between Google’s mission in advertising and its mission in search, she fails to see a key difference–a difference exposes a fundamental problem with web search today.

Search is all about the user. If you can help me, the user, find what I’m looking for, or to find something I didn’t know I wanted, then I’m all ears (or eyes). Of course, I’d like to understand your motives if you’re offering to help me make decisions, especially if they involve my money or even my health.

Advertising is about selling the user’s attention to the highest bidder. Google has done more than anyone to make that bidding process economically efficient. But any utility that advertising proves to users is a means to an end. Advertising is all about the advertisers, and the advertisers only care about providing value to users in so far as their interests are aligned. Absent alignment, advertisers naturally look out for themselves.

This dynamic is hardly unique to search; it applies to any situation where we allow someone or something to influence our decisions. Indeed, persuasion and critical thinking have been locked in an arms race for millenia. The use of advertising to subsidize content dates back to the early 1800s. Wikipedia offers a nice history of the subject.

But supporting search through advertising is a tricky business. Google insists that it maintains a wall between its search and advertising businesses. But Wojcicki’s post–which is on Google’s official blog–suggests otherwise, at least in spirit. If Google believes that both search and advertising aim to “offer relevant content” and “deliver the right information to the right person at the right time”, then why put up a wall at all?

In any case, it is at best misguided and at worst intellectually dishonest to claim that the main goal of advertising is to inform or help the user. The goal of advertising is to influence the user, a goal whose achievement requires delivering a message to which the user is receptive. But influencing is not the same as informing. I hope we all have the critical thinking skills to appreciate the difference.

Categories
Uncategorized

Sales Pitch for the Semantic Web

Thanks to Marco Neumann, who runs the New York Semantic Web Meetup, for alerting me to this presentation by Nova Spivack, whom Marco aptly describes as Chief Director of Sales of the Semantic Web. Enjoy!

http://vimeo.com/moogaloop.swf?clip_id=1062481&server=vimeo.com&show_title=1&show_byline=1&show_portrait=0&color=&fullscreen=1
Nova Spivack at The Next Web Conference 2008

Categories
Uncategorized

Reblogging: NRC Report – Data Mining Won’t Find the Terrorists

For the benefit of readers using RSS, I just wanted to point people to great discussion going on in the comment thread for this post.

Categories
Uncategorized

Kevin McDonald on Endeca on Freebase

This post from Kevin McDonald triggered all of my web alerts, so I thought I’d share it. It’s an interesting thought.

Categories
General

NRC Report: Data Mining won’t find the Terrorists

According to Declan McCullagh, a just-released U.S. National Research Council report entitled Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Assessment concludes that automated identification of terrorists through data mining or any other mechanism “is neither feasible as an objective nor desirable as a goal of technology development efforts.”

I haven’t had the time to read through the 352-page report. The committee that wrote the report includes Stanford professor William Perry, former MIT president Charles Vest, and Microsoft researcher Cynthia Dwork. Such a crew undoubtedly realizes that any data mining technique yields false positives. The big questions are whether the data mining techniques are more effective than the alternatives, and whether the using them is consistent with law and policy.

Based on McCullagh’s summary, the report seems to mainly call for oversight and objective evaluation. Nothing controversial there. And, as he wryly notes, Americans may have watched too many episodes of 24 to have a realistic sense of what data mining can and can’t do.

Still, I think we’d be naive to give up entirely on machine learning approaches to fight crime and improve national security. As with all science, we need to subject hypotheses to rigorous, objective testing. But remember, low-tech approaches have false positives too. There is no moral superiority in being a Luddite.

Categories
General

The Data Cloud?

As Paul Miller notes, “the Cloud” is increasingly prevalent in tech conversation these days. As if “cloud computing” weren’t a fuzzy enough term, now we have the “data cloud” which, if I understand Paul correctly, may just be a rebranding of the “semantic web” (itself a bit fuzzy for my tastes). Although it’s not clear to me from the article to what extent the “data cloud” represents a commodified data repository vs. a common framework to link everyone’s data using open standards.

I suppose I’ve been in technology long enough that I shouldn’t be making fun of buzzwords, especially when the movement to the cloud represents a real and positive phenomenon. But the semantic web needs more than rebranding. A quick search turned up this post from last year that lists what Nova Spivack identified as barriers to the adoption of the semantic web:

  1. A lack of tools
  2. Scaling challenges (what if you want to store a trillion+ triples?)
  3. Vision issues (how can we define a practical vision, for the low-hanging fruit?)
  4. Inadequate Content (not enough semantic data available)
  5. No killer apps
  6. Market education

One year later, I’m not sure we’re that much farther along.

Categories
Uncategorized

Enabled Permalinks

On a friend’s advice, I enabled permalinks for posts here. The good news is that the links will be more SEO-friendly and attract oodles of traffic. The bad news is that all posts may appear in your reader as unread. Sorry.