Categories
General

Is People-Powered Search Overrated?

I recently read an article by Matthew Shaer in the Christian Science Monitor entitled “The future of search: Do you ask Google or the gaggle?” and subtitled “To improve results, new search engines rely on users instead of computers.” The article goes on to talk about Google’s SearchWiki, Jimmy Wales’s Wikia Search, and a number of “people-powered” search tools.

I agree strongly with Wales on the value of transparency and that “because search is so secretive, and so propriety, there are fewer checks and balances”. But I agree just as strongly with Shaer that “handing over control to a community could engender a flood of spam, or devolve into a mess of internecine backbiting among users”, both of which he’s observed on the Yahoo Answers site.

Wales ultimately sees the question as not whether humans make the decisions, but rather by what process, i.e., democratically vs. top-down. His Wikia Search effort is an attempt to take repeat Wikipedia’s success for general web search.

But, while I like democracy as a political system (as Churchill said, it’s the worst form of government except all the others that have been tried), I’m not sold on Wikia Search or any of the crop of people-powered search engines.

Perhaps the problem is that, much as in electoral democracy, we need to be vigilant about attempts to game the system. The anonymity of web users is as much a problem as the secrecy of search ranking algorithms. since it allows people to game “people-powered” systems with impunity.

Would a transparent people-powered search system work? Perhaps, assuming it could address the privacy concerns of users. I’m all for transparent social navigation.

But let’s not forget the other part of people power: giving users meaningful control. Crowdsourcing might improve on the current crop of ranking algorithms, but what I really want is a search engine that provides me with transparency, control, and guidance. Let me get under the hood.

Categories
Uncategorized

The Evolution of Search Results: An SEO Perspective

Today’s post on SEOmozBlog explores how the evolution of search results is changing the landscape for search marketers, aka SEO professionals. Here’s a teaser:

I still believe we’re years (3-5) away from an SEO economy where links don’t play the primary role (and I doubt we can ever get away from keywords – that’s search at its most basic), but I do agree that we’re plodding slowly down that path.

The post also links to and excerpts a “thought paper” called “New Signals to Search Engines” by search marketing firm Acronym Media. It’s an interesting–and refreshingly intellectual–take on how the search engines of the future may move to a less link-centric approach.

Categories
Uncategorized

Guerilla Marketing Gone Wild

The Sunday before Festivus is surely a slow news day, but today’s top tech story is a doozie. Evidently College Prowler, a publishing company for guidebooks on top colleges and universities in the United States, was creating hundreds of “Class of 2013” groups on Facebook, using sock puppet accounts, for the purposes of self-promotion. Brad Ward, a recruiter for Butler University, sleuthed out this marketing strategy and posted an expose at his blog, SquaredPeg. The story has spread like wildfire, including to the Chronicle of Higher Education.

The story is still evolving, but it looks pretty bad for College Prowler. Social networks, whether offline or online, are built on trust, and, as we’ve learned recently from the Madoff scandal, networks of trust are vulnerable. Perhaps universities should have been more proactive in establish their own Class of 2013 Facebook groups, though that feels like blaming the victim.

In my view, this incident argues in favor of discouraging online anonymity, at least in contexts where we need to build trust. This is the one aspect in which Knol got it right.

Categories
General

Enterprise Search: Beset by Marketing and Hype

Given my role at Endeca, I am hardly objective about the competitive landscape of enterprise search. But, while reading an article about enterprise search in the latest issue of Information Age,  I was pleasantly surprised to find myself agreeing with Autonomy CEO Mike Lynch that the enterprise search industry is beset by “marketing and hype”, and that the technologies available are far from equal.

Not surprisingly, there are a variety of  perspectives among the major enterprise search vendors about how best to address the challenges of enterprise search:

  • Autonomy promotes “meaning-based computing”, its branding of its information extraction and text mining techniques.
  • Dave Armstrong, a head of products and marketing for Google’s Enterprise division, questions the feasibility of structuring content and emphasizes the importance of search for unstructured data.
  • Martyn Christian, IBM’s VP of enterprise content management, asserts that search should not be used to address problems better served by classification and metadata.
  • Endeca (not mentioned in the article) emphasizes an interaction-centric “guided summarization” approach that readers here will recognize as human-computer information retrieval.
  • Microsoft’s FAST is mentioned, but the only quotation cited is from a disgruntled former customer.

Note that I am trying to convert vendor slogans into vendor-independent terms that have some traction in the information retrieval research community. My hope is that, through neutral forums like the SIGIR Industry Track, we can do a better job as vendors of keeping ourselves honest, as well as engaging academic researchers to help connect their work to the real world.

Above all, let’s strive to compete on technology and ideas, rather than on obfuscation through marketing.

Categories
General

Fair Use and SEO

The Huffington Post, one of the most prominent political blogs on the web, usually courts political controversy for its unapologetically liberal perspective. But now it finds itself in a different sort of controversy over they way it aggregates content from other sites.

It started with a complaint from Whet Moser at the Chicago Reader:

The Huffington Post’s local “aggregation” wing straight stole our entire Bon Iver Critic’s Choice–they didn’t ask permission (“read the whole article”? that is the whole article, dumbass),

This isn’t an isolated incident. As Henry Blodget puts it:

The Huffington Post’s news aggregation business drives enormous traffic to the third-party sites its editors link to (including, occasionally, this one). The Huffington Post also often excerpts liberally from third-party sites’ stories and uses this content to drive significant traffic to itself.

Ryan Singel presents both sides of the story at Wired, including Huffington Post co-founder Jonah Peretti’s contention that the excerpts drive traffic to the original sites from which they were aggregated.

What fascinates me is that, while the legal and ethical arguments are about what constitutes fair use, the driving concern is search engine optimization (SEO). In many cases, The Huffington Post is excerpting stories without adding any new content, but is then drawing a significant amount of search traffic to its site that, presumably, would have otherwise gone directly to the original articles. In other words, they’re putting themselves in the middle and taking a cut through the resulting advertising revenue.

I can certainly see how this behavior drives online news providers up the wall. Even if The Huffington Post is acting within the legal constraints of fair use, its actions certainly seem parasitical. Unless they are driving traffic to the sites they aggregate that would not have otherwise gone there directly, they are simply profiting from being better at the SEO game.

I see this scenario as a cautionary tale for our excess dependence on traffic from search engines that promote an adversarial model. This is the dark side of SEO–a no-holds-barred fight for a piece of people’s scarce attention.

Categories
Uncategorized

Google Image Search Gets Style

Clip art

Line drawing

Google announced today that its image search now supports search-by-style. As someone who regularly uses Google’s image search to find fodder for my presentations, I am excited about this enhancement. Moreover, I think it’s a clever application of the various image analysis algorithms Google has been developing.

They now include a drop-down that allows you to restrict searches to images from news content, faces, clip art, line drawings, and photo content. It’s not 100% accurate, but it’s not bad.

What is unfortunate is that the interface, whether you’d like to explore images by style or by size, doesn’t give you any sort of preview of the content in each category. I at least find it annoying to have to keep clicking to explore the space. But this is at least a baby step towards supporting exploratory search, in a domain that cries out for it.

Categories
Uncategorized

How do people arrive at The Noisy Channel?

Like most bloggers, I diligently analyze my logs to see how readers are responding to my rambling. I use the Clicky, which I’ve found quite nice even if it isn’t free (but it does provide real-time updates).

Here are the stats for the past month:

  • 48%: directly or through bookmarks
  • 25%: links from other sites
  • 15%: RSS readers and social media
  • 12%: (non-paid) search results

Note that I don’t find out who is reading the blog through RSS readers; I only see log entries for people who click through the readers, e.g., to read or post comments.

The searches are certainly the most entertaining  bits in the log. Here are a few I found particularly amusing:

  • channel for inspired people
  • english sex channel
  • how to make pipe quick
  • “keep yourself on the gravy train for life”
  • psychology of noisy people

I would be curious to know more about who is reading the blog through RSS readers. Anyone here have advice on how or whether it is feasible to do so?

Note: my asserting that my eulogies for privacy nothwithstanding, I respect the anonymity of my readers and will only disclose log data in forms like the above, which do not disclose any even remotely personally identifying information.

Categories
Uncategorized

Networks of Trust are Vulnerable

The offline story of how former Nasdaq chairman  Bernard Madoff evidently took $50B from investors in a massive Ponzi scheme has been been a staple in the press since the story broke last week. But what caught my eye today was an article in the Wall Street Journal. The title, “Madoff Exploited the Jews“, strikes me as a bit glib, but it’s the subtitle that struck me:

Networks of trust are vulnerable. No law can change that.

Diving deeper into the article:

His contacts and connections, his religion and affiliations, his public and private positions, all worked to make his funds look legitimate and exclusive. And he knew how to play his prospects, when to turn potential clients down, when to give something extra.

And finally the closer:

The violation of trust at the heart of that story — of trust by those with the greatest reason to trust — cries out for sympathy. It illustrates the limits of law, not the need for more of it.

The stories of con men (and the occasional con woman) go back centuries, and perhaps there’s nothing new to see here. But I think this story should serve as a wake-up call to those of us who see trust as the foundation of building value in online social networks.

A common criticism of online social networks is that they are less robust than offline ones because there is no substitute for the trust we build through offline interactions. But perhaps the real problem is that we have never learned how to reliably calibrate trust, offline or online. The efficiency of online communication, ideally suited to keeping us more informed, can also propagate disinformation at unprecedented rates (cf. information accountability). We need to learn how to manage our trust more rationally. Perhaps technology can help.

Categories
General

The Macroeconomics of Information and Attention: How People Interact

In my previous posts, I discussed applying Mankiw’s Brief Principles of Macroeconomics to the attention economy postulated by Herb Simon and went through the first seven of ten economic principles, which concern how people make decisions and how the economy works as a whole. In this final post of the series, I’ll consider the last three principles, which concern how people interact.

8. A Country’s Standard of Living Depends on Its Ability to Produce Goods and Services.

Over the past two decades, we’ve increasingly heard that we live in an information economy. In the United States, the information economy has been estimated to represent 63% of the total GNP–and that estimate is over a decade old! Even allowing for the inherent challenge in defining the information economy, it’s clear that a large fraction of the goods and services produced in the United States represent information goods, and–present economic concerns notwithstanding–have contributed to the steadily advancing standard of living in this country.

But what if we restrict our attention to the information and attention markets, rather than overall standard of living? Can we still derive insight from Mankiw’s principle?

I think we can best answer that question by looking at asymmetries in the global information market. Information providers, in which I’ll include everything from traditional media companies to web search engines, are heavily concentrated in the United States. As a result, far more attention flows into the United States than out of it. In global economic terms, the United States has a attention trade surplus.

9. Prices Rise When the Government Prints Too Much Money.

Of course, there’s no government printing a liquid currency specific to information or attention. Nonetheless, we can see effects akin to inflation when larger amounts of information become to people without an corresponding increase in the value that information represents. This information glut is the root cause of information overload, and the result is that all information becomes perceived as less valuable.

On the other side, there can be no inflation in the attention market, since people’s attention represents real, rather than nominal, value. If everyone were to have their 15 minutes of fame, then the fame wouldn’t be worth much.

10. Society Faces a Short-Run Tradeoff Between Inflation and Unemployment.

Here I have to admit that it’s a bit of a stretch to apply this principle of macroeconomics to information markets. But this is the last of the ten principles, so I at least owe it a try.

Reducing inflation, in the sense described by the previous principle, requires reducing (or the slowing growth of) the amount of information available for consumption. Naturally, information producers resist such a reduction, as they would like to use this information to gain the attention of information consumers. But it’s a tragedy of the commons: if all of the information producers attempt to optimize for their self-interest independently, the result will be a devaluation of everyone’s information.

This is the hard choice we face as a society when we attempt to remove friction from information and attention markets. It is tempting to reduce the cost of publication to essentially nothing and optimizing the liquidity of attention markets through auction models like those used for search advertising. We can introduce friction to reduce inflation, but only at a cost.

To sum up: information and attention may not be traditional economic goods, but they nonetheless follow general economic principles. And technologists who work with information would do well to learn from those principles.

I’d like to close this series with a story I heard from a colleague at Yahoo Research (either Prabhakar Raghavan or Usama Fayyad) about economics and information. Yahoo runs a online personals site, and encountered a problem common to such sites: women complaining about being inundated by email from men. Yahoo’s engineers saw this as a technical problem and brainstormed technical solutions, such as automatically detecting and filtering out messages that might draw complaints.

But an economist on the staff quickly identified the problem: the lack of scarcity in the system’s attention market. He proposed a simple solution: give men a limited supply of “digital roses” to hand out to women. Then the invisible hand of market economics solves the problem on its own.

I don’t know whether or how Yahoo ultimately implemented this approach, or whether they considered its applicability to other gender pairings. But, as information and attention become increasingly important economic goods, we would do well to learn from their example.

Categories
General

Who Invented Attention Economics?

My recent posts about the macroeconomics of information and attention triggered an unexpected controversy about who deserves the credit for the concept of attention economics. The Wikipedia entry for attention economy credits Herbert Simon, and I had always thought he came up with the idea. Perhaps I’m biased because of the five years I spent at CMU.

But Michael Goldhaber posted a comment in which he made a case that he deserved credit for introducing the idea. Unfortunately, I couldn’t ask the late Herb Simon to respond.

But I did find an explanation on thw WorkingCogs blog that I thought might satisfy all parties:

Herbert Simon is often credited with being the first person to describe what attention economics is – that a wealth of information leads to a dearth of attention due to the fact that there is so much information out there and only so much attention that can be given to information, and the idea behind rationalizing how much attention any one information source receives.

Golhaber (1997) seminal paper (on an online peer reviewed journal) is however the crucial turning point for this idea. This article presents the strong hypothesis and its consequences. In what follows we will try to introduce the idea of attention economy, mostly from Goldhaber’s point of view and how some popular pages implemented attention technology. Goldhaber has been preparing a book for the last 10 years, and he blogs prolifically. 

I hope this explanation offers an equitable allocation of credit and resolves the unintended controversy.