The Noisy Channel


Is People-Powered Search Overrated?

December 23rd, 2008 · 7 Comments · General

I recently read an article by Matthew Shaer in the Christian Science Monitor entitled “The future of search: Do you ask Google or the gaggle?” and subtitled “To improve results, new search engines rely on users instead of computers.” The article goes on to talk about Google’s SearchWiki, Jimmy Wales’s Wikia Search, and a number of “people-powered” search tools.

I agree strongly with Wales on the value of transparency and that “because search is so secretive, and so propriety, there are fewer checks and balances”. But I agree just as strongly with Shaer that “handing over control to a community could engender a flood of spam, or devolve into a mess of internecine backbiting among users”, both of which he’s observed on the Yahoo Answers site.

Wales ultimately sees the question as not whether humans make the decisions, but rather by what process, i.e., democratically vs. top-down. His Wikia Search effort is an attempt to take repeat Wikipedia’s success for general web search.

But, while I like democracy as a political system (as Churchill said, it’s the worst form of government except all the others that have been tried), I’m not sold on Wikia Search or any of the crop of people-powered search engines.

Perhaps the problem is that, much as in electoral democracy, we need to be vigilant about attempts to game the system. The anonymity of web users is as much a problem as the secrecy of search ranking algorithms. since it allows people to game “people-powered” systems with impunity.

Would a transparent people-powered search system work? Perhaps, assuming it could address the privacy concerns of users. I’m all for transparent social navigation.

But let’s not forget the other part of people power: giving users meaningful control. Crowdsourcing might improve on the current crop of ranking algorithms, but what I really want is a search engine that provides me with transparency, control, and guidance. Let me get under the hood.

7 responses so far ↓

  • 1 Zach Heller // Dec 23, 2008 at 11:46 am

    Interesting theories both ways. In some respects I think Google has always paid attention to user behavior when revamping their search formula. I think for the most part, users will define pretty good search results, and with the amount of Googlers, I don’t think spam results would be an issue. Social bookmarking sites have proven for the most part that internet users are good at finding and sharing information.

  • 2 Daniel Tunkelang // Dec 23, 2008 at 12:18 pm

    Given Google’s secrecy, all we can do is speculate about how they leverage user behavior. But we know there have been concerns about click fraud in their ad business, as well as spamdexing to game organic results.

    Given the enormous value of placement in unpaid search results, I would expect attacks on any point of vulnerability. Perhaps that’s why Google is so secretive, but I’ve never been a believer in security through obscurity.

    p.s. Not trying to single out Google here. Yahoo and Microsoft have the same issues, only at a smaller scale.

  • 3 Gene Golovchinsky // Dec 23, 2008 at 12:55 pm

    There is also the issue of breadth: the wisdom of crowds works only when (among other criteria) people’s opinions are independent of each other. Otherwise, you will re-find the same thing that everyone else found, rather than things that match your specific info need. One possible way to increase tansparency in search results is to give users control over the contribution of content vs. opinion on ranking.

  • 4 jeremy // Dec 23, 2008 at 1:59 pm

    But isn’t the link structure of the web itself, and the PageRank formula that makes use of it, already a form of people-powered search? So is the concern here one of degree, rather than concept?

    I do think Gene has a really good point, too. One of the facts that is too often overlooked in all of these social media spaces is that the the independence assumption often doesn’t hold. And without that, you’ve lost the foundational basis for “wisdom of crowds”.

  • 5 Daniel Tunkelang // Dec 23, 2008 at 3:39 pm

    The aggregated opinions don’t necessarily have to be independent, but it’s certainly a problem if their dependence isn’t correctly modeled. And positive feedback loops is a more general problem with any kind of recommendation / user-driven relevance system–including PageRank.

    Jeremy, you’re right that using PageRank as a query-independent relevance factor is a form of people-powered search. But, even in its original form, PageRank was designed to be harder to game than, say, popularity based on number of clicks. And, as PageRank has been gamed through link farms and other forms of spamdexing, Google has made proprietary (and secret) modifications to that original approach.

  • 6 Len // Jan 4, 2009 at 12:19 pm

    I am little late for this discussion and in general blog discussions tend to be one day long, which in itself is very telling fact about the nature of Web information. But anyway, I write this comment if only to let out my feelings about the entire state of media and information.
    To begin with, I never believed in media as reliable source of information when considered independently from other sources. Web is not exception to that as we can see the same kind of “pollution” as in TV and paper media.

    People driven vs. algorithms (if one can ever be separated from other) is not the real choice. Our options are limited by the very nature of information available on Web, which is merely a byproduct of media business. No one gets paid for producing correct information or even for genuine effort to do so. Likewise, no one buys this information as anything worthy of investing your time beyond obtaining quick and dirty input into the real (and private) quest for truth.
    Even writing this response is probably wast of time as no real value is a stake here – just words.

    So it follows that the best search technology is the one that produces the best raw material – not the best answers, which is impossible in principle until Web becomes the place where real values are traded (like stock exchange).

    Therefore the real measure of search engine is what I would call a coverage or all inclusiveness of results, more so than focus or selectiveness of any kind. Google gives you simply the best coverage, which is why I use it most.

  • 7 Daniel Tunkelang // Jan 4, 2009 at 12:42 pm

    Len, fortunately this is a blog that doesn’t close off discussion after 24 hours.

    I agree with you on some of your points, specifically that all media have their share of misinformation and that much of the online content is derivative of offline publications.

    But I disagree with the rest:

    People do get paid to produce information: even in this economy, there are paid reporters, analysts, and consultants.

    People buy information–not all publications are published for free, and even free publications sell that information for their readers attention, which they in turn sell to advertisers.

    There’s more to search technology that return the “best raw material”–and even you suggest that you don’t mean best but most when you argue that search engines should be measured solely based on their breadth of coverage. Google didn’t win the search wars by indexing the most documents; rather, it offered the best relevance ranking of the documents it indexed. You can debate what “best” means for something as subjective as relevance, but users acted on this judgment en masse.

    Surely the best search technology is one that helps you find what you are looking for as effectively and efficiently as possible. Google is certainly good enough for a significant subset of people’s information needs, and most people don’t think beyond that subset. Or, if they do, they don’t expect a search engine to address it.

    Finally, if “just words” are a waste of time, why bother learning how to read and write? Or if only facts matter, why learn how to think and analyze? There’s more to discussion than catharsis.

Clicky Web Analytics