The Noisy Channel

 

Can We Learn From Anti-Social Users?

November 21st, 2009 · 7 Comments · General

One of the interesting challenges we face as both both developers and consumers of search technology is that social signals are a double-edged sword. On one hand, social signals have proven essential in distinguishing signal from noise–be they links, re-tweets, or any number other ways that online consumers (or more correctly “prosumers”) actively and passively communicate value judgments about information. On the other hand, our reliance on these social signals makes us vulnerable to positive feedback and spammers.

Consider MusicLab, an “experimental study of self-fulfilling prophecies in an artificial cultural market“. In this study, sociologists Matt Salganik, Peter Dodds, and Duncan Watts manipulated the social information available to consumers (specifically teens) regarding their peers’ musical tastes. The experimenters’ goal was to empirically validate a quantitative model of social contagion.

But we can look at this study another way: by isolating the social factors that influence musical taste, the experimenters were also isolating the non-social signal–in theory, how popular a song would be in the absence of social signaling. Indeed, they found that, if they measured a song’s quality by isolating out the social factor, “the best songs never do very badly, and the worst songs never do extremely well, but almost any other result is possible”.

It’s interesting–interesting to me, at least!–to ask if search engines can do the same for search. One of the frequent objections to link-based authority measures like PageRank is that they make the rich get richer. “Real-time” variants like re-tweet frequency (and even TunkRank) suffer from the same weakness. Unchecked, these measures can cause authority / influence market has to resemble a winner-take-all market.

It strikes me as interesting to learn from cases where searchers swim upstream against the social signals to find information. Of course, you may already see the contradiction–this is just another kind of social signaling! Still, it seems like it might be a way to hedge our bets and against the weaknesses of positive feedback and spammers. In a similar vein, we might look at how users find information that suffers from poor accessibility or retrievability.

I don’t have answers about how to pursue such an approach, or whether it would even be feasible to do so. But I hope you agree with me that it’s an interesting question.

7 responses so far ↓

  • 1 jeremy // Nov 22, 2009 at 1:56 am

    Oh, I completely agree that it’s an interesting question! In particular, you write:

    It strikes me as interesting to learn from cases where searchers swim upstream against the social signals to find information. Of course, you may already see the contradiction–this is just another kind of social signaling!

    I don’t know if you remember at SIGIR this year, but right after you asked your question of Barabasi, after his keynote, I asked him a question quite similar to your quote above. I wanted to know if scale free networks locked you in to PageRank-like algorithms, or whether you could use those same link signals to get just the opposite.. the hidden nuggets.

    I essentially want to know, and still want to know, if you can use pagerank against itself.. some sort of link reversal or something like that. So as to do exactly what you’re saying.. swim upstream.

    Let me point to back to something I wrote in August, called “Information retrieval jujitsu”:

    http://irgupf.com/2009/08/19/information-retrieval-jujitsu/

    This is actually also one of the goals that we’ve had in our research on (explicitly) collaborative information seeking. Social search is to PageRank as (explicitly) Collaborative Search is to… Upstream PageRank. Or whatever you want to call it. Gene give a pithy summary here, and we talk about it a lot in our papers:

    http://palblog.fxpal.com/?p=1494

  • 2 Daniel Tunkelang // Nov 22, 2009 at 10:45 am

    I do remember, and I figured you’d appreciate this question. I was also tempted to use a charged term like affirmative action to describe what such an approach might look like, but thought better of it. I prefer jujitsu as a less loaded metaphor.

    Thanks for the link to Gene’s post–it hadn’t occurred to me that positive feedback is just as much of a problem in collaborative search, though it’s immediately obvious why that would be the case. I suppose it shows that we always have to think adversarially about our own approaches.

  • 3 Vinay // Nov 23, 2009 at 8:19 am

    Related publications, in case you havent come across them before:
    http://oak.cs.ucla.edu/~cho/papers/cho-quality-long.pdf
    http://oak.cs.ucla.edu/~cho/papers/cho-shuffle.pdf
    http://arxiv.org/abs/cs.CY/0511005

  • 4 Daniel Tunkelang // Nov 23, 2009 at 9:22 am

    Vinay, thanks! Interesting to see that the question of whether search engines help the “rich” or the”poor” is actually a subject of debate. That last link to “The egalitarian effect of search engines” is interesting–I’ll have to read it more carefully to see whether I accept its claims.

  • 5 CJ // Nov 24, 2009 at 12:26 am

    It’s a really interesting question. Below if sort of a short brain dump of my thoughts about ranking and collaborative search and so on.

    I think that a possible solution is to get more “semantic” about things. Instead of taking a document and calculating a PageRank score (or similar) for it we should be able to understand how it fits in the world around it and in the context of a user query. I’m excited about the semantic web because I can begin to see how such things become possible. It’s not about document x being more relevant than document y in a linear way but rather document x having the potential to invite the user into the correct “information sphere” as it were. The semantic web has a way to go yet and maybe it isn’t the solution but it is important as an idea.

    A more exploratory approach makes it very expensive to spam or game search engines because they would have to create a massive network of documents and maintain them properly. It also becomes highly complex.

    I think that maybe trying to score documents as “most relevant to least relevant” isn’t the solution. I see the web a bit like space. If you’ve played the game Mass Effect (or even Spore), the way that you navigate between planets and clusters is sort of how I visualize it. I’m not saying the user interface has to be like that, but that our approach has to be more multi-dimensional, our vision has to be more all encompassing.

    Collaborative search definitely begins to work along these lines. I’ve done some “idle” research and gathered a load of instances from Twitter where people are asking for information, and also collected the responses. It would be interesting to see how far they correlate with a search engine type response. (I’m sure there’s research on this somewhere already). Anyway people get an answer to their question or get pointed into the right direction. There is no #1 result, but rather a whole world of potential or a clean answer.

    Search engines shouldn’t help the “rich” or the” poor”. A ranking algorithm always means something has to be at the top and something at the bottom. I’m excited about us not using ranking algorithms anymore at some point and coming up with something more intuitive and adapted :)

  • 6 Daniel Tunkelang // Nov 24, 2009 at 6:23 pm

    Point taken–and I’m certainly not a philosophical advocate for scalar relevance! But even in a model that exposes many facets to users, it seems inevitable that at least some of them will reflect social signaling. And I suspect that users will gravitate to those signals, e.g., bestselling products and popular songs. We are only human, and as humans we like fast and frugal heuristics.

  • 7 Phil Simon // Nov 27, 2009 at 8:50 am

    I’d be shocked if we don’t see more of these experiments given the ascent of social networking sites. I just finished The Long Tail by Chris Anderson and he addresses some of the same concerns. If PageRank and other related tools favor the “rich”, then perhaps niche blogs can introduce those to obscure bands. Also, recommendations from sites such as Pandora often introduce others to newer bands.

Clicky Web Analytics