Categories
General

The Information Triumvirate

Nicholas Carr (of “Does IT Matter?” fame) wrote a post a couple of days ago entitled “All hail the information triumvirate!

Here is his argument in a nutshell:

what we seem to have here is evidence of a fundamental failure of the Web as an information-delivery service. Three things have happened, in a blink of history’s eye: (1) a single medium, the Web, has come to dominate the storage and supply of information, (2) a single search engine, Google, has come to dominate the navigation of that medium, and (3) a single information source, Wikipedia, has come to dominate the results served up by that search engine. Even if you adore the Web, Google, and Wikipedia – and I admit there’s much to adore – you have to wonder if the transformation of the Net from a radically heterogeneous information source to a radically homogeneous one is a good thing. Is culture best served by an information triumvirate?

Carr discloses that he is on the Encyclopedia Britannica’s board of editorial advisors, but I don’t think he’s writing this as a hit piece against Wikipedia. Nor does he recycle the usual pablum about Wikipedia’s inaccuracies; he has surely read the research that Wikipedia is just as accurate as Britannica. [Note: he has read it and criticized the study.]

I agree with Carr that our current use of Google and Wikipedia impoverishes our experience of the information that the web has to offer. In fact, that was a major subtext of my recent presentation on reconsidering relevance. I’m not sure that the dominance of the web is itself a problem, but that’s because I assume that the web is relentlessly assimilating all of the world’s information.

But Carr’s advice isn’t constructive, nor are most of the comments in response to it. Indeed, people wrongly focus their ire on Wikipedia rather than Google. It’s Google that promotes a winner-take-all information economy through its relevance-centric paradigm; if anything, Wikipedia mediates this effect because its results are collaboratively edited.

I think that, if we constrain ourselves to a relevance-centric information seeking system, our current Google + Wikipedia model is close to optimal. To do better, we need web-scale tools that support exploratory search. Long live the HCIR revolution!

By Daniel Tunkelang

High-Class Consultant.

17 replies on “The Information Triumvirate”

Indeed, people wrongly focus their ire on Wikipedia rather than Google. It’s Google that promotes a winner-take-all information economy…

I agree.

…through its relevance-centric paradigm;

I disagree.

It’s not because Google’s paradigm is “relevance-centric”, imho. It is because Google’s definition of relevance is limited to a popularity-centric interpretation of relevance.

I was having the other half of this argument with Jeff Dalton a few months ago. He was criticizing the TREC evaluation suites, talking about how TREC really falls short, because they *only* focus on relevance, and not on other important things like timeliness and so on.

My own personal feeling about this is that it’s not that TREC only focuses on relevance, and not on anything else. It’s that TREC only focuses on one aspect of relevance. Topical similarity is that one aspect. Timeliness, while not a part of typical TREC data, is not orthogonal to relevance. It is but another aspect of relevance.

So my feeling is that you can criticize TREC for not having a more robust definition of relevance. But even if you add timeliness metrics to the evaluation, you’re still focusing on relevance. Similarly, you can (and should) criticize Google for also not having a more robust definition of relevance, and not giving users the means to re-sort and explore information in a way that they themselves prefer.

But users sorting and exploring information in a way that they prefer is, by definition, still “relevance”. At its core, relevance is really nothing more than “what I want”. And if TREC only evaluates topics relevance and not timeliness relevance, then it is only focusing on those users for whom topicality is what they want. Similarly, if Google does nothing more than “pagerank” its way into popular relevance, then it also fails to address the more interesting (and I would argue more ubiquitous) exploratory aspects of relevance. That’s where HCIR comes in, to address those relevance aspects.

But whether its timeliness, topicality, popularity, or exploration, it’s still all, fundamentally, “relevance”.

imho.

Like

Jeremy, we’re quibbling over vocabulary. Most people use the word “relevance” today denote some objective, scalar function that expresses a relationship between documents and queries. Someone like Tefko Saracevic or Stefano Mizzaro, but sadly these folks are in the minority–even within academia.

So I think we are trying to say the same thing, only using different words. You’d like to restore the broader sense of relevance, while I’ve written off that possibility and reject the focus on relevance in its narrower sense.

Perry, thanks for the link. Interestingly, when I went to SIAM Data Mining 2002, Sergey Brin delivered a keynote, and I asked him whether Google’s PageRank algorithm (which hadn’t been tweaked as much back then) was causing the “rich get richer” positive feedback phenomenon that Gary describes. He said he hadn’t thought about it. Maybe he hadn’t back in 2002, but surely by now he realizes that he bears some of the responsibility for it.

Like

I agree.. I think we are quibbling over vocabulary. I understand what you mean. And you understand that I understand what you mean. And we both know that we both have goals that are very common in spirit — and reject the narrower, Googlified focus on relevance in its narrower sense.

But vocabulary is very important. Whoever controls the vocabulary succeeds in framing the discussion. That’s why you have two sides of very heated issues, such as abortion, using words such as pro-choice vs. anti-life, and anti-choice vs. pro-life. The pro-choice side rejects the anti-life vocabulary, and the pro-life side rejects the anti-choice vocabulary.

So right now, Google is winning the vocabulary war over “relevance”. Or “relevancy” as they often annoyingly call it. They’re winning it because they’ve succeeded in conditioning most folks to believe that what Google does is what relevance is. What Google’s competitors do is not what relevance is.

And the problem with just ceding the definition of that term to Google is that, from a marketing standpoint, people don’t want anything other than relevance. They’ve been sold on this idea that the most important feature of a search engine is “relevance” (the same way they’ve been sold.. falsely I might add.. on the idea that the most important feature for determining image quality in a digital camera is its megapixels).

And so if we try to sell them on the idea of going “beyond” relevance, they just kinda look at you funny, the same way a meat-lover might look at you when you offer them a deliciously prepared, 5-star tofu cutlet. “But it’s not meat! I don’t want it!” is the reaction you’ll get.

I think it’s very important not to give up control of the vocabulary war. I think it’s important to make very clear that “Google Relevance != Information Need Relevance”. Because relevance is what the public wants. And relevance, true relevance and not google-relevance, is what the public should get.

So yes, you’re absolutely correct; it is just a vocabulary quibble. And we do, I think, very much agree on the underlying concepts. I support the cause for which you are fighting.

But words are important.

Like

Point taken. And you can even see that, for my talk, I chose the title “Reconsidering Relevance” rather than “Rejecting” or “Beyond”. Though I did then go on to criticize the “relevance-centric” paradigm.

I suppose I should be consistent. But it’s hard when I am trying to bridge the gap between two world views. It’s not a question of ceding the definition to Google (and, for that matter, to most of the IR community), but rather executing a long-term strategy to reclaim it.

Like

Gregory, I hadn’t read his criticism of the Nature study. I’ve corrected my mistake in the post.

But even if Britannica turns out have better accuracy, its coverage is so poor relative to Wikipedia’s that it’s worthless to me as a resource.

And the great thing about Wikipedia is that it is a great resource for researching this very question:

http://en.wikipedia.org/wiki/Reliability_of_Wikipedia

Given your vitriol and apparent concern for accuracy, I do find it intriguing that you launched a company that proclaims:

MyWikiBiz is a new directory where you can author your legacy on the Internet. We think you are notable, even if Wikipedia has rejected an article about you or your enterprise as being “non-notable”.

Like

MyWikiBiz does not, nor has it ever, claimed to be an encyclopedia, nor a “sum of human knowledge”. I also have never attempted to expense a visit to a Moscow massage parlor to my non-profit organization, as did the co-founder of Wikipedia.

Sorry, but garbage like that makes the vitriol flow in me.

If you want a nice, narrative story about my personal experience with Wikipedia, here you go:

http://www.mywikibiz.com/Directory_talk:MyWikiBiz

Like

The link provided above for “Reliability of Wikipedia” produces an interesting conundrum. In the second paragraph of that article, we find the name of reporter Jim Giles incorrectly written as “Gile”, and we see the Encyclopedia Britannica referred to as “Britannica Encyclopedia”.

So, only two paragraphs deep into an article discussing the reliability of Wikipedia, we have an unreliable Wikipedia!

Like

Gregory, I think you’re splitting hairs, no hairline puns intended. 🙂

But I reiterate my point–Wikipedia makes up in coverage for its imperfections in accuracy. The Encyclopedia Britannica gave me a free premium subscription, and I gave up on it within days after discovering it didn’t have anything I was looking for.

Moreover, Wikipedia’s collaborative editing process seems to work most of the time, your personal experience notwithstanding. That may be less true for controversial subjects, but most scholarship isn’t especially controversial.

In fact, readers here have made some nice contributions to Wikipedia entries, e.g., https://thenoisychannel.com/2008/05/31/your-input-really-is-relevant/

Like

it seems that google appeared because the size of the web was growing so fast. did wikipedia appear because the amount indexed by google grew so large? xmillion results per query… what will happen when the knowledge in wikipedia reaches a certain size? will something be created to distill that? or, as you say, has wikipedia got the optimal model for being the distilled centre on the internet?

Like

I think Wikipedia is self-distilling–people already split entries, merge them, etc.. Perhaps the better question is whether Wikipedia will collapse under its own weight if its model fails to scale gracefully. So far, I’m impressed at its resilience. I really wish they would insist on non-anonymous authorship, but I cannot argue with its overall success.

Like

I agree that Wikipedia is an incredible phenomenon, and I too have done a little side-by-side comparison of available content:

http://www.mywikibiz.com/Wikipedia_versus_Encyclopedia_Britannica

Note that my experiment found there to be 14 times more subjects on Wikipedia than Britannica.

I’m glad you said something about non-anonymous authorship, though. Most of my beef with Wikipedia is that it is directed by lying, sleazy, careless individuals who tend toward a course of non-responsibility over the more often than not pseudonymous editors, hiding their identity from the public.

The product could be so much better at this point, but the leadership just doesn’t give a crap.

Taner Akcam spent a few hours in detention at an airport, thanks to anonymous libel on his Wikipedia biography. That is inexcusable, and something should have been done long ago to prevent a repeat of that incident.

Like

I am surprised to see how little debate there is around power and who and what constitutes authoritative knowledge.

The problem is not rooted in the production of knowledge (whether wikipedia – where I have got a login and can edit knowledge – or google or any other search engine, offline or online) but the assumption that knowledge provided in sources that have gained a certain status is no longer being questioned and thus taken for granted. Without questioning contemporary discourses embedded in cultural contexts which are no doubt reflected in wikipedia entries, users become nothing but passive consumers – mind you, this argument is not exactly brandnew either as The Frankfurt School with Adorno et al. were arguing similarly about media in the late 1930s.

Like

Britta, I’m not sure I follow your argument.

There’s no question that any claims of knowledge reflect the cultural contexts of those who make those claims. And not just the cultural contexts, but also the interests of the individuals, who may represent commercial, political, or activist institutions.

The problem we face as a society is that, in the face of information overload, people need tools to organize, filter, and prioritize the available information. The Google approach focuses on rank-ordering it–and, not surprisingly, that leads to a promotion of Wikipedia.

My objection to this approach is that, rather than organizing the world’s information, it referees it. That’s why I advocate for exploratory search tools. But it’s also why I focus my critique on Google rather than Wikipedia.

Like

Comments are closed.