The Noisy Channel

 

SIGIR 2009: Day 3, Industry Track: Vanja Josifovski

July 31st, 2009 · 37 Comments · General

After the conference banquet at JFK Library and Museum, a few of us went to Bukowski for beers. At one point in the conversation, a friend of mine railed against computational advertising as a research topic. I didn’t quite have the nerve to reply that it was one of the topics I’d picked for the SIGIR 2009 Industry Track that would take place the following day.

Finding a speaker for this subject was relatively straightforward. I hadn’t yet recruited anyone from Yahoo!, and I knew that Yahoo! was the place to look for computational advertising experts. So I emailed Prabhakar Raghavan, and he suggested Vanja Josifovski. I’d never met Vanja or heard him speak, but a quick look at his publications and experience was more than enough to convince me. I was delighted when Vanja agreed to participate, presenting “Ad Retrieval – A New Frontier of Information Retrieval“.

I was even more delighted the actual presentation, which you can download here. Perhaps more than any of the other speakers, Vanja embodied the spirit of the Industry Track, which was to bring together the worlds of research and practice in information retrieval.

He started by making the case for textual advertising as an area worthy of study. He pointed out that, while advertising supports much of our access to search engines and online content, most users perceive ads as less relevant than the other content content they access. In other words, there is a significant opportunity for those in the advertising business to broadly improve the online user experience while making money.

He then proceeded to explain the anatomy of a textual ad. If you’re not familiar with the details, I encourage you to look at his presentation. But I’ll reproduce what I feel was his most important slide here, slide #15, titled “Ads as Information”:

  • Treat the ads as documents in IR
    • [Ribeiro-Neto et al. SIGIR 2005] [Broder et al. SIGIR2007] [Broder et al. CIKM2008]
  • Retrieve the ads by evaluating the query over the ad corpus
  • Use multiple features of the query and the ad
  • How does Ad retrieval relate to Web search?
    • Web search:
      • Large corpus
      • Reorder the pages that contain all the query terms
    • Ad retrieval:
      • Smaller corpus
      • Similarity search rather than conjunction of the query terms: recall in the first phase important

There’s a lot more to the talk, but hopefully that slide conveys how well Vanja posed ad retrieval as a distinctive information retrieval problem worthy of researchers’ attention.

Ironically, I’m not a big fan of advertising, and I see the dominance of the ad-supported model as a bug, rather than a feature, of our current online ecosystem. But I’m realistic enough to know that this dominance is a fact of life for the forseeable future, and I appreciate that better targeted advertising is a win/win for both advertisers and their audiences.

More importantly, I expect that efforts to improve advertising will result in advances in information retrieval that have broader applications. Vanja’s presentation advertised those benefits brilliantly.

37 responses so far ↓

  • 1 jeremy // Jul 31, 2009 at 10:48 am

    Treat the ads as documents in IR

    Already at stage 1 of the process I run into philosophical difficulties with computational advertising. Why? Because in search engine advertising, you’re not just seeing an ad. You’re seeing a hyperlink to a larger web page, which the ad is attempting to get you to click through to.

    So the ultimate goal of the ad, the ultimate “relevance” to the user, is the web page, not the ad. And that’s where it becomes reasonable to point out that the problem with computational advertising is that, if the relevance algorithm for the main, organic results were already doing its job, it would have retrieved that ad landing page naturally, already. Without an ad having to be placed.

    So the ad gets in the way of organic relevance. If organic relevance were to actually do its job, the ad would have no reason to exist, because people would find that relevant information naturally, anyway.

    Is that what you mean by computational advertising being a “bug”?

    Bug or not, I find the attempts over the past decade to justify computational advertising via an appeal-to-relevance argument completely untenable. I just don’t buy the argument.

  • 2 Daniel Tunkelang // Jul 31, 2009 at 12:02 pm

    To clarify my bug vs. feature comment: my frustration is that online content has been relegated to the ghetto of the ad-supported model. I still cling to the old-fashioned idea that you get what you pay for.

    In any case, let’s distinguish the problem of optimizing advertising effectiveness from the problem of optimizing the retrieval of relevant documents. The goals of the advertiser and the advertisee are not entirely aligned, but not entirely adversarial either. The former explains why people distrust even “relevant” ads, while the latter explains why advertisers see relevance as a factor (albeit one of several competing factors) of successful targeting.

    In short, I’m not trying to justify computational advertising via an appeal-to-relevance argument–nor do I think that’s what Vanja did. Rather, I think he made the successful case that ad retrieval plays a key role in the real-world ecosystem of online content and at the same time offers an interesting intellectual challenge to IR researchers.

  • 3 Josh Young // Jul 31, 2009 at 1:14 pm

    Why don’t those who serve the ads charge based on the difference between the hypothetical level of relevance that would have implied an organic result and the actual level of relevance on the web page? On the one hand, it seems odd to profit from serving less relevant ads. But on the other, as both Daniels point out in different ways, the business of serving ads is a business that derives value from other sources other than relevance. And this would incent advertisers to increase the relevance–or decrease the irrelevance–of their web pages.

  • 4 Daniel Tunkelang // Jul 31, 2009 at 1:37 pm

    I thought that’s sort of what happens already. To a first approximation, ads are ranked by their estimated profitability, which is bid * click-through rate. If you accept click-through rate as a proxy for relevance, then the require bid price for a given placement is inversely correlated to relevance.

  • 5 Neal // Jul 31, 2009 at 1:40 pm

    TV is financed by advertising, as is radio and every other form of content not siloed behind a paywall. It’s not a bug. If neither people nor advertisers are willing to finance the content, then this is a great indication of the importance and value of that content is low.

    If a user types in “tours of italy” into a search engine, why are the advertisement links any less relevant than the main content links? I found them indistinguishable in Google/Yahoo/Bing as far as relevance goes. Granted if you are using a search engine for typical geeky activities perhaps adverts are not very relevant.

    Here’s the simple plug for computational advertising as a research topic and technical pursuit.. increased relevance of the advertising is directly measurable on an economic basis. More clicks and ‘better’ clicks (those leading to purchase activity after the click) lead to a clear and measurable benefit to all involved.

    ir+optimization+economics. What’s not to like? It finances all major search engines and thus /their/ research into IR. Many fields of study in AI and CS are quite narrow.. how many of those are worth $50+ billion dollars annually world wide?

  • 6 Daniel Tunkelang // Jul 31, 2009 at 1:44 pm

    Almost by definition, content that isn’t behind a pay wall is financed by advertising. So let me further clarify my frustration: it’s that there’s a market preconception that content should be financed by advertising, while most other goods and services are financed by their direct consumers. I think that’s an inefficient allocation of people’s money and attention. But it’s what the market yields, and I do live in the real world, much as I may find some of its quirks frustrating.

    As for your plug, I’m 100% with you. I hope I conveyed some of that in the post.

  • 7 dinesh vadhia // Jul 31, 2009 at 1:49 pm

    Agree with comment #5.

    Wrt the last point on slide#24 re: “Can the industry come up with a corpus of ads?” … It would be much easier if Yahoo! et al provided the corpus of ads so that IR researchers have a data set to work with from the get go. They can also provide a corpus that has been sanitized wrt privacy etc.

  • 8 jeremy // Jul 31, 2009 at 4:37 pm

    TV is financed by advertising, as is radio and every other form of content not siloed behind a paywall. It’s not a bug.

    Apples and oranges, Neal. When you’re watching television, let me ask you: Do you have an explicitly formed information need? Are you seeking information on what new car to buy, while you are watching Lost? No. You don’t. Therefore, while television is ad-supported, the ads are truly and completely and logically separate from your main activity: being entertained.

    Now, let’s go to your other example:

    If a user types in “tours of italy” into a search engine, why are the advertisement links any less relevant than the main content links?

    Aha! Now we’re in a different scenario. The user has an explicit need. And is seeking information on that need. And from the user’s perspective, all the user wants is information that is relevant to that need.

    And all an IR system designer should be designing for is to present all the most relevant information, in the best order possible, to that user.

    But this isn’t what is happening. What is happening is that some pieces of information are being split onto the left hand side, and some pieces of information are being split onto the right hand side. Why? If they are truly equally relevant pieces of information, shouldn’t all the information appear on the left? And if the left and the right are not equally relevant, shouldn’t the side that is less relevant be demoted in position, relative to the other side? By promoting items equally, on both sides, the search engines therefore fail not only to present the most relevant information in the best possible order, they also succeed in confusing in the mind of the user what information should be examined, and in what order.

    I therefore maintain my position that there is no such thing as computational advertising as a separate IR topic. It is either IR and relevance, or it is not.

  • 9 Daniel Tunkelang // Jul 31, 2009 at 4:42 pm

    What if we assume, as a given, that a search engine or online media site will be ad-supported? In that case, the question is not whether to subject the user to advertising, but how to make the best of the requirement to do so in order to sustain the revenue model. Surely we can agree that this is a challenging problem that fits into the intellectual landscape of information retrieval, no?

  • 10 jeremy // Jul 31, 2009 at 4:45 pm

    @Daniel: In any case, let’s distinguish the problem of optimizing advertising effectiveness from the problem of optimizing the retrieval of relevant documents.

    My point is that this is exactly what we should not be doing. If the information hyperlinked to by the ad is relevant, then that information should be found organically, by the search engine.

    There is no difference in optimization if your main consideration is relevance to the user.

    If your main consideration is making money on the other hand, then yes, you can distinguish the two. But you are doing so at a cost to the user: muddied and mixed message relevance. Two ranked lists, rather than one.

    And don’t even get me started on double relevance optimization standard and diverging definitions of relevance that Google openly admits to when it allows ads for [adidas] to appear when I search for [reebok]. I wrote about this the other day, but there is certainly much more that could be written about this topic:

    http://irgupf.com/2009/07/13/will-all-relevance-be-googly-aka-googles-microsoft-moment/

  • 11 jeremy // Jul 31, 2009 at 4:46 pm

    There is no difference in optimization if your main consideration is relevance to the user.

    Let me rephrase that:

    There should be no difference in optimization if your main consideration is relevance to the user. If there is a difference, the difference is not due to relevance. The difference is due to $$$.

  • 12 Daniel Tunkelang // Jul 31, 2009 at 4:54 pm

    But that’s my point, the main consideration for advertising is not relevance to the user, no matter what Google or anyone else says, but rather making money. Indeed, Vanja didn’t make any such disingenuous claim. Rather, his message was consistent with what Yahoo says on its computational advertising page:

    Computational advertising is a new scientific sub-discipline, at the intersection of information retrieval, machine learning, optimization, and microeconomics. Its central challenge is to find the best ad to present to a user engaged in a given context, such as querying a search engine (“sponsored search”), reading a web page (“content match”), watching a movie, and IM-ing.

    Some amount of relevance to the user happens to be a means to that end. The end goals are to make money and sustain an ad-supported ecosystem.

  • 13 jeremy // Jul 31, 2009 at 5:34 pm

    What if we assume, as a given, that a search engine or online media site will be ad-supported? In that case, the question is not whether to subject the user to advertising, but how to make the best of the requirement to do so in order to sustain the revenue model. Surely we can agree that this is a challenging problem that fits into the intellectual landscape of information retrieval, no?

    It’s a pretty big pill to swallow. But sure, if we just assume that this is the way things are, and don’t care about the user-oriented relevance limitations that arise out of this business model architecture, don’t care about providing a consistent user relevance experience, and are therefore content to provide the user with a suboptimal experience as a result of our inability to make a product that people find valuable enough to pay for on its own, then sure. Let’s go for it.

    But we have to be clear, and everyone has to understand, what choice we are making. We can’t just sweep it all under the happy-carpet with “oh yes but it’s all relevance” generalizations. We have to be conscious of the fact that having two kinds of relevance, separate-but-supposedly-equal relevance, is a suboptimal decision with not only algorithmic consequences, but user-interface and, ultimately, user task and user satisfaction consequences.

    Are we conscious of that fact? Sometimes I’m not even sure that we (collective) even are.

  • 14 Neal // Jul 31, 2009 at 5:56 pm

    @Daniel Agreed. Advertising that is not important to the user is not relevant.

    @Jeremy We’ll have to agree to disagree here. I’m having trouble judging your motivations. Either you dislike advertising and the way it’s presented on the web and in web search (ie it’s implicitly deceptive) versus or this is a very narrow point on when a community gets to decide that it is a sub-discipline of X.

    It is a relevance game. If you have a choice between an ad that is relevant and pays lower versus a less relevant ad that pays higher.. the ad-matcher needs to weigh the balance and decide. If more relevance leads to more clicks then you win. If the ad-matcher system says an ad is relevant and there are not enough clicks.. then it was wrong.

    What’s the real difference between biasing a std IR relevance model (vector space model) with the stationary distribution of the link graph (pagerank) versus the market driven attributes of bid rate and expected # of clicks? It’s just a different metric to bias against with the hypothesis that relevance improves.

    Perhaps this is your point that it’s just IR. If so I agree.

    Besides, who is to say that biasing via economic value of keywords isn’t a better idea that tf-idf alone? Where is this oracle of correct relevance that says the methods of comp-advert are ‘wrong’?

    On a previous point: Note that if you read the comp-advert lit you will see them pulling text from the landing pages to improve ad ranking.

  • 15 jeremy // Jul 31, 2009 at 8:10 pm

    But that’s my point, the main consideration for advertising is not relevance to the user, no matter what Google or anyone else says, but rather making money. Indeed, Vanja didn’t make any such disingenuous claim. Rather, his message was consistent with what Yahoo says on its computational advertising page:

    Daniel: Let me apologize if any of what I said came across as specifically critical of Vanja. That was not my intent.

    You are correct in pointing out that he is not inconsistent.

    What I disagree with is the overall assumption that we’re (collectively) making about the necessity of “advertisarial” retrieval and relevance being distinct from retrieval and relevance. As you write:

    He started by making the case for textual advertising as an area worthy of study.

    If we’re going to start creating and defining new subfields of IR, then let me propose another area worthy of study: Relevance Reintegration.

    I define this as the process of taking information that has been scattered and separated and segregated into left- and right-hand side column (thereby forcing the user to wonder which column is more or less relevant, forcing the user to wonder whether item #4 in the left column is more relevant than item #2 in the right column, or vice versa) and reintegrating it into a holistic relevance-ranked totality.

    Essentially, this field of study would be attempting to undo what the other field of study has done, to make it easier for the user to find relevant information and stop having to schizophrenically bounce back and forth between two columns on the page, wondering about the relative value of each column.

    Sound like a good intellectual challenge? I think so too ;-)

  • 16 jeremy // Jul 31, 2009 at 8:35 pm

    Neal, we can agree to disagree if you want, but I have to respectfully say that you’re still not gettin’ it. I’ll accept full responsibility for that, and try one more time to be a little clearer. But let’s defer actual disagreement, until I’ve been able to clearly explain my position.

    It’s not that I don’t believe that ads shouldn’t be ranked by relevance. It’s not that I think ads are (necessarily) deceptive. It’s not that I think the search engines don’t use landing page information to assist their ranking scores (I of course know that they do). It’s not that I even (necessarily) think that some sort of “money” feature is, by itself, incapable of improving relevance. (That last statement is going to sneak up and bite me at some point in the future, but hopefully it won’t be taken out of context.)

    Rather, my point is that if the information in the landing page from the ad is relevant to the user’s information need, then that information (that landing page) should appear ranked highly in the organic results. From the user’s perspective an ad does not have to exist in order for the landing page to be relevant to a user’s information need. A good search engine should be able to find that landing page, organically, just as it finds other pages.

    The very existence of two separate ranked lists on the results page (left and right columns) is, from the perspective of the user trying to satisfy his or her own information need, a useless inefficiency. The relevant information is the relevant information, and should be presented first. If it is not presented first, it is a non-optimal and it negatively impacts the user.

    Does that make sense? Do you have enough to disagree with, now? ;-) Be my guest ;-)

  • 17 jeremy // Jul 31, 2009 at 8:52 pm

    The very existence of two separate ranked lists on the results page (left and right columns) is, from the perspective of the user trying to satisfy his or her own information need, a useless inefficiency.

    ..and just to drive the point home, the corollary to this statement is that, when the search engine finds those relevant landing pages organically, anyway, the necessity for advertising ceases to exist.

    So it’s not that I don’t like how the ads are presented, or whatever.

    It’s that the very existence of the ads, in any form, represented a serious search engine inefficiency and non-optimality.

    So why should we be studying something that only increases search engine inefficiency, and our dependence on that inefficiency? We should instead be studying how to get rid of this inefficiency, and thereby improve the overall user experience.

    That’s the core of my objection.

  • 18 jeremy // Jul 31, 2009 at 9:15 pm

    @Daniel: Some amount of relevance to the user happens to be a means to that end. The end goals are to make money and sustain an ad-supported ecosystem.

    I absolutely agree with that. What we as researchers need to be aware of, though, is that the existence of this separate ecosystem, which none-the-less gets mixed into the overall user experience of someone seeking relevant information, naturally conflicts with that overall user experience.

    I’m again not saying it’s deceptive, because the ads are clearly labeled. But labeling isn’t the only factor impacting user experience. The existence of ads in a separate column makes it confusing to the user as to which order he or she should look for relevant information. Should he or she scan down to Result #10 in the left-column, then go back up to Ad #1 in the right column? Or should the user alternate.. Result#1, Ad#1, Result #2, Ad#2? What’s the real best order? Not knowing what to do is confusing and stultifying.

    Similarly, when I click to Page #2 of the results, often I see the same ads from Page #1, but different organic results. This further confuses me.. are the ads not a linearly-scrolling list, like the organic results? Do they cycle? If they cycle, what is their cycle frequency? Should I look through 4 pages of organic results before I start looking at ad results? Should I look at one page of organic results, then try to see how many ads I can get to cycle, to see if I can find relevant information that way?

    It’s a totally confusing, muddied, imbalanced user experience. Not the organic results alone, not the ads alone. But the page as a whole, and all the mixed motives present on that page.

    I’ll say it again: We shouldn’t be proposing Computational Advertising as a new field of study. We should be proposing Relevance Reintegration as the new field, as a way of overcoming all this relevance muddiness.

  • 19 Daniel Tunkelang // Aug 1, 2009 at 11:49 am

    Wow, it’s nice having a heated comment thread again–feels like it’s been weeks!

    Jeremy, I think we mostly agree. I don’t think ads directly improve the user experience, and I’d say they usually impede it. That’s why I block ads personally. But, recognizing the role ads serve in an ecosystem I otherwise enjoy, I see why it’s worth making them better.

    And perhaps there are better ways to combine ads and non-sponsored results to improve user experience (assuming the former as a given)–it’s actually a problem I haven’t thought about. If anything, you’re saying that, in terms of thinking about advertising, there’s a situation analogous to that with non-sponsored retrieval–we have to think beyond filtering and ranking, and in terms of an overall user-centered experience. HCIR for advertisers? :-)

  • 20 Neal // Aug 2, 2009 at 12:08 am

    @jeremy You make your points well, yet they rest on a few key (partially unstated) assumptions. The foremost unstated assumption is that there should be one and only one list of relevant results. The second is that organic relevance is the only kind of relevance. The third is that the types of links and landing pages in adverts are traditional webpages that are not terribly dynamic in content.

    Content within newspapers can generally be linked into four groups, news, analysis, opinion and advertising.. and in that medium their subjective relevance relative to ‘facts’ is approximately in that order. These types of content are clearly marked as such in a newspaper. Why should we desire a universal relevance ranking on the WWW with more than four kinds of content? Even phone books divide themselves into residential, govt, business and yellowpages.

    How should pages be organically ranked for a query such as “rome, italy” where the user might be looking for informational pages, reviews, history, business listings or offers for travel/tours for that destination? I dispute that results should be ranked as if there is one universal ranking… ala your ‘relevance reintegration’!

    Note that many of the landing pages from advert links are terribly dynamic in nature. They are designed to sell something to a potential customer and savvy markers vary the content, layout etc to attempt to find a set of optimal paths to acquiring a customer. Companies like AdChemy and others are in business to optimize these pages automagically. As far as I am aware, webpages that look like SEM doorways are punished in ‘organic’ search engine.

    Organic results seem to implicitly prefer medium depth or deep links to content for queries and mostly spurn links to top level navigation/directory pages (this is overly broad yet mostly accurate). Doesn’t this present a challenge to your relevance reintegration?

    As for comp advert being not sufficiently differentiated to be a sub-discipline.. isn’t the community self organized and the concept of birds-of-a-feather meetings a well established tradition? Color me confused.

    I guess my summary is that your (well stated) arguments smell a little too abstract to me.. as a practical matter advert links are separate and will remain so until something disruptive happens. IMO they should remain separate as most attempts to merge them have resulted in failed search engines.

    As an aside I love your idea of relevance reintegration in general. The need is particularly acute in ‘federated search’. I worked in this area for a while and the questions arising from the how/where of ranking documents from disparate data sources were not easily solved. For some random query how should a so called enterprise federated engine decide ranking for product marketing, customer service, user forum and other self-similar documents? Most attempts seem to punt and federated search devolves into a single search box serving many result lists.. pretty sub-optimal for a good user experience in general. I’m all for your ethos for this and similar problems.

    Great debate here!

  • 21 jeremy // Aug 3, 2009 at 2:18 pm

    First let me say: I totally agree with you, Neal, that we have a great debate going on. This, to me is what the web should be about. Too often it turns into Slashdottesque flame wars, or simply peters out with no discussion at all. Rather, I like the “back of the pub” feeling that we’re engaged in.

    That said, you’re wrong! wrong! wrong! :-)

    But seriously, you do have a point when you say that perhaps my arguments are a little too abstract. I would argue, however, that the reason they’re abstract is because what I propose doesn’t exist yet. That abstractness should be a call-to-arms, a rallying cry to improve. Not a weakness of the arguments themselves.

    The main problem I have with sticking to the status quo is that it does not respect the information needs of the user. It’s not user-driven.

    All the problems you mention are overcome-able, but only if we (as an industry) have the will to try and overcome them. The problem is that we do not possess that will.

    Let me address each of your three main points:

    (1) The foremost unstated assumption is that there should be one and only one list of relevant results.

    That’s not exactly what I am saying / assuming. I am not against the factoring or faceting of results into multiple camps of information. What I am saying, however, is that those factors or facets should be user information-need driven. Not advertiser- or search engine-driven.

    Let me give a concrete example: Suppose I, the user, search for [reebok shoes]. Right now, I get organic results on the left, sponsored results on the right. Two columns, two facets. However, those facets are not driven by my information need. My information need doesn’t particularly care about the pathway to the information, i.e. whether it was an algorithm or money that led me to a result. Rather, what is important to me in my information need is whether I can find those shoes locally, at a nearby store, or remotely, from an online retailer.

    So you can still have two columns of information, but one column should be the local organic results, and the second column should be the online retailer organic results. And sponsored links, instead of appearing only in one column or the other, could/should be scattered across the columns, too — sponsored results from the local stores as well as from the online stores.

    By factoring the results into purely “organic” and “sponsored”, the search engine therefore does not respect the user’s information need. We need relevance reintegration.

    This is just one example, but does illustrate the point.

    (2) The second is that organic relevance is the only kind of relevance.

    So maybe I should state my position slightly differently: There is only one kind of relevance: User information need satisfaction relevance. Again, the user does not care about the pathway taken to get to the information that satisfies his or her need. The user only cares about satisfying his or her need.

    My point about how ad landing pages should be found organically was not to say that organic relevance is the only kind of relevance. It was to say that user-need-satisfaction relevance is the only kind of relevance. And a good search engine should be able to find the information that satisfies a user information need, if that information exists on the web. An ad doesn’t actually help the user; it only helps the search engine.

    (3) The third is that the types of links and landing pages in adverts are traditional webpages that are not terribly dynamic in content.

    No, I do not make this assumption. There is no reason why landing pages in adverts have to be static, in order for search engine to be able to find them organically. Instead, I think that is an assumption that you are making — that they can only be organically findable if they are static — not me.

    Search engines index dynamic content all the time. They even have web crawlers and indexers that are specifically geared toward non-uniform crawl policies, based on the relatively dynamic nature of web pages. Pages that update frequently get crawled more often than pages that do not. And the search engines have also recently committed to real-time search, with the advent of Twitter. This is something that they’re already in the process of doing.

    So I completely disagree; just because an ad landing page is not static doesn’t mean a search engine can punt on it’s main task of finding user-relevant information.

    In summary, my point is that the user’s information need is often orthogonal to the split of information into dual organic/sponsored columns. And this results in a schizophrenic user experience, with the user jumping back and forth between the columns, wondering where the more relevant information is to be found. Advertising, therefore, directly interferes with a good user experience. That unnatural split is what needs to go away.

    And you don’t necessarily have to create a single ranked list. You can still use facets. But the facets should be natural facets, information need-driven facets. Not artificial “organic relevance vs. ad relevance” facets.

  • 22 jeremy // Aug 3, 2009 at 2:34 pm

    @Daniel:
    I think we mostly agree. I don’t think ads directly improve the user experience, and I’d say they usually impede it. That’s why I block ads personally. But, recognizing the role ads serve in an ecosystem I otherwise enjoy, I see why it’s worth making them better.

    But doesn’t that only perpetuate a flawed ecosystem? Isn’t that like saying, “Yes, I see that our fossil-fuel based economy is destroying our planet. But, recognizing the role that fossil fuels serve in our economic ecosystem, I see why its worth making them better, i.e. let’s build more “clean” coal plants rather than develop wind or solar.

    My feeling is that we would be better figure out how to wean ourselves from fossil fuels, rather than invest more time and research in figuring out how to become even more dependent on fossil fuels by making them more effective.

    Same, by analogy, to advertising in the information seeking process.

    And perhaps there are better ways to combine ads and non-sponsored results to improve user experience (assuming the former as a given)–it’s actually a problem I haven’t thought about. If anything, you’re saying that, in terms of thinking about advertising, there’s a situation analogous to that with non-sponsored retrieval–we have to think beyond filtering and ranking, and in terms of an overall user-centered experience. HCIR for advertisers? :-)

    Yes, again, I don’t think the former is a given. But assuming that we do accept it as a given, then yes: I am saying that our current search experience is not very user-driven. HCIR should encompass all the information on a results page — ads included.

    Am I really so novel to even suggest this?

  • 23 Daniel Tunkelang // Aug 3, 2009 at 2:41 pm

    I don’t think that fighting the system and working to improve the system are mutually exclusive. I’d rather see fewer single-occupant cars on the road, but I still am happy to see efforts to improve fuel efficiency. Picking your battles doesn’t mean giving up on the war.

    Anyway, I haven’t seen anyone apply HCIR ideas to advertising. I think it is novel.

  • 24 jeremy // Aug 3, 2009 at 2:48 pm

    @Neal: As for comp advert being not sufficiently differentiated to be a sub-discipline.. isn’t the community self organized and the concept of birds-of-a-feather meetings a well established tradition? Color me confused.

    Maybe I again did not express myself clearly: I’m not saying that such a community doesn’t exist. Obviously it does. I am saying that such a community shouldn’t exist. Because by factoring itself off in that manner, the community is actually working against the overall user experience.

    I’m saying that they’re solving the wrong sub-problem, that they’re (by analogy) trying to make cleaner coal, when they instead should be figuring out how to make solar work, instead.

    Does that clear up your confusion?

  • 25 jeremy // Aug 3, 2009 at 3:16 pm

    I don’t think that fighting the system and working to improve the system are mutually exclusive.

    Simple counterargument:

    Economics is the (dismal) science of how limited resources are allocated. Time and money are limited resources. Time and money spent in optimizing coal is time and money lost in improving solar. QED ;-)

    I also would like to see fewer single-occupancy cars on the road. But money spent promoting that is money not spent on building a light rail.

    If you’re just saying that you can spend some partial amount of money on each, then sure; they’re not completely mutually exclusive. But any money spent on one is money not spent on the other.

  • 26 Daniel Tunkelang // Aug 3, 2009 at 3:33 pm

    That works if all of the resources are yours to allocate. The art of picking your battles is deciding where you can have the most effect, given that most of the resources are outside your control or even your sphere of influence.

    And they call me an idealist! :-)

  • 27 Neal // Aug 3, 2009 at 3:42 pm

    @jeremy Arguing that a self-organized community should not exist is VERY abstract. Besides these periods of self-organization, reformation, splintering happen continuously in human organizations. The rise and fall of nations. If and when there is another community for a given one to join then it will.

    CompAdvert has split off to meet a need for idea exchange and because the addition of Economics into IR+Optimization gives it a bad fit to any existing community.

    You are correct in the abstract that dynamic pages can now be indexed reasonable.. this is a recent development..yet I’d still contend that if they even smell like SEM pages then the major engines will kill them from the rankings.

    At a high level I agree with your now less opaque ideas .. they are still to abstract from an industry perspective. The necessary condition for what you desire to come true is that either a major engine will have to abandon it’s business model (and the traditional separation of advert from content) or a completely new one must be established that does-things-differently.

    There are two ways to proceed.. either start a company or write a seminal paper. When should I expect news on this from you ;-) I’d advocate that you start a company fixing enterprise federated search with relevance reintegration… plenty of opportunity there.

    Note that when we switch the subject to display or contextual advert on webpages outside of the SERP then ‘relevance reintegration’ breaks down… yet is still CompAdvert and still needs IR+Optimization+Economics. We’re not presenting a list of items anymore.. it’s trying mostly to present a bit of content that is relevant to the main content, the person or both.

  • 28 jeremy // Aug 3, 2009 at 3:50 pm

    And they call me an idealist!

    Heh. Yeah, I do have that (positive? negative?) streak.

    Perhaps I’m not trying to pick any battles. Perhaps my goal is to pick a meta-battle, by simply reminding people that the battles that they are picking (1) aren’t the only battles out there, and (2) come at a cost, i.e. that there are tradeoffs.

    The conventional wisdom these days, the belief that it seems almost everyone subscribes to, is that advertising is completely neutral w/r/t its impact on the user experience. The rationale says that because ads are separate and clearly labeled, there is no impact, and relevance can be optimized independently.

    My meta-battle is to point out that this is simply not true. What we do about it is the battle. Knowing that it’s even an issue is the meta-battle.

  • 29 jeremy // Aug 3, 2009 at 6:37 pm

    @Neal: You are correct in the abstract that dynamic pages can now be indexed reasonable.. this is a recent development..yet I’d still contend that if they even smell like SEM pages then the major engines will kill them from the rankings.

    Wait.. what? You’re telling me that if there is relevant information out there on the web, search engines will prevent a user from finding certain types of relevant information (by removing it from their index) if the search engine doesn’t get a cut of the action?

    Shouldn’t the user be the one making that relevance decision, not the search engine? This feels totally screwed up to me.

    This is probably why I am increasingly leery of the formation and promotion of the whole Economic IR subfield. To me, its very premise is anti-user.

    CompAdvert has split off to meet a need for idea exchange and because the addition of Economics into IR+Optimization gives it a bad fit to any existing community.

    But they’re only optimizing the ads, not the whole user experience. (And removing things that should be ads from their index, so that certain people are forced to pay in order for their relevant content to be found?) As we well know in machine learning, steps that appear locally optimal quite often lead to globally suboptimal outcomes.

    Maybe I can’t do anything to change it. But the CompAdvertising subfield needs to at least be aware of what it is they’re proposing and doing. Are they? Do you have insight into this?

    Note that when we switch the subject to display or contextual advert on webpages outside of the SERP then ‘relevance reintegration’ breaks down… yet is still CompAdvert and still needs IR+Optimization+Economics.

    Apples and oranges. Go back to our comments #5 and #8, above. If you just want to place advertisements next to content, I have no problem with that. Be my guest. Doing so is basically just like television advertising. You look at the content+context of a television program, and then match similar advertisements based on that content+context similarity. That’s why you have toy commercials, not Viagra commercials, placed into of Spongebob Squarepants programs. It’s an old story.

    But that content-placement advertising process is fundamentally, qualitatively different than when a user has an information need and is actively seeking information relevant to that need. Everything I’ve said above only applies to the latter, not the former. You’re absolutely correct in saying that “relevance reintegration” doesn’t work in the former, because there is no relevance. There is no relevance because there is no explicit user information need. There is no explicit user information need because the user is not seeking.

  • 30 jeremy // Aug 3, 2009 at 10:05 pm

    By the way, I’ll say it again: Thank you both for the lively and spirited discussion! Quite a pleasure.

  • 31 Daniel Tunkelang // Aug 3, 2009 at 10:09 pm

    Likewise–good to see comments outnumbering blog posts by an order of magnitude again!

  • 32 Stavros Macrakis // Aug 19, 2009 at 10:35 am

    Jeremy says:

    If the information hyperlinked to by the ad is
    relevant, then that information should be
    found organically, by the search engine.

    The relevance metric for a main result is something like “user will find content interesting”, while the relevance metric for an ad result is “user might be interested in a commercial transaction”. (With the usual market assumption that commercial transactions are *mutually* beneficial.)

    I suppose you could achieve the same effect by having a “prefer commercial results” UI (I think both Lycos and Yahoo experimented with sliders for this), but this has various problems: it complicates the UI, and requires that the user decide in advance that s/he’s potentially interested in a commercial response to his/her query. I have myself been surprised that for some queries, ad results have been *more* helpful to me than main search.

    -s

    PS I am of course completely leaving aside the question of whether ads are a good way to finance Web content.

  • 33 Daniel Tunkelang // Aug 19, 2009 at 1:34 pm

    There’s no question that Google and others use two different relevance metrics. I’ll assume that the main one is their best effort to model “user will find content interesting”.

    But the relevance metric for an ad result is, to a first approximation “maximum expected revenue generated from click-throughs”–based on the product of how much the advertiser pays per click and the expected clickthrough rate. It’s not quite the same as “user might be interested in a commercial transaction”.

    Commercial transactions should be mutually beneficial. But advertising often involves jockeying among competitors in a zero-sum game to procure consumer’s scarce attention budget. That’s not necessarily a win/win scenario.

  • 34 jeremy // Aug 19, 2009 at 3:30 pm

    @Stavros:

    I’ve written a little more here about my feelings of having two different relevance metrics that are being simultaneously (and therefore imho confusingly) optimized. Give that a read and we can chat more.

    But even if we do accept the premise that a single results page can have two different types of relevance, I have the additional issue that the user is not given control over which of those metrics is used, for which types of result. Why is it the search engine that gets to decide, and not the user?

  • 35 Nick Trendov // Mar 2, 2010 at 1:14 pm

    re: “…So the ad gets in the way of organic relevance.”

    Your perception may change if you consider an ad as a vehicle that carries the viewer to the web page or as a vehicle that carries the ad to the viewer.

    Consider a teeter-totter in a playground with a searcher on one side and a web page on the other. The ad is just the pivot.

    Once the ad takes the searcher to the web page or the web page to the searcher then it is useless.

    Cheers,
    Nick
    http://www.neuropersona.com

  • 36 So You Like Big Data… // Jan 4, 2011 at 12:44 am

    [...] the bulk of its revenue from advertising, they all present the big-data challenges associated with computational advertising. Search is, in my view, the web’s killer app, so you can’t go wrong working on it. But [...]

  • 37 CIKM 2011 Industry Event: Vanja Josifovski on Toward Deep Understanding of User Behavior on the Web // Nov 27, 2011 at 1:15 pm

    [...] Track had the opportunity to hear Yahoo researcher Vanja Josifovski make an eloquent case for ad retrieval as a new frontier of information retrieval. At the CIKM 2011 Industry Event, Vanja delivered an equally compelling presentation entitled [...]

Clicky Web Analytics