Categories
General

RecSys 2012 Industry Track

I’m proud to be co-organizing the RecSys 2012 Industry Track with Yehuda Koren.

Check out the line-up:

  • Ronny Kohavi (Microsoft), Keynote
    Online Controlled Experiments: Introduction, Learnings, and Humbling Statistics
  • Ralf Herbrich (Facebook)
    Distributed, Real-Time Bayesian Learning in Online Services
  • Ronny Lempel (Yahoo! Research)
    Recommendation Challenges in Web Media Settings
  • Sumanth Kolar (StumbleUpon)
    Recommendations and Discovery at StumbleUpon
  • Anmol Bhasin (LinkedIn)
    Recommender Systems & The Social Web
  • Thore Graepel (Microsoft Research)
    Towards Personality-Based Personalization
  • Paul Lamere (The Echo Nest)
    I’ve got 10 million songs in my pocket. Now what?

Hope to see you at RecSys this September in Dublin! Registration is open now.

By Daniel Tunkelang

High-Class Consultant.

25 replies on “RecSys 2012 Industry Track”

I’ve wondered for some time now, Daniel: What is your personal (research) philosophy on the relative importance/strengths/etc. of RecSys vs. HCIR?

RecSys seems to have attached to it a philosophy of doing everything for you. The algorithm (system) is in control.

Whereas HCIR has the philosophy of structuring the interaction so as to better figure out how to extract more explicit information or knowledge from the user, allowing the user expend his or her own cognitive ability to make the process better.

These two approaches seem, to me at least, to be fundamentally at odds.

Do you have a philosophy that combines them, since you seem to be active in both communities?

Like

..and of course it’s not like recsys doesn’t make use of ongoing input from users. And it’s not like hcir doesn’t use algorithms on the backend to enhance the captured user cognitive effort. There is a little of each in each.

But the core of how a problem is structured.. where all the effort gets put.. is completely different in recsys vs hcir, methinks. Do you see it in some other way?

Like

As Emerson said, a foolish consistency is the hobgoblin of little minds. 🙂

Seriously, I don’t see the approaches as being incompatible. Systems should do something for you, while allowing the user to leverage knowledge and expend effort to make the process better.

You’re right that the communities have different focuses. But the RecSys community seems increasingly interested in interaction and transparency. And the HCIR community has attracted its share of algorithms-focused folks (like me). I think the two communities are at their best when they overlap. So I’m doing my small part to help build that bridge.

Like

Oh, I totally agree that there should be an overlap as well. I was asking about your philosophy about that overlap. Already, both hcir and recsys overlap.. like I was saying “of course it’s not like recsys doesn’t make use of ongoing input from users. And it’s not like hcir doesn’t use algorithms on the backend to enhance the captured user cognitive effort. There is a little of each in each.”

But when they meet, one usually takes the steering wheel, and the other rides shotgun. Do you have philosophy for not only letting them both ride on the same wagon, but both be behind the driver’s seat? That’s what I’d be interested in hearing more about, even at just a high level.

Like

Some applications, like advertising, favor recommender systems — users aren’t particularly interested in expending cognitive effort. Others, like search, favor HCIR since the user is pursuing an information need. I’d say that the wagon is most shared when the user wants to control the process but needs insightful guidance on where to take it, i.e., recommendations for meaningful next steps in a rich enough search space to require algorithmic guidance.

Like

So that tells me when the driver’s seat should be shared. What I’m asking about is conceptual philosophical insight as to how you would share the driver’s seat, when it is to be shared.

Like

I guess I don’t really see a dichotomy when it’s done right. Recommender systems should be following HCIR principles of transparency and interactiveness whenever possible. HCIR systems should be using recommender systems to compute the best paths to show to users. The “driver’s seat” is an artifact of framing.

Like

I think I pretty much agree with your vision. The only slight misalignment that I see is with HCIR system using recommender systems to compute the best paths to show users.

The thing is, recommender systems (typically) use big data, aggregated behavioral patterns of the masses, to compute their best paths. Sure, there is work on “personalizing” those best paths. But I liken it to the interstate highway system. Big data recsys have already carved out the deep and wide (big) Interstate 80. That’s the main path through which all recommendations flow, the main thoroughfare of the big-data driven algorithm.

Personalization or HCIR-ization of that big data recsys is like having a little assistant that tells you to get off I-80 in Lincoln, Nebraska to have lunch rather than Omaha, because Lincoln has a nice vegetarian restaurant that you might like. But what it doesn’t do is take you off the (big data) interstate altogether, and compute a completely different route.. even algorithmic route.. that puts you on state route 2 through Thedford, Nebraska. When it’s actually the Thedford route that you should have gone, based on your HCIR interactions.

Unless the recommender system explicitly disabuses itself from big data in its training, it seems to me that you’re just going to have HCIR in the passenger seat, asking the driver to “personalizedly” pull off the road in Lincoln. Rather than both HCIR and RecSys in the driver’s seat, discovering Route 2 through Thedford together.

Unfortunately, I think too many, if not all, commercial recsys are big data driven. Which is why I think we need an explicit philosophy or approach to removing the dichotomy and getting things done right.

Like

I remember having this discussion with Paul Lamere, one of your panelists, back in 2007. My suggestion then for music recommendation, and how to make it mutually compatible (driver’s seat-sharable) with hcir was to not rely on big data, aggregated user listening habits, at all. But instead to only rely on the small data of the song itself.. do content-based musicological feature extraction. Analyze and pull apart each song with respect to its tempo, rhythmic structure, harmonies, timbres, etc. And then base your hcir-ing on algorithmic transformations of that low-level, small data. Rather than on the big data Interstate-80 of what all your friends, or even the world as a whole, is listening to.

Will it then require more cognitive effort to find those small town Thedfords? Of course. But that’s the whole point.

Or do you think that big data still has a place in the hcir-recsys hybrid system? Do you disagree with my fundamental premise?

Like

“But instead to only rely on the small data of the song itself.. do content-based musicological feature extraction. Analyze and pull apart each song with respect to its tempo, rhythmic structure, harmonies, timbres, etc.”

…and furthermore not even to rely on “folksonomies”, or mass aggregations of user tags, of this information. Remember, folksonomies were the buzzword half a decade ago. But the problem with using folksonomic labels of tempo or timbre or rhythm, rather than the raw content data itself, it that you’re still relying on big data aggregation of what the world as a whole thinks of a song. The folksonomy still leads to the impersonal carving out of I-80, which you can then personalize by exiting in Lincoln rather than Omaha.. but which you fundamentally cannot get away from. You’re still on I-80. No, the only way to do it dichotomy-lessly is to not make big data a part of the algorithm. Is my philosophy, at least.

Like

Wow, it’s great to have you back as my commenter-in-chief!

Of course I have see more personal value on big data. For me folksonomies aren’t impersonal at all — rather, they are social mechanism to allow people to share information. Yes, at some point personal information exists only at the individual level, but I’m grateful to the rest of the world for establishing a vocabulary I can use to organize my own perspective.

Like

Sure, I’m not disputing that you personally, or any one individual, might not find a lot of value in someone else paving the road.

But that’s not the question. The question is whether that is really a dichotomy-free balance between recsys and hcir.

Again, it’s great that you get to “hcir” your way into getting off at the Lincoln vs. the Omaha exit. But you’re still on I-80, and 99% of the country is left unexplored.

Let me step back for a second and think about this from another perspective. Measurement. I’m following the maxim that if one cannot measure something, one cannot improve it. So how do we measure how successful we’ve been in putting both recsys and hcir in the driver’s seat.. how do we measure whether, properly done, we’ve removed the dichotomy.

And let me suggest that a great measurement of that is Leif Azzopardi’s 2008 CIKM paper on “retrievability”. How easy is it, using the system, to get to everything available in the system? Not how easy is it to get to the big data popular items. But how easy is it to get to anything and everything?

I’m reminded of Oscar Celma’s 2007(?) dissertation, in which he studied a similar problem.. the breadth of the recsys coverage. What he found, if I may summarize, is that using big data recsys, the horizon of any one individual was expanded. But the horizon of all users together shrunk. As a community, the users became less diverse, even if any one individual’s perspective was slightly embiggened.

To me, a system that does that is an hcir failure. Well, failure is a strong word. Let me put it another way: A system that does that has hcir in the passenger seat, and recsys in the driver’s seat.

A true hcir+recsys combo should simultaneously enlarge both the individual perspective, as well as the combined group perspective.

So again, that’s fine if folksonomies help organize the world for you. It allows you to find an exit off the interstate that you might otherwise not have known about. But a five minute business loop later, you’re right back on the interstate, with no way of ever discovering that Thedford, Nebraska even exists. Much less discovering that Sarah Lou on the outskirts of Thedford makes a mean strawberry smoothie at her farmstand.

So back to my original statement: “Unfortunately, I think too many, if not all, commercial recsys are big data driven. Which is why I think we need an explicit philosophy or approach to removing the dichotomy and getting things done right.”

I’ve given one attempt to put a measurement on the outcomes of such a philosophy, using a Azzopardian “retrievability” metric, or a Celmian “group radius” metric. But I still lack the core philosophy itself. So I’m curious if you have any thoughts in that direction, or whether you really don’t see it as a problem, the same way I do.

Like

“Wow, it’s great to have you back as my commenter-in-chief!”

Heh. Well, let me say again, as I’ve said before: Please don’t ever hestitate to let me know if I start turning into pedant-in-chief. I comment because I’m unsatisfied with things as they are, and am seeking dialogue on how to make ’em better. But there’s a fine line between that, and just complaining.

Like

Nah, your no pedant. Besides, one person’s pedantry is another person’s scholarship.

As for my twining the roads of RecSys and HCIR, perhaps the best example is Pandora. Pandora has, through its input and particularly its feedback mechanisms, introduced me to music I would never have discovered otherwise. Using Leif’s language, it has improved the accessibility / retrievability of music. I can’t say whether it has reduced the collective horizon — that’s certainly possible. But it has more than slightly embiggened mine.

Like

Right, but Pandora doesn’t make use of big data. Or at least they didn’t originally.. I heard some whisperings at one point that they’d perhaps started doing so, but I cannot confirm those whisperings.

No, instead of using big data, Pandora does exactly what I recommend: Use strict content-based features plus hcir feedback.

Don’t let the fact that humans manually label Pandora’s audio feature sets, which include not only things like gender of the vocalist (or whether there is a vocalist at all. i.e. instrumental), but how much vibrato the vocalist has in his or her voice. It’s not a folksonomy.. it’s a controlled vocabulary. And the features aren’t big data features.. they are per-song based extraction of raw information. That the extraction is done by trained musicians sitting in an office in Emeryville, rather than by an ISMIR algorithmicist is orthogonal to the hcir vs. recsys issue. It’s still a raw audio feature, or direct transformation thereof, without any big data interference.

So if that’s your best twining roads example, then perhaps I’m not so off-based in my wariness of big data? Given that Pandora’s approach mirrors the one I favor?

Again, however, I can’t draw any conclusions on the fact that you’ve discovered more music. Because the metric is not about any one individual, it’s about all individuals, with all the queries (or possible queries) that they can ask. Even Leif doesn’t couch it in terms of a single person. Leif couches it in terms of the system as whole, and how retrievable every atomic unit of information is by the system as a whole. Not by any one individual. Both Leif and Oscar are saying the same thing: Measure the collective horizon, not the individual horizon.

Like

Correction:

“Don’t let the fact that humans manually label Pandora’s audio feature sets — which include not only things like gender of the vocalist (or whether there is a vocalist at all. i.e. instrumental), but how much vibrato the vocalist has in his or her voice — throw you off.”

Like

I’m pretty sure that Pandora combines content-based and social signals, but I don’t have a reference.

And I concede your point — recommendation systems may be reducing collective diversity. I do recall Oscar’s dissertation — I even blogged about it. 🙂

Like

Pandora started from the “music genome” project. At its core it is a function of the music signal itself, not the social signal. I did see one announcement that Pandora had added a social layer, so that you could browse other people who had listened to a song or artist in an hcir fashion (http://techcrunch.com/2006/12/19/pandora-goes-social/)

But this does not appear to be an integration of social information into the ranking signal itself. But again, I don’t know for sure, either.

I also didn’t necessarily expect you to concede any point on recommender systems reducing collective diversity. I’m more curious about whether — if they do — it interferes with your sense of a proper, “done right” recsys-hcir intertwining?

I’ve proposed collective diversity reduction (aka retrievability reduction) as a measure for unsuccessful intertwining (higher reduction = less success)… as a way of telling that recsys really is in the driver’s seat and hcir is only chirping out suggestions from the passenger seat.

Do you agree with with that metric?

You might not. Perhaps you have the feeling that it doesn’t matter what happens collectively in the system or the community as a whole, so long as any one individual feels like they get a little more than they otherwise would have. In which case any hcir input as part of the recsys process is a confirmation of the co-drivering non-dichotomy. And I’m getting all worked up about nothing. Yes? No?

Like

“I also didn’t necessarily expect you to concede any point on recommender systems reducing collective diversity. ”

…because I’m not trying to win any argument, so much as I’m trying to jointly tease out a philosophy. As you wrote: “Recommender systems should be following HCIR principles of transparency and interactiveness whenever possible. HCIR systems should be using recommender systems to compute the best paths to show to users. ”

And I fully agree with that. But I’d like to now figure out how to measure it. (Measurement exposes philosophical underpinnings.) And if it turns out that the hcir best paths, aka the recsys transparent paths, end up reducing diversity, that collectively people end up listening to fewer songs or artists, rather than more, then is that really a transparent, best path system?

Or does diversity not matter? Is the transparency itself its own reward? That you don’t need to measure how well the transparency helps the user do anything.. you just need to measure how transparent the transparency is?

Hmm.. this feels like I’m trying to get you to write a position paper with me.

Like

Re Pandora: I’m not certain how their recommendations work. If it is entirely content-driven based on expert annotation, I’m impressed.

As for whether reducing collective diversity is bad in principle, it really depends.

For example, I’d love to reduce the collective diversity of opinions about evolution. I don’t advocate doing so by removing pseudoscience articles from search indexes, but I have no particular concern about their retrievability relative to articles reflecting scientific consensus. Yes, I prefer giving the user more power in principle, but I also have no problem with a system exercising some degree of curation. I do realize it’s a slippery slope.

So perhaps the seeming dichotomy boils down to a question of how much curation, if any, a system should do. As a user, I appreciate when systems get rid of bad stuff, but I get annoyed when systems start throwing baby out with bathwater because they are trying to be too clever. There’s a right amount of cleverness, and in general I prefer for systems to at least be humble enough to let me override the controls.

And, in any case, my perspective is entirely egocentric. As long as I, as a user, find what I need, I’m happy. Same for all values of “I”. It doesn’t bother me if the result is reduced collective diversity — hopefully that just means that some answers are more equal than others.

Like

Re:Pandora: Entirely content-driven based on expert annotation is indeed how the music genome project started (out of which Pandora grew) in 1999 or 2000, and as of the last time I looked into it in 2005, was still what Pandora was doing. If they’ve changed since then, and added other signals, then I don’t know about it. But it did exist as a service for many years in exactly — and only — that expert-annotated content driven manner.

Re: Retrievability: I heard what you’re saying about not worrying about the global warming pseudoscience not being retrievable. But what if you’re goal/information need is to gather (do a set-based retrieval) of all that pseudo-science, so as to be able to analyze it to show systematic bias, regular patterns of incorrectness, or otherwise expose its pseudoness. Finding one or two pseudoscience articles wouldn’t be enough to make the case. Finding the whole set is necessary. So if the system is systematically biased against helping you find all that pseudoscience, wouldn’t that be a problem?

I know you acknowledged slippery slope caution in allowing the system to exercise some degree of curation…baby bathwater…balance between too clever and not clever enough. But the way out of it is “in general I prefer for systems to at least be humble enough to let me override the controls.”

Doesn’t that statement, by itself, mean that, until you override those controls, and tell the system somehow that you actually do want to find all the pseudoscience, that recsys and not hcir is in the driver’s seat? The system may be humble enough to then let you kick recsys out, and install hcir in. Which might be the best possible compromise in system design. But, at best, that’s a system that allows quick swap of recsys and hcir in and out of driving. It doesn’t really allow them to both share the driver’s seat at the exact same time, eh?

Maybe it’s like calculus.. take the limit as n->infinity, and you can get a pretty good measure of the area under the curve. By analogy, maybe the dichotomy is resolved not by ever having recsys and hcir in the driver’s seat at the exact same time. But if you let the amount of time that each spends in the driver’s seat be 1/n, and then swap them in and out so quickly such that n approaches infinity, then even though it’s still orthogonally sequential, in the limit you get the effect of both being there at the same time?

Now I’m just rambling.

Like

Correction:

Re: Retrievability: I hear what you’re saying about not worrying about the global warming pseudoscience not being retrievable. But what if your goal/information need is to gather (do a set-based retrieval) of all that pseudo-science

Like

And, in any case, my perspective is entirely egocentric. As long as I, as a user, find what I need, I’m happy. Same for all values of “I”. It doesn’t bother me if the result is reduced collective diversity — hopefully that just means that some answers are more equal than others.

And that’s a completely valid, acceptable answer. I’ve just.. found myself wondering lately.. oh how shall I say this..? It’s like buying a car. I’ve never bought a new car in my entire life. Either I’ve lived in big cities where I didn’t need a car, or I’ve bought a used car. Financially, I don’t think buying new cars really makes that much sense. So it’s good for me, personally, to buy only used cars.

But then if everyone behaved like I did (or there was a car-buying hcir-recsys that helped persuade others into behaviors like mine), and nobody ever bought a new car…well then you can see the problems that would cause. The used car market only continues to exist because the new car market exists. I kinda view global vs. personal diversity similarly. Personal diversity is like being able to enter the market of used cars. It very much satisfies my transportation needs, and even gets me a wider choice of vehicles than I otherwise would be willing to spend money on. But if everyone only self-optimized, the market for used cars would dry up… the global diversity of the new car market would disappear.

You know what would be interesting? If we could measure long term effects of optimizing for personal vs. global diversity. Even if via simulation. Celma’s dissertation showed that personal diversity increased. But that was only in the short term, correct? Over the long term, if global diversity has also decreased, that might mean that the personal diversity algorithms have less fodder on which to base their recommendations.. which then might decrease the amount of personal diversity, over time, too. Unintended consequences?

If I were smarter I would attempt an extension of your fast food analogy, to the long term food ecosystem. Postwar industrialization of food production allowed for a short term increase in personal diversity. Americans got access in the short term to a wider range of foodstuffs than they previously had…oranges from Florida, artichokes from California, etc. But when everyone followed their personal gradient, it led to global loss of diversity.. today’s fast food, gas station corn dog food ecosystem.

Or something like that.

Like

Comments are closed.