Perhaps Amazon has had this feature for a while, but today, for the first time I noticed a section labeled “Customers Who Bought Related Items Also Bought” as seen in the screen shot above. I was looking at an unreleased book, which might explain why they couldn’t show me information based on customers who actually bought the item.
Has anyone else noticed this? Am I just late to the party? I tried to find more information online, but nothing showed up. I assume they are using some item similarity measure to assemble a set of related items, and then are basing collaborative filtering on the purchase history associated with that set.
I’m very curious to hear more from anyone who is familiar with this functionality.
14 replies on “Amazon: Customers Who Bought Related Items Also Bought”
From what I’ve read of this, it’s using item-item collaborative filtering. Most collaborative filtering has a matrix of items-users, trying to figure out what items a particular user would want. This one is using items as both the rows and columns of the matrix, trying to figure out what other items are popular given an interest in a starting item.
But in this case, the item it’s using has no purchase history. So they must be taking a semi-supervised approach, using some rule-based or statistical similarity measure to identify related items that do have a purchase history, and then combining the collaborative filtering results obtained from those items. I’ve seen this idea described in research papers on data mining, but this is the first time I’ve seen it implemented.
Couldn’t they also be looking at a correlation of pre-orders or wishlists for the product to seed the correlation? for products that are not yest released, it seems like an actionable substitute to drive results.
I’ve been using the feature you described for years to help me find music and films to download and books to read.
Stefan, have you seen this feature enabled for pre-release items not yet available for sale? That’s the part that struck me as new.
Bryan, that’s possible, but in this case I doubt it. I happen to like the book I used as an example (I reviewed it for the publisher), but I doubt enough people know about it to have pre-ordered it. Maybe more people know now, since this post made it onto Techmeme. 🙂
this has been around for years. they also look to your IPs history, amazon browsing history of your account, geo stats, etc.
amazon attributes a huge chunk of their profits to recommendations. it works. plenty of white papers on it out there to dig through.
Jakrose, I know Amazon’s been using collaborative filtering for years, as well as other forms of personalization. Do your research on me–you’ll see I’m hardly a stranger to this space.
But have you seen this specific feature before–customers who bought *related items* also bought? I haven’t, and I found no leads when I searched. That’s what I thought was newsworthy.
I wonder if they’re using the same notion of related content available through the Amazon Associates Web Service. The public API defines a RelatedItems response group. The user can (is required to?) submit a RelationshipType value in order to get one of these groups back. Sample values include Episode, Season, Tracks, and Variation (you can guess how that might apply to the various lines of products Amazon sells). See this link for more information: http://docs.amazonwebservices.com/AWSECommerceService/2008-08-19/DG/index.html?CHAP_OrganizationofItemsforSaleonAmazon.html
The other thought I had was library published book organization systems. “Related items” might mean items close to the item in question in the library stacks.
I don’t think either of these suggestions would require very much oversight. Other than that… could the google image labeler game be applied to item similarity somehow? I’d like to play that.
Chris, I did see that page when I tried to research the feature. But it left me with two questions:
1) What RelationshipType values or combination thereof do they use? This is content-driven similarity, not the collaborative filtering for which Amazon is famous. That’s not to say they can’t do both–they obviously do. But I’m curious if they’ve published anything about it.
2) How do they then use the set of content-driven related items as inputs to their collaborative filtering engine? Do they assign weights based on some measure of similarity? How do they account for the diversity of results, which might confound a vector-based approach?
OK, that’s more than two questions. But it gives you an idea of why I find this so interesting. And why I’m surprised not to find anything about it on the web. For all I know, this feature has been available for a while, but no one else seems to have taken the time to notice it.
I am not sure how they might be pulling off the “related items” concept, but I’d be interested in any insights you gain on it, Daniel (so hopefully, you’ll share what you find in a future post). I’ve tried to work out a way to do this with information about employees in an enterprise – I have built a simple solution but not one I’m happy with. I’ve described the work on my blog but haven’t made any progress since that write-up.
Lee, I’ll share what I learn. I think the interesting question is what approach they take to content-based similarity, especially given that their products have nominal rather than numerical attributes. I looked at this problem several years ago; you can find my SIAM Data Mining 2002 paper here:
Click to access NearestNeighbor.pdf
I’m looking at another unpublished item (http://www.amazon.co.uk/Exploratory-Search-Query-response-Synthesis-Information/dp/159829783X/ref=sr_1_2), which doesnt have any of these other people who bought related items recommendations. however it does have three indications of the notion of related items: subjects, categories, and tags.
Surely this system is just picking the most similar books and giving you their standard collaborative filtering results?
Ian Ruthven gave a great keynote at IIiX2008 in London a while back on context, and discussed the full range of amazon’s similarity assessments, particularly focusing on books (http://irsg.bcs.org/iiix2008/presentations/Ruthven.ppt)
“Surely this system is just picking the most similar books and giving you their standard collaborative filtering results?”
I don’t doubt it, but that’s a bit underspecified. How many “most similar” books? Do they contribute equal weights, or are the weighed based on the degree of similarity? For that matter, is the weighting linear? Do they do anything to address diversity within the set, i.e., books that are similar to the unpublished item but very different from one another? And how “standard” is their collaborative filtering in the first place?
Regardless, thanks for the pointer to Ruthven’s presentation. Interesting stuff, even if it doesn’t answer the above questions.
And, on an unrelated note, I hope you like the subtitle of that book you were looking at. I’ll vouch for the quality of the contents; my own contributions as a reviewer were cosmetic.