A Sound Approach to Exploratory Music Search?

I just noticed an article today in CNET, “Mufin Player organizes songs by sound“, that describes mufin:

mufin’s music recommendations are based on the sound of the music itself. Only the similarity in sound decides whether a track is recommended or not.

Check it out–you don’t need to download any player in order to explore the 5M+ songs they’ve already indexed. I found the recommendations to be a bit erratic,  but I’m intrigued by the concept, especially after being underwhelmed by Apple’s Genius recommendation engine. So far my preferred music exploration tool is Pandora, about which my only complaint is its limited repertoire.

I know we have music information retrieval experts in the house, and I’m sure we have lots of music consumers. I’m curious to hear what folks think of mufin. Tempting but half-baked?

By Daniel Tunkelang

High-Class Consultant.

27 replies on “A Sound Approach to Exploratory Music Search?”

Are you asking for a full breakdown/analysis of the music search/retrieval space — how all the various services work and what they’re trying to accomplish? Or do you want the specifics on Mufin?

For Mufin, see this Nov. 2008 post:

(and of course all the comments below it, mine included 🙂

For the former question — that would take a separate blog post.

I think music is a great space for exploratory search, though.


Jeremy, thanks for the links. Not surprisingly, the moral of this story for me is that recommendation engines need to offer transparency–especially is such a highly multidimensional space where the relative importance of features is so subjective.


Side note: I’ve been a passive reader of this blog for some time now (its only because of syndication I very, very, rarely comment). Keep up the good work Daniel!

Recently I’ve been developing a project that focuses on tracking information’s life cycle in particular environments, and I was considering including music, but I had to desist, this is why:

Meaning is in the eye of the beholder, this is true for any type of communicable information; with speech (and language for that matter), the ambiguity generated by trying to communicate an idea between dissonant agents is overcome by sheer brute force.

That is, we understand what others have to say, despite our unique perception of reality, simply because we talk a lot, all of us, and we accompany many of the things that surround us, and the things we do, with words. And more importantly, we are protective of the perceived conventions achieved by the act of talking with others all day, which is to say that we are more likely to stick to generally accepted conventions of what certain words or phrases mean simply to increase the possibility that future communications will be effective in transmitting an idea.

But we approach music differently, we’re not jealous when it comes to tying to tie the different elements that make up a musical composition with an idea, with information. Maybe because music is not exclusive, and we can sing words while playing an instrument, who knows? But the fact is searching for information within music is pointless, because of the way we approach music.

Basically, If I’m listening to a song I may be listening to the lyrics, or the beat, or to particular instrumental arrangements, all of which (except lyrics) aren’t as frequently cross-checked with others as words are. And we must keep in mind, that just like with words, when I’m listening to a song, my perception of that song is directly linked to other songs I’ve heard, and that within each song that got my attention.

So… A good recommendation engine for music is one that knows what I’ve heard before, and simply suggests that to me and to people with similar playlists. Period. Eventually it will display something I won’t like, and that’s OK, because anything other than that is just speculation, very broad speculation.

Think of it this way, what would a search engine be like if it tried to display the results I _liked_? So if I searched for ‘car’, for example, it shouldn’t display car manufacturers, and the corresponding page in Wikipedia or what have you, but the actual car I’m most likely to want to buy. Yep, it would kind of miss the point. Now think of the same search engine but instead of querying for text it would allow me to search for sounds. And that’s the state of music recommendation engines.

Long story short, maybe what we want are more music stations, which by the way is what Pandora is trying to be, and less search engines for music we might like.


Guillermo, you had me until your conclusion. I’ve gotten great recommendations from means other than what you suggest–for example, artists whose styles are similar to or even influenced by ones I like. What is key is that recommendation systems can’t be opaque: rather, they need to explain to me *why* they are recommending something, in language that I can understand.

It’s a good idea for friends to do that too when they offer recommendations (for music or anything else). Friends have the advantage of much more common context, which makes the communication easier and more efficient. But, at heart, it’s the same problem.


You’re right, my conclusion wasn’t very good.

This is where I was trying to get: searching _within_ particular songs for the properties that identify musical taste in unique individuals is not the best solution, because of the complexity involved in trying to understand that individuals interpretation of it.

Hence, music stations (which could be any social recommendation system for music, *but* are they _really_ search engines?). If people you trust recommend you music its possible you’ll hear it, and only then will you apply your own musical criteria.

Thats how I understand how all roads lead to Radiohead, it’s not their music which creates those connections.


So… A good recommendation engine for music is one that knows what I’ve heard before, and simply suggests that to me and to people with similar playlists. Period.

I strong disagree. When I was younger, I used to teach social dance at a studio. Cha cha, rumba, waltz, samba, nightclub two step, swing/jive, west coast swing, salsa, etc. And if you went to a music store and bought a “ballroom dance music” CD, what you all-too-often got was a bunch of really bad, cheesy music that no one wanted to dance to.

Instead, most people wanted to dance to popular music. So the best things to play were things like “Open Arms” by Journey as your waltz music. Or “Sweet Lady Marmalade” for your cha cha. Or “Never Tear us Apart” by INXS for your nightclub two step.

It was because of the information contained in that song that I needed a search engine to help me find other good songs with the same information: a cha-cha rhythm, a waltz time signature and tempo, etc.

Furthermore, I needed to have a transparent, open search engine, so that I could tell the engine exactly what it was about those songs that I liked. Not the timbre. Not the lyrics. But the tempo, the rhythmic structure, etc.

Is every single information need like this? No. Some needs probably could be solved by playlists alone. But many cannot. If I had another hour, I would list a bunch more scenarios. But the point is, just having more music stations isn’t the only, one, single solution.


Hence, music stations (which could be any social recommendation system for music, *but* are they _really_ search engines?).

I think we need to get away from using the word “search”. Because search, unfortunately, has come to be used as “known item, precision-oriented, answer oriented finding”. Aka google.

Take a step back in history, and let’s use the word “information retrieval”, instead. It’s a synonym for search. Information retrieval includes not only google-like precision-oriented search. It also includes recall-oriented search. It also includes TDT (topic detection and tracking). And routing and filtering.

With routing and filtering, the idea is that you set up a profile, and there are a stream of documents coming off the wire. The goal of the “routing and filtering information retrieval engine” is to put into your bucket all the documents that match your profile.

Substitute “documents” with “songs”, and you’ll see that it’s very much the scope of an “information retrieval engine” to do what we’re talking about. Right?

And even though “search” really is “information retrieval”, unfortunately Google has skewed the public perception of the word. So yes, it really is the job of a “search engine” to do routing and filtering.


@jeremy: You’re right.

If what you want is to determine a beat, a tempo, a particular pattern within a song a transparent search engine will do.

And also yes, _information retrieval_ is a better a term.

The problem I was presenting is akin to finding poems I may like. We could create a search engine that identifies metric or rhythm, or what have you, so I could query for a haiku and it could give me hundreds. The underlying problem would be identifying those poems I may like. To do that we could take a look at poems I have already read, we could identify authors and maybe themes… but the core problem would still be that I cannot identify the poems you may like because I am unaware of your environment, your history, and your current conditions. So all I could really do is present clever ways that let me present you poems without having to know what makes you, well…, you.

In that sense poems and songs are similar, the way I interpret them is not only unique to my uprising, my environment, my current conditions, etc., but also differs enough from other peoples interpretations so that finding general consensus among a group of individuals isn’t a trivial task.

Because of that what I was proposing was that in order to find the music that you may like, you simply need to hear more music, and _trust_ could be a way to address this issue.


The underlying problem would be identifying those poems I may like. To do that we could take a look at poems I have already read, we could identify authors and maybe themes… but the core problem would still be that I cannot identify the poems you may like because I am unaware of your environment, your history, and your current conditions.

But how is this any different than web search (ahem, I mean, web information retrieval) today?

Suppose I type in the query “bahamas vacation”. How does the search engine know which of the various vacation packages I am going to like?

Currently, what the search engine does is use popularity to make that determination. If a lot of other people like it, by pointing to it, then I may like it, too.

The problem, however, is that because of your environment, your history, and your current conditions, that recommendation might not (and probably does not) match what you are looking for, what you would like.

(See also what we wrote in paragraph 1 of section 2 of this paper:

So I agree with you that these problems are hard problems. But you drew a distinction, a line in the sand, between music retrieval and web retrieval. (Quote: “But the fact is searching for information within music is pointless, because of the way we approach music.”)

And I’m just saying that line does not exist. Web search has the same fundamental problems that music search has, and vice versa.

I tend to agree with Daniel that the best way we can solve those problems, at least right now, is not to have a black-box search engine like Google that gives you no insight into why and how it ranks things the way it does. Rather, we need transparent search engines that merge both content and social context, and furthermore give us the ability to clearly steer those engines based on our own interactive feedback to the engines. Music engines need this, but web engines desperately need this, also.


I’m jumping into the discussion to provide my two cents…

There are a few fundemental problems with content-based music information retrival that are hard to overcome.

One of them is bridging the semantic gap, the difference between what our algorithms can observe from the audio waveform (our set of features), and our subjective interpretation and the emotions the music creates inside us when we listen. I believe this is what @Guillermo is concerned with.

It’s almost impossible to predict a subjective sensation of eg. melancholy using only features extracted from a mix of musical sounds. Even genre is hard to determine objectively. There seems to be some kind of glass celling around 70-80% classification accuracy.

Classification is of course a different problem than recommendation and similarity search. Similarity is easier to work with than fixed decision boundaries (this or that).

Data sparsity will often produce weird recommendations, since two data points (songs) that are far apart may turn out to be the closest ones, since no other songs are filling up the space between them. The quality of content-based recommendations generally increase with index size.

Do I want transparency in a CB recommender? Does it help me to know that two songs have matching BPM or tonal profiles? Probably not (although it makes for good trivia). Audio features are even less useful as query parameters.

I think that the two most viable use cases for CB music information retrieval is recommenations, and similarity search paired with metadata text search. It’s not unusual to think of a song or an artist representing the music experience you’re looking for (some kind of prototype), and that is a natural starting point for a keyword search.

I did some research into semantic audio descriptors a few years ago. I found evidence that it’s possible to predict (with a certain accuracy) a subjective experience of intensity in music, using an objective model trained on a set of pre-classified examples.

I experimented also with a system that allowed you to train your own personal, highlty subjective classification models using examples from your own music collection. You could manually tag 20 soft songs, and have the system suggest additional soft songs for you. Nothing conclusive came out of these experiments, but it seemed like the we were able to make the data sparsity and semantic gap work for us. We made idiosyncratic models for idiosyncratic people 🙂



Yes, Brian Whitman over at the Echo Nest has been doing combination audio-content plus web-mined textual information since at least 2002 (at least, that’s the first time I saw him present that kind of work at ISMIR).

The point about transparency in the search engine is not that you (necessarily) expose the user to the amount of kurtosis in the FFT of the raw audio signal. Instead, transparency means that if you’re using the kurtosis and whatever other features, you come up with algorithms that use that information in a certain way. You use that information to classify by mood, or by genre, or whatever.

So for transparency, you simply give the user some way of weighting or ignoring or boosting or whatever, his or her relative interest in songs with that label.

Even Pandora does this. Pandora tells you why a song is being recommended (it has lots of male vibrato, etc.) and then lets you say whether you like that aspect/feature.

Even if the classification models are subjective, if a lot of a certain type of classification is appearing, and you also do not like all the songs that belong to that classification, a transparent engine should give you a way of saying “no”.


I wasn’t making claim to the work done by the people of Echo Nest. Hope it didn’t come out that way. I just think they are doing interesting things.

Ok, so transparency can also mean being able to control and correct the system (more like scrutability). I can see how that is useful.

I’m not a user of Pandora, but I believe it is possible to “starve” the recommender by providing lots of explisit feedback (like/dislike). The recommender may get locked into a very narrow space of the music universe, leading to very predictable (and perhaps boring) recommendations.

The point I want to make is this: would transparency be necessary if the recommender was flawless? Sure, technology isn’t perfect, but I believe that transparency and exploration is not a core feature of the recommendation user experience. It’s more of a hack adressing a flaw.

I wrote a separate post about that earlier tonight. Love to hear your thoughts 🙂


Sure, you can starve the recommender if you treat each user action as gospel law. Instead, if you use it as an opportunity to nudge the recommender, you may do better.

I just read your blogpost. Excellent!

So maybe the key is to strike a balance.. not too many choices, not too few choices.

I personally think that the too few/too many choice problem is not a line, with end-points. Rather, it is a wrap-around circle. Too few choices BECOMES too many choices.

For example, take Google. You enter your query terms, and you have too few choices.. too few methods or ways of altering the results that you get back. So what you are faced with is the prospect of going through all 750,000 results one at a time, to get the information that you need. 750,000 is too MANY choices. So what you’re left with, in Google’s case, is the Paradox of Choice.

Hmmm.. I should write a blogpost about that..


I’m curious: have there been studies that motivate the convention of returning 10 results per page? I imagine the number comes from a time when latency might have been a factor (today returning 100 results doesn’t add appreciable latency), but I also am curious if 10 is low enough to avoid choice overload.


@vegard: Precisely. I was trying to point out the difference between a text-driven search and sound-driven search, and how targeting sound-driven search like a text-driven search won’t lead to good results. And finally that all we could do seems to be a hack, because music, although it does carry pieces of information which could appear to be transfered between individuals, tends to be highly dissonant.

That being said, sure, these are all abstractions and I was drawing an imaginary line between these particular two (text and music) only because of the complexity of the issue as a whole. I can imagine scenarios where the complexity could be broken down to easily digestible chunks.

@daniel: About the convention of the 10 results, it seems to me that its simply a historical convention, but I’ve also seen the analysis Google has done regarding the different areas users tend to see on its page, the so called hot maps, or something of the like. Which might have lead them to leave that convention as it was.


About the convention of the 10 results, it seems to me that its simply a historical convention, but I’ve also seen the analysis Google has done regarding the different areas users tend to see on its page, the so called hot maps, or something of the like. Which might have lead them to leave that convention as it was.

But that is a self-fulfilling prophecy. That’s like saying that you’re going to build tracks for trains to run on.. observing that the trains only run on tracks.. and then concluding that your decision to build tracks was good, because that’s where the trains run.

I’ve sent a few pings out to some of my HCIR friends. I’ll try and find out whether there has been research done on this “10 links” issue.


I think, based on my own understanding and some articles I’ve read (I’m trying to find back to those), that we should increase the 10 results per page limit to probably 50.

We are not making optimal decisions when we’re scanning the list of search results, trying to decide whether to open up a page or not. Instead we’re satisficing, making a large number of sub-optimal choices. Satisficing takes less cognitive effort, so we’re able to handle more options than if our task was to carefully consider the pros and cons of clicking on every page link.

It also ties in with the infinite scroll design pattern, like you know it from the iPhone. A page that just keeps going, and going , and going and…

Keeping everything on one page is good, because it makes it a lot easier to compare item than if they were located on separate pages. It’s the same reason you shouldn’t place things that need to be compared in separate tabs.

A design that I would like to see in the wild, is one that helps me open multiple pages from one search results. Like many others, I typically ALT-click the interesting links, forcing Firefox to open them up in background tabs. Then I go through each one of them, before returning to the search page if I didn’t find what I was looking for.


I waffled between having Google show 10 and 100 results, before switching to Duck Duck Go. I hate paging, and I figured 100 was better than 10. That said, I don’t like choice overload either. And I hate the infinite scroll, because the decision to stop scrolling is a constant battle with anticipatory regret. I really want a search engine to return a well-defined set of results, not an endless slippery slope.


That’s what I mean about the paradox of choice. Having an undifferentiated list of 750,000 results, whether that list is broken up into chunks of 10, 50, 100, or infinitely-scrolled, is the ultimate in crazy-making choice overload. You are constantly, always faced with the decision of whether or not you should keep going in the list (choice, choice, choice) or whether you should stop and try another query.

What I really want is a search engine that tells me *why* things were retrieved, and not just what was retrieved. To me, that’s exploratory search. Being able to see the “why” then lets me make my decisions about whether to keep looking, or to give up.


Comments are closed.