Skip to content


Music, language and recommender systems

Recommendations plays an important role in navigating our online worlds as the vast amount of content available makes it impossible to rely on conventional navigation aids like hierarchical menus and search features alone.

Recommender systems is a family of content filters that uses recommendations as a way to highlight whatever content that might interest the user and hide content that does not. There a many ways to generate recommendations and systems can rely on many different types of data.

I have worked on recommender system in a number of domains (movies, concerts, music). In this post, I will introduce two families of recommender systems. Afterwards, I will then discuss some of the problems that arise when making musical recommendations.

Recommending music

Music recommendation has long been an integral part of our music culture. Listeners have relied on recommendations made by e.g. music journalists when discovering new music. Schumann famously recommended Chopin to the readers of Allgemeine Musikalische Zeitung with the remark: “Hats off, gentlemen, a genius”.


But with the amount of music available online today, new ways to recommend music automatically are being developed. This has led to recommender system being build into popular services like Spotify and iTunes.

Collaborative filtering

One way to make recommendations is to compare the listening habits of users. This approach is called collaborative filtering and is based upon establishing similarity between users. The assumption is that users that have behaved alike in the past will probably have comparable taste in music. If user A listens to many of the same tracks as user B, any music unique to B’s library can be recommended to user A.

Similarity can be established in a number of way by combining different data sources feed into a model of the user. Some data, like behavioral data can be collected implicit without the user’s knowledge – but it is also possible for a system to explicit ask the user to e.g. describe his/her musical taste by selecting favorite genres and musical idols or to rate music.

Implicit and explicit data collection

Some years ago, I maintained at movie recommender system that used both implicit and explicit data collection. The system explicitly asked the user about favorite genre and actors when the user signed up. This allow recommendations to be made even when no behavior data was available. As the user bought tickets through the systems, the explicit collected data was augmented with usage data.

Collaborative recommenders are used by music services like Spotify and iTunes where your playlists and listening habits are compared to other users. It is basically an automatic version of comparing your record collection with your friends to find new music – but with the whole worlds as potential friends.


It is easy to implement such comparison systems because the recommendations are based on easy quantifiable user data like number of plays and playlists positions. The system need no knowledge of the music it recommends, it is content agnostic, and can be used to recommend any type of data, as long as it can compare the usage patterns of the individual users.

Another point of interest is that a collaborative filtering can recommend tracks that are musical diverse. This makes it possible to recommend clusters of tracks that have little musical common ground, but are liked by the same kind of users due to some extra musical characteristics.

As a 38 year-old Dane, I have an interest in Danish music and music from the 90’s where my music taste formed. It would be hard for a recommendation system to establish a typical Danish or 90’s sound, if such a sound even exists. Yet categories like these clearly plays a role in musical taste. By comparing me to other Danish users, a collaborative filter can recommend music based upon biographical data which does not translate into sound in any simple way.

Content based recommendations

Rather than comparing users, some recommender systems like e.g. Pandora compares feature sets (called a feature vectors) extracted from the content items themselves. The assumption behind such content oriented recommendation is that similar tracks appeal to the same users. If user A likes a track and another track has a similar feature vector, chances are that user A will like the second track as well.

The challenge is to select a feature vector with a predictive potential. It can include characteristics like genre, instrumentation, tempo or tonality which can be extracted either manually (like Pandora) or automatically via machine learning (like Syntonetics Moodagent). The latter is called Music Information Retrieval and is an area of much research.

The advantage of an content-oriented recommender is that it can recommend brand new music without any history of use (so called cold-start).


Recommendation based on user behavior raises some privacy concern. A famous example is an Americal jeweler that sent christmas letters to its most loyal customers. Unfortunally this led to a number of angry wifes who had never seen any jewels.

Yet content filtering can be problematic because of the heterogeneity of much music – both within the single track, and between tracks in an album. If a user likes a track, when exactly is the features that the user likes. A tracks features can vary enormous in the temporal domain which prompts for some way of finding the most salient point in a song and then fingerprint the track in that place.

But the biggest downside to content oriented filtering is that it can only be used to recommend similar music. When considering intra musical features one must ask whether these features have much value when recommending music?

I like the band Soundgarden – regardless of features like timbre and mood of the single song. On the other hand, a band like Megadeth, which may share many of the same features, has no interest. These preferences could probably be established more clearly using collaborative filter techniques.

Interacting with the recommender

Some systems simply recommend music, giving the user a minimum of additional control or explanation on why a specific track has been recommended. Other systems give the users a detailed control over the recommendation parameters. An example is Syntonetics Moodagent that let the user interact with the recommendation criterions using a set of sliders.

As users interact with the latter kind of systems, they face a semantic gap between how music is experiences and how music is described using everyday language. This is a problem as old as music itself: How to describe music using words.

When we describe musical experiences we often resort to using metaphors. Music can be wild, heavy, sad, happy and a vocal can be sweet, haunting or troubled and so on. Such metaphor describes musical features by comparing them to experiences from outside the world of music.

Descriptive vs. prescriptive metaphors

In my master thesis I worked with the ramifications of the metaphors in descriptions of music. I identified a number of metaphor types: Production oriented metaphors tent to describe music by the way it is produced while reception oriented metaphors describe how the music is listen to. Descriptive metaphors describe the actual way music sounds while prescriptive metaphors describe how music should be listens to.

An interacting recommendation system must allow users to communicate using metaphors. As metaphors are based upon knowledge from non-musical domains they may appear overly complicated. But metaphors are integral to our way of conceptualize the music experience and this understanding must be taken into account when designing recommenders.

Music professional has a more elaborate vocabulary, with non-metaphorical description like the one used in musical analysis. While a specialized language might yield more precise descriptions, it also risk alienating the average user.


Concluding remarks

In the real-world recommendations are often made using a combination of collaborative and content oriented filtering. This makes sense as our musical taste is not a firm set of objective auditive criterion, but a mesh of cultural habits – grounded in a lot more than what meet the ear.

By combining different approaches, a recommender system can model the users preferences using both intra and extra musical features and recommend similar as well as dissimilar music.

Posted in C#.

Tagged with , , , , , , , , , .

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Some HTML is OK

or, reply to this post via trackback.