Timezone: »

Machine Learning for Music Discovery
Erik Schmidt · Oriol Nieto · Fabien Gouyon · Katherine Kinnaird · Gert Lanckriet

Sat Jun 15 08:30 AM -- 06:00 PM (PDT) @ 204

The ever-increasing size and accessibility of vast music libraries has created a demand more than ever for artificial systems that are capable of understanding, organizing, or even generating such complex data. While this topic has received relatively marginal attention within the machine learning community, it has been an area of intense focus within the community of Music Information Retrieval (MIR). While significant progress has been made, these problems remain far from solved.

Furthermore, the recommender systems community has made great advances in terms of collaborative feedback recommenders, but these approaches suffer strongly from the cold-start problem. As such, recommendation techniques often fall back on content-based machine learning systems, but defining musical similarity is extremely challenging as myriad features all play some role (e.g., cultural, emotional, timbral, rhythmic). Thus, for machines must actually understand music to achieve an expert level of music recommendation.

On the other side of this problem sits the recent explosion of work in the area of machine creativity. Relevant examples are both Google Magenta and the startup Jukedeck, who seek to develop algorithms capable of composing and performing completely original (and compelling) works of music. These algorithms require a similar deep understanding of music and present challenging new problems for the machine learning and AI community at large.

This workshop proposal is timely in that it will bridge these separate pockets of otherwise very related research. And in addition to making progress on the challenges above, we hope to engage the wide AI and machine learning community with our nebulous problem space, and connect them with the many available datasets the MIR community has to offer (e.g., Audio Set, AcousticBrainz, Million Song Dataset), which offer near commercial scale to the academic research community.

Sat 9:00 a.m. - 10:00 a.m.
[ Video

In this talk, I'll be discussing a few key differences between recommending music and recommending movies or TV shows, and how these differences can lead to vastly different designs, approaches, and algorithms to find the best possible recommendation for a user. On the other hand, I'll also discuss some common challenges and some of our recent research on these topics, such as better understanding the impact of a recommendation, enable better offline metrics, or optimizing for longer-term outcomes. Most importantly, I'll try to leave a lot of time for questions and discussions.

Yves Raimond
Sat 10:00 a.m. - 11:00 a.m.
Poster Presentations (Part 1) (Poster Session)
Ruchit Agrawal, Jeong Choi, Siddharth Gururani Gururani, Nima Hamidi, Harsh Jhamtani, Radha Manisha Kopparti, Ben Krause, Jongpil Lee, Ashis Pati, Fedor Zhdanov
Sat 11:00 a.m. - 11:20 a.m.
[ Video

Many tasks in audio-based music analysis require building mappings between complex representational spaces, such as the input audio signal (or spectral representation), and structured, time-varying output such as pitch, harmony, instrumentation, rhythm, or structure. These mappings encode musical domain knowledge, and involve processing and integrating knowledge at multiple scales simultaneously. It typically takes humans years of training and practice to master these concepts, and as a result, data collection for sophisticated musical analysis tasks is often costly and time-consuming. With limited available data with reliable annotations, it can be difficult to build robust models to automate music annotation by computational means. However, musical problems often exhibit a great deal of structure, either in the input or output representations, or even between related tasks, which can be effectively leveraged to reduce data requirements. In this talk, I will survey several recent manifestations of this phenomenon across different music and audio analysis problems, drawing on recent work from the NYU Music and Audio Research Lab.

Brian McFee
Sat 11:20 a.m. - 11:40 a.m.
Two-level Explanations in Music Emotion Recognition (Accepted Talk) [ Video
Verena Haunschmid
Sat 11:40 a.m. - 12:00 p.m.
[ Video

We seek to identify musical correlates of real-world discovery behavior by analyzing users' audio identification queries from the Shazam service. Recent research has shown that such queries are not uniformly distributed over the course of a song, but rather form clusters that may implicate musically salient events. Using a publicly available dataset of Shazam queries, we extend this research and examine candidate musical features driving increases in query likelihood. Our findings suggest a relationship between musical novelty -- including but not limited to structural segmentation boundaries -- and ensuing peaks in discovery-based musical engagement.

Blair Kaneshiro
Sat 12:00 p.m. - 12:20 p.m.
NPR: Neural Personalised Ranking for Song Selection (Accepted Talk) [ Video
Mark Levy
Sat 2:00 p.m. - 2:20 p.m.
[ Video

I’ll give an overview of some of the projects that we are working on to make Amazon Music more personalized for our customers. Projects include personalized speech and language understanding for voice search, personalizing “Play Music” requests on Alexa, and work with traditional recommender models as building blocks for many customer experiences.

Kat Ellis
Sat 2:20 p.m. - 2:40 p.m.
A Model-Driven Exploration of Accent Within the Amateur Singing Voice (Accepted Talk) [ Video
Camille Noufi
Sat 2:40 p.m. - 3:00 p.m.

Melody extraction has been an active topic of research in Music Information Retrieval for decades now. And yet - what is a melody? As a community we still (mostly) shy away from this question, resorting to definition-by-annotation. How well do past/present/future algorithms perform? Despite known limitations with existing datasets and metrics, we still (mostly) stick to the same ones. And last but not least, why do melody extraction at all? Despite great promise (e.g. query-by-humming, large-scale musicological analyses, etc.), melody extraction has seen limited application outside of MIR research. In this talk I will present three problems that are common to several of research area in music informatics: the challenge of trying to model ambiguous musical concepts by training models with somewhat arbitrary reference annotations, the lack of model generalization in the face of small, low-variance training sets, and the possible disconnect between parts of the music informatics research community and the potential users of the technologies it produces.

Justin Salamon
Sat 3:00 p.m. - 4:30 p.m.
Poster Presentations (Part 2) (Poster Session)
Ruchit Agrawal, Jeong Choi, Siddharth Gururani Gururani, Nima Hamidi, Harsh Jhamtani, Radha Manisha Kopparti, Ben Krause, Jongpil Lee, Ashis Pati, Fedor Zhdanov
Sat 4:30 p.m. - 4:50 p.m.
[ Video

Musical statements can be interpreted in performance with a wide variety of stylistic and expressive inflections. We explore how different musical characters are performed based on an extension of the basis function models, a data-driven framework for expressive performance. In this framework, expressive dimensions such as tempo, dynamics and articulation are modeled as a function of score features, i.e. numerical encodings of specific aspects of a musical score, using neural networks. By allowing the user to weight the contribution of the input score features, we show that predictions of expressive dimensions can be used to express different musical characters

Zhengshan Shi
Sat 4:50 p.m. - 5:10 p.m.
Interactive Neural Audio Synthesis (Accepted Talk) [ Video
Hanoi Hantrakul
Sat 5:10 p.m. - 5:30 p.m.
Visualizing and Understanding Self-attention based Music Tagging (Accepted Talk) [ Video
Minz Won
Sat 5:30 p.m. - 5:50 p.m.
A CycleGAN for style transfer between drum & bass subgenres (Accepted Talk) [ Video
Len Vande Veire

Author Information

Erik Schmidt (Pandora)
Oriol Nieto (Pandora)
Fabien Gouyon (Pandora)
Katherine Kinnaird (Smith College)
Gert Lanckriet (Amazon/UCSD)

More from the Same Authors