Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

Tan, Hao Hao; Herremans, Dorien

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2007

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

Authors: Hao Hao Tan, Dorien Herremans

(Submitted on 29 Jul 2020)

Abstract: High-level musical qualities (such as emotion) are often abstract, subjective, and hard to quantify. Given these difficulties, it is not easy to learn good feature representations with supervised learning techniques, either because of the insufficiency of labels, or the subjectiveness (and hence large variance) in human-annotated labels. In this paper, we present a framework that can learn high-level feature representations with a limited amount of data, by first modelling their corresponding quantifiable low-level attributes. We refer to our proposed framework as Music FaderNets, which is inspired by the fact that low-level attributes can be continuously manipulated by separate "sliding faders" through feature disentanglement and latent regularization techniques. High-level features are then inferred from the low-level representations through semi-supervised clustering using Gaussian Mixture Variational Autoencoders (GM-VAEs). Using arousal as an example of a high-level feature, we show that the "faders" of our model are disentangled and change linearly w.r.t. the modelled low-level attributes of the generated output music. Furthermore, we demonstrate that the model successfully learns the intrinsic relationship between arousal and its corresponding low-level attributes (rhythm and note density), with only 1% of the training set being labelled. Finally, using the learnt high-level feature representations, we explore the application of our framework in style transfer tasks across different arousal states. The effectiveness of this approach is verified through a subjective listening test.

Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
Journal reference:	Proc. of 21st International Society of Music Information Retrieval Conference, ISMIR 2020
Cite as:	arXiv:2007.15474 [eess.AS]
	(or arXiv:2007.15474v1 [eess.AS] for this version)

Submission history

From: Hao Hao Tan [view email]
[v1] Wed, 29 Jul 2020 16:01:45 GMT (15114kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2007.15474

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling

Submission history