AMAAI Webinar - Disentangled representation learning by GM-VAEs by Yin-Jyun Luo

  • 4 years ago
AMAAI Webinar http://dorienherremans.com/webinars

Webinar on nnAudio by Yin-Jyun Luo, SUTD

Title: Disentangled Representation Learning Using Gaussian-Mixture Variational Auto-encoders: Applications for Synthesis and Conversion of Musical Signals

Abstract: Disentangled representation learning aims to uncover generative factors of variation of data. This could enable analysis of interpretable features and synthesis of novel data. In the context of deep learning, variational auto-encoders (VAEs) are one of the most popular frameworks for learning disentangled representations. VAEs describe a data-generating process that first samples a latent variable from a prior distribution, and samples an observation from a distribution conditioned on the latent variable. Training VAEs thus captures disentangled representations with the latent variable. In this talk, we present a VAE that learns significant factors of variation for either isolated musical instrument sounds or expressive singing voices [1, 2]. In particular, we exploit Gaussian-mixture prior distribution for the latent variables of interest, thereby capturing multi-modality of the data. We verify and demonstrate the model's capability of controllable attribute synthesis and conversion.

[1] Yin-Jyun Luo, Kat Agres, Dorien Herremans. "Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders," ISMIR 2019. https://dorienherremans.com/sites/default/files/1912.02613_0.pdf
[2] Yin-Jyun Luo, Chin-Cheng Hsu, Kat Agres, Dorien Herremans, "Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders," ICASSP 2020. https://dorienherremans.com/sites/default/files/jyun-ismir.pdf