Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

Jiralerspong, Marco; Gidel, Gauthier

Full-text links:

Download:

Current browse context:

cs.SD

< prev | next >

new | recent | 2206

Computer Science > Sound

Title: Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

Authors: Marco Jiralerspong, Gauthier Gidel

(Submitted on 25 Jun 2022)

Abstract: We describe our approach for the generative emotional vocal burst task (ExVo Generate) of the ICML Expressive Vocalizations Competition. We train a conditional StyleGAN2 architecture on mel-spectrograms of preprocessed versions of the audio samples. The mel-spectrograms generated by the model are then inverted back to the audio domain. As a result, our generated samples substantially improve upon the baseline provided by the competition from a qualitative and quantitative perspective for all emotions. More precisely, even for our worst-performing emotion (awe), we obtain an FAD of 1.76 compared to the baseline of 4.81 (as a reference, the FAD between the train/validation sets for awe is 0.776).

Comments:	To be published at the ICML Expressive Vocalizations Workshop and Competition (ExVo Generate) held in conjunction with the 39th International Conference on Machine Learning
Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2206.12563 [cs.SD]
	(or arXiv:2206.12563v1 [cs.SD] for this version)

Submission history

From: Marco Jiralerspong [view email]
[v1] Sat, 25 Jun 2022 05:39:52 GMT (274kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.12563

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Sound

Title: Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms

Submission history