Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE

Georges, Marc-Antoine; Schwartz, Jean-Luc; Hueber, Thomas

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Change to browse by:

Computer Science > Computation and Language

Title: Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE

Authors: Marc-Antoine Georges, Jean-Luc Schwartz, Thomas Hueber

(Submitted on 17 Jun 2022)

Abstract: The human perception system is often assumed to recruit motor knowledge when processing auditory speech inputs. Using articulatory modeling and deep learning, this study examines how this articulatory information can be used for discovering speech units in a self-supervised setting. We used vector-quantized variational autoencoders (VQ-VAE) to learn discrete representations from articulatory and acoustic speech data. In line with the zero-resource paradigm, an ABX test was then used to investigate how the extracted representations encode phonetically relevant properties. Experiments were conducted on three different corpora in English and French. We found that articulatory information rather organises the latent representations in terms of place of articulation whereas the speech acoustics mainly structure the latent space in terms of manner of articulation. We show that an optimal fusion of the two modalities can lead to a joint representation of these phonetic dimensions more accurate than each modality considered individually. Since articulatory information is usually not available in a practical situation, we finally investigate the benefit it provides when inferred from the speech acoustics in a self-supervised manner.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2206.08790 [cs.CL]
	(or arXiv:2206.08790v1 [cs.CL] for this version)

Submission history

From: Marc-Antoine Georges [view email]
[v1] Fri, 17 Jun 2022 14:04:24 GMT (340kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.08790

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Self-supervised speech unit discovery from articulatory and acoustic features using VQ-VAE

Submission history