Current browse context:
cs.CL
Change to browse by:
References & Citations
Computer Science > Computation and Language
Title: Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models
(Submitted on 1 Nov 2018 (v1), last revised 15 Apr 2019 (this version, v2))
Abstract: We investigate unsupervised models that can map a variable-duration speech segment to a fixed-dimensional representation. In settings where unlabelled speech is the only available resource, such acoustic word embeddings can form the basis for "zero-resource" speech search, discovery and indexing systems. Most existing unsupervised embedding methods still use some supervision, such as word or phoneme boundaries. Here we propose the encoder-decoder correspondence autoencoder (EncDec-CAE), which, instead of true word segments, uses automatically discovered segments: an unsupervised term discovery system finds pairs of words of the same unknown type, and the EncDec-CAE is trained to reconstruct one word given the other as input. We compare it to a standard encoder-decoder autoencoder (AE), a variational AE with a prior over its latent embedding, and downsampling. EncDec-CAE outperforms its closest competitor by 24% relative in average precision on two languages in a word discrimination task.
Submission history
From: Herman Kamper [view email][v1] Thu, 1 Nov 2018 14:17:01 GMT (190kb,D)
[v2] Mon, 15 Apr 2019 14:28:07 GMT (190kb,D)
Link back to: arXiv, form interface, contact.