Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

Sarı, Leda; Moritz, Niko; Hori, Takaaki; Roux, Jonathan Le

Full-text links:

Download:

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

Authors: Leda Sarı, Niko Moritz, Takaaki Hori, Jonathan Le Roux

(Submitted on 14 Feb 2020)

Abstract: We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR). The proposed model contains a memory block that holds speaker i-vectors extracted from the training data and reads relevant i-vectors from the memory through an attention mechanism. The resulting memory vector (M-vector) is concatenated to the acoustic features or to the hidden layer activations of an E2E neural network model. The E2E ASR system is based on the joint connectionist temporal classification and attention-based encoder-decoder architecture. M-vector and i-vector results are compared for inserting them at different layers of the encoder neural network using the WSJ and TED-LIUM2 ASR benchmarks. We show that M-vectors, which do not require an auxiliary speaker embedding extraction system at test time, achieve similar word error rates (WERs) compared to i-vectors for single speaker utterances and significantly lower WERs for utterances in which there are speaker changes.

Comments:	To appear in Proc. ICASSP 2020
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2002.06165 [eess.AS]
	(or arXiv:2002.06165v1 [eess.AS] for this version)

Submission history

From: Niko Moritz [view email]
[v1] Fri, 14 Feb 2020 18:31:31 GMT (75kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2002.06165

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

Submission history