LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

Miao, Xiaoxiao; McLoughlin, Ian

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1912

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

Authors: Xiaoxiao Miao, Ian McLoughlin

(Submitted on 19 Dec 2019)

Abstract: This paper presents a novel Dialect Identification (DID) system developed for the Fifth Edition of the Multi-Genre Broadcast challenge, the task of Fine-grained Arabic Dialect Identification (MGB-5 ADI Challenge). The system improves upon traditional DNN x-vector performance by employing a Convolutional and Long Short Term Memory-Recurrent (CLSTM) architecture to combine the benefits of a convolutional neural network front-end for feature extraction and a back-end recurrent neural to capture longer temporal dependencies. Furthermore we investigate intensive augmentation of one low resource dialect in the highly unbalanced training set using time-scale modification (TSM). This converts an utterance to several time-stretched or time-compressed versions, subsequently used to train the CLSTM system without using any other corpus. In this paper, we also investigate speech augmentation using MUSAN and the RIR datasets to increase the quantity and diversity of the existing training data in the normal way. Results show firstly that the CLSTM architecture outperforms a traditional DNN x-vector implementation. Secondly, adopting TSM-based speed perturbation yields a small performance improvement for the unbalanced data, finally that traditional data augmentation techniques yield further benefit, in line with evidence from related speaker and language recognition tasks. Our system achieved 2nd place ranking out of 15 entries in the MGB-5 ADI challenge, presented at ASRU 2019.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:1912.09003 [eess.AS]
	(or arXiv:1912.09003v1 [eess.AS] for this version)

Submission history

From: Xiaoxiao Miao [view email]
[v1] Thu, 19 Dec 2019 03:20:33 GMT (287kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1912.09003v1

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

Submission history