We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Attention-based multi-task learning for speech-enhancement and speaker-identification in multi-speaker dialogue scenario

Abstract: Multi-task learning (MTL) and attention mechanism have been proven to effectively extract robust acoustic features for various speech-related tasks in noisy environments. In this study, we propose an attention-based MTL (ATM) approach that integrates MTL and the attention-weighting mechanism to simultaneously realize a multi-model learning structure that performs speech enhancement (SE) and speaker identification (SI). The proposed ATM system consists of three parts: SE, SI, and attention-Net (AttNet). The SE part is composed of a long-short-term memory (LSTM) model, and a deep neural network (DNN) model is used to develop the SI and AttNet parts. The overall ATM system first extracts the representative features and then enhances the speech signals in LSTM-SE and specifies speaker identity in DNN-SI. The AttNet computes weights based on DNN-SI to prepare better representative features for LSTM-SE. We tested the proposed ATM system on Taiwan Mandarin hearing in noise test sentences. The evaluation results confirmed that the proposed system can effectively enhance speech quality and intelligibility of a given noisy input. Moreover, the accuracy of the SI can also be notably improved by using the proposed ATM system.
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Journal reference: IEEE International Symposium on Circuits and Systems 2021
Cite as: arXiv:2101.02550 [eess.AS]
  (or arXiv:2101.02550v2 [eess.AS] for this version)

Submission history

From: SyuSiang Wang [view email]
[v1] Thu, 7 Jan 2021 14:27:00 GMT (351kb)
[v2] Sun, 21 Feb 2021 22:47:59 GMT (771kb)

Link back to: arXiv, form interface, contact.