We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

Abstract: The intelligibility of speech severely degrades in the presence of environmental noise and reverberation. In this paper, we propose a novel deep learning based system for modifying the speech signal to increase its intelligibility under the equal-power constraint, i.e., signal power before and after modification must be the same. To achieve this, we use generative adversarial networks (GANs) to obtain time-frequency dependent amplification factors, which are then applied to the input raw speech to reallocate the speech energy. Instead of optimizing only a single, simple metric, we train a deep neural network (DNN) model to simultaneously optimize multiple advanced speech metrics, including both intelligibility- and quality-related ones, which results in notable improvements in performance and robustness. Our system can not only work in non-realtime mode for offline audio playback but also support practical real-time speech applications. Experimental results using both objective measurements and subjective listening tests indicate that the proposed system significantly outperforms state-ofthe-art baseline systems under various noisy and reverberant listening conditions.
Comments: Accepted to IEEE/ACM Transactions on Audio Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as: arXiv:2104.08499 [eess.AS]
  (or arXiv:2104.08499v2 [eess.AS] for this version)

Submission history

From: Haoyu Li [view email]
[v1] Sat, 17 Apr 2021 09:48:27 GMT (4184kb,D)
[v2] Thu, 16 Sep 2021 12:06:02 GMT (3850kb,D)

Link back to: arXiv, form interface, contact.