We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Scene-aware Far-field Automatic Speech Recognition

Abstract: We propose a novel method for generating scene-aware training data for far-field automatic speech recognition. We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples. We model the acoustic characteristics of a scene with its reverberation time and represent it using a multivariate Gaussian distribution. We use this distribution to select acoustic impulse responses from a large real-world dataset for augmenting speech data. The speech recognition system trained on our scene-aware data consistently outperforms the system trained using many more random acoustic impulse responses on the REVERB and the AMI far-field benchmarks. In practice, we obtain 2.64% absolute improvement in word error rate compared with using training data of the same size with uniformly distributed reverberation times.
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as: arXiv:2104.10757 [eess.AS]
  (or arXiv:2104.10757v1 [eess.AS] for this version)

Submission history

From: Zhenyu Tang [view email]
[v1] Wed, 21 Apr 2021 20:58:30 GMT (642kb,D)

Link back to: arXiv, form interface, contact.