Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Agrawal, Purvi; Ganapathy, Sriram

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2001

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Authors: Purvi Agrawal, Sriram Ganapathy

(Submitted on 20 Jan 2020)

Abstract: Speech recognition from raw waveform involves learning the spectral decomposition of the signal in the first layer of the neural acoustic model using a convolution layer. In this work, we propose a raw waveform convolutional filter learning approach using soft self-attention. The acoustic filter bank in the proposed model is implemented using a parametric cosine-modulated Gaussian filter bank whose parameters are learned. A network-in-network architecture provides self-attention to generate attention weights over the sub-band filters. The attention weighted log filter bank energies are fed to the acoustic model for the task of speech recognition. Experiments are conducted on Aurora-4 (additive noise with channel artifact), and CHiME-3 (additive noise with reverberation) databases. In these experiments, the attention based filter learning approach provides considerable improvements in ASR performance over the baseline mel filter-bank features and other robust front-ends (average relative improvement of 7% in word error rate over baseline features on Aurora-4 dataset, and 5% on CHiME-3 database). Using the self-attention weights, we also present an analysis on the interpretability of the filters for the ASR task.

Subjects:	Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2001.07067 [eess.AS]
	(or arXiv:2001.07067v1 [eess.AS] for this version)

Submission history

From: Purvi Agrawal [view email]
[v1] Mon, 20 Jan 2020 11:39:44 GMT (240kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2001.07067

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Interpretable Filter Learning Using Soft Self-attention For Raw Waveform Speech Recognition

Submission history