Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition

Kim, Taejun; Nam, Juhan

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1911

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition

Authors: Taejun Kim, Juhan Nam

(Submitted on 30 Oct 2019 (v1), last revised 18 Sep 2022 (this version, v3))

Abstract: End-to-end learning models using raw waveforms as input have shown superior performances in many audio recognition tasks. However, most model architectures are based on convolutional neural networks (CNN) which were mainly developed for visual recognition tasks. In this paper, we propose an extension of squeeze-and-excitation networks (SENets) which adds temporal feedback control from the top-layer features to channel-wise feature activations in lower layers using a recurrent module. This is analogous to the adaptive gain control mechanism of outer hair-cell in the human auditory system. We apply the proposed model to speech command recognition and show that it slightly outperforms the SENets and other CNN-based models. We also investigate the details of the performance improvement by conducting failure analysis and visualizing the channel-wise feature scaling induced by the temporal feedback.

Comments:	This paper is accepted to APSIPA ASC 2022
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
Cite as:	arXiv:1911.01803 [eess.AS]
	(or arXiv:1911.01803v3 [eess.AS] for this version)

Submission history

From: Taejun Kim [view email]
[v1] Wed, 30 Oct 2019 04:11:29 GMT (262kb,D)
[v2] Tue, 13 Sep 2022 06:08:32 GMT (1015kb,D)
[v3] Sun, 18 Sep 2022 07:12:41 GMT (1016kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1911.01803v3

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition

Submission history