SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks

Guirguis, Karim; Schorn, Christoph; Guntoro, Andre; Abdulatif, Sherif; Yang, Bin

doi:10.23919/Eusipco47968.2020.9287716

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2003

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks

Authors: Karim Guirguis, Christoph Schorn, Andre Guntoro, Sherif Abdulatif, Bin Yang

(Submitted on 3 Mar 2020)

Abstract: The understanding of the surrounding environment plays a critical role in autonomous robotic systems, such as self-driving cars. Extensive research has been carried out concerning visual perception. Yet, to obtain a more complete perception of the environment, autonomous systems of the future should also take acoustic information into account. Recent sound event localization and detection (SELD) frameworks utilize convolutional recurrent neural networks (CRNNs). However, considering the recurrent nature of CRNNs, it becomes challenging to implement them efficiently on embedded hardware. Not only are their computations strenuous to parallelize, but they also require high memory bandwidth and large memory buffers. In this work, we develop a more robust and hardware-friendly novel architecture based on a temporal convolutional network(TCN). The proposed framework (SELD-TCN) outperforms the state-of-the-art SELDnet performance on four different datasets. Moreover, SELD-TCN achieves 4x faster training time per epoch and 40x faster inference time on an ordinary graphics processing unit (GPU).

Comments:	5 pages, 3 tables, 2 figures. Submitted to EUSIPCO 2020
Subjects:	Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
DOI:	10.23919/Eusipco47968.2020.9287716
Cite as:	arXiv:2003.01609 [eess.AS]
	(or arXiv:2003.01609v1 [eess.AS] for this version)

Submission history

From: Karim Guirguis [view email]
[v1] Tue, 3 Mar 2020 15:48:57 GMT (7838kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2003.01609

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: SELD-TCN: Sound Event Localization & Detection via Temporal Convolutional Networks

Submission history