CAT: Causal Audio Transformer for Audio Classification

Liu, Xiaoyu; Lu, Hanlin; Yuan, Jianbo; Li, Xinyu

Full-text links:

Download:

Current browse context:

eess

< prev | next >

new | recent | 2303

Computer Science > Sound

Title: CAT: Causal Audio Transformer for Audio Classification

Authors: Xiaoyu Liu, Hanlin Lu, Jianbo Yuan, Xinyu Li

(Submitted on 14 Mar 2023)

Abstract: The attention-based Transformers have been increasingly applied to audio classification because of their global receptive field and ability to handle long-term dependency. However, the existing frameworks which are mainly extended from the Vision Transformers are not perfectly compatible with audio signals. In this paper, we introduce a Causal Audio Transformer (CAT) consisting of a Multi-Resolution Multi-Feature (MRMF) feature extraction with an acoustic attention block for more optimized audio modeling. In addition, we propose a causal module that alleviates over-fitting, helps with knowledge transfer, and improves interpretability. CAT obtains higher or comparable state-of-the-art classification performance on ESC50, AudioSet and UrbanSound8K datasets, and can be easily generalized to other Transformer-based models.

Comments:	Accepted to ICASSP 2023
Subjects:	Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2303.07626 [cs.SD]
	(or arXiv:2303.07626v1 [cs.SD] for this version)

Submission history

From: Xiaoyu Liu [view email]
[v1] Tue, 14 Mar 2023 04:50:52 GMT (1180kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.07626

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Sound

Title: CAT: Causal Audio Transformer for Audio Classification

Submission history