Language-based Audio Retrieval Task in DCASE 2022 Challenge

Xie, Huang; Lipping, Samuel; Virtanen, Tuomas

Full-text links:

Download:

Source

Current browse context:

eess.AS

< prev | next >

new | recent | 2209

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Language-based Audio Retrieval Task in DCASE 2022 Challenge

Authors: Huang Xie, Samuel Lipping, Tuomas Virtanen

(Submitted on 20 Sep 2022 (v1), last revised 4 Oct 2022 (this version, v3))

Abstract: Language-based audio retrieval is a task, where natural language textual captions are used as queries to retrieve audio signals from a dataset. It has been first introduced into DCASE 2022 Challenge as Subtask 6B of task 6, which aims at developing computational systems to model relationships between audio signals and free-form textual descriptions. Compared with audio captioning (Subtask 6A), which is about generating audio captions for audio signals, language-based audio retrieval (Subtask 6B) focuses on ranking audio signals according to their relevance to natural language textual captions. In DCASE 2022 Challenge, the provided baseline system for Subtask 6B was significantly outperformed, with top performance being 0.276 in mAP@10. This paper presents the outcome of Subtask 6B in terms of submitted systems' performance and analysis.

Comments:	Update for arXiv:2206.06108 mistakenly submitted as a new article
Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as:	arXiv:2209.09967 [eess.AS]
	(or arXiv:2209.09967v3 [eess.AS] for this version)

Submission history

From: Huang Xie [view email]
[v1] Tue, 20 Sep 2022 19:51:53 GMT (74kb,D)
[v2] Tue, 27 Sep 2022 18:10:10 GMT (0kb,I)
[v3] Tue, 4 Oct 2022 13:50:20 GMT (0kb,I)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2209.09967

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Language-based Audio Retrieval Task in DCASE 2022 Challenge

Submission history