Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

Fan, Jianyu; Nichols, Eric; Tompkins, Daniel; Mendez, Ana Elisa Mendez; Elizalde, Benjamin; Pasquier, Philippe

Full-text links:

Download:

PDF only

Current browse context:

eess.AS

< prev | next >

new | recent | 2002

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

Authors: Jianyu Fan, Eric Nichols, Daniel Tompkins, Ana Elisa Mendez Mendez, Benjamin Elizalde, Philippe Pasquier

(Submitted on 20 Feb 2020)

Abstract: Realistic recordings of soundscapes often have multiple sound events co-occurring, such as car horns, engine and human voices. Sound event retrieval is a type of content-based search aiming at finding audio samples, similar to an audio query based on their acoustic or semantic content. State of the art sound event retrieval models have focused on single-label audio recordings, with only one sound event occurring, rather than on multi-label audio recordings (i.e., multiple sound events occur in one recording). To address this latter problem, we propose different Deep Learning architectures with a Siamese-structure and a Pairwise Presence Matrix. The networks are trained and evaluated using the SONYC-UST dataset containing both single- and multi-label soundscape recordings. The performance results show the effectiveness of our proposed model.

Comments:	Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects:	Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2002.09026 [eess.AS]
	(or arXiv:2002.09026v1 [eess.AS] for this version)

Submission history

From: Jianyu Fan [view email]
[v1] Thu, 20 Feb 2020 21:33:07 GMT (2067kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2002.09026

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix

Submission history