We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

Abstract: Sound event detection (SED) is a task to detect sound events in an audio recording. One challenge of the SED task is that many datasets such as the Detection and Classification of Acoustic Scenes and Events (DCASE) datasets are weakly labelled. That is, there are only audio tags for each audio clip without the onset and offset times of sound events. To address the weakly labelled SED problem, we investigate segment-wise training and clip-wise training methods. The proposed systems are based on the variants of convolutional neural networks (CNNs) including convolutional recurrent neural networks and our proposed CNN-transformers for audio tagging and sound event detection. Another challenge of SED is that only the presence probabilities of sound events are predicted and thresholds are required to predict the presence or absence of sound events. Previous work set this threshold empirically which is not an optimised solution. To solve this problem, we propose an automatic threshold optimization method. The first stage is to optimize the system with respect to metrics that do not depend on the thresholds such as mean average precision (mAP). The second stage is to optimize the thresholds with respect to the metric that depends on those thresholds. This proposed automatic threshold optimization system achieved state-of-the-art audio tagging and SED F1 score of 0.646, 0.584, outperforming the performance with best manually selected thresholds of 0.629 and 0.564, respectively.
Comments: 11 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:1912.04761 [cs.SD]
  (or arXiv:1912.04761v1 [cs.SD] for this version)

Submission history

From: Qiuqiang Kong [view email]
[v1] Tue, 10 Dec 2019 15:25:37 GMT (828kb,D)
[v2] Sun, 23 Aug 2020 10:30:01 GMT (813kb,D)

Link back to: arXiv, form interface, contact.