We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SD

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Sound Event Detection Using Duration Robust Loss Function

Abstract: Many methods of sound event detection (SED) based on machine learning regard a segmented time frame as one data sample to model training. However, the sound durations of sound events vary greatly depending on the sound event class, e.g., the sound event ``fan'' has a long time duration, while the sound event ``mouse clicking'' is instantaneous. The difference in the time duration between sound event classes thus causes a serious data imbalance problem in SED. In this paper, we propose a method for SED using a duration robust loss function, which can focus model training on sound events of short duration. In the proposed method, we focus on a relationship between the duration of the sound event and the ease/difficulty of model training. In particular, many sound events of long duration (e.g., sound event ``fan'') are stationary sounds, which have less variation in their acoustic features and their model training is easy. Meanwhile, some sound events of short duration (e.g., sound event ``object impact'') have more than one audio pattern, such as attack, decay, and release parts. We thus apply a class-wise reweighting to the binary-cross entropy loss function depending on the ease/difficulty of model training. Evaluation experiments conducted using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets show that the proposed method respectively improves the detection performance of sound events by 3.15 and 4.37 percentage points in macro- and micro-Fscores compared with a conventional method using the binary-cross entropy loss function.
Comments: Submitted to DCASE2020 Workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2006.15253 [cs.SD]
  (or arXiv:2006.15253v1 [cs.SD] for this version)

Submission history

From: Keisuke Imoto [view email]
[v1] Sat, 27 Jun 2020 01:49:25 GMT (4095kb)

Link back to: arXiv, form interface, contact.