Audio Concept Classification with Hierarchical Deep Neural Networks

Ravanelli, Mirco; Elizalde, Benjamin; Ni, Karl; Friedland, Gerald

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1710

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Audio Concept Classification with Hierarchical Deep Neural Networks

Authors: Mirco Ravanelli, Benjamin Elizalde, Karl Ni, Gerald Friedland

(Submitted on 11 Oct 2017)

Abstract: Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of audio concepts. However, multi-class classification can benefit from context window analysis and the discriminating power of deeper architectures. Although deep learning has shown promise in various applications such as speech and object recognition, it has not yet met the expectations for other fields such as audio concept classification. This paper explores, for the first time, the potential of deep learning in classifying audio concepts on User-Generated Content videos. The proposed system is comprised of two cascaded neural networks in a hierarchical configuration to analyze the short- and long-term context information. Our system outperforms a GMM approach by a relative 54%, a Neural Network by 33%, and a Deep Neural Network by 12% on the TRECVID-MED database

Subjects:	Audio and Speech Processing (eess.AS); Sound (cs.SD)
Journal reference:	EUSIPCO 2014
Cite as:	arXiv:1710.04288 [eess.AS]
	(or arXiv:1710.04288v1 [eess.AS] for this version)

Submission history

From: Mirco Ravanelli [view email]
[v1] Wed, 11 Oct 2017 20:07:28 GMT (253kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1710.04288

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Audio Concept Classification with Hierarchical Deep Neural Networks

Submission history