We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Supervised Initialization of LSTM Networks for Fundamental Frequency Detection in Noisy Speech Signals

Abstract: Fundamental frequency is one of the most important parameters of human speech, of importance for the classification of accent, gender, speaking styles, speaker identification, age, among others. The proper detection of this parameter remains as an important challenge for severely degraded signals. In previous references for detecting fundamental frequency in noisy speech using deep learning, the networks, such as Long Short-term Memory (LSTM) has been initialized with random weights, and then trained following a back-propagation through time algorithm. In this work, a proposal for a more efficient initialization, based on a supervised training using an Auto-associative network, is presented. This initialization is a better starting point for the detection of fundamental frequency in noisy speech. The advantages of this initialization are noticeable using objective measures for the accuracy of the detection and for the training of the networks, under the presence of additive white noise at different signal-to-noise levels.
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as: arXiv:1911.04580 [cs.SD]
  (or arXiv:1911.04580v1 [cs.SD] for this version)

Submission history

From: Marvin Coto Mr. [view email]
[v1] Mon, 11 Nov 2019 21:57:29 GMT (305kb)

Link back to: arXiv, form interface, contact.