Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

Hsu, Chin-Cheng; Hwang, Hsin-Te; Wu, Yi-Chiao; Tsao, Yu; Wang, Hsin-Min

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1610

Statistics > Machine Learning

Title: Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

Authors: Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao, Hsin-Min Wang

(Submitted on 13 Oct 2016)

Abstract: We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1610.04019 [stat.ML]
	(or arXiv:1610.04019v1 [stat.ML] for this version)

Submission history

From: Chin-Cheng Hsu [view email]
[v1] Thu, 13 Oct 2016 10:52:25 GMT (338kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1610.04019

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Voice Conversion from Non-parallel Corpora Using Variational Auto-encoder

Submission history