We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

Abstract: This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic posteriorgrams based voice conversion system that can perform any-to-many voice conversion while having only 57.5 ms future look-ahead. The complete system is composed of three neural networks trained separately with non-parallel data. While most of the current voice conversion systems focus primarily on quality irrespective of algorithmic latency, this work elaborates on designing a method using a minimal amount of future context thus allowing a future real-time implementation. According to a subjective listening test organized in this work, the proposed AC-VC system achieves parity with the non-causal ASR-TTS baseline of the Voice Conversion Challenge 2020 in naturalness with a MOS of 3.5. In contrast, the results indicate that missing future context impacts speaker similarity. Obtained similarity percentage of 65% is lower than the similarity of current best voice conversion systems.
Comments: ASRU 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as: arXiv:2111.06601 [eess.AS]
  (or arXiv:2111.06601v1 [eess.AS] for this version)

Submission history

From: Milos Cernak [view email]
[v1] Fri, 12 Nov 2021 08:21:54 GMT (179kb,D)

Link back to: arXiv, form interface, contact.