We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Sound

Title: Towards Identity Preserving Normal to Dysarthric Voice Conversion

Abstract: We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker's voice was limited and requires further improvements.
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
Cite as: arXiv:2110.08213 [cs.SD]
  (or arXiv:2110.08213v1 [cs.SD] for this version)

Submission history

From: Wen-Chin Huang [view email]
[v1] Fri, 15 Oct 2021 17:18:02 GMT (203kb,D)

Link back to: arXiv, form interface, contact.