Speech Synthesis with Mixed Emotions

Zhou, Kun; Sisman, Berrak; Rana, Rajib; Schuller, B. W.; Li, Haizhou

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2208

Computer Science > Computation and Language

Title: Speech Synthesis with Mixed Emotions

Authors: Kun Zhou, Berrak Sisman, Rajib Rana, B. W. Schuller, Haizhou Li

(Submitted on 11 Aug 2022 (v1), last revised 28 Dec 2022 (this version, v3))

Abstract: Emotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixture of emotions at run-time. We propose a novel formulation that measures the relative difference between the speech samples of different emotions. We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework. During the training, the framework does not only explicitly characterize emotion styles, but also explores the ordinal nature of emotions by quantifying the differences with other emotions. At run-time, we control the model to produce the desired emotion mixture by manually defining an emotion attribute vector. The objective and subjective evaluations have validated the effectiveness of the proposed framework. To our best knowledge, this research is the first study on modelling, synthesizing, and evaluating mixed emotions in speech.

Comments:	Accepted to IEEE Transactions on Affective Computing
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2208.05890 [cs.CL]
	(or arXiv:2208.05890v3 [cs.CL] for this version)

Submission history

From: Kun Zhou [view email]
[v1] Thu, 11 Aug 2022 15:45:58 GMT (2852kb,D)
[v2] Thu, 22 Dec 2022 06:29:36 GMT (2969kb,D)
[v3] Wed, 28 Dec 2022 21:38:06 GMT (2969kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.05890

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Speech Synthesis with Mixed Emotions

Submission history