We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: The ZevoMOS entry to VoiceMOS Challenge 2022

Authors: Adriana Stan
Abstract: This paper introduces the ZevoMOS entry to the main track of the VoiceMOS Challenge 2022. The ZevoMOS submission is based on a two-step finetuning of pretrained self-supervised learning (SSL) speech models. The first step uses a task of classifying natural versus synthetic speech, while the second step's task is to predict the MOS scores associated with each training sample. The results of the finetuning process are then combined with the confidence scores extracted from an automatic speech recognition model, as well as the raw embeddings of the training samples obtained from a wav2vec SSL speech model.
The team id assigned to the ZevoMOS system within the VoiceMOS Challenge is T01. The submission was placed on the 14th place with respect to the system-level SRCC, and on the 9th place with respect to the utterance-level MSE. The paper also introduces additional evaluations of the intermediate results.
Comments: Accepted at Interspeech 2022 - VoiceMOS Challenge; 5 pages, 2 figures, 2 tables
Subjects: Audio and Speech Processing (eess.AS)
Cite as: arXiv:2206.07448 [eess.AS]
  (or arXiv:2206.07448v1 [eess.AS] for this version)

Submission history

From: Adriana Stan PhD [view email]
[v1] Wed, 15 Jun 2022 10:53:22 GMT (697kb,D)

Link back to: arXiv, form interface, contact.