We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Investigating self-supervised learning for lyrics recognition

Abstract: Lyrics recognition is an important task in music processing. Despite the great number of traditional algorithms such as the hybrid HMM-TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models. We evaluate four upstream SSL models based on their training method (masked reconstruction, masked prediction, autoregressive reconstruction, contrastive model). After applying the SSL model, the best performance improved by 5.23% for the dev set and 2.4% for the test set compared with the previous state-of-art baseline system even without a language model trained by a large corpus. Moreover, we study the generalization ability of the SSL features considering that those models were not trained on music datasets.
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Cite as: arXiv:2209.12702 [eess.AS]
  (or arXiv:2209.12702v1 [eess.AS] for this version)

Submission history

From: Shuyue Stella Li [view email]
[v1] Mon, 26 Sep 2022 13:51:56 GMT (215kb,D)
[v2] Wed, 28 Sep 2022 01:34:37 GMT (215kb,D)
[v3] Mon, 24 Oct 2022 00:57:34 GMT (336kb,D)
[v4] Thu, 27 Oct 2022 03:04:01 GMT (336kb,D)

Link back to: arXiv, form interface, contact.