We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Music-robust Automatic Lyrics Transcription of Polyphonic Music

Abstract: Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i.e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i.e. music-present features. We show that these two sets of features complement each other, and their combination performs better than when they are used alone, thus improving the robustness of the acoustic model to the background music. Furthermore, language model interpolation between a general-purpose language model and an in-domain lyrics-specific language model provides further improvement in transcription results. Our experiments show that our proposed strategy outperforms the existing lyrics transcription systems for polyphonic music. Moreover, we find that our proposed music-robust features specially improve the lyrics transcription performance in metal genre of songs, where the background music is loud and dominant.
Comments: 7 pages, 2 figures, accepted by 2022 Sound and Music Computing
Subjects: Audio and Speech Processing (eess.AS)
Cite as: arXiv:2204.03306 [eess.AS]
  (or arXiv:2204.03306v2 [eess.AS] for this version)

Submission history

From: Xiaoxue Gao [view email]
[v1] Thu, 7 Apr 2022 09:14:58 GMT (118kb,D)
[v2] Fri, 22 Apr 2022 12:06:57 GMT (118kb,D)

Link back to: arXiv, form interface, contact.