We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CV

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computer Vision and Pattern Recognition

Title: Sheet Music Transformer ++: End-to-End Full-Page Optical Music Recognition for Pianoform Sheet Music

Abstract: Optical Music Recognition is a field that has progressed significantly, bringing accurate systems that transcribe effectively music scores into digital formats. Despite this, there are still several limitations that hinder OMR from achieving its full potential. Specifically, state of the art OMR still depends on multi-stage pipelines for performing full-page transcription, as well as it has only been demonstrated in monophonic cases, leaving behind very relevant engravings. In this work, we present the Sheet Music Transformer++, an end-to-end model that is able to transcribe full-page polyphonic music scores without the need of a previous Layout Analysis step. This is done thanks to an extensive curriculum learning-based pretraining with synthetic data generation. We conduct several experiments on a full-page extension of a public polyphonic transcription dataset. The experimental outcomes confirm that the model is competent at transcribing full-page pianoform scores, marking a noteworthy milestone in end-to-end OMR transcription.
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2405.12105 [cs.CV]
  (or arXiv:2405.12105v2 [cs.CV] for this version)

Submission history

From: Antonio Ríos-Vila [view email]
[v1] Mon, 20 May 2024 15:21:48 GMT (10265kb,D)
[v2] Tue, 21 May 2024 08:16:00 GMT (10265kb,D)

Link back to: arXiv, form interface, contact.