We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Capitalization and Punctuation Restoration: a Survey

Abstract: Ensuring proper punctuation and letter casing is a key pre-processing step towards applying complex natural language processing algorithms. This is especially significant for textual sources where punctuation and casing are missing, such as the raw output of automatic speech recognition systems. Additionally, short text messages and micro-blogging platforms offer unreliable and often wrong punctuation and casing. This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing. Furthermore, current challenges and research directions are highlighted.
Comments: An improved version of this paper was published in Artificial Intelligence Review. This is the article version prior to any reviewer comments and improvements
Subjects: Computation and Language (cs.CL)
Journal reference: P\u{a}i\c{s}, V., Tufi\c{s}, D. Capitalization and punctuation restoration: a survey. Artif Intell Rev (2021)
DOI: 10.1007/s10462-021-10051-x
Cite as: arXiv:2111.10746 [cs.CL]
  (or arXiv:2111.10746v1 [cs.CL] for this version)

Submission history

From: Vasile Păiș [view email]
[v1] Sun, 21 Nov 2021 05:48:32 GMT (501kb)

Link back to: arXiv, form interface, contact.