We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: OCR Error Correction Using Character Correction and Feature-Based Word Classification

Abstract: This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.
Comments: Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS2016), Santorini, Greece, April 11-14, 2016
Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Journal reference: Proceedings of the 12th IAPR International Workshop on Document Analysis Systems (DAS 2016), Santorini, Greece, pp. 198-203 (2016)
Cite as: arXiv:1604.06225 [cs.IR]
  (or arXiv:1604.06225v1 [cs.IR] for this version)

Submission history

From: Ido Kissos [view email]
[v1] Thu, 21 Apr 2016 09:25:11 GMT (85kb)

Link back to: arXiv, form interface, contact.