Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Coquenet, Denis; Chatelain, Clément; Paquet, Thierry

doi:10.1007/978-3-031-41685-9_12

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2301

Change to browse by:

Computer Science > Computer Vision and Pattern Recognition

Title: Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Authors: Denis Coquenet, Clément Chatelain, Thierry Paquet

(Submitted on 25 Jan 2023)

Abstract: Recent advances in handwritten text recognition enabled to recognize whole documents in an end-to-end way: the Document Attention Network (DAN) recognizes the characters one after the other through an attention-based prediction process until reaching the end of the document. However, this autoregressive process leads to inference that cannot benefit from any parallelization optimization. In this paper, we propose Faster DAN, a two-step strategy to speed up the recognition process at prediction time: the model predicts the first character of each text line in the document, and then completes all the text lines in parallel through multi-target queries and a specific document positional encoding scheme. Faster DAN reaches competitive results compared to standard DAN, while being at least 4 times faster on whole single-page and double-page images of the RIMES 2009, READ 2016 and MAURDOR datasets. Source code and trained model weights are available at this https URL

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Journal reference:	International Conference on Document Analysis and Recognition - ICDAR 2023
DOI:	10.1007/978-3-031-41685-9_12
Cite as:	arXiv:2301.10593 [cs.CV]
	(or arXiv:2301.10593v1 [cs.CV] for this version)

Submission history

From: Denis Coquenet [view email]
[v1] Wed, 25 Jan 2023 13:55:14 GMT (5526kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.10593v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Faster DAN: Multi-target Queries with Document Positional Encoding for End-to-end Handwritten Document Recognition

Submission history