Current browse context:
cs.CV
Change to browse by:
References & Citations
Computer Science > Computer Vision and Pattern Recognition
Title: Test-Time Adaptation for Visual Document Understanding
(Submitted on 15 Jun 2022 (v1), last revised 23 Aug 2023 (this version, v2))
Abstract: For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{this https URL}.
Submission history
From: Sayna Ebrahimi [view email][v1] Wed, 15 Jun 2022 01:57:12 GMT (2715kb,D)
[v2] Wed, 23 Aug 2023 22:54:40 GMT (9959kb,D)
Link back to: arXiv, form interface, contact.