We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: PAWLS: PDF Annotation With Labels and Structure

Abstract: Adobe's Portable Document Format (PDF) is a popular way of distributing view-only documents with a rich visual markup. This presents a challenge to NLP practitioners who wish to use the information contained within PDF documents for training models or data analysis, because annotating these documents is difficult. In this paper, we present PDF Annotation with Labels and Structure (PAWLS), a new annotation tool designed specifically for the PDF document format. PAWLS is particularly suited for mixed-mode annotation and scenarios in which annotators require extended context to annotate accurately. PAWLS supports span-based textual annotation, N-ary relations and freeform, non-textual bounding boxes, all of which can be exported in convenient formats for training multi-modal machine learning models. A read-only PAWLS server is available at this https URL and the source code is available at this https URL
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2101.10281 [cs.CL]
  (or arXiv:2101.10281v1 [cs.CL] for this version)

Submission history

From: Mark Neumann [view email]
[v1] Mon, 25 Jan 2021 18:02:43 GMT (8932kb,D)

Link back to: arXiv, form interface, contact.