We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Investigating Text Simplification Evaluation

Abstract: Modern text simplification (TS) heavily relies on the availability of gold standard data to build machine learning models. However, existing studies show that parallel TS corpora contain inaccurate simplifications and incorrect alignments. Additionally, evaluation is usually performed by using metrics such as BLEU or SARI to compare system output to the gold standard. A major limitation is that these metrics do not match human judgements and the performance on different datasets and linguistic phenomena vary greatly. Furthermore, our research shows that the test and training subsets of parallel datasets differ significantly. In this work, we investigate existing TS corpora, providing new insights that will motivate the improvement of existing state-of-the-art TS evaluation methods. Our contributions include the analysis of TS corpora based on existing modifications used for simplification and an empirical study on TS models performance by using better-distributed datasets. We demonstrate that by improving the distribution of TS datasets, we can build more robust TS models.
Comments: 7 pages, 3 figures, 1 table
Subjects: Computation and Language (cs.CL)
Journal reference: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 876-882
Cite as: arXiv:2107.13662 [cs.CL]
  (or arXiv:2107.13662v1 [cs.CL] for this version)

Submission history

From: Laura Vásquez-Rodríguez [view email]
[v1] Wed, 28 Jul 2021 22:49:32 GMT (380kb,D)

Link back to: arXiv, form interface, contact.