References & Citations
Computer Science > Computation and Language
Title: Understanding the Properties of Generated Corpora
(Submitted on 22 Jun 2022 (v1), last revised 27 Oct 2022 (this version, v2))
Abstract: Models for text generation have become focal for many research tasks and especially for the generation of sentence corpora. However, understanding the properties of an automatically generated text corpus remains challenging. We propose a set of tools that examine the properties of generated text corpora. Applying these tools on various generated corpora allowed us to gain new insights into the properties of the generative models. As part of our characterization process, we found remarkable differences in the corpora generated by two leading generative technologies.
Submission history
From: Naama Zwerdling [view email][v1] Wed, 22 Jun 2022 17:13:52 GMT (7712kb,D)
[v2] Thu, 27 Oct 2022 10:58:08 GMT (7713kb,D)
Link back to: arXiv, form interface, contact.