New submissions for Thu, 29 Oct 20

[1]  arXiv:2010.14588 [pdf]
Title: A Comprehensive Dictionary and Term Variation Analysis for COVID-19 and SARS-CoV-2
Comments: Accepted EMNLP NLP-COVID Workshop
Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL)

The number of unique terms in the scientific literature used to refer to either SARS-CoV-2 or COVID-19 is remarkably large and has continued to increase rapidly despite well-established standardized terms. This high degree of term variation makes high recall identification of these important entities difficult. In this manuscript we present an extensive dictionary of terms used in the literature to refer to SARS-CoV-2 and COVID-19. We use a rule-based approach to iteratively generate new term variants, then locate these variants in a large text corpus. We compare our dictionary to an extensive collection of terminological resources, demonstrating that our resource provides a substantial number of additional terms. We use our dictionary to analyze the usage of SARS-CoV-2 and COVID-19 terms over time and show that the number of unique terms continues to grow rapidly. Our dictionary is freely available at https://github.com/ncbi-nlp/CovidTermVar.

[2]  arXiv:2010.14640 [pdf, other]
Title: Improving Text Relationship Modeling with Artificial Data
Comments: 9 pages, 3 figures
Subjects: Digital Libraries (cs.DL); Machine Learning (cs.LG)

Data augmentation uses artificially-created examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited availability of labelled data. We apply and evaluate a synthetic data approach to relationship classification in digital libraries, generating artificial books with relationships that are common in digital libraries but not easier inferred from existing metadata. We find that for classification on whole-part relationships between books, synthetic data improves a deep neural network classifier by 91%. Further, we consider the ability of synthetic data to learn a useful new text relationship class from fully artificial training data.

Replacements for Thu, 29 Oct 20

[3]  arXiv:1811.01120 (replaced) [pdf]
Title: Exploring Direct Citations between Citing Publications
Subjects: Digital Libraries (cs.DL)
[4]  arXiv:2010.12294 (replaced) [pdf, other]
Title: Topic Space Trajectories: A case study on machine learning literature
Comments: 36 pages, 8 figures
Subjects: Machine Learning (cs.LG); Digital Libraries (cs.DL)
