We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Digital Libraries

Title: Reducing a Set of Regular Expressions and Analyzing Differences of Domain-specific Statistic Reporting

Abstract: Due to the large amount of daily scientific publications, it is impossible to manually review each one. Therefore, an automatic extraction of key information is desirable. In this paper, we examine STEREO, a tool for extracting statistics from scientific papers using regular expressions. By adapting an existing regular expression inclusion algorithm for our use case, we decrease the number of regular expressions used in STEREO by about $33.8\%$. We reveal common patterns from the condensed rule set that can be used for the creation of new rules. We also apply STEREO, which was previously trained in the life-sciences and medical domain, to a new scientific domain, namely Human-Computer-Interaction (HCI), and re-evaluate it. According to our research, statistics in the HCI domain are similar to those in the medical domain, although a higher percentage of APA-conform statistics were found in the HCI domain. Additionally, we compare extraction on PDF and LaTeX source files, finding LaTeX to be more reliable for extraction.
Subjects: Digital Libraries (cs.DL)
Cite as: arXiv:2211.13632 [cs.DL]
  (or arXiv:2211.13632v2 [cs.DL] for this version)

Submission history

From: Ansgar Scherp [view email]
[v1] Thu, 24 Nov 2022 14:29:20 GMT (734kb,D)
[v2] Sat, 25 Mar 2023 09:47:27 GMT (756kb,D)

Link back to: arXiv, form interface, contact.