We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Digital Libraries

Title: The similarity index of scientific publications with equations and formulas, identification of self-plagiarism, and testing of the iThenticate system

Abstract: The problems of estimating the similarity index of mathematical and other scientific publications containing equations and formulas are discussed for the first time. It is shown that the presence of equations and formulas (as well as figures, drawings, and tables) is a complicating factor that significantly complicates the study of such texts. It is shown that the method for determining the similarity index of publications, based on taking into account individual mathematical symbols and parts of equations and formulas, is ineffective and can lead to erroneous and even completely absurd conclusions. The possibilities of the most popular software system iThenticate, currently used in scientific journals, are investigated for detecting plagiarism and self-plagiarism. The results of processing by the iThenticate system of specific examples and special test problems containing equations (PDEs and ODEs), exact solutions, and some formulas are presented. It has been established that this software system when analyzing inhomogeneous texts, is often unable to distinguish self-plagiarism from pseudo-self-plagiarism (false self-plagiarism). A model complex situation is considered, in which the identification of self-plagiarism requires the involvement of highly qualified specialists of a narrow profile. Various ways to improve the work of software systems for comparing inhomogeneous texts are proposed. This article will be useful to researchers and university teachers in mathematics, physics, and engineering sciences, programmers dealing with problems in image recognition and research topics of digital image processing, as well as a wide range of readers who are interested in issues of plagiarism and self-plagiarism.
Comments: 23 pages, 3 figures, 2 photos
Subjects: Digital Libraries (cs.DL); Information Retrieval (cs.IR)
Journal reference: Mathematical Modeling and Computational Methods, 2021, No. 2, pp. 96-116 https://mmcm.bmstu.ru/articles/253
DOI: 10.18698/2309-3684-2021-2-96116
Cite as: arXiv:2201.09062 [cs.DL]
  (or arXiv:2201.09062v1 [cs.DL] for this version)

Submission history

From: Inna K. Shingareva Dr. Prof. [view email]
[v1] Tue, 21 Dec 2021 07:23:22 GMT (1834kb,D)

Link back to: arXiv, form interface, contact.