We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: GORC: A large contextual citation graph of academic papers

Abstract: We introduce the Semantic Scholar Graph of References in Context (GORC), a large contextual citation graph of 81.1M academic publications, including parsed full text for 8.1M open access papers, across broad domains of science. Each paper is represented with rich paper metadata (title, authors, abstract, etc.), and where available: cleaned full text, section headers, figure and table captions, and parsed bibliography entries. In-line citation mentions in full text are linked to their corresponding bibliography entries, which are in turn linked to in-corpus cited papers, forming the edges of a contextual citation graph. To our knowledge, this is the largest publicly available contextual citation graph; the full text alone is the largest parsed academic text corpus publicly available. We demonstrate the ability to identify similar papers using these citation contexts and propose several applications for language modeling and citation-related tasks.
Comments: 12 pages, 2 figures, 5 appendices
Subjects: Computation and Language (cs.CL); Digital Libraries (cs.DL)
Cite as: arXiv:1911.02782 [cs.CL]
  (or arXiv:1911.02782v1 [cs.CL] for this version)

Submission history

From: Lucy Wang [view email]
[v1] Thu, 7 Nov 2019 07:34:43 GMT (692kb,D)

Link back to: arXiv, form interface, contact.