We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Digital Libraries

Title: Gextext: Disease Network Extraction from Biomedical Literature

Authors: Robert O'Shea
Abstract: PURPOSE: We propose a fully unsupervised method to learn latent disease networks directly from unstructured biomedical text corpora. This method addresses current challenges in unsupervised knowledge extraction, such as the detection of long-range dependencies and requirements for large training corpora. METHODS: Let C be a corpus of n text chunks. Let V be a set of p disease terms occurring in the corpus. Let X indicate the occurrence of V in C. Gextext identifies disease similarities by positively correlated occurrence patterns. This information is combined to generate a graph on which geodesic distance describes dissimilarity. Diseasomes were learned by Gextext and GloVE on corpora of 100-1000 PubMed abstracts. Similarity matrix estimates were validated against biomedical semantic similarity metrics and gene profile similarity. RESULTS: Geodesic distance on Gextext-inferred diseasomes correlated inversely with external measures of semantic similarity. Gene profile similarity also correlated significant with proximity on the inferred graph. Gextext outperformed GloVE in our experiments. The information contained on the Gextext graph exceeded the explicit information content within the text. CONCLUSIONS: Gextext extracts latent relationships from unstructured text, enabling fully unsupervised modelling of diseasome graphs from PubMed abstracts.
Subjects: Digital Libraries (cs.DL); Computation and Language (cs.CL)
Cite as: arXiv:1911.02562 [cs.DL]
  (or arXiv:1911.02562v2 [cs.DL] for this version)

Submission history

From: Robert O'Shea [view email]
[v1] Wed, 6 Nov 2019 10:57:38 GMT (516kb)
[v2] Tue, 17 Dec 2019 10:15:39 GMT (379kb)

Link back to: arXiv, form interface, contact.