We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Partial Product Aware Machine Learning on DNA-Encoded Libraries

Abstract: DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL. Machine learning chemical property prediction approaches rely on the assumption that the property of interest is linked to a single chemical structure. In the context of DNA-encoded libraries, this is equivalent to assuming that every chemical reaction fully yields the desired product. However, in practice, multi-step chemical synthesis sometimes generates partial molecules. Each unique DNA tag in a DEL therefore corresponds to a set of possible molecules. Here, we leverage reaction yield data to enumerate the set of possible molecules corresponding to a given DNA tag. This paper demonstrates that training a custom GNN on this richer dataset improves accuracy and generalization performance.
Comments: 8 pages, 5 figures; Published at the MLDD workshop, ICLR 2022
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Cite as: arXiv:2205.08020 [cs.LG]
  (or arXiv:2205.08020v1 [cs.LG] for this version)

Submission history

From: Polina Binder [view email]
[v1] Mon, 16 May 2022 23:18:02 GMT (1817kb,D)

Link back to: arXiv, form interface, contact.