We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Quantitative Biology > Quantitative Methods

Title: Transformer-CNN: Fast and Reliable tool for QSAR

Abstract: We present SMILES-embeddings derived from internal encoder state of a Transformer[1] model trained to canonize SMILES as a Seq2Seq problem. Using CharNN[2] architecture upon the embeddings results in a higher quality QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis grounds on an internal consensus. Both the augmentation and transfer learning based on embedding allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings are available on this https URL, whereas the OCHEM[3] environment (this https URL) hosts its on-line implementation.
Subjects: Quantitative Methods (q-bio.QM); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:1911.06603 [q-bio.QM]
  (or arXiv:1911.06603v1 [q-bio.QM] for this version)

Submission history

From: Pavel Karpov Dr [view email]
[v1] Mon, 21 Oct 2019 12:49:55 GMT (811kb)
[v2] Tue, 25 Feb 2020 13:35:29 GMT (460kb,D)
[v3] Wed, 26 Feb 2020 14:43:18 GMT (458kb,D)

Link back to: arXiv, form interface, contact.