We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Quantitative Biology > Quantitative Methods

Title: RITA: a Study on Scaling Up Generative Protein Sequence Models

Abstract: In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG)
Cite as: arXiv:2205.05789 [q-bio.QM]
  (or arXiv:2205.05789v2 [q-bio.QM] for this version)

Submission history

From: Daniel Hesslow [view email]
[v1] Wed, 11 May 2022 22:06:03 GMT (564kb,D)
[v2] Thu, 14 Jul 2022 21:46:47 GMT (568kb,D)

Link back to: arXiv, form interface, contact.