We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Quantitative Biology > Quantitative Methods

Title: Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

Abstract: Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a comprehensive instruction dataset designed for the biomolecular domain. Mol-Instructions encompasses three key components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions. Each component aims to improve the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on LLMs, we demonstrate the effectiveness of Mol-Instructions in enhancing large models' performance in the intricate realm of biomolecular studies, thus fostering progress in the biomolecular research community. Mol-Instructions is publicly available for ongoing research and will undergo regular updates to enhance its applicability.
Comments: ICLR 2024. Project homepage: this https URL
Subjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as: arXiv:2306.08018 [q-bio.QM]
  (or arXiv:2306.08018v5 [q-bio.QM] for this version)

Submission history

From: Ningyu Zhang [view email]
[v1] Tue, 13 Jun 2023 14:35:34 GMT (9396kb,D)
[v2] Tue, 29 Aug 2023 17:13:05 GMT (14371kb,D)
[v3] Mon, 2 Oct 2023 15:27:20 GMT (16686kb,D)
[v4] Thu, 30 Nov 2023 15:29:58 GMT (18066kb,D)
[v5] Mon, 4 Mar 2024 12:49:31 GMT (17523kb,D)

Link back to: arXiv, form interface, contact.