We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Croissant: A Metadata Format for ML-Ready Datasets

Abstract: Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
Comments: Preprint. Contributors listed in alphabetical order
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Databases (cs.DB); Information Retrieval (cs.IR)
Cite as: arXiv:2403.19546 [cs.LG]
  (or arXiv:2403.19546v1 [cs.LG] for this version)

Submission history

From: Luis Oala [view email]
[v1] Thu, 28 Mar 2024 16:27:26 GMT (1268kb,D)

Link back to: arXiv, form interface, contact.