Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Warstadt, Alex; Choshen, Leshem; Mueller, Aaron; Williams, Adina; Wilcox, Ethan; Zhuang, Chengxu

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2301

Change to browse by:

Computer Science > Computation and Language

Title: Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors: Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang

(Submitted on 27 Jan 2023)

Abstract: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children. The task has three tracks, two of which restrict the training data to pre-released datasets of 10M and 100M words and are dedicated to explorations of approaches such as architectural variations, self-supervised objectives, or curriculum learning. The final track only restricts the amount of text used, allowing innovation in the choice of the data, its domain, and even its modality (i.e., data from sources other than text is welcome). We will release a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, including targeted syntactic evaluations and natural language understanding.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2301.11796 [cs.CL]
	(or arXiv:2301.11796v1 [cs.CL] for this version)

Submission history

From: Ethan Wilcox [view email]
[v1] Fri, 27 Jan 2023 15:52:50 GMT (156kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.11796

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Submission history