CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Han, Seungju; Hessel, Jack; Dziri, Nouha; Choi, Yejin; Yu, Youngjae

Full-text links:

Download:

Computer Science > Computation and Language

Title: CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Authors: Seungju Han, Jack Hessel, Nouha Dziri, Yejin Choi, Youngjae Yu

(Submitted on 17 Mar 2023 (v1), last revised 16 Aug 2023 (this version, v2))

Abstract: Visual information is central to conversation: body gestures and physical behaviour, for example, contribute to meaning that transcends words alone. To date, however, most neural conversational models are limited to just text. We introduce CHAMPAGNE, a generative model of conversations that can account for visual contexts. To train CHAMPAGNE, we collect and release YTD-18M, a large-scale corpus of 18M video-based dialogues. YTD-18M is constructed from web videos: crucial to our data collection pipeline is a pretrained language model that converts error-prone automatic transcripts to a cleaner dialogue format while maintaining meaning. Human evaluation reveals that YTD-18M is more sensible and specific than prior resources (MMDialog, 1M dialogues), while maintaining visual-groundedness. Experiments demonstrate that 1) CHAMPAGNE learns to conduct conversation from YTD-18M; and 2) when fine-tuned, it achieves state-of-the-art results on four vision-language tasks focused on real-world conversations. We release data, models, and code.

Comments:	ICCV 2023, Project page: this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.09713 [cs.CL]
	(or arXiv:2303.09713v2 [cs.CL] for this version)

Submission history

From: Seungju Han [view email]
[v1] Fri, 17 Mar 2023 01:10:33 GMT (8798kb,D)
[v2] Wed, 16 Aug 2023 08:17:02 GMT (8802kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.09713

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos

Submission history