Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Vaska, Nathan; Helus, Victoria

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2306

Computer Science > Machine Learning

Title: Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Authors: Nathan Vaska, Victoria Helus

(Submitted on 1 Jun 2023)

Abstract: The impressive advances and applications of large language and joint language-and-visual understanding models has led to an increased need for methods of probing their potential reasoning capabilities. However, the difficulty of gather naturally-occurring data for complex multi-modal reasoning tasks bottlenecks the evaluation of AI methods on tasks which are not already covered by an academic dataset. In this work, we leverage recent advances in high resolution text-to-image generation to develop a framework for generating evaluation data for multi-modal reasoning tasks. We apply this framework to generate context-dependent anomaly data, creating a synthetic dataset on a challenging task which is not well covered by existing datasets. We benchmark the performance of a state-of-the-art visual question answering (VQA) model against data generated with this method, and demonstrate that while the task is tractable, the model performs significantly worse on the context-dependent anomaly detection task than on standard VQA tasks.

Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2306.01144 [cs.LG]
	(or arXiv:2306.01144v1 [cs.LG] for this version)

Submission history

From: Victoria Helus [view email]
[v1] Thu, 1 Jun 2023 20:56:34 GMT (9275kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2306.01144

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Submission history