Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

Groot, Tobias; Valdenegro-Toro, Matias

Full-text links:

Download:

Current browse context:

cs.CV

< prev | next >

new | recent | 2405

Computer Science > Computer Vision and Pattern Recognition

Title: Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

Authors: Tobias Groot, Matias Valdenegro-Toro

(Submitted on 5 May 2024)

Abstract: Language and Vision-Language Models (LLMs/VLMs) have revolutionized the field of AI by their ability to generate human-like text and understand images, but ensuring their reliability is crucial. This paper aims to evaluate the ability of LLMs (GPT4, GPT-3.5, LLaMA2, and PaLM 2) and VLMs (GPT4V and Gemini Pro Vision) to estimate their verbalized uncertainty via prompting. We propose the new Japanese Uncertain Scenes (JUS) dataset, aimed at testing VLM capabilities via difficult queries and object counting, and the Net Calibration Error (NCE) to measure direction of miscalibration. Results show that both LLMs and VLMs have a high calibration error and are overconfident most of the time, indicating a poor capability for uncertainty estimation. Additionally we develop prompts for regression tasks, and we show that VLMs have poor calibration when producing mean/standard deviation and 95% confidence intervals.

Comments:	8 pages, with appendix. To appear in TrustNLP workshop @ NAACL 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2405.02917 [cs.CV]
	(or arXiv:2405.02917v1 [cs.CV] for this version)

Submission history

From: Matias Valdenegro-Toro [view email]
[v1] Sun, 5 May 2024 12:51:38 GMT (2432kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2405.02917

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Computer Vision and Pattern Recognition

Title: Overconfidence is Key: Verbalized Uncertainty Evaluation in Large Language and Vision-Language Models

Submission history