Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Mündler, Niels; He, Jingxuan; Jenko, Slobodan; Vechev, Martin

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2305

Computer Science > Computation and Language

Title: Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Authors: Niels Mündler, Jingxuan He, Slobodan Jenko, Martin Vechev

(Submitted on 25 May 2023 (v1), last revised 15 Mar 2024 (this version, v3))

Abstract: Large language models (large LMs) are susceptible to producing text that contains hallucinated content. An important instance of this problem is self-contradiction, where the LM generates two contradictory sentences within the same context. In this work, we present a comprehensive investigation into self-contradiction for various instruction-tuned LMs, covering evaluation, detection, and mitigation. Our primary evaluation task is open-domain text generation, but we also demonstrate the applicability of our approach to shorter question answering. Our analysis reveals the prevalence of self-contradictions, e.g., in 17.7% of all sentences produced by ChatGPT. We then propose a novel prompting-based framework designed to effectively detect and mitigate self-contradictions. Our detector achieves high accuracy, e.g., around 80% F1 score when prompting ChatGPT. The mitigation algorithm iteratively refines the generated text to remove contradictory information while preserving text fluency and informativeness. Importantly, our entire framework is applicable to black-box LMs and does not require retrieval of external knowledge. Rather, our method complements retrieval-based methods, as a large portion of self-contradictions (e.g., 35.2% for ChatGPT) cannot be verified using online text. Our approach is practically effective and has been released as a push-button tool to benefit the public at this https URL

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2305.15852 [cs.CL]
	(or arXiv:2305.15852v3 [cs.CL] for this version)

Submission history

From: Niels Mündler [view email]
[v1] Thu, 25 May 2023 08:43:46 GMT (122kb,D)
[v2] Sun, 1 Oct 2023 07:22:39 GMT (97kb,D)
[v3] Fri, 15 Mar 2024 21:04:34 GMT (993kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.15852

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation

Submission history