Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning

Wang, Xiao; Chen, Tianze; Yang, Xianjun; Zhang, Qi; Zhao, Xun; Lin, Dahua

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2404

Computer Science > Computation and Language

Title: Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning

Authors: Xiao Wang, Tianze Chen, Xianjun Yang, Qi Zhang, Xun Zhao, Dahua Lin

(Submitted on 16 Apr 2024)

Abstract: The open-sourcing of large language models (LLMs) accelerates application development, innovation, and scientific progress. This includes both base models, which are pre-trained on extensive datasets without alignment, and aligned models, deliberately designed to align with ethical standards and human values. Contrary to the prevalent assumption that the inherent instruction-following limitations of base LLMs serve as a safeguard against misuse, our investigation exposes a critical oversight in this belief. By deploying carefully designed demonstrations, our research demonstrates that base LLMs could effectively interpret and execute malicious instructions. To systematically assess these risks, we introduce a novel set of risk evaluation metrics. Empirical results reveal that the outputs from base LLMs can exhibit risk levels on par with those of models fine-tuned for malicious purposes. This vulnerability, requiring neither specialized knowledge nor training, can be manipulated by almost anyone, highlighting the substantial risk and the critical need for immediate attention to the base LLMs' security protocols.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.10552 [cs.CL]
	(or arXiv:2404.10552v1 [cs.CL] for this version)

Submission history

From: Xiao Wang [view email]
[v1] Tue, 16 Apr 2024 13:22:54 GMT (8306kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.10552

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Computation and Language

Title: Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning

Submission history