Are Large Pre-Trained Language Models Leaking Your Personal Information?

Huang, Jie; Shao, Hanyin; Chang, Kevin Chen-Chuan

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2205

Computer Science > Computation and Language

Title: Are Large Pre-Trained Language Models Leaking Your Personal Information?

Authors: Jie Huang, Hanyin Shao, Kevin Chen-Chuan Chang

(Submitted on 25 May 2022 (v1), last revised 20 Oct 2022 (this version, v2))

Abstract: Are Large Pre-Trained Language Models Leaking Your Personal Information? In this paper, we analyze whether Pre-Trained Language Models (PLMs) are prone to leaking personal information. Specifically, we query PLMs for email addresses with contexts of the email address or prompts containing the owner's name. We find that PLMs do leak personal information due to memorization. However, since the models are weak at association, the risk of specific personal information being extracted by attackers is low. We hope this work could help the community to better understand the privacy risk of PLMs and bring new insights to make PLMs safe.

Comments:	Accepted to Findings of EMNLP 2022
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2205.12628 [cs.CL]
	(or arXiv:2205.12628v2 [cs.CL] for this version)

Submission history

From: Jie Huang [view email]
[v1] Wed, 25 May 2022 10:08:45 GMT (40kb,D)
[v2] Thu, 20 Oct 2022 05:30:43 GMT (414kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2205.12628

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Are Large Pre-Trained Language Models Leaking Your Personal Information?

Submission history