AI-enabled Automation for Completeness Checking of Privacy Policies

Amaral, Orlando; Abualhaija, Sallam; Torre, Damiano; Sabetzadeh, Mehrdad; Briand, Lionel C.

Full-text links:

Download:

Current browse context:

cs.CR

< prev | next >

new | recent | 2106

Computer Science > Cryptography and Security

Title: AI-enabled Automation for Completeness Checking of Privacy Policies

Authors: Orlando Amaral, Sallam Abualhaija, Damiano Torre, Mehrdad Sabetzadeh, Lionel C. Briand

(Submitted on 10 Jun 2021 (v1), last revised 5 Oct 2021 (this version, v2))

Abstract: Technological advances in information sharing have raised concerns about data protection. Privacy policies contain privacy-related requirements about how the personal data of individuals will be handled by an organization or a software system (e.g., a web service or an app). In Europe, privacy policies are subject to compliance with the General Data Protection Regulation (GDPR). A prerequisite for GDPR compliance checking is to verify whether the content of a privacy policy is complete according to the provisions of GDPR. Incomplete privacy policies might result in large fines on violating organization as well as incomplete privacy-related software specifications. Manual completeness checking is both time-consuming and error-prone. In this paper, we propose AI-based automation for the completeness checking of privacy policies. Through systematic qualitative methods, we first build two artifacts to characterize the privacy-related provisions of GDPR, namely a conceptual model and a set of completeness criteria. Then, we develop an automated solution on top of these artifacts by leveraging a combination of natural language processing and supervised machine learning. Specifically, we identify the GDPR-relevant information content in privacy policies and subsequently check them against the completeness criteria. To evaluate our approach, we collected 234 real privacy policies from the fund industry. Over a set of 48 unseen privacy policies, our approach detected 300 of the total of 334 violations of some completeness criteria correctly, while producing 23 false positives. The approach thus has a precision of 92.9% and recall of 89.8%. Compared to a baseline that applies keyword search only, our approach results in an improvement of 24.5% in precision and 38% in recall.

Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Cite as:	arXiv:2106.05688 [cs.CR]
	(or arXiv:2106.05688v2 [cs.CR] for this version)

Submission history

From: Sallam Abualhaija [view email]
[v1] Thu, 10 Jun 2021 12:10:51 GMT (9830kb,D)
[v2] Tue, 5 Oct 2021 16:33:20 GMT (11430kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.05688

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Cryptography and Security

Title: AI-enabled Automation for Completeness Checking of Privacy Policies

Submission history