Reference Hub3
Analysis and Text Classification of Privacy Policies From Rogue and Top-100 Fortune Global Companies

Analysis and Text Classification of Privacy Policies From Rogue and Top-100 Fortune Global Companies

Martin Boldt, Kaavya Rekanar
Copyright: © 2019 |Volume: 13 |Issue: 2 |Pages: 20
ISSN: 1930-1650|EISSN: 1930-1669|EISBN13: 9781522564614|DOI: 10.4018/IJISP.2019040104
Cite Article Cite Article

MLA

Boldt, Martin, and Kaavya Rekanar. "Analysis and Text Classification of Privacy Policies From Rogue and Top-100 Fortune Global Companies." IJISP vol.13, no.2 2019: pp.47-66. http://doi.org/10.4018/IJISP.2019040104

APA

Boldt, M. & Rekanar, K. (2019). Analysis and Text Classification of Privacy Policies From Rogue and Top-100 Fortune Global Companies. International Journal of Information Security and Privacy (IJISP), 13(2), 47-66. http://doi.org/10.4018/IJISP.2019040104

Chicago

Boldt, Martin, and Kaavya Rekanar. "Analysis and Text Classification of Privacy Policies From Rogue and Top-100 Fortune Global Companies," International Journal of Information Security and Privacy (IJISP) 13, no.2: 47-66. http://doi.org/10.4018/IJISP.2019040104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

In the present article, the authors investigate to what extent supervised binary classification can be used to distinguish between legitimate and rogue privacy policies posted on web pages. 15 classification algorithms are evaluated using a data set that consists of 100 privacy policies from legitimate websites (belonging to companies that top the Fortune Global 500 list) as well as 67 policies from rogue websites. A manual analysis of all policy content was performed and clear statistical differences in terms of both length and adherence to seven general privacy principles are found. Privacy policies from legitimate companies have a 98% adherence to the seven privacy principles, which is significantly higher than the 45% associated with rogue companies. Out of the 15 evaluated classification algorithms, Naïve Bayes Multinomial is the most suitable candidate to solve the problem at hand. Its models show the best performance, with an AUC measure of 0.90 (0.08), which outperforms most of the other candidates in the statistical tests used.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.