Abstract
Layman and non-layman users often have difficulties to understand privacy policy texts. The amount of time spent on reading and comprehending a policy poses a challenge to the user, who rarely pays attention to what he or she is agreeing to. Given this scenario, this paper aims to facilitate privacy policy terms presentation regarding data collection and sharing by introducing a new format called Privacy Label. Using natural language processing techniques, a model able to extract information about data collection in privacy policies and present them in an automated and easy-to-understand way to the user was built. To validate this model we used a precision assessment method where the accuracy of the extracted information was measured. The precision of our model was 0.685 (69%) when recovering information regarding data handling, making it possible for the final user to understand which data is being collected without reading the whole policy. The PPMark architecture can facilitate the notice-and-choice by presenting privacy policy information in an alternative way for online users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alfred, V.: Algorithms for finding patterns in strings. Algorithms and Complexity 1, 255 (2014)
Apostolico, A., Galil, Z.: Combinatorial algorithms on words, vol. 12. Springer Science & Business Media (2013)
Chang, C., Wang, H.: Comparison of two-dimensional string matching algorithms. In: 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE), vol. 3, pp. 608–611, March 2012
Conger, S., Pratt, J.H., Loch, K.D.: Personal information privacy and emerging technologies. Information Systems Journal 23(5), 401–417 (2013)
Costante, E., Sun, Y., Petković, M., den Hartog, J.: A machine learning solution to assess privacy policy completeness: (short paper). In: Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, WPES 2012, pp. 91–96. ACM, New York (2012). http://doi.acm.org/10.1145/2381966.2381979
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2), 249–260 (1987)
Kearns, M., Pitt, L.: A polynomial-time algorithm for learning k-variable pattern languages from examples. In: Proceedings of the Second Annual ACM Workshop on Computational Learning Theory, pp. 57–71 (2014)
Kelley, P.G., Bresee, J., Cranor, L.F., Reeder, R.W.: A nutrition label for privacy. In: Proceedings of the 5th Symposium on Usable Privacy and Security, p. 4. ACM (2009)
Kelley, P.G., Cesca, L., Bresee, J., Cranor, L.F.: Standardizing privacy notices: an online study of the nutrition label approach. In: Proceedings of the SIGCHI Conference on Human factors in Computing Systems, pp. 1573–1582. ACM (2010)
Lobato, L.L., Zorzo, S.D.: Avaliação dos mecanismos de privacidade e personalização na web. Universidade Federal de São Carlos, São Paulo (2007)
McDonald, A.M., Cranor, L.F.: The cost of reading privacy policies. ISJLP 4, 543 (2008)
Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 19:1–19:39 (2013). http://doi.acm.org/10.1145/2431211.2431218
Orengo, V., Huyck, C.: A stemming algorithm for the portuguese language. In: Proceedings of the Eighth International Symposium on String Processing and Information Retrieval, SPIRE 2001, pp. 186–193, November 2001
Pérez-Castillo, R., García-Rodríguez de Guzmán, I., Piattini, M., Places, A.S.: A case study on business process recovery using an e-government system. Software: Practice and Experience 42(2), 159–189 (2012)
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: First International Conference on Machine Learning (2003)
Savoy, J.: Light stemming approaches for the french, portuguese, german and hungarian languages. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 1031–1035. ACM, New York (2006). http://doi.acm.org/10.1145/1141277.1141523
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28(1), 11–21 (1972)
Watson, B.W.: A new regular grammar pattern matching algorithm. Theoretical Computer Science 299(1), 509–521 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Gonçalves de Pontes, D.R., Zorzo, S.D. (2016). PPMark: An Architecture to Generate Privacy Labels Using TF-IDF Techniques and the Rabin Karp Algorithm. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_89
Download citation
DOI: https://doi.org/10.1007/978-3-319-32467-8_89
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32466-1
Online ISBN: 978-3-319-32467-8
eBook Packages: EngineeringEngineering (R0)