PPMark: An Architecture to Generate Privacy Labels Using TF-IDF Techniques and the Rabin Karp Algorithm

Gonçalves de Pontes, Diego Roberto; Zorzo, Sergio Donizetti

doi:10.1007/978-3-319-32467-8_89

Diego Roberto Gonçalves de Pontes¹⁵ &
Sergio Donizetti Zorzo¹⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 448))

1619 Accesses
1 Citations

Abstract

Layman and non-layman users often have difficulties to understand privacy policy texts. The amount of time spent on reading and comprehending a policy poses a challenge to the user, who rarely pays attention to what he or she is agreeing to. Given this scenario, this paper aims to facilitate privacy policy terms presentation regarding data collection and sharing by introducing a new format called Privacy Label. Using natural language processing techniques, a model able to extract information about data collection in privacy policies and present them in an automated and easy-to-understand way to the user was built. To validate this model we used a precision assessment method where the accuracy of the extracted information was measured. The precision of our model was 0.685 (69%) when recovering information regarding data handling, making it possible for the final user to understand which data is being collected without reading the whole policy. The PPMark architecture can facilitate the notice-and-choice by presenting privacy policy information in an alternative way for online users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alfred, V.: Algorithms for finding patterns in strings. Algorithms and Complexity 1, 255 (2014)
MATH Google Scholar
Apostolico, A., Galil, Z.: Combinatorial algorithms on words, vol. 12. Springer Science & Business Media (2013)
Google Scholar
Chang, C., Wang, H.: Comparison of two-dimensional string matching algorithms. In: 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE), vol. 3, pp. 608–611, March 2012
Google Scholar
Conger, S., Pratt, J.H., Loch, K.D.: Personal information privacy and emerging technologies. Information Systems Journal 23(5), 401–417 (2013)
Article Google Scholar
Costante, E., Sun, Y., Petković, M., den Hartog, J.: A machine learning solution to assess privacy policy completeness: (short paper). In: Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, WPES 2012, pp. 91–96. ACM, New York (2012). http://doi.acm.org/10.1145/2381966.2381979
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2), 249–260 (1987)
Article MathSciNet MATH Google Scholar
Kearns, M., Pitt, L.: A polynomial-time algorithm for learning k-variable pattern languages from examples. In: Proceedings of the Second Annual ACM Workshop on Computational Learning Theory, pp. 57–71 (2014)
Google Scholar
Kelley, P.G., Bresee, J., Cranor, L.F., Reeder, R.W.: A nutrition label for privacy. In: Proceedings of the 5th Symposium on Usable Privacy and Security, p. 4. ACM (2009)
Google Scholar
Kelley, P.G., Cesca, L., Bresee, J., Cranor, L.F.: Standardizing privacy notices: an online study of the nutrition label approach. In: Proceedings of the SIGCHI Conference on Human factors in Computing Systems, pp. 1573–1582. ACM (2010)
Google Scholar
Lobato, L.L., Zorzo, S.D.: Avaliação dos mecanismos de privacidade e personalização na web. Universidade Federal de São Carlos, São Paulo (2007)
Google Scholar
McDonald, A.M., Cranor, L.F.: The cost of reading privacy policies. ISJLP 4, 543 (2008)
Google Scholar
Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 19:1–19:39 (2013). http://doi.acm.org/10.1145/2431211.2431218
Article MATH Google Scholar
Orengo, V., Huyck, C.: A stemming algorithm for the portuguese language. In: Proceedings of the Eighth International Symposium on String Processing and Information Retrieval, SPIRE 2001, pp. 186–193, November 2001
Google Scholar
Pérez-Castillo, R., García-Rodríguez de Guzmán, I., Piattini, M., Places, A.S.: A case study on business process recovery using an e-government system. Software: Practice and Experience 42(2), 159–189 (2012)
Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: First International Conference on Machine Learning (2003)
Google Scholar
Savoy, J.: Light stemming approaches for the french, portuguese, german and hungarian languages. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 1031–1035. ACM, New York (2006). http://doi.acm.org/10.1145/1141277.1141523
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28(1), 11–21 (1972)
Article Google Scholar
Watson, B.W.: A new regular grammar pattern matching algorithm. Theoretical Computer Science 299(1), 509–521 (2003)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Federal University of São Carlos (UFSCar), São Carlos, SP, Brazil
Diego Roberto Gonçalves de Pontes & Sergio Donizetti Zorzo

Authors

Diego Roberto Gonçalves de Pontes
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Donizetti Zorzo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego Roberto Gonçalves de Pontes .

Editor information

Editors and Affiliations

PHASE , Las Vegas, Nevada, USA
Shahram Latifi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonçalves de Pontes, D.R., Zorzo, S.D. (2016). PPMark: An Architecture to Generate Privacy Labels Using TF-IDF Techniques and the Rabin Karp Algorithm. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_89

Download citation

DOI: https://doi.org/10.1007/978-3-319-32467-8_89
Published: 29 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32466-1
Online ISBN: 978-3-319-32467-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics