Skip to main content

PPMark: An Architecture to Generate Privacy Labels Using TF-IDF Techniques and the Rabin Karp Algorithm

  • Conference paper
  • First Online:
Information Technology: New Generations

Abstract

Layman and non-layman users often have difficulties to understand privacy policy texts. The amount of time spent on reading and comprehending a policy poses a challenge to the user, who rarely pays attention to what he or she is agreeing to. Given this scenario, this paper aims to facilitate privacy policy terms presentation regarding data collection and sharing by introducing a new format called Privacy Label. Using natural language processing techniques, a model able to extract information about data collection in privacy policies and present them in an automated and easy-to-understand way to the user was built. To validate this model we used a precision assessment method where the accuracy of the extracted information was measured. The precision of our model was 0.685 (69%) when recovering information regarding data handling, making it possible for the final user to understand which data is being collected without reading the whole policy. The PPMark architecture can facilitate the notice-and-choice by presenting privacy policy information in an alternative way for online users.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alfred, V.: Algorithms for finding patterns in strings. Algorithms and Complexity 1, 255 (2014)

    MATH  Google Scholar 

  2. Apostolico, A., Galil, Z.: Combinatorial algorithms on words, vol. 12. Springer Science & Business Media (2013)

    Google Scholar 

  3. Chang, C., Wang, H.: Comparison of two-dimensional string matching algorithms. In: 2012 International Conference on Computer Science and Electronics Engineering (ICCSEE), vol. 3, pp. 608–611, March 2012

    Google Scholar 

  4. Conger, S., Pratt, J.H., Loch, K.D.: Personal information privacy and emerging technologies. Information Systems Journal 23(5), 401–417 (2013)

    Article  Google Scholar 

  5. Costante, E., Sun, Y., Petković, M., den Hartog, J.: A machine learning solution to assess privacy policy completeness: (short paper). In: Proceedings of the 2012 ACM Workshop on Privacy in the Electronic Society, WPES 2012, pp. 91–96. ACM, New York (2012). http://doi.acm.org/10.1145/2381966.2381979

  6. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  7. Kearns, M., Pitt, L.: A polynomial-time algorithm for learning k-variable pattern languages from examples. In: Proceedings of the Second Annual ACM Workshop on Computational Learning Theory, pp. 57–71 (2014)

    Google Scholar 

  8. Kelley, P.G., Bresee, J., Cranor, L.F., Reeder, R.W.: A nutrition label for privacy. In: Proceedings of the 5th Symposium on Usable Privacy and Security, p. 4. ACM (2009)

    Google Scholar 

  9. Kelley, P.G., Cesca, L., Bresee, J., Cranor, L.F.: Standardizing privacy notices: an online study of the nutrition label approach. In: Proceedings of the SIGCHI Conference on Human factors in Computing Systems, pp. 1573–1582. ACM (2010)

    Google Scholar 

  10. Lobato, L.L., Zorzo, S.D.: Avaliação dos mecanismos de privacidade e personalização na web. Universidade Federal de São Carlos, São Paulo (2007)

    Google Scholar 

  11. McDonald, A.M., Cranor, L.F.: The cost of reading privacy policies. ISJLP 4, 543 (2008)

    Google Scholar 

  12. Mooney, C.H., Roddick, J.F.: Sequential pattern mining - approaches and algorithms. ACM Comput. Surv. 45(2), 19:1–19:39 (2013). http://doi.acm.org/10.1145/2431211.2431218

    Article  MATH  Google Scholar 

  13. Orengo, V., Huyck, C.: A stemming algorithm for the portuguese language. In: Proceedings of the Eighth International Symposium on String Processing and Information Retrieval, SPIRE 2001, pp. 186–193, November 2001

    Google Scholar 

  14. Pérez-Castillo, R., García-Rodríguez de Guzmán, I., Piattini, M., Places, A.S.: A case study on business process recovery using an e-government system. Software: Practice and Experience 42(2), 159–189 (2012)

    Google Scholar 

  15. Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: First International Conference on Machine Learning (2003)

    Google Scholar 

  16. Savoy, J.: Light stemming approaches for the french, portuguese, german and hungarian languages. In: Proceedings of the 2006 ACM Symposium on Applied Computing, SAC 2006, pp. 1031–1035. ACM, New York (2006). http://doi.acm.org/10.1145/1141277.1141523

  17. Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of documentation 28(1), 11–21 (1972)

    Article  Google Scholar 

  18. Watson, B.W.: A new regular grammar pattern matching algorithm. Theoretical Computer Science 299(1), 509–521 (2003)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Diego Roberto Gonçalves de Pontes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gonçalves de Pontes, D.R., Zorzo, S.D. (2016). PPMark: An Architecture to Generate Privacy Labels Using TF-IDF Techniques and the Rabin Karp Algorithm. In: Latifi, S. (eds) Information Technology: New Generations. Advances in Intelligent Systems and Computing, vol 448. Springer, Cham. https://doi.org/10.1007/978-3-319-32467-8_89

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32467-8_89

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32466-1

  • Online ISBN: 978-3-319-32467-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics