Skip to main content

Generalization-Based k-Anonymization

  • Conference paper
  • First Online:
Modeling Decisions for Artificial Intelligence (MDAI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9321))

  • 714 Accesses

Abstract

Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., numerical and categorical attributes. In this paper we propose a new microaggregation method for achieving a compliant k-anonymous masked file for categorical microdata based on generalization. The goal is to build a generalized description satisfied by at least k domain objects and to replace these domain objects by the description. The way to construct that generalization is similar that the one used in growing decision trees. Records that cannot be generalized satisfactorily are discarded, therefore some information is lost. In the experiments we performed we prove that the new approach gives good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abril, D., Navarro-Arribas, G., Torra, V.: Supervised learning using a symmetric bilinear form for record linkage. Inf. Fusion 26, 144–153 (2015)

    Article  MATH  Google Scholar 

  2. Armengol, E., Plaza, E.: Bottom-up induction of feature terms. Mach. Learn. 41, 259–294 (2000)

    Article  MATH  Google Scholar 

  3. Bache, K., Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  4. Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228 (2005)

    Google Scholar 

  5. Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)

    Article  MathSciNet  Google Scholar 

  6. Duncan, G.T., Elliot, M., Salazar, J.J.: Statistical Confidentiality. Springer, New York (2011)

    Book  MATH  Google Scholar 

  7. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Guo, L., Wu, X.: Privacy preserving categorical data analysis with unknown distortion parameters. Trans. Data Priv. 2(3), 185–205 (2009)

    MathSciNet  Google Scholar 

  9. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  10. Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley, New York (2012)

    Book  Google Scholar 

  11. Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)

    Google Scholar 

  12. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Multidimensional \(k\)-anonymity, Technical report 1521, University of Wisconsin (2005)

    Google Scholar 

  13. LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain \(k\)-anonymity, SIGMOD 2005 (2005)

    Google Scholar 

  14. Li, X.-B., Sarkar, S.: Privacy protection in data mining: a perturbation approach for categorical data. Inf. Syst. Res. 17(3), 254–270 (2004)

    Article  Google Scholar 

  15. Marés, J., Torra, V.: Clustering-based categorical data protection. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 78–89. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  17. Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression, SRI International Technical report (1998)

    Google Scholar 

  18. Tassa, T., Mazza, A., Gionis, A.: k-concealment: an alternative model of k-type anonymity. Trans. Data Priv. 5(1), 189–222 (2012)

    MathSciNet  MATH  Google Scholar 

  19. Torra, V., Stokes, K.: A formalization of record linkage and its application to data protection. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 20, 907–919 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. Wang, K.: Bottom-up generalization: a data mining solution to privacy protection. Proc. ICDM 2004, 249–256 (2004)

    Google Scholar 

  21. Winkler, W.E.: Re-identification methods for masked microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Acknowledgments

This research is partially funded by the Spanish MICINN projects COGNITIO (TIN-2012-38450-C03-03), EdeTRI (TIN2012-39348-C02-01) and COPRIVACY (TIN2011-27076-C03-03), the grant 2009-SGR-1434 from the Generalitat de Catalunya, and the European Project DwB (Grant Agreement Number 262608).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Armengol, E., Torra, V. (2015). Generalization-Based k-Anonymization. In: Torra, V., Narukawa, T. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2015. Lecture Notes in Computer Science(), vol 9321. Springer, Cham. https://doi.org/10.1007/978-3-319-23240-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23240-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23239-3

  • Online ISBN: 978-3-319-23240-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics