Generalization-Based k-Anonymization

Armengol, Eva; Torra, Vicenç

doi:10.1007/978-3-319-23240-9_17

Eva Armengol⁶ &
Vicenç Torra⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9321))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

714 Accesses

Abstract

Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., numerical and categorical attributes. In this paper we propose a new microaggregation method for achieving a compliant k-anonymous masked file for categorical microdata based on generalization. The goal is to build a generalized description satisfied by at least k domain objects and to replace these domain objects by the description. The way to construct that generalization is similar that the one used in growing decision trees. Records that cannot be generalized satisfactorily are discarded, therefore some information is lost. In the experiments we performed we prove that the new approach gives good results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abril, D., Navarro-Arribas, G., Torra, V.: Supervised learning using a symmetric bilinear form for record linkage. Inf. Fusion 26, 144–153 (2015)
Article MATH Google Scholar
Armengol, E., Plaza, E.: Bottom-up induction of feature terms. Mach. Learn. 41, 259–294 (2000)
Article MATH Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Bayardo, R.J., Agrawal, R.: Data privacy through optimal k-anonymization. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 217–228 (2005)
Google Scholar
Domingo-Ferrer, J., Torra, V.: Ordinal, continuous and heterogeneous \(k\)-anonymity through microaggregation. Data Min. Knowl. Discov. 11(2), 195–212 (2005)
Article MathSciNet Google Scholar
Duncan, G.T., Elliot, M., Salazar, J.J.: Statistical Confidentiality. Springer, New York (2011)
Book MATH Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Guo, L., Wu, X.: Privacy preserving categorical data analysis with unknown distortion parameters. Trans. Data Priv. 2(3), 185–205 (2009)
MathSciNet Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., de Wolf, P.-P.: Statistical Disclosure Control. Wiley, New York (2012)
Book Google Scholar
Iyengar, V.S.: Transforming data to satisfy privacy constraints. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 279–288. ACM, New York (2002)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Multidimensional \(k\)-anonymity, Technical report 1521, University of Wisconsin (2005)
Google Scholar
LeFevre, K., DeWitt, D.J., Ramakrishnan, R.: Incognito: efficient full-domain \(k\)-anonymity, SIGMOD 2005 (2005)
Google Scholar
Li, X.-B., Sarkar, S.: Privacy protection in data mining: a perturbation approach for categorical data. Inf. Syst. Res. 17(3), 254–270 (2004)
Article Google Scholar
Marés, J., Torra, V.: Clustering-based categorical data protection. In: Domingo-Ferrer, J., Tinnirello, I. (eds.) PSD 2012. LNCS, vol. 7556, pp. 78–89. Springer, Heidelberg (2012)
Chapter Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: \(k\)-anonymity and its enforcement through generalization and suppression, SRI International Technical report (1998)
Google Scholar
Tassa, T., Mazza, A., Gionis, A.: k-concealment: an alternative model of k-type anonymity. Trans. Data Priv. 5(1), 189–222 (2012)
MathSciNet MATH Google Scholar
Torra, V., Stokes, K.: A formalization of record linkage and its application to data protection. Int. J. Uncertain. Fuzziness Knowl. Based Syst. 20, 907–919 (2012)
Article MathSciNet MATH Google Scholar
Wang, K.: Bottom-up generalization: a data mining solution to privacy protection. Proc. ICDM 2004, 249–256 (2004)
Google Scholar
Winkler, W.E.: Re-identification methods for masked microdata. In: Domingo-Ferrer, J., Torra, V. (eds.) PSD 2004. LNCS, vol. 3050, pp. 216–230. Springer, Heidelberg (2004)
Chapter Google Scholar

Download references

Acknowledgments

This research is partially funded by the Spanish MICINN projects COGNITIO (TIN-2012-38450-C03-03), EdeTRI (TIN2012-39348-C02-01) and COPRIVACY (TIN2011-27076-C03-03), the grant 2009-SGR-1434 from the Generalitat de Catalunya, and the European Project DwB (Grant Agreement Number 262608).

Author information

Authors and Affiliations

CSIC - Spanish Council for Scientific Research, IIIA - Artificial Intelligence Research Institute, Campus UAB, 08193, Bellaterra, Catalonia, Spain
Eva Armengol
University of Skövde, Skövde, Sweden
Vicenç Torra

Authors

Eva Armengol
View author publications
You can also search for this author in PubMed Google Scholar
Vicenç Torra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vicenç Torra .

Editor information

Editors and Affiliations

University of Skövde, Skövde, Sweden
Vicenc Torra
Toho Gakuen, Tokyo, Japan
Torra Narukawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Armengol, E., Torra, V. (2015). Generalization-Based k-Anonymization. In: Torra, V., Narukawa, T. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2015. Lecture Notes in Computer Science(), vol 9321. Springer, Cham. https://doi.org/10.1007/978-3-319-23240-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-23240-9_17
Published: 01 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23239-3
Online ISBN: 978-3-319-23240-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics