Differentially Private Projected Histograms: Construction and Use for Prediction

Vinterbo, Staal A.

doi:10.1007/978-3-642-33486-3_2

Staal A. Vinterbo²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7524))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

5154 Accesses
5 Citations

Abstract

Privacy concerns are among the major barriers to efficient secondary use of information and data on humans. Differential privacy is a relatively recent measure that has received much attention in machine learning as it quantifies individual risk using a strong cryptographically motivated notion of privacy. At the core of differential privacy lies the concept of information dissemination through a randomized process. One way of adding the needed randomness to any process is to pre-randomize the input. This can yield lower quality results than other more specialized approaches, but can be an attractive alternative when i. there does not exist a specialized differentially private alternative, or when ii. multiple processes applied in parallel can use the same pre-randomized input.

A simple way to do input randomization is to compute perturbed histograms, which essentially are noisy multiset membership functions. Unfortunately, computation of perturbed histograms is only efficient when the data stems from a low-dimensional discrete space. The restriction to discrete spaces can be mitigated by discretization; Lei presented in 2011 an analysis of discretization in the context of M-estimators. Here we address the restriction regarding the dimensionality of the data. In particular we present a differentially private approximation algorithm for selecting features that preserve conditional frequency densities, and use this to project data prior to computing differentially private histograms. The resulting projected histograms can be used as machine learning input and include the necessary randomness for differential privacy. We empirically validate the use of differentially private projected histograms for learning binary and multinomial logistic regression models using four real world data sets.

Download to read the full chapter text

Chapter PDF

Differentially Private Non-parametric Machine Learning as a Service

Robust and Private Bayesian Inference

Construction of Differentially Private Empirical Distributions from a Low-Order Marginals Set Through Solving Linear Equations with $$l_2$$ Regularization

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, accuracy, and consistency too: a holistic solution to contingency table release. In: PODS, pp. 273–282 (2007)
Google Scholar
Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, Displace, and Compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)
Chapter Google Scholar
Chaudhuri, K., Monteleoni, C., Sarwate, A.: Differentially private empirical risk minimization. JMLR 12, 1069–1109 (2011)
MathSciNet Google Scholar
Dwork, C.: Differential privacy: A survey of results. Theory and Applications of Models of Computation, pp. 1–19 (2008)
Google Scholar
Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006)
Chapter Google Scholar
Dwork, C., Smith, A.: Differential privacy for statistics: What we know and what we want to learn. J. Privacy and Confidentiality 1(2), 135–154 (2008)
Google Scholar
Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)
Chapter Google Scholar
Fisher, M., Nemhauser, G., Wolsey, L.: An analysis of approximations for maximizing submodular set functions—ii. Polyhedral Combinatorics, 73–87 (1978)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Google Scholar
Gupta, A., Ligett, K., McSherry, F., Roth, A., Talwar, K.: Differentially private combinatorial optimization. In: Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1106–1125. Society for Industrial and Applied Mathematics (2010)
Google Scholar
Hand, D.J., Till, R.J.: A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning 45, 171–186 (2001), doi:10.1023/A:1010920819831
Article MATH Google Scholar
Hay, M., Rastogi, V., Miklau, G., Suciu, D.: Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment 3(1-2), 1021–1032 (2010)
Google Scholar
Jagadish, H., Koudas, N., Muthukrishnan, S., Poosala, V., Sevcik, K., Suel, T.: Optimal histograms with quality guarantees. In: Proceedings of the International Conference on Very Large Data Bases, pp. 275–286. Institute of Electrical & Electronics Engineers (1998)
Google Scholar
Kennedy, R.L., Burton, A.M., Fraser, H.S., McStay, L.N., Harrison, R.F.: Early diagnosis of acute myocardial infarction using clinical and electrocardiographic data at presentation: Derivation and evaluation of logistic regression models. European Heart Journal 17, 1181–1191 (1996)
Article Google Scholar
Lei, J.: Differentially private m-estimators. In: NIPS, pp. 361–369 (2011)
Google Scholar
McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: FOCS, pp. 94–103 (2007)
Google Scholar
Mohammed, N., Chen, R., Fung, B., Yu, P.: Differentially private data release for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 493–501. ACM (2011)
Google Scholar
Pawlak, Z.: Rough Sets, Theoretical Aspects of Reasoning about Data, Series D: System Theory, Knowledge Engineering and Problem Solving, vol. 9. Kluwer Academic Publishers (1991)
Google Scholar
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2011) ISBN 3-900051-07-0
Google Scholar
Ullman, J., Vadhan, S.: Pcps and the hardness of generating synthetic data. In: ECCC, vol. 17, p. 17 (2010)
Google Scholar
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S, 4th edn. Springer, New York (2002) ISBN 0-387-95457-0
MATH Google Scholar
Vinterbo, S., Øhrn, A.: Minimal approximate hitting sets and rule templates. International Journal of Approximate Reasoning 25(2), 123–143 (2000)
Article MathSciNet MATH Google Scholar
Vinterbo, S.A., Kim, E.Y., Ohno-Machado, L.: Small, fuzzy and interpretable gene expression based classifiers. Bioinformatics 21(9), 1964–1970 (2005)
Article Google Scholar
Vitter, J.S.: An efficient algorithm for sequential random sampling. ACM Trans. Math. Softw. 13(1), 58–67 (1987)
Article MathSciNet Google Scholar
Xiao, Y., Xiong, L., Yuan, C.: Differentially Private Data Release through Multidimensional Partitioning. In: Jonker, W., Petković, M. (eds.) SDM 2010. LNCS, vol. 6358, pp. 150–168. Springer, Heidelberg (2010)
Chapter Google Scholar
Xu, J., Zhang, Z., Xiao, X., Yang, Y., Yu, G.: Differentially private histogram publication. In: Proceedings of the IEEE International Conference on Data Engineering (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Biomedical Informatics, University of California San Diego, San Diego, CA, USA
Staal A. Vinterbo

Authors

Staal A. Vinterbo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK
Peter A. Flach
Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road,, BS8 1UB, Bristol, UK
Tijl De Bie & Nello Cristianini &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vinterbo, S.A. (2012). Differentially Private Projected Histograms: Construction and Use for Prediction. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33486-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-33486-3_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33485-6
Online ISBN: 978-3-642-33486-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Differentially Private Projected Histograms: Construction and Use for Prediction

Abstract

Chapter PDF

Similar content being viewed by others

Differentially Private Non-parametric Machine Learning as a Service

Robust and Private Bayesian Inference

Construction of Differentially Private Empirical Distributions from a Low-Order Marginals Set Through Solving Linear Equations with $$l_2$$ Regularization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Differentially Private Projected Histograms: Construction and Use for Prediction

Abstract

Chapter PDF

Similar content being viewed by others

Differentially Private Non-parametric Machine Learning as a Service

Robust and Private Bayesian Inference

Construction of Differentially Private Empirical Distributions from a Low-Order Marginals Set Through Solving Linear Equations with $$l_2$$ Regularization

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation