Skip to main content

“Secure” Log-Linear and Logistic Regression Analysis of Distributed Databases

  • Conference paper
Privacy in Statistical Databases (PSD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4302))

Included in the following conference series:

Abstract

The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002)

    Book  MATH  Google Scholar 

  2. Bertino, E., Fovino, I.N., Provenza, L.P.: A framework for evaluating privacy preserving data mining algorithms. Data Mining and Knowledge Discovery 11, 121–154 (2005)

    Article  MathSciNet  Google Scholar 

  3. Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Therory and Practice. MIT Press, Cambridge (1975)

    Google Scholar 

  4. Dobra, A., Fienberg, S.E., Rinaldo, A., Zhou, Y.: Confidentiality Protection and Utility for Contingency Table Data: Algorithms and Links to Statistical Theory (unpublished manuscript, 2006)

    Google Scholar 

  5. Dobra, A., Fienberg, S.E., Trottini, M.: Assessing the risk of disclosure of confidential categorical data. In: Bernardo, J., et al. (eds.) Bayesian Statistics 7, pp. 125–144. Oxford University Press, Oxford (2003)

    Google Scholar 

  6. Eriksson, N., Fienberg, S.E., Rinaldo, A., Sullivant, S.: Polyhedral conditions for the non-existence of the MLE for hierarchical log-linear models. Journal of Symbolic Computation 41, 222–233 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  7. Fienberg, S.E.: The Analysis of Cross-Classified Categorical Data. MIT Press, Cambridge (1980)

    MATH  Google Scholar 

  8. Fienberg, S.E.: Datamining and Disclosure Limitation for Categorical Statistical Databases. In: Proceedings of Workshop on Privacy and Security Aspects of Data Mining, Fourth IEEE International Conference on Data Mining (ICDM), Brighton, UK (2004)

    Google Scholar 

  9. Fienberg, Rinaldo.: Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation, Journal of Statistical Planning and Inference (to appear, 2006)

    Google Scholar 

  10. Fienberg, S.E., Slavkovic, A.B.: Making the release of confidential data from multi-way tables count. Chance 17(3), 5–10 (2004a)

    MathSciNet  Google Scholar 

  11. Fienberg, S.E., Slavkovic, A.B.: Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules. Data Mining and Knowledge Discovery Journal 11(2), 155–180 (2005)

    Article  MathSciNet  Google Scholar 

  12. Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press, Chicago (1974)

    MATH  Google Scholar 

  13. Kantarcioglu, M., Clifton, C.: Privacy preserving data mining of association rules on horizontally partitioned data. Transactions on Knowledge and Data Engineering 16, 1026–1037 (2004)

    Article  Google Scholar 

  14. Karr, A.F., Lin, X., Reiter, J.P., Sanil, A.P.: Privacy preserving analysis of vertically partitioned data using secure matrix products. In: J. Official Statist. (submitted for publication, 2004), available on-line at www.niss.org/dgii/technicalreports.html

  15. Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure regressions on distributed databases. Journal of Computational and Graphical Statistics 14, 263–279 (2005a)

    Article  MathSciNet  Google Scholar 

  16. Karr, A.F., Fulp, W.J., Vera, F., Young, S.S.: Secure, Privacy-Preserving Analysis of Distributed Databases (2005b), available on-line at www.niss.org/dgii/techreports.html

  17. Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure statistical analysis of distributed databases. In: Wilson, A., Wilson, G., Olwell, D. (eds.) Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, Springer, New York (2006)

    Google Scholar 

  18. Koch, G., Amara, J., Atkinson, S., Stanish, W.: Overview of categorical analysis methods. SAS-SUGI 8, 785–795 (1983)

    Google Scholar 

  19. Kohnen, C.N., Reiter, J.P., Karr, A.F., Lin, X., Sanil, A.P.: Secure regression for vertically partitioned, partially overlapping data (2005), available on-line at http://www.niss.org/dgii/techreports.html

  20. Reiter, J.P.: Model diagnostics for remote access regression servers. Statistics and Computing 13, 371–380 (2003)

    Article  MathSciNet  Google Scholar 

  21. Reiter, J.P.: Secure regression on distributed databases (unpublished manuscript, 2004)

    Google Scholar 

  22. Reiter, J.P., Kohnen, C.: Categorical data regression diagnostics for remote access servers. Journal of Statistical Computation and Simulation 75, 889–903 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  23. Rinaldo, A.: Maximum Likelihood Estimation for Log-linear Models. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University (2005)

    Google Scholar 

  24. Slavkovic, A.B.: Statistical disclosure limitation with released marginal and conditionals for contingency tables. In: ICDM 2004, pp. 13–20. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  25. Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (2002)

    Google Scholar 

  26. Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how, and when. IEEE Security and Privacy 2(6), 19–27 (2004)

    Article  Google Scholar 

  27. Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fienberg, S.E., Fulp, W.J., Slavkovic, A.B., Wrobel, T.A. (2006). “Secure” Log-Linear and Logistic Regression Analysis of Distributed Databases. In: Domingo-Ferrer, J., Franconi, L. (eds) Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol 4302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930242_24

Download citation

  • DOI: https://doi.org/10.1007/11930242_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49330-3

  • Online ISBN: 978-3-540-49332-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics