Abstract
The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002)
Bertino, E., Fovino, I.N., Provenza, L.P.: A framework for evaluating privacy preserving data mining algorithms. Data Mining and Knowledge Discovery 11, 121–154 (2005)
Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Therory and Practice. MIT Press, Cambridge (1975)
Dobra, A., Fienberg, S.E., Rinaldo, A., Zhou, Y.: Confidentiality Protection and Utility for Contingency Table Data: Algorithms and Links to Statistical Theory (unpublished manuscript, 2006)
Dobra, A., Fienberg, S.E., Trottini, M.: Assessing the risk of disclosure of confidential categorical data. In: Bernardo, J., et al. (eds.) Bayesian Statistics 7, pp. 125–144. Oxford University Press, Oxford (2003)
Eriksson, N., Fienberg, S.E., Rinaldo, A., Sullivant, S.: Polyhedral conditions for the non-existence of the MLE for hierarchical log-linear models. Journal of Symbolic Computation 41, 222–233 (2006)
Fienberg, S.E.: The Analysis of Cross-Classified Categorical Data. MIT Press, Cambridge (1980)
Fienberg, S.E.: Datamining and Disclosure Limitation for Categorical Statistical Databases. In: Proceedings of Workshop on Privacy and Security Aspects of Data Mining, Fourth IEEE International Conference on Data Mining (ICDM), Brighton, UK (2004)
Fienberg, Rinaldo.: Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation, Journal of Statistical Planning and Inference (to appear, 2006)
Fienberg, S.E., Slavkovic, A.B.: Making the release of confidential data from multi-way tables count. Chance 17(3), 5–10 (2004a)
Fienberg, S.E., Slavkovic, A.B.: Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules. Data Mining and Knowledge Discovery Journal 11(2), 155–180 (2005)
Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press, Chicago (1974)
Kantarcioglu, M., Clifton, C.: Privacy preserving data mining of association rules on horizontally partitioned data. Transactions on Knowledge and Data Engineering 16, 1026–1037 (2004)
Karr, A.F., Lin, X., Reiter, J.P., Sanil, A.P.: Privacy preserving analysis of vertically partitioned data using secure matrix products. In: J. Official Statist. (submitted for publication, 2004), available on-line at www.niss.org/dgii/technicalreports.html
Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure regressions on distributed databases. Journal of Computational and Graphical Statistics 14, 263–279 (2005a)
Karr, A.F., Fulp, W.J., Vera, F., Young, S.S.: Secure, Privacy-Preserving Analysis of Distributed Databases (2005b), available on-line at www.niss.org/dgii/techreports.html
Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure statistical analysis of distributed databases. In: Wilson, A., Wilson, G., Olwell, D. (eds.) Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, Springer, New York (2006)
Koch, G., Amara, J., Atkinson, S., Stanish, W.: Overview of categorical analysis methods. SAS-SUGI 8, 785–795 (1983)
Kohnen, C.N., Reiter, J.P., Karr, A.F., Lin, X., Sanil, A.P.: Secure regression for vertically partitioned, partially overlapping data (2005), available on-line at http://www.niss.org/dgii/techreports.html
Reiter, J.P.: Model diagnostics for remote access regression servers. Statistics and Computing 13, 371–380 (2003)
Reiter, J.P.: Secure regression on distributed databases (unpublished manuscript, 2004)
Reiter, J.P., Kohnen, C.: Categorical data regression diagnostics for remote access servers. Journal of Statistical Computation and Simulation 75, 889–903 (2005)
Rinaldo, A.: Maximum Likelihood Estimation for Log-linear Models. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University (2005)
Slavkovic, A.B.: Statistical disclosure limitation with released marginal and conditionals for contingency tables. In: ICDM 2004, pp. 13–20. IEEE Computer Society Press, Los Alamitos (2004)
Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (2002)
Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how, and when. IEEE Security and Privacy 2(6), 19–27 (2004)
Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fienberg, S.E., Fulp, W.J., Slavkovic, A.B., Wrobel, T.A. (2006). “Secure” Log-Linear and Logistic Regression Analysis of Distributed Databases. In: Domingo-Ferrer, J., Franconi, L. (eds) Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol 4302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11930242_24
Download citation
DOI: https://doi.org/10.1007/11930242_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49330-3
Online ISBN: 978-3-540-49332-7
eBook Packages: Computer ScienceComputer Science (R0)