Abstract
In many practical situations, it is important to store large amounts of data and to be able to statistically process the data. A large part of the data is confidential, so while we welcome statistical data processing, we do not want to reveal sensitive individual data. If we allow researchers to ask all kinds of statistical queries, this can lead to violation of people’s privacy. A sure-proof way to avoid these privacy violations is to store ranges of values (e.g., between 40 and 50 for age) instead of the actual values. This idea solves the privacy problem, but it leads to a computational challenge: traditional statistical algorithms need exact data, but now we only know data with interval uncertainty. In this paper, we describe new algorithms designed for processing such interval data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Cowell, F.A.: Grouping bounds for inequality measures under alternative informational assumptions. J. of Econometrics 48, 1–14 (1991)
Dalenius, T.: Finding a needle in a haystack — or identifying anonymous census record. Journal of Official Statistics 2(2), 329–336 (1986)
Dantsin, E., Kreinovich, V., Wolpert, A., Xiang, G.: Population variance under interval uncertainty: a new algorithm. Reliable Computing 12(4), 273–280 (2006)
Denning, D.: Cryptography and Data Security. Addison-Wesley, Reading, MA (1982)
Duncan, G., Lambert, D.: The risk of disclosure for microdata. In: Proc. of the Bureau of the Census Third Annual Research Conference, Bureau of the Census, Washington, DC, pp. 263–274 (1987)
Duncan, G., Mukherjee, S.: Microdata disclosure limitation in statistical databases: query size and random sample query control In: Prof. 1991 IEEE Symposium on Research in Security and Privacy, Oakland, CA, May 20–22, 1991 (1991)
Fellegi, I.: On the question of statistical confidentiality. Journal of the American Statistical Association, 7–18 (1972)
Ferson, S., Ginzburg, L., Kreinovich, V., Longpré, L., Aviles, M.: Computing variance for interval data is NP-hard. ACM SIGACT News 33(2), 108–118 (2002)
Jaulin, L., Kieffer, M., Didrit, O., Walter, E.: Applied Interval Analysis, Springer-Verlag, London (2001)
Kim, J.: A method for limiting disclosure of microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 370–374 (1986)
Kirkendall, N., et al.: Report on Statistical Disclosure Limitations Methodology, Office of Management and Budget, Washington, DC, Statistical Policy Working Paper No. 22 (1994)
Kreinovich, V., Longpré, L., Starks, S.A., Xiang, G., Beck, J., Kandathi, R., Nayak, A., Ferson, S., Hajagos, J.: Interval versions of statistical techniques, with applications to environmental analysis, bioinformatics, and privacy in statistical databases. Journal of Computational and Applied Mathematics 199(2), 418–423 (2007)
Kreinovich, V., Xiang, G., Starks, S.A., Longpré, L., Ceberio, M., Araiza, R., Beck, J., Kandathi, R., Nayak, A., Torres, R., Hajagos, J.: Towards combining probabilistic and interval uncertainty in engineering calculations. Reliable Computing 12(6), 471–501 (2006)
Langewisch, A.T., Choobineh, F.F.: Mean and variance bounds and propagation for ill-specified random variables. IEEE Trans. SMC 34(4), 494–506 (2004)
Morgenstern, M.: Security and inference in multilevel database and knowledge base systems. In: Proc. of the ACM SIGMOD Conference, pp. 357–373 (1987)
Nguyen, H.T., Kreinovich, V., Gorodetski, V.I., Nesterov, V.M., Touloupiev, A.L.: Applications of interval-valued degrees of belief: a survey. In: Touloupiev, A. (ed.) Information Technologies and Intellectual Methods, vol. 3 (IT&IM’3), St. Petersburg Institute for Information and Automation of Russian Academy of Sciences (SPIIRAS), pp. 6–61 (in Russian) (1999)
Office of Technology Assessment, Protecting privacy in computerized medical information, US Government Printing Office, Washington, DC (1993)
Palley, M., Siminoff, J.: Regression methodology based disclosure of a statistical database. In: Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 382–387 (1986)
Rabinovich, S.: Measurement Errors and Uncertainties, Springer, N. Y. (2005)
Su, T., Ozsoyoglu, G.: Controlling FD and MVD inference in multilevel relational database systems. IEEE Transactions on Knowledge and Data Engineering 3, 474–485 (1991)
Sweeney, L.: Weaving technology and policy together to maintain confidentiality. Journal of Law, Medicine and Ethics 25, 98–110 (1997)
Sweeney, L.: Datafly: a system for providing anonymity in medical data. In: Lin, T.Y., Qian, S. (eds.) Database Security XI: Status and Prospects, Elsevier, Amsterdam (1998)
Vavasis, S.A.: Nonlinear Optimization. Oxford University Press, N.Y. (1991)
Willenborg, L., De Waal, T.: Statistical disclosure control in practice. Springer Verlag, New York (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Longpré, L., Xiang, G., Kreinovich, V., Freudenthal, E. (2007). Interval Approach to Preserving Privacy in Statistical Databases: Related Challenges and Algorithms of Computational Statistics. In: Gorodetsky, V., Kotenko, I., Skormin, V.A. (eds) Computer Network Security. MMM-ACNS 2007. Communications in Computer and Information Science, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73986-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-73986-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73985-2
Online ISBN: 978-3-540-73986-9
eBook Packages: Computer ScienceComputer Science (R0)