Skip to main content

Interval Approach to Preserving Privacy in Statistical Databases: Related Challenges and Algorithms of Computational Statistics

  • Conference paper
Computer Network Security (MMM-ACNS 2007)

Abstract

In many practical situations, it is important to store large amounts of data and to be able to statistically process the data. A large part of the data is confidential, so while we welcome statistical data processing, we do not want to reveal sensitive individual data. If we allow researchers to ask all kinds of statistical queries, this can lead to violation of people’s privacy. A sure-proof way to avoid these privacy violations is to store ranges of values (e.g., between 40 and 50 for age) instead of the actual values. This idea solves the privacy problem, but it leads to a computational challenge: traditional statistical algorithms need exact data, but now we only know data with interval uncertainty. In this paper, we describe new algorithms designed for processing such interval data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cowell, F.A.: Grouping bounds for inequality measures under alternative informational assumptions. J. of Econometrics 48, 1–14 (1991)

    Article  MathSciNet  Google Scholar 

  2. Dalenius, T.: Finding a needle in a haystack — or identifying anonymous census record. Journal of Official Statistics 2(2), 329–336 (1986)

    Google Scholar 

  3. Dantsin, E., Kreinovich, V., Wolpert, A., Xiang, G.: Population variance under interval uncertainty: a new algorithm. Reliable Computing 12(4), 273–280 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  4. Denning, D.: Cryptography and Data Security. Addison-Wesley, Reading, MA (1982)

    MATH  Google Scholar 

  5. Duncan, G., Lambert, D.: The risk of disclosure for microdata. In: Proc. of the Bureau of the Census Third Annual Research Conference, Bureau of the Census, Washington, DC, pp. 263–274 (1987)

    Google Scholar 

  6. Duncan, G., Mukherjee, S.: Microdata disclosure limitation in statistical databases: query size and random sample query control In: Prof. 1991 IEEE Symposium on Research in Security and Privacy, Oakland, CA, May 20–22, 1991 (1991)

    Google Scholar 

  7. Fellegi, I.: On the question of statistical confidentiality. Journal of the American Statistical Association, 7–18 (1972)

    Google Scholar 

  8. Ferson, S., Ginzburg, L., Kreinovich, V., Longpré, L., Aviles, M.: Computing variance for interval data is NP-hard. ACM SIGACT News 33(2), 108–118 (2002)

    Article  Google Scholar 

  9. Jaulin, L., Kieffer, M., Didrit, O., Walter, E.: Applied Interval Analysis, Springer-Verlag, London (2001)

    MATH  Google Scholar 

  10. Kim, J.: A method for limiting disclosure of microdata based on random noise and transformation. In: Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 370–374 (1986)

    Google Scholar 

  11. Kirkendall, N., et al.: Report on Statistical Disclosure Limitations Methodology, Office of Management and Budget, Washington, DC, Statistical Policy Working Paper No. 22 (1994)

    Google Scholar 

  12. Kreinovich, V., Longpré, L., Starks, S.A., Xiang, G., Beck, J., Kandathi, R., Nayak, A., Ferson, S., Hajagos, J.: Interval versions of statistical techniques, with applications to environmental analysis, bioinformatics, and privacy in statistical databases. Journal of Computational and Applied Mathematics 199(2), 418–423 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  13. Kreinovich, V., Xiang, G., Starks, S.A., Longpré, L., Ceberio, M., Araiza, R., Beck, J., Kandathi, R., Nayak, A., Torres, R., Hajagos, J.: Towards combining probabilistic and interval uncertainty in engineering calculations. Reliable Computing 12(6), 471–501 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  14. Langewisch, A.T., Choobineh, F.F.: Mean and variance bounds and propagation for ill-specified random variables. IEEE Trans. SMC 34(4), 494–506 (2004)

    Google Scholar 

  15. Morgenstern, M.: Security and inference in multilevel database and knowledge base systems. In: Proc. of the ACM SIGMOD Conference, pp. 357–373 (1987)

    Google Scholar 

  16. Nguyen, H.T., Kreinovich, V., Gorodetski, V.I., Nesterov, V.M., Touloupiev, A.L.: Applications of interval-valued degrees of belief: a survey. In: Touloupiev, A. (ed.) Information Technologies and Intellectual Methods, vol. 3 (IT&IM’3), St. Petersburg Institute for Information and Automation of Russian Academy of Sciences (SPIIRAS), pp. 6–61 (in Russian) (1999)

    Google Scholar 

  17. Office of Technology Assessment, Protecting privacy in computerized medical information, US Government Printing Office, Washington, DC (1993)

    Google Scholar 

  18. Palley, M., Siminoff, J.: Regression methodology based disclosure of a statistical database. In: Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp. 382–387 (1986)

    Google Scholar 

  19. Rabinovich, S.: Measurement Errors and Uncertainties, Springer, N. Y. (2005)

    Google Scholar 

  20. Su, T., Ozsoyoglu, G.: Controlling FD and MVD inference in multilevel relational database systems. IEEE Transactions on Knowledge and Data Engineering 3, 474–485 (1991)

    Article  Google Scholar 

  21. Sweeney, L.: Weaving technology and policy together to maintain confidentiality. Journal of Law, Medicine and Ethics 25, 98–110 (1997)

    Article  Google Scholar 

  22. Sweeney, L.: Datafly: a system for providing anonymity in medical data. In: Lin, T.Y., Qian, S. (eds.) Database Security XI: Status and Prospects, Elsevier, Amsterdam (1998)

    Google Scholar 

  23. Vavasis, S.A.: Nonlinear Optimization. Oxford University Press, N.Y. (1991)

    MATH  Google Scholar 

  24. Willenborg, L., De Waal, T.: Statistical disclosure control in practice. Springer Verlag, New York (1996)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Longpré, L., Xiang, G., Kreinovich, V., Freudenthal, E. (2007). Interval Approach to Preserving Privacy in Statistical Databases: Related Challenges and Algorithms of Computational Statistics. In: Gorodetsky, V., Kotenko, I., Skormin, V.A. (eds) Computer Network Security. MMM-ACNS 2007. Communications in Computer and Information Science, vol 1. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73986-9_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73986-9_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73985-2

  • Online ISBN: 978-3-540-73986-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics