skip to main content
10.1145/1559845.1559863acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Privacy preservation of aggregates in hidden databases: why and how?

Published:29 June 2009Publication History

ABSTRACT

Many websites provide form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we explain the importance of protecting sensitive aggregate information of hidden databases from being disclosed through individual tuples returned by the search queries. This stands in contrast to the traditional privacy problem where individual tuples must be protected while ensuring access to aggregating information. We propose techniques to thwart bots from sampling the hidden database to infer aggregate information. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.

References

  1. M. Atallah, E. Bertino, A. K. Elmagarmid, M. Ibrahim, V. S. Verykios, Disclose Limitation of Sensitive Rules. Knowledge and Data Exchange Workshop 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Agrawal, A. Evfimievski, and R. Srikant, Information Sharing Across Private Databases. SIGMOD 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant, Privacy-Preserving Data Mining, SIGMOD 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Agrawal, R. Srikant, and D. Thomas, Privacy Preserving OLAP, SIGMOD 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bharat and A. Broder. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. WWW 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Bar-Yossef and M. Gurevich. Random Sampling from a Search Engine's Index. WWW 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Z. Bar-Yossef and M. Gurevich: Efficient search engine measurements. WWW 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Bruno, L. Gravano, A. Marian: Evaluating Top-k Queries over Web-Accessible Databases. ICDE 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. J. P. Callan, M. E. Connell: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2): 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. C-C. Chang, S. Hwang: Minimal probing: supporting expensive predicates for top-k queries. SIGMOD 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang, WebTables: Exploring the Power of Tables on the Web, VLDB 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, Tools for Privacy Preserving Distributed Data Mining, ACM SIGKDD Explorations, 4(28): 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Dasgupta, G. Das, H. Mannila: A random walk approach to sampling hidden databases. SIGMOD 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Dasgupta, N. Zhang, G. Das: Leveraging COUNT Information in Sampling Hidden Databases. ICDE 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Dasgupta, N. Zhang, G. Das, S. Chaudhuri, On Privacy Preservations of Aggregates in Hidden Databases, Technical Report TR-GWU-CS-09-001, George Washington University, 2009.Google ScholarGoogle Scholar
  17. J. Elson, J. R. Douceur, J. Howell, J. Saul: Asirra: a CAPTCHA that exploits interest-aligned manual image categorization, CCS 2007.Google ScholarGoogle Scholar
  18. http://code.google.com/apis/soapsearch/api_faq.htmlGoogle ScholarGoogle Scholar
  19. A. Gkoulalas-Divanis and V. S. Verykios, An Integer Programming Approach for Frequent Itemset Hiding. CIKM 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Hettich and S. D. Bay, The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science. 1999.Google ScholarGoogle Scholar
  21. Y. Hedley, M. Younas, A. E. James, M. Sanderson: A two-phase sampling technique for information extraction from hidden web databases. WIDM 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Hedley, M. Younas, A. E. James, M. Sanderson: Sampling, information extraction and summarisation of Hidden Web databases. Data Knowl. Eng. 59(2): 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. G. Ipeirotis, L. Gravano: Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. VLDB 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Jajodia, P. Samarati, M. L. Sapino, V. S. Subrahmanian, Flexible support for multiple access control policies. TODS 26(2): 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Kenthapadi, N. Mishra, and K. Nissim, Simulatable auditing. PODS 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, l-Diversity: Privacy Beyond k-Anonymity. TKDD 1(1): 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Madhavan, D. Ko, A. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy, Google's Deep-Web Crawl, VLDB 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani, Towards robustness in query auditing. VLDB 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman, Role-based access control models. IEEE Computer, 29(2): 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. L. Sweeney, k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5): 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, Association rule hiding, TKDE 16(4): 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. Zhang and W. Zhao, Privacy-Preserving Data Mining Systems. IEEE Computer, 40(4): 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy preservation of aggregates in hidden databases: why and how?

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
      June 2009
      1168 pages
      ISBN:9781605585512
      DOI:10.1145/1559845

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 June 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader