research-article

Privacy preservation of aggregates in hidden databases: why and how?

Authors:
Arjun Dasgupta

University of Texas at Arlington, Arlington, TX, USA

University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Nan Zhang

George Washington University, Washington D.C., DC, USA

George Washington University, Washington D.C., DC, USA
View Profile

,
Gautam Das

University of Texas at Arlington, Arlington, TX, USA

University of Texas at Arlington, Arlington, TX, USA
View Profile

,
Surajit Chaudhuri

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA
View Profile

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataJune 2009Pages 153–164https://doi.org/10.1145/1559845.1559863

Published:29 June 2009Publication History

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Pages 153–164

ABSTRACT

Many websites provide form-like interfaces which allow users to execute search queries on the underlying hidden databases. In this paper, we explain the importance of protecting sensitive aggregate information of hidden databases from being disclosed through individual tuples returned by the search queries. This stands in contrast to the traditional privacy problem where individual tuples must be protected while ensuring access to aggregating information. We propose techniques to thwart bots from sampling the hidden database to infer aggregate information. We present theoretical analysis and extensive experiments to illustrate the effectiveness of our approach.

References

M. Atallah, E. Bertino, A. K. Elmagarmid, M. Ibrahim, V. S. Verykios, Disclose Limitation of Sensitive Rules. Knowledge and Data Exchange Workshop 1999. Google ScholarDigital Library
R. Agrawal, A. Evfimievski, and R. Srikant, Information Sharing Across Private Databases. SIGMOD 2003. Google ScholarDigital Library
R. Agrawal and R. Srikant, Privacy-Preserving Data Mining, SIGMOD 2000. Google ScholarDigital Library
R. Agrawal, R. Srikant, and D. Thomas, Privacy Preserving OLAP, SIGMOD 2005. Google ScholarDigital Library
K. Bharat and A. Broder. A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines. WWW 1998. Google ScholarDigital Library
Z. Bar-Yossef and M. Gurevich. Random Sampling from a Search Engine's Index. WWW 2006. Google ScholarDigital Library
Z. Bar-Yossef and M. Gurevich: Efficient search engine measurements. WWW 2007. Google ScholarDigital Library
N. Bruno, L. Gravano, A. Marian: Evaluating Top-k Queries over Web-Accessible Databases. ICDE 2002.Google ScholarCross Ref
J. P. Callan, M. E. Connell: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2): 2001. Google ScholarDigital Library
K. C-C. Chang, S. Hwang: Minimal probing: supporting expensive predicates for top-k queries. SIGMOD 2002. Google ScholarDigital Library
M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang, WebTables: Exploring the Power of Tables on the Web, VLDB 2008. Google ScholarDigital Library
C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, Tools for Privacy Preserving Distributed Data Mining, ACM SIGKDD Explorations, 4(28): 2003. Google ScholarDigital Library
A. Dasgupta, G. Das, H. Mannila: A random walk approach to sampling hidden databases. SIGMOD 2007. Google ScholarDigital Library
C. Dwork, F. McSherry, K. Nissim, and A. Smith, Calibrating noise to sensitivity in private data analysis. Theory of Cryptography Conference 2006. Google ScholarDigital Library
A. Dasgupta, N. Zhang, G. Das: Leveraging COUNT Information in Sampling Hidden Databases. ICDE 2009. Google ScholarDigital Library
A. Dasgupta, N. Zhang, G. Das, S. Chaudhuri, On Privacy Preservations of Aggregates in Hidden Databases, Technical Report TR-GWU-CS-09-001, George Washington University, 2009.Google Scholar
J. Elson, J. R. Douceur, J. Howell, J. Saul: Asirra: a CAPTCHA that exploits interest-aligned manual image categorization, CCS 2007.Google Scholar
http://code.google.com/apis/soapsearch/api_faq.htmlGoogle Scholar
A. Gkoulalas-Divanis and V. S. Verykios, An Integer Programming Approach for Frequent Itemset Hiding. CIKM 2006 Google ScholarDigital Library
S. Hettich and S. D. Bay, The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science. 1999.Google Scholar
Y. Hedley, M. Younas, A. E. James, M. Sanderson: A two-phase sampling technique for information extraction from hidden web databases. WIDM 2004. Google ScholarDigital Library
Y. Hedley, M. Younas, A. E. James, M. Sanderson: Sampling, information extraction and summarisation of Hidden Web databases. Data Knowl. Eng. 59(2): 2006. Google ScholarDigital Library
P. G. Ipeirotis, L. Gravano: Distributed Search over the Hidden Web: Hierarchical Database Sampling and Selection. VLDB 2002. Google ScholarDigital Library
S. Jajodia, P. Samarati, M. L. Sapino, V. S. Subrahmanian, Flexible support for multiple access control policies. TODS 26(2): 2001. Google ScholarDigital Library
K. Kenthapadi, N. Mishra, and K. Nissim, Simulatable auditing. PODS 2005. Google ScholarDigital Library
A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, l-Diversity: Privacy Beyond k-Anonymity. TKDD 1(1): 2007. Google ScholarDigital Library
J. Madhavan, D. Ko, A. Kot, V. Ganapathy, A. Rasmussen, and A. Halevy, Google's Deep-Web Crawl, VLDB 2008. Google ScholarDigital Library
S. U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani, Towards robustness in query auditing. VLDB 2006. Google ScholarDigital Library
R. S. Sandhu, E. J. Coyne, H. L. Feinstein, and C. E. Youman, Role-based access control models. IEEE Computer, 29(2): 1996. Google ScholarDigital Library
L. Sweeney, k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10(5): 2002. Google ScholarDigital Library
V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni, Association rule hiding, TKDE 16(4): 2004. Google ScholarDigital Library
N. Zhang and W. Zhao, Privacy-Preserving Data Mining Systems. IEEE Computer, 40(4): 2007. Google ScholarDigital Library

Index Terms

Privacy preservation of aggregates in hidden databases: why and how?
1. Information systems
  1. Data management systems
    1. Database administration

Recommendations

IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society

Privacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
Read More
Privacy risks in health databases from aggregate disclosure
PETRA '09: Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments

This paper focuses on privacy risks in health databases that arise in assistive environments, where humans interact with the environment and this information is captured, assimilated and events of interest are extracted. The stakeholders of such an ...
Read More
Privacy preservation and data exploration on databases
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Editors:
Carsten Binnig,
Benoit Dageville,
General Chairs:
Uğur Çetintemel
Brown University, USA
,
Stan Zdonik
Brown University, USA
,
Program Chair:
Donald Kossmann
ETH Zurich, Switzerland
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
hidden databases
privacy preservation
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 1,029
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Privacy preservation of aggregates in hidden databases: why and how?

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

IMR based Anonymization for Privacy Preservation in Data Mining

Privacy risks in health databases from aggregate disclosure

Privacy preservation and data exploration on databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Privacy preservation of aggregates in hidden databases: why and how?

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

IMR based Anonymization for Privacy Preservation in Data Mining

Privacy risks in health databases from aggregate disclosure

Privacy preservation and data exploration on databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media