research-article

On differentially private frequent itemset mining

Authors:
Chen Zeng

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

,
Jeffrey F. Naughton

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

,
Jin-Yi Cai

University of Wisconsin-Madison, Madison, WI

University of Wisconsin-Madison, Madison, WI
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 1pp 25–36https://doi.org/10.14778/2428536.2428539

Published:01 November 2012Publication History

Proceedings of the VLDB Endowment

Abstract

We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items). Accordingly, we investigate an approach that begins by truncating long transactions, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy. Experimental results over standard benchmark databases show that truncating is indeed effective. Our algorithm solves the "classical" frequent itemset mining problem, in which the goal is to find all itemsets whose support exceeds a threshold. Related work has proposed differentially private algorithms for the top-k itemset mining problem ("find the k most frequent itemsets".) An experimental comparison with those algorithms show that our algorithm achieves better F-score unless k is small.

References

http://pages.cs.wisc.edu/~zeng/dfim.pdf.Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994. Google ScholarDigital Library
D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC, 1977. Google ScholarDigital Library
M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Anonymity preserving pattern discovery. The VLDB Journal, 2008. Google ScholarDigital Library
R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, 2010. Google ScholarDigital Library
D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A performance study of mining maximal frequent itemsets. In FIMI, 2003.Google Scholar
J. Cao, P. Karras, C. Raissi, and K.-L. Tan. ρ-uncertainty: Inference proof transaction anonymization. In VLDB, 2010. Google ScholarDigital Library
R. Chen, N. Mohammed, B. C. M. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. VLDB, 2011.Google Scholar
C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarDigital Library
C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCS, 2006. Google ScholarDigital Library
A. Evfimievski, R. Srikant, R. Agarwal, and J. Gehrke. Privacy preserving mining of association rules. 2004.Google Scholar
A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, 2009. Google ScholarDigital Library
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000. Google ScholarDigital Library
Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. VLDB, 2009. Google ScholarDigital Library
J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining: a general survey and comparison. SIGKDD Explor. Newsl., 2000. Google ScholarDigital Library
N. Li, W. Qardaji, D. Su, and J. Cao. Privbasis: Frequent itemsets mining with differential privacy. In VLDB, 2012. Google ScholarDigital Library
F. McSherry and R. Mahajan. Differentially-private network trace analysis. SIGCOMM Comput. Commun. Rev., 2010. Google ScholarDigital Library
V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, 2010. Google ScholarDigital Library
M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. VLDB, 2008. Google ScholarDigital Library
V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding. TKDE. Google ScholarDigital Library
X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2009. Google ScholarDigital Library
Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD '01, 2001. Google ScholarDigital Library

Index Terms

On differentially private frequent itemset mining
1. Information systems
  1. Data management systems
    1. Database management system engines
  2. Information systems applications
    1. Data mining

Recommendations

Top-k frequent itemsets via differentially private FP-trees
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Frequent itemset mining is a core data mining task and has been studied extensively. Although by their nature, frequent itemsets are aggregates over many individuals and would not seem to pose a privacy threat, an attacker with strong background ...
Read More
Differentially Private Frequent Itemset Mining via Transaction Splitting
Recently, there has been a growing interest in designing differentially private data mining algorithms. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a ...
Read More
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining

In order to generate synthetic basket datasets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket datasets. The characteristics that could be used for this purpose include the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 1
November 2012
36 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 November 2012
Published in pvldb Volume 6, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 32
  Total Citations
  View Citations
- 376
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On differentially private frequent itemset mining

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Top-k frequent itemsets via differentially private FP-trees

Differentially Private Frequent Itemset Mining via Transaction Splitting

Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On differentially private frequent itemset mining

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Top-k frequent itemsets via differentially private FP-trees

Differentially Private Frequent Itemset Mining via Transaction Splitting

Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media