Abstract
We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items). Accordingly, we investigate an approach that begins by truncating long transactions, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy. Experimental results over standard benchmark databases show that truncating is indeed effective. Our algorithm solves the "classical" frequent itemset mining problem, in which the goal is to find all itemsets whose support exceeds a threshold. Related work has proposed differentially private algorithms for the top-k itemset mining problem ("find the k most frequent itemsets".) An experimental comparison with those algorithms show that our algorithm achieves better F-score unless k is small.
- http://pages.cs.wisc.edu/~zeng/dfim.pdf.Google Scholar
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994. Google ScholarDigital Library
- D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC, 1977. Google ScholarDigital Library
- M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Anonymity preserving pattern discovery. The VLDB Journal, 2008. Google ScholarDigital Library
- R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, 2010. Google ScholarDigital Library
- D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A performance study of mining maximal frequent itemsets. In FIMI, 2003.Google Scholar
- J. Cao, P. Karras, C. Raissi, and K.-L. Tan. ρ-uncertainty: Inference proof transaction anonymization. In VLDB, 2010. Google ScholarDigital Library
- R. Chen, N. Mohammed, B. C. M. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. VLDB, 2011.Google Scholar
- C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarDigital Library
- C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCS, 2006. Google ScholarDigital Library
- A. Evfimievski, R. Srikant, R. Agarwal, and J. Gehrke. Privacy preserving mining of association rules. 2004.Google Scholar
- A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, 2009. Google ScholarDigital Library
- J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000. Google ScholarDigital Library
- Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. VLDB, 2009. Google ScholarDigital Library
- J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining: a general survey and comparison. SIGKDD Explor. Newsl., 2000. Google ScholarDigital Library
- N. Li, W. Qardaji, D. Su, and J. Cao. Privbasis: Frequent itemsets mining with differential privacy. In VLDB, 2012. Google ScholarDigital Library
- F. McSherry and R. Mahajan. Differentially-private network trace analysis. SIGCOMM Comput. Commun. Rev., 2010. Google ScholarDigital Library
- V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, 2010. Google ScholarDigital Library
- M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. VLDB, 2008. Google ScholarDigital Library
- V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding. TKDE. Google ScholarDigital Library
- X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2009. Google ScholarDigital Library
- Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD '01, 2001. Google ScholarDigital Library
Index Terms
- On differentially private frequent itemset mining
Recommendations
Top-k frequent itemsets via differentially private FP-trees
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningFrequent itemset mining is a core data mining task and has been studied extensively. Although by their nature, frequent itemsets are aggregates over many individuals and would not seem to pose a privacy threat, an attacker with strong background ...
Differentially Private Frequent Itemset Mining via Transaction Splitting
Recently, there has been a growing interest in designing differentially private data mining algorithms. Frequent itemset mining (FIM) is one of the most fundamental problems in data mining. In this paper, we explore the possibility of designing a ...
Approximate Inverse Frequent Itemset Mining: Privacy, Complexity, and Approximation
ICDM '05: Proceedings of the Fifth IEEE International Conference on Data MiningIn order to generate synthetic basket datasets for better benchmark testing, it is important to integrate characteristics from real-life databases into the synthetic basket datasets. The characteristics that could be used for this purpose include the ...
Comments