skip to main content
research-article

On differentially private frequent itemset mining

Published:01 November 2012Publication History
Skip Abstract Section

Abstract

We consider differentially private frequent itemset mining. We begin by exploring the theoretical difficulty of simultaneously providing good utility and good privacy in this task. While our analysis proves that in general this is very difficult, it leaves a glimmer of hope in that our proof of difficulty relies on the existence of long transactions (that is, transactions containing many items). Accordingly, we investigate an approach that begins by truncating long transactions, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy. Experimental results over standard benchmark databases show that truncating is indeed effective. Our algorithm solves the "classical" frequent itemset mining problem, in which the goal is to find all itemsets whose support exceeds a threshold. Related work has proposed differentially private algorithms for the top-k itemset mining problem ("find the k most frequent itemsets".) An experimental comparison with those algorithms show that our algorithm achieves better F-score unless k is small.

References

  1. http://pages.cs.wisc.edu/~zeng/dfim.pdf.Google ScholarGoogle Scholar
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. Anonymity preserving pattern discovery. The VLDB Journal, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Bhaskar, S. Laxman, A. Smith, and A. Thakurta. Discovering frequent patterns in sensitive data. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. Mafia: A performance study of mining maximal frequent itemsets. In FIMI, 2003.Google ScholarGoogle Scholar
  7. J. Cao, P. Karras, C. Raissi, and K.-L. Tan. ρ-uncertainty: Inference proof transaction anonymization. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Chen, N. Mohammed, B. C. M. Fung, B. C. Desai, and L. Xiong. Publishing set-valued data via differential privacy. VLDB, 2011.Google ScholarGoogle Scholar
  9. C. Dwork. Differential privacy. In ICALP, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dwork, F. Mcsherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Evfimievski, R. Srikant, R. Agarwal, and J. Gehrke. Privacy preserving mining of association rules. 2004.Google ScholarGoogle Scholar
  12. A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. In STOC, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Hipp, U. Güntzer, and G. Nakhaeizadeh. Algorithms for association rule mining: a general survey and comparison. SIGKDD Explor. Newsl., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Li, W. Qardaji, D. Su, and J. Cao. Privbasis: Frequent itemsets mining with differential privacy. In VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. McSherry and R. Mahajan. Differentially-private network trace analysis. SIGCOMM Comput. Commun. Rev., 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. V. Rastogi and S. Nath. Differentially private aggregation of distributed time-series with transformation and encryption. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Terrovitis, N. Mamoulis, and P. Kalnis. Privacy-preserving anonymization of set-valued data. VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. S. Verykios, A. K. Elmagarmid, E. Bertino, Y. Saygin, and E. Dasseni. Association rule hiding. TKDE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Z. Zheng, R. Kohavi, and L. Mason. Real world performance of association rule algorithms. In KDD '01, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On differentially private frequent itemset mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 6, Issue 1
        November 2012
        36 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 November 2012
        Published in pvldb Volume 6, Issue 1

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader