Skip to main content
Log in

Large-scale k-means clustering with user-centric privacy-preservation

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

A k-means clustering with a new privacy-preserving concept, user-centric privacy preservation, is presented. In this framework, users can conduct data mining using their private information by storing them in their local storage. After the computation, they obtain only the mining result without disclosing private information to others. In most cases, the number of parties that can join conventional privacy-preserving data mining has been assumed to be only two. In our framework, we assume large numbers of parties join the protocol; therefore, not only scalability but also asynchronism and fault-tolerance is important. Considering this, we propose a k-mean algorithm combined with a decentralized cryptographic protocol and a gossip-based protocol. The computational complexity is O(log n) with respect to the number of parties n, and experimental results show that our protocol is scalable even with one million parties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Breese J, Heckerman D (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence (UAI), pp 43–52

  2. Dåmgard I, Jurik M (2001) A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Public key cryptography. Springer, Berlin

  3. Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Proceedings of the IEEE international conference on privacy, security and data mining, vol 14, pp 1–8. Australian Computer Society, Darlinghurst

  4. Evfimievski A et al (2004) Privacy preserving mining of association rules. Inf Syst 29(4): 343–364

    Article  Google Scholar 

  5. Goldreich O (2004) Foundations of Cryptography: basic applications, vol 2. Cambridge University Press, London

    Google Scholar 

  6. Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 593–599. ACM Press, New York

  7. Jelasity M et al (2005) Gossip-based aggregation in large dynamic networks. ACM Trans Comput Syst (TOCS) 23(3): 219–252

    Article  Google Scholar 

  8. Jha S et al (2005) Privacy preserving clustering. Lect Notes Comput Sci 3679: 397

    Article  Google Scholar 

  9. Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, pp 1026–1037

  10. Kearns M et al (2007) Privacy-preserving belief propagation and sampling. In: NIPS 20, vol 20. MIT Press, Cambridge

  11. Kempe D et al (2003) Gossip-based computation of aggregate information. In: Proceedings of 44th annual IEEE symposium on foundations of computer science 2003 (FOCS), pp 482–491

  12. Kowalczyk W, Vlassis N (2005) Newscast EM. In: Proceedings of neural information processing system, vol 17. MIT Press, Cambridge, pp 713–720

  13. Laur S et al (2006) Cryptographically private support vector machines. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 618–624

  14. Lin X et al (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inform Syst 8(1): 68–81

    Article  Google Scholar 

  15. Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15(3): 177–206

    Article  MATH  MathSciNet  Google Scholar 

  16. Malkhi D et al (2004) Fairplay: a secure two-party computation system. In: Proceedings of the 13th USENIX security symposium, pp 287–302

  17. Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of third IEEE international conference on data mining (ICDM), pp 211–218

  18. Padmanabhan V et al (2003) Resilient peer-to-peer streaming. In: Proceedings of eleventh IEEE international conference on network protocols, pp 16–27

  19. Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Springer, Berlin, pp 223–238

  20. Pedersen T et al (1991) A threshold cryptosystem without a trusted party. Eurocrypt 91: 129–140

    Google Scholar 

  21. Sakuma J et al (2008) Privacy-preserving reinforcement learning. In: Proceedings of the 25th international conference on machine learning (ICML). ACM Press, New York, pp 864–871

  22. Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570

    Article  MATH  MathSciNet  Google Scholar 

  23. Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inform Syst 19(2): 133–157

    Article  Google Scholar 

  24. Tran D et al (2003) ZIGZAG: an efficient peer-to-peer scheme for media streaming. In: Proceedings of twenty-second annual joint conference of the IEEE computer and communications societies 2003 (INFOCOM), vol 2, pp 1283–1292

  25. Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 206–215

  26. Vaidya J et al (2008) Privacy-preserving Naïve Bayes Classification. VLDB J 17(4): 879–898

    Article  Google Scholar 

  27. Vaidya J et al (2008) Privacy-preserving SVM classification. Knowl Inform Syst 14(2): 161–178

    Article  Google Scholar 

  28. Yang Z et al (2005) Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 5th international conference on data mining (ICDM). Society for Industrial Mathematics

  29. Yao AC-C (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, pp 162–167

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Sakuma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakuma, J., Kobayashi, S. Large-scale k-means clustering with user-centric privacy-preservation. Knowl Inf Syst 25, 253–279 (2010). https://doi.org/10.1007/s10115-009-0243-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-009-0243-x

Keywords

Navigation