Large-scale k-means clustering with user-centric privacy-preservation

Sakuma, Jun; Kobayashi, Shigenobu

doi:10.1007/s10115-009-0243-x

Large-scale k-means clustering with user-centric privacy-preservation

Regular Paper
Published: 27 August 2009

Volume 25, pages 253–279, (2010)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Jun Sakuma¹ &
Shigenobu Kobayashi²

261 Accesses
26 Citations
Explore all metrics

Abstract

A k-means clustering with a new privacy-preserving concept, user-centric privacy preservation, is presented. In this framework, users can conduct data mining using their private information by storing them in their local storage. After the computation, they obtain only the mining result without disclosing private information to others. In most cases, the number of parties that can join conventional privacy-preserving data mining has been assumed to be only two. In our framework, we assume large numbers of parties join the protocol; therefore, not only scalability but also asynchronism and fault-tolerance is important. Considering this, we propose a k-mean algorithm combined with a decentralized cryptographic protocol and a gossip-based protocol. The computational complexity is O(log n) with respect to the number of parties n, and experimental results show that our protocol is scalable even with one million parties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Breese J, Heckerman D (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the fourteenth conference on uncertainty in artificial intelligence (UAI), pp 43–52
Dåmgard I, Jurik M (2001) A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In: Public key cryptography. Springer, Berlin
Du W, Zhan Z (2002) Building decision tree classifier on private data. In: Proceedings of the IEEE international conference on privacy, security and data mining, vol 14, pp 1–8. Australian Computer Society, Darlinghurst
Evfimievski A et al (2004) Privacy preserving mining of association rules. Inf Syst 29(4): 343–364
Article Google Scholar
Goldreich O (2004) Foundations of Cryptography: basic applications, vol 2. Cambridge University Press, London
Google Scholar
Jagannathan G, Wright RN (2005) Privacy-preserving distributed k-means clustering over arbitrarily partitioned data. In: Proceedings of ACM SIGKDD international conference on knowledge discovery and data mining, pp 593–599. ACM Press, New York
Jelasity M et al (2005) Gossip-based aggregation in large dynamic networks. ACM Trans Comput Syst (TOCS) 23(3): 219–252
Article Google Scholar
Jha S et al (2005) Privacy preserving clustering. Lect Notes Comput Sci 3679: 397
Article Google Scholar
Kantarcioglu M, Clifton C (2004) Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, pp 1026–1037
Kearns M et al (2007) Privacy-preserving belief propagation and sampling. In: NIPS 20, vol 20. MIT Press, Cambridge
Kempe D et al (2003) Gossip-based computation of aggregate information. In: Proceedings of 44th annual IEEE symposium on foundations of computer science 2003 (FOCS), pp 482–491
Kowalczyk W, Vlassis N (2005) Newscast EM. In: Proceedings of neural information processing system, vol 17. MIT Press, Cambridge, pp 713–720
Laur S et al (2006) Cryptographically private support vector machines. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, New York, pp 618–624
Lin X et al (2005) Privacy-preserving clustering with distributed EM mixture modeling. Knowl Inform Syst 8(1): 68–81
Article Google Scholar
Lindell Y, Pinkas B (2002) Privacy preserving data mining. J Cryptol 15(3): 177–206
Article MATH MathSciNet Google Scholar
Malkhi D et al (2004) Fairplay: a secure two-party computation system. In: Proceedings of the 13th USENIX security symposium, pp 287–302
Merugu S, Ghosh J (2003) Privacy-preserving distributed clustering using generative models. In: Proceedings of third IEEE international conference on data mining (ICDM), pp 211–218
Padmanabhan V et al (2003) Resilient peer-to-peer streaming. In: Proceedings of eleventh IEEE international conference on network protocols, pp 16–27
Paillier P (1999) Public-key cryptosystems based on composite degree residuosity classes. In: Proceedings of Eurocrypt’99, Springer, Berlin, pp 223–238
Pedersen T et al (1991) A threshold cryptosystem without a trusted party. Eurocrypt 91: 129–140
Google Scholar
Sakuma J et al (2008) Privacy-preserving reinforcement learning. In: Proceedings of the 25th international conference on machine learning (ICML). ACM Press, New York, pp 864–871
Sweeney L (2002) k-Anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5): 557–570
Article MATH MathSciNet Google Scholar
Teng Z, Du W (2009) A hybrid multi-group approach for privacy-preserving data mining. Knowl Inform Syst 19(2): 133–157
Article Google Scholar
Tran D et al (2003) ZIGZAG: an efficient peer-to-peer scheme for media streaming. In: Proceedings of twenty-second annual joint conference of the IEEE computer and communications societies 2003 (INFOCOM), vol 2, pp 1283–1292
Vaidya J, Clifton C (2003) Privacy-preserving k-means clustering over vertically partitioned data. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM Press, New York, pp 206–215
Vaidya J et al (2008) Privacy-preserving Naïve Bayes Classification. VLDB J 17(4): 879–898
Article Google Scholar
Vaidya J et al (2008) Privacy-preserving SVM classification. Knowl Inform Syst 14(2): 161–178
Article Google Scholar
Yang Z et al (2005) Privacy-preserving classification of customer data without loss of accuracy. In: Proceedings of the 5th international conference on data mining (ICDM). Society for Industrial Mathematics
Yao AC-C (1986) How to generate and exchange secrets. In: Proceedings of the 27th IEEE symposium on foundations of computer science, pp 162–167

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, 305-8577, Japan
Jun Sakuma
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan
Shigenobu Kobayashi

Authors

Jun Sakuma
View author publications
You can also search for this author in PubMed Google Scholar
Shigenobu Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Sakuma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sakuma, J., Kobayashi, S. Large-scale k-means clustering with user-centric privacy-preservation. Knowl Inf Syst 25, 253–279 (2010). https://doi.org/10.1007/s10115-009-0243-x

Download citation

Received: 30 September 2008
Revised: 25 June 2009
Accepted: 27 June 2009
Published: 27 August 2009
Issue Date: November 2010
DOI: https://doi.org/10.1007/s10115-009-0243-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-scale k-means clustering with user-centric privacy-preservation

Abstract

Access this article

Similar content being viewed by others

Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data

Oblivious Sampling with Applications to Two-Party k-Means Clustering

Privacy Aware K-Means Clustering with High Utility

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-scale k-means clustering with user-centric privacy-preservation

Abstract

Access this article

Similar content being viewed by others

Privacy Preserving Multi-server k-means Computation over Horizontally Partitioned Data

Oblivious Sampling with Applications to Two-Party k-Means Clustering

Privacy Aware K-Means Clustering with High Utility

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation