Skip to main content
Log in

An update algorithm for restricted random walk clustering for dynamic data sets

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

In this article, we present a randomized dynamic cluster algorithm for large data sets. It is based on the restricted random walk cluster algorithm by Schöll and Schöll-Paschinger that has given good results in past studies. We discuss different approaches for the clustering of dynamic data sets. In contrast to most of these methods, dynamic restricted random walk clustering is also efficient for a small percentage of changes in the data set and has the additional advantage that the updates asymptotically produce the same clusters as a reclustering with the static variant; there is thus no need for any reclustering ever. In addition, the method has a relatively low computational complexity which enables it to cluster large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Freytag JC, Lockemann PC, Abiteboul S, Carey MJ, Selinger PG, Heuer A (eds) VLDB 2003, Proceedings of 29th international conference on very large data bases. Morgan Kaufmann, San Francisco, pp 81–92

    Google Scholar 

  • Banerjee J, Kim W, Kim SJ, Garza JF (1988) Clustering a DAG for CAD databases. IEEE Trans Softw Eng 14(11): 1684–1699

    Article  MATH  MathSciNet  Google Scholar 

  • Barbará D (2002) Requirements for clustering data streams. SIGKDD Explor Newsl 3(2): 23–27

    Article  Google Scholar 

  • Basagni S (1999) Distributed clustering for ad hoc networks. In: Proceedings of the fourth international symposium on parallel architectures, algorithms, and networks (ISPAN ’99). IEEE Press, Piscataway, pp 310–315

  • Bock H (1974) Automatische Klassifikation. Vandenhoeck & Ruprecht, Göttingen

    MATH  Google Scholar 

  • Bullat F, Schneider M (1996) Dynamic clustering in object databases exploiting effective use of relationships between objects. In: Cointe P (eds) ECCOP’96—Object-Oriented Programming, 10th European Conference. Springer, Heidelberg, pp 344–365

    Google Scholar 

  • Can F, Ozkarahan EA (1987) A dynamic cluster maintenance system for information retrieval. In: Yu CT, Rijsbergen CJV (eds) SIGIR ’87: Proceedings of the 10th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, pp 123–131

    Chapter  Google Scholar 

  • Can F, Ozkarahan EA (1989) Dynamic cluster maintenance. Inf Process Manag 25(3): 275–291

    Article  Google Scholar 

  • Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 2006 ACM SIGKDD. ACM Press, New York

  • Charikar M, Chekuri C, Feder T, Motwani R (1997) Incremental clustering and dynamic information retrieval. In: Leighton FT, Shor P (eds) STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on theory of computing. ACM Press, New York, pp 626–635

    Chapter  Google Scholar 

  • Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Larmore LL, Goemans MX (eds) STOC ’03: Proceedings of the thirty-fifth annual ACM symposium on theory of computing. ACM Press, New York, pp 30–39

    Chapter  Google Scholar 

  • Chaudhuri BB (1994) Dynamic clustering for time incremental data. Pattern Recognit Lett 15(1): 27–34

    Article  Google Scholar 

  • Czumaj A, Sohler C (2007) Sublinear-time approximation algorithms for clustering via random sampling. Random Struct Algorithms 30: 226–256

    Article  MATH  MathSciNet  Google Scholar 

  • Darmont J, Fromantin C, Régnier S, Gruenwald L, Schneider M (2001) Dynamic clustering in object-oriented databases: an advocacy for simplicity. In: Dittrich KR, Guerrini G, Merlo I, Oliva M, Rodríguez E (eds) Objects and databases, international symposium, proceedings. Springer, Heidelberg, pp 71–85

    Chapter  Google Scholar 

  • Duda RO, Hart PE, Stork DG (2001) Pattern classification. 2. Wiley-Interscience, New York

    Google Scholar 

  • Frahling G, Sohler C (2006) A fast k-means implementation using coresets. In: Proceedings of 22nd ACM symposium on computational geometry (SoCG), pp 135–143

  • Franke M (2007) An update algorithm for restricted random walk clusters. PhD thesis, Universität Karlsruhe (TH), Karlsruhe

  • Franke M, Geyer-Schulz A (2004) Automated indexing with restricted random walks on large document sets. In: Heery R, Lyon L (eds) Research and advanced technology for digital libraries—8th European conference, ECDL 2004. Springer, Heidelberg, pp 232–243

    Google Scholar 

  • Franke M, Geyer-Schulz A (2005) Using restricted random walks for library recommendations. In: Uchyigit G (eds) Web personalization, recommender systems and intelligent user interfaces. INSTICC Press, Setúbal, pp 107–115

    Google Scholar 

  • Franke M, Geyer-Schulz A (2007) A method for analyzing the asymptotic behavior of the walk process in restricted random walk cluster algorithm. In: Advances in data analysis. Proceedings of the 30th annual conference of the german classification society (GfKl). Springer. Studies in Classification, Data Analysis, and Knowledge Organization, Heidelberg, pp 51–58

  • Franke M, Geyer-Schulz A (2007) Using restricted random walks for library recommendations and knowledge space exploration. Int J Pattern Recognit Artif Intell 21(2): 355–373

    Article  Google Scholar 

  • Franke M, Thede A (2005) Clustering of large document sets with restricted random walks on usage histories. In: Weihs C, Gaul W (eds) Classification—the ubiquitous challenge: Proceedings of the 28th annual conference of the german classification society (GfKl). Studies in Classification, Data Analysis, and Knowledge Organization, Springer, pp 402–409

  • Franke M, Geyer-Schulz A, Neumann AW (2008) Recommender services in scientific digital libraries. In: Tsihrintzis GA, Jain L (eds) Multi-media services in intelligent environments. Springer, Heidelberg, pp 377–417

    Chapter  Google Scholar 

  • Gao J, Guibas LJ, Hershberger J, Zhang L, Zhu A (2003) Discrete mobile centers. Discret Computat Geom 30(1): 45–65

    MATH  MathSciNet  Google Scholar 

  • Gerla M, Tsai JTC (1995) Multicluster, mobile, multimedia radio network. J Wirel Netw 1(3): 255–265

    Article  Google Scholar 

  • Gupta C, Grossman RL (2004) GenIc: a single-pass generalized incremental algorithm for clustering. In: Berry MW, Dayal U, Kamath C, Skillicorn DB (eds) Proceedings of the fourth SIAM international conference on data mining. SIAM, Philadelphia

    Google Scholar 

  • Harel D, Koren Y (2001) On clustering using random walks. In: Hariharan R, Mukund M, Vinay V (eds) FST TCS 2001: foundations of software technology and theoretical computer science. Springer, Heidelberg, pp 18–41

    Chapter  Google Scholar 

  • Hudson SE, King R (1989) Cactis: a self-adaptive, concurrent implementation of an object-oriented database management system. ACM Trans Database Syst 14(3): 291–321

    Article  Google Scholar 

  • Krishna P, Vaidya NH, Chatterjee M, Pradhan DK (1997) A cluster-based approach for routing in dynamic networks. SIGCOMM Comput Commun Rev 27(2): 49–64

    Article  Google Scholar 

  • McIver WJ Jr, King R (1994) Self-adaptive, on-line reclustering of complex object data. In: Snodgrass RT, Winslett M (eds) Proceedings of the 1994 ACM SIGMOD international conference on management of data. ACM Press, New York, pp 407–418

    Chapter  Google Scholar 

  • O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th international conference on data engineering. IEEE Press, Piscataway, pp 685–694

  • Overmars MH (1983) The design of dynamic data structures. LNCS, vol 156. Springer, Berlin

    Google Scholar 

  • Rand WM (1971) Objective criteria for the evaluation of clustering algorithms. J Am Stat Assoc 66(336): 846–850

    Article  Google Scholar 

  • Richa AW, Obraczka K, Sen A (2001) Application-oriented self-organizing hierarchical clustering in dynamic networks: a position paper. In: Proceedings of 1st ACM Workshop on principles of mobile computing (POMC). ACM Press, New York, pp 57–65

  • Salton G, Wong A (1978) Generation and search of clustered files. ACM Trans Database Syst 3(4): 321–346

    Article  Google Scholar 

  • Schöll J, Schöll-Paschinger E (2003) Classification by restricted random walks. Patt Recognit 36(6): 1279–1290

    Article  MATH  Google Scholar 

  • Tarjan RE (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2): 146–160

    Article  MATH  MathSciNet  Google Scholar 

  • Trier M, Bobrik A (2009) Social search. IEEE Internet Comput 13(2): 51–59

    Article  Google Scholar 

  • Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301): 236–244

    Article  Google Scholar 

  • Wedel M, Kamakura W (2001) Market segmentation. International series in quantitative marketing. Kluwer Academic Publishers, Boston

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Markus Franke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franke, M., Geyer-Schulz, A. An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3, 63–92 (2009). https://doi.org/10.1007/s11634-009-0039-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-009-0039-6

Keywords

Mathematics Subject Classification (2000)

Navigation