An update algorithm for restricted random walk clustering for dynamic data sets

Franke, Markus; Geyer-Schulz, Andreas

doi:10.1007/s11634-009-0039-6

An update algorithm for restricted random walk clustering for dynamic data sets

Regular Article
Published: 18 June 2009

Volume 3, pages 63–92, (2009)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Markus Franke¹ &
Andreas Geyer-Schulz¹

190 Accesses
8 Citations
Explore all metrics

Abstract

In this article, we present a randomized dynamic cluster algorithm for large data sets. It is based on the restricted random walk cluster algorithm by Schöll and Schöll-Paschinger that has given good results in past studies. We discuss different approaches for the clustering of dynamic data sets. In contrast to most of these methods, dynamic restricted random walk clustering is also efficient for a small percentage of changes in the data set and has the additional advantage that the updates asymptotically produce the same clusters as a reclustering with the static variant; there is thus no need for any reclustering ever. In addition, the method has a relatively low computational complexity which enables it to cluster large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: Freytag JC, Lockemann PC, Abiteboul S, Carey MJ, Selinger PG, Heuer A (eds) VLDB 2003, Proceedings of 29th international conference on very large data bases. Morgan Kaufmann, San Francisco, pp 81–92
Google Scholar
Banerjee J, Kim W, Kim SJ, Garza JF (1988) Clustering a DAG for CAD databases. IEEE Trans Softw Eng 14(11): 1684–1699
Article MATH MathSciNet Google Scholar
Barbará D (2002) Requirements for clustering data streams. SIGKDD Explor Newsl 3(2): 23–27
Article Google Scholar
Basagni S (1999) Distributed clustering for ad hoc networks. In: Proceedings of the fourth international symposium on parallel architectures, algorithms, and networks (ISPAN ’99). IEEE Press, Piscataway, pp 310–315
Bock H (1974) Automatische Klassifikation. Vandenhoeck & Ruprecht, Göttingen
MATH Google Scholar
Bullat F, Schneider M (1996) Dynamic clustering in object databases exploiting effective use of relationships between objects. In: Cointe P (eds) ECCOP’96—Object-Oriented Programming, 10th European Conference. Springer, Heidelberg, pp 344–365
Google Scholar
Can F, Ozkarahan EA (1987) A dynamic cluster maintenance system for information retrieval. In: Yu CT, Rijsbergen CJV (eds) SIGIR ’87: Proceedings of the 10th annual international ACM SIGIR conference on research and development in information retrieval. ACM Press, New York, pp 123–131
Chapter Google Scholar
Can F, Ozkarahan EA (1989) Dynamic cluster maintenance. Inf Process Manag 25(3): 275–291
Article Google Scholar
Chakrabarti D, Kumar R, Tomkins A (2006) Evolutionary clustering. In: Proceedings of the 2006 ACM SIGKDD. ACM Press, New York
Charikar M, Chekuri C, Feder T, Motwani R (1997) Incremental clustering and dynamic information retrieval. In: Leighton FT, Shor P (eds) STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on theory of computing. ACM Press, New York, pp 626–635
Chapter Google Scholar
Charikar M, O’Callaghan L, Panigrahy R (2003) Better streaming algorithms for clustering problems. In: Larmore LL, Goemans MX (eds) STOC ’03: Proceedings of the thirty-fifth annual ACM symposium on theory of computing. ACM Press, New York, pp 30–39
Chapter Google Scholar
Chaudhuri BB (1994) Dynamic clustering for time incremental data. Pattern Recognit Lett 15(1): 27–34
Article Google Scholar
Czumaj A, Sohler C (2007) Sublinear-time approximation algorithms for clustering via random sampling. Random Struct Algorithms 30: 226–256
Article MATH MathSciNet Google Scholar
Darmont J, Fromantin C, Régnier S, Gruenwald L, Schneider M (2001) Dynamic clustering in object-oriented databases: an advocacy for simplicity. In: Dittrich KR, Guerrini G, Merlo I, Oliva M, Rodríguez E (eds) Objects and databases, international symposium, proceedings. Springer, Heidelberg, pp 71–85
Chapter Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification. 2. Wiley-Interscience, New York
Google Scholar
Frahling G, Sohler C (2006) A fast k-means implementation using coresets. In: Proceedings of 22nd ACM symposium on computational geometry (SoCG), pp 135–143
Franke M (2007) An update algorithm for restricted random walk clusters. PhD thesis, Universität Karlsruhe (TH), Karlsruhe
Franke M, Geyer-Schulz A (2004) Automated indexing with restricted random walks on large document sets. In: Heery R, Lyon L (eds) Research and advanced technology for digital libraries—8th European conference, ECDL 2004. Springer, Heidelberg, pp 232–243
Google Scholar
Franke M, Geyer-Schulz A (2005) Using restricted random walks for library recommendations. In: Uchyigit G (eds) Web personalization, recommender systems and intelligent user interfaces. INSTICC Press, Setúbal, pp 107–115
Google Scholar
Franke M, Geyer-Schulz A (2007) A method for analyzing the asymptotic behavior of the walk process in restricted random walk cluster algorithm. In: Advances in data analysis. Proceedings of the 30th annual conference of the german classification society (GfKl). Springer. Studies in Classification, Data Analysis, and Knowledge Organization, Heidelberg, pp 51–58
Franke M, Geyer-Schulz A (2007) Using restricted random walks for library recommendations and knowledge space exploration. Int J Pattern Recognit Artif Intell 21(2): 355–373
Article Google Scholar
Franke M, Thede A (2005) Clustering of large document sets with restricted random walks on usage histories. In: Weihs C, Gaul W (eds) Classification—the ubiquitous challenge: Proceedings of the 28th annual conference of the german classification society (GfKl). Studies in Classification, Data Analysis, and Knowledge Organization, Springer, pp 402–409
Franke M, Geyer-Schulz A, Neumann AW (2008) Recommender services in scientific digital libraries. In: Tsihrintzis GA, Jain L (eds) Multi-media services in intelligent environments. Springer, Heidelberg, pp 377–417
Chapter Google Scholar
Gao J, Guibas LJ, Hershberger J, Zhang L, Zhu A (2003) Discrete mobile centers. Discret Computat Geom 30(1): 45–65
MATH MathSciNet Google Scholar
Gerla M, Tsai JTC (1995) Multicluster, mobile, multimedia radio network. J Wirel Netw 1(3): 255–265
Article Google Scholar
Gupta C, Grossman RL (2004) GenIc: a single-pass generalized incremental algorithm for clustering. In: Berry MW, Dayal U, Kamath C, Skillicorn DB (eds) Proceedings of the fourth SIAM international conference on data mining. SIAM, Philadelphia
Google Scholar
Harel D, Koren Y (2001) On clustering using random walks. In: Hariharan R, Mukund M, Vinay V (eds) FST TCS 2001: foundations of software technology and theoretical computer science. Springer, Heidelberg, pp 18–41
Chapter Google Scholar
Hudson SE, King R (1989) Cactis: a self-adaptive, concurrent implementation of an object-oriented database management system. ACM Trans Database Syst 14(3): 291–321
Article Google Scholar
Krishna P, Vaidya NH, Chatterjee M, Pradhan DK (1997) A cluster-based approach for routing in dynamic networks. SIGCOMM Comput Commun Rev 27(2): 49–64
Article Google Scholar
McIver WJ Jr, King R (1994) Self-adaptive, on-line reclustering of complex object data. In: Snodgrass RT, Winslett M (eds) Proceedings of the 1994 ACM SIGMOD international conference on management of data. ACM Press, New York, pp 407–418
Chapter Google Scholar
O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Proceedings of the 18th international conference on data engineering. IEEE Press, Piscataway, pp 685–694
Overmars MH (1983) The design of dynamic data structures. LNCS, vol 156. Springer, Berlin
Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering algorithms. J Am Stat Assoc 66(336): 846–850
Article Google Scholar
Richa AW, Obraczka K, Sen A (2001) Application-oriented self-organizing hierarchical clustering in dynamic networks: a position paper. In: Proceedings of 1st ACM Workshop on principles of mobile computing (POMC). ACM Press, New York, pp 57–65
Salton G, Wong A (1978) Generation and search of clustered files. ACM Trans Database Syst 3(4): 321–346
Article Google Scholar
Schöll J, Schöll-Paschinger E (2003) Classification by restricted random walks. Patt Recognit 36(6): 1279–1290
Article MATH Google Scholar
Tarjan RE (1972) Depth-first search and linear graph algorithms. SIAM J Comput 1(2): 146–160
Article MATH MathSciNet Google Scholar
Trier M, Bobrik A (2009) Social search. IEEE Internet Comput 13(2): 51–59
Article Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301): 236–244
Article Google Scholar
Wedel M, Kamakura W (2001) Market segmentation. International series in quantitative marketing. Kluwer Academic Publishers, Boston
Google Scholar

Download references

Author information

Authors and Affiliations

Information Services and Electronic Markets, Institute of Information Systems and Management, Universität Karlsruhe (TH), 76128, Karlsruhe, Germany
Markus Franke & Andreas Geyer-Schulz

Authors

Markus Franke
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Geyer-Schulz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markus Franke.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franke, M., Geyer-Schulz, A. An update algorithm for restricted random walk clustering for dynamic data sets. Adv Data Anal Classif 3, 63–92 (2009). https://doi.org/10.1007/s11634-009-0039-6

Download citation

Received: 01 April 2008
Revised: 22 April 2009
Accepted: 24 May 2009
Published: 18 June 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11634-009-0039-6

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An update algorithm for restricted random walk clustering for dynamic data sets

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

An update algorithm for restricted random walk clustering for dynamic data sets

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation