Abstract
Data clustering is one of the most important methods used to discover naturally occurring structures in datasets. One of the most popular clustering algorithms is the Density-Based Spatial Clustering of Applications with Noise (DBSCAN). This algorithm can discover clusters of arbitrary shapes in datasets and thus it has been widely applied in many different applications. However, the DBSCAN requires two input parameters, i.e. the radius of the neighborhood (eps) and the minimum number of points required to form a dense region (MinPts). The right choice of the two parameters is a fundamental issue. In this paper, a new method is proposed to determine the radius parameter. In this approach the distances between each element in the dataset and its k-th nearest neighbor are used, and then in these distances abrupt changes in values are identified. The performance of the new approach has been demonstrated for several different datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bilski, J., Smoląg, J.: Parallel architectures for learning the RTRN and Elman dynamic neural networks. IEEE Trans. Parallel Distrib. Syst. 26(9), 2561–2570 (2015)
Bilski, J., Kowalczyk, B., Marchlewska, A., Zurada, J.M.: Local levenberg-marquardt algorithm for learning feedforward neural networks. J. Artif. Intell. Soft Comput. Res. 10(4), 299–316 (2020)
Boonchoo, T., Ao, X., Liu, Y., Zhao, W., He, Q.: Grid-based DBSCAN: indexing and inference. Pattern Recogn. 90, 271–284 (2019)
Bradley P., Fayyad U.: Refining initial points for k-means clustering. In: Proceedings of the Fifteenth International Conference on Knowledge Discovery and Data Mining, New York, AAAI Press, pp. 9–15 (1998)
Chen, Y., Tang, S., Bouguila, N., Wanga, C., Du, J., Li, H.: A fast clustering algorithm based on pruning unnecessary distance computations in DBSCAN for high-dimensional data. Pattern Recogn. 83, 375–387 (2018)
Darong H., Peng W.: Grid-based dbscan algorithm with referential parameters. Phys. Procedia 24, Part B, 1166–1170 (2012)
Dziwiṅski, P., Bartczuk, Ł, Paszkowski, J.: A new auto adaptive fuzzy hybrid particle swarm optimization and genetic algorithm. J. Artif. Intell. Soft Comput. Res. 10(2), 95–111 (2020)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Ferdaus, M., Anavatti, S.G., Garratt, M.A., Pratama, M.: Development of C-means clustering based adaptive fuzzy controller for a flapping wing micro air vehicle. J. Artif. Intell. Soft Comput. Res. 9(2), 99–109 (2019)
Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)
Gabryel, M.: Data analysis algorithm for click fraud recognition. Commun. Comput. Inf. Sci. 920, 437–446 (2018)
Gałkowski, T., Krzyak, A., Filutowicz, Z.: A new approach to detection of changes in multidimensional patterns. J. Artif. Intell. Soft Comput. Res. 10(2), 125–136 (2020)
Grycuk, R., Najgebauer, P., Kordos, M., Scherer, M., Marchlewska, A.: Fast image index for database management engines. J. Artif. Intell. Soft Comput. Res. 10(2), 113–123 (2020)
Hruschka E.R., de Castro L.N., Campello R.J.: Evolutionary algorithms for clustering gene-expression data, In: Fourth IEEE International Conference on Data Mining, 2004, ICDM 2004, pp. 403–406. IEEE (2004)
Jain, A., Dubes, R.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs (1988)
Karami, A., Johansson, R.: Choosing DBSCAN parameters automatically using differential evolution. Int. J. Comput. Appl. 91, 1–11 (2014)
Luchi, D., Rodrigues, A.L., Varejao, F.M.: Sampling approaches for applying DBSCAN to large datasets. Pattern Recogn. Lett. 117, 90–96 (2019)
Meng X., van Dyk D.: The EM algorithm - An old folk-song sung to a fast new tune. J. Royal Stat. Soc. Series B (Methodological) 59(3), 511–567 (1997)
Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)
Nowicki, R., Grzanek, K., Hayashi, Y.: Rough support vector machine for classification with interval and incomplete data. J. Artif. Intell. Soft Comput. Res. 10(1), 47–56 (2020)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)
Rohlf, F.: Single-link clustering algorithms. In: Krishnaiah, P.R., Kanal, L.N. (eds.), Handbook of Statistics, vol. 2, pp. 267–284 (1982)
Sameh, A.S., Asoke, K.N.: Development of assessment criteria for clustering algorithms. Pattern Anal. Appl. 12(1), 79–98 (2009)
Shah, G.H.: An improved dbscan, a density based clustering algorithm with parameter selection for high dimensional data sets. In: Nirma University International Engineering, (NUiCONE), pp. 1–6 (2012)
Sheikholeslam, G., Chatterjee, S., Zhang, A.: WaveCluster: a wavelet-based clustering approach for spatial data in very large databases. Int. J. Very Large Data Bases 8(3–4), 289–304 (2000)
Shieh, H.-L.: Robust validity index for a modified subtractive clustering algorithm. Appl. Soft Comput. 22, 47–59 (2014)
Starczewski, A.: A new validity index for crisp clusters. Pattern Anal. Appl. 20(3), 687–700 (2017)
Starczewski, A., Cader, A.: Determining the Eps parameter of the DBSCAN algorithm. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2019. LNCS (LNAI), vol. 11509, pp. 420–430. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20915-5_38
Starczewski, J., Goetzen, P., Napoli, C.: Triangular fuzzy-rough set based fuzzification of fuzzy rule-based systems. J. Artif. Intell. Soft Comput. Res. 10(4), 271–285 (2020)
Wang, W., Yang, J., Muntz, R.: STING: a statistical information grid approach to spatial data mining. In: Proceedings of the 23rd International Conference on Very Large Data Bases, VLDB 1997, pp. 186–195 (1997)
Viswanath, P., Suresh Babu, V.S.: Rough-dbscan: a fast hybrid density based clustering method for large data sets. Pattern Recogn. Lett. 30(16), 1477–1488 (2009)
Zalik, K.R.: An efficient k-means clustering algorithm. Pattern Recogn. Lett. 29(9), 1385–1391 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Starczewski, A. (2021). A Novel Approach to Determining the Radius of the Neighborhood Required for the DBSCAN Algorithm. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2021. Lecture Notes in Computer Science(), vol 12854. Springer, Cham. https://doi.org/10.1007/978-3-030-87986-0_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-87986-0_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87985-3
Online ISBN: 978-3-030-87986-0
eBook Packages: Computer ScienceComputer Science (R0)