Abstract
We propose a unique framework that is based upon diffusion processes and other methodologies for finding meaningful geometric descriptions in high-dimensional datasets. We will show that the eigenfunctions of the generated underlying Markov matrices can be used to construct diffusion processes that generate efficient representations of complex geometric structures for high-dimensional data analysis. This is done by non-linear transformations that identify geometric patterns in these huge datasets that find the connections among them while projecting them onto low dimensional spaces. Our methods automatically classify and recognize network protocols. The main core of the proposed methodology is based upon training the system to extract heterogeneous features that automatically (unsupervised) classify network protocols. Then, the algorithms are capable to classify and recognize in real-time incoming network data. The algorithms are capable to cluster the data into manifolds that are embedded in low-dimensional space, analyzed and visualized. In addition, the methodology parameterized the data in the low-dimensional space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: SODA ‘07 Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, Philadelphia, PA. SIAM, pp 1027–1035
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California, Department of Information and Computer Science. http://www.ics.uci.edu/mlearn/MLRepository.html
Chung FRK (1997) Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. AMS
Coifman RR, Lafon S (2006a) Diffusion maps. Appl Comput Harmon Anal 21(1):5–30
Coifman RR, Lafon S (2006b) Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions. Appl Comput Harmon Anal 21(1):31–52
Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc Natl Acad Sci USA 102(21):7426–7431
David G (2009) Anomaly detection and classification via diffusion processes in hyper-networks. PhD thesis, Tel Aviv University
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2(3):283–304
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Lafon S, Keller Y, Coifman RR (2006) Data fusion and multicue data matching by diffusion maps. IEEE Trans Pattern Anal Mach Intell 28(11):1784–1797
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inform Theory 28(2):129–137
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol 1, Berkeley, CA. University of California Press, pp 281–297
Nadler B, Lafon S, Coifman RR, Kevrekidis IG (2006) Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl Comput Harmon Anal 21(1):113–127
Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2(6):559–572
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Boston
Zhang R, Rudnicky AI (2002) A large scale clustering scheme for kernel k-means. In: Proceedings of the 16th international conference on pattern recognition (ICPR 02), vol 4, New York. IEEE, pp 289–292
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
David, G. (2015). Clustering-Based Protocol Classification via Dimensionality Reduction. In: Lehto, M., Neittaanmäki, P. (eds) Cyber Security: Analytics, Technology and Automation. Intelligent Systems, Control and Automation: Science and Engineering, vol 78. Springer, Cham. https://doi.org/10.1007/978-3-319-18302-2_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-18302-2_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18301-5
Online ISBN: 978-3-319-18302-2
eBook Packages: EngineeringEngineering (R0)