An efficient and straightforward online vector quantization method for a data stream through remove-birth updating

The growth of network-connected devices has led to an exponential increase in data generation, creating significant challenges for efficient data analysis. This data is generated continuously, creating a dynamic flow known as a data stream. The characteristics of a data stream may change dynamically, and this change is known as concept drift. Consequently, a method for handling data streams must efficiently reduce their volume while dynamically adapting to these changing characteristics. This article proposes a simple online vector quantization method for concept drift. The proposed method identifies and replaces units with low win probability through remove-birth updating, thus achieving a rapid adaptation to concept drift. Furthermore, the results of this study show that the proposed method can generate minimal dead units even in the presence of concept drift. This study also suggests that some metrics calculated from the proposed method will be helpful for drift detection.


Introduction
In today's world, an enormous number of devices, from computers to IoT gadgets, are constantly connected to the Internet, sending a continuous stream of data to server computers.This vast amount of data, often referred to as big data, is too large to efficiently analyze in its raw form, so it needs to be reduced in size.A continuous generation of data that flows like a river into a server is called a data stream.Since the volume of an accumulated data stream is too large, we need preprocessing, such as vector quantization, to handle it effectively.This paper proposes a new approach of a vector quantization method for a data stream to reduce the data size and store the features of a data stream.
In many real-world scenarios, we cannot assume that a dataset is static (Ramírez-Gallego et al., 2017).Instances of a data stream will evolve (Zubaroglu and Atalay, 2021), meaning that not only is the data continuously generated, but its properties can also change.This phenomenon is known as concept drift.A quantization method for concept drift must be able to extract new features as the characteristics of the distribution change.Therefore, we need a method that can resize the data and automatically adapt to changes in the characteristics of the data stream.
Developing a machine learning method for data streams with concept drift is a new computational challenge (Sultan, 2022) because the method needs to fulfill four capabilities: avoidance of storing data points, feature extraction, flexibility, and single pass.The method should avoid storing whole data points (Beyer and Cimiano, 2013) because the accumulated data is enumerated, so the method must extract the features of the data and store them instead.The distribution of a data stream will evolve, which can degrade the model's performance if it does not adapt to these changes.Therefore, the method should be able to flexibly extract new features on the fly.In addition, since data streams are potentially unbounded and ordered sequences of instances (Zubaroglu and Atalay, 2021), the method should continuously update the extracted features or the model with new data using the latest information.Batch algorithms are not ideal for dynamically changing data, so the algorithm must be capable of continuous online (single-pass) learning (Smith and Alahakoon, 2009).This paper proposes improved online vector quantization methods for a data stream.Our methods are designed to quickly adapt to the evolution of the data stream with concept drift.Specifically, this study focus on remove-birth (RB) updating to improve online quantization methods for concept drift.
The inspiration for the RB updating mechanism is derived from the death-birth updating concept originally proposed by Ohtsuki et al. (2006) for an evolutionary game on a graph.This concept revolves around the random selection and elimination of an individual from a node, creating an empty node.A new individual is then introduced to fill this node, inheriting the characteristics or strategies of a neighboring node.
RB updating consists of removing units far from the current data distribution and creating new units around the units on the distribution.This procedure results in a simple and efficient online vector quantization method for data streams.Using a reliable metric to determine whether a unit should be removed is critical for RB updating.In this study, the win probability of a unit is used as the metric.Notably, the win probability is not affected by the value range of data.As a result, using the win probability as the metric allows us to efficiently decide whether to remove a unit even when the value range of the data changes due to concept drift.
This study proposes three quantization methods based on online k-means (MacQueen, 1967), Kohonen's self-organizing maps (SOM) (Kohonen, 1982), and neural gases (NG) (Martinetz and Schulten, 1991).Online k-means is an online version of k-means.SOM is a competitive learning method used for cluster analysis, data visualization, and dimensionality reduction.NG, an alternative to SOM, is used for vector quantization and data representation as a graph.All three methods are online (incremental) learning techniques suitable for data streams.However, online learning alone is not sufficient to handle data streams, which need mechanisms to forget outdated data and adapt to the latest state of nature (Gama, 2010).Online k-means, SOM, and NG need to be improved because their parameters decay with iterations and they cannot adapt to the latest state of the data.
To address this issue, we develop three simple incremental vector quantization methods for data streams, which are online k-means, SOM, and NG applied RB updating, named online k-means with RB updating (OKRB), SOM with RB updating (SOMRB), and NG with RB updating (NGRB), respectively.This study shows that OKRB, SOMRB, and NGRB exhibit satisfactory performance in vector quantization and can quickly adapt to concept drift.Thus, the proposed methods can adapt to concept drift by using RB updating.

Concept drift
In most real-world applications, processes are not strictly stationary, which means that the target concept may gradually change over time (Gama, 2010).This unanticipated evolution of the statistical properties of a data stream is known as concept drift.Concept drift refers to situations where the underlying patterns or distributions of the data stream evolve, resulting in unexpected changes in the statistical properties (Zubaroglu and Atalay, 2021).Ramírez-Gallego et al. (2017) have classified concept drift into four types: sudden, gradual, incremental, and recurring.Sudden concept drift refers to sudden changes in the statistical properties of a data stream.Gradual concept drift is characterized by an evolution in which the number of data points generated by a data stream with previous properties gradually decreases, while those with new properties gradually increase over time.Incremental concept drift involves the step-by-step transformation of the statistical properties of a data stream.Recurring concept drift involves the cyclical change of a data stream between two or more characteristics.More details about these types of concept drift can be found in Zubaroglu and Atalay (2021).
3 Related work K-means (MacQueen, 1967;Lloyd, 1982) is the simplest and most well-known clustering.K-means has gained widespread recognition and is considered one of the top ten algorithms used in data mining (Wu et al., 2007).Its popularity is due to its ease of implementation and efficient performance (Haykin, 2009).K-means is also a useful vector quantization method because it transforms a data set into a set of centroids.The typical k-means algorithm is the Lloyd algorithm (Lloyd, 1982), a well-known instance of k-means that uses batch learning.Online k-means, also known as sequential k-means or MacQueen's k-means (MacQueen, 1967), uses online learning.In particular, online k-means can be applied to quantization for data streams because it is not limited by the size of the data using online learning.
The most famous and widely used self-organizing map algorithm is Kohonen's self-organizing map (SOM) (Kohonen, 1982).This method represents an input dataset as units with a weight vector called the reference vector.The SOM can project multidimensional data onto a low-dimensional map (Vesanto and Alhoniemi, 2000).More specifically, the low-dimensional map is typically a two-dimensional grid because it is easy to visualize the data (Smith and Alahakoon, 2009).SOM is essentially to perform topological vector quantization of the input data to reduce the dimensionality of the input data while preserving as much of the spatial relationships within the data as possible (Smith and Alahakoon, 2009).Because of these capabilities, SOM is widely used in data mining, especially in unsupervised clustering (Smith and Alahakoon, 2009).The versatility of SOM is demonstrated by its use in a wide range of applications, such as color quantization (Chang et al., 2005;Rasti et al., 2011), data visualization (Heskes, 2001), and skeletonization (Singh et al., 2000).
Neural Gas (NG), proposed by Martinez and Schulten, is one of the SOM alternatives.With the ability to quantize input data and generate reference vectors, NG constructs a network that reflects the manifold of the input data.One of its good features is its independence from the range of values of data.However, NG needs to improve when dealing with data streams.The root of NG's problems with data streams in its time-decaying parameters.While this decay helps NG's network fit the input data more accurately, it reduces its flexibility over time.Thus, in scenarios where the characteristics of the data suddenly change during training, NG cannot adjust its network accordingly.This inherent inflexibility makes NG unsuitable for data stream applications.In addition, maintaining static parameters in NG will result in the creation of dead units that have no nearest data point.In particular, NG with non-decaying parameters tends to create dead units when reference vectors are distant from the data.These characteristics pose significant barriers to the application of NG to data streams.
Growing Neural Gas (GNG) (Fritzke, 1994) is also a kind of SOM alternative (Fiser et al., 2013) and can find the topology of an input distribution (García-RodríGuez et al., 2012).GNG can also quantize input data and create reference vectors from data.GNG changes the network structure by increasing the number of neurons during training.This makes the GNG network remarkably flexible and reflective of the data structure.GNG has a wide range of applications, including topology learning, such as extraction of the two-dimensional outline of an image (Angelopoulou et al., 2011(Angelopoulou et al., , 2018)), reconstruction of 3D models (Holdstein and Fischer, 2008), landmark extraction (Fatemizadeh et al., 2003), object tracking (Frezza-Buet, 2008), anomaly detection (Sun et al., 2017), and cluster analysis (Canales and Chacón, 2007;Costa and Oliveira, 2007;Fujita, 2021).
The vector quantization methods discussed previously, including k-means, SOM, NG, and GNG, are effective for static data processing.However, they face challenges when applied to non-stationary data, such as data streams.To overcome this limitation, much research has been devoted to developing both clustering and vector quantization methods specifically designed for data streams.
Several stream k-means methods have been proposed.Incremental k-means (Ordonez, 2003) is a binary data clustering method based on k-means.StreamKM++ (Ackermann et al., 2012) is based on k-means++ (Arthur and Vassilvitskii, 2007), which improves the initial centroid problem of k-means.These methods can store sufficient statistics and extract features from the entire dataset, even though the methods use a portion of the data at each step.However, these methods were designed to cluster large data sets on a computer with small memory, and they do not assume that the characteristics of the data streams change during training.Thus, they may not be able to capture the feature of a data stream with a new property because they retain the statistical properties of the old data.
Several researchers have adapted self-organizing maps (SOM) and growing neural gas (GNG) for data stream analysis.Smith's CPSOM (Smith and Alahakoon, 2009) is an online SOM algorithm suitable for large datasets.It incorporates a forgetting factor that allows the network to let go of old patterns and adapt to new ones as they emerge.Silva and Marques (2015) have introduced a variant of SOM for data streams, called ubiquitous SOM, which uses the average quantization error over time to estimate learning parameters.This approach allows the model to maintain continuous adaptability and handle concept drift in multidimensional streams.Ghesmoune et al. (Ghesmoune et al., 2014, 2015) have proposed G-Stream, an adaptation of GNG designed for data stream clustering.
Data stream analysis often uses a two-step approach consisting of an online phase followed by an offline phase.In the online phase, data is assigned to microclusters, which are a summary of the data.This is the data abstraction step.The offline phase produces the clustering result.An example of this method is CVD-stream (Mousavi et al., 2020), an online-offline density-based clustering method for data streams.

Online quantization methods with remove-birth updating
In this study, remove-birth updating (RB updating) is applied to three different online quantization methods: online k-means, SOM, and NG.This approach allows us to quickly adapt to changes in the distribution of the data.Online k-means is a type of k-means that uses online learning, while SOM can project a low-dimensional space.NG can extract the topology of the input as a graph.By integrating RB updating with these methods, we will achieve data quantization, dimension reduction, and topology extraction for a data stream.

Metric for RB updating
Concept drift results in a change in the characteristics of a data stream.As a result of this change, some units representing the data with old characteristics may be outside the data distribution with new characteristics, as shown in Fig. 1 A and B. RB updating addresses this problem by removing units that are far from the current data distribution and creating new units around the units on the distribution, as shown in Fig. 1 B and C. Developing an effective metric for determining when and which unit to remove and where to create a unit is critical to RB updating.
Fritzke (1997) have proposed a metric based on error, which is the difference (distance) between a data point and its corresponding reference vector.However, the effectiveness of this error-based metric is affected by the value range of the dataset.Thus, this metric will face problems for data streams undergoing concept drift because there is no guarantee that the value range is static for data streams.
This study introduces win probability as an alternative metric.A win of a unit means that the unit is closest to an input data point.Unlike error-based metrics, win probability retains independence from the value range of the dataset, thereby enhancing its effectiveness in RB updating even as the value range of the dataset changes due to concept drift.This metric M n (t) of unit n is expressed as Here, P win,n (t) = c n (t)/( m c m (t)) refers to the win probability of unit n, and c n (t) refers to the number of wins of unit n.In this study, c n (t) shows the decay over iteration as indicated: where β is a decay constant.This exponential decay mechanism limits the influence of old data characteristics, thus improving the overall effectiveness of the RB updating process.

Online k-means with RB updating
Online k-means with RB updating (OKRB) is a modification of the standard online k-means algorithm designed to quickly adapt to variations within a data stream.However, online k-means suffers from the problem of dead units.Dead units are units (centroids) to which no data points are assigned, often due to their initial placement far from the input dataset.When concept drift occurs, and results in units far from the data distribution, units close to the dataset may gradually move toward it, while those farther away remain static.When concept drift occurs and the data distribution changes, online k-means will generate dead units.OKRB mitigates this shortcoming through RB updating.
Consider an input data sequence denoted by X = {x 1 , x 2 , ..., x t , ...}, where each x t belongs to a D-dimensional real space R D .The probability distribution generating X may change during data point generation due to concept drift.

A B C
Remove Create most wins unit OKRB contains N units, where each unit represents a centroid.Each unit i is associated with a reference vector w n ∈ R D .To accommodate evolving data streams, OKRB iteratively adapts its reference vectors to the incoming data point x t , eliminating less useful units through RB updating.
In each iteration, OKRB processes a single data point x t .The algorithm identifies the winning unit n 1 , which is the unit whose reference vector is closest to x t .The reference vector of n 1 is updated by where ε is the learning rate.Unlike traditional online k-means, where the learning rate decays over time, in OKRB the learning rate remains static.This is because the data streams are unbounded, making the end of the iteration indeterminable.
RB updating in OKRB serves to prune less winning units while introducing new units near frequently winning units.Specifically, each iteration n 1 's win count c n1 is incremented by one.The algorithm then identifies the unit with the maximum number of wins n max and the unit with the minimum number of wins n min .If c nmin /c nmax exceeds a certain threshold TH RB , RB updating is triggered: the unit n min is discarded and a new unit n new is added near n max .In this study, n new assumes identity with n min .The reference vector and the win count of n new are respectively determined by w nnew = (w nmax + w f )/2 and c nnew = (c nmax + c f )/2, where f is the unit closest to n max .Finally, the winning counts of all units are exponentially decaying, according to with β as the decay rate.
The detailed procedure for the OKRB algorithm is described in the algorithm 1.

SOM with RB updating
Self-Organizing Map with RB updating (SOMRB) is based on standard SOM, but with improved adaptation to changes in the data stream through RB updating.SOMRB projects high-dimensional input data onto a low-dimensional map that can be used for clustering, visualization, and dimension reduction.
SOMRB contains N units, each with a reference vector w n in R d .These units, positioned at p n = (p n1 , p n2 ) ∈ R 2 on a two-dimensional map, form a grid structure with p n1 and p n2 as integers.
Algorithm 1 Online k-means with RBirth updating (OKRB) Require: X = {x 1 , ..., x t , ...}, N 1: Initialize: Receive input x t at iteration t 6: {Update reference vectors:} 7: {RB updating:} 10: In each iteration, SOMRB processes a single data point.After receiving an input data point x t , SOMRB adjusts its reference vectors for each iteration.The winning unit, n 1 , with the reference vector closest to x t , is identified.Then all reference vectors are updated according to where ε is the learning rate and h(•) is the neighborhood function defined as h(d) = exp(− d 2 2σ 2 ).RB updating is used to remove units that rarely win (dead units) and add new units around units that frequently win.The count of wins for unit n 1 , denoted by c n1 , is incremented at each iteration.Units with the maximum count c nmax and the minimum count c nmin are identified.However, n max must have neighboring empty vertices on the grid.When c nmin /c nmax exceeds the threshold TH RB , an RB update is performed.This involves removing the minimum winning unit n min and adding a new unit n new near the maximum winning unit n max .In this study, n new assumes identity with n min .The new unit is placed on an empty vertex neighboring n max , chosen randomly if there is more than one empty vertex.This new unit is connected to neighboring units on the grid, and its reference vector is computed based on the average of the reference vectors of the neighboring units if it has more than one neighbor.Conversely, if the new unit has only one neighbor, its reference vector is the average of the reference vectors of its neighbor and the neighboring units of its neighbor.
All winning counts are exponentially decaying according to the following formula: where β is the decay rate.
The detailed SOMRB algorithm is given in the algorithm 2.

Neural gas with RB updating
Neural Gas with Remove-Birth updating (NGRB) is an alternative to Self-Organizing Map (SOM) for data streams based on Neural Gas (NG).NGRB quantizes the input data and generates a network like NG.In addition, NGRB can reconfigure its network structure more quickly through RB updating.
The network generated by NGRB consists of N units, with edges connecting pairs of units.Each unit i has a reference vector w i and a winning counter c n .Edges between units are neither weighted nor directed.The edge is represented by C nm , which is 0 or 1.If C nm = 1, unit n is connected to unit Algorithm 2 Self-Organizing Map with RB updating (SOMRB) Require: X = {x 1 , ..., x t , ...}, N 1: Initialize: Connect each pair of neighboring units 5: w n = (⌊n/L⌋/L, (n mod L)/L, 0, ..., 0), n = 0, ..., N − 1 6: Receive input x t at iteration t 9: {Update reference vectors:} 10: {RB updating:} 13: , where e n is the number of edges connected to unit n 33: c n ← c n − βc n , n = 0, ..., N − 1 38: end loop m, and vice versa.Each edge has an age variable, denoted by a nm , which informs the decision to keep or discard the edge.In NGRB, data points are represented iteratively, and the reference vectors are refined at each iteration.
In each iteration, NGRB processes a single data point.After receiving an input data point x t , NGRB iteratively refines its reference vectors and changes the network topology.The reference vector update procedure is identical to that of NG.First, we determine the neighborhood ranking k n of unit n based on the distance between w n and x t .This results in a sequence of unit rankings (n 0 , n 1 , ..., n k , ..., n M −1 ) determined by Then the reference vectors of all units are updated according to where e −kn/λ is a neighborhood function.λ determines the number of units which significantly change their reference vectors at each iteration.
At the same time, the network topology evolves in response to the input data through iterative adaptation.The adaptation procedure of the network topology is also identical to that of NG.The winning unit n 0 and the second nearest unit n 1 are identified.If C n0n1 = 0, we set C n0n1 = 1 and a n0n1 = 0.If C n0n1 = 1, we set a n0n1 = 0.The age of all edges connected to n 0 is incremented by 1, and links exceeding the prescribed lifetime are removed.
In addition, NGRB dynamically restructures its network using RB updating, which involves removing infrequently winning units and introducing new units around those that win frequently.The winning count of winning unit, denoted by c n0 , is incremented at each iteration.The algorithm then identifies the unit with the maximum wins n max , and the unit with the minimum wins n min .In the process, n min is eliminated and a new unit n new is added around n max .In this study, n new = n min .The reference vector and the win count of n new are respectively determined by w nnew = (w nmax + w f )/2 and c nnew = (c nmax + c f )/2, where f is the neighboring unit of n max that has the maximum wins.If n max has no neighbors, the closest unit to n max is taken as unit f .Consequently, n new is connected to n max and f , setting C nnewnmax = 1 and C nnewnmax = 1.
All winning counters are subject to exponential decay, calculated as follows where β is the decay rate.
The complete NGRB algorithm is given in Algorithm 3.
5 Experimental setup

comparison methods
The proposed methods are compared with four other methods: online k-means, SOM, NG, and GNG.The algorithm of online k-means is referred to in the appendix section labeled Online k-means.The

33:
Connect n min with n max and f 34: c n ← c n − βc n , n = 1, ..., N 37: end loop detailed descriptions of SOM, NG, and GNG are covered in (Fujita, 2021).Notably, online k-means, SOM, and NG have parameters that decay over iterations, but in this study they were kept static in order to process data streams.For example, in the case of NG, the learning rate is kept constant so that ε = ε i = ε f , where ε i and ε f are the initial and final learning rates, respectively.

Initialization
The initialization for OKRB, SOMRB, and NGRB is described in the algorithms 1, 2, and 3, respectively.For all comparison methods except SOM, the elements of the reference vectors are initialized uniformly at random in the range [0, 1).For SOM, the reference vector is initialized as w n = (⌊n/L⌋/L, (n mod L)/L, 0, ..., 0), where L = ⌊ √ N ⌋.The units of SOM on the 2D feature map are placed at p n = (p n1 , p n2 ) = (⌊n/L⌋, n mod L).Such an initialization strategy strengthens the network topology of the SOM against map distortions.Traditionally, reference vectors are either randomly selected data points or derived using an efficient initialization algorithm such as k-means++ because random initialization often creates dead units.However, in the context of a data stream, data points are continuously fed into the system.Furthermore, their characteristics will change due to concept drift, requiring methods to adapt to the data regardless of the state of the reference vectors.Consequently, any method designed for a data stream must not only extract relevant features from the data, but also strive to minimize the generation of dead units, regardless of how the initial reference vectors are set.

Evaluation metrics
In this study, the performance of vector quantization algorithms is evaluated using two metrics: the mean squared error (MSE) and the number of dead units N dead .The MSE quantifies the average squared distance between each data point and its nearest unit, expressed as follows where M is the number of data points.The data point x i is assigned to the unit s if s = arg min w n ∥ 2 , and as a result k is = 1.Otherwise k in = 0.The unit n is considered a dead unit if is the number of elements in a set.To evaluate the topological properties of the networks generated by SOMRB and NGRB, the average degree and the average clustering coefficient are used.The average degree k is the average number of edges per unit and can be expressed as where L is the total number of edges and N is the number of units.The average clustering coefficient C is defined as The frequency of RB updating occurrences N RB indicates the frequency of dynamic changes of units due to concept drift and is a candidate of a metric for concept drift detection.

Software
The implementation of OKRB, SOMRB, NGRB, online k-means, SOM, NG, and GNG was done in Python using several Python libraries including NumPy, NetworkX, and scikit-learn.NumPy facilitated linear algebra computations, while NetworkX aided in network manipulation and network coefficient computations.Scikit-learn was used to generate synthetic data.The source code used in this study is publicly available and can be found at https://github.com/KazuhisaFujita/RemoveBirthUpdating.

Datasets
In this study, six synthetic datasets and three real-world datasets are used to evaluate the proposed methods.The synthetic datasets include Blobs, Circles, Moons, Aggregation (Gionis et al., 2007), Compound, t7.8k (Karypis et al., 1999), and t8.8k (Karypis et al., 1999).Blobs, Circles, and Moons are generated using the make_blobs, make_circles, and datasets.make_moonsfunctions from the scikit-learn library, respectively.Blobs are derived from three isotropic Gaussian distributions with default parameters for mean and standard deviation.Circles are composed of two concentric circles generated with noise and scale parameters set to 0.05 and 0.5, respectively.Moons are composed of two moon-shaped distributions generated with the noise parameter set to 0.05.Aggregation, Compound, t4.8k, and t7.10k serve as representative synthetic datasets commonly used to evaluate clustering methods.In addition to the synthetic datasets, three real-world datasets from the UCI Machine Learning Repository are used, including Iris, Wine, and Digits.

Results
The numerical experiments were performed to evaluate the performance and explore the features of OKRB, SOMRB, and NGRB.All experimental values are the average of 10 runs with random initial values.An input vector is uniformly and randomly selected from a dataset at each step during training.

Comparison of Network Structures from Synthetic Datasets
Figure 2 shows the unit distributions of OKRB, SOMRB, NGRB, online k-means, SOM, NG, and GNG for synthetic data sets; Blobs, Circles, Aggregation, Compound, t4.8k, and t8.8k.These visualizations provide insight into the structure of the reference vectors and the generated networks.
The figure shows that OKRB, SOMRB, NGRB, and GNG successfully extract the topologies of all datasets.While the SOM-generated network contains some dead units, most of its reference vectors accurately capture the topologies of the datasets.The SOM network topology is a two-dimensional lattice, which often leads to the creation of dead units between clusters.In contrast, online k-means and NG tend to generate dead units when the initial position of the units is significantly away from the dataset.While this dead unit problem could be mitigated by refining the initial value, such a solution is not feasible for data streams, given their unbounded nature and the potential for changing ranges of values in the dataset.This suggests that methods other than online k-means and NG may be more appropriate for dealing with data streams.

Evaluating Method Performances on Static Datasets
Figure 3 shows the evolution of the Mean Squared Error (MSE) over time for a collection of static data sets: Blobs, Circles, Aggregation, Compound, t4.8k, t7.10k, Iris, Wine, and Digits.All four methods (OKRB, SOMRB, NGRB, and GNG) can maintain low MSEs from the 10 4 iterations.In particular, the MSEs of OKRB and NGRB decay rapidly.The performance of OKRB and NGRB is equal to that of GNG and better than other techniques.The MSEs of SOMRB are slightly larger than those of OKRB, NGRB, and GNG, but smaller than those of SOM.Despite the lack of parameter decay and possibility to generate dead units, SOM shows a respectable performance.On the other hand, both NG and Online k-means show bad performance, mainly due to the lack of optimal initial reference vector values and a step decay mechanism.However, for the Circle dataset, NG and Online k-means perform well because the initial values are within the data distribution (the range of values for Circle is from −1 to 1).
Figure 4 shows the evolution of dead units over iteration for a static dataset.OKRB and NGRB keep the number of dead units close to zero from the 10 4 iterations.SOMRB's rate of decrease in dead units is slower than OKRB and NGRB.Interestingly, despite the 2-dimensional lattice bias and the lack of parameter decay, SOM generates a relatively limited number of dead units.However, compared to methods that use RB updating, the number of dead units produced by SOM is higher.For the Blobs, Circles, Aggregation, t4.8k, t8.8k, and Digits datasets, the number of dead units generated by GNG is approximately zero across all iterations.Conversely, NG and Online k-means have a higher number of dead units and a slower rate of decay of the number of dead units.
These results suggest that the proposed methods, namely OKRB, SOMRB, and NGRB, show sufficient performance with static data.However, due to the constraints of the network topology, SOMRB and SOM show relatively inferior performance compared to OKRB and NGRB.Online k-means and NG, in the absence of parameter decay and efficient initialization algorithms, provide inferior results even with static data.It is suggested that online k-means and NG without decay parameters are also likely unsuitable for stream data.

Performance for data stream
In this subsection, the proposed methods with RB updating are evaluated for data streams.We consider three types of concept drifts: sudden, gradual, and recurring concept drifts, as shown in Fig. 5.The data stream consists of several different data sets.A concept drift event, which occurs every 100000 iteration, causes a change from one dataset to another.
For sudden concept drift, the dataset responsible for generating the input undergoes an abrupt change.
For gradual concept drift, the dataset generating the data points gradually shifts from the old dataset to the new dataset.The generation probabilities of the old and the new datasets are represented by p old = (T driftstart + T dur − t)/T dur and p new = 1 − p old , where t is the number of iterations, T dur is the duration of the drift, and T driftstart is the start iteration of the drift.In this experiment we set T dur = 10000 and the initial T driftstart to 90000.For the recurring concept drift, two datasets alternately generate the input, switching every 100000 iteration.
The experiments in this subsection compute the Mean Squared Error (MSE), the number of dead units, the number of edges, and the average clustering coefficient.These values are computed every 1000 iterations, using data points collected during the period T Start < t ≤ T Start + 1000, where T Start = 1000n, n = 0, 1, 2, 3, ....However, the MSE of SOM is consistently higher than that of the proposed method with RB updating.

Gradual concept drift
Recurring concept drift Figure 5: Illustration of the data streams used in this experiment.Data1 through data6 correspond to the Aggregation, Blobs, Circles, Compound, t4.8k, and t7.10k datasets, respectively.For both sudden and recurring concept drifts, the drifts occur every 100,000 iterations.In the case of gradual concept drift, a gradual transition from one dataset to another is implemented over the same iteration interval.During this drift, the probability of data generation gradually shifts.
Although GNG converges quickly to a low MSE value after the arrival of the first stream, it converges to a high MSE value after any concept drift.Figure 6C shows a rapid convergence of GNG's MSE between steps 200000 and 300000.This rapid convergence is due to the fact that the third dataset active during this interval is identical to the first.Therefore, GNG maintains its reference vectors that adapt to data1 after the first concept drift.Examples of the reference vector distributions of OKRB, SOMRB, and NGRB during gradual concept drift can be found in Reference Vector Evolution of the Appendix.
Figures 6D, E, and F illustrate the progression of dead units for OKRB, SOMRB, NGRB, SOM, and GNG.In all scenarios, the dead units of OKRB and NGRB quickly converge to zero after the concept drift.SOMRB's dead units decrease more slowly than OKRB and NGRB.SOM's dead units decrease more slowly than the proposed methods, and its amount also converges higher than the proposed methods.For sudden concept drift and gradual concept drift, OKRB, SOMRB, NGRB, and SOM have few dead units between data5 (t4.8k) and data6 (t7.10k) because Data5 and Data6 are widely distributed with noise.GNG has more dead units than other methods after the first concept drift.
These observations suggest that OKRB, SOMRB, and NGRB deal with concept drift efficiently.SOM, although not improved, does not show poor performance.However, GNG is inappropriate for data stream applications.
Figure 7 A, B, and C show the evolution of the average degree of SOMRB and NGRB.For sudden and gradual concept drifts, the average degree of SOMRB shows a rapid change at each drift event, except for the transition from data5 to data6.It shows a continuous decay that does not stabilize within each period.The average degree of NGRB peaks at each drift event.However, it does not occur during the transition from data2 to data3 in the gradual concept drift.The average degree of NGRB shows a rapid decay and stabilizes within each drift period.Furthermore, its convergence value is different for each dataset.Since the units of SOMRB are placed on vertices of the 2D grid, they cannot form triangular clusters, resulting in a clustering coefficient of zero.On the other hand, NGRB's clustering coefficient shows a peak at each drift, except for the transition from data5 to data6 for the gradual concept drift.Similar to its degree, NGRB's clustering coefficient decays rapidly and stabilizes within each drift period.Furthermore, these convergence values are different for each dataset.
While the value ranges for data5 (t4.8k) and data6 (t7.10k) are similar, and the MSEs for these data are also similar, there are differences in the convergence values of the average frequency and the average clustering coefficient.These results suggest that these two metrics can effectively identify shifts in data stream characteristics (concept drift occurrences) when using SOMRB and NGRB.
Figure 8 shows the evolution of RB updating frequency in OKRB, SOMRB, and NGRB.A strong peak in update frequency is observed coinciding with the onset of each concept drift.However, the peaks for the gradual concept drift are lower than for the other drifts.In particular, the peak observed during the transition from data5 to data6 shows a significant reduction.Figure 8 D, E and F show that the peak shapes for OKRB and NGRB are almost identical.Conversely, the peaks for SOMRB are lower in height and broader in width than those for OKRB and NGRB.In addition, the peaks for SOMRB are delayed from the onset of drift.These findings suggest the potential of RB update frequency as an effective indicator for detecting the occurrence of concept drift.
Figures 9 A, B, and C show the evolution of the MSEs for OKRB, SOMRB, and NGRB using the error-based metric.For details on RB updating using the error-based metric, see RB updating using error-based metric in the Appendix.The MSEs of the methods using the error-based metric show rapid convergence.However, for gradual concept drift, the MSEs of OKRB and NGRB using the error-based metric are larger than those of OKRB and NGRB using the win probability based metric during the data4, data5, and data6 phases.
Figures 9 D, E, and F show the evolution of the number of dead units of OKRB, SOMRB, and NGRB using the error-based metric.In all tested scenarios, the number of dead units fluctuates significantly.Furthermore, these methods show a larger number of dead units when using the error-based metric compared to the win probability based metric.Strikingly, NGRB shows an exceptionally high number of dead units for gradual concept drift.These results suggest that the proposed methods using the win probability based metric demonstrate proficiency in dealing with concept drift.In addition, they significantly mitigate the occurrence of dead units.

Conclusions and discussions
In this study, we proposed three improved vector quantization methods using RB updating for data streams (OKRB, SOMRB, and NGRB).These proposed methods demonstrate fast adaptability to concept drift and provide efficient quantization of a dataset.In addition, both SOMRB and NGRB can generate a graph that reflects the topology of the dataset.However, the performance of SOMRB is slightly inferior to the other proposed methods.Therefore, OKRB is recommended when only vector quantization is required for a data stream.If the task requires not only quantization but also graph generation from a data stream, NGRB is a more suitable option.
Interestingly, SOM with static parameters performs satisfactorily, although its performance is lower than any of the proposed methods.SOM can only learn static datasets because once a map learns and stabilizes, it loses its ability to reshape itself as new structures manifest in the input data (Smith and Alahakoon, 2009).This stability is provided by the decay parameters.The parameter decay gives the SOM the flexibility to adapt to a static dataset in the early stages of training while maintaining Significantly, OKRB, SOMRB, and NGRB are not affected by changes in the value range of the data due to concept drift.This range independence is attributed not only to RB updating but also to a property of online k-means, SOM, and NG, namely the independence of the hyperparameters on the value range of the data.The learning rates of online k-means, SOM, and NG, as well as the parameters of the neighborhood functions of NG and SOM, do not need to be adjusted based on the value range of the data, despite changes in the characteristics of a data set.For example, we typically do not change the learning rate whether the maximum value of the data is 100 or 1.In addition, the parameters of the neighborhood functions of SOM and NG depend on the location of the units on the feature map and the neighborhood ranking, respectively.In fact, in this study, the proposed methods, SOM and NG, achieve the quantization of all the datasets without any parameter changes depending on each dataset.
The proposed methods may also be useful for drift detection.As shown in Figure 6, both the Mean Squared Error (MSE) and the number of dead units increase significantly with concept drift.Similarly, Figure 7 shows significant changes in the average degree and clustering coefficient when concept drift occurs.Furthermore, these two measures show different values for each data characteristic.Figure 8 shows that the frequency of RB updates shows spike-like fluctuations in response to concept drift.Therefore, these measures could be effectively used to detect concept drift.Moreover, if we use these measures and the proposed methods multiply and simultaneously, we will be able to detect drift even more accurately.
The proposed methods are effectively used within a two-step approach, commonly known as an online-offline-based method for data streams.In this approach, the data is first transformed into sub-clusters.Then, these smaller groups are treated as individual objects and combined into larger groups or clusters.This method is valuable when dealing with large data sets.In the first step of this approach, the reduction of dead units is critical, as dead units potentially become outliers.In data mining, outliers can negatively affect processing accuracy, making outlier detection a key aspect of the field (Zubaroglu and Atalay, 2021).In cases where concept drift occurs, data points produced by the data stream with old properties and the centroids derived from them could become outliers.
The proposed methods quickly reduce the dead units caused by concept drift (i.e., they quickly adapt to the new properties of a data stream), thereby efficiently preprocessing the data stream with new properties.Therefore, using the proposed methods for the first step will improve the performance of the subsequent step.

Appendix A Online k-means
Online k-means is an online variant of k-means.This study uses an algorithm derived from the one described in Abernathy and Celebi (2022).For data streams, the online k-means used in this study does not consider the time decay of the parameter.The pseudocode for implementing online k-means can be found in algorithm 4.

B Parameter optimization
In this study, grid search is used for hyperparameter optimization.As a fundamental method for hyperparameter tuning, grid search is both easy to implement and widely accepted (Bergstra and Bengio, 2012;Feurer and Hutter, 2019).
For grid search, the parameter sets for OKRB, SOMRB, NGRB, online k-means, SOM, NG, and GNG are detailed in table 2. For each combination of parameters, these methods quantize three different data sets ten times: Blobs, Circles, and Moons.
The goal of the grid search is to find the parameter set that minimizes the normalized mean square error (NMSE), which is computed as follows where M is the set of datasets: Blobs, Circles, Moons.The MSE for a data set m is calculated as follows where N m is the number of data points in the dataset m, and x i is a data point in the dataset m.MSE max,m is the maximum MSE derived from all possible parameter combinations for a given data set m.

C Evolution of the reference vector
Figures 10, 11, and 12 show the evolution of the reference vectors in OKRB, SOMRB, and NGRB, respectively, along with the distribution of the generated data points during the gradual concept drift (transition from Aggregation to Blobs).This drift phase occurs within the time step range of t = 90000 to t = 100000.The plotted data points are generated within the time span of t − 1000 to t.
During this gradual concept drift, all three methods successfully capture the topological changes and integrate the old and new features of the data.In particular, after the drift, the reference vectors of OKRB and NGRB, representing the old dataset, quickly disappear.In contrast, for SOMRB, the reference vectors representing the old dataset decrease more slowly.

D RB updating using error-based metric
Each unit has an associated error term, E n , initialized to 0. At each iteration, the error term of the winning unit, denoted E s1 , is updated according to the following formula: where x t is the input vector and w s1 is the reference vector of the winning unit.The error term E n for all units undergoes a decay process described by the equation: where β is the decay rate.In addition to the error term, each unit is also associated with a utility term, U n , which is initialized to 0. At each iteration, the utility of the winning unit, U s1 , is updated according to: U s1 ← U s1 + ∥x t − w s2 ∥ 2 − ∥x t − w s1 ∥ 2 , (16) where s 2 is the second nearest unit of the input vector x t .Like the error term, the utility U n decays at each step according to the following equation: In RB updating with an error-based metric, a unit n with minimum utility U n is removed and a new unit is created near the unit q with maximum error E q if the following conditions are satisfied:

Figure 1 :
Figure 1: This figure provides a schematic representation of the Remove Birth (RB) updating process.Each dot represents a data point, while the open circles represent units.(A) shows the initial distribution of data points and their corresponding units.(B) shows the change in data distribution and the introduction of new data points, marked in red, resulting from concept drift.At the same time, the units retain their positions from the initial data distribution.(C) shows the removal of the least frequent winning unit, marked by the blue dotted circle.A new unit, marked by the red dotted circle, is then introduced around the most frequent winning unit, marked by the red-filled circle.
c nmin /c nmax < TH RB then 18: Remove unit n min 19: Add new unit n min on the empty vertex neighboring unit n max 20: Establish edges between unit n min and its neighboring units 21: M min = {n | n is a neighbor of n min } 22: if |M min | > 1 then 23: w nmin = 1 |Mmin| n∈Mmin w n .24: c nmin = 1 |Mmin| n∈Mmin c n .25: else ▷ |M min | = 1 and unit n min connects with only unit n max 26: M max = {n | n is a neighbor of n max excluding n min } ∪ {n max } 27:if |M max | > 1 then 28:

Algorithm 3
Neural Gas with RB updating (NGRB) Require: X = {x 1 , ..., x t , ...}, N 1: Initialize:2:w n ← (ξ 1 , ..., ξ d , ..., ξ D ), n = 1, ..., N , where ξ d = [0, 1) is uniformed random value 3: c n ← 0, n = 1, ..n ← w n + εe −kn/λ (x n − w n ), n = 1, ..., N ages of the edge emerging from the unit n 1 17:Remove the edge with a nm > a max c nmin /c nmax < TH RB then 23:Remove unit n min 24:Create a new unit at n min 25:M = {n | n is a neighbor of n max } where c n is the clustering coefficient of the unit n.The clustering coefficient c n is calculated as c n = 2tn kn(kn−1) , where t n is the number of triangles around the unit n and k n is the number of edges formed by the unit n.If k n < 2, then c n = 0.

Figure 2 :Figure 3 :
Figure 2: 2D plots of reference vectors and distributions of data sets.The first, second, third, fourth, fifth, sixth, and seventh rows show reference vectors generated by OKRB, SOMRB, NGRB, online k-means (OKMEANS), SOM, NG, and GNG, respectively.The data points are represented by cyan dots, while the black dots denote units.The black lines symbolize the edges of the networks.

Figure
Figure6A, B, and Cshow the MSE evolution of OKRB, SOMRB, NGRB, SOM, and GNG for all types of concept drifts.OKRB, SOMRB, NGRB, and SOM quickly converge to low MSE values.However, the MSE of SOM is consistently higher than that of the proposed method with RB updating.
Figure 6: A, B, and C show the evolution of the Mean Squared Errors (MSE) under sudden concept drift, gradual concept drift, and recurring concept drift scenarios, respectively.D, E, and F show the evolution of the number of dead units under the same scenarios -sudden, gradual, and recurring concept drifts, respectively.

Figure 7 D
Figure7 D, E, and Fshow the evolution of the average clustering coefficient of SOMRB and NGRB.Since the units of SOMRB are placed on vertices of the 2D grid, they cannot form triangular clusters, resulting in a clustering coefficient of zero.On the other hand, NGRB's clustering coefficient shows a peak at each drift, except for the transition from data5 to data6 for the gradual concept drift.Similar to its degree, NGRB's clustering coefficient decays rapidly and stabilizes within each drift period.Furthermore, these convergence values are different for each dataset.
Figure 7: A, B, and C show the evolution of the average degree under sudden concept drift, gradual concept drift, and recurring concept drift scenarios, respectively.D, E, and F show the evolution of the average clustering coefficient under the same scenarios -sudden, gradual, and recurring concept drifts, respectively.

Figure 8 :
Figure8: A, B, and Cshow the evolution of the frequency of RB updating in OKRB, SOMRB, and NGRB under sudden, gradual, and recurring concept drifts, respectively.D, E, and F provide close-up views of RB updating occurrences within the iteration range of 80000 to 110000.These close-up views reveal the intricate behavior of the RB updating occurrences during this specific interval.

Table 1 :
Characteristics of datasets