Density Gain-Rate Peaks for Spectral Clustering

Clustering has been troubled by varying shapes of sample distributions, such as line and spiral shapes. Spectral clustering and density peak clustering are two feasible techniques to address this problem, and have attracted much attention from academic community. However, spectral clustering still cannot well handle some shapes of sample distributions in the space of extracted features, and density peak clustering encounters performance problems because it cannot mine the local structures of data and well deal with non-uniform distributions. In order to solve above problems, we propose the density gain-rate peak clustering (DGPC), a new type of density peak clustering method, and then embed it in spectral clustering for performance promotion. Firstly, in order to well handle non-uniform sample distributions, we propose density gain-rate for density peak clustering. Density gain-rate is based on the assumption that the density of a clustering center will be higher with the reduce of the radius. Even under non-uniform distributions, the cluster center in low density region will still have a significant density gain-rate thus can be detected. We combine density gain-rate in density peak clustering to construct DGPC method. Then in the framework of spectral clustering, we use our new density peak clustering to cluster the samples by their extracted features from a similarity graph of these samples, such as the neighbor-based similarity graph or the self-expressiveness similarity graph. Compared with the previous spectral clustering and density peak clustering, our method leads to better clustering performances on varying shapes of sample distributions. The experiment measures the performances of our clustering method and existing clustering methods by NMI and ACC on seven real-world datasets to illustrate the effectiveness of our method.


I. INTRODUCTION
Clustering is widely used to extract potentially useful information in unsupervised learning environment. It aims to group samples in such a way that the samples within the same group exhibit a higher similarity than ones in different groups [14], [15], [30]. Traditional clustering methods, such as k-means and k-medoids [24], [28], just well cluster the samples with spherical distributions [36]. They often fail to deal with nonspherical sample distributions.
Spectral clustering (SC) is a significant way to improve performance of the traditional clustering methods on varying shapes of sample distributions [1]. A group of SC methods have been proposed in recent years [11], [38], [40]. Firstly they use a similarity graph to extract features of the samples, then use k-means to cluster these samples in the space of the extracted features [4]. In certain cases, the clustered samples The associate editor coordinating the review of this manuscript and approving it for publication was Nizam Uddin Ahamed .
with nonspherical distributions will become spherical distributions by mining the local structures of these samples in feature extraction, thus k-means can well cluster these samples. However in real-world datasets, samples with nonspherical distributions still cannot change to spherical distributions by the feature extraction [26], [47].
Density peak clustering (DPC) emerges as another essential way to handle the problem of varying shapes of sample distributions [36]. DPC can recognize the clusters regardless of their shapes of sample distributions, as it clusters the samples by their densities, [7], [12]. Recently, a variety of approaches based on DPC method also succeed in recognizing clusters under varying shapes of sample distributions. Jiang et al. proposed GDPC method [23], this method uses gravitation theory and nearby distance to identify centers of the clusters and outliers more accurately than DPC with an alternative decision graph. Du et al. [10] contributed a density peaks clustering based on k-nearest neighbors and PCA. This method provides another option for the local density computation by introducing the idea of k nearest neighbors (KNN) into DPC. Xu et al. proposed Denpehc method. It directly generates clusters on each possible clustering layer and uses a grid granulation framework to cluster large-scale and high-dimensional datasets [44]. Xie et al. proposed FKNN-DPC method which computes the local density of the sample relative to its K -nearest neighbors for any size dataset independent of the cutoff distance, and then assigns the remaining points to the most probable clusters by the fuzzy weighted K-nearest neighbors [43]. Wu et al. proposed DBG method, applying only distances between a much smaller number of grid nodes to cluster the samples rapidly Instead of calculating Euclidean distances between all the samples in dataset. [42].
Unfortunately, algorithms based on DPC, namely, density peak clustering and its variants, encounter the problem of performance. To explain this, DPC cannot well capture the local structures of the data under non-uniform distributions. Some researchers propose DP-SC [26]. This method uses DPC to cluster the samples after feature extraction in SC. Though DP-SC can better mine the local structure of samples than DPC, it still cannot well handle the non-uniform sample distributions in the space of extracted features.
In order to solve above problem, we propose density gain-rate peak clustering (DGPC) which is enlightened from information gain-ratio, and we embed DGPC in spectral clustering then [35]. It is based on the assumption that the density of a clustering center will be higher with the reduce of the radius. Firstly, for each sample, DGPC computes its density gain-rate which can present the rate of gain in its density with the reduce of the radius. Thus even under the non-uniform distributions, the cluster center of a low density region will have a significant density gain-rate like the cluster center in high density region. Then DGPC clusters the samples in data with following three assumptions listed below.
• A cluster center should have a higher density gain-rate than that of its neighbors.
• There should be a long distance between a cluster center and any point with a higher density gain-rate than that of this center.
• Each sample point, besides the cluster centers, should be in the same cluster as its nearest neighbors with the higher density gain-rates in the sample points.
Based on DGPC, we propose our two new DGPC-SC method which embed DGPC into SC with a neighbor-based similarity graph and a self-expressiveness similarity graph. Compared with the previous SC methods and DPC methods, our DGPC-SC method can get the better cluster of the samples under nonspherical and non-uniform distributions. Fig. 1 as example illustrates the advantages of using DGPC to cluster non-uniform samples by extracted features in SC. Fig. 1 (a) shows the clustering unbalance-spiral dataset where sample distributions are spiral shapes. The different colors of the samples in Fig. 1 (a) indicate the different classes. The red samples in this dataset are much more than blue ones, which simulates the case of non-uniform samples distribution. The clustering results of this dataset with different clustering methods (k-means, DPC, SC, DP-SC, DGPC-SC) are shown in Fig. 1 (b) -(f). For making a fair comparison, the SC, DP-SC and DGPC-SC use the same similarity graph (e.g., K -NN graph, which is agile to build and well-known) to extract features. As we can see from Fig. 1 k-means hardly cluster this dataset well due to the fact that it fail to handle nonspherical sample distributions effectively. DPC encounters problem on the non-uniform samples, so even most of the sample distributions obey its assumptions, it still cannot well cluster this dataset. SC and DP-SC can get better clustering results than k-means and DPC, but the performances of them are still not excellent enough. DGPC-SC generates an improved clustering performance on this dataset.
The key contributions of this paper are summarized as follows. Firstly, we propose density gain-rate to mine the structure of data under non-uniform distributions. Secondly, we combine density gain-rate into DPC method to construct our DGPC method. Finally, based on DGPC we propose a new SC method, DGPC-SC.
The rest of this paper is as follows: Section 2 reviews DPC and two SC methods including Ncut-SC and 2 -SC. In Section 3, we present our DGPC. We also propose two new SC methods, Ncut-DGPC and 2 -DGPC, by embedding DGPC into SC with the neighbor-based similarity graph and the self-expressiveness similarity graph, respectively. In Section 4, we connect our DGPC with related works including DPC and DENCLUE. Section 5 shows the experimental results of our clustering methods. We conclude this paper in Section 6.

II. PRELIMINARIES
Our work is based on DPC and two SC methods, therefore we review them in this section. For convenience, we list the notations in Table 1.

A. DPC METHODS
DPC is able to recognize clusters regardless of their shapes [36]. It is based on the following three assumptions.
• A cluster center should have a higher local density than that of its neighbors.
• There should be a long distance between a cluster center and any point with a higher local density than that of this center..
• Each sample point, except the cluster centers, should be in the same cluster as its nearest neighbor with the higher local densities in the sample points.
DPC method firstly defines two variables, ρ i and δ i , for each x i . ρ i is the local density of x i and defined as where χ ( * ) is the sign function defined as d c is a cutoff distance and d i,j is the Euclidean distance between x i and x j . A commonly used strategy to tune the d c is as follow [36]: where p ∈ [0, 100], d n 2 ×p/100 ∈ {d 1 , d 2 , . . . , d n 2 } indicate the Euclidean distances between all pairs of samples and are sorted in ascending order. δ i is the minimum value of the Euclidean distances between x i and all the other points with the higher local densities than x i .
According to DPC, each of the cluster centers should be with a high ρ i and a large δ i . Based on this guideline, there are two ways to select the cluster centers. One is that we can use a decision graph to manually select the cluster centers. This decision graph has ρ and δ as its axes and provides a visualization of what sample points can be selected as cluster centers. The other is that we can select those points whose γ i are large as the cluster centers [36], [43], where After the cluster centers are selected, DPC method assigns each sample point x i , except the selected cluster centers, to the same cluster of its nearest neighbor in the sample points with the higher local densities than that of x i . Recently, a variety of approaches based on DPC method also succeed in recognizing clusters under varying shapes of sample distributions. Jiang et al. have proposed GDPC method, this method uses gravitation theory and nearby distance to identify centers of the clusters and outliers more accurately than DPC with an alternative decision graph [23]. Xu et al. proposed Denpehc method. It directly generates clusters on each possible clustering layer and uses a grid granulation framework to cluster large-scale and high-dimensional datasets [44]. Xie et al. proposed FKNN-DPC method which computes the local density of the sample relative to its K -nearest neighbors for any size dataset independent of the cutoff distance, and then assigns the remaining points to the most probable clusters by the fuzzy weighted K-nearest neighbors [43]. Wu et al. proposed DBG method, applying only distances between a much smaller number of grid nodes to cluster the samples rapidly Instead of calculating Euclidean distances between all the samples in dataset [42].

B. SC METHODS
In an SC framework, one extracts features by a similarity graph of the samples, and then uses k-means to cluster the samples in the space of the extracted features [6], [9]. Generally speaking, two kinds of similarity graphs are often used in SC methods: the neighbor-based and the selfexpressiveness [3], [11], [33].
Ncut-SC is a representative SC method, which is based on the neighbor-based similarity graph [38]. Given a sample matrix X = [x 1 ; x 2 ; . . . ; x n ], this method firstly constructs a neighbor-based similarity graph associated with a weighted adjacency matrix W . Each element w i,j in W is defined as follows [34].
where N i is the set of the nearest neighbors of x i and σ is a free parameter. Then the features that well represent the manifold of this graph can be extracted by the following optimization problem: where D is a diagonal matrix with the ith diagonal element decided by d i,i = j w i,j . The optimal solution y to this problem is the normalized eigenvector corresponding to the smallest nonzero eigenvalue of D −1 L with the normalization of y T y = 1, where L = D − W is a Laplacian matrix [38].
Although, this eigenvector can well represent a part of manifold structures in the above graph, we cannot get the ideal cluster performance by only this eigenvector. As the suboptimal solutions to (7), which correspond to a few small eigenvalues, some of the next normalized eigenvectors also contain useful partitioning information. thus Ncut-SC also selects all above normalized eigenvectors as the extracted features.
As another crucial SC method, 2 -SC is based on the self-expressiveness similarity graph [33]. This method firstly constructs a regression coefficient matrix G where each column vector g i in G is decided by solving optimization problem: where λ is a free parameter and e i is the standard orthogonal basis of R n . The optimal solution of the above formulation is given by where X is the sample matrix, P = (XX T +λI ) and Q = X T P. After the samples are projected into the linear space, G handles the errors by performing a hard thresholding operator H k ( * ) over g i , where H k ( * ) keeps the k largest elements in g i and zeroizes the others [33]. Then 2 -SC computes the weighted adjacency matrix W = G T +G and uses this matrix to compute the Laplacian matrix L, like Ncut-SC. Finally, this method selects the normalized eigenvectors corresponding to a few small nonzero eigenvalues of L as the extracted features.
The above SC methods can be combined in a one-after-theother way. To take advantages of multiple clustering methods, another strategy is to exploit the ensemble clustering technique [41]. Recently, there are many effective ensemble clustering methods have been proposed. Huang et.al proposed a novel ensemble clustering approach named robust ensemble clustering using probability trajectories. The method is developed from sparse graph representation and probability trajectory analysis. The researchers grew the elite neighbor selection strategy for the identification of uncertain links using locally adaptive thresholds, and they establish a sparse graph with a small number of probably reliable links [18]. Furthermore, Huang et.al developed locally weighted ensemble clustering based on ensemble-driven cluster uncertainty estimation and local weighting strategy. They considered the cluster labels in the entire ensemble by an entropic criterion for the estimation of the uncertainty of each cluster [19]. Besides, Huang et.al presented enhanced ensemble clustering via fast propagation of cluster-wise similarities. The experts firstly built a cluster similarity graph which would be utilized to generate a new cluster-wise similarity matrix. Then, enhanced co-association matrix would be achieved by mapping the former matrix from the cluster-level to the object-level [20].

III. DGPC AND DGPC-SC
We firstly propose DGPC method to better deal with non-uniform sample distribution. Then, we construct our two DGPC-SC methods through embedding DGPC into SC with the neighbor-based similarity graph and the selfexpressiveness similarity graph, respectively. Compared with previous SC methods, our two DGPC-SC algorithms can better cluster the samples under non-spherical and non-uniform distributions.

A. DENSITY GAIN-RATE PEAK CLUSTERING
In order to solve the problem of non-uniform distribution for DPC, we define the gain rate of density. It is based on the assumption that with the reduce of the radius, the density of a clustering center will be higher. Firstly, we define the density of x i as Based on this renewed density, we define the density gain of x i , G i , from a large radius d q to a small radius d t as where d t < d q . Then the gain rate of density of x i is define as VOLUME 9, 2021 FIGURE 2. An example of clustering non-uniform distribution data using DGPC.
In DGPC, we select those sample points whose γ i are large as the cluster centers, where where Then, we assign each sample point x i , except the selected cluster centers, to the same cluster of its nearest neighbor with a higher density gain-rate than that of x i . We list the main algorithmic steps of DGPC in Algorithm 1.
There is an example to show the advantage of our density gain-ratio on a synthetic dataset in Fig 2. As we can see, the two clusters in this dataset have non-uniform distributions. DPC cannot accurately cluster the samples in these two clusters. However, DGPC can cluster these samples accurately with the assistance of density gain-rate. Like DPC, DGPC also can discover the outliers in data. The difference is that DPC regards the samples with low densities as the outliers, but DGPC regards the samples with low GRs as the outliers. The result of discovering the outliers in our above synthetic dataset of DGPC is shown in Fig. 3. The black points are the outliers discovered by DGPC. From the clustering result, we can see that DGPC can well discover the outliers in data.
Although the time complexity of DGPC is O(n 2 d) which is same with time complexity of DPC, the time complexity of DGPC-SC highly depends on the corresponding SC method. Unfortunately, because of the time complexity, SC methods including our DGPC-SC are hard to deal with big data. 46004 VOLUME 9, 2021 Algorithm 1 DGPC 1: Calculate GR i of each x i by (12); 2: Calculate δ i of each x i by (14); 3: Calculate γ i of each x i by (13); 4: Sort all the samples by γ = {γ 1 , γ 2 , . . . , γ n } in descending order and then select the first n c points in the sorted list to be cluster centers; 5: Assign each sample point x i , besides those selected as cluster centers, to the same cluster as its nearest neighbor x j with a higher GR j than GR i , thus form the set of n c clusters C = {C 1 , C 2 , . . . , C n c }; 6: Return C;

B. DGPC-SC METHODS
In SC, the spectral embedding can serve as a better and low-dimensional representation for the original data. However, there still are nonspherical sample distributions and non-uniform samples distributions in this representation. Most of SC methods use k-means to cluster the samples in this representation, while k-means owns no ability of solving nonspherical sample distributions perfectly. SC-DPC method uses DPC to cluster the samples in this representation. considering that DPC cannot well deal with non-uniform samples distributions, we use DGPC to deal with non-uniform and nonspherical sample distributions in this representation, which leads to the better clustering result.
Based on our framework, we improve two typical SC methods, Ncut [38] and 2 -SC [33], then get our two DGPC-SC methods: Ncut-DGPC and 2 -DGPC. Furthermore, instead of distance, an effective kernel KROD is used to measure the differences between the samples in our methods [22]. The two new algorithms are present in Algorithm 2 and Algorithm 3. Fig. 4 is an example to illustrate the advantages of using DGPC in SC. Fig. 4 (a) and (c) show Lung dataset with the respective representations of the Euclidean distance and KROD in the space of the extracted features. Fig. 4 (b) and (d) show the respective clustering results of Ncut and Ncut-DGPC. As we can see from Fig. 4 (a) and Fig. 4 (c), even in the space of the extracted features, there are still many sample under non-spherical and non-uniform distributions. Ncut fail to handle these distributions excellently, and to recognize the cluster with the smallest number of the samples (the cluster marked by color brown). But Ncut-DGPC not only can better cluster the these samples. From this example and the example of the above subsection, the reason, behind Ncut-DGPC can get a better cluster result, is that it uses DGPC to cluster the samples by the extracted features.

IV. CONNECTED TO RELATED WORKS
In this section, we theoretically analyze the DGPC with DPC and DENCLUE to show the advantages of DGPC [17], [36]. In DENCLUE the density gradient of x with respect to a sample set D = {x 1 , x 2 , . . . , x k } is defined as where f B (x, x i ) is the basic influence function of sample x i on x. Generally speaking f y B (x) > 0. Then DENCLUE computes density-attractor of the sample. A sample x is density attracted to a x * , if and only if ∃k, d(x k , x * ) ≤ with where and δ are two parameters. The calculation stops at k, and takes x * = x k as a new densityattractor. Finally DENCLUE clusters the samples under nonspherical distributions by the following definition.  Definition 1: A arbitrary-shape of the set density-attractors X is a subset C ∈ D, 1. ∀x in C, ∃x * x is density-attracted to x * 2. ∀x i , x j ∈ X : ∃ a path P from x i to x j with ∀p ∈ P : However, it is hard to tune the parameters , δ and ζ for DENCLUE. DPC can be seen as a special case of DENCLUE in the clustering process without tuning these parameters.
Proposition 1: DPC is a special case of DENCLUE in the clustering process. In this case D = {y}, where y is the nearest neighbor of x with the higher local density.
It is easy to see, in this case, the density gradient of x will be same with (y − x). Thus x will be clustered with y without tuning any parameters. The disadvantage of DPC is not robust for non-uniform sample distributions. The samples in low density regions will be easily clustered to the high density regions. DGPC improves this problem with only one more parameter, the larger radius d q . We also prove that DPC is a special case of DGPC.
Proof: When d q > max{d 1 , d 2 , . . . , d n 2 }, the density of each sample x i , ρ d q i , is equal to N . In this case, the density gain of each x i can be computed as Thus the density gain-rate of x i is The GR i above has linear relationship with ρ d t i . Therefore, in the case of d q > max{d 1 , d 2 , . . . , d n 2 }, DPC will has the same clustering results as DGPC after selecting clustering centers.
In practice, DPC selects cluster centers by the large γ i in (5). For any two non-uniform distribution clusters C 1 and C 2 (Here we assume that the samples in C 1 are denser than the samples in C 2 ), if ∃x i ∈ C 1 makes where c 2 is the center of C 2 . DPC will incorrectly select the cluster center of C 2 , leading to the inaccurate clustering result, as shown in Fig. (2) (a). In this process, DPC directly compares weighted ρ i and weighted ρ c 2 , thus ignores the local structure of c 2 . Our DGPC mines the local structure that even in sparse cluster, with the reduce of the radius, the density of a clustering center will be higher by GR, thus get the correct clustering result, as shown in Fig. (2) (b). As described above, DGPC is a extend of DPC and can better mine the cluster in the local structure under the non-uniform sample distributions. Compared with DENCLUE, DGPC needs not to tune the complex parameters.

V. EXPERIMENT AND ANALYSIS
In this section, we introduce the experimental settings in Subsection 4.1. Experimental results of different clustering methods are shown in Subsection 4.2. Finally, some visual results of different methods on ORL dataset are shown in Subsection 4.3 [37].

A. EXPERIMENTAL SETTINGS
In our experiment, we use eight public datasets to test the performances of our clustering methods. These datasets include Iris, Wine, Isolet [2], Lung [39], ORL [37], Jaffe [27], UMist [13], COIL20 [29], and Mnist-test [8]. The experimental results of Lung dataset are shown in the examples of Section 3 to illustrate the motivation of this work. The experimental results of the other datasets are shown in this section. All of the above datasets are described in Table 2. To verify the effectiveness of our clustering methods, we compare ours with eleven clustering methods listed below.
1) k-means. It works by computing centres of the different clusters iteratively [28]. 2) NMF-LP. It extracts features of samples by non-negative matrix factorization then uses k-means to cluster these samples [5]. 3) N-Cuts. It extracts features of samples by non-negative matrix factorization then uses k-means to cluster these samples [5]. 4) SSC. It is a famous subspace SC method with 1minimization and can well handle samples near the intersections of subspaces [11]. 5) DPC. It works by combining the sparse samples into the dense samples [36]. 6) GDPC. It uses gravitation theory to improve DPC [23]. 7) PK-DPC. It improves DPC based on K -nearest neighbors and PCA [10]. 8) DP-SC. It uses DPC to cluster the samples with extracted features in SC [26]. 9) U-SENC. It works by interpreting the sparse sub-matrix as a bipartite graph then transfer cut is utilized to efficiently partition the graph [21]. 10) CLR. It constrains Laplacian Rank method to learn a graph and clusters samples on this graph with 1 -norm or 2 -norm constraint [31]. 11) 2 -SC. It uses 2 -graph to extract features in SC [33]. We use two clustering evaluation metrics: clustering accuracy (ACC) and normalized mutual information (NMI) to measure the clustering performance. Denote the clustering results with B = {b 1 , b 2 , · · · , b n } and the truth label of X with A = {a 1 , a 2 , · · · , a n } respectively. ACC is defined as [46] If a == b, E(a, b) = 1; otherwise, E(a, b) = 0. map(b i ) is the best mapping function that permutes clustering labels to match the given truth labels using the Kuhn-Munkres algorithm [25], [32]. The lager ACC leads to a better clustering results. The NMI is defined as [16] NMI (B, where H (B) and H (A) are the entropies of distribution B and A. MI (B, A) is the mutual information metric of B and A. The lager NMI is, the better clustering result is. For the original features from a dataset and the extracted features of our graphs, we use 2 -norm to normalize them since it empirically improves the clustering performances [33], [45].

B. PERFORMANCE COMPARISON
Firstly, we compare our SC methods with eleven clustering methods for demonstrating the advantages of our SC methods. There are some hyper-parameters that need to be set in our methods. k for k-nearest neighbors are set from {4, 5, 10, 15, 20, 25}, p is set from{2, 2.5, 3}, λ is set from {0.05, 0.5, 5, 50, 500}, d t is set by t and d q is set by 2 × t. Furthermore, we combine the Gaussian kernel into the process of computing density peaks, like the original DPC [36]. In Table 3, we show the NMIs of the different clustering methods. Then we show the ACCs in Table 4. From Table 3 and Table 4, we can see that, in most of the cases, our clustering methods outperform the previous methods. Interestingly, we achieve perfect result (NMI = 1 and ACC = 1) on Jaffe dataset, which means that all samples in the same class are clustered into the same group. For SC methods, we can see that compared with k-means or DPC, DGPC can get the better clustering results for SC. Then, in order to demonstrate the advantages of DGPC, we compare DPC, GDPC and DGPC in Fig. 5 and Fig. 6. For a fair comparison,   we just tune p, which is the parameter for DPC GDPC and DGPC. As we can see, DGPC performs better than GDPC and DPC.

C. THE VISUAL RESULTS
As did in many other DPC-based methods [36], [43], we show the visual clustering results on ORL-20 dataset, which is composed of the 200 images of the first 20 subjects in ORL dataset. Fig. 7 shows the visual results of the different methods on ORL-20. The different back ground colors index the different clusters. As we can see, DGPC performs much better than DPC. Ncut-DGPC and 2 -DGPC get the best clustering  Fig. 7. Especially, 2 -DGPC clusters the samples of ORL-20 with a mere three mistakes.

VI. CONCLUSION
We firstly propose DGPC. It uses density gain-rate to find the cluster centers in low density regions. Compared with the previous DPC methods, DGPC can better handle non-uniform samples in non-spherical distributions. Then, we improve two typical SC methods, Ncut and 2 -SC, then get our two DGPC-SC methods: Ncut-DGPC and 2 -DGPC. Compared with the pervious clustering methods, new methods achieve the better performances on real-world datasets.
In the future work, we can explore how to automatically recognize the cluster number and the cluster centers by DGPC in SC methods. Furthermore, as mentioned above, because of the time complexity of DGPC, improving the efficiency of the SC methods can be another future work.