Clustering Preserving Projections for High-Dimensional Data

In this paper, a novel clustering preserving projection method is presented to retain the cluster (or group) structures hidden in the original data. This method involves two steps: (1) the famous clustering method Fuzzy c-means is utilized to discover the cluster distribution in data; (2) such distribution is preserved by finding a linear embedding that minimizes the intra-cluster compactness in low-dimensional space. The feasibility and effectiveness of the proposed method are demonstrated on UCI dataset and USPS digital handwritten dataset.


Introduction
Dimensionality reduction is a traditional problem in machine learning. It has been found important in many domains such as human visual perception, text categorization, and face recognition. The goal of dimensionality reduction is to project high-dimensional data into low space so that the result performs better in further processing such as clustering, classification, indexing and searching.
The techniques for dimensionality reduction can be roughly divided into two main types [1]: (1) techniques that attempt to preserve global properties of the original data in the low dimensional representation, (2) techniques that attempt to preserve local properties of the original data in the lowdimensional representation.
The classical techniques belonging to the first type involve principle component analysis (PCA) [2] and linear discriminant analysis (LDA) [2], which are by far the most popular unsupervised and supervised linear technique, respectively. Using the kernel trick, kernel principal components analysis (KPCA) [3] and kernel discriminant analysis (KDA) [4] were developed to extract a group of new nonlinear features. In recent years, several new dimensionality reduction methods have been introduced, such as ISOMAP [5], and Multi-dimensional Scaling (MDS) [6]. All of these methods attempt to preserve solely the global properties of the data [1], but pay no attention to the neighborhood structure or cluster structure of data.
In addition, some dimensionality reduction methods aim to preserve the local properties of the data. Commonly used methods include Locality Preserving Projections (LPP) [7], Locally Linear Embedding (LLE) [8], Neighborhood Preserving Embedding (NPE) [9], and Laplacian eigenmaps [10]. In these methods, the definition of neighborhood plays an important role. Each of these methods performs well when the data belong to a single well-sampled cluster. They fail when the data are spread among multiple clusters. When data points are organized into multiple clusters, all data points in one cluster may not be able to find neighbors in other clusters and the neighborhood graph of these points does not connect to the neighborhood graph of data points in the other clusters. Therefore, these methods fail to project clustered data into a single low-dimensional coordinate system [11].
The above-mentioned techniques all neglect the cluster distribution of the original data, as a result, the cluster structures are possibly destroyed in the low-dimensional representation. Different from CISAI 2020 Journal of Physics: Conference Series 1693 (2020) 012031 IOP Publishing doi:10.1088/1742-6596/1693/1/012031 2 these existing algorithms, in this paper, a new linear dimensionality reduction method called clustering preserving projections (CPP) is proposed to preserve the cluster (or group) structure information hidden in data. In CPP, the classical clustering analysis method Fuzzy c-means (FCM) is adopted to discover the cluster structure of data, and then a linear embedding is computed to preserve the cluster distribution to the reduced space. CPP looks forward to project the data onto a lower dimensionality in which the intra-cluster compactness will be minimized. Compared to PCA and LPP, the proposed algorithm can preserve the cluster distribution of the data in a lower dimensional space more effectively. The feasibility and effectiveness of the proposed algorithm are demonstrated.

Related Work
The generic problem of linear dimensionality reduction can be described as follows. Given a set X=[x 1 , x 2 ,…., x N ] where x k in R D , find a transformation matrix W that maps these N points to a set of points Y=[y 1 , y 2 ,…., y N ], y k in R d , such that y k 'represents' PCA is a technique that is widely used for dimensionality reduction. It is also known as the Karhumen -Loè ve transform. Its main idea is to project the data along the directions of maximal variance.
Assume the projection onto a one-dimensional space, each data point is projected onto a value of w T x i where w∈R D . The objective function of PCA is formulated as where x is the mean of the data and S is the data covariance matrix. The optimal linear projection for which the variance of the projected data is maximized is defined by the d eigenvectors of the data covariance matrix S corresponding to the largest eigenvalues. LPP is a linear projection algorithm that preserves the neighborhood structure of the data. It builds a graph incorporating neighborhood information of the data set. Using the notion of the Laplacian of the graph, a transformation matrix which maps the data points to a subspace is obtained.
When x i and x j satisfy ||x i -x j || 2 <ε (parameter ε∈R) or x i is among k nearest neighbors of x j , the weight a ij between x i and x j is calculated by otherwise a ij is set to 0. Suppose w is a transformation vector, that is, where a ij will incur a heavy penalty if neighboring points x i and x j are far apart.

Clustering Preserving Projections
One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. Some classical technique such as PCA and LDA pay attention to the global structure information of the data; on the other hand, many other state-of-art algorithms such as LPP and NPE aims to preserve the local neighborhood structure of the data. Different from these existing algorithms, in this paper, a new dimensionality reduction method called clustering preserving projections (CPP) is proposed to preserve the cluster (or group) structure hidden in data.

Clustering Analysis
In this paper, Fuzzy c-means (FCM) [2] is selected for two reasons: (1) its outputs allow the sample to belong to multiple clusters with varying degrees of membership, so FCM can get much more 3 information from dataset than the hard clustering methods.; (2) it performs in an unsupervised manner (class labels are not used), the number of prototypes in FCM is independent of the number of classes, as a result, the obtained fuzzy partition of the data can more closely represent the underlying structures.
Specifically, FCM explores the structures in data by partitioning the samples into groups (or clusters) and its objective function can be formulated as: where c is the number of clusters, are the cluster centers, and the fuzzy matrix U=(u ji ) c×N makes up of the fuzzy memberships of the each training sample x i to each cluster v j . The parameter m (1≤m＜∞) is a weighting exponent on each fuzzy membership that determines the amount of fuzziness of the resulting classification. In the following experiment, the value of m is set to 2. An iterative algorithm for minimizing (4) with respect to u ik and v i can be derived as shown in (5):

Clustering Preservation
In fact, the objective function of FCM aims to minimize the intra-cluster compactness in the original space. Motivated by this, CPP looks forward to preserve the intra-cluster compactness in the reduced lower space. Consider a set X=[x 1 , x 2 , …., x N ], assume its clustering partition U and clustering centers V are calculated by FCM. The goal of CPP is to find a transformation matrix W that minimizes the following objective function: Here the cluster centers v i is computed by (9) and can be further rewritten as iN ] T . Using (7), the (6) can be translated into: To make the objective function concise, (6) can be further written as the objective function is simplified to: It is interesting to observe that D is a diagonal matrix, its entries are column (or row, since S is symmetric) sums of S, i.e.

Justification
It is very interesting to observe that the final objective function of our algorithm is similar to that of LPP in their formulation. In (3) of LPP, the matrix A represents the near relationship among samples. Specifically, when the sample x i is near to x j , the value of weight a ij is small, and vice versa. Corresponding to such matrix A, there is also a matrix S in our algorithm. Can we gain some insights from this matrix?
In fact, the element of S measures the possibility of samples falling into the same cluster or group. For example, suppose the cluster membership of a given sample x i is a hard value, that is to say, the element of u i can be represented as: To summarize, our projection algorithm can be described in Fig. 1. Fig. 1 Algorithm of CPP

Experiment Setup
To testify whether the presented method can effectively preserve the cluster structure of data, we use a typical clustering method K-means to evaluate the clustering accuracy on the original space and the low-dimensional space, respectively. If the clustering result obtained in low-dimensional space performances better than in original space, the effectiveness of CPP will be clearly demonstrated. To further investigate the superiority of CPP, we compare CPP with the typical global technique PCA and local technique LPP. All above comparisons have been made on two classical datasets. Iris is a real-life dataset cited from the UCI Machine Learning Repository. This dataset consists of 150 samples (50 in each of three clusters) and each sample has four dimensions. USPS is handwritten 16x16 digits dataset. The digits 1 and 2 are used in our experiments as the two clusters. There are 1100 examples for each cluster, for a total of 2200. In our experiment, these two datasets have been projected to two dimensional spaces. Table 1 gives the clustering accuracies obtained on the original datasets and the clustering-preserving projected datasets, respectively. The clustering accuracies of 97.33% and 99.32% on the lowdimensional representations of Iris and USPS datasets are respectively higher than those of 88.00% and 98.59% on the original datasets. These results indicate that the cluster structure hidden in the data has been effectively preserved in the low-dimensional space, thus demonstrating the effectiveness of the presented dimension reduction method CPP. To further validate the effectiveness of the presented method, we utilize PCA, LPP and CPP to project Iris and USPS datasets to the two-dimensional spaces, and then adopt K-means to quantificationally measure the correctness of retained cluster structure information in the projected 2D spaces. Table 2 lists the performances of clustering preserving achieved by PCA, LPP and CPP, respectively. From this table, it can be observed that CPP can achieve much better projecting results than PCA and LPP in preserving the cluster structure. The underlying reason is that PCA and LPP all pay no attention to the cluster distribution of data, so the group information is unlikely to be preserved; in contrast, CPP can successfully retain the local cluster distribution of data. Step1: Perform FCM on the original space and calculate the partition matrix U according to (5).

Experiment Result
Step2: Compute the matrix S, D and L in terms of (10) and (12) Fig.2 shows a 2D projection of Iris dataset achieved by PCA, LPP and CPP, respectively. The dataset Iris has three clusters: the class Iris Setosa is well separated from the other two clusters; the class Iris Versicolor and Iris Virginica are not easily separated from each other. In these figures, the characters '+','.' and '*' represent the data samples from three different clusters, respectively. The character 'O' denotes a misclassified sample. In Fig.2(a) and (b), many samples falling into the overlap area between cluster Versicolor and Virginica have been misclustered, these results indicating that PCA and CPP do not care about the cluster structure of data; in contrast, in Fig. 2(c), only four samples have been misclustered, so it is clear that CPP can effectively retain the cluster structure in the processing of dimension reduction.

Conclusion
In this paper, we propose a clustering preserving projection method for dimension reduction. This method can retain the cluster (or group) structures hidden in the original data. This method employs Fuzzy c-means to discover the cluster structure in data, and then calculates a linear embedding to minimize the intra-cluster compactness in low-dimensional space. The experiments on UCI dataset and USPS digital handwritten dataset demonstrate the feasibility and effectiveness of the proposed method.