FMvC: Fast Multi-View Clustering

In multi-view clustering, an eigen-decomposition of the Laplacian matrix of the graph is usually necessary. This leads to a significant increase in time cost and also requires post-processing such as $k$ -means. In addition, some methods require learning a uniform graph matrix. In large-scale data, this process significantly increase time and memory costs. To address these problems, this paper proposes Fast Multi-view Clustering (FMvC). First, non-negative constraints are added to the objective function from the unified view of relaxed normalized and ratio cuts. Then, graph reconstruction is performed on the similarity matrix using an indication matrix to ensure that the obtained graph has robust intra-cluster and weak inter-cluster connectivity. Besides, the operation speed of the method can be further enhanced by setting a common labeling matrix. Finally, the problem is solved optimally based on the strategy of alternating directional multipliers. Experimental results on eight real-world datasets demonstrate the effectiveness of the proposed algorithm, which can always outperform eleven existing baseline algorithms.


I. INTRODUCTION
Clustering is known to be a powerful technique for unsupervised learning. Traditional clustering methods tend to perform the learning process from one view, which is called single-view learning. However, the traditional methods can not satisfy the actual needs in practice because the correlation between different views is ignored. Datasets can be encoded by multiple views, different sources, and heterogeneous attributes [1]. Based on such datasets, a new clustering method, called multi-view clustering, has arisen [2]. This type of method has been used in many fields such as automation [3], image [4], [5], and communication [6]. This approach learns complementary information from different views and outputs consistent results. For example, documents can often be represented in multiple languages in natural language processing. Each language represents a view, and co-regularized spectral clustering (CoregSC), proposed by Kumar et al. [7], has become a common data processing method in this field.
The associate editor coordinating the review of this manuscript and approving it for publication was Mauro Tucci .
Numerous scholars have thoroughly analyzed and discussed the study of multi-view learning [8], [9]. Currently, multi-view clustering methods include collaborative training methods [10], [11], multiple kernel learning methods [12], graph-based methods [13], [14], and other types. Since graph-based clustering methods can effectively explore the potential information from the data, by considering the diversity of knowledge and complementary information among different views, they can always improve the learning performance [15]. In this paper, we focus on graph-based multiview clustering methods.
In recent years, numerous scholars have devoted themselves to extend the single-view clustering to multi-view clustering. Zhou and Burges [16] proposed a multi-view clustering method by extending the normalized cut from a single view to multiple views. Cai et al. [17] built a model for each feature description and then unified these models to learn a Laplacian matrix. Liu et al. [18] proposed multi-view clustering by a joint non-negative matrix decomposition approach (MultiNMF). To deal with largescale datasets, Li et al. [19] used bipartite graphs to construct VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This position is generated by random selection. We apply these addresses to each view.
similarity matrices. Zhan et al. [20] proposed a new objective function to learn a unified graph by minimizing the differences between different views and adding rank constraints to the Laplace matrix. To obtain the clustering results directly, Cai et al. [21] proposed a new robust multi-view k-means clustering method (MkC), which can obtain the clustering results directly by decomposing each view data feature. Zhan et al. [20] proposed multi-view consistent graph clustering (MCGL) by adding rank constraints to the learned global Laplacian matrix. Xia et al. [22] proposed robust multi-view graph clustering (MSC) based on low-rank and sparse decomposition. A probability transfer matrix is first constructed for each view. The low-rank probability transfer matrix is recovered, and finally, the common matrix is input to the Markov chain method to obtain the clustering results. Zhu et al. [23] proposed a one-step multi-view graph clustering, which learns the affinity matrix from low-dimensional data. We classify the above methods into two categories based on whether they learn a consistent graph. 1) Need to learn a consistent graph: this method requires post-processing of the consistent graph matrix using other methods. This often leads to suboptimal clustering results. 2) No need to learn a consistent graph: the global similarity matrix is learned directly by adding rank constraints. This method has good performance. However, the computational complexity is high because it requires eigen-decomposition. This paper focuses on the uniform label matrix, interpretability of clustering results, and time cost. A novel multiview clustering model is proposed. As shown in Fig. 1, the method first forms different types of data into multi-view data; secondly, the addresses of the anchor points are randomly selected. And the data corresponding to that address in each view is used as the anchor point. Embed the anchor points in the similarity matrix. The anchor address means the location of the anchor point in the matrix; finally, it iteratively updates them quickly by a non-negative uniform label matrix. In Fig. 1, F and G represent the indicator matrix and label matrix, respectively, forming the model's balance term. K (v) is the rotation matrix used to ensure the accuracy of the clustering results. This can eliminate the iterative pre-processing computation (e.g., k-means) and improve computational efficiency.
The main contributions can be summarized as follows: • Model: It investigates an advanced multi-view clustering paradigm and provides a novel solution for fast clustering multi-view clustering. Graph reconstruction by orthogonal and non-negative constraints for multi-view data, resulting in a structured graph. The interpretability of clustering is greatly enhanced due to the inclusion of non-negative conditions. Post-processing (e.g., k-means) is discarded in the method of obtaining clustering labels. The clustering labels are received directly by indexing the column where the most significant element of each row of the label matrix is located, thus further reducing the time cost.
• Algorithm: A fast multi-view clustering method is proposed using the unified view of normalized cuts and ratio cuts in traditional spectral clustering (SC). By adding a unified label matrix, the convergence speed of the objective function is further improved.
• Result: This study has shown good performance on six real-world datasets and has compared it with eight related studies.
The rest of the paper is organized as follows: Section II gives notation and briefly reviews related work on spectral clustering based on a cut view. Section III describes the proposed method and the optimization process in this paper. The experimental results and discussion about FMvC are given in Section IV. Conclusions are provided in Section V.

II. RELATED WORKS
In this section, we present the related work of this study from a unified view of spectral clustering algorithms.

A. NOTATIONS
We define some notations and symbols commonly used in spectral clustering based on the graph cut perspective. An undirected graph can be expressed as G = (V , E), where V denotes the data point, and E means the edge between two data points. We defineĀ as a complement of A, where A is a set of vertices. |A| is the number of elements in the subset A. vol(A) is the sum of the degree matrix D diagonal elements, i.e., i∈A d ii . The ith row and ith column elements of the degree matrix are denoted as d ii . i∈A,j∈Ā w ij is the graph cut between A andĀ. w ij and W represent the weights of the edge and the similarity matrix, respectively.

B. SPECTRAL CLUSTERING WITH SINGLE VIEW
The objective of clustering is to split G into c unconnected subgraphs, which is an optimal partitioning problem for multi-path undirected graphs and belongs to the NP-hard problem. It is easy to see that A i ∩ A j = ∅(i, j = 1, . . . , c) and A 1 ∪, . . . , ∪A c = {v 1 , . . . , v n }. We can translate this problem into a spectral decomposition of the graph Laplacian matrix. Since minimum cut often cut just one of the vertices individually, this will lead to unbalanced clusters of the cuts. Therefore we usually use ratio cuts (R-cut) [24] and normalized cuts (N-cut) [25]. Their objective functions are as follows: The subgraph size balance needs to be ensured when the graph was cut. Thus, two problems that need to be solved are both NP-hard and we need to relax them. Setting the instruction matrix as H ∈ R n×c , then its jth column vector we denote by h j . For the Ncut, the h j in the ith row can be denoted as It is not difficult to find that , where L denotes the Laplace matrix, I c denotes the unit matrix of order c, and D denotes the degree matrix. Thus we can transform the N-cut problem of (2) into a matrix trace of the form where Tr(·) denotes the trace of the matrix. In (4), after the relaxation of Ncut, we can ignore the discretization requirement, while retaining only the orthogonality constraint. If we set F = D 1/2 H after bringing to (4) we can get This happens to be the objective function of non-normalized spectral clustering as well. We can decompose L orL to obtain the optimal solution of H or F from the eigenvectors corresponding to the first c minimum eigenvalues, and then perform k-means on its row vectors to obtain the clustering labels.

C. UNIFIED VIEWPOINT OF N-CUT AND R-CUT
If the similarity matrix W is a double stochastic matrix, i.e., a matrix in which the sum of all elements of each row and each column of W is 1. Then the degree matrix is a identity matrix. Then L = I − W =L and F = H . At this point, the graph Laplacian matrix is normalized, and then (5) is exactly equivalent to (6). This is the conclusion from the unified view of N-cut and R-cut. In the subsequent sections, the default W is a doubly stochastic matrix and also a structured graph.
To ensure that W is a doubly stochastic matrix, it can be transformed into the condition that where a bolded 1 is a vector containing all elements as 1.
In the real world, single-view data do not provide a complete description of things. This is because they can only describe things from a single point of view. Therefore, the data are usually described with multiple heterogeneous sources to generate multi-view data [8]. From the unified viewpoint of spectral clustering, we can write the corresponding multi-view spectral clustering objective function: where L (v) is the graph Laplacian matrix of the v-th view.
Obviously, this objective function is problematic and cannot meet the needs of multi-view clustering. There are three main problems: First, (8) does not consider the relationship between different views. After several iterations, the instruction matrix F cannot get consistent embedding and thus cannot form the multi-view instruction matrix accurately. Second, it still does not get rid of the limitation of post-processing. Since the k-means algorithm is sensitive to initialization, different initial points will lead to different clustering results. Therefore, the post-processing makes the clustering results not robust. Third, the method requires eigen-decomposition, which will lead to a significant increase in computational complexity and time cost. Numerous researchers have optimized this formulation by adding non-normalized factors and regular terms, such as [2]. In this paper, we will focus on the main idea of this study to solve the above problem by unified label matrix in Section III.

III. THE PROPOSED METHOD A. PROBLEM FORMULATION
In spectral clustering, the eigenvectors of the Laplacian matrix obtained by eigen-decomposition are the indicator vectors. This approach makes the clustering lose its interpretability and therefore relies heavily on post-processing. We can also understand post-processing as the process of re-dissociation of continuous values. If we want to get rid of post-processing, we must avoid excessive relaxation in the cut [26]. Therefore, it is necessary to add non-negative constraints to (8) to obtain: In this paper, we explain the model proposed in (9) in four ways. First, the model is closer to R-cut (N-cut) than spectral clustering. Because, in both the model of (9) and R-cut (Ncut), the indicator matrix has nonnegative elements, while spectral clustering relaxes them to real numbers. Second, two samples in the same cluster may have different similarities, while R-cut (N-cut) should have the same similarity within the same cluster. Third, the non-negative constraint makes (9) provide interpretability of clustering, i.e., the elements of F reflect the closeness of the connection between data and clusters. Fourth, the problem is a problem with orthogonal nonnegative constraints, and in essence, the solution of (9) already possesses discrete properties. As shown in Fig. 2, assume that F ∈ R 9×3 , indicating that there are nine data points belonging to three different clusters. The white squares denote elements with a value of 0 and the other squares denote non-zero elements. Due to the constraint such that each row of F has one and only one non-zero element and the L 2 norm of each column is 1, i.e., a discrete sparse and physically meaningful matrix. The physical meaning is that the values of the elements in F can directly represent the relationship between data points and clusters.
Since the original graph is usually constructed with noise in the original data, there is no clear structure [27]. Therefore, the reconstruction of the graph is an optimization process when learning structured graphs. As (9) is difficult to be solved. We use an approximate model to solve this problem considering the difficulty and complexity of the calculation. We can rewrite (9) as (10) by Proposition 1. Proposition 1: The problem in (9) can be solved by solving the following problem: Proof: Let W be a symmetric double stochastic matrix and both Tr (W (v) ) T W (v) and Tr(FF T ) be constant terms, then we have □ In this paper, (10) is a very important model. It can be interpreted as a graph reconstruction process. As shown in Fig. 3, each block in M corresponds to a connected component, and a connected component belongs to the same cluster. In the usual case, a connected component is a cluster. The goal of (10) is to reconstruct the similarity matrix by diagonal chunking matrix [28]. The reconstructed matrix needs to be as similar as possible to the original graph W , i.e., the reconstructed graph with structural information is used to approximate the original graph. This clear structure contains a lot of accurate information about the cluster classes. In Fig. 3, there are significant differences between the three-block matrices of matrix M that lie on the diagonal. However, the data within the blocks are not identical and they have similarities with each other. Therefore this is a very good model to ensure that both the cluster labels and the differences within the clusters are attended to during the clustering process.
We can add the condition ∈ R n×m and m ≪ n, which ensures that W depends entirely on B (v) . To make the article present the model clearly, we arrange the detailed introduction and discussion of matrix B (v) in Section III-D. Moreover, inspiration is gained from previous work on [29]. We can define a rotation matrix can be rewritten as We can solve a good approximation of this problem by using Proposition 2. Where the matrix K (v) will try to find the variables that are suitable for reconstruction. Proposition 2 can implicitly perform the clustering task. We represent the data to be clustered as rows of K (v) . The columns of K (v) are a set of orthogonal bases that learn data features in order to find patterns and simplify the data representation. The specific representation of the data in the latent space, is given in F.
Proposition 2: For the solution of problem (11), we can approximate it as Proof: Using the triangle inequality and compatibility of matrix norm to perform the last two steps of the proof, the problem can be easily approximated. □ To accelerate the convergence of the objective function, a uniform label matrix is added to the implicit graph reconstruction performed using Proposition 2. This can further simplify the problem, and the final objective function is where the first term is the reconstruction term, the second term is the norm term, and λ is an balance factor used to keep the balance between the two terms, so that F and G will be infinitely close when is large enough, at which point (12) is equivalent to (13). G is the label matrix used to provide the final clustering labels. The relaxed F has continuous values, however, it differs from spectral clustering in that it is infinitely close to the non-negative matrix G in the new model, which is computationally less expensive and more interpretable than problem (12).

B. OPTIMIZATION
In order to obtain a solution to the non-convex problem that we have proposed in (13), therefore, this problem is split into two subproblems and the optimization is achieved through the method of alternating iterations [30]. The algorithm is summarized in our Algorithm 1. Due to the excellent initialization method, the convergence can be seen to be fast enough and capable of obtaining the best locally optimal solution in the experiments.
Step 1: Fix F and K (v) , update G. The problem can be transformed into The optimal solution can be easily found as where (·) + means that the negative elements of the matrix are clipped.
Step 2: Fix F and G, update K (v) . The problem can be transformed into Suppose that SVD on T 1 . We can further rewrite the above problem as 1ii .
(v) 1ii and Q (v) 1ii are the ith diagonal entry of (v) 1 and Q (v) 1 , respectively. Where is the singular value and Q where equality holds if and only if Q 1 . Due to the previous definition of Q, we can express the optimal solution of (16) as VOLUME 11, 2023

Algorithm 1 FMvC Algorithm
n ] ∈ R d v ×n , cluster number c, anchor number m, nearest neighbor number k and regularization parameter λ. Output: Cluster labels Y .
1: Randomly select the address index of m anchors and apply it to each view; 2: Compute the factor matrix B (v) by (23); 3: Initialize F ∈ R n×c as described in Section III-C; 4: repeat 5: Update G as described in step 1; 6: Update K (v) as described in step 2; 7: Update F as described in step 3; 8: until converges. 9: Get the clustering labels Y by indexing the largest value of each row in G.
Step 3: Fix K (v) and G, update F. The problem can be transformed into Using the same method as in step 2, we suppose that U 2 2 V T 2 can be obtained after performing SVD on T 2 = (B (v) K (v) + λG). Then, remembering that the first c column vectors of U 2 areŨ 2 , we can find the optimal solution of (19) as

C. INITIALIZATION
In the previous section, all variables have been updated. However, other variables need to be used during the update of each variable. And F is most closely related to the other variables. Therefore, F can be initialized. In this paper, two methods of initializing F are given. 1) Randomly generated matrix F: This method needs to ensure that the randomly generated F is a column orthogonal matrix.
2) The method of singular value decomposition: F can be initialized with the eigenvectors corresponding to the k smallest eigenvalues ofL. The vectors corresponding to the first k largest eigenvalues of W are equivalent to this. Recall that we added a property in the construction of problem (11) to ensure that W is a low rank, W = B (v) (B (v) ) T . Thus, the left singular vector of matrix B (v) is the eigenvector of W . In summary, the initialization F can be composed of the left singular vectors corresponding to the first k largest singular values of B (v) .

D. ADAPTIVE ANCHOR-BASED GRAPH
In this section, we introduce the construction of sparse matrix B (v) which represents the similarity between data points and anchors. Normally, we use the k-Nearest Neighbors (k-NN) [31] strategy with the Gaussian kernel function K σ (x i , α j ) = exp (−∥x i − α j ∥ 2 2 /2σ 2 ) to calculate the similarity when constructing the anchor-based graph. Where the original data matrix is , σ is the hyperparameter that represents the bandwidth and d v is the feature dimension of the vth view. The similarity between data points and anchor points can then be expressed as where the set ⟨i⟩ contains the indexes of k nearest anchors of x (v) i , i = 1, . . . , n and j ∈ ⟨i⟩. The main drawback of this method is that it has a hyperparameter σ . The performance of the cluster will be affected by the artificially set hyperparameter, even in the case where a better σ can be set empirically after several trials. Cosine similarity [32] is an efficient way to construct B (v) . However, this method is usually used for texttype data, which does not apply to the model in this paper. Inspired by GMC [33], we use an adaptive approach to construct similar graphs in multi-views. Suppose the similarity between x (v) i and α (v) j of the vth view we useβ (v) ij to denote the (i, j)-th entry inB (v) . Then it is necessary to solve the minimization problem as Since the problem can be solved independently for each view, the vth view is used as an example in (22). With the same reasoning, independence is also available for different i. In this problem, δ denotes the distance between x (v) i and α (v) j . For convenience, we use the euclidean distance. Thus ij ) 2 can effectively avoid the appearance of a trivial solution, i.e., the similarity of nearest neighbors is 1 and all others are 0. The sparse solution of (22) can be solved by the alternating iteration method. We consider that each sample has exactly k nearest neighbors. Let δ i denote the constant vector whose jth element is δ ij . Arrange the elements in δ i from smallest to largest in this order, i.e., δ i1 ⩽ . . . ⩽ δ im . Then problem (22) can be obtained as a closed form solution where i . The parameter γ is able to be adaptively adjusted according to the This adaptive approach has three distinct advantages: 1) The similarity inB (v) is kept consistent with the distance, and a large distance is followed by a large similarity. 2) Also, this method is scaling invariant, being scaled by arbitrary scalars and remaining invariant. 3) Only basic operations of addition, subtraction, multiplication and division are involved, which is highly efficient. And theB (v) obtained by learning is sparse. Next, it is also necessary to calculate the B (v) required for FMvC by ik . This gives the ith diagonal element of the degree matrix D as Therefore, D is the identity matrix. What is more important is that W is a double stochastic matrix.

E. COMPUTATIONAL COMPLEXITY
To further verify the advantages of the proposed method in this paper in terms of running time, experiments related to testing the running time are added in the experimental section.
In this section, we analyze the computational complexity. Suppose the number of anchors is m, the number of samples is n, the number of features is d, and the number of nearest neighbors is k. In step 1 of Algorithm 1, the computational complexity is O(1) because the method of randomly selecting anchor addresses is used. In step 2, the construction of B (v) needs O(mndv + nmv log(k)) by using the adaptive anchor-based graph construction method. v denotes the number of view. In step 3, initializing F we need O(m 2 n + m 3 ), by performing SVD on B (v) . In steps 4 to 8, the complexity of the alternating and iterative process is less than O((mnc + where t denotes the number of iterations. We also need O(nc) to get the clustering labels from the label matrix. Note that m ≪ n, c ≪ n, and c ≪ m, in step 9. In summary, the total calculated cost of this method is O(mndv + m 2 n + mnct), which has linear relationship with data size n.

IV. EXPERIMENTS
This section compares the algorithm with other clustering algorithms on real data sets in order to illustrate the effectiveness of this algorithm. The running environment of all codes in the experiments is Matlab R 2020a with a hardware system configuration of 2.60GHz, i7-10750H CPU, 16Gbyte running memory, and Windows 10 system. All experiments were run in the same environment.

A. EXPERIMENTS ON REAL-WORLD DATASETS
In order to validate the clustering performance of FMvC, we compared it with eleven algorithms in eight datasets. We use the following dataset commonly used in clustering-related papers for our experiments.
• One-hundred plant species leaves data set (100leaves). 1 There are 3 views, each with 1,600 data points from 100 plant leaves. Each object can be described by shape descriptors, fine-scale margins, and texture histograms in each of the 3 views.
• 3 source data set (3source). 2 Three views from the BBC, Reuters, and The Guardian all involved. Each view has 169 news items that can be divided into 6 clusters.
• Amsterdam Library of Object Images (ALOI). 3 This dataset consists of four views with four types of features, namely RGB, HSV, color similarity, and Haralick features. There are 11,025 images of 100 small objects.
• BBC data set (BBC). 4 It is a collection of 685 documents from the BBC News website. Each document is divided into four sections, manually annotated by one of five subject tags.
• WebKB data set (WebKB). 6 The 1,051 web documents were divided into a total of 230 course pages and 821 non-course pages in two categories. Each instance has two representations: 2,949 full-text features describing the text content on the web page, and 334 Inlinks features recording the anchor information on the hyperlinks.
• Mfeat handwritten digit data set (Mfeat). 7 Handwritten numbers (0-9) from the UCI repository. A total of 2,000 samples are represented by each of the six types of features. • Reuters data set (Reuters). 8 News dataset with 18,758 samples. A total of 5 views correspond to different national languages. Due to a large amount of data in this dataset, it is a widely used dataset for conducting large-scale data experiments. Table 1 summarizes the important parameters of each dataset. Specific information on the datasets can be viewed through the links in the footnotes.

B. EVALUATION METRICS
The clustering results were evaluated by comparing the labels obtained for each instance with the labels provided by the dataset. The following three main metrics are used in the experiments to measure clustering performance: accuracy (ACC), normalized mutual information (NMI), and adjusted rand index (ARI). To randomize the experiments, we run each algorithm ten times and report the average results and standard deviation of the performance metrics.
ACC is the average clustering accuracy of the matching results between the labels and the true labels obtained after clustering by this method. Its calculation formula is where p i denotes the label generated by the ith sample under this algorithm and y i denotes the true label. ψ (y i , map (p i )) = 1 when y i = map(p i ), otherwise 0. map(·) is used to map the difference between the predicted labels and the true labels, and the clustering accuracy will be calculated with the best mapping result. This metric is generally in the range of [0, 1], with larger values representing better performance. The NMI can be calculated by following this formula and letting ξ be the correct cluster label, while ξ ′ is the cluster label predicted by the algorithm. Where, P(ξ i ) denotes the probability that the sample belongs to the cluster, and P(A, B) denotes the joint probability of the two. H (·) denotes the entropy. and ′ denote the sets of ξ and ξ ′ , respectively.
where C 2 n r i and C 2 n τ i denote the true label and learned label of the ith cluster, respectively. And C n m denotes the selection of m elements from the set composed of n. This metric is generally in the range of [−1, 1], with larger values representing better performance.

C. COMPARISON ALGORITHMS
FMvC is compared with the following nine baselines methods. To distinguish them, we label single-view k-means and N-cut [25] as Sk-means and SNcut, respectively, both of which are single-view clustering algorithms. MSC [22] and CoregSC [7] are traditional multi-view clustering methods and are based on collaborative training. MkC [21] and multi-NMF [18] are multi-view clustering methods based on multiview subspaces. ASMV [34], MGL [35], and MCGL [20] are weighting-based multi-view clustering methods. Since Sk-means and SNcut are only valid for single view data. For multi-view data, this paper runs each view separately and records the results for the best-performing view. The original code of the comparison algorithm was obtained from its author's homepage for this study and run in MATLAB using the default parameters set by it. For FMvC it was also run in MATLAB to ensure a consistent environment.
In the parameter setting, k is empirically set to 20 and the initial λ is set to 1. It is adaptively that the λ adapts to different datasets during the clustering process. Specifically, in each iteration, we increase (λ = λ × 2) or decrease (λ = λ/2) the connected component of B (v) if it is smaller or larger than the number of clusters c, respectively.

D. EXPERIMENTAL RESULTS
The comparison results are shown in Table 2, Table 3, and  Table 4. Table 5 shows the results of the ablation experiments, which demonstrate the effectiveness of the non-negative constraint. Tables 6 and Table 7 show the running time of the algorithm.The results are recorded as ''→ 0'' in the table, indicating that their results are close to 0, not that their results are 0. In this experiment, the best results are taken for ten runs of each method and recorded as percentages in the table, and the standard deviations of the ten runs are recorded in parentheses in the table. For ease of observation, the top two best-performing data in each data set in the table are bolded. Comparative plots of the convergence speed for the objective function and the balance term are given in Fig. 5, along with example plots of graph reconstruction. Fig. 6 shows the   convergence of the algorithm and the convergence of the balance term through the blue and yellow curves, respectively. Fig. 7 and Fig. 8 show the effect of different parameters on the performance of the algorithm in this paper. Further analysis of the algorithm performance of FMvC leads to the following experimental results from the data.

1) ANALYSIS OF IMPORTANT METRICS
FMvC achieves the highest clustering performance for the listed performance metrics on most of the datasets. For example, FMvC exceeded the ACC of about 12% on 100leaves relative to the best second method, MCGL. The NMI result exceeds about 5% on 3sourse relative to MGL. In Mfeat, the ARI of the FMvC exceeds the second MCGL by more than 5%. The superior performance of FMvC over the other methods is due to its avoidance of post-processing. MultiNMF has no value for datasets related to handwritten numbers. This is because usually handwritten numeric datasets contain negative data, and MultiNMF is based on non-negative matrix decomposition, which is not suitable for handling such data.
Compared with multi-view algorithms such as CoregSC and MSC, which are improved from spectral clustering,    FMvC has a significant advantage. This may be because the method in this paper relies on something other than learning uniform graphs but rather on the uniform label matrix to obtain labels.
In addition, Fig. 5 gives an example illustration of the graph reconstruction of FMvC on 100leaves, Hdigit, and Mfeat. It can be visualized that the graph reconstructed by FF T has a clear structure with strong intra-cluster connections and weak inter-cluster connections in the graph. The process of structured graph reconstruction makes the method in this paper outperform the clustering performance of other methods. The random selection of anchor addresses makes the clustering results of this method may be inaccurate in some cases. However, it did not happen in the experiment.

2) ABLATION STUDY
In this section, we further investigate the effect of non-negative constraints on the clustering performance through experiments. In Table 5, FMvC-A indicates that the model is not subject to the non-negative constraint, i.e., G = F in (15). It can be found from the experimental results that FMvC has better performance than FMvC-A in all categories because the balance term (i.e., ∥F − G∥ 2 F ) can quickly converge to 0 under the restriction of the nonnegative constraint. FMvC-A needs help to achieve this. Similar conclusions regarding the non-negative properties can be obtained in [18].

3) RUNNING TIME
The running times of the algorithms are shown in Table 6, and FMvC significantly outperforms the other methods. It is the fastest on the 100leaves, 3sources, Mfeat, WebKB, ALOI, and BBC datasets. The running time of FMvC is also significantly higher than the other methods on the Mfeat dataset, which has the largest data. There are four main reasons: first, the method in this paper adaptively adjusts the parameters to solve the similarity matrix during the clustering process rather than by solving the problem (10); second, it gets rid of the time cost of up to O(n 3 ) due to the eigen-decomposition and solves F using SVD instead, thus reducing the time cost [7]; third, the non-negative constraint makes the interpretability greatly enhanced, thus making FMvC free from postprocessing; and fourth, because instead of using a uniform graph, this paper uses a uniform labeling matrix to make it converge quickly.
To further test the efficiency of this study in terms of running time, we conducted comparison experiments with two other fast clustering algorithms. The two comparison algorithms were AWP [29] and FMCSE [36]. In the experiments, we ran each of them 20 times using the parameters of  the best run in the reference and recorded the average value. Table 7 shows that the ACC and NMI of FMvC are not the best results. This is mainly due to the strategy of randomly selecting anchor addresses used by FMvC instead of k-means. However, the run speed is much better than the comparison method, which is the focus of this study. We construct B (v) by randomly selecting the anchor addresses. Therefore, this algorithm has the feature of running speed blocks. However, this also leads to certain drawbacks. Too strong randomness may lead to limited anchor point selection. This will lead to poor performance. This is the issue that we will focus on in our future research.

4) CONVERGENCE STUDY
Since FMvC is an iterative algorithm, it is inevitable to focus on its convergence. In this paper, experiments on the convergence speed of the algorithm were also conducted. In Fig. 6, experiments are performed mainly on the 100leaves, 3sourse, and Mfeat datasets. The value of the balance term of FMvC is given as a function of the number of iterations. For each graph, the x-axis and y-axis indicate the number of iterations and the value of the objective function, respectively. FMvC usually converges within 10 iterations. The main reason is that we provide an optimal solution for each subproblem.
FMvC incorporates the balance term in the objective function, i.e., the uniform label matrix, and sets the balance factor λ. In Fig. 6, the value of the balance term (i.e., ∥F − G∥ 2 F ) versus the number of iterations is given for FMvC on the 100leaves, 3sourse, and Mfeat datasets. The x-axis and y-axis coordinates in each plot indicate the value of the balance term and the number of iterations. F and G gradually approach each other in each iteration, indicating that the negative term gradually becomes smaller or disappears. As can be seen in (15), this is the part of the difference or negative term between F and G. In Fig. 6, the value of the balance term converges to 0 within 10 iterations, which indicates that the balance term reaches convergence. Fig. 7 and Fig. 8 show the relationship between the clustering performance and the parameters. While the number of anchors seems proportional to the performance, there is no correlation between the two in the case of sufficient anchors. Therefore, we should consider a reasonable number of anchor points. A balancing factor of 1 seems to be a good choice. Selecting a reasonable λ is a very empirical task. In theory, it is better to pick a large enough λ. In practice, it is best to be able to balance the two terms of (13). The algorithm in this paper is not very sensitive to the number of neighbors, and the best results are usually achieved when there are 9 nearest neighbors. In some cases, we can directly ignore the effect brought by the number of nearest neighbors.

V. CONCLUSION
In this paper, we propose a fast multi-view clustering method. The method has the advantages of high interpretability, accuracy, and speed. FMvC exploits the unified view of spectral clustering. The similarity matrix is reconstructed into a structured graph through orthogonal and non-negative constraints. Such graphs have strong intra-cluster connections and weak inter-cluster connections. The non-negative constraints make the clustering easier to interpret, thus avoiding post-processing. In addition, a uniform label matrix is added to accelerate convergence. Extensive experiments on real-world datasets show that FMvC has a significant advantage in terms of time cost. For the future work, we will explore more stable methods to select anchor points.