Next Article in Journal
Calculation of the Electrostatic Field of a Circular Cylinder with a Slot by the Wiener–Hopf Method
Next Article in Special Issue
GA-KELM: Genetic-Algorithm-Improved Kernel Extreme Learning Machine for Traffic Flow Forecasting
Previous Article in Journal
Existence and Uniqueness of Non-Negative Solution to a Coupled Fractional q-Difference System with Mixed q-Derivative via Mixed Monotone Operator Method
Previous Article in Special Issue
Active Learning: Encoder-Decoder-Outlayer and Vector Space Diversification Sampling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Low-Rank Graph Multi-View Clustering via Cauchy Norm Minimization

1
College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
2
Chongqing Key Laboratory of Nonlinear Circuits and Intelligent Information Processing, College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(13), 2940; https://doi.org/10.3390/math11132940
Submission received: 29 May 2023 / Revised: 25 June 2023 / Accepted: 28 June 2023 / Published: 30 June 2023

Abstract

:
Graph-based multi-view clustering methods aim to explore the partition patterns by utilizing a similarity graph. However, many existing methods construct a consensus similarity graph based on the original multi-view space, which may result in the lack of information on the underlying low-dimensional space. Additionally, these methods often fail to effectively handle the noise present in the graph. To address these issues, a novel graph-based multi-view clustering method which combines spectral embedding, non-convex low-rank approximation and noise processing into a unit framework is proposed. In detail, the proposed method constructs a tensor by stacking the inner product of normalized spectral embedding matrices obtained from each similarity matrix. Then, the obtained tensor is decomposed into a low-rank tensor and a noise tensor. The low-rank tensor is constrained via nonconvex low-rank tensor approximation and a novel Cauchy norm with an upper bound is proposed to handle the noise. Finally, we derive the consensus similarity graph from the denoised low-rank tensor. The experiments on five datasets demonstrate that the proposed method outperforms other state-of-the-art methods on five datasets.

1. Introduction

Multi-view learning is a significant task in machine learning. Recently, a battery of multi-view learning methods have emerged, such as cross-view domain learning [1], multi-view classification [2,3], multi-view outlier detection [4] and multi-view clustering [5,6,7]. Multi-view data are ubiquitous in the real world, for example, images consist of texture, lighting, and color, while videos are made up of audio and frames. Multiple views contain complementary and principle information, which can provide multi-view learning with more knowledge. Therefore, utilizing multi-view data for clustering tasks can yield more precise results compared to using only a single view.
The clustering method aims to explore the latent categorical information of multi-view data in an unsupervised manner. The performance of most clustering methods heavily depends on the learned similarity matrix. To improve the quality of the similarity matrix, numerous clustering approaches have been proposed. Subspace learning and matrix decomposition [8,9,10,11] and graphs are widely used for multi-view clustering.
A graph is a powerful data structure that indicates the correlations among individuals and the structure of groups [12,13]. In this structure, nodes represent individuals, and the values of the edges indicate the strength of connections among individuals. The graph is usually represented as a similarity matrix, where elements determined by rows and columns denote the edges between individuals. Typically, the graph is constructed using the Euclidean distance with each element denoting the similarity between two samples.
Graph-based clustering methods typically obtain a consensus similarity matrix from multiple views. Subsequently, an additional clustering step, such as spectral clustering, is performed to obtain the clustering results. Although graph-based methods have achieved impressive performance, they still have some drawbacks. Most of the methods learn the similarity graph in the original data space, resulting in the learned graph missing structural information in the low-dimensional space. To address this limitation, the work in [14] constructs a similarity graph in the spectral embedding space, which captures both the low-dimensional information and local structure. However, [14] only uses the spectral embedding to handle the noise, and does not specifically process the noise in the spectral embedding accordingly. Furthermore, [14] uses a weighted tensor nuclear norm (assign weights to each singular value) to impose a low-rank constraint on the tensor constructed from the inner product of the normalized spectral embedding matrices. Assigning optimal weights becomes a challenge when dealing with a large number of data views. This is because the weight assignment relies on a priori knowledge of the data, which is difficult to obtain directly.
To address these limitations, the proposed approach is as follows: (1) A tensor nuclear norm with a nonconvex approximation function [15,16,17] is utilized to constrain the low-rank tensor. This approach assigns weights to each singular value using a parameter while constraining the contribution of each singular value to the nuclear norm to be close to 1. (2) Inspired by [18,19,20], a novel norm named the Cauchy norm with an upper bound is proposed to handle the noise in the tensor. Compared with the widely used l 2 , 1 norm, the Cauchy norm is more stable, due to its asymptotic upper bound, which alleviates the influence of noise, especially in the case of high-noise values. By using the Cauchy norm, we prevent large noise values dominating the objective function. By combining these two strategies, we introduce a novel multi-view clustering method called Robust Low-rank Graph Multi-view Clustering (RLGMC) via Cauchy Norm Minimization. The proposed method can be optimized with heuristic algorithms such as particle swarm optimization [21], collaborative neurodynamic optimization [22] or the alternating direction method of multipliers (ADMM) [23]. The flowchart of the proposed method is illustrated in Figure 1. We unify spectral embedding, low-rank constraint and noise constraint into a unified framework. The low-rank tensor constructed from the spectral embedding space is decomposed into a robust low-rank tensor and a noise term. Subsequently, the consensus graph S is obtained from a robust low-rank tensor. Finally, the clustering results are obtained by performing spectral clustering on S. The main contributions of our work are as follows:
  • A novel multi-view clustering method referred to as RLGMC is proposed. In this method, we combine spectral embedding, low-rank tensor learning and noise constraints into a unified framework. By learning a robust tensor, the underlying structure implied in multiple views is effectively captured.
  • To enhance the tensor nuclear norm, a nonconvex low-rank approximation is employed. This approach assigns weights to each singular value via a nonconvex function, improving the performance of low-rank approximation while only requiring one hyperparameter.
  • We propose a norm called the Cauchy norm with an upper bound to handle the noise in the spectral embedding space. This norm suppresses the noise value, preventing large noise values from dominating the objective function.
  • An alternating optimization algorithm is designed to solve the proposed method. Additionally, experiments conducted on six real-world datasets demonstrate that the proposed method outperforms state-of-the-art methods.
The remainder of this paper is organized as follows. Section 2 gives a brief review of the related works. Section 3 presents the notations and preliminaries used throughout this paper. The details and the proposed method are presented in Section 4. In Section 5, an efficient iterative optimization algorithm is designed to optimize the proposed method. Section 6 verifies the effectiveness of our approach on six datasets. Finally, the conclusion is provided in Section 7.

2. Related Works

In this section, we introduce several popular clustering methods.

2.1. Deep Multi-View Clustering Methods

Deep multi-view clustering methods discover the clustering patterns via deep neural network architecture. The work in [24] mimics the attributes of ‘self-expression’ via the self-expressive layer. Wang et al. unified two schools of graph representation learning strategies via adversarial training in a minimax game [25]. To adapt graph neural networks to multi-view clustering tasks, Li et al. proposed a graph auto-encoder for clustering and constructs a weighted graph for GAE via a generative graph representation model [26].

2.2. Subspace Clustering Methods

Subspace clustering methods learn a similarity matrix and then obtains clustering patterns via an additional clustering method. Sparse subspace clustering (SSC) and low-rank representation (LRR) are two representative methods, which explore sparse representation and low-rank representation, respectively [27,28]. However, both SSC and LRR are single-view methods limited by the inability to use complementary information from multiple views. To address this limitation, Zhang et al. explore the complementary information via a low-rank constraint [29]. To learn the higher-order information among multiple views, Xie et al. proposed a tensor nuclear norm based on tensor singular value decomposition to achieve a low-rank constraint on the rotated tensor [30]. The work in [31] learns an essential tensor for Markov chain-based spectral clustering.

2.3. Graph-Based Clustering Method

Graph-based methods typically try to find a consensus graph among multiple views. The work in [32] learns the similarity matrix by assigning optimal neighbors for each sample and a rank constant of the Laplacian matrix. Nie et al. proposed a parameter-free method to learn optimal weights for each view [33]. A step further, to address the drawback in [33], Nie et al. further proposed a novel self-weighted method to impose the graph as the center of a multi-view by a Laplacian rank constraint [34]. In [14], a framework for simultaneously learning spectral embeddings and low-rank tensor representations is proposed, achieving excellent performance.

3. Notations and Preliminaries

In this section, the notations, preliminaries and nomenclature declaration are given.
For convenience, the basic notations and preliminaries used in this paper are provided in this section. A summary of the primary notations is shown in Table 1. The calligraphy letters (e.g., T ), capital letters (e.g., T), bold lower case letters (e.g., t ) and lower case letters (e.g., t) denote the tensor, matrices, vectors and scalar, respectively. The discrete fast Fourier transformation (DFFT) and the inverse operation of a tensor T along the third dimension are defined as T ¯ = f f t ( T , [ ] , 3 ) and T = i f f t ( T ¯ , [ ] , 3 ) , respectively.

3.1. Tensor Construction and Rotation Operations

Tensor construction operation is represented by Ψ ( · ) . Ψ ( · ) can construct the tensor by stacking multiple matrices.
The tensor rotation operation is represented by Φ ( · ) . Given a tensor Z R N × M × V , Φ ( · ) can rotate the size of Z to N × V × M .

3.2. Tensor Nuclear Norm and Nonconvex Approximation

The tensor singular value decomposition (t-SVD) [35] of a tensor A R n 1 × n 2 × n 3 can be formulated as follows.
A = U S V T
where U R n 1 × n 1 × n 3 and V R n 2 × n 2 × n 3 are orthogonal tensors (a tensor satisfies U U T = U T U = I ), S R n 1 × n 2 × n 3 is f-diagonal tensor (each of its front slices is a diagonal matrix). Furthermore, ∗ is the t-Product defined in [35].
The tensor nuclear norm (TNN) proposed in [36] of a tensor A is as follows.
A = v = 1 V j = 1 r S ¯ ( j , j , v )
where r = min { N , M } , S ¯ is denoted as the DFFT on S which is the singular value of A .
The nonconvex low-rank tensor approximation TNN has a tighter approximation of the original sparsity regularized rank function than TNN [15], which is represented as follows.
Z , θ = v = 1 V j = 1 r ϕ ( σ j ( S ( v ) ¯ ) , θ ) = v = 1 V j = 1 r ( 1 + θ ) σ j ( S ( v ) ¯ ) θ + σ j ( S ( v ) ¯ )
where ϕ , σ i and θ denote the nonconvex approximation function and the i-th largest singular value and function parameters, respectively.

3.3. Cauchy Norm with Upper Bound

The Cauchy norm with an upper bound of a tensor E R N × M × V can be formulated as follows:
E C a u c h y = i = 1 M ln ( 1 + E ( : , i ) 2 γ )
where E obtained by E = E ( 3 ) is the third matricization of E and γ is the scaling parameters. The value of γ is positive.
Theorem 1.
Given a non-negative tensor E R N × M × V , it is obvious that the C a u c h y norm with an upper bound is less sensitive to large noise values compared to the l 2 , 1 norm and the following inequality holds:
E C a u c h y E 2 , 1
Proof. 
Clearly, x is greater than ln ( 1 + x γ ) ; if x is larger than 1 1 γ . Since E C a u c h y equals i = 1 N ln ( 1 + E ( : , i ) 2 γ ) and E 2 , 1 equals i = 1 N E ( : , i ) 2 , where E = E ( 3 ) , we can construct the following inequality.
ln ( 1 + E ( : , i ) 2 γ ) E ( : , i ) 2
The property shown in (6) helps prevent the noise values from dominating the objective function during optimization. Additionally, it is worth noting that each E ( : , i ) 2 must satisfy the condition E ( : , i ) 2 1 1 γ . As a result, the value of γ is confined within the range of 0–1 to ensure the constancy of Ineq. (6).    □

3.4. Similarity Graph Construction and Spectral Embedding

There is a set of data matrices { X ( v ) } v = 1 V , X R d v × N , where N is the number of samples and d v is the dimension of the data in v-th view. The work in [37] constructs the similarity graph by the adaptive allocation of weights according to the Euclidean distance between two points as follows.
min s v v = 1 V w v i , j = 1 N x i v x j v 2 2 S i , j + λ S F 2 s . t . S i , j 0 , S i 1 T = 1 , r a n k ( L s ) = n c
where x i v and x j v are i-th and j-th data points in the v-th view. n and c denote the number of samples and number of clusters, respectively. s i , j denotes the consensus similarity graph. λ is the balance parameter. The Laplacian matrix L s is calculated from L s = D S , D is the diagonal matrix with d i , i = j = 1 n S i , j . The weight w v is equal to
1 2 i = 1 n j = 1 n x i v x j v 2 2 S i , j .
Moreover, the work in [14] learns a low dimension embedding similarity graph to address the limit that the graph learned from Equation (7), which is inaccurate because of the presence of noise and redundancies. Low-dimensional spectral embedding graph can be obtained as follows.
min S v = 1 V i = 1 n j = 1 n h i ( v ) h j ( v ) 2 2 S i , j + γ S F 2 s . t . S i , j 0 , S i 1 T = 1
where h i ( v ) and h j ( v ) is the i-th and j-th rows of normalized spectral embedding matrix H . The matrix H ( v ) can be obtained by the following formula.
h i ( v ) = h i ( v ) h i ( v ) T h i ( v )
H ( v ) R N × c is the spectral embedding matrix of the v-th view and can be obtained by the spectral clustering of the similarity graph of the v-th view.
max H ( v ) t r ( H ( v ) T G ( v ) H ( v ) ) s . t . H ( v ) H ( v ) T = I
where G ( v ) = D ( v ) 0.5 W ( v ) D ( v ) 0.5 , D ( v ) is a diagonal matrix calculated by D i , i ( v ) = i = 1 n W i , j ( v ) . The similarity graph { W ( v ) } v = 1 V is computed as
W i , j ( v ) = A i , k + 1 ( v ) A i , j ( v ) k A i , k + 1 ( v ) m = 1 k A i , m ( v )
where A i , j is computed as x i ( v ) x j ( v ) 2 2 , then, A ( v ) is normalized to A ( V ) and k denotes the near-neighbor parameters.

4. The Proposed Method

According to Equation (9), the quality of the similarity graphs S depends on D i , j h ( v ) , where D i , j h ( v ) = h i ( v ) h j ( v ) 2 2 = 2 h i ( v ) h j ( v ) T . However, Equation (11) neglects the high-order information embedded in multi-view data and principle information shared among multiple views. To overcome this limitation, a third-order tensor B R n × n × V is constructed by stacking each H ( v ) H ( v ) T . To better capture the correlations between different views, the dimension of B is rotated to n × V × n [30,31,38]. It is expected that the third-order tensor B has a low-rank structure since the sample correlations are consistent across views. Based upon this, the optimization problem can be formulated as follows:
min H ( v ) , B λ v = 1 V t r ( H ( v ) T G ( v ) H ( v ) ) + τ B s . t . H ( v ) T H ( v ) = I
where λ is the balance parameter, · denotes the tensor nuclear norm, I indicates the identity matrix and τ denotes the singular values threshold.
Despite the utilization of tensor nuclear norm (TNN) in Equation (13) to capture low-rank information, it overlooks the presence of errors and noise in the spectral embedding space, leading to the learning of an inaccurate consensus graph. Additionally, solely relying on TNN may not achieve perfect tensor low-rank approximation. One potential solution is to employ the weighted TNN [39], but this introduces the challenge of determining appropriate weight values. To address these issues, inspired by [15,16], a nonconvex approximation TNN (see Section 3.2), which assigns different thresholds to different singular values, is used to constrain B . Furthermore, in order to mitigate the influence of noise in the spectral embedding space and prevent large noise values dominating the objective function, we proposed a novel norm called the C a u c h y norm (see Section 3.3) with an upper bound. The overall formulation of the proposed method is as follows:
min H ( v ) , Z , E λ v = 1 V t r ( H ( v ) T G ( v ) H ( v ) ) + Z , θ + α E C a u c h y s . t . H ( v ) T H ( v ) = I H ( v ) H ( v ) T = Z ( v ) + E ( v ) Z = Φ ( Ψ ( { Z ( v ) } v = 1 V ) ) E = Ψ ( { Z ( v ) } v = 1 V )
where G ( v ) can be obtained by first calculating Equation (12) and then D i , i ( v ) = i = 1 n W i , j ( v ) . Z , E and I denote the low-rank tensor, noise tensor and identity matrix, respectively. The Φ ( · ) operation and Ψ ( · ) operation are tensor construction and rotation operations (see Section 3.1). λ and α are balance parameters, θ is the hyperparameters of nonconvex approximation TNN.

5. Optimization Algorithm

5.1. Optimization Steps

An efficient alternating optimization algorithm is designed to solve problem (14). The optimization problem is equivalent to the following subproblems.

5.1.1. H ( v ) Subproblem

With other variables fixed, H ( v ) can be obtained by solving the following problem.
arg min H ( v ) λ v = 1 V t r ( H ( v ) T G ( v ) H ( v ) ) s . t . H ( v ) T H ( v ) = I H ( v ) H ( v ) T = Z ( v ) + E ( v )
Optimizing the above problem is equivalent to optimizing each of the following subproblems.
arg max H ( v ) λ t r ( H ( v ) T G ( v ) H ( v ) ) s . t . H ( v ) T H ( v ) = I H ( v ) H ( v ) T = Z ( v ) + E ( v )
With reference to the augmented Lagrangian function, problem (16) can be rewritten as
arg min H ( v ) λ t r ( H ( v ) T G ( v ) H ( v ) ) + μ 2 H ( v ) H ( v ) T ( Z ( v ) + E ( v ) Y ( v ) μ ) F 2 s . t . H ( v ) T H ( v ) = I
where Y ( v ) is the multiplier and μ indicates the penalty parameter. Then, the problem (17) is rewritten as
arg min H ( v ) λ t r ( H ( v ) T G ( v ) H ( v ) ) + μ 2 t r ( H ( v ) H ( v ) T H ( v ) H ( v ) T ) μ 2 t r ( H ( v ) H ( v ) T ( T ( v ) + T ( v ) T ) ) s . t . H ( v ) T H ( v ) = I
where T ( v ) = Z ( v ) + E ( v ) Y ( v ) μ . Similarly to [14], let P ( v ) R n × n be a diagonal matrix. Its value on the diagonal can be calculated as follows.
P i , i ( v ) = 1 h i ( v ) h i ( v ) T
It is obvious that we have H ( v ) = P ( v ) H ( v ) . Integrating Equation (19) into problem (18) yields
arg max H ( v ) t r ( H ( v ) T Q ( v ) H ( v ) ) s . t . H ( v ) T H ( v ) = I
where Q ( v ) = λ G ( v ) + μ 2 P ( v ) ( T ( v ) + T ( v ) T H ( v ) H ( v ) T ) P ( v ) . According to [40], the optimal solution of (20) is to take the eigenvector corresponding to the c largest eigenvalues of Q ( v ) .

5.1.2. Z Subproblem

With other variables fixed, Z is obtained by solving the following augmented Lagrangian function.
arg min Z Z , θ + μ 2 Z T F 2
where T ( v ) = H ( v ) H ( v ) T E ( v ) + Y ( v ) μ .
According to the proof in [16], the optimal solution of (21) is as follows.
Z = U i f f t ( S ϕ ρ ( Γ ¯ ) ) V T
where T = U Γ V T is the t-SVD of T , Γ ¯ = f f t ( Γ ) , ∗ denotes the t-product [35], S ϕ ρ ( Γ ¯ ) ( v ) = diag { max { ( Γ ¯ i , i ( v ) 1 ρ ϕ ( σ i ( Γ ¯ ) ) ) } } and ϕ indicates the first-order derivative of the nonconvex approximate function ϕ .

5.1.3. E Subproblem

With other variables fixed, the E subproblem can be formulated as follows.
arg min E α E C a u c h y + μ 2 E A F 2
where A ( v ) = H ( v ) H ( v ) T Z ( v ) + Y ( v ) μ . We introduce a theorem to solve the above optimization problem.
Theorem 2.
Given a tensor B R N × M × V , we give the optimization problem as
arg min W λ W C a u c h y + ρ 2 W B F 2
and the optimal solution of problem (24) is
W i = D i 2 γ λ ϵ D i 2 D i , D i 2 > γ λ ϵ 0 , D i 2 γ λ ϵ ϵ = ρ γ λ , D = μ B ϵ
where W = W ( 3 ) and B = B ( 3 ) are the third matricization of W and B , respectively. W i and D i denote the i-th column of W and D. γ is the scaling parameter of the C a u c h y norm.
Proof. 
Since we have the second-order Taylor expansions as ln ( 1 + x ) = x x 2 2 + Θ ( ( x ) 2 ) , where Θ ( ( x ) 2 ) is the remainder term of Taylor’s expansion, W C a u c h y can then be expanded as follows.
W C a u c h y = i = 1 n ( γ W i 2 ( γ W i 2 ) 2 2 + Θ ( ( γ W i 2 ) 2 ) )
There is a drawback to using the second-order Taylor expansion when x is large, as it can result in inaccuracies. To eliminate the effects of this drawback, γ is set to a small value, such as 0.3. As shown in Figure 2, it can be observed that choosing appropriate scaling parameters (in order to stop them being too small to prevent the complete loss of information about the noise) satisfies the constant condition stated in Equation (26). Assuming an ideal value of γ (which is often challenging to achieve in practice), taking Equation (26) into problem (24) gives
arg min W i = 1 n ( γ λ W i 2 λ ( γ W i 2 ) 2 2 + ρ 2 W i B i 2 2 )
Let ϵ = μ γ λ , D = μ B ϵ , and Equation (27) can be simplified as follows.
arg min W γ λ W i 2 , 1 + ϵ 2 W D F 2
According to [28], the optimal solution is
W i = D i 2 γ λ ϵ D i 2 D i , D i 2 > γ λ ϵ 0 , D i 2 γ λ ϵ
   □
Based on the above theorem, the optimal solution of problem (23) is as follows.
E i = D i 2 γ α ϵ D i 2 D i , D i 2 > γ α ϵ 0 , D i 2 γ α ϵ ϵ = μ γ α , D = μ A ϵ

5.1.4. Remaining Steps

The Lagrange multipliers are updated as follows.
Y ( v ) = Y ( v ) + μ ( H ( v ) H ( v ) T Z ( v ) E ( v ) )
The entire process is summarized in Algorithm 1. Once the similarity matrix S is obtained, the final clustering results are calculated by clustering S with spectral clustering method.
Algorithm 1: RLGMC for multi-view clustering
Require: 
Multi-view data: X ( v ) R n × d i , v = 1 , 2 , , V , parameters λ , α , θ and γ ;
1:
First, compute W ( v ) by Equation (12), second, calculate D ( v ) by i = 1 n W i , j ( v ) , and finally, calculate G ( v ) = D ( v ) 0.5 W ( v ) D ( v ) 0.5 .
2:
Set k = 0 , μ 0 = 10 3 , β = 2 , Z 0 = E 0 = Y 0 = H ( v ) = 0 , μ m a x = 10 10 , and ε = 10 6 ;
3:
while not converged do
4:
    for each v [ 1 , V ]  do
5:
        Update H ( v ) , k + 1 by solving problem (20);
6:
    end for
7:
    Obtained Z = Φ ( ψ ( Z 1 , Z 2 , , Z V ) ) ;
8:
    Update Z k + 1 by Equation (22);
9:
    Update E k + 1 by Equation (30);
10:
    for each v [ 1 , V ]  do
11:
        Update Y ( v ) k + 1 by Equation (31);
12:
    end for
13:
    Update μ k + 1 by μ k + 1 = min ( β μ k , μ m a x ) ;
14:
    Check the convergence conditions:
H ( v ) , k + 1 H ( v ) T , k + 1 Z ( v ) , k + 1 E ( v ) , k + 1 ε
15:
     k = k + 1 ;
16:
end while
17:
Construct consensus graph S by Equation (9);
Ensure: 
Robust consensus graph S;

5.2. Computational Complexity

The computational complexity of Algorithm 1 is analyzed as follows: updating H v needs to calculate the eigenvectors of G ( v ) which takes O ( c n 2 ) on each view. Since FFT operation and IFFT operation totally cost O ( V n 2 log ( n ) ) , and computing the SVD of each frontal slice of Γ ¯ takes O ( V 2 n 2 ) , the total complexity of updating Z is O ( V n 2 log ( n ) + V 2 n 2 ) . Furthermore, then, updating E takes O ( n 2 V ) per iteration. The spectral clustering method cost O ( n 3 ) . To this end, the computational complexity of RLGMC for multi-view clustering is O ( K ( V c n 2 + V n 2 log ( n ) + V 2 n 2 + n 2 V ) + n 3 ) O ( K V n 2 log ( n ) + n 3 ) , where K denotes the number of iterations.

6. Experiments

6.1. Datasets

In this section, six multi-view datasets are used to evaluate the performance of the proposed model. The information from all the datasets is summarized in Table 2. Moreover, we briefly introduce the six datasets as follows:

6.2. Comparison Methods

The method proposed in this paper is compared with the following methods.
  • SSC [27]: Sparse subspace clustering explores the sparse representation. The parameters of the SSC are set to the default parameters given in the program. Since it is a single-view method, the result is obtained from the first view.
  • LRR [28]: Low-rank representation explores the low-rank subspace representation by nuclear norm. For LRR, the parameter is selected from [0.1, 0.5, ⋯, 4.5, 4.9]. Since it is a single-view method, the result is obtained from the first view.
  • LT-MSC [29]: LT-MSC explores the high-order correlation of the multi-view using a low-rank constraint. For LT-MSC, the parameter is selected from [ 10 3 , 10 2 , ⋯, 10 1 , 10 2 ].
  • t-SVD-MSC [30]: tSVDMSC learns low-rank subspace representation by TNN and tensor rotation operation. For t-SVD-MSC, the parameter is selected from [ 10 2 , 10 1 , ⋯, 10 1 , 10 2 ].
  • ETLMSC [31]: ETLMSC learns the essential tensor constrained by TNN from the transition probability matrices. Then, the final clustering results are obtained through the Markov chain-based spectral clustering method. The parameter of ETLMSC is tuned from 0.001 to 5.
  • MCGC [42]: MCGC reduces the disagreements among multiple views by a co-regularization term. The parameters of MCGC are set to 0.6.
  • GMC [43] (https://github.com/cshaowang/gmc, 15 January 2023): GMC proposes a joint framework consisting of the learning of the similarity-induced graph, the learning of the unified graph and clustering tasks. For GMC, the default parameter is 1.
  • DGF [44] (https://github.com/youweiliang/ConsistentGraphLearning, 15 January 2023): DGF learns the consistency and inconsistency among multiple views in a unified optimization model. For DGF, two parameters are selected from [ 10 6 , 10 4 ] and [ 10 4 , 10 6 ], respectively.
  • CGL [14] (https://github.com/guanyuezhen/CGL, 15 January 2023): CGL simultaneously learns spectral embedding and explores the low-rank tensor representation. The parameters of CGL are selected from [1, 5, 10, 50, 100, 500, 1000, 5000].

6.3. Evaluation Metrics

To evaluate the performance of all methods, the following metrics are adopted.
Normalized mutual information (NMI) measures the similarity between predicted partition and the truth partition. The NMI is calculated as:
N M I ( C , C ) = i = 1 K j = 1 S | C i C j | log N | C i C j | | C i | | C j | ( i = 1 K | C i | log C i N ) ( j = 1 S | C j | log C j T N )
where C and C represent the predicted partition and the truth partition, respectively. Accuracy (ACC) measures the percentage of samples that are correctly predicted. It can be defined as follows:
A C C = i = 1 N δ ( t i , m a p ( p i ) ) N
where t i and p i denote the true label and predicted label, respectively. Function m a p ( · ) denotes the best permutation mapping function [45]. δ is formulated as follows:
δ ( t , p ) = 1 , t = p 0 , t p
The remaining metrics including precision, recall, F-score and AR, view clustering as a series of decisions, calculated from the different perspectives of decisions. The details about their definitions can be found in [46].
The higher values of the above six metrics indicate better performance. All methods including the proposed method were performed ten times for clustering, and then the average and standard deviation of the metrics are obtained.

6.4. Parameter Settings

For the proposed method, λ is tuned in the range of [1, 5, 10, 15, 20, 50, 100, 500, 1000, 2000, 3000, 5000], α is tuned in the range of [1, 5, 10, 50, 100, 500, 1000], θ is tuned in the range of [0.5, 1, 1.2, 1.5, 2, 2.5, 3, 3.5, 4] and the scaling parameter γ is fixed as 0.2 .

6.5. Analysis of Clustering Results

As shown in Table 3, the superiority of the multi-view models compared to the single-view models indicates that complementary information in multiple views is necessary for clustering tasks.
Compared with the three subspace learning methods (LT-MSC, t-SVD-MSC, ETLMSC), the proposed method outperforms LT-MSC, t-SVD-MSC, and ETLMSC on the ORL, COIL-20, 100leaves, and UCI digits datasets. For example, the proposed method improves accuracy by 13.03%, 5.35% and 8.15% over ETLMSC, t-SVD-MSC and LT-MSC. On the Handwritten and 20newsgroups datasets, the proposed method is very close to t-SVD-MSC, with a slightly lower accuracy, which is 1.96% and 2.4% less than t-SVD-MSC in ACC on the two datasets, respectively. These results highlight the effectiveness of the proposed method in learning accurate similarity by using spectral embedding and low-rank tensor approximation.
Compared with other graph-based methods, the proposed method has obvious advantages. The proposed method improves the COIL-20 accuracy by 2.69%, 4.07%, 9.01% and 8.25% over DGF, CGL, GMC and MCGC. On the UCI digits dataset, the proposed method achieves 9.79%, 1.7%, 2.13%, 0.61%, 1.93% and 2.15% improvement over the second method in terms of ACC, NMI, precision, recall, F-score and AR. On the 20newsgroups dataset, although there is a slight disadvantage in ACC compared to CGL and GMC, the proposed method still provides improvements compared to the inspired method (CGL). On the ORL dataset, the proposed method has 2.54% and 0.26% improvement in terms of ACC and NMI compared to CGL. The results illustrate that utilizing a nonconvex function to achieve the low-rank tensor learning and Cauchy norm to handle noise can refine the clustering performance.
The visualizations of the learned similarity graphs on the ORL and Handwritten datasets are provided in Figure 3 and Figure 4. Compared with other methods, the block structure of the similarity graph learned by the proposed method is the clearest among all the methods, indicating its suitability for clustering tasks.

6.6. Parameter Sensitivity and Time Complexity Analysis

There are four hyperparameters in our method, i.e., two balance parameters λ and α , a nonconvex function parameter θ and a scale parameter γ . The parameters are analyzed on six datasets, λ is tuned from [5, 10, 15, 20, 50, 100, 500, 1000, 5000], α is tuned from [1, 5, 10, 50, 100, 500, 1000, 5000], the range of θ is [0.5, 1, 2, 3, 4] and γ is chosen from [0.001, 0.01, 0.1, 0.2, 0.25]. We only display the ACC values for different combinations of parameters.
First, we analyze λ and α with fixed θ = 2.8 and γ = 0.2 , and the results are shown in Figure 5. Then, θ and γ are analyzed with the optimal λ and α , and the results are shown in Figure 6. In Figure 5, the x and y axes denote the values of λ and α . In Figure 6, the x and y axes denote the values of θ and γ . In conclusion, the proposed method is more stable in the other datasets than in the 20newsgroups and 100leaves datasets.
We show the running time of different methods on each dataset in Table 4. Graph-based methods are more time-saving than subspace learning methods. Compared with CGL, the proposed method runs faster, indicating that the designed optimization algorithm is more efficient.

7. Conclusions

In this paper, we propose a novel multi-view graph-based method named RLGMC. Firstly, the embedding matrices are constructed using spectral embedding. Secondly, the gram matrices obtained from the inner product of normalized embedding matrices are reorganized into a tensor. Subsequently, we employ a nonconvex low-rank tensor approximation to capture the high-order principle information among multiple views. To handle the noise in the spectral embedding space, a novel Cauchy norm with an upper bound is introduced, which suppresses noise by the Cauchy function and the scaling parameter. By integrating spectral embedding, the nonconvex low-rank tensor approximation and the Cauchy norm into an unified framework, the robust low-rank tensor is obtained. Finally, the consensus similarity graph is obtained from the robust low-rank tensor. Experimental results on six datasets demonstrate that the proposed method achieves superior performance and represents an effective improvement over the inspired method.
Although the proposed method achieves excellent performance, it has two main drawbacks. Firstly, determining the value of nonconvex function parameter θ in practical applications is challenging. Secondly, the optimization step of nonconvex approximation TNN, which involves singular value decomposition, can be time-consuming. We aim to address these deficiencies in future research.

Author Contributions

Conceptualization, X.P. and H.C.; methodology, X.P. and B.P.; writing—original draft preparation, X.P.; writing—review and editing, H.C.; supervision, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Natural Science Foundation of Chongqing, China (Grant No. cstc2021jcyj-msxmX1169) and the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJQN202200207).

Data Availability Statement

The data used to support the findings of the study are available from the first author upon request. The author’s email address is [email protected].

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

References

  1. Tang, J.; Shu, X.; Li, Z.; Jiang, Y.G.; Tian, Q. Social Anchor-Unit Graph Regularized Tensor Completion for Large-Scale Image Retagging. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2027–2034. [Google Scholar] [CrossRef] [Green Version]
  2. Han, Z.; Zhang, C.; Fu, H.; Zhou, J.T. Trusted Multi-View Classification with Dynamic Evidential Fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 2551–2566. [Google Scholar] [CrossRef]
  3. Liu, B.; Che, Z.; Zhong, H.; Xiao, Y. A Ranking Based Multi-View Method for Positive and Unlabeled Graph Classification. IEEE Trans. Knowl. Data Eng. 2023, 35, 2220–2230. [Google Scholar] [CrossRef]
  4. Zhao, H.; Liu, H.; Ding, Z.; Fu, Y. Consensus Regularized Multi-View Outlier Detection. IEEE Trans. Image Process. 2018, 27, 236–248. [Google Scholar] [CrossRef]
  5. Hu, J.; Pan, Y.; Li, T.; Yang, Y. TW-Co-MFC: Two-level weighted collaborative fuzzy clustering based on maximum entropy for multi-view data. Tsinghua Sci. Technol. 2021, 26, 185–198. [Google Scholar] [CrossRef]
  6. Zhang, X.; Zhang, X.; Liu, H.; Liu, X. Multi-Task Multi-View Clustering. IEEE Trans. Knowl. Data Eng. 2016, 28, 3324–3338. [Google Scholar] [CrossRef]
  7. Yu, H.; Lian, Y.; Xu, X.; Zhao, X. Mixture Self-Paced Learning for Multi-view K-Means Clustering. In Proceedings of the 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Orlando, FL, USA, 6–10 November 2017; pp. 1210–1215. [Google Scholar] [CrossRef]
  8. Che, H.; Wang, J.; Cichocki, A. Bicriteria Sparse Nonnegative Matrix Factorization via Two-Timescale Duplex Neurodynamic Optimization. IEEE Trans. Neural Netw. Learn. Syst. 2021, 1–11. [Google Scholar] [CrossRef]
  9. Li, C.; Che, H.; Leung, M.F.; Liu, C.; Yan, Z. Robust multi-view non-negative matrix factorization with adaptive graph and diversity constraints. Inf. Sci. 2023, 634, 587–607. [Google Scholar] [CrossRef]
  10. Che, H.; Wang, J. A nonnegative matrix factorization algorithm based on a discrete-time projection neural network. Neural Netw. 2018, 103, 63–71. [Google Scholar] [CrossRef] [PubMed]
  11. Chen, K.; Che, H.; Li, X.; Leung, M.F. Graph non-negative matrix factorization with alternative smoothed l0 regularizations. Neural Comput. Appl. 2023, 35, 9995–10009. [Google Scholar] [CrossRef]
  12. Yang, X.; Che, H.; Leung, M.F.; Liu, C. Adaptive graph nonnegative matrix factorization with the self-paced regularization. Appl. Intell. 2022, 53, 15818–15835. [Google Scholar] [CrossRef]
  13. Liu, C.; Li, R.; Wu, S.; Che, H.; Jiang, D.; Yu, Z.; Wong, H.S. Self-Guided Partial Graph Propagation for Incomplete Multiview Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–14. [Google Scholar] [CrossRef] [PubMed]
  14. Li, Z.; Tang, C.; Liu, X.; Zheng, X.; Zhang, W.; Zhu, E. Consensus Graph Learning for Multi-View Clustering. IEEE Trans. Multimed. 2022, 24, 2461–2472. [Google Scholar] [CrossRef]
  15. Chen, Y.; Guo, Y.; Wang, Y.; Wang, D.; Peng, C.; He, G. Denoising of Hyperspectral Images Using Nonconvex Low Rank Matrix Approximation. IEEE Trans. Geosci. Remote Sens. 2017, 55, 5366–5380. [Google Scholar] [CrossRef]
  16. Chen, Y.; Wang, S.; Peng, C.; Hua, Z.; Zhou, Y. Generalized Nonconvex Low-Rank Tensor Approximation for Multi-View Subspace Clustering. IEEE Trans. Image Process. 2021, 30, 4022–4035. [Google Scholar] [CrossRef] [PubMed]
  17. Pan, B.; Li, C.; Che, H. Nonconvex low-rank tensor approximation with graph and consistent regularizations for multi-view subspace learning. Neural Netw. 2023, 161, 638–658. [Google Scholar] [CrossRef] [PubMed]
  18. Li, X.; Lu, Q.; Dong, Y.; Tao, D. Robust Subspace Clustering by Cauchy Loss Function. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2067–2078. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Guan, N.; Liu, T.; Zhang, Y.; Tao, D.; Davis, L.S. Truncated Cauchy Non-Negative Matrix Factorization. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 246–259. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Li, J.; Che, H.; Liu, X. Circuit Design and Analysis of Smoothed l0 Norm Approximation for Sparse Signal Reconstruction. Circuits Syst. Signal Process. 2023, 42, 2321–2345. [Google Scholar] [CrossRef]
  21. Vaze, R.; Deshmukh, N.; Kumar, R.; Saxena, A. Development and application of Quantum Entanglement inspired Particle Swarm Optimization. Knowl.-Based Syst. 2021, 219, 106859. [Google Scholar] [CrossRef]
  22. Che, H.; Wang, J.; Cichocki, A. Sparse signal reconstruction via collaborative neurodynamic optimization. Neural Netw. 2022, 154, 255–269. [Google Scholar] [CrossRef]
  23. Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
  24. Ji, P.; Zhang, T.; Li, H.; Salzmann, M.; Reid, I. Deep Subspace Clustering Networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 4 December 2017; pp. 23–32. [Google Scholar]
  25. Wang, H.; Wang, J.; Wang, J.; Zhao, M.; Zhang, W.; Zhang, F.; Xie, X.; Guo, M. GraphGAN: Graph Representation Learning with Generative Adversarial Nets. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, New Orleans, LA, USA, 2 February 2018; AAAI Press: Palo Alto, CA, USA, 2018. [Google Scholar]
  26. Li, X.; Zhang, H.; Zhang, R. Adaptive Graph Auto-Encoder for General Data Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 9725–9732. [Google Scholar] [CrossRef] [PubMed]
  27. Elhamifar, E.; Vidal, R. Sparse Subspace Clustering: Algorithm, Theory, and Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 2765–2781. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Liu, G.; Lin, Z.; Yan, S.; Sun, J.; Yu, Y.; Ma, Y. Robust Recovery of Subspace Structures by Low-Rank Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Zhang, C.; Fu, H.; Liu, S.; Liu, G.; Cao, X. Low-Rank Tensor Constrained Multiview Subspace Clustering. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1582–1590. [Google Scholar] [CrossRef]
  30. Xie, Y.; Tao, D.; Zhang, W.; Liu, Y.; Zhang, L.; Qu, Y. On Unifying Multi-View Self-Representations for Clustering by Tensor Multi-Rank Minimization. Int. J. Comput. Vis. 2018, 126, 1157–1179. [Google Scholar] [CrossRef] [Green Version]
  31. Wu, J.; Lin, Z.; Zha, H. Essential Tensor Learning for Multi-View Spectral Clustering. IEEE Trans. Image Process. 2019, 28, 5910–5922. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Nie, F.; Wang, X.; Huang, H. Clustering and Projected Clustering with Adaptive Neighbors. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 977–986. [Google Scholar] [CrossRef]
  33. Nie, F.; Li, J.; Li, X. Parameter-Free Auto-Weighted Multiple Graph Learning: A Framework for Multiview Clustering and Semi-Supervised Classification. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 24 February 2017; AAAI Press: Palo Alto, CA, USA, 2016; pp. 1881–1887. [Google Scholar]
  34. Nie, F.; Li, J.; Li, X. Self-Weighted Multiview Clustering with Multiple Graphs. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; AAAI Press: Palo Alto, CA, USA, 2017; pp. 2564–2570. [Google Scholar]
  35. Kilmer, M.E.; Braman, K.S.; Hao, N.; Hoover, R.C. Third-Order Tensors as Operators on Matrices: A Theoretical and Computational Framework with Applications in Imaging. SIAM J. Matrix Anal. Appl. 2013, 34, 148–172. [Google Scholar] [CrossRef] [Green Version]
  36. Kilmer, M.E.; Martin, C.D. Chen et al. Factorization strategies for third-order tensors. Linear Algebra Its Appl. 2011, 435, 641–658. [Google Scholar] [CrossRef] [Green Version]
  37. Nie, F.; Cai, G.; Li, X. Multi-View Clustering and Semi-Supervised Classification with Adaptive Neighbours. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; AAAI Press: Palo Alto, CA, USA, 2017; pp. 2408–2414. [Google Scholar]
  38. Gao, Q.; Xia, W.; Wan, Z.; Deyan, X.; Zhang, P. Tensor-SVD Based Graph Learning for Multi-View Subspace Clustering. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3930–3937. [Google Scholar] [CrossRef]
  39. Gu, S.; Xie, Q.; Meng, D.; Zuo, W.; Feng, X.; Zhang, L. Weighted Nuclear Norm Minimization and Its Applications to Low Level Vision. Int. J. Comput. Vision 2017, 121, 183–208. [Google Scholar] [CrossRef]
  40. Li, Y.; Nie, F.; Huang, H.; Huang, J. Large-Scale Multi-View Spectral Clustering via Bipartite Graph. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; AAAI Press: Palo Alto, CA, USA, 2015; pp. 2750–2756. [Google Scholar]
  41. Xie, D.; Gao, Q.; Deng, S.; Yang, X.; Gao, X. Multiple graphs learning with a new weighted tensor nuclear norm. Neural Netw. 2021, 133, 57–68. [Google Scholar] [CrossRef]
  42. Zhan, K.; Nie, F.; Wang, J.; Yang, Y. Multiview Consensus Graph Clustering. Trans. Img. Proc. 2019, 28, 1261–1270. [Google Scholar] [CrossRef]
  43. Wang, H.; Yang, Y.; Liu, B. GMC: Graph-Based Multi-View Clustering. IEEE Trans. Knowl. Data Eng. 2020, 32, 1116–1129. [Google Scholar] [CrossRef]
  44. Liang, Y.; Huang, D.; Wang, C.D. Consistency Meets Inconsistency: A Unified Graph Learning Framework for Multi-view Clustering. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 1204–1209. [Google Scholar] [CrossRef]
  45. Cai, D.; He, X.; Han, J. Document clustering using locality preserving indexing. IEEE Trans. Knowl. Data Eng. 2005, 17, 1624–1637. [Google Scholar] [CrossRef] [Green Version]
  46. Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, MA, USA, 2008. [Google Scholar]
Figure 1. The flowchart of the RLGMC. Given multi-view data { X ( v ) R d v × n } v = 1 V , where V is the number of views, the similarity matrices { W ( v ) R n × n } v = 1 V are constructed from { X ( v ) } v = 1 V . Then, multi-view embedding matrices { H ( v ) R n × c } v = 1 V , where c denotes the numbers of cluster, are computed via spectral embedding. H ( v ) is the normalized embedding matrix. The robust low-rank tensor and the noise tensor are decomposed from the tensor stacked by { H ( v ) H ( v ) T } v = 1 V . Finally, the consensus similarity matrix is constructed from the robust tensor.
Figure 1. The flowchart of the RLGMC. Given multi-view data { X ( v ) R d v × n } v = 1 V , where V is the number of views, the similarity matrices { W ( v ) R n × n } v = 1 V are constructed from { X ( v ) } v = 1 V . Then, multi-view embedding matrices { H ( v ) R n × c } v = 1 V , where c denotes the numbers of cluster, are computed via spectral embedding. H ( v ) is the normalized embedding matrix. The robust low-rank tensor and the noise tensor are decomposed from the tensor stacked by { H ( v ) H ( v ) T } v = 1 V . Finally, the consensus similarity matrix is constructed from the robust tensor.
Mathematics 11 02940 g001
Figure 2. Curves of Taylor functions and Cauchy parametrization with different scaling parameters. The value of the x axis is | X i | 2 , where X is a random matrix of size 300 × 20 and X i denotes i-th column of X. The four figures correspond to four different parameter values as: (a) γ = 1 . (b) γ = 0.1 . (c) γ = 0.5 . (d) γ = 0.1 .
Figure 2. Curves of Taylor functions and Cauchy parametrization with different scaling parameters. The value of the x axis is | X i | 2 , where X is a random matrix of size 300 × 20 and X i denotes i-th column of X. The four figures correspond to four different parameter values as: (a) γ = 1 . (b) γ = 0.1 . (c) γ = 0.5 . (d) γ = 0.1 .
Mathematics 11 02940 g002
Figure 3. Visual comparisons of the similarity matrices on the ORL dataset.
Figure 3. Visual comparisons of the similarity matrices on the ORL dataset.
Mathematics 11 02940 g003
Figure 4. Visual comparisons of the similarity matrices on the Handwritten dataset.
Figure 4. Visual comparisons of the similarity matrices on the Handwritten dataset.
Mathematics 11 02940 g004
Figure 5. Performance in terms of ACC for λ and α with a fixed θ = 0.2 and γ = 2.8 on six datasets.
Figure 5. Performance in terms of ACC for λ and α with a fixed θ = 0.2 and γ = 2.8 on six datasets.
Mathematics 11 02940 g005
Figure 6. Performance in terms of ACC for θ and γ with fixed λ and α on six datasets.
Figure 6. Performance in terms of ACC for θ and γ with fixed λ and α on six datasets.
Mathematics 11 02940 g006
Table 1. Notations and nomenclature declaration.
Table 1. Notations and nomenclature declaration.
NotationsMeaning
z i The i-th element of vector z
Z i The i-th column of matrix Z
Z T The Transposition of matrix Z
Z F Z F = i j Z i j 2
Z 2 , 1 Z 2 , 1 = j Z ( : , j ) 2
Z Sum of singular values
Z i j k The ( i , j , k )-th entry of Z
Z ( i , : , : ) The i-th horizontal slice of Z
Z ( : , j , : ) The j-th lateral slice of Z
Z ( : , : , k ) The k-th frontal slice of Z
Z T The Transposition of tensor Z
Z ( i ) Z ( i ) = Z ( : , : , i )
Z ( i ) The i-th matricization of Z
Z F Z F = i j k Z i j k 2
Z Tensor nuclear norm
l 2 , 1 norm A 2 , 1 , where A is matrix or tensor
l 2 norm A 2 , for measuring Euclidean distance
t r ( · ) The traces of matrix
Tensor Z A data structure made by stacking multiple matrices
Table 2. Information of Six Datasets.
Table 2. Information of Six Datasets.
DatasetObjectiveInstancesClustersView
ORLFace400403
20newsgroupsDocument50053
COIL-20Object1440203
100leavesPlant species16001003
UCI digitsDigit2000103
HandwrittenDigit2000106
Table 3. The comparison results on six datasets.
Table 3. The comparison results on six datasets.
DatasetsMethodsACCNMIPrecisionRecallF-ScoreAR
ORLSSC 0.5803 ± 0.0172 0.7658 ± 0.0099 0.6442 ± 0.0159 0.3861 ± 0.0210 0.5323 ± 0.0229 0.4474 ± 0.0203
LRR 0.6407 ± 0.0198 0.7935 ± 0.0104 0.5518 ± 0.0274 0.5012 ± 0.0181 0.5252 ± 0.0219 0.5137 ± 0.0223
LT-MSC 0.7718 ± 0.0312 0.8992 ± 0.0090 0.6474 ± 0.0482 0.7722 ± 0.0193 0.7039 ± 0.0350 0.6964 ± 0.0361
t-SVD-MSC 0.7998 ± 0.0207 0.8967 ± 0.0098 0.7052 ± 0.0253 0.7712 ± 0.0205 0.7367 ± 0.0224 0.7303 ± 0.0230
ETLMSC 0.7230 ± 0.0433 0.8702 ± 0.0173 0.5984 ± 0.0509 0.7033 ± 0.0401 0.6463 ± 0.0456 0.6374 ± 0.0469
MCGC 0.6625 ± 0.0000 0.7888 ± 0.0000 0.2290 ± 0.0000 0.7428 ± 0.0000 0.3501 ± 0.0000 0.3269 ± 0.0000
GMC 0.6325 ± 0.0000 0.8035 ± 0.0000 0.2321 ± 0.0000 0.8011 ± 0.0000 0.3599 ± 0.0000 0.3367 ± 0.0000
CGL 0.8279 ± 0.0180 ̲ 0.9159 ± 0.0046 ̲ 0.7343 ± 0.0195 ̲ 0.8110 ± 0.0115 0.7707 ± 0.0149 ̲ 0.7651 ± 0.0153 ̲
DGF 0.6165 ± 0.0168 0.7976 ± 0.0056 0.4387 ± 0.0133 0.5927 ± 0.0109 0.5041 ± 0.0115 0.4909 ± 0.0118
Ours 0.8533 ± 0.0049 0.9185 ± 0.0024 0.8222 ± 0.0057 0.7657 ± 0.0056 ̲ 0.7929 ± 0.0066 0.7880 ± 0.0068
20newsgroupsSSC 0.2248 ± 0.0010 0.0211 ± 0.0007 0.2268 ± 0.0010 0.1986 ± 0.0000 0.9221 ± 0.0028 0.3268 ± 0.0001
LRR 0.2080 ± 0.0000 0.0080 ± 0.0000 0.9840 ± 0.0000 0.1984 ± 0.0000 0.3302 ± 0.0000 0.0000 ± 0.0000
LT-MSC 0.2040 ± 0.0000 0.0155 ± 0.0000 0.1984 ± 0.0000 0.9841 ± 0.0000 0.3302 ± 0.0000 0.0000 ± 0.0000
t-SVD-MSC 0.9820 ± 0.0000 0.9391 ± 0.0000 0.9642 ± 0.0000 0.9643 ± 0.0000 0.9643 ± 0.0000 0.9554 ± 0.0000
ETLMSC 0.3830 ± 0.0327 0.2090 ± 0.0314 0.2896 ± 0.0196 0.3772 ± 0.0415 0.3273 ± 0.0264 0.1330 ± 0.0305
MCGC 0.2880 ± 0.0000 0.0998 ± 0.0000 0.2049 ± 0.0000 0.7595 ± 0.0000 0.3228 ± 0.0000 0.0150 ± 0.0000
GMC 0.9820 ± 0.0000 0.9392 ± 0.0000 0.9642 ± 0.0000 0.9643 ± 0.0000 0.9643 ± 0.0000 0.9554 ± 0.0000
CGL 0.9800 ± 0.0000 ̲ 0.9363 ± 0.0000 ̲ 0.9600 ± 0.0000 ̲ 0.9608 ± 0.0000 ̲ 0.9604 ± 0.0000 ̲ 0.9506 ± 0.0000 ̲
DGF 0.2924 ± 0.0013 0.0984 ± 0.0015 0.2219 ± 0.0000 0.6520 ± 0.0001 0.3312 ± 0.0001 0.0499 ± 0.0001
Ours 0.9624 ± 0.0008 0.8844 ± 0.0016 0.9265 ± 0.0015 0.9255 ± 0.0017 0.9260 ± 0.0016 0.9077 ± 0.0020
COIL-20SSC 0.7726 ± 0.0004 0.9015 ± 0.0000 0.8431 ± 0.0000 0.6770 ± 0.0000 0.8832 ± 0.0001 0.7665 ± 0.0000
LRR 0.7088 ± 0.0153 0.8007 ± 0.0129 0.6933 ± 0.0194 0.6143 ± 0.0239 0.6513 ± 0.0202 0.6321 ± 0.0215
LT-MSC 0.7751 ± 0.0202 0.8709 ± 0.0098 0.7277 ± 0.0228 0.7853 ± 0.0194 0.7552 ± 0.0177 0.7420 ± 0.0187
t-SVD-MSC 0.7433 ± 0.0100 0.8163 ± 0.0076 0.6869 ± 0.0153 0.7048 ± 0.0136 0.6957 ± 0.0144 0.6797 ± 0.0152
ETLMSC 0.7353 ± 0.0227 0.8272 ± 0.0081 0.6862 ± 0.0258 0.7209 ± 0.0145 0.7029 ± 0.0166 0.6871 ± 0.0177
MCGC 0.7986 ± 0.0000 0.8839 ± 0.0000 0.7196 ± 0.0000 0.8393 ± 0.0000 0.7748 ± 0.0000 0.7622 ± 0.0000
GMC 0.7910 ± 0.0000 0.9189 ± 0.0000 0.6938 ± 0.0000 0.9287 ± 0.0000 ̲ 0.7943 ± 0.0000 0.7819 ± 0.0000
CGL 0.8404 ± 0.0065 0.9209 ± 0.0000 0.7958 ± 0.0059 ̲ 0.8868 ± 0.0000 0.8235 ± 0.0000 0.8137 ± 0.0000
DGF 0.8542 ± 0.0000 ̲ 0.9450 ± 0.0000 0.7574 ± 0.0000 0.9497 ± 0.0000 ̲ 0.8427 ± 0.0000 0.8336 ± 0.0000 ̲
Ours 0.8811 ± 0.0004 0.9452 ± 0.0000 0.8992 ± 0.0001 0.8557 ± 0.0001 0.8769 ± 0.0001 ̲ 0.8704 ± 0.0001
100leavesSSC 0.5748 ± 0.0090 0.7704 ± 0.0032 0.6089 ± 0.0054 0.4087 ± 0.0093 0.4693 ± 0.0109 0.4369 ± 0.0095
LRR 0.4933 ± 0.0182 0.7204 ± 0.0073 0.3664 ± 0.0150 0.3322 ± 0.0151 0.3484 ± 0.0150 0.3420 ± 0.0151
LT-MSC 0.7167 ± 0.0152 0.8646 ± 0.0068 0.5987 ± 0.0182 0.6606 ± 0.0179 0.6281 ± 0.0176 0.6244 ± 0.0178
t-SVD-MSC 0.7479 ± 0.0173 0.8744 ± 0.0079 0.6277 ± 0.0196 0.6995 ± 0.0214 0.6616 ± 0.0200 0.6582 ± 0.0202
ETLMSC 0.7274 ± 0.0145 0.8923 ± 0.0069 0.6191 ± 0.0166 0.7393 ± 0.0174 0.6738 ± 0.0158 0.6704 ± 0.0159
MCGC 0.7381 ± 0.0000 0.8353 ± 0.0000 0.2707 ± 0.0000 0.7391 ± 0.0000 0.3963 ± 0.0000 0.3879 ± 0.0000
GMC 0.8238 ± 0.0000 0.9025 ± 0.0000 0.3521 ± 0.0000 0.8874 ± 0.0000 0.5042 ± 0.0000 0.4974 ± 0.0000
CGL 0.9609 ± 0.0074 ̲ 0.9805 ± 0.0016 ̲ 0.9247 ± 0.0113 ̲ 0.9567 ± 0.0037 0.9404 ± 0.0072 ̲ 0.9398 ± 0.0073 ̲
DGF 0.7294 ± 0.0154 0.8823 ± 0.0023 0.6044 ± 0.0098 0.7479 ± 0.0060 0.6685 ± 0.0068 0.6650 ± 0.0069
Ours 0.9696 ± 0.0091 0.9831 ± 0.0024 0.9636 ± 0.0032 0.9396 ± 0.0163 0.9514 ± 0.0089 0.9509 ± 0.0090
UCI digitsSSC 0.6697 ± 0.0007 0.7711 ± 0.0007 0.7594 ± 0.0006 0.6376 ± 0.0005 0.7954 ± 0.0009 0.7078 ± 0.0005
LRR 0.7794 ± 0.0002 0.7619 ± 0.0001 0.7585 ± 0.0001 0.7017 ± 0.0001 0.7290 ± 0.0001 0.6977 ± 0.0002
LT-MSC 0.8649 ± 0.0186 0.8223 ± 0.0018 0.7983 ± 0.0125 0.8198 ± 0.0021 0.8089 ± 0.0074 0.7874 ± 0.0084
t-SVD-MSC 0.9669 ± 0.0005 0.9343 ± 0.0007 0.9345 ± 0.0010 ̲ 0.9377 ± 0.0008 0.9361 ± 0.0009 ̲ 0.9290 ± 0.0010 ̲
ETLMSC 0.9088 ± 0.0758 0.9447 ± 0.0399 ̲ 0.8825 ± 0.0949 0.9413 ± 0.0459 0.9102 ± 0.0712 0.8996 ± 0.0796
MCGC 0.8570 ± 0.0000 0.8382 ± 0.0000 0.7613 ± 0.0000 0.9076 ± 0.0000 0.8280 ± 0.0000 0.8071 ± 0.0000
GMC 0.7505 ± 0.0000 0.8096 ± 0.0000 0.6814 ± 0.0000 0.8179 ± 0.0000 0.7434 ± 0.0000 0.7122 ± 0.0000
CGL 0.8796 ± 0.0011 ̲ 0.9154 ± 0.0077 0.8008 ± 0.0050 0.9490 ± 0.0126 ̲ 0.8686 ± 0.0082 0.8527 ± 0.0091
DGF 0.4647 ± 0.0004 0.4906 ± 0.0002 0.3806 ± 0.0003 0.3883 ± 0.0005 0.3844 ± 0.0004 0.3156 ± 0.0004
Ours 0.9775 ± 0.0000 0.9464 ± 0.0000 0.9558 ± 0.0000 0.9551 ± 0.0000 0.9554 ± 0.0000 0.9505 ± 0.0000
HandwrittenSSC 0.7958 ± 0.0009 0.8117 ± 0.0012 0.8384 ± 0.0009 0.7146 ± 0.0016 0.8263 ± 0.0015 0.7664 ± 0.0015
LRR 0.7088 ± 0.0153 0.8007 ± 0.0129 0.6933 ± 0.0194 0.6143 ± 0.0239 0.6513 ± 0.0202 0.6321 ± 0.0215
LT-MSC 0.9380 ± 0.0002 0.8854 ± 0.0001 0.8832 ± 0.0002 0.8857 ± 0.0002 0.8844 ± 0.0002 0.8716 ± 0.0003
t-SVD-MSC 0.9995 ± 0.0000 0.9986 ± 0.0000 0.9990 ± 0.0000 0.9990 ± 0.0000 0.9990 ± 0.0000 0.9989 ± 0.0000
ETLMSC 0.9306 ± 0.0966 0.9749 ± 0.0346 ̲ 0.9162 ± 0.1144 0.9756 ± 0.0325 0.9430 ± 0.0779 0.9361 ± 0.0876
MCGC 0.9710 ± 0.0000 0.9330 ± 0.0000 0.9420 ± 0.0000 0.9435 ± 0.0000 0.9427 ± 0.0000 0.9364 ± 0.0000
GMC 0.8820 ± 0.0000 0.8932 ± 0.0000 0.8260 ± 0.0000 0.9085 ± 0.0000 0.8653 ± 0.0000 0.8496 ± 0.0000
CGL 0.9750 ± 0.0000 0.9455 ± 0.0000 0.9500 ± 0.0000 0.9510 ± 0.0000 0.9507 ± 0.0000 0.9452 ± 0.0000
DGF 0.8110 ± 0.0000 0.7832 ± 0.0000 0.7116 ± 0.0000 0.7490 ± 0.0000 0.7298 ± 0.0000 0.6991 ± 0.0000
Ours 0.9755 ± 0.0000 ̲ 0.9467 ± 0.0000 0.9521 ± 0.0000 ̲ 0.9511 ± 0.0000 ̲ 0.9516 ± 0.0000 ̲ 0.9463 ± 0.0000 ̲
The best clustering results are highlighted in bold and the second best results underlined.
Table 4. The running time comparisons on six datasets.
Table 4. The running time comparisons on six datasets.
Time (s)MethodsORL20newsgroupsCOIL-20100leavesUCI DigitsHandwritten
Datasets
SSC7.517.0329.8522.3062.1271.74
LRR18.6142.22242.61172.55341.30342.05
LT-MSC51.7051.85633.59420.87924.881083.70
t-SVD-MSC34.3831.88406.90273.93198.71627.44
ETLMSC0.903.6916.1248.6966.3266.47
MCGC0.460.685.5713.8915.7714.49
GMC0.490.567.195.0128.2212.72
CGL10.509.1776.13239.89152.11305.81
DGF0.300.283.650.621.652.14
Ours2.511.6325.1433.0742.5382.61
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pu, X.; Pan, B.; Che, H. Robust Low-Rank Graph Multi-View Clustering via Cauchy Norm Minimization. Mathematics 2023, 11, 2940. https://doi.org/10.3390/math11132940

AMA Style

Pu X, Pan B, Che H. Robust Low-Rank Graph Multi-View Clustering via Cauchy Norm Minimization. Mathematics. 2023; 11(13):2940. https://doi.org/10.3390/math11132940

Chicago/Turabian Style

Pu, Xinyu, Baicheng Pan, and Hangjun Che. 2023. "Robust Low-Rank Graph Multi-View Clustering via Cauchy Norm Minimization" Mathematics 11, no. 13: 2940. https://doi.org/10.3390/math11132940

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop