Comparison of Matrix Decomposition in Null Space-Based LDA Method

Problems with small sample sizes and high dimensionality are common in pattern recognition. Almost all machine learning algorithms degrade in high-dimensional data, so that singularities in the scatter matrices, the main problem of the Linear Discriminant Analysis (LDA) technique, might result. A null space-based LDA (NLDA) has been conceived to address the singularity issue. NLDA aims to maximize the distance between classes in the null space of the within-class scatter matrix. In the earliest research, the NLDA method was performed by computing eigenvalue decomposition and singular value decomposition (SVD). This research led to several new implementations of the NLDA method using other matrix decompositions. The new implementations include NLDA using Cholesky decomposition and NLDA using QR decomposition. This paper compares the performance of three NLDA methods using different matrix decompositions, namely SVD, Cholesky decomposition, and QR decomposition. Two sets of data were used in the experiments that used three different NLDA algorithms. To determine the performance of the NLDA methods, the classification accuracy of the three methods was measured using the Confusion Matrix. The results show that the NLDA method using SVD has the best performance when compared to the other two methods, achieving 77.8% accuracy for the Colon dataset and 98.8% accuracy for the TKI-resistance dataset.


Introduction
Research on image recognition has been going strong for decades.A few examples of related applications include smart cards, surveillance systems, biometrics, information security, and access control [1].Real-world data sets often have too many dimensions.Almost all machine learning algorithms degrade in highdimensional data because high-dimensional data is likely to contain noise, redundant (correlated between variables), and features with small variances that can cause a phenomenon called the curse of dimensionality [2].Besides causing high computation time [3], high dimensionality with a much smaller number of samples than dimensionality can also cause overfitting [4], thus reducing the performance of machine learning algorithms.Not all features in high-dimensional data are relevant to the problem at hand, so it is necessary to reduce them.Dimensionality reduction is useful for improving the performance of machine learning algorithms, memory efficiency, reducing computational costs, and visualization [5].The objective is to reduce the number of dimensions of a high-dimensional dataset without sacrificing any of the useful information contained therein [6].
Many dimension reduction techniques have been suggested in recent decades.Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are statistical-based methods now widely utilized for dimensionality reduction.LDA is a classbased (supervised) dimensionality reduction method, while PCA is a dimensionality reduction method but not class-based (unsupervised), so PCA cannot guarantee maximum separation between classes [6].For classification, LDA is generally more suitable than PCA [7].
If the dimension of the data is much greater than the number of training vectors, then the within-class matrix and the total scatter matrix are singular.This weakness is considered a main problem of LDA, which is prevalently known as the singularity problem caused by Small Sample Size (SSS) [8].Classical LDA cannot handle the SSS problem, so many variations of LDA are proposed to overcome the SSS problem.Among them are Direct LDA (DLDA) [9], Regularized LDA (RLDA) [10], Subspace LDA (SLDA) [11], Null space LDA (NLDA) [12], ILDA [13], GO-LDA [14], and many more.
Here, we zero in on NLDA as it pertains to the SSS issue.There have been many variations of NLDA with new and faster implementations.In [15] and [12], the NLDA method is performed by calculating eigenvalue decomposition and singular value decomposition (SVD) to obtain the optimal transformation matrix.However, it turns out that the NLDA method has a high computational cost, so in [16] and [17], a new method is proposed without calculating eigenvalue decomposition and SVD but using QR decomposition.For high dimensionality and small sample size problems, [18] proposed a subspace method and introduced the notion of a tenuous null subspace and its associated projection operator.Another development of NLDA is also discussed in [19], namely by applying Cholesky decomposition to the scatter matrix in a class.A reference collection of variations of NLDA methods is discussed in [8].
The three types of NLDA methods that will be discussed are methods that apply matrix decomposition, namely the SVD method, Cholesky decomposition, and QR decomposition on within-class scatter matrices, respectively.Previous research has not compared the performance of NLDA methods that use the three matrix decompositions.The research in [16] discusses two variations of the NLDA method using SVD, and one using QR decomposition.Then the research in [17] also discusses two variations of the NLDA method using SVD and two others using QR decomposition, but experiments are only carried out on the NLDA method using QR decomposition.

Classical Linear Discriminant Analysis (LDA)
Finding the best projection matrix G is the goal of the conventional LDA approach, which involves solving the following optimisation problem is shown in Formula 3.
trace(•) demonstrates the trace operator.By tackling the accompanying expanded eigenvalue issue, we can get the answer for Formula 4.
= λ  , λ ≠ 0 whose columns in  are the eigenvectors that correspond to the  − 1 highest eigenvalues.

Null Space LDA (NLDA) using Singular Value Decomposition (SVD)
The null space of   must be calculated to find the optimum matrix .Because data is highly dimensional, this null space may be large.[20] improved the efficiency of the algorithm in [21] via the initial step of removing the null space of   .
Let   = Σ  be the SVD [22] of   .Here   is defined by Formula 2, and  and  are orthogonal, Σ  = ℝ × is the diagonal entries sorted in the nonascending order, and  = (  ).Then Let  = ( 1 ,  2 ) be  divided by  1 ∈ ℝ × and  2 ∈ ℝ ×(−) .Data may be projected into the subspace spanned by the columns of  1 to remove the null space from   .After removing the null space from   , the scatter matrices are  ̃,  ̃, and  ̃.That is =  1  gives the optimal transformation of NLDA with the calculated  1 .Here  solves the following optimization problem [9] as shown in Formula 5.
that is, the columns of  are in the null space of  ̃, while maximizing trace(   ̃).
Let  be a matrix whose columns of  span the null space of  ̃.Next,  =  is applied to the next determined matrix . for each , the constraint in Formula 5 satisfies  = , so we can calculate the optimal  by maximizing (     ̃).
By requiring an orthogonality constraint on  [20], the optimal  is given by the eigenvectors of    ̃ associated with the nonzero eigenvalues.The calculation of the eigen decomposition of  ̃ yields the matrix .The optimal transformation of NLDA is given by The NLDA algorithm using SVD is shown in Table 1[12].5.  =  1 .

Null Space LDA (NLDA) using Cholesky Decomposition
In this section,   () is used to indicate that a sample  is the th sample of the th class.In this case,   can be expressed as Assuming the last sample of each class is removed from   , we get [19] The range space of   and   are also proved to be equal in Formula 1.Then, we can see that the space spanned by    is the range space of   .
To calculate the eigenvectors of  ̅  −1  ̅  , we use the following steps [19]: Compute the Cholesky decomposition [23]  The optimal projection matrix in   's null space may be quickly determined by considering the following technique.Let and the SVD of   be Based on the proof of the theorem in [19],   =   .However, this method does not require the SVD of   to obtain   .

Null Space LDA (NLDA) using QR Decomposition
The NLDA method suggested in [16] makes assumptions about the linear independence of the training data vectors in order to simplify things.Within the NLDA method, there is only a one-step economical QR decomposition of an  × ( − 1) matrix if each training data vectors are linearly independent [17].
and    defined, as Without a doubt,   and   share the same range space.In this case,    is defined as where   =  (+1) −  (1) for  = 1, … , .
From the explanation above, it can be seen that if n data points in the data matrix  ∈ ℝ × are linearly independent, then the optimization problem may be solved using one step QR decomposition.
In general, machine learning is more effective on bigger datasets for pattern recognition, therefore applying it to datasets with a small sample size is certain to cause problems.Machine learning's accuracy and resilience degrade with decreasing dataset sizes.Sparsity is an inherent property of high-dimensional spaces [25].
Information extraction from limited datasets, deep learning techniques for data augmentation, and dimensionality reduction in complicated big data analyses are some of the approaches that have been investigated in an effort to address these issues [26].

Evaluation Method
The evaluation is carried out by computing the accuracy of each NLDA model.Find the accuracy value with the help of the Confusion Matrix.An actual value vs the model's predicted value comparison is shown via a confusion matrix [27].Confusion Matrix is used as a metric to analyze how machine learning classifiers perform on datasets, making it possible to define a wide variety of performance metrics.Figure 1 is an overview of the Confusion Matrix.When the dataset contains more than two classes, the matrix grows with multiple classes.For example, if there are three classes, then the matrix is a 3 x 3 matrix.Whatever the size of the confusion matrix, the method for interpreting it is the same.
The accuracy of a prediction system is defined as the proportion of accurate predictions to total data [28].Formula 6 is used to determine the accuracy value.

Results
Experimental studies are conducted and discussed in this section to evaluate the three NLDA methods.We compared three NLDA methods by visualizing the results and showing their classification performance.Python programming was used in this experiment to model the NLDA method, calculate the linear discriminant for each model, and evaluate the experimental results.The evaluation method in section 2.5 is used for all NLDA methods.
Experiments were conducted on two datasets obtained from Openml web and Orange software (https://www.openml.org/search?type=data&status=active&id=45087).The 2000-dimensional Colon Cancer dataset, consisting of 62 data samples and two classes, and the 467-dimensional TKI-resistance dataset, consisting of 280 data samples and three classes, were divided into two, 70% for training and 30% for testing.The linear discriminant (LD) result can be calculated by multiplying the testing data matrix A with the transformation matrix G, which is then used to create data visualizations as in Figure 2 and Figure 3. Excel was used to create the visual representation seen in the graph above.Next, we use Formula 5 to create a Confusion Matrix which is presented in Figure 4 and Figure 5.Meanwhile, Table 4 shows the accuracy of all NLDA models.Figure 2 shows that QR and SVD have good class separation between class 1 and class 2, although overfitting still occurs in both.It means that the best NLDA method for the Colon dataset uses SVD and QR decomposition, as evidenced by the accuracy results in Table 4, which is 77.8%, both with SVD and QR decomposition.While in Figure 2 shows that SVD has the best class separation between class 1, class 2, and class 3. It means that the best NLDA method for the TKI-resistance dataset uses SVD, which can be proven from the accuracy results in Table 4, which is 98.8%.
Figure 4 and Figure 5 show the Confusion Matrix of the TKI-resistance dataset, using the NLDA method with SVD and Cholesky Decomposition, respectively.The numbers 0, 1, and 2 on the left side and bottom indicate the class name, while the colour indicates the amount of data.Figure 4 shows that almost all data is classified correctly, i.e.The difference in results in the three methods can occur because of the difference in treatment after obtaining the scatter matrix.In the first method, the   matrix is used to calculate the SVD, thus obtaining the  matrix to calculate the transformation matrix .The second method requires the    matrix to calculate the Cholesky decomposition, thus obtaining the  matrix, the lower triangular matrix.However, the Cholesky decomposition can only be calculated if the matrix (   )     is non-singular and positive definite [23].Since the matrix is singular, SVD is still used in this method.The   matrix is used to replace the  matrix (SVD result matrix) to obtain the transformation matrix .The combination of the    and    matrices in the third method is used to calculate the QR decomposition, where the  matrix is needed to find the transformation matrix .
In the second method, subtraction of the last sample of each class is applied to the   matrix.Similarly, in the third method, subtraction of the first sample of each class is applied to the   matrix.It was not applied in the first method.It turned out that these treatments did not give better results for the Colon and TKI-resistance datasets.It means that NLDA using SVD gives the best results for these datasets.

Conclusions
In this paper, we discuss three NLDA models using different matrix decompositions, i.e., SVD, Cholesky decomposition, and QR decomposition.In particular, we compare the steps taken after obtaining the scatter matrix.The scatter matrix in NLDA with Cholesky and QR decomposition is treated almost the same.
Experiments on two datasets have shown the effectiveness of the three NLDA methods.The SVD approach outperforms the others on this dataset in terms of accuracy and class separation in the visualization output of 77.8% for the Colon dataset and 98.8% for the TKI-resistance dataset.Future research is expected to develop variations of the NLDA method to improve accuracy and reduce overfitting on the Colon and TKIresistance datasets.

Figure 1 .
Figure 1.Confusion Matrix TP value means the number of correct positive predictions.FP value means the example is actually negative, but the classifier marked it as positive.FN value means the example marked by the classifier is negative but actually positive.Finally, TN is where the examples are entirely wrong.

Figure 2 .Figure 3 .
Figure 2. Visualization of Three NLDA Models from the Colon Dataset

Figure 4 .
Figure 4. Confusion Matrix of TKI-resistance Dataset with NLDA using SVD

Figure 5 .
Figure 5. Confusion Matrix of TKI-resistance Dataset with NLDA using Cholesky Decomposition 3.2 Discussions Figure 2 is the linear projection diagram of the Colon dataset with the NLDA method, using SVD, Cholesky decomposition, and QR decomposition, respectively.Similarly, Figure 3 is a linear projection diagram of the TKI-resistance dataset with the NLDA method, using SVD, Cholesky decomposition, and QR decomposition, respectively.

Table 4 .
The Accuracy of Three NLDA Methods