Building a discriminatively ordered subspace on the generating matrix to classify high-dimensional spectral data

https://doi.org/10.1016/j.ins.2016.12.001Get rights and content

Abstract

Soft independent modelling of class analogy (SIMCA) is a widely-used subspace method for spectral data classification. However, since the class subspaces are built independently in SIMCA, the discriminative between-class information is neglected. An appealing remedy is to first project the original data to a more discriminative subspace. For this, generalised difference subspace (GDS) that explores the information between class subspaces in the generating matrix can be a strong candidate. However, due to the difference between a class subspace (of infinite scale) and a class (of finite scale), the eigenvectors selected by GDS may not also be discriminative for classifying samples of classes. Therefore in this paper, we propose a discriminatively ordered subspace (DOS): different from GDS, our DOS selects the eigenvectors with high discriminative ability between classes rather than between class subspaces. The experiments on three real spectral datasets demonstrate that applying DOS before SIMCA outperforms its counterparts.

Introduction

High-dimensional spectral data, such as near infrared (NIR) spectroscopic data and mass spectrometry (MS) data, are widely used in a variety of fields, for example chemometrics, bioinformatics and hyperspectral image analysis. In the analysis of spectral data, classification is an omnipresent task [2], [4], [7], [9], [10], [13], which enables us to distinguish different species, identify the geographical origins of the products, or predict molecular substructure, to name a few.

Fig. 1 shows an example for NIR spectroscopic data of two classes, the chicken meat samples and the turkey meat samples. Each curve depicts the spectrum of a sample, which is usually represented by a high-dimensional feature vector. A classification task is to classify the spectra of new samples into the two classes based on the information provided by some labelled training spectra. In this paper, we focus on two-class classification. Based on the two-class classification results, multi-class classification can be readily obtained by using the one-vs-one or one-vs-all strategy [3].

Soft independent modelling of class analogy (SIMCA) [12] is a subspace-based classification method that is widely used in the two-class classification of high-dimensional spectral data in chemometrics [2], [4], [10]. When SIMCA is used for two-class classification, firstly two class subspaces are built for the two classes separately through using principal component analysis (PCA). Then an F-test, which tests whether the residual standard deviation of a new sample from the subspace of a class is statistically significantly different from the residual standard deviation of the training set of that class, is used to determine the class membership of the new sample. The PC-subspace is considered as a good class model for high-dimensional data because it extracts the most variable information in the data to few PCs and gets rid of a large amount of redundant information in the original feature dimensions. SIMCA is originally designed for both outlier detection and classification. In this paper, we treat SIMCA as a simple classification method that assign a new sample to the class with the smallest F-value as suggested in [8].

In spite of its wide use, SIMCA suffers from the problem that the class subspaces are built independently without considering between-class information. Therefore the F-value calculated independently for each class may not be discriminative enough to classify a new sample.

An appealing solution to this problem is to find a more discriminative subspace than the original feature space and project the data to this subspace before applying SIMCA. The projections of the samples to this discriminative subspace are expected to be more separated and can be more easily classified than those in the original feature space, as illustrated in Fig. 2. Also, as the new subspace contains more discriminative information for classification, the F-value calculated in this subspace is expected to be more discriminative. It is therefore the objective of our work in this paper to find such a discriminative subspace.

Recently, Fukui and Maki [6] propose the generalised difference subspace (GDS) projection as a preprocessing method to improve a popular subspace-based classifier called mutual subspace method (MSM) in image set-based object recognition. GDS aims to tackle an issue of MSM: the class subspaces are independently generated by PCA in a class-by-class manner, and thus may not be strongly discriminative for classification. This issue is actually the same as that of SIMCA. Hence, we believe the GDS projection can also be utilised as a preprocessing method for SIMCA to improve its classification performance.

GDS is a subspace containing the information about difference between class subspaces, and thus is supposed to be more discriminative than the original feature space. GDS is generated on the basis of a generating matrix GD, which is calculated as the sum of the projection matrices of the two class subspaces and can provide between-class information. Fukui and Maki [6] show that the eigenvectors of GD with small eigenvalues contain the information of difference between class subspaces while those with large eigenvalues contain the information about similarity between class subspaces. The GDS projection thus keeps only the last few eigenvectors with small eigenvalues and discards the first few eigenvectors with large eigenvalues, in order to make use of the difference information.

The GDS projection shows superior performance on face recognition and hand shape recognition problems. However, there is a limitation of the GDS. The GDS projection discards the eigenvectors of GD with large eigenvalues because they contain similarity information between class subspaces and thus are assumed ineffective for classification. This assumption is, however, not always valid due to the conceptual difference between a class subspace (of infinite scale) and a class (of finite scale). For example, two separable classes may span the same subspace. More technically, this assumption defines similarity information by using the eigenvector directions only, without considering the distribution of the projected samples in these directions. If the projected samples of different classes in the directions of similarity (i.e. the directions with large eigenvalues of GD) are still class separable, then these directions can also be discriminative in separating classes (although not discriminative in separating class subspaces), and thus discarding them can be harmful for classification of samples.

To illustrate the difference between a class subspace and a class, we show an intuitive example in Fig. 3. The infinite scale subspace of class 1, L1, is spanned by v1 and v3, and the infinite scale subspace of class 2, L2, is spanned by v1 and v2. The samples of the two classes lie in the two ellipses with finite scales in L1 and L2, respectively. It is obvious that v1 is the intersection of L1 and L2, which represents the same direction, i.e. the similarity information, between class subspaces. The GDS projection discards v1 because it is the eigenvector of GD with the largest eigenvalue and contains similarity information between class subspaces. However, the samples of the two classes are class separable on the direction of v1, which suggests that v1 contains discriminative information between classes. (We shall demonstrate another motivating example for this issue in Section 2.3.1 using a real spectral dataset.)

Moreover, here we illustrate that discarding the eigenvectors of GD with large eigenvalues can be harmful for classification using three real spectral datasets: meat, Phenyl and fat. In Fig. 4, we plot the classification accuracies of SIMCA and the GDS-preprocessed SIMCA on the three datasets. We can clearly observe that a preprocessing step of SIMCA by GDS does not necessarily benefit the classification performance of SIMCA; it actually has an negative effect (lowering classification accuracy) on SIMCA for the Phenyl dataset and the fat dataset. Detailed discussion on this will be provided in Section 3.

To make use of the between-class information in GD and to overcome the above limitation of the GDS projection, we propose a discriminatively ordered subspace (DOS): our DOS is spanned by the most discriminative eigenvectors of GD instead of the eigenvectors with small eigenvalues and extracts the most discriminative information from the data. That is, we sort the eigenvectors in terms of their discriminative ability and select the top-ranked eigenvectors with high discriminative abilities to generate the DOS projection. This discriminatively ordering procedure during the generation of the subspace is where the term ‘discriminatively ordered’ was from in DOS. As our objective is to develop DOS to tackle the issue of SIMCA, the discriminative ability of an eigenvector is measured by the classification accuracy of SIMCA on the samples projected to this eigenvector. The higher the classification accuracy, the higher the discriminative ability. We choose this filter-type of eigenvector selection scheme for high-dimensional spectral data, taking into consideration its simplicity and efficiency, as well as the uncorrelatedness and orthogonality of the candidate eigenvectors. The effectiveness of the DOS-preprocessed SIMCA will be demonstrated in Section 3.

The rest of this paper is organised as follows. In Section 2, a discussion of the GDS projection and a detailed description of the DOS projection are provided. In Section 3, GDS and DOS are compared for the improvement of classification performance of SIMCA on real spectral datasets. Section 4 presents some concluding remarks.

Section snippets

SIMCA

In the training phase of SIMCA, suppose XkRNk×p is the training set of class k (k=1,2), in which there are Nk training instances and each instance is represented by a p-dimensional data vector (i.e. in the original p-dimensional feature space). To build the principal component (PC) subspace for each class, we apply eigendecomposition to the covariance matrix of the kth class: Cov(Xk)=1Nk1(Xkc)TXkc=UkΣUkT,where Xkc is the column-centred Xk; the columns of UkRp×qk denote the normalised

Experiments

In the following experiments, we compare the performances of the original SIMCA without preprocessing, the SIMCA preprocessed by the linear discriminative analysis (LDA) projection, the SIMCA preprocessed by the GDS projection, and the SIMCA preprocessed by the DOS projection. The LDA-preprocessed SIMCA is also compared since LDA is a commonly used method to find a discriminative subspace. Three real datasets are used in the experiments: the fat dataset, the meat dataset, and the Phenyl

Conclusion

SIMCA is a widely-used subspace method for classifying two-class high-dimensional spectral datasets. It suffers from the problem that the class subspaces are built independently without considering between-class information. This problem can be tackled by projecting the data to a subspace more discriminative than the original feature space before applying SIMCA. We have proposed a new method, the DOS projection, to generate such a discriminative subspace, by considering the between-class

References (16)

There are more references available in the full text version of this article.

Cited by (7)

  • Enhanced Grassmann discriminant analysis with randomized time warping for motion recognition

    2020, Pattern Recognition
    Citation Excerpt :

    A MSM with GDS projection is called constrained MSM (CMSM) [14]. GDS has also been extended recently, such as by adding regularization [15] and for different applications, i.e. high-dimensional spectral data [16]. The idea of using GDS for motion recognition has been motivated by previous preliminary work in [17,18] and this paper contains more in-depth analysis with extensive and comprehensive experiments.

  • Learning distance to subspace for the nearest subspace methods in high-dimensional data classification

    2019, Information Sciences
    Citation Excerpt :

    Subspace-based classification methods have been widely applied to classify high-dimensional data. Face recognition [4,7,11], chemometrics [2,5,22,27] and process control in engineering [14,15,17,20] are famous application areas of subspace-based classification methods. In subspace-based classification methods, classes are first modelled by low-dimensional subspaces.

  • On the orthogonal distance to class subspaces for high-dimensional data classification

    2017, Information Sciences
    Citation Excerpt :

    Besides classification methods, some clustering methods also aim to seek low-dimensional subspaces for better clustering results [13,22,23]. In the above two classification approaches, associated with the PC subspaces, two distance metrics (or statistics) are often adopted to achieve pattern classification [3,15–18,20,25,29]: (1) the orthogonal distance (OD), also known as the Q-statistic or the squared prediction error, i.e. the squared orthogonal Euclidean distance from a test instance to a PC subspace; and (2) the score distance (SD), also known as the Hotelling’s T2 statistic, i.e. the squared Mahalanobis distance from the projection of a test instance to the centre of a PC subspace [17]. The distributions of OD and SD have also been studied extensively, in order to find a proper acceptance area for classification; recent work includes [17–19,21,30].

  • Multimodal medical image fusion techniques – a review

    2021, Current Signal Transduction Therapy
View all citing articles on Scopus
View full text