Improved Graph Embedding for Robust Recognition with outliers

Li, Peiyang; Zhou, Weiwei; Huang, Xiaoye; Zhu, Xuyang; Liu, Huan; Ma, Teng; Guo, Daqing; Yao, Dezhong; Xu, Peng

doi:10.1038/s41598-018-22207-x

Download PDF

Article
Open access
Published: 09 March 2018

Improved Graph Embedding for Robust Recognition with outliers

Peiyang Li ORCID: orcid.org/0000-0002-3937-2560^1,2,
Weiwei Zhou^1,2,
Xiaoye Huang^1,2,
Xuyang Zhu^1,2,
Huan Liu²,
Teng Ma^1,2,
Daqing Guo^1,2,
Dezhong Yao^1,2 &
…
Peng Xu^1,2

Scientific Reports volume 8, Article number: 4231 (2018) Cite this article

1637 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Artifacts in biomedical signal recordings, such as gene expression, sonar image and electroencephalogram, have a great influence on the related research because the artifacts with large value usually break the neighbor structure in the datasets. However, the conventional graph embedding (GE) used for dimension reduction such as linear discriminant analysis, principle component analysis and locality preserving projections is essentially defined in the L2 norm space and is prone to the presence of artifacts, resulting in biased sub-structural features. In this work, we defined graph embedding in the L1 norm space and used the maximization strategy to solve this model with the aim of restricting the influence of outliers on the dimension reduction of signals. The quantitative evaluation with different outlier conditions demonstrates that an L1 norm-based GE structure can estimate hyperplanes, which are more stable than those of conventional GE-based methods. The applications to a variety of datasets also show that the proposed L1 GE is more robust to outlier influence with higher classification accuracy estimated. The proposed L1 GE may be helpful for capturing reliable mapping information from the datasets that have been contaminated with outliers.

Song lyrics have become simpler and more repetitive over the last five decades

Article Open access 28 March 2024

Microenvironmental reorganization in brain tumors following radiotherapy and recurrence revealed by hyperplexed immunofluorescence imaging

Article Open access 15 April 2024

EEG is better left alone

Article Open access 09 February 2023

Introduction

The technological advances in data acquisition and storage result in a large number of high-dimensional and ultra-high-dimensional datasets in various biomedical applications¹, such as electroencephalogram (EEG), gene expression, and sonar imaging. These datasets might be redundant and result in the curse of dimensionality or excessive computational consumption when they are utilized for model construction and pattern recognition, although they are certainly desirable for modeling biological processes². This issue promotes the utilization of dimension reduction technologies such as principle component analysis (PCA), linear discriminant analysis (LDA), and locality preserving projections (LPP). In essence, these linear or nonlinear projection strategies can be unified in the structure of graph embedding (GE), which constructs a graph space based on the affinity information between each pair of samples and represents each vertex of the graph with a low-dimensional vector that preserves similarities between the vertex pairs³. Since the establishment of this framework, there have been varieties of successful applications in biomedical signal analysis^4,5,6.

However, in data analysis, such as biomedical research, a great challenge exists, i.e., the inevitable artifacts in the signal recordings^7,8, which are one of the major factors accounting for reduced signal quality. Artifacts can be caused by various factors such as the clinical image artifacts from illumination variations and dust particles, measurement errors in biochemistry⁸, independent scatters caused by biological tissues that are smaller than the acoustical wavelength⁹ and artifacts in EEG recordings due to blinks and eye movement, a large number of spontaneous brain activities, or spikes. Artifacts are usually characterized by several orders of magnitude larger than the signal of interest, which cannot be described by the standard Gaussian distribution. Thus, when using the conventional L2 norm-based approaches like maximum-likelihood (ML) and least squares to estimate the noise variance or to extract features, the results are usually inevitably biased by the components that express the outliers^9,10,11. To resist the noise influence, Cai et al.¹² redefined kernel discriminant analysis with spectral graph analysis (SRKDA) and estimated the projections in a more efficient and robust way. Though a series of improvements^13,14,15 have been proposed to widen the applications of GE extensions in recent years, few of them have focused on artifact restriction.

The easiest way to depress the influence of artifacts for further analysis is to reject the contaminated samples. However, this is not favorable because it may lead to the loss of other useful information. A more favorable way is to detect the artifacts or outliers and then use approaches like regression, the blind source separation (BSS) method for artifact removal. Although some methods have been proposed to find outliers, it is difficult to determine the extent to which the peculiarities would be outliers⁸. Even if the artifacts or the outliers are located, the process of eliminating the artifact component by using BSS methods such as independent component analysis (ICA) is not trivial and may result in the loss or distortion of useful information in the extracted features¹⁶. To avoid the loss of data while restricting the outlier influence for feature extraction, one of the most efficient ways is to extend the conventional L2 norm-based models to the Lp (p ≤ 1) norm space. For example, Kwak et al. translated the L2 norm structure of conventional PCA with the L1 norm to restrict the outlier influence for dimensional reduction¹⁷. Wang et al. proposed replacing the covariance matrix with the L1 norm structure in the Rayleigh quotient expression to improve the robustness of common spatial patterns (CSP) toward outliers for a motor imagery-based brain computer interface (BCI)¹¹. In our previous conference work¹⁸, Zhou developed the spectral regression (SR) in the L1 norm space to obtain robust parameters under outlier conditions while neglecting the possible outlier effects on the estimation of response vectors in graph embedding. Similar to Zhou’s work, Nie et al. measured the distance between any pair of projected vertices in the L1 norm space¹⁹. To facilitate the solution, they still restrained the denominator of the Laplacian embedding in the L2 norm space. Accordingly, this may also inevitably be influenced by outliers. In this work, we propose a novel graph-embedding framework based on the L1 norm maximization strategy to restrict the outlier influence in dimensionality reduction, which is usually encountered in biological signals.

Results

In recent years, graph embedding and its extensions have gained more and more attention and contributed to a series of relative reports. Among these, discriminant analysis-based extensions, such as RLDA, HLDA and NDA, and spectral regression-based extensions, such as SRDA, SRKDA and local preserving projection and its extensions, are commonly mentioned and have been successfully applied to both engineering technology and neural science research^{12,20,21,22,23,24,25}. These methods have their own superiorities when dealing with different data problems such as noise and heteroscedasticity in dimension reduction^3,26. In addition to these GE extensions, other L1 norm-based linear classifiers such as Lasso SRDA, L1 SR and L1 LDA have also been proposed to resist outlier influence in recent years. Thus, in this section, we will investigate the performance difference between our proposed L1 GE algorithm and these popular classifiers for classification tasks. Classification accuracy is adopted as the performance index for evaluation. All of our experiments were performed on a Core i3 3.30-GHz Windows 7 machine with 8 GB of memory.

We mainly investigate the performance difference between L1 GE and the 10 other popular classifiers (7 GE extensions and 3 L1 norm-based linear classifiers), i.e., LDA²⁷, Regularized LDA (RLDA)²⁸, Heteroscedastic LDA (HLDA)²⁹, Non-parameter LDA (NDA)³⁰, Spectral regression discriminant analysis (SRDA)³¹, LPP³², SRKDA¹², Lasso SRDA³¹, L1 LDA³³ and L1 SR¹⁸, for dimension reduction and feature extraction. In both simulation and real datasets experiments, we simply set the regularization parameter for RLDA and SRDAs (i.e., SRDA, SRKDA and Lasso SRDA) as 1.0 and 0.7 according to³¹. As the types of kernels were not what we considered in this work, we just used the Gaussian kernel for SRKDA as reported in¹². In this work, we tuned the kernel width parameter σ in SVM to achieve the best testing performance for SVM. Then, the same kernel width parameter σ was used for SRKDA. The weights estimation for LPP is straightforward according to³². The 3 neighbors in NDA were used as reported in a previous study³⁰.

Simulation studies

In this section, two classes from a 2-D Gaussian distribution with different means are adopted³⁴. Class 1 is from a Gaussian distribution with mean (3.00, 3.00) and variance (0.5, 0.5), and class 2 is from a Gaussian distribution with mean (1.85, 1.85) and variance (0.5, 0.5). Theoretically, the optimal hyperplane (i.e., decision boundary) for differentiating the two classes should be along 135 degrees. If the outliers are introduced, the corresponding classification hyperplane may be biased. During simulation, the training set consisted of 220 samples, with each class being 110 samples, and the testing set contained the same label distribution as the training set, i.e., 110 samples for each class in the testing set.

Effect of outlier occurrence rate

This simulation generates datasets contaminated by different numbers of outliers. The outlier is generated from the Gaussian distribution with mean (13.65, 13.65) and variance (0.5, 0.5). The number of outliers is set as 0%, 3% and 8% of the sample number. In each outlier condition, the outliers are evenly assigned to the two classes and the procedure is repeated 100 times. The mean accuracies are reported in Table 1 on the left side. The corresponding mean effect of the outlier number on the hyperplane is visually given in Fig. 1 for all the linear classifiers. As SRKDA constructs the hyperplane in the kernel space that is essentially different from the original data space, the corresponding hyperplane of SRKDA is not presented in Fig. 1 under different outlier occurrence rates.

Table 1 Classification accuracy and the projection direction in different outlier conditions.

Full size table

Effect of outlier strength

This simulation generates datasets contaminated by outliers with different strengths. The outlier is generated from a Gaussian distribution with fixed variance (0.5, 0.5) and varied mean from (6.00 6.00) to (8.60, 8.60), where the outlier strength ($6.00\times \sqrt{2}$ ~ $8.60\times \sqrt{2}$) can be adjusted by the varied mean, aiming to reveal the effect of the outlier strength on the hyperplane. We fix the number of outliers as 5% of the total sample number. The procedure is repeated 100 times, and the mean accuracies are reported in Table 1 on the right side. The mean effect of the outlier strength on the hyperplane is also given in Fig. 1 for all the classifiers. In this simulation study, we also do not report the corresponding hyperplane of SRKDA under various outlier strengths, as in the first simulation study.

Actual Dataset

Gene Datasets

In real applications, gene data are usually contaminated by outliers due to measurement errors and random fluctuation at various manufacturing stages. Thus, in the following experiments, we tested the classification performance of the eight GE extensions on two gene datasets: One is a colon cancer dataset³⁵ and the other is a Leukemia dataset³⁶.

The first colon cancer dataset contains the expression levels of 2000 genes taken in 62 different samples, i.e., a 62 × 2000 data matrix. For each sample, the matrix indicates whether it is from a tumor biopsy. Forty samples are positive, and 22 samples are negative. In our experiment, a 5-fold CV is adopted and repeated 100 times. In each 5-fold CV run, approximately 49 samples are included in the training set, and the 2000-length feature is reduced to a 49-length for each sample to serve as an input for classification according to previous studies^37,38. The total time consumptions for the eight GE extensions to estimate the mapping information are 4.14 s, 1.08 s, 11.69 s, 35.30 s, 14.93 s, 0.90 s, 2.51 s, 0.91 s, 69.92 s, 2.17 s and 618.65 s. The mean accuracies of 100 runs for the eleven classifiers are given in Fig. 2(a).

The second leukemia dataset is a collection of expression measurements reported by Golub et al. (1999). The data set contains 72 samples. These samples are divided into two variants of leukemia: 25 samples of acute myeloid leukemia (AML) and 47 samples of acute lymphoblastic leukemia (ALL). The source of the gene expression measurements is taken from 63 bone marrow samples and 9 peripheral blood samples. The gene expression levels in these 72 samples are measured using high-density oligonucleotide microarrays. The expression levels of 7129 genes are reported. A 5-fold CV is used to evaluate the classification performance of the eight GE extensions on this dataset. In each 5-fold CV run, approximately 58 samples are included in the training set, and the 7129-length feature is reduced to a 58-length for each sample to serve as the input for classification. The times required for all the classifiers to estimate the mapping information are 5.25 s, 1.18 s, 14.84 s, 38.94 s, 17.91 s, 0.97 s, 3.12 s, 0.96 s, 62.05 s, 1.93 s and 637.33 s. The corresponding recognition accuracies for the classifiers are shown in Fig. 2(b). Both Fig. 2(a) and (b) consistently indicate that L1 GE, L1 SR, L1 LDA and RLDA have markedly better recognition performance than the other seven classifiers.

BCI Dataset

EEG is often contaminated by the noise from eye blinks or head movement, which largely influence the performance of EEG-based brain computer interfaces. In this section, we compare the performance difference of the eight GE extensions on two motor imagery datasets. The datasets used here are Dataset IVa of BCI competition 3 and a dataset recorded in our lab.

The first dataset consists of EEG signals recorded from five subjects using 118 electrodes. In each trial, a visual cue with respect to the motor imagery is shown for 3.5 s. Three types of images are presented, i.e., left hand, right hand and right foot. The tasks for the imagery of the right hand and the left hand are adopted to evaluate the algorithm performance in the current work. For the two tasks, the total number of EEG trials for each subject is 280, and the data are bandpass filtered between 0.05 and 200 Hz and down-sampled to 100 Hz. Following the reported results in^11,39,40, the time interval between 0.5 s and 2.5 s after the trial onset is selected for task recognition, and the training trial and testing trial numbers used in previous studies⁴¹ are also adopted in the current work.

The second dataset is from the MI BCI system developed in our group, which consists of EEG data from 19 subjects. During the online experiment, the subjects were required to sit in a comfortable armchair in front of a computer screen, and they were asked to perform motor imagery with the left or right hand according to the instructions appearing on the screen. Motor imagery lasted for 5 s, followed by 5 s of rest. Fifteen Ag/AgCl electrodes covering the sensorimotor area were used to record the EEG, and the signals were sampled using 1000 Hz and bandpass filtered between 0.5 Hz and 45 Hz. Four runs on the same day were recorded for each subject, with each run consisting of 50 trials (i.e., 25 trials for each class), and there was a 3-minute break between two consecutive runs. The time interval between 0.5 s to 5 s after trial onset was selected for task recognition. The 50 trials in the first run were used for training classifiers, and the remaining 150 were used to test the performance of the classifiers. The experiment was approved by the Ethical Committee of the University of Electronic Science and Technology of China (UESTC). Informed consent was obtained from all participants, according to the Declaration of Helsinki.

For both datasets, a common spatial pattern (CSP)⁴² is adopted to extract the MI rhythm related features, and the 6 features corresponding to the 6 most discriminative CSP filters⁴³ are used for task recognition. The corresponding classification accuracies for the 23 subjects are illustrated in Table 2, where L1 GE shows better performance than the other classifiers.

Table 2 Classification accuracy for the two BCI datasets.

Full size table

UCI Dataset

To evaluate the generalization performance of L1 GE on other applications, we use nine binary-class datasets in UCI machine learning repositories to evaluate the algorithm performance; details of these nine datasets can be found in⁴⁴. Specifically, in these nine datasets, five are from a clinical environment, i.e., heart disease data, breast cancer data, liver disorder data, SPECT data and thrombin data, while the remaining four datasets are from other applications.

For the purpose of our experiments, all of the features in each dataset are used for classification. We adopt the 5-fold CV proposed in¹⁷ for comparison, and the mean accuracies for 100 repetitions of CVs are reported in Table 3 for eleven different classification approaches. As shown in Table 3, L1 GE shows better performance on 5 datasets among the 9 tested datasets, giving the highest accuracy (84%) and approximately 1~5% accuracy improvement compared to the other approaches.

Table 3 The classification for the UCI dataset.

Full size table

Discussion

The simulation study quantitatively evaluates the possible influence of outliers on the performances of GE extensions when they are applied to pattern recognition. In the first experiment, we studied the influence of the ratios between samples and outliers on the hyperplane estimation with the mentioned eleven classifiers. As shown in Table 1 on the left side, when the occurrence rate of an outlier is increased from 0% to 8%, the performances of ten linear classifiers are all lowered. But in terms of classification accuracies, L1 GE and the other L1 norm-based classifiers consistently show the better performance compared with the L2 norm-based GE extensions. In essence, the accuracy difference among the eleven classifiers is determinative by the hyperplane estimated for classification, and the hyperplane angle in Table 1 also confirmed that the hyperplanes estimated in the L1 norm space are less influenced by the introduced outliers and that their angles are closest to the theoretical 135 degrees. Specifically, Fig. 1 visually reveals the different effects of outlier occurrence rates on the eleven classifiers, where the hyperplane of all the classifiers are deflected under noise conditions. However, the hyperplanes estimated in the L1 norm space are more robust to outlier influence with less deflection than the other six linear classifiers. As for the non-outlier case (i.e., 0% occurrence rate), the eleven classifiers expected for SRKDA can find the boundaries to discriminate the two classes well. However, if a dataset is contaminated with outliers, the corresponding hyperplanes estimated in the L2 norm space are obviously biased toward the outliers, resulting in the misclassification of some samples. Compared to the classifiers estimated in the L2 norm space, although L1 GE and other L1 norm-based classifiers are influenced by outliers, their hyperplanes can still provide good discrimination ability (i.e., close to the diagonal 135 degree direction) for the two classes.

In the second experiment, we studied the influence of various outlier strengths on the classification hyperplane. Similar to the results in the first experiment, all the linear classifiers used in this experiment were influenced by outlier strength. As shown in Table 1 on the right side, when the value of the outlier varied from $6.00\times \sqrt{2}$ to $8.60\times \sqrt{2}$, their performances decreased. However, in terms of classification accuracy, L1 norm-based classifiers consistently show better performance than the traditional GE extensions. Similar to the results on the left side, the hyperplane angles in Table 1 on the right side confirm that L1 norm-based classifiers are robust toward the influence of outlier strength and that the angles of their estimated hyperplanes are closest to the theoretical value of 135 degrees. Specifically, the hyperplanes estimated from different outlier-strength conditions in Fig. 1 also reveal the robustness of L1 norm-based classifiers. Another point to note is that even a small outlier, e.g., $6.00\times \sqrt{2}$, will lead to a biased hyperplane estimated by the L2 norm-based GE extensions. Compared to the traditional GE extensions, although L1 norm-based classifiers are also influenced by the outliers, their hyperplanes can still provide good discrimination ability (i.e., close to the diagonal 135 degrees direction) for these two classes.

Considering the results in Fig. 1, we can see that the hyperplanes estimated by HLDA under outlier conditions show much more bias toward the ideal boundary compared to the other nine classifiers, which accounts for the relatively lower accuracies for this classifier in the simulation studies. Among the existing LDA variants, HLDA aims to extend LDA toward problems with more complex distribution other than the homoscedasticity distribution with equivalent variance^29,30,45. The motivation of HLDA is to address heteroscedastic situations where the classes may have a distribution with differences in both the variance and mean, which indicates that this method needs to estimate these two parameters from the training samples²⁹. However, when the training sample is contaminated by outliers, both the mean and variance are inaccurately estimated to fit a false sample distribution, which will finally induce the biased hyperplane. Similar to HLDA, NDA is also designed to solve the problems with unequal variance distribution. For NDA the critical step is to estimate the distance between samples in one class and their nearest neighbor centers from another class³⁰. When the training samples are contaminated with outliers, the distance between the noised samples and their neighbor centers is usually larger than that of other original samples, which plays a dominant role in the between-class scatter estimation. Consequently, more outlier numbers will lead to a more biased hyperplane estimated by NDA. RLDA is developed to address the possible singular problem existing in the covariance matrix³¹, and SRDA combined with regularization operators can decrease the time and memory consumption by the Eigen decomposition of dense matrices. For this simulation study, the covariance matrix has relatively larger coefficients in the diagonal direction, thus indicating the non-singularity of the covariance matrix. Therefore, when small regularization parameters (α = 1.0 or α = 0.70) are added to the diagonal coefficients in RLDA and SRDA, they will have little effect on the covariance properties, resulting in the similar results observed for LDA, RLDA and SRDA in the simulation study. Nevertheless, the effect of different regularization strategies used in RLDA and SRDA on the classification performance can be revealed by the actual datasets when the covariance matrix may be close to singularity. As SRKDA estimated hyperplanes in the kernel space whose dimension is usually more than 2, they cannot be illustrated in Fig. 1 as other linear GE extensions. However, Table 1 showed that SRKDA performs similarly in resisting outlier influence. In essence, SRKDA and SRDA share the same parameter estimation strategy, which indicates that when the outlier influence cannot be restricted by the kernel space, SRKDA will also be influenced. It is worth noting that the Gaussian kernel seems to be less sensitive than the other LS-based GE extensions when both the strength and occurrence of the outlier increases, which may be attributed to the Gaussian space used for kernel construction. In fact, the kernel space reflects the interactions between different samples. For a Gaussian kernel, the weights of the outliers tend to be small values. When the outlier ratio is small, it will not influence the real neighbor relationship in the kernel space. However, when the outlier ratio increases, more kernel-components will tend toward small values, which will greatly influence the stability of the kernel matrix. In addition, the types and parameters of kernels will also have an effect on the performance of related methods. Thus, it might be cumbersome to find proper kernels and the corresponding parameters for different applications. LPP is proposed to yield a nearest neighbor structure in low-dimensional space similar to that in high-dimensional space by preserving the local structure³². When the value of the outlier is small, it will have less influence on the main neighbor structure, resulting in a relatively robust hyperplane as illustrated by the red line in Fig. 1(g). However, when the outliers are stronger, the neighbor structure measured by the Euclidean distance will be destroyed, and the biased mapping information as illustrated in Fig. 1(g) will be estimated. Evidently, these linear GE extensions are not inferred in the outlier problem. As a result, they are all sensitive to the outliers in our simulation studies. It is worth noting that both L1 norm-based classifiers and SRKDA consistently show good ability to compress the outlier effect. Compared with SRKDA, L1 GE, L1 LDA and L1 SR are nonparametric methods, which indicates their convenience in real applications.

Compared with the L2 norm-based GE extensions, all the L1 norm-based classifiers mentioned in this work performed better, as illustrated in Fig. 1(e),(h–j), which could be attributed to the robustness of the L1 norm space to outlier influence. In addition to the better performance, we also observe that the differences in the objective function will have an influence on the hyperplanes, although the four mentioned L1 norm classifiers estimated the parameters in the same norm space. Through Fig. 1, we can see that the hyperplanes estimated by L1 SR and L1 SRDA are almost the same as shown in Fig. 1(i) and (j), which may be attributed to the spectral regression model utilized in their objective function. Meanwhile, the hyperplanes estimated by L1 GE and L1 LDA also hold some similarity in different outlier conditions as shown in Fig. 1(e) and (h). In fact, with the coding matrix H defined for supervised learning in the current work, (11) can be transformed into an L1 norm-based DA structure that is similar to L1 LDA, and this similarity can account for their close performance as outlier restriction methods in the simulation study. Although L1 GE and L1 LDA have a similar DA structure under supervised conditions, there are some differences in the definition of the scatter matrix in the DA denominator, which accounts for the different classification results when applied to the actual datasets. In essence, our proposed L1 GE can be flexibly transformed into other L1 norm-based GE extensions (i.e., L1 LDA, L1 LPP, L1 PCA, etc.), with a variety of coding matrices H. We will explore this in further work.

In biomedical engineering, gene data are usually measured for cancer diagnosis. However, gene expression data sets usually have a large number of variables but with a small number of samples⁴⁶ that require robust classifiers for the clinical diagnosis of diseases. RLDA is one of the popular methods for such a situation^20,28, and the classification results for these two datasets demonstrate the efficiency of RLDA for gene recognition. For a gene dataset, specific problems are encountered such as information redundancy or a low signal-to-noise ratio (i.e., strong noise artifacts)^36,37,47. The conducted comparison of the two gene datasets confirmed that L1 GE may also have a stable ability to address these problems and has the closest performance to RLDA. In gene datasets, we also observe that Lasso SRDA performs worse than the other L1 norm-based classifiers. This may be due to the objective function used in Lasso SRDA. In essence, the distance measurement of Lasso SRDA is still designed in L2 norm space although it imposed the L1 norm constraint on the parameter, which may indicate that when the dataset suffers from the curse of dimensionality and strong noise artifacts, the Lasso SRDA will be influenced, regardless of how strong the constraints that are imposed on the parameters.

In BCI applications, artifacts are the main cause of reduced signal quality. Artifacts can be caused by various factors, such as electrode wire movement in the signal recordings and measurement errors in biochemistry. These artifacts usually hold a strength that is several orders of magnitude larger than the signal of interest and appear as spike-like waveforms of very short periods. Such artifacts can usually be treated as outliers and must be eliminated before further study. In studies based on biomedical signals, the easiest way is to reject the contaminated samples. However, this is not preferable because it may lead to the loss of other useful information. Other favorable methods such as BSS can also be applied for artifact rejection. However, it is difficult to determine the extent to which the samples may be contaminated by outliers, and it is also not trivial to eliminate the artifact components by the BSS methods. In addition, there is a rare report about the application of graph embedding analysis in scalp EEG BCI. To investigate the feasibility of L1 GE in the EEG BCI application, we evaluate the performance on two independent BCI datasets (total 24 subjects). Table 2 shows that 9 of 24 subjects can obtain the highest accuracy when L1 GE is used, which is the highest ratio among the eleven classifiers. Moreover, L1 GE shows 1~9% accuracy improvement compared to the other ten classifiers. The results in Table 2 show the potential of L1 GE for actual BCI application.

The presence of outliers is encountered not only in biomedical signal recording or gene expression data but also in a variety of other clinical applications. To evaluate the generalization performance of L1 GE, we used 9 UCI datasets with 6 datasets from the clinical application. The results for the UCI dataset presented in Table 3 reveal that L1 GE has the highest recognition accuracy (84%). Table 3 also demonstrates that no classifier consistently shows the best performance across the 9 different UCI datasets and that L1 GE achieves the best performance for 5 of the 9 datasets, hence outperforming the ten other classifiers. In contrast to the conventional L2 norm-based GE and its extensions, L1 norm-based classifiers will utilize an iterative procedure, which results in the relatively higher complexity and also accounts for more time consumption.

The applications to various datasets demonstrate that the L1 norm-based GE extensions are robust in dealing with the artifact-influenced classification. Compared to conventional graph embedding based on the L2 norm structure, the developed L1 norm GE maximizes the dispersion in the L1 norm space, which can provide a more powerful ability to suppress artifact effects. In this work, we mainly evaluated the performance of L1 GE in supervised learning. For unsupervised learning, one crucial step is to estimate the symmetric factor H, which holds the structure information in similarity matrix W^48,49. In essence, our proposed L1 GE can be flexibly transformed into other L1 norm-based GE extensions, with a variety of coding matrices H. In future work, we will explore the efficiency of L1 GE when it is transformed into other L1 norm-based GE extensions, and we will also do further analysis on its efficiency in unsupervised and semi-supervised learning.

Methods

Graph embedding

Define $m$ samples $X={\{{x}_{i},{x}_{i}\subset {R}^{n\times 1}\}}_{i=1}^{m}\subset {R}^{n\times m}$ from $C$ classes. In general graph embedding, each sample point can be treated as a vertex in an adjacency graph $G\in {R}^{m\times m}$, where n denotes the sample dimension. The corresponding edges in G represent a statistical relationship between each pair of these sample points. The motive of graph embedding is to represent each vertex of G in a lower dimensional space and preserve the original edge information between vertex pairs. Essentially, graph embedding estimates the response vector $y\in {R}^{m\times 1}$, which maximizes the following function:

$$J(y)=\frac{{y}^{T}Wy}{{y}^{T}Dy}\,\,$$

(1)

where T denotes the transpose and $W\in {R}^{m\times m}$ is a sparse symmetric matrix reflecting the weight of the joining edge between vertices i and j as

$${W}_{ij}=\{\begin{array}{c}1/{m}_{k},\,\,\,{\rm{if}}\,{x}_{i}\,{\rm{and}}\,{{\rm{x}}}_{j}\,{\rm{both}}\,\\ \,\,\,\,\,\,\,{\rm{belong}}\,{\rm{to}}\,{\rm{the}}\,k-\mathrm{th}\,\mathrm{class};\\ 0,\,\,\,\,\,\,\,\,{\rm{otherwise}}.\end{array}$$

(2)

and m_k is the sample number of the k-th class. D is a diagonal matrix whose entries are column or row sums of W³¹. Note that the scaling of the projection y will have no effect on the objective value. Thus, maximizing J(y) is tantamount to the following constrained optimization problem as

$$\{\begin{array}{c}\mathop{\text{arg}\,\max }\limits_{w}\,\,\,{y}^{T}Wy\\ subject\,to\,{y}^{T}Dy=1\end{array}\,\,\,\,$$

(3)

By introducing the Lagrange multiplier, the objective function can be rewritten as

$$L(y;\lambda )={y}^{T}Wy-\lambda ({y}^{T}Dy-1).$$

(4)

Taking the derivative of (4) with respect to y under the condition of $\partial L/\partial y=0$, response vector y can be estimated by using the generalized eigenvalue equation as

$$Wy=\lambda Dy\,$$

(5)

where $\lambda $ denotes the eigenvalue of the generalized eigenproblem, and y is the corresponding eigenvector. For multiple response vectors, the above equation (5) can be solved as

$${D}^{-1}WY={\rm{\Sigma }}Y$$

(6)

where Y is the matrix consisting of the eigenvectors of ${D}^{-1}W$, and ${\rm{\Sigma }}=diag({\lambda }_{1},{\lambda }_{2},\mathrm{...}{\lambda }_{m})$ is a diagonal matrix consisting of the eigenvalues of ${D}^{-1}W$. For classification purposes, there are only $(C-1)$ eigenvectors corresponding to the maximum $(C-1)$ eigenvalues. However, the response vectors ${\{{y}_{i}\}}_{i=1}^{C-1}\subset {R}^{m\times (C-1)}$ inferred from (6) only provide mapping information in the training set. To expand the mapping information for the testing sample, a simple way is to estimate some projections between the response vector and sample points. By replacing y with ${X}^{T}\alpha $, the objective function in (1) could be rewritten as

$$\mathop{\text{arg}\,\max \,}\limits_{\alpha }\,J(\alpha )=\mathop{\text{arg}\,\max }\limits_{\alpha }\frac{{\alpha }^{T}XW{X}^{T}\alpha }{{\alpha }^{T}XD{X}^{T}\alpha }$$

(7)

where $\alpha \subset {R}^{n\times 1}$ is the mapping projection between the defined graph and samples. Therefore, the optimal solution of equation (7) is a mapping $\alpha \subset {R}^{n\times 1}$, which can transform samples X to Y by preserving the manifold structure defined in W as much as possible. More details about graph embedding can be found in Appendix B.

L1 Graph embedding

Noting that W is a symmetrical matrix and D is a diagonal matrix, equation (7) can be formatted as

$$\begin{array}{c}\mathop{\text{arg}\,\max }\limits_{\alpha }\,J(\alpha )=\mathop{\text{arg}\,\max }\limits_{\alpha }\frac{{\alpha }^{T}XW{X}^{T}\alpha }{{\alpha }^{T}XD{X}^{T}\alpha }\\ \,\,\,\,\,\,\,\,\,\,\,=\,\mathop{\text{arg}\,\max }\limits_{\alpha }\frac{{\alpha }^{T}XH{H}^{T}{X}^{T}\alpha }{{\alpha }^{T}X\sqrt{D}{\sqrt{D}}^{T}{X}^{T}\alpha }=\frac{{\Vert {\alpha }^{T}XH\Vert }_{2}^{2}}{{\Vert {\alpha }^{T}X\sqrt{D}\Vert }_{2}^{2}}\end{array}$$

(8)

where ${\Vert \bullet \Vert }_{2}$ denotes L2 norm, $W=H{H}^{T}$ and ${D}_{ii}={\sum }_{j=1}^{m}{W}_{ij}$. Because W is a symmetric matrix, there are several effective strategies to implement symmetric factorization so that we can easily estimate H^48,49. As revealed in equation (8), the graph embedding is essentially derived from the L2 norm structure. However, the L2 norm has been proven to be prone to the presence of outliers¹⁷, which indicates that in practical applications, outliers will cause an unexpected effect on related analyses such as signal processing and feature extraction. To improve the robustness of parameter estimation in this framework, some schemes like sparse constraint with an L1 norm are proposed to alleviate the outlier effect^50,51. However, most of these schemes are mainly focused on imposing restrictions on parameters but still leave the main structure of the objective function in the L2 norm space, which will inevitably exaggerate the outlier effect, regardless of how much these parameters are emphasized. Motivated by the merit of the L1 norm in suppressing the outlier effect, we proposed estimating the mapping denoted in (7) and (8) with the L1 norm space instead of the L2 norm as

$$\mathop{\text{arg}\,\max \,}\limits_{\alpha }\,\tilde{J}(\alpha )=\mathop{\text{arg}\,\max }\limits_{\alpha }\frac{{\Vert {\alpha }^{T}XH\Vert }_{1}}{{\Vert {\alpha }^{T}X\sqrt{D}\Vert }_{1}}$$

(9)

where ${\Vert \bullet \Vert }_{1}$ denotes the L1 norm. We refer to (9) as L1 Graph embedding (L1 GE). For pattern recognition, when all the training samples are labeled, we have ${\sum }_{k=1}^{C}{m}_{k}=m$. Suppose that all of the data points in X are ordered according to their labels as X = [X⁽¹⁾, X⁽²⁾,…, X^(C),…, X^(k)], where ${X}^{(k)}=[{x}_{1}^{(k)},{x}_{2}^{(k)},\mathrm{...},{x}_{{m}_{k}}^{(k)}]$ with ${x}_{i}^{(k)}$ being the ith feature vector for the kth class; then, the weight matrix W can be defined as a c-block diagonal matrix³¹, with each block being a symmetric matrix as

$$W=[\begin{array}{cccc}{W}^{(1)} & 0 & 0 & 0\\ 0 & {W}^{(2)} & 0 & 0\\ 0 & 0 & \ddots & 0\\ 0 & 0 & 0 & {W}^{(C)}\end{array}]$$

(10)

where ${W}^{(k)}$ is an ${m}_{k}\times {m}_{k}$ matrix with all of the elements as $1/{m}_{k}$ and $D\in {R}^{m\times m}$ is a unit matrix. Let $\tilde{X}=X-\bar{X}$; we can rewrite (10) as

$$\mathop{\text{arg}\,\max }\limits_{\alpha }\,\tilde{J}(\alpha )=\mathop{\text{arg}\,\max }\limits_{\alpha }\frac{{\Vert {\alpha }^{T}{{\rm{\Phi }}}_{b}\Vert }_{1}}{{\Vert {\alpha }^{T}\tilde{X}\Vert }_{1}}$$

(11)

where $\bar{X}$ is the mean of $X$, and ${{\rm{\Phi }}}_{b}=\tilde{X}H$. $H\in {R}^{m\times c}$ is a coding matrix with columns indicating the class types and rows indicating the samples. The values in each column of H are defined as

$${H}_{k}={[\mathop{\underbrace{0,\mathrm{...},0}}\limits_{{\sum }_{i=1}^{k-1}{m}_{i}},\mathop{\underbrace{1/\sqrt{{m}_{k}},\mathrm{...},1/\sqrt{{m}_{k}}}}\limits_{{m}_{k}},\mathop{\underbrace{0,\mathrm{...},0}}\limits_{{\sum }_{i=k+1}^{c}{m}_{i}}]}^{T}.$$

(12)

In essence, by using the logarithm transformation, equation (12) is formatted as

$$J(w)=\mathop{\text{arg}}\limits_{w}\,\max \,\,\mathrm{ln}(||{\alpha }^{T}{{\rm{\Phi }}}_{b}|{|}_{1})-\,\mathrm{ln}(||{\alpha }^{T}\tilde{X}|{|}_{1}).$$

(13)

By introducing the sign function R and Q, we define the iterative direction for (14) as

$$d(\alpha (t))=\frac{(\sum _{i}^{C}{R}_{i}(t){{\rm{\Phi }}}_{b}(:,i))}{||\alpha {(t)}^{T}{{\rm{\Phi }}}_{b}|{|}_{1}}-\frac{(\sum _{j}^{N}{Q}_{j}(t)\tilde{X}(:,j))}{||\alpha {(t)}^{T}\tilde{X}|{|}_{1}},$$

(14)

where R(t), and Q(t) are defined as

$$\begin{array}{c}{R}_{i}(t)=\{\begin{array}{c}1,\,\alpha {(t)}^{T}{{\rm{\Phi }}}_{b}(:,i)\ge 0\\ -1,\alpha {(t)}^{T}{{\rm{\Phi }}}_{b}(:,i) < 0\end{array}\\ {Q}_{j}(t)=\{\begin{array}{c}1,\,\alpha {(t)}^{T}\tilde{X}(:,j)\ge 0\\ -1,\alpha {(t)}^{T}\tilde{X}(:,j) < 0,\end{array}\end{array}$$

(15)

In the above equations, ${{\rm{\Phi }}}_{b}(:,i)$ and $\tilde{X}(:,i)$ denote the ith column of matrices ${{\rm{\Phi }}}_{b}$ and $\tilde{X}$, respectively.

Based on the gradient in (14), the objective function in (11) can be solved by the iterations below:

(1)
Initialization. Set t = 0; Set the stop tolerance error ${\varepsilon }_{1}$ and ${\varepsilon }_{2}$ with a small positive number. Initialize $\alpha (0)\,$ with N random numbers and normalize it as $\alpha (0)\,=\alpha (0)/||\alpha (0)|{|}_{2}$.
(2)
Updating. Update the projection vector as $\alpha (t+1)\,=\alpha (t)+\eta d(\alpha (t))$ with $\eta $ being a small number accounting for the learning rate of iteration.
(3)
Convergence criterion. If $J(\alpha (t+1))-J(\alpha (t))|| < {\varepsilon }_{1}$, or ${\Vert \alpha (t+1)-\alpha (t)\Vert }_{2} < {\varepsilon }_{2}$, set $\alpha \,=\alpha (t+1)/||\alpha (t+1)|{|}_{2}$ and stop iteration; else, t = t + 1, and go to Step 2.

Using the above steps, the optimal $\alpha $ can be estimated. The computational complexity of L1 GE parameter estimation for a single iteration is $o(n\times (2m+2c+mc))$.

With the gradient in equation (14) and the iteration steps, the proposed objective function J(α(t)) is a non-decreasing function at each step of iteration t, which have been proved in Appendix A.

References

You, W., Yang, Z., Yuan, M. & Ji, G. TotalPLS: Local Dimension Reduction for Multicategory MicroarrayData. IEEE Transactions on Human-Machine Systems 44, 125–138, https://doi.org/10.1109/THMS.2013.2288777 (2014).
Article Google Scholar
Basavanhally, A. N. et al. Computerized image-based detection and grading of lymphocytic infiltration in HER2+ breast cancer histopathology. IEEE Transactions on Biomedical Engineering 57, 642–653 (2010).
Article PubMed Google Scholar
Yan, S. et al. Graph embedding and extensions: a general framework for dimensionality reduction. Pattern Analysis and Machine Intelligence, IEEE Transactions on 29, 40–51 (2007).
Article Google Scholar
Monti, R. et al. In Pattern Recognition In NeuroImaging (PRNI), 2015 International Workshop on. 1–4 (IEEE).
Sitaram, R. et al. Temporal classification of multichannel near-infrared spectroscopy signals of motor imagery for developing a brain–computer interface. NeuroImage 34, 1416–1427 (2007).
Article PubMed Google Scholar
Rivet, B., Souloumiac, A., Attina, V. & Gibert, G. xDAWN algorithm to enhance evoked potentials: application to brain–computer interface. IEEE Transactions on Biomedical Engineering 56, 2035–2043 (2009).
Article PubMed Google Scholar
Rahman, F. A., Othman, M. F. & Shaharuddin, N. A. In Control Conference (ASCC), 2015 10th Asian. 1–6 (IEEE).
Sun, J. & Kabán, A. A fast algorithm for robust mixtures in the presence of measurement errors. IEEE transactions on neural networks 21, 1206–1220 (2010).
Article PubMed Google Scholar
Michailovich, O. & Adam, D. Robust estimation of ultrasound pulses using outlier-resistant de-noising. IEEE Transactions on Medical Imaging 22, 368–381 (2003).
Article PubMed Google Scholar
Breuer, L., Dammers, J., Roberts, T. P. & Shah, N. J. A constrained ICA approach for real-time cardiac artifact rejection in magnetoencephalography. IEEE Transactions on Biomedical Engineering 61, 405–414 (2014).
Article PubMed Google Scholar
Wang, H., Tang, Q. & Zheng, W. L1-norm-based common spatial patterns. IEEE Transactions on Biomedical Engineering 59, 653–662 (2012).
Article PubMed Google Scholar
Cai, D., He, X. & Han, J. Speed up kernel discriminant analysis. The VLDB Journal—The International Journal on Very Large Data Bases 20, 21–33 (2011).
Article Google Scholar
Nguyen, C. H. & Mamitsuka, H. Discriminative graph embedding for label propagation. IEEE Transactions on Neural Networks 22, 1395–1405 (2011).
Article PubMed Google Scholar
Cai, D., He, X. & Han, J. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on. 1–8 (IEEE).
Cai, D., He, X. & Han, J. Semi-supervised regression using spectral techniques (2006).
Upadhyay, R., Padhy, P. & Kankar, P. EEG artifact removal and noise suppression by Discrete Orthonormal S-Transform denoising. Computers & Electrical Engineering 53, 125–142 (2016).
Article Google Scholar
Kwak, N. Principal component analysis based on L1-norm maximization. Pattern Analysis and Machine Intelligence, IEEE Transactions on 30, 1672–1680 (2008).
Article Google Scholar
Zhou, W. et al. In Digital Signal Processing (DSP), 2015 IEEE International Conference on. 535–538 (IEEE).
Nie, F., Wang, H., Huang, H. & Ding, C. In Computer Vision (ICCV), 2011 IEEE International Conference on. 2268–2273 (IEEE).
Guo, Y., Hastie, T. & Tibshirani, R. Regularized linear discriminant analysis and its application in microarrays. Biostatistics 8, 86–100 (2007).
Article PubMed MATH Google Scholar
Xia, Z., Xia, S., Wan, L. & Cai, S. Spectral regression based fault feature extraction for bearing accelerometer sensor signals. Sensors 12, 13694–13719 (2012).
Article PubMed PubMed Central Google Scholar
Zheng, W., Tang, H., Lin, Z. & Huang, T. S. In Computer Vision, 2009 IEEE 12th International Conference on. 1901–1908 (IEEE).
Klare, B. F. & Jain, A. K. Heterogeneous face recognition using kernel prototype similarities. IEEE transactions on pattern analysis and machine intelligence 35, 1410–1422 (2013).
Article PubMed Google Scholar
Li, Z., Park, U. & Jain, A. K. A discriminative model for age invariant face recognition. IEEE transactions on information forensics and security 6, 1028–1037 (2011).
Article Google Scholar
He, X., Yan, S., Hu, Y., Niyogi, P. & Zhang, H.-J. Face recognition using laplacianfaces. IEEE transactions on pattern analysis and machine intelligence 27, 328–340 (2005).
Article PubMed Google Scholar
Cai, D., He, X. & Han, J. Spectral regression for dimensionality reduction. Technical report, Computer Science Department, UIUC, UIUCDCS-R-2007-2856 (2007).
Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals of eugenics 7, 179–188 (1936).
Article Google Scholar
Friedman, J. H. Regularized discriminant analysis. Journal of the American statistical association 84, 165–175 (1989).
Article MathSciNet Google Scholar
Zheng, W. Heteroscedastic feature extraction for texture classification. Signal Processing Letters, IEEE 16, 766–769 (2009).
Article ADS Google Scholar
Li, Z., Lin, D. & Tang, X. Nonparametric discriminant analysis for face recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on 31, 755–761 (2009).
Article Google Scholar
Cai, D., He, X. & Han, J. SRDA: An efficient algorithm for large-scale discriminant analysis. Knowledge and Data Engineering, IEEE Transactions on 20, 1–12 (2008).
Article Google Scholar
He, X., Yan, S., Hu, Y., Niyogi, P. & Zhang, H.-J. Face recognition using Laplacianfaces. Pattern Analysis and Machine Intelligence, IEEE Transactions on 27, 328–340 (2005).
Article Google Scholar
Zhong, F. & Zhang, J. Linear discriminant analysis based on L1-norm maximization. IEEE Transactions on Image Processing 22, 3018–3027 (2013).
Article ADS MathSciNet PubMed MATH Google Scholar
Lu, H., Plataniotis, K. N. & Venetsanopoulos, A. N. A survey of multilinear subspace learning for tensor data. Pattern Recognition 44, 1540–1551 (2011).
Article MATH Google Scholar
Alon, U. et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96, 6745–6750 (1999).
Article ADS CAS Google Scholar
Golub, T. R. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. science 286, 531–537 (1999).
Article CAS PubMed Google Scholar
Ben-Dor, A. et al. Tissue classification with gene expression profiles. Journal of Computational Biology 7, 559–583 (2000).
Article CAS PubMed Google Scholar
Guan, N., Tao, D., Luo, Z. & Yuan, B. Manifold regularized discriminative nonnegative matrix factorization with fast gradient descent. Image Processing, IEEE Transactions on 20, 2030–2048 (2011).
Article ADS MathSciNet MATH Google Scholar
Blankertz, B. et al. The BCI competition III: Validating alternative approaches to actual BCI problems. Neural Systems and Rehabilitation Engineering, IEEE Transactions on 14, 153–159 (2006).
Article Google Scholar
Li, P., Xu, P., Zhang, R., Guo, L. & Yao, D. L1 norm based common spatial patterns decomposition for scalp EEG BCI. Biomed. Eng. Online 12 (2013).
Naeem, M., Brunner, C., Leeb, R., Graimann, B. & Pfurtscheller, G. Seperability of four-class motor imagery data using independent components analysis. Journal of neural engineering 3, 208 (2006).
Article ADS CAS PubMed Google Scholar
Blankertz, B., Tomioka, R., Lemm, S., Kawanabe, M. & Muller, K.-R. Optimizing spatial filters for robust EEG single-trial analysis. Signal Processing Magazine, IEEE 25, 41–56 (2008).
Article ADS Google Scholar
Lotte, F. & Guan, C. Regularizing common spatial patterns to improve BCI designs: unified theory and new algorithms. IEEE Transactions on biomedical Engineering 58, 355–362 (2011).
Article PubMed Google Scholar
Blake, C. & Merz, C. J. {UCI} Repository of machine learning databases (1998).
D L Torre, F. & Black, M. J. A framework for robust subspace learning. International Journal of Computer Vision 54, 117–142 (2003).
Article MATH Google Scholar
Dziuda, D. M. Data mining for genomics and proteomics: analysis of gene and protein expression data. Vol. 1 Ch. 2, 17–94 (John Wiley & Sons, 2010).
Dudoit, S., Fridlyand, J. & Speed, T. P. Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American statistical association 97, 77–87 (2002).
Article MathSciNet CAS MATH Google Scholar
Kuang, D., Park, H. & Ding, C. H. In Proceedings of the 2012 SIAM international conference on data mining, Society for Industrial and Applied Mathematics. 106–117 (SIAM).
Klema, V. C. & Laub, A. J. The singular value decomposition: Its computation and some applications. Automatic Control, IEEE Transactions on 25, 164–176 (1980).
Article MathSciNet MATH Google Scholar
Yu, G. X., Yu, Z. W., Hua, J., Li, X. & You, J. In Machine Learning and Cybernetics (ICMLC), 2011 International Conference on. 532–537 (IEEE).
Shi, L., Du, L. & Shen, Y.-D. In Data Mining (ICDM), 2014 IEEE International Conference on. 977–982 (IEEE).

Download references

Acknowledgements

This work was supported in part by research funding from the National Nature Science Foundation of China (#61522105, #81330032), and the 863 Project 2012AA011601.

Author information

Authors and Affiliations

The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Lab for Neuroinformation, University of Electronic Science and Technology of China, Chengdu, China
Peiyang Li, Weiwei Zhou, Xiaoye Huang, Xuyang Zhu, Teng Ma, Daqing Guo, Dezhong Yao & Peng Xu
School of life Science and technology, center for information in medicine, University of Electronic Science and Technology of China, Chengdu, China
Peiyang Li, Weiwei Zhou, Xiaoye Huang, Xuyang Zhu, Huan Liu, Teng Ma, Daqing Guo, Dezhong Yao & Peng Xu

Authors

Peiyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoye Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xuyang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Teng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Daqing Guo
View author publications
You can also search for this author in PubMed Google Scholar
Dezhong Yao
View author publications
You can also search for this author in PubMed Google Scholar
Peng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Peiyang Li, Peng Xu and Dezhong Yao designed the algorithm. Peiyang Li, Xiaoye Huang and Weiwei Zhou performed the experiments. Teng Ma, Xuyang Zhu and Huan Liu analyzed the results. Peiyang Li, Peng Xu and Daqing Guo wrote the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Peng Xu.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Appendices

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, P., Zhou, W., Huang, X. et al. Improved Graph Embedding for Robust Recognition with outliers. Sci Rep 8, 4231 (2018). https://doi.org/10.1038/s41598-018-22207-x

Download citation

Received: 21 November 2017
Accepted: 19 February 2018
Published: 09 March 2018
DOI: https://doi.org/10.1038/s41598-018-22207-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.