Group discriminative least square regression for multicategory classification

doi:10.1016/j.neucom.2020.05.016

Neurocomputing

Volume 407, 24 September 2020, Pages 175-184

https://doi.org/10.1016/j.neucom.2020.05.016 Get rights and content

Highlights

•
A new regularization of the classifier is given to guarantee group discrimination.
•
A group discriminative LSR model is given for multicategory classification.
•
Extended experiments show superior performance of the proposed method.

Abstract

The least square regression (LSR) is a popular framework for multicategory classification because it has simple mathematical formulation and efficient solution. The classification performance of LSR based methods depends heavily on the discriminative capability of the label transformation. In this work, we aim to enhance the discriminative capability of the label transformation by imposing a new class-induced structure constraint. Specifically, we propose to regularize the label transformation matrix by the difference of $l_{2, 1}$ norm and $l_{2, 2}$ norm of the predicted labels of each class. The major advantage of the new regularity is that, it can guarantee the ideal discrimination of the label transformation matrix and make the classification more stable. For better generalization capability, we adopt the existing $ε$ -dragging technique to relax the binary label. Leveraging the new regularity term and the label relaxation, we give a group discriminative least square regression (GDLSR) training model to learn the label transformation for multicategory classification. To solve the proposed model, we present an ADMM-like iteration algorithm, which we can guarantee a weak convergency. Experiments on several commonly used datasets show that our method outperforms both related LSR-based methods and some traditional methods.

Introduction

The least square regression (LSR) has wide applications in multi-category pattern classification, because it is mathematically tractable and computationally efficient. In the past decades, many LSR-based methods, such as local LSR [1], weight LSR [2], partial LSR [3], support vector machine (SVM) [4] have been proposed. In addition, several representation-based classification methods, such as sparse representation-based classification (SRC) [5] and linear-regression-based classification (LRC) [6], can also be regarded as LSR based methods, because they learn the representation coefficient by using the LSR technique. Among these methods, linear regression (LR) is widely used because it is simple and highly efficient.

LR [7], [8], [9] based multi-category classification usually follows a two-stage procedure. In the training stage, under supervision of the training data and their known labels, a label transformation matrix is learnt with an LR-based training model; in the test stage, the label transformation matrix is used to predict the class identity of a test sample. A commonly used method for predicting the class identity of a test sample is that, the transformation matrix is used to map a test sample into a label vector, and the class identity is defined as the index corresponding to the maximum entry of the label vector. The classification performance of such methods mainly depends on the discriminative and the generalization capability of the label transformation.

Many LR-based methods have been developed to ensure the discriminative and generalization capability of the label transformation. The discriminative LSR (DLSR) [10] relaxes the one-hot label by using the $ε$ -dragging technique. The retargeted LSR (ReLSR) [11] enlarges the margin between the true and false classes more intuitively. Despite their great success by relaxing the labels, these methods often suffer from overfitting [12], [13], [14]. To solve this problem, various regularization techniques have been proposed to impose certain constraint on the transformation matrix of the LR-based training model. The regularized label relaxation linear regression (RLRR) [15] relaxes the strict binary label matrix into a slack variable matrix by introducing a nonnegative label relaxation matrix and avoids overfitting by constructing a class compactness graph. The inter-class sparsity based discriminative least square regression (ICS $_$ DLSR) [16] pursues a common sparsity structure shared by samples within class by using an inter-class sparsity constraint: the $l_{2, 1}$ norm. Once the transformation matrix is obtained in the training stage, the nearest neighbor classifier is performed to obtain the final classification results. The group low-rank representation-based discriminative linear regression (GLRRDLR) [17] introduces an inter-class low-rank constraint on the transformed data such that the intrinsic structure of data can be naturally captured and exploited during regression.

By using relaxation and various regularization, the previous methods can ensure the discrimination and generalization capability to some extent. However, the structure regularization in these methods are not sufficient to guarantee the ideal structure or discriminative capability of the predicted labels. Ideally, the predicted binary labels of data from the same class should have such a property: they have one single common entry valued 1 while other entries valued 0. Arranging all the label vectors column-wise into a label matrix, it has one single row of 1s and other rows are all zeros. Analysis in our motivation section suggests that such structure ensures good discriminative capability and stability of the classification. Motivated by this observation, we propose a new regularization on the transformation matrix to guarantee this structure of the predicted labels. Specifically, we regularize the label transformation matrix by minimizing the difference of $l_{2, 1}$ norm and $l_{2, 2}$ norm of the predicted label matrix of each class. We also adopt the $ε$ -dragging technique of DLSR to relax the binary label for better generalization capability. We call it group discriminative least square regression (GDLSR). Experiments on several commonly used datasets show that our method outperforms some state-of-the-art LSR-based methods.

Section snippets

Related Works

We first introduce notations used in this paper, then we review some related works.

For a matrix $A = (A_{ij}), A_{ij}$ denotes the $(i, j)$ entry, $A_{i, :}$ and $A_{:, i}$ denotes the ith row and ith column respectively. ${‖A‖}_{1} = \sum_{ij} |A_{ij}|, {‖A‖}_{F} = {‖A‖}_{2, 2} = \sqrt{\sum_{ij} {|A_{ij}|}^{2}}, {‖A‖}_{2, 1} = \sum_{i} \sqrt{\sum_{j} {|A_{ij}|}^{2}}$ and ${‖A‖}_{*} = \sum_{i} σ_{i}$ denote the $l_{1}$ -norm (as a vector), Frobenius norm, $l_{2, 1}$ -norm and nuclear norm of the matrix A respectively, where $σ_{i}$ denotes the sigular values of A. For a vector $v = {(v_{1}, v_{2}, \dots, v_{n})}^{T}, {‖v‖}_{1} = \sum |v_{i}|, {‖v‖}_{2} = \sqrt{\sum {(v_{i})}^{2}}$ , is the $l_{1}$ -norm and $l_{2}$ -norm, respectively.

Denote a set of

Motivation

Ideally, the predicted labels of the data $X_{i}$ from the ith class should be $Q X_{i} = [\begin{matrix} 0 & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 1 & \dots & 1 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & 0 \end{matrix}]$ where 1s are on the ith row. The most significant characteristic of this matrix is that, it has one single nonzero row. The index of this nonzero row indicates the class identity of the data collection $X_{i}$ . This embodies the discriminative capability of the predicted label indicators. Moreover, the predicted label indicators ${QX}_{i}$ of data $X_{i}$ share the same index of nonzero entries, which further enforces

Complexity analysis

The intensive computation parts of Algorithm 1 are steps 1–4. The computation of step 1 is relatively heavy with an $O (n^{3})$ complexity for an n $\times$ n matrix. The complexity of step 2 and step 3 is the same as $O (nc)$ , where c is the number of classes. The complexity of step 4 is $O (n^{3})$ . Hence the total computational complexity of GDLSR is about $O (τ n^{3})$ , where $τ$ is the iteration number.

Convergence properties

The convergence analysis of ADMM has been well studied when there are two blocks of variables [19], [20]. For more variable

Experimental Analysis

We evaluate our method for pattern classification, and we compare it with some related state-of-the-art methods: LRLR [9], DLSR [10], ReLSR [11], ICS $_$ DLSR [16]. Our model has two parameters, i.e., $λ_{1}$ and $λ_{2}$ , which are selected by the standard 5-fold cross validation method within $λ_{1}, λ_{2} \in {0.001, 0.01, 0.03, 0.05, 0.08, 0.1, 1}$ . The model parameters of other methods are also carefully chosen for fair comparison. We test all methods on six datasets, including four face detasets: Extended Yale B [27], CMU

Conclusion

We proposed the group discriminative least square regression (GDLSR) training model for multicategory classification. The novelty is the new regularization term, which can effectively ensure good discriminative capability and stability of the classification. In addition, the $ε$ -dragging technique of DLSR is used to strengthen the generalization capability. To solve the proposed model, we presented an iteration algorithm based on ADMM. We proved a weak convergency of the proposed algorithm, which

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their constructive comments and suggestions. This paper is supported by National Natural Science Foundation of China (Grant Nos. 61972264, 61472303, and 61772389) and Natural Science Foundation of Guangdong Province (Grant No.2019A1515010894).

Chunyu Yang received her B.S. degree from Henan Normal University, Xinxiang, China, in 2015; M.S. degree from the School of Mathematics and Statistics, Xidian University, in 2018; She is currently pursuing her Ph.D. degree at the School of Mathematics and Statistics, Xidian University. Her research interests include pattern recognition, subspace clustering, and semi-supervised learning.

References (32)

H. Xue et al.
Discriminatively regularized least squares classification
Pattern Recogn.
(2009)
J. Wen et al.
Inter-class sparsity based discriminative least square regression
Neural Networks
(2018)
D. Ruppert et al.
An effective band width selector for local least squares regression
J. Amer. Statist. Assoc.
(1995)
T. Strutz
Data fitting and uncertainty: a practical introduction to weighted least squares and beyond
(2010)
H. Abdi
Partial least squares regression and projection on latent structure regression (pls regression)
Wiley Interdiscip. Rev. Comput. Stat.
(2010)
C.C. Chang, and C. Lin, Libsvm: a library for support vector machines, ACM Transactions on Intelligent Systems and...
J. Wright et al.
Robust face recognition via sparse representation
IEEE Transactions on Pattern Analysis and MachineIntelligence
(2009)
I. Naseem et al.
Linear regression for face recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2010)
Z. Lai et al.
Human gait recognition via sparse discriminant projection learning
IEEE Transactions on Circuits & Systems for Video Technology
(2014)
C. Cai et al.
On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions

S.M. Xiang et al.

Discriminative least squares regressions for multiclass classification and feature selection

IEEE Trans. Neural Netw. Learn. Syst.

(2012)

X.Y. Zhang et al.

Retargeted least squares regression algorithm

IEEE Transactions on Neural Networks and Learning Systems

(2015)

X. Cai et al.

Exact top-k feature selection via $l_{2, 0}$ -norm constraint

F. Bunea, Y. She, and M.H. Wegkamp, Optimal selection of reduced rank estimators of high-dimensional matrices, Ann....

S. Xiang, Y. Zhu, X. Shen, and J. Ye, Optimal exact least squares rank minimization, In Proceedings of the ACM SIGKDD...

X. Fang et al.

Regularized label relaxation linear regression

IEEE Transactions on Neural Networks and Learning Systems

(2017)

Cited by (0)

Weiwei Wang received the B.S., M.S. and Ph.D. degrees from Xidian University, Xi’an, China, in 1993, 1998 and 2001, respectively. She is currently a Professor with the School of Mathematics and Statistics, Xidian University. Her research interests include supervised/unsupervised learning, deep learning, sparse representation, low-rank representation and their applications in image processing.

Xiangchu Feng received the B.S. degree in Computational Mathematics from Xi’an JiaoTong University, Xi’an, China, in 1984, and the M.S. and Ph.D. degrees in Applied Mathematics from Xidian University, Xi’an, in 1989 and 1999, respectively. He is currently a Professor with the School of Mathematics and Statistics, Xidian University. His research interests include numerical analysis, wavelets, and partial differential equations for image processing.

Ruiqiang He received the M.S. degree from Xi’an University of Architecture and Technology, Xi’an, China, in 2009. Currently he is pursuing his Ph.D. degree at the School of Mathematics and Statistics, Xidian University, Xi’an. His research interests include inverse problems in image processing, mathematical models and algorithms for image processing.

View full text

Group discriminative least square regression for multicategory classification

Highlights

Abstract

Introduction

Section snippets

Related Works

Motivation

Complexity analysis

Convergence properties

Experimental Analysis

Conclusion

Acknowledgements

Pattern Recogn.

Neural Networks

An effective band width selector for local least squares regression

J. Amer. Statist. Assoc.

Data fitting and uncertainty: a practical introduction to weighted least squares and beyond

Partial least squares regression and projection on latent structure regression (pls regression)

Wiley Interdiscip. Rev. Comput. Stat.

Robust face recognition via sparse representation

IEEE Transactions on Pattern Analysis and MachineIntelligence

Linear regression for face recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Human gait recognition via sparse discriminant projection learning

IEEE Transactions on Circuits & Systems for Video Technology

On the equivalent of low-rank linear regressions and linear discriminant analysis based regressions

Discriminative least squares regressions for multiclass classification and feature selection

IEEE Trans. Neural Netw. Learn. Syst.

Retargeted least squares regression algorithm

IEEE Transactions on Neural Networks and Learning Systems

Exact top-k feature selection via l2,0-norm constraint

Regularized label relaxation linear regression

IEEE Transactions on Neural Networks and Learning Systems

Exact top-k feature selection via $l_{2, 0}$ -norm constraint