Elsevier

Pattern Recognition

Volume 100, April 2020, 107118
Pattern Recognition

Noise-robust dictionary learning with slack block-Diagonal structure for face recognition

https://doi.org/10.1016/j.patcog.2019.107118Get rights and content

Highlights

  • We propose a slack block-diagonal (SBD) structure for representation where the target structure matrix is dynamically updated to instead of the strict ‘0-1’ structure.

  • In order to depict the noise in face images more precisely, we also propose a robust dictionary learning algorithm based on mixed-noise model by utilizing the above SBD structure (SBD2L).

  • Moreover, we add a low-rank constraint on the representation matrix to enhance the dictionary’s robustness to noise.

Abstract

Strict ‘0-1’ block-diagonal structure has been widely used for learning structured representation in face recognition problems. However, it is questionable and unreasonable to assume the within-class representations are the same. To circumvent this problem, in this paper, we propose a slack block-diagonal (SBD) structure for representation where the target structure matrix is dynamically updated, yet its blockdiagonal nature is preserved. Furthermore, in order to depict the noise in face images more precisely, we propose a robust dictionary learning algorithm based on mixed-noise model by utilizing the above SBD structure (SBD2L). SBD2L considers that there exists two forms of noise in data which are drawn from Laplacian and Gaussion distribution, respectively. Moreover, SBD2L introduces a low-rank constraint on the representation matrix to enhance the dictionary’s robustness to noise. Extensive experiments on four benchmark databases show that the proposed SBD2L can achieve better classification results than several state-of-the-art dictionary learning methods.

Introduction

Face recognition is the most popular topic in the field of biometrics thanks to its intuitiveness and unique advantages. Over the past few years, the research on sparse and low-rank representation has attracted a great deal of attention because of the promising performance [1], [2], [3], [4], [5], [6]. Among these, Du et al. [1] for the first time optimized the sparsity and feature statictics simultanously and formulated a hybrid sparsity and statistics based detector for high dimension hyperspectral image data. For face image recognition, the most classical sparse representation based classification (SRC) [2] algorithm aims to find a sparse representation (only a few non-zero elements) of a query sample over an over-complete dictionary. Based on the observation that the collaborative mechanism tends to play a more important role in representing a sample than sparsity, Zhang et al. [7] proposed a collaborative representation based classifier (CRC) which uses the l2 norm to replace the l1 norm in SRC. Besides, the linear regression based classification (LRC) [8] uses cla class specific training samples to represent a test sample and then classifies it to the class which leads to the minimum residual. In addition, Wang et al. [9] proposed a locality-constrained linear coding (LLC) algorithm which states that the data locality can always promote the sparsity of representation. However, sparsity-deduced algorithms may be unable to capture the global structures of data because they are designed to learn the sparsest representation sample-wise, while ignoring the relevance between samples. Fortunately, the theory of low-rank representation has been studied to solve this problem [5], [6], [10]. The low-rank based algorithms jointly explore the underlying structures and correlations beween all samples in order to generate a representation that can preserve the global structure of data as much as possible. Zhang et al. [11] proposed to use a low-rank matrix factorization technique to simultaneously implement dimensionality reduction and data clustering for hyperspectral images. The robust principal component analysis (RPCA) [12] is the most classical matrix recovery method by introducing a low-rank constraint on clean components. Wei et al. [13] introduced a structural incoherence constraint into RPCA and presented a method called low rank matrix recovery with structural incoherence (LRSI). Based on LRSI, Yin et al. [14] presented a new method that can correct the corrupted test images with a low rank projection matrix. Liu et al. [5] proposed a low-rank representation based matrix recovery algorithm (LRR) by seeking a low-rank representation in terms of a given dictionary so that the noise can be seperated from the original samples. LRR assumes the data lies in multiple subspaces rather than only one, which is more robust in handling corrupted samples compared to to RPCA. Latent LRR (LatLRR) [6] is also a feature extraction algorithm based on LRR which supposes that the observed samples can be represented by some underlying hidden samples.

It is worth noting that all the above algorithms take advantage of the original samples as the dictionary. Although their application to real-world face recognition tasks has achieved impressive results, there often exist various contaminations in collected face images in practical scenarios, i.e., the gross occlusion, illumination and real disguise, which disturb the process of data reconstruction. If we directly use the raw training samples as dictionary, the classification performance may be degraded because the class structure of the subspace is destroyed by the noise. However, selecting only clean samples as the dictionary and strictly neglecting corrupted samples will also yield poor results since these corrupted samples may include some useful discriminative information. Therefore, it is preferable to learn a compact, clean and discriminative dictionary from the original contaminated samples. Depending on the available label information, existing dictionary algorithms can be divided into two types: supervised or unsupervised dictionary learning methods.

KSVD [15] and the method of optimal directions (MOD) [16] are the classical unsupervised dictionary learning algorithm in which the noise is assumed to be drawn from a Gaussian distribution. Moreover, Chen et al. [17] proposed a mixed-noise (Laplacian and Gaussian distribution) based dictionary learning algorithm. In order to find a solution for small sample size problems, Xu et al. [18] proposed a sample-diversity and representation-effectiveness based dictionary learning (SDRERDL) algorithm exploiting data augmentation by mirroring the original samples. More recently, Zhou et al. [19] proposed a double-dictionary learning algorithm in which two different dictionaries are learned to separate the original data into different subspaces.

For supervised dictionary learning algorithms, the label information relating to the original training samples is embedded during the learning procedure to capture more discriminative structure. Yang et al. [20] used the Fisher discrimination criterion to learn Fisher discrimination dictionary (FDDL) in which the representations have both minimum within-class scatter and maximum inter-class scatter. FDDL is a very classic supervised dictionary learning algorithm. Based on the KSVD model, Zhang et al. [21] proposed a discriminative KSVD (D-KSVD) algorithm to improve the discriminative ability of the learned dictionary by incorporating a classification error term into the objective function. Furthermore, Jiang et al. [22] proposed a label-consistent K-SVD (LC-KSVD) algorithm in which the within-class samples are encouraged to have similar representations by introducing a sparse binary label matrix. Based on the assumption that the samples from different classes may share certain common features, Wang et al. [23] proposed a classification-oriented dictionary learning model (COPAR) by exploiting the particularity and commonality of information across all classes. Besides, DLSI [24], JDL [25] and CSDL [26] were also proposed to explore a shared dictionary. However, [23], [24], [25], [26] overlook an important problem, i.e., the shared dictionary should be low-rank. Inspired by this, Vu et al. [27] proposed a low-rank shared dictionary learning (LRSDL) framework by introducing a low-rank constraint on the shared dictionary to encourage its subspace to be of low-dimensionality and its corresponding representations to be similar. In addition to adding a low-rank constraint to the dictionary, there also exist a number of low-rank representation based dictionary learning algorithms [28], [29], [30]. Thanks to the self-expressiveness property, the obtained data representations should be block-diagonal for discriminability [31]. By virtue of the model of LRR, structured LRR (SLRR) [28] constructed an ideal ’0-1’ block-diagonal matrix to force the learned low-rank representation matrix to be block-diagonal. Gao et al. [30] proposed a robust and discriminative low-rank representation based dictionary learning (RDLRR) algorithm by exploiting the low-rank property of both the representations and contiguous errors.

However, the used ideal ’0-1’ block-diagonal structure in SLRR [28] and RDLRR [30] is unrealistic in practice because the within-class representation coefficients over a dictionary are not identical. If we force all the within-class coefficients to be equal to ’1’, they may lose some useful structural information, which is beneficial for classification. Recently, Zhang et al. [31] proposed a discriminative block-diagonal low-rank representation (BDLRR) algorithm by restraining the energy of off-block-diagonal elements to strengthen the contribution of block-diagonal elements. In BDLRR, the learned representations are block-diagonal but not strict ’0-1’ structure. However, BDLRR takes no account of the correlations among within-class representations [32]. The above low-rank representation based dictionary learning algorithms characterize the noise using a single distribution assumption, which is not robust to mixed-noise. To address these problems, in this paper, we first propose a slack block-diagonal (SBD) structure by adding a row-sparse slack term on the ideal ’0-1’ structure matrix. As a result, the new target matrix of the representations is dynamic, yet similar to the block-diagonal structure. From [28], [29], [30], [31] we know that by imposing a low-rank constraint on coding coefficients, we can capture the whole structure of data and the learned dictionary becomes more robust to noise. Thus, by integrating the idea of mixed-noise based learning model [17] and low-rank nature of representations, we develop a novel noise-robust dictionary learning algorithm with slack block-diagonal structure (SBDL). The main contributions of this paper can be summarized as follows:

  • (1)

    A low-rank representation based noise-robust dictionary learning model is proposed. In addition to learning low-rank representations for dictionary, we take advantage of the l1 and l2 norms to respectively describe the noise with a Laplacian distribution and Gaussian distribution. This model is more robust to complicated noise contaminations than the one based on a single distribution.

  • (2)

    Based on above mixed-noise based learning model, we propose a slack block-diagonal structure (SBD) for the representations. A slack block-diagonal structure is more flexible than strict binary block-diagonal structure. As a result it avoids the loss of structural information and contributes to learning more discriminative dictionary.

  • (3)

    We develop an effective optimization algorithm based on the alternating direction method of multipliers (ADMM) to solve the optimization problem of the proposed model. Its convergence is validated experimentally.

The remainder of the paper is organized as follows: Section 2 introduces the low-rank representation based noise-robust dictionary learning algorithm for removing mixed-noise. Section 3 presents our SBD2L algorithm developed by introducing a slack block-diagonal structure. In Section 4, we disuc discuss the classification method. In Section 5, we validate our SBD2L approach in extensive experiments and compare it with state-of-the-art methods on four benchmark face databases. Finally, Section 6 concludes the paper.

Section snippets

Noise-robust dictionary learning by embedding low-rank characteristics

Let X=[X1,X2,,Xc]Rd×n denote n training samples with a dimensionality of d drawn from c classes. Each column of X, i.e., xi,i{1,2,,n}, denotes a sample vector. XiRd×ni(i=1,2,,c) is the matrix of samples of the ith class and i=1cni=n. Dictionary learning (DL) methods aim to learn a compact and discriminative dictionary D=[D1,D2,,Dc]Rd×K from the original samples X, which exhibits robustness to various types of noise in face images. DiRd×Ki(i=1,2,,c) is the sub-dictionary of the ith

Noise-robust dictionary learning with slack block-diagonal structure

In this section, we first present our novel slack block-diagonal structure (SBD) for noise-robust dictionary learning (SBD2L). Then, we will introduce the optimization method for the proposed SBD2L. In the end, we give a simple complexity analysis of the optimization method.

Classification

By solving Algorithm 1, the mixed noise or corruptions in the training samples can be eliminated during the structured dictionary learning process to obtain optimized dictionary D^ and discriminative representation A^. For efficient classification, it is feasible to use the representation, A^, and the labels of training samples to learn a linear classifier W [28]W^=argminWHWA^F2+γWF2where H=[h1,h2,,hn]Rc×n is the binary label matrix of the training samples, whose column hj(j=1,2,,n)

Experiments

In this section, we conduct experiments on four benchmark face databases, i.e., the AR [43], Extended Yale B [44], CMU PIE [45] and Labeled Faces in the Wild (LFW) [46] database, to demonstrate the effectiveness of the proposed SBD2L algorithm. We compare the SBD2L algorithm with some state-of-the-art dictionary learning algorithms: LCKSVD(1 and 2) [22], FDDL [20], LCLE [47], SDRERDL [18] and several representation based classification algorithms: SRC [2], LRC [8], LLC [9]. For LCKSVD, SDRERDL,

Conclusion

This paper proposed a robust dictionary learning algorithm based on a mixed-noise model by utilizing a slack block-diagonal structure (SBD2L). Specifically, the mixed-noise based model is introduced in a dictionary learning procedure which assumes that the face images are simultaneously subject to Laplacian and Gaussian noise. The core innovation of the SBD2L algorithm is the use of a slack block-diagonal structure for the proposed representation to alleviate the structure information loss

Declaration of Competing Interest

The authors declared that they have no conflicts of interest to this work.

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Zhe Chen received the B.S. degree in the department of computer science and technology from Hefei University, Hefei, China, in 2014, and the M.S. degree from the Jiangnan University, Wuxi, in 2018. Currently, he is a PhD candidate in School of IoT Engineering, Jiangnan University, Wuxi, China. His research interests include face recognition, dictionary learning and sparse low-rank representation.

References (50)

  • B. Du et al.

    Beyond the sparsity-based target detector: a hybrid sparsity and statistics-based detector for hyperspectral images

    IEEE Trans. Image Process.

    (2016)
  • J. Wright et al.

    Robust face recognition via sparse representation

    IEEE Trans. pattern Anal. Mach. Intell.

    (2008)
  • Y. Zhang et al.

    Joint sparse representation and multitask learning for hyperspectral target detection

    IEEE Trans. Geosci. Remote Sens.

    (2016)
  • G. Liu et al.

    Robust recovery of subspace structures by low-rank representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • G. Liu et al.

    Latent low-rank representation for subspace segmentation and feature extraction

    Proceedings of the International Conference on Computer Vision

    (2011)
  • L. Zhang et al.

    Sparse representation or collaborative representation: Which helps face recognition?

    Proceedings of the International Conference on Computer Vision

    (2011)
  • I. Naseem et al.

    Linear regression for face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • J. Wang et al.

    Locality-constrained linear coding for image classification

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    (2010)
  • Z. Chen et al.

    Robust low-rank recovery with a distance-measure structure for face recognition

    Proceedings of the Pacific Rim International Conference on Artificial Intelligence

    (2018)
  • E.J. Candès et al.

    Robust principal component analysis?

    J. ACM (JACM)

    (2011)
  • C.-P. Wei et al.

    Robust face recognition with structurally incoherent low-rank matrix decomposition

    IEEE Trans. Image Process.

    (2014)
  • H. Yin et al.

    Face recognition based on structural incoherence and low rank projection

    Proceedings of the International Conference on Intelligent Data Engineering and Automated Learning

    (2016)
  • M. Aharon et al.

    K-Svd: an algorithm for designing overcomplete dictionaries for sparse representation

    IEEE Trans. Signal Process.

    (2006)
  • K. Engan et al.

    Frame based signal compression using method of optimal directions (mod)

    Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI, ISCAS (Cat. No. 99CH36349)

    (1999)
  • Z. Chen et al.

    Robust dictionary learning by error source decomposition

    Proceedings of the IEEE International Conference on Computer Vision

    (2013)
  • Cited by (41)

    • Dictionary-based transfer learning with Universum data

      2022, Information Sciences
      Citation Excerpt :

      For example, Zhou et al. [48] utilize the low-rank technique to preserve the subspace structure of clean data and the structured noise. Chen et al. [3] introduce a low-rank constraint into the sparse representation matrix to enhance the robustness of noise obtained from the Gaussion and Laplacian distribution. Considering transfer learning has been widely used in many areas, we need to study the problem of dictionary-based transfer learning in which Universum data are taken into account.

    View all citing articles on Scopus

    Zhe Chen received the B.S. degree in the department of computer science and technology from Hefei University, Hefei, China, in 2014, and the M.S. degree from the Jiangnan University, Wuxi, in 2018. Currently, he is a PhD candidate in School of IoT Engineering, Jiangnan University, Wuxi, China. His research interests include face recognition, dictionary learning and sparse low-rank representation.

    Xiao-Jun Wu received the B.Sc. degree in mathematics from Nanjing Normal University, Nanjing, China, in 1991, and the M.S. and Ph.D. degrees in pattern recognition and intelligent systems from the Nanjing University of Science and Technology, Nanjing, in 1996 and 2002, respectively. He is currently a Professor of Artificial Intelligent and Pattern Recognition with Jiangnan University, Wuxi, China. His current research interests include pattern recognition, computer vision, fuzzy systems, neural networks, and intelligent systems.

    He-Feng Yin received his B.S. degree in School of Computer Science and Technology from Xuchang University, Xuchang, China, in 2011. Currently, he is a Ph.D. candidate in School of I oT Engineering, Jiangnan University, Wuxi, China. His research interests include representation-based classification methods, dictionary learning and low-rank representation.

    Josef Kittler (M’74âLM’12) received the B.A., Ph.D., and D.Sc. degrees from the University of Cambridge, in 1971, 1974, and 1991, respectively. He is currently a Professor of Machine Intelligence with the Centre for Vision, Speech and Signal Processing, Department of Electronic Engineering, University of Surrey, Guildford, U.K. He conducts research on biometrics, video and image database retrieval, medical image analysis, and cognitive vision. He has authored a textbook entitled Pattern Recognition: A Statistical Approach (Englewood Cliffs, NJ, USA: Prentice-Hall, 1982) and over 600 scientific papers. He serves on the Editorial Board of several scientific journals in pattern recognition and computer vision.

    View full text