Elsevier

Signal Processing

Volume 120, March 2016, Pages 620-626
Signal Processing

Fast communication
Discriminative separable nonnegative matrix factorization by structured sparse regularization

https://doi.org/10.1016/j.sigpro.2015.10.021Get rights and content

Highlights

  • Propose a discriminative separable non-negative matrix factorisation (DSNMF) model.

  • Derive an efficient first-order algorithm to learn DS-NMF.

  • Apply DS-NMF to face and scene image classification.

Abstract

Non-negative matrix factorization (NMF) is one of the most important models for learning compact representations of high-dimensional data. With the separability condition, separable NMF further enjoys a global optimal solution. However, separable NMF is unable to make use of data label information and thus unfavourable for supervised learning problems. In this paper, we propose discriminative separable NMF (DS-NMF), which extends separable NMF by encoding data label information into data representations. Assuming that each conical basis vector under the separability condition is only contributable to representing data from a few classes, DS-NMF exploits a structured sparse regularization to learning a sparse data representation and provides higher discrimination power than the standard separable NMF. Empirical evaluations on face recognition and scene classification problems confirm the effectiveness of DS-NMF and its superiority to separable NMF.

Introduction

Given a data matrix X=[x1,x2,,xn]R+p×n, containing n nonnegative examples of p-dimensional vector space, non-negative matrix factorisation (NMF) [22], [23], [21] finds the a pair of nonnegative matrices AR+r×n and BR+p×r, such thatXBA.The columns of B consists of a basis for the representation of X, while the columns of A store the coefficients of each data example under such a basis. In general, the column size r of B, i.e., the rank of the basis, is much less than the original data dimensionality p. Therefore, NMF leads to compact data representation and data compression. In addition, the non-negativity of A and B generally gives rise to more natural and interpretable data representations than other matrix factorization methods [17], [9], which makes NMF a favourable model for a wide-range of applications, from text topic modelling, signal separation, social networks, collaborative filtering, dimension reduction, sparse coding and feature selection.

Different metrics can be used to measure the approximation residual between X and BA, such as matrix norms or information-theoretic quantities (e.g., divergences), up to the intention of modelling data properties. In this paper, we use the Frobenius matrix norm to measure the approximation residual, i.e., we optimize A and B byminAR+r×n,BR+p×rXBAF2.However, it is worth emphasizing that the results of this study is readily extendable to NMFs with other metrics.

Although NMF provides a favourable approach for find compact data representations, the computation of the global optimal solution of (2) or forms with other metrics for the approximation residuals is intractable. Most practical NMF algorithms [24], [18], [20], [8], [16] solve (2) with local optima by alternating minimisation over A and B using different heuristics. It has been proved that NMF is generally NP-hard [31]. In addition, the non-convexity of NMF makes it difficulty to find a unique and globally optimal factorization. To overcome the above-mentioned drawbacks of NMF, additional assumptions on the data matrix can be used to transfer the original NMF into more amiable problems. In [9], the authors proposed the separability assumption on the data matrix X and showed that this ensures NMF to have a unique factorisation. The separability assumes that there exits a subset of columns XI of X such that the rest of columns can be represented by a nonnegative combination of XI. Therefore, NMF with the separable assumption reduces to finding the subset index I and the coefficients of representing X under XI.

Definition 1.1 Separability

A data matrix X is called separable if there is a subset index I[n], |I|=k, and a permutation matrix Π such thatXΠ=XIA,withA=[IkA]where Ik is the identity matrix of size k and AR+k×(nk).

Geometrically, the separability of X can be interpreted as: the data examples in XI generates a convex cone cone (XI), and the other data examples in X are located within cone (XI). Since a finitely generated convex cone has a unique set of extreme rays from its generators, NMF with the separable assumption is unique. Further, if we can allow approximate data representation or noise contamination, separable NMF can be formulated intominI,Π,AXΠXIAF2subjecttoI[n],|I|=kΠisapermutationmatrixA=[IkA],AR+k×(nk)In addition, under the mild regularity condition that XI cannot be represented by combinations of the rest of the examples in X, it is possible to get rid of the permutation matrix Π by using the following equivalent form of (4) minI,AXXIAF2subjecttoI[n],|I|=k,AR+k×n

Several algorithms have been developed to solve the separable NMF Eq. (4), (9) [3], [5], [12], [14], which are commonly motivated by the geometric interpretation of the separability of X. Specifically, these algorithms apply linear programming (LP) to detect the extreme rays or generators XI of the convex cone cone(XI) and to find the combination weights of the rest of examples in X. Very efficient algorithms for separable NMF have also been proposed by using recursive projections [13] and randomised methods [33]. In addition, by using the idea of group sparsity, the separable NMF problem (5) can be relaxed intominW0XXWF2+ϱi=1nW(i,:)2,from which the index I can be recovered by the nonzero rows of the optimal W and the coefficient matrix A can be obtained by A=W(I,:), i.e., the nonzero rows. Such formulation has been used for unmixing hyperspectral images in a blind and fully constrained manner [1].

Throughout this paper, we use the following notations. Upper letter A denotes a matrix. AI or A(:,I) denotes a sub-matrix of A, where I is an index variable and AI is composed by the corresponding columns of A indexed by I. A(i,:) denotes the i-th row of A. Lower letter a denotes a vector or a scalar. a(I) denotes as sub-vector of a, indexed by I. [n] denotes the set {1,2,,n}. AF denotes the Frobenius norm of matrix A. a2 denotes the ℓ2 norm of vector a. L(·) denotes the gradient of the loss function L(·). I1I2 denotes the intersection between index sets I1 and I2. I¯ is the complementary of index set I with respect to [n].

Section snippets

Discriminative separable nonnegative matrix factorisation

In the separable NMF (5), the coefficient matrix A provides a compact representation of original high-dimensional example in the data matrix X. However, such representation does not encode any discriminative information, if we know the labels of the data examples. To address this limitation of separable NMF in the supervised setting, we propose to exploit structured sparse regularization to construct a discriminative separable NMF (DS-NMF), so that the obtained low-dimensional representation A

On face recognition

We apply DS-NMF to face recognition. Four face image datasets, including Yale [4], ORL [29], UMIST [15], and FERET (a subset with 50 subject) [28], are used in the evaluation. These datasets contain a variety of face images from individuals, with varying poses, facial expressions and ages, and in different light conditions. On the Feret, Orl and Yale datasets, we randomly select 80% data for training and use the resting 20% for test, while on the Umist dataset we randomly select 40% data for

Conclusion

In this paper, we have proposed a new model, i.e., the discriminative separable non-negative matrix factorisation (DS-NMF), for learning discriminative feature of non-negative data. DS-NMF improves NMF and especially the separable NMF, by encoding the discrimination power of into the feature learning. In particular, a structured sparsity regularisation was exploited in DS-NMF, so as to make the learning tractable. Empirical evaluations on face recognition and scene classification show that

Acknowledgements

The authors would like to thank the anonymous reviewers and editors for their comments and suggestions. The research is supported by NSFC of China (No: 51379121, and 61304230) and Shanghai Key Technology Plan Project(No: 12510501800 and 13510501600).

References (33)

  • P.-Y. Chen et al.

    Translation-invariant shrinkage/thresholding of group sparse signals

    Signal Process.

    (2014)
  • Z. Gao et al.

    Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition

    Signal Process.

    (2015)
  • R. Ammanouil et al.

    Blind and fully constrained unmixing of hyperspectral images

    IEEE Trans. Image Process.

    (2014)
  • A. Argyriou et al.

    Convex multi-task feature learning

    Mach. Learn. J.

    (2008)
  • S. Arora, R. Ge, R. Kannan, A. Moitra, Computing a nonnegative matrix factorization—provably, in: Symposium on Theory...
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfacesrecognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • V. Bittorf, B. Recht, C. Ré, J.A. Tropp, Factoring nonnegative matrices with linear programs, in: Advances in Neural...
  • C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, ACM Trans. Intell. Syst. Technol. 2 (2011)...
  • C.H.Q. Ding et al.

    Convex and semi-nonnegative matrix factorizations

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • D. Donoho, V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts? in:...
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen.

    (1936)
  • N. Gillis et al.

    Robust near-separable nonnegative matrix factorization using linear optimization

    J. Mach. Learn. Res.

    (2014)
  • N. Gillis et al.

    Fast and robust recursive algorithms for separable nonnegative matrix factorization

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2014)
  • N. Gillis, S.A. Vavasis, Semidefinite programming based preconditioning for more robust near-separable nonnegative...
  • D.B. Graham, N.M. Allinson, Characterizing virtual eigensignatures for general purpose face recognition, in: H....
  • N. Guan, D. Tao, Z. Luo, J.S. Taylor, Mahnmf: Manhattan non-negative matrix factorization, 2012, CoRR...
  • Cited by (11)

    • Atom-substituted tensor dictionary learning enhanced convolutional neural network for hyperspectral image classification

      2021, Neurocomputing
      Citation Excerpt :

      Therefore, the 3-D CNN and 2-D CNN layers are combined into the hybrid model, which makes full use of spectral and spatial feature maps, and achieves the maximum possible accuracy. Besides the above methods, sparse representation-based methods [27–36] are also attractive in HSI classification. Spectral dictionary learning (SDL) [34] is a sparse representation method with spectral information.

    • A sparse tensor-based classification method of hyperspectral image

      2020, Signal Processing
      Citation Excerpt :

      In these applications, the basic problem is classification [5], where each pixel in the HSI is assigned to a category. During the last decade, many reliable classifiers have been designed for HSI classification, such as support vector machine (SVM) [6–8], random forests [9], decision trees (DT) [10], and sparse representation-based classifiers (SRC) [11–17]. Among these approaches, SVM is regarded as a simple but powerful classifier, which has been used as the final classifier in many other HSI classification methods [18,19].

    • Predicting ship fuel consumption based on LASSO regression

      2018, Transportation Research Part D: Transport and Environment
      Citation Excerpt :

      Another disadvantage of the SVR is not able to handle high-dimensional data very well, which may affect the prediction accuracy. Therefore, the SVR would not be a good choice for an onboard system if these problems were not solved in some manner.At the same time, some other related regression approaches can also be found in Lepore et al. (2017), Wang and Yang (2015) and Wang et al. (2016b). Therefore, our main research task is to study a novel model from a new perspective to describe the fuel consumption of a specific ship as a function of the ships states and surrounding environments.

    • Low-rank tensor learning for classification of hyperspectral image with limited labeled samples

      2018, Signal Processing
      Citation Excerpt :

      Therefore, classifying the high-dimensional HSI with limited number of training samples remains an open research issue. Intensive work has been carried out to design reliable classifiers for classification of the HSI, e.g. decision trees (DT) [6], AdaBoost [7], artificial neural networks (ANN) [8], support vector machine (SVM) [9–11] and sparse representation-based classifiers (SRC) [12–17]. Among these approaches, the SVM [9–11] classifier, which aims at finding an optimal separating hyperplane between two classes, has provided very successful results for HSI classification.

    • Prediction of ship fuel consumption based on Elastic network regression model

      2021, 2021 International Conference on Information, Cybernetics, and Computational Social Systems, ICCSS 2021
    View all citing articles on Scopus
    View full text