Dependence maximization based label space dimension reduction for multi-label classification

https://doi.org/10.1016/j.engappai.2015.07.023Get rights and content

Abstract

High dimensionality of label space poses crucial challenge to efficient multi-label classification. Therefore, it is needed to reduce the dimensionality of label space. In this paper, we propose a new algorithm, called dependence maximization based label space reduction (DMLR), which maximizes the dependence between feature vectors and code vectors via Hilbert–Schmidt independence criterion while minimizing the encoding loss of labels. Two different kinds of instance kernel are discussed. The global kernel for DMLRG and the local kernel for DMLRL take global information and locality information into consideration respectively. Experimental results over six categorization problems validate the superiority of the proposed algorithm to state-of-art label space dimension reduction methods in improving performance at the cost of a very short time.

Introduction

During the last decade, multi-label classification has aroused the interest of researchers from engineering and academic areas because of its wide applications in real world. In multi-label setting, a document may be associated with multiple categories (Ji et al., 2010, Ueda and Saito, 2003); an image may be annotated with several concepts (Boutell et al., 2004). It is rather different from the traditional single-label (binary or multi-class) classification where each document is only allowed to be associated with one category.

A lot of algorithms have been proposed for multi-label classification (Zhang and Zhou, 2014). Currently a consensus on multi-label classification is that label correlations play an important role and should be utilized for performance improvement (Dembczyński et al., 2010, Zhang and Zhang, 2010, Zhang and Zhou, 2014). Most algorithms usually build classification model based on some label correlation assumption, such as ensemble of classifier chains (ECC) (Read et al., 2011) and calibrated label ranking (CLR) (Fürankranz et al., 2008).

Although these algorithms achieve satisfactory results, they suffer from computational inefficiency in both training and testing even for the most intuitive approach binary relevance (BR) (Boutell et al., 2004), which decomposes a multi-label classification problem into several independent binary classification problems, one for each label, based on one-versus-all (OVA) strategy (Hastie et al., 2009). This problem poses a rather crucial challenge to classification especially when there are a large number of possible labels. Therefore, it is necessary to explore ways that balances the classification performance and computational efforts.

Several algorithms on label space dimension reduction (LSDR) have been proposed along this avenue, which can be categorized into two groups: learning methods and reduction methods. The former group reduces the label space while jointly learning a classifier from the instances to the code vectors as well, for example multi-label prediction via compressed sensing (CS) (Hsu et al., 2009). We can obtain a classifier finally and use it for predicting directly. However, in order to get a promising classifier, these methods often employ complicated algorithms in the learning part, which again costs too much time. Therefore, the latter group is the mainstream in this avenue.

The latter group focuses on how to efficiently compress the label space and does not consider what learning algorithm to apply after compression. An exemplar is principal label space transformation (PLST) (Tai and Lin, 2012), which only reduces the dimensionality of label space by analyzing the principal components. A key problem of this group is on how to utilize the instances, which is still an open question. Since the ultimate objective is to make classification, some methods only use a simple model from instances to code vectors, for instance, conditional PLST (CPLST) (Chen and Lin, 2012). Nevertheless, this strategy might be suboptimal as it may over-fit the learnt model, which has a negative impact on the learning process later.

In this paper, we propose a new LSDR method, called dependence maximization based label space reduction (DMLR), which can be categorized into the latter group as a reduction method. Different from previous reduction methods, it assumes that the objective function should consist of two components: encoding loss and dependence loss. The former one measures the loss of label compression while the latter one measures the dependence between instances and code vectors. Specifically, it measures the encoding loss using least square loss function as used in PLST and measures the dependence loss based on Hilbert–Schmidt independence criterion (HSIC) (Gretton et al., 2005). Two different instance kernels are applied and we obtain two methods: DMLRG with the instance kernel exploiting global information and DMLRL with the instance kernel exploiting local information. Experimental results across six data sets from various application domains validate the superiority of two proposed algorithms to two state-of-art LSDR methods, PLST and CPLST, in performance and save a lot of training and testing time compared with a simple representative multi-label classification method – BR. Moreover, DMLRL outperforms DMLRG in performance in most cases and costs similar or less training time due to the sparsity of instance kernel used in DMLRL.

The rest of this paper is organized as follows. Section 2 presents a brief literature review on multi-label classification algorithms and pays more attention on LSDR methods and the HSIC. We describe the two proposed algorithms, DMLRG and DMLRL, in detail in Section 3 and. Experimental results and discussion are given in Section 4 Finally, Section 5 concludes this paper and presents some clues for future work.

Section snippets

Related works

Since this paper focuses on LSDR methods, we present a brief literature review on multi-label classification in Section 2.1 and existing LSDR methods in Section 2.2. Section 2.3 describes the dependence measurement criterion HSIC on which our proposed methods relies. But for convenience of presentation, we first give the formulation of multi-label classification.

Let D={(Xi,Yi)i=1N} be the training set with N examples, where XiRd is the ith instance (or feature vector) and Yi{1,+1}L is the

DMLR: dependence maximization based label space reduction

In this section, we first present the DMLR method in Section 3.1 in detail. Then we discuss in Section 3.2 the way to set the instance kernel K for different purposes which plays a key role in DMLR.

Experiments and discussion

In this section, we conduct experiments to validate the competence of our proposed methods: DMLRG and DMLRL. Section 4.1 gives the experimental settings of several comparing methods. Some details on data sets are given in Section 4.2, on which the experimental results and discussion are presented in Section 4.3.

Conclusion and future work

In this paper, we assumed that the objective function in multi-label label space dimension reduction should be constituted by two components: the compression loss to measure the quality of label compression and the dependency loss to measure the dependence between the instances and code vectors. Based on this scheme, we proposed the dependence maximization based label space reduction (DMLR). It utilizes the compression loss as used in PLST and CPLST, and introduces the HSIC as the dependence

Acknowledgments

This work is supported by National Natural Science Foundation of China (Grant nos. 61472305, 61070143, 61303034), Science and Technology Project of Shaanxi Province, China (Grant nos. 2015GY027), and the Fundamental Research Funds for the Central Universities (Grant no. SMC1405).

References (45)

  • Chen, Y.-N., Lin, H.-T., 2012. Feature-aware label space dimension reduction for multi-label classification. In:...
  • W. Cheng et al.

    Combining instance-based learning and logistic regression for mutli-label classification

    Mach. Learn.

    (2009)
  • C. Cortes et al.

    Algorithms for learning kernels based on centered alignment

    J. Mach. Learn. Res.

    (2012)
  • Dembczyński, K., Cheng, W., Hüllermeier, E., 2010. Bayes optimal multilabel classification via probabilistic classifier...
  • Elisseeff, A., Weston, J., 2001. A kernel method for multi-labelled classification. In: Proceedings of Advances in...
  • K. Fukumizu et al.

    Kernel dimension reduction in regression

    Ann. Stat.

    (2009)
  • J. Fürankranz et al.

    Multilabel classification via calibrated label ranking

    Mach. Learn.

    (2008)
  • Gretton, A., Bousquet, O., Smola, A.J., Schölkopf, B., 2005. Measuring statistical dependence with Hilbert–Schmidt...
  • M. Hall et al.

    The WEKA data mining software: an update

    SIGKDD Explorations

    (2009)
  • T. Hastie et al.

    The elements of statistical learning

    (2009)
  • H. Hotelling

    Relations between two sets of variates

    Biometrika

    (1936)
  • Hsu, D., Kakade, S.M., Langford, J., Zhang, T., 2009. Multi-label prediction via compressed sensing. In: Proceedings of...
  • Cited by (12)

    • Compact learning for multi-label classification

      2021, Pattern Recognition
      Citation Excerpt :

      Then, based on canonical correlation analysis [20], Conditional PLST (CPLST) [16] and CCA-OC [21] improved PLST from the point of feature information. Zhang et al. [22] put forward a method to maximize the dependence between features and embedding labels. Some LC methods applied the randomized techniques to speed up the computing [23,24].

    • A Label Embedding Method via Conditional Covariance Maximization for Multi-label Classification

      2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    • Robust Multi-Label Relief Feature Selection Based on Fuzzy Margin Co-Optimization

      2022, IEEE Transactions on Emerging Topics in Computational Intelligence
    • A Review on Dimensionality Reduction for Multi-Label Classification

      2021, IEEE Transactions on Knowledge and Data Engineering
    View all citing articles on Scopus
    View full text