Simultaneous Nonlinear Label-Instance Embedding for Multi-label Classification

Kimura, Keigo; Kudo, Mineichi; Sun, Lu

doi:10.1007/978-3-319-49055-7_2

Keigo Kimura¹⁸,
Mineichi Kudo¹⁸ &
Lu Sun¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10029))

Included in the following conference series:

Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR)

1361 Accesses
2 Citations

Abstract

In this paper, unlike previous many linear embedding methods, we propose a non-linear embedding method for multi-label classification. The algorithm embeds both instances and labels into the same space, reflecting label-instance relationship, label-label relationship and instance-instance relationship as faithfully as possible, simultaneously. Such an embedding into two-dimensional space is useful for simultaneous visualization of instances and labels. In addition linear and nonlinear mapping methods of a testing instance are also proposed for multi-label classification. The experiments on thirteen benchmark datasets showed that the proposed algorithm can deal with better small-scale problems, especially in the number of instances, compared with the state-of-the-art algorithms.

You have full access to this open access chapter, Download conference paper PDF

A Label Embedding Method for Multi-label Classification via Exploiting Local Label Correlations

Label Embedding for Multi-label Classification Via Dependence Maximization

Article 17 August 2020

Yachong Li & Youlong Yang

Toward embedding-based multi-label feature selection with label and feature collaboration

Article 27 October 2022

Liang Dai, Jia Zhang, … Shaozi Li

Keywords

1 Introduction

Multi-Label Classification (MLC), which allows an instance to have more than one label at the same time, has been recently received a surge of interests in a variety of fields and applications [10, 15]. The main task of MLC is to learn the relationship between a F-dimensional feature vector $\varvec{x}$ and an L-dimensional binary vector $\varvec{y}$ from N training instances $\{(\varvec{x}^{(1)},\varvec{y}^{(1)}),\dots ,(\varvec{x}^{(N)},\varvec{y}^{(N)})\}$, and to predict a binary vector $\hat{\varvec{y}} \in \{0,1\}^{L}$ for a test instance $\varvec{x} \in \mathbb {R}^{F}$. To simplify the notation, we use a matrix $\mathbf {X}=[\varvec{x}^{(1)},\varvec{x}^{(2)},\dots ,\varvec{x}^{(N)}]^{T} \in \mathbb {R}^{N \times F}$ and a matrix $\mathbf {Y}=[\varvec{y}^{(1)},\varvec{y}^{(2)},\dots ,\varvec{y}^{(N)}]^{T} \in \{0,1\}^{N\times L}$ for expressing the training set.

A key of learning in MLC is how to utilize dependency between labels [10]. However, an excessive treatment of label dependency causes over-learning and brings larger complexity, sometimes, even intractable. Thus, many algorithms have been proposed to model the label dependency efficiently and effectively. Embedding is one of such methods for MLC. This type of methods utilizes label dependency through dimension reduction. The label dependency is explicitly realized by reducing the dimension of the label space from L to K ($\ll $ L). Embedding methods in general learn relationships instances in F-dimensional space and latent labels in K-dimensional space, then, linearly transform the relationship to those in F-dimensional and real labels in L-dimensional space [4–6, 8, 12, 16].

In this paper, we propose a novel method of a nonlinear embedding. Usually, either a set of labels or a set of instances is embedded [4–6, 8, 16], but in our method, both are embedded in the same time. We realize a mapping into a low-dimensional Euclidean space keeping three kinds of relationships between instance-instance, label-label and label-instance as faithfully as possible. In addition, for classification, a linear and a non-linear mappings of a testing instance are realized.

2 The Proposed Embedding

2.1 Objective Function

In contrast to traditional embedding methods, we explicitly embed both labels and instances into the same K-dimensional space ($K < F$) while preserving the relationships among labels and instances.^{Footnote 1} To preserve such relationships, we use a manifold learning method called Laplacian eigen map [1]. It keeps the distance or the degree of similarity between any pair of points or objects even in a low-dimensional space. For example, given similarity measure $\mathbf {W}_{ij}$ between two objects indexed by i and j, we find $\varvec{z}^{(i)}$ and $\varvec{z}^{(j)}$ in $\mathbb {R}^{K}$ so as to minimize $\sum _{i,j} \mathbf {W}_{ij} \Vert \varvec{z}^{(i)} -\varvec{z}^{(j)}\Vert ^{2}_{2}$ under an appropriate constraint for scaling.

Now, we consider to embed both instances and labels at once. Let $\varvec{g}^{(i)} \in \mathbb {R}^{K}$ be the low-dimensional representation of ith instance $\varvec{x}^{(i)}$ on the embedding space and $\varvec{h}^{(l)} \in \mathbb {R}^{K}$ be the representation of lth label on the same space as well. In this embedding, we consider three types of relationships: instance-label, instance-instance and label-label relationships. In this work, we quantify the above relationships by focusing on their localities. In more detail, we realize a mapping to preserve the following three kinds of properties in the training set:

1.
Instance-Label (IL) relationship: Explicit relationship given by ($\varvec{x}^{(i)},\varvec{y}^{(i)}$) ($i=1,\dots ,N$) should be kept in the embedding as closeness between $\varvec{g}^{(i)}$ and $\varvec{h}^{(l_{i})} $ where $l_{i}$ is one label of value one in $\varvec{y}^{(i)}$
2.
Label-Label (LL) relationship: Frequently co-occurred label pairs should be placed more closely in the embedded space $\mathbb {R}^{K}$.
3.
Instance-Instance (II) relationship: Instances close in $\mathbb {R}^{F}$ should be placed closely even in $\mathbb {R}^{K}$.

Let us denote them by $\mathbf {W}^{(IL)} \in \mathbb {R}^{N \times L}$, $\mathbf {W}^{(LL)} \in \mathbb {R}^{L \times L}$ and $\mathbf {W}^{(II)} \in \mathbb {R}^{N \times N}$, respectively. Then our objective function of $\{\varvec{g}^{(i)},\varvec{h}^{(l)}\}$ become, with $\alpha ,\beta $ (>0),

$$\begin{aligned} O= & {} 2O_{IL}+ \alpha O_{II}+ \beta O_{LL} \nonumber \\= & {} 2\sum _{i,l} \mathbf {W}_{il}^{(IL)}\Vert \varvec{g}^{(i)}-\varvec{h}^{(l)} \Vert _{2}^{2} + \alpha \sum _{ij} \mathbf {W}_{i,j}^{(II)}\Vert \varvec{g}^{(i)}-\varvec{g}^{(j)} \Vert _{2}^{2} \\&\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ + \beta \sum _{l,m} \mathbf {W}_{l,m}^{(LL)}\Vert \varvec{h}^{(l)}-\varvec{h}^{(m)} \Vert _{2}^{2} \nonumber \\= & {} \sum _{s,t} \mathbf {W}_{st} \Vert \varvec{e}^{(s)} - \varvec{e}^{(t)} \Vert ^{2}_{2} \ \ \ \ (s,t=1,2,\dots ,(N+L)), \nonumber \end{aligned}$$

(1)

where $ \varvec{e}^{(s)} = \varvec{g}^{(s)}$ or $\varvec{h}^{(s)}$, and $\mathbf {W}_{st}=\mathbf {W}^{(IL)}_{st}$, $\mathbf {W}^{(II)}_{st}$ or $\mathbf {W}^{(LL)}_{st}$ depending on the values of s and t. As their matrix representation, let us use $\mathbf {G}=[ \varvec{g}^{(1)},\dots , \varvec{g}^{(N)}]^{T} \in \mathbb {R}^{N\times K}$ and $\mathbf {H}=[ \varvec{h}^{(1)},\dots , \varvec{h}^{(L)}]^{T} \in \mathbb {R}^{L\times K}$. Then using

our objective function is rewritten as

$$\begin{aligned} O = \sum _{s,t} \mathbf {W}_{st} \Vert \varvec{e}^{(s)} - \varvec{e}^{(t)} \Vert ^{2}_{2} = 2 \text {Tr}(\mathbf {E}^{T}\mathbf {L}\mathbf {E}), \ \ \text {s.t.} \ \mathbf {E}^{T}\mathbf {DE}=\mathbf {I} \end{aligned}$$

(2)

where $\mathbf {L}=\mathbf {D}-\mathbf {W}$ and $\mathbf {D}$ is a diagonal matrix with elements $\mathbf {D}_{ii}=\sum _{j}\mathbf {W}_{ij}$ [1]. The constraint $\mathbf {E}^{T}\mathbf {DE}=\mathbf {I}$ is imposed to remove an arbitrary scaling factor in the embedding. This formulation is that of the Laplacian eigen map. Next, let us explain how to determine the similarity matrix $\mathbf {W}$.

Instance-Label Relationship: For the instance-label relationship $\mathbf {W}^{(IL)}$, we use $\mathbf {W}^{(IL)}=\mathbf {Y}$. In this case, $\mathbf {W}^{(IL)}$ has elements of zero or one. The corresponding objective function of Instance-Label relationship becomes:

$$\begin{aligned} O_{IL}=\min _{\varvec{g}^{(i)},\varvec{h}^{(l)}} \sum _{\begin{array}{c} i=1,\dots ,N \\ l=1,\dots ,L \end{array}} \mathbf {W}^{(IL)}_{il}\Vert \varvec{g}^{(i)} -\varvec{h}^{(l)}\Vert ^{2}_{2}, \end{aligned}$$

where $\mathbf {W}^{(IL)}_{il}=\mathbf {Y}_{il} \in \{0,1\}$.

Instance-Instance Relationship: We use the symmetric k-nearest neighbor relation in $\mathbb {R}^{F}$ for constructing $\mathbf {W}^{(II)}$ as seen in [3]. Thus, our second objective function becomes

$$\begin{aligned} O_{II}=\min _{\varvec{g}^{(i)}, \varvec{g}^{(j)}} \sum _{i,j=1,\dots , N} \mathbf {W}^{(II)}_{ij}\Vert \varvec{g}^{(i)} -\varvec{g}^{(j)}\Vert ^{2}_{2}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {W}^{(II)}_{ij}= {\left\{ \begin{array}{ll} 1 &{} (i \in \mathcal{{N}}_{k}(\varvec{x}^{(j)}) \vee j \in \mathcal{{N}}_{k}(\varvec{x}^{(i)}) ),\\ 0 &{} \text {(otherwise),} \end{array}\right. } \end{aligned}$$

where $\mathcal{{N}}_{k}(\varvec{x}^{(i)}) $ denotes the index set of k nearest neighbors of the ith instance. It is worth noting that we can construct $\mathbf {W}^{(II)}$ on the basis of the similarity between $\varvec{y}^{(i)}$ and $\varvec{y}^{(j)}$ as seen in [3] instead of that between $\varvec{x}^{(i)}$ and $\varvec{x}^{(j)}$ above.

Label-Label Relationship: We construct $\mathbf {W}^{(LL)}$ in such a way that $\mathbf {W}^{(LL)}_{lm}$ takes a large positive value when labels l and m co-occur frequently in $\mathbf {Y}$, otherwise a small positive value. We also use the symmetric k-nearest neighbor relation in the frequency. The corresponding third objective function becomes

$$\begin{aligned} O_{LL}=\min _{\varvec{h}^{(l)}, \varvec{h}^{(m)}} \sum _{l,m=1,\dots L} \mathbf {W}^{(LL)}_{lm}\Vert \varvec{h}^{(l)} -\varvec{h}^{(m)}\Vert ^{2}_{2}, \end{aligned}$$

where

$$\begin{aligned} \mathbf {W}^{(LL)}_{lm}= {\left\{ \begin{array}{ll} 1 &{} \text {(if}\ l\ \text {is one of top-}k\ \text {co-occurrence labels of}\ m\ \text {and vice versa)},\\ 0 &{} (\text {otherwise}). \end{array}\right. } \end{aligned}$$

Note that $\mathbf {W}^{(LL)}$ is symmetric as well as $\mathbf {W}^{(II)}$. The symmetricity of those guarantees the existence of a solution in (2).

The solution of (2) is obtained by solving the following generalized eigen problem:

$$\begin{aligned} \mathbf {L}\mathbf {E}=\lambda \mathbf {D}\mathbf {E}. \end{aligned}$$

(3)

Hence, the optimal solution $\mathbf {E}$ of the objective function is the bottom K eigenvectors excluding an eigenvector with zero eigenvalue [1].

An example of this embedding is shown in Fig. 1. This is the result of mapping for Scene dataset [11] where $N=2407$, $F=294$, $L=6$ and $K=2$. In Fig. 1, we can see that the instance-label, instance-instance and label-label relations are fairly preserved. First, for instance-label relationship, four instances that share a label subset $\{3,4\}$ (large brown dots) are mapped between labels 3 and 4. Second, for label-label relationship, highly co-occurred labels 1, 5 and 6 are closely mapped (highlighted by a circle). Finally, for instance-instance relationship, an instance and its k nearest neighbors ($k=2$) in the original F-dimensional space (a blue square and 2 blue diamonds) are closely placed.

2.2 Embedding Test Instances

For assigning labels for a testing instance, we need to embed it into the same low-dimensional space constructed from the training instances with multiple labels. Unfortunately above embedding is not functionally realized, we do not have an explicit way of mapping. Therefore, we propose two different ways of a linear mapping and a nonlinear mapping.

In the linear mapping, we simulate the nonlinear mapping from $\mathbf {X}$ to $\mathbf {G}$ (the former part of $\mathbf {E}$) by a linear mapping $\mathbf {V}$ so as to $ \mathbf {G} \simeq \hat{\mathbf {G}}=\mathbf {XV}$. We use Ridge regression to find such a $\mathbf {V}$:

$$\begin{aligned} \min _{\mathbf {V}} \Vert \mathbf {X}\mathbf {V} - \mathbf {G} \Vert ^{2}_{2} + \lambda \Vert \mathbf {V}\Vert ^{2}_{2}. \end{aligned}$$

where $\lambda $ is a parameter. A test instance $\varvec{x}$ is mapped to $\varvec{g}$ such as $\varvec{g}=\varvec{x}^{T}\mathbf {V}$.

In the nonlinear mapping, we use again the k-nearest neighbor relation to the testing instance $\varvec{x}$. We map $\varvec{x}$ into $\varvec{g}$ by the average point of its k-nearest neighbors in the training instances.

$$\begin{aligned} \varvec{g}=\mathop {\text {argmin}}\limits _{\varvec{g}'} \frac{1}{K}\sum _{i \in \mathcal{{N}}_{k}(\varvec{x})} \Vert \varvec{g}' - \varvec{g}^{(i)}\Vert ^{2}_{2}=\frac{1}{K}\sum _{i \in \mathcal{{N}}_{k}(\varvec{x})} \varvec{g}^{(i)}. \end{aligned}$$

Since the objective function (2) is solved by Laplacian Eigen Map [1], we name the proposed method Multi-Label classification using Laplacian Eigen Map (shortly, MLLEM). The combined pseudo-code of MLLEM-L (for linear mapping of a testing instance) and MLLEM-NL (for nonlinear mapping of a test instance) is described in Algorithms 1 and 2.

2.3 Computational Complexity

The training procedure of the proposed algorithm (Algorithm 1) can be divided into two parts. The first part constructs k-nn graphs for both labels and instances (Step 3 and Step 4), in $O(NL^{2})$ for labels and in $O(FN^{2})$ for instances, respectively. The second part solves the generalized eigen problem (Step 6). This part takes $O((N+L)^{3})$. However, it is known that this complexity can be largely reduced when the matrix $\mathbf {W}$ is sparse and only a small number K of eigen vectors are necessary [9]. Therefore, the complexity of the proposed algorithm can be estimated as $O(NL^{2}+FN^{2})$. This complexity is the same to those of almost all embedding methods including the compared methods on the experiments.

In the testing phase, the linear embedding needs $O(F^{2}N)$ for the ridge regression. In contrast, nonlinear embedding needs only O(FN) for each test instance that is faster than linear embedding.

3 Related Work

Label embedding methods for MLC are employed to utilize label-dependency via the low-rank structure of an embedding space. Recently, several methods based on traditional factorizations [4, 6, 8] and based on regressions with various loss functions [12, 13] have been proposed. Canonical Correlation Analysis based method [16] is also one of them. This method conceptually embeds both instance and labels at the same time like the proposed MLLEM does. However, it conducts only one-side embedding in the actual classification process. This is because the linear regression after embedding includes the other-side embedding. Although all methods utilizes low-rank structure and succeeded to improve classification accuracy, they are limited to linear transformation.^{Footnote 2} In contrast to these methods, our MLLEM utilizes label dependency in a nonlinear way so that it is more flexible for mapping. On the other hand, we have to be careful for overfitting when we use nonlinear mappings. In MLLEM, the nonlinear mappings rely only on the similarity measures $\mathbf {W}^{(IL)}$, $\mathbf {W}^{(II)}$ and $\mathbf {W}^{(LL)}$. Therefore, overfitting is limited to some extent.

Bhatia et al. proposed linear embedding method for instances [3]. In their embedding, only instance locality on the label space is considered and ML-KNN [14] is conducted on the low-dimensional space. In the sense of using locality, the proposed MLLEM is close to theirs, but the proposed MLLEM is different from their approach in the sense that label-instance relationship, label-label relationship and instance-instance relationship are all taken into consideration at the same time.

4 Experiments

4.1 Setting

We conducted experiments on thirteen benchmark datasets [11] (Table 1). Each dataset was separated into 67 % of training instances and 33 % of test instances at random. On large datasets (i.e. delicious, bookmarks and mediamill), we sampled randomly 5000 instances (4000 samples for training and 1000 samples for testing) according to [6].

Since all embedding methods return scores of labels, not a label subset, we used Area Under ROC-Curve (AUC) and top-k precision to evaluate the results [13].^{Footnote 3} AUC is a popular criteria to evaluate the ranking of all labels. We used ROC-curve between true-positive rate and false-positive rate for AUC and Top-1 precision [3].^{Footnote 4}

Table 1. Dataset used in the experiment. $F_{nnz}$ and $L_{nnz}$ are the average of number non-zero features and labels among instances in the corresponding set, respectively.

Full size table

We compared the following three state-of-the-art embedding methods to ours:

1.
Low-rank Empirical risk minimization for MLC (LEML) [13]
2.
Feature-aware Implicit Encoding(FaIE) [6]
3.
Sparse Local Embedding for Extreme Multi-label Classification (SLEEC) [3]
4.
Proposal with linear embedding (MLLEM-L) and nonlinear embedding (MLLEM-NL)

The proposed MLLEM has five parameters, the number of nearest neighbors $k^{(I)}$ for instances, $k^{(L)}$ for labels, weighting parameters $\alpha $ for $\mathbf {W}^{(II)}$ and $\beta $ for $\mathbf {W}^{(LL)}$ and the dimension K of the embedding space. On all datasets, we used $K=20$. All the other parameters were tuned by five-cross validation on training dataset. The code is available at the authors’ web site.^{Footnote 5} For CPLST and FaIE, we set their numbers of dimension for labels space to the $80\,\%$ of their numbers of labels following the setting in [3]. The other parameters were tuned as well. We used the implementations provided by the authors [13]^{Footnote 6}, [6]^{Footnote 7}. For SLEEC, we set the number of dimension as $K=100$ following the their setting [3]. We tuned best the number of k-nn and the number of neighborhoods for ML-KNN after embedding. The other parameters such as smoothing parameter in their regression is set to the default setting they used. We used the implementations provided by the authors too.

Table 2. Results

Full size table

Table 3. Training time ($K=20$).

Full size table

4.2 Results

Table 2 shows the averaged AUC and Top-1 precision on all thirteen benchmark datasets (the larger, the better). SLEEC was the best in AUC. The proposed MLLEM follows. In Top-1 MLLEM was the best, especially in relatively small-scale datasets. This difference is possibly explained from the difference between objective functions of MLLEM. MLLEM ignores the distance between two objects which do not have a local similarity relation to each other. Therefore, occasionally, such two objects are placed closely in the embedding space in spite that they are not similar. This affects the result measured by AUC which takes overall ranking into evaluation. On the other hand, on top-k labels, such an indicated bad effect problem seldom occurs.

MLLEM was superior to SLEEC when the number N of instances is relatively small even in AUC. This is probably because SLEEC considers only locality of instances (instance-instance relationship). When the number of available instances is limited, instance-instance relationship is not enough to capture the relationship between features and labels. From this viewpoint, MLLEM is the best choice for small- to medium-sample size problem (Table 3).

5 Discussion

Since the proposed MLLEM uses Laplacian eigen map for the nonlinear embedding, there are several ways to increase the scalability of MLLEM such as an incremental method [2], Nyström approximation or column sampling [9] and efficient k-NN constructors [7]. Note that the framework used in MLLEM is very general. It comes from the freedom of choice of matrix $\mathbf {W}$ (e.g., using a heat kernel). It is also able to handle categorical features. This generalization is not shared with SLEEC [3].

6 Conclusion

In this paper, we have proposed an embedding based approach for multi-label classification. The proposed algorithm takes into consideration three relationships: label-instance relationship, label-label relationship and instance-instance relationship, and realized a nonlinear mapping. All these three relationships are preserved in the embedded low-dimensional space as the closeness between instances and individual labels. We have shown that the algorithm is useful to visualize instances and labels at the same time, which helps us to understand a given multi-label problem, especially, how strongly those labels are related to each other. Linear and nonlinear mapping have been also proposed for classification. On experiments, the proposed algorithm outperformed the other state-of-the-art methods in small-scale datasets in sample number.

Notes

1.
Note that labels do not have their representations explicitly before embedding.
2.
Several methods can utilize kernel regressions instead a liner regression, however, after regression, they linearly transform the latent labels into the original labels. This means that the way of utilizing label dependency is still limited to be linear.
3.
All embedding methods use a threshold to obtain a label subset.
4.
We only show the result of top-1 precision since the ordering was not changed in top-3 or top-5.
5.
https://dl.dropboxusercontent.com/u/97469461/MLLEM.zip.
6.
http://www.cs.utexas.edu/~rofuyu/exp-codes/leml-icml14-exp.zip.
7.
The code is available at the authors’ site (https://sites.google.com/site/linzijia72/).

References

Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Bengio, Y., Paiement, J.F., Vincent, P., Delalleau, O., Le Roux, N., Ouimet, M.: Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps, and spectral clustering. Adv. Neural Inf. Process. Syst. 16, 177–184 (2004)
Google Scholar
Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse local embeddings for extreme multi-label classification. Adv. Neural Inf. Process. Syst. 28, 730–738 (2015)
Google Scholar
Chen, Y.N., Lin, H.T.: Feature-aware label space dimension reduction for multi-label classification. In: Advances in Neural Information Processing Systems, pp. 1529–1537 (2012)
Google Scholar
Hsu, D., Kakade, S., Langford, J., Zhang, T.: Multi-label prediction via compressed sensing. Adv. Neural Inf. Process. Syst. 22, 772–780 (2009)
Google Scholar
Lin, Z., Ding, G., Hu, M., Wang, J.: Multi-label classification via feature-aware implicit label space encoding. In: Proceedings of the 31st International Conference on Machine Learning, pp. 325–333 (2014)
Google Scholar
Liu, T., Moore, A.W., Yang, K., Gray, A.G.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 825–832 (2004)
Google Scholar
Tai, F., Lin, H.T.: Multilabel classification with principal label space transformation. Neural Comput. 24(9), 2508–2542 (2012)
Article MathSciNet MATH Google Scholar
Talwalkar, A., Kumar, S., Rowley, H.: Large-scale manifold learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Department of Informatics, Aristotle University of Thessaloniki, Greece (2006)
Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., Vlahavas, I.: Mulan: a Java library for multi-label learning. J. Mach. Learn. Res. 12, 2411–2414 (2011)
MathSciNet MATH Google Scholar
Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation. IJCAI 11, 2764–2770 (2011)
Google Scholar
Yu, H.f., Jain, P., Kar, P., Dhillon, I.: Large-scale multi-label learning with missing labels. In: Proceedings of the 31st International Conference on Machine Learning, pp. 593–601 (2014)
Google Scholar
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007)
Article MATH Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014)
Article Google Scholar
Zhang, Y., Schneider, J.G.: Multi-label output codes using canonical correlation analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 873–882 (2011)
Google Scholar

Download references

Acknowledgment

We would like to thank Dr. Kush Bhatia for providing the code of SLEEC and large-scale datasets. This work was partially supported by JSPS KAKENHI Grant Number 14J01495 and 15H02719.

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 060-0814, Japan
Keigo Kimura, Mineichi Kudo & Lu Sun

Authors

Keigo Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Mineichi Kudo
View author publications
You can also search for this author in PubMed Google Scholar
Lu Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Keigo Kimura .

Editor information

Editors and Affiliations

Data 61 - CSIRO , Canberra, Australia
Antonio Robles-Kelly
Pattern Recognition Laboratory, Technical University of Delft Pattern Recognition Laboratory, CD Delft, The Netherlands
Marco Loog
Electrical and Electronic Engineering, University of Cagliari Electrical and Electronic Engineering, Cagliari, Italy
Battista Biggio
Computación e IA, Universidad de Alicante Computación e IA, Alicante, Spain
Francisco Escolano
Computer Science, University of York Computer Science, York, United Kingdom
Richard Wilson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kimura, K., Kudo, M., Sun, L. (2016). Simultaneous Nonlinear Label-Instance Embedding for Multi-label Classification. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2016. Lecture Notes in Computer Science(), vol 10029. Springer, Cham. https://doi.org/10.1007/978-3-319-49055-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-49055-7_2
Published: 05 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49054-0
Online ISBN: 978-3-319-49055-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Simultaneous Nonlinear Label-Instance Embedding for Multi-label Classification

Abstract

Similar content being viewed by others

A Label Embedding Method for Multi-label Classification via Exploiting Local Label Correlations

Label Embedding for Multi-label Classification Via Dependence Maximization

Toward embedding-based multi-label feature selection with label and feature collaboration

Keywords

1 Introduction