research-article

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

Authors:
Rohit Babbar

Max-Planck Institute for Intelligent Systems, Tuebingen, Germany

Max-Planck Institute for Intelligent Systems, Tuebingen, Germany
View Profile

,
Bernhard Schölkopf

Max-Planck Institute for Intelligent Systems, Tuebingen, Germany

Max-Planck Institute for Intelligent Systems, Tuebingen, Germany
View Profile

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningFebruary 2017Pages 721–729https://doi.org/10.1145/3018661.3018741

Published:02 February 2017Publication History

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 721–729

ABSTRACT

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated.

In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.

References

R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the International World Wide Web Conference, May 2013. Google ScholarDigital Library
R. Babbar, C. Metzig, I. Partalas, E. Gaussier, and M.-R. Amini. On power law distributions in large-scale taxonomies. ACM SIGKDD Explorations Newsletter, 16(1):47--56, 2014. Google ScholarDigital Library
R. Babbar, K. Muandet, and B. Schölkopf. Tersesvm : A scalable approach for learning compact models in large-scale classification. In SIAM International Conference on Data Mining (SDM 2016), 2016. Google ScholarCross Ref
R. Babbar, I. Partalas, E. Gaussier, and M.-R. Amini. Re-ranking approach to classification in large-scale power-law distributed category systems. In ACM SIGIR, 2014. Google ScholarDigital Library
R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini, and C. Amblard. Learning taxonomy adaptation in large-scale classification. Journal of Machine Learning Research, 17(98):1--37, 2016.Google Scholar
R. Babbar, I. Partalas, C. Metzig, E. Gaussier, and M.-r. Amini. Comparative classifier evaluation for web-scale taxonomies using power law. In Extended Semantic Web Conference, pages 310--311. Springer, 2013. Google ScholarCross Ref
S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Neural Information Processing Systems, pages 163--171, 2010.Google Scholar
K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems, pages 730--738, 2015.Google ScholarDigital Library
M. M. Cisse, N. Usunier, T. Artieres, and P. Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851--1859, 2013.Google ScholarDigital Library
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.Google ScholarDigital Library
S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 257--265. ACM, 2013. Google ScholarDigital Library
S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135--1143, 2015.Google ScholarDigital Library
D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in neural information processing systems, 2009.Google Scholar
H. Jain, Y. Prabhu, and M. Varma. Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2016. Google ScholarDigital Library
K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. Extreme f-measure maximization using sparse probability estimates. In Proceedings of the 33nd International Conference on Machine Learning, pages 1435--1444.Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarDigital Library
Z. Lin, G. Ding, M. Hu, and J. Wang. Multi-label classification via feature-aware implicit label space encoding. pages 325--333, 2014.Google Scholar
J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013. Google ScholarDigital Library
I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.Google Scholar
Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the ACM SIGKDD, August 2014.Google ScholarDigital Library
Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 263--272. ACM, 2014. Google ScholarDigital Library
R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of machine learning research, 5(Jan):101--141, 2004.Google Scholar
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.Google Scholar
F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, pages 2508--2542, 2012. Google ScholarDigital Library
J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. 2011.Google Scholar
J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In Proceedings of The 30th International Conference on Machine Learning, pages 181--189, 2013.Google ScholarDigital Library
R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del. icio. us cookbook.Google Scholar
C. Xu, D. Tao, and C. Xu. Robust extreme multi-label learning. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2016. Google ScholarDigital Library
I. E. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. Pd-sparse : A primal and dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the 33nd International Conference on Machine Learning, 2016.Google Scholar
H.-F. Yu, P. Jain, P. Kar, and I. Dhillon. Large-scale multi-label learning with missing labels. In Proceedings of The 31st International Conference on Machine Learning, pages 593--601, 2014.Google ScholarDigital Library
G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012. Google ScholarCross Ref

Index Terms

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label set. Unfortunately, existing loss functions, such as the ...
Read More
FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem ...
Read More
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
extreme classification
large-scale classification
multi-label learning
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 115
  Total Citations
  View Citations
- 924
  Total Downloads
- Downloads (Last 12 months)52
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning

Inductive Semi-supervised Multi-Label Learning with Co-Training