skip to main content
10.1145/3018661.3018741acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

Authors Info & Claims
Published:02 February 2017Publication History

ABSTRACT

Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated.

In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.

References

  1. R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the International World Wide Web Conference, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Babbar, C. Metzig, I. Partalas, E. Gaussier, and M.-R. Amini. On power law distributions in large-scale taxonomies. ACM SIGKDD Explorations Newsletter, 16(1):47--56, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Babbar, K. Muandet, and B. Schölkopf. Tersesvm : A scalable approach for learning compact models in large-scale classification. In SIAM International Conference on Data Mining (SDM 2016), 2016. Google ScholarGoogle ScholarCross RefCross Ref
  4. R. Babbar, I. Partalas, E. Gaussier, and M.-R. Amini. Re-ranking approach to classification in large-scale power-law distributed category systems. In ACM SIGIR, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini, and C. Amblard. Learning taxonomy adaptation in large-scale classification. Journal of Machine Learning Research, 17(98):1--37, 2016.Google ScholarGoogle Scholar
  6. R. Babbar, I. Partalas, C. Metzig, E. Gaussier, and M.-r. Amini. Comparative classifier evaluation for web-scale taxonomies using power law. In Extended Semantic Web Conference, pages 310--311. Springer, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Neural Information Processing Systems, pages 163--171, 2010.Google ScholarGoogle Scholar
  8. K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems, pages 730--738, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. M. Cisse, N. Usunier, T. Artieres, and P. Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851--1859, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 257--265. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135--1143, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in neural information processing systems, 2009.Google ScholarGoogle Scholar
  14. H. Jain, Y. Prabhu, and M. Varma. Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. Extreme f-measure maximization using sparse probability estimates. In Proceedings of the 33nd International Conference on Machine Learning, pages 1435--1444.Google ScholarGoogle Scholar
  16. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Z. Lin, G. Ding, M. Hu, and J. Wang. Multi-label classification via feature-aware implicit label space encoding. pages 325--333, 2014.Google ScholarGoogle Scholar
  18. J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.Google ScholarGoogle Scholar
  20. Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the ACM SIGKDD, August 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 263--272. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of machine learning research, 5(Jan):101--141, 2004.Google ScholarGoogle Scholar
  23. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.Google ScholarGoogle Scholar
  24. F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, pages 2508--2542, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. 2011.Google ScholarGoogle Scholar
  26. J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In Proceedings of The 30th International Conference on Machine Learning, pages 181--189, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del. icio. us cookbook.Google ScholarGoogle Scholar
  28. C. Xu, D. Tao, and C. Xu. Robust extreme multi-label learning. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. I. E. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. Pd-sparse : A primal and dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the 33nd International Conference on Machine Learning, 2016.Google ScholarGoogle Scholar
  30. H.-F. Yu, P. Jain, P. Kar, and I. Dhillon. Large-scale multi-label learning with missing labels. In Proceedings of The 31st International Conference on Machine Learning, pages 593--601, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
      February 2017
      868 pages
      ISBN:9781450346757
      DOI:10.1145/3018661

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 February 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader