ABSTRACT
Extreme multi-label classification refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. Datasets in extreme classification exhibit fit to power-law distribution, i.e. a large fraction of labels have very few positive instances in the data distribution. Most state-of-the-art approaches for extreme multi-label classification attempt to capture correlation among labels by embedding the label matrix to a low-dimensional linear sub-space. However, in the presence of power-law distributed extremely large and diverse label spaces, structural assumptions such as low rank can be easily violated.
In this work, we present DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size. Unlike most state-of-the-art methods, DiSMEC does not make any low rank assumptions on the label matrix. Using double layer of parallelization, DiSMEC can learn classifiers for datasets consisting hundreds of thousands labels within few hours. The explicit capacity control mechanism filters out spurious parameters which keep the model compact in size, without losing prediction accuracy. We conduct extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels. We compare DiSMEC with recent state-of-the-art approaches, including - SLEEC which is a leading approach for learning sparse local embeddings, and FastXML which is a tree-based approach optimizing ranking based loss function. On some of the datasets, DiSMEC can significantly boost prediction accuracies - 10% better compared to SLECC and 15% better compared to FastXML, in absolute terms.
- R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In Proceedings of the International World Wide Web Conference, May 2013. Google ScholarDigital Library
- R. Babbar, C. Metzig, I. Partalas, E. Gaussier, and M.-R. Amini. On power law distributions in large-scale taxonomies. ACM SIGKDD Explorations Newsletter, 16(1):47--56, 2014. Google ScholarDigital Library
- R. Babbar, K. Muandet, and B. Schölkopf. Tersesvm : A scalable approach for learning compact models in large-scale classification. In SIAM International Conference on Data Mining (SDM 2016), 2016. Google ScholarCross Ref
- R. Babbar, I. Partalas, E. Gaussier, and M.-R. Amini. Re-ranking approach to classification in large-scale power-law distributed category systems. In ACM SIGIR, 2014. Google ScholarDigital Library
- R. Babbar, I. Partalas, E. Gaussier, M.-R. Amini, and C. Amblard. Learning taxonomy adaptation in large-scale classification. Journal of Machine Learning Research, 17(98):1--37, 2016.Google Scholar
- R. Babbar, I. Partalas, C. Metzig, E. Gaussier, and M.-r. Amini. Comparative classifier evaluation for web-scale taxonomies using power law. In Extended Semantic Web Conference, pages 310--311. Springer, 2013. Google ScholarCross Ref
- S. Bengio, J. Weston, and D. Grangier. Label embedding trees for large multi-class tasks. In Neural Information Processing Systems, pages 163--171, 2010.Google Scholar
- K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. Sparse local embeddings for extreme multi-label classification. In Advances in Neural Information Processing Systems, pages 730--738, 2015.Google ScholarDigital Library
- M. M. Cisse, N. Usunier, T. Artieres, and P. Gallinari. Robust bloom filters for large multilabel classification tasks. In Advances in Neural Information Processing Systems, pages 1851--1859, 2013.Google ScholarDigital Library
- R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. Liblinear: A library for large linear classification. The Journal of Machine Learning Research, 9:1871--1874, 2008.Google ScholarDigital Library
- S. Gopal and Y. Yang. Recursive regularization for large-scale classification with hierarchical and graphical dependencies. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 257--265. ACM, 2013. Google ScholarDigital Library
- S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems, pages 1135--1143, 2015.Google ScholarDigital Library
- D. Hsu, S. Kakade, J. Langford, and T. Zhang. Multi-label prediction via compressed sensing. In Advances in neural information processing systems, 2009.Google Scholar
- H. Jain, Y. Prabhu, and M. Varma. Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 2016. Google ScholarDigital Library
- K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. Extreme f-measure maximization using sparse probability estimates. In Proceedings of the 33nd International Conference on Machine Learning, pages 1435--1444.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012.Google ScholarDigital Library
- Z. Lin, G. Ding, M. Hu, and J. Wang. Multi-label classification via feature-aware implicit label space encoding. pages 325--333, 2014.Google Scholar
- J. McAuley and J. Leskovec. Hidden factors and hidden topics: understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems, pages 165--172. ACM, 2013. Google ScholarDigital Library
- I. Partalas, A. Kosmopoulos, N. Baskiotis, T. Artieres, G. Paliouras, E. Gaussier, I. Androutsopoulos, M.-R. Amini, and P. Galinari. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581, 2015.Google Scholar
- Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the ACM SIGKDD, August 2014.Google ScholarDigital Library
- Y. Prabhu and M. Varma. Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 263--272. ACM, 2014. Google ScholarDigital Library
- R. Rifkin and A. Klautau. In defense of one-vs-all classification. Journal of machine learning research, 5(Jan):101--141, 2004.Google Scholar
- A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.Google Scholar
- F. Tai and H.-T. Lin. Multilabel classification with principal label space transformation. Neural Computation, pages 2508--2542, 2012. Google ScholarDigital Library
- J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. 2011.Google Scholar
- J. Weston, A. Makadia, and H. Yee. Label partitioning for sublinear ranking. In Proceedings of The 30th International Conference on Machine Learning, pages 181--189, 2013.Google ScholarDigital Library
- R. Wetzker, C. Zimmermann, and C. Bauckhage. Analyzing social bookmarking systems: A del. icio. us cookbook.Google Scholar
- C. Xu, D. Tao, and C. Xu. Robust extreme multi-label learning. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2016. Google ScholarDigital Library
- I. E. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. Pd-sparse : A primal and dual sparse approach to extreme multiclass and multilabel classification. In Proceedings of the 33nd International Conference on Machine Learning, 2016.Google Scholar
- H.-F. Yu, P. Jain, P. Kar, and I. Dhillon. Large-scale multi-label learning with missing labels. In Proceedings of The 31st International Conference on Machine Learning, pages 593--601, 2014.Google ScholarDigital Library
- G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification. Proceedings of the IEEE, 100(9):2584--2603, 2012. Google ScholarCross Ref
Index Terms
- DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification
Recommendations
Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThe choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label set. Unfortunately, existing loss functions, such as the ...
FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data miningThe objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningIn multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Comments