Abstract
As a big data application, extreme multilabel classification has emerged as an important research topic with applications in ranking and recommendation of products and items. A scalable hybrid distributed and shared memory implementation of extreme classification for large scale ranking and recommendation is proposed. In particular, the implementation is a mix of message passing using MPI across nodes and using multithreading on the nodes using OpenMP. The expression for communication latency and communication volume is derived. Parallelism using work-span model is derived for shared memory architecture. This throws light on the expected scalability of similar extreme classification methods. Experiments show that the implementation is relatively faster to train and test on some large datasets. In some cases, model size is relatively small.
Code: https://github.com/misterpawan/DXML
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Open MPI: Open source high performance computing. https://www.open-mpi.org/
Openmp. https://www.openmp.org/
Kumar, P., Markidis, S., Lapenta, G., Meerbergen, K., Roose, D.: High performance solvers for implicit particle in cell simulation (special issue). Procedia Comput. Sci. 18, 2251–2258 (2013). https://doi.org/10.1016/j.procs.2013.05.396. https://www.sciencedirect.com/science/article/pii/S1877050913005395. 2013 International Conference on Computational Science
Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse local embeddings for extreme multi-label classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 1, pp. 730–738. MIT Press, Cambridge (2015)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. SIGPLAN Not. 30(8), 207–216 (1995). https://doi.org/10.1145/209937.209958
Jain, H., Prabhu, Y., Varma, M.: Extreme multi-label loss functions for recommendation, tagging, ranking and other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 935–944. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939756
Jasinska, K., Dembczynski, K., Busa-Fekete, R., Pfannschmidt, K., Klerx, T., Hullermeier, E.: Extreme f-measure maximization using sparse probability estimates. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 1435–1444. JMLR.org (2016)
Jayadev, N., Tanmay, S., Pawan, K.: A riemannian approach for constrained optimization problem in extreme classification problems. CoRR abs/2109.15021 (2021). https://arxiv.org/abs/2109.15021
Jayadev, N., Tanmay, S., Pawan, K.: A riemannian approach for extreme classification problems. In: CODS-COMAD 2021 (2021)
Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Ensembles of multi-objective decision trees. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 624–631. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_61
Kumar, P.: Communication optimal least squares solver. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th Intl Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Syst (HPCC, CSS, ICESS), pp. 316–319 (2014). https://doi.org/10.1109/HPCC.2014.55
Kumar, P.: Multithreaded direction preserving preconditioners. In: 2014 IEEE 13th International Symposium on Parallel and Distributed Computing, pp. 148–155 (2014). https://doi.org/10.1109/ISPDC.2014.23
Kumar, P.: Multilevel communication optimal least squares (special issue). Procedia Comput. Sci. 51, 1838–1847 (2015). https://doi.org/10.1016/j.procs.2015.05.410. https://www.sciencedirect.com/science/article/pii/S1877050915012181. International Conference On Computational Science, ICCS 2015
Kumar, P., Meerbergen, K., Roose, D.: Multi-threaded nested filtering factorization preconditioner. In: Manninen, P., Öster, P. (eds.) PARA 2012. LNCS, vol. 7782, pp. 220–234. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36803-5_16
Prabhu, Y., Varma, M.: Fastxml: a fast, accurate and stable tree-classifier for extreme multi-label learning, KDD 2014, pp. 263–272. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2623330.2623651
Siblini, W., Meyer, F., Kuntz, P.: Craftml, an efficient clustering-based random forest for extreme multi-label learning. In: Dy, J.G., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. Proceedings of Machine Learning Research, vol. 80, pp. 4671–4680. PMLR (2018). http://proceedings.mlr.press/v80/siblini18a.html
Tagami, Y.: Annexml: approximate nearest neighbor search for extreme multi-label classification. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 455–464. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3097987
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehous. Min. 3, 1–13 (2007)
Weinberger, K.Q., Dasgupta, A., Attenberg, J., Langford, J., Smola, A.J.: Feature hashing for large scale multitask learning. CoRR abs/0902.2206 (2009). http://arxiv.org/abs/0902.2206
Weston, J., Bengio, S., Usunier, N.: Wsabie: scaling up to large vocabulary image annotation, IJCAI 2011, pp. 2764–2770. AAAI Press (2011)
Weston, J., Makadia, A., Yee, H.: Label partitioning for sublinear ranking. In: Proceedings of the 30th International Conference on International Conference on Machine Learning, ICML 2013, vol. 28, pp. II-181–II-189. JMLR.org (2013)
Yen, I.E.H., Huang, X., Zhong, K., Ravikumar, P., Dhillon, I.S.: PD-sparse: a primal and dual sparse approach to extreme multiclass and multilabel classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 3069–3077. JMLR.org (2016)
Yen, I.E., Huang, X., Dai, W., Ravikumar, P., Dhillon, I., Xing, E.: PPDSparse: a parallel primal-dual sparse method for extreme classification. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 545–553. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3097983.3098083
Yu, H.F., Jain, P., Kar, P., Dhillon, I.S.: Large-scale multi-label learning with missing labels. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, ICML 2014, vol. 32, pp. I-593–I-601. JMLR.org (2014)
Zhang, M., Zhou, Z.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26(8), 1819–1837 (2014). https://doi.org/10.1109/TKDE.2013.39
Zhang, M.L., Zhou, Z.H.: ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn. 40(7), 2038–2048 (2007). https://doi.org/10.1016/j.patcog.2006.12.019
Acknowledgement
This work was done at IIIT, Hyderabad using IIIT seed grant. The author acknowledges all the support by institute. This project was partially supported by RIPPLE center of excellence at IIIT, Hyderabad.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kumar, P. (2021). DXML: Distributed Extreme Multilabel Classification. In: Srirama, S.N., Lin, J.CW., Bhatnagar, R., Agarwal, S., Reddy, P.K. (eds) Big Data Analytics. BDA 2021. Lecture Notes in Computer Science(), vol 13147. Springer, Cham. https://doi.org/10.1007/978-3-030-93620-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-93620-4_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93619-8
Online ISBN: 978-3-030-93620-4
eBook Packages: Computer ScienceComputer Science (R0)