skip to main content
10.1145/3289600.3290979acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article
Best Paper

Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches

Published:30 January 2019Publication History

ABSTRACT

This paper reformulates the problem of recommending related queries on a search engine as an extreme multi-label learning task. Extreme multi-label learning aims to annotate each data point with the most relevant subset of labels from an extremely large label set. Each of the top 100 million queries on Bing was treated as a separate label in the proposed reformulation and an extreme classifier was learnt which took the user's query as input and predicted the relevant subset of 100 million queries as output. Unfortunately, state-of-the-art extreme classifiers have not been shown to scale beyond 10 million labels and have poor prediction accuracies for queries. This paper therefore develops the Slice algorithm which can be accurately trained on low-dimensional, dense deep learning features popularly used to represent queries and which efficiently scales to 100 million labels and 240 million training points. Slice achieves this by reducing the training and prediction times from linear to logarithmic in the number of labels based on a novel negative sampling technique. This allows the proposed reformulation to address some of the limitations of traditional related searches approaches in terms of coverage, density and quality. Experiments on publically available extreme classification datasets with low-dimensional dense features as well as related searches datasets mined from the Bing logs revealed that slice could be more accurate than leading extreme classifiers while also scaling to 100 million labels. Furthermore, slice was found to improve the accuracy of recommendations by 10% as compared to state-of-the-art related searches techniques. Finally, when added to the ensemble in production in Bing, slice was found to increase the trigger coverage by 52%, the suggestion density by 33%, the overall success rate by 2.6% and the success rate for tail queries by 12.6%. Slice's source code can be downloaded from [21].

References

  1. R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages. In WWW . Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt. 2015. Practical and optimal LSH for angular distance. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Babbar and B. Shoelkopf. 2017. DiSMEC-Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Baeza-Yates, C. Hurtado, and M. Mendoza. 2004. Query recommendation using query logs in search engines. In ICDT . Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Bhatia, K. Dahiya, H. Jain, Y. Prabhu, and M. Varma. 2014. The Extreme Classification Repository . hrefhttp://manikvarma.org/downloads/XC/XMLRepository.htmlnolinkurlhttp://manikvarma.org/downloads/XC/XMLRepository.html.Google ScholarGoogle Scholar
  6. K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. 2008. The query-flow graph: model and applications. In CIKM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. 2012. Efficient query recommendations in the long tail via center-piece subgraphs. In SIGIR . Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Boytsov. 2016. Code for HNSW . hrefhttps://github.com/searchivarius/nmsliblnolinkurlhttps://github.com/searchivarius/nmslib.Google ScholarGoogle Scholar
  10. L. Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. In ICML . Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. N. Chen and H. T. Lin. 2012. Feature-aware Label Space Dimension Reduction for Multi-label Classification. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Cissé, N. Usunier, T. Artières, and P. Gallinari. 2013. Robust Bloom Filters for Large MultiLabel Classification Tasks. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Dehghani, S. Rothe, E. Alfonseca, and P. Fleury. 2017. Learning to attend, copy, and generate for session-based query suggestion. In CIKM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Diaz, B. Mitra, and N. Craswell. 2016. Query expansion with locally-trained word embeddings. CoRR (2016). https://arxiv.org/abs/1605.07891Google ScholarGoogle Scholar
  15. R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. B. Hashemi, A. Asiaee, and R. Kraft. 2016. Query intent detection using convolutional neural networks. In WSDM .Google ScholarGoogle Scholar
  17. D. Hsu, S. Kakade, J. Langford, and T. Zhang. 2009. Multi-Label Prediction via Compressed Sensing. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality.Google ScholarGoogle Scholar
  20. A. Jain, U. Ozertem, and E. Velipasaoglu. 2011. Synthesizing high utility suggestions for rare web search queries. In SIGIR . Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Code for Slice . hrefhttp://manikvarma.org/code/Slice/download.htmlnolinkurlhttp://manikvarma.org/code/Slice/download.html.Google ScholarGoogle Scholar
  22. H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications. In KDD . Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. 2016. Extreme F-measure Maximization Using Sparse Probability Estimates. In ICML . 1435--1444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Jones, B. Rey, O. Madani, and W. Greiner. 2006. Generating query substitutions. In WWW . Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of tricks for efficient text classification. In EACL .Google ScholarGoogle Scholar
  26. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality.Google ScholarGoogle Scholar
  27. W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin. 2016. Approximate Nearest Neighbor Search on High Dimensional Data--Experiments, Analyses, and Improvement. CoRR (2016). https://arxiv.org/pdf/1610.02455.pdfGoogle ScholarGoogle Scholar
  28. Z. Lin, G. Ding, M. Hu, and J. Wang. 2014. Multi-label Classification via Feature-aware Implicit Label Space Encoding. In ICML . Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Liu, W. C. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR . Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Lu, B. Savas, W. Tang, and I. S. Dhillon. 2010. Supervised link prediction using multiple sources.Google ScholarGoogle Scholar
  31. Y. Malkov and D. A. Yashunin. 2016. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. CoRR (2016).Google ScholarGoogle Scholar
  32. J. McAuley and J. Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys . Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Q. Mei, D. Zhou, and K. Church. 2008. Query Suggestion Using Hitting Time. In CIKM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. E. L. Mencia and J. Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In SIGIR .Google ScholarGoogle Scholar
  35. A. K. Menon and C. Elkan. 2011. Link prediction via matrix factorization. In ECML-PKDD . Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Mineiro and N. Karampatziakis. 2015. Fast Label Embeddings for Extremely Large Output Spaces. In ECML .Google ScholarGoogle Scholar
  38. J. P. Mordelet, F. amd Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. (2014).Google ScholarGoogle Scholar
  39. G. Navarro. 2002. Searching in metric spaces by spatial approximation.Google ScholarGoogle Scholar
  40. A. Ng and M. Jordan. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes.. In NIPS . Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS .Google ScholarGoogle Scholar
  42. K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. (2000).Google ScholarGoogle Scholar
  43. U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. 2012. Learning to suggest: a machine learning framework for ranking query suggestions. In SIGIR . Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. Pennington, R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation.Google ScholarGoogle Scholar
  45. J. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers .Google ScholarGoogle Scholar
  46. Y. Prabhu, A. Kag, S. Gopinath, K. Dahia, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising. In WWW . Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Prabhu and M. Varma. 2014. FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. In KDD . Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. R. Ramanath, G. Polatkan, L. Xu, H. Lee, B. Hu, and S. Zhou. 2018. Deploying Deep Ranking Models for Search Verticals. CoRR (2018). https://arxiv.org/abs/1806.02281Google ScholarGoogle Scholar
  50. S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback.. In AUAI . Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. 2010. Clustering query refinements by user intent. In WWW . Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In WWW . Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. S. Si, H. Zhang, S. S. Keerthi, D. Mahajan, I. S. Dhillon, and C. J. Hsieh. 2017. Gradient Boosted Decision Trees for High Dimensional Sparse Output. In ICML . 3182--3190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. W. Siblini, F. Meyer, and P. Kuntz. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In International Conference on Machine Learning. In ICML .Google ScholarGoogle Scholar
  55. A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Simonsen, and J. Y. Nie. 2015. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In CIKM . Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. S. Sra. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x). (2012).Google ScholarGoogle Scholar
  57. Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD . Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. J. Uhlmann. 1991. Satisfying general proximity similarity queries with metric trees. (1991).Google ScholarGoogle Scholar
  59. H. Vahabi, M. Ackerman, D. Loker, R. Baeza-Yates, and A. Lopez-Ortiz. 2013. Orthogonal query recommendation. In RecSys . Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Weston, S. Bengio, and N. Usunier. 2011. Wsabie: Scaling Up To Large Vocabulary Image Annotation. In IJCAI . Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. J. Weston, A. Makadia, and H. Yee. 2013. Label Partitioning For Sublinear Ranking. In ICML . Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. S. H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. 2011. Like like alike: joint friendship and interest propagation in social networks.. In WWW . Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing. 2017. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD . 545--553. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. I. E. H. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. 2016. PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In ICML . Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. I. E. H. Yen, S. Kale, F. Yu, D. Holtmann-Rice, S. Kumar, and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In International Conference on Machine Learning. In ICML .Google ScholarGoogle Scholar
  66. P. N. Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces.Google ScholarGoogle Scholar
  67. H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. 2014. Large-scale Multi-label Learning with Missing Labels. In ICML . Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. W. Zhang, L. Wang, J. Yan, X. Wang, and H. Zha. 2017. Deep Extreme Multi-label Learning. CoRR (2017).Google ScholarGoogle Scholar
  69. Y. Zhang and J. G. Schneider. 2011. Multi-Label Output Codes using Canonical Correlation Analysis. In AISTATS .Google ScholarGoogle Scholar

Index Terms

  1. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
      January 2019
      874 pages
      ISBN:9781450359405
      DOI:10.1145/3289600

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 January 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '19 Paper Acceptance Rate84of511submissions,16%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader