research-article

Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches

Authors:
Himanshu Jain

Indian Institute of Technology Delhi, New Delhi, India

Indian Institute of Technology Delhi, New Delhi, India
View Profile

,
Venkatesh Balasubramanian

Microsoft AI & Research, Bangalore, India

Microsoft AI & Research, Bangalore, India
View Profile

,
Bhanu Chunduri

Microsoft AI & Research, Hyderabad, India

Microsoft AI & Research, Hyderabad, India
View Profile

,
Manik Varma

Microsoft AI & Research, Bangalore, India

Microsoft AI & Research, Bangalore, India
View Profile

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data MiningJanuary 2019Pages 528–536https://doi.org/10.1145/3289600.3290979

Published:30 January 2019Publication History

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

Pages 528–536

ABSTRACT

This paper reformulates the problem of recommending related queries on a search engine as an extreme multi-label learning task. Extreme multi-label learning aims to annotate each data point with the most relevant subset of labels from an extremely large label set. Each of the top 100 million queries on Bing was treated as a separate label in the proposed reformulation and an extreme classifier was learnt which took the user's query as input and predicted the relevant subset of 100 million queries as output. Unfortunately, state-of-the-art extreme classifiers have not been shown to scale beyond 10 million labels and have poor prediction accuracies for queries. This paper therefore develops the Slice algorithm which can be accurately trained on low-dimensional, dense deep learning features popularly used to represent queries and which efficiently scales to 100 million labels and 240 million training points. Slice achieves this by reducing the training and prediction times from linear to logarithmic in the number of labels based on a novel negative sampling technique. This allows the proposed reformulation to address some of the limitations of traditional related searches approaches in terms of coverage, density and quality. Experiments on publically available extreme classification datasets with low-dimensional dense features as well as related searches datasets mined from the Bing logs revealed that slice could be more accurate than leading extreme classifiers while also scaling to 100 million labels. Furthermore, slice was found to improve the accuracy of recommendations by 10% as compared to state-of-the-art related searches techniques. Finally, when added to the ensemble in production in Bing, slice was found to increase the trigger coverage by 52%, the suggestion density by 33%, the overall success rate by 2.6% and the success rate for tail queries by 12.6%. Slice's source code can be downloaded from [21].

References

R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages. In WWW . Google ScholarDigital Library
A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt. 2015. Practical and optimal LSH for angular distance. In NIPS . Google ScholarDigital Library
R. Babbar and B. Shoelkopf. 2017. DiSMEC-Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM . Google ScholarDigital Library
R. Baeza-Yates, C. Hurtado, and M. Mendoza. 2004. Query recommendation using query logs in search engines. In ICDT . Google ScholarDigital Library
K. Bhatia, K. Dahiya, H. Jain, Y. Prabhu, and M. Varma. 2014. The Extreme Classification Repository . hrefhttp://manikvarma.org/downloads/XC/XMLRepository.htmlnolinkurlhttp://manikvarma.org/downloads/XC/XMLRepository.html.Google Scholar
K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS . Google ScholarDigital Library
P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. 2008. The query-flow graph: model and applications. In CIKM . Google ScholarDigital Library
F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. 2012. Efficient query recommendations in the long tail via center-piece subgraphs. In SIGIR . Google ScholarDigital Library
L. Boytsov. 2016. Code for HNSW . hrefhttps://github.com/searchivarius/nmsliblnolinkurlhttps://github.com/searchivarius/nmslib.Google Scholar
L. Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. In ICML . Google ScholarDigital Library
Y. N. Chen and H. T. Lin. 2012. Feature-aware Label Space Dimension Reduction for Multi-label Classification. In NIPS . Google ScholarDigital Library
M. Cissé, N. Usunier, T. Artières, and P. Gallinari. 2013. Robust Bloom Filters for Large MultiLabel Classification Tasks. In NIPS . Google ScholarDigital Library
M. Dehghani, S. Rothe, E. Alfonseca, and P. Fleury. 2017. Learning to attend, copy, and generate for session-based query suggestion. In CIKM . Google ScholarDigital Library
F. Diaz, B. Mitra, and N. Craswell. 2016. Query expansion with locally-trained word embeddings. CoRR (2016). https://arxiv.org/abs/1605.07891Google Scholar
R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR (2008). Google ScholarDigital Library
H. B. Hashemi, A. Asiaee, and R. Kraft. 2016. Query intent detection using convolutional neural networks. In WSDM .Google Scholar
D. Hsu, S. Kakade, J. Langford, and T. Zhang. 2009. Multi-Label Prediction via Compressed Sensing. In NIPS . Google ScholarDigital Library
P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM . Google ScholarDigital Library
P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality.Google Scholar
A. Jain, U. Ozertem, and E. Velipasaoglu. 2011. Synthesizing high utility suggestions for rare web search queries. In SIGIR . Google ScholarDigital Library
H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Code for Slice . hrefhttp://manikvarma.org/code/Slice/download.htmlnolinkurlhttp://manikvarma.org/code/Slice/download.html.Google Scholar
H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications. In KDD . Google ScholarDigital Library
K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. 2016. Extreme F-measure Maximization Using Sparse Probability Estimates. In ICML . 1435--1444. Google ScholarDigital Library
R. Jones, B. Rey, O. Madani, and W. Greiner. 2006. Generating query substitutions. In WWW . Google ScholarDigital Library
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of tricks for efficient text classification. In EACL .Google Scholar
E. Kushilevitz, R. Ostrovsky, and Y. Rabani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality.Google Scholar
W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin. 2016. Approximate Nearest Neighbor Search on High Dimensional Data--Experiments, Analyses, and Improvement. CoRR (2016). https://arxiv.org/pdf/1610.02455.pdfGoogle Scholar
Z. Lin, G. Ding, M. Hu, and J. Wang. 2014. Multi-label Classification via Feature-aware Implicit Label Space Encoding. In ICML . Google ScholarDigital Library
J. Liu, W. C. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR . Google ScholarDigital Library
Z. Lu, B. Savas, W. Tang, and I. S. Dhillon. 2010. Supervised link prediction using multiple sources.Google Scholar
Y. Malkov and D. A. Yashunin. 2016. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. CoRR (2016).Google Scholar
J. McAuley and J. Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys . Google ScholarDigital Library
Q. Mei, D. Zhou, and K. Church. 2008. Query Suggestion Using Hitting Time. In CIKM . Google ScholarDigital Library
E. L. Mencia and J. Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In SIGIR .Google Scholar
A. K. Menon and C. Elkan. 2011. Link prediction via matrix factorization. In ECML-PKDD . Google ScholarDigital Library
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS . Google ScholarDigital Library
P. Mineiro and N. Karampatziakis. 2015. Fast Label Embeddings for Extremely Large Output Spaces. In ECML .Google Scholar
J. P. Mordelet, F. amd Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. (2014).Google Scholar
G. Navarro. 2002. Searching in metric spaces by spatial approximation.Google Scholar
A. Ng and M. Jordan. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes.. In NIPS . Google ScholarDigital Library
A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS .Google Scholar
K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. (2000).Google Scholar
U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. 2012. Learning to suggest: a machine learning framework for ranking query suggestions. In SIGIR . Google ScholarDigital Library
J. Pennington, R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation.Google Scholar
J. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers .Google Scholar
Y. Prabhu, A. Kag, S. Gopinath, K. Dahia, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM . Google ScholarDigital Library
Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising. In WWW . Google ScholarDigital Library
Y. Prabhu and M. Varma. 2014. FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. In KDD . Google ScholarDigital Library
R. Ramanath, G. Polatkan, L. Xu, H. Lee, B. Hu, and S. Zhou. 2018. Deploying Deep Ranking Models for Search Verticals. CoRR (2018). https://arxiv.org/abs/1806.02281Google Scholar
S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback.. In AUAI . Google ScholarDigital Library
E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. 2010. Clustering query refinements by user intent. In WWW . Google ScholarDigital Library
Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In WWW . Google ScholarDigital Library
S. Si, H. Zhang, S. S. Keerthi, D. Mahajan, I. S. Dhillon, and C. J. Hsieh. 2017. Gradient Boosted Decision Trees for High Dimensional Sparse Output. In ICML . 3182--3190. Google ScholarDigital Library
W. Siblini, F. Meyer, and P. Kuntz. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In International Conference on Machine Learning. In ICML .Google Scholar
A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Simonsen, and J. Y. Nie. 2015. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In CIKM . Google ScholarDigital Library
S. Sra. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x). (2012).Google Scholar
Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD . Google ScholarDigital Library
J. Uhlmann. 1991. Satisfying general proximity similarity queries with metric trees. (1991).Google Scholar
H. Vahabi, M. Ackerman, D. Loker, R. Baeza-Yates, and A. Lopez-Ortiz. 2013. Orthogonal query recommendation. In RecSys . Google ScholarDigital Library
J. Weston, S. Bengio, and N. Usunier. 2011. Wsabie: Scaling Up To Large Vocabulary Image Annotation. In IJCAI . Google ScholarDigital Library
J. Weston, A. Makadia, and H. Yee. 2013. Label Partitioning For Sublinear Ranking. In ICML . Google ScholarDigital Library
S. H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. 2011. Like like alike: joint friendship and interest propagation in social networks.. In WWW . Google ScholarDigital Library
I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing. 2017. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD . 545--553. Google ScholarDigital Library
I. E. H. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. 2016. PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In ICML . Google ScholarDigital Library
I. E. H. Yen, S. Kale, F. Yu, D. Holtmann-Rice, S. Kumar, and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In International Conference on Machine Learning. In ICML .Google Scholar
P. N. Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces.Google Scholar
H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. 2014. Large-scale Multi-label Learning with Missing Labels. In ICML . Google ScholarDigital Library
W. Zhang, L. Wang, J. Yan, X. Wang, and H. Zha. 2017. Deep Extreme Multi-label Learning. CoRR (2017).Google Scholar
Y. Zhang and J. G. Schneider. 2011. Multi-Label Output Codes using Canonical Correlation Analysis. In AISTATS .Google Scholar

Index Terms

Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label ...
Read More
Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

The objective in extreme multi-label learning is to build classifiers that can annotate a data point with the subset of relevant labels from an extremely large label set. Extreme classification has, thus far, only been studied in the context of ...
Read More
NGAME: Negative Mining-aware Mini-batching for Extreme Classification
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Extreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
January 2019
874 pages
ISBN:9781450359405
DOI:10.1145/3289600
General Chairs:
J. Shane Culpepper
RMIT University
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Paul N. Bennett
Microsoft
,
Kristina Lerman
University of Southern California
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 January 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Best Paper
Author Tags
extreme multi-label learning
large-scale learning
negative sampling
query recommendation
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '19 Paper Acceptance Rate84of511submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 71
  Total Citations
  View Citations
- 651
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches

WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation

NGAME: Negative Mining-aware Mini-batching for Extreme Classification