ABSTRACT
This paper reformulates the problem of recommending related queries on a search engine as an extreme multi-label learning task. Extreme multi-label learning aims to annotate each data point with the most relevant subset of labels from an extremely large label set. Each of the top 100 million queries on Bing was treated as a separate label in the proposed reformulation and an extreme classifier was learnt which took the user's query as input and predicted the relevant subset of 100 million queries as output. Unfortunately, state-of-the-art extreme classifiers have not been shown to scale beyond 10 million labels and have poor prediction accuracies for queries. This paper therefore develops the Slice algorithm which can be accurately trained on low-dimensional, dense deep learning features popularly used to represent queries and which efficiently scales to 100 million labels and 240 million training points. Slice achieves this by reducing the training and prediction times from linear to logarithmic in the number of labels based on a novel negative sampling technique. This allows the proposed reformulation to address some of the limitations of traditional related searches approaches in terms of coverage, density and quality. Experiments on publically available extreme classification datasets with low-dimensional dense features as well as related searches datasets mined from the Bing logs revealed that slice could be more accurate than leading extreme classifiers while also scaling to 100 million labels. Furthermore, slice was found to improve the accuracy of recommendations by 10% as compared to state-of-the-art related searches techniques. Finally, when added to the ensemble in production in Bing, slice was found to increase the trigger coverage by 52%, the suggestion density by 33%, the overall success rate by 2.6% and the success rate for tail queries by 12.6%. Slice's source code can be downloaded from [21].
- R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages. In WWW . Google ScholarDigital Library
- A. Andoni, P. Indyk, T. Laarhoven, I. Razenshteyn, and L. Schmidt. 2015. Practical and optimal LSH for angular distance. In NIPS . Google ScholarDigital Library
- R. Babbar and B. Shoelkopf. 2017. DiSMEC-Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM . Google ScholarDigital Library
- R. Baeza-Yates, C. Hurtado, and M. Mendoza. 2004. Query recommendation using query logs in search engines. In ICDT . Google ScholarDigital Library
- K. Bhatia, K. Dahiya, H. Jain, Y. Prabhu, and M. Varma. 2014. The Extreme Classification Repository . hrefhttp://manikvarma.org/downloads/XC/XMLRepository.htmlnolinkurlhttp://manikvarma.org/downloads/XC/XMLRepository.html.Google Scholar
- K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NIPS . Google ScholarDigital Library
- P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, and S. Vigna. 2008. The query-flow graph: model and applications. In CIKM . Google ScholarDigital Library
- F. Bonchi, R. Perego, F. Silvestri, H. Vahabi, and R. Venturini. 2012. Efficient query recommendations in the long tail via center-piece subgraphs. In SIGIR . Google ScholarDigital Library
- L. Boytsov. 2016. Code for HNSW . hrefhttps://github.com/searchivarius/nmsliblnolinkurlhttps://github.com/searchivarius/nmslib.Google Scholar
- L. Cayton. 2008. Fast nearest neighbor retrieval for bregman divergences. In ICML . Google ScholarDigital Library
- Y. N. Chen and H. T. Lin. 2012. Feature-aware Label Space Dimension Reduction for Multi-label Classification. In NIPS . Google ScholarDigital Library
- M. Cissé, N. Usunier, T. Artières, and P. Gallinari. 2013. Robust Bloom Filters for Large MultiLabel Classification Tasks. In NIPS . Google ScholarDigital Library
- M. Dehghani, S. Rothe, E. Alfonseca, and P. Fleury. 2017. Learning to attend, copy, and generate for session-based query suggestion. In CIKM . Google ScholarDigital Library
- F. Diaz, B. Mitra, and N. Craswell. 2016. Query expansion with locally-trained word embeddings. CoRR (2016). https://arxiv.org/abs/1605.07891Google Scholar
- R. E. Fan, K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. 2008. LIBLINEAR: A library for large linear classification. JMLR (2008). Google ScholarDigital Library
- H. B. Hashemi, A. Asiaee, and R. Kraft. 2016. Query intent detection using convolutional neural networks. In WSDM .Google Scholar
- D. Hsu, S. Kakade, J. Langford, and T. Zhang. 2009. Multi-Label Prediction via Compressed Sensing. In NIPS . Google ScholarDigital Library
- P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM . Google ScholarDigital Library
- P. Indyk and R. Motwani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality.Google Scholar
- A. Jain, U. Ozertem, and E. Velipasaoglu. 2011. Synthesizing high utility suggestions for rare web search queries. In SIGIR . Google ScholarDigital Library
- H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Code for Slice . hrefhttp://manikvarma.org/code/Slice/download.htmlnolinkurlhttp://manikvarma.org/code/Slice/download.html.Google Scholar
- H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications. In KDD . Google ScholarDigital Library
- K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hüllermeier. 2016. Extreme F-measure Maximization Using Sparse Probability Estimates. In ICML . 1435--1444. Google ScholarDigital Library
- R. Jones, B. Rey, O. Madani, and W. Greiner. 2006. Generating query substitutions. In WWW . Google ScholarDigital Library
- A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of tricks for efficient text classification. In EACL .Google Scholar
- E. Kushilevitz, R. Ostrovsky, and Y. Rabani. 1998. Approximate nearest neighbors: towards removing the curse of dimensionality.Google Scholar
- W. Li, Y. Zhang, Y. Sun, W. Wang, W. Zhang, and X. Lin. 2016. Approximate Nearest Neighbor Search on High Dimensional Data--Experiments, Analyses, and Improvement. CoRR (2016). https://arxiv.org/pdf/1610.02455.pdfGoogle Scholar
- Z. Lin, G. Ding, M. Hu, and J. Wang. 2014. Multi-label Classification via Feature-aware Implicit Label Space Encoding. In ICML . Google ScholarDigital Library
- J. Liu, W. C. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR . Google ScholarDigital Library
- Z. Lu, B. Savas, W. Tang, and I. S. Dhillon. 2010. Supervised link prediction using multiple sources.Google Scholar
- Y. Malkov and D. A. Yashunin. 2016. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. CoRR (2016).Google Scholar
- J. McAuley and J. Leskovec. 2013. Hidden factors and hidden topics: understanding rating dimensions with review text. In RecSys . Google ScholarDigital Library
- Q. Mei, D. Zhou, and K. Church. 2008. Query Suggestion Using Hitting Time. In CIKM . Google ScholarDigital Library
- E. L. Mencia and J. Fürnkranz. 2008. Efficient pairwise multilabel classification for large-scale problems in the legal domain. In SIGIR .Google Scholar
- A. K. Menon and C. Elkan. 2011. Link prediction via matrix factorization. In ECML-PKDD . Google ScholarDigital Library
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In NIPS . Google ScholarDigital Library
- P. Mineiro and N. Karampatziakis. 2015. Fast Label Embeddings for Extremely Large Output Spaces. In ECML .Google Scholar
- J. P. Mordelet, F. amd Vert. 2014. A bagging SVM to learn from positive and unlabeled examples. (2014).Google Scholar
- G. Navarro. 2002. Searching in metric spaces by spatial approximation.Google Scholar
- A. Ng and M. Jordan. 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes.. In NIPS . Google ScholarDigital Library
- A. Niculescu-Mizil and E. Abbasnejad. 2017. Label Filters for Large Scale Multilabel Classification. In AISTATS .Google Scholar
- K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. (2000).Google Scholar
- U. Ozertem, O. Chapelle, P. Donmez, and E. Velipasaoglu. 2012. Learning to suggest: a machine learning framework for ranking query suggestions. In SIGIR . Google ScholarDigital Library
- J. Pennington, R. Socher, and C. Manning. 2014. Glove: Global vectors for word representation.Google Scholar
- J. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers .Google Scholar
- Y. Prabhu, A. Kag, S. Gopinath, K. Dahia, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM . Google ScholarDigital Library
- Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising. In WWW . Google ScholarDigital Library
- Y. Prabhu and M. Varma. 2014. FastXML: A fast, accurate and stable tree-classifier for extreme multi-label learning. In KDD . Google ScholarDigital Library
- R. Ramanath, G. Polatkan, L. Xu, H. Lee, B. Hu, and S. Zhou. 2018. Deploying Deep Ranking Models for Search Verticals. CoRR (2018). https://arxiv.org/abs/1806.02281Google Scholar
- S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback.. In AUAI . Google ScholarDigital Library
- E. Sadikov, J. Madhavan, L. Wang, and A. Halevy. 2010. Clustering query refinements by user intent. In WWW . Google ScholarDigital Library
- Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. 2014. Learning semantic representations using convolutional neural networks for web search. In WWW . Google ScholarDigital Library
- S. Si, H. Zhang, S. S. Keerthi, D. Mahajan, I. S. Dhillon, and C. J. Hsieh. 2017. Gradient Boosted Decision Trees for High Dimensional Sparse Output. In ICML . 3182--3190. Google ScholarDigital Library
- W. Siblini, F. Meyer, and P. Kuntz. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In International Conference on Machine Learning. In ICML .Google Scholar
- A. Sordoni, Y. Bengio, H. Vahabi, C. Lioma, J. Grue Simonsen, and J. Y. Nie. 2015. A hierarchical recurrent encoder-decoder for generative context-aware query suggestion. In CIKM . Google ScholarDigital Library
- S. Sra. 2012. A short note on parameter approximation for von Mises-Fisher distributions: and a fast implementation of Is(x). (2012).Google Scholar
- Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD . Google ScholarDigital Library
- J. Uhlmann. 1991. Satisfying general proximity similarity queries with metric trees. (1991).Google Scholar
- H. Vahabi, M. Ackerman, D. Loker, R. Baeza-Yates, and A. Lopez-Ortiz. 2013. Orthogonal query recommendation. In RecSys . Google ScholarDigital Library
- J. Weston, S. Bengio, and N. Usunier. 2011. Wsabie: Scaling Up To Large Vocabulary Image Annotation. In IJCAI . Google ScholarDigital Library
- J. Weston, A. Makadia, and H. Yee. 2013. Label Partitioning For Sublinear Ranking. In ICML . Google ScholarDigital Library
- S. H. Yang, B. Long, A. Smola, N. Sadagopan, Z. Zheng, and H. Zha. 2011. Like like alike: joint friendship and interest propagation in social networks.. In WWW . Google ScholarDigital Library
- I. E. H. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing. 2017. PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD . 545--553. Google ScholarDigital Library
- I. E. H. Yen, X. Huang, P. Ravikumar, K. Zhong, and I. S. Dhillon. 2016. PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification. In ICML . Google ScholarDigital Library
- I. E. H. Yen, S. Kale, F. Yu, D. Holtmann-Rice, S. Kumar, and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In International Conference on Machine Learning. In ICML .Google Scholar
- P. N. Yianilos. 1993. Data structures and algorithms for nearest neighbor search in general metric spaces.Google Scholar
- H. F. Yu, P. Jain, P. Kar, and I. S. Dhillon. 2014. Large-scale Multi-label Learning with Missing Labels. In ICML . Google ScholarDigital Library
- W. Zhang, L. Wang, J. Yan, X. Wang, and H. Zha. 2017. Deep Extreme Multi-label Learning. CoRR (2017).Google Scholar
- Y. Zhang and J. G. Schneider. 2011. Multi-Label Output Codes using Canonical Correlation Analysis. In AISTATS .Google Scholar
Index Terms
- Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches
Recommendations
DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data MiningScalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label ...
Extreme Multi-label Learning with Label Features for Warm-start Tagging, Ranking & Recommendation
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data MiningThe objective in extreme multi-label learning is to build classifiers that can annotate a data point with the subset of relevant labels from an extremely large label set. Extreme classification has, thus far, only been studied in the context of ...
NGAME: Negative Mining-aware Mini-batching for Extreme Classification
WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data MiningExtreme Classification (XC) seeks to tag data points with the most relevant subset of labels from an extremely large label set. Performing deep XC with dense, learnt representations for data points and labels has attracted much attention due to its ...
Comments