Quantitative robustness of instance ranking problems

Werner, Tino

doi:10.1007/s10463-022-00847-1

Quantitative robustness of instance ranking problems

Published: 30 August 2022

Volume 75, pages 335–368, (2023)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Tino Werner¹

135 Accesses
1 Citation
Explore all metrics

Abstract

Instance ranking problems intend to recover the ordering of the instances in a data set with applications in scientific, social and financial contexts. In this work, we concentrate on the global robustness of parametric instance ranking problems in terms of the breakdown point which measures the fraction of samples that need to be perturbed in order to let the estimator take unreasonable values. Existing breakdown point notions do not cover ranking problems so far. We propose to define a breakdown of the estimator as a sign-reversal of all components which causes the predicted ranking to be potentially completely inverted; therefore, we call it the order-inversal breakdown point (OIBDP). We will study the OIBDP, based on a linear model, for several different carefully distinguished ranking problems and provide least favorable outlier configurations, characterizations of the order-inversal breakdown point and sharp asymptotic upper bounds. We also compute empirical OIBDPs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on instance ranking problems in statistical learning

Article Open access 18 November 2021

Efficiency comparisons for partially rank-ordered set sampling

Article 16 January 2016

Perfect ranking test in moving extreme ranked set sampling

Article 10 December 2015

References

Agarwal, S. (2010). Learning to rank on graphs. Machine Learning, 81(3), 333–357.
Article MathSciNet MATH Google Scholar
Agarwal, S., Sengupta, S. (2009). Ranking genes by relevance to a disease. Proceedings of the 8th annual international conference on computational systems bioinformatics, 37–46.
Alfons, A., Croux, C., Gelper, S. (2013). Sparse least trimmed squares regression for analyzing high-dimensional large data sets. The Annals of Applied Statistics, 7(1), 226–248.
Article MathSciNet MATH Google Scholar
Alqallaf, F., Van Aelst, S., Yohai, V. J., et al. (2009). Propagation of outliers in multivariate data. The Annals of Statistics, 37(1), 311–331.
Article MathSciNet MATH Google Scholar
Averbukh, V., Smolyanov, O. (1967). The theory of differentiation in linear topological spaces. Russian Mathematical Surveys, 22(6), 201–258.
Article MATH Google Scholar
Becker, C., Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94(447), 947–955.
Article MathSciNet MATH Google Scholar
Brefeld, U., Scheffer, T. (2005). AUC maximizing support vector learning. Proceedings of the ICML 2005 workshop on ROC analysis in machine learning, 92–99.
Bühlmann, P., Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
MathSciNet MATH Google Scholar
Bühlmann, P., Van De Geer, S. (2011). Statistics for high-dimensional data: Methods, theory and applications. Berlin, Heidelberg: Springer Science & Business Media.
Book MATH Google Scholar
Cao, Y., Xu, J., Liu, T.Y., Li, H., Huang, Y., Hon, H. W. (2006). Adapting ranking SVM to document retrieval. Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, 186–193. ACM.
Chu, L. Y., Nazerzadeh, H., Zhang, H. (2020). Position ranking and auctions for online marketplaces. Management Science, 66(8), 3617–3634.
Article Google Scholar
Clémençon, S., Achab, M. (2017). Ranking data with continuous labels through oriented recursive partitions. Advances in neural information processing systems, 4603–4611.
Clémençon, S., Vayatis, N. (2007). Ranking the best instances. Journal of Machine Learning Research, 8(Dec), 2671–2699.
MathSciNet MATH Google Scholar
Clémençon, S., Vayatis, N. (2008). Tree-structured ranking rules and approximation of the optimal ROC curve. Proceedings of the 2008 conference on algorithmic learning theory. Lecture Notes in Artificial Intelligence, Vol. 5254, 22–37.
Clémençon, S., Vayatis, N. (2010). Overlaying classifiers: a practical approach to optimal scoring. Constructive Approximation, 32(3), 619–648.
Article MathSciNet MATH Google Scholar
Clémençon, S., Lugosi, G., Vayatis, N. (2008). Ranking and empirical minimization of U-statistics. The Annals of Statistics, 36(2), 844–874.
Article MathSciNet MATH Google Scholar
Clémençon, S., Depecker, M., Vayatis, N. (2013a). Ranking forests. Journal of Machine Learning Research, 14(Jan), 39–73.
MathSciNet MATH Google Scholar
Clémençon, S., Depecker, M., Vayatis, N. (2013b). An empirical comparison of learning algorithms for nonparametric scoring: the TreeRank algorithm and other methods. Pattern Analysis and Applications, 16(4), 475–496.
Article MathSciNet MATH Google Scholar
Clémençon, S., Robbiano, S., Vayatis, N. (2013c). Ranking data with ordinal labels: Optimality and pairwise aggregation. Machine Learning, 91(1), 67–104.
Article MathSciNet MATH Google Scholar
Davies, P. L. (1993). Aspects of robust linear regression. The Annals of Statistics, 21(4), 1843–1899.
Article MathSciNet MATH Google Scholar
Davies, P. L., Gather, U. (2005). Breakdown and groups. The Annals of Statistics, 33(3), 977–1035.
Article MathSciNet MATH Google Scholar
Donoho, D. L. (2006). High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete & Computational Geometry, 35(4), 617–652.
Article MathSciNet MATH Google Scholar
Donoho, D. L., Huber, P. J. (1983). The notion of breakdown point. A Festschrift for Erich L. Lehmann, 157–184.
Donoho, D. L., Stodden, V. (2006). Breakdown point of model selection when the number of variables exceeds the number of observations. The 2006 IEEE international joint conference on neural network proceedings, 1916–1921. IEEE.
Freund, Y., Iyer, R., Schapire, R. E., et al. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4(Nov), 933–969.
MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R. (2001). The elements of statistical learning. Springer Series in Statistics, Vol. 1. New York, NY: Springer New York.
MATH Google Scholar
Fürnkranz, J., Hüllermeier, E. (2011). Preference learning, Vol. 19. 01 ISBN 978-3-642-14124-9. https://doi.org/10.1007/978-3-642-14125-6.
Fürnkranz, J., Hüllermeier, E., Vanderlooy, S. (2009). Binary decomposition methods for multipartite ranking. Joint European conference on machine learning and knowledge discovery in databases, 359–374. Berlin, Heidelberg: Springer.
Gather, U., Hilker, T. (1997). A note on Tyler’s modification of the mad for the stahel-donoho estimator. Annals of Statistics, 25(5), 2024–2026.
Article MathSciNet MATH Google Scholar
Genton, M. G. (1998). Spatial breakdown point of variogram estimators. Mathematical Geology, 30(7), 853–871.
Article MathSciNet MATH Google Scholar
Genton, M. G. (2003). Breakdown-point for spatially and temporally correlated observations. Developments in robust statistics, 148–159. Heidelberg: Springer.
Chapter Google Scholar
Genton, M. G., & Lucas, A. (2003). Comprehensive definitions of breakdown points for independent and dependent observations. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(1), 81–94.
Article MathSciNet MATH Google Scholar
Hampel, F. R. (1971). A general qualitative definition of robustness. The Annals of Mathematical Statistics, 42(6), 1887–1896.
Article MathSciNet MATH Google Scholar
Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69(346), 383–393.
Article MathSciNet MATH Google Scholar
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P., et al. (1986). Robust statistics: The approach based on influence functions. New York: Wiley-Interscience.
MATH Google Scholar
He, X. (2005). Discussion of "breakdown and groups" by P.L. Davies and U. Gather. arXiv: math/0508501.
Hennig, C. (2008). Dissolution point and isolation robustness: robustness criteria for general cluster analysis methods. Journal of Multivariate Analysis, 99(6), 1154–1176.
Article MathSciNet MATH Google Scholar
Herbrich, R., Graepel, T., Obermayer, K. (1999a). Support vector learning for ordinal regression. 9th international conference on artificial neural networks: ICANN ’99, 97–102. IET.
Herbrich, R., Graepel, T., Obermayer, K. (1999b). Regression models for ordinal data: A machine learning approach. Citeseer.
Hodges, J. L., Jr. (1967). Efficiency in normal samples and tolerance of extreme values for some estimates of location. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, 163–186.
Hothorn, T. (2019). TH.data: TH’s data archive, URL https://CRAN.R-project.org/package=TH.data. R package version 1.0-10.
Huber, P. J., Ronchetti, E. (2009). Robust statistics. New Jersey: John Wiley & Sons.
Book MATH Google Scholar
Hubert, M. (1997). The breakdown value of the \(L_1\) estimator in contingency tables. Statistics & Probability Letters, 33(4), 419–425.
Article MathSciNet MATH Google Scholar
Hubert, M., Rousseeuw, P. J., Van Aelst, S. (2008). High-breakdown robust multivariate methods. Statistical Science, 23(1), 92–119.
Article MathSciNet MATH Google Scholar
Joachims, T. (2002). Optimizing search engines using clickthrough data. Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, 133–142. ACM.
Kanamori, T., Takenouchi, T., Eguchi, S., et al. (2004). The most robust loss function for boosting. Neural information processing, 496–501. Berlin, Heidelberg: Springer.
Kayala, M. A., Azencott, C.-A., Chen, J. H., et al. (2011). Learning to predict chemical reactions. Journal of Chemical Information and Modeling, 51(9), 2209–2222.
Article Google Scholar
Lai, H., Pan, Y., Liu, C., et al. (2013). Sparse learning-to-rank via an efficient primal-dual algorithm. IEEE Transactions on Computers, 62(6), 1221–1233.
Article MathSciNet MATH Google Scholar
Laporte, L., Flamary, R., Canu, S., et al. (2014). Nonconvex regularizations for feature selection in ranking with sparse SVM. IEEE Transactions on Neural Networks and Learning Systems, 25(6), 1118–1130.
Article Google Scholar
Maronna, R. A., Martin, R. D., Yohai, V. J., et al. (2019). Robust statistics: theory and methods (with R). Chichester, England: John Wiley & Sons.
MATH Google Scholar
Meinshausen, N., Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473.
Article MathSciNet MATH Google Scholar
Mohan, A., Chen, Z., Weinberger, K. (2011). Web-search ranking with initialized gradient boosted regression trees. Proceedings of the learning to rank challenge, 77–89. PMLR.
Morrison, J. L., Breitling, R., Higham, D. J., et al. (2005). Generank: Using search engine technology for the analysis of microarray experiments. BMC Bioinformatics, 6(1), 1–14.
Article Google Scholar
Page, L., Brin, S., Motwani, R., et al. (1999). The pagerank citation ranking: Bringing order to the web. Technical Report Nr. 1999-66, Stanford InfoLab, November URL http://ilpubs.stanford.edu:8090/422/. Previous number = SIDL-WP-1999-0120.
Pahikkala, T., Tsivtsivadze, E., Airola, A. et al. (2007). Learning to rank with pairwise regularized least-squares. SIGIR 2007 workshop on learning to rank for information retrieval, Vol. 80, 27–33.
Pahikkala, T., Airola, A., Naula, P. et al. (2010). Greedy RankRLS: A linear time algorithm for learning sparse ranking models. SIGIR 2010 workshop on feature generation and selection for information retrieval, 11–18. ACM.
Pickett, K. S. (2006). Audit planning: A risk-based approach. New Jersey: John Wiley & Sons.
Google Scholar
Qian, C., Tran-Dinh, Q., Fu, S., et al. (2019). Robust multicategory support matrix machines. Mathematical Programming, 176(1–2), 429–463.
Article MathSciNet MATH Google Scholar
Rakotomamonjy, A. (2004). Optimizing area under Roc curve with SVMs. Proceedings of the ECAI-2004 workshop on ROC analysis in AI, 71–80.
Rieder, H. (1994). Robust Asymptotic Statistics, Vol. 1. New York: Springer Verlag.
Book MATH Google Scholar
Rousseeuw, P. J. (1984). Least median of squares regression. Journal of the American Statistical Association, 79(388), 871–880.
Article MathSciNet MATH Google Scholar
Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. Mathematical Statistics and Applications, 8(37), 283–297.
Article MathSciNet MATH Google Scholar
Rousseeuw, P. J., Hubert, M. (2011). Robust statistics for outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1), 73–79.
Google Scholar
Rousseeuw, P. J., Leroy, A. M. (2005). Robust regression and outlier detection, Vol. 589. Hoboken, New Jersey: John Wiley & Sons.
MATH Google Scholar
Rousseeuw, P. J., Van Driessen, K. (2006). Computing LTS regression for large data sets. Data Mining and Knowledge Discovery, 12(1), 29–45.
Article MathSciNet MATH Google Scholar
Ruckdeschel, P., Horbenko, N. (2012). Yet another breakdown point notion: EFSBP. Metrika, 75(8), 1025–1047.
Article MathSciNet MATH Google Scholar
Rudin, C. (2009). The p-norm push: A simple convex ranking algorithm that concentrates at the top of the list. Journal of Machine Learning Research, 10(Oct), 2233–2271.
MathSciNet MATH Google Scholar
Sakata, S., White, H. (1995). An alternative definition of finite-sample breakdown point with applications to regression model estimators. Journal of the American Statistical Association, 90(431), 1099–1106.
MathSciNet MATH Google Scholar
Sakata, S., White, H. (1998). High breakdown point conditional dispersion estimation with application to S & P 500 daily returns volatility. Econometrica, 529–567.
Schölkopf, B., Herbrich, R., Smola, A. (2001). A generalized representer theorem. Computational Learning Theory, 416–426. Berlin, Heidelberg: Springer.
Chapter Google Scholar
Sculley, D. (2010). Combined regression and ranking. Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, 979–988.
Stromberg, A. J., Ruppert, D. (1992). Breakdown in nonlinear regression. Journal of the American Statistical Association, 87(420), 991–997.
Article MathSciNet MATH Google Scholar
Tian, Y., Shi, Y., Chen, X., et al. (2011). AUC maximizing support vector machines with feature selection. Procedia Computer Science, 4, 1691–1698.
Article Google Scholar
Torgo, L., Ribeiro, R. (2007). Utility-based regression. European conference on principles of data mining and knowledge discovery, 597–604. Berlin, Heidelberg: Springer.
Von Mises, R. (1947). On the asymptotic distribution of differentiable statistical functions. The Annals of Mathematical Statistics, 18(3), 309–348.
Article MathSciNet MATH Google Scholar
Wang, S., Nan, B., Rosset, S., et al. (2011). Random lasso. The Annals of Applied Statistics, 5(1), 468.
Article MathSciNet MATH Google Scholar
Werner, D. (2006). Funktionalanalysis. Berlin, Heidelberg: Springer.
Google Scholar
Werner, T. (2021a). A review on instance ranking problems in statistical learning. Machine Learning, 111(2), 415–463.
Article MathSciNet MATH Google Scholar
Werner, T. (2021b). Trimming stability selection increases variable selection robustness. arXiv:2111.11818.
Werner, T. (2022). Elicitability of instance and object ranking. Decision Analysis, 19(2), 123–140.
Article MathSciNet MATH Google Scholar
Yoganarasimhan, H. (2020). Search personalization using machine learning. Management Science, 66(3), 1045–1070.
Article Google Scholar
Zhao, J., Yu, G., Liu, Y. (2018). Assessing robustness of classification using angular breakdown point. Annals of Statistics, 46(6B), 3362.
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Mathematics, Carl von Ossietzky University Oldenburg, Carl-von-Ossietzky-Strasse 9-11, P/O Box 5634, 26046, Oldenburg, Germany
Tino Werner

Authors

Tino Werner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tino Werner.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The online version of this article contains supplementary material.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 515 KB)

About this article

Cite this article

Werner, T. Quantitative robustness of instance ranking problems. Ann Inst Stat Math 75, 335–368 (2023). https://doi.org/10.1007/s10463-022-00847-1

Download citation

Received: 04 November 2021
Revised: 03 May 2022
Accepted: 07 July 2022
Published: 30 August 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10463-022-00847-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantitative robustness of instance ranking problems

Abstract

Access this article

Similar content being viewed by others

A review on instance ranking problems in statistical learning

Efficiency comparisons for partially rank-ordered set sampling

Perfect ranking test in moving extreme ranked set sampling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 515 KB)

About this article

Cite this article

Keywords

Navigation

Quantitative robustness of instance ranking problems

Abstract

Access this article

Similar content being viewed by others

A review on instance ranking problems in statistical learning

Efficiency comparisons for partially rank-ordered set sampling

Perfect ranking test in moving extreme ranked set sampling

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 515 KB)

About this article

Cite this article

Share this article

Keywords

Search

Navigation