Ordinal Quantification Through Regularization

Bunse, Mirko; Moreo, Alejandro; Sebastiani, Fabrizio; Senz, Martin

doi:10.1007/978-3-031-26419-1_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13717))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

628 Accesses

Abstract

Quantification, i.e., the task of training predictors of the class prevalence values in sets of unlabelled data items, has received increased attention in recent years. However, most quantification research has concentrated on developing algorithms for binary and multiclass problems in which the classes are not ordered. We here study the ordinal case, i.e., the case in which a total order is defined on the set of \(n>2\) classes. We give three main contributions to this field. First, we create and make available two datasets for ordinal quantification (OQ) research that overcome the inadequacies of the previously available ones. Second, we experimentally compare the most important OQ algorithms proposed in the literature so far. To this end, we bring together algorithms that are proposed by authors from very different research fields, who were unaware of each other’s developments. Third, we propose three OQ algorithms, based on the idea of preventing ordinally implausible estimates through regularization. Our experiments show that these algorithms outperform the existing ones if the ordinal plausibility assumption holds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code and supplementary results: https://github.com/mirkobunse/ecml22.
2.
http://jmcauley.ucsd.edu/data/amazon/links.html.
3.
https://huggingface.co/docs/transformers/model_doc/roberta.
4.
https://factdata.app.tu-dortmund.de/.
5.
https://github.com/fact-project/open_crab_sample_analysis/.

References

Aad, G., Abbott, B., Abbott, D.C., et al.: Measurements of the inclusive and differential production cross sections of a top-quark-antiquark pair in association with a Z boson at \(\sqrt{s} = 13\) TeV with the ATLAS detector. Europ. Phys. J. C 81(8), 737 (2021)
Article Google Scholar
Aartsen, M.G., Ackermann, M., Adams, J., et al.: Measurement of the \(\nu _{\mu }\) energy spectrum with IceCube-79. Europ. Phys. J. C 77(10) (2017)
Google Scholar
Anderhub, H., Backes, M., Biland, A., et al.: Design and operation of FACT, the first G-APD Cherenkov telescope. J. Inst. 8(06), P06008 (2013)
Google Scholar
Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.J.: Quantification via probability estimators. In: International Conference on Data Mining (2010)
Google Scholar
Blobel, V.: Unfolding methods in high-energy physics experiments. Technical report, DESY-84-118, CERN, Geneva, CH (1985)
Google Scholar
Blobel, V.: An unfolding method for high-energy physics experiments. In: Advanced Statistical Techniques in Particle Physics, Durham, UK, pp. 258–267 (2002)
Google Scholar
Börner, M., Hoinka, T., Meier, M., et al.: Measurement/simulation mismatches and multivariate data discretization in the machine learning era. In: Conference on Astronomical Data Analysis Software and Systems, pp. 431–434 (2017)
Google Scholar
Bunse, M.: Unification of algorithms for quantification and unfolding. In: Workshop on Machine Learning for Astroparticle Physics and Astronomy. Gesellschaft für Informatik e.V. (2022, to appear)
Google Scholar
Bunse, M., Piatkowski, N., Morik, K., Ruhe, T., Rhode, W.: Unification of deconvolution algorithms for Cherenkov astronomy. In: Data Science and Advanced Analytics, pp. 21–30 (2018)
Google Scholar
Da San Martino, G., Gao, W., Sebastiani, F.: Ordinal text quantification. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 937–940 (2016)
Google Scholar
D’Agostini, G.: A multidimensional unfolding method based on Bayes’ theorem. Nucl. Instr. Meth. Phys. Res.: Sect. A 362(2–3), 487–498 (1995)
Google Scholar
D’Agostini, G.: Improved iterative Bayesian unfolding (2010). arXiv:1010.0632
Esuli, A.: ISTI-CNR at SemEval-2016 task 4: quantification on an ordinal scale. In: International Workshop on Semantic Evaluation, pp. 92–95 (2016)
Google Scholar
Esuli, A., Moreo, A., Sebastiani, F.: LeQua@CLEF2022: learning to quantify. In: Hagen, M., et al. (eds.) ECIR 2022. LNCS, vol. 13186, pp. 374–381. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-99739-7_47
Chapter Google Scholar
Esuli, A., Sebastiani, F.: Sentiment quantification. IEEE Intell. Syst. 25(4), 72–75 (2010)
Article Google Scholar
Forman, G.: Counting positives accurately despite inaccurate classification. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 564–575. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_55
Chapter Google Scholar
Gao, W., Sebastiani, F.: From classification to quantification in tweet sentiment analysis. Soc. Netw. Anal. Min. 6(1), 1–22 (2016). https://doi.org/10.1007/s13278-016-0327-z
Article Google Scholar
Higashinaka, R., Funakoshi, K., Inaba, M., Tsunomori, Y., Takahashi, T., Kaji, N.: Overview of the 3rd dialogue breakdown detection challenge. In: Dialog System Technology Challenge (2017)
Google Scholar
Hoecker, A., Kartvelishvili, V.: SVD approach to data unfolding. Nucl. Instr. Meth. Phys. Res.: Sect. A 372(3), 469–481 (1996)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach (2019). arXiv:1907.11692
McAuley, J.J., Targett, C., Shi, Q., van den Hengel, A.: Image-based recommendations on styles and substitutes. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 43–52 (2015)
Google Scholar
Moreno-Torres, J.G., Raeder, T., Alaíz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recogn. 45(1), 521–530 (2012)
Article Google Scholar
Mueller, J.L., Siltanen, S.: Linear and nonlinear inverse problems with practical applications. SIAM (2012)
Google Scholar
Nachman, B., Urbanek, M., de Jong, W.A., Bauer, C.W.: Unfolding quantum computer readout noise. NPJ Quant. Inf. 6(1), 84 (2020)
Article Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation, pp. 1–18 (2016)
Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer, Cham (2006). https://doi.org/10.1007/978-0-387-40065-5
Book MATH Google Scholar
Nöthe, M., Adam, J., Ahnen, M.L., et al.: FACT - performance of the first Cherenkov telescope observing with SiPMs. In: International Cosmic Ray Conference (2018)
Google Scholar
Pedregosa, F., Bach, F., Gramfort, A.: On the consistency of ordinal regression methods. J. Mach. Learn. Res. 18, 55:1–55:35 (2017)
Google Scholar
Rennie, J.D., Srebro, N.: Loss functions for preference levels: regression with discrete ordered labels. In: IJCAI 2005 Workshop on Advances in Preference Handling (2005)
Google Scholar
Rosenthal, S., Farra, N., Nakov, P.: SemEval-2017 task 4: sentiment analysis in Twitter. In: International Workshop on Semantic Evaluation, pp. 502–518 (2017)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In: International Conference on Computer Vision, pp. 59–66 (1998)
Google Scholar
Ruhe, T., Schmitz, M., Voigt, T., Wornowizki, M.: DSEA: a data mining approach to unfolding. In: International Cosmic Ray Conference, pp. 3354–3357 (2013)
Google Scholar
Saerens, M., Latinne, P., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput. 14(1), 21–41 (2002)
Article MATH Google Scholar
Sakai, T.: Comparing two binned probability distributions for information access evaluation. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1073–1076 (2018)
Google Scholar
Schmelling, M.: The method of reduced cross-entropy: a general approach to unfold probability distributions. Nucl. Instr. Meth. Phys. Res.: Sect. A 340(2), 400–412 (1994)
Google Scholar
Schmitt, S.: TUnfold, an algorithm for correcting migration effects in high energy physics. J. Inst. 7(10), T10003 (2012)
Google Scholar
Smith, N.A., Tromble, R.W.: Sampling uniformly from the unit simplex. Technical report, Johns Hopkins University (2004)
Google Scholar
Werman, M., Peleg, S., Rosenfeld, A.: A distance metric for multidimensional histograms. Comput. Vis. Graph. Image Proc. 32, 328–336 (1985)
Google Scholar
Zeng, Z., Kato, S., Sakai, T.: Overview of the NTCIR-14 short text conversation task: dialogue quality and nugget detection subtasks. In: NTCIR (2019)
Google Scholar
Zeng, Z., Kato, S., Sakai, T., Kang, I.: Overview of the NTCIR-15 dialogue evaluation task (DialEval-1). In: NTCIR (2020)
Google Scholar

Download references

Acknowledgments

The work by M.B., A.M., and F.S. has been supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 871042 (SoBigData++). M.B. and M.S. have further been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Data Analysis”, project C3, https://sfb876.tu-dortmund.de. A.M. and F.S. have further been supported by the AI4Media project, funded by the European Commission (Grant 951911) under the H2020 Programme ICT-48-2020. The authors’ opinions do not necessarily reflect those of the European Commission.

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, 44227, Dortmund, Germany
Mirko Bunse & Martin Senz
Istituto di Scienza e Tecnologie dell’Informazione, Consiglio Nazionale delle Ricerche, 56124, Pisa, Italy
Alejandro Moreo & Fabrizio Sebastiani

Authors

Mirko Bunse
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Moreo
View author publications
You can also search for this author in PubMed Google Scholar
Fabrizio Sebastiani
View author publications
You can also search for this author in PubMed Google Scholar
Martin Senz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirko Bunse .

Editor information

Editors and Affiliations

Grenoble Alpes University, Saint Martin d’Hères, France
Massih-Reza Amini
INSA Rouen Normandy, Saint Etienne du Rouvray, France
Stéphane Canu
Ruhr-Universität Bochum, Bochum, Germany
Asja Fischer
KU Leuven, Leuven, Belgium
Tias Guns
Central European University, Vienna, Austria
Petra Kralj Novak
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bunse, M., Moreo, A., Sebastiani, F., Senz, M. (2023). Ordinal Quantification Through Regularization. In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham. https://doi.org/10.1007/978-3-031-26419-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-26419-1_3
Published: 17 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26418-4
Online ISBN: 978-3-031-26419-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)