Scikit-Weak: A Python Library for Weakly Supervised Machine Learning

Campagner, Andrea; Lienen, Julian; Hüllermeier, Eyke; Ciucci, Davide

doi:10.1007/978-3-031-21244-4_5

Scikit-Weak: A Python Library for Weakly Supervised Machine Learning

Andrea Campagner¹³,
Julian Lienen¹⁴,
Eyke Hüllermeier¹⁵ &
…
Davide Ciucci¹³

Conference paper
First Online: 11 November 2022

799 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13633))

Abstract

In this article we introduce and describe scikit-weak, a Python library inspired by scikit-learn and developed to provide an easy-to-use framework for dealing with weakly supervised and imprecise data learning problems, which, despite their importance in real-world settings, cannot be easily managed by existing libraries. We provide a rationale for the development of such a library, then we discuss its design and the currently implemented methods and classes, which encompass several state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://scikit-learn.org.
2.
Compared to the usual definition of a training set considered in the ML literature the definition of a decision table in rough set theory distinguishes instances in U from their representation in terms of features.
3.
https://github.com/AndreaCampagner/scikit-weak.
4.
https://pypi.org/project/scikit-weak/.
5.
https://sphinx-doc.org/.
6.
https://scikit-weak.readthedocs.io.
7.
https://tensorflow.org.

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
Bao, W.X., Hang, J.Y., Zhang, M.L.: Partial label dimensionality reduction via confidence-based dependence maximization. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 46–54 (2021)
Google Scholar
Bezdek, J.C., Chuah, S.K., Leep, D.: Generalized k-nearest neighbor rules. Fuzzy Sets Syst. 18(3), 237–256 (1986)
Article MathSciNet MATH Google Scholar
Cabannes, V., Bach, F., Rudi, A.: Disambiguation of weak supervision with exponential convergence rates. arXiv preprint arXiv:2102.02789 (2021)
Campagner, A., Ciucci, D.: Feature selection and disambiguation in learning from fuzzy labels using rough sets. In: Ramanna, S., Cornelis, C., Ciucci, D. (eds.) IJCRS 2021. LNCS (LNAI), vol. 12872, pp. 164–179. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87334-9_14
Chapter MATH Google Scholar
Campagner, A., Ciucci, D.: Rough-set based genetic algorithms for weakly supervised feature selection. In: et al. International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, vol. 1602, pp. 761–773. Springer, Cham (2022). DOIurlhttps://doi.org/10.1007/978-3-031-08974-9_60
Campagner, A., Ciucci, D., Hüllermeier, E.: Rough set-based feature selection for weakly labeled data. Int. J. Approximate Reasoning 136, 150–167 (2021)
Article MathSciNet MATH Google Scholar
Campagner, A., Ciucci, D., Svensson, C.M., Figge, M.T., Cabitza, F.: Ground truthing from multi-rater labeling with three-way decision and possibility theory. Inf. Sci. 545, 771–790 (2021)
Article MATH Google Scholar
Chollet, F., et al.: Keras. https://keras.io (2015)
Côme, E., Oukhellou, L., Denoeux, T., Aknin, P.: Learning from partially supervised data using mixture models and belief functions. Pattern Recogn. 42(3), 334–348 (2009)
Article MATH Google Scholar
Denoeux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. Knowl. Data Eng. 25(1), 119–130 (2011)
Article Google Scholar
Denœux, T., Zouhal, L.M.: Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets Syst. 122(3), 409–424 (2001)
Article MathSciNet MATH Google Scholar
Destercke, S.: Uncertain data in learning: challenges and opportunities. Conformal and Probabilistic Prediction with Applications, pp. 322–332 (2022)
Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Article MATH Google Scholar
Hüllermeier, E.: Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization. Int. J. Approximate Reasoning 55(7), 1519–1534 (2014)
Article MathSciNet MATH Google Scholar
Hüllermeier, E., Beringer, J.: Learning from ambiguously labeled examples. Intell. Data Anal. 10(5), 419–439 (2006)
Article MATH Google Scholar
Kuncheva, L.: Fuzzy Classifier Design, vol. 49. Springer, Heidelberg (2000)
MATH Google Scholar
Lienen, J., Hüllermeier, E.: Credal self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, virtual, pp. 14370–14382 (2021)
Google Scholar
Lienen, J., Hüllermeier, E.: From label smoothing to label relaxation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, virtual, 2–9 February (2021)
Google Scholar
Liu, L., Dietterich, T.G.: A conditional multinomial mixture model for superset label learning. In: Advances in Neural Information Processing Systems, pp. 548–556 (2012)
Google Scholar
Löning, M., Bagnall, A., Ganesh, S., Kazakov, V., Lines, J., Király, F.J.: sktime: a unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872 (2019)
McKinney, W., et al.: pandas: a foundational python library for data analysis and statistics. Python High Perform. Sci. Comput. 14(9), 1–9 (2011)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Poyiadzi, R., Bacaicoa-Barber, D., Cid-Sueiro, J., Perello-Nieto, M., Flach, P., Santos-Rodriguez, R.: The weak supervision landscape. In: 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 218–223. IEEE (2022)
Google Scholar
Quost, B., Denoeux, T., Li, S.: Parametric classification with soft labels using the evidential EM algorithm: linear discriminant analysis versus logistic regression. Adv. Data Anal. Classif. 11(4), 659–690 (2017)
Article MathSciNet MATH Google Scholar
Sakai, H., Liu, C., Nakata, M., Tsumoto, S.: A proposal of a privacy-preserving questionnaire by non-deterministic information and its analysis. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1956–1965. IEEE (2016)
Google Scholar
Wu, J.H., Zhang, M.L.: Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 416–424 (2019)
Google Scholar
Zhang, M.L., Wu, J.H., Bao, W.X.: Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction. ACM Trans. Knowl. Discov. Data (TKDD) 16(4), 1–18 (2022)
Article Google Scholar
Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)
Article Google Scholar
Zhou, Z.H., Sun, Y.Y., Li, Y.F.: Multi-instance learning by treating instances as non-IID samples. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1249–1256 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Sistemistica e Comunicazione, University of Milano–Bicocca, Viale Sarca 336/14, 20126, Milano, Italy
Andrea Campagner & Davide Ciucci
Department of Computer Science, Paderborn University, Warburger Str. 100, 33098, Paderborn, Germany
Julian Lienen
Institute of Informatics, University of Munich (LMU), Munich Center for Machine Learning (MCML), Akademiestr. 7, 80799, Munich, Germany
Eyke Hüllermeier

Authors

Andrea Campagner
View author publications
You can also search for this author in PubMed Google Scholar
Julian Lienen
View author publications
You can also search for this author in PubMed Google Scholar
Eyke Hüllermeier
View author publications
You can also search for this author in PubMed Google Scholar
Davide Ciucci
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrea Campagner .

Editor information

Editors and Affiliations

University of Regina, Regina, SK, Canada
JingTao Yao
Iwate Prefectural University, Takizawa, Iwate, Japan
Hamido Fujita
Shanghai University, Shanghai, China
Xiaodong Yue
Tongji University, Shanghai, China
Duoqian Miao
University of Kansas, Lawrence, KS, USA
Jerzy Grzymala-Busse
Soochow University, Suzhou, Jiangsu, China
Fanzhang Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Campagner, A., Lienen, J., Hüllermeier, E., Ciucci, D. (2022). Scikit-Weak: A Python Library for Weakly Supervised Machine Learning. In: Yao, J., Fujita, H., Yue, X., Miao, D., Grzymala-Busse, J., Li, F. (eds) Rough Sets. IJCRS 2022. Lecture Notes in Computer Science(), vol 13633. Springer, Cham. https://doi.org/10.1007/978-3-031-21244-4_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-21244-4_5
Published: 11 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21243-7
Online ISBN: 978-3-031-21244-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics