Skip to main content

Scikit-Weak: A Python Library for Weakly Supervised Machine Learning

  • Conference paper
  • First Online:
  • 799 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13633))

Abstract

In this article we introduce and describe scikit-weak, a Python library inspired by scikit-learn and developed to provide an easy-to-use framework for dealing with weakly supervised and imprecise data learning problems, which, despite their importance in real-world settings, cannot be easily managed by existing libraries. We provide a rationale for the development of such a library, then we discuss its design and the currently implemented methods and classes, which encompass several state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://scikit-learn.org.

  2. 2.

    Compared to the usual definition of a training set considered in the ML literature the definition of a decision table in rough set theory distinguishes instances in U from their representation in terms of features.

  3. 3.

    https://github.com/AndreaCampagner/scikit-weak.

  4. 4.

    https://pypi.org/project/scikit-weak/.

  5. 5.

    https://sphinx-doc.org/.

  6. 6.

    https://scikit-weak.readthedocs.io.

  7. 7.

    https://tensorflow.org.

References

  1. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org

  2. Bao, W.X., Hang, J.Y., Zhang, M.L.: Partial label dimensionality reduction via confidence-based dependence maximization. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 46–54 (2021)

    Google Scholar 

  3. Bezdek, J.C., Chuah, S.K., Leep, D.: Generalized k-nearest neighbor rules. Fuzzy Sets Syst. 18(3), 237–256 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  4. Cabannes, V., Bach, F., Rudi, A.: Disambiguation of weak supervision with exponential convergence rates. arXiv preprint arXiv:2102.02789 (2021)

  5. Campagner, A., Ciucci, D.: Feature selection and disambiguation in learning from fuzzy labels using rough sets. In: Ramanna, S., Cornelis, C., Ciucci, D. (eds.) IJCRS 2021. LNCS (LNAI), vol. 12872, pp. 164–179. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87334-9_14

    Chapter  MATH  Google Scholar 

  6. Campagner, A., Ciucci, D.: Rough-set based genetic algorithms for weakly supervised feature selection. In: et al. International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, vol. 1602, pp. 761–773. Springer, Cham (2022). DOIurlhttps://doi.org/10.1007/978-3-031-08974-9_60

  7. Campagner, A., Ciucci, D., Hüllermeier, E.: Rough set-based feature selection for weakly labeled data. Int. J. Approximate Reasoning 136, 150–167 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  8. Campagner, A., Ciucci, D., Svensson, C.M., Figge, M.T., Cabitza, F.: Ground truthing from multi-rater labeling with three-way decision and possibility theory. Inf. Sci. 545, 771–790 (2021)

    Article  MATH  Google Scholar 

  9. Chollet, F., et al.: Keras. https://keras.io (2015)

  10. Côme, E., Oukhellou, L., Denoeux, T., Aknin, P.: Learning from partially supervised data using mixture models and belief functions. Pattern Recogn. 42(3), 334–348 (2009)

    Article  MATH  Google Scholar 

  11. Denoeux, T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. Knowl. Data Eng. 25(1), 119–130 (2011)

    Article  Google Scholar 

  12. Denœux, T., Zouhal, L.M.: Handling possibilistic labels in pattern classification using evidential reasoning. Fuzzy Sets Syst. 122(3), 409–424 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  13. Destercke, S.: Uncertain data in learning: challenges and opportunities. Conformal and Probabilistic Prediction with Applications, pp. 322–332 (2022)

    Google Scholar 

  14. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)

    Article  MATH  Google Scholar 

  15. Hüllermeier, E.: Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization. Int. J. Approximate Reasoning 55(7), 1519–1534 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  16. Hüllermeier, E., Beringer, J.: Learning from ambiguously labeled examples. Intell. Data Anal. 10(5), 419–439 (2006)

    Article  MATH  Google Scholar 

  17. Kuncheva, L.: Fuzzy Classifier Design, vol. 49. Springer, Heidelberg (2000)

    MATH  Google Scholar 

  18. Lienen, J., Hüllermeier, E.: Credal self-supervised learning. In: Advances in Neural Information Processing Systems, vol. 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, 6–14 December 2021, virtual, pp. 14370–14382 (2021)

    Google Scholar 

  19. Lienen, J., Hüllermeier, E.: From label smoothing to label relaxation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, virtual, 2–9 February (2021)

    Google Scholar 

  20. Liu, L., Dietterich, T.G.: A conditional multinomial mixture model for superset label learning. In: Advances in Neural Information Processing Systems, pp. 548–556 (2012)

    Google Scholar 

  21. Löning, M., Bagnall, A., Ganesh, S., Kazakov, V., Lines, J., Király, F.J.: sktime: a unified interface for machine learning with time series. arXiv preprint arXiv:1909.07872 (2019)

  22. McKinney, W., et al.: pandas: a foundational python library for data analysis and statistics. Python High Perform. Sci. Comput. 14(9), 1–9 (2011)

    Google Scholar 

  23. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  24. Poyiadzi, R., Bacaicoa-Barber, D., Cid-Sueiro, J., Perello-Nieto, M., Flach, P., Santos-Rodriguez, R.: The weak supervision landscape. In: 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), pp. 218–223. IEEE (2022)

    Google Scholar 

  25. Quost, B., Denoeux, T., Li, S.: Parametric classification with soft labels using the evidential EM algorithm: linear discriminant analysis versus logistic regression. Adv. Data Anal. Classif. 11(4), 659–690 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  26. Sakai, H., Liu, C., Nakata, M., Tsumoto, S.: A proposal of a privacy-preserving questionnaire by non-deterministic information and its analysis. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1956–1965. IEEE (2016)

    Google Scholar 

  27. Wu, J.H., Zhang, M.L.: Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 416–424 (2019)

    Google Scholar 

  28. Zhang, M.L., Wu, J.H., Bao, W.X.: Disambiguation enabled linear discriminant analysis for partial label dimensionality reduction. ACM Trans. Knowl. Discov. Data (TKDD) 16(4), 1–18 (2022)

    Article  Google Scholar 

  29. Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)

    Article  Google Scholar 

  30. Zhou, Z.H., Sun, Y.Y., Li, Y.F.: Multi-instance learning by treating instances as non-IID samples. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1249–1256 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Campagner .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Campagner, A., Lienen, J., Hüllermeier, E., Ciucci, D. (2022). Scikit-Weak: A Python Library for Weakly Supervised Machine Learning. In: Yao, J., Fujita, H., Yue, X., Miao, D., Grzymala-Busse, J., Li, F. (eds) Rough Sets. IJCRS 2022. Lecture Notes in Computer Science(), vol 13633. Springer, Cham. https://doi.org/10.1007/978-3-031-21244-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21244-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21243-7

  • Online ISBN: 978-3-031-21244-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics