skip to main content
10.1145/3510454.3517067acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open Access

Quality-driven machine learning-based data science pipeline realization: a software engineering approach

Published:19 October 2022Publication History

ABSTRACT

The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in engineering and implementation of this systems. Considering the big picture of data science, Machine learning is the wider used technique and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also presents some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.

References

  1. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mohsen Asadi, Samaneh Soltani, Dragan Gasevic, Marek Hatala, and Ebrahim Bagheri. 2014. Toward automated feature model configuration with optimizing non-functional requirements. Information and Software Technology 56, 9 (Sept. 2014), 1144--1165. Google ScholarGoogle ScholarCross RefCross Ref
  3. Shelernaz Azimi and Claus Pahl. 2020. A Layered Quality Framework for Machine Learning-driven Data and Information Models.. In ICEIS (1). 579--587.Google ScholarGoogle Scholar
  4. Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2018. Fairness in Criminal Justice Risk Assessments: The State of the Art. 50, 1 (2018), 3--44. https://doi.org/10.1177/0049124118782533 Publisher: SAGE PublicationsSage CA: Los Angeles, CA. Google ScholarGoogle ScholarCross RefCross Ref
  5. Gurol Canbek. 2021. The need for a systematic machine-learning process: A proposal via a mobile malware classification case study. In 2021 International Conference on Information Security and Cryptology (ISCTURKEY). IEEE, Ankara, Turkey, 173--178. Google ScholarGoogle ScholarCross RefCross Ref
  6. Elizamary de Souza Nascimento, Iftekhar Ahmed, Edson Oliveira, Márcio Piedade Palheta, Igor Steinmacher, and Tayana Conte. 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  7. Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (2012), 214--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Sydney NSW Australia, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Görkem Giray. 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software (2021), 111031.Google ScholarGoogle Scholar
  10. Koichi Hamada, Fuyuki Ishikawa, Satoshi Masuda, Tomoyuki Myojin, Yasuharu Nishi, Hideto Ogawa, Takahiro Toku, Susumu Tokumoto, Kazunori Tsuchiya, Yasuhiro Ujita, et al. 2020. Guidelines for Quality Assurance of Machine Learning-based Artificial Intelligence.. In SEKE. 335--341.Google ScholarGoogle Scholar
  11. Fuyuki Ishikawa. 2018. Concepts in quality assessment for machine learning-from test data to arguments. In International Conference on Conceptual Modeling. Springer, 536--544.Google ScholarGoogle ScholarCross RefCross Ref
  12. Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (Oct. 2012), 1--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Kyo C Kang, Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. 1990. Feature-oriented domain analysis (FODA) feasibility study. Technical Report. Carnegie-Mellon Univ Pittsburgh Pa Software Engineering Inst.Google ScholarGoogle Scholar
  14. Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counter-factual Fairness. In Advances in Neural Information Processing Systems (2017), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.htmlGoogle ScholarGoogle Scholar
  15. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. 2020. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 23, 1 (Dec. 2020), 18. Google ScholarGoogle Scholar
  16. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. Comput. Surveys 54, 6 (July 2021), 1--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Julien Siebert, Lisa Joeckel, Jens Heidrich, Adam Trendowicz, Koji Nakamichi, Kyoko Ohashi, Isao Namba, Rieko Yamamoto, and Mikio Aoyama. 2021. Construction of a quality model for machine learning systems. Software Quality Journal (2021), 1--29.Google ScholarGoogle Scholar
  18. Stefan Studer, Thanh Binh Bui, Christian Drescher, Alexander Hanuschkin, Ludwig Winkler, Steven Peters, and Klaus-Robert Müller. 2021. Towards CRISP-ML (Q): a machine learning process model with quality assurance methodology. Machine Learning and Knowledge Extraction 3, 2 (2021), 392--413.Google ScholarGoogle ScholarCross RefCross Ref
  19. Thomas Thum, Christian Kastner, Sebastian Erdweg, and Norbert Siegmund. 2011. Abstract features in feature modeling. In 2011 15th International Software Product Line Conference. IEEE, 191--200.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hugo Villamizar, Tatiana Escovedo, and Marcos Kalinowski. 2021. Requirements Engineering for Machine Learning: A Systematic Mapping Study. In SEAA. 29--36.Google ScholarGoogle Scholar

Index Terms

  1. Quality-driven machine learning-based data science pipeline realization: a software engineering approach

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings
          May 2022
          394 pages
          ISBN:9781450392235
          DOI:10.1145/3510454

          Copyright © 2022 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 October 2022

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate276of1,856submissions,15%

          Upcoming Conference

          ICSE 2025
        • Article Metrics

          • Downloads (Last 12 months)196
          • Downloads (Last 6 weeks)21

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader