Quality-driven machine learning-based data science pipeline realization: a software engineering approach

Author:
Giordano d'Aloisio

University of L'Aquila, Italy

University of L'Aquila, Italy
View Profile

ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion ProceedingsMay 2022Pages 291–293https://doi.org/10.1145/3510454.3517067

Published:19 October 2022Publication History

ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings

Pages 291–293

ABSTRACT

The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in engineering and implementation of this systems. Considering the big picture of data science, Machine learning is the wider used technique and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also presents some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.

References

Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.Google ScholarDigital Library
Mohsen Asadi, Samaneh Soltani, Dragan Gasevic, Marek Hatala, and Ebrahim Bagheri. 2014. Toward automated feature model configuration with optimizing non-functional requirements. Information and Software Technology 56, 9 (Sept. 2014), 1144--1165. Google ScholarCross Ref
Shelernaz Azimi and Claus Pahl. 2020. A Layered Quality Framework for Machine Learning-driven Data and Information Models.. In ICEIS (1). 579--587.Google Scholar
Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2018. Fairness in Criminal Justice Risk Assessments: The State of the Art. 50, 1 (2018), 3--44. https://doi.org/10.1177/0049124118782533 Publisher: SAGE PublicationsSage CA: Los Angeles, CA. Google ScholarCross Ref
Gurol Canbek. 2021. The need for a systematic machine-learning process: A proposal via a mobile malware classification case study. In 2021 International Conference on Information Security and Cryptology (ISCTURKEY). IEEE, Ankara, Turkey, 173--178. Google ScholarCross Ref
Elizamary de Souza Nascimento, Iftekhar Ahmed, Edson Oliveira, Márcio Piedade Palheta, Igor Steinmacher, and Tayana Conte. 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.Google ScholarCross Ref
Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (2012), 214--226. Google ScholarDigital Library
Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Sydney NSW Australia, 259--268. Google ScholarDigital Library
Görkem Giray. 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software (2021), 111031.Google Scholar
Koichi Hamada, Fuyuki Ishikawa, Satoshi Masuda, Tomoyuki Myojin, Yasuharu Nishi, Hideto Ogawa, Takahiro Toku, Susumu Tokumoto, Kazunori Tsuchiya, Yasuhiro Ujita, et al. 2020. Guidelines for Quality Assurance of Machine Learning-based Artificial Intelligence.. In SEKE. 335--341.Google Scholar
Fuyuki Ishikawa. 2018. Concepts in quality assessment for machine learning-from test data to arguments. In International Conference on Conceptual Modeling. Springer, 536--544.Google ScholarCross Ref
Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (Oct. 2012), 1--33. Google ScholarDigital Library
Kyo C Kang, Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. 1990. Feature-oriented domain analysis (FODA) feasibility study. Technical Report. Carnegie-Mellon Univ Pittsburgh Pa Software Engineering Inst.Google Scholar
Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counter-factual Fairness. In Advances in Neural Information Processing Systems (2017), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.htmlGoogle Scholar
Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. 2020. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 23, 1 (Dec. 2020), 18. Google Scholar
Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. Comput. Surveys 54, 6 (July 2021), 1--35. Google ScholarDigital Library
Julien Siebert, Lisa Joeckel, Jens Heidrich, Adam Trendowicz, Koji Nakamichi, Kyoko Ohashi, Isao Namba, Rieko Yamamoto, and Mikio Aoyama. 2021. Construction of a quality model for machine learning systems. Software Quality Journal (2021), 1--29.Google Scholar
Stefan Studer, Thanh Binh Bui, Christian Drescher, Alexander Hanuschkin, Ludwig Winkler, Steven Peters, and Klaus-Robert Müller. 2021. Towards CRISP-ML (Q): a machine learning process model with quality assurance methodology. Machine Learning and Knowledge Extraction 3, 2 (2021), 392--413.Google ScholarCross Ref
Thomas Thum, Christian Kastner, Sebastian Erdweg, and Norbert Siegmund. 2011. Abstract features in feature modeling. In 2011 15th International Software Product Line Conference. IEEE, 191--200.Google ScholarDigital Library
Hugo Villamizar, Tatiana Escovedo, and Marcos Kalinowski. 2021. Requirements Engineering for Machine Learning: A Systematic Mapping Study. In SEAA. 29--36.Google Scholar

Index Terms

Quality-driven machine learning-based data science pipeline realization: a software engineering approach
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Designing software
  2. Software organization and properties
    1. Extra-functional properties

Recommendations

Construction of a quality model for machine learning systems
Abstract
Nowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its ...
Read More
Factors affecting effective software quality management revisited

Developing a good software system is a very complex task. In order to produce a good software product, several measures for software quality attributes need to be taken into account. System complexity measurement plays a vital role in controlling and ...
Read More
Monitoring Software Quality Evolution for Defects

Quality control charts, especially c-charts, can help monitor software quality evolution for defects over time. c-charts of the Eclipse and Gnome systems showed that for systems experiencing active maintenance and updates, quality evolution is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings
May 2022
394 pages
ISBN:9781450392235
DOI:10.1145/3510454
General Chair:
Matthew B Dwyer
University of Virginia
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
machine learning
model-driven
pipelines
product-line architecture
software quality
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 339
  Total Downloads
- Downloads (Last 12 months)196
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quality-driven machine learning-based data science pipeline realization: a software engineering approach

ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings

ABSTRACT

References

Cited By

Index Terms

Recommendations

Construction of a quality model for machine learning systems

Factors affecting effective software quality management revisited

Monitoring Software Quality Evolution for Defects