ABSTRACT
The recently wide adoption of data science approaches to decision making in several application domains (such as health, business and even education) open new challenges in engineering and implementation of this systems. Considering the big picture of data science, Machine learning is the wider used technique and due to its characteristics, we believe that a better engineering methodology and tools are needed to realize innovative data-driven systems able to satisfy the emerging quality attributes (such as, debias and fariness, explainability, privacy and ethics, sustainability). This research project will explore the following three pillars: i) identify key quality attributes, formalize them in the context of data science pipelines and study their relationships; ii) define a new software engineering approach for data-science systems development that assures compliance with quality requirements; iii) implement tools that guide IT professionals and researchers in the realization of ML-based data science pipelines since the requirement engineering. Moreover, in this paper we also presents some details of the project showing how the feature models and model-driven engineering can be leveraged to realize our project.
- Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.Google ScholarDigital Library
- Mohsen Asadi, Samaneh Soltani, Dragan Gasevic, Marek Hatala, and Ebrahim Bagheri. 2014. Toward automated feature model configuration with optimizing non-functional requirements. Information and Software Technology 56, 9 (Sept. 2014), 1144--1165. Google ScholarCross Ref
- Shelernaz Azimi and Claus Pahl. 2020. A Layered Quality Framework for Machine Learning-driven Data and Information Models.. In ICEIS (1). 579--587.Google Scholar
- Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kearns, and Aaron Roth. 2018. Fairness in Criminal Justice Risk Assessments: The State of the Art. 50, 1 (2018), 3--44. https://doi.org/10.1177/0049124118782533 Publisher: SAGE PublicationsSage CA: Los Angeles, CA. Google ScholarCross Ref
- Gurol Canbek. 2021. The need for a systematic machine-learning process: A proposal via a mobile malware classification case study. In 2021 International Conference on Information Security and Cryptology (ISCTURKEY). IEEE, Ankara, Turkey, 173--178. Google ScholarCross Ref
- Elizamary de Souza Nascimento, Iftekhar Ahmed, Edson Oliveira, Márcio Piedade Palheta, Igor Steinmacher, and Tayana Conte. 2019. Understanding development process of machine learning systems: Challenges and solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1--6.Google ScholarCross Ref
- Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference (2012), 214--226. Google ScholarDigital Library
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Sydney NSW Australia, 259--268. Google ScholarDigital Library
- Görkem Giray. 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software (2021), 111031.Google Scholar
- Koichi Hamada, Fuyuki Ishikawa, Satoshi Masuda, Tomoyuki Myojin, Yasuharu Nishi, Hideto Ogawa, Takahiro Toku, Susumu Tokumoto, Kazunori Tsuchiya, Yasuhiro Ujita, et al. 2020. Guidelines for Quality Assurance of Machine Learning-based Artificial Intelligence.. In SEKE. 335--341.Google Scholar
- Fuyuki Ishikawa. 2018. Concepts in quality assessment for machine learning-from test data to arguments. In International Conference on Conceptual Modeling. Springer, 536--544.Google ScholarCross Ref
- Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems 33, 1 (Oct. 2012), 1--33. Google ScholarDigital Library
- Kyo C Kang, Sholom G Cohen, James A Hess, William E Novak, and A Spencer Peterson. 1990. Feature-oriented domain analysis (FODA) feasibility study. Technical Report. Carnegie-Mellon Univ Pittsburgh Pa Software Engineering Inst.Google Scholar
- Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counter-factual Fairness. In Advances in Neural Information Processing Systems (2017), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.htmlGoogle Scholar
- Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. 2020. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 23, 1 (Dec. 2020), 18. Google Scholar
- Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. 2021. A Survey on Bias and Fairness in Machine Learning. Comput. Surveys 54, 6 (July 2021), 1--35. Google ScholarDigital Library
- Julien Siebert, Lisa Joeckel, Jens Heidrich, Adam Trendowicz, Koji Nakamichi, Kyoko Ohashi, Isao Namba, Rieko Yamamoto, and Mikio Aoyama. 2021. Construction of a quality model for machine learning systems. Software Quality Journal (2021), 1--29.Google Scholar
- Stefan Studer, Thanh Binh Bui, Christian Drescher, Alexander Hanuschkin, Ludwig Winkler, Steven Peters, and Klaus-Robert Müller. 2021. Towards CRISP-ML (Q): a machine learning process model with quality assurance methodology. Machine Learning and Knowledge Extraction 3, 2 (2021), 392--413.Google ScholarCross Ref
- Thomas Thum, Christian Kastner, Sebastian Erdweg, and Norbert Siegmund. 2011. Abstract features in feature modeling. In 2011 15th International Software Product Line Conference. IEEE, 191--200.Google ScholarDigital Library
- Hugo Villamizar, Tatiana Escovedo, and Marcos Kalinowski. 2021. Requirements Engineering for Machine Learning: A Systematic Mapping Study. In SEAA. 29--36.Google Scholar
Index Terms
- Quality-driven machine learning-based data science pipeline realization: a software engineering approach
Recommendations
Construction of a quality model for machine learning systems
AbstractNowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary qualities of the system and its ...
Factors affecting effective software quality management revisited
Developing a good software system is a very complex task. In order to produce a good software product, several measures for software quality attributes need to be taken into account. System complexity measurement plays a vital role in controlling and ...
Monitoring Software Quality Evolution for Defects
Quality control charts, especially c-charts, can help monitor software quality evolution for defects over time. c-charts of the Eclipse and Gnome systems showed that for systems experiencing active maintenance and updates, quality evolution is ...
Comments