short-paper

Impact of programming languages on machine learning bugs

Authors:
Sebastian Sztwiertnia

TU Darmstadt, Germany

TU Darmstadt, Germany

0000-0001-8532-4319
View Profile

,
Maximilian Grübel

TU Darmstadt, Germany

TU Darmstadt, Germany

0000-0002-2429-7369
View Profile

,
Amine Chouchane

TU Darmstadt, Germany

TU Darmstadt, Germany

0000-0001-5071-8860
View Profile

,
Daniel Sokolowski

TU Darmstadt, Germany

TU Darmstadt, Germany

0000-0003-2911-8304
View Profile

,
Krishna Narasimhan

TU Darmstadt, Germany

TU Darmstadt, Germany
View Profile

,
Mira Mezini

TU Darmstadt, Germany

TU Darmstadt, Germany
View Profile

AISTA 2021: Proceedings of the 1st ACM International Workshop on AI and Software Testing/AnalysisJuly 2021Pages 9–12https://doi.org/10.1145/3464968.3468408

Published:11 July 2021Publication History

AISTA 2021: Proceedings of the 1st ACM International Workshop on AI and Software Testing/Analysis

Pages 9–12

ABSTRACT

Machine learning (ML) is on the rise to be ubiquitous in modern software. Still, its use is challenging for software developers. So far, research has focused on the ML libraries to find and mitigate these challenges. However, there is initial evidence that programming languages also add to the challenges, identifiable in different distributions of bugs in ML programs. To fill this research gap, we propose the first empirical study on the impact of programming languages on bugs in ML programs. We plan to analyze software from GitHub and related discussions in GitHub issues and Stack Overflow for bug distributions in ML programs, aiming to identify correlations with the chosen programming language, its features and the application domain. This study's results enable better-targeted use of available programming language technology in ML programs, preventing bugs, reducing errors and speeding up development.

References

Saleema Amershi, Andrew Begel, and Christian Bird. 2019. Software Engineering for Machine Learning: A Case Study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’19). IEEE Press, 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042 Google ScholarDigital Library
Boris Beizer. 1984. Software System Testing and Quality Assurance. Van Nostrand Reinhold Co., USA. isbn:0442213069Google ScholarDigital Library
Emery D. Berger, Celeste Hollenbeck, Petr Maj, Olga Vitek, and Jan Vitek. 2019. On the Impact of Programming Languages on Code Quality: A Reproduction Study. ACM Trans. Program. Lang. Syst., 41, 4 (2019), Article 21, Oct., 24 pages. issn:0164-0925 https://doi.org/10.1145/3340571 Google ScholarDigital Library
Mary Beth Kery and Brad A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 25–29. https://doi.org/10.1109/VLHCC.2017.8103446 Google ScholarCross Ref
Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding TypeScript. In Proceedings of the 28th European Conference on ECOOP 2014 — Object-Oriented Programming - Volume 8586. Springer-Verlag, Berlin, Heidelberg. 257–281. isbn:9783662442012 https://doi.org/10.1007/978-3-662-44202-9_11 Google ScholarDigital Library
S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 20, 6 (1994), 476–493. https://doi.org/10.1109/32.295895 Google ScholarDigital Library
S. Eldh, H. Hansson, and S. Punnekkat. 2006. A Framework for Comparing Efficiency, Effectiveness and Applicability of Software Testing Techniques. In Testing: Academic Industrial Conference - Practice And Research Techniques (TAIC PART’06). 159–170. https://doi.org/10.1109/TAIC-PART.2006.1 Google ScholarDigital Library
Onyeka Ezenwoye. 2018. What Language? - The Choice of an Introductory Programming Language. In 2018 IEEE Frontiers in Education Conference (FIE). 1–8. https://doi.org/10.1109/FIE.2018.8658592 Google ScholarCross Ref
Zheng Gao, Christian Bird, and Earl T. Barr. 2017. To Type or Not to Type: Quantifying Detectable Bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 758–769. https://doi.org/10.1109/ICSE.2017.75 Google ScholarDigital Library
M. Garkavtsev, N. Lamonova, and A. Gostev. 2018. Chosing a Programming Language for a New Project from a Code Quality Perspective. In 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP). 75–78. https://doi.org/10.1109/DSMP.2018.8478454 Google ScholarCross Ref
Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-Universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR ’20). Association for Computing Machinery, New York, NY, USA. 431–442. isbn:9781450375177 https://doi.org/10.1145/3379597.3387473 Google ScholarDigital Library
Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:9781450371216 https://doi.org/10.1145/3377811.3380395 Google ScholarDigital Library
Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 510–520. isbn:9781450355728 https://doi.org/10.1145/3338906.3338955 Google ScholarDigital Library
Howell Jordan, Goetz Botterweck, and John Noll. 2015. A feature model of actor, agent, functional, object, and procedural programming languages. Science of Computer Programming, 98 (2015), 120–139. issn:0167-6423 https://doi.org/10.1016/j.scico.2014.02.009 Google ScholarDigital Library
Kaggle. 2019. State of Data Science and Machine Learning 2019. https://www.kaggle.com/c/kaggle-survey-2019/dataGoogle Scholar
Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA. 1265–1276. isbn:9781450346559 https://doi.org/10.1145/3025453.3025626 Google ScholarDigital Library
P. S. Kochhar, D. Wijedasa, and D. Lo. 2016. A Large Scale Study of Multiple Programming Languages and Code Quality. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 1, 563–573. https://doi.org/10.1109/SANER.2016.112 Google ScholarCross Ref
Andrey Kolmogorov. 1933. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn., 4 (1933), 83–91.Google Scholar
Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. arxiv:2103.14101. arxiv:2103.14101Google Scholar
Mario Linares-Vásquez, Sam Klock, and Collin McMillan. 2014. Domain Matters: Bringing Further Evidence of the Relationships among Anti-Patterns, Application Domains, and Quality-Related Metrics in Java Mobile Apps. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC 2014). Association for Computing Machinery, New York, NY, USA. 232–243. isbn:9781450328791 https://doi.org/10.1145/2597008.2597144 Google ScholarDigital Library
Abhinav Nagpal and Goldie Gabrani. 2019. Python for Data Analytics, Scientific and Technical Applications. In 2019 Amity International Conference on Artificial Intelligence (AICAI). 140–145. https://doi.org/10.1109/AICAI.2019.8701341 Google ScholarCross Ref
S. Nanz and C. A. Furia. 2015. A Comparative Study of Programming Languages in Rosetta Code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 778–788. https://doi.org/10.1109/ICSE.2015.90 Google ScholarCross Ref
D.L. Parnas and M. Lawford. 2003. The role of inspection in software quality assurance. IEEE Transactions on Software Engineering, 29, 8 (2003), 674–676. https://doi.org/10.1109/TSE.2003.1223642 Google ScholarDigital Library
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A Large Scale Study of Programming Languages and Code Quality in Github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA. 155–165. isbn:9781450330565 https://doi.org/10.1145/2635868.2635922 Google ScholarDigital Library
Michael L. Scott. 2009. 1 - Introduction. In Programming Language Pragmatics (Third Edition) (third edition ed.), Michael L. Scott (Ed.). Morgan Kaufmann, Boston. 5–39. isbn:978-0-12-374514-9 https://doi.org/10.1016/B978-0-12-374514-9.00010-0 Google ScholarCross Ref
Beau Sheil. 1986. Datamation®: Power Tools for Programmers. In Readings in Artificial Intelligence and Software Engineering, Charles Rich and Richard C. Waters (Eds.). Morgan Kaufmann, 573–580. isbn:978-0-934613-12-5 https://doi.org/10.1016/B978-0-934613-12-5.50048-3 Google ScholarCross Ref
Pramila P. Shinde and Seema Shah. 2018. A Review of Machine Learning and Deep Learning Applications. In 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). 1–6. https://doi.org/10.1109/ICCUBEA.2018.8697857 Google ScholarCross Ref
X. Sun, T. Zhou, G. Li, J. Hu, H. Yang, and B. Li. 2017. An Empirical Study on Real Bugs for Machine Learning Programs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). 348–357. https://doi.org/10.1109/APSEC.2017.41 Google Scholar
F. Thung, S. Wang, D. Lo, and L. Jiang. 2012. An Empirical Study of Bugs in Machine Learning Systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. 271–280. https://doi.org/10.1109/ISSRE.2012.22 Google ScholarDigital Library
Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. 2014. Design and Evaluation of Gradual Typing for Python. SIGPLAN Not., 50, 2 (2014), Oct., 45–56. issn:0362-1340 https://doi.org/10.1145/2775052.2661101 Google ScholarDigital Library
Jianxiong Xiao. 2017. Learning Affordance for Autonomous Driving. In Proceedings of the 2nd ACM International Workshop on Smart, Autonomous, and Connected Vehicular Systems and Services. Association for Computing Machinery, New York, NY, USA. 1. isbn:9781450351461 https://doi.org/10.1145/3131944.3133941 Google ScholarDigital Library
Ru Zhang, Wencong Xiao, and Hongyu Zhang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 1159–1170. https://doi.org/10.1145/3377811.3380362 Google ScholarDigital Library
Tianyi Zhang, Ganesha Upadhyaya, and Anastasia Reinhardt. 2018. Are Code Examples on an Online Q A Forum Reliable?: A Study of API Misuse on Stack Overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 886–896. https://doi.org/10.1145/3180155.3180260 Google ScholarDigital Library

Index Terms

Impact of programming languages on machine learning bugs

Recommendations

Programming Languages The First 25 Years

The programming language field is certainly one of the most important subfields of computer science. It is rich in concepts, theories, and practical developments. The present paper attempts to trace the 25 year development of programming languages by ...
Read More
Representation learning for software engineering and programming languages
RL+SE&PL 2020: Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages

Recently, deep learning (DL) and machine learning (ML) methods have been massively and successfully applied in various software engineering (SE) and programming languages (PL) tasks. The results are promising and exciting, and lead to further ...
Read More
Comparing heuristic and machine learning approaches for metric-based code smell detection
ICPC '19: Proceedings of the 27th International Conference on Program Comprehension

Code smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
AISTA 2021: Proceedings of the 1st ACM International Workshop on AI and Software Testing/Analysis
July 2021
20 pages
ISBN:9781450385411
DOI:10.1145/3464968
General Chairs:
Shuai Wang
Hong Kong University of Science and Technology, China
,
Xiaofei Xie
Nanyang Technological University, Singapore
,
Lei Ma
Kyushu University, Japan
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 July 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
empirical study
machine learning
programming languages
Qualifiers
- short-paper
Conference
Upcoming Conference
ISSTA '24

Sponsor:

sigsoft

33rd ACM SIGSOFT International Symposium on Software Testing and Analysis

September 16 - 20, 2024

Vienna , Austria
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 246
  Total Downloads
- Downloads (Last 12 months)69
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Impact of programming languages on machine learning bugs

AISTA 2021: Proceedings of the 1st ACM International Workshop on AI and Software Testing/Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Programming Languages The First 25 Years

Representation learning for software engineering and programming languages

Comparing heuristic and machine learning approaches for metric-based code smell detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Impact of programming languages on machine learning bugs

AISTA 2021: Proceedings of the 1st ACM International Workshop on AI and Software Testing/Analysis

ABSTRACT

References

Cited By

Index Terms

Recommendations

Programming Languages The First 25 Years

Representation learning for software engineering and programming languages

Comparing heuristic and machine learning approaches for metric-based code smell detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media