ABSTRACT
Machine learning (ML) is on the rise to be ubiquitous in modern software. Still, its use is challenging for software developers. So far, research has focused on the ML libraries to find and mitigate these challenges. However, there is initial evidence that programming languages also add to the challenges, identifiable in different distributions of bugs in ML programs. To fill this research gap, we propose the first empirical study on the impact of programming languages on bugs in ML programs. We plan to analyze software from GitHub and related discussions in GitHub issues and Stack Overflow for bug distributions in ML programs, aiming to identify correlations with the chosen programming language, its features and the application domain. This study's results enable better-targeted use of available programming language technology in ML programs, preventing bugs, reducing errors and speeding up development.
- Saleema Amershi, Andrew Begel, and Christian Bird. 2019. Software Engineering for Machine Learning: A Case Study. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’19). IEEE Press, 291–300. https://doi.org/10.1109/ICSE-SEIP.2019.00042 Google ScholarDigital Library
- Boris Beizer. 1984. Software System Testing and Quality Assurance. Van Nostrand Reinhold Co., USA. isbn:0442213069Google ScholarDigital Library
- Emery D. Berger, Celeste Hollenbeck, Petr Maj, Olga Vitek, and Jan Vitek. 2019. On the Impact of Programming Languages on Code Quality: A Reproduction Study. ACM Trans. Program. Lang. Syst., 41, 4 (2019), Article 21, Oct., 24 pages. issn:0164-0925 https://doi.org/10.1145/3340571 Google ScholarDigital Library
- Mary Beth Kery and Brad A. Myers. 2017. Exploring exploratory programming. In 2017 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 25–29. https://doi.org/10.1109/VLHCC.2017.8103446 Google ScholarCross Ref
- Gavin Bierman, Martín Abadi, and Mads Torgersen. 2014. Understanding TypeScript. In Proceedings of the 28th European Conference on ECOOP 2014 — Object-Oriented Programming - Volume 8586. Springer-Verlag, Berlin, Heidelberg. 257–281. isbn:9783662442012 https://doi.org/10.1007/978-3-662-44202-9_11 Google ScholarDigital Library
- S.R. Chidamber and C.F. Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on Software Engineering, 20, 6 (1994), 476–493. https://doi.org/10.1109/32.295895 Google ScholarDigital Library
- S. Eldh, H. Hansson, and S. Punnekkat. 2006. A Framework for Comparing Efficiency, Effectiveness and Applicability of Software Testing Techniques. In Testing: Academic Industrial Conference - Practice And Research Techniques (TAIC PART’06). 159–170. https://doi.org/10.1109/TAIC-PART.2006.1 Google ScholarDigital Library
- Onyeka Ezenwoye. 2018. What Language? - The Choice of an Introductory Programming Language. In 2018 IEEE Frontiers in Education Conference (FIE). 1–8. https://doi.org/10.1109/FIE.2018.8658592 Google ScholarCross Ref
- Zheng Gao, Christian Bird, and Earl T. Barr. 2017. To Type or Not to Type: Quantifying Detectable Bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 758–769. https://doi.org/10.1109/ICSE.2017.75 Google ScholarDigital Library
- M. Garkavtsev, N. Lamonova, and A. Gostev. 2018. Chosing a Programming Language for a New Project from a Code Quality Perspective. In 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP). 75–78. https://doi.org/10.1109/DSMP.2018.8478454 Google ScholarCross Ref
- Danielle Gonzalez, Thomas Zimmermann, and Nachiappan Nagappan. 2020. The State of the ML-Universe: 10 Years of Artificial Intelligence & Machine Learning Software Development on GitHub. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR ’20). Association for Computing Machinery, New York, NY, USA. 431–442. isbn:9781450375177 https://doi.org/10.1145/3379597.3387473 Google ScholarDigital Library
- Nargiz Humbatova, Gunel Jahangirova, Gabriele Bavota, Vincenzo Riccio, Andrea Stocco, and Paolo Tonella. 2020. Taxonomy of Real Faults in Deep Learning Systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA. 1110–1121. isbn:9781450371216 https://doi.org/10.1145/3377811.3380395 Google ScholarDigital Library
- Md Johirul Islam, Giang Nguyen, Rangeet Pan, and Hridesh Rajan. 2019. A Comprehensive Study on Deep Learning Bug Characteristics. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2019). Association for Computing Machinery, New York, NY, USA. 510–520. isbn:9781450355728 https://doi.org/10.1145/3338906.3338955 Google ScholarDigital Library
- Howell Jordan, Goetz Botterweck, and John Noll. 2015. A feature model of actor, agent, functional, object, and procedural programming languages. Science of Computer Programming, 98 (2015), 120–139. issn:0167-6423 https://doi.org/10.1016/j.scico.2014.02.009 Google ScholarDigital Library
- Kaggle. 2019. State of Data Science and Machine Learning 2019. https://www.kaggle.com/c/kaggle-survey-2019/dataGoogle Scholar
- Mary Beth Kery, Amber Horvath, and Brad Myers. 2017. Variolite: Supporting Exploratory Programming by Data Scientists. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ’17). Association for Computing Machinery, New York, NY, USA. 1265–1276. isbn:9781450346559 https://doi.org/10.1145/3025453.3025626 Google ScholarDigital Library
- P. S. Kochhar, D. Wijedasa, and D. Lo. 2016. A Large Scale Study of Multiple Programming Languages and Code Quality. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 1, 563–573. https://doi.org/10.1109/SANER.2016.112 Google ScholarCross Ref
- Andrey Kolmogorov. 1933. Sulla determinazione empirica di una lgge di distribuzione. Inst. Ital. Attuari, Giorn., 4 (1933), 83–91.Google Scholar
- Grace A. Lewis, Stephany Bellomo, and Ipek Ozkaya. 2021. Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems. arxiv:2103.14101. arxiv:2103.14101Google Scholar
- Mario Linares-Vásquez, Sam Klock, and Collin McMillan. 2014. Domain Matters: Bringing Further Evidence of the Relationships among Anti-Patterns, Application Domains, and Quality-Related Metrics in Java Mobile Apps. In Proceedings of the 22nd International Conference on Program Comprehension (ICPC 2014). Association for Computing Machinery, New York, NY, USA. 232–243. isbn:9781450328791 https://doi.org/10.1145/2597008.2597144 Google ScholarDigital Library
- Abhinav Nagpal and Goldie Gabrani. 2019. Python for Data Analytics, Scientific and Technical Applications. In 2019 Amity International Conference on Artificial Intelligence (AICAI). 140–145. https://doi.org/10.1109/AICAI.2019.8701341 Google ScholarCross Ref
- S. Nanz and C. A. Furia. 2015. A Comparative Study of Programming Languages in Rosetta Code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering. 1, 778–788. https://doi.org/10.1109/ICSE.2015.90 Google ScholarCross Ref
- D.L. Parnas and M. Lawford. 2003. The role of inspection in software quality assurance. IEEE Transactions on Software Engineering, 29, 8 (2003), 674–676. https://doi.org/10.1109/TSE.2003.1223642 Google ScholarDigital Library
- Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A Large Scale Study of Programming Languages and Code Quality in Github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). Association for Computing Machinery, New York, NY, USA. 155–165. isbn:9781450330565 https://doi.org/10.1145/2635868.2635922 Google ScholarDigital Library
- Michael L. Scott. 2009. 1 - Introduction. In Programming Language Pragmatics (Third Edition) (third edition ed.), Michael L. Scott (Ed.). Morgan Kaufmann, Boston. 5–39. isbn:978-0-12-374514-9 https://doi.org/10.1016/B978-0-12-374514-9.00010-0 Google ScholarCross Ref
- Beau Sheil. 1986. Datamation®: Power Tools for Programmers. In Readings in Artificial Intelligence and Software Engineering, Charles Rich and Richard C. Waters (Eds.). Morgan Kaufmann, 573–580. isbn:978-0-934613-12-5 https://doi.org/10.1016/B978-0-934613-12-5.50048-3 Google ScholarCross Ref
- Pramila P. Shinde and Seema Shah. 2018. A Review of Machine Learning and Deep Learning Applications. In 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). 1–6. https://doi.org/10.1109/ICCUBEA.2018.8697857 Google ScholarCross Ref
- X. Sun, T. Zhou, G. Li, J. Hu, H. Yang, and B. Li. 2017. An Empirical Study on Real Bugs for Machine Learning Programs. In 2017 24th Asia-Pacific Software Engineering Conference (APSEC). 348–357. https://doi.org/10.1109/APSEC.2017.41 Google Scholar
- F. Thung, S. Wang, D. Lo, and L. Jiang. 2012. An Empirical Study of Bugs in Machine Learning Systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering. 271–280. https://doi.org/10.1109/ISSRE.2012.22 Google ScholarDigital Library
- Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. 2014. Design and Evaluation of Gradual Typing for Python. SIGPLAN Not., 50, 2 (2014), Oct., 45–56. issn:0362-1340 https://doi.org/10.1145/2775052.2661101 Google ScholarDigital Library
- Jianxiong Xiao. 2017. Learning Affordance for Autonomous Driving. In Proceedings of the 2nd ACM International Workshop on Smart, Autonomous, and Connected Vehicular Systems and Services. Association for Computing Machinery, New York, NY, USA. 1. isbn:9781450351461 https://doi.org/10.1145/3131944.3133941 Google ScholarDigital Library
- Ru Zhang, Wencong Xiao, and Hongyu Zhang. 2020. An Empirical Study on Program Failures of Deep Learning Jobs. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). 1159–1170. https://doi.org/10.1145/3377811.3380362 Google ScholarDigital Library
- Tianyi Zhang, Ganesha Upadhyaya, and Anastasia Reinhardt. 2018. Are Code Examples on an Online Q A Forum Reliable?: A Study of API Misuse on Stack Overflow. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 886–896. https://doi.org/10.1145/3180155.3180260 Google ScholarDigital Library
Index Terms
- Impact of programming languages on machine learning bugs
Recommendations
Programming Languages The First 25 Years
The programming language field is certainly one of the most important subfields of computer science. It is rich in concepts, theories, and practical developments. The present paper attempts to trace the 25 year development of programming languages by ...
Representation learning for software engineering and programming languages
RL+SE&PL 2020: Proceedings of the 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program LanguagesRecently, deep learning (DL) and machine learning (ML) methods have been massively and successfully applied in various software engineering (SE) and programming languages (PL) tasks. The results are promising and exciting, and lead to further ...
Comparing heuristic and machine learning approaches for metric-based code smell detection
ICPC '19: Proceedings of the 27th International Conference on Program ComprehensionCode smells represent poor implementation choices performed by developers when enhancing source code. Their negative impact on source code maintainability and comprehensibility has been widely shown in the past and several techniques to automatically ...
Comments