Authors:
Diana-Lucia Miholca
and
Zsuzsanna Oneţ-Marian
Affiliation:
Department of Computer Science, Babeş-Bolyai University, No. 1, Mihail Kogalniceanu street, Cluj-Napoca, Romania
Keyword(s):
Software Defect Prediction, Doc2Vec, Graph2Vec, LSI, Hyperparameter Tuning, Deep Learning.
Abstract:
Software defect prediction is an essential software development activity, a highly researched topic and yet a still difficult problem. One of the difficulties is that the most prevalent software metrics are insufficiently relevant for predicting defects. In this paper we are proposing the use of Graph2Vec embeddings unsupervisedly learnt from the source code as basis for prediction of defects. The reliability of the Graph2Vec embeddings is compared to that of the alternative embeddings based on Doc2Vec and LSI through a study performed on 16 versions of Calcite and using three classification models: FastAI, as a deep learning model, Multilayer Perceptron, as an untuned conventional model, and Random Forests with hyperparameter tuning, as a tuned conventional model. The experimental results suggest a complementarity of the Graph2Vec, Doc2Vec and LSI-based embeddings, their combination leading to the best performance for most software versions. When comparing the three classifiers, the
empirical results highlight the superiority of the tuned Random Forests over FastAI and Multilayer Perceptron, which confirms the power of hyperparameter optimization.
(More)