Software defect prediction based on non-linear manifold learning and hybrid deep learning techniques

Software defect prediction plays a very important role in software quality assurance, which aims to inspect as many potentially defect-prone software modules as possible. However, the performance of the prediction model is susceptible to high dimensionality of the dataset that contains irrelevant and redundant features. In addition, software metrics for software defect prediction are almost entirely traditional features compared to the deep semantic feature representation from deep learning techniques. To address these two issues, we propose the following two solutions in this paper: (1) We leverage a novel non-linear manifold learning method - SOINN Landmark Isomap (SL-Isomap) to extract the representative features by selecting automatically the reasonable number and position of landmarks, which can reveal the complex intrinsic structure hidden behind the defect data. (2) We propose a novel defect prediction model named DLDD based on hybrid deep learning techniques, which leverages denoising autoencoder to learn true input features that are not contaminated by noise, and utilizes deep neural network to learn the abstract deep semantic features. We combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to achieve the best prediction performance by adjusting a hyperparameter. We compare the SL-Isomap with seven state-of-the-art feature extraction methods and compare the DLDD model with six baseline models across 20 open source software projects. The experimental results verify that the superiority of SL-Isomap and DLDD on four evaluation indicators.

potentially defective software modules (such as components, files, classes) as possible before releasing the new software product [Hall, Beecham, Bowes et al. (2012)]. Nevertheless, a serious challenge that threatens the modeling process of defect prediction is the high dimensionality of defect datasets, i.e., datasets that contain excessive irrelevant and redundant features. To solve this issue, a few feature extraction methods have been proposed to alleviate irrelevant and redundant features by constructing new, combined features form the original features, which have not been thoroughly investigated in software defect prediction [Kondo, Bezemer, Kamei et al. (2019)]. In this paper, we leverage a non-linear manifold learning method -SOINN Landmark Isomap (SL-Isomap) [Gan, Shen, Zhao et al. (2014)] to extract the representative features form the original defect features by selecting automatically the reasonable number and position of landmarks, which can reveal the complex intrinsic structure hidden behind the defect data. SL-Isomap adopts the SOINN (Self-Organizing Incremental Neural Network) algorithm [Shen, Tomotaka and Osamu (2007)] to automatically select the reasonable number of landmarks, thus characterizing topological structure of defect data in the high dimensional input space. In addition, SL-Isomap also utilizes the L-Isomap (Landmark Isomap) algorithm to search low dimensional manifolds from high dimensional defect data based on selected landmarks. At present, deep learning techniques are research hotspot in the field of artificial intelligence, and have been successfully used in many domains, such as image classification [Zhang, Wang, Lu et al. (2019)]. In this paper, in order to bridge the research gap, while taking into account the superior prediction performance of deep learning techniques, we leverage hybrid deep learning techniques-denoising autoencoder (DAE) [Vincent, Larochelle Bengio et al. (2008)] and deep neural network (DNN) to construct a novel defect prediction model named DLDD by further processing the defect features extracted by SL-Isomap. Denoising autoencoder can remove noise through training to learn true input features that are not contaminated by noise, and reconstruct a clean "repaired" input form the "corrupted" input, thus learning the reconstructed distribution by changing the reconstruction error term. The learned features not only have more robust feature representation, but also stronger generalization capability. Then we integrate these defect features processed by denoising autoencoder into the abstract deep semantic features by deep neural network. The deep neural network trained by these deep semantic features has stronger discriminative capacity for different classes [Wang, Jiang, Luo et al. (2019); Zhou, Tan, Yu et al. (2019)]. For the loss function of the entire DLDD model, we combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to reinforce the learned defect feature representation by controlling a hyperparameter θ, thereby achieving the best defect prediction effect. The main contributions of this paper can be summarized as follows: (1) We utilize a novel non-linear manifold learning method -SOINN Landmark Isomap (SL-Isomap) to extract the representative features from the original defect features by selecting automatically the reasonable number and position of landmarks, which can reveal the complex intrinsic structure hidden behind the defect data.
(2) Encouraged by the superior performance of deep learning techniques, we propose a novel defect prediction model called DLDD based on hybrid deep learning techniques, which leverages denoising autoencoder to learn the reconstructed distribution and more robust feature representation by changing the reconstruction error term, and utilizes deep neural network to learn the abstract deep semantic features.
(3) For the loss function of the entire DLDD model, we combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to achieve the best performance of defect prediction by adjusting a hyperparameter.
(4) To verify the performance of SL-Isomap and DLDD, we conduct extensive experiments for feature extraction and defect prediction across 20 software defect projects from large open source datasets. We compare the SL-Isomap with seven state-of-the-art feature extraction methods, and compare the DLDD model with six baseline models contain five classic defect predictors and deep neural network. The experimental results demonstrate that the effectiveness of SL-Isomap and DLDD on four evaluation indicators.

Related work
Software defect prediction is a research hotspot in software engineering domain. The majority of previous studies use different machine learning methods to construct defect prediction models. Li et al. [Li, Jing, Zhu et al. (2018) ] leverage a new Two-Stage Ensemble Learning (TSEL) method to conduct software defect prediction model, and the method includes two stages: ensemble multi-kernel domain adaptation stage and ensemble data sampling stage. Wang et al. [Wang, Zhang, Jing et al. (2016)]. propose a semiboost defect prediction model called NSSB based on non-negative sparse graphs, which can utilize the adaboost algorithm to boost the model performance. The experimental results demonstrate that the NSSB model can effectively address the issues of label instances inadequacy and class imbalance. Chen et al. [Chen and Ma (2015)] use six regression models to conduct extensive empirical studies, and the experimental results show that decision tree regression can achieve the best prediction performance. Lov et al. [Lov, Saikrishna, Ashish et al. (2018)] construct the defect prediction model based on Least Squares Support Vector Machine (LSSVM) associated with linear, polynomial and radial basis function kernel functions. Different from previous studies, we leverage hybrid deep learning techniques -denoising autoencoder and deep neural network to construct a novel software defect prediction model in this paper.

Feature extraction based on SL-Isomap
We utilize a non-linear manifold learning technique-SOINN Landmark Isomap (SL-Isomap) [Gan, Shen, Zhao et al. (2014)] to extract the representative features form the original defect features, which can reveal the complex intrinsic structure hidden behind the defect data. SL-Isomap is a variant of Isomap [Li, Zhang, Zhang et al. (2017)], which leverages the SOINN (Self-Organizing Incremental Neural Network) algorithm to automatically select the reasonable number and position of landmarks, so as to depict topological structure of defect data in the high dimensional input space and lessen short-circuit errors.
In addition, L-Isomap (Landmark Isomap) algorithm is adopted to search low dimensional manifolds from high dimensional defect data based on selected landmarks. The implementation process for SL-Isomap is as follows. The data points on each software project are defined as = {( , )| = 1,2, … , }. 1) Select the reasonable number and position of SOINN landmarks We utilize the SOINN algorithm to select the reasonable number and position of landmarks automatically. We first initialize the following variables: The output nodes: = { 1 , 2 }, the number of local cumulative signals: 1 = 2 = 1 , the thresholds: 1 = 2 = (1,2) , the connection value: = ∅, the connection age: (1,2) = 0. We can find the winner 1 and second winner (second-nearest) 2 by searching the output nodes from the input data points ( ∈ [3, ]) one by one, as shown in Eqs. (1) and (2): (1) The is a new data node and = ∪ when || − 1 || > 1 or || − 2 || > 2 , and go back to find winner again. If there is no the connection between 1 and 2 , we need to recreate the connection and reset the connection age ( 1 , 2 ) to 0, and assign 1 to the age of all edges in 1 and increase the number of local cumulative signals 1 by 1, and then adjust the winner 1 to input data by a certain fraction and delete invalid edges and connections. The SOINN can automatically determine the number of landmarks n. After updating the threshold and removing noise nodes, we can obtain the node set = { 1 , 2 , … , } and Nearest neighbors of O in D, namely the landmark set = { 1 , 2 , … , }, which can be expressed as follows: 2) Apply MDS on SOINN landmarks We utilize MDS (MultiDimensional Scaling) to construct matrix Hn based on the selected n landmarks, as shown in Eq. (4): where denotes the matrix of squared G, G represents landmarks-only distance matrix. Next the l-dimensional coordinates of n landmarks are represented as the columns of matrix U: where denotes the ith biggest eigenvalues of , ⃗ represents the corresponding eigenvector.

3) LMDS based on SOINN landmarks
We calculate embedding coordinates for the remaining data nodes according to the distances from the SOINN landmarks. First, we need to conduct n times Dijkstra algorithm to compute single-source shortest path matrix G'(n*N), which denotes the approximate geodesic distance between landmarks and remaining data nodes. Second, we leverage LMDS (Landmark MDS) to generate the low dimensional embedding, the embedding of x can be expressed as follows: where represents the column vector of squared distances between data node x and n landmarks (one column vector in squared G'), ̅ denotes the average value of the column for . Finally, we utilize PCA to reorient the axes to reflect the entire distribution of { 1 , 2 , … , }, thus extracting the representative features Xf.

Robustness feature representation based on denoising autoencoder
Denoising autoencoder can remove noise through training to learn true input features that are not contaminated by noise, thereby reconstructing a clean "repaired" input form the "corrupted" input. We utilize denoising autoencoder to further process these defect features Xf extracted by SL-Isomap, aiming to generate more robust feature representation, which has stronger generalization capability. Denoising autoencoder regards the corrupted data as the input and the predicted undamaged data as the output, and can learn useful information by changing the reconstruction error term. The training process of denoising autoencoder is shown in Fig.  1. Denoising autoencoder is trained to reconstruct clean data points x from damaged version �, which can be achieved by minimizing the loss = − log ( |ℎ = ( �)), where � is the damaged version of the each defect instance x by the damage process ( �| ). Denoising autoencoder can learn the reconstructed distribution ( | � ) from the data pairs ( , �) according to the follow training process: First, given a d-dimensional input vector ∈ , we introduce a damage process ( �| ) by adding Gaussian noise, the conditional distribution denotes the probability that the given defect instance X generates the corrupted instance � . Gaussian noise is a type of noise whose probability density function obeys the Gaussian distribution (i.e., normal distribution). The probability density function can be expressed as shown in Eq. (9): where represents the standard deviation and represents the expectation. Then, we leverage the training instances ( , �) to estimate the reconstructed distribution ( | �) = ( |ℎ), which contains two stages: encoder and decoder. For the encoder stage, the d-dimensional Gaussian noise input data � is mapped to the k-dimensional hidden layer h, as shown in Eq. (10); for the decoder stage, the hidden layer h is reconstructed to the d-dimensional output r, as shown in Eq. (11) . ℎ = ( � + 1 ), where f(.) and g(.) denote the activation function of the encoder and decoder, respectively, ∈ × and ′ ∈ × present the weight matrix of the encoder and decoder, respectively, 1 ∈ and 2 ∈ denote the bias of hidden layer and output layer, respectively, and the parameter of denoising autoencoder can be defined as follows: = ( , ′ , 1 , 2 ). The parameter is trained to minimize the reconstruction error, as shown in Eq. (12): where L(.) denotes the squared error loss function, N represents the total number of training instances. We adopt the squared error (the average reconstruction error) as the loss function of denoising autoencoder. The smaller the value, the better the performance of denoising autoencoder. The loss function LDAE of denoising autoencoder is as shown in Eq. (13): where || || is the norm of the squared error.

Feature integration based on deep neural network
We integrate these defect features processed by denoising autoencoder into the abstract deep semantic features by deep neural network (DNN). The deep neural network trained by these deep semantic features has stronger discriminative capacity for different classes (defective or non-defective). We utilize the trained deep neural network to predict whether the defect of unknown label is defective or non-defective. According to the location of different layers, the network layers of deep neural network can be divided into three categories: input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The neurons among various network layers are fully connected, whereas the neurons within the same layer have no direct connections. Moreover, the number of the neurons in the input and output layers are determined in accordance with specific applications, while the number of hidden layers and the number of neurons for each hidden layer are determined empirically. The network structure of deep neural network in this paper is shown in Fig. 2. The output of the first hidden layer can be expressed as shown in Eq. (14): where presents the mth input vector, presents the input weight vector connecting the mth input node and the kth hidden node, denotes the bias of the kth hidden node, g(.) denotes the nonlinear activation function. The output of the output layer is as follows: where presents the output weight connecting the jth output node and the sth hidden node, and presents the output value of the sth hidden node. denotes the probability that a specific module belongs to the jth class. The training process of deep neural network is mainly divided into the forward transmission of the information and the backpropagation of the loss. In the training process of deep neural network, the loss is used for updating the network parameters (weights and biases) by gradient descent, aiming to maximize the probability of the correct class label and minimize the probability of the incorrect class label, in other words, to minimize the classification loss on the given training set. In this paper, the deep neural network adopts cross entropy loss function to train the network parameters. From the perspective of classification, it is the probability that the input instances are predicted to belong to a certain class. The smaller the cross entropy, the more accurate the prediction result. The equation for cross entropy loss function is as shown in Eq. (16): where ( ) presents the actual probability of the jth input vector of the tth module , ( ) presents the output probability of the jth input vector of the tth module by deep neural network, and C presents the number of defect classes.

Hybrid loss function for the DLDD model
For the loss function of the entire DLDD model, we combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to further reinforce the learned defect feature representation. The equation of the hybrid loss function for the DLDD model is as shown in Eq. (17) For the hyperparameter , ∈ [0, 1], we can adjust according to the experimental result. In this paper, we discuss the performance of the DLDD model when the hyperparameter =0.25, 0.5, 0.75 and 1, respectively. This experiment part will introduce the experimental results for different hyperparameter in detail. We utilize the proposed DLDD model to learn the deep semantic features with stronger discriminative capacity for the training set. After using the defect instances with known labels to train the proposed DLDD model, the weights and biases of deep neural network will no longer change. For the defect instances with unknown labels in the test set, we feed them to the DLDD model for prediction with the same mapping rule, the class label with the highest probability manifests that the defect instance belongs to this class (defective or not-defective).

Experimental setup
In this section, we introduce the experimental setup, including benchmark datasets, evaluation indicators and baseline models.

Benchmark datasets
We conduct extensive experiments on 20 software projects, including 14 projects from the PROMISE data repository and 6 projects from the NASA data repository, which are publicly available and well-known datasets in software defect prediction study [Lov, Saikrishna, Ashish et al. (2018)]. Tab. 1 summarizes the basic information of 14 projects (the first fourteen rows) from the PROMISE data repository and 6 projects (the latter six rows) from the NASA data repository respectively. For all software projects, we adopt the SMOTE (Synthetic Minority Oversampling Technique) algorithm for class imbalance processing and the z-score method for data normalization in this paper. Moreover, we conduct 10 times 10-fold cross-validation to evaluate the performance of the models in this paper. In this paper, we adopt four commonly used evaluation indicators-F1, MCC (Matthews correlation coefficient), pf and G-measure [Zhu, Zhang, Ying et al. (2020)] to evaluate the model performance.  [Zhao, Zou and Gao (2013)], Generalized Discriminant Analysis-Gaussion (GDA-G) [Uddin and Hassan (2015)] and Isometric Mapping (Isomap) [Li, Zhang, Zhang et al. (2017)]. These feature extraction methods all use the DLDD as the defect predictor. For software defect prediction, we compare the SL-Isomap model with five classic defect predictor, include Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN), Decision Tree (DT) and Logistic Regression (LR). These defect predictors all use the features extracted by SL-Isomap.
In addition, we also compare the DLDD with Deep Neural Network (DNN) that does not combine denoising autoencoder (DAE), and the DNN also use the features extracted by the SL-Isomap.

Experimental results
We detail the experimental results by the following three research questions (RQ) in the section.

RQ1: How about the feature extraction capability of the non-linear manifold learning method SL-Isomap compared with seven state-of-the-art feature extraction methods in software defect prediction?
To verify the effectiveness of the representative features extracted by the non-linear manifold learning method SL-Isomap, we compare the SL-Isomap with seven state-ofthe-art feature extraction methods with the same defect predictor-DLDD ( =0.75), including FA, PCA, SPE, SNE, NPE, GDA-G and Isomap. In RQ2, the experiment results demonstrate that the DLDD model can achieve the best defect prediction performance when =0.75, so we choose the DLDD with =0.75 in RQ1.
Tabs. 2-4 show the F1, MCC and G-measure of SL-Isomap and seven state-of-the-art feature extraction methods across all 20 projects. Note that the highest value of each row is marked in bold. From Tabs. 2-4, we can observe that our method SL-Isomap achieves the best average performance in terms of F1, MCC and G-measure. More specifically, the average F1 (0.7957) by SL-Isomap gains improvement between 4.40% (for Isomap) and 18.46% (for NPE) with an average improvement of 8.96%, the average MCC (0.5714) by SL-Isomap yields improvement between 12.48% (for Isomap) and 89.46% (for NPE) with an average improvement of 34.83% and the average G-measure (0.7820) by SL-Isomap achieves improvement between 4.78% (for PCA) and 23.15% (for NPE) with an average improvement of 11.35%. Fig. 3 shows the box-plots of four indicators for our method SL-Isomap and seven feature extraction methods across all 20 projects. From Figs. 3(a)-3(d), we can observe that the median values gained by SL-Isomap are higher than those gained by seven feature extraction methods from the point of F1, MCC and G-measure respectively, and the median value gained by SL-Isomap is lower than those gained by seven feature extraction methods from the point of pf, which can fully demonstrate the superiority of our method SL-Isomap. In addition, for F1, MCC and G-measure, the median values by SL-Isomap are higher than the maximum values by SPE and NPE, respectively. Compared with other feature extraction methods, our method SL-Isomap can achieve the best experimental results. This is because SL-Isomap can utilize the SOINN algorithm to automatically select the reasonable number of landmarks, thereby characterizing topological structure of defect data in the high dimensional input space. Moreover, SL-Isomap also leverages the L-Isomap algorithm to search low dimensional manifolds from high dimensional defect data based on selected landmarks.

Conclusion 1:
Our method SL-Isomap performs better than seven state-of-the-art feature extraction methods in terms of F1, MCC and G-measure. The SL-Isomap can achieve the average 8.96%, 34.83% and 11.35% performance improvements compared with seven feature extraction methods across all 20 projects in terms of F1, MCC and G-measure. In terms of pf, the median value gained by SL-Isomap is lower than those gained by other seven methods.   In this paper, we combine the squared error loss function of denoising autoencoder with the cross entropy loss function of deep neural network to further reinforce the learned defect feature representation by controlling the hyperparameter , ∈ [0, 1] . To investigate the influence of on the performance of the DLDD model, we select =0.25, 0.5, 0.75, 1, and discuss the performance of the DLDD model when the hyperparameter =0.25, 0.5, 0.75, 1, respectively. In addition, this question is also designed to evaluate the effectiveness of the DLDD model compared with five classic defect predictors with the same feature extraction method SL-Isomap, including SVM, NB, KNN, DT and LR. Tabs. 5-7 show the F1, MCC and G-measure of the DLDD ( =0.25, 0.5, 0.75, 1) model compared with those of five classic predictors across all 20 projects, respectively. Note that the highest value of each row is marked in bold. From Tabs. 5-7, compared with DLDD ( =0.25, 0.5, 1), we can find that the DLDD model is the best performer in terms of F1, MCC and G-measure when the hyperparameter =0.75. Moreover, compared with SVM, NB, KNN, DT and LR, we can observe that the proposed DLDD ( =0.75) model also achieves the best average performance in terms of F1, MCC and G-measure. More specifically, the average F1 (0.7957) by DLDD achieves improvement between 7.14% (for DT) and 19.19% (for NB) with an average improvement of 11.31%, the average MCC (0.5714) by DLDD yields improvement between 19.97% (for LR) and 97.65% (for NB) with an average improvement of 46.55% and the average G-measure (0.7820) by DLDD achieves improvement between 8.04% (for LR) and 27.15% (for NB) with an average improvement of 15.08%. Fig. 4 shows the box-plots of four indicators for the proposed DLDD ( =0.25, 0.5, 0.75, 1) model and five classic defect predictors across all 20 projects. From Figs. 4(a)-4(d), we can observe that the median values gained by DLDD ( =0.25, 0.5, 0.75, 1) are higher than those gained by five classic defect predictors from the point of F1, MCC, and G-measure, respectively. We also find that the median values by DLDD ( =0.75) are higher than the maximum values by NB respectively in terms of F1, MCC, G-measure, and the median value by DLDD ( =0.75) is lower than the minimum values by SVM, KNN and DT in terms of pf. In addition, the median values gained by DLDD ( =0.75) are higher than those gained by DLDD ( =0.25, 0.5, 1) respectively in terms of MCC and G-measure, and the median value gained by DLDD ( =0.75) is lower than those gained by DLDD ( =0.25, 0.5, 1) respectively in terms of pf (the smaller the pf, the better the performance). Compared with five classic defect predictors, our DLDD model can achieve the best prediction performance. This is because the DLDD model adopts denoising autoencoder to learn the reconstructed distribution and more robust feature representation by changing the reconstruction error term, and utilizes deep neural network to learn the abstract deep semantic features. The deep semantic features have stronger discriminative capacity for different classes. In addition, the model performance can be affected by the degree of noise added, the DLDD model can achieve the best experimental performance when =0.75.   Denoising autoencoder can remove noise through training to learn more robust feature representation that are not contaminated by noise, and reconstruct a clean "repaired" input form the "corrupted" input. To explore the influence of the denoising autoencoder on the prediction performance of the DLDD ( =0.75) model, we compare the DLDD ( =0.75) model (with denoising autoencoder) with deep neural network (without denoising autoencoder) in this experiment.

Conclusion 2:
The proposed DLDD model can achieve the best prediction performance in terms of F1, MCC, pf and G-measure when the hyperparameter =0.75. The DLDD ( =0.75) can achieve the average 11.31%, 46.55% and 15.08% performance improvements compared with five defect predictors across all 20 projects in terms of F1, MCC and G-measure.

Conclusion
Software defect prediction can effectively guide the direction of software testing by allocating reasonably limited testing resources to highly risky modules before releasing the new software product. In this work, we construct an effective software defect prediction model based on a novel non-linear manifold learning feature extraction method and hybrid deep learning techniques. First, we leverage an advanced non-linear Conclusion 3：The DLDD ( = 0.75) model outperforms the single deep neural network that does not combine denoising autoencoder, and the experimental results prove that the denoising autoencoder can boost the prediction performance of the DLDD ( =0.75) model. manifold learning method -SL-Isomap to extract the representative features from the original defect features. Second, we propose a novel defect prediction model called DLDD based on hybrid deep learning techniques, which leverages denoising autoencoder to learn the reconstructed distribution and more robust feature representation by changing the reconstruction error term, and utilizes deep neural network to learn the abstract deep semantic features. In addition, we also combine loss functions of two deep learning techniques to achieve the best performance of defect prediction by adjusting a hyperparameter. We conduct extensive experiments for feature extraction and defect prediction across 20 software defect projects from large open source datasets, and the experimental results demonstrate that the effectiveness of SL-Isomap and DLDD.
In future work, in order to verify generalization capability and practicability of SL-Isomap and DLDD, we will evaluate them in more open source and commercial projects. Moreover, we also plan to extend SL-Isomap and DLDD to cross-project defect prediction.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.