Prediction and Comparative Analysis of Software Reliability Model Based on NHPP and Deep Learning

K.Y.S


Introduction
Over time, software has become increasingly important in various fields. In the past, software helped with simple tasks or small parts in each industry; however, it has now been developed into a form that can perform basic roles in all-inclusive roles. If the current software is more dependent than in the past and is broken owing to large and small issues, such as certain coding and system errors, it is expected to cause significant damage to the entire industry. To address this problem, studies on software reliability, which measure how effectively the software works, are constantly being conducted. Software reliability tests the ability of software to not fail for a specified period and indicates how long the software can be used without failure.
Studies on software reliability have been conducted based on the software reliability model, assuming the Markov model and the non-homogeneous Poisson process (NHPP) [1]. However, since software failures follow a Poisson distribution, most related studies have been conducted based on the software reliability model, assuming NHPP. Previous studies on software reliability began when a software reliability model based on NHPP was developed by Goel and Okumoto [2]. It was assumed that software failures occur independently and do not affect each other. Subsequently, a study was conducted on an S-shaped curve model in which the number of cumulative software failures increased along the S curve [3][4][5]. In addition, a software reliability model study, assuming incomplete debugging in which the defects found in the test phase were not corrected or removed, was also conducted [6]. By expanding on this, a study on a generalized incomplete debugging defect detection rate model was conducted [7][8][9][10][11]. The defect detection rate of each proposed model is in the form of a function. In addition, since the operating environment is different for each software, it is difficult to compare them equally. Thus, the software reliability model has been studied, considering uncertain factors in the operating environment [12,13].
As software develops over time, it becomes composed of complex and intertwined structures; therefore, studies on software reliability models are being conducted, assuming that software failures occur independently [14]. A software reliability model that assumes that previous errors will continuously affect software failures if they are not well fixed has been proposed, and a dependent software reliability model that assumes an uncertain operating environment has been studied [15,16].
As research progressed, several software reliability models studied under independent and dependent assumptions developed numerous models suitable for special cases, and generalizing them has been difficult. To address this, studies on software reliability models using deep learning, which is a nonparametric method, have been performed. Among the nonparametric software reliability models using deep learning, a study on software reliability models using deep neural networks was conducted [17], and a software reliability model using deep learning that applies failure data generated through open-source software was also proposed [18]. In addition, since software failure is a sequential data characteristic, the software reliability model using the recurrent neural network (RNN) and the long short-term memory (LSTM) was studied using this characteristic [19][20][21].
The software reliability model developed by NHPP has the problem that it fits well only in special cases because it adds special mathematical assumptions. Therefore, in this study, to solve this problem, we aim at the software reliability model utilizing deep learning among machine learning methods, which is a form that relies on given failure data. Since it is a data-dependent model, it is possible to develop a generalized software reliability model that considers all failures that occur in software without relying on special cases of software failures. In addition, in the past, software reliability models utilizing deep learning have been analyzed by applying methods, but in this study, we propose to add a deeper hidden layer of deep neural networks to the recurrent neural network series. If we develop a model that utilizes 100% of the data, it may cause an overfitting problem that only fits the trained data well. As a result, we also present the results for 80% and 90% of the data sets and the resulting predictions to prove superiority using 11 criteria comparisons. Section 2 presents the theoretical background of software reliability and deep learning. Section 3 introduces numerical examples, and finally, Section 4 presents the conclusions.

Generalized Software Relability Model
Software reliability refers to the probability that the software does not cause system failure for a certain period of time under specific conditions. The reliability function used to evaluate this is This indicates the probability of an operation over time t. The pdf f (t) assumes the failure time or lifetime of the software to be a random variable T. We obtained the reliability function by assuming a distribution for the calculation. It was assumed that it follows an exponential distribution with parameter λ. Subsequently, we approach λ as an NHPP and propose a model in which λ is not a single constant value but a mean value function m(t) that changes over time [1]. N(t) is the number of failures up to time t and is a Poisson probability density function with parameter m(t).
P{N(t) = n} = m(t)} n n! e −m(t) , n = 0, 1, 2, · · · , t ≥ 0 (2) m(t) is the integral of λ(t), which is a strength function representing the number of instantaneous failures at time t and the mean value function from 0 to time t.
The m(t) is calculated using the relationship between the number of failures a(t) at each time point and the failure detection rate b(t).
The software reliability model is developed using a differential equation, which is the basic form of the above equation. The software reliability model obtained by multiplying η in Equation (4) assumes an uncertain operating environment. Here, η is a parameter for the uncertainty of the operating environment, and N is the expected number of faults that exist in the software before testing [12].
In addition, if m(t) is multiplied again in Equation (4), a software reliability model can be derived assuming dependent failures, in which software failures at the previous time point affect failures at the next time point [14]. This is expressed in the following equation: If each differential equation is arranged in terms of m(t) according to the conditions in Equations (4)-(6), a software reliability model that satisfies each condition can be derived. The developed software model is presented in Table 1. Models 1 to 6 are software reliability models obtained through the most basic form of differential equations; models 7 and 8 are those assuming uncertain operating environments; model 9 assumes dependent failures; and model 10 assumes the occurrence of dependent failures in an uncertain operating environment.  An artificial neural network is based on the biological structure of the human brain and consists of nonlinear neurons. Based on the information received by the five senses, humans think and judge last, as information moves from one neuron to the next. The neurons (nodes) present in each layer of an artificial neural network are the most basic information-processing units. The network structure consists of three layers: input, output, and hidden layers. The input layer plays the same role as the information obtained by human beings through their five senses or thoughts, whereas the output layer refers to thoughts or judgments based on this. Conversely, the hidden layer learns to think or judge while receiving signals from the previous layer and passes them to the next layer through an activation function. A deep neural network (DNN) refers to a form in which the number of hidden layers is large in an artificial neural network [22,23]. When data is inputted from the input layer, the next hidden layer moves to the next layer through an activation function after combining the input value, weight, and bias and is transmitted to the output layer. Equation (7) expresses the passing from each layer. It consists of the product of the transformed z value (input value) in the previous layer, the weight, and the sum of the biases.
For each layer, the activation function when moving to the next layer mainly uses the sigmoid, hyperbolic tangent, and ReLU functions. As the activation function used when outputting the result passed to the last output layer is a continuous variable, an identity function was used.
The final predicted value was obtained using the above process. The difference between the predicted value y and the actual value t is calculated as a loss function. To minimize the difference between these values, the function is updated to have the minimum value based on the differential value for each weight and bias.
The weight update implies that the loss function is updated by the learning rate (α) based on the partial derivative value for a specific weight using the backpropagation algorithm. The deep neural network learns through a series of processes and outputs a predicted value that approximates the actual value.

Recurrent Neural Network
The recurrent neural network is a model specialized for sequential data structures. It has cells that can remember the information of a previous point in the hidden layer of the most basic deep neural network. Therefore, it is suitable for time-series data because it is Appl. Sci. 2023, 13, 6730 5 of 22 designed to remember information from the past and influence future events. Figure 1a shows an image of the RNN model, and the formula is shown in Equation (10) [24].
Let and be an exponential moving average of momentum and RMSprop. After obtaining the parameters u and s of momentum and RMSprop, update the weights by the learning rate α. In this study, to optimize the hyper-parameters, we set a range of hyper-parameters and iterated to find the optimal results. We want to find the optimal value by increasing the learning rate α by 5-times units from 0.0000001 to 0.01, and utilizing and of 0.9 and 0.99, respectively. Figure 2 shows the software reliability model that combines recurrent neural network types and deep neural networks among software reliability models using deep learning.

Data Information
The dataset used in this study is the software failure data from PLC4X (https://plc4x.apache.org, accessed on 7 February 2022)) and Apache Camel (http://camel.apache.org, accessed on 7 February 2022). Dataset 1 is from PLC4X, which is a set of libraries for communicating with industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. The data represent the cumulative number of failures occurring between December 2017 and January 2022. Dataset 2 is from Apache Camel, which is an open-source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data. Dataset 2 presents the cumulative number of failures between April 2011 and January 2022. Tables A1 and A2 show datasets 1 and 2. The software uses the Apache IoTDB (Database for the Internet of Things), which is a native IoT database with high performance for data management and analysis, deployable on the edge and in the cloud. Errors, such as bugs, code modifications, and database management errors, were recorded. It records defects and failures Its network structure reflects past time; however, it has a vanishing gradient problem that prevents it from accurately reflecting the information about the time that existed a long time ago as the time point becomes longer [25]. To address this problem, LSTM, which has a structure that can learn long-term dependence, was built. It contains the same hidden layer as the RNN structure; however, its internal structure differs. In LSTM, a memory cell c exists. This cell has a forget gate that determines how much previous information is reflected, an input gate that includes new information g, which indicates how much new information is reflected, and an output gate that indicates how much previous information is transferred to the next layer. The activation functions used at this time were the sigmoid function and the hyperbolic tangent. Information about the past is delivered by passing through three gates and memory cells. Through this, the gradient loss problem can be solved. Each gate follows Equation (11). Figure 1b shows the LSTM model [26].
LSTM can solve the gradient loss problem in RNN; however, when going to the next step, LSTM goes through four operation processes if only one operation is performed. Processing big data may be inefficient because the number of computations increases fourfold. Complementing this, the gated recurrent unit (GRU) transmits a value to the next layer through three calculation processes in one layer. The GRU integrates the forget and input gates in the LSTM to create an update gate. In addition, it replaces the output gate with a reset gate, which determines the amount of previous information to be forgotten. The update gate determines how long the previous information should be kept. Through Equation (12), the network learns while transferring to the next layer. Figure 1c shows an image of the GRU model [27].
The software reliability model proposed in this study adds a deep neural network layer to add training depth to the predicted model, which reflects the past viewpoint through the hidden layer of the recurrent neural network. It proposes a learning model that goes through a deeper hidden layer to reflect the results of the past rather than just reflecting the past time of RNN series. This allows us to propose a software reliability model that can be generalized by solving problems that are only well suited to special cases of previously developed software reliability models. Among the software reliability models using deep learning, DNNs have three hidden layers, and RNNs have two. The number of nodes in the hidden layer was configured as the number of data points. Based on the given data, the number of nodes in the model should be limited to the number of data points to avoid estimating too many parameters compared to the data. The parameter optimization method used in the software reliability model using deep learning is Adam. Adam is a deep learning optimization method that combines the momentum method, which uses the acceleration of previous training, and the RMSprop method, which has a structure where the learning rate changes with each training instead of being constant. The formula is as follows: Let β 1 and β 2 be an exponential moving average of momentum and RMSprop. After obtaining the parameters u and s of momentum and RMSprop, update the weights by the learning rate α. In this study, to optimize the hyper-parameters, we set a range of hyper-parameters and iterated to find the optimal results. We want to find the optimal value by increasing the learning rate α by 5-times units from 0.0000001 to 0.01, and utilizing β 1 and β 2 of 0.9 and 0.99, respectively. Figure 2 shows the software reliability model that combines recurrent neural network types and deep neural networks among software reliability models using deep learning. Appl optimization method used in the software reliability model using deep learning is Adam. Adam is a deep learning optimization method that combines the momentum method, which uses the acceleration of previous training, and the RMSprop method, which has a structure where the learning rate changes with each training instead of being constant. The formula is as follows: Let and be an exponential moving average of momentum and RMSprop. After obtaining the parameters u and s of momentum and RMSprop, update the weights by the learning rate α. In this study, to optimize the hyper-parameters, we set a range of hyper-parameters and iterated to find the optimal results. We want to find the optimal value by increasing the learning rate α by 5-times units from 0.0000001 to 0.01, and utilizing and of 0.9 and 0.99, respectively. Figure 2 shows the software reliability model that combines recurrent neural network types and deep neural networks among software reliability models using deep learning.

Data Information
The dataset used in this study is the software failure data from PLC4X (https://plc4x.apache.org, accessed on 7 February 2022)) and Apache Camel (http://camel.apache.org, accessed on 7 February 2022). Dataset 1 is from PLC4X, which is a set of libraries for communicating with industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. The data represent the cumulative number of failures occurring between December 2017 and January 2022. Dataset 2 is from Apache Camel, which is an open-source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data. Dataset 2 presents the cumulative number of failures between April 2011 and January 2022. Tables A1 and A2 show datasets 1 and 2. The software uses the Apache IoTDB (Database for the Internet of

Data Information
The dataset used in this study is the software failure data from PLC4X (https://plc4x. apache.org, accessed on 7 February 2022)) and Apache Camel (http://camel.apache.org, accessed on 7 February 2022). Dataset 1 is from PLC4X, which is a set of libraries for communicating with industrial programmable logic controllers (PLCs) using a variety of protocols with a shared API. The data represent the cumulative number of failures occurring between December 2017 and January 2022. Dataset 2 is from Apache Camel, which is an open-source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data. Dataset 2 presents the cumulative number of failures between April 2011 and January 2022. Tables A1 and A2 show datasets 1 and 2. The software uses the Apache IoTDB (Database for the Internet of Things), which is a native IoT database with high performance for data management and analysis, deployable on the edge and in the cloud. Errors, such as bugs, code modifications, and database management errors, were recorded. It records defects and failures caused by bugs, new features, improvements, and tasks while operating the Apache IoTDB and is intended to estimate and predict software reliability models. Both datasets 1 and 2 used 80% and 90% of the entire data, which were utilized as training data necessary for training and parameter estimation, respectively. The remaining 20% and 10% of the data were used to compare predictions by substituting them for the training and estimated models.

Criteria
The software reliability models utilizing NHPP and deep learning were compared using 11 criteria based on the difference between the actual and predicted values. The mean squared error (MSE) and mean absolute error (MAE) were defined as the sum of the squared distances and absolute values of the distances between the estimated and actual values, divided by the difference between the number of observations and parameters [28,29].
The predictive ratio risk (PRR) was defined by dividing the distance between the actual and predicted values by the predicted value, and the predictive power (PP) was defined by dividing the distance between the actual and predicted values by the actual value [30].
R 2 is the coefficient of determination of the regression equation and determines the explanatory power by considering the number of parameters [31].
The predicted relative variation (PRV) is the standard deviation of the prediction bias, where the bias is ∑ n i=1 m(t i )−y i n and defined as [32]: The root mean square prediction error (RMSPE) estimates the closeness with which the model predicts the observations [32]: The mean error of prediction (MEOP) sums the absolute value of the deviation between the actual data and the estimated curve and is defined as [29]: The Theil statistic (TS) is the average percentage of deviation over all periods with respect to the actual values. This is defined as follows [29]: This criterion increases the penalty when a parameter is added to the model for a small sample [33].
The preSSE (predicted sum squared error) refers to the sum of the differences between the predicted result value and the actual value, and the degree of estimation can be identified [34].
Based on the above criteria, we compared the software reliability model utilizing NHPP with that using deep learning. For the same comparison, the number of criterion parameters affected by the number of parameters was set to zero. The closer R 2 is to 1, the better the result, and the closer it is to 0 for the other 10 criteria, the better the result. The goodness of the model fit was compared by calculating the criteria using R and MATLAB R2018b.

Results of Dataset 1
Using Dataset 1, Table 2 shows the results of the parameter estimation of the software reliability model assuming NHPP and the structure of the software reliability model using deep learning. In addition, Dataset 1 was divided into 80% and 90%, and parameter estimation and model fitting were performed on the divided dataset. The structure of the software reliability model using deep learning consisted of three hidden layers, and the number of nodes in each layer was set according to the size of the training dataset. For the software reliability model of the recurrent neural network type, two deep neural networks were added, except for the recurrent neural network hidden layer. Here, Adam was used as the optimization method. The learning rate was set to 0.005 for the DNN and 0.0001 for the RNN, LSTM, and GRU.      Table 4 lists the result values of the criteria using 90% of Dataset 1. MSE, RMSPE, TS, and PC were the lowest in GRU at 1.9961, 0.9948, 1.413, 4.1991, and 16.1831, respectively, and PRV was the smallest in LSTM at 0.1251. In addition, PRR and PP in UDPF showed the lowest values of 0.6386 and 0.9113, and MAE and MEOP were 1.3714 and 1.3409 in DPF, respectively. The R 2 value was the largest at 0.9948. In GRU, 5 out of 10 criteria showed good results. In preSSE, the TC, DS, and Vtub models showed very small values of 6.367, 6.537, and 7.340, respectively. In the software reliability model using deep learning, RNN showed the smallest value of 11.397, followed by LSTM at 11.755. The resulting values showed good results for the software reliability model, assuming NHPP. Using Dataset 2, Table 5 shows the results of the parameter estimation of the software reliability model assuming NHPP and the structure of the software reliability model using deep learning. Similar to Dataset 1, Dataset 2 was divided into 80% and 90%, and parameter estimation and model fitting were performed on the divided dataset. The structure of the software reliability model using deep learning consists of three hidden layers, as in Dataset 1, and the number of nodes in each layer was set according to the size of the training dataset. The learning rates were set to 0.000001 for DNN and 0.00001 for RNN, LSTM, and GRU.

Confidence Interval
In this study, GRU, the software reliability model that showed the best results among the proposed models, was used to obtain the confidence interval. In addition, the difference in the distribution between the predicted and actual data values of the other models was compared. The confidence interval follows Equation (22). Since software failures follow a Poisson distribution, we take the mean and variance as ( ).
is defined as the 100(1 − )/2 percentile of the standard normal distribution [35]. A 95% confidence interval was used.
Tables A3 and A4 show the predicted values and corresponding confidence intervals for the remaining 20% and 10% of the data based on the estimated and trained models for 80% and 90% of Dataset 1. The bolded values indicate that the actual data values do not

Confidence Interval
In this study, GRU, the software reliability model that showed the best results among the proposed models, was used to obtain the confidence interval. In addition, the difference in the distribution between the predicted and actual data values of the other models was compared. The confidence interval follows Equation (22). Since software failures follow a Poisson distribution, we take the mean and variance as ( ).
is defined as the 100(1 − )/2 percentile of the standard normal distribution [35]. A 95% confidence interval was used.
Tables A3 and A4 show the predicted values and corresponding confidence intervals for the remaining 20% and 10% of the data based on the estimated and trained models for 80% and 90% of Dataset 1. The bolded values indicate that the actual data values do not

Confidence Interval
In this study, GRU, the software reliability model that showed the best results among the proposed models, was used to obtain the confidence interval. In addition, the difference in the distribution between the predicted and actual data values of the other models was compared. The confidence interval follows Equation (22). Since software failures follow a Poisson distribution, we take the mean and variance asm(t). z α is defined as the 100(1 − α)/2 percentile of the standard normal distribution [35]. A 95% confidence interval was used.m (t) + z α m(t) Tables A3 and A4 show the predicted values and corresponding confidence intervals for the remaining 20% and 10% of the data based on the estimated and trained models for 80% and 90% of Dataset 1. The bolded values indicate that the actual data values do not fall within the prediction intervals. For 80% of Dataset 1, the other NHPP software reliability models, except for PNZ, PZ, and Vtub, and the software reliability models using deep learning, predicted well within the 95% confidence interval. For 90% of Dataset 1, the other NHPP software reliability models (except PNZ, PZ, TP, and UDPF) and the software reliability models using deep learning performed well within the 95% confidence interval.
Tables A5 and A6 show the predicted values and corresponding confidence intervals for the remaining 20% and 10% of the data based on the estimated and trained models for 80% and 90% of Dataset 2. For 80% of Dataset 2, the GO, PNZ, and TP models showed results where the actual data was not within the prediction confidence interval at all, while the DS model showed results where the actual data was within the 95% prediction confidence interval up to the 108th time, the YID, TC, DPF, and UDPF models up to the 109th time, and the PZ, Vtub model up to the 110th time. For 90% of Dataset 2, the NHPP software reliability models showed that the PZ, Vtub, DPF, and UDPF models resulted in the actual data falling within the 95% prediction confidence interval up to the 118th time. Except for the PZ, Vtub, DPF, and UDPF models, none of the NHPP software reliability models had the actual data fall within the predicted confidence interval. The software reliability models using deep learning performed well, with predictions within the 95% confidence interval. Except for the models that do not fall within the 95% confidence intervals shown in Tables A3-A6, the remaining models showed good results in following the data trend, and the software reliability models using deep learning showed good estimation and prediction results in following the data trend.

Conclusions
In this study, the software reliability models using NHPP and deep learning were compared. Regarding the software reliability model using deep learning, a model composed of deep neural networks and recurrent neural networks (RNN, LSTM, and GRU) that have been applied to time-series data characteristics was used. It was constructed by including the hidden layer of the neural network. Using Datasets 1 and 2, it was confirmed that this model showed better estimation and predictive power than existing software reliability models. Among them, the software reliability of the recurrent neural network series showed better results, and as a result of fitting with 80% and 90% datasets, the software reliability model with the deep neural network added to the GRU showed the best results. In addition, the software reliability model of the recurrent neural network showed good results when the predicted values were compared with the remaining datasets. The NHPP software reliability model showed results that relatively fit the data well with a small number of time points; however, it showed a significant increase as the number of time points increased. On the other hand, the software neural network using deep learning showed results that fit the data trend well, even when the time point was large.
However, software reliability models using deep learning always face the problem of overfitting. Therefore, to address this, we plan to conduct research on software reliability models based on mathematical and statistical assumptions by grafting them together with the NHPP software reliability models. Through this, we aim to create a stable, datadependent, and complementary model with a more mathematical basis.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data is available in a publicly accessible repository.