Modeling of software fault detection and correction processes with fault dependency

Software reliability modeling has undergone a continuous evolution over the past three decades to adapt to various and ever-changing software testing environments. In existing models, immediate fault removal and fault independency are two basic and commonly used assumptions. Recently, models combining fault detection process (FDP) and fault correction process (FCP) were proposed to alleviate the immediate fault removal assumption. In this paper, we extend such a methodology by proposing a modeling framework for the FDP and FCP incorporating fault dependency. Faults are classified as leading faults and dependent faults and the FCPs for both types of faults are explicitly modeled. Several paired models considering different assumptions for debugging lags are proposed for the combined FDP and FCP. The applicability of the proposed models are illustrated using real testing data. In addition, the optimal software release policy under this framework is studied.


Introduction
Software today plays important roles in almost every section of our society, and the software reliability has been a major concern in many integrated systems [3].With continuous debugging, analysis and correction, the software reliability will grow gradually with testing [33].During the past three decades, numerous software reliability growth models (SRGMs) have been proposed [2,7,24,26,35,40,41].Among these models, Non-homogeneous Poisson Process (NHPP) models are the most commonly accepted [20,30,36,39,50].Although NHPP models are mathematically tractable, they are developed under some strong assumptions on the software testing process.Specifically, NHPP models assume immediate fault removal and fault independency.To adapt to different practical software testing environments, generalizations of traditional models by relaxing the assumptions have been proposed [5,9,17,23,28,29].
In practical software testing, each detected fault has to be reported, diagnosed, removed and verified before it can be noted as corrected.Consequently, the time spent for fault correction activity is not negligible.In fact, this debugging lag can be an important element in making decisions [16,49].Therefore, it is necessary to incorporate the debugging lag into the modeling framework, i.e., to model both the fault detection process (FDP) and fault correction process (FCP).The idea of modeling FCP was first proposed in Schneidewind [34], where a constant lag was used to model the FCP after fault detection.Clearly, the constant correction time assumption is restrictive for various types of faults and different correction profiles.For instance, data collected from practical testing projects show that the correction time can be fitted by the exponential and log-normal distributions [27].In addition, the correction time may show a growing trend during the whole testing cycle, as later detected faults can be more difficult to correct.Some extensions were made in Lo and Huang [25] and Xie, et al. [44] by incorporating other assumptions of debugging delay.Hu, et al. [8] studied a data-driven artificial neural network model for the prediction of FDP and FCP.[37] used the fault detection/correction profile to quantify the maintainability of software.Some paired FDP and FCP models were proposed in Peng, et al. [31], where testing effort function and fault introduction were included.
Traditional NHPP models assume the statistical independency between successive software failures.Actually, it can hardly be true in practice, as some faults are not detectable until some other fault has been corrected because of logical dependency.Moreover, the common practice of mixing testing strategies can lead to the dependency of failures [6].With a failure detected, there is a higher chance for another related failure or a cluster of failures to occur in the near future.From this point of view, faults can be classified into mutually independent and dependent types with respect to path-based testing approach.This sciENcE aNd tEchNology issue was addressed in [18], where an extended NHPP SRGM was proposed.Huang and Lin [11] studied the fault detection & correction process considering both fault dependency and debugging lags.Yang, et al. [46] discussed the statistical inference of the software reliability model with fault dependency.However, most of the studies only focus on the FDP, and only the FDP data are used for model parameters estimation.As a result, the collected information from FCP is neglected, which can lead to deficiency in model estimation.
To remedy the problem, we incorporate the fault dependency into the paired FDP and FCP model.Instead of assuming a single type of fault, this study classifies the faults in the testing process into leading faults and dependent faults.The leading faults occurs independently following an NHPP, while the dependent faults are only detectable after the related leading faults being corrected.Different from Huang and Lin [11] which modeled the FDP and the FCP as a single, synthesized fault detection & correction process, we model the FDP and FCP for the leading faults and the dependent faults separately.Subsequently, the FDP&FCP model for the aggregated, observable faults can be readily obtained.With different formulation of debug delays, we can derive various FDP&FCP models.Hence, the proposed models admit a wide applicability that can account for different software reliability growth schemes.
The rest of this paper is organized as follows.Section 2 formulates the general modeling framework of paired FDP and FCP with the incorporation of fault dependency.In Section 3, special paired FDP and FCP models are derived based on different assumptions for debugging lags.In Section 4, the proposed faults are fitted to two real datasets to illustrate the application.Section 5 derives the optimal software release policy under the proposed framework.The conclusion is given in Section 6.

Notation a
The total number of faults in the software The number of leading faults in the software The number of dependent faults in the software p The ratio of the number of leading faults to the total number of faults

The general framework
In this study, we formulate the fault-oriented software testing process as a paired fault detection and correction process.During the test, a fault can only be corrected after being detected.For the faults embedded in the software system, they can be categorized into leading faults and dependent ones.The faults that can be detected and corrected independently are defined as leading faults or independent faults.Other faults that remain undetectable until the corresponding leading faults are removed are defined as dependent faults.Fig. 1 illustrates the relationship between leading faults and dependent faults.
Suppose the leading faults are detected and corrected independently.Then, for the leading faults, their detection (FDP L ) and correction process (FCP L ) can be modeled by NHPP models, as in Xie, et al. [44].For the dependent faults, their detection process (FDP D ) can be modeled as a delayed process of FCP L , considering that they are only detectable after the corresponding leading faults are corrected.Consequently, the correction process for dependent faults (FCP D ) can be modeled as a delayed process of FDP D .The modeling framework is characterized by the mean value function for each sub-process.

Modeling FDP L
We assume that FDP L follows a NHPP, and the expected number of leading fault detected during (t, t+Δt] is proportional to the number of undetected leading faults at time t.Thus we have: where b(t) is the fault detection rate at time t and a 1 is number of leading faults at the beginning.With the initial condition m d1 (t)=0, it can be derived from (1) that: Different m d1 (t) can be obtained based on different b(t).Specially, when b(t) is a constant, we have: which is the G-O model [4].When b t b t bt , we have: which has the same form as the Yamada delayed-S-shaped model [45].FCP L can be regarded as a delayed process of FDP L and different models can be used to accommodate the debugging delay.Xie, et al. [44] pointed out that debugging lags could be assumed constant, time dependent or random.If the debugging lag is not random, the FCP L can be derived from FDP L as m t m t t r d 1 1 In particular, if the debugging lag is assumed to be an exponentially distributed random variable, i.e., δ(t)~Exp(c), we have: Taking the derivatives of both sides with respect to t, we can obtain that: This implies that the expected number of faults corrected during (t,t+Δt] is proportional to the number of detected but uncorrected faults at time t.We call c the fault correction rate.

Modeling FDP D and FCP D
For these dependent faults, they can only be detected after the corresponding leading faults are removed.Hence, the proportion of the detectable dependent faults in the dependent faults is equal to the proportion of the corrected leading faults in the leading faults.Suppose the number of dependent faults is a 2 .Then, the expected number of detectable dependent faults is a 2 m r1 (t)/a 1 up to time t.Furthermore, because leading faults and dependent faults are detected under the same testing environment, it is reasonable to assume that the fault detection rate for dependent faults is the same as the fault detection rate for leading faults.Therefore: With the initial condition m d2 (0)=0, we can derive from (7) based on m r1 (t) and b(t) that: Particularly, when b(t)=b, we have: Based on the detection process of dependent faults, the corresponding correction process can be obtained as a delayed process as for leading faults.Thus, with different assumptions for the debugging delay, m r2 (t) of FCP D can be derived accordingly.

Combined models
With the FDP and FCP models for both kinds of faults, the aggregated model for the paired FDP&FCP can be readily obtained: . (12)

Specific models for dependent FDP and FCP
In this section, we consider the widely-used constant fault detection rate function b(t), i.e., b(t)=b [10,22].In this case, we have , where p=a 1 /a is the proportion of leading faults.As stated, different m r1 (t) can be derived based on different assumptions on the debugging lag.Moreover, as long as m r1 (t) being specified, m d2 (t) can be obtained according to (9).In the following, we consider three different types of debugging lags, which have been observed from practical testing processes.Correspondingly, specific paired PDF&FCP models are derived.

Constant debugging lag
We first consider the case where the correction of each fault takes the same time, i.e., δ(t)=δ.Then, the FCP model of leading faults is: Consequently, the FDP model for dependent faults can be derived according to (9): Because the FCP models for both kinds of faults are modeled as delayed FDP, the aggregated FCP model is:

Time-dependent Debugging Lag
In practice, the faults discovered in the later phase of the testing process may be more difficult to correct.To model such a phenomenon, we assume the debugging lag is dependent on the testing time, . Under this assumption, we have: which is a general form of the delayed NHPP model [45].
Based on ( 9) and ( 17), m d2 (t) can be derived.Then, m d (t) for the aggregated FDP is obtained as: Because m t m t t b r d the model for the aggregated FCP can be derived as follows:

Exponentially distributed random debugging lag
As obtained in Section 2.2, the number of faults corrected during time interval (t,t+Δt] in this case is proportional to the number of detected but uncorrected faults at time t.Based on (5), m r1 (t) can be obtained as: Then, m d2 (t) can be derived based on m r1 (t) according to (9)

Numerical example
In this section, we illustrate the application of the proposed models to two real software testing datasets.

Description of the Datasets
The first dataset is from the System T1 data of the Rome Air Development Center (RADC) [27].This dataset is widely used and it contains both fault detection data and fault correction data.The cumulative numbers of detected faults and corrected faults during the first 21 weeks are shown in Table 1.During the debugging, 300.1 hours of computer time were consumed and 136 faults were removed.The computer time spent in the testing process is used the time scale for the FDP and FCP.
The second dataset is from the testing process of a middle-size software project [42,44].The cumulative numbers of detected faults and corrected faults during the first 17 weeks are listed in Table 2.

Performance analysis
To illustrate our models, we consider the following three paired FDP&FCP models: We note that the models proposed in Xie, et al. [44] are special cases of the proposed FCP&FDP models without considering the dependent faults.For comparison purpose, we also fit the data by the three simplified models of M1-M3 with p=1, which are abbreviated as M1', M2' and M3', respectively.
The six models are fitted to the two datasets by the least squares method.The least squares method minimizes the mean squared error (MSE) between the estimated cumulative numbers of detected and corrected faults and the actual cumulative numbers of detected and corrected faults.It is calculated as: where m d,i , m r,i are the observed cumulative numbers of detected faults and corrected faults at time t i ,i=1,…,n.The estimated model parameters for dataset 1 is given in Table 3.
As can be noticed from Table 1, the estimated parameter a (the total number of faults) in the three proposed models M1-M3 are close to each other.They are all close to 188, which is the number of detected faults after three years' testing, as reported in Kapur and Younes [18].On the contrary, the models M1'-M3', which assume no dependent faults exist, produce quite large a.Therefore, ignoring the dependent faults in the model would result in incorrect total number of faults.
According to the MSE values and the point-wise squared error MSE d,i +MSE r,i in Fig. 2, it shows that the paired FDP&FCP model with exponentially distributed debugging lag fits the dataset best.On the other hand, the model M1, which assumes constant debugging lag, also provides a competitive fit.The model assuming time-dependent debugging lags provides the least favorable fit.In fact, according to the estimated model M3, we can derive that the expected length of the debugging lag is 1 26.08 c = .This is close to the estimated debugging lag in M1.Thus, we can infer that there are significant debugging lags in the software testing process, and it takes about 25 hours for a detected fault to be corrected.
The estimation results by the six models for dataset 2 are presented in Table 4. Analogous to the dataset 1, the proposed models considering both leading and dependent faults are superior to those    only considering leading faults.In the fitting procedure, we restrict the total number of faults a to be no smaller than the faults in the data.Therefore, we see that the estimated a are all equal to 144, which is the number of the total faults in dataset 2. Among the three models M1-M3, the constant debugging lag model provides the best fit.This can also be noted from the point-wise squared error in Fig. 3.This indicates the debugging lag is almost constant in the software testing process.

Software release policy
Based on a SRGM, useful information can be inferred to guide decision-making.For software projects, one critical decision is to determine the optimal release time [12].Many studies have dealt with this problem [13,19,21,32]; see Jain and Priya [14] and Boland and Chuív [1] for an overview.As cost and reliability requirements are of great concern, they are often used as objectives for optimizing the testing time and release policy [15,38,47,48].In this section, we study the optimal release policies based on the proposed models from the cost and reliability perspectives.

Software release policy based on reliability criterion
Software is usually released when a reliability target is achieved.It is reasonable to stop testing when a pre-specified proportion of faults are removed.We use T to denote the length of testing and consider the ratio of cumulative removed faults to the initial faults in the software system as the reliability criterion [11]: With a given reliability target R 1 , the time to reach this reliability target is Another criterion is the software reliability, which is defined as the probability that no failure occurs during time interval (T,T+ΔT] given that the software is released at time T. Considering that the reliability status of software generally does not change in operational phase, the reliability function is: where λ d (T) is the instantaneous failure intensity at time T. For a given target R 2 for R 2 (ΔT│T), the time for the software to reach R 2 can be solved as min T {T:R 2 (ΔT│T)≥R 2 }.

Software release policy based on cost criterion
For a basic FDP model with mean value function m(t), the following cost model is frequently used [43]: where c 1 is the expected cost of removing a fault during testing, c 2 is the expected cost of removing a fault in the field and c 3 is the expected cost per unit time of testing.In practice, the cost of removing a fault in field is generally greater than that during testing, thus we assume c 2 >c 1 .
When the correction process is incorporated, the following cost model can be constructed: where m r (T) is the total number of corrected faults at the time of release T, and m d (∞) -m r (T) is the number of uncorrected faults that includes both the undetected faults m d (∞) -m d (T) and the detectedbut-not-corrected faults m d (T) -m r (T).By minimizing the cost model with respect to T, the optimal release time T c under the proposed framework can be obtained., +   ) and decreases on z z ) , with j=0,…,k, z 0 = 0 and z 2k+1 =+∞.The optimal T c is determined as Proof: We just need to prove that there exists a T s such that Cʹ(T) is positive for 3 , we have: Clearly, Cʹ(0)= c 3 >0, indicating that C(T) is increasing when T is close to zero.We shall prove that λ r (T) approaches 0 (or Cʹ(T) approaches c 3 ) when T approaches +∞.If so, C(T) is increasing when T is close to 0 or approaches +∞.Consequently, if C(T) has any stationary point, it must have even number of stationary points 0 <z 1 ≤z 2 ≤⋯≤ z 2k < +∞ such that C(T) increases on on z z ) and decreases on z z ) , for j=0,…,k, z 0 =0 and z 2k+1 =+∞.In the following, we shall show that λ r (T) approaches 0 when T approaches +∞ under the three proposed models.
If the paired model under constant debugging lag assumption is used, from (16) we have: When T approaches +∞, λ r (T) approaches 0. For the paired model with time-dependent debugging lags, according to (18) we have: It can be seen that λ d (T) approaches 0 when T approaches +∞.Moreover, we have: γ approaches +∞ when T approaches +∞ for b>c, we can see that λ r (T) approaches 0 for T → +∞.
For the paired model under exponentially distributed random debugging lags, we have: sciENcE aNd tEchNology Both m d (T) and m r (T) approach a as T approaches +∞.Thus λ r (T)approaches 0 when T approaches +∞.

Software release policy based on mixed criterion
When both reliability requirements and the total cost are considered, we determine the optimal release time T* that minimizes the total cost under the reliability constraint.Accordingly, the problem can be formulated as:

Minimize
Subject to When the reliability constraint R 1 (T) is used, we can divide the time axis [0,∞) into two types of intervals such that C(T) increases on type 1 intervals and decreases on type 2 intervals.The candidates for T * comprise of the minimum T on each type 1 interval that satisfies R 1 (T)≥R 1 .Then, T * is the one among all the candidates that leads to the lowest cost.
When the reliability constraint R 2 (ΔT│T) is used, we can split the time axis [0,∞) into four types of intervals such that both R 2 (ΔT│T) and C(T) increase on type 1 intervals, both R 2 (ΔT│T) and C(T) decrease on type 2 intervals, R 2 (ΔT│T) increases while C(T) decreases on type 3 intervals, and R 2 (ΔT│T) decreases while C(T) increases on type 4 intervals.The candidates for T * comprise of the minimum T in each type 1 interval that satisfies R 2 (ΔT│T)≥R 2 , the maximum T in each type 2 interval that satisfies R 2 (ΔT│T)≥R 2 , the end points of type 3 intervals which satisfy R 2 (ΔT│T)≥R 2 , and the initial points of type 4 intervals which satisfy R 2 (ΔT│T)≥R 2 .The optimal release time T * is the one corresponding to the lowest cost.

Numerical examples
For illustration, we consider the paired FDP&FCP model with constant debugging lag that fits the dataset 1 in Section 4. The model parameters are a=199.27,b=0.00717, δ=24.78 and p=0.382.In addition, we assume c 1 =$300, c 2 =$2000, c 3 =$10, ΔT=12, R 1 =0.95 and R 2 =0.95.In the following, we present the optimal release time that minimizes the cost with specific reliability constraints.
Considering cost criterion and reliability target 1) R 1 .
From (28), the testing cost under our parameter settings is: Considering cost criterion and reliability target 2) R 2 .
When R 2 (ΔT│T) is used as the reliability constraint, we can derive the following detection rate according to the specified model parameters:

( ) =
, which is slightly larger than that in the last case.An illustration of the optimal release policies under two scenarios is given in Fig. 4.

Conclusion
In this paper, we proposed a framework for the software reliability growth modeling.The software testing process was considered as a paired fault detection and correction process, and the faults during the testing were classified into leading and dependent faults according to their detectability.The leading faults can be detected and corrected directly, whereas the dependent faults can only be detected until the corresponding leading faults are corrected.For both types of faults, the FCP was modeled as a delayed FDP.In addition, the FDP of dependent faults depended on the FCP of leading faults.Special paired FDP&FCP models were derived under the proposed framework with different assumptions on the debugging lag.The application to two real software testing datasets revealed the effectiveness and the superiority of the proposed models over existing ones.Under this framework, the optimal software release policy was investigated considering cost and reliability requirements.
As a direction for future studies, the proposed modeling framework can be extended to incorporate other information or adapt to other testing environments.For instance, Bayesian technique can be used to incorporate prior information and update model parameters when more information is available.In addition, the imperfect fault correction or the fault introduction phenomenon can be incorporated, as it is common for debuggers to make mistakes with fault correction.

δ δ ( 14 )
Based on the FDP models of the leading faults and the dependent faults, m d (t) for the aggregated FDP is obtained as: (1) model with constant debugging lag (abbreviated as M1); (2) model with δ γ model with exponentially distributed debugging lag (abbreviated M3).

Fig. 2 .
Fig. 2. Point-wise squared errors of the six fitted models for dataset 1.

Fig. 3 .
Fig. 3. Point-wise squared errors of the six fitted models for dataset 2

Theorem 1 :
Under the proposed models in Section 3, the time T c which minimizes C(T) exists.Specifically, there exist 2k(k ≥ 0) non- hand, the correction process model with given parameters is: m r (T) into(33), it can be derived that C(T) increases on [0,24.78],decreases on (24.78,1030.45)and increases on [1030.45,∞).As can be verified, R 1 (0)<0.95,R 1 (1030.45)>0.95.According to the analysis in the preceding section, the optimal release time is T 1 1030 45 * .=. Correspondingly, the optimal software testing costis C

Fig. 4 .
Fig. 4. Variation of normalized total cost function and software reliability functions with testing time

Table 1 .
The dataset from System T1.

Table 2 .
The dataset from a middle-size software project.

Table 3 .
The estimated model parameters for dataset 1.

Table 4 .
The estimated model parameters for dataset 2.