A Non-Parametric Approach for Survival Analysis of Component-Based Software

Reliability of a software or system is the probability of system to perform its functions adequately for the stated time period under specific environment conditions. In case of component-based software development reliability estimation is a crucial factor. Existing reliability estimation model falls into two broad categories parametric and non-parametric models. Parametric models approximate the model parameters based on the assumptions of fundamental distributions. Non-parametric models enable parameter estimation of the software reliability growth models without any assumptions. We have proposed a novel non-parametric approach for survival analysis of components. Failure data is collected based on which we have calculated failure rate and reliability of the software. Failure rate increases with the time whereas reliability decreases with the time.


Introduction
Now a days, software development organization of industry has become progressively dependent on third party for functionality.This is due to financial and time-to-market consideration.These third party's software or components are then integrated to form complete software as per the needs of the customer.Components are high-quality and pretested software entities.This methodology of software development is called Component-based software engineering (CBSE) (Gayen and Misra, 2008).
CBSE plays an important role in this era of software.CBSE comprises of application and component engineering.One of the grimmest problems for successful CBSE is its reliability estimation.Analyzing the reliability of software is crucial for predicting software field failure (Tyagi and Sharma, 2012).The term reliability can be defined as "Probability of a system to perform its functions correctly for a specified period of time." Reliability is measured with respect to time.Traditional methods for estimating reliability can't be applied to component-based software (CBS) applications.There are various methods already proposed by researchers.These approaches for reliability estimation involve two steps (Goseva-Popstojanova and Trivedi, 2003): Approximating the reliability of distinct components, and the reliability of system.Nautiyal and Preeti (2016) have proposed an evaluation process for certification of component based software.Certification is performed at component as well as system level.Author has used unstructured weighting technique to certify the system or component.The author Gokhale (2007) has proposed an overview of the existing research in the area of architecture-based software reliability analysis and critically examined the growing size and complexity of software applications.
Reliability estimation models falls into three categories: state-based, path-based and additive models (Singh et al., 2001;Yacoub and Ammar, 2002).To estimate reliability, State-based models observe the flow of control among components.The models assume that components may be faulty autonomously and current behavior of a component doesn't depend on its earlier behavior.Failure is modeled as Non-Homogeneous Poisson Process (NHPP).The limitation of these models is that the component's failure probability cannot be constant because failure rate may be high for frequently used components.So, the assumption of constant failure rate cannot lodge this fact.
Path-based models take into account the possible execution paths for estimating the system reliability.Experiments and algorithms are two ways to obtain different paths.Path's reliability is defined in terms of a function of the reliabilities of the components along that particular path.Reliability of the system is the average of reliabilities of all paths.Third category of models is additive model.Failure data of the component is used to estimate the system reliability.Additive models study growth of software reliability.Additive models do not explicitly take into account architecture of the software.Reliability of a system can be estimated from failure rate by using many techniques.We can categorize these techniques into two broad categories: Non-Parametric methods are commonly used for estimating the reliability characteristics.These methods are simple to use.The constraint is that the results cannot be precisely generalized outside the last reported failure rate.In Parametric techniques, the failure rate is to fit to a statistical distribution (exponential, normal, Weibull, or lognormal).The resultant model can be used for efficient calculation of reliability parameters for the entire lifetime of the system.
We have proposed a non-parametric additive model to estimate the reliability of the CBS.In proposed approach the reliability estimation is based on failure data of the components.Failure data of a CBS is collected and accordingly reliability is computed.Probability of failure is used to represent the failure behavior.Remaining paper is organized as follows; next section discusses the related work done in this area.Section 3 consists of proposed approach.Final section includes conclusion of the paper.

Related Work
Software reliability model falls into two main categories: parametric and non-parametric models (Lakshmanana and Ramasamy, 2015).Parametric models approximate the model parameters based on the assumptions of fundamental distributions.These models can be further divided into three types: NHPP, Markovian models (Whittaker et al., 2000) and Bayesian models.Nonparametric models enable parameter estimation of the software reliability growth models (SRGM) without any assumptions.Non-parametric methods yield models with better analytical accurateness than parametric models (Karunanithi et al., 1992).
The author, Su et al. (2007) have proposed a fuzzy-logic based model to estimate the reliability of CBS.Author considers four factors that affect the reliability, reusability and operational profile in case of component reliability and component dependency and application complexity to estimate interface reliability.Zhang et al. (2009) have introduced the concept of reliability estimation using architecture-based model.This approach for reliability evaluation can be applied in design phase.This approach assumes that the overall reliability is related to the individual component's reliability.
The author Isaac (1995) focused on the main two points i.e. risk assessment and risk control where risk assessment helps a manager to make judgment about his future and helps others to overcome their errors.This paper also highlighted on ten points that should be kept in mind while using risk management techniques.Bowers and Khorakian, (2014) has proposed new method which is quite similar to other projects which include failure rate and emphasizes on creativity.Without risk management it is difficult to achieve success.But an excessive risk can also hamper the creativity.So, to be on the safer side one should use risk management technique.
The authors, Wang and Huang (2008) have offered reliability analysis based on rewrite logic technique.This method is based on analysis of operation profile and specifications.Rewrite language Maude is used to execute these specifications.Execution process is used to calculate transition probabilities and statistically analyze the expected numbers of components, which will be visited.Critical components can also identified by this algorithm.Weiss and Weyuker (1988) have provided the approach in faces a problem of test case selection from a specific input domain since there were no strategies concerning selection of test cases and occurrence of operational errors.Gayen and Misra (2009) have solved this problem by dividing the input field into operational error subfield and logical subfield.Path coverage based testing methodology is used to select test cases and to predict the reliability in the logical sub-domain.To obtain the actual input domain based reliability this value is multiplied with the probability of non-occurrence in the operational error sub-domain.Yacoub et al. (2004) have proposed Scenario-based reliability evaluation method.This approach presents component dependency graphs that can be extended for complex distributed systems.The approach is constructed on scenarios which can be seized with sequence diagrams.It means that this approach can be automated.A disadvantage of this approach is that it does not take into account the failure dependencies among the components.Gokhale et al. (1998) have discussed an approach in which author assumes that the application can be represented as a control flow graph.Component failures are randomly generated for simulation.A programmatic procedure is used to return the inter failure arrival time for a particular component.Simulation failures use these failure and repair rate while executing the application and its reliability is estimated.Component interface and link failure are not considered while simulation is being performed.Lo (2010) has proposed a software reliability estimation model based on a Support Vector Machine (SVM) and Genetic Algorithm (GA).Advantage of this model is that it does not depend on failure data much.This approach states that topical failure data itself is enough for estimating reliability.Reliability estimation parameters for the SVM are determined by the GA.Goswami and Acharya (2009) have considered component usage ratio (CUR) for reliability analysis of CBS.Mathematical formulas are used to compute CUR.Due to the suppleness of the CUR, this technique may be used in real-time applications.Everett (1999) proposes a six step process for software reliability; dividing software into components, Characterize the component, define usage of components, Model the reliability of discrete components, Superimpose the reliability of components, Component analysis through testing.

Proposed Approach
Let t1, t2, t3…represent the time of failure of component.Also let n1, n2, n3… symbolize the number of component failure that happen at each of these times, and let r1, r2, r3… be the corresponding number of components lasting.It means r2 = r1 − n1, r3 = r2 − n2, etc.We know that the probability of lasting beyond time t2 i.e. (P(T>t2)) depends on probability of lasting beyond time t1 i.e. (P(T>t1)).Similarly, probability of lasting beyond time t3 depends on probability of lasting beyond time t2etc.We can use this recursive relation to iteratively build a numerical estimate R (t) of the true survival function R(t).
For any time t ϵ [0, t1), we have R (t) = P (T >t) = "Probability of surviving beyond time t" = 1, because no failures have occurred as yet.Therefore, for all t in this interval, let R (t)=1.

Note: For any two events A and B, P (A and B) = P(A) × P(B | A).
Let A = "survive to time t1" and B = "survive from time t1 to beyond some time t before t2".As both events occurs therefore equivalent time of the event "A and B" = "survive beyond time t before t2," i.e, "T >t."Hence, the following condition holds.


In general, for t ϵ [tj, tj+1), j = 1, 2, 3… we have… (1 ) where, rj = the number of component failures in the interval j, n = the total number of components, tj = time taken for dj failure, nj = the operating components in the interval j i.e. n − Σrj .

Steps for Survival Analysis of CBS
Proposed approach comprises of four phases.Figure 1 shows the diagram of proposed approach.Four phases are as follows: (i) Take a CBS and Test it: We have coded a CBS comprises of 30 components.These components don't perform any function but only prints something on the screen.We consider a component is failed if at some time it is not printing its statement on the screen.Each component runs as a thread of java program.For introducing failure we have stopped the particular thread.
(ii) Collect Time-To-Failure and Number of Components Failed: Table 1 shows the failure data collected in testing this CBS.
(iii) Calculate Failure Rate: Third column in Table 2 gives calculated failure rate.Failure rate vs. time graph (in Figure 2) shows failure rate increases as the time increases.
(iv) Calculate Reliability: Last column of Table 2 in gives reliability calculated by using the proposed approach.Reliability Vs. time graph (in Figure 3) shows reliability decrease as the time increases.

Reliability Analysis
Reliability of the software is the ability of the software to perform the required function, under some scenario or pre-defined condition for a stated period of time.It is usually defined as the probability of failure free operation for a specified time, in specified environment for a specific purpose.It is the important attribute of software quality.Reliability is basically categorized into two parts Hardware reliability means, what is the probability of hardware failing and how long does it take to repair that component?Software reliability is the probability that the software system will function properly without failure over a certain period of time.This section presents reliability analysis a system with 31 components.

Reliability
Column 7 of Table 2 is the calculated reliability.Figure 2 and 3 respectively show the growth/decay of failure rate and reliability.Figure 2 shows the failure rate vs. time graph based on proposed approach.As can be seen from Figure 2, the failure rate is increasing with time.The reliability vs. time graph is shown in Figure 3.It shows that the reliability value decreases as time increases.

Conclusion
Reliability of a software or system is the probability of system to perform its functions adequately for the stated time period under specific environmental conditions.In case of component-based software development reliability estimation is a crucial factor.Existing reliability estimation models falls into two broad categories parametric and non-parametric models.Parametric models approximate the model parameters based on the assumptions of fundamental distributions.Nonparametric models enable parameter estimation of the software reliability growth models without any assumptions.We have proposed a novel non-parametric approach for survival analysis of components.Failure data is collectively based on this.We have calculated failure rate and reliability on the basis of this software.Failure rate increases with the time whereas reliability decreases with the time.Various authors proposed parametric approaches for estimating reliability of the CBS.Thus, we have tried to contribute a non-parametric approach.

Figure 1 .
Figure 1.Flow chart for proposed approach

Table 1 .
Time-to-failure of CBS

Table 2 .
Calculated values of failure rate and reliability using proposed approach