Application of multistage process control methodology for software quality management

Article history: Received: October 1, 2016 Received in revised format: November 16, 2016 Accepted: February 24, 2017 Available online: February 24, 2017 As the need for software increased, the number of software firms and the competition among them also increased. The software companies in developing countries like India can no longer survive based on cost advantage alone. The firms need to deliver competitively priced quality software products on time. This can be achieved through quantitatively managing the different phases or sub processes in software development process. But quantitative management of a process consisting of a set of interlinked sub processes or stages with the output of one sub process influencing that of subsequent stages and final output is not easy. The process performance models developed for quantitative management of software development process often model the final outcome in terms of factors from various stages together or focuses only on quantitatively managing a particular sub process independently. In manufacturing and other engineering industries, the processes with multiple sub process are monitored and controlled using multistage process control methodology. This paper is an application of multistage statistical process control for managing the software development process. The suggested methodology is a combination of process performance models and control charts. The proposed methodology can be easily implemented for controlling various types of software projects like development projects, incremental development projects, testing projects etc. The methodology also provides the project manager the opportunity to tighten or relax the control at various sub processes based on the project team’s strengths and still achieve the goal on the final outcome. 2017 Growing Science Ltd.


Introduction
Many organizations utilize information technology (IT) or use automation to gain the business advantage over the competitors (Adam et al., 2001;Samson & Terziovski, 1999;Asher & Kanji, 1999).As a result, the IT industry has grown rapidly in the recent past.As the number of software firms increased, the competition among the companies also increased.The software companies in countries like India can no longer survive or satisfy the customer by cost advantage alone.The organizations need to deliver quality software products on time at a competitive cost.The studies have shown that the software quality, development cycle time and effort are related (Harter et al., 2000).The software quality is also related to customer satisfaction (Prazinger & Nath, 2000).The studies have also shown that higher the CMM level, better is the software quality (Herbsleb et al., 1997).
Defining software quality is not easy and there is no single adequate measure for software quality.ISO 9126 standard (1991) defined software quality as the totality of features and characteristics of a software product that bear on its ability to satisfy the stated and implied needs of the customer.The Capability Maturity Model (CMM) of Software Engineering Institute (SEI) of Carnegie Mellon University classifies the software process into five maturity levels namely initial, repeatable, defined, managed and optimized (Paulik et al., 1994;Pressman, 2005).For quantifying and monitoring software development process, software quality is often measured in terms of delivered defect density (Fenton & Bieman, 2014).The delivered defect density is the number of defects per unit size.The widely accepted approach for software quality management is to set the target or goal on delivered defect density and then manage the various phases or sub process in the software development life cycle to achieve the set goal.The project managers generally utilize their industry experience, software engineering knowledge, process performance models (Tamura, 2009;Hao & Zhang, 2011, John & Kadadevaramath, 2015), defect prediction models, etc. to implement the aforementioned approach.A wide variety of defect prediction models is available in the literature.These models are developed for either predicting the defects or classifying the software module as defect prone or not.Most of them are based on either statistical learning techniques (Niel, 1992;Turhan & Bener, 2007) or machine learning techniques (Ceylan et al., 2006;Song et al., 2006).The defect prediction models often use static code attributes like code complexity, etc. as predictors and use difficult to change factors.Hence these models are more suitable for prediction than process control and monitoring.The software quality also depends on people-related factors like programmer skill, domain knowledge, experience, etc. (Antony & Fergusson, 2004).Moreover, the defect density at different sub processes or phases in software development lifecycle may be related to that of subsequent phases and also to the delivered defect density.So one of the ways to achieve the goal on delivered defect density can be to link the delivered defect density with phase wise defect densities and then control the different phases in the development process to meet the intermediate goals set on phase wise defect densities.This can be done using multistage statistical process control.The engineering and chemical industries have been successfully using multistage statistical process control methodology for monitoring multistage processes.The software development process also can be considered as a multistage process and the different phases of design, coding, testing, etc. can be considered as multiple stages in the process.This paper is an application of multistage statistical process control methodology for monitoring and controlling software development process.
The remaining part of this paper is organized as follows: a brief description of multistage statistical process control is given in session 2, the proposed methodology for controlling the software development life cycle process is given in session 3, session 4 discusses the application of proposed methodology in software quality management and the conclusions are given in session 5.

Multistage statistical process control
Many manufacturing and service delivery processes consist of multiple stages or sub processes.The examples are semiconductor manufacturing, software development, automotive body assembly, etc. (Tsung et al., 2008).Such processes are called multistage processes.The multistage process output quality often depends on the quality at the intermediate stages.Hence to achieve the output quality goal or target, it is necessary to link the final quality with the quality at the intermediate stages and control the intermediate stages in the process.The multistage process control techniques are developed for the aforementioned purpose.There are two widely used approaches for multistage process control namely engineering process control and statistical process control.The engineering process control uses a linear state space model based on engineering knowledge and physical laws (Jin & Shi, 1999;Ding et al., 2002;Djurdjanovic & Ni, 2001).The state space model provides an engineering tool for analyzing, modeling and controlling multistage process.
The commonly used multistage statistical process control methods are regression adjustment approach (Hawkins, 1993;Shu et al., 2004) and cause selecting chart (Shu et al., 2003;Shu & Tsang, 2003;Shu et al., 2005).The logic of regression adjusted approach and cause selecting chart are very similar to that of model-based control charts.The model-based approach is commonly used for autocorrelated data.The approach is to fit a suitable time series model to the quality characteristic and then monitor the residuals using a control chart (Montgomery, 2007).In regression adjustment approach, regression models are fitted for stage wise quality characteristic with control variables from respective stages as predictor variables.Then the residuals of each model are plotted on univariate control charts.In cause selecting chart, the regression models for the quality characteristic at every stage is developed by taking the quality characteristic at the previous stage as the only predictor variable.Then the residuals of the model are plotted on a suitable control chart.Another suggestion for monitoring multistage processes is to fit regression models for quality characteristics at different stages and then monitoring the residuals of the models using CUSUM charts (Zantek et al., 2006).
In this paper, the authors describe the application of regression adjustment approach for monitoring quality of software development process.

Methodology
The step by step details of the proposed multistage statistical process control methodology for managing quality during software development process is given below.The quality is measured in terms of defect density.
Step1: Classify the software development projects into homogenous groups.The groups should be formed in such way that the projects within a group are similar to each other but are dissimilar to projects in other groups.The domain, technology, account, etc. can be the grouping variable.
Step 2: For every group, identify the different sub processes or phases in software development process and shortlist the various control factors at each phase.Preferably choose the parameters which the project manager can change without much difficulty by altering the team composition as control factors.
Step 3: Collect data on control factors, phase wise and delivered defect densities from the projects in the group.
Step 4: Develop models for phase wise defect densities in terms of control factors using a suitable modeling technique.
Step 5: Predict the phase wise defect densities using the respective models and compute the residuals Step 6: Develop a model for delivered defect density in terms of predicted phase wise defect densities.
Step 7: Construct suitable control charts to monitor the residuals of the models.
Step 8: To apply the methodology to a new project, estimate the optimum phase wise defect densities to achieve the delivered defect density goal using the model developed for predicting delivered defect density.The strength and weakness of the project team can be considered as constraints while estimating the optimum phase wise defect densities.
Step 9: Identify the team composition which would result in the optimum values of control factors needed for achieving the required phase wise defect densities using models developed for predicting stage wise defect densities.
Step 10: Execute the project and at the end of each phase compare the actual defect density with the predicted defect density and plot the residuals in the respective control charts.Whenever the chart indicates out of control, carry out root cause analysis and take necessary actions.If required recalibrate the models The application of the methodology is demonstrated using a case study in the next session

Case Study
This study is carried out for telecom domain projects.The critical sub processes or phases identified for study are design, coding and link testing.Through discussions with the project managers and software engineers, the control factors at every stage are identified.The list of the phase wise control factors is given in Table 1.The data on the control factors and phase wise defect densities are collected from the past projects in the telecom domain group and models are developed for predicting the phase wise defect densities.Since some of the factors are categorical and the remaining are numeric, models are developed using classification and regression tree algorithm (Myatt, 2007;Crawley, 2007) using R package ( 2016).The models are cross-validated at different sizes of the tree.The plots of cost complexity factor (cp) versus the cross-validation error (x-val Relative error) are given in Fig. 1 to Fig. 3.The best models are obtained by pruning the trees with cp corresponding to minimum cross-validation error (James et al., 2013).The best models obtained are given in Fig. 4 to Fig. 6.The model diagnostic measures namely mean square error (MSE) and root mean square error (RMSE) of the models are given in Table 2. Table 3 shows that the residuals of all the three models are normally distributed (p-value ≥ 0.05).Hence the residuals can be monitored using a control chart (Jayathavaj & Pongpullponsak, 2014;Black et al., 2011).
Finally, a model is developed for the delivered defect density in terms of predicted phase wise defect densities.The details of model development are as follows: The correlation matrix of the variables is given in table 4. Table 4 shows that there is a good correlation between dependent variable delivered defect density and the predictors.But the correlation between predictors is also very high.Hence whether multicollinearity problem exists or not is verified by computing the variance inflation factor (VIF).The VIF values are given in Table 5.Table 5 shows that the VIF value > 5 for two of the predictors.Hence model cannot be developed using ordinary least square regression.The multicollinearity can be tackled by dropping some of the correlated predictor variables or using principal component regression, partial least square regression, ridge regression, etc.Since all the important phase wise defect densities are important, dropping some of the predictor variables is not a good option in this scenario.The principal component and partial least square regression will first generate uncorrelated components which are the linear combinations of the predictor variables.Then the model is developed using these components as predictors.Even though no predictor is dropped in these approaches, the predictors would not be directly used in the model.Another option to tackle multicollinearity issue is to develop the model using ridge regression.The ridge regression is a shrinkage methodology, which would give a simple model for dependant variable in terms of predictors but the coefficients of some of the correlated predictor variables are shrunk close to zero (Friedman et al., 2001).Hence it is decided to use ridge regression to develop the model using R package.The best value of the shrinkage parameter  is obtained through cross-validation.The mean square error versus log () plot is given in Fig. 7.

Fig. 7. MSE versus log () of ridge regression model
The best value of  and log () obtained from Fig. 7 is given in Table 6.The model coefficient obtained with best  value using ridge regression is given in Table 7 and the model performance measures are given in  Table 8 shows that R 2 and adjusted R 2 are greater than 0.6.Hence the model is reasonably good.The normality of the model residuals is checked through Shapiro-Wilk test and normal quantile -quantile (Q_Q) plot.The Q -Q plot is given in Fig. 8 and Shapiro test results are given in Table 9.Table 9 shows that the p-value > 0.05 and also the points in figure 8 are more or less on a straight line.
Hence the residuals of the ridge regression model for predicting delivered defect density also normally distributed.Since the residuals of all the models (table 3 and 9) are normally distributed, individual x control charts are constructed for monitoring the residuals and detecting the out of control situations (Bag et al., 2012;Noghondarian & Ghobadi, 2012).The control limits of the individual x control charts are given in table 10.For ease of implementation of the proposed methodology, a Microsoft excel macro based template is created.The screenshot of the template is given in Fig. 9.

Fig. 9. MS Excel template for the implementation
The project managers or leaders can key in the values of control variables to the excel template (figure 9) and execute the macro by clicking on the run button.The template would display the predicted phase wise and delivered defect densities.If the delivered defect density is not close to the goal or target, then the managers can adjust the control variables in one or more phases and identify a feasible combination of control variable values which would give desired delivered defect density.Then execute the project with the feasible setting of the control variables.At the end of each phase, measure the actual defect density, compute the residuals and plot the residuals on the respective control charts.Whenever the control chart is showing out of control situation, carry out root cause analysis and take necessary action.
If necessary, the model can be recalibrated.
The model is validated on seven projects which were not used for developing the models.The value of the control variables of the aforementioned projects is given in Table 11.The phase-wise predicted and actual defect densities along with the actual and predicted delivered defect density is given in Table 12.Table 12 shows that the defect densities predicted using the methodology is reasonably close to the actual defect density values.This showed that the proposed methodology can be successfully used for controlling the quality of the software development process.
The methodology has been pilot implemented on three projects.The values of the control factors, predicted defect densities using the macro tool and the target set on delivered defect densities are given in table 13.The table 13 shows that predicted delivered defect density is reasonably close to the target set for projects 1 & 3. Hence the projects 1 and 3 have been executed with the given setting.For the project 2, the predicted delivered defect density was 0.789 against a target of 0.6.Hence the project manager decided to change the control factors slightly and with the help of macro tool, identified that increasing the design review coverage from 60% to 80%, changing the development team composition such that the domain skill would change to "Practised" from "Learner" category, and increasing link testing test cases to 100 from 90 would give a predicted delivered defect density of 0.637 which was close to the set target of 0.6 for the project.Hence the project 2 was executed with the changed settings.The changed values of control factors and expected defect densities are also given table 13.The actual defect densities measured after the execution of projects and the corresponding residuals are given in Table 14.Table 14 shows that the actual values are very close to the predicted values and the residuals are within the control limits of the respective control charts.Thus the pilot implementation has once again confirmed that the methodology can be used for software quality management and achieve the target set on delivered defect density.

Conclusion
Quality, along with cost and schedule are important for the software firms to retain the customers as well as to get new projects from the customers.The software quality is generally expressed in terms of delivered defect density.The project managers need to quantitatively manage the software development process to achieve the goal on delivered defect density.The software development process consists of interlinked multiple phases or stages with the defect density at each stage impacting that at subsequent stages and the delivered defect density.The process performance models available for quantitative project management often model delivered defect density in terms of factors from different phases together or model defect density of a particular phase only.The multistage process control techniques are more suitable for quantitatively managing processes with multiple stages.In this paper, the authors suggested multistage statistical process control methodology for monitoring and controlling the software development process.The proposed methodology is a combination of process performance models and control charts.
The case study on the application of the suggested methodology for controlling projects of telecom domain is also discussed in the paper.The design, coding and link testing phases are identified as multiple stages in the development process.Using the data collected from past projects, models are developed for phase wise defect densities namely code review defect density, design review defect density and link testing defect density.The predictor variables are identified from the respective phases.
Since the predictors are a combination of numeric and categorical variables, the models are developed using classification and regression tree technique.Then a model is developed for delivered defect density in terms of predicted phase wise defect densities as predictor variables.Since the phase wise defect densities are correlated and multicollinearity issue existed, the model is developed using ridge regression technique.Finally, control charts are developed to monitor the residuals of the models.An Excel macro based template is developed for implementing the methodology.The project managers can enter the values of the phase wise predictor variables in the excel template and run the macro.The macro will compute the phase wise and delivered defect densities using the model.The project managers can compare the predicted delivered defect density with the target and if the predicted delivered defect density is not close to the required target, the managers can use the macro template to identify the optimum combination of predictor variables which would bring the delivered defect density close to the target.The methodology is validated on seven projects.The methodology is pilot implemented on 3 projects and the results are very encouraging.
The main advantage of the proposed methodology is that it gives the project managers the flexibility to tighten or relax the control at different phases and still achieve the goal or target on delivered defect density.Even though the case study is from telecom domain, the same can be used to monitor and control project of any domain.Similarly, the methodology can be used for projects with any number of multiple stages or sub processes.

Fig. 1 .
Fig. 1. cp versus cross-validation error plot for design review defect density (DR DD) model Fig. 2. cp versus cross-validation error plot for code review defect density (CR DD) model Fig. 3. cp versus cross-validation error plot for link testing defect density (LT DD) model

Fig. 4 .
Fig. 4. Regression tree model for design review defect density Fig. 5. Regression tree model for code review defect density Fig. 6.Regression tree model for link testing defect density

Table 1
List of control factors at different phases

Table 2
MSE and RMSE values of the modelsTable2shows that the RMSE values are reasonably close to zero.The residuals of the models are subjected to normality test.The Shapiro-Wilk normality test results are given in Table3.

Table 5
Variance Inflation Factor values

Table 6
Best  value

Table 8 Table 7
Model coefficients

Table 10
Control limits of charts constructed to monitor model residuals

Table 11
Control variable values used for validation