Evaluating the Impact of Critical Factors in Agile Continuous Delivery Process: A System Dynamics Approach

— Continuous Delivery is aimed at the frequent delivery of good quality software in a speedy, reliable and efficient fashion – with strong emphasis on automation and team collaboration. However, even with this new paradigm, repeatability of project outcome is still not guaranteed: project performance varies due to the various interacting and interrelated factors in the Continuous Delivery 'system'. This paper presents results from the investigation of various factors, in particular agile practices, on the quality of the developed software in the Continuous Delivery process. Results show that customer involvement and the cognitive ability of the QA have the most significant individual effects on the quality of software in continuous delivery.


INTRODUCTION
The Agile Manifesto places a high importance on the need for the frequent delivery of working software: "Our highest priority is to satisfy the customer through early and continuous delivery of valuable software" [1].This subtle principle indicates that not all developed software is actually made available to the customer for use where it actually adds value to the customer's business.As Humble et al points out: "It's hard enough for software developers to write code that works on their machine.But even when that's done, there's a long journey from there to software that's producing valuesince software only produces value when it's in production" [2].
Software delivery is inhibited by a number of postdevelopment issues: Configuration management problems, insufficient testing in production-like environment and poor collaboration among the various 'silos' in software projects are the major problems that cause software rejection at this Stage [2].A practical example of such problem is the lateness by the operations team to realize they can't support a version of developed software due to the incompatibility of the software architecture with their available infrastructure.This is strictly owed to the lack of involvement and collaboration of the operations team in the development process, thus, resulting in delivery failure.Such post-development problems are the motivation for the Continuous Delivery (CD) initiative [2][4] [10].
Tests automation, strong team collaboration, effective configuration management, deployment automation and good team culture [2] [10] are the major practices advocated in CD to boost the effectiveness of a frequent delivery process .However, these factors are not a surety to a smooth CD process; while there have been overwhelming testimonies of success with these practices ,most notably by Flickr and IMVUwith up to 50 deployments a day [4], there have also been numerous instances of failures [2][19].This shouldn't be surprising: project outcomes in software projects is faced by many limiting factors [5] [6].
Various interacting and interconnected factors are present in software projects and these are accountable for the inconsistencies in the quality of software project results [7].According to Brooks: "no one thing seems to cause the difficulty (in software projects)...but the accumulation of simultaneous and interacting factors... ."[7].
The primary goal of this work is to investigate the dynamic causal relationships of the variables within the CD 'system' and develop a System Dynamics (SD) [8] model to evaluate the impact of these pertinent factors on the quality of software projects adopting CD.This can be used as a tool to evaluate various managerial decisions and introduce reliability, predictability and risk aversion in the CD process.Vensim [9], free SD software is used for this research work.

A. Problem
Continuous integration, tests automation, good culture and strong collaboration have been identified as the "prerequisites" for a successful CD process [2][3] [4][10] [19].However, software projects are daunted with several interrelated problems which make the project outcomes unreliable [5][6]even with the adoption of the aforementioned "CD success pre-requisites" [2].The success of software delivery is impacted by a host of non-exhaustive factors that interact in a continuous mannercreating revolving loops within software projects [5].
Refactoring of an automated acceptance test suite, as an example, is hypothesized to have a causal and dynamic effect on CD process: As the acceptance test automation script increases linearly with the project progress, the test suite complexity, brittleness, as well as coupling increaseswww.ijacsa.thesai.orggradually introducing test smell into the automated acceptance test suite [11].This is worsened by the presence of schedule pressure; developers take short cuts by ignoring the test coding standards and ideals in order to meet up with the estimated work [6].The test smell effect has a negative ripple effect on the maintenance effort of writing automated acceptance tests [10].However, after refactoring the test suite, there is a significant reduction in the test suite maintenance effort due to the improved design of the test scripts [2] [10].Refactoring, of course, comes at a cost of extra effort [12].
Such causal effects of various practices are the determinants of the failure and success of CD and there is a wide gap in academic research within this context [18].Without the managerial proactiveness of the effects of various practices at various times in software projects, software delivery will continue to be uncontrollable, leading to many unpleasant surprises.A rigorous study of these variables, their dynamic effects and their impact on the quality of the developed software is vital to ensure repeatability and predictability of an efficient CD process.

B. Literature Review
CD is a relatively new paradigm; this explains the reason for the paucity of research work done in this field.At the time of writing this paper, there isn't any research work categorically done on CD.However, some works have been done within considerably similar context: Kajko-Mattson [12] developed a preliminary process model incorporating the two parts of release management: the vendor and user side.Lahtela et al [3] presented the challenges in the delivery of software by performing a full case study.The authors identified 7 different challenges encountered in the release of software.Van Der Hoek et al [13] identified the problems of releasing software from a component-based software engineering approach.The authors developed a tool to solve the identified problems.Krishnan [14] developed an economic model to optimize the delivery cycle of delivering good quality software.These works adopt a big-bang traditional waterfall approach to delivery and not a repetitive delivery processas is the case in an agile development.This casts a major doubt on the relevance of their findings to agile software projects.More so, these works are empirical based and not simulation based which indicates to a high degree that there is limitation on the control over the identified factors.Though these works give an insight into some of the problems with delivering software, these problems are wholly generic and highly aggregated.Abdel-Ahmed [5] was the first researcher to leverage SD in software process simulations.He investigated the effect of various management policies on development cycle time, quality and effort were presented.However, his work was based on the waterfall methodology approach which confines the applicability of the results to waterfall projects.The actual delivery process in software projects is also beyond the scope of his work.Melis et al [16] developed a SD model to investigate the impact of Test Driven Development (TDD) and Paired Programming (PP) on the cycle time, effort and quality of software projects.Cao [21] investigated the dynamics of agile software development and the impact of agile practices on cycle time and customer satisfaction using SD.With credit to the impact of the work done by these authors in agile software development, their works do not consider any postdevelopment activities relevant to software delivery.Furthermore, there is complete exemption of the impact of schedule pressure experienced by software project teams.
The authors of the paper assert that the successful conclusion of this research work is going to be a pioneering development in the field of CD and will create further insights in which new research interests can evolve.

II. RESEARCH SCOPE
This research aims to develop a SD model that delivery practitioners can adopt to have control over the delivery risk factors, particularly cost overrun and schedule flaws.Achieving this aim involves full investigation to determine the pertinent factors impacting the outcome of the CD practices described in section 1; the causal effects of agile practices on these advocated CD practices within the delivery pipeline [17] are also considered.Fig. 1 below presents an overview of the generic flow process of CD.The entire process line is known as the delivery pipeline, deployment pipeline or build pipeline [17].The complexity of the pipeline created by teams will vary depending on the level of available resources, project risks involved and criticality of the developed software [35].This research work is based on a standard 4-stage deployment pipeline as represented in Fig. 1.

Legend:
Level: Entity that builds or diminishes over a specified period of time; Inflow/Outflow: Rate of change in level Mathematically, the legend above is exemplified by: Fig. 1 above shows each activity and their corresponding artifacts.The success of each initial stage is a criteria for the commencement of the succeeding stage.Our work lies in this pipeline to determine the relevant factors affecting the efficiency of this 'journey' for frequent software delivery.

A. Research Goal
The goal of this work is to develop a SD model to act as a tool for the delivery pipeline to ensure a repetitive, predictable and risk-free CD activity for software projects.The model will ensure a fully controllable delivery environment and help management anticipate the results of their deliberate actions.

B. Research Questions
This research work is aimed at answering the following major Research Questions (RQ): RQ1: What are the key factors (environmental, human and technological)in software projects that impacting the success of CD? What are the agile practices that have an impact on the quality of software projects in the CD process?RQ2: What are the dynamic and causal effects of each of these factors on the software quality in CD? RQ3: What is the impact of the agile practices such as on-site customer, TDD,PP and Pair Testing on software quality in CD?What is the impact of the ability of the Quality Assurance (QA) tester on the quality of the software

C. Research Benefits
A number of benefits would be achievable from the success of this research work: Firstly, it will help to maintain a total control of the available resources to achieve a stable, repeatable and predictable CD process.The lack of such tool has created a huge gap in the industry and made delivery stability a difficult task.The stability that is realizable with this tool will help organizations striving to achieve CMMI levels 4 and 5 [22] accreditation.
This model may also be used as a risk management tool of the delivery process.Since the impact of potential technological and strategic decisions on outcomes such as project completion dates and number of deliverable features is possible via simulations, potential risks can be anticipated and proactively planned against or avoided completely.Several software organizations depend mainly on SD models as their major risk management tools [5].This model will act as an invaluable tool to project managers, release managers and senior management of software development organizations interested in the frequent release of their software to customers.
In addition, the model can serve as a process improvement tool by helping to determine points for optimization of important variables like acceptance rate, build time, required effort, and etcetera.

III. METHODOLOGY
This section describes how the objectives of the research work are planned to be achieved.

A. Data Sources
 Interview: Primarily, semi-structured interviews will be conducted with experienced agile consultants, project managers and developers to elicit the major active variables to achieve the objectives of this research work.A formal approach will be adopted to narrow down these factors to the most relevant active factors.
 Questionnaire/Survey: This will be developed and sent to practitioners within the CD field who will give their responses based on the valuable experience in the area.The responses will then be analyzed systematically.
 Literature review: Keywords such as "continuous delivery (modeling)", "release management (simulation)" and "software system dynamics" will be used to search for related work in digital libraries.Significant findings from related work will not only help in identifying some factors but also help in the quantification of the impact the factors have on other variables in the project.The quantification of this impact will be vital in the calibration of the SD model for simulations. Author's discretionary assumption: Where necessary, author's assumptions are used in the development of the model.Such assumptions will be sanctioned and perhaps, moderated by experienced agile practitioners via interviews and questionnaire.

B. Simulation
Simulations provide the computerized prototype of an actual system run over a specified period of time.They are useful in software projects to improve project understanding and knowledge base of project stakeholders.
Simulations offer a more realistic and cost-effective approach to realizing the objectives of this work as opposed to the 'rigidity' offered by empirical methods.The flexibility provided by simulation techniques to alter the variables for system behavior analysis will be impractical to achieve if the conventional empirical methods are adopted [5] [15].SD, a continuous simulation technique, provides the full functionalities to achieve the goals and objectives of this research work, hence, its adoption for this work.SD facilitates the visualization of the complex inter-relationship between variables in a software project system and runs simulations to study how complex project systems behave over time [6].A system dynamic model has a non-linear mathematical structure of first order differential equations expressed as: Where y represents vector of levels, f is a non-linear function and m is a set of parameters.www.ijacsa.thesai.orgIV.CONTINUOUS DELIVERY MODEL AND PARAMETERIZATION The full CD model is designed into three sub-models for ease of analysis: The schedule pressure sub-model, the delivery pipeline sub-model, CD cost effectiveness sub-model.Due to space constraints of this paper, only the automation acceptance testing section of the delivery pipeline sub-model is presented in this paper.This section is responsible for estimating the AAT Pass Multiplier.

A. The AAT PASS Multiplier
This sub-model was designed to determine the impact of various policies on the quality of automated acceptance tests.Schedule pressure --an occurrence triggered when actual time left to finish the development of the software exceeds the estimated time to finish the development of the softwareplays a pivotal role in the level of adoption of process improvement practices [5].The Figure below shows the dynamic modelling of factors responsible for the quality of the automated acceptance tests.
The authors have solely discussed the elicitation and calibration of the tdd factor only as space constraints of this paper makes it impossible to discuss all the active variables in the model TDD as a development technique has been a core practice in agile software projects.TDD involves a sub-iterative and incremental 6-step process in the following order: write failing unit test -run to ensure failure -write functionality code -rerun unit test to ensure success -refactor -proceed.This iterative and incremental procedure instills a high degree of reliability into the developed software and reduces redundancy in the production code and test artifacts [23].Some researchers have investigated the impact of tdd on the quality of automated acceptance tests: A recent research investigated the effects of TDD on external quality and productivity by using meta-analytical techniques [24].Results of the analysis suggest that the TDD has a relatively small positive impact on the quality of software; however, the impact on productivity is non-conclusive.
George and Williams [25] carried out a controlled experiment on 24 professional pair programmers to evaluate the external code quality and speed of development of the TDD adopters vis-a-vis waterfall approach adopters.3 experiments were performed on 8 person-group teams at 3 different companies to program Martin's bowling game task [26].Results showed the tdd group passed roughly 18% more tests than the control group that adopted the waterfall approach.The results in the work are however may have been confounded by the effects of PP.
A more relevant experiment was carried out on by Yenduri [27].A two 9-person groups of senior undergraduate students to evaluate the impact of tdd on software quality and productivity; one group used the test first approach and the other using the test-last approach.Results showed 55% improvement in the acceptance test pass rate of the tdd group.The task developed however was described as a "small" project which implies results could vary with larger projects.Also, the size of the subjects is relatively small.
A survey was conducted by the authors directed at experienced project managers and developers with over 5yrs experience in tdd usage to determine the impact of tdd on acceptance tests success.Analysis of the responses gave values ranging from 30% to 60% improvement, with the mean of approximately 54%.The authors assumed a modest value of 50% and this value was further supported by two interviewees.One of the interviewees (I1)said: "The high level of granularity of functionality testing during TDD, when done effectively, guarantees the behavioural requirement of the system is fulfilled, severely limiting the causes of failures during functional testing to environmental factors or inaccuracy in the requirement elicitation.Conveniently, we achieve 50% better success during acceptance testing than when we used to adopt the test-last approach... ." Baring other factors, the authors make a bold assumption that a full adoption of TDD should guarantee 100% success of acceptance tests.Hence, we estimate the success rate of acceptance tests pass rate without TDD to be 67% so that a full adoption of TDD will yield 100% success in our model.The estimated acceptance pass rate of the test -last approach is represented by the average non-TDD pass rate in the model.This value is quite close to values of test-last adopters (75%) in Williams' work [25].
The planned degree of tdd in the model represents the planned level of tdd adoption in the development of the software features for the project.No literature exists on the average level of tdd adoption.However, the standard degree of unit test coverage in the industry ranges from 80-90%.[28].Some platform providers maintain a strict level of unit test coverage before allowing promotion of software unto their platform.Sales force, a leading PaaS provider, insists on minimum unit test coverage of 75% before allowing promotion of customer's software unto their staging environment [29].In our project case study, the planned level of tdd adoption is 100% for the project i.e. all features were planned to be developed by tdd approach.
For ease of analysis, we assume that the test suite offers complete coverage; implying all behavioral defects in the system are detected during development when tdd is fully adopted.This follows a similar assumption made by Williams et al in the development of their economic model [30].
The actual degree of tdd adoption is affected by schedule pressure [6] [23].Developers tend to "cut corners" when the team is behind schedule to try and catch up .When a team is behind schedule, the procedural steps f adopting tdd are easily bypassed to increase development speed.The actual degree of tdd is the effective percentage of features developed using the tdd approach in the project throughout the project.There is no published work on the estimated impact of schedule pressure on the degree of tdd adoption prompting the authors to derive simplistic mathematical model to estimate the impact of schedule pressure on the planned degree of tdd adoption.Effective and simple mathematical models can be developed by researchers when reliable data is not available for model parameterization.Forrester advised: "A mathematical model should be based on the best information that is readily available, but the design of a model should not be postponed until all pertinent parameters have been accurately measured.That day will never come.Values should be estimated where necessary…."[31] As the schedule pressure develops, the team responds to falling behind in schedule by working extra hours and cutting their slack time to try and meet up with the lost work [32].This makes the initial effect of SP very minute, hence, the initial flatness in the curve.However, as the pressure mounts, the "threshold" is exceeded and the team responds by cutting corners and reducing their adoption of TDD steps, instead, following the test last approach.It then gets to a maximum point where a increase in SP doesn't have an effect anymore.This forms the tail end/flat end of the other extreme end of the graph.This relationship is built in the variable lookup of SP for tdd.tdd factor, the variable representing the impact of tdd on the estimated pass rate of the automated acceptance tests has the formula : average non TDD pass rate+(actual degree of tdd*tdd impact*average non TDD pass rate) where actual degree of TDD = IF THEN ELSE(Time=11, 0, planned degree of tdd*lookup of SP for tdd(schedule presure)), TDD impact = 0.5 and average non TDD pass rate=0.67.

V. MODEL VALIDATON
The model is validated in two folds, following the approach described by Richardson et el [20]: structural phase and behavioral phase.Structural validation is the examination of the structure of the entire model.This involves the studying of the inter-relationship and parameterization of the variables to ensure they are credible enough to produce replicate reallife scenarios.Experienced project managers, consultants and developers were sought for this process, with critical feedback used to rework the model in an iterative manner until the structure is approved by the reviewers.The model was also presented at two conferences and valuable feedback was incorporated to rework the model.
Behavioural validation aims to verify the model actually produces results that are similar to real-life project outputs.The model will be validated against data of output variables from a completed software project with similar characteristics www.ijacsa.thesai.org that successfully implemented CD.Coherence in the results between the simulation outputs and actual completed project outputs prove the model is capable of producing real-life project results, hence, validating the model.Also Success at this stage is a critical prerequisite before the model could be subjected to sensitivity analysis to answer the remaining research questions outlined.

A. Project Data
Data was sourced from a complete project that adopted CD and agile practices from a sales software vendor.The developed software is part of a comprehensive software suite used for enhancing sales of products by manufacturers.
The project case study used for this project was the software development project for their sales modeling solution.Data from the project is presented in the table 1.
Table 2 presents the data used to simulate the pressure experienced by the team.The pressure influences the adoption of major practices in the model and consequently the outcome of the project [5][6].Fig. 5 below shows the simulated graphical representation of the SP experienced by the team.Schedule pressure is determined across each iteration in the project by the formula: (3).

Iteration duration 2weeks
Team Size 5

Team Velocity 50
Agile Methodology Used XP/Scrum

Automated Acceptance Testing Tool
JBehave Team Experience Mix Average of 9years software projects experience Working hours/day 7.5

B. Simulation Results
The table below shows the extracted results from the simulation model."AA" denotes "automated acceptance" in table 3. www.ijacsa.thesai.orgThe data provided in table 3 is used to examine the validity of the model by comparing the actual project outcome with the outcome produced by the developed simulation model.The actual automated acceptance test pass rate and simulated automated acceptance test pass rate represents the actual number of passing automated acceptance and user acceptance test cases expressed as a percentage of the total number of automated acceptance and user acceptance test cases and simulated number of passing automated acceptance and user acceptance test cases expressed as a percentage of the total number of automated acceptance and user acceptance test cases respectively.Noticeably, the results from the model highly correlate with the actual project outcome.There were two main points of significant discrepancy in the values of the results for AAT results: The 1st, 2nd and 12th iteration.In the first and second iteration, the team recorded a low number of automated acceptance test cases due to the relatively few number of stories delivered which significantly reduced the total sample for that iteration.Hence, the high impact on the % variation between the simulated and actual results.It is plausible to believe that the actual pass ratios for these iterations with low test cases are exaggerated.In the 12th iteration, the team had significantly more actual failing tests due to the impact of major refactoring on the passing test suite which occurred in the 11th iteration.It has been reported that that software project teams generally experience problems of failing tests after major redesign due to the coupling among various components of the software [33].The 11th iteration is not recognized as a non-productive iteration by the simulation model as the actual project progress was inhibited due to the management decision to carry out major refactoring.This behaviour is not built into the simulation model as this is a manual decision made solely by the discretion of the team.
The major point of disparity in the simulated and actual pass rate in the UAT scenario is apparent in the 19th iteration.A possible argument for this is that testers tend to overlook many possible scenarios when a project is seemingly coming to closure and build assumptions into the system to get the project over with; in extreme cases, testers actually pass failing tests and are not really ready to find faults to avoid project extension and look forward celebrating project completion.This phenomenon was further attested by an interviewee (I2).This phenomenon may explain the considerable disparity in the passing test rates in the final iteration than that projected by the simulation model.

C. Model Experimentation
Experiments are performed to carry out sensitivity analysis on the model to determine the impact of various policies on the quality of the developed software by altering the planned level of adoption of the major influencing agile practices.The major practices of interest are: PP, PT, customer involvement and TDD.The impact of the ability of the QA (cognitive ability and domain savvy) is also investigated.Schedule pressure plays a prominent role in the actual level of adoption of the practices.The project data used for the model validation is used and the level of adoption of each practice is altered.Table 4 below shows the various scenarios typifying various managerial policies regarding agile practices adoption.Table 5 and Fig. 6 show the relative impact of applying various managerial policies on the quality of the software with the values rounded off to the nearest 1decimal point.Table 1 above shows the impact of various management policies on the AAT pass rates.Clearly, scenario 8 (adoption of al practices) provides the most outstanding results until iteration 5 when it levels up with scenario 7 (TDD and customer involvement).
While scenario 4's performance (customer involvement) is not the best, it provides the most stable pass ratios all through the project irrespective of the schedule pressure.Unsurprisingly, scenario 1 had the poorest results having not adopting any of the practices.7 and Fig. 7. Various PT adoption policies were simulated as well as determining the impact of management hiring options of the pertinent to the ability of the QA tester.These factors help improve the sad path test coverage and discover defects that are only usually discoverable by the system end user [34] [36].PT in the context of this paper is the practice of developers pairing with the QA alone or with the QA and onsite customer in writing and coding the test cases to run behavioral examples of system features authored by the customer [34].
As such, PT is not considered to have significant impact on the AAT pass ratio since the test examples written by the customers are unequivocally defined and does not necessarily need the exploratory testing skill input of a second tester/ developer.The impact of SP is clearly seen to reduce the pass ratio in some scenarios while it remains relatively inactive in some scenarios.The cognitive ability of the QA is noticeable to be most significant on the UAT pass rate in the project followed closely by PT.The adoption of PT and having a QA with high cognitive ability with commendable domain savviness yield 17% improvement in the UAT pass rate to a project with poor QA Ability without a pair tester.However, it remains to be known if the savings made by deploying a second tester and hiring a QA with immense domain savvy and cognitive ability are more than the cost of their introduction

D. Limitation of the Study
The calibration of the model was based on data from peerreviewed literature, surveys and interviews.Bias of any of the sources could inhibit the validity of the model.Furthermore, the sample size of the actual test cases per iteration produced by the team is relatively small.This being middle sized project, it may imply that this model is only applicable to middle-large sized projects with numerous test case developed due to high number of features; the model may yield different results for small projects.
Most importantly, these effectiveness of the various factors are valid under the conditions experienced by the project team, most notable the schedule pressure experienced.Intuitively, without the effects of schedule pressure process improvement practices, the adoption of these factors will yield better results.

VI. CONCLUSIONS AND FUTURE WORK
This paper reports a developed SD model that acts as a decision making and process improvement pool to software development teams practicing CD.The goal of the model is to improve the effectiveness of the CD process and help managers optimize their development process.The impacts of practices such as PP, TDD, PT, customer involvement on the quality of the software were investigated.The authors also investigated the impact of the QA ability on the quality of software.The impact on SP experienced by teams is also substantilized in this study.Validating the model against data from a completed middle sized project, customer involvement proves to have the most significant impact on the quality of onsite AAT while the cognitive ability of the QA has the most impact on the quality of UAT.
The authors are addressing the limitations of this work and currently working on evaluating this model in an uninfluenced and "ideal" environment by simulating an exploratory project case study to fully evaluate the impact of various managerial policies on the CD process.Furthermore, it is not enough to determine the qualitative impact of these various factors on the quality of the software project.This work points attention for possible concerns to address questions like: "what are the trade-offs of these practices and the optimal level of adoption of these practices on the CD performance metrics?"; what is the economic effectiveness of the adoption of the agile practices on the CD process?"; what is the extra resource requirement necessary to adopt these practices?";"is the extra cost necessary to incorporate these practices better devoted to other valueadding tasks such as development or QA?"; "do the benefits(quality improvement) of the adoption of these practices overweigh their associated costs?"

Fig. 6 .
Fig. 6.Graph for Automated Acceptance Test Rate for Various Scenarios

TABLE II .
PROJECT DATA USED FOR SCHEDULE PRESSURE SIMULATION

TABLE III .
RESULTS COMPARISON OF ACTUAL PROJECT OUTCOME AND SIMULATED PROJECT OUTCOME

TABLE VI .
SCENARIOS FOR UAT MODEL SUB-SECTION

62.23 64.65 60.5 72.1 66.5 67.7 74.5
Fig. 7. Graph for User Acceptance Testing Rate for Various Scenarios www.ijacsa.thesai.orgTable6shows the various scenarios of factors affecting the quality of user acceptance testing.The results for the UAT pass multiplier with various scenarios are presented in table