How does project size affect cost estimation error? Statistical artifacts and methodological challenges
Highlights
► Diverging results on how project size affects cost overrun. ► Reported results from observational studies likely to be statistical artifacts. ► The robust evidence that exists is limited to smaller tasks.
Introduction
Percentage cost estimation error in projects may be measured as the difference between the actual and the estimated cost divided by the estimated cost. If the percentage cost estimation error is positive there is a cost overrun and if negative there is a cost underrun. A frequently reported estimation bias is the tendency towards higher percentage cost overrun on larger than of smaller projects. This finding is reported for tasks with sizes ranging from simple, small scale tasks, e.g., the paper sheet counting tasks in (Roy and Christenfeld, 2008), to large engineering projects (Gray et al., 1999, Heemstra and Kusters, 1991, Moløkken-Østvold et al., 2004, Yang et al., 2008b). Explanations in support of an increase in cost overrun with increasing project size may involve both human biases, e.g., increased over-confidence with increased size and or complexity (Grieco and Hogarth, 2009) or the belief in linear increase in complexity with increase project size when there is a non-linear relationship (Staats et al., submitted for publication), and biases resulting from rational estimation strategies, e.g., the variance shrinkage effect that typically occurs if people base their estimates on the average cost of similar projects (Hatton, 2007).
There are studies that report the opposite bias, i.e., a decrease in cost overrun with increased project sizes. The opposite bias is, for example, reported in several engineering project contexts (Bertisen and Davis, 2008, Creedy, 2006, Odec, 2004). In addition, there are studies showing no significant relationship between project size and percentage cost overrun, e.g., the rail and road construction projects in (Flyvbjerg et al., 2004). The difference in results has not surprisingly led to differences in recommendations. Odec (2004), who found an increase in percentage cost overrun with decreasing project size, recommend that managers should pay special attention to cost control of the smaller projects. Sauer et al. (2007a), who found the opposite bias, recommend on the other hand the managers to keep the projects small to avoid cost overruns. The common belief among practitioners, as far as we have experienced, is that the percentage cost overruns of larger projects tends to be higher than those of smaller ones. The belief that larger projects are less predictable and manageable than smaller ones, is for example consistent with the creation of software development methods based on so-called “incremental development” (Larman and Basili, 2003), i.e., methods based on splitting a large project into smaller ones.
There may be natural reasons for the studies' differences in reported relationship between project size and percentage cost overrun. There may, for example, be differences in the ability to and complexity in handling large projects in different contexts. It is, however, also possible that the statistical analyses typically used to support the claimed relation between project size and cost estimation bias are problematic and cause artificial variations in the results. In that case we should be very careful about making claims about the underlying (causal, “true”) relationship and recommendations based on the analyses. In this paper we claim that there are indeed problems with the analyses. The analyses do not, we argue, enable a separation of statistical artifacts from underlying relationships.
The remaining part of this paper is organized as follows: First, we review relevant empirical studies and report that the studies' difference in results seems to be strongly related to their choice of project size variable. Then, we apply regression analysis mathematics to show that the observed project size related difference in reported results is an expected consequence (statistical artifact) of imperfect correlation between the estimated and the actual cost. The examinations of problems related to random error in project size measurement and non-random samples suggest that project size variables related to the estimated and to the actual cost are likely to lead to interpretation problems. We discuss the interpretation problems and argue that robust knowledge about the underlying relationship between project size and cost estimation bias may require other types of studies and analyses, e.g., controlled experiments with fixed project size variables and studies aiming at better in-depth understanding of the involved mechanisms.
Section snippets
A review of empirical studies
In this section we review empirical studies that indicate an underlying relationship between project size and percent cost overrun. The main purpose is to give an initial assessment of the validity of general claims regarding the impact of project size on percentage cost overrun. The assessment is based on the assumption that a change from one meaningful project size measure to another should not lead to a substantial change of the reported underlying relationship between project size and
The effect of choice of project size variable
The relationship between project size and cost estimation error may be modeled as follows, using the actual cost (ACT) as the project size variable:
Correspondingly, we may model the relationship using the estimated cost (EST) as the project size variable:
The expression in Eq. (2) implies that there is an increase in percentage cost overrun (disproportionally more under-estimation or less over-estimation) with increased actual cost if β1 > 0, a constant estimation
Random error in measurement of project size
Measurement of the estimated and actual cost of projects will typically be exposed to random error. This is a violation of an essential assumption of ordinary regression analysis, which requires that the independent variables are “fixed” (or at least have no random error). A fixed variable is one were the values of the independent variables are set and not just observed as a result of sampling projects from a population. If the random error of the independent variables is substantial, the
Non-random sampling
Non-random sampling may be a consequence of practical concerns, e.g., data collection limitations, or caused by processes inherent in the phenomenon under study. As is the case with incorrectly specified models, the direction and size of the total bias is in general hard to predict (Berk, 1983). We will in this section argue that non-random sampling contributes to the interpretation problems of observational studies of the relation between project size and cost estimation error. An extensive
Discussion and conclusion
We have in the previous sections argued that there are reasons to doubt the robustness of observational studies on the relation between project size and cost estimation bias. There are a couple of, laboratory-based, small tasks, studies that avoids the interpretation problems through fixed size variable values and random treatment. These two studies suggest that there is an increase in effort (cost) overrun with increased task (project) size. In the other reviewed studies, however, we cannot
Acknowledgement
The authors would like to thank Les Hatton, John Hill, Kim E. van Oorschot and Michael M. Roy for making their data sets available.
References (51)
- et al.
Tests in contingency tables as regression tests
Economics letters
(2009) - et al.
Overconfidence in absolute and relative performance: the regression hypothesis and Bayesian updating
Journal of Economic Psychology
(2009) - et al.
Inconsistency of expert judgment-based estimates of software development effort
Journal of Systems and Software
(2007) - et al.
Experts' estimates of task durations in software development projects
International Journal of Project Management
(2000) A review of studies on expert estimation of software development effort
Journal of Systems and Software
(2004)- et al.
Software effort estimation by analogy and “regression toward the mean”
Journal of Systems and Software
(2003) A note on the analysis of repeated measurements of the same subjects
Journal of chronic diseases
(1962)Regression facts and artifacts
Evaluation and Program Planning
(2000)- et al.
Variability and reproducibility in software engineering: a study of four companies that developed the same system
IEEE Transactions of Software Engineering
(2009) - et al.
Properties of the geometric mean functional relationship
Biometrics
(1988)