Avoiding Cost Overrun Through Stochastic Cost Estimation and External Quality Assurance

Cost overruns are a threat to project performance and continue to attract interest in both the popular media and the academic literature. Numerous studies from all continents have demonstrated that overruns remain prevalent in all industries. Although there are different suggestions as to what are the main causes of this problem, few studies have demonstrated what can be done to improve cost performance. This article provides evidence that improved cost estimation methodologies combined with external quality assurance can significantly reduce the extent of cost overruns in projects. The authors use data from 96 government projects in Norway, which implemented a quality assurance regime for large investment projects in the year 2000. The results show that cost performance was reasonably good. Only c. 25% of the projects subject to the regime experienced cost overruns. This suggests that by using proper cost estimation methodologies that are embedded in a governance framework that ensures that projects are subject to external scrutiny, the risks of cost overruns can be significantly reduced. The results should be encouraging for project owners who may have the impression that overrun is an unavoidable part of project delivery.

operate and maintain infrastructure over time. Therefore, accurate cost estimates and efficient project delivery are essential for ensuring successful projects.
However, project success is a heterogeneous measure and there is no universally accepted measure of what makes a project successful, even if the subject is at the heart of project management [1]. Traditionally, the project management literature has focused on the "iron triangle" of time, cost, and scope [2]. These criteria were challenged by Atkinson [3], who argued that they were inadequate. In recent years, many authors have argued for the need to take a wider and more strategic view. Instead of focusing on output, the attention has turned to the outcome and strategic effects. Projects are implemented to deliver benefits and create value for users, parent organizations, and/or society at large. Thus, Samset [4] argued that success should be measured from an operational, tactical, and strategic perspective, and incorporate the interests of the project, the users, and society. Success in operational terms typically means adhering to the criteria of the iron triangle, which are short-term targets. Tactical success refers to the achievement of the formal goals, often formulated in the project's business case. Strategic success covers the long-term economic impacts of the project, meaning whether or not the impacts can be sustained in the long term and continue to satisfy societal needs. He also argued, like Cooke-Davies [5], that it is more important to do the right project than to do the project right, and that a project can be regarded as a success despite experiencing a large cost overrun. Therefore, there is a need for a wide view of the success and failure of projects. Samset and Volden [6] referred to the University Hospital in Oslo project, which experienced considerable cost overrun and was delivered one year behind schedule, resulting in widespread negative media coverage. However, in relative terms, the overrun was equivalent to only a few months' operational costs for the hospital, and therefore insignificant from a lifetime perspective. Since its opening, the hospital has been regarded as highly successful, despite inefficient project delivery. Zwikael and Meredith [7] argued for a similar categorization of project success: project management success, project ownership success, and project investment success. Thus, cost performance only measures one dimension of project success. However, to be truly successful, projects should perform well in all three respects. Project efficiency remains an important element in project management, and project success is closely related to project efficiency [8]. Since the assessment of benefits for public projects such as schools, music venues, and military equipment may be characterized by personal judgement and uncertainty, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ quantitative measures such as the cost of implementation remain fundamental in the decision-making process of government departments and agencies.
Despite this, the track record on cost performance in large projects is poor. Overruns of more than 100% are not uncommon. Some high-profile projects, such as the Edinburgh Trams and the Scottish Parliament Building in Scotland, HS2 and Crossrail in England, Berlin Brandenburg Airport in Germany, and the extension of the Storting building, which houses the Norwegian Parliament, have attracted considerable public attention due to their cost overruns. The evidence of widespread waste is not just anecdotal. Several studies have documented that cost overruns happen regularly in all countries, in different industries, and both the private and public sectors. Odeck [9] reviewed 48 studies of cost performance in the transport sector from 1973 to 2015, covering all continents, and found that overruns were prevalent and on average 34%, regardless of the transport asset under question. Bent Flyvbjerg of Saïd Business School, Oxford, has repeatedly claimed that nine out of ten transport projects experience a cost overrun (e.g., [10]). Similar findings have been found in other sectors. Flyvbjerg and Budzier [11] studied 1471 ICT projects and found that the average overrun was 27%. One in six projects had a cost overrun of 200% or more. For some types of projects, such as the Olympic Games, 100% experience cost overruns. Flyvbjerg et al. [12] found that the average overrun in a sample of 19 of 30 Games organized between 1960 and 2014 was 156%. Almost half of all the Games exceeded their budget by more than 100%.
Even though the academic literature is full of examples of studies that document large cost overruns in different industries, there are also examples of good practice, where most project costs are below or close to budget. In Odeck's review of studies of cost overruns [9], 12 studies had overruns of 10% or lower. An earlier study of 620 road projects by the same author revealed a relatively modest mean overrun of 7.9%; 48% of the projects in the sample experienced no overruns [13]. In a later study of 1045 road projects, Odeck [14] found that even though the mean overrun was 10%, the largest projects had a mean underrun of 3.8%. Similar results were found by Odeck et al. [15], who demonstrated that among 22 large road projects estimated by stochastic methods and subjected to external quality assurance, the mean underrun was 10.8%. Underruns have been found in other industries too. Ahsan and Gunawan [16] studied international development projectsa project category often associated with poor performance. Among the 100 projects in their sample, the average underrun was 14.5%. Love et al. [17] studied 1093 water infrastructure projects in the U.K. and found that although overruns were more common than underruns, project costs were on average delivered below budget. The mean underrun was 0.8%. In a study of social infrastructure projects in Hong Kong, Love et al. [18] found that 43% of projects incurred a cost underrun from their contract award, and they suggest that projects may be exposed to both optimism bias and pessimism bias. The U.K. set out to improve the performance of publicly financed projects after Mott MacDonald [19] documented that large public procurements were underestimated by an average of 38%.
Park [20] documented that the efforts have been successful, as the mean underrun against the estimated P70 was 4.7%. In Park's sample, 62% of the projects were completed below the budget.
Despite some evidence of good practice, there should be little doubt that overruns are a challenge for project-based organizations worldwide. Limited progress seems to have been made in terms of improving cost performance. Most studies on the topic have focused on demonstrating failures, and often data for the studies have been aggregated over long-time scales and even collected from different countries, where project governance regimes may differ considerably. Less attention has been given to measures that have proven to improve cost performance. A frequently cited study by Flyvbjerg et al. [21], and one repeated in many subsequent publications (e.g., [10]), asserted that bias and underestimation are the root causes of cost overrun. However, merely to assume that planning processes and estimation methodologies in a large sample of projects have been inadequate, but without discussing how projects are planned and costs are estimated, is too simplistic. To improve the process by which projects are developed and budgets are set, we need to compare the results of different estimation methodologies.
This article aims to demonstrate that it is possible to reduce cost overruns through a project governance framework in which stochastic cost estimation and external quality assurance play a crucial role. In 2000, the Norwegian Government introduced a new and mandatory quality assurance regime for its largest projects. Since then, cost performance has improved, and today most of the projects are completed within budget. In the article, we demonstrate these results using a sample of 96 projects. We compare results across sectors, investigate whether there has been an improvement over time, and discuss whether the results could be compared with results from other countries.
Furthermore, the article offers empirically based advice to planners and decision-makers on what could be done to reduce the risks of overruns by demonstrating that overruns can be avoided if projects are implemented within a governance framework that ensures quality at entry. We pay special attention to the cost estimation methodology and the process for ensuring quality up-front. Few studies of cost overruns discuss how cost estimates are produced, even though many different methods for estimation are available. We argue that even if bias may be an unavoidable part of project planning, there is still a need to develop procedures that can, as far as possible, quantify the risks in individual projects and minimize the impact of deliberate human bias. The results presented in the article should be relevant to policymakers and anyone involved in the delivery of large projects in both the public and private sectors.
The article proceeds as follows. First, we review some potential causes of cost overruns, followed by a discussion of whether studies with data from different industries and countries can be compared. We then describe the Norwegian quality assurance regime as the context for our empirical findings. Thereafter, we present a description of the data and methodology, followed by the results. The article ends with our discussion and conclusions.

II. PREVIOUS STUDIES OF CAUSES OF COST OVERRUNS
In Section I, we referred to several studies that have documented the magnitude and frequency of overruns in different countries and industries. It is well documented and agreed among academics and practitioners that the extent of overruns is too large. There is less consensus regarding the causes of overruns. In this section, we refer to some of the main lines of explanations before we discuss if different studies from different countries can be compared.

A. Causes of Overruns
Along with the many studies of cost overruns, at least an equal number of explanations of overruns have been suggested [21]. The theories can be grouped into two categories. The first category includes behavioral and political explanations and suggests that conscious or unconscious bias leads planners to produce unrealistically low costs estimates that increase the likelihood of project approval. This suggests that underestimation may be an important reason why projects experience cost overruns. The second category of explanations is related to traditional causes of overrun, such as scope changes, contractual disputes, ground conditions, and other manifestations of uncertainty. Ahiaga-Dagbui and Smith [22] referred to the first category as underestimation and the other overrun, meaning that a project can experience cost overrun during project completion due to issues such as unexpected ground conditions, technical and managerial difficulties, and price changes, even if costs are not underestimated in the projects' front-end.
Flyvbjerg et al. [10] argued that the root cause of cost overrun is human bias, namely psychological and political explanations. They completely dismissed traditional explanations and argued that such issues may be causes, but not root causes, meaning that cost estimates should factor in these risks. Flyvbjerg et al. [10] argued that, if this is not done, the reason is either a matter of deliberate decision or self-delusion. They further argued that the problem is not cost overrun but cost underestimation. Hence, if we solve the problem of underestimation, we solve the problem of overrun. Flyvbjerg et al.'s [10] work has since had a substantial influence on cost estimation practice and on governments, such as in the U.K. and Ireland, where "optimism bias uplifts" are used to avoid intentional or unintentional underestimation [23], [24].
The rather provocative suggestions of widespread fraudulent behavior among planners and managers have appealed to both the media and parts of the scientific community [25], but such behavior remains a controversial issue, conclusive proof of it may be difficult to find, and it has been criticized by other scholars (e.g., [26]- [30]).
Traditional explanations for cost overrun take a more rational approach to project planning and delivery-that project performance can be improved through streamlining and improving procedures for estimation and planning. An important reason for cost overrun is the occurrence of events or conditions during project execution that does not concur with assumptions in the front-end.
Another source of uncertainty regarding project delivery is contract issues. In contracts that are awarded based on the lowest bid, the contractor may have an incentive to make unrealistically low bids and instead incorporate speculated costly change orders. If the client requests changes and additions beyond the agreed scope of the contract, the contractor will normally require higher compensation for carrying out such work. This, in turn, may result in contract overruns. In a study of 67 construction contracts, Love et al. [32] found that the deviation between the agreed contract sum and the final sum was on average 23.8%. Welde and Dahl [33] found similar results from Norway, where the average overrun among 712 contracts was 17%. They pointed out, however, that the projects in which the contracts were carried out normally had contingencies to cover excess payments to contractors and that overruns in contracts did not necessarily lead to overruns in projects.
Another potential reason for overruns is the dynamic between the project manager and the project owner [34]. This relates to governance in projects-how owners follow up and support their projects. Active owners will be able to identify and act on emerging problems more quickly and more effectively than passive owners. The project manager responsible for project delivery may have few incentives to deliver the project with costs lower than budgeted, so without active project ownership, the project may use all or close to all the funds available (Parkinson's law). Thus, in a portfolio of projects, small underruns in some projects may be insufficient to compensate for large overruns in other projects.
Finally, there is the often-ignored issue of estimation methodology and access to data. Large and complex projects are subject to uncertainty. Cost estimates should consider this uncertainty through risk analysis and by adding necessary contingency. Different cost outcomes have different probabilities. Even if we select the most likely cost of all elements in an estimate, there might be a less than 30% probability of that sum occurring. Therefore, large cost overruns should be expected in a portfolio of projects where estimates have been based on deterministic bottom-up estimation [35].
Stochastic estimation is used to consider the uncertainty in input parameters and to produce probability-based estimates. The median (the P50) is often used by decision-makers who are willing to accept a fifty-fifty risk of cost overrun. If costs are normally distributed, the P50 and the mean will be identical, but the reality is that the distribution of costs might be heavily skewed to the right. There is a limit to how much costs can underrun, but almost no limit to how much they can overrun. For example, a 100% overrun is quite possible, but a 100% underrun is impossible. Therefore, the mean, or the expected value, can be significantly higher than the median. Emhjellen et al. [36] demonstrated that the mean can be 20% above the median in a cost estimate with a positive skew and they argued that the practice of using the median instead of the mean could explain part of the observed overruns in many projects.

B. Comparing the Results of Different Studies
Despite the extant literature on cost overruns, there are reasons to exercise caution when interpreting and comparing the results between countries. One of the main reasons for the large differences in results between studies can be how cost overruns are defined and measured.
The term "cost overrun" may seem straightforward, but different studies use different definitions of the term. Siamiatycki [37] referred to 13 auditor studies of cost overruns. In eight of the studies, the final costs were compared with the estimate at the "go decision," while the other five were compared with the contract value. Less than half the studies had adjusted budgets and costs for inflation, which illustrates the concern raised by Love et al. [31]. The large differences between the results of studies may be due to the differences in the "point of reference" from which the cost overrun is measured. According to Invernizzi et al. [38], neither the Project Management Institute nor the Association of Project Management provides a formal definition of cost overruns. The authors argued that the assessment of cost overruns can be especially difficult when the development of a project is long and challenging. Love et al. [39] criticized the use of the term cost overrun to describe scope changes to a project. They argued that in cases when changes are sanctioned by the client, the term that should be used is "cost growth." In the transport sector, projects are often parts of strategic plans developed by the transport authorities. The projects may have long histories and undergo substantial changes before they are allocated formal budgets and client organizations can put the engineering works out to tender. Welde and Odeck [40] studied 42 Norwegian road projects that opened for traffic from 2000 to 2014 and found that while the deviation between the final cost and the P50 estimate approved by Parliament was a mere 1%, the average increase from the estimate prepared for the national transport plan developed years earlier was 39%. The difference between the two results is due to the fact that while a strategic plan may indicate an intention to carry out a range of projects and may be subject to considerable scope changes and cancellations, the projects are normally well defined once a budget has been allocated and money is allowed to flow.
As argued by Flyvbjerg et al. [10], the baseline for measuring cost overruns should reflect what we want to measure. If the intention is to study lock-in, scope creep, perverse incentives, optimism bias, and strategic underestimation, the early estimates, which often are produced by local promoters, should be the baseline. The budget at the time of the decision to build is relevant for measuring the quality of the decision-making process and the management of the project, while the contracted budget may be relevant for measuring the performance of contractors or the contract management of the client organization. However, the decision to build can occur at different cost accounting stages of a project [41]. Thus, there may be different measures of cost overruns depending on the starting point for the analyses. There should be a careful reflection on this point in comparisons of studies.

III. COST ESTIMATION, QUALITY ASSURANCE, AND BUDGET APPROVAL IN NORWEGIAN GOVERNMENT PROJECTS
In the year 2000, the Norwegian Government introduced a system whereby all large government projects are required to undergo external quality assurance (QA) of cost estimates and final business cases before Parliament could approve a budget for the project. For years, several road, railway, and public building projects had suffered large overruns and delays and had caused much concern and embarrassment for the responsible agencies and their ministries. A working group led by the Ministry of Finance investigated the problem and concluded that projects often were rushed through the decision-making process without proper scrutiny and concluded that there was a need to standardize planning procedures and cost estimation methodologies. Today, except for health-related projects and projects from the oil and gas industry, which have their own arrangements, all Norwegian government investment projects with an expected cost above NOK 1 billion (c. EUR 100 million) must be subjected to QA by external consultants selected by the Ministry of Finance [43]. QA is a system for ensuring a desired level of quality in the development, production, and delivery of products and services [15]. Independent and external peer reviews of forecasts and business cases have long been regarded as part of good practice in projects and as a tool for debiasing estimates that have been influenced by tunnel vision and delusion. It may be a potential remedy for cost overruns, optimism bias, and a way to improve the quality of front-end management by taking an "outside view" of planned actions [42], including planned project costs with completed projects. QA is normally used as part of a comprehensive system of project governance in which a financing party introduces systems and regulations to ensure that projects are successful. Volden and Samset [43] reviewed principles and practices for project governance in six countries and found that independent QA was mandatory in all of them.
The Norwegian system, often referred to as the QA scheme, is a gateway model and all large projects must go through two external reviews: 1) QA1-Quality assurance of choice of concept before the government decision to start a pre-project. 2) QA2-Quality assurance of cost estimates before the project is submitted to Parliament for approval and funding. The process can be described as shown in Fig. 1. 1) Original project proposals are often based on local initiatives, with rough estimates based on very little information on what the actual solution will look like, and without any in-depth analysis. 2) If the proposal addresses an actual problem, the Government may instruct one or more agencies to carry out a conceptual appraisal that considers several potential solutions (including a do-nothing or do-minimum alternative). The appraisal is then subjected to scrutiny by external experts through QA1 before the Government may (or may not) allow further planning to proceed. The early appraisals include rough strategic estimates to compare alternatives, but not for budget purposes. If a conceptual solution is selected and accepted for planning, the preproject will be subject to QA2, which includes a significantly more thorough estimation and uncertainty analysis. The project is compared with similar projects. On that basis, the consultant will make recommendations regarding the budget for the project, including necessary contingency reserves to account for uncertainty. Then, Parliament may take the formal decision to finance and execute the project. 3) Detailed planning and design start before contractors and suppliers are invited to make offers on the project. Bids are assessed by the client before one or more contracts are awarded for the execution of works. 4) After execution, the actual cost is established based on continuous bookkeeping, checks, and balances. The above-described process helps to ensure that projects that receive government funding are sufficiently mature and that the risk of optimism bias is reduced. By allowing projects time to develop, the risk of bad investments due to premature decisions is reduced. However, a long front-end may lead to increased expectations from stakeholders and escalation of commitment by decision-makers to the extent that final project approval is inevitable. Fig. 1 shows that project planning and appraisal may be a time-consuming process and that a cost estimate is not a single figure that is determined at the start of a project and fixed from thereon. Rather, it evolves as the project matures and is inherently linked to the development of the project scope and schedule [44]. The process allows for the rejection of unviable projects and ensures that projects that receive funding are sufficiently mature to proceed to the execution phase.
Cost performance is a central part of the QA scheme, and the responsible government agencies use a lot of resources on estimating the costs of the projects leading up to QA2 and parliamentary approval. In Norway, various forms of stochastic cost estimation have been common since the 1990s. The methodologies came about as a result of collaboration between researchers specializing in statistical theory, psychology, and engineering economics at technical universities in Denmark and Norway [45], [46].
Stochastic cost modeling, in which input variables are assumed to be uncertain and results are presented as probabilities, is not new. It has been used in various industries for decades, but stochastic cost estimation has been uncommon in government projects. The process in Norwegian agencies varies, but usually consists of the following main steps. 1) Establish a suitable analysis group of experts, who then prepare by reviewing planning documents and data from relevant reference projects. 2) Break down the project into a few elements using a topdown approach. 3) Quantify all uncertain elements using triple estimates. 4) Identify generic risks relevant to the inherent uncertainty and quantify their impact on total costs based on triple estimates. 5) Calculate total project costs using a stochastic estimation tool, normally based on Monte Carlo simulation. 6) Report to the responsible Ministry and inform stakeholders of the probability of different outcomes. The estimation process produces a range of outcomes with assigned probabilities. According to the Ministry of Finance's guidelines, the budget should normally represent the cost that has an 85% probability of being met (the P85 percentile), minus an identified potential for scope reductions [47]. Olsson [48] found that potential scope reductions agreed upon before project delivery were equivalent to 2.7% of project budgets. The use of "reduction lists" that can be implemented if costs escalate is thought to have a disciplining effect on project managers and gives the responsible agency a list of pre-approved scope reductions that can be implemented if necessary. This means that in most cases the budget is closer to the P80 than the P85 [49]. Studies that document the actual probability of cost estimates are rare, but by using the P80 to P85 value as the formal budget, Norwegian authorities have adopted a rather conservative approach to risk compared with using either the median or the mean.
A cost estimate for a hypothetical Norwegian Government project is illustrated in Fig. 2. In addition to the formal budget, the business case for projects includes a P50 estimate, which acts as a target for the responsible agency. The difference between the P50 and the budget, typically around 10% of the expected cost, is a contingency reserved at a higher organizational level and may only be used after department approval has been given.
The cost estimation process typically produces results in the way shown in Fig. 2. The total budget includes a reserve to cover pure uncertainty (unknown unknowns) and contingency reserves to cover the consequence of known uncertainty and calculated risks, in addition to the basis cost consisting of known cost items.
Cost estimation under uncertainty is carried out using different software and can involve considerable resources. Estimating the costs of large projects usually requires three to four days plus preparations, which may be extensive, and supplementary work. It involves up to 15-20 different people with different backgrounds and a professional facilitator. An open and standardized process that involves many people reduces the potential of one or a few individuals introducing bias into the estimates.
Stochastic estimation has the advantage of describing the uncertainty of a cost estimate and identifying the most important risks so that necessary mitigation steps can be taken. Projects have different risk profiles. A simple uplift for uncertainty may be insufficient to identify the riskiest projects, which may require special attention. As discussed in Section II, different studies of cost overruns provide limited information on which cost estimation method has been used, but our impression is that stochastic or probabilistic estimation is uncommon. Even in a capital-intensive industry such as oil and gas, cost estimation has traditionally been based on deterministic values [50]. Among governments, Australia is a notable exception, as the Department of Infrastructure, Regional Development and Communications [51] has issued a suite of documents on stochastic cost estimation guidance.

IV. DATA AND METHODOLOGY
In this section, we outline the empirical strategy for fulfilling the research purpose of the article. We first describe the data, followed by hypotheses, and the methodology that we use to answer them.

A. Data
The data were collected by the Concept Research Programme, 1 which is tasked by the Ministry of Finance to research projects that have been through external QA. The organizations responsible for the projects are required to submit accurate cost information to the program following Ministry of Finance directives. The projects in the sample were all subjected to the QA scheme, which ensured consistency in planning, estimation methodology, and project maturity at the time of the decision to implement the projects.
We compared the final cost with the budget (including the contingency) and we adjusted the budget and the annual project expenditure to the year of the final cost using sector-specific indexes developed by Statistics Norway.
The projects were carried out within the same governance regime and all budgets were based on the same methodology for cost estimation and risk analysis. This ensured a more robust assessment of causation compared with studies that used data from disparate sources around the world.
The sample for this study comprises projects that were subjected to QA2, that have been completed, and for which the project accounts have been finished. Since the QA scheme was implemented 20 years ago, c. 200 projects have been subjected to external QA2, of which c. 130 have finished. We had access to the final costs for 96 of the latter projects. The reason why the sample of final costs was smaller than the total number of finished projects is that it often takes time before project accounts are closed, due, for example, to disputes with the contractor and warranty work. To date, no projects that have been through QA1 have been completed.
As discussed in Section II, the baseline for comparing cost performance may vary between studies. In this study, we assess the quality of the cost estimates prepared for final budget authorization and the responsible organizations' ability to deliver the project efficiently. Cost estimates go through a process of refinement during project development, but there is no universal definition in the academic literature of the degree of detail needed at different levels of project maturity. AACE International provides a classification of estimate classes from concept screening (Class 5) through to estimates for bid/tender (Class 1). According to the AACE classification, Class 3 estimates are prepared for budget authorization with semi-detailed unit costs [52] and act as a baseline for later assessments of estimate accuracy. In the U.K., the Infrastructure and Projects Authority (IPA) expects project maturity to be at c. 60% in the final business case for making a final investment decision [44]. These classes of maturity fit well with the Norwegian cost estimates that are prepared for the final investment decision.
Most of the projects in our sample were approved for implementation between 2003 and 2010 (see Table I). Their average size, measured by their median cost estimate (the P50), was relatively stable: in nominal terms, it was in the range of NOK 1000-1500 million (c. EUR 100-150 million). The combined total value of the projects in the sample was some NOK 125 000 million in nominal terms. At the time, they were large projects by Norwegian standards. Since then, the average size of the projects subjected to QA2 has increased and currently, several building and construction projects are currently being implemented with an expected cost of between NOK 10 000 million and NOK 25 000 million.
The majority of the projects in the sample were carried out by the Norwegian Public Roads Administration (50), followed by the Norwegian Armed Forces (15), Statsbyggthe government's building commissioner (15), the Norwegian Railway Authority (9), and various other government agencies (7).
As Norway is a mountainous country, many road projects include both bridge and tunnel construction, in addition to ordinary roadworks, such as dualling or realignment. Therefore, the road project sample cannot be disaggregated further. The rest of the categorization follows the organizational responsibilities, except for five ICT projects carried out by the Railway Authority (1) and the Norwegian Armed Forces (4).

B. Hypotheses
The purpose of the article is to help planners and decisionmakers to understand potential strategies to reduce cost overruns. To do so, we assess cost performance through traditional descriptive statistics, and we investigate the impact of different variables. The variables and their associated hypotheses are presented in Table II.
The sample includes projects delivered both by organizations responsible for many large projects annually and by organizations that only deliver one or fewer projects above the QA threshold per year. The Norwegian Public Roads Administration is the largest land-based project organization in Norway, and we hypothesize that its projects should experience fewer overruns than those of other organizations. Furthermore, nonstandardized projects such as ICT projects and defense acquisitions should, in our opinion, be more vulnerable to overruns.
Learning should be an essential part of all organizations, especially capital-intensive organizations where the actions of individuals can have huge financial implications. However, the link between individual learning to group and organizational learning is often weak and complex. In itself, individual learning does not guarantee organizational learning. That means that even if an organization carries out a lot of projects per year, there is no guarantee that good practice in one project will benefit the others. In their study of transport projects, Flyvbjerg et al. [21] claimed that no learning had taken place, as they found that overruns were of the same order of magnitude as 70 years earlier.
Odeck [9] reached a different conclusion, as his sample of studies showed a reduction in overruns over time. We hypothesize that as organizations develop experience, we should expect a gradual improvement in cost performance.
Size could impact overruns in different ways. Small projects can have a higher risk of cost overruns than large projects because the uncertainty in large projects can be partly diversified away. However, large projects can be characterized by more uncertainty because size can be an indication of complexity and because large projects take longer to complete. However, there is no common definition of project complexity [53]. Odeck [13], [14] found that smaller projects had larger cost overruns than large projects (he found that on average large projects experienced underruns). Flyvbjerg et al. [54] found that small and large projects were equally prone to overruns, while Cantarelli et al. [55] found that large and very large projects had lower cost overruns. Despite indications that small projects may be more vulnerable to overruns, the transferability of results between countries may be limited, as what is defined as a large project in one country may be a small project in another country.
Our final hypothesis relates to the geographical dimension. We hypothesize that civil engineering projects in urban areas can be more demanding than greenfield developments in rural areas. Our sample comprises 20 road, rail, and building projects in urban locations and 54 in rural areas.

C. Methodology
To measure cost performance-cost overrun, or cost underrun, we use the measure most commonly used in the literature, namely the mean percentage cost overrun, which is defined as the actual final costs as a ratio of estimated costs. The reference point is point (2) in Fig. 1 and the percentage cost overrun (PCO) in each project is calculated as follows: where PCO is the per cent inaccuracy, X a is the actual final cost, and X est is the estimated cost or budget.
The PCO measures the overrun in the individual projects. For comparisons on the portfolio level, an averaging measure is required. We use the mean percentage cost overrun (MPCO), which is defined as follows: Costs include all the client's costs to implement an asset or procure services and equipment, which in turn includes all contracts with, for example, contractors and suppliers, design engineering, and ground acquisition, depending on the nature of the project. Final costs are defined as real, accounted costs determined at the time of final project completion (i.e., when the final project report and project accounts in the sample were completed). Budgets and annual project expenditures have been converted to the same measurement year, using the appropriate indices for the sectors where the projects were carried out. The purpose was to evaluate the ability of the responsible agencies to deliver projects within budgets approved by the Norwegian Parliament and to gauge the extent to which the same organizations, with the help of external expert reviewers, estimated costs, and identified risks so that budgets would be realistic.
We measure the final costs against both the formal budget and the P50 estimate. The budget is the formal point of reference for measuring overrun or underrun and represents the amount that Parliament has approved that the responsible organization should set aside to complete the project. The P50 estimate represents a target cost at which the responsible organizations (agencies) aim to complete the project. In a portfolio of projects, such as represented by our sample, about half of the projects should be completed below the P50 and about half above it. The P50 is also useful for measuring the symmetry of the distribution of final costs. Theoretically, the final costs should be normally distributed around the P50. As most international studies have shown that final costs are heavily skewed to the right, we investigate whether Norwegian projects experience similar fat upper tails.

V. RESULTS
In this section, we present the results relating to the hypotheses outlined in Section IV. Table III shows the descriptive statistics for the final costs compared to the formal budget. On average, the final costs are 4.4% below the budget. In other words, large Norwegian projects experience an average cost underrun that contrasts with most other studies on this topic. The result is significantly different from zero at the 95% level [t(95) = -2.4, p = 0.02]. The dispersion is relatively large, with a standard deviation of 17.8% and a minimum to maximum range from 43% under budget to 84% over budget.

A. Cost Performance
Only two projects have cost overruns above 30%. The confidence interval for the mean was -7.9% to -0.2%, which implies that (based on this sample) a random project will experience a cost underrun in 95% of cases. This is encouraging for risk-averse decision-makers. The added contingency has been sufficient for projects to keep within budget in most cases. The mode, or the most common value, is -1.8%. A typical Norwegian Government investment project experiences a small cost underrun.  When we consider only the projects with cost overruns, the final costs are on average 16% over budget. About 25% of these projects have overruns above 20%. Although the maximum overrun is large, this indicates that even among the "worst" performing projects, the results are within a range that should be acceptable to government agencies that can diversify among many projects annually.
The dispersion of the results is shown in Fig. 3. About half the projects have final costs within +/-10% of the budget. One project has a large overrun (84%). The outlier is the Norwegian Defence Logistics Project, a complicated ICT project that experienced both a large cost and time overrun. The red line illustrates the distribution of observations produced by the best-fit function of Palisade @RISK. The distribution that best describes the IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT  data is the Laplace distribution, which is similar to the normal distribution, but is more pointed in the middle and has tails that are not as thin as those of the normal distribution.
A first glance at Fig. 3 suggests a left skew in the results, as should be expected when the budget is set at the P80-P85 level. However, as shown in Table III, the number of projects above budget is higher than desirable. If estimates had been perfectly calibrated, we should not expect more than a maximum of 20% of the projects to overrun. This is not fully accomplished, as 27% of the projects have experienced cost overruns. Despite the potential for improvement, the result presented in Fig. 3 is different and better than most of the results reported in the literature. However, the authors of most international studies have not provided information on which methodology their estimates have been based on or their probability levels, so any comparison of the results would be circumstantial.
For evaluation of cost estimates, it is more common to use the P50. If the estimates were perfectly calibrated, the distribution of costs would be normally distributed around the median. Fig. 4 shows the distribution of costs compared with the P50.
Almost 60% of the projects have final costs above the P50. The mean deviation is +4.8% and the median is 1.5%. The mean is significantly different from zero at the 95%-level [t(95) = -2.2, p = 0.03]. This further illustrates the skew to the right, as also illustrated in Fig. 3. Again, as information on estimation methodologies in other studies is limited, it is difficult to compare these Norwegian results with results from studies of cost overruns elsewhere, but the deviation from the P50 and the approved budget in our sample appears to be smaller than reported in most other academic studies. This indicates that the probability-based estimation paired with external quality assurance may be a useful tool for curbing cost overruns. However, there are more issues to be considered. To improve practices and results, we must look beyond the descriptive statistics. Table IV shows the results for different project categories. The Norwegian Public Roads Administration has a low mean underrun in its projects, but the dispersion of its results is high and the share of projects above budget is the highest among the large project-based government agencies in the sample.

B. Difference Between Project Categories
Overruns are more common in ICT projects, but not in defense acquisitions, which have the lowest share of overruns among the projects in the sample. The Norwegian Defence Material Agency is highly specialized and possesses in-house expertise in project management, but the cost of equipment for the Norwegian Armed Forces is difficult to estimate because of its nonrepetitive nature and limited market transparency. The low share above budget may indicate that contingencies have been too high and that excess resources have been tied up in these projects. The two remaining categories-buildings and railways-have moderate mean underruns and mid-range frequency of cost overrun. Fig. 5 shows a boxplot of the differences between the project categories. The boxplot illustrates that the roads category includes the projects with the best cost performance and some of the worst. This is perhaps counterintuitive, given their relatively simple technology, long tradition, and comprehensive experience. However, road projects normally cover long distances over land with different geological properties. Furthermore, due to the mountainous terrain of the country Norwegian road projects often include tunnels and bridges, which may be more prone to complications than projects in other areas. Unforeseen and differing ground conditions, including underground utilities, represent a major risk in road projects. This may explain the larger variation in final costs compared with civil engineering projects that are fixed in location, such as buildings. Defense acquisitions have a similar large spread, but this is more intuitive.
They are highly specialized, technology-driven, complex, and hard to oversee, especially from the outside. ICT intuitively has a large spread and an outlier that lies well beyond the range, which is similar to the impression from the literature.
The results show that there are only small differences in cost performance between government agencies. The differences between groups are not statistically significant (ANOVA [F(5, 90) = 0.97, p = 0.44]), nor are the difference between ICT projects and the other projects in the sample [t(94) = 1.76, p = 0.08]. Therefore, our hypotheses that organizations responsible for many projects experience better cost performance, and that defense acquisitions and ICT projects are more vulnerable to overruns, cannot be verified. Table V shows the difference in cost performance between projects with investment decisions in the periods 2001-2005, 2006-2009, and 2010-2014. The results do not show any sign that there has been a positive development over time. While in the first years after the introduction of the QA scheme the projects performed in accordance with estimates (less than 15% of projects experienced overruns), the proportion of overruns in the later periods have been well below the maximum target. The difference over time is not a result of coincidence. A oneway ANOVA revealed that the mean cost performance between periods is statistically significant: [F(2, 93) = 3.535, p = 0.03].

C. Impact of Time and Learning
These results call for some critical reflection. If projects were identical over time, we might expect results to be too. There is no reason to think this is the case in real life. We know that the size of projects has increased, as indicated in Table I. Increasing size may not be a problem in itself, but often it indicates increasing complexity. Over the last 20 years, issues such as Industry 4.0 and the Internet of Things have entered the picture, transport is becoming electrified, ICT is now integrated into all types of projects, and all systems are integrated. Klakegg et al. [56] investigated governance frameworks in Norway, the  TABLE V  DEVELOPMENT IN COST PERFORMANCE OVER TIME   TABLE VI  THE IMPACT OF PROJECT SIZE ON COST OVERRUN   TABLE VII  IMPACT OF LOCATION ON COST OVERRUN U.K., and The Netherlands and found that projects had become more challenging, but also that efforts to control projects had improved. They warn that the effect of new procedures and improved systems wears off quickly. The governance frameworks, management systems, and other efforts to control projects need continuous improvement to avoid losing their effect. Table V shows the difference in cost performance between projects with an expected cost below and above NOK 1000 million at the time of budget authorization (c. EUR 100 million).

D. Impact of Size
The results presented in Table VI indicate that the cost performance for the two groups of projects is similar. An independent sample t-test confirmed that the difference in mean cost performance is insignificant and a result of coincidence [t(94) = 0.24, p = 0.81] The share of projects with overruns is lower for the smaller projects, but that may be because there were more small projects in the years after the millennium and projects generally performed better then. As such, neither of the hypotheses concerning project size presented in Table II is supported by the results.

E. Impact of Location
The difference in cost performance between projects in urban areas and other projects in the sample is shown in Table VII.
The results show that whereas projects that are not located in urban areas have a mean percentage cost overrun of -6.9%, urban projects have lower average underruns and a higher proportion of overruns. Table VII indicates that construction projects in urban areas perform worse than projects in rural areas. However, there is an overlap between the results, and the mean results could be due to coincidence. An independent samples t-test confirmed this. The difference between the cost performance of urban and rural projects is not significant at the 95%-level [t(71) = 1.70, p = 0.09].

VI. CONCLUDING REMARKS
In this article, we documented that good cost performance at the portfolio level (i.e., completing the majority of projects within budget and with mean underrun) is possible if we follow good practice for stochastic cost estimation and risk analysis, and combine the efforts of the responsible organizations with external quality assurance. We used empirical data from 96 large government projects that have been planned and implemented within the same governance framework. This is relevant to other countries because Norway has used stochastic estimation methods for over 20 years and for most of that time, cost estimates have been scrutinized by external experts before Parliament has been allowed to make investment decisions. Funding relies on projects being considered mature enough to proceed, after having been through the necessary and mandatory gateways. Most of the individuals involved in the preparation and scrutiny of estimates have nothing to gain personally from providing unrealistically low estimates, and the number of people involved effectively prevents individuals or stakeholders from deliberately biasing the estimates. The governance regime, within which the projects in the sample used in this study were planned and executed, ensures quality-at-entry, as it combines the inside view of experienced project experts with the outside view of external consultants that can draw on experiences from many other projects across sectors.
A mean underrun of -4% is considerably better than what has been reported in most of the academic literature on this subject. Even if we consider the possibility of hedging-that contingencies have been too generous-the results are good compared with those of other studies. The median and mean deviation from the estimated P50, which is the estimated cost without contingency, is modest (1.5% and 4.8%, respectively). This confirms the findings of Jørgensen et al. [57], who used similar data and found that estimates had been well-calibrated.
Most other studies of cost performance have found that most projects overrun their budgets. The results presented in this article are more positive and in contrast to, for example, those of Flyvbjerg et al. [10] and Odeck [9]. However, and as we referred to in Section I, there are other previous examples of good practice. This article adds to that literature and demonstrates that overruns can be avoided through recognized good practices for project planning and cost estimation. We argued that the institutional context through which projects are delivered can be crucial to their success. The results indicate that a common governance framework has a strong levelling effect across sectors. It helps the development of new knowledge and spread of experience from one project to another, and it institutionalizes good practices. The latter include who and the number of people that are involved, their incentives, how the quality of plans and estimates are scrutinized before funding is approved, and perhaps most importantly, how costs and uncertainty are estimated. Thus, we accept both the traditionalist view of cost overruns due to overruns during the delivery stage and the planning fallacy account that leads to underestimation in projects' front-end. The Norwegian approach combines an inside view through best practice methodologies with an outside view that recognizes that human error is an unavoidable part of project plans but can be rooted out through external quality assurance. Norway is, however, ranked among the least corrupt nations in the world and has strong institutions [58] which may reduce the risk of deliberate misrepresentation. Even if we would argue that a system that requires the decisions made by one part to be subjected to external scrutiny by another has merit across countries, there is no guarantee that the results presented in this article could be replicated in another country with other challenges and institutional structures.
In the article, we acknowledged that comparing studies between sectors and countries may be challenging, as information on project maturity and cost estimation methodology is lacking in most studies. We encourage authors of future studies to include this highly relevant information and to compare their data with the maturity classes suggested by, for example, the AACE in the USA or the IPA in the UK.
Despite the relatively positive results presented in this article, we have also identified room for improvement, described as follows.
1) The share of projects above both the total budget and the P50 is higher than desirable. The cost estimates have not properly recognized the asymmetrical nature of the true costs and perhaps unrealistically assumed a symmetric distribution.
2) The outcome space is typically higher than assumed at the investment decision time, especially for road projects, ICT projects, and construction projects in urban areas. The prediction intervals have been too small to reflect the true uncertainty.
3) There has been an increasing number of overruns over time. Learning from past projects has been inadequate. Estimation and QA may have become repetitive exercises and not adapted to the increasing complexity of projects. The approach to cost estimation and project governance presented in this article is not new. Stochastic cost estimation, quantitative risk analysis, and external quality assurance have been used in various industries and advocated by professional associations for decades. However, based on the reported results regarding cost overruns in the academic literature it may seem that the progress in implementing effective methodologies has been slow. For countries and organizations struggling with persistent overruns in investment projects, the Norwegian experiences may provide a valuable source of inspiration.