Construction Time Estimation Function for Canadian Utility Scale Power Plants

: Construction time and time overruns for infrastructure projects have been frequently studied; however, the construction time of power plants has not been studied. This lack of study is problematic, as more renewable energy power plants, such as wind and solar, are planned for many jurisdictions. Accurately estimating the construction time of a power plant will assist construction planning, budget estimates, and policy development encouraging the use of more renewable sources. The construction times of utility scale power plants in Canada were studied using publicly available data. Multiple linear regression analysis techniques were applied to the data to generate construction time estimation functions for all power plants together, and for individual technologies. The analyses reveal that construction time is sensitive to jurisdiction and the decade of construction, indicating that decisions made by individual Canadian provincial governments at different times had statistically signiﬁcant impacts on construction time. The analyses also indicated that construction time is a strong function of installed capacity, independent of technology. This ﬁnding suggests that large solar or wind energy facilities will encounter longer construction times similar to large hydroelectric facilities.


Introduction
The focus of this paper is the development of an estimation function for the construction time of utility scale power plants, and the associated statistical analysis of the results from the estimation function. Construction cost of power plants has been often studied in the literature; however, a statistical analysis of the construction time for power plants has not been executed previously. The development of this function is novel and aims to increase technical accuracy and to inform policy makers. The technical motivation is to provide a function for use in the preliminary project planning phase to accurately estimate the required construction time, logistics, and budget. The policy motivation is to inform policy makers about the time required to implement renewable energy sources and integrate the sources into the grid because a concern of many governments is energy security. This paper determines the construction time for power plants using stepwise, multiple linear regression techniques with interaction terms. The paper is limited in geographical scope to Canadian power plants because of the limited number of provinces. Data was collected for the provinces of Alberta, British Columbia, Manitoba, New Brunswick, Nova Scotia, Ontario, Prince Edward Island, Quebec, and Saskatchewan.
Prior to a discussion of the investigation, the statistical tools and approach are explained and justified. The investigation begins with existing construction time data for hydroelectric, nuclear, and wind power plants across Canada. These data are explored for sensitivities to year of construction, technology, location, and size. A regression is The majority of research on construction times has focused on waterworks and sewage [11], buildings [12], transportation infrastructure [13], and petrochemical projects [14]. Six categories of delay were identified: contractor performance, owner administration, planning and design, government regulations, environmental assessments, and supervision. Of these, contractor performance is considered the major delay cause. Upward of 70% of projects experience delays with an average delay of 39% of the intended construction duration [11]. Contractor performance was echoed as a cause of delay by Alhajri et al. with specific examples cited of poor site management and conflict with the contractor from interviews conducted with project managers, engineers, and supervisors [14].
Al-Momani considered construction delays and causes for 130 public projects in Jordan such as residential, medical, and schools. The sources of delays varied based on the group surveyed. Engineers identified cash flow from the project owners as the main delay, while owners identified design errors by the engineers and labor availability of the contractors as the main sources of delay. The survey groups also identified that the early stages of a project were the origins of most delays, which coincides with the findings by Wright. The analysis used Excel to develop a simple linear regression for the construction delays [12].
A more recent study of power transmission projects attempted to determine the origins of the delays that afflict that sector. A survey of 311 stakeholders identified 63 delay factors that can be organized into ten groups: sector-specific, general, administrative, employer, contractor, consultant, materials, equipment, labor, and unavoidable. The sector-specific and general factors are novel from this study as they identified factors that are particular to power transmission projects such as access roads, right-of-way, poor site management, and poor coordination between different types of work [15]. The study did not include any regression techniques and was solely empirical; however, the identification of factors impacting delays is useful for power plant related analyses.
Love and their colleagues examined cost overrun probability distributions for 276 Australian construction projects with the objective to see if the distribution is normal [13]. Normality was assessed as applying the incorrect distribution would affect the predicted results. This assessment is important since Flyvbjerg assumes Gaussian distributions in their technique. The authors used Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared tests to compare distributions and found that the distributions are non-Gaussian and are best described by a three parameter Frechet function for cost overruns. This result suggests that Flyvbjerg's normality assumption for cost overruns is incorrect.
Further, there is debate whether the project type and geographical location affects the cost overrun prediction. Flyvbjerg predicts independence of location but dependence of project type [16]. Dependence upon project type was also identified by Bhargava and their colleagues for highway projects, and project type influenced both cost and construction time [17]. Meanwhile, Odeck predicts independence of project type [18]. Last, Flyvbjerg's approach is that projects of the same class are assumed to have the same optimism bias [19]; however, earlier work by Love and Odeck show that the bias is not uniform for a given class [18,20,21]. The present work assesses the dependence of location and project type, and applies non-Gaussian distributions, thus contributing to the literature. The novelty is the consideration of construction time instead of cost.
The technical approach considered here is the use of multiple linear regression (MLR) techniques, as described in Section 4. Regression analyses have an established presence in predicting cost and time overruns of infrastructure projects. Samarghandi et al. used regression techniques to quantify construction delay factors in Iran for residential projects and educational facilities [22]. Educational facilities have been a particular interest for assessing cost and time overruns, with a more advanced regression analysis executed by Asiedu et al. [23]. Samarghandi et al. used linear regression techniques where a single dependent variable was correlated with a single independent variable. The Asiedu et al. team used multiple linear regression (MLR), where they identified 10 predictive variables that could influence cost overruns. The MLR analysis determined that five of the variables influenced the overruns [23]. A study of 911 building projects in Ghana also applied MLR analysis in the statistical software R and considered seven predictive variables [24]. A total of three variables were determined to influence the completion cost of the projects.
Other construction projects, such as buildings and drainage, have been analyzed using regression techniques. Senouci et al. used linear regression models to predict cost overruns in Qatari construction projects. These models were developed in Excel and determined the correlation between contract price and cost overrun [25]. Croatian water infrastructure projects have also been modeled using MLR techniques. Ninety-three projects were considered with data collected via survey of project managers. These surveys identified 108 variables that were evaluated using MLR, and a set of 5 variables were determined to have the majority of the impact on the success of a water project [26].
Electrical infrastructure projects have been studied less often; however, an extensive analysis of cost overruns was done in 2017 of 401 projects across 57 countries. These projects were constructed between 1936 and 2014, and the team applied a linear regression analysis to the construction costs. They determined that large projects such as hydroelectric and nuclear facilities have a high correlation with cost overruns while decentralized facilities such as wind and solar have a negative correlation [27]. This study did not include a time trend, as the data was clustered in certain decades. For example, most of the nuclear power plants used in this study were constructed during the 1970s. A subsequent study examined hydroelectric power plants and their time overruns by considering 57 hydroelectric dams installed between 1975 and 2015 [28]. This study only included hydroelectric facilities financed by the World Bank Group and assessed the uncertainties in the costs and benefits of the technology. The data analyzed indicated that 80% of the 57 facilities considered experienced a time overrun, which highlights the need to accurately predict the construction of hydroelectic facilities [28].

Materials
A list of power plants across Canada was assembled, and the construction start and end dates compiled by searches through utility documents, press releases, and newspaper reports. The construction dates are subject to variability because the definitions of a "start" and "end" are non-uniform. Some "start" dates referred to when a license or regulatory approval was granted for a project, and other "start" dates were when the design process began, or when construction began. Similarly, the "end" date may be when the power plant is grid connected, or when the balance of equipment is installed. The variability of the dates affects the final results and is discussed in Section 6.
The data collected is a combination of continuous variables and categorical variables, as shown in Appendix A- Table A1. The continuous variables are the power plants rated capacities, and construction times. The categorical variables are power plant type, province where they were constructed, and decade of construction. These are categorical variables because they can only have discrete values. The categorical variables are necessary to include because each province in Canada may have different regulations and procedures for the approval of a power plant. Further, nuclear power plants are subject to federal approval; therefore, power plant type becomes a relevant category. The decade of construction is an additional categorical variable, and is intended to capture larger effects such as the economic or global conditions that may impact the construction of a large project.

Model
The theoretical framework is composed of early diagnostics, regression analysis, and residual analysis, and was completed using R, a free programming language and software for statistical analysis and graphics. A couple of early diagnostics were performed prior to the regression analysis. First, the dependent variable (Time) was plotted against each independent variable to observe any non-linearities. A correlation matrix was computed to see if there is any indication of multicollinearity among the independent variables. In regression analysis, a function must be established that relates the dependent variable to the independent variable. The function may use the dependent and independent variables directly, known as a level-level, or use the logarithmic of one or both variables. Additionally, the function may include interaction terms which are other variables that may influence the dependent variable. Further, the function may be linear or a polynomial. For the regression analysis considered here, several models were considered of the following functional forms. The objective was to determine the functional form that captured the relationship between the construction time and the independent variables the best. Each of the following functional forms was applied to the data, and a stepwise regression applied within each form. The stepwise regression added parameters to the model such as technology type, decade of construction, or midpoint of construction. At each step and form, the residuals of the model were calculated and recorded. The form that had the lowest residual was selected as the best model. Below are the forms considered: • The level-level form without interaction terms is the simplest model and uses the dependent and independent variables directly. For example, construction time as a function of power plant installed capacity. The log-level form without interaction terms is similar to the level-level form except the logarithm of the dependent variable is used. In this case, that would be the logarithm of construction time as a function of installed capacity. A logarithm may be necessary, as some technologies have long construction periods. A related form is the level-log form without interaction terms. This model would be the construction time, as a function of the logarithm of installed capacity. The power plants considered here range in capacity from a few MWs to the GW scale; therefore, the logarithm of installed capacity may provide a better fit. Lastly, the log-log form without interaction terms uses the logarithm of both the dependent and independent variables.
Interaction terms describe any relationships between the independent variables that may have an effect on the construction. Models with interaction terms are more sophisticated than the basic models and contribute to the stepwise regression approach used here. The simplest models were attempted first, then interaction terms added to determine if a more accurate model in terms of predicting construction time was possible. An example of interaction terms in the context of power plant construction, is the jurisdiction where the power plants are constructed can have an interaction with power plant size. A jurisdiction with large hydroelectric resources may become more efficient at construction of large installed capacities; therefore, the construction time would be shorter compared to other jurisdictions. The model would require a term to account for the interaction of jurisdiction with power plant size.
The level-level, log-level, level-log, and log-log models all attempt to construct a linear relationship between the independent variables and the dependent variables. A line of best fit is the result of these models. With some data, a linear relationship is not feasible and a nonlinear relationship is required. The polynomials describe nonlinear relationships such as quadratic, cubic, or quartic relationships between the independent and dependent variables. These models are more sophisticated than the linear models and may also use logarithms of the variables.
Each model is estimated by the ordinary least squares (OLS) method [29]. Initial model development and selection of the functional form was performed using a stepwise regression procedure. For each functional form, stepwise regression selects a group of independent variables by employing forward selection, backward elimination, or bidirectional elimination methods. Stepwise regression adds or removes independent variables to a regression model by optimizing an objective function/criterion such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), and R 2 value. The set of models suggested by the stepwise regression was then reduced by applying several selection criteria: standard error of regression, overall significance of a model, individual significance of independent variables, and adjusted-R 2 value. The number of models was further reduced by a multicollinearity test and a residual analysis. The ideal regression model will determine the function that relates the dependent variable to a single independent variable. Actual data may involve the dependent variable being related to several independent variables or the independent variables may be related to each other, which is multicollinearity. The existence of these relationships must be checked and the extent of the relationships evaluated.
Residual analysis was implemented to see if the surviving models satisfied the six multiple linear regression (MLR) Gauss-Markov assumptions. For this purpose, first residuals were plotted against fitted values to detect if there was a serious violation of the linear parameters, zero conditional mean, and homoskedasticity assumptions. Similarly, a histogram and a normal probability plot of residuals were created to check the normality assumption. Finally, outliers and influential observations were determined using outlier tests, standardized residuals, and Cook's Distance. After the residual analysis, model selection was completed by assessing the remaining models in terms of their predictive power.

Results
The statistical techniques described in Section 4 were applied to several data samples of Canadian power plants constructed over the past 100 years. The total sample size was 87 power plants with 25 wind, 23 nuclear, and 39 hydroelectric power plants. The total sample size was first assessed to distinguish if any power plant technologies appeared to be outliers, as shown in Section 5.1. The subsequent subsections consider models for hydroelectric power plants, and nuclear power plants. In these models, time is returned in months while the power plant size is provided in MW.
The wind power plants were modeled in isolation, but none of the models were significant compared to the arithmetic average construction time because of the very small variation in construction times of wind power plants. The available data for the wind plants were to the nearest year of construction unlike the other power plants where the data were available to the nearest month of construction. Construction time should be a continuous variable, which is possible for monthly data; however, yearly data creates a discrete variable. In the specific case of the wind power plant data, the recorded construction times were either 12 months or 24 months; therefore, a function was not generated from this data.

Modeling All Power Plants Together
When considering all power plants together, the stepwise regression procedure determined that a log-log with interaction terms model passed all model selection criteria and gave the lowest residual error. All functional forms were considered, and a subset of the their resulting R 2 values are shown in Table 1. Only a subset of the stepwise regression is shown in Table 1 because the other forms and combinations of indicators gave results that were not statistically significant. The three models reported in the table are all statistically valid; however, the third model indicates the best fit via the R 2 value and is reported in the following.
The selection process identified that the key plant-type indicators are for hydro and wind, and that the construction time is dependent upon the size of the power plant. The process also identified that the location of the power plant does not affect the construction time. Lastly, the process identified another key indicator, mPoint, that represents the year at the mid-point of the construction time of a given power plant.  [30]. The estimated model is where β 0 through to β 5 are the regression coefficients, while is the residual error. The regression process determined the values and significance of the coefficients as shown in Table 2. Substituting the parameters from Table 2 into Equation (1) gives where the denotes the estimated nature of the equation. The summary statistics of the model are provided in Table 3, where N indicates the number of data points used in the model, s is the standard error and is the average distance of the data points from the regression line. In terms of the units of the model, a standard error of 0.3523 corresponds to a difference of 1.4 months. The summary statistics indicate that the model is statistically significant. The use of technology indicators in the model shown in Section 5.1 reveals that a single estimation model for the construction time is not feasible. The inclusion of technology indicators in Equation (2) means that the type of technology will alter the predicted construction time. For example, if a hydroelectric power plant is considered, the construction time will be reduced given the −0.323 before the hydro indicator. If construction time was insensitive to technology, then Equation (2) should not include "hydro" or "wind". The subsequent step is to consider regression models for individual power plant technologies.

Modeling Hydro Power Plants
The form of this estimation function for hydroelectric power is different than the form used for all power plant technologies combined; indicating a different relationship between construction time and size of the power plant. Applying the functional forms shown in Section 4, a level-log with interaction terms model was found to provide the best fit according to the regression procedure. The results from other possible, statistically valid, model forms are shown in Table 4. The hydro power model introduced interaction terms for province and decade of construction. The inclusion of additional decade indicators was necessary because large hydro plants may require 10 or more years to complete construction. The longer duration means that the construction process is sensitive to economic conditions at the time. A logarithm for the power plant size was also introduced as the data for hydro power plants included a large range of sizes, thus the logarithm may linearize the relationship between size and construction time. The resulting model is where D 1910 and D 2000 indicate decade of construction, and QC indicates the province of construction. All are categorical data that take discrete values of 0 or 1. The model, with coefficients, is shown in Equation (4) where the coefficients originate from the regression results shown in Table 5. With the summary statistics provided in Table 6. The summary statistics of the model, shown in Table 6, indicate a higher standard error for the estimated construction time compared to the standard error achieved with the model for all power plants. The standard error is equivalent to 1.7 months compared to the earlier error of 1.4 months.

Modeling Nuclear Power Plants
Assessment of the construction time for nuclear power plants generated a log-level with interaction parameters model. The logarithm is applied to the construction time, which is significant because the nuclear power plants took disproportionally long to construct for their nameplate capacity. Compared to the previous models, the construction time of nuclear power plants is more influenced by geographic location, and the decade of construction if a nuclear accident occurred. Similar to the hydro power plants, nuclear power plants may take 10 or more years to construct, thus the economic and socio-political environment will influence the continued construction. Unlike nuclear power plants, hydro power plants have not encountered suspended construction because of accidents thus the hydro power plant model did not require a logarithmic function for time.
The nuclear power plant model is a function of mPoint. The parameter mPoint is the year corresponding to the midpoint of the construction period for a given power plant. This parameter is a continuous variable and is necessary to capture major shifts in public acceptance for the continued construction of a nuclear power plant. Accidents such as Three-Mile Island, Chernobyl, and Fukushima occurred in short periods of time; however, the resulting shift in public acceptance had lingering effects. For example, after the Three-Mile Island accident, no new nuclear reactors for power generation were constructed in the United States [31]. The results from the leading model forms are shown in Table 7. This model is unique in that cross-products of interaction terms and independent variables are present, as shown in Equation (5).
where Darlington indicates a particular nuclear power plant. The Darlington power plant appears as an indicator because the plant was under construction when the Chernobyl accident occurred. The ramifications of this accident on the construction time are elaborated in Section 6. The regression process applied to this model generated the coefficients tabulated in Table 8. Substitution of the parameters in Table 8 into Equation (5) gives The resulting model has a standard error of 0.082, shown in Table 9, which is equivalent to a 1 month discrepancy in construction time.

Discussion
The discussion of the results is organized into three subsections that consider all the power plants together, hydro plants in isolation, and nuclear plants in isolation. Finally, the limitations of the models and the data are discussed and assessed against the hypothesis.

Modeling All Power Plants Together
The model shown in Equation (1) indicates that the coefficients for the Hydro and Wind variables are statistically significant, as their p-values are less than 0.05. The presence of the Hydro and Wind variables means that the construction times of wind and hydro power plants are significantly different from that of nuclear power plants. This result agrees with Flyvbjerg's assertion that project type is a factor for cost overrun predictions [16], rather than Odeck's later assertion of type independence [18]. This difference in type required the derivation of separate estimation functions for each type of power plant. Despite the need for project type indicators, the model has a small standard error of 1.4 months, which is substantially less than the 30% increase in construction time proposed by Koch [6] for wind turbines and the 39% determined by Al-Khalil [11]. Thus, this model provides a more accurate estimate for construction time.
The model also demonstrates a statistically significant correlation with the power plant size because the MW variable has a p-value less than 0.05. The model includes the mid-point of construction to achieve a more accurate prediction. The mid-point of construction is important because for longer construction times, such as those experienced by hydro and nuclear power plants, there may be a change in economic conditions during the construction that influence the balance of the project.

Modeling of Hydro Power Plants
The hydro power model indicates that hydro power construction time is consistent across Canadian provinces and decades except for construction in Quebec, and construction during the 1910 and 2000 decades. The model, Equation (4), indicates that the time to construction is dependent upon the size of the power plant because the MW variable appears. Further, the construction time exhibits a linear relationship with power plant size. The exceptions for Quebec and the two decades indicate an increased construction time.
The longer construction time in Quebec may be a function of the interactions of Hydro Quebec with the First Nations whose traditional lands were affected by the hydroelectric installations and reservoirs [32,33]. For example, the La Grande project began construction in 1971, but a court order brought by the Quebec Association of Indians stopped construction in 1975, and construction was not completed until 1985 [32]. Since the La Grande project, with its associated long construction time, was included in the data set used for the regression analysis, the Quebec parameter appeared. The appearance of the Quebec parameter also contradicts Flyvbjerg's assertion of independence of location for cost overrun predictions [16].

The extended construction time for plants built during the 1910s pertains to Great Falls Generating Station in Manitoba and Sir Adam Beck I Generating Station in Ontario.
The Great Falls Generating Station's construction commenced in 1919 and was completed in 1928. Construction was delayed by World War 1 and subsequent "material shortages and uncertain financing", caused additional delays [34]. The first generating station came online in 1923, and in 1928 there were a total of six generators in operation [34].
Two factors that influenced the duration of construction projects were the accessibility of transportation and available technologies. A lack of road infrastructure during the 1910s made transporting materials and heavy machinery more difficult in comparison to later decades. A period report states that heavy equipment was moved using skids, horses, and steam hoists. In the warmer months, ferries could be used, but when the water froze over, equipment had to be moved over the ice [34].
The Sir Adam Beck (SAB) I generating station project located near Niagara Falls, Ontario commenced construction in May 1917. By 5 December 1925, it had nine generators in operation, with a tenth generator to be finished and in service by July of 1930. The 437 MW project was unparalleled to anything the Ontario government had done before, and made it the world's largest hydroelectric power station for its time [35]. Being the first of its kind, SAB encountered a number of novel construction challenges. The power station required a 20 km hydro canal to be excavated across uphill terrain to exploit a larger hydraulic head, starting in Chippawa and ending in Queenston [35].
The construction delay during the 2000s is attributable to the economic recession in 2008, which impacted the budgets associated with the necessary construction logistics [30].

Modeling of Nuclear Power Plants
The nuclear power plant model (Equation (6)) demonstrates that the construction time for nuclear power plants is consistent across provinces and dependent upon the size of the power plant, and the mid-point of the construction period. The consistency across provinces is demonstrated by the absence of provincial variables except for Quebec, and the Darlington power plant variable. The construction time for nuclear power plants in Quebec is longer because of the data for the Gentilly 1 facility. This facility introduced many new technologies that encountered delays, and is only one of two nuclear power plants constructed in Quebec. The impact of the single power plant on the province is higher than for provinces with more nuclear power plants. The models also demonstrate a dependence on the mid-point of the construction period, which is expected because of the number of phases that nuclear construction must complete in Canada [36].
The Darlington nuclear power plant was constructed in Ontario between 1982 and 1993, and there are several factors to consider when investigating this lengthy construction time. The Chernobyl accident occurred during construction, which led to the provincial government requesting further analysis of the power plant's safety. The Canadian Deuterium Uranium (CANDU) reactor is technologically very different from Chernobyl's RKMB reactor design; however, a construction halt was imposed during the review. A change of scope for Darlington caused the most amount of delays along with uncontrolled economic factors [37]. A decrease in forecasted electricity load was the main reason for major project delays for all four units. Units 3 and 4 suffered further economic impact as policies stipulated a reduction in the borrowing for capital expenditures [38]. This sudden change in access to capital negatively impacted the construction time. In 1986, there was a strike by the electrical workers on site causing six month delays on both Unit 1 and Unit 2 [37].
While these unexpected factors caused the majority of the delay for Darlington, other technical and mismanaged resource issues created further delays. The first major observed issue was the damage to the fuel bundles caused by vibrations from the main heat transport pumps followed by cracks observed in the main generator shaft which required a redesigned rotor shaft [39]. The turnover of trained operating staff cascaded from Units 1 and 2 delays [37]. The technical issues were observed early on in the project for Units 1 and 2, allowing Units 3 and 4 to be commissioned with the optimally designed parts.

Limitations and Evaluation of Hypothesis
The estimation functions generated for all the power plants combined, hydro power plants in isolation, and nuclear power plants in isolation are limited by several factors. The first factor is that the regression functions were developed using Canadian data, thus the specific functions may not be applied to other countries. The function forms reveal the independent variables and indicators that should be included in the construction time estimation functions. These parameters are transferable to other countries. The second factor is the limited data set used to generate the regression functions. The wind power plant data included in the models for all construction plants lack a continuous variable for construction time as the reported completion of these power plants is to the nearest year rather than nearest month. The third factor is that the data was collected through public resources. More precise analyses are possible if the utilities and contracting companies provide their schedules from each facilities' construction so that construction start and end times may be narrowed to a uniform definition and narrowed to the day rather than the year or month. More accurate analyses are possible by increasing the size of the data set, which is possible if the same utilities provide the required data on all their facilities.
The hypothesis is that the construction time for a power plant is a strong function of installed capacity. The presence of installed capacity in all the models confirms the hypothesis. The models also reveal that the construction time is dependent on the technology, thus a universal model for predicting construction time is limited.

Conclusions
The construction times of utility scale power plants in Canada were examined using multiple linear regression analysis techniques to assist in identifying the factors that may contribute to time overruns. Construction time overruns for power plants is an area where there has been little study, while other infrastructure projects such as educational facilities, water systems, and highways have been thoroughly studied. This lack of study has implications for construction planning, policy development to adopt more renewable power, and larger grid level planning. If the construction time is poorly estimated, the logistics of certain supplies and workers to arrive on site will be incorrect, policy windows may close that are favorable to certain technologies, and the retirement of older, inefficient power plants may be delayed.
The MLR analysis of Canadian power plants revealed that the construction time is strongly dependent on the installed capacity of the power plant, shown by all the models (Equations (2)-(6)) requiring the inclusion of the power plant size through the MW variable. The construction time is also a function of the type of technology, as the analyses demonstrated that hydro and nuclear power plants required separate models (Equations (4)-(6)) to accurately estimate their construction time compared to a model developed from hydro, nuclear, and wind (Equation (2)). Lastly, the construction time is a function of location, as both the hydro and nuclear models required location indicators.
The inclusion of technology indicators on construction time means that an accurate estimate of the construction time for novel power plants may be difficult in jurisdictions where no power plants of that type currently exist. Construction planners are recommended to use the worst case, i.e., longest construction time, from existing technologies. In the Canadian context, the regression model developed here for hydroelectric facilities is recommended. These regression models are also of use to policy developers because there is typically a policy window within which a policy should be enacted. Policy analysts aiming to encourage the use of more utility-scale renewable energy are recommended to use the regression models to estimate how long the power plants will take to construct. This duration will indicate how long a particular construction may remain at the forefront of the public's perception. The duration will also help with planning to meet environmental targets. Lastly, the inclusion of jurisdictions in the models suggests that policy planners should examine the policies implemented by certain Canadian provinces that may have eased the implementation of power plant technologies.
The development of regression models for the construction time of Canadian utilityscale power plants fills a gap in the knowledge of construction time and time overruns for energy infrastructure. These models have several limitations, the major limitation being that the models were developed with publicly available data. As a result, not every power plant in Canada was used in the regression analyses as many power plants have not released the necessary data. For this research to continue further, power plant construction companies and energy ministries are encouraged to make time line data publicly available. A second limitation is that the end times of the construction may not be uniformly defined across the entire data set. Some reported end of construction times were when a power plant was grid connected, while others used when a generator became operational. The last limitation is that the start times of the construction varied where some facilities used the issuance of government licenses as the start, while others used the breaking of ground as the start.

Acknowledgments:
The authors would like to thank Y. Han for her editorial assistance with the paper. The authors would also like to thank A. Boehm for her research assistance. This work was not funded by any organization.

Conflicts of Interest:
The authors declare no conflict of interest.