PREDICTING FREEWAY PAVEMENT CONSTRUCTION COST USING A BACk-PROPAGATION NEURAL NETWORk: A CASE STUDY IN HENAN, CHINA

mark.king@qut.edu.au Abstract. The objective of this research was to develop a model to estimate future freeway pavement construction costs in Henan Province, China. A comprehensive set of factors contributing to the cost of freeway pavement construction were included in the model formulation. These factors comprehensively reflect the characteristics of region and topography and altitude variation, the cost of labour, material, and equipment, and time-related variables such as index numbers of labour prices, material prices and equipment prices. An Artificial Neural Network model using the Back-Propagation learning algorithm was developed to estimate the cost of freeway pavement construction. A total of 88 valid freeway cases were obtained from freeway construction projects let by the Henan Transportation Department during the period 1994−2007. Data from a random selection of 81 freeway cases were used to train the Neural Network model and the remaining data were used to test the performance of the Neural Network model. The tested model was used to predict freeway pavement construction costs in 2010 based on predictions of input values. In addition, this paper provides a suggested correction for the prediction of the value for the future freeway pavement construction costs. Since the change in future freeway pavement construction cost is affected by many factors, the predictions obtained by the proposed method, and therefore the model, will need to be tested once actual data are


Introduction
Henan Province is located in the centre of China. Its area is 167 000 km2, it has a population of 99 mln and contains 18 cities ranging in size from 1.5 mln to 11 mln people. In 1994, Henan's first freeway was built from Zhengzhou to Kaifeng. In 2007, the total length of the freeways in Henan Province was 4556 km, and by 2020, it is expected that the total length will reach 6280 km.
In the past 10 years, the freeway network in China has developed very quickly and the total investment has been huge. However, in some freeway projects, the final construction cost is higher than the estimated cost at the detailed design stage, which in turn is higher than the conceptual cost at the preliminary design stage. In the context of government financial accountability practices in China this presents challenges; any deviation is likely to be queried, and the Secretary of the Provincial Transportation Dept or a senior official in the department will often have to defend the increased costs publicly or in the state legislature. As a result, the legislature and the public will have perceptions of incompetence and erosion. A more accurate cost estimation process for freeways would therefore contribute to greater public and government confidence in the operation of infrastructure planning and development agencies, as well as contributing to more efficient budget processes.
Researches have indicated that projct definition in the early planning process is an important factor leading to project success (Le et al. 2010;Scott-Young, Samson 2008;Thomas, Fernández 2008). To prepare reliable budgets for freeway construction programs, road authorities must have accurate estimates of future funding allocations they are likely to receive, and future project costs for long term infrastructure programs. While future funding is obviously never known with a great deal of certainty, it is more often the inaccurate estimation of project costs that causes greater disruption to the execution of construction programs.
Various critical factors must be identified to estimate construction costs effectively. Several studies have set out to identify relevant factors, ranging from generic management and financial factors through to those that are specific to the industry under consideration. Stoy et al. (2008) identified quantitative cost factors such as absolute size, construction duration, and compactness as influence factors for good bidding information. Liu et al. (2011) found uncertain factors such as meteorological factor has a great uncertainty in the construction schedule of hydropower construction. Pinto and Mantel (1990) identified the ten critical factors such as project scope, management goals, time planning and management, communication with owner, etc. In a study conducted in Newfoundland, Hegazy and Ayed (1998) found that season, location, type of project, contract duration, and contract size had a significant impact on an individual contract cost. Wilmot and Cheng (2003) described future construction cost in terms of predicted index values based on forecasts of the price of construction labour, materials, and equipment and the expected contract characteristics and contract environments. In a building construction study conducted by Cheng et al. (2009b), ten key quantitative factors were identified in the planning stage of projects. Six were quantitative: floors underground, total floor area, floors aboveground, site area, the number of households and households in adjacent buildings; and four were qualitative: soil condition, seismic zone, interior decoration and electromechanical infrastructure,. Thus, examination of the literature shows that a wide variety of factors have been found to influence construction costs. Factors such as those described above have been used in models of construction costs, but the models rarely attempt to use a comprehensive set of factors. In part, this is a consequence of the methods used for estimation. Shi, Li (2008) integrated rough sets (RS) theory and Artificial Neural Network (ANN) to forecast construction project cost. To overcome cost overruns in projects, some methods such as Probabilistic Simulation (Chou et al. 2009) and Support Vector Machine (Cheng et al. 2010;Chou 2011) have been used to develop appropriate cost models for predicting the expected project cos.
On the contrary, regression analysis represents a traditional approach (Khosrowshahi, Kaka 1996), an inherent disadvantage of which is its requirement of a defined mathematical form for cost functions, i.e. the nature of the relationships between variables must be assumed at the outset. In addition, such traditional methods of estimating project costs are hampered by the large number of important variables and the interactions between them. In addition, some of the variables that influence construction costs, such as the cost of labour, equipment, and materials, are usually highly correlated with each other, resulting in multicollinearity in the model when more than one of them is included. Thus, traditional methods are limited in their potential applicability to the estimation of construction costs.
As a comparatively new method, Neural Network (NN) models have no implicit functional form and therefore have greater freedom to fit the data than do regression models. It is therefore possible that the greater flexibility in the relationship between input and output variables in NN might translate into a better model than that achieved with regression analysis. One purpose of the research reported in this paper is to use NN to identify a better model. Some researchers have employed NN models to estimate the construction costs of individual projects (Ji et al. 2009). By combining NN and fuzzy logic, Boussabaine (1999), Boussebaine and Elhag (1999) developed neurofuzzy systems to estimate the construction cost and project duration of individual building projects. Wilmot and Mei (2005) developed a NN model to estimate highway construction cost escalation over time. Cheng et al. (2009b) developed an evolutionary fuzzy neural inference model to estimate costs at the concept stage. Ma et al. (2012) propose to modify the existing model (a single cost for cost-sensitive neural networks), the traditional back-propagation neural networks (TNN), by extending the back-propagation error equation for multiple cost decisions. Yip et al. (2014) presents a comparative study on the applications of general regression neural network (GRNN) models and conventional Box − Jenkins time series models to predict the maintenance cost of construction equipment.
Furthermore, hybrid models (combining NN and other approaches) have also been developed to estimate construction costs. Hegazy and Ayed (1998) used NN to develop a parametric cost estimating model for highway projects, with optimal NN weightings optimized by genetic algorithms. Kim et al. (2005) applied hybrid models of NN and genetic algorithms to residential building cost estimation in order to predict preliminary cost estimates.
These studies indicate that NN and NN hybrid models have been used instead of traditional methods to estimate the cost, duration, and other features of construction project costs, including highway construction projects. However, it is also clear from the limited literature that NN models have usually been used only for individual construction projects, rather than investigating the overall cost of construction across a range of projects, and examining how their cost alters over time. This approach has an inherent limitation, i.e. that the models developed are relevant only to the case studied, and will therefore not be readily generalizable to other projects. And the models discussed also lack relevance to similar projects undertaken some time later, as the model is specific to a particular time as well, whereas some of the important variables are changing over time in ways which they are modelled. The objective of this paper is to address these issues by developing a NN model based on a range of freeway pavement construction projects and taking temporal factors into consideration. In particular, this research will apply a back-propagation (BP) NN model to predict design cost estimates for freeway pavement construction projects, using historical data on freeway construction projects in Henan Province as a case study of the application of the approach.

Influential factors analysis
The first step in developing the model is to identify which factors influence the costs of freeway pavement construction, so that they are considered for inclusion in the model. These factors have been categorized below as location, resource or time factors, though they also incorporate other variables, e.g. location is related to altitude and topography, both of which influence pavement construction costs in Henan. While the list of potential influencing factors is quite lengthy, a balance needs to be found, such that the number of factors is sufficient to provide adequate forecasts of costs, but not too large for practical application in a management setting. There is no clear guideline as to what the ideal number of factors should be. In this study, it was judged that the nine factors described below (two location, four resource and three time-related factors) should provide a more comprehensive basis for modelling and forecasting than has previously been the case, without creating disproportionate information needs.

Location factors
The freeway construction projects were located across Henan, which is characterized by differences in climate, geology, and topography that might be expected to have an influence on the cost of the projects. These characteristics tended to vary together, so that it was possible to define just three regions based on climate, geology and topography. The region factor is given in Table 1.
An important practical issue for highway construction is the amount of variation in altitude along the road, as greater variation increases costs. This is related to topography, which is taken into account in a broad sense in the regional categories above, but the degree of variation between individual projects pointed to a need to develop categories at the project level. Variation in altitude was therefore divided into five categories from "very small", which described roads that were essentially flat, to variations of between 450 m and 800 m. The variation in altitude factor categories (B1 to B5) are listed in Table 2, along with an indication of where the projects for each category took place, the range of absolute altitudes which applied there, and the freeway contracts which fell into these categories.

Resource factors
Labour, material and equipment are the main resources for a construction project. For simplicity, this study randomly selected five cases as an example to illustrate pavement construction cost components as shown in Table 3. Material costs constituted nearly 85% of pavement construction costs and the equipment costs constituted nearly 12%.
A construction project usually requires more than 100 types of material. The components of pavement material costs of the five cases are shown in Table 4, with the largest four components listed separately. Taken together, the two largest components (concrete and asphalt costs and stone costs) accounted for approximately 76% of material costs. For simplicity, the authors proposed to use the costs of crushed stone (diameter 4 cm) as a proxy for stone costs. The quantity of stone category material is written in the following form: , where N stone − the quantity of stone category material; A similar process is used to quantify an equivalent amount of Calx (calcium oxide) as a proxy for concrete and asphalt material costs. The quantity of concrete and asphalt category material is written in the following form: , where N Conctere And Asphalt − the quantity of concrete and asphalt category material; N j − the quantity of material j; p jo − the bid price of material j at time o; P calxo − the calx price at time o.   − the price of using equipment item i at time k; − the price of using equipment item i at time o. As with the materials costs, this study did not base INEP calculations on all equipment items, selecting only major items which together accounted for more than 80% of all equipment costs.

Data for model development
Data were obtained for freeway construction projects contracted by the Henan Transportation Department during the period 1994−2007. Some nonstandard design and construction projects were removed from the data base. The effective data consisted of contractual information on 88 projects, all of which were four lane divided carriageway freeways with 120 km/h speed limits. Pavement construction cost factors for a sample of the projects are shown in Table 5.
Eighty one of the 88 projects in the data set were used as a training data set, which was designed to comply with the following criteria for minimum size and proportion of total data set.
The minimum training set for the NN is written as follows: , where m − the number of the factors of a BP NN; n − the possible value of each factor (Shi 1995); N − the number of combinations of all possible values of the m parameters. The training sample set is considered incomplete in terms of solving the problem without a sample equal to or greater than N. The problem in this paper has 9 factors. The location factor has three possible values; the altitude factor has five possible values; the labour cost per km is simple and does not involve categories, so it has only one possible value; the two largest components of the cost of materials are "concrete and asphalt" and "stone", so it has two possible values; and the cost of equipment is simple and therefore has one possible value. The influence of "other costs" on the overall cost of pavement construction is not significant compared with other resource costs; it is assigned a value of 1 in our calculation. The possible values of the Index Number of Labour Prices (INLP) and Index Number of Equipment Prices (INEP), which correspond with the labour and equipment resource factors, are both 1. As the possible value of different types of materials has been taken into account in the materials resource costs, the value of the Index Number of Material Prices (INMP) is set at 1 to avoid recalculation. From the discussion above, it can safely be concluded that the minimum size of the training set for the BP NN used in this paper was 30. In theory, the more training samples, the better, but in practice there are limitations on the number of road segments available. 88 samples are gathered; most researchers will select 90% of them as training samples and use the remainder for testing. This research selected 81 of the 88 samples as training samples and used the remaining 7 for testing.

Artificial neural network models for construction cost estimation
Artificial Neural Networks (NNs) were selected to model the pavement construction cost. ANNs are versatile because of their highly distributed parallel structures and adaptive learning processes (Cheng et al. 2009b;Raab et al. 2013;Šliupas, Bazaras 2013;Wilmot, Mei 2005). Of the many structures available for NNs, the multilayer feed-forward network was chosen for this study because such networks have the ability to deal with complex systems and yet are relatively easy to construct (Hegazy, Ayed 1998;Hunter et al. 2012;Ji et al. 2009). To train the model, the back-propagation (BP) learning algorithm was used because it has strong classification and generalization capabilities (Cheng et al. 2009a;Li, Chen 2012;Xiaokang, Mei 2010). The form of neural network used in this study is common in civil engineering applications.
In theory, a three layer BP network consisting of an input layer, n input variables are mapped to m target output variables in a hidden layer and an output layer. Therefore, the general form of the neural network models used in this study is represented as the simple three layers shown in Fig. 1.
The number of neurons in the hidden layer is difficult to ascertain and is normally found by experiment and experience.
The number of neurons in the hidden layer is directly related to the requirements of the problem and the number of neurons in the input or output layer. If the number is too small, there will be insufficient information acquired by the network to resolve the question; if there are too many neurons, it will increase the number of iterations of the network, thus extending the training time and reducing network generalization, thus decreasing predictive power.
First, the number of neurons in the hidden layer is determined using empirical formulae during the design of the network. Second, the network is trained using different neuron numbers. Finally, the optimal number of neurons is obtained by comparing the operating results. The general empirical formula used to determine the number of neurons in the hidden layer (Hirose et al. 1991;Sheela, Deepa 2013) where i − the number of hidden neurons; n − the number of input neurons; m − the number of output neurons; a − a constant and 1 < a < 10.
According to Kolmogorov's theorem, if the number of neurons in the input layer was n, then the number of neurons in the hidden layer is 2n + 1. i is written as: And i is written as: , (9) where n − the number of input neurons; i − the number of hidden neurons. In this study, the max and min number of hidden neurons (i_max, i_min) was determined by (7), (8) and (9), while training the network from the min to max increased the number of neurons by one. The optimal number of hidden neurons was selected by convergence data and training error using the operating results of different neurons number.

Pavement construction cost model development
Nine neurons were used in the input layer. These arose from the construction cost factors identified earlier and shown in Table 5 (region, variation in altitude, labour costs, stone costs, concrete and asphalt costs, and equipment costs, INLP, INMP and INEP).
The min and max determined by (7), (8) and (9) were 4 and 19. The training error and testing error that varied through different numbers of neurons are listed in Table 6. According to the changes of training step and training error listed in Table 6, the training error gradually decreased with the increase of the number of hidden layer neurons, but it rebounds when the number was 17 to 19. In summary, the optimal hidden neurons number was 16.
Only one neuron appeared in the output layer, representing pavement construction cost.

MATLAB program
The MATLAB software package was used to estimate the neural network models. The MATLAB training function for BP network has training functions traingd, trainrp, traincgf, trainscg, trainlm, trainbr and so on. Each has its own characteristics but no single function is adapted to the training process in all cases (Adeli, Wu 1998;Minli, Shanshan 2012). There are also many improved BP algorithms  such as the algorithm with adaptive study velocity and the additive momentum which is implemented using Matlab function 'traingdx' , the gradient descent with momentum function that is implemented using 'traingdm' , and the gradient descent adaptive function which is implemented by 'traingda' etc. The training data set was used to map the input variable pattern to the target output pattern and minimize the error by adjusting the weights of the network links in an iterative process. Training was set to stop after 7000 iterations or until convergence of the root mean square error (RMSE) to a value less than 0.01.
Observing the changes of training step and training error obtained by different training functions determined the number of neurons in each layer of the network. The result showed that 'trainbr' was the best function, as its testing error was the minimum; even 'trainlm' and 'trainrp' had a small training step, but their testing error was relatively large; 'trainscg' , 'traincgf ' and 'traingd' showed much worse results; 'traingdm' and 'traingd' showed the worst results. In short, 'trainbr' is chosen as the training function for the network.

Model testing
A random selection of 81 freeway cases were used as a training data set for the neural network model and the remaining 7 freeway cases were used as a testing set on which the performance of the NN model was evaluated. The testing set projects were Shang-Zhou 4, Shang-Zhou(SQ)02, Yong-Bo A4, An-Nan, Daguang-Xin 8, Ji-Jin and Feng-Nan 08.
The NN model was programmed in MATLAB, with each run producing a slightly different result. The results of 10 runs on the testing set are listed in the Table 8. The statistical measure mean absolute percentage error (MAPE) was used to measure the performance of the models. The MAPE of the seven test cases varied from 0.048% to 2.24%, and the mean MAPE was 0.67%. The implications of this value for the accuracy of cost estimates are discussed below.

Predicting
The model was used to forecast the change in future freeway pavement construction costs based on predictions of input values.
Input variables such as labour, N stone , N Concrete and Asphalt , and equipment costs utilized average values observed between 1994 and 2007. The next two variables, variation in altitude and region, were taken from Tables 1 and 2 based on the specific location and characteristics of each contract. The other three variables , , and were calculated using (3), (4), and (5) and were based on forecasts of future GDP. The resulting values are listed in Table 9.
Using these values, the freeway pavement construction costs predicted by the model for 2010 are shown in Table 10.

Accuracy of predicted costs
As noted above, the MAPE is around 0.67%, which needs to be taken into account in the predictions made by the model. The MAPE is used to calculate an expected range within which the actual future cost is expected to fall. MAPE is written in the following form: .
Hence, CReality is expressed as , , where and are the correction coefficients. Therefore, the range of future freeway pavement construction costs is written in the following form: . (13) Using (13), the range of future freeway pavement construction costs in 2010 is shown in Table 11.

Discussion
The paper presents prediction of the construction cost of freeway pavement in Henan, China using an Artificial Neural Network. It seems to be informative and to provide accurate forecasts. However, the following issues need to be taken into account: -there are more than 9 factors that influence the costs of freeway pavement construction -increasing the number of factors would give a more accurate model, but a greater sample size might also be needed; -for NN, in theory, the more training samples the better, but in practice there are limits to numbers of road segments available; -the price of product is changed suddenly due to the international economic situation; -the model will need to be tested using actual data. In short, there is still some distance to go in order to pursue this approach as an engineering application, but in the meantime it is useful for the Secretary of the Provincial Transportation Department or a senior official in the department to adopt as a reference.

Conclusion
1. This paper has explored a new approach to the estimation of future freeway pavement construction costs by using a Neural Network trained with real data from 81 construction projects and incorporating a more comprehensive set of factors than is typically employed. Data were obtained from freeway construction projects let by the Henan Transportation Department during the period 1994-2007. The data consisted of information on 88 freeway contracts. Data from a random selection of 81 freeway cases were used to train a neural network model and the remaining data were used to test the performance of the Neural Network model. Finally, the likely range of pavement construction cost of three freeways in 2010 was predicted.
2. The factors used in the Neural Network model in this study reflect the characteristics of location (regionwhich incorporates differences in climate, geology and topography -and variation in altitude along the constructed road), resources costs (labour costs, and proxy costs for stone, concrete and asphalt, and equipment), and time-related changes dependent on indexation costs of labour, materials and equipment (Index Number of Prices, Index Number of Labour Prices and Index Number of Equipment Prices, respectively). Neural Network models have usually been used only for individual construction projects, rather than investigating the overall cost of construction across a range of projects, and examining how their cost alters over time, so that this approach represents a new way of addressing the problem of predicting future pavement construction costs.
3. A question which arises is the generalizability of the results, however this is relevant to the specific model derived rather than the process described. While the Neural Network was developed using data from Henan Province, the principal factors and the applicability of the Neural Network process are transferable to other locations. The nine factors used in this study will not all be applicable in another location, and this would need to be determined at the outset through consultation with the relevant agencies and experts.
4. The general form of the neural network model used in this study was three layers and for training, the Back-Propagation learning algorithm was used. In addition, the MATLAB® software package was used to estimate the neural network models, utilizing the training function 'trainbr' with characteristics of adaptive study velocity and the additive momentum method. Again, alternative approaches are tested and other software packages used. 5. One limitation which requires further testing relates to the success of the model's predictions in practice. This study has shown how to develop and train the model, and has tested how consistent its predictions were when applied to a different set of cases, but no attempt was made to test the accuracy of the predictions in practice. This requires a longer term study with greater amounts of data.