Modeling Causes of Cost Overrun in Large Construction Projects with Partial Least Square-SEM Approach: Contractor's Perspective

Construction cost overrun is a major problem faced by the construction industry globally and it needs serious attention to alleviate. Cost overrun is a result of one or combination of several causes which are very important to identify for effective cost performance. Current methodologies focusing on the identification of causes does not give the insight of underlying relationships between the causes. Hence, this study focused on studying the fundamental relationship between factors of cost overrun using Partial Least Square-SEM method. This is an advanced multivariate analysis technique for estimating and analyzing causal relationships in path models. Data collection was carried out with structured questionnaire survey amongst contractors involving in large construction projects in Malaysia. Hierarchal model for assessing causative factors and cost overrun was developed and analyzed using Smart PLS software of SEM and it was found that contractor’s site management related factors had strong effect on cost overrun. The calculated Global Fit (GoF) Index of model was 0.405, which indicates that the developed model had substantial explaining power to represent the Malaysia construction industry focused of large construction projects. Hence, improvement in contractor’s site management is the critical requirement to control construction cost overrun.


INTRODUCTION
Construction industry now-a-days is being faced by chronic problems like delay in completion, low productivity, low quality and cost overrun etc., of these, cost overrun is the most significant issue faced globally. Commonly, most of the projects faced cost overrun when executed (Azhar et al., 2010) and these overruns produce immediate effects on construction stakeholders and on the country's economy (Moura et al., 2007). This is because construction industry plays vital role in economic and social growth of any country. In Malaysia, like other countries, construction industry is rapidly growing and consequently huge amount is invested to this industry. Under 10 th Malaysian Plan, RM230 billion have been allocated for construction development (Abu Mansor, 2010). However, the construction projects are rarely finished within stipulated time and at the estimated cost. In a study conducted on 308 public and 51 private projects, it was found that only 46.8 and 37.2% of public sector and private sector projects completed within the budget, respectively. Conversely, 84.3% of the private sector and 76% public sector projects completed within 10% of cost deviation (Endut et al., 2009).
To prevent poor cost performance, it is often required to evaluate a project's vulnerability of cost overrun before it is too late (Cha and Shin, 2011). Ibrahim et al. (2010) stated that in Malaysia very little research has been carried out by academic and practitioners on problems faced by construction industry, more specifically there is lack of investigation on construction cost factors (Toh et al., 2011). Hence, this study focused on developing the path model of causative factors of cost overrun in construction, however it was limited to large construction projects only and the targeted group for data collection was the contractors of the firms. In Malaysia, any construction project of contract sum more than RM 5 Million is regarded as large project (Abdullah et al., 2009).
Due to lack of assessing causal relationship of cost overrun factors, Structural Equation Modeling (SEM) approach was adopted to analyze these factors. SEM is regarded as an extension of standardized regression modeling used to deal with poorly measured independent variables and is ideally suited for many research issues in the fields of construction engineering and management (Molenaar et al., 2000). SEM method is suitable for exploring relationships among key variables and is highly applicable for resolving the complicated problems in the construction domain (Yang and Ou, 2008) as the functionality of SEM is better than other multivariate techniques including multiple regression, path analysis, and factor analysis (Ng et al., 2010). There are two approaches that may be used for SEM analysis:  Covariance-based structure analysis  Component-based analysis using partial least square estimation also known as PLS-SEM (Haenlein and Kaplan, 2004) For this study, PLS approach was used as it is more advisable when the objective of study is testing the causal relation and theory development (Hair et al., 2011).

LITERATURE REVIEW
Cost overrun and common causes: Cost is among major consideration throughout the project management life cycle and is considered as prime factor for success of any project. However, in spite of its proven importance it is uncommon to see project completion within estimated cost (Azhar et al., 2008). In today's construction industry, cost overrun is very common phenomenon worldwide. The problem of cost overrun is critical and needs to be studies more to alleviate this issue (Angelo and Reina, 2002). A study conducted by Frame (1997) consisting of 8,000 projects showed that only 16% of the projects could satisfy the three famous performance criteria: completing projects on time, within budgeted cost and quality standard (Ameh et al., 2010), while In a global study on cost overrun issues in transport infrastructure projects covering 258 projects in 20 nations, Flyvbjerg et al. (2003) concluded that 9 out 10 projects face cost overrun and cost performance has not improved over time, it is in the same order of magnitude as it was 10, 30 or 70 years ago.
Similarly, Omoregie and Radford (2006) reported a minimum average of cost escalation in construction projects in Nigeria to be 14%, while in Portugal construction projects face a minimum of 12% of cost overrun (Moura et al., 2007). In Pakistan, minimum cost overrun was reported as 10% of the estimated cost of project, this trend is sometimes is more severe in developing countries where cost overrun sometime exceeds 100% of the anticipated cost of the project (Azhar et al., 2008).
Cost overrun in construction project can occur due to various causes. A number of researchers have investigated various causes of cost overrun. Ameh et al. (2010) in his study investigating 42 cost overrun causes found that lack of experience of contractors, cost of material, fluctuation in the prices of materials, frequent design changes, economic stability, high interest rates charged by banks on loans and Mode of financing, bonds and payments as well as fraudulent practices and kickbacks were dominant factor causing cost overrun run in Nigeria. Enshassi et al. (2009) found that the top 10 of 42 investigated factors causing cost overrun in construction projects of Gaza were increment of materials prices due to continuous border closures, delay in construction, supply of raw materials and equipment by contractors, fluctuations in the cost of building materials, unsettlement of the local currency in relation to dollar value, project materials monopoly by some suppliers, resources constraint: funds and associated auxiliaries not ready, lack of cost planning/monitoring during pre-and post contract stages, improvements to standard drawings during construction stage, design changes and inaccurate quantity take-off. Le-Hoai et al. (2008) found that poor site management and supervision, poor project management assistance, financial difficulties of owner, financial difficulties of contractor; design changes were most severe and common causes of cost overrun in Vietnamese construction industry. Memon et al. (2010) investigated large project of MARA Malaysia and found that cash flow and financial difficulties faced by contractors, contractor's poor site management and supervision, inadequate contractor experience, shortage of site workers, incorrect planning and scheduling by contractors were most severe factors while changes in scope of project and frequent design changes are least affecting factors on construction cost. Koushki et al. (2005) studied private residential projects in Kuwait and concluded that contractor related issues, materialrelated problems and financial constraints were major reasons of cost overrun. Several others researchers have conducted survey in different areas of the world contributing to identify the main causes of construction cost overrun which were reviewed in order to identify the common causes of cost overrun. This results in identifying 35 common causes of cost overrun categorized in 7 groups named as Contractor's Site Management related factors (CSM) with 8 items (also known as manifest variable), Design and Documentation related Factors (DDF) with 5 items, Financial Management related factors (FIN) having 6 items, Information and Communication related factors (ICT) containing 3 items, Human Resource (Workforce) Related Factors (LAB) with 5 items, Nonhuman Resource Related Factors (MMF) with 4 items; and Project Management and Contract Administration related factors (PMCA) with 4 items. These factors were considered for further investigation in Malaysian construction industry and presented in Table 1.

Partial Least Square Structural Equation Modeling
(PLS-SEM): Use of Partial Least Square Structural Equation Modeling (PLS-SEM) in literature is also referred as PLS path modeling (Ringle, 2010). The PLS path modeling approach is a general method for estimating causal relationships in path models that Schedule delay Omoregie and Radford (2006) and Harisweni (2007) (Ringle, 2010). PLS uses a component-based approach, similar to principal components factor analysis (Compeau et al., 1999). The PLS path analysis predominantly focuses on estimating and analyzing the relationships between the latent variables in the inner model. However, latent variables are measured by means of a block of items or manifest variables, with each of these indicators associated with a particular latent variable (Ringle, 2010). PLS path models are formally defined by two sets of equation: the inner model (or structural model) and outer model (measurement model). The inner model specifies the relationships between unobserved or latent variables, whereas the outer model specifies the relationship between a latent variable and its observed or manifest variables (Henseler et al., 2009). In PLS outer relationships or outer model include 2 types of models: formative and reflective models (Gudergan et al., 2008;Henseler et al., 2009). A formative measurement model has cause-effect relationships between the manifest variables and the latent index (independent causes), a reflective measurement model involves paths from the latent construct to the manifest variables or dependent effects (Henseler et al., 2009). The use of PLS path modeling can be predominantly found in the fields of marketing, strategic management, and management information systems (Henseler et al., 2009)

DEVELOPING CONCEPTUAL MODEL
In order to assess effect of causative factors on cost overrun as hierarchical conceptualization, reflective construct was adopted. A hierarchal model based on groups and items identified in Table 1 showing relation to endogenous latent variable i.e., cost overrun is shown in Fig.1. Data collection and sample characteristics: A quantitative approach using structured questionnaire was used to understand the perception of contractors in Malaysia towards factors influencing construction cost at construction projects. In order to be able to select the appropriate method of analysis, ordinal scales of measurement were used. A five likert scale was adopted as 1 = Not Significant (NS); 2 = Slightly Significant (SS); 3 = Moderately Significant (MS); 4 = Very Significant (VS); 5 = Extremely Significant (ES). A total of 200 questionnaire sets were distributed to randomly selected contractor organizations registered in G7 category of registration (the top class for large contractors) using CIDB Malaysia official portal (CIDB, 2011). Of which 124 responses were received back, however, some of the questionnaire sets were incomplete and filled partially which were considered invalid and not suitable for further analysis. Table 2 shows the summary of data collection. The respondents involved in the survey have had several years of experience in handling various types of projects. The characteristics of the respondents participated in survey showed that 39% of respondents were engaged in handling building projects while 29% of respondents had experienced handling infrastructure project. However, 32% of respondents had experience of handling both types of project i.e., buildings as well as infrastructure projects. Although all the respondents had experienced in handling large projects i.e., projects with contract amount more than RM 5 million. A significant number of respondents (36%) had experience of handling very large projects i.e., project with contract amount more than RM 50 Millions, while 45 and 19% of respondents were involved in handling projects with contract amount RM 10-50 Million and RM 6-10 Million respectively. Majority of the respondents had working experience of more than 10 years and significant number of respondents i.e., 31% of respondents had were engaged to work in construction industry for more than 20 years and 23% of respondents had experience of more than 16 years in handling construction projects while 17 and 20% of respondents had experience of more than 11 and 6 years in handling of construction projects. Only 9% respondents were experienced below 5 years of practicing with construction industry. This shows that the respondents were competent enough and capable to participate in the survey.

Data analysis:
The theoretical model (Fig. 1) was analyzed with partial least square estimation approach. Smart PLS 2.0 (Ringle et al., 2005) was used to estimate measurement and structural model parameters. A two-step process (Henseler et al., 2009) was adopted to calculate PLS model criteria. The PLS path model evaluation steps are:  Outer model (measurement model) evaluation to determine the reliability and validity of the construct (Hulland, 1999). This can be assessed by examining the individual loading of each item, internal composite reliability and discriminant validity (Chin, 1998).  Inner model (structural model) evaluation to assess the relationship between exogenous and endogenous latent variables (independent latent variables and dependent variable) in respect of variance accounted for (Hulland, 1999). In the structural model, the hypotheses are tested by assessing the path coefficients "which are standardized betas" (Compeau et al., 1999). Nonparametric bootstrapping (Akter et al., 2011a) with 5000 replications was applied to test the hypothesis and obtain the standard errors of the estimates.
The sequence ensures that reliability and validity of measures of constructs are ascertained before attempting to draw conclusions about the nature of the relationships between constructs (Aibinu et al., 2011). Then Goodness of Fit (GOF) (Akter et al., 2011a) measure was used to assess the explaining power of the model. GoF index is crucial to assess the global validity of a PLS based complex model (Tenenhaus et al., 2005cited by Akter et al., 2011b. Assessment of measurement model: Properties of the measurement scales were assessed by calculating:  Indicator reliability and convergent validity  Discriminant validity as adopted by Akter et al. (2011a) and Aibinu et al. (2011)

Individual item reliability and convergent validity:
Individual item reliability is the extent to which measurements of the latent variables measured with multiple-item scale reflects mostly the true score of the latent variables relative to the error (Hulland, 1999). It is the correlations of the items with their respective latent variables. To evaluate individual item reliability, the standardized loadings (or simple correlation) were assessed. Aibinu and Al-Lawati (2010) suggested that items with low loadings should be reviewed, and perhaps dropped since they would add very little explanatory power to the model and therefore biasing the estimates of the parameters linking the latent variables. According to Hulland (1999) items with  loadings of less than 0.4 should be dropped while Chin (1998) argued that item with loading below than 0.5 should be dropped. A common threshold is that the items with outer loading higher than 0.7 should be considered highly satisfactory (Hulland, 1999;Henseler et al., 2009;Gotz et al., 2010) and for items with loading between 0.4 to 0.7 practical potential significance should be assessed prior to elimination. If an indicator's reliability is low and eliminating this indicator goes along with a substantial increase of composite reliability, it makes sense to discard this indicator (Henseler et al., 2009). Hence, Iterative process for elimination of the items suggested by Aibinu et al. (2011) was adopted considering that discarding the indicator increases the composite reliability. Convergent validity is the measure of the internal consistency which ensures that the items assumed to measure a particular construct actually measure it and not another construct (Hulland, 1999). Composite Reliability scores (CR), Cronbach's alpha and Average Variance Extracted (AVE) tests were used to determine the convergent validity of measured constructs (Fornell and Larker, 1981 as cited by Akter et al., 2011a;Aibinu et al., 2011). The composite reliability measure (synonymous with factor reliability or Joreskog's rho) can be used to check how well a construct is measured by its assigned indicators. The reliability test depicts the degree of internal consistency. The most commonly used reliability coefficient is Cronbach's alpha, which is a generalized measure of a uni-dimensional, multiitem scale's internal consistency. A basic assumption is that the average covariance among indicators has to be positive. AVE measures the amount of variance that a latent variable captures from its measurement items relative to the amount of variance due to measurement errors.

Iteration 1 ------------------------------------------------------------------Iteration 2 ---------------------------------------------------------------
The composite reliability can vary between 0 and 1. Researchers argue that composite reliability value for a good model should be more than 0.7 (Akter et al., 2011a). Similarly, the value of alpha can also vary between 0 to 1. A common threshold for sufficient values of Cronbach's alpha is 0.6 and if the value is more than 0.7, data is considered as highly acceptable (Yang and Ou, 2008;Cheung, 2005, Akter et al., 2011a). Fornell andLarcker 1981 (Akter et al., 2011a;Aibinu et al., 2010) stated that AVE should be higher than 0.5. This means that at least 50% of measurement variance is captured by the latent variables. This can be summarized as the cut-off value for AVE, CR and Cronbach Alpha were 0.5, 0.7 and 0.7, respectively. Table 3 shows the results of individual item reliability and convergent validity measures.  Table 3 shows that in iteration 1 almost all the manifest items had outer loading more than 0.5 except DDF02 and DDF04 in DDF construct. Also the AVE of this construct was very low. This implied that this construction was required to apply modification by dropping the items DDF02 and DDF04. While manifest item of other constructs had loading value above 0.5, also AVE, CR and Alpha values of all construct were exceeded than cut-off value i.e., 0.5, 0.7 and o.7 respectively except the FIN construct which had AVE value lower that required. Hence this constructs also was considered for modification and the item with lowest loading value i.e., FIN 04 was selected for dropping. Using the iterative process of deletion, in iteration 2, DDF04 having lowest value in the construct and Fin04 were omitted and PLS algorithm was run to test the model properties. As depicted in results of iteration 2 in Table 2, the omitting of one item from each of construct FIN and DDF resulted in improving the value AVE, CR and Alpha which exceeded than required cut-off values. Also the outer loading value of DDF02 was improved to 0.492 which was higher that the value to be considered for deletion as suggested by Hulland (1999). Hence DDF02 was not deleted and measurement model was considered satisfactory with the evidence of adequate reliability, convergent validity.

Discriminant validity of constructs:
Discriminant validity indicates the extent to which a given construct is different from other constructs (Hulland, 1999). The discriminant validity of the measurement was then evaluated using analysis of the average variance extracted (Akter et al., 2011a;Aibinu et al., 2011) by using the criteria that "a construct should share more variance with its measures than it shares with other constructs in the model (Fornell and Larker, 1981;Aibinu et al., 2011). This can be examined by comparing the AVE of construct shared on itself and other constructs. For valid discriminant of construct, AVE shared on it should greater than shared with other constructs. The rule that the square root of the AVE of each construct should be larger than the correlation of two constructs (Chin, 1998) was applied. This was done by replacing the diagonal of correlation matrix with the value of square root of the AVE. For adequate discriminant validity, the diagonal elements need to be greater than the off-diagonal elements in the corresponding rows and columns (Hulland, 1999). Table 4 presents the correlation matrix for the constructs. It was found that square root of AVE had   Figure 2 shows the results of structural model. According to Cohen (1988) R² of endogenous can be assessed as substantial = 0.26, moderate = 0.13 and weak = 0.02. From Fig. 2 it is perceived that R² of the endogenous latent variable (cost overrun) is 0.262 which is higher than the cut-off value and hence the model lies at satisfactory level. In assessing the path coefficient, beta value of all structural paths is compared, higher the path co-efficient the significant effect on endogenous latent variable. Figure 2 show that CSM has the highest co-efficient value of 0.604. This means the CSM shares high value of variance with respect to cost overrun have large effect on cost overrun. The second major construct affecting cost overrun is LAB with path co-efficient of 0.251. Further, the significance of the path co-efficient was tested by calculating t-value using non-parametric bootstrap procedure with Smart PLS software to provide confidence intervals for all parameter estimates, building the basis for statistical inference. In general, the boot strap technique provides an estimate of the shape, spread, and bias of the sampling distribution of a specific statistic.
Bootstrapping treats the observed sample as if it represents the population. The procedure creates a large, pre-specified number of bootstrap samples (e.g., 5,000). Each bootstrap sample should have the same number of cases as the original sample. Bootstrap samples are created by randomly drawing cases with replacement from the original sample. The PLS results for all bootstrap samples provide the mean value and standard error for each path model coefficient. This information permits a student's t-test to be performed for the significance of path model relationships (Henseler et al., 2009). Table 5 shows the summary of the path results and the corresponding t values and estimated p value associated with each t value calculated with bootstrap run suing 5000 bootstrap samples. For all the paths, a two tail t-test was used. The exact p values (probability value) associated with the t values of each path coefficient were also estimated. Table 5 shows that all the paths retrieved tvalue higher than minimum cut-off value i.e., 2.58 at significance level = 1% (Hair et al., 2011). This implies that all the construct have significant effect on cost overrun.
Overall model assessment: Global Fit measure (GoF) for PLS path modeling, defined as the geometric mean of the average communality and average R 2 (for endogenous constructs) was used to assess over all model fitness and explaining power of model. The GoF index is bounded between 0 and 1. Wetzels et al. (2009) suggest using 0.50 as the cut off value for communality (Fornel and Larcker, 1981) and different effect sizes of R 2 (Cohen, 1988cited by Akter et al., 2011b to determine GoFsmall (0.10), GoFmedium (0.25) and GoFlarge (0.36) leading to achieve GoFsmall = 0.1, GoFmedium = 0.25, GoFlarge = 0.36 as cut-off values for global validation of PLS model as adopted by Akter et al., 2011a, b). Following equation used by (Akter et al., 2011a) was adopted to calculate GoF: √0.626 0.262 0.405 In this study, GoF value was obtained of 0.405 for the complete (main effects) model, which exceeds the cut-off value in comparison of baseline value. This shows that the model has substantial explaining power.

DISCUSSION AND CONCLUSION
This study investigated various factor affecting cost overrun using partial least square approach to structural equation modeling. The results indicated that contractor's site management factors ware major contributing causes of cost overrun. These findings were supported by Frimpong et al. (2003), Žujo and Car-Pušić (2008) and Le-Hoai et al. (2008) found poor site management as 1 st ranked causes of cost overrun in Vietnamese construction industry. In Pakistan also poor site management was a very significant cause of cost overrun. Aje et al. (2009) states that Contractors' performance is crucial to success of any construction project as it is the contractors who convert design into practical reality and contractors' management capability enhances project performance (Aje et al., 2009). Poor management causes many constraints at the projects, such as poor follow-up of progress, incorrect distribution of works, non-commitment of site employees, poor monitoring of project, etc., (Enshassi et al., 2009). Poor of site management reflects the weakness and incompetency of contractors (Le-Hoai et al., 2008). Therefore, improved site management and supervision of contractors can lead to improve the project performance and contribute in effective cost management of project that can result in control of cost overrun.