Hybrid Artiﬁcial Intelligence HFS-RF-PSO Model for Construction Labor Productivity Prediction and Optimization

: This paper presents a novel approach, using hybrid feature selection (HFS), machine learning (ML), and particle swarm optimization (PSO) to predict and optimize construction labor productivity (CLP). HFS selects factors that are most predictive of CLP to reduce the complexity of CLP data. Selected factors are used as inputs for four ML models for CLP prediction. The study results showed that random forest (RF) obtains better performance in mapping the relationship between CLP and selected factors affecting CLP, compared with the other three models. Finally, the integration of RF and PSO is developed to identify the maximum CLP value and the optimum value of each selected factor. This paper introduces a new hybrid model named HFS-RF-PSO that addresses the main limitation of existing CLP prediction studies, which is the lack of capacity to optimize CLP and its most predictive factors with respect to a construction company’s preferences, such as a targeted CLP. The major contribution of this paper is the development of the hybrid HFS-RF-PSO model as a novel approach for optimizing factors that inﬂuence CLP and identifying the maximum CLP value.


Introduction
The construction industry is a key sector of the national economy for countries all around the world [1]. Since construction is a labor-intensive industry, poor construction labor productivity (CLP) usually causes cost and time overruns in projects [2,3]. To overcome this issue, the construction industry is constantly trying to identify CLP improvement strategies [4]. However, project managers first require a CLP model that helps them identify which factors lead to positively changing CLP and by how much [5,6]. Furthermore, the accurate prediction of CLP is essential for effective scheduling and planning prior to and during project execution [7]. CLP is a form of efficiency measure that is mainly defined as a ratio of units of output (i.e., project components) to units of input (e.g., labor work hours, and labor cost) or vice versa [7][8][9]. Many factors can potentially affect CLP, reducing the accuracy of a predictive model and imposing the risk of data overfitting [10]. The power of any prediction method relies on choosing the proper factors that affect the model output [11]. Since the identification of factors that influence construction productivity is essential for productivity performance improvement, different studies identified numerous factors that affect CLP [12]. Several studies used questionnaire surveys to identify factors with the greatest influence on CLP [13][14][15][16][17][18][19]. However, many studies notably focused on finding CLP factors, and few works are found in the literature on labor productivity prediction [20].
The most reliable estimate of productivity can be achieved using past project data because the important predictive productivity information can be extracted for future project management and planning [12].
The remainder of this paper is organized as follows. Section 2 presents a brief review of past research on modeling CLP. In Section 3, the proposed methodology for predicting and optimizing CLP is presented. Section 4 provides the experimental results from using the proposed methodology to predict and optimize CLP, using a data set. Finally, Section 5 presents conclusions and recommendations for future work.

Literature Review on Construction Productivity Modeling
Modeling CLP is challenging because it requires evaluating the impact of numerous factors simultaneously. To deal with this challenge, AI techniques, such as fuzzy logic, ANN, classifiers, learning algorithms, and hybrid techniques, are widely used in the construction management domain. Golnaraghi et al. [33] developed a CLP prediction model using ANN and compared it with other techniques, including adaptive neuro-fuzzy system (ANFIS) and radial basis function neural network. El-Gohary et al. [12] introduced the engineering approach, using ANN to map the relationship between CLP and factors influencing it. Nasirzadeh et al. [34] developed ANN-based prediction intervals to predict CLP, using historical data. Their model identified various sources of uncertainty affecting prediction. Momade et al. [35] proposed a data-driven approach, using support vector machine (SVM) and random forest (RF) to model and predict CLP. Their results showed that SVM achieved a higher rate of accuracy, compared to RF. Recently, Sarihi et al. [36] developed a comparative analysis of CLP models, using ANN, ANFIS, and logical fuzzy inference system (FIS). They found that ANFIS showed better accuracy, compared to the two other models.
However, in recent years, hybrid systems-based machine learning (ML), optimization algorithms, and simulation techniques have been applied in several construction problems because they are superior to sole AI techniques [7,37]. Gerami Seresht and Fayek [26] developed fuzzy system dynamics technique by integrating system dynamics and fuzzy logic to model the multifactor productivity of equipment-intensive activities. Tsehayae and Fayek [9] demonstrated the application of data-driven fuzzy clustering in the development of FIS. They then used genetic algorithm (GA)-based optimization to address the FIS limitation, which is the inability to learn from data. Khanzadi et al. [8] developed a hybrid simulation model by combining system dynamics and agent-based modeling to predict labor productivity by evaluating various influencing factors in a concrete placing project. Raoufi and Fayek [38] proposed the integration of fuzzy logic and agent-based modeling to predict the performance of construction crews, according to crew motivation and situational input variables. Gerami Seresht et al. [39] introduced a new fuzzy clustering algorithm, using Gustafson-Kessel's algorithm and Adam optimization to determine the number of clusters automatically and assign weights to the FIS rules to improve accuracy; they then used the proposed algorithm to predict CLP for concrete placing activities, and the results showed that the new approach improved accuracy and efficiency, compared to previous research. Although the aforementioned papers developed hybrid methods to model and predict construction productivity, very few studies have applied HFS methods as the combination of filter and wrapper FS methods in CLP prediction to find the most predictive factors. Ebrahimi et al. [10] proposed the integration of ANN and GA as a wrapper method for selecting the most influential CLP factors and then predicting CLP, using ANN. The results showed accuracy improvement, compared to previous work using filter methods. Recently, Cheng et al. [7] introduced a hybrid model, including least square SVM, symbiotic organisms search, and wrapper-based FS methods to predict construction productivity. Goodarzizad et al. [40] proposed the integration of ANN and the grasshopper optimization algorithm to identify the factors with the greatest influence on CLP. They then applied an ANN to measure CLP, using the identified factors.
One wrapper FS method that has been developed in other disciplines is the integration of SVM and GA, which shows appropriate efficiency in selecting the optimal feature subset. Fei and Min [41] developed the integration of SVM and GA to select a feature subset and optimize SVM parameters for solving binary classification problems. Furthermore, Tao et al. [42] presented a novel approach based on GA for feature selection parameter optimization of SVM in hospitalization expense modeling. Due to the aforementioned superiority of HFS methods in the introduction, this study integrated the ReliefF algorithm as a filter method with SVM-GA as a wrapper method to develop the proposed HFS model.
Modeling the optimization process is another important challenge in CLP studies. Optimization techniques have been used in various construction domains. Jin et al. [43] proposed a workspace-based multi-objective optimization model to produce optimal solutions for scaffolding resource allocation and space planning. Lin and Lai [44] proposed a time-cost trade-off model to reduce project duration that used GA to evaluate variable productivity. Shahbazi et al. [45] presented a model, using mixed-integer nonlinear programming to allocate tasks to employees with different skill levels. However, the previous studies did not explore hybrid optimization in the area of construction labor productivity optimization by using evolutionary optimization techniques to evaluate the optimum value of factors influencing CLP.

Research Methodology
This paper presents an HFS method for identifying the factors that are most predictive of CLP and utilizing them as inputs of the developed CLP models, and develops a novel hybrid evolutionary optimization technique by integrating HFS, a predictive model, and particle swarm optimization (PSO) to optimize CLP and the factors that influence it. This proposed methodology for modeling, predicting, and optimizing CLP is accomplished in the following four main steps: (1) preparing CLP data, using different techniques (Section 3.1); (2) developing an HFS to reduce dimensionality and identify the factors most predictive of CLP (Section 3.2); (3) developing four algorithms widely used in predictive models, namely ANFIS, ANFIS-GA, ANN, and RF (Section 3.3); and (4) developing a hybrid optimization model, using PSO to search for the maximum CLP value and optimum value of each selected factor (Section 3.4). These four steps are presented in Figure 1. subset. Fei and Min [41] developed the integration of SVM and GA to select a feature subset and optimize SVM parameters for solving binary classification problems. Furthermore, Tao et al. [42] presented a novel approach based on GA for feature selection parameter optimization of SVM in hospitalization expense modeling. Due to the aforementioned superiority of HFS methods in the introduction, this study integrated the ReliefF algorithm as a filter method with SVM-GA as a wrapper method to develop the proposed HFS model. Modeling the optimization process is another important challenge in CLP studies. Optimization techniques have been used in various construction domains. Jin et al. [43] proposed a workspace-based multi-objective optimization model to produce optimal solutions for scaffolding resource allocation and space planning. Lin and Lai [44] proposed a time-cost trade-off model to reduce project duration that used GA to evaluate variable productivity. Shahbazi et al. [45] presented a model, using mixed-integer nonlinear programming to allocate tasks to employees with different skill levels. However, the previous studies did not explore hybrid optimization in the area of construction labor productivity optimization by using evolutionary optimization techniques to evaluate the optimum value of factors influencing CLP.

Research Methodology
This paper presents an HFS method for identifying the factors that are most predictive of CLP and utilizing them as inputs of the developed CLP models, and develops a novel hybrid evolutionary optimization technique by integrating HFS, a predictive model, and particle swarm optimization (PSO) to optimize CLP and the factors that influence it. This proposed methodology for modeling, predicting, and optimizing CLP is accomplished in the following four main steps: (1) preparing CLP data, using different techniques (Section 3.1); (2) developing an HFS to reduce dimensionality and identify the factors most predictive of CLP (Section 3.2); (3) developing four algorithms widely used in predictive models, namely ANFIS, ANFIS-GA, ANN, and RF (Section 3.

CLP Data Identification
In this study, the proposed methodology was used to predict and optimize CLP of concrete placing activities, using the data collected by Tsehayae and Fayek [9] in a previous study. Data were collected in Alberta, Canada, in four construction project contexts,

CLP Data Identification
In this study, the proposed methodology was used to predict and optimize CLP of concrete placing activities, using the data collected by Tsehayae and Fayek [9] in a previous study. Data were collected in Alberta, Canada, in four construction project contexts, including residential and commercial warehouse buildings, residential and commercial high-rise buildings, industrial buildings, and institutional buildings. A literature review conducted by Tsehayae and Fayek [13] initially identified 169 factors that influence CLP. They collected 112 factors influencing CLP for concrete placing activities over 92 days of data collection. In this study, per Equation (1), CLP is defined as a ratio of output, which is installed quantity, to input, which is labor work hours; CLP has positive real values.

CLP =
Installed quantity (output) Labor work hours(input) In the existing data set, some CLP factors are objective, such as crew size, which has a numerical measure (in terms of number of workers), while other factors are subjective, such as complexity of task, which does not have a well-defined measurement. Subjective factors were measured using a predetermined rating scale of 1-5, according to Tsehayae and Fayek [5]. CLP factors can be grouped into six levels: (1) activity, (2) project, (3) organizational, (4) provincial, (5) national, and (6) global.

CLP Data Preparation
The CLP data preparation process consists of normalization, imputing missing values, removing factors with zero variance, and eliminating outliers.
Mostly, CLP data have varying scales that lead to increased training time and biases in predictive models and affect convergence in prediction [33]. Hence, the experimental data are normalized using Equation (2) in a process called "max-min normalization", where x ij is the value of instance i from factor j; x jmin and x jmax are the minimum and maximum values of factor j, respectively; and r ij is the normalized value of instance i from factor j. Max-min normalization guarantees that all features have the exact same scale.
Data sets often have some missing values, due to human error or non-availability of real data. Imputation methods use ML algorithms to help estimate missing values. Based on Choudhury and Pal [46], the neural network-based imputation method is able to train a data set containing incomplete samples and identify instances similar to instances with missing values. Based on the results of several studies [46][47][48], neural network imputation was applied in the present study in order to impute missing values of CLP.
Standard deviation is a measure of the variance of each factor in a data set. Removing factors with no variation in data instances is a pre-processing step for data sets [49]. In this study, CLP factors with zero standard deviation were removed from the data set.
Detecting and eliminating outliers is another essential step in data preparation. Although outliers are part of a data set, they are significantly different from other observations. In this study, Tukey's method, which utilizes the median, upper, and lower quartiles of a data set, was applied as an outlier detection method [50]. Since quartiles are resistant to farthest data of the data set, Tukey's method is less sensitive, compared to methods using mean and standard variance [50].

Hybrid Feature Selection (HFS)
The developed HFS is a combination of the ReliefF algorithm as a filter method and the integration of SVM and GA as a wrapper method [51] and is utilized to identify the factors that are most predictive of CLP. The structure of three algorithms, namely ReliefF, SVM, and GA, are briefly discussed in the following sections.

ReliefF
Relief is a widely used filter-based FS method that identifies the best subset of features by measuring features' weights. Proposed by Kira and Rendell [52], this algorithm assigns weights to features based on correlation between features and categories, and then selects all features with weight greater than an artificial threshold. Notably, the Relief algorithm is limited to binary classification problems. To address this problem, Kononeko [53] proposed the ReliefF algorithm, which has the ability to work with multiclass problems. ReliefF is a distance-based feature selector that uses Manhattan distance to measure weights. The evaluation criterion of ReliefF is presented in Equation (3), where: W( f 0,i ) stands for the weight of the ith feature before updating; W( f i ) is the updated weight of the ith feature; A is the vector of features; k is the number of nearest neighbors; m is the number of cycles; f h(x i ) and f r(C) are the value of k nearest neighbors of x i in the same and different class, respectively; P(C) is the ratio of the target samples C to the total sample; P(class(x i )) is the ratio of the samples in the same class including x i to the total samples; and di f f () denotes the distance of two samples on each feature in A.
This study uses Manhattan distance to measure the distance between two samples as shown in Equation (4), where di f f (A, R1, R2) is the difference between the sample R1 and R2 in the vector of feature A.
In this study, ReliefF selected the most correlated CLP factors as its output. The factors selected by ReliefF were then applied as inputs to the combination of SVM and GA.

Support Vector Machine (SVM)
SVMs can solve linear and nonlinear problems and provide power classification results [54]. The most important advantage of SVM is that it can control overlearning and high dimensionality and decrease computational complexity and local extremum [42]. Introducing a kernel function can facilitate the solving of nonlinear problems. Types of kernel function include linear, polynomial, and sigmoid functions. The radial basis function (RBF), presented in Equation (5), is one of the most popular kernel functions because it requires only one parameter, δ, which is a free parameter with a significant effect on classification accuracy and has a lower complexity, compared with other functions [41,55]. Another essential parameter in SVM problems is C, which is the penalty factor and shows the cost of misclassification. According to the significance of C and δ on the result of SVM, they need to be optimized to obtain the desired accuracy, which can be accomplished using GA optimization.
GA is an adaptive heuristic search algorithm, the goal of which is finding an optimal solution [41]. GA uses a fitness function to estimate the significance of results from the evaluation step. Two GA operators, mutation and crossover functions, randomly transfer chromosomes and affect the fitness value [56]. Crossover specifies two chromosomes that will generate a new offspring chromosome. However, mutation is the process used to change genes in chromosomes from their initial state [10,57]. This study selected the four best chromosomes to be part of the next generation and used a single-point crossover and binary mutation.
GA minimizes the fitness function value, which is shown as FF and calculated for each chromosome, using Equation (6). SV M_RMSE is the root mean square error (RMSE) of an SVM model, w is a weight of the specified number of factors (n f ), s i is '1' if the factor i is selected or '0' if the factor i is not selected, and c i is the cost of factor i. In the first step of HFS, ReliefF measures the weight of all CLP factors based on their correlations. Then, it selects all weights greater than or equal to a defined threshold as inputs of the wrapper method. In the second step, GA randomly generates the initial population, where each chromosome is an available feature subset for the problem. In the third step, using the selected factors from ReliefF as inputs, training of the SVM model begins and the RMSE of SVM is measured. Then, the FF calculation for each chromosome is completed. In the fourth step, if the result meets the termination criteria, the process stops; otherwise, the process goes back to the GA operation to find a better solution. Once the termination criteria are satisfied by the final generation, the iteration stops, and the final generation contains the factors that most influence CLP.

CLP Predictive Modeling
According to the literature review on past CLP modeling techniques, ANN and ANFIS have been found to perform well and thus, were chosen for this study. ANN is a suitable model for complex relationships between CLP and the factors that influence it, as these relationships cannot be obtained in a precise manner [12,24]. ANFIS models have been widely used in past CLP studies because of their superiority in being less reliant on expert knowledge and having a systematic data-driven process [36]. In order to optimize ANFIS parameters, the integration of ANFIS and GA was also developed. Another algorithm that shows accurate performance in a number of studies in other disciplines is RF, which was developed and compared with the other techniques in this study. Results from past studies show that RF is highly capable of solving non-linear classification problems, compared to other ML models [35]. As most crucial factors related to CLP do not follow a normal distribution, RF is a common ML technique in modeling construction productivity [58]. The following sections discuss the structure and components of these four widely used ML modeling techniques developed in this study.

Artificial Neural Network (ANN)
In the past few decades, ANN has become a popular and helpful technique for classification, clustering, pattern recognition, and prediction in many disciplines [59]. ANNs are able to deal with noisy or incomplete data and can be very effective, mostly in modeling problems where the relationships between inputs and outputs are not sufficiently known [60]. So, based on their abilities, ANN-based models can be ideal alternatives for modeling CLP. ANN consists of three types of layers: an input layer, hidden layers, and an output layer. In this study, a multilayer feedforward back-propagation network with one hidden layer was developed as an ANN model to predict CLP.

Adaptive Neuro Fuzzy Systems (ANFIS)
ANFIS is a hybrid technique that integrates the linguistic interpretability and fuzzy reasoning of FIS and learning capability of ANN in order to map inputs to an output [61]. In an ANFIS structure, fuzzy rules are extracted from ANN and the parameters of fuzzy membership functions are adaptively utilized during the hybrid learning process [62].

ANFIS-GA
The combination of ANFIS and GA is used to improve the performance of the ANFISbased model and optimize its parameters. GA is utilized to find the optimum parameters of ANFIS.

Random Forest (RF)
RF could be considered an ensemble of classification and regression tree (CART) since multiple CART models are generated and used as base models [35,58]. In this approach, RF first generates several training data sets by sampling randomly from the original training data set. After generating new training data sets and before the tree splitting process, RF implements variable randomization to boost the diversity of trees. As both training data and variable sets are generated randomly, trees in RF are different from each other and also independent [63]. Then, RF combines all trees by averaging their predictions. This joint prediction process increases accuracy and decreases large errors [64].

CLP Optimization
In the last step of the methodology, the PSO algorithm searches for the optimum values of CLP and the factors influencing it, using the predictive model proposed in the previous section.
PSO is one of the swarm intelligence-based algorithms first proposed by Kennedy and Eberhart [65]. PSO is simple to implement and is able to find solutions with acceptable accuracy, which makes it popular [66]. Each particle maintains three D-dimension vectors: position vector, velocity vector, and personal best vector. A particle retains its current position in position vector X i = (x 1 i , x 2 i , . . . , x D i ), for i = 1 to N (N = number of particles). Particles obtain their initial positions randomly in the search space. Velocity . . , v D i ) of the ith particle is utilized to update its position, and the particle also obtains its initial value randomly. The best position attained by the ith particle is preserved in personal best vector and denoted as Pbest i = (Pbest 1 i , Pbest 2 i , . . . , Pbest D i ). Therefore, the swarm best position is evaluated as Gbest = (Gbest 1 , Gbest 2 , . . . , Gbest D ). The movement of a particle is related to updating its velocity and position attributes in the tth iteration (t = 2, 3 . . . ), based on Equations (7) and (8).
where w is the inertia weight, c 1 is the cognitive acceleration coefficient, c 2 is the social acceleration coefficient, and r 1 and r 2 are random values between 0 and 1. Figure 2 presents a flowchart of the PSO algorithm. The objective of the optimization phase of this study contained the following goals: • Goal 1: Predicted CLP (CLP Pred ) has minimum deviation from "targeted CLP" (CLP tgt ), as shown in Equation (9), where ω is the relative importance of Goal 1, compared to Goal 2.
• Goal 2: Predicted CLP factors (F Predi ) have minimum deviation from "average value of factors" (F Avgi ) in the data set, among all the possible combinations of improvement scenarios, as shown in Equation (10).
In Goal 1, "targeted CLP" is the preferable CLP that a company tries to achieve. In this study, the value of CLP is between 0 and 1 after the normalization process, and greater CLP indicates better productivity in a project. Goal 1 tries to predict CLP considering the minimum distance from the targeted CLP.
Goal 2 tries to minimize changes in factors that most influence CLP. Companies mostly prefer minimum changes and corrective measures to achieve the preferable CLP because of the cost of implementing new strategies and corrective measures. In Goal 2, the average value of each factor is achieved from the existing CLP data set, which is discussed in Section 4. Since obtaining a value near the average value of each factor in the data set is feasible, the goal is to have a minimum distance between the average and optimum values for each factor. Therefore, the objective function is defined as in Equation (11): where CLP tgt and CLP Pred are the targeted CLP and predicted CLP, respectively, n is the number of selected factors affecting CLP, F Predi is the predicted value of the ith CLP factor, F Avgi is the average value of the ith CLP factor in the data set, ω is the relative importance of Goal 1 compared to Goal 2, and Z is the minimum value of the objective function. Objective function ranges from 0 to 1, where 0.5 means that Goals 1 and 2 have equal importance.
The outputs of this model are CLP Pred , which is the optimized and predicted CLP value, and F Predi , which is the predicted value of factors influencing CLP. The objective of the optimization phase of this study contained the following go

CLP Data Preparation and Feature Selection
Based on the data preparation process, the CLP value was normalized, using Equation (2). As a result of normalization, the CLP value was between 0 and 1, and greater CLP indicates better labor productivity for the project. After imputing missing values and removing factors with zero standard deviation, the number of factors was reduced to 108. By eliminating outliers from the CLP data set, 7 data points were removed as outliers, and the total number of data points became 85. Therefore, the CLP data set after the preparation process had 85 data points, 108 CLP factors, and a CLP value.
Next, the number of features was reduced by the proposed HFS method. For this study, the threshold of 0.25 was defined for ReliefF. All features with weights greater than or equal to 0.25 were selected as essential features in the next HFS stage. From 108 factors in the final CLP data set, ReliefF selected 43 as essential features. In the next stage of HFS, which is the integration of SVM and GA as a wrapper method, the GA parameter settings were a population size of 50, GA maximum iteration of 60, crossover rate of 0.83, and mutation rate of 0.2. The SVM penalty factor C was 10, the kernel type was RBF, and the kernel cache was 200. These parameters were obtained by trial and error and are the optimum values for this case. The termination criteria were a maximum of 60 generations or no improvement of performance over 5 generations. The proposed wrapper method was developed considering these parameters, and it selected 14 of the 43 factors identified by ReliefF. The set of 14 factors was selected when the RMSE of the run of the wrapper model was lowest. Table 1 presents the selected CLP factors resulting from HFS. As shown in Table 1, the first 11 factors are all from the activity level, and the next three factors belong to the project level, which shows the significant impact of activity-level factors on predicting CLP. From the selected factors, "Level of interruption and disruption", "Complexity of task", "Working condition (dust and fumes)", "Location of work scope (elevation)" and "Congestion of work area" are factors that negatively influence CLP. In other words, after normalization, when negatively influencing factors have values close to zero, they result in greater CLP, compared to when their values are close to 1. The other selected factors are positively influencing factors, and when their values are close to 1, they result in greater CLP.

CLP Modeling Comparison and Results
To develop the predictive CLP model, four different AI models were developed using the selected factors from HFS as input variables and CLP as the output. The accuracy of the four models was measured by comparing their predictions to the actual field data and calculating two commonly used error measures, mean absolute error (MAE) and RMSE, which are shown in Equations (12) and (13), where t i and y i are the actual and predicted CLP values for the ith instance, respectively, and m is the number of instances. For this purpose, data were divided into training and testing data sets, in which 70% of data were used for training and 30% for testing. For development of the ANN model, using MATLAB NN Toolbox, a multilayer feedforward back-propagation network with two hidden layers was considered, and the hidden layer sizes were 5 and 6. The learning rate was set to 0.33, and 200 training cycles were performed. The ANN model resulted in an RMSE of 0.164 and MAE of 0.130 for the training data set, and an RMSE of 0.165 and MAE of 0.135 for the testing data set.
The ANFIS model was generated using the ANFIS function of MATLAB Fuzzy Logic Toolbox. The basic learning rules for optimizing membership functions in ANFIS are either hybrid learning or back-propagation gradient descent. Hybrid learning combines the gradient descent and least square methods, and it overcomes the major limitation of the back-propagation method, which is that the learning process gets trapped in the local minima. Therefore, this study used the hybrid learning method. The training data set was grouped using subtractive clustering with an influence range of 0.4, squash factor of 1.15, and accept and reject ratios set at 0.5 and 1.15, respectively. The selected CLP factors were used as input variables and CLP as the output of ANFIS. The ANFIS model resulted in an RMSE of 0.042 and MAE of 0.034 for the training data set and an RMSE = 0.176 and MAE = 0.138 for the testing data set.
The ANFIS-GA model, developed using MATLAB, tries to optimize ANFIS parameters, and it showed better performance than ANFIS alone. In this study, the values of 0.2, 0.83, and 60 were assigned for the mutation rate, crossover percentage, and maximum iteration of GA, respectively. These parameters were obtained by trial and error and are the optimum values for this case. Different sizes were tested to find the appropriate population size and based on the results as shown in Table 2, the ANFIS-GA model with a population size of 25 had the best testing performance, which included an MAE of 0.096 for the training data set and MAE of 0.129 for the testing data set. Therefore, a population size of 25 was used in this study. The RF model was developed using Python language programming and required three parameters, namely the minimum number of terminal nodes for each tree, the number of trees, and the number of randomly selected variables to grow the trees [63]. In this study, these three parameters were set to 5, 145, and 6, respectively. The results of the RF prediction model are listed in Table 3 along with results of the ANN, ANFIS, and ANFIS-GA models for comparison. The results presented in Table 3 indicate that the RF model had the highest accuracy among the four predictive models, with an RMSE of 0.137 and MAE of 0.112 in the testing data set. The second most accurate algorithm was the ANN model, with a testing data set RMSE of 0.165 and MAE of 0.135. The third most accurate algorithm was the combination of ANFIS and GA, with an RMSE of 0.172 and MAE of 0.129 in the testing data set. Finally, testing data set RMSE of 0.176 and MAE of 0.138 indicate the ANFIS model was the least accurate.
According to the RMSE value of 0.137 for the RF testing data set, CLP predicted by RF was closer to the actual CLP values than for the other three developed models. In other words, RF was found to be better than ANN, ANFIS, and ANFIS-GA in mapping the relationship between the selected CLP factors and CLP. Moreover, the closeness of the RMSE values for the training and testing data sets indicate that ANN and RF were more stable than ANFIS and ANFIS-GA. Therefore, the RF model was selected to predict CLP in the optimization process for this study. Comparing the results of this study with past studies indicate that the RF predictive model has better performance. For example, Gerami Seresht et al. [39] obtained an RMSE value of 0.22 for their proposed CLP predictive model, while in this study, using the same data set, the RMSE value of the RF model was 0.137. Therefore, the proposed CLP predictive model achieved better performance accuracy in CLP prediction, compared with Gerami Seresht et al. [39].

CLP Optimization Results
Next, the integration of RF and PSO was developed to achieve the optimum value of the selected factors and maximum CLP value, according to the objective function in Equation (11). For this case study, the average value of each factor (F Avgi ) and CLP after normalization are shown in Table 4, and the average CLP value for the data set is 0.259. Table 4. Average values of selected factors and CLP of the data set. For the purpose of illustrating a CLP improvement trend, a sensitivity analysis was carried out to show the influence of different values of input parameters (namely ω and CLP tgt ) on output variables (CLP Pred and Z) for understanding the impact of input parameters on model output. Table 5 shows the results of the sensitivity analysis, which indicates the value of Z and predicted CLP as outputs based on different values of ω and CLP tgt as inputs of the RF-PSO model. The value of ω was changed to between 0.27 and 1; ω = 1 is the largest possible value for ω and indicates that Goal 2 has no impact on the model. The CLP tgt is in the range of 0.45 to 1, and CLP tgt = 1 is the largest possible value for CLP resulting from the normalization process. Figure 3 is based on the results in Table 5, which shows the value of CLP Pred for different values of ω and CLP tgt . For a specific CLP tgt , by increasing ω, CLP Pred increases, which shows the model sensitivity to ω, which is the relative importance of Goal 1. For CLP tgt = 0.45 and CLP tgt = 0.6, the changes in CLP Pred are much less, given ω greater than 0.4. So, it can be concluded that when CLP tgt is less than or equal to 0.6, the most appropriate value of ω is less than or equal to 0.4. This means the minimum deviation of F Predi (predicted value of CLP factors) from F Avgi (average value of CLP factors in the dataset) as a Goal 2 in Equation (10) has more weight, compared to the minimum deviation of CLP Pred from CLP tgt as a Goal 1 in Equation (9).  For selecting the most appropriate weight and targeted CLP, a company's preference is important. Most companies prefer minimum deviation from "average value of factors," which is feasible to reach, helps them decrease the number of corrective measures that are required, and thus reduces the cost of implementing corrective measures. Based on this, Goal 2 needs to have more weight compared to Goal 1, which leads to selecting a value of less than or equal to 0.5 as a weight of Goal 1. For this case study, the targeted CLP ( ) of 0.75 and of 0.27 were selected. Equation (14) indicates the objective function of the HFS-RF-PSO algorithm, according to the selected factors. In the presented algorithm, the settings were the number of particles = 50, maximum number of iterations = 30, and maximum velocity = 2, and the value of learning factors and were both set to 2.05. The initial values of the parameters were established on the basis of the relevant literature [67]. A large number of trials were performed to obtain the optimum values for this case.
Based on the selected inputs, the result of the RF-PSO model indicated 0.057 as a minimum value of Z, which is the minimum value of objective function (Equation (14)), and 0.522 was achieved as a maximum value of predicted CLP ( ). The optimum value of each factor is shown in Table 6. For selecting the most appropriate weight and targeted CLP, a company's preference is important. Most companies prefer minimum deviation from "average value of factors", which is feasible to reach, helps them decrease the number of corrective measures that are required, and thus reduces the cost of implementing corrective measures. Based on this, Goal 2 needs to have more weight compared to Goal 1, which leads to selecting a value of ω less than or equal to 0.5 as a weight of Goal 1.
For this case study, the targeted CLP (CLP tgt ) of 0.75 and ω of 0.27 were selected. Equation (14) indicates the objective function of the HFS-RF-PSO algorithm, according to the selected factors. In the presented algorithm, the settings were the number of particles = 50, maximum number of iterations = 30, and maximum velocity = 2, and the value of learning factors c 1 and c 2 were both set to 2.05. The initial values of the parameters were established on the basis of the relevant literature [67]. A large number of trials were performed to obtain the optimum values for this case.
Based on the selected inputs, the result of the RF-PSO model indicated 0.057 as a minimum value of Z, which is the minimum value of objective function (Equation (14)), and 0.522 was achieved as a maximum value of predicted CLP (CLP Pred ). The optimum value of each factor is shown in Table 6. The optimum value of each factor was obtained from the RF-PSO model as the predicted values for CLP factors (F Predi ) and the deviation of the optimum value from the average value for each factor. In other words, deviation from average value was achieved using Equation (15): As shown in Table 6, the optimum value of the factors "Ground condition," "Crisis management," and "Risk monitoring and control" have the least deviation from the average value of selected factors from dataset (F Avgi ) with values of 0.004, −0.005, and 0.007, respectively. Therefore, these factors do not need major changes to achieve the optimum CLP value, which is 0.522. It is notable in Table 6 that the optimum values of "Level of interruption and disruption," "Working condition (dust and fumes)," and "Fairness in performance review of crew by foreman" have the largest deviation from the average value of the factors, which are −0.119, −0.110, and 0.114, respectively. In other words, "Level of interruption and disruption" needs to be reduced to 0.043, "Working condition (dust and fumes)" needs to be reduced to 0.108, and "Fairness in performance review of crew by foreman" needs to increase to 0.808 in order to obtain the optimum CLP value. Improving factors with high deviation helps companies reach optimum predicted CLP. In order to improve factors that have a high deviation from their average value, a number of improvement strategies and corrective measures can be implemented. For example, for reducing dust and fumes in the working area, preventative maintenance for the airconditioning system can be conducted.
The proposed HFS-RF-PSO model has the potential to benefit construction companies in achieving their preferred labor productivity by applying the minimum changes to factors influencing CLP. Another capability of the proposed model is that companies can define their targeted value for each factor influencing CLP instead of the average value of factors. The results of the model will give them the values of predicted CLP and predicted factors in regard to having the minimum deviation from their targeted values for CLP as well as each factor. This novel approach can help companies identify factors that need the most changes for achieving their targeted CLP and, consequently, to prioritize the management practices that focus on factors with the greatest deviation from average value in the HFS-RF-PSO model.
The proposed HFS-RF-PSO model presented in this paper has a few limitations that need to be addressed in future research. First, the hybrid model was developed, using field data collected for concrete placing activities. In order to develop a generic model of CLP for different types of labor-dependent activities, new field data need to be collected. Second, although the PSO algorithm is computationally efficient compared to other optimization techniques and is robust with respect to control parameters, it can fall into a local optimum in high-dimensional space. In future research, an adaptive PSO algorithm can be developed and added to the hybrid model to improve diversity of the algorithm and avoid falling into local optimum.

Conclusions and Future Work
Developing models for predicting and improving a project's labor productivity is challenging because of the complexity of construction projects. Hence, accurate CLP prediction is required for effective decision making before and during project execution. The fact that numerous factors affect CLP is the main challenge in modeling labor productivity. This study deals with this challenge by developing an HFS method prior to CLP prediction. The main aim of this study was to develop a novel approach for predicting and optimizing CLP. After developing the HFS method, this novel methodology developed and compared four different predictive models, using ANN, ANFIS, ANFIS-GA, and RF and the 14 factors selected by HFS as inputs and CLP as an output. The comparative analysis of four predictive models showed that the RF model obtained better accuracy, compared with the three other models. Then, RF as the most accurate predictive model was selected to integrate with PSO for identifying the optimum value of influential factors and the maximum CLP value. The proposed HFS-RF-PSO model has the capability of obtaining the optimized CLP close to a company's preferred value and minimizing deviation of the predicted CLP factors from the average value of factors in a data set. Based on the results of the HFS-RF-PSO model using the mentioned data set, among 14 selected factors, "Level of interruption and disruption," "Working condition (dust and fumes)," and "Fairness in performance review of crew by foreman" have the largest deviation from their average value, which means major improvements regarding these factors are needed in order to obtain optimum CLP. Furthermore, comparing the four most common ML models highlighted some critical modeling features of the presented models, which can assist researchers in future studies.
The contributions of this study include (1) identifying the most predictive factors for CLP by developing an HFS model that contains the integration of ReliefF and SVM-GA, (2) developing and comparing four different predictive models for CLP and identifying the most accurate model, and (3) developing a novel approach-the HFS-RF-PSO algorithmfor optimizing the factors that influence CLP and identifying the maximum CLP value considering the minimum deviation from targeted CLP value and also finding the optimum value of the selected factors based on minimizing their deviation from their average value in the data set. The proposed HFS-RF-PSO model helps project managers predict, optimize, and improve the CLP value, taking into account the factors that are the most predictive of CLP. Although construction projects are unique and the factors affecting CLP may differ from project to project, the proposed model is flexible, and new influencing factors can be added to the existing model structure to predict and optimize CLP and its factors for a given project. The results of this study and implementation of the HFS-RF-PSO model will help project managers identify causes of low labor productivity and select and prioritize corrective measures based on the deviation of factors in the model, to improve CLP. The model also enables project managers to improve the reliability of predictions.
Future research can focus on using the proposed methodology to model and optimize multifactor construction productivity, which includes labor, equipment, and materials. Furthermore, future studies can present corrective measures to improve CLP, according to the HFS-RF-PSO results that show which factors need the most changes for reaching the targeted CLP. In addition, using new field data from other labor-dependent activities will help researchers overcome one of the mentioned limitations of this study and develop a generic hybrid model for predicting and optimizing CLP.
Author Contributions: S.E.: conceptualization, methodology, formal analysis, software, investigation, writing-original draft preparation, writing-review and editing. A.R.F.: conceptualization, writing-review and editing, supervision, project administration, funding acquisition. V.S.: conceptualization, writing-review and editing. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.