Comparison Studies Between Machine Learning Optimisation Technique on Predicting Concrete Compressive Strength


 In this research, a comparison study of the machine learning (ML) optimisation technique to predict the compressive strength of concrete is discussed. In previous studies, researchers focused on identifying the machine learning model by comparing, ensemble, bagging, and fusion methods in predicting the concrete strength. In this research, an ML model hyper-parameter optimisation is used to improve the prediction accuracy and performance of the model. Extreme gradient boosting (XGBoost) is used as the base model to perform the prediction, as the XGBoost has a built-in model ensemble, bagging, and boosting algorithms. Grid Search, Random Search, and Bayesian Optimisation are selected and used to optimise the hyperparameters of the XGBoost model. For this particular prediction study, the optimised models based on Random Search performed better than other optimisation methods. The Random Search optimisation method showed substantial improvements in prediction accuracy, modelling error and computation time.


Literature Review
Concrete has been commonly used in construction and architecture due to its favourable engineering properties. Combining the ability to cast and harden at ambient temperatures, concrete is a popular option in constructing structural elements, especially in high-rise buildings [1]. The benefits of concrete include high compressive strength and excellent water resistance, allowing it to be the material of choice for structures that need a solid foundation to withstand critical environmental conditions such as tunnels, dams, and reservoirs [2].
In general, concrete comprises four primary components: coarse aggregate, fine aggregate, cement, and water. Concrete's economic value allows it to be widely used in constructions and the accessibility to the material which is available in the local market. It also demonstrates excellent benefits over other construction materials such as steel, and concrete can be produced with minimum effort. In certain instances, supplementary materials like fly ash (PFA) [1] [2], blast furnace slag (GGBS) [3], silica fume [4], and other industrial waste are added in concrete to enhance the mechanical properties of the concrete [3]. The introduction of industrial waste [5][6] into concrete offers environmental benefits while increasing the longevity and resiliency of concrete structures.
The multiple Kaggle competition winner 'XGBoost' is a highly effective machine learning algorithm due to its scalable tree boosting system and sparsity-aware algorithm in modelling structured datasets. However, in most previous studies, the XGBoost model was used for comparison studies and evaluated its performance.

Objectives
In this paper, several hyper-parameter optimisation methods, namely 'Grid Search' and 'Random Search' and 'Bayesian Optimisation,' are used to optimise the prediction accuracy of the XGBoost models. As a reference, several regression-based machine learning models were used to evaluate and compare the performance of XGBoost models.
The fundamentals behind XGBoost models are defined in Section 2 and followed by the statistical properties of the dataset & modelling discussed in Section 3. Results of the XGBoost models with various models and optimisations are compared, and the importance of each hyper-parameter optimisation is analysed in Section 4. The conclusion and significance of model optimisation in XGBoost modelling are discussed in Section 5.

Fundamental Theoretical of XGBoost
XGBoost is a decision tree-based ensemble Machine Learning algorithm that uses gradient boosting to make predictions for unstructured data, i.e., images. The algorithm has been the source of countless cutting-edge applications, and it has been the driving force behind many of these recent advances. It's been widely used as industrial solutions such as customer churn prediction [9], applicant risk assessment [10], malware detection [11], stock market selection [12], classification of traffic accidents [13], diseases identification [14], and even in predicting the death of patience during SARS-COV-2(Covid-19) treatment [15]. In general, the XGBoost algorithms are the evolution of decision tree algorithms that were improved over time. Figure 1 below shows the development of decision tree-based algorithms to XGBoost.

Figure 1 The Evolution of XGBoost
The most significant benefit of XGBoost is its scalability across any condition [16]. Like GBM's ensemble tree method, XGBoost employs the concept of boosting weak learners using gradient descent architecture. However, XGBoost outperforms the GBM algorithm by superior optimisation and algorithmic improvement. Figure 2 illustrates the advantages of XGBoost, which contributes to an exceptional prediction result.

Figure 2 The Advantages of XGBoost
The XGBoost algorithm applies an adaptive regularisation technique to the objective function, and regularisation is used to keep the complexity of the model low and prevent overfitting, as shown in Eq. (1), where Ω( ) = + XGBoost builds parallel trees based on sequential implementation. The loop order is reversed using initialisation via a global search and sorting using parallel threads. This increases algorithmic efficiency by allowing parallelization without any overhead.

Parallelization
The greedy stopping criterion allows the loss to be below some threshold at the split point. Using a max depth parameter, the XGBoost algorithm begins pruning the trees backwards. This 'depth-first' approach increases computational efficiency.

Tree Pruning
XGBoost is a cache aware algorithm which allocate internal buffers in each thread to store gradient statistics. In addition, it is enhanced with out-of-core computation when handling large datasets.

Hardware Optimization
The approach penalises models that are too complex by applying LASSO and Ridge regularisation to reduce model overfitting.

Regularization
XGBoost automatically learns sparse features for inputs by naturally handling different degrees of sparsity in the data, and effectively handles different kinds of sparsity in the training dataset.

Sparsity Awareness
The distributed weighted Quantile Sketch algorithm in XGBoost finds the optimal split points between weighted datasets.

Weighted Quantile Sketch
The built-in cross-validation in XGBoost takes away the need to specify the exact number of boosting iterations needed in a single run.
where =̂( −1) ( ,̂( −1) ) is a first-order derivative and ℎ =̂( −1) is a second derivative. The aim is to establish optimal tree structures by incrementally adding partitions to the current leaf nodes.
When each decision tree splits, two branches are formed. We can measure the information gain from the split by measuring the target function before and after the split. Eq. (4). If the fixed gain is less than the splitting gain or the number of divisions exceeds the maximum depth of division, the split ends and the final model is obtained.

Hyperparameters in XGBoost
The most commonly used machine-learning algorithms are famous for learning patterns and regularities in the data automatically by tuning the hyperparameters. In machine learning models, the hyperparameters choose decision variables at each node and the numeric thresholds that influence the predictions. In XGBoost, hyperparameters are categorised into three, i.e., general parameters, booster parameters, and learning task parameters. Table 1 below shows the categories of hyperparameters and the critical parameter in XGBoost. • min_child_weight (minimum sum of instance weights needed in a child) • max_depth (maximum depth of a tree) • gamma (minimum loss reduction required to do a split) • subsample (section of observations to be randomly sampled for each tree) • colsample_bytree (section of columns to be randomly sampled for each tree) • colsample_bylevel (subsample ratio of columns for each split in each level) • lambda (L2 regularisation term on weights) • alpha (L1 regularisation term on weights)

Learning Task
Parameters Hyperparameters used to define optimisation objective to the calculated metric • objective (minimisation of loss function) • eval_metric (evaluation metric for data validation) The influence of hyperparameters on a model is well known; however, it is difficult to determine the best values for a hyperparameter or the best combinations of hyperparameters for a given dataset. Hyperparameter optimisation or hyperparameter tuning is an approach used to evaluate various values for model hyperparameters and select a subset that results in a model with the best predictive results on a given dataset.
In optimisation, a search space is established as an n-dimensional volume, where each hyperparameter characterises a different dimension, and the value of that dimension may be a true, integer, or categorical value. Points in the search space are vectors that have the required values for a given set of hyperparameters. Optimisation aims to find a parameter that performs in the best output of the model after training, i.e., the most accurate or the least error model.
Various optimisation algorithms can be used, with two of the most reliable and simplest methods are Grid Search and Random Search. As a comparison, advanced optimisation techniques, i.e., Bayesian Optimisation, were also used in this research.

Grid Search
Grid Search optimisation is a comprehensive parameter searching approach where the model is tested and evaluated with every combination of the parameter values, and the optimal combination is selected. In Grid Search, the parameters are successively allocated to each model, generating a multidimensional mesh grid theoretically created with different values.
Each mesh grid node represents a set of parameters, and model calibration are established at each node according to the predetermined values of parameters. To confirm the nodes are the most efficient, the algorithm continues by screening the entire grid, similar to the partial exhaustion method. Considering simplicity and representative, the best possible outcomes are to be obtained with cross-validation [17].

Random Search
The random search focuses on the use of random combinations to optimise the hyperparameters of a built model. It measures random combinations of a set of values to optimise decent outcomes, with the function tested at any number of random combinations in the parameter space. The chances of discovering the optimal parameter are relatively higher in random search due to various search patterns in the model are trained on the optimised parameters without aliasing. Random search is best for lower dimensional data as this method takes less time and iterations to find the right parameter combination [18].

Bayesian Optimisation
The fundamental difference between Bayesian optimisation and other methods is that it generates a probabilistic model and then uses it to decide where to test the function next while simultaneously controlling for uncertainty. The basic principle is to use all available information in the model and use gradient and Hessian approximations only if possible [19].
This allows for finding the minimum of complex non-convex functions with fewer evaluations at the expense of evaluating where to look next. When using Bayesian optimisation, two essential decisions must be made. Firstly, to express assumptions about the function being optimised, a prior over function must be chosen. As this suits our purposes, we will use the Gaussian method for its simplicity and tractability. Secondly, an acquisition function is selected to establish a utility function from the model posterior, which is then used to find the next evaluation stage.

Model Structure
In general, the machine learning model structure involves several key processes. Figure 3  vi. Analysis and Reporting -A case study is performed based on comparing various machine learning models, optimisation parameters, and evaluation metrics. Figure 3 Step by step of XGBoost Modelling and Optimisation

Data Collection and Pre-Processing
Dataset of concrete compressive strength with 1030 samples is collected from the UCI Machine Learning Repository [20].  Table 2 below.

Figure 4 Pearson's Correlation Heatmap
As shown in Figure 4 above, it can be observed that the correlation between input and output parameters is relatively low. The correlation coefficient is generally in the range of -0.66 to 0.50, and only aggregate is negatively correlated with all other parameters.
An additional statistical analysis of each parameter was also performed to ensure that the training dataset comprised input parameters with overall values encompassing the entire range of the dataset. Figure 5-11 shows the distribution and correlation between input parameters with output parameters, i.e., strength.  To train and evaluate XGBoost model prediction results, the dataset was randomly partitioned into two sets, i.e., a training set and a testing set. Around 75% of the primary dataset's data records were used to train the XGBoost models, while 25% were used for testing. This theoretical percentage split of 75% to 25% is widely used in past research [21][22] [4][8].
Before training machine learning models, pre-processing data is required. To prevent training from being dominated by one or a few features with large magnitude, features should be normalised so that their range is consistent. The normalisation process is achieved by applying the normalisation process to the range from 0 to 1 before training. For each input element, the data points are divided by the highest magnitude. In the training process, the performance features' predictive effects are mapped back to their original scale.

Model Evaluation
In this paper, four separate statistical measurement parameters were used to calculate the prediction efficiency of the XGBoost models. In simpler terms, the evaluation parameters estimate the accumulated error in predictions concerning actual observations. The statistical parameters are: coefficient of determination (R2), mean square error (MSE), mean absolute error (MAE), and root mean squared error (RMSE). These mathematical formulations are defined in Eq. 6-10; in this case, n is the total number of test dataset records while y′ and y are the predicted and measured values, respectively. The values of R 2 would range from 0 to 1the closer the value is to 1, the higher fitting optimisation of the model is. The values MAE, difference between the predicted value and the measured value, that is, the best the prediction of the model.

Initial Modelling
Initial modelling was undertaken using the default hyper-parameter setting in the XGBoost algorithm. For this research, five hyperparameters were selected for optimisation. Table 3 summarises all five hyperparameters and the default value used on the initial XGBoost modelling. The initial modelling reached a score of 0.95 and 0.88 for the training and testing models. The coefficient of determination, R 2, reached 0.88, and other evaluation metrics for the initial modelling are listed in Table 4. Figure 12 and 13 illustrates the distribution of predicted results compared to actual results and the best fit line for the prediction distribution. Based on the evaluation metrics and prediction results, the initial modelling indicates a fair modelling result; however, it can be further optimised using Grid Search, Random Search, or Bayesian Optimisation Models, which is discussed in the following section.   Table 5 below shows the summary of hyper-parameter optimised value for the proposed Grid Search, Random Search, and Bayesian Optimisation Models. The initial model was used as the reference model, and all three optimisation algorithms were performed to optimise the five default hyper-parameter to reach the best RMSE value. As indicated in Table 5, each optimisation algorithms returns a unique value for all five hyper-parameters. Moreover, the optimisation duration was also recorded for each optimisation algorithm. The grid search optimisation shows a significantly longer optimisation duration than Random Search and Bayesian Optimisation, which is recorded around 1 hour.

Features Importance Analysis (Sensitivity Analysis)
In addition to model optimisation, sensitivity analysis or feature importance analysis is performed to understand the influence of each feature/component of concrete in predicting the compressive strength of concrete. Figure 20 below displays all the features used in the compressive strength prediction model and the relative importance. Age is expressed as the most essential feature and followed by OPC as the second most important feature. Aggregate is recorded as the least important feature and has a notably low effect on strength prediction.

Uncertainty Analysis
As the final step of the analysis, a new dataset was feed into the model to assess the performance and behaviour of the model. A separate 30 batches of the dataset were used to predict using the final model. The evaluation metric is listed in Table 8. The distribution of predicted results compared to actual results and the best fit line for the prediction distribution is shown in Figure 21 and Figure 22, respectively. The final model with the new dataset shows acceptable prediction results and comparatively with better evaluation metrics. The prediction distribution and the best fit line also indicate an improved prediction with R 2 reached 0.95. were improved significantly using the hyper-parameter optimisation algorithms. RMSE and MAE are an essential indicator that measures the prediction accuracy of the model using the new dataset, and at the same time to fit the primary purpose of a predictive model. Additionally, various machine learning algorithms were also modelled and compared to optimised XGBoost with Random Search algorithm. Table 10 shows the summary of multiple ML models and the evaluation metrics for predicting the compressive strength of concrete.
Generally, ten models were developed and compared to XGBoost, and only the 'Extra Trees Regressor' model achieved suggestively good prediction results with R 2 of 0.91 and RMSE of 5.41.