Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete

This study aims to provide an efficient and accurate machine learning (ML) approach for predicting the creep behavior of concrete. Three ensemble machine learning (EML) models are selected in this study: Random Forest (RF), Extreme Gradient Boosting Machine (XGBoost) and Light Gradient Boosting Machine (LGBM). Firstly, the creep data in Northwestern University (NU) database is preprocessed by a prebuilt XGBoost model and then split into a training set and a testing set. Then, by Bayesian Optimization and 5-fold cross validation, the 3 EML models are tuned to achieve high accuracy (R 2 = 0.953, 0.947 and 0.946 for LGBM, XGBoost and RF, respectively). In the testing set, the EML models show significantly higher accuracy than the equation proposed by the fib Model Code 2010 (R 2 = 0.377). Finally, the SHapley Additive exPlanations (SHAP), based on the cooperative game theories, are calculated to interpretate the predictions of the EML model. Five most influential parameters for concrete creep compliance are identified by the SHAP values of EML models as follows: time since loading, compressive strength, age when loads are applied, relative humidity during the test and temperature during the test. The patterns captured by the three EML models are consistent with theoretical understanding of factors that influence concrete creep, which proves that the proposed EML models show reasonable predictions.


Introduction
Creep is one of the most significant behaviors of concrete, which forms necessary basis for precise prediction and evaluation of both short-and long-term mechanical response of concrete structures under sustained loads.On one hand, for early-age concrete members that bear shrinkage-induced tension stress, creep essentially reduces the magnitude of tensile stress evolution [1,2].On the other hand, for long-term mechanical behavior of concrete structures, creep is a major reason accounting for serious time-dependent engineering problems such as substantial deflection and prestress loss in bridges [3,4].Therefore, an accurate estimation of creep behavior of concrete is important for long-term serviceability of concrete structures.For this, a number of experimental and numerical approaches have been developed, as described below.

Creep and relaxation have been experimentally studied for decades.
A wide range of variables have been investigated, including different concrete mixtures (i.e., w/c ratio, supplementary cementitious materials (SCM), fiber, admixture.)[5][6][7][8], environmental conditions (i.e., relative humidity, temperature.)[9][10][11] and loading schemes (i.e., compressive or tensile loading, age or duration of loading.)[12,13].These tests provided valuable data for understanding the creep behavior of concrete.However, accurate prediction remains a challenge because of high sensitivity of creep to wide ranges of parameters and interplay between these parameters [14].
In recent years, researchers have tried to decouple the influence of multiple variables to more accurately measure concrete creep under certain conditions.Hydration degree and its interaction with loading conditions can significantly influence the experimental results of creep tests [15,16].To overcome this, Wyrzykowski et al. [17] partly replaced unhydrated cement with an inert quartz powder to form a non-aging system, on which the mechanical properties of pseudo cement paste were calibrated to mimic the real one and then uniaxial compressive creep tests were conducted.With the concern that most researchers assumed an axial load in their creep/relaxation tests, Liang et al. [18] performed biaxial creep tests of high strength concrete using ring test method and concluded that creep properties under biaxial stress condition were significantly lower than that under uniaxial stress condition.Besides, in order to decouple the influence of shrinkage in creep tests, Liang et al. [19] also devised the methodology of flexural deflection test to exclude the influence of shrinkage on measurement of creep properties under the assumption of a symmetric distribution of relative humidity about the neutral plane.As an important factor influencing both the hydration reaction and the time-dependent change of microstructure, temperature can significantly influence creep behavior of concrete.Ladaoui et al. [20] tested the basic creep of 4 types of high-performance concrete under different temperatures, and observed a factor of 2-3 in basic creep when rising the temperature 20 • C and 50 • C. Similarly, the studies [21,22] also observed significant increase of concrete creep under elevated temperatures.These observations are in line with [23] and can be explained by distortion of water from the CSH gel.Meanwhile, To exclude the influence of temperature and achieve the aim of in-situ and early-age measurement of creep, a series of temperature-regulated stress testing methods were employed to investigate the influencing mechanisms of loading age and loading scenarios on creep [24][25][26].
Except for macro-scale tests mentioned above, micro-scale indentation tests have been conducted to investigate the intrinsic creep properties of C-S-H in recent years [27,28].Recently, Gan et al. [29][30][31][32] have proposed the methodology of micro-cantilever bending test to characterize the creep properties of cement paste, by which the effects of w/b ratio, binder type and stress level on creep measurement were investigated.Their results showed that microstructural features largely influence the global creep behavior.
To sum up, the state-of-the-art testing methods for creep of concrete can effectively improve the measurement accuracy by using more sophisticated loading scenarios, material processing methods, and advanced testing machines.However, an unavoidable challenge is that only a limited number of parameters can be tested and decoupled in laborious experimental works, which largely restraints the estimation accuracy of creep behavior of concrete under practical loading conditions.

Numerical methods
Based on experimental tests, a series of numerical, analytical and semi-empirical models were proposed to predict creep behavior of concrete.Bazant et al. [33] proposed a rate-type creep law to characterize the linear creep behavior of concrete, which can be interpreted by a Maxwell or a Kelvin Chain model.To avoid the need for storing all the stress or strain history of all FE elements and improve the computational efficiency, the exponential algorithm proposed by Carol et al. [34] was incorporated in the rate-type law.Meanwhile, in order to tackle the computational instability and non-uniqueness of a discrete Kelvin spectrum, Bazant et al. [35] proposed a continuous retardation spectrum to unambiguously fit the experimental creep curve, which can be discretized and then implemented in a complete creep analysis.Di Luzio et al. [36] systematically performed the rate-type creep analysis methods formed by Refs.[33][34][35] on 3 typical creep constitutive models (i.e., Eurocode 2, ACI Model, and Model B3), which further validated the effectiveness of rate-type creep model in characterizing the creep behavior of concrete under various loading scenarios.Similarly, Hedegaard et al. [37] assumed the concrete and reinforcement as a composite material and implemented the rate-type creep law to approximate the creep properties of reinforced concrete.Besides the rate-type method, many models were also proposed to simulate creep/relaxation behavior with emphasis on various influencing mechanisms (i.e., degree of hydration [38], relative humidity [39], loading eccentricity [40] and water-aggregate interaction [41]) and different numerical expressions of the creep constitutive model (i.e., time-varying generalized Maxwell model [42], age adjusted effective modulus [43] and parallel creep curve [44]).
Except for the modelling approaches that mainly consider a single influencing mechanism, more sophisticated models were developed to partly or fully characterize the interplay between creep, shrinkage/ swelling, damage, hydration and heat transfer.Based on experimental observations, time-dependent softening models were developed to simulate the creep failure of concrete under different levels of sustained loads [45][46][47].Moreover, multi-physics models [48][49][50][51][52][53] were developed to incorporate almost all significant aspects of concrete behaviors including creep, shrinkage, hydration and crack.These models coupled the hydro-thermo-chemo-mechanical mechanisms of concrete and therefore could characterize both the short-and long-term behavior of concrete.Specifically, for a practical creep test, which is enviably influenced by complex interplay of shrinkage, damage, and aging, such models will be useful in separating the influence of each individual factor on the creep behavior.
Subsequently, with the advancement of experimental micromechanical facilities, abundant creep data gathered from micro-scale tests motivated more micro-scale creep models to spring up, which consider the influence of aggregate [54,55], C-S-H gel [56] and hydration [57].Recently, with the aim to reproduce the experimentally observed creep behavior of cement paste, a local force -based lattice fracture model was developed to model the creep behavior [58].Based on the results of X-ray computed microtomography analysis and nanoindentation tests, the experimentally informed model was capable of explaining the experimental results.
To sum up, the sophisticated numerical models described above have not only provided an effective alternative to experimental methods with good precision under certain conditions or assumptions, but also delivered in-depth explanations for various creep mechanisms and interactions between multiple influencing parameters.Nevertheless, given the objectives of predicting the creep behavior of concrete structures, most numerical methods turn out to be inappropriate because of the gap between the ideal conditions presumed by numerical models and practical engineering conditions which also constantly change during the creep process.

Significance of this study
With the aim to supplement the insufficiency of experimental and numerical methods mentioned above, this study provides a Machine-Learning (ML) solution to predict the creep behavior of concrete materials.As a tributary of artificial intelligence, supervised ML has been considered as a promising data-driven approach to build robust models for predicting various properties and behaviors of heterogenous materials.
In recent years, many researchers in the field of construction materials have been implementing ML models in prediction of drying shrinkage [59], cracking propagation [60], compressive strength [61][62][63], elastic modulus [64], breakout capacity [65], shear capacity [66], slump flow [67], flexural strength [68] and interfacial bond strength between FRPs and concrete [69].In these studies, conventional machine learning (CML) models including artificial neural network (ANN), Gaussian process regression (GPR), support vector machine (SVM), multilayer perceptron artificial neural network (MLP-ANN) and random forests (RF) proved to be effective in making precise prediction based on an existing database.Furthermore, as a well-known approach to reduce the risk of overfitting that brought by CML model, ensemble machine learning (EML) models were adopted in predicting surface chloride concentration [70] and compressive strength [71,72].The results showed that compared with CML models, EML models could better consider multiple influencing factors and thus had superior prediction performance.
In this study, based on the Northwestern University (NU) creep database [73] that constituted by parameters characterizing different loading conditions, mixture proportions and environmental conditions, three decision tree-based EML models (i.e., Random Forest (RF), Extreme Gradient Boosting Machine (XGBM) and Light Gradient Boosting Machine (LGBM)) are established to learn the underlying patterns behind 26,458 creep test results.Then, by tunning the hyperparameters with Bayesian Optimization, the performance of 3 EML models is efficiently improved within 60 iterations of a cross validation process.Finally, based on the well-tuned EML models, a game-based theory SHapley Additive exPlanation (SHAP) is implemented to unravel the underlying pattern that EML models reveal from the database, which can provide an in-depth understanding for the prediction of creep behavior.

Data preprocessing
This study aims to build EML models by exhaustively learning the creep data from NU database [73], which contains 29,196 observations of creep compliance at different time points.Fourteen parameters are selected as model input: water-cement ratio, aggregate-cement ratio, cement content (kg/m 3 ), cement type, amount of superplasticizer (kg/m 3 ), compressive strength at the age of 28 days (MPa), volume-surface ratio, environmental humidity of specimen preconditioning (%), concrete age at loading (day), temperature during loading ( • C), humidity during test (%), creep test type, loading stress (MPa) and time since loading (day).

Categorical parameters
Compared with other numeric parameters, both cement type and creep test type are both treated as categorical parameters here, which enables this study to cover a wider range of parameters and promote robustness for predictions in practical conditions.
In the category of cement type, the number 1, 2, 3, 4 will be used to classify slowly hardening cement (SL), normal cement (N), rapid hardening cement (R) and rapid hardening high strength cement (RS).While in the category of creep test type, according to the original records of NU database, most creep tests are lying in the categories of "basic", "basic?", "total?" and "total".Due to ambiguous experimental process described by the original papers, the NU database does not provide clear information regarding the type of every creep test.As basic creep refers to time-dependent deformation without the influence of drying creep, fully-sealed conditions are clearly required in basic creep tests.While on the other hand, total creep tests aim to measure all the deformation that induced by both basic and drying creep, which corresponds to no-sealed conditions.Therefore, for the creep test type "basic", "basic?", "total?" and "total", the sealing conditions of fully-sealed, inadequately-sealed, partly-sealed and no-sealed can be reasonably determined, with the two terms "inadequately-sealed" and "partly-sealed" are tentatively assumed to tackle the uncertain annotation of creep test type in NU database.Subsequently, this study labels each creep test type according to their sealing condition, using the number 1, 2, 3, 4 to represent the creep test type of "total", "total?", "basic?", "basic".

Missing data
In the selected 14 input parameters mentioned above, much data is missing.Therefore, an appropriate data imputation procedure should be taken to supplement missing data.First of all, to ensure the reliability of data, any observations that have more than 3 missing data has been discarded in this study.
There are mainly two kinds of available strategies to tackle the problem of missing data, namely univariate imputation methods and multivariate imputation methods.The univariate imputation method is a simple approach to impute the missing data with the statistics (i.e., mean, median or most frequent) of each column where the missing data are located.While the multivariate imputation is a more sophisticated approach to impute the missing data by implementing a model to fit the correlation of all the available data, which can then be used to estimate the missing data point so as to make the process of data imputation unbiased and statistically valid [74][75][76].For parameters other than compressive strength, there are little correlations between the missing data and existing data, so a multivariate imputation would be inappropriate.Therefore, these missing values are imputed following the method of univariate imputation method with its median values.
Compressive strength is one of the most important parameters of concrete that can be correlated to multiple concrete properties [23], therefore an accurate estimation for missing compressive strength data will be carried out.The XGBoost model, which is widely reported to have superior performance in predicting compressive strength [77][78][79][80][81], will be established here to conduct imputation of missing compressive strength data.Based on, in total, 3051 sets of compressive strength data (1206 sets from NU database, 1030 sets from Ref. [82], and 815 sets from Refs.[83][84][85][86][87][88][89][90][91]), the XGBoost model for compressive strength is trained on 2440 samples and tested on the remaining 611 samples.Besides, Bayesian Optimization and 10-fold cross validation [92] will be implemented to tune the hyper parameters of the XGBoost model.The input for prediction of compressive strength are as follows: amount of cement, fly ash, slag, superplasticizer, w/c ratio and curing age.The hyperparameters are shown in Table 1.The training and testing performance evaluated by coefficient of determination (R 2 ) are shown in Fig. 1.Compared with the other ML-Based compressive strength prediction models [71,72,[77][78][79][80][81], the XGBoost model is built on the largest database of concrete strength, and therefore its prediction should be robust and accurate.After the preprocessing, 26,458 observations of creep tests results from NU database are selected as final input data and will be used in prediction of creep, as is shown in Fig. 2.

Methodology
For prediction problems involving unstructured data (e.g., image, text), deep artificial neural network tends to outperform all other ML algorithms.However, when it comes to medium tabular data, tree-based ML algorithms remain to be best-in-class.In this study, three kinds of the state-of-the-art tree-based EML models, namely Random Forest (RF), Extreme Gradient Boosting Machine (XGBoost) and Light Gradient Boosting Machine (LGBM) will be built to predict creep compliance of concrete under various experimental conditions.During the process of cross validation, Bayesian Optimization will be implemented to efficiently tune the hypermeters.Afterwards, with the well-tuned models, Shaley Additives Values will be calculated to characterize the influence of each parameter on the final inference of creep compliance.

Decision tree-based ensemble machine learning (EML) models
The basic strategy of a decision tree-based EML is to build multiple decision trees based on randomly selected subsamples of the whole data set, and then combine the prediction of each pre-built decision tree estimator to give a new inference corresponding to the input data.Specifically, there are mainly two training methods for EML models, namely parallel training and sequential training.Parallel learning will be used in training of RF, while sequential training will be used for XGBoost and LGBM.

Classification and regression tree (CART)
The classification and regression tree (CART) is implemented here as the basic estimators that will be further ensembled to build the EML model.For regression aims, CART consistently conducts binary partition for each parameter of the dataset at each level until the prescribed maximum depth is reached [93].During this process, the input data X will be partitioned at the split point s for splitting variable j: By repeating the binary partition process, the whole dataset will be divided into M regions (i.e., terminal nodes) R 1 , R 2 , …, R m , with each region R i corresponding to a prediction response c i .In order to find best splitting point s for each splitting variable j, the mean squared error is used to build up the cost function: For any choice of j and s, the minimization in Eq. ( 2) is given by:  where y i is the i-th ground-truth value, and N 1 , N 2 is the number of data points in region R 1 , R 2 .Therefore, the best partition pair of (j, s) can be found by minimizing the cost function of Eq. ( 2).An example of CART with a maximum depth of 3 and number of nodes of 4 is shown in Fig. 3.With a proper value of maximum depth, number of nodes and pruning strategy, a CART can serve as a qualified standalone estimator.However, with the intrinsic hierarchical nature of training process, the effect of an error in the top split is propagated down to all of the splits below, which means CART has very high variance and a slight change of input data can greatly change the output [93,94].As a result, overfitting often happens and thus reduces the generalization capabilities of the CART model.

Random forest (RF)
RF is proposed as a straightforward EML model to tackle the instability of a standalone CART as described in section 3.1.1.The workflow of RF is shown in Fig. 4. Following the parallel learning method, RF uses a bagging strategy (i.e., bootstrap and aggregation) to build a number of CART models based on subsamples randomly selected from the training dataset [95].Besides, the number of features that used as input for each CART model is also randomized from the original number of features.Finally, by taking averages, summation or following the majority vote, responses of all CART models are combined to provide an inference according to a certain input.Although the two injected randomness can yield the CART with certain decoupled errors, the aggregation procedure of combining all CART can be effective to cancel out most of these errors [96].Thus, RF can effectively reduce variance at a very small cost of bias.
In this study, based on the Scikit-Learn API [97], RF will be implemented to build a baseline prediction model for concrete creep behavior.The selected hyperparameters for tunning process are n_estimators (the number of CART), max_features (the maximum size of every random subset of features for training each CART), max_depth (the maximum depth for each CART), min_samples_split (the minimum number of samples required to split an internal node of each CART) and min_-samples_leaf (the minimum number of samples required in an internal node).

Extreme Gradient Boosting Machine (XGBoost)
As a counterpart to RF, gradient boosting machine adopts a sequential training strategy to create a strong learner by sequentially training a number of weak learners (i.e., CART in this study).In every step, a weak learner is trained to minimize the loss function by using gradient descent optimization algorithm [93].XGBoost is recognized as an advanced implementation of gradient boosting machine, which can more effectively control over-fitting by using a more regularized model formation [98].The objective function during the training process of XGBoost is expressed as: in which L is the loss function; Ω is the regularization term for measuring the model complexity; t is the t-th iteration for generating t-th CART; n is the total number of CART; f i is the mapping function of i-th CART.The model complexity of each CART can be expressed as: in which T is the total number of nodes, w is the score of each node, γ and λ are the penalty terms for regularization to avoid over-fitting.When t-th CART is created to fit the residual error of the former CARTs, the prediction of the new tree can be expressed as: Then the objective function can be written as: Expanding f t with Taylor polynomial and assuming the loss function L uses the form of mean square error, the approximation of objective function with second-order accuracy can be expressed as: in which g i and h i denote the first-and second-order derivative: Because the value of objective function depends only on g i and h i , a customized loss functions can be supported by XGBoost.In this study, the mean square error is used as the loss function.Using the XGBoost API [98], the selected hyperparameters for tunning process are gamma (minimum loss reduction required to make a further partition on a leaf node of a CART), n_estimators (the number of CART), learning_rate (boosting learning rate), max_depth (the maximum depth for each CART) and min_child_weight (minimum sum of instance weight needed in a node).

Light Gradient Boosting Machine (LGBM)
As a major challenger to XGBoost, LGBM was developed to achieve faster training speed and lower memory usage, while maintain high accuracy.The main difference between LGBM and XGBoost lies in their ways of growing trees, as is shown in Fig. 5. XGBoost, as well as most Fig. 3.An example of CART.gradient boosting algorithms, grows trees level-wise following the splitting procedure of CART, as described in section 3.1.1.While LGBM grows tree leaf-wise by splitting the leaf node using a histogram-based algorithm, which yields great advantages on both efficiency and memory consumption.As leaf-wise tree growth leads to more model complexity, LGBM can result in more accuracy gain in every iteration.However, it also means high risks of over-fitting, which is tackled by regularization terms.Two techniques, namely gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB), are also implemented to make the LGBM a fast, efficient and stable EML algorithm [99].During the training process, because data samples with high gradient will contribute more to the information gain, GOSS selects the samples with high gradient and discards the ones with low gradient.On the other hand, EFB bundles exclusive features in sparse feature space, thus reduces the dimensionality and improve efficiency.
In this study, the API developed in Ref. [99] is used to build a LGBM model based on the NU database.The selected hyperparameters are num_leaves (the maximum tree leaves of each CART), max_depth (the maximum depth of each CART), n_estimators (number of boosted CART to fit), learning_rate (boosting learning rate) and min_child_samples (minimum number of samples needed in a leaf node).

Bayesian Optimization
In most supervised learning tasks, a bias-variance tradeoff has always been one of the most tricky and essential objectives, since it represents the generalizing capabilities of the model to predict unseen new input data.To achieve the best bias-variance tradeoff, hyperparameters that represent the complexity and the training step of EML models should be well tuned to get highest prediction accuracy in the process of cross validation.For such optimization aim, methods like grid-search and random-search are broadly implemented by enumerating all combinations of presumed values for each hyperparameter [59][60][61][62][63][64][65][66][67][68][69][70][71][72].For example, for the 5 hyperparameters selected in LGBM mentioned in section 3.1.4,if 3 values are assigned to each hyperparameter, then there is 3 5 = 243 combinations of hyperparameters being required to be implemented in 243 times of training process.Furthermore, if a 10-fold validation is required, then 2430 times of training will be required to be  implemented.Therefore, grid-search and random-search can be highly inefficient and time-consuming when it comes to complex models with parameters of high dimensionality and large amount of data.Besides, such methods strongly depend on experience of the programmer, since only limited number of presumed values can be pre-assigned to each hyperparameter.In view of the limitations of grid-search and random-search, this study proposes to adopt Bayesian Optimization to find the optimal hyperparameters corresponding to the highest prediction accuracy in the process of cross-validation.The process of Bayesian Optimization mainly includes two components: a statistical model for modelling the objective function and an acquisition function for the next sampling point.The workflow of Bayesian Optimization used in this study is given in Fig. 6.
In Bayesian Optimization, a fundamental assumption is that the optimization objective follows the multivariate Gaussian distribution (i.e., Y (X) ~ N (0, K), where Y is the prediction performance in crossvalidation, while X is the hyperparameter vectors).Then the Gaussian Process (GP) regression can be implemented to provide a prior for the objective Y, which forms the GP model mentioned in Fig. 6.In GP, the kernel for the covariance term that is corrupted by noise with zero mean and standard deviation σ noise can be expressed as: in which k is the covariance function that is calculated by an exponential function of the second norm of the difference value between two samples; n is the number of samples that are incorporated in the GP model.Under the assumption of Gaussian distribution, the distribution of the observation of next sample Y n+1 can be expressed as a joint Gaussian distribution with the observations in the first n iteration denoted as Thereby, with the GP model, a mapping from the selected hyperparameters to the prediction performance in cross-validation can be drawn, with a credible interval for every inference.Then, an acquisition function will be implemented to calculate the improving potential of every sample point.In this study, the Expect Improvement (EI) is adopted as acquisition function to achieve the balance between exploitation and exploration, which can be expressed as [101]: where Φ(⋅) and φ(⋅) are thecumulative distribution function and probability distribution function of the standard Gaussian distribution; y best is the tentative optimal value in current sample space.Note that the first term in Eq. ( 15) aims to exploit every sample point by evaluating the difference between the potential sample point and the best point in current sample space.The second term in Eq. ( 15) aims to quantify the uncertainty of each sample point.Therefore, the EI function tends to get higher values at sample points with larger posterior mean and credible intervals, and thus the balance of exploitation and exploration can be achieved.

Results and discussion
To evaluate the generalization capabilities of the 3 EML models, based on a pseudo random-number generator, the database is shuffled first and then divided into the training and testing set following the ratio of 8:2.For each EML model, 5 hyperparameters will be selected and tuned by Bayesian Optimization in the process of 5-fold cross validation [92,102].And then, each tuned EML model will be examined by the 5292 unseen samples within the testing set.Furthermore, the performance of EML models on testing set will be compared with the broadly-used empirical model in the fib Model Code 2010 [103].Finally, the game-based theory Shapley Additive Explanation will be implemented on each EML model to provide explanations for its prediction.

Model performance
The performance of the EML models will be evaluated using the following 3 metric indexes: coefficient of determination (R 2 ), root mean square error (RMSE) and mean absolute error (MAE), whose formulations are given as follows: ) which uses a hyperparameter vector θ i as the input and then gives an output of the model performance indexed by the averaged R 2 of 5-fold cross-validation.Thereby, the form of the objective function of the Bayesian Optimization is given as follows:

Optimization result and EML model comparison
Following Eq. ( 19), the performance of 3 EML models in the process of 5-fold cross-validation is optimized.For each EML model, 5 iterations are firstly conducted to form an initial sample space, which establishes the initial GP prior.Then in the following iterations, the acquisition function EI is calculated to select new sample point and then update the GP prior at the end of each iteration.According to Ref. [104], at least 30 iterations is recommended for most optimization aims.In this study, 60 iterations will be implemented to find the best combination of hyperparameters for each EML model.The optimizing history of the 3 EML models is shown in Fig. 7.With Bayesian Optimization, the 3 EML models successfully obtain high prediction accuracy measured by R 2 in the process of 5-fold cross-validation.However, there are distinct differences lying between them: 1) Compared with both RF and XGBoost, LGBM obtains highest prediction accuracy within significantly shortest calculation time, which is about 2.7% of the XGBoost and 3.4% of the RF.The significant improvement in computational efficiency as well as accuracy is attributed to the leaf-wise tree growing strategy, gradient-based subsampling method GOSS and feature bundling method EFB, according to Section 3.1.4.2) Compared with RF, the two gradient boosting models XGBoost and LGBM shows much higher accuracy at the beginning of Bayesian Optimization process and ends with limited improvement of R 2 , which means that both XGBoost and LGBM are much more robust than RF and therefore require less effort in hyperparameter tunning.
The optimized hyperparameters for each EML model are listed in Table .2.The range of each hyperparameter is selected according to references [97][98][99], which is listed beneath the name of each hyperparameter in the form: (lower boundary, upper boundary).Note that although with different names, the parameters min_samples_leaf of RF, min_child_weight of XGBoost and min_child_samples of LGBM all refer to the minimum number of samples required in the leaf node of CART, which controls the complexity of the basic estimators and thus reduces the overfitting risk for each EML model.As shown in Table .2,all 3 well-tuned EML models show preference to larger amounts of basic estimators, which is logical since ensemble strategies (bagging for RF and gradient boosting for XGBoost and LGBM) play as inherent advantage to gain high accuracy for EML models.However, a significant difference between the ensemble strategies lies in their preference for complexity of basic estimators.For RF, which adopts bagging as the ensemble strategy, more complex basic estimators with more depth and smaller leaf nodes are preferred.As a standalone estimator, such complexity is highly prone to overfitting due to its high variance and low bias, especially for CART which has a hierarchical decision path and therefore is sensitive and unstable to fluctuation of input parameters.However, through bagging, the variance of all these complex estimators can be effectively counteracted and therefore produce a strong ensemble estimator.On the other hand, for gradient boosting models XGBoost and LGBM, a preference for less complex basic estimators (i.e., lower depth and large nodes) is observed.Less complex basic estimator, which has high bias and low variance, is prone to underfitting and therefore is called a weak learner.However, by gradient descent methods and strong regularization procedures, gradient boosting method can train the weak learners sequentially to compensate the residual errors of the former Fig. 7. Optimizing history for 3 EML models.
M. Liang et al. ones, and therefore produce a strong learner at last.

Testing results and comparison with model code 2010
Using the 5292 data that was separated beforehand from the original database and therefore have never been exposed to the EML model, the generalization capabilities of the EML models can be examined.The validation results are shown in Table .3.As can be expected from the results of 5-fold cross validation, 3 EML models all obtain superior accuracy in both the training and the testing set.Because only minor discrepancies are observed between the accuracy of training set and testing set, the extent of overfitting is marginal and therefore and be reasonably neglected.Comparing the performance of the 3 EML models, RF and XGBoost obtain almost the same accuracy, while LGBM outperforms the others in both the training and the testing set in most accuracy indexes.
To further validate the performance of the EML models, the broadlyused empirical prediction model of creep in the Model Code 2010 (MC2010) [103] is also implemented on the testing set.There are 7 input parameters considered by MC2010, namely cement type, compressive strength at 28 days, concrete age at loading, volume surface ratio, relative humidity of the ambient environment, loading time and temperature.The comparison between MC2010 and the proposed EML models are shown in Fig. 8.The results show that EML models outperform MC2010 in the testing set.An intrinsic reason accounting for such gap lies in the sophisticated statistical advantages of EML models, which guarantees a balance between bias and variance during training process.Besides, another reason is the number of parameters that are taken into consideration.In EML models, there are 14 parameters being used for training, which is twice as many as MC 2010.Despite the significant gap when compared to EML models, MC2010 still shows certain precision in most data points, which can be attributed to the correct selection of the 7 input parameters.In the following section, 5 of the 7 input parameters of MC 2010 will be proved to be the most influential ones in the prediction of EML models.

Shapley Additive Explanations (SHAP)
Although many ML studies in concrete materials have achieved high accuracy in predictions of their objectives, inadequate attentions are paid to the interpretability of the ML models.For tree-based models, many studies calculate the feature importance based on the decision path, heuristic approaches, or model-agnostic approaches [105].However, these methods are often infeasible and biased for EML models, especially for those ones with high bias [106].In this study, the SHapley Additive exPlanation (SHAP) is adopted to illustrate both local and global interpretation of each input parameter.

Overview of SHAP
Based on the cooperative game theory, SHAP is represented by the average marginal contribution of a feature value across all possible coalitions.Specifically, the SHAP value of a feature is the averaged prediction value of the samples with this feature minus the averaged prediction value of the other samples without this feature.To provide interpretability of an ML model, the output of the ML model is expressed by a linear addition of its input features multiplied by corresponding SHAP values: where f is the mapping function represented by the ML model; N is the number of input features; φ 0 is the average of all predictions; φ i is the SHAP value for i-th feature; X i ' is the coalition vector of the i-th feature, which can be calculated from the original input X i by a mapping function expressed as X i = h x (X i ').Lunderberg et al. [107] proposed a unique solution for Eq. ( 20) based on the 3 desirable preconditions: local accuracy, missingness and consistency.Local accuracy ensures that the additive relation expressed by Eq. ( 20) is achieved.Missingness ensures that all missing values get a SHAP value of 0. And consistency ensures that changing a ML model does not change the relative SHAP value of that feature, which means that assuming z' \j =0 and f x '(z')-f x '(z' \j ) ≥ f x (z')-f x (z' \j ), then φ i (f', x) ≥ φ i (f, x).Thereby the unique solution to the SHAP value of a feature value is weighted and summed over all possible feature value combinations: where |z'| is the number of non-zero entries in z'.For computational efficiency, Lundberg et al. [94] with S being the non-zero indexes in z'.In this study, the SHAP value of each input features are calculated on the testing set for each tree-based EML models based on the API [108].

Local interpretation
Local interpretation aims to give explanation for prediction of each individual sample.As is given by Eq. ( 20), the linear addition of SHAP values forms the output of each sample data, and therefore the SHAP value of each input parameter can characterize the contribution of each feature.As is shown in Fig. 9, based on LGBM, 3 samples that represent 3 typical scenarios are selected to be explained by their SHAP values.With a fixed base value of 68.59, the final prediction for creep compliance is the counteracting result of SHAP values of different features.Force plot is used here to visualize the counteracting process, in which red and blue bars present positive and negative contribution to the magnitude of prediction value, respectively.
In scenario 1 as shown by Fig. 9 (a), although beginning from the base value of 68.59, the final creep compliance predicted by LGBM is 99.06.As shown by the force plot of scenario 1, the most significant reason accounting for this prediction result is the high temperature (T = 80 • C), which is believed to induce desorption of water from the gel and therefore the gel gradually becomes the sole phase subject to molecular diffusion and shear flow [23].Another feature that is of positive contribution for scenario 1 is that drying-related parameters volume-surface ratio (V/S = 30) and relative humidity (RH_test = 99%).High V/S means that less surface area of the specimen is exposed to the environment, which indicates less drying creep and therefore the magnitude of predicted creep compliance decreases.On the other hand, high RH_test can reduce the drying creep and hence is identified as a negative contribution in scenario 1. Besides, loading age (t 0 ) is considered as a negative contribution since scenario 1 has a late loading age (t 0 = 90 days).However, in scenario 2 as shown by Fig. 9 (b), t 0 becomes the most significant positive contribution since its value 0.65 indicates loading at very early age, when the degree of hydration is low and the gel microstructure is weak, indicating more deformation under the action of sustained loading.On the other hand, the leading negative contribution of scenario 2 is high strength at 28 days (fc28 = 122 MPa), which means that in the scenario of early age loading, high strength concrete has a lower creep compliance than normal concrete.
In scenario 3 as shown by Fig. 9 (c), when the loading stress is applied at the age of 28 days (t0 = 28 days), high compressive strength (fc28 = 97.3MPa) is identified as the most significant contribution to reduce the magnitude of predicted creep compliance.Besides, as mentioned in scenarios 1 and 2, normal temperature (T = 20 • C) and high relative humidity (RH_test = 101%) are both considered as negative contribution in scenario 3. Note that the value 101% of RH_test means a steam experimental condition.

Global interpretation
Global interpretation aims to provide an overview of the SHAP values for input parameters of all required samples.In this section, SHAP values of 5292 samples in the testing set are calculated based on the 3 EML models.The SHAP values of each feature are shown in Fig. 10, in which the features are sorted in the order of their averaged SHAP value.Therefore, the features listed on the top contribute more to the final prediction results of their corresponding samples.Comparing the results of the 3 EML models, 5 features are identified as the most significant input parameters influencing the prediction results of creep compliance: time since loading (t), compressive strength (fc28), age when loads are applied (t 0 ), relative humidity during the test (RH_test) and temperature during the test (T).Moreover, for the 5 influential features, boundaries of feature values that can separate negative and positive contribution can be explicitly observed.The influence of the top 5 features on final prediction of creep compliance illustrated by global SHAP values here is consistent with what has already been discussed in local interpretation (see Fig. 9).Besides, for the influence of cement type (cem), the boundaries between the negative and positive contribution are not clear, which mainly due to: 1) limited influence of cement type on creep and 2) imbalanced distribution of the input parameter (see Fig. 2).While for the influence of creep test type (type), which corresponds to the sealing degree (described in detail in section 2.1), a very clear boundary can be observed.The distribution of SHAP values calculated by the 3 EML models for the input parameter creep test type (type) points out that high sealing degree can reduce the magnitude of creep compliance, which is exactly consistent with most observations in basic creep tests and total creep tests and can be explained by the influence of drying creep [9].In order to look deeper into how a precise prediction is given by the EML model, the SHAP values of the most influential parameters need to be further analyzed.Although there is strong consensus between the 3  21), the calculation of SHAP values requires to enumerate all the possible coalition of the input features, which can be very time-consuming.Therefore, the SHAP results of LGBM, which shows the best accuracy and efficiency as mentioned in section 4.1, is adopted to show how the 6 most influential factors affects the final prediction of creep compliance (Fig. 11).The overall patterns of how the 6 parameters influence the creep compliance captured by the LGBM model are: 1) The increase of compressive strength, concrete age, and relative humidity will decrease the (predicted) creep compliance; 2) The increase of loading time, temperature and water-cement ratio will increase the predicted creep compliance.Moreover, some more specific patterns can also be captured: 1) The influence of loading time and concrete age will decrease as their values increase, which is in accordance with power function shape of creep compliance curve and hydration degree evolution in early-age stages [38]; 2) the influence of temperature becomes significant when the temperature reaches 70 • C [23].Thereby, based on common knowledge of how concrete creep is influenced by various factors [23], the reasonability of the LGBM can be examined.

Exemplification of typical scenarios
To further exemplify the applicability of the proposed EML models for predicting the creep behavior of concrete, this section selects 6 typical scenarios from the testing set and makes predictions of the corresponding creep compliance ranging from 1 to 12,000 days (i.e., 32.8 years).The details of 6 selected scenarios (S1~S6) are shown in Table

Exemplification results
Based on the LGBM model, the predictions of the 6 selected scenarios are shown in Fig. 12.As comparison, the predictions made by MC2010 are also plotted.As reported by Refs.[109][110][111], the creep compliance curve can be effectively characterized by power law function and logarithmic function.Thereby, the predictions made by LGBM will also be fitted by the power law function and logarithmic function, as shown in Eq. ( 22) and Eq. ( 23): where a, b, c are fitting parameters.The fitting procedure is based on Least Square Method and is solved by Levenberg-Marquardt algorithm, which is robust and efficient for solving unconstraint problems [112].As shown in Fig. 12, for each scenario, the prediction results of LGBM are in conformity with the experimental results.In S1, S2, S5, and S6, where the experimental data cover wide time ranges, such conformity is much more obvious.Meanwhile, as already validated in Section 4.1.2,the prediction accuracy of LGBM significantly outperforms that of MC2010.Obvious gaps between MC2010 and experimental results can be seen in all selected scenarios.In S2, although the MC2010 shows certain accuracy in first days, it still cannot capture the long-term development of creep compliance in later stages.By comparing the fitting results, both logarithmic and power functions can well characterize the prediction results of LGBM, with the R 2 all above 0.965.In S1-S4, the logarithmic function shows slight advantages compared to power function, but in S5-S6 the power function prevails with higher R 2 .

Limitations and discussion
As shown in Fig. 12, non-smooth creep compliance curves predicted by LGBM are observed.This is inherently due to the nature of the input database in the following 3 aspects: 1) The data of creep tests in the NU database are formed by scattered points distributed in different time ranges and characterized by multiple mixture parameters and experimental conditions, as shown in Fig. 2; 2) Derived from creep tests that last days, months or years, errors inevitably happen to the experimental results and therefore the scattered points often do not follow a presumed smooth curve, as shown in Fig. 12; 3) Some information is missing in the NU database, as processed in Section 2.1 and 2.2, which can also influence the prediction results.Because the performance of the EML models highly depends on the input database, the scattered, incomplete and possibly abnormal data can result in local numerical instability of EML models in prediction of the creep compliance at a long-term continuous time range.In view of the limitations mentioned above, 3 possible measures can be taken to promote the applicability of ML approaches in prediction of creep compliance: 1) Improve the quality of the database before training.This can be achieved by implementing a thorough data cleaning procedure to identify anomalies and interpolate the missing values.However, large-scale data cleaning procedure requires a highly-accurate algorithm, or the quality of the database may be significantly reduced.A potential method is to fit all the creep test data with a predefined function (e.g., logarithmic or power functions), and then using the fitting parameters as output of ML models.However, this can be risky because the fitting results are already biased (by the predefined function) and may often be very different from the original testing results.In this study, only slight and careful preprocessing measures have been implemented to tackle the missing information while maintaining the originality of the NU database (as discussed in Section 2.1 and 2.2). 2) Increase the robustness of ML models to decrease the influence of unstructured and possibly abnormal data points.This can be achieved by the "bagging and bootstrap" procedure typically used in EML models, which are reported to be highly effective in prevent overfitting (as discussed in Section 3.1).Besides, instead of training the ML models on scattered points as this study, another potential method can be to treat the experimental results as time-series, although the unstructured database may pose even more difficulties in this case.3) Keep the originality of the database and filter the prediction results after training.This can be achieved by simply fitting the prediction results of EML models with a predefined function.In this study, the logarithmic and power functions are fitted by the prediction of LGBM.As shown in Fig. 12, these fitted results still preserve high accuracy when compared with original experimental results.
Overall, this study mainly adopts measure 2 and 3: the advantages of EML models are taken to prevent overfitting and obtain high accuracy through Bayesian Optimization.This is validated by both crossvalidation and SHAP.For further application, the final results on a long-term continuous time range should be processed by a proper fitting function (e.g., logarithmic or power functions).As for measure 1, this study does not implement large-scale data cleaning procedures, while only conduct slight and careful measures to complement missing information.This keeps the originality of the database but also allows the anomalies to influence the training process.Further studies are recommended to focus on the data preprocessing techniques to effectively filter the anomalies, interpolate missing information while maintaining the originality of the database.

Conclusions
This study builds three Ensemble Machine Learning (EML) models (i.e., Random Forest (RF), Extreme Gradient Boosting Machine (XGBoost)  and Light Gradient Boosting Machine (LGBM)) for prediction of creep of concrete, based on an input vector constituted by parameters of concrete mix proportions, environmental conditions and loading conditions.Firstly, the NU database is preprocessed by a prebuilt XGBoost model and split into a training and a testing set.Then, through Bayesian Optimization and 5-fold cross validation, the EML models are tuned to achieve a high accuracy.Based on the cooperative game theories, the Shapley Additive Explanations (SHAP) is adopted to interpretate the predictions of the EML model and validate its reasonability by comparing its interpretations with common knowledges in the field of creep of concrete.Finally, the applicability of the EML models is exemplified by 6 typical scenarios and the limitations and potential improvement measures of the ML approaches are discussed.Through this study, the following conclusions can be obtained: (t0), relative humidity during the test (RH_test) and temperature during the test (T).(4) The patterns that EML models captured are consistent with common knowledge about factors that influence concrete creep, which validates the reasonability of proposed EML models.(5) The prediction of EML models on a long-term continuous time range are in conformity with experimental observations.The creep compliance curves are not smooth due to the scattered, incomplete and possibly abnormal data in the NU database.Further improvements are required on the data preprocessing techniques for anomaly identification and missing data imputation.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

4 .
The selected scenarios cover creep test results of the frequently-used w/c ratio of 0.4-0.6 under a temperature of about 20 • C. S1 and S2 represent the total creep tests starting from the concrete age (t0) of 28 days with loading durations of 1944 and 1278 days, respectively.S3 and S4 represent the basic creep tests starting from the concrete age of 7 days with the same loading duration of 28 days.S5 and S6 represent the basic creep tests starting from the concrete age of 14 days with loading durations of 11,076 and 4992 days, respectively.

Fig. 9 .
Fig. 9. Force plot for local interpretation of 3 typical scenarios based on LGBM.

Fig. 11 .
Fig. 11.SHAP values of the most influential features based on LGBM.

Table 1
Hyperparameters of XGBoost for estimating missing compressive strength data.
M.Liang et al.

Table 3
Comparison of EML model performance.

Table 4
Details of selected scenarios for exemplification.