Predictive Regression Models for the Compressive Strength of Fly Ash-based Alkali-Activated Cementitious Materials via Machine Learning

-Fly ash powders produced from pulverized carbon are a promising renewable and sustainable replacement for Ordinary Portland Cement (OPC) in concrete. However, quantifying the desired compressive strength threshold requires defining the ratio of Fly Ash (FA) to fine aggregates (S). This study presents two novel machine learning models to predict the mechanical properties of FA-based Alkali-Activated Cementitious Materials (AACMs) using supervised regressors. The two models, SLR and MGSVM, showed high prediction accuracy (~95%) based on raw compressive strength training datasets from AACMs with mixed proportions of FA/S (0, 5, 10, 15, 20, 25, and 30%) for 28 days of curing. Maximum compressive strength of ~67.5MP was observed at approximately 20% FA/S (spline interpolation), suggesting the attainment of high mechanical stability. Having more than 30% FA/S indicates a high probability of recovering the original strength of 61MPa for pristine AACMs. The nonlinear stress or strain patterns against FA/S confirmed the applicability of stress-strain relationships and elasticity laws. The pozzolanic properties of FA facilitate interaction with Ca(OH) 2 for aggregation linked to the non-linear behavior. This study provides generalized design models for correlating the mix proportions in OPC-substituted AACMs to the optimum compressive strength.

INTRODUCTION Concrete consists of Ordinary Portland Cement (OPC), aggregates, water, and some other materials and chemical additives, such as superplasticizers. OPC, as the primary binding material, plays a crucial role in determining concrete's properties [1]. Mineral admixtures are natural pozzolans (i.e. form cementitious compounds in a finely divided form in presence of water, combined with calcium hydroxide) like coal Fly Ash (FA) or fine aggregates (S), which can be obtained commercially from thermal power plants [2,3]. The compressive strength of FA-based concrete was found to be strongly correlated with the Blaine value [4], the pozzolanic reaction or cure age (high Ca(OH) 2 and the reaction of SiO 2 +Al 2 O 3 in FA induce reaction rates), resulting in non-linear relationships [5] and forming hydrate deposits (the decreased Ca(OH) 2 content and low Ca 2+ concentration increase the deposition of C 3 S and cement hardness) [6][7][8]. Physical parameters, including Blaine (surface area) and particle size, along with other chemical processing parameters, such as C 3 S, C 2 S, C 3 A, C 4 AF, and SO 3 contents, are some of the factors determining the compressive strength after 28 days of curing [9][10][11]. The compressive strength of FA concrete was predicted in [12] using the "Particle Model" according to the classification and the chemical information of FA particles. Very high precision was obtained from the model with R 2 =0.99 to predict the compressive strength of 20% and 40% FA substitution (mass replacement) of cementitious materials with different curings from 3 to 180 days. This particle model applied Machine Learning (ML) classification algorithms, which classified FA particles into 9 different groups [13][14], allowing to build models and carry out regression analysis at different curing times giving empirical prediction models. These models were able to predict the compressive strength of FA concrete for 26 various FA sources (Class C and Class F based on ASTM C618 [15] for concrete mixtures made of OPC). It should be noted that ASTM C618 ASEM was previously published as a widely accepted method to compare the performance of different FAs [16]. Type I Portland cement consists mostly of CaO (~63%), SiO 2 (~21%), and Al 2 O 3 (~4.6%) with other trace elements. The ratio of water-tocementitious material of 0.45 is a common practice when testing mixture proportions with a 20% or 40% FA replacement content. The identified strength performance of the mixtures based on 2000 particles of existing 11 elements (Si, Al, Fe, Ca, Mg, S, Na, K, Ti, P, and Sr) showed that the Particle Model was capable of building accurate predictive equations [16][17][18].
In [19], the Unconfined Compressive Strength (UCS) of coal FA-based cement-based pastes, mortars, and concrete was predicted using ML models, showing a negligible mean square error of 5MPa according to mixtures following the European Standards (35% for cement and 55% for concrete). These models described experimental data with UCS ranges of 32.5-52.5MPa and 12-60MPa based on the European limits on cement and concrete respectively. Mix composition data were trained for the generalization of the model [20] from Neural Network Analysis (NNA) and based on the defined input and output variables and the nonlinear relationships, concluding that the least influencing variables on strength were additives, being water or aggregates and cement, while the products and compositions of different amounts of FA were the detrimental variables. The development of sustainable concrete mixtures with optimal compressive strengths was also studied in [1,[21][22][23][24][25] by combining linear regression with SVM and Artificial Neural Networks (ANNs), resulting in successful and precise models focused on the reduction of environmental impacts of concrete from information like aging time, contents, and ratios between contents.
This study examined the impact of adding FA powders or S as a sustainable full replacement of OPC in concrete from collected experimental datasets on the compressive strength of FA-based Alkali-Activated Cementitious Materials (AACMs). An ML analysis was applied with supervised training and testing of mixed proportions data, using SLR and MGSVM regression trainers from MATLAB toolboxes, for building accurate strength predictive models for 28-days cured AACMs. The reliability and validity of the built models were evaluated using residual analysis to identify the models with minimum statistical error. The compression results obtained from data fitting and the prediction analysis using trained models were utilized to find the ideal FA/S ratio to maximize the compressive strength and mechanical stability. Furthermore, changes in the mechanical properties were studied at high mixing ratios to check the effect of FA pozzolanic properties on particle aggregation and provide a generalized design model for OPC-substituted AACMs with optimum compressive strength.
II. METHODS AND FRAMEWORK All AACM composite mixtures were prepared at room temperature at 20±1°C and 65±5% relative humidity and then were cured in steam condition. The studied parameters were slag cement (SL), with a constant quantity as the core binder for the full replacement of OPC, combined with FA by partial replacement with S according to the previously mentioned ratios. Alkali activator (AL) to slag cement ratio (AL/SL) of 20% and 50% water/slag cement ratio (W/SL) were used for all mixes to create an economically desirable performance with the complimentary benefits of meeting the sustainable development of high-performance systems [26][27][28][29][30]. The studied mixture proportions of AACMs can be found in [26][27][28]31], and the datasets were gathered from previous experimental works [26,27,31]. The collected datasets for 28-day curing included FA/S of 0, 5, 10, 15, 20, 25, and 30% to evaluate the optimum favorable mixture designs that improve the mechanical stability of AACMs after steam curing with constant ratios of AL and W to SL as the core binder. ML analysis was applied using supervised training and testing of mixed proportions data through the SLR and MGSVM regression trainers from MATLAB toolboxes to build accurate strength predictive models. The original datasets containing 21 values were expanded to 105 using the earlier introduced concept of "in-between randomization" [32][33][34][35], by correlating randomly generated strength values between the three measured samples of each mixing ratio to the same FA/S proportion. This 5-fold expansion approach allows better ML training and testing analysis. The datasets were then divided randomly into two groups: 80% for training and 20% for testing to check the validity and reliability of the models in predicting the compressive strength. The supervised regression analysis was initiated from a selection of training data points (84) and testing data points (21) of the already curated and built datasets, based on raw experimental results (7 data points with 3 trials each). The model's validity was examined using the testing datasets and by applying residual analysis to identify models with minimum statistical errors. The collected datasets included only FA/S as an independent input parameter and compressive strength as the only dependent variable, based on raw datasets taken from earlier works of AACMs with 28-day steam curing [26,27,31,36,37].
Various supervised regression learners from MATLAB's toolbox [38] were selected for training and testing. Linear and tree regression models as well as SVM and Gaussian Process Regression Models (GPRM) with a 50-fold CV were used to identify the optimal compression strength [38][39][40][41]. The training datasets consisted of four 84×1 matrices representing the input parameters and the compressive strength output. The compression results obtained from data fitting and prediction analysis were utilized to find the ideal FA/S ratio with the maximum compressive strength. Furthermore, changes in the mechanical properties were studied at high mixing ratios to check the effect of the pozzolanic properties of FA on particle aggregation. It should be noted that each sample number was correlated with gradual-increasing ratios associated with the mixing proportions of FA/S 0, 5, 10, 15, 20, 25, and 30%, and their corresponding compressive strengths from experiments for both training and testing data points, which were originally forked from the raw datasets.
It is quite common to measure a models' validity and prediction ability using various statistical metrics, including coefficient of determination (R 2 ), Mean Square Error (MSE), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and residual analysis. These metrics were respectively calculated using their mathematical definitions [42][43][44]: where ‫ݔ‬ , and ‫ݔ‬ are the observed values, ‫ݔ‬ , and ‫ݔ‬ are the predicted by the ML model values, ‫̅ݔ‬ is the average of the experimentally obtained values, ‫̅ݔ‬ is the average of the predicted values, and ݊ is the dataset size. RMSE  where more similarities arise between the trendlines of both experimental and predicted samples with higher R 2 [45]. Once the regression learners were trained, the statistical metrics obtained from the different models were compared. Then, the best models (e.g. SLR, fine trees, MGSVM, ensemble boosted trees, ensemble bagged trees, squared exponential Gaussian process regression) were selected by checking if R 2 >0.7. Only models that met the previous conditions were further examined to compare their predicted response patterns. Finally, the MGSVM (R 2 >0.95) model was chosen for further analysis against the SLR (R 2 >0.68). Figure 1 depicts the process of selecting the optimal supervised ML models for the accurate prediction of strengths correlated to FA/S.

III. RESULTS AND DISCUSSION
A. Impact of Adding FA/S The impact of adding FA/S content to the studied AACMs was identified from the datasets. The selected testing data points showed a similar pattern, while an inspection of the experimental results shown in Figure 2(A) confirms the reliability of the built ML models for accurate predictions. The cementitious materials showed a maximum compressive strength of ~67.5MP at ~20% FA/S ratio according to the raw data, as well as the spline interpolation fitting analysis using ORIGIN as illustrated in Figure 2(B). Without added FA or S, it appears that the cement cannot withstand high compressions greater than 61MPa, indicating the relatively poor mechanical strength before the addition of FA/S. However, increasing the FA/S ratio by more than 30% increases the probability of reaching a point where the original compression strength of 61MPa can be recovered, which is similar to the case of having pristine cement without additives.
Interestingly, these results were proved through experiments, supervised ML, and regression fitting. Combining a core binder with a higher FA/S replacement ratio of up to 20% resulted in improved compressive strength by forming a denser binder matrix, which confirmed more released heat and higher initial mix temperature. However, the replacement ratios of FA/S over 25% reduced the composites' compressive strength due to the lack of accessibility for hydration, as lowering produced heat inhibited dense binder formation, influencing the pozzolanic reactions and AACMs mechanical properties. Raw data suggest the addition of higher FA rates than those of fine aggregates to achieve high mechanical stability, as much as closer to the optimal point of ~20% FA/S ratio for the maximum mechanical and compression stability of the cementitious materials, as shown in Figure 2. This could be achieved using training/testing dataset ranges of 0-300kg/m 3 and 900-1200kg/m 3 for FA and S additives, respectively.

B. Compression Predictions from SLR and MGSVM
SLR and MGSVM showed unexpectedly high prediction accuracy for the observed pattern, as shown in Figure 3 learners. The selection of training and testing data points from the already curated and built datasets, based on the raw data experimental results, led to the acquisition of training and testing trendlines of various samples against the observed compressive strength, as shown in Figures 3(A) and (B), respectively. Similar regression trends were generated when testing the model's validity. It is worth noting that the sample number is correlated with gradual-increasing ratios associated with the various FA/S ratios and their corresponding compressive strengths. Since both SLR and MGSVM were capable of producing results very close to the experimentally observed pattern, they were both considered as the possible optimal available supervised models to predict the changing pattern of the experimental datasets, including training and testing data points originally forked from the raw datasets.

C. Exceedance Probability for SLR and MGSVM
The exceedance probability is the probability that a certain value will be exceeded in a predefined future period [46,47]: where n is the total number of compressive strengths, m is the ranking from highest to lowest of observed, trained, and tested data separately. The observed (training) or predicted (testing) outputs were utilized to statistically characterize the compressive strength of AACMs, as shown in Figure 4. The values observed exceeded various percentages of the specified compressive strength. A 5% exceedance probability from both SLR and MGSVM models means that a high compressive strength could exist and be exceeded by only 5% of all sampled records, whereas a 95% exceedance probability from sampled records is characterized with low compressive strength.

D. Predictions Accuracy and Statistical Errors
A comparison between the training and testing datasets was used to build the ML regression prediction models, as shown in Figure 5. The statistical analysis allowed checking the prediction accuracy of the built models. Most of the observed versus the predicted training dataset points, either from SLR or MGSVM models, were very close to the exact approximated compressions values and close to the drawn diagonal dotted line shown in Figure 5(A). Similarly, the same pattern was observed for the testing datasets, as shown in Figure 5(B). These results imply the correctness and the reliability of the model to be used for predicting the experimentally obtained compression results, without the need to conduct further experiments when dealing with similar cementitious materials. Note that regardless of the number of used data for testing, one would get similar accurate model results if a minimum of 84 experimental data points were used in the model training session to ensure the model's capability of producing high predictions. The statistical error parameters obtained for each built model, including RMSE, R 2 , MSE, and MAE, are shown in Table I.

E. Residual Analysis, Stress/Strain in FA-Based Composites
Residual statistic analysis is a method to check the reliability of the built ML models for their potential adoption by the scientific community. The compression strength results of the prediction analysis were compared, according to their calculated residuals which showed a very low residual range of ±1, indicating the high precision of the supervised models as shown in Figure 6. The closer the determined residual points to the origin-line, the better the model accuracy becomes, indicating very low statistical errors. Thus, it seems that the MGSVM outperformed the SLR trained model due to its closer-to-the-zeroth-line points observed in the residual analysis. Moreover, the tested results residuals were almost identical to those obtained from the training datasets, and this is only possible if the required compression of the AACMs composites is for samples with mixed proportions or ratios of FA/S in the range 0-30%. Such extrapolation may be possible if a huge selection of data points is available, which can be achieved from the in-between randomization and getting leverage of the inevitable experimental errors to create dataset inputs correlated to the same output ranges, yielding in much larger training datasets expanded for the possibility of predicting compression in mixed proportions with FA/S ratios beyond 30%. However, supervised ML models are ideal when analyzing inputs within the same provided experimental dataset constraints to ensure the correctness of the results without the need to conduct the experimental work. The average stress and strain values were estimated using stress-strain relationships and elasticity laws for various FA/S ratios, confirming patterns that would follow non-linear behaviors of a polynomial function of third-order and secondorder for stress and strain, respectively, as shown in Figure  7(A). Furthermore, the approximate impact of high compression, or even stretch after the applied force is released on the material, on the designed FA-based composites was determined, shown as changes in the matrix length from the Hook's law analysis shown in Figure 7(B). However, both stress and strain were found to follow a linear pattern when plotted against compressive strength, observed from the various FA/S ratios, as shown in Figures 7(C) and (D) respectively. This indicates the possibility of directly applying common stress-strain relationships for the designed composites for further compression investigations.

F. Novelty, Contribution, and Significance of the Results
Several ML models were used to predict the mechanical properties of FA-based AACMs. The two presented models, SLR and MGSVM, showed a high prediction accuracy of ~95%. Maximum compressive strength of ~67.5MPA was observed at ~20% FA/S (spline interpolation), which suggests the attainment of high mechanical stability. A FA/S greater than 30% indicates a high probability of recovering the original strength of 61MPa from pristine AACMs. This analysis showed the promise of using FA for a sustainable full replacement of OPC in concrete validated by SLR and MGSVM. The non-linear patterns of observed stress and strain against FA/S ratios were confirmed and linked to the pozzolanic properties of FA, facilitating interactions with Ca(OH) 2 for aggregation. A novel theoretical analysis was suggested to investigate changes in mechanical properties of various FA-based compositions and the impact of FA pozzolanic properties on particle aggregation to provide generalized design models for correlating mix proportions to optimum compressive strength. IV. CONCLUSION This study presented two high-accuracy predictive models from collected experimental datasets on the compressive strength of FA-based AACMs. The applied ML analysis was conducted using a supervised training and testing procedure on various proportions of fly ash (FA) to fine aggregate (S) ratios of 0, 5, 10, 15, 20, 25, and 30%. The analysis showed the promise of using FA as a sustainable full replacement of OPC in concrete and was validated with SLR and MGSVM regression trainers for 28-days of steam curing samples. This study aimed to evaluate the most optimum favorable mixture designs that would improve AACMs' mechanical stability and maximum compressive strength. The built models predicted the mechanical properties of FA-based AACMs with high prediction accuracy (~95%) using MGSVM regressions that outperformed SLR-trained models due to their closer-to-thezeroth-line points observed in residual analysis, translated as minimum statistical errors. The concept of "in-between randomization" was applied by taking advantage of inevitable experimental errors for expansion of raw datasets by 5-fold to obtain a better ML analysis, using strength as output and mix ratios as input. The results revealed a maximum compressive strength of ~67.5 MP at ~20% FA/S, obtained from data fitting using trained models, suggesting an optimal ratio for an economic desired compressive strength threshold and the attainment of high mechanical stability of the AACMs. The non-linear patterns of the observed average stress and strain against FA/S ratios, but with linear patterns against strength, were confirmed indicating the applicability of stress-strain relationships and elasticity laws for the built composites. The pozzolanic properties of FA that facilitate interaction with Ca(OH) 2 for aggregation were linked to non-linear relationships. Furthermore, the approximated impact of high compression on the designed composites was realized from the introduced changes in the matrix length from Hook's law. This study suggests a novel theoretical analysis to investigate the changes in the mechanical properties of various compositions based on FA and the impact of FA pozzolanic properties on particle aggregation. Such works could offer generalized design models for optimum compressive strengths needed in engineering construction applications.