Abstract

This study aimed to develop accurate models for estimating the compressive strength (CS) of concrete using a combination of experimental testing and different machine learning (ML) approaches: baseline regression models, boosting model, bagging model, tree-based ensemble models, and average voting regression (VR). The research utilized an extensive experimental dataset with 14 input variables, including cement, limestone powder, fly ash, granulated glass blast furnace slag, silica fume, rice husk ash, marble powder, brick powder, coarse aggregate, fine aggregate, recycled coarse aggregate, water, superplasticizer, and voids in mineral aggregate. To evaluate the performance of each ML model, five metrics were used: mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), coefficient of determination (R2-score), and relative root mean squared error (RRMSE). The comparative analysis revealed that the VR model exhibited the highest effectiveness, displaying a strong correlation between actual and estimated outcomes. The boosting, bagging, and VR models achieved impressive R2-scores in the range of 86.69%–92.43%, with MAE ranging from 3.87 to 4.87, MSE from 21.74 to 38.37, RMSE from 4.66 to 4.87, and RRMSE between 8% and 11%. Particularly, the VR model outperformed all other models with the highest R2-score (92.43%) and the lowest error rate. The developed models demonstrated excellent generalization and prediction capabilities, providing valuable tools for practitioners, researchers, and designers to efficiently evaluate the CS of concrete. By mitigating environmental vulnerabilities and associated impacts, this research can significantly contribute to enhancing the quality and sustainability of concrete construction practices.

1. Introduction

Machine learning (ML) has emerged as a transformative tool in civil engineering [13], offering promising avenues for advancing prediction and analysis within diverse domains. Its integration into civil engineering practices holds the potential to augment predictability and cost-effectiveness by reducing the dependence on resource-intensive real-time experimentation.

A significant application of ML in civil engineering is the prediction of compressive strength (CS) in concrete [4]. The CS of concrete, a crucial factor for ensuring structural integrity in constructions like buildings and bridges, traditionally involves time-consuming experimental testing. The advent of ML presents an opportunity to create models capable of precise CS estimation, thereby streamlining the assessment process.

Concrete, valued for its strength, durability, and adaptability, serves as a cornerstone in construction. The CS of concrete directly influences its performance, making accurate predictions essential. Traditional methods are resource-intensive, prompting the exploration of ML models for efficient and cost-effective CS assessment. This study focuses on applying various ML models to experimental data involving concrete and industrial byproducts to forecast concrete strength.

To contextualize the study, recent research by Nguyen-Sy et al. [5] applied extreme gradient boosting regression (XGB), artificial neural networks (ANN), and support vector machine (SVM) to predict uniaxial compressive strength in concrete, revealing XGB’s superior performance. Additionally, studies on sugarcane bagasse ash [68] demonstrated the effectiveness of ML models.

The research landscape further expands with investigations into waste marble powder [9], self-compacting concrete [10], lightweight fiber-reinforced concrete [11, 12], and recycled aggregate cement [13], showcasing the versatility of ML in diverse concrete compositions. Studies on foamed concrete [14], high-performance concrete [15], and steelmaking slag concrete [16] underscore the effectiveness of ML models, with outcomes shaping future applications. In addition, ANN is employed to predict the CS of concrete by mixing high volumes of fly ash (FA) [17] is tested. Likewise, in the domain of concrete containing industrial waste materials such as ground granulated blast-furnace slag (GGBFS) and FA, evolutionary learning algorithms like practical swarm optimization (PSO) and genetic algorithm (GA) were employed with the support vector regression (SVR) model as the objective function. The SVR–PSO and SVR–GA models [18] were used on experimental data to forecast the CS of this type of concrete. Moreover, the concrete containing FA [19] was modeled with ANN and fuzzy logic for CS prediction.

In another study involving concrete with FA [20], researchers evaluated experimental data using various ML models. The bagging model (BAM) emerged with a higher coefficient of correlation (R2) compared to gene expression programing, ANN, and decision trees (DT). The ML-based approaches applied on the concrete mix with GGBFS [21] to predict its CS.

Furthermore, research on geopolymer concrete [22], cement with metakaolin [23], and supplementary cementitious materials [24] reflects the continuous evolution of methodologies. Notably, the study incorporating rice husk ash (RHA) [2527] demonstrates stacking-based ensemble learning, ANN, and ML models to predict concrete CS.

The present study focuses on a unique approach by combining various types of waste materials with cement to create concrete composites. The waste materials include limestone powder (LP), FA, granulated glass blast furnace slag (GGBS), silica fume (SF), RHA, marble powder (MP), recycled coarse aggregate (RCA), superplasticizer (SP), voids in mineral aggregate (VMA), and brick powder (BP). These materials are combined with the conventional components of concrete, namely coarse aggregate (CA), FA, and water (W).

The presented research makes a significant contribution to sustainability efforts through its innovative approach to concrete composite production. By incorporating an extensive array of waste materials, including LP, FA, GGBS, SF, RHA, MP, RCA, SP, VMA, and BP, into the concrete mix, the study not only addresses environmental concerns related to waste disposal but also reduces reliance on traditional raw materials. This inclusive approach aligns with sustainable practices by promoting the reuse of industrial byproducts and minimizing the environmental impact associated with waste generation.

The experimental results were obtained by conducting extensive experimentation to assess the performance of these different combinations. Subsequently, they utilized a comprehensive set of regression models to model these experimental results. The study employed four baseline regression models (BRM), namely linear regression (LR), SVM, k-nearest Neighbor (KNN), and DT. Additionally, explored two boosting-based models, light gradient boosting (LGBM) and XGB, as well as tree-based BAMs like Random forest (RF) and extra tree regression (ETR). Moreover, this study incorporated both the ETR-based bagging model (BagETR) and XGB-based bagging (BagXGB). Furthermore, average voting regression (AVR) models are used in their analysis.

Additionally, the research contributes to sustainability by leveraging advanced regression and ensemble models to predict the CS of the resulting concrete composites. The utilization of these modeling approaches demonstrated a commitment to enhancing predictive accuracy and efficiency. By optimizing concrete formulations through sophisticated modeling techniques, the study aims to improve the overall sustainability and performance of concrete in construction applications.

This current research stands out from the related works mentioned previously, as it not only considers a wider range of waste materials in the concrete mix but also employs a diverse set of regression and ensemble models to predict the CS of the resulting concrete composites. By combining various waste materials with traditional concrete components and leveraging multiple advanced modeling techniques, this study aims to explore innovative ways to enhance the sustainability and performance of concrete in construction applications.

2. Materials and Methods

The primary objective of this study is to determine the CS of concrete using artificial intelligence methods, offering an efficient and cost-effective approach compared to extensive empirical measurements. In this section, we outline the methods utilized to predict the CS of concrete, the dataset, and the testing procedures. This work employed the following artificial intelligence techniques such as BRM, boosting model (BM), BAM, and tree-based ensemble models (TAEM).

2.1. Dataset

The initial dataset used in this study consists of 223 experimental compositions of concrete. The CS of the concrete samples ranged from 10 to 120 MPa. The dataset includes the following features: cement (C), LP, FA, GGBS, SFs, RHA, MP, BP, CA, fine aggregate (Fa), RCA, W, SP, and VMA.

2.2. Data Collection and Testing Procedures

The data collection process involved testing concrete cube samples of size 150 × 150 × 150 mm3. These samples were tested in a universal testing machine with a capacity of 100 T, adhering to the recommendations of the Bureau of Indian Standards (IS 516(1959)).

The experimental data was compiled and accumulated in three phases. In the first phase, the experimentation is carried on the combination of cement, CA, Fa, RCA, and W. In the second phase, MP, brick power, CA, Fa, and W were added with cement. Next, in the third phase, LP, Fa, GGBS, SP, RHA, VMA, CA, and W were combined with cement. By employing the described artificial intelligence methods and utilizing the comprehensive dataset obtained from the experimental tests, we aim to accurately determine the CS of concrete. This approach offers a valuable alternative for assessing the mechanical properties of the material, taking into account both linear and nonlinear relationships between input features and predicted values. The statistical characteristic of the dataset is shown in Table 1.

The dataset consists of 223 instances with 15 features (as reported in Table 1) in which CS is a dependent feature, and the remaining 14 are independent variables.

From the dataset, it is observed that the 11 features/materials were not used many times, as reported in Table 2. In such cases, the value for the corresponding material is assigned with zero as an indication of unused material. This phenomenon makes it as noisy dataset [2830], as well as close to sparse matrix behavior.

3. ML Modeling and Performance Evaluation

3.1. Proposed Model Architecture

The proposed model consists of three phases: first, it performs data scaling; second, data splitting; and finally, model training, performance evaluation, and comparison analysis. The first two phases, such as scaling and splitting of data, were common to all the ML models. Figure 1 shows architecture for base regression models, where the are trained LR, SVR, KNN, and DT models, are predicted data by the trained LR, SVR, KNN, and DT models on test data, and are the evolution results of the respective models based on predicted and actual data.

Figure 2 shows the architecture for the usage of boosting, ensemble, and BAMs. In Figure 2, the are trained LGBM, XGB, RF, ETR, BagETR, BagXGB models, indicates prediction result of trained LGBM, XGB, RF, ETR, BagETR, BagXGB models on test data, and are the evolution results based on predicted and actual data of the respective models.

Figure 3 shows the architecture of the proposed max voting regression (MVR) model. The best-performed three models, such as ETR, BagXGB, and BagETR, were used to develop a max voting model. In Figure 3, the are objects of ETR, BagETR, BagXGB models, indicates MVR model training, followed by model testing and prediction result of on test data. The is an evolution result based on predicted and actual data of the MVR model. The best-performed models, such as ETR, BagETR, and BagXGB, are integrated as MVR models, as shown in Figure 3. The AVR takes the average outcome of all these models as the final result.

3.1.1. Scaling

The feature standardization or scaling is an important aspect in ML modeling since it makes the degree of all the features as equal by transforming feature values into the same range. This helps the model to assign equal weightage to all the features of the dataset during the learning process. The standardization finds mean of each feature through Equation (1), standard deviation through Equation (2), and finally, obtains scaled feature, as reported in Equation (3), where indicates number of instances indicates value of feature.

3.1.2. Data Splitting

The dataset was divided into a training set with 189 instances and a test set with 34 instances, using an 85% and 15% train and test split ratio, respectively. The training set was further partitioned into two parts: one part consisting of 189 instances with 14 independent features and another part consisting of 189 instances with one dependent feature. These parts are, respectively, referred to as the training data for the independent features and the training data for the dependent feature. Similarly, the test set was split into two parts: one part consisting of 34 instances with 14 independent features and another part consisting of 34 instances with one dependent feature. These parts are, respectively, referred to as the testing data for the independent features and the testing data for the dependent feature.

3.1.3. ML Models

In view of dataset nature [1719], this work considers four different categories of regression models, namely BRM, BM, BAM, and TAEM as described below:

(1) BRM. The BRM, such as LR, decision tree regression, SVR, and elastic net regression, were applied on the dataset, and its performance was evaluated through five metrics. These models provide a straightforward approach for capturing linear relationships between input features and the predicted output. The structure of each model is as follows:(1)LR establishes a linear relationship between the waste materials (LP, FA, GGBS, SF, RHA, MP, RCA, SP, VMA, BP) and CS of concrete composites. The coefficients and intercept were determined during training to predict the output CS.(2)SVM utilizes a hyperplane to separate data points in a high-dimensional space. It employs a kernel function to map input features into a higher dimensional space, making it effective for capturing nonlinear relationships between waste materials and CS.(3)KNN operates by classifying data points based on the majority class of their KNN. It assesses the similarity between data points in feature space, offering insights into the local relationships between waste materials and CS. The number of neighbors (k = 3) and the distance metric were important considerations during implementation.(4)DT comprises a tree-like model of decisions, where each internal node represents a decision based on a feature, each branch represents the outcome of the decision, and each leaf node holds the predicted CS. It provides a clear representation of decision rules.

(2) BM. Utilizing the concept of boosting, this model combines weak learners to create a powerful predictive model. Boosting is an ensemble learning approach used to improve the performance of ML models that train a specified set of weak learners; each model tries to improvise the efficiency by correcting incorrect predictions of the previous model and, lastly, takes the weighted average as the final output. Moreover, BMs [1] handle noisy data, missing values, zeros effectively and attain high accuracy. Similarly, the dataset presented in Table 1 is almost close to the sparse matrix in nature. Therefore, the proposed work uses LGBM and XGB models. The LGBM is faster and consumes less memory. However, it is prone to overfitting. On the other hand, the usage of regularization methods in the XGB model prevents overfitting, and as a result, it obtains more accurate results than LGBM. The XGB uses default parameters such as a learning rate of 0.1, a number of estimates of 100, random state set as 0, and the maximum depth of tree is 3. On the other hand, the LGBM employed the maximum depth of tree as −1, and the remaining parameters were the same as XGB.

(3) TAEM. These models, including RF and extra trees, leverage DT to create ensembles that improve prediction accuracy. This approach combines multiple base regression models to improve the model performance. Random forest regression (RFR) and ETR were considered in this work. Both models build multiple DTs and aggregate their results to make a concluding result. The ETR randomly chooses the splitting point and considers all the features and data, which makes the ETR less sensitive to the noise comparatively with RFR. Due to these reasons, sometimes, ETR performs better than RFR with noisy dataset also. Therefore, this work employed ETR in view of the sparse matrix nature of the dataset. The number of estimators is set as 100, and the random state is set as 1 for both RF and ETR.

(4) BAM. Based on the concept of bagging, this model generates multiple bootstrap samples and averages their predictions to enhance accuracy. Bagging is also an ensemble model in which training data are divided into different subsets, and multiple models are trained independently on different subsets of training data, and then the combination of their predictions is considered the final prediction. Bagging can be built with any ML model. Hence, to increase effectiveness and prediction sore the proposed work employed ETR and XGB as base learners for bagging, which leads to build two more models, such as BagETR and BagXGB models. These models employed a number of estimators as 10 and a random state set as true.

(5) VR. It is an ensemble meta-regression model that takes several unfitted regressors and fits those regressors on training data. Then, the prediction values of all regressors are collected, and selects most common value (mode) among all the regressor outcomes as the final prediction. In this work, as reported in Figure 3, the best-performed three models such as ETR, Bag ETR, and BagXGB, were used to develop a max voting regressor model.

3.1.4. Evolution Metrics

The performance metrics such as mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), R2-score, relative root mean squared error (RRMSE), employed to evaluate, compare, and analyze the performance of all the models.

(1) MAE. It is a mean absolute value of error. It is a summation of all error values obtained from the difference between the actual and predicted value of CS. The error value for each instance is calculated, as shown in Equation (4), and MAE is the summation of all error values divided by number of instances, as shown in Equation (5).where and is actual and predicted CSs, respectively, is error value for instance.where indicates total MAE value for CS and is number of instances of a dataset.

(2) MSE. The , as in Equation (6), measures squared average distance between actual () and predicted () values in other way, it is the summation of squares of all error values of all instances, where as in Equation (4) and is number of instances.

(3) RMSE. It is also known as root mean square deviation or root mean squared error on prediction. It determines how best the predicted () values are surrounded around the actual (). Equation (7) shows a mathematical formula from RMSE.

(4) Coefficient of Determination or R2-Score. It is also known as the model score, which lies between 0 and 1. The score close to 1 indicates that the model performance is accurate in predicting with less error rate. Equation (8) shows a mathematical formula.where

(5) RRMSE. It measures the performance of a model in terms of percentages. The model is said to be excellent when it secures less than 10%, good when it lies between 10% and 20%, fair when it is in between 20% and 30%, and greater than 30% is said to be poor. Equation (9) shows the calculation of RRMSE.

4. Comparison Analysis and Results

The results reported in Table 2 show that the performance of base regression models, as in Figure 1, underperformed due to the sparse matrix nature of the dataset. It is observed from the dataset shown in Table 1 that four features/materials, such as “Recycled Coarse Aggregate,” “Brick Powder,” “Marble Powder,” and “RHA,” consists of 90%–95% of zeros, moreover, the other four materials, namely “Silica Fume,” “Limestone powder,” “VMA,” and “GGBS” are having zeros between 80% and 85%, furthermore, two materials “Fly Ash” and “SP” are having 50%–55% of zeros, and the “Coarse Aggregate” consists 2.7% of zeros. In general, any dataset that consists of 66%–67% of zeros in total is treated as a sparse dataset. However, this dataset consists of almost 56.04% of zero values in total. Because of this, the dataset is close to sparse matrix statistics. In turn, this phenomenon reduced the performance of standard regression models. Table 2 shows the results of all regression models through five evolution metrics, such as MAE, MSE, RMSE, R2-score, and RRMSE.

The performance of BRM, such as DT, SVM, and LR, secured 54.89%, 58.61%, and 69.96%, respectively. This indicates that the base regression models are unable to handle the sparse nature of the dataset. Hence, they secured less prediction performance. However, KNN obtained 86% of the R2-score, which is a reasonably good prediction performance and the highest among all other baseline models.

Furthermore, our choice of ensemble models was guided by their inherent capability to adeptly handle missing and noisy data, addressing the challenges posed by the dataset outlined in Table 1. The observed performance of boosting, bagging, and TAEM surpassed that of the BRM, validating their effectiveness in mitigating the impact of noise on predictions.

Notably, the proposed ensemble-based MVR model, detailed in Figure 3, emerges as the top-performing model, as corroborated by the comprehensive results presented in Table 2. Figures 49 provide an insightful comparative analysis of the predicted CS () against the actual concrete strength, employing various regression models.

Figure 4 highlights the distribution of () for LR, SVM, and MVR, while Figure 5 showcases the spread of () for KNN, LGBM, and MVR. In addition, Figure 6 visualizes the spread of predicted values around actual CS through DT, RFR, and MVR, while Figure 7 delineates the predicted () for XGB, ETR, and MVR. Further insights are provided by Figure 8, displaying the predicted () for BagXGB, BagETR, and MVR, and Figure 9, illustrating the distribution of predicted values by MVR around the actual CS.

The BMs LGBM and XGB also secured 86.69% and 87.87% of the R2-score, respectively. However, the ensemble models, such as RF and ETR, performed well and obtained 91.65% and 92.05% R2-score, respectively. It indicates that the ensemble-based models are good when compared with base and boosting regression models.

However, the BagETR and BagXGB models obtained even better results, as shown in Table 3. The BagXGB obtained 92.18% of R2-score with MSE of 3.62 and 4.74 of RMSE. The RRMSE is also 8%, which indicates that the model performance is excellent. Furthermore, the AVR outperforms all the models with 92.4307 of R2-score, MAE is 3.42, MSE is 21.74, RMSE is 4.66, and RRMSE is 8%, which shows that this model performance is in the excellent category with less error rate than any other models.

4.1. The Performance of ML Models on Augmented Dataset

The extended evaluation of our regression models on the augmented dataset aimed to assess the generalizability of our approach, yielding promising results detailed in Table 4. In this comprehensive testing phase, the Voting model exhibited superior performance compared to other models, showcasing a low error rate, satisfactory model performance (as indicated by RRMSE), and an impressive R2-score of 88.92 on the augmented dataset. The predictive performance of the Voting model is vividly illustrated in Figures 1015, where the predicted values closely align with the actual values. It is noteworthy that the augmentation process, integrating our original experimental data (223 instances) with additional instances (59) from the literature [3134], presented challenges due to disparities in experimentation properties. Despite these variations, the model demonstrated a commendable level of performance. We acknowledge the inherent impact of differences in experimental conditions on model outcomes and recognize the need for careful consideration in future studies.

5. Conclusion and Future Scope

In conclusion, this study adeptly harnesses a diverse array of ML models to predict concrete strength, effectively navigating challenges presented by noisy and sparse datasets, while also incorporating various waste materials. The robustness of our proposed models was further tested on an augmented dataset, seamlessly integrating our original experimental data (223 instances) with additional instances (59) from the literature [3137]. Despite the inherent variations in experimentation properties, our models exhibited a commendable level of performance.

The synthesis of key findings in conclusion emphasizes the substantial correlation achieved by the AVR model between predicted and actual concrete strength values.

Moving forward, it is vital to acknowledge that the constraints posed by the limited dataset size have influenced our ability to fully utilize advanced modeling approaches. This limitation underscores the need for caution when applying our developed models to larger datasets and highlights the importance of securing more extensive and diversified datasets in future research endeavors.

The future scope of our work extends to exploring advanced ML techniques, with a specific focus on integrating deep learning models. By doing so, we aim to overcome the dataset size limitation and further refine the accuracy of concrete strength predictions. This forward-looking initiative will involve a dedicated commitment to collecting comprehensive datasets that encompass a wider range of concrete compositions, addressing challenges related to sparsity and noise, and ensuring a more representative sample for robust model development.

By incorporating these challenges into our future scope, we aim to not only improve the forecasting tool’s accuracy but also contribute to the sustainable evolution of concrete strength prediction methodologies in the field of civil engineering.

Data Availability

The datasets used in this research are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Authors’ Contributions

Conceptualization was done by Lakshmana Rao Kalabarige, Jayaprakash Sridhar, Ravindran Gobinath; Methodology was done by Ravindran Gobinath, Sivaramakrishnan Subbaram, Palaniappan Prasath; Writing—review and editing was done by Lakshmana Rao Kalabarige, Jayaprakash Sridhar, Ravindran Gobinath; Visualization was done by Palaniappan Prasath, Sivaramakrishnan Subbaram; Data curation was done by Ravindran Gobinath, Lakshmana Rao Kalabarige; Formal analysis was done by Lakshmana Rao Kalabarige, Jayaprakash Sridhar; Project administration was done by Jayaprakash Sridhar; Software-related task was done by Lakshmana Rao Kalabarige; and Validation was done by Ravindran Gobinath, Lakshmana Rao Kalabarige.