Using artificial intelligence methods to predict the compressive strength of concrete containing sugarcane bagasse ash

Sugarcane bagasse ash is an agricultural and industrial waste material produced in millions of tonnes annually. While traditionally used as a fertilizer or buried underground, incorporating this material into concrete production not only reduces cement usage and addresses environmental concerns but also enhances the mechanical properties of concrete. This research aims to investigate the predictive capabilities of various Artificial Intelligence (AI) models, including radial basis function neural network (RBFNN), artificial neural network (ANN), and support vector regression (SVR), for estimating the compressive strength of concrete containing sugarcane bagasse ash (SCBA). A dataset comprising 1819 data points from previous studies was collected, consisting of three main groups: 340 data of concrete with SCBA, 459 data of concrete without SCMs, and 1020 concrete data incorporating other supplementary cementitious materials (SCMs). The dataset was utilised in three different ways to evaluate the influence of data on model performance. Firstly, models were trained solely on SCBA data (Group A); subsequently, SCBA data was combined with concrete data without any SCMs (Group B); finally, the entire dataset was utilised (Group C). Statistical analysis revealed that the SVR model trained on SCBA data and the RBFNN model trained on group B data demonstrated the best results and accuracy. However, further refinement of the concrete without SCMs data in group B led to improved model performance, surpassing even that of the SVR model trained on group A data.


Introduction
One of the most popular materials in the construction industry is concrete due to its water resistance, the wide availability of its components in nature, and its flexibility in terms of form and size [1].This material has been used in various kinds of construction projects, ranging from small-scale projects like domestic buildings to large-scale structures such as dams and high-rise buildings [2].The primary components of concrete are cement, natural sand, crushed stone, and water [3].Due to the high demand for buildings and infrastructure, this material is the second most used material on Earth, following water [1].However, cement-based materials (CBMs), including concrete, play a significant role in greenhouse gas emissions, particularly CO 2 emissions.Cement production generates CO 2 from various aspects of its process, such as the decarbonisation of limestone, fuel combustion in furnaces, and electrical energy consumption [4].
To mitigate these emissions, supplementary cementitious materials (SCMs) like agricultural waste and industrial by-products (including rice husk ash, sugarcane bagasse ash, palm oil fuel ash, fly ash, slag, silica fume, and others) can be used.These materials can enhance the mechanical and durability properties of CBMs while promoting sustainability in construction [5].Otherwise, disposing of these waste materials in landfills poses a significant environmental threat [1].In summary, while concrete and CBMs are widely used construction materials, their production contributes significantly to greenhouse gas emissions.However, the incorporation of SCMs in concrete can promote sustainability in construction while reducing their negative impact on the environment.
One type of SCMs that can be used in concrete as a partial replacement of cement is Sugarcane Bagasse Ash (SCBA).SCBA is a residue of the sugar industry that is produced when bagasse (a fibrous material remaining after the extraction of sugar from sugarcane) is burned [6].This burning process produces ash, which is rich in amorphous silica and alumina, making it an excellent pozzolanic material that can partially substitute cement in concrete [6,7].The amount of silica in the ash varies depending on factors such as the type of soil used to grow sugarcane and the burning temperature [6].Sugarcane is grown in many countries, with 15 countries (Brazil, Pakistan, India, Thailand, China, United States, Argentina, Myanmar, Australia, Bangladesh, Cuba, Mexico, Philippines, Columbia, and South Africa) accounting for the majority of its production [8].The global production of sugarcane is estimated to exceed 1.50 billion tonnes annually, generating 400-500 million tonnes of bagasse waste [6].Ashes from sugarcane biomass can be obtained from various streams, such as energy recovery and open-air burning, and their pozzolanic characteristics depend on the type of ash, origin, processing, and the temperature reached during combustion [6].
SCBA has the potential to improve workability of cementitious composites.As compared to the reference concrete (concrete without SMCs), Priya and Ragupathy [3] demonstrated that the use of SCBA in concrete leads to higher compaction factor values and improved workability.Hussein et al. [9] also explored the impact of SCBA as a supplementary cementitious material (SCM) in proportions ranging from 5 to 30 % on the workability (slump) of fresh concrete.Some studies ( [10,11]) have reported a decrease in concrete workability with increasing SCBA content, which is attributed to the absorption of water by the SCBA particles.In the first study [10], they replaced cement with SCBA at levels of 5 %, 10 %, 15 %, 20 %, 25 %, and 30 % and measured the slump.They showed that with a 10 % substitution, the slump of the concrete increased by 17.3 %.With a 30 % substitution, the slump decreased by more than 34 %, indicating that at higher levels of SCBA, the water demand increased, and the slump decreased.In another study, Rao and Prabath [11] investigated the impact of SCBA on concrete workability.They found that when SCBA was used in concrete, the workability decreased.They mentioned that this could be due to the water absorption of SCBA.Therefore, it can be concluded that the influence of SCBA on concrete workability is not consistent and depends on the chemical and physical properties of this material.In most cases, the slump of concrete decreased when SCBA was added to the mixture, because of the porous and irregular shapes of SCBA [12,13].In this situation, not only does the workability of concrete decrease, but in some cases, the mechanical properties also suffer.This is because SCBA contains irregular particles that increase water absorption, reduce the density of concrete, hinder the completion of pozzolanic activity, and consequently, lower its mechanical properties.[12][13][14].The best way to enhance the chemical and physical properties of SCBA, and consequently, improve the properties of concrete, is by employing postprocessing methods such as thermal treatment, grinding, floating, sieving, and chemical activation [15][16][17].It should be considered that most post-processing methods for SCBA require energy and may result in additional emissions of pollutants [15].However, in some situations, simply adding a superplasticizer to the concrete mixture can suffice to mitigate this issue [18].These post-processing methods can be used alone or in combination for SCBA.By combining these techniques, the specific surface area of SCBA can be increased, which accelerates the pozzolanic reaction, and the amount of reactive silica and alumina in SBA can be enhanced by reducing its LOI (Loss On Ignition) [9,19,20].The choice of post-processing method for SCBA depends on its properties, aiming to find the best way that not only improves the properties of SCBA but also minimizes energy consumption in post-treatments [21].In some cases, the properties of SCBA can be improved only through sieving [22] or grinding [18,23].However, sieving and grinding are energy-consuming processes, but the positive impact of using SCBA on the environmental situation, in comparison with the consumed energy, can be substantial.It can be even more beneficial when the duration of these processes is limited to a few minutes [15].In other cases, based on the properties of SCBA, researchers have decided to use reburning or a combination of grinding and reburning of SCBA [5,24].Indications from [5,24] suggest that using post-processing of SCBA can be a very important part of using SCBA in concrete.They showed that by using post-processing, the compressive strength of concrete and slump can be improved compared to reference samples and samples that used unprocessed SCBA.However, [5] showed in their study that unprocessed SCBA could still have a positive impact on the properties of concrete.Additionally, there are a few studies [3,9,25,26] that do not mention any kind of post-processing when using SCBA as a part of cement in the concrete mixture.In their results, improvements in the compressive strength of concrete or slump, or both, can be seen.Therefore, it can be concluded that the most important factors in deciding whether SCBA needs post-processing and in choosing the method of post-processing are the quality, chemical, and physical properties of SCBA.The main factors influencing these properties include the soil on which sugarcane is cultivated, the process of burning, and the temperature of burning.In general, because post-processing improves the physical and chemical properties of SCBA, even if unprocessed SCBA can enhance the properties of concrete, post-processing of SCBA can help to further improve concrete properties compared to unprocessed SCBA.
Incorporating SCBA into cement-based materials such as concrete can enhance [1] their mechanical properties, especially the compressive strength (CS) of concrete [1].However, the improvement in CS relies on several factors, including the concrete grade, water-to-binder ratio, SCBA content, curing time, and curing conditions [6].The key to using SCBA in concrete is to determine the appropriate amount of SCBA to use [3,10,[27][28][29].According to previous research [3,11,27], the optimal amount of SCBA that can replace cement in the mixture is usually between 5 % and 25 %.
By utilising a novel predictive technique that enables the estimation of concrete's mechanical properties in a short time with lower costs, the need for sample preparation (making and testing concrete samples) is decreased, leading to reduced material consumption.This aligns with providing solutions to address environmental concerns, making it a small yet significant step.This method relies on pre-existing information and eliminates the need for making and crushing samples, which can contribute to waste and make errors.Ultimately, implementing this method can reduce environmental pressures in multiple ways.
In recent decades, there has been significant research and investigation into the use of Artificial Intelligence (AI) methods as a predictive tool.Previous studies have shown that AI methods are suitable for assisting engineers and concrete technologists to predict concrete strength.Golafshani et al. [30] utilised Harris Hawks Optimisationbased data-driven methods to predict the compressive strength of green concrete.Their model utilised 1374 input data points, with the compressive strength being the output.Thi Mi et al. [31] employed machine learning methods to predict the compressive strength of fibrereinforced self-compacting concrete, utilising three models, including a Decision Tree, Light Gradient Boosting Machine, and Extreme Gradient Boosting (XGBoost), with XGBoost demonstrating the best results.Another study [32] investigated the predictive performance of three AI methods for fly ash-based geopolymer concrete, which included Radial Basis Function Neural Network (RBFNN), Group Method of Data Handling (GMDH), and Artificial Neural Network (ANN), with RBFNN being the best performer.Samin et al. [3] explored the use of machine learning for predicting the compressive and tensile strength of concrete, while Jibril et al. [33] investigated the implementation of Classical Regression and Nonlinear Computing Models such as Elman Neural Network (ENN), Support Vector Machine (SVM), Multilinear Regression (MLR), and Feed-Forward Neural Network (FFNN) for predicting the compressive strength of high-performance concrete.Additionally, numerous other studies have investigated various AI methods and machine learning models for predicting and optimising the mechanical properties of concrete [34][35][36][37][38][39][40].
As there are limited studies on predicting the compressive strength of concrete containing SCBA, this study aims to investigate the efficacy of various AI methods for this purpose.While some previous research has used AI methods for predicting the compressive strength of SCBA concrete, they have only utilised a range of 90-150 data points and examined the compressive strength of concrete solely after 28 days [41][42][43] In this study, the age of concrete will be considered as one of the input parameters, allowing the determination of compressive strength at G. Pazouki et al. various ages.Furthermore, this study collected information on 340 SCBA concrete samples through a comprehensive literature review.Additionally, over 1479 indirect data points were gathered, including data of concrete samples containing other types of SCMs and those without SCMs, which were then introduced to the models: SVR, RBFNN, and ANN.The research utilises indirect data to investigate the impact of two groups of data: those with SCMs other than SCBA and those without SCMs, on the models' performance.The former dataset was used to assess the models' accuracy, and the latter dataset was used to assess if it is possible to further improve the accuracy and performance of the models.Ultimately, the study will compare the performance of the three models and identify the most accurate and capable model for predicting the compressive strength of SCBA concrete.Furthermore, in Section 6, a multi-objective Genetic Algorithm (GA) will be introduced to determine the optimal mixture design of SCBA concrete using the developed machine learning model.This algorithm simultaneously considers both the cost of concrete and its compressive strength to determine the optimum mix design.Additionally, the performance of the model was tested by introducing higher values for the target price and compressive strength.This was done to confirm the model's ability to operate effectively across a wider range of prices and compressive strengths.

Data collection
When conducting a study, one of the most critical steps is to find reliable resources and comprehensive literature to collect data.This process involves studying many papers, research, and studies to obtain accurate and credible information.In this specific study, a total of 340 SCBA concrete data [5,9,10,14,18,[22][23][24][25][26][44][45][46][47][48][49][50][51][52] and over 1479 indirect data [53][54][55][56] were collected, which were carefully analysed and evaluated.The indirect data comprise 459 data points without any kind of SCMs, while the remaining 1020 data points contain fly ash or slag as SCMs in their mixture.In total, 1819 data points were collected and used as inputs for the models.The dataset was divided into three groups: Group A consisted of 340 data points, specifically focusing on SCBA concrete.Group B included 799 data points, with 459 data points representing concrete samples without any SCMs, and the remaining 340 data points representing SCBA concrete.Group C comprised SCBA concrete data, concrete samples without any SCMs, and other concrete data with different SCMs (fly ash and slag), totalling 1819 data points.
Each group of the dataset was then separately introduced to the models to investigate the impact of these groups on the results, which is extensively discussed in Section 5.
In addition to collecting data, identifying the main parameters for consideration as input variables is also crucial.Therefore, previous research on compressive strength and mechanical properties of concrete was analysed to determine the inputs for the models.The input parameters considered in the models include the weights of cement, water, SCBA, fly ash, slag, fine aggregate, and coarse aggregate, and the age of the samples.Water, cement, coarse and fine aggregates, and SCMs are the main ingredients of concrete.The amount of each of these parameters has a significant impact on the compressive strength of concrete.However, some, like cement, have a more pronounced effect on concrete properties.Furthermore, another crucial parameter is the age of the concrete, which refers to the number of days after concrete curing.The age of concrete plays a vital role in determining its compressive strength.With these considerations in mind, these parameters have been selected as the input variables for the model designed to predict the compressive strength of concrete containing SCBA.To depict the relationship between input variables and the output, Fig. 1 displays the Pearson correlation coefficient.This metric illustrates the linear association between two variables, with values ranging from − 1 to 1.The closer the value is to 1, the stronger the linear relationship between the variables.Conversely, values closer to 0 indicate a weaker or poorer linear relationship.Furthermore, the utilization of the Variance Inflation Factor (VIF) serves to evaluate the presence of multicollinearity or interrelationships among the input parameters, which are independent variables in a regression analysis.VIF quantifies the extent to which the variance of estimated regression coefficients is elevated due to multicollinearity, which is a phenomenon occurring when two or more independent variables in a regression model display strong correlations [57].A high VIF for a specific variable indicates a pronounced correlation with other independent variables within the model.The VIF values for each parameter are provided in the Table 1.In general, VIF values below 5 are typically considered acceptable, while values approaching 1 signify minimal multicollinearity [58].An examination of the VIF values associated with the parameters reveals that most of them exhibit low to moderate levels of multicollinearity.Notably, the first and fourth features exhibit relatively higher multicollinearity compared to the others.Nevertheless, none of the VIF values appear  exceptionally high, indicating favourable conditions regarding multicollinearity within the dataset.Moreover, the scattering range of each parameter has been shown in Fig. 2.These parameters were chosen because they have been identified as significant factors affecting the compressive strength of concrete.Moreover, for better interpretation, two tables, 2(a) and (b), are provided to demonstrate the value ranges of the data used in this study.The information in Table 2(a) pertains to all data, encompassing 1819 data points, while the information in Table 2 (b) specifically concerns SCBA concrete data.
It is important to note that the only output of the models is the compressive strength of concrete.This means that the models are specifically designed to predict the compressive strength of concrete based on the input parameters mentioned earlier.

Methods
In this section, an overview of the models employed in this paper is presented.The discussion begins with the RBFNN model, encompassing its structure, advantages, adjustment parameters, and performance evaluation.Additionally, the concept of ACOA (Ant Colony Optimisation Algorithm) and its rationale for implementation in the research are explored.Subsequently, SVR (Support Vector Regression) is introduced from a mathematical perspective, followed by an examination of the firefly algorithm and its relevance to the study.Finally, a comprehensive review is provided on the implications, structure, and functioning of artificial neural networks, concluding this section.

Radial basis function neural network (RBFNN)
The Radial Basis Function Neural Network (RBFNN) is a type of artificial neural network that has been widely used in pattern recognition, function approximation, and time-series prediction.This network is composed of three layers: an input layer, a hidden layer, and an output layer (Fig. 3).The input layer receives the input data, and the hidden layer consists of radial basis functions that process the input data.The radial basis functions are mathematical functions that depend on the distance between the input data and a set of reference points called centroids.The output layer performs a linear combination of the hidden layer outputs to produce the network's final output.
The RBFNN has several advantages over other neural network architectures, such as its ability to learn quickly and its good generalisation properties.This network is also relatively simple to implement and can be trained using a variety of optimisation algorithms, such as the gradient descent algorithm.However, one of the main challenges in using the RBFNN is to determine the optimal number of neurons and the width of the radial basis functions (Spread).These parameters can significantly affect the performance of the network, and determining their optimal values is often a trial-and-error process.In addition to trialand-error methods, an optimising algorithm can be utilised to determine the key parameters of the RBFNN.With the use of this algorithm, the optimal values of parameters can be obtained efficiently within a short time, leading to improved accuracy and efficiency in the network's performance.
The research employed the Ant Colony Optimisation Algorithm (ACOA), as it is a potent and precise method for addressing optimisation issues.A concise explanation of this algorithm will be presented in the following.The concept of optimising algorithms, such as ACOA, is derived from the behaviour of insects, particularly ants, in their search for food.In nature, when ants leave their nest to search for food, they explore different directions and leave a chemical substance called pheromone along the path between the nest and the food source.When one ant discovers food, the quality and quantity of the pheromone change [59].Other ants can then detect this change and follow the trail to the source of the food [60].This indirect communication method used by ants to find food has inspired the development of ACOA.The algorithm mimics the behaviour of ants in searching for optimal solutions by creating a pheromone trail in the search space.The algorithm then updates the pheromone trail based on the quality of the solutions found.This allows the algorithm to converge to an optimal solution efficiently.It is important to note that the success of ACOA depends on its ability to strike a balance between exploration and exploitation.The algorithm must not only explore the search space to find new solutions but also exploit the solutions that have already been found to improve the overall quality of the solution.
In summary, ACOA is an optimisation algorithm that is based on the behaviour of ants in their search for food.It is designed to strike a balance between exploration and exploitation in order to find optimal solutions efficiently.

Support Vector regression (SVR)
Vapnik [61] introduced the statistical learning theory, Support Vector Machine (SVM), for classification problems.The use of SVM was then extended to regression problems in the form of Support Vector Regression (SVR) [61].The basic idea of SVR is to transform the input data into a feature space of higher dimension using a nonlinear mapping function, and then apply a linear function to this space with minimal complexity.This step aims to flatten the function to decrease its complexity, which results in better generalisation over a larger range.In summary, SVR is a powerful tool for regression problems and provides a way to balance complexity and accuracy.The process of SVR is described as follows: For convenience of discussion, a dataset is denoted as D = [(x i ,t i ), i = 1,…..,h], where x i and t i are the input and target vector, respectively, and z is the number of data.In SVR, the final target is to find a linear relation between h-dimensional input vectors and output variables as follows: where w and b are the weight vector and the bias value, which are calculated by minimising the regularised risk function, as follows: The constant c is a weighting parameter, which determines the tradeoff between the model flatness and empirical error.L ε is ε-essential loss function, which has been illustrated by Vapnik [61] as follows: The regression problem can be transformed into the following optimisation problem by introducing two positive slack variables ξ i and ξ * i : subject to   f This problem can be reformulated in a dual optimisation problem using a dual set of Lagrange multiplier variables α i and α * i .After solving the optimisation problem, the weight vector can be defined as follows: The bias value is calculated using the following equation: and For a new input, the output value is calculated as follows: where x i , x new is the inner product of the new observation and i th pattern.
In nonlinear regression, the Kernel function can be applied for mapping input data onto higher dimensional feature space in order to produce a linear regression hyperplane.The SVR function in Eq. ( 7) can now be

Table 2
Statistical description of parameters in the database.
where k(x i ,x) is kernel function.One of the most well-known kernel functions is Gaussian, which is defined as follows: The SVR model has a set of three primary parameters, including Sigma, C, and Epsilon that are essential for its performance.In SVR models, sigma regulates the width of the kernel, C serves as a regularisation parameter, and Epsilon determines the margin of tolerance around the regression function.In order to achieve optimal performance, it is necessary to determine the best possible values for these parameters.As mentioned earlier, there are two methods for obtaining these values: trial-and-error and optimisation algorithms.Based on the accuracy and speed of optimisation algorithms, the firefly optimisation algorithm was selected as the preferred method in this study to determine the optimal values for the SVR parameters.
The firefly algorithm is a type of optimisation algorithm that takes inspiration from the behaviour of fireflies.This algorithm was developed by Yang [62], and its main idea is about the way fireflies attract each other based on their distance and brightness.The rules of this algorithm are as follows: First, the gender of fireflies is not important, and they can attract each other without any regard for gender.Second, the attractiveness of a firefly is proportional to its brightness.When two fireflies are flashing, the less bright one will move towards the brighter one.The attractiveness of the fireflies is proportional to their brightness, and it decreases as their distance increases.If there is no brighter firefly than a particular firefly, it will move randomly.Finally, the brightness of a firefly is determined by the landscape of the objective function [62].
To elaborate further, the firefly algorithm mimics the behaviour of fireflies in the wild.These insects use bioluminescence to attract potential mates, and their behaviour has inspired the development of this optimisation algorithm.The algorithm works by using the brightness of a firefly to represent the quality of a solution in a problem space.The fireflies in this algorithm move in search of the best possible solution by adjusting their brightness and attraction to other fireflies.This optimisation algorithm has been used in many applications, including in engineering, medicine, and finance, to find the optimal solution to complex problems.

Artificial Neural network (ANN)
Artificial Neural Networks (ANNs) are a class of machine learning algorithms inspired by the structure and function of the human brain.ANNs consist of layers and nodes that work collaboratively to process and classify information.The fundamental elements of an ANN are layers and neurons (see Fig. 4).A layer is composed of neurons that execute a specific function, such as input, processing, or output.Neurons receive input from other nodes and apply a mathematical function to the input to produce an output.Each node has a set of weights that dictate how it processes input from other nodes, as well as a bias term that helps to shift the output of the node.Weights are the connections between nodes, and they determine the strength and direction of the connection.During training, the weights are adjusted to optimise the performance of the network.One of the most commonly used training algorithms for ANNs is Levenberg-Marquardt (LM) [63].Bias is an additional term that is added to the input of a node to shift the output of the node.It provides the network with an additional degree of freedom, allowing it to model more complex relationships between inputs and outputs.Overall, ANNs are capable of solving a wide range of machine-learning problems, including classification, regression, and prediction.They have gained popularity in recent years due to their ability to learn intricate patterns in data and their versatility in handling various types of input.

Radial basis function neural network facilitated by Ant Colony optimisation algorithm (RBFNN + ACOA)
This study presents a hybrid model that combines the Radial Basis Function Neural Network (RBFNN) with the Ant Colony Optimisation Algorithm (ACOA) [32,64].As mentioned earlier, the motivation behind incorporating an optimisation algorithm is to determine the optimal values for the RBFNN parameters.In this research, the Ant Colony Optimisation Algorithm was selected due to its advantageous characteristics, including its parallelism, rapid identification of satisfactory solutions, self-organisation capabilities, suitability for dynamic applications, and positive feedback ability [65].By leveraging the strengths of ACOA, this hybrid model aims to enhance the overall performance of the RBFNN in solving complex problems.Therefore, a concise explanation of the implementation of this hybrid model is provided below (see Fig. 5): In the initial phase, it is crucial to classify the collected data into distinct groups for training and testing purposes.Furthermore, it is necessary to specify the adjustment parameters associated with the ACO algorithm, namely the number of iterations, the number of ants, the sample size, and the deviation-distance ratio.Subsequent to this, an initial population for the artificial colony is generated, acting as a random solution for the model.Once these initial random solutions are established, the ACO algorithm is integrated with the RBFNN, allowing the hybrid model to undergo training and testing procedures employing these solutions.During this stage, the performance of the model is assessed using various statistical parameters, including root mean squared error (RMSE), normalized root mean squared error (NRMSE), Rvalue, coefficient of determination (R 2 ), and mean absolute error (MAE) [Eq (10)-( 14)].Through this evaluation, the most optimal solutions that exhibit superior performance are selected and preserved for further analysis.It is important to emphasise that these statistical parameters are applied to evaluate the outcomes of all AI models employed in this particular study.To attain the best possible results, a fresh population of artificial ants is generated, and the levels of pheromones along their respective paths are updated.Subsequently, the model is once again subjected to training and testing procedures using the new solutions.This iterative process of generating populations, training and testing the model, and evaluating the results continues until at least one of the defined stopping criteria is met.These criteria encompass reaching the maximum number of iterations, the maximum number of generations for the ant population, and the maximum number of attempts to determine the error.
Eqs. ( 10)-( 14) involve several variables, where "l i " represents the actual output for the i th instance, and "z i " represents the calculated output."l i " and "z i " refer to the average of the actual and predicted outputs, respectively.The variable "m" represents the total number of samples.

Support Vector regression assisted by firefly algorithm (SVR + FA)
This particular hybrid model introduced in the study combines SVR (Support Vector Regression) with FA (Firefly Algorithm) [66].In the context of SVR, one of the key factors for achieving optimal performance lies in determining the appropriate values for its parameters, namely Epsilon, C, and Sigma.To address this, the utilisation of an optimisation algorithm becomes essential.In this case, FA was selected as the optimising algorithm due to its inherent capabilities, such as automatic subdivision and the ability to handle multimodality [62].By employing FA, the hybrid model aims to effectively determine the optimal values for the SVR parameters.The performance of this combined model can be elaborated upon and is also depicted in Fig. 6.First, the hybrid model is introduced to a comprehensive dataset obtained from previous studies.This dataset is randomly divided into training and testing datasets, with 70 % and 15 % of the data allocated to each set, respectively.The model relies on several adjustment parameters that are essential for its functioning, such as the maximum number of iterations (this is a predetermined restriction that determines how many times the algorithm will update the positions of the fireflies during the search for an optimal solution (MAX It)), the number of fireflies (nPop), the light absorption coefficient (L), the attraction coefficient base value (B), and the mutation coefficient (M).After completing the initial steps, a population of fireflies is generated by the model, representing random solutions.The Firefly Algorithm (FA) is then incorporated into the Support Vector Regression (SVR), enabling the model to be trained and tested using these firefly-based solutions.The model's performance is evaluated by assessing the same statistical parameters as those mentioned for the previous model, which includes RMSE, R-value, and MAE.The best results obtained from this evaluation process are saved for further analysis.The subsequent step involves generating a new population of fireflies using a specific equation.The model is reexecuted using this updated firefly population, and its performance is re-evaluated.The best results obtained from this evaluation are saved, and the model checks if any of the stopping criteria have been met.These stopping criteria may involve reaching the maximum number of iterations, a specific generation number, or achieving the desired error threshold.If none of the stopping criteria has been satisfied, the model continues to run, repeating the process of generating new populations of fireflies.

Artificial neural network development (ANN)
This study utilised a basic artificial neural network (ANN) model to accurately predict the compressive strength of sugarcane bagasse ash concrete.The chosen ANN model was considered reliable and widely accepted for this purpose.The model was constructed with three primary layers, and the Levenberg-Marquardt training algorithm, known for its popularity, was employed during the configuration process.The determination of the number of neurons in the hidden layer was achieved through a trial-and-error approach.Ultimately, the best results were achieved when the ANN model consisted of three main layers, with the hidden layer containing 15 neurons.

Discussion and comparison
Due to the utilisation of three different models, including RBFNN, SVR, and ANN, and three groups of datasets [Database of concrete containing SCBA including 340 data (group A), a database of concrete containing SCBA + concrete without SCMs including 799 data (group B), and all data (concrete containing SCBA + concrete without SCMs + concrete with SCMs) including 1819 data (group C)], the initial focus of this study is on evaluating the performance of each model for every dataset group.Following this analysis, the study aims to determine the most effective model and method for predicting the compressive strength of SCBA concrete.The primary objective of utilising three different datasets in the study was to examine the influence of each dataset group and its parameters on the model's performance.The following outlines the process and performance of the model:

Database of concrete containing SCBA (340 data in group A)
For this specific dataset group consisting of 340 data points, the datasets were divided into three main subgroups: training data, testing data, and validating data.The distribution of each subgroup is as follows: the training data accounts for 70 % of the dataset, the testing data accounts for 15 %, and the validating data accounts for the remaining 15 %.
Regarding the RBFNN + ACOA model, the ACOA algorithm is responsible for identifying the optimal values of the spread and number of neurons for the RBFNN.Specifically, the values determined by ACOA for the spread and number of neurons are 57.50 and 145, respectively Similarly, for the SVR model, the firefly algorithm is employed to optimise the parameters, resulting in values of 0.44 for the Epsilon, 54 for C, and 0.28 for Sigma.In contrast, for the ANN model, a configuration consisting of three main layers and the Levenberg-Marquardt training algorithm is utilised for all models associated with each dataset group.The distinguishing factor among these models is solely the number of neurons, which varies in each model.For this model, the number of neurons is 15.
The performance of the models in terms of training and testing data, compared to the experimental results, is depicted in Fig. 7a-f.In this figure, the x-axis represents the model's predicted values, and the y-axis represents the corresponding experimental data.The diagonal line depicts the ideal relationship between the model's predictions and the experimental data.Each dot on the plot corresponds to a specific data pair consisting of the model's prediction and the corresponding experimental data point.The position of each dot on the plot indicates the level of agreement or deviation between the predicted and experimental values.When the dots are closer to the diagonal line, it signifies a higher correlation between the model's outputs and the experimental data, resulting in improved accuracy of the model.Furthermore, for a clearer comprehension of the relationship between model outputs and experimental data, two black dotted lines indicating the positions of the ± 20 % range have been plotted during the testing phase.These lines illustrate the quantity and specifics of data points situated within the predictive range ± 20 %.These figures demonstrate that all models  exhibit acceptable performance.However, when considering the training data, both SVR and RBFNN models show a similar density of points along the diagonal line, which is higher than that of the ANN model.On the other hand, for the testing phase, the performance of RBFNN and SVR models outperforms that of the ANN model.Although the performance of RBFNN and SVR models is quite similar, it is noticeable that some points for RBFNN are further away from the diagonal line compared to the same points for SVR.Consequently, it can be concluded that SVR performs better than RBFNN in the testing phase.
For a more comprehensive evaluation based on statistics, Table 3 provides the performance of the models for each phase and dataset group.The superior performance of RBFNN and SVR models in the training step, as well as the highest performance of SVR in the testing phase, is confirmed by the statistical parameters presented in Table 3.In the training phase, both SVR and RBFNN models exhibit R-value, RMSE, R 2 , NRSME, and MAE values that are very close to each other, and these values are superior to those of the ANN model.However, in the testing phase, the better performance of SVR is evident through its superior statistical parameter values compared to the other models.

Database of concrete containing SCBA + Concrete without SCMs (799 data in group B)
This particular dataset consists of 340 data points related to sugarcane concrete and an additional 459 data points related to concrete without SCMs.In this dataset, 459 extra data points from concrete without SCMs are included in the training data, while the testing data remains the same as in the previous dataset group.By incorporating the concrete without SCMs data, it becomes possible to observe the impact of concrete without SCMs on the model's results.
As mentioned earlier, the ANN model's configuration remains consistent across all data groups, with only the number of neurons in the hidden layer being altered.In the case of this particular data group, the ANN model is configured with 12 neurons in its hidden layer.Regarding the RBFNN model, the ACOA algorithm determines the number of neurons and spread as 170 and 300, respectively.For the SVR model, the parameters are specified as follows: Epsilon is set to 0.45, C is set to 137, and Sigma is set to 0.72.
The performance of the models compared to the experimental results is depicted in Fig. 8a-f.Based on the figure, it is evident that the RBFNN model outperforms the other models.The points are more closely clustered around the diagonal line for the RBFNN model compared to the others.The statistical parameters in Table 3 further support this observation.The R-value of the RBFNN model is closer to one, indicating a stronger correlation.Additionally, the RMSE and MAE values of the RBFNN model are lower than those of the other models.In terms of the testing phase, the figure suggests that the performance of the RBFNN and ANN models is almost equal and superior to that of the SVR model.However, a closer examination of the statistical parameters reveals that the ANN model exhibits the best performance in this phase.Although the R-value of the ANN and RBFNN models is similar, the ANN model demonstrates lower values for RMSE, NRMSE, and MAE, indicating higher precision in its predictions.

All data (concrete contains SCBA + concrete without SCMs + concrete with SCMs, 1819 data in group C)
Similar to the previous dataset groups, in this dataset, the testing data for the models remains constant and identical to the previous dataset.The additional data points are solely included as part of the training data.For this particular dataset, the key parameters of the RBFNN model are determined as 900 for Spread and 220 for another parameter (Maximum numbers of neuron).The SVR model is configured with Sigma = 1.20,C = 120, and Epsilon = 0.30 as its primary parameters.Furthermore, the ANN model has 15 neurons in its hidden layer.
Similar to the previous sections, the performance of these models for this dataset is evaluated by examining Fig. 8a-f, which illustrates the outputs of the models compared to the experimental results.Additionally, the statistical parameters of the models are presented in Table 3 to further analyse their performance.
It is evident from Fig. 9a-f that the performance of the ANN and RBFNN models is quite similar, and both models outperform the SVR model.This observation is further supported by the statistical parameters, where the R-value, RMSE, and MAE values for the ANN and RBFNN models are superior to those of the SVR model.However, considering these parameters, it can be concluded that the accuracy of the RBFNN model slightly surpasses that of the ANN model.Hence, for the training phase with all the data, the RBFNN model exhibits the best performance.During the testing phase, the points in the ANN model are observed to be closer to the diagonal line compared to the RBFNN and SVR models.Additionally, the statistical parameters of the ANN model demonstrate its superior performance in this phase.Specifically, the R-value of the ANN model is approximately 2 % higher than those of the other models, the RMSE is at least 25 % lower than those of the other models, and the MAE is approximately 11 % better than those of the other models.As a result, it can be concluded that the ANN model exhibits the highest efficiency in this testing phase.

Comparison of the results of the database
In this section, the best performance of each dataset group will be compared to determine which group is the most suitable for model input.Based on the findings presented in Sections 5.1 − 5.3, it is evident that different models yield the best performance for each dataset group: Support Vector Regression (SVR) for the sugarcane dataset, Radial Basis Function Neural Network (RBFNN) for the combined sugarcane and concrete without SCMs dataset, and Artificial Neural Network (ANN) for the entire dataset.Therefore, the comparison can be conducted by evaluating the performance of the best model within each dataset group.
According to the statistical parameters provided in Table 3, it is evident that all dataset groups can be utilised for training the model.The statistical parameters of the best models for each group demonstrate acceptable accuracy across different ranges.Upon closer analysis, it is observed that when only the sugarcane data is used for training the models, there is a very high accuracy in the training phase.However, there is a slightly higher difference between the results of the training and testing phases, suggesting a potential impact of overfitting.On the other hand, when the entire dataset is used for training, the differences between the training and testing phases are reduced.However, the overall accuracy of both phases decreases as well.In comparison to using only the sugarcane data, selecting the sugarcane data for training appears to be more reasonable.However, when the combination of sugarcane and concrete without SCMs data is used for training the models, the results demonstrate that this dataset group yields good accuracy with a reduced difference between the training and testing phases.Moreover, the consistency in the performance of the models is evident regardless of the method, as compared to the case where only SCBA data has been used specifically.Consequently, the likelihood of overfitting decreases, leading to more genuine outputs from the models.
Therefore, the primary competition lies between dataset groups A and B. By utilising both groups as input data, the models exhibit acceptable performance with good accuracy.However, the choice of the best dataset group ultimately depends on the researcher's goals and priorities.If researchers prioritize achieving very high accuracy in the training phase and are not concerned about the potential overfitting of the model, group A is a suitable choice.On the other hand, if obtaining more genuine outputs from the models, in addition to good accuracy and performance, is a priority, then group B of data would be the preferred option.In the following, an attempt was made to improve the performance of the models for group B of data and assess the impact of enhancing the concrete without SCMs data as training data on the test outputs (sugarcane data).An iterative process has been conducted, examining each data point to determine whether its inclusion significantly increases prediction errors.In this process, each datapoint whose model's prediction shows a high error in comparison with its experimental value can be considered as an outlier point.Data points that meet this criterion (show high errors in comparison to the experimental data) have been removed from the model.This process was applied exclusively to the concrete without SCMs dataset and independently used to predict concrete strength with sugarcane.The models were run using this new database, and the results of the model are shown in Table 4.
As clearly shown in Table 4, the refinement of the concrete without SCMs data as training data had a significant positive impact on the models.The refined concrete without SCMs data enabled the models to undergo better training, resulting in improved outputs with higher accuracy in both the training and testing phases.For the RBFNN model during the training phase, the RMSE, MAE, and R-values of the model improved from 3.90, 2.40, and 0.98 to 2.20, 1.58, and 0.99, respectively.Furthermore, in the testing phase, the model exhibited the following improvements: RMSE decreased from 6.10 to 4.30, R 2 increased from 0.90 to 0.956, and MAE reduced from 4.90 to 3.50.
Additionally, these improvements indicate that the RBFNN model yielded the best results in the testing phase compared to other models.Not only did the accuracy of the RBFNN model increase in both the testing and training phases, but the difference between these two phases also decreased, indicating more reliable results.The performance of the RBFNN model has been confirmed by various other papers [30,32,67,68].However, a direct comparison between the results of these papers and the findings of our study is not feasible.The objectives, dataset sizes, and types of data often differ significantly.While the RBFNN model consistently demonstrates strong performance in previous research, it's important to note that the effectiveness of other models could potentially surpass that of the RBFNN, depending on the specific topic, dataset, and methodologies employed in those other studies.In our current research, considering the statistical parameters and data refinement techniques employed, the RBFNN model emerges as the top performer.
As mentioned in the last paragraph, a direct comparison between the performance of the model in this study for predicting the compressive strength of concrete containing SCBA and the performance of other research models used for predicting compressive strength is not feasible.This is because, in many cases, the input parameters of the models differ across various research studies, the datasets are not consistent, or the number of data points varies.Given this fact, after examining other research studies related to the prediction of compressive strength in concrete, such as those by [69][70][71][72][73][74] (among many others), and reviewing the statistical parameters from these studies, it becomes evident that the R 2 values in the test phase for all of these models fall within the range of 0.7 to 0.98, while the RMSE values range from 1.5 to 6.A simple comparison of these values with the statistical parameters from the RBFNN results leads to the conclusion that the performance and accuracy of this model are in good standing.
Furthermore, in comparison to other studies that employed AI or machine learning techniques to predict concrete containing SCBA [41][42][43], several distinctions emerge.These disparities include the utilization of a limited dataset, often constrained to approximately 120 or 150 samples, and a predominant focus on concrete aged for only 28 days.Additionally, these studies did not use other data points, such as data without SCMs, as a partial dataset for the training phase.Therefore, the comparative analysis between this study and those studies is not straightforward.However, by evaluating the statistical parameters of the Radial Basis Function Neural Network (RBFNN) in this study alongside those of the other research, it becomes evident that the accuracy of our model is both acceptable and notably high.
In this context, a table has been prepared to display the statistical parameters of previous studies that utilized machine learning models for predicting the compressive strength of SCBA concrete.The purpose of this table is to facilitate comparisons with the RBFNN model based on their statistical parameters.Table 5 focuses on prior studies that employed various models to predict the compressive strength of SCBA concrete.It includes R 2 values and Nash-Sutcliffe Efficiency (NSE), which are normalised statistical parameters used in this context.The relevant statistical parameters for the RBFNN's performance in predicting the compressive strength of SCBA concrete are presented in the first row of Table 5.
Based on the statistical parameters presented in Table 5, it is seen that the RBFNN model outperforms the other models used in previous studies.During the training phase, the R 2 value for RBFNN is 0.98, matching the highest R 2 value in the comparison, and the NSE is also 0.98, compared to the nearest values of 0.95 (R 2 ) and 0.94 (NSE), respectively, for other models.Moreover, during the testing phase, RBFNN continues to demonstrate superior performance.It achieves an R 2 value of 0.955 and an NSE of 0.95, while the best values for the same parameters in other models are 0.9 for R 2 and NSE, respectively.
It can be seen that based solely on the statistical parameters of the models, the RBFNN outperforms the previous models used for predicting the compressive strength of SCBA concrete.The values of statistical parameters for both the training and testing phases of this model are superior to those in the other mentioned studies.This observation further verifies the effectiveness of the RBFNN model.In contrast, the results of some other models reveal that the statistical parameters in the testing phase tend to be better than those in the training phase, which may indicate the possibility of biased data selection.Furthermore, when examining other papers in the field that utilize machine learning models for predicting concrete compressive strength (e.g., [34,[70][71][72]75,76]), the R 2 values in these studies range from 0.51 to 0.96.By comparing these values with the R 2 value of the RBFNN, which is nearly 0.96, it is apparent that the statistical parameters of the RBFNN model for predicting the compressive strength of SCBA concrete fall within an acceptable range and perform well compared to other studies.

Optimal mix design of SCBA concrete
The primary factor influencing the mechanical properties of concrete, particularly its compressive strength, is the composition of the concrete mixture.By adjusting the proportions of the ingredients in the mixture, the mechanical properties of the concrete can be modified accordingly.Another crucial consideration in construction projects is cost management.Therefore, finding an optimal mixture that achieves the desired compressive strength while minimising expenses can meet the construction requirements and reduce overall costs simultaneously.To address this challenge, a multi-objective genetic algorithm has been implemented to determine the optimal mixture design for SCBA concrete.This algorithm takes into account the 28-day compressive strength of the concrete and the associated costs of the mixture, enabling the identification of the best mixture design that strikes a balance between strength and affordability.
The genetic algorithm (GA) is widely recognised as a prominent type of optimisation algorithm and is considered one of the most powerful tools for addressing optimisation problems.The primary objective of this algorithm is to minimise the cost of one cubic meter of concrete with a cylinder compressive strength of around 50 MPa.The cost can be calculated using the following equation: Totalprice = P c A c + P w A w + P s A s + P ca A ca + P fa A fa (13) In the equation, P i represents the price of the i th parameter, A i represents the amount of each parameter, and c, w, s, ca, and fa correspond to cement, water, sugarcane, coarse aggregate, and fine aggregate, respectively.Table 6 displays the material costs per kilogram based on Australian Dollar (AUD).
The GA model incorporates a secondary objective function, which involves the utilisation of an RBFNN assisted by FA to predict the 28-day compressive strength of SCBA concrete.The objective function relies on the information regarding the actual 28-day compressive strength of SCBA concrete.Initially, the FA algorithm determines the optimal number of neurons and the spread of the RBFNN.Once the FA identifies the optimum values for the RBFNN's adjustment parameters, the RBFNN operates with heightened accuracy and optimal performance.This step signifies the achievement of the second objective in the optimisation model.Moreover, the range of materials should be introduced to the model as the upper bound and lower bound based on the amount of material that have been used in the test data.So, the GA initiates by generating an initial random solution for the amount of material based on the upper and lower bounds, and makes a mixture.After that, this mixture will be introduced to the configured RBFNN to determine its compressive strength.The solution is enhanced based on these objective functions (see Fig. 10).This process will be repeated until the model determines the mixture that gives the highest compressive strength with the lowest cost.
At first, the model was used to determine the optimum mixture with the lowest price and highest strength among the test data, which means that the price-to-strength ratio should be minimized.In this regard, the model successfully identifies a blend that achieves a minimum cost of $336.1 and a compressive strength of 48.7 MPa, with a price-to-strength ratio of 6.9, which is the minimum ratio among the test data.As specified in Table 7, the lowest cost of the mixture in the test data is $353.3, and the compressive strength is 48.2 MPa.Therefore, by comparing the cost and compressive strength of the mixture determined by the model with the experimental data (see Table 7), it can be concluded that the model determines a mixture with a slightly lower cost and almost the same compressive strength as the experimental data.Furthermore, to verify the performance of the model in the higher range of compressive strength and cost, another experimental mix design was introduced to the model as a target, with the following characteristics: compressive strength of 51.30 MPa and cost of $351.50.As indicated in Table 8, the model has determined a concrete mixture containing SCBA with a compressive strength and cost of 48.80 MPa and $347.50, respectively.Therefore, by comparing the results of the model with the experimental data (see Table 8), it can be said that this model demonstrates good ability and accuracy in determining the optimum mixture of SCBA concrete by considering the relationship between compressive strength and the cost of the mixture over a wide range.

Conclusions
Sugarcane bagasse ash is one of the agricultural and industrial waste materials that can be used in a concrete mixture as a replacement for cement or fine aggregate.By reusing waste materials in concrete, not only can the environmental pressure be decreased, but also the properties of concrete can be improved.In addition, this study investigates three AI methods, including RBFNN, SVR, and ANN, as cheap and fast methods for predicting the compressive strength of concrete containing sugarcane bagasse ash.In this regard, the collected data have been categorised into three main groups, which are described in Section 5.The following conclusions can be drawn: • For predicting the compressive strength of the SCBA concrete dataset (group A), the SVR model exhibited the best performance.The key parameters of the SVR were determined by the FA algorithm as follows: Epsilon = 0.445, C = 54, and Sigma = 0.28.• The RBFNN model demonstrates the highest accuracy in predicting the compressive strength of the database consisting of SCBA concrete + Concrete without SCMs (group B).For this model, the maximum number of neurons and spread, which are the primary parameters of the RBFNN model, have been determined to be 170 and 300, respectively, using the ACO algorithm.• Based on the statistical parameters for all data (including concrete containing SCBA, concrete without SCMs, and concrete with SCMs (group C)), the ANN model produces the best outputs.However, the accuracy of the models for this particular group of datasets is not as high as that of the other groups of data.• Among all models and for all groups of data, the SVR model exhibits the highest accuracy for Group A. The RBFNN model shows the closest results to the SVR model for Group B. However, after the refinement process on the concrete without SCMs data in Group B, including deleting the outlier data, the accuracy of the models, particularly the RBFNN model, significantly improves, especially during the testing phase.As a result, both the RBFNN model for Group B and the SVR model for Group A demonstrate high and comparable accuracy.Therefore, both models can be used for predicting the compressive strength of SCBA concrete.Nevertheless, when considering the statistical parameters, the difference between the training and testing phases for the RBFNN model is smaller than that of the SVR model.This suggests that the outputs of the RBFNN model are more realistic and reliable.Moreover, the advantage of utilising concrete without SCMs data is its consistent performance across various AI methods, unlike the specific use of SCBA data.
• Based on the outputs of the multi-objective GA for the optimum mixture of SCBA concrete, it can be concluded that this model exhibits a strong ability to determine the optimal mixture design for SCBA concrete.Moreover, by determining the optimum mixture for  the compressive strength and price in a higher range than the optimal value of the testing data, the performance of the model has been confirmed.
It should be noted that the main limitation of this study and the proposed model is their dependence on the quality and quantity of the data.This means that introducing a high quality and substantial quantity of data can yield good results, while, conversely, with low-quality data, the accuracy of the model will be diminished.Furthermore, for future research, increasing the number of SCBA concrete data, changing or adding input parameters, and incorporating other AI and machine learning models can be considered.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

G
.Pazouki et al.

Fig. 2 .
Fig.2.Scattering range of parameters; each chart shows the scattering range of (a) cement, (b) water, (c) coarse aggregate, (d) fine aggregate, (e) sugarcane bagasse ash, (f) slag, (g) fly ash, (h) water to binder ratio for all data, and (i) water to binder ratio for SCBA data.

G
.Pazouki et al.

G
.Pazouki et al.

Fig. 5 .
Fig. 5. Diagram outlining the performance flow of the hybrid model combining RBFNN and ACOA.

Fig. 6 .
Fig. 6.Diagram of the flow of the hybrid model of SVR and FA.

Fig. 7 .
Fig. 7. Values of the model's prediction of CS against the values of experimental CS for RBFNN model (a) training dataset, (b) testing dataset, SVR model (c) training dataset, (d) testing dataset, and A NN model (e) training dataset, and (f) testing dataset.

G
.Pazouki et al.

Fig. 8 .
Fig. 8. Values of the model's prediction of CS against the values of experimental CS for RBFNN model (a) training dataset, (b) testing dataset, SVR model (c) training dataset, (d) testing dataset, and A NN model (e) training dataset, and (f) testing dataset.

Fig. 9 .
Fig. 9. Values of the model's prediction of CS against the values of experimental CS for RBFNN model (a) training dataset, (b) testing dataset, SVR model (c) training dataset, (d) testing dataset, and A NN model (e) training dataset, and (f) testing dataset.

Table 1
VIF values of parameters.

Table 3
Statistical parameters for each model performance.

Table 4
Statistical parameters of models with the new dataset.

Table 5
Model performances for predicting compressive strength of SCBA concrete in this and previous research.

Table 7
Optimal mixture for the sample with the minimum cost in the test data.

Table 8
Optimal mixture for the sample with the higher CS in the test data.