Comparing the performance of machine learning models for predicting the compressive strength of concrete

doi:10.21203/rs.3.rs-4176429/v1

Download PDF

Research Article

Comparing the performance of machine learning models for predicting the compressive strength of concrete

https://doi.org/10.21203/rs.3.rs-4176429/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

This study aimed to investigate and compare the performance of different machine learning models in predicting the compressive strength of concrete using a data set of 1234 compressive strength values. The predictive variables were selected based on their relevance using the SelectKBest method, resulting in an analysis of eight and six predictive variables. The evaluation was conducted through linear correlation studies via simple linear regression and non-linear correlation studies using support vector regression (SVR), gradient boosting (GB), and artificial neural networks (ANN). The results showed a coefficient of determination (R²) = 0.85 and a root mean square error (RMSE) = 30.9051 MPa for SVR, R² = 0.90 and RMSE = 25.5979 MPa for GB, and R² = 0.87 and RMSE = 5.781 MPa for ANN. The comparison between the machine learning methods revealed significant differences. For instance, GB stood out with a higher R² value, demonstrating its remarkable ability to explain the variability in the data. Conversely, ANN showed the lowest RMSE value, indicating notable accuracy in the predictions. The choice between these approaches depends on considerations regarding the balance between explainability and accuracy. While GB provides a more in-depth understanding of the relationship between variables, ANN stands out for the accuracy of its predictions.

concrete

compressive strength

predictive variables

support vector regression

gradient boosting

artificial neural networks

The compressive strength of concrete is a fundamental parameter in civil engineering, with direct implications for safety and durability (Helene and Andrade 2007). Traditional methods, such as destructive testing, are expensive and limited, especially considering the complexity of concrete composition, which is influenced by ingredient proportions and curing time (Andrade et al. 2015; Estacechen 2020). In this context, ML techniques are an innovative approach that can account for a wide range of complex variables and interactions. These techniques enable one to model and predict concrete strength with greater accuracy and efficiency, which can significantly reduce costs and contribute to advances in developing more robust and durable concrete (Souza 2022; Suen et al. 2005; Andrade 2016). Hence, this study sought to explore the potential of these techniques by analyzing predictive variables in concrete composition, providing valuable insights for civil engineering and the construction of safer and more efficient structures.

Concrete, known for its strength, durability, and versatility, is extensively used in civil construction. It plays a critical role in building structures of various sizes, from homes to large infrastructure projects (Costa and Brandão 2023; Scrivener and Snellings 2022; Kaefer 1998). Predicting the compressive strength of concrete, which is pivotal for ensuring its ability to withstand applied loads, has been challenging, leading to the development of various methods over the years (Bastos 2006; Pereira 2008; Pinheiro 2007). Recently, machine learning (ML) techniques have emerged as promising tools for predicting the compressive strength of concrete. These techniques offer advantages such as considering multiple variables and identifying complex patterns in the data (Fan et al. 2008; Paixão et al. 2022; Woźniak et al. 2014). Given this context, this study sought to scrutinize the application of ML techniques, including linear regression, support vector machines for regression (SVR), and artificial neural networks (ANN), to predict compressive strength and explore the advantages and challenges associated with these techniques.

The objectives of this study were to conduct a comprehensive analysis of the interactions between predictive variables in concrete composition and their impact on compressive strength to gain in-depth insights into the relationships between these variables using the Python programming language. Additionally, the advantages of ML techniques were compared to traditional approaches in terms of analyzing concrete properties, the interactions between predictor variables in predicting compressive strength, and specific models such as SVR, gradient boosting (GB), and artificial neural networks (Braga et al. 2020; Chaabene et al. 2020).

Jupyter Notebook

The Jupyter Notebook (version 6.4.12), an open-source web application, was selected as the development environment for implementing the ML models. It offers an interactive computing environment that seamlessly integrates code, documentation, and visualizations into a unified workflow (Wang and Zeller 2020). The Python programming language (version 3.9.13) and ML techniques were used to investigate the complex interactions between different variables (e.g., water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time) and their influence on the compressive strength of concrete. Recent studies have shown that Jupyter Notebook provides flexibility for data exploration, pre-processing, modeling, and evaluating model performance (Paixão et al. 2022, 2023), in addition to streamlining the documentation of the process, enabling clear and cohesive communication of the analysis stages, decision-making, and results.

Database

An initial data set was supplemented with information extracted from other articles to enhance the diversity and robustness of the learning process (Achong and Gunter 2021; Da Silva and Silva 2022; Feng et al. 2020; Kamath et al. 2022; Paixão et al. 2022). The final data set consists of 1234 records on the compressive strength of concrete, which were used for the ML and algorithm training. This data set provided comprehensive and representative information on the properties of concrete and their compressive strength values. An existing database was used to ensure that the research was based on a representative and diverse sample (Brownlee 2020), thereby making the results more robust and generalizable. Another advantage of this approach is that it allows for external validation of the model and results. This external validation serves as an independent check that strengthens the reliability of the study.

Furthermore, utilizing a pre-existing database was crucial for time and resource efficiency, as collecting data can be time-consuming and costly. By using this data set to train an ML algorithm, it was possible to explore the patterns and relationships between the eight input variables (water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time) and the output variable (compressive strength). Each concrete variable has interdependent relationships, as demonstrated in Table 1.

Table 1

Characteristics of the variables that make up concrete
Variable	Characteristics
Cement	The quantity of cement in the concrete mix (measured in kg/m³).
Blast furnace slag	The amount of blast furnace slag in the concrete mix (measured in kg/m³). Blast furnace slag is a by-product of iron and steel production, and it can be used as a partial substitute for cement in concrete.
Steering wheel gray	The amount of fly ash in the concrete mix (measured in kg/m³). Fly ash is a by-product of coal combustion and can also be used as a partial substitute for cement in concrete.
Water	The amount of water in the concrete mix (measured in kg/m³). Water is essential for cement hydration, which is the process of hardening.
Superplasticizer	The amount of superplasticizer in the concrete mix (measured in kg/m³). Superplasticizers are chemical additives that reduce the water needed in concrete without compromising its workability.
Coarse aggregate	The quantity of coarse aggregate in the concrete mix (measured in kg/m³). Coarse aggregate is typically crushed stone or gravel, giving volume and strength to the concrete.
Fine aggregate	The amount of fine aggregate in the concrete mix (measured in kg/m³). Fine aggregate is usually sand and fills the gaps between the coarse aggregate particles.
Test age	The age of the concrete at the time of testing (measured in days). Concrete strength increases with age, making it important to know the age of the concrete when interpreting test results.
Concrete compressive strength	The compressive strength of concrete, measured in MPa. Compressive strength refers to the ability of concrete to resist compression under a load.
Source: Adapted from Helene and Andrade (2007) and Helene and Silva Filho (2011).

Insert Table 1 here

The algorithm undergoes a training process to learn from the training data, adjusting its parameters and creating a model capable of making predictions or decisions based on the acquired knowledge. To assess the algorithm’s performance, the data was divided into training and test sets in an 80:20 ratio. Therefore, an initial exploratory analysis was conducted to evaluate the data’s quality using descriptive statistics, which is crucial to understand the nature of the data and identify any potential obstacles that may impact the effectiveness of model training.

Attribute Selection

To identify the best attributes from data set X based on the target variable Y, we employed the “SelectKBest” feature selection technique from the sklearn.feature_selection package (Chandrashekar and Sahin 2014). The selection criterion used was the F regression (f_regression), which measures the linear correlation between each attribute and the target variable. After the analysis, studies were conducted with two different scenarios: one with k = 8, including the predictor variables water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time; and another with k = 6, in which fly ash and fine aggregates were removed. This comparison allowed us to assess how the analysis performs when the number of predictive variables analyzed varies.

This attribute selection strategy simplifies the model by removing variables that contribute little valuable information to enable more efficient and interpretable modeling. The selection process, carried out by SelectKBest, prioritizes the most relevant variables, enhancing the model’s ability to identify patterns and make more accurate predictions on unseen data. By focusing on the most significant attributes in explaining the variability of the target variable, it is possible to improve the model’s effectiveness.

Linear Regression

Linear regression is employed to model the relationship between a set of independent variables and a dependent variable (Chein 2019), and this study utilized a flexible approach to capture complex relationships. The linear regression model was implemented using the “LinearRegression” class from the “sklearn.linear_model” package. The model was trained on the data set X, consisting of the independent variables and the dependent variable y.

The quality of the model’s fit was assessed using two metrics: the coefficient of determination (R²) and the root mean square error (RMSE). The former measures how well the model’s predictions align with the actual values, while the latter quantifies the closeness between the model’s predictions and the actual values. For evaluation purposes, we printed out the results of model training. The R² was calculated using the “score” method from the “LinearRegression” class, and the RMSE was obtained using the “mean_squared_error” function from the “sklearn.metrics” package, followed by the “sqrt” function from the “numpy” package to calculate the square root.

Support Vector Machine for Regression

The parameters for a support vector machine model and a cross-validation object were defined. Subsequently, a “GridSearchCV” object was created to conduct a grid search on the parameters of the SVR using cross-validation (Smola and Schölkop 2004). The “GridSearchCV” object was then applied to the training data, specifically Xtrain and ytrain, in order to perform a grid search on the SVR model parameters and adjust the best model to the training data (Cervantes et al. 2020). The best estimator, identified using the “grid.best_estimator_” method, represents the model that exhibited the highest performance in the cross-validation, as determined by the specified scoring metric during the configuration of the “GridSearchCV” object.

Gradient Boosting

Another method used in this study was GB, an ML technique designed for regression and classification problems. This particular approach generates a prediction model as an ensemble of weak prediction models, typically represented by decision trees (Zhu et al. 2023). The parameters employed and their respective functions are as follows:

• n_estimators: This parameter represents the number of boosting stages. Since GB is resistant to overfitting, a larger number generally leads to better performance;
• max_depth: This parameter determines the maximum depth of the individual regression estimators. The maximum depth restricts the number of nodes in the tree, and adjusting this parameter aims to optimize performance, with the ideal value depending on the interaction of the input variables;
• min_samples_split: This parameter specifies the minimum number of samples required to split an internal node;
• learning_rate: This parameter refers to the learning rate, which decreases the contribution of each tree based on the learning_rate value. There exists a trade-off between learning_rate and n_estimators;
• loss: This parameter denotes the loss function to be optimized, where ‘squared_error’ refers to the RMSE.

The model was trained using the GradientBoostingRegressor class from the sklearn.ensemble package. Subsequently, the trained model can provide results, enabling predictions on the X test data set and the RMSE of these predictions to be calculated (Cherkassky and Ma 2004).

Artificial Neural Networks

To apply ANN, we first identified the predictor variables (X) and the output variable (y) and then divided the data into training and test sets. The architecture of the neural network was adjusted for optimization by considering the following parameters:

• solver = ‘adam’: This refers to the weight optimization solver. ‘adam’ is a gradient-based stochastic optimization algorithm particularly well-suited for large data sets.
• hidden_layer_sizes = (32, 64, 32): This represents the number of neurons in the hidden layers. The model will have three hidden layers, with the first layer containing 32 neurons, the second layer containing 64 neurons, and the third layer containing 32 neurons.
• n_iter_no_change = 200: This indicates the maximum number of epochs to iterate without observing an improvement in the training process.
• random_state = 1: This is the seed of the random number generator, which is used to initialize the weights randomly.
• max_iter = 5000: This represents the maximum number of interactions for the solver.
• learning_rate_init = 0.0001: This signifies the initial learning rate for the ‘adam’ solver.
• verbose = True: This setting enables the printing of the training progress.

Once the training is complete, the ANN model is ready to make predictions.

Exploratory Analysis

Table 2 presents the findings of the preliminary exploratory analysis conducted on the database under investigation. This analysis yielded pertinent information, such as the number of fields completed in the database, the standard deviation, and the maximum (max) and minimum (min) values for each variable.

Table 2

Input variables in the compression analysis process.
	Cement*	Blast furnace slag	Fly ash	Water*	Superplasticizer*	Coarse aggregate*	Fine aggregate*	Test time^#	Concrete compressive strength^†
Data count	1234.00	1234.00	1234.00	1234.00	1234.00	1234.00	1234.00	1234.00	1234.00
Mean	282.37	68.92	58.95	181.44	6.87	958.78	750.27	41.90	35.10
SD	113.01	83.98	71.10	31.05	6.62	143.58	125.21	58.43	16.20
Min.	4.50	0.00	0.00	1.80	0.00	12.35	8.85	1.00	2.33
25%	190.34	0.00	0.00	167.35	0.00	923.00	703.25	14.00	23.09
50%	277.00	17.35	0.00	185.70	6.68	968.00	770.00	28.00	33.70
75%	361.70	132.90	121.35	194.00	10.68	1043.60	812.00	28.00	44.39
Max.	540.00	359.40	260.00	247.00	32.20	1145.00	992.60	365.00	82.60
Note: SD = standard deviation; * measured in kg/m³; ^# measured in days; ^† measured in MPa.

Insert Table 2 here

The mean compressive strength of the concrete was 35.1 MPa, indicating a relatively high strength suitable for applications requiring this property. However, there was considerable variability in the compressive strength, as evidenced by the standard deviation of 16.2 MPa. This variability may be attributed to factors such as the quality of the materials, mixing and curing procedures, and testing methods.

Conversely, the minimum compressive strength of 2.33 MPa was significantly low, making the concrete associated with this strength unsuitable for most applications. This low strength may have been due to issues with the concrete mix or the testing procedures. Additionally, the compressive strength at the 25th percentile was 23.1 MPa, meaning 25% of the concrete samples had a compressive strength lower than 23.1 MPa. At the 50th percentile, the compressive strength was 33.7 MPa; at the 75th percentile, it was 44.4 MPa. These findings indicated that the median compressive strength of the concrete was 33.7 MPa, and 75% of the samples had a strength above 33.7 MPa. Furthermore, the maximum compressive strength was 82.6 MPa, a very high strength value suitable for demanding applications.

The data suggests that concrete has a high resistance to compression, although there was significant variation in this property; ML techniques, known for their flexibility and adaptability, can effectively capture both linear and non-linear variations. By adjusting models to identify complex patterns and relationships between variables, these algorithms can predict and explain substantial variations in the data, including anomalies and outliers. Training on diverse data sets and optimizing hyperparameters allow these techniques to create dynamic models that can identify anomalous behavior or outliers, thereby enhancing the robustness and accuracy of data analysis (Fogliatto 2000).

Correlation Analysis

A correlation analysis was conducted to identify the materials with the strongest relationship with concrete strength. The results of this analysis are as follows:

Concrete compressive strength: It has a moderate positive correlation with cement (0.676) and weaker correlations with other materials.
Cement: It has a moderate correlation with water (0.613) and coarse aggregate (0.643).
Water: It has a moderate correlation with coarse aggregate (0.752) and fine aggregate (0.646).
Test age: The age of the tests shows a moderate correlation with the compressive strength of concrete (0.665).
Other variables: Blast furnace slag, fly ash, and superplasticizer show weaker correlations with the compressive strength of concrete.

These correlations provide valuable insights for developing predictive models of concrete strength, particularly when using ML techniques (Fogliatto 2000). When creating a regression model, we consider the variables with the strongest correlations as important features for predicting the compressive strength of concrete (Young et al. 2019).

Attribute Selection (Selectkbest)

Specific attribute selection techniques were employed to analyze the variables that impact concrete compressive strength, particularly the SelectKBest method. This method assigned scores to the predictor variables, identifying those that contributed the most to explaining the variation in compressive strength. The resulting scores were as follows: ‘cement’ (174.51), ‘blast furnace slag’ (32.89), ‘fly ash’ (0.03), ‘water’ (45.46), ‘superplasticizer’ (58.76), ‘coarse aggregate’ (11.75), ‘fine aggregate’ (0.02), and ‘test time’ (150.51). Figure 1 presents the relationship between the predictive variables and the assigned.

Insert Fig. 1 here

Upon analyzing the scores, it was evident that ‘cement,’ ‘water,’ and ‘test time’ received the highest scores, indicating a more significant contribution to the prediction of compressive strength. In contrast, ‘fly ash’ and ‘fine aggregate’ had considerably lower scores. Based on these evaluations, the variables with the lowest scores were excluded as their contribution to predicting compressive strength was limited.

Analysis for Linear Correlation

Linear Regression

The linear regression model analysis used eight predictor variables (k = 8), and the results revealed moderate performance (R² = 0.594 and RMSE = 10.319 MPa). However, when the model was used with only the six best predictor variables (k = 6) selected by the SelectKBest method, the coefficient of determination decreased (R² = 0.45) and RMSE increased (RMSE = 18,097 MPa).

Subsequent analysis indicated a disparity between both approaches as the results did not reach satisfactory levels (Fig. 2).

Insert Fig. 2 here

Given the inadequate results obtained from simple linear regression, alternative methods were employed to investigate non-linear correlations.

Analysis For Non-Linear Correlation

SVR Analysis

A comparative analysis of different k settings in the context of SVR provides relevant information about the model’s performance (Fig. 3). When k was set to 8, including all predictive variables, the R² reached 0.85, indicating the model’s impressive ability to explain the variability in concrete compressive strength. Furthermore, the RMSE was calculated at 39.9051 MPa, indicating high accuracy in the estimates. When k was adjusted to 6, there was a slight decrease in the performance indicators. The R² reached 0.82, suggesting that the model still had a robust ability to capture the underlying relationships in the data. The corresponding RMSE was 47.1729 MPa, indicating an acceptable level of accuracy in the predictions.

Insert Fig. 3 here

The configuration with k = 8 was chosen due to higher R² and lower RMSE values, indicating greater accuracy in the predictions.

Gradient Boosting

Gradient boosting was utilized to predict concrete strength in the models with six and eight predictive variables. The models showed an excellent fit to the data, achieving R² = 0.90 and RMSE = 25.7029 MPa for k = 8 and R² = 0.90 and RMSE = 27.2153 MPa for k = 6. The analysis indicated that while both sets of parameters produced significant results, the configuration with k = 8 exhibited slightly higher accuracy due to its lower RMSE. Figure 4 shows the scatter plots of both models.

Insert Fig. 4 here

Artificial Neural Networks

By comparing the results obtained from different configurations of ANN, significant variations in predictive performance for concrete compressive strength can be observed. Notably, two scenarios were investigated: one with k = 8 predictor variables and another with k = 6, each representing a unique combination of predictor variables.

In the scenario with k = 8 (Fig. 5), the neural model incorporated variables such as water, cement, fly ash, blast furnace slag, superplasticizer, coarse aggregate, fine aggregate, and curing time. The results revealed a strong coefficient of determination (R² = 0.87), indicating that the model can substantially explain the variability in compressive strength. Further analysis of these variables suggests that approximately 87% of the variation in compressive strength can be attributed to the neural model. Additionally, the RMSE (5.781 MPa) indicated a relatively high level of accuracy in the predictions, with a standard deviation of approximately 5.78 MPa from the actual values.

Insert Fig. 5 here

As for k = 6 (Fig. 5, the model’s performance decreased (R² = 0.77), implying that the ANN accounted for roughly 77% of the variation in compressive strength. Furthermore, RMSE increased to 7.618 MPa, indicating a greater prediction spread than actual values.

Our findings showed that including more independent variables significantly improves the results, regardless of the ML algorithm used. This improvement is evident in both the coefficient of determination, which measures the model’s explanatory capacity, and the RMSE, which reflects the accuracy of the predictions. This finding emphasizes the importance of comprehensive and relevant predictor characteristics in predictive analysis.

Several studies demonstrate the potential of machine learning (ML) models to predict the compression strength of materials with considerable accuracy. Through training on comprehensive datasets, these models learn complex patterns and relationships between material characteristics and their respective strength (Cook et al., 2019). The scientific literature documents cases where ML models achieve coefficients of determination (R²) exceeding 0.90. This means that the model explains more than 90% of the variability in compression strength, which represents a reasonable level of accuracy for practical applications.

In the case of six predictive variables compared to using eight predictive variables, we observed numerically lower values, highlighting the significance of specific physical characteristics of fly ash and fine aggregate excluded from the analysis. Fly ash plays a crucial role in concrete by enhancing cohesion, reducing exudation and segregation, and extending the setting time of fresh concrete. In its hardened state, fly ash contributes to temperature reduction through hydration reactions, resulting in more resilient concrete. Fine aggregates (e.g., sand) are incorporated into concrete to fill voids and improve the workability of the fresh material, playing a decisive role in concrete strength.

The comparison between the models with 6 and 8 variables demonstrates that including more variables positively impacts the model's accuracy in predicting concrete strength. The results show that including more variables in the model leads to an increase in the accuracy of predicting concrete strength. This is because the additional variables provide more information about the behavior of concrete, allowing the model to make more precise estimates.

However, the choice of the number of variables to be included in the model should be made carefully, as an excessive number of variables can lead to overfitting, which can impair its generalization. On the other hand, including an insufficient number of variables can lead to underfitting, which can reduce its accuracy (Hastie et al. 2009).

A comparative analysis of the models, including SVR, GB, and ANN, revealed that GB exhibited the highest coefficient of determination. This model demonstrated a superior ability to explain variation in the data compared to the other models. However, the ANN excelled in minimizing the RMSE, indicating more accurate predictions and reduced scattering in relation to the real values, leading to a more precise estimate of the concrete’s compressive strength.

This research utilized several modeling approaches to predict the compressive strength of concrete by considering relevant predictor variables. Initially, the linear regression analysis displayed moderate performance, indicating its limitation in handling the complexity of the underlying relationships. Reducing the number of predictor variables to six led to a further decline in performance, emphasizing the model's sensitivity to selecting these variables.

The comparison between machine learning methods, such as support vector regression, gradient boosting, and artificial neural networks, revealed interesting nuances. Gradient boosting exhibited superiority in R², indicating the model’s exceptional ability to explain variability in the data. Conversely, the ANN showed the lowest RMSE, suggesting remarkable prediction accuracy.

Thus, the comparative evaluation of these methods prompts considerations regarding the trade-off between explainability and accuracy. While gradient boosting provides a deeper understanding of the relationship between variables, the ANN excels in the accuracy of predictions. The choice between these approaches depends on the project’s specific requirements and the emphasis placed on interpretability versus accuracy.

Author Contribution

A.A.B.L wrote the first draft of manuscript and prepared all figuresR.S manuscript revision and improvement, wrote the final version of manuscript.

Data Availability Statement

Some or all data, models, or code that support the findings of this study are available from the corresponding author github.

Achong, P. S. A., and Guntor, N. A. A. (2021). Concrete Strength Prediction Using Linear Regression of Machine Learning Algorithm. Recent Trends in Civil Engineering and Built Environment,2(1), 691-699. https://doi.org/10.30880/rtcebe.2021.02.01.075.
Andrade, J. J., et al. (2015). Aplicação de métodos de inteligência computacional para a previsão de propriedades mecânicas do concreto de agregado leve. In: XXXVI Iberian Latin-American Congress on Computational Methods in Engineering. http://dx.doi.org/10.20906/CPS/CILAMCE2015-0716.
Andrade, J. J. (2016). Técnicas de inteligência computacional para a previsão de previsões mecânicas do nível concreto de agregado. Universidade Federal de Juiz de Fora, Instituto de Ciências Exatas, Departamento de Ciência da Computação, Bacharelado em Ciência da Computação. Orientador: Leonardo Goliatt da Fonseca. Juiz de Fora.
Artero, A. O. (2009). Inteligência artificial: teórica e prática. São Paulo: Editora Livraria da Física.
Bastos, P. S. D. S. (2006). Histórico e principais elementos estruturais de concreto armado. Notas de aula, Departamento de Engenharia Civil, Universidade Estadual Paulista. Bauru, São Paulo.
Bocanegra, C. W. R. (2002). Procedimentos para tornar mais efetivo o uso das redes neurais artificiais em planejamento de transportes. São Carlos, v. 97.
Braga, A. P., Carvalho, A. C. P. L. F., & Ludemir, T. B. (2000). Redes neurais artificiais: teoria e aplicações. Rio de Janeiro: LTC.
Brownlee, J. (2020). Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery.
Cervantes, J., Garcia-Lamont, F., Rodríguez-Mazahua, L., & Lopez, A. (2020). Uma pesquisa abrangente sobre classificação de máquinas de vetores de suporte: aplicações, desafios e tendências. Neurocomputing, 408, 189-215. https://doi.org/10.1016/j.neucom.2019.10.118.
Chaabene, W. B., Flah, M., & Nehdi, M. L. (2020). Machine learning prediction of mechanical properties of concrete: Critical review. Construction and Building Materials, 260, 119889. https://doi.org/10.1016/j.conbuildmat.2020.119889.
Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16-28. https://doi.org/10.1016/j.compeleceng.2013.11.024.
Chein, F. (2019). Introdução aos modelos de regressão linear: um passo inicial para compreensão da econometria como uma ferramenta de avaliação de políticas públicas. Brasília: Enap.
Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks, 17(1), 113-126. https://doi.org/10.1016/S0893-6080(03)00169-2
Cook, R., Lapeyre, J., Ma, H., & Kumar, A. (2019). Prediction of compressive strength of concrete: critical comparison of performance of a hybrid machine learning model with standalone models. Journal of Materials in Civil Engineering, 31(11), 04019255. https:// doi/abs/10.1061/(ASCE)MT.1943-5533.0002902.
Costa, E. S. M., & Brandão, M. A. S. (2023). Redução de amplitude de variáveis categóricas utilizando aprendizado não-supervisionado de máquinas.
da Silva, R. M. B., & Silva, F. M. (2022). Adição de cinzas em concreto. Multidebates, 6(2), 57-65. https://revista.faculdadeitop.edu.br/index.php/revista/article/view/534.
Estacechen, T. A. C. (2020). Comparativo da resistência à compressão do concreto através de ensaios destrutivos e não destrutivos. CONSTRUINDO, 12(2), 23-37. http://revista.fumec.br/index.php/construindo/article/view/7233.
Fan, R.-E., et al. (2008). LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 9, 1871-1874. https://cir.nii.ac.jp/crid/1574231875572593920.
Feng, D. C., et al. (2020). Machine learning-based compressive strength prediction for concrete: An adaptive boosting approach. Construction and Building Materials, 230, 117000. https://doi.org/10.1016/j.conbuildmat.2019.117000.
Fogliatto, F. S. (2000). Estratégias para modelagem de dados multivariados na presença de correlação. Gestão & Produção, 7, 17-28. https://doi.org/10.1590/S0104-530X2000000100002.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
Helene, P., & Andrade, T. (2007). Concreto de cimento Portland. In: Isaia, Geraldo Cechella (Ed.), Materiais de Construção Civil e Princípios de Ciência e Engenharia de Materiais (pp. 905-944). São Paulo: IBRACON, v. 2. https://www.phd.eng.br/wp-content/uploads/2014/07/lc48.pdf.
Helene, P., & Silva Filho, L. C. (2011). Análise de estruturas de concreto com problemas de resistência e fissuração. Concreto: Ciência e Tecnologia, 2.
Kaefer, L. F. (1998). A evolução do concreto armado. São Paulo. Acessoado em 08 de fevereiro de 2024. https://wwwp.feb.unesp.br/lutt/Concreto%20Protendido/HistoriadoConcreto.pdf.
Kamath, M. V., et al. (2022). Machine-Learning-Algorithm to predict the High-Performance concrete compressive strength using multiple data. Journal of Engineering, Design and Technology. https://doi.org/10.1108/JEDT-11-2021-0637.
Maioli, C. (2021). Análise Comparativa Interlaboratorial De Ensaios À Compressão Axial Em Corpos De Prova De Concreto. Revista Conectus: Tecnologia, Gestão e Conhecimento, 1(3). https://revista.ftec.com.br/index.php/01/article/view/45/57.
Paixão, R. C. F. da, et al. (2022). Comparação de técnicas de aprendizado de máquina para prever a resistência à melhoria do concreto e considerações sobre a generalização de modelos. Revista IBRACON de Estruturas e Materiais, 15. https://doi.org/10.1590/S1983-41952022000500003.
Paixão, V. M., et al. (2023). Bioinformática com Jupyter Notebook. Alfahelix Publicações.
Pereira, M. D. S. (2008). Controle da resistência do concreto: paradigmas e variabilidades: estudo de caso.
Pinheiro, L. M. (2007). Fundamentos do concreto e projeto de edifícios.
Scrivener, K. L., & Snellings, R. (2022). A ascensão dos cimentos Portland. Elements: An International Magazine of Mineralogy, Geochemistry, and Petrology, 18(5), 308-313. https://doi.org/10.2138/gselements.18.5.308.
Smola, J. A., & Schölkopf, B. A. (2004). Tutorial on Support Vector Regression. Statistics and Computing, 14, 199-222. https://www.ise.ncsu.edu/fuzzy-neural/wp-content/uploads/sites/9/2022/08/SVR-Tutorial.pdf.
Souza, J. C. (2022). Proposta de aplicação de aprendizado de máquina na previsão de resistência do concreto.
Suen, Y. L., Melville, P., & Mooney, R. J. (2005). Combining bias and variance reduction techniques for regression trees. In: Springer, Berlin, Heidelberg. European Conference on Machine Learning. pp. 741–749. https://doi.org/10.1007/11564096
Wang, J., Li, L., & Zeller, A. (2020). Melhor código, melhor compartilhamento: sobre a necessidade de analisar notebooks Jupyter. In: Anais da 42ª Conferência Internacional ACM/IEEE sobre Engenharia de Software: Novas Ideias e Resultados Emergentes, pp. 53-56. https://lilicoding.github.io/papers/wang2020better.pdf
Woźniak, M., Grana, M., & Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16, 3-17. https://doi.org/10.1016/j.inffus.2013.04.006.
Young, B. A., et al. (2019). Can the compressive strength of concrete be estimated from knowledge of the mixture proportions?: New insights from statistical analysis and machine learning methods. Cement and Concrete Research, 115, 379-388. https://doi.org/10.1016/j.cemconres.2018.09.006.
Zhu, J., et al. (2023). Previsão da Resistência do Concreto com Base em Random Forest e Gradient Boosting Machine. In: 2023 IEEE 3rd International Conference on Power, Electronics and Computer Applications (ICPECA), pp. 306-312. IEEE. https://ieeexplore.ieee.org/document/10075839.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
29 Apr, 2024
Reviews received at journal
23 Apr, 2024
Reviews received at journal
18 Apr, 2024
Reviewers agreed at journal
18 Apr, 2024
Reviewers agreed at journal
18 Apr, 2024
Reviewers agreed at journal
11 Apr, 2024
Reviewers invited by journal
09 Apr, 2024
Editor assigned by journal
09 Apr, 2024
Submission checks completed at journal
09 Apr, 2024
First submitted to journal
27 Mar, 2024

You are reading this latest preprint version

Comparing the performance of machine learning models for predicting the compressive strength of concrete

Status:

Version 1

Abstract

Figures

Introduction

Materials and Methods

Jupyter Notebook

Database

Attribute Selection

Linear Regression

Support Vector Machine for Regression

Gradient Boosting

Artificial Neural Networks

Results

Exploratory Analysis

Correlation Analysis

Attribute Selection (Selectkbest)

Analysis for Linear Correlation

Linear Regression

Analysis For Non-Linear Correlation

SVR Analysis

Gradient Boosting

Artificial Neural Networks

Discussion

Conclusions

Declarations

Author Contribution

Data Availability Statement

References

Additional Declarations

Status:

Version 1