Statistical Model for Predicting the Optimum Gypsum Content in Concrete

The problem of internal sulfate attack in concrete is widespread in Iraq and neighboring countries. This is because of the high sulfate content usually present in sand and gravel used in it. In the present study the total effective sulfate in concrete was used to calculate the optimum SO 3 content. Regression models were developed based on linear regression analysis to predict the optimum SO 3 content usually referred as (O.G.C) in concrete. The data is separated to 155 for the development of the models and 37 for checking the models. Eight models were built for 28-days age. Then a late age (greater than 28-days) model was developed based on the predicted optimum SO 3 content of 28-days and late age. Eight developed models were built for all ages. The important results obtained from the developed models are the positive effect of C 3 S, C 3 A and C 4 AF on optimum SO 3 content. The effect of C 3 A on optimum SO 3 content is about twice that of C 4 AF. The study also showed a trend of positive and important effect of the fineness of cement except in some models and this is due to statistical overlap


DETERMINATION OF THE STATISTICAL MODEL VARIABLES 1.1 Collecting Data
In order to build a regression predictive model, there should be sets of data that cover a wide range of variation of the independent variable. A survey was carried out to obtain the required data has been chosen to cover locally published literature from (1977 to 2002) as presented in Table 1.

The Independent Variables
The Followings are the selected data of the independent variables; the data were processed to obtain the information listed below and as presented in Table 2. 1. Total alkalis as equivalent Na 2 O. 2. Main compounds of cement. 3. Cement surface area (Blaine fineness).

The Dependent Variables
The value of optimum SO 3 content has to be predicted from the relationship between compressive strength and different SO 3 content as detailed in the presented research items as shown in Table 3. The decision was based on the observed variation of SO 3 content with maximum compressive strength and the change of SO 3 content with age of the same mix.

Preliminary Statistical Analysis
The analysis focused on the calculation of the following measures of central tendency and dispersion of data and the number of data equal to 178. 1. Mean, median and mode (central tendency) 2. Minimum and maximum, range and standard deviation (dispersion). The calculated measures of central tendency and dispersion are presented in Table 4.

Correlation Analysis
Two types of correlation coefficient obtained which were Person and Spearman [SPSS manual] between dependent and independent variables are presented in Table 5 and 6 respectively. First one is used for linear relationship while the second coefficient for non User Page 244 1/13/2013linear relationships. This is achieved by comparison between calculated (r c ) and the critical correlation coefficient (r c ) at a specified level of significance [Bland (1985)] and can be calculated using the equation given below. Where: r c = the critical correlation coefficient α = the level of significance, t = the standard t variable, n = number of sample data pairs.

DEVELOPED REGRESSION MODELS FOR CONCRETE 2.1 28-days model
Developments of predictive models for concrete are made in two stages based on age of the product. The first stage focused on data for the age of (28-days) and the second stage for late ages higher than 28-days. First descriptive statistic analysis presented in Table 7. The calculated coefficient for Person and Spearman correlation are presented in Tables 8 and 9 respectively. Comparison between the values in the two Tables (8) and (9) and indicates that there is a high correlation between the independent variables. From the partial correlation presented in Table 10, it could be concluded in general that the coefficients of correlation of the linear relationship are higher than the critical coefficient of correlation except for the relation with total alk. , C4AF and fineness Blaine, which is lower than the critical value and it is higher than the nonlinear relationship so the multiple linear regression analysis is used for model development. Eight models were built for 28-days age presented in Table 11 and the number of data is equal to 33 when ignoring Abdul-Latif ` data (1997-2001 and this means no missing value for total alkalies for model (1-A,2-A,3-A) and 42 when used for models (1-B,2-B,3-B,4 and 5). The missing values for total alkalies were replaced by the average value for all other data.  Tables 11 and 12 the followings can be  concluded: 1. The best statistical model is (1-A) since, it has the highest coefficient of determination, R 2 (0,992), lowest root mean square of error (0.3424) and the Durbin-Watson value within the accepted range of (1.5-2.5) although the T-value is not the best but it is still low despite that some independent variables not on the line of concrete technology. 2. For model 2-A, 2-B, the effect of L.O.I is removed because it is less effective in concrete, the following can be concluded: -Some independent variables not in the line of concrete technology. -High coefficient of determination, R 2 of (0,985 and 0.974), low root mean square of error (0.458 and 0.5816), and the Durbin-Watson is not within the ranges (1.36 and 0.952) and Tvalue is low value (-0.24 and 0.22). 3. For model 3-A and 3-B the effect of L.O.I and MgO are removed. The reason is that the collected data below the values mentioned in the ASTM specification (6%). Furthermore, examinations of the model suggest the following: -Some of the independent variables effect is not consistent with the current knowledge of concrete technology. -Low coefficient of determination R 2 of (0.954 and 0,952) in comparison with the other developed models. This is in addition to the high root mean square of error (0.7851 and 0.7753),and the Durbin-Watson statistic is not within the ranges(1.331 and 0.788) and T-value is low value (-0. 81 and 0.01). 4. For model 4 , The effect of total alkalies is removed in order to include Abdul-Latf's data (1997 -2001) , so the model become with no missing values .From this model the following can be observed : -High coefficient of determination R 2 of (0.987), low root mean square of error (0.417), and the Durbin-Watson is not in within the rang but it is closest to the rang and T-value is the lowest value (0. 01). 5. For model 5 in general the independent effect is consistent with the current knowledge of concrete technology, but the shortcoming is on the statistical concept, it has low coefficient of determination R 2 of (0.952) compared with other model, the low root mean square of error (0.7686), and the Durbin-Watson is not within the range (0.726) despite the low T-value (-0. 01).
From the presented as above analysis it can be concluded that it is so difficult to choose the best acceptable model which satisfies the conditions of concrete science and regression analysis. Therefore, the decision was selected of 1-A, 3-B and 4 for more examination.
Examination of the scatter plots for C 3 A and optimum predicted SO 3 versus the residuals are presented in Figs. 1 and 2 for model 1-A, Figs. 3 and 4 for model 3-B and Figs 5 and 6, indicates that model 3-B does not adequately represent the obtained data. Therefore this model is ignored in the following analysis.
Further statistical analysis is made to find the best model among those described as above. The relationship between the observed and predicted SO 3 are presented in Fig. 7 and Fig. 9 for models 1-A and 4 respectively. The conclusion is that the developed models result in minimal random error. By contrast, Fig. 8 for model 3-B is less articulate.
Moreover, the distribution of residuals presented in Figs 10 ,11 and 12 for model 1-A , 3-B and 4 respectively provide further evidence to support the conclusion that model 3-B is not a reliable model. To conclude this section, it was decided that the data presented in Fig 12 provide the best fit between observed and predicted SO 3 values. The implication is that model 4 is the best to describe the obtained data.

Late Ages models (Greater Than 28-Days)
Following the development of 28-days models, a late age (greater than 28-days) model was developed based on the predicted optimum SO 3 content of 28-days and late age. The number of data is equal to 77.
Descriptive statistical analysis is presented in Table 13. The predicted models for late ages are presented in Table 14.
Optimum SO 3 % ( Late ages)-model 4= 0.976× SO 3 (predicted for 28-days) +1.251E-03× Time (late ages) eq. (1) Table 15 shows that the standard error of estimate (R 2 ) is (0.97). This has the implication that 97.0% of the observed scatter in the data is explained by the adopted model. This conclusion is consistent with result of comparison of the calculated F (1206.493) with the tabulated critical F value of (3.127) at the 95% level of confidence. Moreover, the calculated Durbin-Watson value is (1.939) which is within the range (1.5-2.5) and hence, a minimal random error would be expected. The value of T-statistics equal to (T= 0.08). A prove to the conclusion that the developed model result is in a minimal random error can found by examination of Fig. 13. Examination of Figs.14 and 15 which shows scatter plots of predicted optimum SO 3 and MgO, variables versus the residual. The presented data suggest the existence of random variation between variable values and its residual values. Finally the distribution of the residuals is shown in Fig. 16, from this figure it is clear that the residuals are almost normally distributed. From all statistical analysis presented above, it is also difficult to select model 4 as the best model for 28-days model since it contain some independent variables not in the line of concrete technology. So all age model may be the alternative model.

All-Age Concrete Models
Eight development models were built for all ages, and the number of data is equal to 132 when Abdul-Latif `s data (1997)(1998)(1999)(2000)(2001) were ignored and 155 when entering them. The results of the preliminary descriptive statistical analysis are presented in Table 16. Results of linear and non linear (Pearson and Spearman) correlation analysis are presented in the form of a matrix in Tables 17 and 18 respectively. The data presented suggest that in general the linear model provides better fit for the data between the compounds and there are highly correlated with each other. From the partial correlation presented in Table  19, in general the coefficients of correlation of the linear relationship are higher than the critical coefficient of correlation except for MgO and C 3 A. For nonlinear relationship all independent variables are less than the critical value. Based on this result it was decided to use linear multiple regression technique for the developed required statistical model. The regression equation coefficient obtained, tvalue and the decision are presented in Table  20. From Tables 20 and 21 we can conclude: 1. The best statistical model is (1-A) since ,it has the highest R 2 , lowest root mean square of error and the Durbin-Watson value within the range although the Tvalue is not the best but it is still a low value despite that some independent variables not on the line of concrete technology . 2. The model (1-B) may be selected as the best model for the following reasons: -In general the regression coefficient is in the line of concrete technology except for total alkalies and this is because that we replace the value of Latif`s data (1997 and 2001) by mean value and this effect the final result.
-High coefficient of determination R 2 of (0.98) , low root mean square of error (0.4821) and the Durbin -Watson is not within the range , but it is still near the range and with low T-value (-0.49).
-The model includes all independent variables. 3. For model 2-A, 2-B ,3-A and 3-B it is clear from the Table presented above there is a shortcoming either in the statistical concept or that some independent variables not in the line of concrete technology and this due to high correlated between the independent variables . 4. Model 4 is the best model for the following reasons: -All independent variables is on the line with concrete technology understanding , as it contains all expected positive and negative factors and values .
-High coefficient of determination, R 2 (0.98), low root mean square of error (0.417), and the Durbin-Watson is not within the ranges but it is closest to the ranges and T-value is the lowest value (-0.08).
-The model including all independent variables except total alkalies. 5.
For model 5 despite all independent variables are in the line of concrete technology ,but the shortcoming is on the statistical concept, it has low coefficient of determination , R 2 (0.962)compared with other model , the highest root mean square of error , and the Durbin-Watson is not within the rang despite the low T-value (0.08) . Examination of Figs. 17 and 18 for model 4 and Figs.19 and 20 for model 5 which shows scatter plots for C 3 S, C 3 A, Blaine and age versus the residuals of each variable. The data presented suggest the existence of random variation between variable values and its residual values. The data presented provides further confirmation to the conclusion that the developed model 4 can be considered as the best selected model. A proof to the conclusion that the developed model results in minimal random error can found by examination of Fig. 21 for model 4. The distribution of the residuals is shown in Fig 22 from this figure it is clear that the residuals are almost normally distributed.

CONCLUSIONS 3.1 Development of Models for (28 Days -
Late Age And All Age Model) of Concrete: 1. The examination of the data presented for all variables indicates that the coefficients of correlation for linear relationship are substantially higher than that for nonlinear relationship. 2. In general, statistically, it was also found that the MgO content of cement positively affects the optimum SO 3 content. 3. Increasing the SO 3 content in sand affects the optimum SO 3 content of concrete and this effect is more significant than that due to increasing the SO 3 content in coarse aggregate, so total effective SO 3 in concrete is preferred.

28 Days -Late Age Models:
In the 28-days model the relationship between the independent variables themselves and the optimum SO 3 content is overlapped resulting in the high correlation between them. From the presented regression analysis it is difficult to choose the best model because the regression models are either in the line of concrete technology or best statistical analysis. According to the results obtained from the models of 28-days, the following could be concluded: 1. In general, the trend for both C 3 S and C 2 S are positive and this is due to the positive influence effect for both C 3 S and C 2 S on 28-days strength. 2. For more confidence for the above conclusion, the value of regression coefficient C 2 S is less than for C 3 S in all positive effect models (1-b, 2-B, 3-B, 4, and 5). 3. It was proved statistically that the effect of C 3 A is positive. 4. In general, the effect of C 4 AF is positive. 5. In general the effect of C 3 A is about double that of C4AF except for 1-A, 2-A and 3-A and this due to the combined effect between Abdul -Latif `s data and other authors data. 5. The trend of Blaine fineness is not clear, so it needs more study. 6. It is proved statistically that the optimum SO 3 content increases with increase of age in late age model.

All Age Models
For our best models 4 and 5, the following could be concluded: 1. The effect of C 3 S is positive and of C 2 S is negative. 2. The effects of C 3 A and C 4 AF are positive.
3. The effect of C 3 A is about twice the effect of C 4 AF. 4. The effects of C 3 A and C 4 AF are higher than the effects of C 3 S and C 2 S. 5. The positive effect of fineness Blaine. 6. The positive effect of time led to increase in optimum SO 3 content with increase of age. 7. Blaine fineness and C 3 A were found as the major factors affecting the optimum SO 3 content.                      Optimum SO3-actual Optimum SO3-predicted%