Improving the Prediction of Cement Compressive Strength by Coupling of Dynamical Models

The dynamic approach of two well-known techniques has been used to predict a cement’s 28-day compressive strength: Multiple linear regression (MLR), and artificial neural networks (ANN). The modeling is based on Portland cement data and utilizes daily physical, chemical analyses, and early strength results at days 1 and 7. Two kinds of models have been built, containing the 1-day strength as an independent variable, or both 1and 7-day strength. The models are dynamic because they are applied to a movable past period of TD days to calculate the parameters, and then used for a future period of TF days. The comparison is based on the residual error of the testing period, and TD, TF have been optimized. Eight ANNs of different complexity have been developed, but some of them are suffering from over-fitting. A third model has also been created with the coupling of the initial two. The time parameters as well as the filtering and weighting coefficients of the coupled model have been optimized. The simple ANNs with one node in the hidden layer, sigmoid or hyperbolic functions and bias, show better performance. The combination of the coupled model with these two best ANN techniques provides an improved prediction of 28-day strength compared with the initial model containing the 1-day strength. The sensitivity also of TF parameter is lower providing certain benefit in daily industrial application. The implementation of these methods in cement process control can contribute to quality improvement by maintaining a low variance of typical strength.


Introduction
Artificial neural networks (ANN) are an attractive tool for the modeling of non-linear processes and phenomena.In the cement industry, ANNs are mostly used to describe and control the main production operations: burning [1][2][3][4] and grinding [5][6][7] .Prediction of a cement's 28-day strength from earlier analysis results of the same sample, based on the quality control database for cement produced in the past, remains a challenging issue.Mainly linear and polynomial models have been developed, or algorithms that can be traced back to such models.Extensive reviews for these techniques can be found in literature [8][9] .Neural networks have been used successfully for the prediction of concrete strength by building various ANN structures.Examples of such application of ANN in concrete are referred to in numerous studies [10][11][12][13][14][15][16][17] .However, relatively few studies are based on ANN methodology or generally evolutionary and genetic algorithms (GA) that cor-relate a cement's typical 28-day strength with other cement properties.
In 2011, Dolado et al. 18 presented an excellent review of the recent efforts to describe cement-based materials by computational means.Their approach was to distinguish the models at several levels.They summarized sub-micro level simulations, as well as micro-, meso-, and macro-level models.They concluded that the trend to establish open platforms for modeling and for the use of simulation models illustrates the increasing demand for this type of tool for solving real engineering problems.Akkurt et al. 19 developed a GA -ANN model of cement compressive strength by collecting and processing six months-worth of industrial data on the chemical, physical, and mechanical characteristics.Their results indicated that an increase in C 3 S, SO 3 and specific surface leads to increased strength.Gene expression programming (GEP) and neural networks were used by Baykasoglu et al. 20 and Thamma et al. 21to predict the strength of Portland composite cement.Motamedi et al. 22 predicted the compressive strength of cockle shell-cement-sand mixtures using and comparing the support vector regression (SVR) and adaptive -neuro-fuzzy inference (ANFIS) techniques.Firstly, the ANFIS network was used to find the parameters having a stronger impact on strength.Their findings showed that the capability of generalization could be improved by the ANFIS approach in comparison to the SVR estimation.Motamdi et al. 23 also applied the ANFIS technique to predict the strength of a pulverized fuel ash-cement-sand mixture.They concluded that ANFIS provided a suitable platform when the analysis was aimed at countering the uncertainties in a system.Zhang et al. 24 introduced an algorithm named Double-layer Multi-expression Programming (DMEP).They implemented the DMEP to the prediction of Portland cement's 28day strength, and they compared the results with those of four other computational models, namely the Multi-Expression Programming model (MEP), Gene Expression Programming model (GEP), Neural Network model and Fuzzy Logic model (FL).Madsen et al. 25 applied FL and GA techniques to predict the strength of CEM I cement for all the strength classes.Ren et al. 26 applied generalized regression neural network (GRNN) techniques to predict the heat of hydration and compressive strength of cement.Yongzheng et al. 27 developed a model combining Principal Components Analysis (PCA) and ANN algorithms.The predictions were accurate in their field of application.If the value of a parameter that is not contained in the set of the independent variables changes noticeably during the cement production process, the predictive model could fail.Consequently, most of the models could be called "static", since the parameters are estimated from a specified data set and the future strength is predicted.
Tsamatsoulis [28][29] presented a comparison between static models and those with a movable time horizon, based on polynomial equations and longterm process results.The latter models incorporate the uncertainty due to the time variability of non-involved factors during the modeling, thus they are dynamic.The particularities of these two models have been explored thoroughly, and the superiority of the dynamic models have been proven.Apart from chemical and physical measurements, the results of early strength were also utilized, and two independent equations were applied for the prediction of 28-day strength: (i) The first where the 1-day strength constitutes an independent variable, named Str28_1.(ii) The second where the 7-day strength is also included, named Str28_7.This second model provides much better accuracy than the first one, but its bigger delay -more than 7 days -constitutes a severe drawback in daily practical application.Tsamatsoulis expanded his studies 30 by developing neural network structures incorporating the results of 1-day strength, as well as physical and chemical characteristics.The aim of this study is dual.Firstly, to compare two types of dynamical techniques: The multiple linear regression (MLR) with a technique based on several types of ANNs.Both techniques use data of a predetermined period for training, namely for computing of optimal parameters.Afterwards the data belonging to a time interval that follows the training period are used for validating the model.The training and validation time interval constitute the past and future period, respectively.Throughout this text, the errors corresponding to past periods are characterized as training errors, while the ones computed from future period data are named test errors.The criterion of comparison is the capability of a model to predict the cement's future strength that can be characterized as the generalization ability 31 of modeling.The time intervals of training and validation need optimization to achieve the best prediction.The minimal test errors of MLR and ANNs are compared in order to select the best technique for further examination.Special care has been given to the over-fitting problem that frequently appears in ANN techniques.Subirats et al. 32 clearly stated that one of the strategies of avoiding over-fitting is the search of compact architectures.Other popular techniques include early stopping of training using a validation data set, weight decay, and exclusion of noisy instances from training data.The MLR and ANN techniques are implemented to data sets corresponding to the Str28_1 and Str28_7 models mentioned earlier.The second main objective of this study is to couple dynamically the predictions of the model based on the 7-day early strength with the ones of the model based on the 1-day strength, aiming to utilize the prior information derived from the Str28_7 model in order to improve the predictability after the 1-day strength has been measured.In this case, a more precise prediction of strength leads to an improved quality control of the production process.The study is restricted to Portland composite cement types.

Materials and testing methods
Two Portland cement types, produced according to EN 197-1:2011 33 were studied: CEM II A-L 42.5 N and CEM II B-M (P-L) 32.5 N.Each common cement type necessarily contains clinker and gypsum.The first cement type contains limestone as the main component, while the main components of the second type are natural pozzolane and limestone.The typical chemical characteristics of the raw materials are shown in Table 1.The oxides analysis was performed with XRF.The typical mineral composition of clinker is also presented according to Bogue equations ( 1)-( 4).The subsequent no-tation is used for these formulae: With CaO f , the clinker free lime is denoted.
2.87 0.754 The EN 197-1:2011 requirements as regards composition and 28-day strength limits are shown in Table 2.The modeling is based on the results of daily average samples obtained from the corresponding chemical analyses, physical, and mechanical measurements.The following data were utilized: (i) Residue at 40 μm sieve, measured with air jet sieving according to EN 196-6 34 ; (ii) Specific surface, measured according to EN 196-6; (iii) Loss on ignition and insoluble residue of the cement, measured according to EN 196-2 35 ; (iv) SO 3 , measured with XRF; (v) Compressive strength at 1, 7 and 28 days.The preparation, curing, and measuring of the specimens were made according to EN 196-1 36 .The typical chemical, physical, and mechanical properties of the two cement types are depicted in Table 3.The modeling is based on more than 3400 sets of data corresponding to nine years' daily production results of Halyps plant.

Mathematical models predicting strength
The common independent variables in all models are: Loss on ignition, LOI, sulfates content, SO 3 , insoluble residue, Ins_Res, residue at 40 microns sieve, R40 and specific surface, S b .The reason for using chemical analysis instead of the composition of cement, as was done in earlier modeling, is the fullest generalization of generated equations.The direct use of chemical analysis does not need prior knowledge of raw materials and clinker composition.Two basic and independent models initially applied to the prediction of 28-day strength: (i) The one named Str_28_1, where the 1-day strength -Str_1-constitutes an input variable, except the set of physical and chemical data.(ii) The second one

Multiple linear regression
The seven independent variables are named X I with I=1 to 7, where: X 1 =LOI, X 2 =SO 3 , X 3 =Ins_Res, X 4 =S b , X 5 =R40, X 6 =Str_1, X 7 =Str_7.The 28-day strength is a dependent variable, Y=Str_28 and then algorithm goes on as follows: (i) For a given data set, the minimum and maximum values of X I and Y, X I,MIN , X I,MAX , Y MIN , Y MAX respectively, are computed.
(ii) The variables X I , Y are normalized.The set of the new variables XN I , YN is calculated from equations ( 5) and (6).The normalized data belong to the interval [0, 1]., , , .
(v) For a total number of data sets equal to M and actual strength Str_28 Act , the coefficients A I , I = 0 to N, are computed by minimizing the residual error s Res which is calculated from the formula (9).( 9)

Neural networks
Two main kinds of neural networks were developed: The usual feed-forward ANN with three layers, and the more complicated cascade ANN.The back propagation method is applied in batch mode.The hidden layer of the ANN with three layers contains one or two nodes.The non-linearity of the activation function is approached using the sigmoid, hyperbolic and radial basis functions.The modeling also involves ANNs with and without bias.The result of the combinations is an elevated number of structures with nomenclature presented in Table 4.The ANNs with three layers and one or two nodes in the hidden layer are depicted in Figure 1 as concerns the model St_28_1.The model Str_28_7 includes an additional node in the first layer for the reception of Str_7, thus it needs one or more additional synaptic weights, depending on the number of nodes in the hidden layer.The ANN with one node in hidden layer accepts and processes all the data.In the two nodes case, each one takes as inputs the data of each cement type (CEM B-M 32.5 and CEM A-L 42.5).The software identifies the cement type from the chemical analyses.At each node the linear combination of the inputs and of synaptic weights is performed and the result enters to the activation function.In the case of sigmoid or radial basis functions, equations ( 5) and ( 6) are applied for normalization.When hyperbolic tangent function applies, the normalization is made according to formulae (10) and (11).In this case, XN I , YN belong to the interval [-1, 1].Then Str_28 is back calculated from its normalized value by applying equation (12).(10)   , , ..

I I MIN I MAX
The activation functions are described by equations (13) to (16).( ) where o(J) is the output of the node J.When the hidden layer contains one node, J = 1, while when two nodes exist, J is either 1 or 2.
Output layer activation function (16)   ( ) ( ) where N 1 = 1 or 2 depending on the number of nodes in the hidden layer.
The more complicated structure of the cascade ANN is depicted in Figure 2. Cascade ANN has been applied for the prediction of concrete strength by Badde et al. 37 To develop this kind of ANN, the technique described by Shetinin 38 has been followed.All the hidden layers contain sigmoid functions, while the output layer combines linearly the outputs of each hidden layer with the weights V I , I = 1..4.In the first hidden layer Str_1 and LOI, the two most significant inputs as regards their impact on Str_28, are fed.Their linear combination with W 11 , W 21 follows.The output o( 1) is multiplied with the weight V 1 and is fed to the output node.These three synaptic ror.An explanation is the expected descending function between R40 and S b .Due to the fundamental impact that the specific surface has on the reaction rate of cement hydration, it is concluded that S b is more significant compared to R40 at least for this ANN type.

Dynamic modeling
The common feature among the linear regression and neural network techniques is the dynamic modeling, which is described by the following algorithm: (i) At date t, a new 28-day strength result occurs.The specimen was prepared 28 days ago.The production date is in distance t-29 days from the current date t.
(ii) A time interval of T D days and the samples belonging to the period [t-29-T D , t-29] are considered.The dynamic data set contains the results of this population of daily average samples, i.e. all the daily results of LOI, Ins_Res, SO 3 , R40, S b , Str_1, Str_7, Str_28 of both cement types for cement produced in each cement mill.An example of a data set is provided in Figure 3, for T D = 180 days.
(iii) Using the selected technique (MLR or ANN), the sets of parameters that minimize the residual errors of models Str_28_1 and Str_28_7 are computed.
(iv) At day t, the chemical and physical results of the cement produced on the previous day, the 1-day strength of the cement produced 2 days ago, and the 7-day strength of cement produced 8 days ago are measured.
(v) With the set of parameters computed in step (iii), the 28-day strength of cement produced at t-2 and t-8 days is estimated, by applying the models Str28_1 and Str28_7, respectively.
(vi) Steps (iv), (v) are repeated for all the consecutive days up to t+T F -1, where T F is a predetermined time interval, without new parameters estimation.
(vii) According to step (vi), for the days belonging to the interval [t, t+T F -1], the future strength of the cement produced in the time intervals [t-2, t+T F -3], [t-8, t+T F -9] is computed according to the equation of step (iii).If the date becomes greater than t+T F -1, a new parameter estimate is carried out starting from step (i), considering as initial time, the first date that is greater than t+T F -1.
(viii) When the results of T F days have been completed, the time interval of T D length moves forward by T F days.Thus, the future 28-day strength is calculated using models applied to movable data sets of time span T D with step of length T F .The above means that, when the data belonging to the interval [t-29, t+T F -29] are added, the data contained in the first T F days of the interval [t-29-T D , t-29] are subtracted.
(ix) Parameters T D and T F shall be optimized considering the following two criteria: (a) minimum MSRE Past during modeling, and (b) minimum error MSRE Futur during future application of the models.
(x) For each model, each T D and T F, , and for each past and future time interval, a set (A I , MSRE-Past , MSRE Futur ) is computed from the samples belonging to this interval.A Newton-Raphson non-linear regression method has been used to determine the values of parameters A I which minimize s Res,TD (I).Then the model with this set of parameters applies to the data of T F length to obtain s Res,TF (I).
The number of the consecutive sets (A I , MSRE Past , MSRE Futur ) is K TD and it is a function of T D and T F values.The mean square residual errors MSRE Past , MSRE Futur are calculated by equations ( 17) and (18), respectively.

Coupling of models
The coupling of models Str_28_1 and Str_28_7 makes use of the exponentially weighted moving average (EWMA) filter 39 .For a variable X and discrete time I in days, the EWMA variable Y is defined by the procedure: (i) For time I = 0, the initial moving average Y(0) is expressed by the relation ( 19): (ii) For a parameter λ, where 0 < λ ≤ 1, the statistic Y(I) is computed by the recursive formula (20).
3 F i g. 3 (iii) If λ = 1, the value of moving average becomes equal to the current one.The smaller the λ value is, the lower the rate of change becomes, and trends of longer duration can be revealed.
It is supposed that, at time I, the actual 28-day strength is Str_28(I), and the computed one from Str_28_7 is Str_28_7(I).The difference Diff(I) is defined by formula (21).
Afterwards, the following procedure applies: (i) The moving average of Diff(I), EW_Diff(I), is calculated by applying equation ( 20) for a predefined value of λ.
(ii) The corrected value Str_28_1(J), named Str_28_EW(J), is computed from equation ( 22), where J ≥ I. (22)   (iii) The parameters k and λ of the coupled model need optimization using the mean square residual error (MSRE) as criterion.For each CEM type, different parameters are used, the set of (k BM , λ BM ) for CEM II B-M 32.5, and the set (k AL , λ AL ) for CEM II A-L 42.5.

Preliminary processing of data
To examine the correlations among all of the input variables and Str_28, the correlation coefficients were computed.The results are presented in Table 5.These correlations should be inspected with some criticism because some of them give a false signal.Therefore, the data have to be examined from a physical stand-point.From the typical characteristics of the cements shown in Table 3 and limits of composition and strength depicted in Table 2, the subsequent remarks can be made: (a) CEM II A-L 42.5 N has higher average SO 3 value compared with CEM II B-M (P-L) 32.5 N. (b) Due to the higher insoluble residue, the second cement contained pozzolan, which was not detected in the first.(c) The lower LOI value of the first cement means lower limestone content compared with the second.The negative correlation between Str_28 and S b is false because limestone, pozzolane, and gypsum are materials of high grindability, consequently, in the two CEM types the strength and specific surface were found in reverse order.The positive correlation between Str_28 and SO 3 is also doubtful and needs deeper investigation.A low correlation was found between 28-day strength and R40.The reason being, that R40 is similar for both cements because it is a process variable regulated by the separator of the cement mill, while strength differs considerably due to the different clinker content of each cement.All the above considerations need a deeper investigation.Strong correlations exist between Str_28 and LOI, Ins_Res, Str_1, Str_7.Early strength is not only a function of physical and chemical characteristics, but also of the clinker activity and grinding conditions.In the case of a model that relates 28-day strength only with physical and chemical properties of the cement, the modeling error is higher, because all other independent variables act as noise.Tsamatsoulis 29 performed such a comparison by developing second-degree polynomial models applied to a subset of data utilized in this study.
A detailed search of the shape of the functions between Str_28 and input variables follows based on the processing of results of two years.The reason for performing the analysis to a subset of the total population is to reduce the impact of independent variables not included.For early strength -Str_1 and Str_7 -the interval of their minimum and maximum values is partitioned into N equal intervals.In each sub-interval, the average values of early strength and Str_28 are computed and plotted in Figure 4.A noticeable non-linearity occurs in the function between Str_1 and Str_28 and the gain is higher for lower values of early strength.This is mainly attributed to the pozzolanic action as concerns the CEM II B-M cement.In CEM II A-L, an increase in early strength is mainly achieved by the higher fineness, but this characteristic contributes less to the strength development.A similar but more linear trend exists between Str_7 and Str_28.Correlations between {SO 3 , R 40 , S b } and 28-day strength was studied by first separating the results of each CEM type using LOI and Ins_Res as criteria.Then the procedure implemented for early strength is followed for each independent variable and cement type.The trends are depicted in Figure 5.As concerns CEM B-M 32.5, the function between sulfates and Str_28 presents a clear maximum for SO 3 є [2. 4, 2.6].This result agrees with a laboratory study of Tsamatsoulis et al. 40 for the same clinker and raw materials, where parabolic equations were used to express this correlation.For the CEM 42.5, the function between the two variables is ascending, meaning that the results are located in the left branch of the parabola.The function between residue at 40  ments.In Figure 5(c), constantly higher specific surface is observed in CEM II B -M compared with that of CEM II A-L.This is due to the higher content of softer materials in the first cement.The function between specific surface and Str_28 passes from maximum value for both cements.Although this result seems to be in contradiction with the principle that the finer cement has higher strength, the explanation is as follows: In the left branch of each curve as S b increases, the specific surface of the clinker also increases, leading to greater cement strength.A further increase in S b is caused by the higher content of soft materials, leading into a strength reduction.
The negative correlation coefficients of LOI and Ins_Res with str_28 denote that the examination of impact of each variable on the strength needs some classification of the results.The above is obtained with generation of two families of curves: (i) The data are classified into five groups with low variance of insoluble residue.The upper and lower limits of Ins_Res are (0, 2), (2, 4), (4, 6), (6, 8) and >8 %.For each group of data the average LOI and Str_28 are calculated and plotted in Figure 6(a).(ii) Regarding LOI, the classification is made into five groups with limits (5, 8), (8, 9.5), (9.5, 11), (11,  12.5) and >12.5 %.The average Ins_Res and Str_28 are found and plotted in Figure 6(b).An increase in LOI causes a decrease in Str_28, as may be ob-served from the first Figure for all the Ins_Res groups.The slope of reduction is greater at higher LOI values.A similar trend occurs in the function between Ins_Res and Str_28, but the gradient of strength reduction is smaller.Especially for Ins_Res between 6 % and 9 %, the average decrease in strength is fairly low, demonstrating the positive effect of pozzolane addition on 28-day strength.

Residual errors of Str_28_1 and Str_28_7 models
The dynamical models initially apply to model Str_28_1 for a movable training period of T D = 180 days and the corresponding MSRE Past are computed.Then the parameters of each model apply to the results of the next T F days, which constitute the test-   ing set.This period starts at least 29  eters not included in the current models -clinker composition, mineralogy, and activity -have been modified from their previous state; (b) Some values of input variables do not belong to the range of these variables during T D period.Therefore, the models are obliged to extrapolate the computation, which can lead to a worsening of prediction.From Figure 7 it may be observed that the dynamic MLR model shows high efficiency: Only four out of the eight ANN models provide a lower test error: The S_1N, S_1N_B, CASC and HT_1N_B models behave better than MLR in future predictions.The more complicated ANNs with two nodes in the hidden layer or with the radial basis functions provide worse MSRE Future .Generally, the addition of bias improves the generalization ability of the ANNs with one node in the hidden layer.The simple S_1N_B model provides the minimum test errors.Model S_2N, despite its lower training error, fails to predict the future strength better than MLR, a fact that is a clear indication of over-fitting.Over-fitting also appears in the case of ANNs based on radial basis functions (RBF).The RBF structure with two hidden nodes presents lower training error than that with one hidden node.However, the MSRE Future of the former model is noticeably higher than that of the latter for all the range of T F .Cascade ANN behaves relatively better than MLR as concerns MSRE Future , but it is the most complicated ANN and needs longer computational time.Therefore, it appears that the higher generalization ability is achieved with a more compact architecture, namely models with fewer parameters.
The MLR and ANN techniques have also been compared using the Str_28_7 model.This model has been applied only for three out of the eight ANN techniques, those with one node in the hidden layer, sigmoid and hyperbolic activation function σ, with or without bias: The S_1N, S_1N_B and HT_1N_B models.As previously mentioned, the MSRE Past and MSRE Future curves were constructed for T D = 180 and T F ranging from 1 to 60 days.The results are shown in Figure 8.The MLR model shows an adequately small MSRE Past , in the range of 1.27 MPa, almost equal to the MSRE Past of S_1N_B and HT_1N_B, while the training residual error of S_1N is relatively higher.Concerning the test errors, MSRE Future of MLR is lower than those of S_1N and S_1N_B, and almost equal to the error of HT_1N_B.Therefore, in the current level of ANNs development, the simple MLR technique is very reliable.

Optimization of T D and T F parameters
The training and test periods, T D and T F respectively, need optimization as regards minimization of MSRE Future .To achieve this objective, the models Str_28_1 and Str_28_7 were applied for T F ranging from 1 to 60 days, using the MLR technique as well as the two best ANNs, S_1N_B and HT_1N_B, which show a minimal MSRE Future .The dynamic models have been implemented for T D values from the set {60, 90, 120, 180, 240, 300, 360, 540, 720}, while T F takes values from the set {1, 2, 5, 10, 20, 30, 60}.For all the possible combinations of (T D , T F ), the errors

Residual errors of Str_28_EW model
The model Str_28_EW is composed by the superposition of models Str_28_1 and Str_28_7.An EWMA filter is applied to smooth the variable Diff(I) defined by equation (17).Str_28_EW contains four additional parameters, named k BM , λ BM , k AL , λ AL whose values need determination.Two difficulties arose during the solving of this problem: (a) The residual errors do not diminish any value of k and l, but an optimization technique is required, using MSRE Future as criterion.(b) Some trials of application of non-linear regression technique showed convergence in local minima, thus the optimum values were not guaranteed.For these reasons, the following steps applied: The coupled model Str_28_EW has been applied using the MLR technique as well as the two best ANNs: S_1N_B and HT_1N_B and time pa-  The sensitivity of the optimum (k, λ) is also studied by determining the pairs (k, λ) which provide MSRE Future ≤ 1.005 MSRE Future Min using interpolation between the discrete parameters and the results shown in Figures 12(a), 13(a), 14(a).This area of (k BM , λ BM ) constitutes the optimum region.The respecting areas for MLR and the two ANN techniques are depicted in Figure 15, which shows that the sensibility of the optimal k BM , λ BM is low.For HT_1N_B neural network, the range of k, λ of optimum test error is larger compared with S_1N_B for almost the same MSRE Future Min .From these results, it is concluded, that a slow EWMA filter provides a more effective correction and amelioration of the prediction.The test errors for the full range of T D ,  (i) The coupled model is noticeably better than Str_28_1 for all three techniques as regards test error.This improvement is higher, increasing T D and T F .For both models, a small T D period augments the test error meaning that the size of the training set is inadequate to train the model satisfactorily.The above could be compared with a small integral time in a PID controller, which creates oscillations around the set point.
(ii) Time parameters, T D and T F , have strong impact on test error for both models, independently of the technique applied: The optimum value of T F is one day, while the test error as function of T D passes from a minimum value found between 120 and 240 days depending on the model and technique applied.The optimum (T D , T F ) and the minimum MSRE Future for both models and all the techniques are shown in Table 6.
(iii) As T D increases, the surface MSRE Future is more flat in the case of Str_28_EW compared to that of Str_28_1, for all the three techniques, i.e.Str_28_EW is more robust as regards the selection of T D .
(iv) For the optimum values of T D , the differences between MSRE of 1 < T F ≤ 30 and MSRE of 12 F i g. 14

Conclusions
A dynamic approach of two well-known techniques has been used to model and predict the 28day cement strength: Multiple linear regression (MLR), and artificial neural networks (ANN).The modeling includes data for Portland cement produced according to EN 197-1:2011 and utilizes analyses of daily average samples of cement produced industrially.Physical, chemical, and early strength results are used to predict the typical 28day strength by developing the models Str_28_1 and Str_28_7.Two kinds of models have been built -Str_28_1 and Str_28_7 -which include only the 1-day strength as an independent variable or both 1-day and 7-day strength.The models are dynamic because they are applied to a movable past period of T D days to calculate the parameters, and then used for a future period of T F days.Upon completion of the future period, the process is repeated by moving forward periods.Various ANNs have been developed involving three layers with one or two nodes in the hidden layer.In parallel, a cascade ANN has been created.Three types of activation functions have been utilized; sigmoid, hyperbolic tangent, and radial basis functions.The comparison is based on the MSRE of testing sets.The linear technique shows high performance as only four out of the eight ANNs provide a lower test error for the Str_28_1 model: Two ANNs with one hidden node, sigmoid transfer function, with and without bias, one ANN with one hidden node, hyperbolic tangent function and bias, as well as the cascade ANN.The other more complicated ANNs suffer from over-fitting, and test error fails to be lower than that of MLR.Using this error as a criterion, an optimization of the training and testing periods, represented by the time parameters T D and T F , has been performed.For Str_28_1 model and for all the techniques applied, the best estimations were obtained for T F = 1.
The model Str_28_7, taking into account the 7-day strength results, generates significantly smaller training and test MSRE compared to Str_28_1, but it has longer delay time.Therefore, it is difficult to use for direct control purposes, although it provides more accurate information.For this reason, a third model has been developed, called Str_28_EW, with coupling the Str_28_1 model and a filtered version of Str_28_7.This model involves two additional parameters per CEM type: The weight λ of the EWMA filter and the weight k of coupling.These parameters have been optimized to reach the minimum test error.By applying both MLR and two of the best ANN techniques, Str_28_EW leads to errors continuously smaller than those of Str_28_1.These best ANN techniques include only one node in the hidden layer, sigmoid and hyperbolic functions and bias.Their implementation in the coupled model provides a smaller error than the combination of the Str_28_EW with MLR.Consequently, by combining the coupled model with the two best ANNs and selecting the optimal parameters of time, of filtering and weighting, a minimal test error is assured.For the optimal values of T D , k, λ the coupled model is less sensitive in the selection of T F .During the implementation of the model in process control, the benefit is that if the model is not updated daily -for the optimum T F = 1 daybut in some cases a few days later, the worsening of prediction is much lower for the Str_28_EW model, compared with the Str_28_1.The further improvement of these techniques should follow the directions below: (a) The ANN inputs should include the chemical characteristics of the clinker, such as C 3 S, C 3 A, equivalent alkalis, free lime.

F i g . 3 -
Example of data set for T D = 180 days

4 F i g. 4 F
cm 2 g -1 ) CEM II A-L 42.5 CEM II B-M 32.5

F i g . 4 - 4 F i g. 4 FF i g . 5 -
Effects of early strength on Str_28 Effects of SO 3 , S b and R40 on Str_28

6 FF i g . 8 - 7 FF i g . 9 -
MSRE Past , MSRE Future are determined.The results for Str_28_1 model are depicted in Figure 9.The analysis of these results leads to the subsequent conclusions: (a) MSRE Past is an ascending function of T D and not significantly dependent on T F .(b) For the same T D value, the MSRE Past results computed from both ANNs are ≈0.02lower that that computed by MLR, for almost the entire range of T F .(c) MSRE Future is a strong function of both T D , T F , and for each T F value, there is a T D where MSRE-Future becomes minimal.For the MLR and S_1N_B techniques, these T D values of minimum future error Training and test MSRE of Str_28_7 modelshow an increasing trend as T F increases: For small T F , T D = 120 days, while for higher T F values, T D increases to 180 days, reaching 240 days for T F = 60 days.Therefore, a positive correlation between the time parameters exists for the two mentioned techniques.As concerns HT_1N_B technique, T D is independent of T F , remaining continuously equal to 120 days as T F varies from 1 to 60 days.(d) Small values of T D -T D = 60 days -lead to a noticeable worsening of future prediction, meaning that short training periods are insufficient to train the models.(e) The minimum MSRE Future for the MLR technique appears for (T D , T F ) = (120, 1) and it is equal to 1.89 MPa.(f) For the same values of T D and T F , the two ANNs provide a residual error continuously lower than that of MLR.Therefore, both ANNs behave better than MLR in predicting the future 28-day strength.(g) The minimum MSRE Future for S_1N_B and HT_1N_B appears for (T D , T F ) = (120, 1) and it is equal to 1.86 MPa.(h) Additionally the error of ANNs for T F = 2 remains lower than that of MLR for T F = 1.The above advantage leads to a more robust implementation of the selected neural networks in the actual quality control of a cement plant.(i) Because all the points (T D , T F ) in the grid T D є {60…720}, T F є {1…60} have been scanned, the global minimum MSRE Future is assured.(j) For small T D = 60 days or high T F ≥ 20 days, MSRE Future differs noticeably from optimum.The causes are explained in the previous paragraph as regards the variables not included in the current models, and values of input variables not belonging to the range of these variables during the training period.The sensitivity of the optimum (T D , T F ) = (120, 1) has also been studied by determining the pairs (T D , T F ) providing MSRE Future ≤ 1.02 MSRE Future, Min using interpolation between the discrete time parameters.This area constitutes the optimum region of the time parameters.The results are demonstrated in Figure10.The optimum areas are very similar for MLR and HT_1_B techniques.In the case of S_1N_B, T D is expanded to higher values, but max-Training and test errors for T D є [60, 720], T F є[1, 60] and Str_28_1 model imum T F is approximately one day shorter compared to HT_1_B.This is an indication of higher robustness of the HT_1_B technique.A similar analysis has been performed for the residual errors during the training and testing period for the model Str_28_7.The results are plotted in Figure11, and the following trends were observed: (a) The MSRE Past is an increasing function of T D and independent of T F. (b) The minimum MSRE Future appears for the highest T D equal to 720 days, and for this T D value it does not depend on T F. (c).From (a), (b) it is deduced that smaller training periods are not adequate to train the models, leading to higher test error.(d) The lowest test errors for the full range of T D , T F are provided by applying the HT_1S_B neural network.The errors derived by applying MLR are not far from those of HT_1S_B.Third in this ranking is the S_1N_B model.(e) When T D decreases from 720 to 60 days, the function between MSRE Future and T F is ascending for the same T D .
(a) For each parameter k BM , λ BM , k AL , λ AL minimum and maximum values, k MIN , k MAX , λ MIN , λ MAX are selected.(b) Steps of change, dk, dλ are also chosen.(c) The rectangular area defined by the vertices k MIN , k MAX , λ MIN , λ MAX is scanned with steps dk, dλ for each CEM type.The errors during training and test provided by equations (13) and (14) are computed.(d) As optimization criterion, the minimum MSRE Future is selected.(e) The steps (a)-(d) are implemented for each (T D , T F ) and the optimum (k AL , λ AL , k BM , λ BM ), producing the minimum MSRE Future is found.The designed optimization technique provides the actual optimum as it performs calculations in all the lattice points.

TF
by implementing the MLR, S_1N_B, HT_1N_B techniques are shown in Figure 16.For each pair (T D , T F ) the optimum values (k AL , λ AL , k BM , λ BM ) have been determined.The results of this Figure compared to the results of Figure 9(b) concerning the test error of Str_28_1 model, lead to the following remarks.

F i g . 1 4 -HT_1N_B,F i g . 1 5 -
Test errors for Str_28_EW model, HT_1N_B technique and (a) (λ AL , k AL ) = (0.1, 1), (b) (λ AL , k AL ) = (0.5, 7) TD=120 days, TF=1 day (a) T F = 1 are shown in Figure 17, for all the models and techniques.When T F increases, the rate that MSRE Future augments is much lower in the case of coupled model compared to the Str_28_1, meaning that the Str_28_EW model shows high robustness to this time parameter.The benefit of this significant characteristic during the implementation of the model in the daily quality control is that, if for some reason the model is not updated daily -T F = 1-but in some cases a few days later, the worsening of prediction is much lower for the Str_28_EW model compared to the Str_28_1.(v) The implementation of neural networks assures a lower test error, compared with the linear technique for both models.The minimum error for the Str_28_EW model is obtained by using the S_1N_B technique for T D = 240 and T F = 1.The above remarks prove that the combination of ANN techniques and of the models coupling, leads to a serious improvement of the 28-day strength prediction.Such an improvement becomes more important because already the Str_28_1 model combined with the MLR technique shows a high ability to predict the future strength.Next, a basic Str_28_1 model with MLR was considered for T D = 120 and 1≤ T F ≤ 60.The percentage of reduction of MSRE Future by using the optimum Str_28_EW model with S_1N_B neural network, for T D = 240 and the same range of T F appears in Figure 18.The reduction of error starts from ~5 % for T F = 1 and reaches ~10 % for T F = 30.Therefore, there is a noticeable amelioration, which can lead to an enhanced quality control and reduction of the cement strength variance.Ta b l e 6 -Comparison of minimum MSRE Future between Str_28_1, Str_28_EW models for the optimal tech-Optimum area of (k BM , λ BM ) for Str_28_ΕW model F i g . 1 6 -Test errors for Str_28_EW model as function of T D , T F and optimum values of k, λ

FF i g . 1 7 - 1 -
Difference MSRE Future (T F )-MSRE Future(1) as function of T F F i g .1 8 -% Reduction of MSRE Future between Str_28_1, MLR and Str_28_EW, S_1N_B MSRE(TF)-MSRE(1) (MPa) N o m e n c l a t u r e A 0 , A I -Coefficients of equation (3) C 3 S -Tricalcium silicate, % Diff(I) -Difference between calculated and actual 28 -day strength at time I, MPa dk -step of change of parameter k EW_Diff(I) -EWMA difference between calculated and actual 28 -day strength at time I, MPa I -Discrete time in equations (16), (17), days Ins_Res -Insoluble residue, % J -Discrete time in equation (18), days k -Weight parameter of the coupled model K TD -Number of consecutive data sets LOI -Loss on ignition, % M -Total number of data sets in equation (5) MSRE -Mean square residual error, MPa N -Number of coefficients N Number of nodes in the hidden layer o(J) -Activation functions of neural networks R40 -Residue at 40 microns sieve, % S b -Specific surface, 10 m 2 kg -1 SO 3 -Sulfates content, % s Res -Residual error, MPa Str_1 -Compressive strength at 1 day, MPa Str_7 -Compressive strength at 7 days, MPa Str_28 -Compressive strength at 28 days, MPa Str_28_1(I) -Calculated strength at 28 days from model Str_28_1 at time I, MPa Str_28_7(I) -Calculated strength at 28 days from model Str_28_7 at time I, MPa Str_28_EW(I) -Calculated strength at 28 days from model Str_28_EW at time I, MPa t -Time, days T D -Set training period, days T F -Set testing period, days X(I)

G r e e k s y m b o l sλ
-parameter of the EWMA model dλ -step of change of parameter λ σ I -variance parameters of the radial basis function S u b s c r i p t s A b b r e v i a t i o n s ANN -artificial neural networks CEM -cement type DMEP -double-layer multi-expression programming EWMA -exponentially weighted moving average FL -fuzzy logic GEP -gene expression programming GA -genetic algorithms GRNN -generalized regression neural network MEP -multi -expression programming MLR -multiple linear regression PCA -principal components analysis RBF -radial basis function R e f e r e n c e s Ta b l e 4 -Description of ANNs structure F i g . 1 -Three-layered ANN with one and two nodes in the hidden layer weights are trained in batch mode and then tested.The second step involves a new layer where Str_1 and Ins_Res enter.The weights W 12 , W 32 , V 1 , V 2 are trained and tested as previously.The construction of new, hidden layers follows by adding each time Str_1 and one of the remaining variables.The algorithm stops when the addition of a new variable does not decrease the training error further, which means that the ANN is probably over-fitted.Applying this algorithmic logic, R40 was not added to the cascade ANN, as it causes worsening of the test er- microns and Str_28 is descending.The rate of strength reduction increases as R40 aug- days after the last date of the training period.The calculated and actual results of 28-day strength during the testing period are compared.Thus, the errors MSRE Future are determined.The training and test errors, MSRE Past and MSRE Future , of Str_28_1 for MLR and all ANN techniques for T D = 180 and T F ranging from 1 to 60 days are shown in Figure7.Because the training errors of each model are mainly the function of T D , they remain almost constant for T F ranging from 1 to 60 days.As concerns the test errors, a decrease in T F generally causes a decrease in MSRE Future .Especially for T F ≥ 20 days, a worsening of MSRE Future is observed, meaning that the ability of ANNs or MLR to predict future results is reduced.This result is attributed to two main causes: (a) Some of the param- F i g .6 -Effects of LOI and Ins_Res on Str_28F i g .7 -Training and test MSRE of Str_28_1 model are determined and applied.For each λ value there is only one k value where MSRE Future becomes minimal.From the shape of the curves, it is concluded that increasing λ, minimal MSRE Future is obtained for lower k values, meaning that there is a negative correlation between these two parameters.Despite the differences in the shape of the surfaces in the three Figures, generally, the minimum error is found at low values of λ and high values of k, in the intervals (0.1, 0.2) and (0.9, 1.0) correspondingly.