A Dynamic Weighted RBF-Based Ensemble for Prediction of Time Series Data from Nuclear Components

In this paper, an ensemble approach is proposed for prediction of time series data based on a Support Vector Regression (SVR) algorithm with RBF loss function. We propose a strategy to build diverse sub-models of the ensemble based on the Feature Vector Selection (FVS) method of Baudat & Anouar (2003), which decreases the computational burden and keeps the generalization performance of the model. A simple but effective strategy is used to calculate the weights of each data point for different sub-models built with RBF-SVR. A real case study on a nuclear power production component is presented. Comparisons with results given by the best single SVR model and a fixed-weights ensemble prove the robustness and accuracy of the proposed ensemble approach.


INTRODUCTION
Combining various data-driven approaches into an ensemble has become a popular direction of research in the last decades, motivated by the aim of improving the robustness and accuracy of the final prediction.The models which compose the ensemble are called sub-models.Various strategies have been proposed for building sub-models, including error-correcting output coding, Bagging, Adaboost, and Boosting (Kim, Pang, Je, Kim & Bang, 2003;Hu, Youn, Wang & Yoon, 2012).Similarly, several methods for aggregating the prediction results of the submodels have been proposed, such as majority vote, weighted vote, Borda count, Bayes and probabilistic schemes, etc (Polikar, 2006).Support Vector Machine (SVM) is a popular and promising data-driven method for prognostics.SVM-based ensemble models have been proposed for classification.Chen, Wang and Zuylen (2009) use ensemble of SVMs to detect traffic incidents.The sub-models use different kernel functions and parameters, and their outputs are combined to improve the classification performance.Acar and Rais-Rohami (2009) treat the general weighted-sum formulation of an ensemble as an optimization problem and, then, minimize an error metric to select the best weights for the sub-models of SVM.Kurram and Kwon (2013) try to achieve an optimal sparse combination of the sub-model results by jointly optimizing the separating hyperplane obtained by each SVM classifier and the corresponding weights of the subdecisions.Valentini and Dietterich (2003) prove that an ensemble of SVMs employing bagging of low-bias algorithms improves the generalization power of the procedure with respect to single SVM.The ensemble of SVMs built with bagging and boosting can greatly outperform a single SVM in terms of classification accuracy (Kim et al., 2003).
In this paper, we focus on the combination of multiple SVR sub-models (Liu, Seraoui, Vitelli & Zio, 2012) with Radial Basis loss Function (RBF).The case study considered to present the application of the method concerns the monitoring of the leak flow in the first seal of the Reactor Coolant Pump (RCP) of a Nuclear Power Plant (NPP), using real data collected from sensors.
The high complexity of the NPP system and the catastrophic economic and environmental loss under accidents make the monitoring for failure prediction very important in NPP accident management (Lee, 1998;Hallbert and Thomas, 2014).RCP pumps the coolant into the reactor to transfer the heat to steam generator and to protect the nuclear material (In Soo and Kim, 2000).With large amount of leakage from RCP, NPP has the risk of melting down.Thus, it is critical to predict the leakage in the future.The prediction can be divided into long-term (months) and shortterm (hours) prediction.Long-term prediction provides future information for the maintenance scheduling, while short-term prediction helps for emergency actions.
An ensemble of SVRs with RBF and dynamic weighting strategy is proposed in this paper.The elements of novelty of the method here proposed are various.
In the previously cited literature on ensembles of SVMs, the weights of the sub-models in the ensemble are calculated during training and kept fixed for testing.However, a submodel may perform well only on a part of the dataset.Hence, the weights need to be updated considering the different datasets involved in the case study, and even different input vectors.In Fantoni, Figedy and Racz (1998), a dynamic strategy is integrated into a Neuro-Fuzzy Model.A dynamic weighting method is also used in Muhlbaier, Topalis and Polikar (2009), Yang, Yuan and Liu (2009) and Razavi-Far, Baraldi and Zio (2012), for adding new classifiers to the ensemble model, but the weights are not adjusted to the different input vectors.
A novel dynamic weighting strategy, based on local fitness calculation (Baudat & Anouar, 2003) is proposed in this paper.
To generate diversity in the sub-models, each of them is trained on a different dataset.In this respect, one can use bagging and boosting with possible overlapping between different datasets (Quinlan, 1996).In this work, the strategy to form the training dataset of each sub-model is based on the angle between different data points in the Reproducing Kernel Hilbert Space (RKHS), so as to reduce the computational burden.
Moreover, in order to be able to build ensembles of SVRs on very large datasets, FVS is used to select a smaller subset of the training data points of each sub-model, again to decrease the computational burden.
All the above novel strategies are tested on the case study concerning the prediction of leak flow of the RCP in a NPP.
The rest of the paper is organized as follows.Section 2 gives details about the proposed ensemble approach.Section 3 illustrates the case study, the available data and how the proposed ensemble model is constructed.Section 4 presents the experimental results from the SVR ensemble models and describes the comparison with a single SVR model and a fixed weighted ensemble.Finally, conclusions with some considerations are drawn in Section 5.

DYNAMIC-WEIGHTED RBF-BASED ENSEMBLE
The underlying strategy motivating the use of ensemblebased methods in prediction problems is to benefit from the strength of different sub-models by combining their outputs to improve the global prediction performance, if compared to the results of a single sub-model.
In this section, we give details about the proposed Dynamic-Weighted RBF-based Ensemble (named DW-RBF-Ensemble, in short).and a good property for RBF is that for each data point , (, ) = 1, i.e., the data point in RKHS is a unit vector.The difference between different data points in RKHS is only the angle between them.

Feature Vector Selection
In Baudat and Anouar (2003), the authors propose a Feature Vector Selection (FVS) method to select a subset of the training data points (i.e.Feature Vectors (FVs)), which can represent the dimension of the whole dataset in RKHS.The other data points can all be expressed as a linear combination of the selected FVs.Suppose (  ,   ) , for  = 1, 2, … ,  are the training data points and the mapping φ() maps each input vector   into RKHS with the mapping   , for  = 1, 2, … , .The kernel  , = (  ,   ) is the inner product between   and   .Suppose that the FVs selected from the training dataset are {  1 ,  2 , … ,   } and the corresponding mapping is S = { 1 ,  2 , … ,   }: the process for selecting the new next FV is to calculate { ,1 ,  ,2 , … ,  , } which gives the minimum of Eq. ( 4), with   being the mapping of the new input vector   : . ⑷ The minimum of   can be expressed with an inner product, as shown in Eq. ( 5):  6), the FVS procedure proceeds to select a subset of training data points with minimal size, which gives zero global fitness.
The details for FVS is shown in Figure 1. ⑹

Ensemble-Based Approach
An ensemble-based approach is obtained by training diverse sub-models and, then, combining their results following given strategies.It can be proven that this can lead to superior performance with respect to a single model approach (Bauer & Kohavi, 1999).A simple paradigm of a typical ensemble-based approach with N sub-models is shown in Figure 2. Ensemble models are built on three key components: a strategy to build diverse models; a strategy to construct accurate sub-models; a strategy to combine the outputs of the sub-models in a way such that the correct predictions are weighted more than the incorrect ones.
In the DW-RBF-Ensemble that we are proposing, the submodels are built using a modified SVR model with RBF.A dynamic weighted-sum strategy is proposed to combine the outputs of the sub-models.As mentioned in the Introduction, different methods can be applied to calculate the weights for the sub-models.In the methods that can be found in the literature, the weights are normally fixed after the ensemble model is built.They are only updated when new sub-models are added to the ensemble or when some sub-models are changed.In some real applications with fast changing environmental and operational conditions, the performance of the ensemble model may degrade rapidly.This degradation is not always caused by the low robustness or capability to adapt of the ensemble model, but can be due to the fact that the best sub-models are not given proper weights.
In this paper, a dynamic weighting strategy is thus proposed.The weights are no longer constant during the prediction, but dependent on the input vector.They are recalculated each time a new input vector arrives.Inspired by the work of Baudat and Anouar (2003) and considering the characteristics of SVR, a local fitness calculation is implemented in this paper to calculate the weights of the different sub-models for each input vector.

Sub-datasets determination
Clustering methods are widely used in ensemble approaches for determining the sub-datasets for different sub-models.
In this paper, SVR models are trained with RBF.The difference between different data points in RKHS is only the angle between them, as the norm of all data points in RKHS is one.Thus, we can use the angular-clustering algorithm to divide the whole training dataset into several sub-datasets.
The pseudo-code is shown in Figure 3.As kernel function, RBF is the inner product of two vectors in RKHS and the angle between them can be expressed as Eq. ( 7) in the pseudo-code of Figure 3.

Train a RBF-SVR sub-model
With the angle-clustering method, the training dataset is divided into several clusters.But in the DW-RBF-Ensemble method, the data points in each cluster are not used directly to train a RBF-SVR.FVS is firstly used to select the FVs in each cluster and, then, the SVR model is trained on these selected FVs, in order to decrease the computational burden.
The procedures for training a SVR model with FVs are not the same as shown in Sub-Section 2.1, as the estimate function in Eq. ( 2) is no longer a kernel expansion on all the training data points in one cluster, but only on the selected FVs.
Suppose that for the j-th cluster, the training data points are (  ,   ), for  = 1, 2, … ,   and the FVs selected by FVS are (  ,   ), for  = 1, 2, … ,   ; the estimate function of SVR for the i-th cluster is given in Eq. ( 8): Such a process can efficiently decrease the risk of overfitting and guarantee the generalization performance of the sub-models.

Weights Calculation
In Section 2.
, ⑼ where τ is a very small value so that Eq. ( 9) works in the case   () = 1.

Combining Sub-Models Outputs
Figure 4 shows the paradigm of DW-RBF-Ensemble, where  is the number of sub-models, () is a new input vector arriving at time ,   () is the weight assigned to the j-th sub-model for the new input vector,  ̂() is the predicted value for the j-th sub-model given by RBF-SVR and  ̂() is the final output of the ensemble model.(), if we assume sub-models results to be uncorrelated.
Note that all the sub-models weights and outputs are a function of , which means that they are all dependent on the input vector of the ensemble model.

CASE STUDY DESCRIPTION
The study considered in this paper concerns the 1-day ahead prediction of leak flow from the first seal of the RCP of a NPP.RCP is a critical component in NPP, whose function is to circulate coolant into the reactor to transport the heat produced by nuclear fission to the steam generator.The leakage of coolant reduces such heat removal function, posing serious safety concerns.Short-term prediction can provide warning on a time horizon of hours and the lead time for deciding emergency actions.The time horizon of one day has been considered appropriate for NPP systems of interest, as indicated by the experts involved in this work.
In this section we describe the time series data and briefly recall the data pre-processing steps.We also detail the strategies to build the diverse sub-models of the ensemble.

Data Description and Pre-Processing
The data provided correspond to 9 scenarios of leak flow from different NPPs.Each scenario contains a time series data of the leak flow.They are named Scenario 1, Scenario 2, …, Scenario 9 in the following sections of the paper.These data are monitored every four hours.As these data are time-dependent and recorded within different time windows, only scenarios coming from the same NPP have the same size.In some of the scenarios, there are missing data points and outliers.Since the dataset we are going to analyze contains both missing data and outliers, we have to deal with both these issues.First of all, we must remove anomalous data, since their extreme values would affect the results of the analysis.Outliers can be detected with reference to some constraints, e.g. the limits  ̅ ± 3 *   where  ̅ is the mean of the data points values and   is the standard deviation.These limits allow detecting the outliers, selected as those data points whose values are larger than  ̅ + 3 *   or smaller than  ̅ − 3 *   , and subsequently removed.Some observations are in order with respect to the adopted procedure: i) In nonstationary time series, this outlier detection method should be carried out on local data and not on the whole scenario.We choose this method, not to delete any values which are possible indicators for changing conditions.ii) Given that the scenario is known, the strategy of outlier selection is chosen considering its overall development where there is no sudden changes, and the single values that are significantly outside the range of their neighbors are considered as outliers.iii) Note that we use those constraints, rather than the usual ones based on the median and the InterQuartile Range (IQR), to be more conservative in the outlier selection, due to the dependence among data (Brodsky, Lemmens, Brock-Utne, Vierra & Saidman, 2002).
Secondly, we want to reconstruct missing data.A possible way to deal with the reconstruction of missing data is local polynomial regression fitting (Masry, 1996).This local least squares regression technique estimates effectively the values of missing data points.Moreover, it can also be used to perform the smoothing of the available observations, in order to reduce noise.We will, thus, use this technique both to reconstruct data where missing, and to obtain a smoother and less noisy time series in all remaining time instances.All the time series data of all scenarios are, then, normalized from 0 to 1.All details on this pre-processing task can be found in Liu et al. (2012).

Strategies to Build Sub-Models
We have a time series dataset and we need to decide the best number of historical values to be used as inputs.
Suppose () represents an instance of the time series data of one scenario.For 1-day ahead prediction, the output () is ( + 6), because the signals are monitored every four hours.In order to decide the best  for selecting the input vector () = (( −  + 1), … , ()) most related to the output, a partial autocorrelation analysis is carried out, i.e. the correlation between the output values at current time and different temporal lags is computed.Figure 5 shows the results of this analysis on all the scenarios, where the x and y axis represent the temporal lag (a multiple of four hours) and the corresponding empirical partial autocorrelation, respectively.The bounds of a 95% confidence interval are also shown with dashed lines in the Figure .The correlation decreases with the lag (although not linearly) and after a lag of 17 time steps it is no longer comparable with the values observed for lags smaller than 17, i.e. the best choice is  = 17.Although the autocorrelation is still high for t = 18 and t= 20, the results are not improved on the real case study, according to the numerical experiments.The selection of 17 is already a "conservative" choice: in fact the most important values appear to be the first 7 values.Then, the training dataset is divided into several subdatasets for different sub-models using the angle-clustering algorithm described in sub-section 2.3.1.

Comparison of DW-RBF-Ensemble with Single SVR and Fixed Weights Ensemble
The ensemble model is expected to give better results than a single SVR model.To verify this claim, a comparison between a single SVR model and the proposed DW-RBF-Ensemble is carried out on the considered case study.A fixed weights ensemble (Kurram and Kwon, 2013) is also taken as a benchmark method to prove the benefit of using a dynamic weighting strategy.
Each time one out of 9 scenarios is chosen as the test dataset (named Observed Scenario) and the other 8 scenarios (named Reference Scenarios) from the training dataset which is used to construct the DW-RBF-Ensemble and the Fixed Weights Ensemble (FW-Ensemble).A SVR model is also trained on the training dataset for comparison (it is named Single SVR to be distinguished from the two ensemble models).
The steps for the comparison are the following: 1. Train a Single SVR model with all the training dataset.
2. The training dataset is divided into 6 clusters by the angle-clustering algorithm.3. Train DW-RBF-Ensemble: FVS select the FVs in each cluster and a sub-model is trained on the selected FVs.
Weights of different sub-models for each data point are calculated with Eq. ( 9). 4. Train a FW-Ensemble: train a sub-model with all the data points in each cluster.The weight for each submodel is decided by minimizing the MAE on the training dataset. 5. Calculation of Mean Absolute Error (MAE), Mean Relative Error (MRE) of the outputs of DW-RBF-Ensemble, FW-Ensemble and Single PSVR.6. Compare prediction accuracy, computational burden and model robustness.The results and comparisons among these models are presented in the next section.

RESULTS
In this section, the results from DW-RBF-Ensemble, FW-Ensemble and Single SVR are compared with respect to different aspects.

Prediction Accuracy
Figure 6 shows the prediction results of the ensembles (DW-RBF- The bad results of the Single SVR are caused by the fact that the predictions are highly dependent on the training dataset.Moreover, the hyperparameters optimization is also critical to the performance of SVR.Well-chosen hyperparameters values can improve the performance of the SVR.However, the optimization method may converge to a local extreme, which results into a good performance at the beginning but bad at the end of the scenario.The ensemble approach can avoid such problem by combining the results from different sub-models.These unstable results from the Single SVR prove the necessity of the ensemble approach for avoiding the limits of Single SVR in attaining the desired accuracy and robustness of the model.
In this case study, FW-Ensemble gives the worst results as the weights are fixed after training.Somewhat surprisingly, it even gives results worse than the single SVR but that is due to the fact that with the partitioning by angle clustering for the training of the ensemble sub-models, it turns out that for some data points, the best sub-model is not given the most important weight.In this case, overlapping datasets for sub-models training would likely improve the FW-ensemble prediction performance.Figure 9 above shows the weights for different sub-models of DW-RBF-Ensemble in the case of selecting the ninth scenario as the Observed Scenario.It is clear that the weights of the sub-models change frequently to adapt to the ongoing data points.
The prediction results from DW-RBF-Ensemble confirm the practicability and efficiency of the proposed approach.

Computational complexity
Suppose the size of the training dataset is  ; then, the computational complexities of the Single SVR for training and testing are  3 and  , respectively.For very large datasets, the computational burden of the Single SVR model is very high and sometimes unacceptable.By dividing the training dataset into different sub-datasets, the total computational burden is decreased as  3 >  1 3 + ⋯ +   3 , with  1 + ⋯ +   =  .With FVS, the size of the training dataset is further decreased for training and testing.Thus, the computational complexity of the DW-RBF-Ensemble approach is much smaller than the Single SVR trained on all the training dataset and the FW-Ensemble.

CONCLUSIONS
In this paper, we have proposed an innovative dynamicweighted RBF-based ensemble approach for short-term prediction (1-day ahead prediction) with time series data.An angular-clustering algorithm is used to divide the training dataset into sub-datasets and FVS is used to decrease the size of the training data points by selecting only the representative data points in RKHS.Local fitness calculation is integrated to calculate the specific weights of the sub-models of the ensemble for each new input vector, without bringing too much computational burden.
The proposed ensemble approach has been shown to perform well in a real case study of signals recorded on a NPP component.Compared to the single SVR model and FW Ensemble, the proposed ensemble model outperforms them on prediction accuracy, computational burden, robustness and adaptability.
Further research needs to be carried out for optimizing the numbers of sub-models and the tuning of the hyperparameters.From the application point of view, the further developments of the method for long-term prediction will be investigated, for the purpose of remaining useful life prediction, i.e. prognostics.
, = ( , ), ,  = 1,2, … ,  is the kernel matrix of S and  , = ( , ),  = 1,2, … ,  is the vector of the inner product between   .The expression  ,is the local fitness of   with respect to the present feature space S. If 1 −  , is zero, the new data point is not a new FV; otherwise, it is a new FV and is added to S. With the global fitness defined as in Eq. (
In order to avoid the overfitting problem, the optimization still aims at finding the minimum of the objective function in Eq. (1) on all the training data points in the cluster.Thus, by replacing   +  in Eq. (2) with ∑ (  −   * ) *  =1(  ,   ) + , we can have the new, dual formulation of SVR.Classical methods can be used to estimate the unknowns in Eq. (8).
2, FVS defines global and local criteria to characterize the feature space.The proposed local fitness can describe the linearity between the mapping of a new input vector and the mapping of all the Feature Vectors (FVs) of the model: if a linear combination of the mapping of the FVs can better approach the mapping of the new input vector, i.e. 1 −  , ≈ 0 the model gives better approximation of the output of the new data point; otherwise, i.e. 1 −  , ≈ 1, the model performs worse for this data point.Thus local fitness can be implemented to derive the weight of each sub-model for each input vector.With Eq. (5), for a new coming data point at time t, we can calculate the local fitness   () with respect to the FVs of the i-th sub-model.And the weight of the i-th sub-model for this data point is calculated as

Fig. 5 .
Fig. 5. Partial autocorrelation function with respect to time lags (multiples of four hours).Dotted lines are the bounds of the 95% confidence interval.

Fig. 6 .
Fig. 6.Prediction results of ensembles and Single SVR, for the Scenario 1.

Fig. 9 .
Fig. 9. Weights of different sub-models of DW-RBF-Ensemble for test data points of the Scenario 9.

Vector Regression with RBF and ε-sensitive loss function
Ensemble and FW-Ensemble) on the first scenario.Figures 7 and 8 report the prediction results of MAE and MRE obtained, respectively, by DW-RBF-Ensemble, FW-Ensemble and Single SVR.It is clear that DW-RBF-Ensemble gives best results in this case study, i.e. on average, the MAE and MAE values are smaller than for Single SVR and FW-Ensemble.
From Figures7 and 8, it is seen that the DW-RBF-Ensemble gives more stable prediction results compared to the Single SVR model and FW Ensemble.The Single SVR model cannot properly handle the noise in the data and it is difficult to find the global optimal values of the hyperparmeters.The weighted-sum ensemble models can decrease the influence of the noise by combining the prediction outputs of the sub-models.But the fixed weighting strategy cannot adapt to the changing environment and the weights of the sub-models are not changed adaptively.This is one reason for which DW-RBF-Ensemble model can give stable results, i.e. the DW-RBF-Ensemble model is more robust compared to the Single SVR and FW-Ensemble.