Compound Autoregressive Network for Prediction of Multivariate Time Series

,


Introduction
In the information era, data play a signi cant role in various arti cial and natural systems.Data provide the basis for machine control, industrial system running, economical market, environment management, etc.For the complex systems above, the accurate real-time data are essential for the control and operation.Moreover, the future information is also very important, which is predicted with the historical data and can guide the beforehand operation for the system adjustment, environmental adaptation, and accident avoidance.erefore, the reliable prediction of the data in the time domain becomes an urgent issue for the complex systems.For the complicated composition and internal mechanism, the time-series data in the systems are usually nonstationary, nonlinear, and noisy.e complicated features make the prediction di cult.Besides, the variables in the time-series impact on each other to perplex the nonlinear relation.en, the prediction issue becomes the challenge in front of the complicated time-series characteristics and multivariate correlativity.
In the prediction issue, various explorations have been conducted to excavate the potential rules and features in the time-series data.For the practice application in some elds, the prediction methods are proposed based on mechanism models.In the methods, the inner mechanism of a system is studied deeply, in which the relations between system components are built with the approach of physics, chemistry, and biology, such as models of water environment (WASP [1] and EFDC [2]) and models of atmospheric di usion (Gaussian pu and plume model [3]).e system change can be predicted based on the mechanism model in the view of model simulation.However, the models are di cult to build because of the complex and unknown inner structure.Moreover, the professional and interdisciplinary knowledge is also required for the mechanism analysis.
e data-driven solution has been an effective complement for the mechanism methods.Different from the mechanism methods, the data-driven methods focus on the external data characteristic instead of the inner structure relation.It develops from the statistical method to the machine learning method which can excavate more features from the mass data.Machine learning solves mainly the problem of the parametric model setting and adaption in the statistical methods such as autoregression (AR), moving average (MA), autoregressive moving average (ARMA), and autoregressive integrated moving average (ARIMA) models [4].Machine learning including the traditional neural network and deep learning also face some problems in the time-series analysis.First, multiple variables usually need to be considered for the target predicted variable.In the multivariable analysis, the traditional networks mainly model the multivariable mapping relations, while neglecting the sequential features.And the deep learning methods are specialized in the sequential feature extraction of the univariate.Second, the computational efficiency should be considered in the prediction models, especially for the terminal application which cannot provide the high configuration.ird, the training methods affect the network performance largely.A suitable and extensible learning framework should be designed for the neural network.Based on the analysis of the existing research, we explore an access to the time-series prediction, in the view of multivariable modelling performance, computational efficiency, and training methods.e rest of this paper is organized as follows: Section 2 introduces the related prediction methods, including the statistical model and machine learning method.In Section 3, the main prediction model is proposed and the compound autoregressive network is presented with the prediction algorithm.Experiments are conducted in Section 4 to test the network.
e methods and results are discussed in Section 5. Finally, the paper is concluded in Section 5.

Related Works
e direct solution of the prediction is to figure out the change rule of the system, which is the basic thought of the mechanism-based prediction methods.Obviously, it is difficult to build the completed mechanism model to describe the system composition and change rule.en, the data-driven method becomes a feasible solution with the external characteristic irrespective of the system inner construction and relation.e data-driven methods can be divided into two categories: statistical model and machine learning model.

Prediction Models Based on Statistics.
e statistical model is based on the mathematical description and calculation of the data.e classical statistical models are built on the autocorrelation function and exponential decays of the time series.e typical models include AR, MA, and hybrid models.e AR model describes the change process of the regressor variable itself.In the model, the random variables in the next time steps are expressed with the linear combination of the variables in the previous moments.e MA model uses the sliding window to extract the time-series features in the view of the adjacent data segment.Because the length of the sliding window impacts the feature extraction ability mainly, some exponential smoothing methods are proposed to optimize the MA model, in which the cubic exponential smoothing method is applied widely.Based on AR and MA models, the hybrid model is proposed for accurate modelling, including the ARMA and ARIMA.
e ARIMA has been the typical hybrid model for the nonstationary regressive issue.It was applied in the prediction problems of environment monitoring [5], financial economy [6,7], food safety [8], traffic system [9], etc.
e statistical model can be expressed as follows.x t is the value of the time series at t, p is the number of autoregressive terms, q is the number of moving average terms, d is the differential order, ε t is the white noise at t, L is the lag operator, and α 1 , • • • , α p and β 1 , • • • , β q are the weights.en, the AR model can be expressed as ( e MA model is e ARMA model is e ARIMA model is e statistical models rely on the assumption of stationarity in the time series.Although the models are improved and evolved, they are still limited by the transformation and process of the stationary data.Besides, it is a problem on how to select a proper model and estimate the model parameters.e practice indicates that the models perform well in the linear short-term prediction.e prediction accuracy declines markedly in the complex and longterm time series.It becomes a demand to seek new prediction solutions to the nonstationary time series.

Prediction Model Based on Machine Learning.
Machine learning develops fast in the classification and regression research.
e black-box thought of machine learning seems to provide the extensive possibility for the complex modelling problems.e backpropagation neural network (BP), radial basis function neural network (RBF), nonlinear autoregressive neural network (NAR), support vector machine (SVM), and Bayes network have been studied and applied in the prediction problems [10].

Complexity
Some studies have been conducted to improve the network and prediction performance.Pradeepkumar [11] proposed a novel particle swarm optimization algorithm to train the quantile regression neural network, which was applied in the financial data prediction.Daly [12] designed the structure of the NAR to predict the video traffic in the Ethernet passive optical network.Wang [13] proposed an adaptive method based on the multiple-rate network to predict the parameters in industrial control.Liu [14] studied an improved grayscale neural network which was tested to predict the traffic stop.Some combinations of different methods are also a hotspot in the machine learning studies.Doucoure [15] predicted the wind speed with wavelet analysis and neural network.Wang [16] improved the BP with the self-adaptive differential evolution algorithm.
e machine learning methods above are mainly the shallow networks.ey are suitable for multivariate modelling because of the network structure of multiple input nodes.
e data in different time steps are imported independently into the network circularly, which place emphasis on the nonlinear mapping relation instead of the sequence connection in the time domain.Generally, they are limited in mass data processing and complex time-series relation modelling.Especially for the prediction issue, the sequence feature should be extracted which is difficult to realize in the traditional fully connected network.e recurrent neural network (RNN) [17] draws much attention in the sequence features.In the RNN, the nodes between the hidden layers are connected, and the input of the hidden layer includes not only the output of the input layer but also the output of the previous hidden layer.e RNN develops to the multidimensional recurrent neural network (MDRNN) [18] and to the bidirectional recurrent neural network (BiRNN) [19] for the higher performance.e long short-term memory network (LSTM) [20] is proposed for the long-term dependency problem in the traditional RNN.Some variants of the LSTM appear with the improvement and redesign of the structure or gate in the LSTM, including the bidirectional LSTM network (BiLSTM) [21] and gated recurrent unit (GRU) [22].Although the deep networks usually perform better than the traditional networks, they are studied and applied more with the univariate instead of the multivariate.Besides, their structures are more complex, and they need more training time and computing resources.
In the prediction problem of the time series, on the one hand, we should consider the sequence feature of the time series as well as the mutual effect of the related variables.On the other hand, we should balance the network prediction accuracy with the calculating speed and resources occupied.Considering the related works mentioned above, the advantages of different networks should be utilized, including the simple structure and multivariate analysis ability in the shallow networks, as well as the sequence feature extraction in the recurrent networks.en, the shallow recurrent neural network NAR [23] is selected as the basic network which can extract the nonlinear and sequence features in the time series.And a compound network structure and algorithm are designed to analyse multiple variables.e novel framework of the compound network can be applied in the prediction problem of complex systems, providing an alternative solution to analyse the data change in the data-driven view.

Compound Autoregressive Prediction Network
For the time series in the systems, the main feature is the trend in their changing process, as well as the incidence relation among different variables.e trend means that there are potential rules in the changing data, which can be linear, periodic, or stochastic.e incidence relation means the effect on multiple variables.For example, the temperature value fluctuates in its change rule, and it is impacted by other meteorological variables such as the precipitation and humidity.Based on the two important factors in the time series, a compound neural network is built to predict the object variable.e overall network structure is introduced firstly.
en, the components and training methods are analysed.e prediction algorithm for the multivariate time series is proposed finally.

Compound Autoregressive Network.
In the traditional neural networks, the NAR can realize the regression analysis of the time series itself.e network has been applied in practice and performs well in the short-term prediction.Besides, the data needed in the network training are obviously less than those of deep networks such as the LSTM and GRU.
en, the NAR can be an effective tool in the univariate prediction.Moreover, the nonlinear autoregressive network with external input (NARX) develops based on the NAR, in view of the incidence relation in the multiple variables.With the advantages of the NAR and NARX, the compound network is designed for the multivariate prediction issue, as shown in Figure 1. e compound autoregressive network proposed in this paper is abbreviated as CARN.
e CARN consists of two parts, namely, the primary network and auxiliary network.In the prediction issue, a variable is the main target to be predicted, and some variables are selected as the correlated variables according to their correlation degrees.e components in the compound network are corresponding to different types of variables.e primary network is built based on the structure of the NARX to predict the object variable.And the auxiliary network is built based on the NAR to provide the reference of the correlated variables.
For the primary network, the inputs include the object variable (Y in Figure 1) and the correlated variables (U in Figure 1).e nonlinear and complex relation in the variables is usually difficult to be analysed with mechanism modelling.But the network performs well in the black-box mapping relation mining.en, the design of the two types of inputs can excavate the associate relation in multiple variables.Besides the two types of inputs, the other characteristic of the network is the feedback of the object variable from the output to the input.e changing trend in the object variable itself is usually more important than the multivariable relation.And the self-trend is constructed based on the feedback in the time dimension.

Complexity
For the auxiliary network, the main inputs are the variables associated with the object variable.e network mainly sets up the time-series trend with the structure of the feedback.In the feedback, the data change gradient is also set as the input to compensate the prediction.e NAR-based auxiliary network realizes the regression of the univariate.Moreover, there is not only one e ect variable of the object variable.erefore, there are some auxiliary networks in practice, and the number of auxiliary networks equals the variable number.

Design and Train of Discrete Networks.
In the framework of the compound network, the primary and auxiliary networks are set up to predict the variables.ere are two issues to be solved including the concrete network structure and the network training method.e structures of the networks are shown in Figure 2.
ere are three layers in the primary network, namely, the input, hidden, and output layers.e inputs include the e ect variables which are from the auxiliary networks and the object variable.In the view of the time dimension, the data of the object variable in the past are used to predict the data in the next time steps.e data at present are provided by the auxiliary networks.e nonlinear regressive function of the network can be expressed as where y(t) is the prediction output, u(t) is the e ect variable input, (t − i) means the time step, n u is the input delay, and n y is the output delay.
e relation between the input and hidden layers is where j 1, 2, . . ., l, i 1 is the number of historical input data, u i 1 is the i 1 -th input, i 2 is the number of historical output data, y i 2 is the i 2 -th output, l is the number of hidden-layer neurons, f is the activation function in the hidden layer, W i 1 j is the connection weight between the i 1 -th input and the j-th neuron in the hidden layer, W i 2 j is the connection u(t -1)

Complexity
weight between the i 1 -th linear relation weight and the j-th neuron in the hidden layer, and A j is the threshold value of the j-th hidden neuron.e network output O can be obtained with the hiddenlayer output H j : where W j is the connection weight between the output neuron and the j-th neuron in the hidden layer and B is the threshold value of the output neuron.Similar to the primary network, there are also input, hidden, and output layers in the auxiliary network.But the hidden layers are extended to two layers.e inputs include the effect variable itself and the data change gradient which can be the reference to promote the prediction accuracy.e network can be expressed as where u(t) is the effect variable input and Δ(t) is the data change gradient given by where n u is the input delay and t 0 is the time step interval.e concrete model of the auxiliary network is where j 1 � 1, 2, ..., l 1 , j 2 � 1, 2, ..., l 2 , i 1 is the number of historical input data, i 2 is the number of linear relation weights between u(t) and u(t − 1), l 1 and l 2 are the number of hidden-layer neurons, n u is the input delay, f is the activation function of the hidden layer, u i 1 is the i 1 -th input number, ω is the connection weight between input and hidden neurons, and a is the threshold value of the hidden neuron.e output is derived from the hidden layer: where b is the threshold of the second hidden layer and c is the threshold of the output layer.
Based on the design of the networks above, the training method should be studied.
e basic learning method is from the algorithm of backpropagation through time, in which the variable from the feedback can be regarded as a new variable.e errors of the primary and auxiliary networks between the prediction output and the designed output are where e 1 and e 2 are the errors, o and O are the prediction outputs, and y and Y are the designed outputs.e connection weights ω i 1 j 1 , ω i 2 j 2 , ω j 1 , ω j 2 , W i 1 j , W i 2 j , W j , a j 1 , a j 2 , b 1 , b 2 , A j , and B are adjusted with the errors until the global error or the training iterations reach the preset value.Based on the backpropagation algorithm, the weights are obtained as

Complexity
where η 1 and η 2 are the learning rate and E 1 and E 2 are the global errors of the two networks.

Prediction Algorithm for Multivariate Time Series.
Based on the CARN proposed above, the data in practice can be used to train and obtain the networks which can predict the object variable with the effect variables.e prediction algorithm for the multivariate time series is designed based on the network model.In the algorithm, the data processing and calculation process is ascertained to obtain the final prediction results.e algorithm flow is shown in Figure 3.
e inputs of the prediction algorithm include the historical data of the object variable and effect variables and the data change gradient.e output is the series of the object variable in the next time steps.e steps of the algorithm are as follows: (1) e effect variables are selected with the correlation degrees between the object and effect variables.e historical data of the object variable and selected effect variables are preprocessed with the normalization method.In the preprocessing, the data change gradients of the effect variables should be calculated for the auxiliary networks.
(2) e historical data which have been processed are imported into the auxiliary networks.e networks are trained with the method in Section 3.2.(3) e outputs of the auxiliary networks and the historical data of the object variable are imported into the primary network to obtain the main prediction model.(4) e time step is set forward, and the updated data in the next time step can be obtained by repeating the steps above.
e compound network and the prediction algorithm for the multivariate time series have been proposed so far.In practice, the prediction length should be set, and the effect variables should be selected reasonably.en, the designed prediction results of the object variable can be obtained with the historical data.

Experiment Data and Setting.
In the experiment, we focus on the data prediction issue in the complex environment system.Two sets of the environment data are chosen to be tested.One is the atmospheric quality data from the monitoring system of an industrial park.And the other one is the meteorological forecast data.
For the atmospheric quality data, 3240 sets of data are truncated from the monitoring system in an industrial park of Hebei Province, China.e data are from different time periods which can represent different trends.e time periods include June to August in 2016 (set A), September to November in 2016 (set B), and December in 2016 to February in 2017 (set C). e monitored variables are SO 2 , NO 2 , CO, O 3 , VOC, humidity, temperature, wind speed, atmospheric pressure, etc.And they were recorded every hour in the monitoring system.SO 2 is the main factor in the atmospheric environment management in the industrial park.en, SO 2 is set as the object variable to be predicted, and the correlation degrees between other variables and SO 2 were calculated, as shown in Figure 4. en, the main effect variables were selected including NO 2 , CO, O 3 , humidity, and wind speed.
For the meteorological forecast data, there are 24 sets of data in a day.And every set is about the meteorological factors, including the temperature, humidity, wind speed, precipitation, and atmospheric pressure.Similar to the atmospheric quality data, the most relevant variables are selected for the object variable temperature.
e effect variables are the humidity, wind speed, and precipitation.
In the setting of the prediction models, the data were preprocessed firstly with the method of maximum and minimum.
e prediction network output should be denormalized.
e data were divided into the training, validation, and test sets.eir proportions are 70%, 15%, and 15%.e numbers of various sets are listed in Table 1.
In the experiments, the parameters of the network structure and training were obtained and are listed in Table 2. en, the networks are trained to run the prediction algorithm in Section 3.3.e prediction results are presented in Section 4.2.
Some typical prediction methods are set as the contrast methods, including the ARIMA model, BP, RNN, and LSTM.e contrast methods cover the main types of the classic statistical model and machine learning methods.In the concrete experiments, the ARIMA and RNN are used to predict the object variable.e BP and LSTM are designed with multiple inputs including the object variable and effect variables.

Results of Atmospheric Quality Data.
In the experiments, 162 sets of atmospheric quality data are tested for the prediction performance.e prediction results are shown in Figure 5.According to the experiment setting, the input delay means the historical data used, and the output delay means the prediction steps.For the atmospheric quality data, the historical data of the latest 6 hours are used to output the prediction, and the prediction results are the SO 2 concentration in the next 6 hours.e data are used forward circularly.In Figure 5, the reference true value and the prediction results of various methods are presented with lines in different colours, and some parts are enlarged for the obvious comparison.
For the prediction results in Figure 5, all methods can trace the general trend of the SO 2 concentration data.e results of the ARIMA and RNN fluctuate more acutely than the others.e results of the CARN are closer to the true value so that the black line seems to be hidden in the figure .For the obvious comparison of different methods, the errors are calculated and shown in Figure 6.
e mean absolute error (MAE) and root-mean-squared error (RMSE) are selected as the evaluation indicators.e indicators are listed in Table 3.
6 Complexity e absolute errors show the similar trend of the prediction results in Figure 5.In the general view, the CARN performs more stably than other methods, in which the errors of the ARIMA and RNN change more sharply.e prediction performance can be evaluated objectively with the indicators in Table 3. e MAE is the average of all errors in their absolute value.In the indicator MAE, the CARN and LSTM perform better than the others.e MAE of results in the ARIMA is largest, while the RNN and BP show a similar MAE. e RMSE re ects the overall closeness of the results to the average value.It can indicate the stability of the prediction methods.
e sort of the RMSE in di erent methods is similar to the trend of the MAE, and the CARN is more stable than the others in prediction.Complexity from the experiment of atmospheric quality data, the input and output delays are set to 12. e latest 12 sets of data are used to predict the temperature in the next 12 hours.e data shown in Figure 7 present an obvious periodic trend.In fact, 216 sets of data are the meteorological data in 9 days.e temperature changes circularly in the period of one day.

Results of Meteorological Forecast
en, the data change rule is more distinct.e prediction results of the CARN are closer to the true value than the others, in which the ARIMA and RNN uctuate because they are predicted only with the object variable and other methods use the object variable with e ect variables.
e errors are calculated and presented in Figure 8 and Table 4. Figure 8 shows the errors of di erent prediction results.From the prediction results in Figure 7 and errors in Figure 8, it can be seen that all methods can trace the data change rule closely because of the periodicity in the meteorological data.e errors mainly occur in the uctuation.e maximal MAE reaches 4.43 °C in the ARIMA which is near to

Discussion
For the prediction issue of the multivariate time series, a compound network framework is introduced in which the structure of the nonlinear autoregressive network and the prediction algorithm are designed.e experiments are conducted within the environment data, including the atmospheric quality data and meteorological forecast data.e prediction methods and results will be discussed in this section.
Firstly, the method shows the favourable short-term tracking performance in the data change rule.Generally, the prediction methods cannot avoid the divergency in the long term.It seems that there is not divergency in our prediction results.It is not that our approach is perfect, while the good regressive results derive from the setting of prediction time.
e prediction time steps of the experiment are 6 and 12, which belong to the short-term prediction.e practical true values are imported into the model circularly to output the data in the future.erefore, the prediction results show the good regressive effects.e results indicate that the proposed method can meet the short-term prediction need.
Secondly, the proposed method focuses on the prediction problem with multiple variables.For the accurate prediction, the related variables should be considered based on the target variable to be predicted.In the SO 2 and temperature are set as the object variable, and the related variables are selected as the effect variables.In the comparison methods, the ARIMA and RNN only use the object variable to predict the data themselves.
e BP, LSTM, and CARN use multiple variables to obtain more accurate results.It is indicted that the effect variables help  Complexity improve the prediction performance.In the proposed method, the design of the auxiliary network meets the need of multivariate analysis.irdly, the proposed method seeks the balance of precision performance and calculation resource occupancy.As mentioned in the introduction of related works, deep learning shows the excellent performance in prediction.It can be proved in the experiment where the result of the LSTM is similar to that of the CARN.However, the structure of the deep network is more complex than that of the network NAR, which may lead to the large consumption of the calculation resources.In the proposed method, networks based on the NAR are combined to obtain the expected prediction accuracy.Meanwhile, the simple structure of the NAR can reduce the calculation resource demand.e balance of accuracy and calculation resource in our method is beneficial to the application in practice.e proposed CARN reaches the expectant effect in the time-series prediction.
e effect is guaranteed with the compound structure of the primary and auxiliary networks to model the multivariable relation.Meanwhile, the training method in the CARN is also tested with experimental results based on the adjustment of the network parameters.
For the objective appraisal, the performance and application of the proposed network can be extended in the future.For the network performance, the training method is derived from the framework of backpropagation through time, which is an effective and simple solution in network learning.e related works on the backpropagation learning method are abundant.e improvement methods can be imitated based on the compound network structure.For the application, the proposed network can solve the direct prediction problems, such as the forecast of the weather, environment, economic market, and health management.It can also solve the data prediction in other complex systems indirectly.For example, the network may help the prediction of the control parameter in the nonlinear time-delay system [24].e prediction result will be the important information for the control and management issues.

Conclusion
For the intelligent and advance management in the information era, the data-driven prediction method is studied in this paper.Considering the characteristic of the nonstationary and multivariate effect in the nonlinear time series, a compound prediction framework is designed based on the autoregressive neural network.e experiments on the environment data are conducted to verify the performance of the method.
e method shows the favourable accuracy and appropriate calculation scale.
e proposed network realizes the prediction of the multivariate.Besides, it takes the computational efficiency into account as well as the prediction performance.Furthermore, the principle of the network training in this paper is practical.It provides a feasible solution to the nonlinear multivariate time series with the shallow neural network.In the future work, the training method can be improved based on the advanced research, and the long-term prediction performance should be promoted.Moreover, the compound autoregressive network can be applied in other fields, including the direct forecasting of the time series and indirect prediction of the parameters and components in the complex systems.

Figure 1 :
Figure 1: Compound autoregressive network for the multivariable time-series prediction.

Figure 2 :
Figure 2: Structure design of the networks: (a) primary network; (b) auxiliary network.

Figure 3 :Figure 4 :
Figure 3: Prediction algorithm ow for the multivariate time series.

Figure 5 :
Figure 5: Prediction results of atmospheric quality data in three time periods: subset data (a) from June to August in 2016, (b) from September to November in 2016, and (c) from December in 2016 to February in 2017.

Figure 6 :
Figure 6: Errors of the prediction results of atmospheric quality data in different methods: subset data (a) from June to August in 2016, (b) from September to November in 2016, and (c) from December in 2016 to February in 2017.

Figure 7 :
Figure 7: Prediction results of meteorological forecast data.

Figure 8 :
Figure 8: Errors of the prediction results of meteorological forecast data in different methods.

Table 1 :
Number of data in the experiments.

Table 4
lists the error evaluation indicators MAE and RMSE.

Table 2 :
Parameters of the network structure and training.

Table 3 :
Error evaluation indicators of the prediction results of atmospheric quality data.

Table 4 :
Error evaluation indicators of the prediction results of meteorological forecast data.