Prediction of HFMD Cases by Leveraging Time Series Decomposition and Local Fusion

Hand, foot, and mouth disease (HFMD) is an infection that is common in children under 5 years old. This disease is not a serious disease commonly, but it is one of the most widespread infectious diseases which can still be fatal. HFMD still poses a threat to the lives and health of children and adolescents. An effective prediction model would be very helpful to HFMD control and prevention. Several methods have been proposed to predict HFMD outpatient cases. These methods tend to utilize the connection between cases and exogenous data, but exogenous data is not always available. In this paper, a novel method combined time series composition and local fusion has been proposed. The Empirical Mode Decomposition (EMD) method is used to decompose HFMD outpatient time series. Linear local predictors are applied to processing input data. The predicted value is generated via fusing the output of local predictors. The evaluation of the proposed model is carried on a real dataset comparing with the stateof-the-art methods. The results show that our model is more accurately compared with other baseline models. Thus, the model we proposed can be an effective method in the HFMD outpatient prediction mission.


Introduction
Hand, foot, and mouth disease (HFMD) is a common infection caused by a group of viruses. It is likely to occur to children under 5 years old. HFMD causes a serious threat to children's health. Especially in developing Asian countries, this disease is more likely to cause big damage. China is a country with a large population and vast territory, and the development of different regions is uneven. Under this situation, it is difficult to control infectious diseases spread in China. HFMD has been a nationally notifiable disease since 2008. The new cases should be reported in 24 hours. However, the situation is still worsening. According to the data from the Chinese Centre for Disease Control and Prevention (CCDC) [1], nearly 2 million cases were reported in China in 2019, with an incident rate of over 137/100,000. Although most HFMD patients are self-limiting, HFMD can still be fatal. Thus, the prevention and control of HFMD are very important. And if health authorities had anticipated the situ-ation before the outbreak, a lot of unnecessary damage could have been avoided.
Many methods have been proposed to predict HFMD cases. ARIMA is one of the most general time series models, which is already used in HFMD prediction work [2]. ARI-MAX is the ARIMA with external parameters added, and study showed that the ARIMAX has better performance than ARIMA [3]. With the increase of computer computing power, multiple learning models are utilized in HFMD prediction, such as LSTM [4], RNN, and CNN-RNN [5]. These methods often attempt to learn the law of the disease spread trend based on a global predictor.
However, on the one hand, the HFMD outpatient data is nonlinear and nonstationary. On the other hand, the spread of HFMD is affected by complex and diverse external factors, such as climate, living habits, and living conditions. These two characteristics make it difficult to improve performance based on a global predictor. The relationship between target data and external factors provides a new idea to researchers, and many studies focus on prediction using external factors to enhance the model performance have been down. The data about external factors is named exogenous data to distinguish it from target data. In this paper, we use new ideas to improve the accuracy of prediction: time series decomposition and local fusion.
Essentially, decomposition is the process of dividing a complex problem into subproblems that can be easily solved.
In our experiments, a classical method named Empirical Mode Decomposition (EMD) is used to decompose the HFMD outpatient data. This method decomposes a time series into several subseries named Intrinsic Mode Function (IMF) and a residual. Each IMF contains a local feature. In addition, in our study, the residual is also treated as an IMF. Each IMF is treated equally by local predictors in the experiment.
In this paper, we propose a Concurrent Autoregression with Decomposition (CARD) model for HFMD prediction. We try to improve the accuracy of prediction as much as possible without exogenous data. CARD generates predicted value by fusing the output of the local predictors. The method utilizes two linear autoregression predictors to process the past outpatient data and the IMFs, respectively. Then, a fusion component fuses the outputs of two linear predictors. Finally, a global predictor is introduced to generate the predicted result. In a word, we propose an effective time series decomposition and local fusion method, which can catch a higher accuracy than several general methods that only use history outpatient data.
The main contributions of this paper can be summarized as follows: (1) We propose a novel prediction model, which applied time series decomposition and local fusion to the prediction of outpatient cases of HFMD (2) A classical decomposition method named EMD is introduced to decompose the HFMD outpatient time series. Compare with several other decomposition methods, EMD is simpler and more efficient in this study (3) The proposed method applies a linear weighted module to fuse the output of two local predictors. Each local predictor predicts an output result independently. Then, the fusion module trains to generate the final predicted value of the output of local predictors The rest of the paper is organized as follows. Section 2 introduces related work. The CARD model we proposed is explained in detail in Section 3. Section 4 illustrates the experiment design. Section 5 analyzes the experimental results. Finally, the whole research is summarized in Section 6.

Related Work
This section introduces several most commonly used decomposition methods and fusion methods related to our research.
Wavelet transform [6] inherits and develops the idea of localization short-time Fourier transform. Wavelet transform is a local transform not only the frequency but also time can be obtained. The method replaces the basis of Fourier transform. For a signal that has been processed by wavelet transform, both frequency part and specific position in time can be obtained. Compared with Fourier transform, it has good time-frequency localization characteristics and can extract information on signals more effectively.
RobustSTL [7,8] is a robust method for decomposing complex time series into trend, seasonality, and remainder components. This method allows for multiple seasonal, cyclic components, and multiple linear regressors with constant, flexible, seasonal, and cyclic influence.
EMD [9] is a Fourier transform-based signal decomposition method, which can process any nonlinear and nonstationary signal adaptively. Compared with most of the decomposition methods, EMD is easy to use, since EMD decomposes data based on the local feature of the data, so this method is adaptive and does not require setting up extra parameters in advance.
EEMD [10] is a variant of EMD. For EMD, the extremum points of the signal will affect IMFs, and mode mixing will occur if the distribution of the extremum is uneven. EEMD is proposed to solve the mode mixing problem of EMD. This method using the advantage of uniform distribution of white noise spectrum, the white noise is added to the signal to be analyzed so that the signals of different time scales can be automatically separated to the corresponding reference scales. This method is mainly to add white noise to the signal to supply some missing scale which has good performance in signal decomposition.
In recent years, there are some time series prediction works using time series decomposition in several search areas. A regression model combined with wavelet transform is proposed to forecast the future value of the S&P 500 [11]. EMD is used for electricity load forecasting [12]. Besides, time series decomposition has been applied to disease prediction work. An ensemble model for chickenpox forecast utilizes the STL decomposition to generate the input of the model. Wavelet-ARIMA model got a good performance in COVID-19 case prediction [13]. An improved EEMD algorithm is used to decompose the diarrhea time series [14]. A TDDF model utilizes heterogeneous data to predict the HDMD cases [15].
The HFMD outpatient time series data is applied in our study. The spread of HFMD is easily affected by many external factors. Thus, the processing of the time series is difficult. But the adaptive nature of EMD overcomes this problem. In this paper, we introduce EMD to process our input data.

Fusion
Methods. Time series forecasting has been a subject of interest in several different research areas including disease control and prevention. In the practical problems of nature, things are not isolated from each other but inextricably connected. The same goes for HFMD. Many studies have fused exogenous data to improve the accuracy of prediction.
The spread of HFMD is influenced by many external factors, such as meteorological factors including temperature, humidity, rapid climate change, local policies, air quality, and population [15][16][17]. Besides, making good use of some data can help researchers to predict, for instance, the search engine query data [18,19].
Several methods using exogenous data are collected, and these models can be classified into two categories-stochastic methods and learning methods. Stochastic methods usually combine the past data and exogenous data by a linear method and then learn a linear function to get prediction results [16,17,[19][20][21][22][23]. The main differences between these methods are the regression of target variables, functions on exogenous data, and the decomposition of exogenous data. In the past few years, exogenous data has been widely used in learning methods. These methods can be roughly divided into the following three categories: (1) Traditional Learning Methods Using Exogenous Inputs. The most common models are multiple linear regression (MLR), support vector regression (SVR), and neural network. For these methods, exogenous data is treated as an input dimension, just like the past data, in which each element of the inputs is equally treated. To prevent data jitter, these methods need to be validated.
(2) The learning methods focus on temporal, which inputs of different categories are differential treatment, such as [24][25][26]. For these methods, the temporal dynamics of input data is captured to use RNN structures, and a nonlinear mapping from inputs to the target is learned from training data. To differently treat exogenous inputs and target inputs, the encoder-decoder structure is employed to do time series prediction tasks. The encoder-decoder framework consists of two RNN layers and maps input sequence to output sequence [27] (3) Temporal Attention Learning Methods. The attention mechanism is fused into sequential models to predict future values, such as TPA-LSTM [28], DA-RNN [29], HRHN [30], and LSTNet [31]. These models have strong memory abilities in keeping numerous samples. Especially for small-scale infection data, the training loss value would be very small, but the accuracy would be worse than the general methods and is not general enough.
Though the exogenous data can help to improve accuracy, it still has some unavoidable defects. That is, the exogenous data requires a mass of energy to collect and organize and it is unavailable sometimes. Therefore, it is not always wise to do prediction relying on exogenous data, especially in real-time systems. It is almost impossible to integrate required data into the model dynamically. Considering the drawbacks of exogenous data, we discussed above, our attention focuses on target data itself and we do not utilize the exogenous data.

The Proposed CARD
This section formulates the problem and illustrates our approach. Figure 1 shows an overview of the proposed model. The model consists of 3 stages: data preprocessing (left), Concurrent Autoregression with Decomposition (upper right), and data postprocessing (bottom right). For any module in Figure 1, if it makes any changes in the input data, then this module will be connected to the following modules using dotted lines.
In the data preprocessing stage, the input data is the HFMD outpatient. The outpatient data is normalized and then further segmented. Finally, they are decomposed into finite IMFs and residual by EMD. In the CARD, softmax function is introduced to avoid unfairness in feature extraction. Two linear autoregression components are used to mine the sequence feature details and enhance the feature representation of input data. At last, the output of two linear components is fused and another linear component is applied to generate the predicted value. In the data postprocessing stage, the final result is generated and evaluated after denormalization.

Problem Formulation and Notations.
The main notations are explained in Table 1.
Windows size T. A window is a subsequence of the original data. T is the length of the subsequence. And the subsequence is the data in a certain interval be observed to predict the value of future time point.
IMF. If we do not have a termination, the EMD algorithm will loop an infinite number of times. In our experiments, we set a max number of the IMF which is symbolized as K to stop the decomposition.
The problem of this paper can be addressed as the problem of time series prediction missions. A time series is a list of continuous history observation values with equal time intervals. Our goal is to get a predicted value of the outpatient value of the next day.
It is a mapping from the history observation time series and IMFs to the future outpatient value. The symbol y t ∈ ℝ 1 is the value at time t. The history observation values with window size T are symbolized as ½y 1 , y 2 , ⋯, y T . And Dðy 1 , y 2 , ⋯, y T Þ is the matrix obtained by decomposing the windowed time series.ŷ T+1 denotes the predicted value at time T + 1. The mapping process can be formulated as follows: In this study, ½y 1 , y 2 , ⋯, y T denotes the HFMD outpatient window size T.

Data
Preprocessing. Normalization. The normalization operation scales the data in a specified range. In order to 3 Wireless Communications and Mobile Computing avoid large data dominance caused by the difference of data magnitude, normalization is essentially requisite.
Min-max normalization (0-1 normalization) is a widely use method in time series normalization. It is a linear transformation of the original data, making the result fall into the interval of (0,1). The original data can maintain the difference of value after the linear transformation. Thus, Minmax is suitable to normalize the outpatient time series in our study. The formula of the Min-max normalization is expressed as follows: where x denotes a sample of observed samples, x ′ is the normalization result, min ðxÞ is the smallest value in the samples, and max ðxÞ is the biggest.
Segmentation. The purpose of segmentation is to transform time series data into supervise data. For a given time series Y with K points, the segmentation formula is as follows: where the left matrix is the input data and the right matrix is the output data. Empirical Mode Decomposition. In this paper, we use EMD [13] to do data decomposition. We perform time series decomposition on the supervision data generated by segmentation. For each sequence, we decompose it into 3 IMFs and a residual.
An IMF must satisfy the requirements as follows: (1) In any local time scale, the number of extrema and the number of points cross zero must be equal or the difference is 1 (2) At any point, the mean value of the upper envelop defined by the local maxima and the lower envelope defined by local minima is close to 0 The procedures of EMD algorithm are shown in Algorithm 1.

Concurrent
Autoregression. The processing of IMF. We utilize the softmax function to process the IMF future. The softmax function is an extension of the logistic function. This function maps a k-dimensional vector containing any real number to another k-dimensional real-valued vector. Such that each element is in the interval (0, 1), and the sum of all elements is 1. After the process of softmax, the largest value is highlighted and the other components that are far below the maximum value are suppressed. The formula of softmax function is expressed as follows: where x i is the output value of the i-th input vector. ω is the weight matrix. k is the number of output elements. The generation of input memory a i is based on the input vector X and the weight matrix ω. The formula is expressed as follows: CARD employs a linear layer to receive a regression result of IMF. The formula is expressed as follows: where e q is the weighted IMF feature matrix, ω q is the weight corresponding to the input dimension, and b q is a bias value. The processing of the HFMD outpatient data. The processing of outpatient data is essentially the same as that of IMFs. The difference is the softmax function is not use for normalization. Only a linear component is applied to analyze the trends in outpatient data. The formula is expressed as follows: where e y is the weighted outpatient feature matrix, ω y is the weight corresponding to the input dimension, and b y is a bias value.
Concatenation. The CARD model combines the output of two concurrent working components by the cat function in PyTorch. The data is treated as the input for last linear module, and finally, this module generates a predicted valuê y T+1 . The generation ofŷ T+1 is formulated as follows: where ½e y; e q is the concatenated vector of dual side outputs, ω is the weight of outputs from dual represented sources, y T+1 is the predicted value of the outpatient number in the next day, b is a bias value, and Ф represents the activation function.

Data Postprocessing. Denormalization.
Denormalization is an inverse procedure of normalization, and the denormalization formula is applied to generate the final prediction results acquired from our model. The formula is expressed as follows: The detail steps of the proposed CARD are shown in Algorithm 2.

Experimental Setup
This section configures our experiments. Section 4.1 introduces the dataset we use. Section 4.2 gives three evaluation metrics. And Section 4.3 presents the implementation of our model and the baseline models for comparison. All experiments are proceeding with the real-world HFMD outpatient case time series data.

Data.
The real dataset we applied in our experiments is HFMD outpatient case data which is collected from the Xiamen Center for Disease Control and Prevention (XCDC). This dataset is the daily record data from January 1, 2012, to December 30, 2018. A total of 2555 sample points are included. In Figure 2, the time series is shown at one-year intervals.

Metric.
To measure the performance of our proposed model and compare our model with the selected baseline models, 3 widely used standard methods are adopted in our experiments, and the formulas are defined as follows: In these equations, the parameter y t is the real MAE is a basic and universal metric in regression mission. Compared with MAE, RMSE has the same degree as the data. For R 2 , the denominator is understood as the dispersion degree of the original data, and the molecule is the error between the predicted data and the original data. The division of the two can eliminate the influence of the dispersion degree of the original data. These three metrics can be used together to evaluate the performance of the model comprehensively and objectively.

Configuration. Parameter settings.
In our experiments, the target data is divided into two parts: training set (80%) and test set (20%). The batch size is set to 32. A set of experiments are completed to find the best values of window size, and the results are shown in Figure 3, and as we can see, the best performance is achieved when T = 10. For each experiment, we chose the learning rate between 0.0005 and 0.002 for a step 0.0005 to acquire the best performance of every model. We repeat each set of experiments five times and take the average value to obtain the final result. Thus, the result is stable and has a high level of credibility.
Decomposition algorithm. RobustSTL is more suitable for long-time series processing, and the time series we used is too short for this method. Therefore, we only consider wavelet transform, EMD, and EEMD as the time series decomposition method candidates. There are four experiments for comparison that have been done, and the results are shown in Figure 4. "IMF3" means the original data will be decomposition into three IMFs and one residual. Both "db" and "sym" are commonly used wavelet basis functions. "db" is the abbreviation of Daubechies, and "db2" represents a wavelet of order 2. "sym" is symlets and "sym3" means a wavelet of order 3. As we have seen in Figure 4, the wavelet transform-based approach has a time advantage, while the EEMD algorithm consumes too much computational time. Although the EMD algorithm is at a time disadvantage, it takes the lead in three metrics. Thus, this configure is applied in the formal experiments.
e y ← y i,j using Equation (7); 8. for each sample x in X do 9 for i ← 1 do 10 for j ← 1 do 11.
ω ← softmax x i,j using Equation (4); 13. A ← ω and X using Equation (5); 14. e x ← A using Equation (6)  Wireless Communications and Mobile Computing [35], CNN-1d, and CNN-RNN [36] are selected as the baseline models. To explain, MLR is a widely used regression model in many research areas. LSTM, GRU, and ED are improved neural networks based on RNN. CNN-RNN is a hybrid model of CNN and RNN. Experiment process. All experiments could be divided into 3 groups. The main difference between the three groups of experiments is their input data. The input data of the first group of experiments contains only historical data, and the second group uses only the data after time series decomposition. The last group of experiments takes both historical and decomposed data as input. As a result, the final group has the best performance. Details are discussed in the next section.

Results and Analysis
This section gives prediction results, comparisons, and analyses.

Effects on Decomposition and Fusion.
In this subsection, we investigate the effects of decomposition and fusion. As we can see in Figure 5, the result of three metrics shows that    Time series decomposition is an important part of this study. We decompose the HFMD outpatient time series into finite and multitime scale IMFs and a residual; then, each subsequence is modeled and predicted with a local linear predictor separately. The single IMF contains a specific physical meaning, such as seasonality and trend. Each sequence is treated equally in the model. Compared with the original data, each IMF can represent the local features by itself. This means that separate predictions for each sequence and then fusion may give better results than using only the raw data, and the experimental results proved this.
The prediction accuracy of all baseline models has increased after fusing the HFMD outpatient case data. This result shows the superiority of data fusion. A possible explanation is that existing models do not work well with complex time series like IMF, and some methods cannot capture the relation between different sequences. The data processing of the CARD can be divided into two stages. In the first stage, each sequence is predicted separately, and then, the results are fused. In the second stage, the predicted values are obtained from the fused data, which can analyze the relationship between each sequence. So, we get better results than other models. By the way, the IMFs may lose some features in the original data. These defects are more obvious with the short and complex time series data. However, the fusion of IMF and case data overcomes this shortage. That may explain why all models have various degrees of improvement after fusion.

Comparison.
The main results are shown in Figure 5. The major results can be observed, and the analysis of them is as follows: Out of all the models, CARD performs the best. In detail, MLR is the second-best model. Compared to MLR, CARD is slightly behind in MAE and RMSE, and we are slightly ahead in R 2 . In addition, we are at least 0.1, 0.4, and 0.1 ahead of the other models in three metrics. The advantages of our model are described in Section 5.1, and these should explain the leading position of our model.
In the experiments using only decomposed data as input, several baseline models showed various degrees of degradation in performance. And their performance is improved if the outpatient data is added to the dataset. However, as we can see in Figure 5, the best performance of these models is still obtained in the first set of experiments-the input is outpatient data. In contrast, the performance of CNN-1d and MLR shows only small fluctuations. One possible explanation is that the EMD algorithm filters peaks in the time series while CNN-1d and MLR are insensitive to peaks. Therefore, the accuracy of these two models is not affected much. LSTM, ED, and GRU study the dependence of time series, and since these models cannot capture the relationship between the series, the IMF may negatively affect the prediction accuracy. CARD performs weighting at each time point and predicts 5   Wireless Communications and Mobile Computing each IMF separately. Finally, the model generates a result by fusion. Thus, CARD solves these problems and obtains better performance. Although the CARD model does not make revolutionary advances, however, the model is much less computationally intensive compared to most neural network models. Therefore, the model has relatively low hardware requirements. Moreover, this model still has good predictive performance when using only historical data, which means that the data needed to run the model is easily available. This further lowers the threshold for practical using the model. Therefore, our proposed model has good prospects for practical applications.

Conclusions
Our experiment indicates that data decomposition and local fusion can improve prediction performance. In this paper, we propose a time series decomposition and local fusion model named CARD for HFMD outpatient case prediction. The main conclusions of this study are shown as follows: (1) Compared with wavelet transform and EEMD, the EMD method has advantages in predicting accuracy in terms of HFMD outpatient prediction. Therefore, EMD is suitable for HFMD outpatient time series (2) The fusion model we proposed is superior to the most general methods, which means that such a model still has great potential in infectious disease forecasting Our study must go further research. In this paper, we do not test the predicting accuracy on the multistep prediction. In the next step, we can try to extend our model to multistep times series prediction and other diseases.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.