A Hybrid Grey Prediction Model for Small Oscillation Sequence Based on Information Decomposition

Grey prediction model has good performance in solving small data problem, and has been


Introduction
Big data technology is a computational strategy and method for processing large data sets.It is based on large data and has gradually become a research hotspot in recent years.However, sometimes it is di cult to obtain large data.Due to technological capabilities or historical reasons, there are still many small data, such as unconventional energy production, shortterm tra c ow, sulfur dioxide emissions, crops disaster area, and so on [1][2][3][4].e above problems show that there are many grey systems in the real world, and the data of these grey systems are limited.Big data technology can not e ectively describe the grey system from the small data.
Grey prediction model is a useful method to study uncertain systems with partly known information and partly unknown information [5,6].At present, there are mainly two kinds of sequences suitable for grey prediction model, one is monotone sequence [7][8][9][10], the other is a sequence with saturated " " shape [11][12][13].For other sequences, such as oscillation sequence or uctuation sequence, the performance of grey prediction model is poor.However, the real world is complex.e monotonic sequence and the saturated -shaped sequence are only two special cases, and more sequences show oscillation characteristic [14][15][16].erefore, how to reasonably construct a grey prediction model to model with oscillation sequence has become a research trend.
Currently, grey prediction model has made some achievements in modelling with oscillation sequence.ese studies are mainly manifested in the following three aspects: (a) increasing smoothness of oscillating sequences: the poor smoothness of the oscillation sequence is the main reason for the poor modelling accuracy of the oscillation sequence, so smoothing the oscillation sequence becomes a way to improve the modelling accuracy.At present, sequence smoothness is mainly improved by sequence transformation, such as smoothness operator and amplitude compression [17][18][19][20]; (b) modelling oscillation interval by envelope: from the perspective of scope, the oscillation sequence envelope is modelled.e envelope is modelled by grey prediction model, and the simulation and prediction of the oscillation sequence variation range are realized [21][22][23]; (c) improving the structure of grey prediction model by periodic operator: in order to adapt to periodic sequence, scholars have introduced periodic factor of triangular function and have established periodic grey prediction model to match the periodic uctuation of sequence and reduce modelling error [24,25].
e above methods can improve the modelling ability of grey prediction model for oscillation sequence to a certain extent, but they still have some shortcomings.e sequence transformation method destroys the characteristics of the original sequence and can not make full use of the information transmitted by the sequence.e randomness of envelope design is too large and its generalization is weak.Grey periodic prediction model not only increases the complexity of the model structure, but also only works for periodic and regular uctuation sequences.When the sequence has oscillation characteristic, the performance of grey periodic prediction model is poor.
e oscillation sequence is composed of di erent scales information, such as trend, randomness, periodicity, etc.It re ects the nal result of the system under the in uence of various uncertainties [26][27][28].A single prediction method is suitable for modelling with a single time scale sequence.It can not simultaneously simulate and predict two or more time scale information of the oscillation sequence, which ultimately can not get intended e ect.
However, preprocessing complex sequence into simpler mode, has o en led to satisfactory predicting results.Empirical mode decomposition (EMD) algorithm is a multi-scale analysis method.It decomposes complex oscillation sequences into a set of sub-sequences, which contain the information of the original sequence in di erent time scales [29].According to the characteristics of sub-sequences, appropriate models are selected to simulate and predict the corresponding sub-sequences.Integrating the simulated and predicted values of sub-sequences will obtain the simulated and predicted value of the original sequence.
Decomposed by EMD algorithm, the small sample oscillation sequence is usually decomposed into two sub-sequences.One part is short-time trend sub-sequence.e other is one or more random uctuation sub-sequences.GM(1,1) model is the most classical model in grey prediction model, needs only a little data (not less than 4).It excavates the trend of system through grey generation processing, and then achieves the e ect of simulating and prediction.erefore, GM(1,1) has superior performance in modelling with small trend sub-sequence.Random uctuation subsequence is usually modelled by ARMA model.Based on the above facts, we use GM(1,1) model and ARMA model to simulate and predict sub-sequences, respectively.According to the result of decomposition, there may be other kinds of sub-sequences, but trend and uctuation subsequences are the most common cases.
erefore, we mainly study the general situation and specically analyse the other situations.
In this paper, a hybrid grey model for predicting small oscillation sequence is proposed based on information decomposition.In order to verify the validity of the proposed model, we select the crops disaster area in China as the modelling object, which has small oscillation characteristics.Comparing the simulation accuracy of the new model with that of the traditional ARIMA and GM(1,1) models, the result shows that the new model is obviously superior to the traditional model, which proves the validity of the new model.e remainder of this paper is organized as follows.In section 2, the principle of empirical mode decomposition is introduced.In Section 3, the EMD-ARMA-GM(1,1) prediction model is proposed.In Section 4, modelling condition and testing method of model errors are studied.is is followed by comparisons of the proposed model with ARIMA and GM(1,1) model, and the proposed model is used to predict crops disaster area in China.en, conclusions are drawn in Section 6.
A chart showing the structure of this paper is given as Figure 1.

Empirical Mode Decomposition Principle
Empirical mode decomposition (EMD) is a method of signal decomposition, which does not depend on prior data and completely relies on the intrinsic characteristic of the data itself.A er EMD adaptively decomposed the original data according to its intrinsic characteristic, the obtained Intrinsic Mode Functions (IMFs) re ect the inherent characteristic of the data [30].IMF satis es the following two conditions at the same time: (i) in the whole data set, the number of extrema and the number of zero-crossings must either equal or di er at most by one; (ii) at any point, the mean value of the envelope de ned by local maxima and the envelop de ned by the local minima is zero [31].e operation steps of the EMD algorithm for oscillation sequence ( ) are as follows [32]: Step 1. Recognize all the maximum points and minimum points in sequence ( ), and use cubic spline interpolation function to t all the maximum points to form the upper envelope, and then t all the minimum points to form the lower envelope, which are marked as ( ) and w ( ), respectively.
Step 2. In each time period , the average of upper and lower envelopes of sequence ( ) is denoted as 1 ( ), and is calculated as Step 3. Minus the average envelope of sequence ( ): If sequence 1 ( ) has negative local maxima and positive local minima, then 1 ( ) is regarded as a new original sequence ( ).Repeat the above process until 1 ( ) satis es the two conditions of IMF.It is denoted as 1 ( ), where 1 ( ) = 1 ( ), which is called the rst IMF component a er decomposition of the original sequence ( ).
Step 4. Sequence 1 ( ) is separated from the original sequence ( ) and the residual component is obtained, which is denoted as 1 ( ), that is Step 5. e residual component 1 ( ) is regarded as a new original sequence, and the " ltering" process of Step 1 is repeated until the new IMF component can not be separated.At this time, the original sequence ( ) is " ltered" by EMD algorithm to get IMFs and one residual component, where ( An example of the empirical mode decomposition of an oscillating sequence is shown in Figure 2.
In Equation (7), is a stationary oscillation sequence; is the ACF tail order of sequence and is the PACF tail order of ; (1), . . ., , , . . ., are real parameters and be estimated by identi cation function ARMAX.

De nition 4. Assume sequence
is stated as De nition 3.
Assume is error sequence as follows,

Let that is
According to LSM, can be minimised with respect to parameters , to obtain Parameters , can be obtained, as follows, ( (15) = − .

Complexity 6
De nition 9. Assume that = ( (1), (2), . . ., ( )), where t( ) ≥ 0 for = 1, 2, . . ., , then the following is referred to as the smoothness ratio of sequence : e concept of smoothness ratio re ects the smoothness of a sequence.Obviously, the smoother the change of sequence is, the smaller the smoothness ratio is.
In the hybrid prediction model, EMD algorithm decomposes the original time series into sequence and sequence to extract intrinsic characters of the complex system.sequence is inputted into the ARMA model to describe the random changes and sequence is substituted into GM(1,1) model to describe the trend.e value ̂ ( ) obtained by super- position ̂ ( ) and ̂ (0) ( ) realizes the simulation or prediction of the original sequence.e ow of EMD-ARMA-GM(1,1) model is shown in Figure 3.

Modelling Condition and Error Checking
Method for the EMD-ARMA-GM(1,1) Model

Modelling Condition of the EMD-ARMA-GM(1,1)
Model.Each prediction model has a speci c modelling condition and applicable rang.A model can be used for prediction only when the modelling condition is satis ed.
T 2: e quasi-smooth condition of sequence ( ).Estimated value 2 1 1.284 0.9363 0.9184 Complexity For given threshold value in which the threshold is set according to the speci c situation of the system, when < holds true, the grey model is said to be error-satisfactory.

Application
China is a large agricultural country, but its special geographical location and climate environment lead to natural disasters frequently, which cause a large number of crops disasters every year.Large-scale crops disaster has seriously a ected the national grain security, the basic status of agriculture and the sustainable development of rural economy.A scienti c prediction of crops disaster areas can provide reasonable reference for arranging agricultural production subsidy and disaster relief subsidy, which has positive signi cance for promoting the sustainable development of agriculture and China's economy.e crops disaster in China has a long history.To prevent and mitigate disasters, Chinese government proposes and implements many signi cant policies since 2010.ese policies have e ectively improved the situation of crops disaster and profoundly in uenced the crops disaster area in China.
e data of crops disaster area in China from 2010 to 2017 are a small oscillation sequence.
e data of crops disaster area in China from 2010 to 2017 are shown in Table 1.
e quasi-smooth condition of residual component is used to act as the criteria to test whether an oscillation sequence can be used to establish EMD-ARMA-GM(1,1) model.

Error Checking Method for the EMD-ARMA-GM(1,1)
Model.A model's performance can be judged by testing, and only the model that pass test can be meaningfully employed to make predictions.

=
(1), (2), . . ., ( ) , From De nition 9, we can obtain the smoothness ratio of sequence ( ) and the values of smoothness ratio are shown in 5.1.Data Decomposing.EMD algorithm is applied to decompose the sequence of crops disaster area in China, and an IMF1 and a residual component ( ) are obtained.e results are shown in Figures 4 and 5, respectively.
As can be seen from Figure 4, IMF1 is a curve of oscillations around the -axis, showing linear and random characteristic of original sequence.
In Figure 5, ( ) is a monotonic decreasing curve and shows the decreasing trend characteristic of the original sequence.As provided in Table 4, We substitute the parameters into the whitening equation of GM(1,1) model, and get the simulated value of ( ). e simulated curve of ( ) is shown in Figure 7.
Finally, through integrating the simulated values of IMF1 and ( ), we can get the simulated value of China's crops dis- aster area.e simulated curves of crops disaster area in China is shown in Figure 8.

Result and Analysis.
To verify the performance of EMD-ARMA-GM(1,1) model, we compare the MRSPE of EMD-ARMA-GM(1,1) to that of traditional mainstream prediction models, including ARIMA model and GM (1,1) model.e simulated values ̂ ( ), ( ) and MRSPE of the three models are presented in Table 5.

Modelling.
Firstly, IMF1 is introduced into ARMA , model.By increasing its order gradually, IMF1 is closer to the dependence of data.When tting e ect of the data is best, it stops and gets the value of and .Next, the parameter identi cation function ARMAX is used to estimate 1 , . . ., , 1 , . . ., .e optimal order and parameters are obtained as shown in Table 3.
As shown in Table 3, the proper value of is 2 and is 1.So we use ARMA(2, 1) model to simulate IMF1, and draw the sim- ulated curve of this model based on IMF1, as shown in Figure 6.
Next, ( ) is introduced into GM(1,1) and the parameters are estimated by least square method as shown in Table 4.   illustrate the simulation e ects of the three models for China's crops disaster area, we draw the simulated curves and errors of the three models based on the data in Table 5 in MATLAB as shown in Figures 9-12.
According to Figures 9-12, the performance of lEMD-AR-MA-GM(1,1) model is best among the above three models.more than 10%.Comparatively, the performance of the GM(1,1) model is second to that of EMD-ARMA-GM(1,1) model because it does not consider the e ect of random oscillation characteristic; the performance of the ARMA model is the worst among the three model because it does not consider the in uence of trend characteristic.In order to clearly   not effective in predicting oscillation sequence by analyzing the intrinsic characteristics of oscillation sequence: the system of oscillation sequence is complex, and the trend and random oscillation are often combined.Therefore, based on information decomposition and aiming at extracting the intrinsic characteristics of the sequence, a hybrid grey prediction model is established in this paper.The results of case analysis show that the proposed model considers the complexity of system information, effectively describes the operation behavior and rules of the system, and the effect is higher than that of a single classical prediction model.e new grey hybrid prediction model provides a new idea and method for small oscillation sequence.However, when the size of oscillation sequence is big, the big data methods can be used to simulate and predict the oscillation sequence, such as neural network and support vector machine.At this time, the performance of the new hybrid grey prediction model needs to be compared with that of the big data method, and the simulation and prediction errors can be used to determine the performance of those methods, and then the superior one is selected for study the oscillation sequence.
In the following work, we will further consider the other characteristics of the sub-sequence generated by EMD algorithm, and establish suitable methods to study the oscillation sequence.
Data Availability e China's crop disaster area data used to support the ndings of this study are included within the article.us it is evident that the performance of EMD-ARMA-GM(1,1) model is better than that of traditional mainstream prediction models.

Prediction of Crops Disaster Area in China.
e EMD-ARMA-GM(1,1) model is used to predict the crops disaster area in China from 2018 to 2021, and the results are shown in Table 6.
Table 6 shows that the overall trend of crops disaster area in China is decreasing in the next four years, but the crops disaster area is still very large.By 2021, it will reach 19633390 hectares.e large of crops disaster area may cause shortage of grain and inhibit rural economic.erefore, in order to maintain the sustainable development of agriculture and national economy, the Chinese government needs to develop policies for production subsidies and disaster relief subsidies, and set aside su cient funds to deal with the problems of crops failures caused by future natural disasters.

Conclusion
In this paper, the shortcomings of grey prediction model in modelling small oscillation sequence are analysed, and then we find out the reason why grey prediction model is

F 2 :
e e ectiveness of EMD Algorithm.Complexity simulated curve of ( ).

4 F 8 :
e simulated curve of crops disaster area in China.Complexity

F 10 :
e simulated curve of the ARIMA model.