A Novel Linear Time-Varying GM(1,N) Model for Forecasting Haze: A Case Study of Beijing, China

Haze is the greatest challenge facing China’s sustainable development, and it seriously affects China’s economy, society, ecology and human health. Based on the uncertainty and suddenness of haze, this paper proposes a novel linear time-varying grey model (GM)(1,N) based on interval grey number sequences. Because the original GM(1,N) model based on interval grey number sequences has constant parameters, it neglects the dynamic change characteristics of parameters over time. Therefore, this novel linear time-varying GM(1,N) model, based on interval grey number sequences, is established on the basis of the original GM(1,N) model by introducing a linear time polynomial. To verify the validity and practicability of this model, this paper selects the data of PM10, SO2 and NO2 concentrations in Beijing, China, from 2008 to 2018, to establish a linear time-varying GM(1,3) model based on interval grey number sequences, and the prediction results are compared with the original GM(1,3) model. The result indicates that the prediction effect of the novel model is better than that of the original model. Finally, this model is applied to forecast PM10 concentration for 2019 to 2021 in Beijing, and the forecast is made to provide a reference for the government to carry out haze control.


Introduction
In recent years, the process of urbanization and industrialization has been accelerating in China, but air pollution is seriously increasingly. At present, haze is one of the greatest atmospheric, environmental pollution problems in China. Haze not only has adverse effects on the ecological environment [1][2][3] but also poses a major threat to human health [4,5]. As one of the largest developing countries in the world, China is facing the biggest challenge of sustainable development. In 2018, China's government officially issued the 'Three-Year Plan of Action to Win the Blue Sky Defense War'. Therefore, an accurate study of haze is of great importance.
Beijing is one of key cities for haze control in China. 16 haze events occurred in Beijing from November, 2012 to January, 2013, and the minimum visibility was even 667 m during this period [6]. Besides, the haze event in January, 2013 may have been the cause of 690 deaths in Beijing, which could lead to 253.8 million dollars losses [7]. To reduce the haze event, China's government implemented the 'Air Pollution Prevention and Control Action Plan' in September, 2013. Since then, an increasing number of scholars started to research haze from different aspects. Current haze studies mainly focused on health impact [3,4,7], economic loss [7], chemical composition [8], statistical characteristics [9], trend prediction [10], formation mechanism [11], and the rest. Above researches mainly based on hourly or daily data, but the goal of Blue Sky Defense War is to reduce the average annual concentration of pollutants. Therefore, using annual data to forecast haze is necessary. Due to the uncertainty and suddenness of haze, the data of its related indices are uncertain. In the system research, grey number refers to the uncertain number in an interval or a general number set [12]. In other words, due to the constraints of data acquisition tools, acquisition conditions and errors by the acquisition personnel, the value of haze may contain inevitable measurement errors in a certain range, thus, haze has obvious grey number features. Therefore, Wu et al. established a grey prediction model to forecast annual pollutant concentrations effecting haze by limited data for the first time [13]. Grey prediction model is an important part of grey system. As an emerging uncertain system theory, grey system theory is characterized by small data modeling to obtain accurate results, and widely applied in all professions and trades. The grey system theory was established by Professor Deng, and assists with solving an uncertain system of small data and poor information, mainly through the deep mining of the inherent law in the existing information research system. Besides, grey number is the basic unit of grey system [12]. To summarize, grey prediction model is an effective tool for studying haze.
At present, the original grey model (GM)(1,N) based on interval grey number sequences is used to estimate structural parameters by the least squares method, and the result of its structural parameters are fixed values that are not related to time. Although this method is simple, it ignores the dynamic change characteristics of the parameters over time, which may lead the model to have a high precision fit, but not an ideal predictive effect. Therefore, this paper selects the data of PM 10 concentration (PM 10 refers to particles with aerodynamic equivalent diameter less than or equal to 10 microns in ambient air, called inhalable particulate matter), SO 2 concentration and NO 2 concentration in Beijing, China, from 2008 to 2018, and establishes a novel linear time-varying GM(1,3) model based on interval grey number sequences to perform simulations and make predictions. Meanwhile, we establish an original GM(1,3) model based on interval grey number sequences as the contrast model, and then we compare the results of these two models.
This paper is organized as follows: the literature reviews of studying on haze and grey prediction model are presented in Section 2; the modeling algorithms and model testing methods of the linear time-varying GM(1,N) model based on interval grey number sequences are illustrated in Section 3; the linear time-varying GM(1,N) model based on interval grey number sequences is established to forecast PM 10 concentration of Beijing in Section 4; the main conclusions of this paper are summarized in Section 5.

Study of Haze
To control haze more scientifically and effectively, many scholars at home and abroad have carried out a series of extensive studies. PM 2.5 (PM 2.5 refers to particles with an aerodynamic equivalent diameter less than or equal to 2.5 microns in ambient air, called fine particulate matter) and PM 10 are main factors causing haze, which not only affect the quality of atmospheric environment and visibility, but also endanger human health. Voukantsis et al. used the neural network method of multilayer perceptive structure to predict the daily change of PM 10 concentration in Thessaloniki and Helsinki [14]. Kumar [20]. The novel air quality early warning system was proposed by Xu et al., which included an evaluation module, a prediction module and a feature estimation module, and this literature used a novel dynamic fuzzy comprehensive evaluation method to determine the air quality level and major pollutants in the study area [21]. Han et al. predicted the spatial distribution of lung cancer in males induced by PM 2.5 in China from 2010 to 2015, and the spatial autocorrelation method was used to evaluate the spatial relationship between the incidence of lung cancer and the atmospheric level of satellite-derived PM 2.5 from 2006 to 2009 [22].
Nowadays, there are some scholars studying haze with grey system theory. Xiong et al. established the multivariate grey model (MGM) (1,M) based on interval grey number sequences to predict the visibility and relative humidity during a haze period in Nanjing [23]. Gong et al. combined the GM(1,1) model with both the Markoff chain model and the residual error correction model to establish a modified grey Markoff chain model, and used this model to predict the PM 2.5 concentration in Shanghai [24]. Wang studied the distribution characteristics of PM 2.5 concentration in Huaian with the non-parametric hypothesis test, and predicted the PM 2.5 concentration of Huaian in the next five years by using GM(1,1) model [25]. Chen et al. predicted the hourly PM 2.5 and PM 10 in Taichung's Dali area by using several GM(1,1) models and back-propagation artificial neural network, and compared their predictive performances [26]. Wang et al. established a novel grey correlation degree model, and used this model to dynamically analyze the influencing factors of haze in southern China [27].

Study of Grey Prediction Model
The grey prediction model has been widely applied in many fields, and it occupies an important position in the grey system theory. The GM(1,1) model that has only one variable is the most widely used prediction model in the grey system, but it is most suitable for the time sequences of monotonic increasing or monotonic decreasing. Hence, many scholars have worked to improve it. Wu et al. proposed a new GM(1,1) model with the fractional order accumulation, and this model had a better predicted performance than the traditional model [28]. Focusing on why the discrete grey model simulated the constant value growth rate, Zhang et al. established a linear time-varying discrete grey model by introducing a linear time polynomial [29]. Wang introduced the time polynomial function into the GM(1,1) power model and optimized the power exponent of the model [30]. Zeng et al. built a self-adapting intelligent grey prediction model to predict the natural gas demand of China [31].
However, the GM(1,1) model ignores the effect of related factors on the system behavior data, and then, Professor Deng proposed the GM(1,N) model, which has one system behavior sequence and N-1 related factors sequences [12]. The GM(1,N) model, as an extension of the GM(1,1) model, fully considers the impact of related factors on the system behavior data. Therefore, the GM(1,N) model and its optimization models are increasing in researches and applications. Ding et al. established a novel GM(1,N) model combined with the changing trend of the driving term, and this model was applied to predict CO 2 emissions from fuel combustion in China [32]. Zeng et al. proposed the optimal background-value GM(1,N) model through optimizing the background-value coefficient with the particle swarm optimization algorithm [33]. Wang et al. constructed a nonlinear GM(1,N) model by introducing the power exponent of the related factors, and this model was used to predict the carbon emissions of fossil energy consumption in China [34]. The above studies on grey prediction models were mainly based on real number sequence. Recently, increasingly more scholars have begun to explore the modeling problem of grey prediction models based on interval grey number sequences. Ye et al. fully explored and expanded the axiom of generalized non-decreasing grey degrees and established a prediction model for interval grey number sequences [35]. Luo et al. established a discrete GM(1,1) model for kernel and measure sequences, and then restored the predicted value of the interval grey number [36]. Yang et al. established a prediction model for the normal distribution based on interval grey number sequences in the context of the normal distribution of uncertain information [37].
A new nonlinear GM(1,N) model was established based on interval grey number sequences by Xiong et al., which was applied to the prediction of air quality index in haze period [38].

The Introduction of Interval Grey Number
In this section, we will mainly introduce the basic concepts of interval grey number, including the definitions of interval grey number, kernel and grey radius. Definition 1. [12] grey number that has both lower bound a k and upper bound b k is called interval grey number, denoted as ⊗ k ∈ [a k , b k ]. Definition 2. [12] suppose that interval grey number ⊗ k is a continuous function, ⊗ = (a k + b k )/2 is called a kernel of interval grey number ⊗ k . Definition 3. [12] when ⊗ k is a continuous interval grey number, r(k) = (b k − a k )/2 is called a grey radius of interval grey number ⊗ k .

Linear Time-Varying GM(1,N) Model Based on Interval Grey Number Sequences
In this section, the modeling mechanism of the linear time-varying GM(1,N) model based on interval grey number sequences will be introduced. Besides, this model will be constructed from the kernel and grey radius sequences, respectively. Finally, the kernel and grey radius sequences will be restored to the interval grey number sequences. To illustrate the model more clearly, we will show the modeling steps of the linear time-varying GM(1,N) model based on interval grey number sequences in Figure 1. Obtain the original data.
Step 1: Calculate the kernel and grey radius sequences of interval grey number sequences.
Step 2: Establish the linear time-varying GM(1,N) model based on the kernel and grey radius sequences respectively, and obtain the simulated and predicted values.
Step 3: Restore the simulated and predicted values of the kernel and grey radius sequences to obtain the simulated and predicted values of the upper and lower bound of interval grey number sequences.
Step 4: Calculate the average relative error of the upper and lower bounds. Then, this section will mainly introduce the linear time-varying GM(1,N) model based on kernel sequences as follows: is the mean sequence generated by consecutive neighbors of ⊗ (1) 1 ; thus, the linear time-varying GM(1,N) model based on kernel sequences is shown as follows: Also, the whitening equation of the linear time-varying model based on kernel sequences is shown as follows: (2) When n − 1 > 2N − 1, that is n > 2N, and B . . .
The proof is similar to literature [39].

Theorem 2.
After calculating the coefficient vectorâ, the solution of the linear time-varying GM(1,N) model based on kernel sequences is shown as follows: where the solution can be obtained using the initial condition ⊗ 1 (1). Also, the inverse accumulating reduction equation is shown as follows: The modeling mechanism of the linear time-varying GM(1,N) model based on grey radius sequences are the same as the linear time-varying GM(1,N) model based on kernel sequences, thus we will not present the linear time-varying GM(1,N) model based on grey radius sequences repeatedly.
After obtaining the values of kernel and grey radius respectively, we will calculate the upper and lower bounds of the interval grey number sequences as follows [40]:

Model Evaluation Criterion
To analyze the reliability and credibility of the prediction model, we will show the model evaluation criterion for testing the model accuracy in this section. By comparing the relative error and average relative error of the upper and lower bounds of the interval grey number sequences to test the prediction model, the testing equations are shown as follows: The relative error of the upper and lower bounds of the interval grey number sequences are shown as follows: , k = 1, 2, · · · , n.
The average relative error of the upper and lower bounds of the interval grey number sequences are shown as follows: Prediction accuracy is an important criterion for measuring the reliability of the prediction model. Therefore, this paper provides the prediction accuracy corresponding to the average relative error in Table 1. Table 1. The average relative error criterion for testing model [41].

Empirical Results and Discussion
In this section, the linear time-varying GM(1,N) model and the original GM (1,N) model, based on interval grey number sequences, will be established to simulate the development trend of haze in Beijing, and the model with high prediction accuracy will be selected to forecast the haze situation in Beijing from 2019 to 2021.

Data Selection and Processing
PM 10 can be formed by the interaction of sulfur oxides, nitrogen oxides and other compounds in the ambient air. Additionally, PM 10 is highly correlated with SO 2 and NO 2 in 31 cities of China [42]. Therefore, PM 10 concentration in Beijing from 2008 to 2018 is selected as the system behavior sequence, and SO 2 concentration and NO 2 concentration are selected as related factor sequences. In addition, the data are from the annual report of air quality in Beijing. In data processing, the maximum and minimum observed values over the previous three values are the upper and lower bounds of the third interval grey number. Moreover, the interval grey number sequences for 2008 to 2014 are used as the modeling data, and the interval grey number sequences for 2015 to 2018 are used as the prediction data. Besides, we denote the interval grey number sequences of PM 10 concentration, SO 2 concentration and NO 2 concentration as X 1 (⊗), X 2 (⊗), X 3 (⊗) respectively. The original data is shown in Table 2.

Establishment and Comparison of Model
Step 1: on the basis of the data from 2008 to 2014 in Table 2, the kernel and grey radius sequences of PM 10 concentration, SO 2 concentration and NO 2 concentration are calculated according to Definition 2 and 3, and the results are shown in Table 3. Table 3. The kernel and grey radius sequences of PM 10 concentration, SO 2 concentration and NO 2 concentration Step 2: after calculating model parameters by least squares method, a linear time-varying GM(1,3) model for the PM 10 concentration kernel sequence is established as follows: Similarly, a linear time-varying GM(1,3) model for the PM 10 concentration grey radius sequence is established as follows: According to the linear time-varying GM(1,3) model of PM 10 concentration kernel and grey radius sequences, the simulated values of PM 10 concentration kernel and grey radius sequences can be respectively obtained. The SO 2 concentration and NO 2 concentration sequences that are predicted by the GM(1,1) model are used as the related factor sequences for 2015 to 2018. Then, a linear time-varying GM(1,3) model is used to obtain the predicted values of PM 10 concentration kernel and grey radius sequences.
Step 3: according to Equation (5), the simulated and predicted values of the upper and lower bounds over the PM 10 concentration interval grey number sequence are obtained by restoring the simulated and predicted values of kernel and grey radius sequences. The results are shown in Table 4.
Step 4: the relative error and average relative error of the PM 10 concentration upper and lower bounds are calculated according to Equation (6) and Equation (7). The results are shown in Table 4.  Table 4. This paper will compare the simulated and predicted values, relative errors of upper and lower bounds of the PM 10 concentration interval grey number sequence. According to Table 4

Forecast Results and Discussion
To understand the trend of haze in Beijing in the future, the linear time-varying GM(1,3) model proposed in this paper is used to forecast PM10 concentration in Beijing for 2019 to 2021, the forecast results are shown in Table 5. According to Table 5, PM10 concentration in Beijing will decrease slowly from 2019 to 2021, but still exceed the China's environmental air quality standard [13]. According to all the results considered, we will perform a discussion as follows: the average predicted relative errors of the linear time-varying GM(1,3) model are less than those of the original GM(1,3) model, which is attributed to the improved adaptability of the model to the dynamic change characteristics data. The model proposed in this paper is simple in the modeling method and convenient for calculation and applications. In addition, this novel model expands the range of predicted values from the real number to interval grey number, which can broaden the range of applications. To a certain extent, it can make up for the errors caused by data acquisition tools, acquisition conditions and acquisition personnel. However, this novel model still has certain limitations. It only considers that the related factor sequences are air pollutants, and it neglects the influence of meteorological factors on haze. Therefore, the future research will select both air pollutants and meteorological factors as the related factor sequences to study haze.

Conclusions
Aiming at forecasting haze of Beijing more accurately, this paper introduced a linear time polynomial into the GM(1,N) model based on interval grey number sequences, and established a novel linear time-varying GM(1,N) model based on interval grey number sequences. The data of PM10 concentration, SO2 concentration and NO2 concentration in Beijing, China, were selected as modeling data to establish a linear time-varying GM(1,3) model from 2008 to 2018. The results indicate that the prediction model proposed in this paper has a higher prediction accuracy than the original model, and both the corresponding prediction errors are less than 5%, which proves the validity and practicability of this model. When using this novel model to make a forecast of PM10 concentration in Beijing, the forecast shows that PM10 concentration will be a downward trend for 2019 to 2021 in Beijing. This is because PM10 concentration is not only determined by air pollutants, but also related

Forecast Results and Discussion
To understand the trend of haze in Beijing in the future, the linear time-varying GM(1,3) model proposed in this paper is used to forecast PM 10 concentration in Beijing for 2019 to 2021, the forecast results are shown in Table 5. According to Table 5, PM 10 concentration in Beijing will decrease slowly from 2019 to 2021, but still exceed the China's environmental air quality standard [13]. According to all the results considered, we will perform a discussion as follows: the average predicted relative errors of the linear time-varying GM(1,3) model are less than those of the original GM(1,3) model, which is attributed to the improved adaptability of the model to the dynamic change characteristics data. The model proposed in this paper is simple in the modeling method and convenient for calculation and applications. In addition, this novel model expands the range of predicted values from the real number to interval grey number, which can broaden the range of applications. To a certain extent, it can make up for the errors caused by data acquisition tools, acquisition conditions and acquisition personnel. However, this novel model still has certain limitations. It only considers that the related factor sequences are air pollutants, and it neglects the influence of meteorological factors on haze. Therefore, the future research will select both air pollutants and meteorological factors as the related factor sequences to study haze.

Conclusions
Aiming at forecasting haze of Beijing more accurately, this paper introduced a linear time polynomial into the GM(1,N) model based on interval grey number sequences, and established a novel linear time-varying GM(1,N) model based on interval grey number sequences. The data of PM 10 concentration, SO 2 concentration and NO 2 concentration in Beijing, China, were selected as modeling data to establish a linear time-varying GM(1,3) model from 2008 to 2018. The results indicate that the prediction model proposed in this paper has a higher prediction accuracy than the original model, and both the corresponding prediction errors are less than 5%, which proves the validity and practicability of this model. When using this novel model to make a forecast of PM 10 concentration in Beijing, the forecast shows that PM 10 concentration will be a downward trend for 2019 to 2021 in Beijing. This is because PM 10 concentration is not only determined by air pollutants, but also related to meteorological factors, such as wind speed, precipitation, and so forth. Specifically, when wind speed or precipitation increases, haze will decrease in Beijing [43]. Meteorological factors vary from year to year, and the extreme weather event such as El Niño cannot be accurately forecasted a year in advance. When the extreme weather event occurs, the actual PM 10 concentration in Beijing may be different with our forecast for 2019 to 2021. If there is no extreme weather event in that year, the haze predicted value of this paper will be accurate. Therefore, this model can provide decision support for the government when working toward greater haze control.
The model proposed in this paper had a high accuracy of prediction, but there is much room for improvement. This paper only selected two pollutants as related factors, which were SO 2 and NO 2 , in order to forecast PM 10 concentration in Beijing. In future research, we will consider a few meteorological factors as related factors to construct model, such as wind speed, relative humidity, air pressure, temperature, precipitation, etc. Besides, we will apply this novel model to several key cities in the Beijing-Tianjin-Hebei region of China. On this basis, we will use data mining technology to compare and analyze the haze of different cities in the Beijing-Tianjin-Hebei region.