A New Decomposition Ensemble Learning Approach with Intelligent Optimization for PM 2.5 Concentration Forecasting

. In this study, we focus our attention on the forecasting of daily PM 2.5 concentrations. According to the principle of “divide and conquer,” we propose a novel decomposition ensemble learning approach by integrating ensemble empirical mode decomposition (EEMD), artiﬁcial neural networks (ANNs), and adaptive particle swarm optimization (APSO) for forecasting PM 2.5 concentrations. Our proposed decomposition ensemble learning approach is formulated exclusively to deal with diﬃculties in quantitating meteorological information with high volatility, irregularity, and complicacy. This decomposition ensemble learning approach mainly consists of three steps. First, we utilize EEMD to decompose original time series of PM 2.5 concentrations into a speciﬁc amount of independent intrinsic mode functions (IMFs) and residual term. Second, the ANN, whose connection parameters are optimized by APSO algorithm, is employed to model IMFs and residual terms, respectively. Finally, another APSO-ANN is applied to aggregate the forecast IMFs and residual term into a collection as the ﬁnal forecasting results. The empirical results show that the forecasting of our decomposition ensemble learning approach outperforms other benchmark models in terms of level accuracy and directional accuracy.


Introduction
With the development at the technological level and the improvement of people's living standards, environment pollution becomes more and more serious, especially in developing countries. PM 2.5 refers to the particles having the diameter of 2.5 micrometers or smaller, which can go directly to the alveoli of the lungs. Compared to the PM 10 (size of 10 microns or less) and TSP (size of 100 microns or less), PM 2.5 is more likely to absorb hazardous and noxious substances. It is the carrier of all sorts of toxic substances in the air. Some scientific research shows that nitrogen oxides and sulfur dioxide emissions may be separately transformed to PM 2.5 nitrate ion and sulfate ion in particular environmental conditions. Human exposure to PM 2.5 can lead to a variety of adverse health impacts, such as cardiovascular and respiratory problems [1][2][3]. Based on the effect on environment and human health, PM 2.5 pollution hierarchies have been divided into six grades from excellent to serious pollution, which are described in Table 1. During the past two decades, some epidemiological studies have demonstrated that the major air pollutant impacting human health is particulate matter [4]. e adverse health impacts of particulate matter have become a well-known problem in our daily life. Except the accumulation of dust and the reduction of visibility, the direct effect on human health via inhalation is a severe problem [5,6].
Due to the worse harm of PM 2.5 , it became the study object and chief pollutant for rigorous control in the world, especially in the developed countries, during recent years. Air quality monitoring systems have gain large amounts of pollutant concentration data hourly or daily; it is necessary for us to analyze the data through appropriate methods [7,8]. However, due to the serious environmental pollution, air quality monitoring systems show that many areas do not conform to the standards, which may lead to serious health problems, with ecological and economic effects. However, very few countries have a real-time air quality forecasting (RT-AQF) platform. In the United States, the public can learn about the future air quality index (AQI) through television, newspapers, radio, Internet, and other media, including air pollutant concentration and its associated health risks [9]. erefore, PM 2.5 concentration forecasting is obviously necessary, which will be able to give early pollution warnings and nip the pollution in the bud, so precaution and governing can get in progress as early as possible. Recently, a lot of efforts have been made in the research of PM 2.5 concentration forecasting.
A lot of mathematical models are applied for forecasting PM 2.5 concentrations. According to their fundamental principles and math representation, the mathematical models are mainly classified into two types: empirical models and deterministic models [10,11]. Empirical model is the use of statistics or big data technology to quantify the relationship between AQI observed by air quality monitoring systems and that observed by meteorological parameters. Deterministic model is to estimate the air quality index based on simulating physical and chemical reaction, which uses mathematical models to understand how chemical processes occur in the transmission and transformation process of the atmosphere, and then test these models to see if they can create the desired results [11]. However, because of the complexity of meteorological parameters and the difficulty of quantitative estimation, there exists a vast amount of uncertainty which causes PM 2.5 concentration forecasting to differ from reality. erefore, compared with deterministic model, empirical model has higher precision of forecasting and better adaptability. Many empirical models, such as Autoregressive Integrated Moving Average (ARIMA), multilinear regression (MLR), and artificial neural network (ANN) models, have been applied to PM 2.5 concentration forecasting [12][13][14][15]. As a traditional statistical model, ARIMA needs historical data continuity, because it is better at capturing the linear pattern of a time series, especially seasonal pattern. Similarly, MLR is more suitable for the linear pattern, but it is difficult in capturing extreme values. Additionally, artificial neural networks (ANNs), as a new machine learning technique which has great versatility, can recognize noise and nonlinear patterns that include extremes in the original data [16]. Moreover, some researchers have found that, compared with the single model, hybrid empirical models can better capture linear and nonlinear patterns of the time series and deal with extreme value effectively, ultimately improving the forecast accuracy [17][18][19].
Because of the computational efficiency and forecasting accuracy, ANN model has been widely used [20,21]. ree types of artificial neural network models and a linear model have been chosen to forecast daily PM 2.5 concentrations in El Paso (USA) and Ciudad Juarez (Mexico) [22]. Zhu et al. put forward a hybrid model optimized by particle swarm optimization (PSO) algorithm and obtained good performance in PM 2.5 concentrations forecasting [23]. However, even considering the meteorological and geographical data, combination of linear and nonlinear models cannot meet the complexity of air quality data [24].
Fortunately, some problems we mentioned above can be partially solved by the principle of "divide and conquer" [25]. e purpose of "divide" is to simplify the forecasting difficulties by decomposing a task into some relatively easy subtasks, while its overall goal is to formulate a consensus forecasting result for the original data [26]. erefore, recently based on this principle, some hybrid ensemble approaches had been put forward to solve some difficult forecasting problems, such as the forecasting of international crude oil price, and empirical results show that hybrid ensemble approaches are better than individual forecasting models [27,28]. In fact, previous research has already demonstrated the advantage of "divide and conquer" principle. For instance, while integral models may ignore some value properties and thus lead to evaluation errors, Fischer has proved that decomposition method can analyze problems and their intrinsic properties and make them more comprehensive and clear [28]. Likewise, Kleinmuntz argued that individuals have the bounded ability to deal with the information, which may become invalid in the face of a large and complex system [29].
e main contribution of this study is to establish a more accurate approach to forecast PM 2.5 concentrations, and evaluate the forecasting performance of the approach. PM 2.5 concentrations are influenced by a lot of factors, but the influence law is uncertain, so we just care about PM 2.5 concentrations. Based on the principle of "divide and conquer", this study proposes a novel decomposition ensemble learning approach by integrating EEMD, ANN, and APSO optimization algorithm for PM 2.5 concentrations forecasting at Lanzhou city in China. Generally, because of the complexity and irregularity of PM 2.5 concentration time series, the principle of "divide and conquer" is established to deal with this problem. erefore, a novel framework of decomposition ensemble learning approach integrating EEMD, ANN, and APSO is presented. In the proposed approach, a difficult forecasting task has been divided into several relatively simple subtasks; the process of adding such a decomposition process can make it easier to solve the problem of forecasting, thus improving the forecasting performance. e main reason for selecting Lanzhou city as the research area is that it has significant characteristics in terms of climate, topography, and population. In addition, the study verified how well the approach we presented performs in different circumstances. e remaining parts of this article are organized as follows. Section 2 will illustrate research data collection and preprocessing. en, Section 3 will briefly introduce the 2 Discrete Dynamics in Nature and Society related methods used in this study. e accuracy of forecasting results and validity of the proposed approach are discussed in Section 4. Finally, the paper is concluded in Section 5.

Data Collection and Preprocessing
e research, analysis, and results of this paper are all based upon the data of PM 2.5 concentrations in Lanzhou, which is the capital city of Gansu province and has specific location and climatic conditions. Lanzhou is located on the upper reaches of Yellow River and at the geometric center of China's continental territory. With the Yellow River going through, the city is sandwiched by mountains on the northern and southern banks. e average altitude of Lanzhou is 1520 m, and it is located 36 degrees 3 minutes north latitude and 103 degrees 40 minutes east longitude and situated in the temperate zone with a semiarid climate. e PM 2.5 concentration data used in this study are obtained from the Ministry of Ecology and Environment of China (http://www.mee.gov.cn/). e PM 2.5 concentration daily data covers the period from January 1, 2017, to October 31, 2019, with a total of 1004 observations.

Related Methods
As we all know the PM 2.5 concentration has high volatility, nonlinearity, and irregularity. In this study, we propose a new decomposition ensemble learning approach to forecast PM 2.5 concentrations in terms of the principle of "divide and conquer". e general framework of our proposed decomposition ensemble learning approach is as follows: decomposition, single forecasting, and ensemble forecasting. First of all, some decomposition methods can be utilized to decompose the original PM 2.5 concentrations data into several meaningful component consequences. en, some optimized forecasting methods are employed to forecast each component, respectively. Finally, the forecasts results of each component can be aggregated into the final forecasting results by means of the ensemble approaches [30].
In summary, different data decomposition methods, intelligent optimization algorithms, forecasting models, and ensemble approaches can formulate different decomposition ensemble learning approaches. In this study, firstly, we utilize ensemble empirical mode decomposition (EEMD) to decompose original data of PM 2.5 concentrations into a specific amount of independent intrinsic mode functions (IMFs) and a residual term. Secondly, artificial neutral network (ANN) optimized by adaptive particle swarm optimization (APSO) is applied to forecast all IMFs and residual term, respectively. Finally, another APSO-ANN is employed to aggregate the forecasting results of IMFs and residual term into a collection as the final forecasting results.
is is called EEMD-based APSO-ANN ensemble learning approach. e overall formulation process of our proposed decomposition ensemble learning approach is as follows.

Ensemble Empirical Mode Decomposition. Empirical model decomposition (EMD) was initially proposed by
Huang et al. [31]. In order to overcome the shortcoming of the mode mixing problem in EMD, Wu and Huang presented the ensemble empirical mode decomposition (EEMD) [32]. EMD and EEMD are self-adaptive algorithms compared with other traditional decomposition methods, such as wavelet decomposition and Fourier decomposition. e specific effect of local feature can identify all modes; hence, EMD and EEMD decompose signals into several intrinsic mode functions according to its characteristic of time scale.
In recent years, EMD and EEMD have been widely applied to decompose complex time series and some complex system modeling [31,[33][34][35]. is study chooses EEMD as the data decomposition method. e EMD and EEMD method will be introduced as follows.
EMD method is a kind of adaptive time series decomposition technique which is used to process nonlinear and nonstationary signals and is based on Hilbert-Huang transform (HHT) [36]. Because of the complexity of the data, the method assumes that data may have different modes of oscillations simultaneously. Tested signals are decomposed into a number of intrinsic mode functions (IMFs) by using local wave method, and time-frequency spectrums of IMFs are acquired by means of Hilbert transformation, which must meet the following requirements: (1) they function within the entire time domain, in which the number of local extreme value points and zero crossings must be equal, or at most by one; (2) the local maximum envelope and the local minimum envelope must be zero on average at any time point.
We can define meaningful IMFs through those two conditions. According to the definition, we can decompose any complicated data series x t (t � 1, 2, . . . , T); the process is presented as follows.
(1) Find out all the local extrema of original data series x t . (2) Use cubic spline interpolation to create the upper and lower envelopes x up,t and x low,t , respectively, and calculate the average of the upper and lower envelope: m t � (x up,t +x low,t )/2. (3) Subtract the envelope mean from original time series and define it as c t , c t � x t − m t . Inspect whether c t meets the above two basic conditions of IMF; if c t is not an IMF, replace x t with c t and repeat the above two steps. (4) Extract an IMF and replace x t with the residual r t � x t − c t . Repeat Steps 1-3 until the stop criterion is satisfied.
Using this screening process, the original data series x t can finally be decomposed into a sum of IMFs and a residue term: where n is the number of IMFs, r n,t is the final residue term, and c j,t (j � 1, 2, . . . , n) is the jth IMF.

Discrete Dynamics in Nature and Society
Even though EMD is a fully data-driven and selfadaptive data decomposition method, there is also an obvious disadvantage, such as the mode mixing. In order to address the mode mixing problem, EEMD technique was proposed by Wu and Huang [32]. EEMD takes the method of EMD as the basis and successfully solves the mode mixing problem caused by intermittent noise by adding white noise to the original time series before decomposition. EEMD method can not only reserve the information of the original data, but also overcome the drawback of mode mixing. e sifting steps of EEMD are as follows: (1) Add a group of white noise w t to the original data series x t to acquire X t : X t � x t + w t . (2) Employ EMD method to decompose X t , and obtain a series of IMFs: X t � n j�1 c j + r n . (3) Add different white noise series to the original data, repeat the above steps: and obtain corresponding IMF components: X i,t � n j�1 c i,j + r i,n . (4) e final results are the ensemble averages of corresponding IMFs: c j � n j�1 c i,j /N. Wu and Huang demonstrated that the effect of the added noise is strictly controlled via the following statistical criteria [32]: where N is the total number, ε is the amplitude of the added noise, and ε n is the final standard deviation of error between original data series and the corresponding IMFs. In practice, the total number N is often set to 100 and ε n of white noise series is set to 0.1 or 0.2 [33].

Artificial Neural Networks.
Artificial neural networks (ANNs) are widely applied in air pollution forecasting, which can build flexible model for various nonlinear problems. Relative to other types of nonlinear models, ANNs are universal approximators with a high reliability and accuracy in estimating a large class of functions. Additionally, ANNs are largely determined by the characteristics of data in the model building process; hence these techniques do not require prior assumption. e neural network architecture usually consists of the input layer, the hidden layer, and the output layer [37]. e input layer accepts the data imported to the network, and the output layer realizes the output of evaluation results. e hidden layer, which is between the input and output layer, consists of a number of neurons or hidden units placed parallel to each other. From the viewpoint of mathematics, the hidden neuron h j is described as the following mathematical expressions [38]: where φ(·) is the activation function that is usually chosen as the logistic sigmoid function φ(x) � 1/(1 + e − x ), w ij is the weight of input x i at neuron j, and b j is the bias of neuron j. e relationship between the output f(x) and the inputs (x i , y i ) l i�1 is presented as the following mathematical expressions: where w j (j � 0, 1, 2, . . . , q) is the connection weights, q is the number of hidden nodes, and f(·) is a function determined by the network structure and the connection weights. In this study, the architecture of ANN selects backpropagation neural network (BPNN) that is one of the most popular and effective forecasting techniques. BPNN is a three-layered feedforward architecture based on backpropagation (BP) algorithm. e details of BPNN can be found in [38].

Adaptive Particle Swarm Optimization.
Particle swarm optimization (PSO) is a heuristic search algorithm based on swarm intelligence and has been widely used to solve various problems. e principle of PSO is to simulate the characteristics that the birds update location in searching food. First of all, it initializes a group of particles in the solution space, each of which denotes a potential optimal solution. e characteristics of the particles are measured by three indicators: location, speed, and fitness. Particles update timing position by tracking individual extremum (P best ) and global extremum (G best ). PSO algorithm can easily cause early maturing; in order to address this problem, a novel adaptive particle swarm optimization (APSO) algorithm has been proposed to solve the problem of low precision and avoid premature phenomena of basic PSO algorithm [39].
Supposing that there is a population in a D-dimensional search space, which consists of n particles X � (X 1 , X 2 , . . . , X n ). Among them, X i � [x i1 , x i2 , . . . , x iD ] T denotes the position of ith particle in the D-dimensional search space, also on behalf of a potential solution of the problem. According to the objective function, we can calculate the corresponding fitness value of each particle. e speed of the position change for particle is , P i2 , . . . , P iD ] T denotes the best previous position which gives the best fitness value of the ith particle, and P g � [P g1 , P g2 , . . . , P gD ] T denotes the best position among all the particles of the population. e specific formula of adaptive adjustment is as follows: where w t is the adaptive inertia weight, c 1 and c 2 are nonnegative constants, which are called acceleration factors, and r 1 and r 2 are random numbers distributed in [0, 1]. λ and θ are constraint factors in the range [0, 1]. w min is the minimum inertia weight. f(·) is the fitness function; it is defined as follows in this study: where y i and y i are the actual value and the forecast value of PM 2.5 concentration, respectively [37].

e Framework of Our Proposed Approach.
Given that x t (t � 1, 2, . . . , T) is a time series, we could purpose a proactive mechanism to make m-step ahead forecasting, i.e., x t+m . In this study, it is worth reminding that we apply iterative forecasting method, which can be represented as follows: where x t+m is the forecast value, x t is the actual value, and l denotes the lag orders. In ANN, the initial weights and thresholds have significant meaning and play an important role in learning and optimizing the neural network [40]. However, these parameters are randomly generated in the beginning and then adjusted in the whole training process. Hence, APSO algorithm is applied to determine the threshold and weight values of artificial neural network, as shown in Figure 1. Meanwhile, the time series data inevitably adulterates some noise or worthless and meaningless information. erefore, our proposed EEMD-based APSO-ANN ensemble learning approach has been established to forecast PM 2.5 concentrations at Lanzhou city in China.
According to the framework in Figure 2 and the previous research, this study will establish a novel decomposition ensemble learning approach by integrating EEMD and APSO-ANN for PM 2.5 concentrations forecasting. As shown in Figure 2, our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach is generally composed of the following three main steps: (1) e original PM 2.5 concentrations time series x t (t � 1, 2, . . . , T) is decomposed into n IMFs c j,t (j � 1, 2, . . . , n) and one residual component r n,t by EEMD method. In short, in view of the principle of "divide and conquer", our proposed EEMD-based APSO-ANN ensemble learning approach can be described as a general framework of "EEMD (decomposition)-APSO-ANN (single forecasting)-APSO-ANN (ensemble forecasting)". In order to verify the effectiveness of our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach, PM 2.5 concentrations data collected from Lanzhou city is used as the test target. For more details, we will discuss in the next section.

Empirical Study
In this study, the sample data are divided into two subsets: training subset and testing subset. We treat data from January 1, 2017, to September 30, 2019, as training subset with 974 observations used for model training. Similarly, data from October 1, 2019, to October 31, 2019, with 31 observations are treated as the testing subset to evaluate the forecasting performance of the model. Additionally, data of the past 1 day (lag order 1), 2 days (lag order 2), 3 days (lag order 3), 4 days (lag order 4), and 5 days (lag order 5) are utilized as initial input form to forecast the following daily PM 2.5 concentrations, respectively, and finally the input form with minimum forecasting error is chosen as optimal input structure.

Evaluation Criteria of Forecasting Performance.
In this study, two evaluation criteria are utilized to evaluate the forecasting performance of our proposed decomposition ensemble learning approach. ey are mean square error (MSE) and mean absolute percent error (MAPE). e smaller the index value is, the better the forecasting performance will be. e formulas of criteria are as follows [41]: where N is the number of observation points, Y t represents the value of actual PM 2.5 concentrations for a time period t, and Y t is the forecast value for the same period. Additionally, we also consider the directional forecasting accuracy; it can be expressed by where a i � 1 if (y i+1 − y i )(y i+1 − y i ) ≥ 0 or a i � 0 and 2 otherwise.

Empirical
Results. In our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach, the first step is to apply EEMD method to decompose the original PM 2.5 concentration data series into several independent IMF components and one residue term. In this study, the ensemble member is set to 100, and the standard deviation of added white noise in each ensemble Discrete Dynamics in Nature and Society member is 0.2. All IMF components are sorted from the highest to the lowest according to the frequency, and the last one is the residue term. e decomposition results of original PM 2.5 concentrations at Lanzhou city in China are shown in Figure 3. It is easy to find that the original PM 2.5 concentrations time series is decomposed into nine independent components.
For comparison, we choose some other popular forecasting models as benchmarks to be compared with our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach. According to previous literature, five single forecasting models, ANN, GA-ANN, PSO-ANN, APSO-ANN, and ARIMA, and three groups of decomposition ensemble learning approaches are chosen as benchmark models. For the purpose of consistency, the parameters of the decomposition ensemble learning approaches are the same as single forecasting models.
To clearly analyze data, the empirical results consisted of two parts. In the first part, we will compare the results of five single forecasting models and then choose the optimum model as a single forecasting and ensemble model for decomposition ensemble learning approach. In the second part, the forecasting performance of our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach is compared with the other three decomposition ensemble learning approaches.

Performance Comparison of Single Models.
In this subsection, we compare five single forecasting models, ANN, GA-AND, PSO-ANN, APSO-ANN, and ARIMA. For the ANN techniques, the numbers of inputs and hidden layer nodes are determined using the trial-and-error method, and the active function of hidden layer is sigmoid function. Table 2 shows the forecasting errors by means of MSE, MAPE, and D stat . e forecasting results indicate that the APSO-ANN has a high forecasting accuracy, followed by PSO-ANN.
From Table 2, it is clearly seen that all of the ANN techniques are superior to the traditional ARIMA model, and the optimal lag order of inputs is 3. APSO algorithm is better than ANN without any optimization scheme, so the optimized ANN technique is regarded as the single forecasting and ensemble forecasting method in our proposed decomposition ensemble learning approach.

Performance Comparison of Decomposition Ensemble
Approaches. is subsection focuses on the forecasting performance comparison of three groups of decomposition ensemble learning approaches. Some variants of decomposition ensemble learning approaches with other decomposition methods (e.g., EMD method) and other ensemble approaches (e.g., simple addition (ADD)) are also employed as decomposition ensemble learning benchmarks to be compared with our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach.
erefore, we select three groups of decomposition ensemble learning approaches, i.e.,  Table 3 provides the forecasting results of different decomposition ensemble learning approaches.
For the above different decomposition ensemble learning approaches, we firstly discuss forecasting performance of decomposition ensemble learning approaches with different decomposition methods. We can clearly see that the EEMD-based decomposition ensemble learning approaches can obtain better forecasting accuracy than the corresponding EMD-based decomposition ensemble learning approaches. at is, the EEMD is much more efficient than EMD in data decomposition of PM 2.5 concentrations. Secondly, the forecasting performance of APSO-ANN-based decomposition ensemble learning approaches is mostly better than ADD-based decomposition ensemble learning approaches in terms of MSE, MAPE, and D stat criteria. is indicates that APSO-ANN is a powerful ensemble learning method.
irdly, we compare single forecasting models; it is clearly seen that the forecasting accuracy of APSO-ANN is better than that of PSO-ANN and GA-ANN.
In general, through the analysis above, we can obtain some interesting findings as follows: (1) Decomposition ensemble learning approaches are significantly better than other single models, such as ARIMA, ANN, GA-ANN, PSO-ANN, and APSO-ANN. e main reason is that the strategy of "divide and conquer" can effectively improve the performance of PM 2.5 concentrations forecasting. (2) It is Step 1: decomposition Step 2: single forecast Step 3: ensemble forecast    terms of both level forecasting accuracy and directional forecasting accuracy. erefore, our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach can be used as an effective forecasting framework for forecasting PM 2.5 concentrations.
Additionally, we have set the length of inputs for ANN as lag order 1, 2, 3, 4, and 5, respectively, and while the input form is lag order 5, our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach has the highest forecasting accuracy. Figure 4 shows the best forecasting results of PM 2.5 concentrations at Lanzhou city in China from October 1, 2019, to October 31, 2019.

Conclusions
e ascension of PM 2.5 concentration will lead to serious health, climate, and environment problems and cause respiratory and cardiovascular diseases. As a result, it is important and urgent to establish an early warning system based on the accurate PM 2.5 concentration forecasting. In order to address this hard issue, based on the principle of "divide and conquer", this study proposes a new decomposition ensemble learning approach by integrating ensemble empirical mode decomposition (EEMD), artificial neural networks (ANNs), and adaptive particle swarm optimization (APSO) in order to improve the performance of PM 2.5 concentration forecasting. e PM 2.5 concentration data used in this study covers the period from January 1, 2017, to October 31, 2019, at Lanzhou city in China. Our proposed decomposition ensemble learning approach takes advantage of multiple methods, such as the effective selfadaptive data decomposition of EEMD and end-to-end parameters optimization of APSO, to improve the performance of PM 2.5 concentration forecasting. To verify performance of our proposed approach, three groups of decomposition ensemble learning approaches were chosen as benchmarks to be compared with our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach. Empirical results show that our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach significantly improves the forecasting performance and outperforms some other benchmarks in terms of of level forecasting accuracy and directional forecasting accuracy. is indicates that our proposed decomposition ensemble learning approach with effective decomposition, as well as nonlinear single and ensemble forecasting, can be used as a very promising framework to solve other complex time series forecasting problems, especially for the data characterized by high volatility and irregularity.
Additionally, our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach can be applied to other applications such as finance forecasting and energy forecasting. Furthermore, this study mainly considers the univariate time series forecasting, while other factors affecting PM 2.5 concentrations were not taken into consideration. If those factors were incorporated into our proposed EEMD-APSO-ANN-APSO-ANN decomposition ensemble learning approach, the forecasting performance may still improve. ese limitations will hopefully be addressed in future research.  Data Availability e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.