The evolving concept of air pollution: a small‐world network or scale‐free network?

To analyze the dynamics of air pollution, a homogenous partition of the coarse graining process is employed to transform the daily air pollution index series in Lanzhou into a character series consisting of five characters (R, r, e, d and D). The nodes of the pollution fluctuation network are 125 three‐symbol strings (i.e. 125 fluctuation patterns in a duration of 3 days) linked in the network's topology by a time sequence. The network contains integrated information about the interconnections and interactions among the fluctuation patterns of pollution in the network topology. After calculating the dynamical statistics of degree and degree distribution, we find that the distribution follows a three‐stage power‐law distribution characterized by a scale‐free property with hierarchy structure and small‐world effect. Therefore, the pollution fluctuation network is not only a scale‐free network with hierarchy but also a small‐world network. The higher the degree of the node is, the greater the probability that the pollution fluctuation modes will occur. The main nodes of pollution fluctuation networks generally contain the symbols R and r, which demonstrates that the feature of pollution fluctuation is mainly ascending.


Introduction
With the rapid development of China's economy, the environment is placed under increasingly heavy pressure. The atmosphere has become pervasive, a condition that is closely related to the health of people everywhere. Lanzhou in China, as an important part of the economy, has been developing rapidly and has a high population density. The region's air pollution and other atmospheric environmental issues have gradually elicited widespread attention (Sanchez-Reyna et al., 2005;Chen et al., 2015) Technical support for air pollution control and air quality amelioration must be provided. The dynamic change in air quality must be systematically analyzed, and the changing laws of atmosphere pollution should be determined.
In recent years, research on atmospheric environmental change mainly focused on SO 2 , NO x (Hamer and Shallcross, 2007;Griffiths and Cox, 2009) and total suspended particulate (Latha and Badrinath, 2004;Sanchez-Reyna et al., 2005) and involved the organic compounds and heavy metals in PM 10 and PM 2.5 . The research methods employed are mainly analysis methods that combine investigation-sampling and physics-chemistry experiments (West et al., 2005;Hamer and Shallcross, 2007), pollution-load method (Zhang et al., 2014), Spearman's rank correlation coefficient method (Heo and Lee, 2013), environmental Kuznets curve model (Xiao et al., 2011), neural network model (Pasini and Modugno, 2013), fuzzy mathematics method (Li and Liu, 2013), and so on. The development of air pollution is a complex dynamic system, and the air pollution series itself is nonlinear and non-stationary. Hence, using traditional methods to reveal the dynamic characteristics of air pollution is difficult and cannot highlight the acts of nature.
Lanzhou is one of the most heavily air-polluted cities in China. To establish reasonable preventive countermeasures, the pollutant characteristics and the mechanisms of the pollution indexes' temporal evolution need to be understood. In recent years, numerous researchers have devoted a considerable amount of attention on Lanzhou's atmospheric environmental quality and air pollution, including the causes and influencing factors of air pollution (Wang, 1992;Yu et al., 2011), the physical mechanism of atmospheric pollution formation in Lanzhou (Wang, 1992;Hu and Zhang, 1999;Yu et al., 2011). To further explore effective pollution forecasting methods, the meteorological conditions and meteorological data characteristics in Lanzhou and their relationship with pollutant concentrations have been investigated (Wang et al., 2000;Shang et al., 2001). Several researchers established different types of atmospheric pollution diffusion models The evolving concept of air pollution 309 based on Lanzhou's special geographical location of valley basin (Jiang and Peng, 2002;Xie and Jie, 2010). In these studies, the application of several models to Lanzhou (e.g. the site-optimized model (Dirks et al., 2006)) is limited because of the topography of the valley. The analysis of causes and formation mechanism of atmospheric pollution has not been fully developed. Time variation and dynamism exist in the distribution of the concentration of atmospheric pollutants, and a chaotic time series should be applied to reflect the temporal evolution law of atmospheric pollutants along with the change in time and other factors.
Complex network refers to a network that has one or all properties of self-organization, self-similarity, attractor, small world and scale free (Watts and Strogatz, 1998;Jeong et al., 2000). Since the late 20th century, the number of studies on complex networks has been increasing. Complex networks have been widely utilized in research on social science (Wasserman and Faust, 1994), computer science (Gao and Li, 2009), transportation (Soh et al., 2010) and other fields. However, its application in environmental science is rare.
The relation between motion track and formal language is established by symbol dynamics, and complexity is characterized by grammar complexity theory. The core content is the coarse grain of symbol dynamics and time series. Applying different levels of coarse graining, rounding a small level of details, and making them characteristic quantities help highlight the essential characteristics (Mitran et al., 2012). Hence, studying the change law of atmosphere pollution with a complex network is a new attempt and thus has an important research value.
According to the air pollution index data in Lanzhou obtained on 1 July 2000 to 30 June 2015 from the website of the Chinese Ministry of Environmental Protection (http://www.zhb.gov.cn/), this study examine the daily API data of Lanzhou in China using complex network theory and reveal the dynamic characteristics of air pollution change in Lanzhou from the perspective of a complex network. By using the uniform probability principle in the foundation of a homogenous partition of the coarse graining process, the daily air pollution index series in Lanzhou is transformed into a character series consisting of five characters (R, r, e, d and D). The nodes of the pollution fluctuation network are 125 three-symbol strings linked in the network's topology by a time sequence. The network contains integrated information about the interconnections and interactions between fluctuation patterns of pollution in network topology. To better understand the complex characteristics of the air pollution system, the dynamic statistical characteristics and topological parameters of the network are analyzed, and the inherent law of the pollution fluctuation network is obtained.

Data source and data preprocessing
The data source is the daily air quality index of Lanzhou offered on website of the Chinese Ministry of Environmental Protection (http://www.zhb.gov.cn/); the data were obtained from 1 July 2000 to 30 June 2015. When the air pollution index data of 9 days were unavailable, we made up for it by the arithmetic average value of 2 days' data. If 1 July 2000 is set as number 1, then 30 June 2015 is set as 5477. To eliminate border effects, the data were extended by the cycle method, and the main data processing and wavelet transform were accomplished with Matlab 7.6. The seasons were divided as follows: spring (March to May), summer (June to August), autumn (September to November), and winter (December to February). The time series were filtered with a locally projective nonlinear noise reduction filter (Chelidze, 2013). The method works on the hypothesis that a natural time series is a combination of both a low-dimensional dynamical system and high-dimensional (random) noise. Unlike linear filters, nonlinear ones only remove noisy data points. These points can then be replaced by estimates computed from a nonlinear interpolation process (Fuentes, 2003).

Building a pollution network
The creation of a pollution network for the daily air pollution index series reflects the change fluctuation characteristics of air pollution in Lanzhou. We calculated the topology statistics of pollution from the perspective of complex networks. The following is a brief introduction of the main steps implemented to build the pollution network.
The first step is a five-valued coarse graining process. Calculating fluctuation k(t) of the air pollution index series is as follows: where Δt is the time interval scale of the factor sequence. We mainly focused on Δt =2, i.e. pollution index fluctuations in any three consecutive days. The variation slope k of three consecutive days for air pollution index time series P(t) can be fitted with the least squares method, that is, Calculating probability P k of possible fluctuation values in the air pollution index series is as follows: where Num(x) is the number of times fluctuation mode x occurred corresponding to an air pollution index. We divided the P k values into five equal intervals.  Pollution fluctuation k(t), which lies within the five equal intervals, can then be presented as R, r, e, d and D, as shown below.
The meaning of R, r, e and d in this study is shown in Figure 1.
Daily air pollution index series P(t) was transformed into a corresponding symbolic series in Lanzhou in the last 15 years.
Transforming time series into symbolic series, i.e. symbolic time series analysis (STSA), is a new analysis method developed from symbolic dynamic theory, chaos theory, and information theory. The basic idea is to transform a data series with a number of possible values into a symbolic series with only a few different values. For the selection of symbols, owing to the difference in symbolic rules (Lehman et al., 1997) the symbols that scholars select are different . Although having too many symbols can sample the details of information, the amount of operation becomes too large. However, the details of the series may be obscured by small symbols with a very high loss rate of the amount of information; this can change the dynamic characteristics of the system (Chen et al., 2010).
Research on the atmospheric system generally focuses on the change in the process. However, this chapter investigates the fluctuation patterns that represent air pollution index fluctuations in several consecutive days, so meta-patterns are guaranteed to be equal in quantity when the numerical fluctuations are categorized. In this manner, we can ensure the probability that five actors occur equally in the fluctuation patterns.
Time interval scale Δt is an important parameter in the process of transforming an air pollution index numeric series into a symbolic series. If the Δt value is selected differently, the time series resolution will also differ. For example, Zhou et al. (2008) set Δt to 1 in a study of temperature change. The number of times, i.e. N(R),  Figure 2 shows the graphs of symbol statistics in double logarithmic coordinate. Figure 2 reveals that a good linear relationship exists among symbols R, r, e, d and D. This condition indicates that air pollution index fluctuation itself presents a good scale-free characteristic. The scale-free characteristic of fluctuation reflects the optimization in the network evolution process (Watts and Strogatz, 1998;Brede, 2011); it tends to have a small time scale in processing the symbols using five-valued coarse graining (Liu et al., 2007). Hence, it is effective to investigate the transformation from air pollution fluctuation into symbolic series in three consecutive days in this study (Li and Wang, 2003).
The second step is to build the network. We introduced a weighted network to describe the correlation among the fluctuation patterns in the air pollution index series. The nodes of the network are 125 fluctuation patterns with three-symbol strings, and the edge of the network is the previous node points to the next node. That is, one pattern is transformed into the next one, and one pollution process is transformed into another. The weight of the edge between two nodes is the number of multiple disjoint parallel connections between them. For example, in the pollution fluctuation network we built, the symbolic series are eRdDeRdrdeDDDr eDDDrDedDdDdedrRreeRrreRedrrDdredDrDDedDe reDdDeeRdeeRedrdeDdD The three-symbol strings are utilized as the nodes of networks, and the directed connection of the nodes is eRd→DeR→drd→eDD →Dre→DDD→rDe→dDd→Dde→drR→ree→Rrr→e Re→drr→Ddr→edD→rDD→edD→ere→DdD→eeR →dee→Red→rde→ DdD.
From this, we can build a directed weighted network that shows the interactions among various fluctuation patterns. Figure 3 provides the correlation diagram of the part of nodes.

Discussion
With the lucubration of complex networks, many conceptions and measures have been proposed to describe and express the structural characteristics of such networks. Meanwhile, degree and degree distribution are the most important statistical characteristics.
Degree is also called connectivity, and 'degree of nodes' means the edge that is connected to the nodes. Degree has different meanings in different networks. In a social network, degree can represent individual effects and importance. The higher the degree is, the more influence and function exist in the entire system and vice versa. Degree distribution represents the probability distribution function p(k) of node degree, and it means the probability that the node is connected with k edges. The current research involves two common degree distributions. The first one is exponential degree distribution, that is, p(k) attenuates exponentially with increasing k. The second one is power-law distribution, that is, p(k) ∼ k , in which is the degree exponent; it has various kinetic properties in different networks. In addition, degree distribution has other forms. For instance, it is a two-point distribution in a star network and a one-point distribution in a regular network. Table 1. Sequence of the size of node degree of various fluctuation modes in the pollution fluctuation network.

Node RrR rdR DDR rrR rRR DeR RDr rRr eRe RRe
Degree 362 247 237 226 209 182 173 151 142 Figure 3 shows the complex network constructed with partial nodes of the daily average air pollution index series. The thickness of the edge lines that connect the nodes reflects the strength of correlation. For instance, the thickest line is between nodes RrR and rdR, which means the correlation between the two air pollution fluctuation modes is the most intensive; in addition, the two modes have good contact in the long term. Table 1 shows the sequence of the size of the node degree of fluctuation modes in the air pollution fluctuation network. Several node degrees, such as nodes DDR, rrR and DeR, are relatively large. This shows that the network fluctuation modes represented by these nodes play an important direct correlation role in the air pollution fluctuation network.
All the fluctuation modes were transformed into these important modes. The frequency of mode transformation is high, so the extreme pollution event is likely to occur in Lanzhou. In addition, we also derived the character frequency statistics on the node degree of the pollution fluctuation network. Symbol R often presents a sharp rise in first 17 nodes of a high degree, whereas D seldom declines rapidly, which symbol R and D appear 18 times respectively. Hence, the pollution fluctuation characterized by a sharp rise appears more often in the pollution change. Figure 4 shows the degree distribution and cumulative degree distribution of nodes. The degree distribution of nodes in the fluctuation network satisfies the power-law distribution overall and has a heavy tail as a result of a random selection mechanism caused by random connection. However, as long as the amount of certainty is sufficiently large, the random heavy tail of the power-law distribution will be restrained (Patriarca et al., 2006). In addition, the node degree of the network obeys the three-segment power-law distribution. Therefore, the pollution fluctuation network has a scale-free property, but the degree is distributed unevenly, and the difference in the importance of the degree among different pollution fluctuation modes is relatively large. After fitting and statistical calculation, the cut-off points were determined to be 60 and 100. The first segment index example to explain the cause of the power-law distribution by HOT theory. The atmosphere system is composed of many subsystems; large ones include climate and hydrological systems, and small ones include temperature, precipitation, and pollution dynamic systems. If the pollution system can bear significant changes in the robust climatological-hydrological factors, such as temperature, humidity, precipitation, and pollution, but cannot bear the disturbance of small pollution events with some uncertainties, such as exhaust emissions of home kitchens and emissions of a few air pollutants from catering, small factories, and vehicles, then serious pollution events may be created by the SOC behavior of air pollutants (Shi et al., 2008;Shi and Liu, 2009). When the atmosphere pollution system is in a HOT state, the system satisfies the power-law distribution.
In the semi-logarithmic coordinate as shown Figure 4, pollution network approximately obey exponential decay distribution (Han et al., 2009) and can randomly select the network evolution process. The pollution fluctuation mode occurs with some randomness (when and which fluctuation mode will occur are unknown). This scenario further indicates that the air pollution index has chaotic characteristic, but the number of times that all types of pollution fluctuation modes occur abides by certain laws in a long period of time. The range of degree distribution for the fluctuation networks is relatively narrow. Therefore, air pollution index fluctuations obey an index distribution, and this law is similar to the distribution of the number of times that all types of earthquakes occur in the earth system (Abe and Suzuki, 2012). The reason may be that the pollution process follows the maximum entropy principle (Thomas, 1979), which embodies the inherent essential dynamic characteristics of the atmosphere system to a certain extent.
As a result, the air pollution fluctuation network has a scale-free property and a small-world effect and is a small-world network with a scale-free property. A high-speed railway's and an airline's compound network as well as the topological properties of international oil prices have such characteristics. The two characteristics of the air pollution fluctuation network reflect the harmony and unity of certainty and randomness. With chaotic characteristics, natural unity and diversity are further described with the passing of time in the pollution process. Many networks have scale-free characteristics and small-world effect. For instance, the atmospheric network is not only a small-world network but also a scale-free network and indicates a long-range correlation.

Conclusions
A pollution fluctuation network was built on the homogenous partition of the coarse graining method by applying the complex network to the daily air pollution index series of Lanzhou. In accordance with the principle of equal probability, the fluctuation of climatological-hydrological factors was transformed into a symbol series. To describe the complexity of the spatial structure in the air pollution system quantitatively based on the structural characteristics, the relevant factor values were inputted into the corresponding network topology.
To obtain the inherent law of the fluctuation network, we specifically analyzed the topological properties of the degree distribution of the fluctuation network and obtained the following conclusions.
(1) The degree distribution of the pollution fluctuation network obeys a three-segment distribution; the network is a small-world network with scale-free characteristics. The frequency of the various fluctuation modes of the pollution fluctuation network follows the maximum entropy principle at a large time scale. This is probably the embodiment of the self-organized criticality of the pollution network.
(2) Several differences exist in the degree value of the nodes in the fluctuation network. The degree of RrR, rdR, DDR, rrR and rRR is markness, these models of temperature fluctuant represent by the five nodes play an important direct correlation role in the process of pollution change, and most of all pollution fluctuation modes have risen or a sharp rise trend.
In conclusion, by adopting the theory of complex network, we analyzed the daily air pollution index in Lanzhou, identified the inherent laws of the pollution fluctuation network, and obtained a series of valuable conclusions that are significant in controlling and reducing the air pollution in Lanzhou. In the future, we plan to enhance our work in following three aspects. First, we will investigate the natural dynamic mechanism of air pollution. Secondly, we will study the causes of the differences among different modes. Third, we will explore why the fluctuation network obeys a three-segment distribution.