Identify the contribution of elevated industrial plume to ground air quality by optical and machine learning methods

Regional severe haze caused by atmospheric particle explosion is one of the biggest environmental problems in China that has yet to be fully understood. This research managed to find the linkage between diversified shapes of heavy industrial stack plume (HISP) and local ground particle concentration. We used two optical methods: LIDAR and auto-shoot camera, to catch the HISP’s vertical shape, and two machine leaning models: binary classification and decision tree, to find the quantitative relationship between the HISP’s shape and PM2.5 concentration. The PM2.5 concentration correlated to the polygon length (PL) of HISP’s shape with a logistic function. With a plume length more than twice the height of stack, the spread of HISP’s shape accompanied with PM2.5 concentration decreasing to <100 μg m−3. The residence time of HISP’s particles was longer (>20 h) under uniform offshore dispersion than that in heterogeneous wind field, when the footprint of HISP was estimated to be > 7 km. We acquired a decision tree model to yield an exact prediction of PM2.5 concentration, in which the HISP’s length played a statistically significant role. Though the plume shape is just one of the easy-to-use indicators of complex meteorological condition, it is still practical for policy makers to identify the particle pollution caused by the elevated sources in the fastest way.


Introduction
With uniform state planning, the Chinese government has produced a spectacular improvement of air quality in the past five years [1][2][3]. However, regional severe haze events in which fine particle (diameter<2.5 μm) concentration reached to>200 μg m −3 still plagued north China, especially in wintertime [4,5]. A better understanding of the regional air pollution is urgently needed for developing follow-through remediation. Notably, with the intensive energy consumption and air pollutant emission, China's heavy industries are producing 2-3 billion tons of iron/steel and consuming 3-4 billion tons of coke/coal per year, accounting for half of the total world output [6]. These heavy industrial plants, where stack plumes are usually treated with low efficiency waste gas cleaning, are centrally distributed in north China [5,7]. From a regional view, the heavy industrial stack plume (HISP) could rise to a height of more than 1000 m, and inclines to transport from the elevated emission source to the downwind area [8,9], interacts with thermals from the land, and dilutes rapidly to the ground [10,11]. From a local view, the HISP mixes with ambient air masses just several hours after it is emitted [12,13]. The visible white or gray shape of HISP appears because of water vapor condensation but the actual amount of dispersing particles is far beyond its envelope. Therefore, HISP induces a major human health risk to local and downwind areas, and it is important for policy makers to quick determine how much the role of HISP plays in air pollution. Previously, numerical model was the most commonly employed method to evaluate the HISP impacts. In classical physical model, such as Gaussian plume model using prescribed meteorological data, the plume's particles disperse under the assumption of a uniform state with homogeneous wind field. However, these assumptions are deficient when the meteorology is spatially and temporally heterogeneous prevailing in the most common conditions [14]. Assuming instantaneous vertical mixing of plume particles, Lagrangian model also produces inaccurate results (typically underestimated) when entrainment rate is large or vertical spread speed is small [15,16]. Eulerian numerical models perform even worse than the Lagrangian model, since these models cannot resolve the sub-grid processes [16,17]. Besides, classical physical approaches are computationally expensive, dependent to large databases of input parameters which may not be complete [18]. There is a growing demand for decision-makers to provide rapid, economical and effective solutions for air pollution control [19]. The simplest way to track changeable HISP is using optical method such as cameras catching the boundary of elevated plume [20]. The vertical variability of elevated plume has been less well studied than that in the horizontal plane [21]. Optical photo methods enable sensing over a wide span of space to catch HISP's dispersion and transport [22]. When the bottom envelope of HISP is close to the ground, chances are that the plume's particles increase ground PM 2.5 concentration since they mainly distribute in fine size [23]. On top of optical methods, more practitioners resort to data-driven methods, such as machine learning, which can predict air pollution and identify the key factors without consider complex physical and chemical processes [24][25][26]. Note that optical and data-driven methods are generally case-specific, produce limited repeatability, which is the irreplaceable advantage of classical physical models [25].
The dispersion of HISP is determined by meteorological conditions and the stack parameters, such as stack height, stack diameter, plume exit temperature, and plume exit velocity. The plume shape is one of the easy-touse indicators of complex meteorological condition. We choose a coastal industrial area to apply the optical and machine learning methods using the indicator of plume shape. Coastal regions are often prefer sites for heavy industry development and have high emission loads, extending over hundreds of kilometers [27]. In the recent years of remediation policies, heavy industry such as Iron and Steel Foundry (ISF) in the north China plain is moving to coastal areas to avoid fumigation to this largest population center. Specifically, industrial smoke emission in Shandong province ranks the highest (1.2 million tons) among all provinces in China [28]. Rizhao is a coastal city in Shandong province, with numerous ISF plants and a still increasing productive capacity because of Chinese uniform planning. In this paper, we enter the fray of air pollution control by catching the HISP's shape, which is emitted from ISF located in the industrial area of Rizhao city. We managed to find the relationship between HISP's shape, meteorological conditions, and ground PM 2.5 concentration. Doing so can be helpful for policy-makers to identify the air pollution caused by industrial sources with real-time evaluation.

Materials and methods
In September 2018, we launched a campaign at an urban site in an ISF in Rizhao (35.124°N, 119.311°E). The ISF's stacks are located in the dense production area and the number of stacks is about 200. Satellite maps of ISF's heavy stack plumes and their affected area can be found in this manuscript (figure S1 is available online at stacks. iop.org/ERC/2/021005/mmedia). During the observation in early autumn, the coal burning for residential heating can be ignored. At the ISF site, a dual-wavelength (NIES-MIE 1064, 532 nm) depolarization LIDAR was set up for discerning the industrial plumes. LIDAR was developed by the National Institute for Environmental Studies (Japan). More details can be found therein [29,30]. In March 2019, we set up an auto-shoot camera fixed at the ISF site to catch the different types of HISP's shapes. After obtaining the photos, we used the software Labelme (GitHub: https://github.com/wkentaro/labelme) to denote the HISP's shape and then we calculated the polygon length of the denoted shape. Some representative photos can be seen in figure 1. The HISP's shape was denoted by hand, thus there were operation errors during the labeling process. The PM 2.5 concentration data comes from a local monitoring station in the ISF. The meteorological parameters at different heights are available on the public website of European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data (ERA-interim, 0.125°×0.125°). The international weather station (ID=549450) for assimilating the ECMWF model is about 20 km from the ISF site. We abstracted the profile of Air temperature (T) and wind (U, V) in the ISF site. The related denoted files (.json) and other types of data are available in the GitHub website created by Limin Feng for public review (https://github.com/Limin-Feng1993/Data_Plume_Shape_and_ Particle_Concentration).

Results and discussion
3.1. HISP's shape caught by extinction signals The scattering components of HISP's particles, such as sulfate and nitrite, induce a large extinction coefficient in the upper air and is responsible for the degradation of visibility [31]. In September 2018, we caught 4 episodes with significant extinction signals at >200 m height, indicating that different types of HISP passed across the ISF site.
3.1.1. Episode-1: HISP's trapping by stable air Plume trapping occurs when a plume is capped by a deep lid of stable air, which is frequent on cloudy days [32]. Under low cloud cover on September 2, we caught significant extinction signals at 500-1000 m height, named episode-1 (figure 2(a)). The cloud signal is marked as blank areas in the figure. The cloud base height was about 1000 m height, and the upward dispersion of HISP was suppressed. The extinction coefficient of HISP's particles was close to 1. Due to the interaction of low clouds and stack plumes, the internal mixing of HSIP's particles was enhanced, inducing significant radiation force and extinction [33]. The static air appeared at <1000 m height with a wind speed close to zero, while the ground PM 2.5 concentration decreased from 22 μg m −3 to 7 μg m −3 . The airborne particle concentration didn't accumulate, suggesting that the surface air was lack of external pollutant input, neither from ground nor HISP. Static condition in the surface air warded off the upper sinking HISP.
3.1.2. Episode-2 and episode-3: HISP's spreading in homogeneous wind field On September 10, we caught the HISP passing by the observation site with significant signals of aerosols extinction at 500-1500 m height, named as episode-2 ( figure 2(b)). The bottom boundary of the HISP was lower than 200 m height (close to the stack height), indicating the HISP's sinking and fumigation. In episode-2, the west offshore wind dominated 0-3000 m height with a wind speed>2 m s −1 , inducing strong disturbance to the plume aerosols. Accordingly, the ground PM 2.5 concentration ranged from 24-71 μg m −3 during episode-2, suggesting that ground aerosols mixed with HISP particles. Comparing episode-1 and episode-2, we conclude that the unstable air at<1000 m height intensified the HISP's downward dispersion, while the stable air at >1000 m height suppressed the HISP's upward dispersion.
Episode-3 occurred from September 7 to September 8, when we caught two HISPs passing by the upper air one after the other. It can be seen in figure 2(b) that the HISPs spread within 500-3000 m height. Both HISPs can be continuously observed over 20 h. When episode-3 occurred, the north wind dominated 0-3000 m height lasting for more than 48 h. The homogeneous northerly winds created a distinct clean layer between HISP's particles and ground aerosols, accordingly, ground PM 2.5 concentration ranged from 7-12 μg m −3 during episode-3. In just 6 h, the two HISPs spread over a span of 2000 m, thus the dispersion rate of HISP was estimated as ∼6 m min −1 in the vertical direction. Given that the plumes tilt at some angle rather than flat or straight, we assume that the transport speed of HISP in the horizontal direction was comparable to the vertical direction (typically larger). Thus the footprint of the passing-by HISP was estimated to be at least 7 km (residence time as 20 h).

Episode-4: HISP's spreading in heterogeneous wind field
Under sea-land breeze (typically <1000 m height), temporally unstable and spatially heterogeneous wind causes a vertical shear of the plume shape [34,35]. Episode-4 occurred on September 23, during which the wind direction changed from east in 0-1000 m to west in 1000-3000 m after 18:00 ( figure 2(c)). During daytime, the plume spread over the 500-3000 m height under the robust west land breeze in 1000-3000 m and static air at 0-1000 m height. However, at 18:00, east sea breeze began to dominate the air at <1000 height, and the bottom boundary of the HISP was sinking to <300 m. Then during nighttime, the HISP's shape was shrinking, distinguished from ground aerosols, and disappeared at ∼1500 m height in the next morning.
After 18:00 on September 23, the extinction coefficient in the HISP's center line increased from ∼0.08 to >0.15. The extinction is mainly determined by coarse particles with a diameter >1 μm [36,37]. The 532 nm extinction coefficient showed in figure 2 is mainly determined by the upper particles with a diameter>532 nm, and increase as particles grow larger. Therefore, we infer that plume's particles shifted to a larger size and were then continuously scavenged after 18:00 on September 23. The enlarged particle size can be explained by hygroscopic growth, since coagulation rate for super-micron particles is too low to affect particle growth [38]. Substantial particle volume growth was reported to associate with sulfate-rich plumes [39]. The particle growth in HISP was enhanced after the condensation of SO 2 and taking up droplets. As a result, the in-cloud scavenging amplified the absorption and scattering of the HISP meanwhile reduced the residence time of plume's particles.

HISP's shape caught by visible photos
To further determine the impact of HISP's shape to ground particle concentration, we denoted and calculated the polygon length of the visible plume from campaign in March 2019, and attempted to discover the quantitative relationship between them. The polygon length means the maximum distance of two points in all vertexes (basically the diagonal line). The long/broad HISP contains large amount water droplets and liquid particles with long-lasting dispersion.

The non-linear correlation between HISP's shape and PM 2.5 concentration
From the temporal trend of plume length and PM 2.5 concentration, there seems to be a negative correlation between the two parameters, especially in the days with long HISP, (figure 3(a)). However, this negative correlation is non-linear and cannot be regarded as a causal relationship. We can see in figure 3(a) that PM 2.5 concentration was also controlled by synoptic conditions inducing different visibilities and wind fields. For example, during March 1 to March 5 and March 11 to March 12, the visibility was less than 5 km caused by the sea fog and low clouds (see typical photos in figure S3). Accordingly, the PM 2.5 concentration stayed at a high level of >100 μg m −3 . The cloud/fog processes of HISP's particles and ground aerosols were enhanced during these low visibility weathers with intensified phase transformation and hygroscopic growth [27,40].
At first we simplified the question by using the logistic function to fit the polygon length (PL) of HISP's shape and PM 2.5 concentration. Logistic function is a generalized Boltzmann distribution, also known as Gibbs distribution, which is the probability distribution of particles in a nearly isolated system under heat balance. H represents the stack height which was about 200 m and 385 pixels in the photo ( figure 3(b)). The value of 9 μg m −3 represents the background value of PM 2.5 concentration under the most favorable condition, and 205 μg m −3 represents the maximum impact of HISP. Note that the distribution of PL versus PM 2.5 concentration is only half of the logistic function. This is because the HISP's length cannot be as small as zero and always has a certain length. During March 1 to March 5, sea fog appears at the ISF site, and the plume length was <500 pixels in the photos with a large operation error. Therefore, we only choose the HISP's shape with a polygon length of >500 pixels as credible results for fitting.
In machine learning, the logistic function cannot be regarded as a regression line, instead, it is a binary classification method [41]. The logistic function classifies the distribution into two groups. In the one group with plume length >2 H, the spreading of HISP's shape accompanied with the PM 2.5 concentration decreasing to <100 μg m −3 . In the other group with plume length <2 H, the PM 2.5 concentration ranged from 10 μg m −3 to 200 μg m −3 . In other words, the HISP's shape has no significant correlation with PM 2.5 concentration, which is the main source of fitting errors. As formerly said, there are other factors dominating ground PM 2.5 concentration in addition to the HISP's shape. Both the HISP's particles and ground particles can be regarded as subset samples of mixing layer aerosols. Therefore, the collaborative change of these two subset aerosols only occurred under robust mixing. While under the weak diffusion conditions symbolized by short HISP, the ground aerosols may originate from road dust from the transport of raw materials for ISF plants and residential combustion in winter. Thus the controlling factors of PM 2.5 concentration in the second group were complex.

The multiple factors controlling ground PM 2.5 concentration
Since the logistic function still lags far behind our requirement for predicting PM 2.5 concentration, we take wind and temperature profiles into consideration, and then adopt decision tree regression (another popular machine learning method) to make the prediction transparent and precise [42]. We reduce the temporal resolution of data to per 6 h to match the ECMWF meteorological data (U, V, and T at 850 hPa, 950 hPa, and 1000 hPa, respectively). The simulation value and observation values can be seen in figure 4(a). The root mean square error (RMSE) of the decision tree regression is 10 μg m −3 for the training set and 19 μg m −3 for the test set. The large RMSE in the test set is mainly caused by several far over-estimated values (blue outliers in figure 4(a)). In general, the decision tree model performs well in predicting the PM 2.5 concentration with acceptable error and R-square value (>0.8).
The depth of decision tree is 6 (choose the branch number with the highest prediction precision), which is tedious to explain ( figure 4(b)). Subscripts of U and T represent the different pressure levels (hPa), and the negative value of U represents westward wind (onshore). However, we only need to focus two main trunks of the decision tree: the left one controlled by strong eastward wind (U) and short HISP, the right one controlled by long HISP. The left one (with red border) is: U_850<−5.4 m s −1 →PL<509 pixels→PM 2.5 >176 μg m −3 . The right one (with blue border) is: U_850>−5.4 m s −1 →PL>796 pixels (∼2 H)→PM 2.5 <67 μg m −3 . The former trunk tells us that the robust onshore wind and short HISP were related to severe particle pollution at ground level. The onshore east wind brings back the dispersed HISP, causes secondary plume fumigation, and produces advection fog to the coastal area.
The latter trunk tells us that under the offshore wind and long/broad HISP, the PM 2.5 concentration was at a low level and harmless to humans. In this case, HISP's particles were scavenged faster under the integrated effect of offshore dispersion and in-plume droplet's cleaning. It is ill-formed to keep faith with ground-level PM 2.5 concentration leading to take redundant effort to control emission sources. This decision tree model yields an exact prediction of PM 2.5 concentration using the HISP's shape and wind profile, which is practical for policy makers to identify the particle pollution caused by elevated sources or meteorological conditions in the fastest way.

Conclusions
In this research, we performed an analysis of the contribution of heavy industrial stack plume to local ground particle concentration. An auto-shoot camera and LIDAR were set up to catch the passing-by HISP. Here we suggest that: • The ground PM 2.5 concentration correlated to the polygon length of HISP with a logistic function. With a plume length more than twice the height of stack, the spreading of HISP's shape accompanied with the substantially decreased PM 2.5 concentration. The longer elevated stack plumes induce a smaller detrimental impact on ground air quality.
• The residence time of HISP's particles was longer (>20 h) under uniform offshore dispersion than that in heterogeneous wind field, when the footprint of HISP was estimated to be >7 km. Whether the vertical wind field is homogeneous or not determines the plume's dispersing boundary.
• We acquired a decision tree model to yield an exact prediction of PM 2.5 concentration, which confirms that the HISP's shape played a fundamental role in determining the local atmospheric particle concentration. It is an economic way to evaluate industrial air pollution using the stack plume's shape.