Identification of long-range aerosol transport patterns to Toronto via classification of back trajectories by cluster analysis and neural network techniques

doi:10.1016/j.chemolab.2005.12.009

Chemometrics and Intelligent Laboratory Systems

Volume 83, Issue 1, 7 July 2006, Pages 26-33

https://doi.org/10.1016/j.chemolab.2005.12.009 Get rights and content

Abstract

In this work, back trajectories of air masses arriving in Toronto were classified into distinct transport patterns by cluster analysis and, for the first time, by a neural network (Adaptive Resonance Theory—ART-2a). Different similarity criteria were used by the two classification techniques, the former relying on the Euclidean distances between trajectories, the latter on the Euclidean angles between trajectories. Nevertheless, both techniques provided similar conclusions as to the location of PM_2.5 emission sources and the level of pollution associated with a given air transport pattern. Both techniques illustrated the cleaner nature of northerly and northwesterly transport patterns in comparison to southerly and southwesterly ones, as well as the effect of near stagnant air masses. In addition, ART-2a resolved a much larger percentage of trajectories than cluster analysis into groups with clearly identifiable transport patterns and compared favourably with cluster analysis with respect to the precision of the classification.

Introduction

Identifying the sources of airborne pollutants is of great importance to the study of fine particulate matter (PM_2.5), which has been linked to adverse health effects [1]. The examination of transport patterns of air masses through the use of back trajectories is commonly performed for source identification. Unlike wind direction, back trajectories provide visualization of not just the local air direction, but transport over a continental scale. While back trajectories are on average accurate to within 20% of the distance traveled, individual back trajectories may be completely incorrect [2]. Thus, a large dataset is required to provide meaningful source identification.

Two approaches have recently emerged for the visualization of air quality data. The first consists of the generation of a probability map of the areas around a receptor site that contribute to its poor air quality days, as characterized by high PM_2.5 and/or trace gas levels (the so-called Potential Source Contribution Function approach). This is the focus of our work in an upcoming publication [3]. The second approach was the focus of the present work and was based on grouping back trajectories with similar distances traveled or similar overall direction. It has been concluded that grouping back trajectories with similar distances traveled or similar overall direction is the best approach for the visualization of air quality data [4]. Cluster analysis, which uses physical distances between trajectories, has typically been used to group similar trajectories during the last two decades via algorithms like average-linkage clustering, Ward’s method and k-means clustering [4]. These algorithms generate different classification results and their interpretation is often subjective. Consequently, there is no one best grouping algorithm [2], [4], [5], [6], [7], [8].

Another grouping technique that has so far been overlooked for the classification of air mass back trajectories is neural network analysis. Neural networks have long been known as useful for analyses in synoptic climatology [9]. Since then, neural networks have found some use in general circulation models as they provide the tightest possible mapping of the complex, non-linear relationships between the atmosphere and the surface environment [10]. However, a study of transport patterns of air masses, a less complex problem, has not emerged.

The neural network approach differs from cluster analysis in that a desired degree of separation must be specified rather than a desired number of clusters. Secondly, the dot product used in neural net classification incorporates an angular component, relatable in this application to wind direction, rather than the geometric distance used in cluster analysis. Finally neural nets are designed to learn: when a novel trajectory is encountered, a neural net will create a new class with this trajectory as its sole member. This feature can be incorporated into some cluster analysis algorithms but it is not fundamental to this method. Hence sufficient differences existed to suggest that the methods might produce different insight.

Air masses arriving in Toronto have diverse histories ranging from clean, fast-moving Arctic air to polluted, nearly stagnant Ohio Valley air. In fact, Southern Ontario PM_2.5 concentrations have been reported to be 2 to 4 times higher under southerly or southwesterly flow conditions than under northerly flow conditions [11]. Thus, the ideal trajectory taxonomy would not group trajectories that pass through both northerly and southerly regions before arriving in Toronto with purely northerly or southerly trajectories. Grouping of air trajectories has been used to study the origins of ozone pollution in Toronto [12]. However, this approach has not previously been applied to particulate matter in the region.

For this reason, the main goal of this work was a comparison of the air pollution information provided by cluster analysis and an artificial neural network, Adaptive Resonance Theory (ART-2a) [13], [14], [15], for back trajectories ending in Toronto during a thirteen month sampling duration. The adaptation of cluster analysis and ART-2a to interpret atmospheric pollutant concentration is described. The inter-cluster variation of atmospheric species concentration was also explored, with special attention devoted to those trajectory groups that displayed abnormally large or small concentrations of atmospheric particulate matter. Some reasons for the dissimilarity of the classifications are also suggested.

Section snippets

Sampling of airborne pollutants

Urban Toronto PM_2.5 mass and number concentration, SO₂ concentration and Nitrate PM_2.5 mass concentration were measured for each hour in the sampling duration by a tapered element oscillating microbalance (TEOM 1400A, Rupprecht & Pataschnick Co.), aerodynamic particle sizer (APS) (Model 3321, TSI Inc.), and a real-time nitrate analyzer (Series 8400N, Rupprecht & Pataschnick Co.), respectively. Note that the APS provided a total particle number concentration between 0.5 and 2.5 μm in this work.

Compilation of back trajectories

Comparison of cluster analysis and ART trajectory classifications

Since this was the first application of ART-2a to back trajectory analysis, a comparison between the ART and cluster analysis solutions follows. Due to the different similarity criteria, each ART class was composed of trajectories that were placed in different clusters. Fig. 2 illustrates an example to explain the reasons behind the different trajectory assignments and to highlight some associated implications.

Trajectories 1 and 3 were assigned to one ART class (class 7), while trajectories 1

Conclusions

In this paper, back trajectories of air masses arriving in Toronto were classified into distinct transport patterns by cluster analysis and a neural network (ART-2a). The application of bulk data to the classification of air mass back trajectories by cluster analysis and a neural network (ART-2a) demonstrated that both techniques provide similar conclusions as to the location of emission sources and the level of pollution associated with a given air transport pattern. Both techniques

Acknowledgements

The authors thank the Canada Foundation for Innovation, Ontario Research and Development Challenge Fund, Natural Sciences and Engineering Research Council (NSERC), Environment Canada, and the Ontario Ministry of the Environment, for funding to construct and operate the University of Toronto Facility for Aerosol Characterization. The authors are also grateful to Environment Canada for the TEOM results and the loan of the nitrate analyzer.

References (20)

A. Stohl
Atmos. Environ.
(1998)
J.N. Cape et al.
Atmos. Environ.
(2000)
E. Brankov et al.
Atmos. Environ.
(1998)
L.A. Moy et al.
Atmos. Environ.
(1994)
S.R. Dorling et al.
Atmos. Environ.
(1992)
E. Brankov et al.
Environ. Pollut.
(2003)
C. Perrino et al.
Atmos. Environ.
(2002)
Ontario Medical Association (Ed.), The illness costs of air pollution—findings report,...
S. Owega, G.J. Evans, B. Khan, R.E Jervis, M. Fila, Ecological modelling (submitted for...
M.E. Fernau et al.
J. Appl. Meteorol.
(1990)

There are more references available in the full text version of this article.

Cited by (30)

Comprehensive study of regional haze in the North China Plain with synergistic measurement from multiple mobile vehicle-based lidars and a lidar network
2020, Science of the Total Environment
Citation Excerpt :
The HYSPLIT model is one of the most extensively used atmospheric transport and dispersion models in the atmospheric sciences community and was developed by the US National Oceanic and Atmospheric Administration Air Resource Laboratory. It has been widely used to link pollution concentrations to pollution sources in the previous studies (Owega et al., 2006; Chen et al., 2017). During the measurement period, the backward trajectories were simulated to analyse the source of air masses in the NCP.
Recently, haze pollution has emerged as a regional characteristic that needs to be monitored and mitigated sensibly in China, particularly in the North China Plain (NCP). Clarifying the distribution and source characteristics of haze is necessary to better understand its formation mechanism on a regional scale. In this study, a comprehensive study of regional haze using synergistic measurement from multiple mobile vehicle-based lidars, a ground-based lidar network, and in suit instruments is presented. To investigate the distribution and source characteristics of regional haze in the NCP during the winter of 2017, simultaneous measurements of aerosol under different wind conditions are conducted. The regional distribution characteristics of the aerosol were observed using three sets of mobile vehicle-based lidars, and the source characteristics were achieved using an analysis of transport flux (with the ground-based lidar network and the WRF-Chem model). High aerosol extinction was observed on the southwest pathway under a southern wind. Backward trajectories also indicated that the air masses at 500 m were primarily from the southwest. The transport flux at the boundary of Beijing (BJ) and Baoding (BD) on the southwest pathway was calculated. Below 500 m, the transport flux from BD to BJ was positive under a southern wind and negative under a northern wind. In addition to the transport layer below 500 m, an upper transport layer was observed both on November 6, 2017 and January 15, 2018. The upper transport layer from 500 m to 1500 m on November 6, 2017 was obviously noticeable, which decreased dramatically with a maximum transport flux of 539.53 μg m² s. The significant transport layer at 1250 m with a maximum flux of 614.93 μg m² s was observed on January 15, 2018, while it had no impact on the ground because it had not yet fallen.
Autoregressive metric-based trimmed fuzzy clustering with an application to PM<inf>10</inf> time series
2017, Chemometrics and Intelligent Laboratory Systems
Air quality measurement relies on the effectiveness of a network of monitoring stations. Monitoring stations collect information about the evolution of air pollutants concentration. If more stations supplies the same information, then some of them could be deemed as redundant. Then, a clustering model for time series is useful to identify stations with similar features. Time series of pollutant concentration can be classified using the autoregressive metric in the framework of standard clustering techniques. A serious drawback is related to the lack of robustness of standard procedures. In this paper, using a partitioning around medoids approach combined with a trimming-based rule, a fuzzy model for cluster time series is proposed. The model provides a robust alternative to standard procedures. Two simulation studies are carried out to evaluate the clustering performance of the proposed clustering model. Finally, an empirical application to real time series of PM₁₀ concentration in the Lazio region is presented and discussed showing the practical usefulness of the proposed approach.
Potential emission flux to aerosol pollutants over Bengal Gangetic plain through combined trajectory clustering and aerosol source fields analysis
2016, Atmospheric Research
Citation Excerpt :
Cluster analysis algorithm (such as average-link clustering, K-means clustering), a multivariate statistical tool is being used to overcome this difficulty (Bratchell, 1989; Dorling et al., 1992; Brankov et al., 1998; Moy et al., 1994; Stohl, 1998; Cape et al., 2000; Methven et al., 2001; Chan et al., 2002). Cluster analysis has also been carried out recently using new artificial intelligence techniques such as artificial neural networks and self organization map (Owega et al., 2006; Kassomenos et al., 2010; Mingoti and Lima, 2006). Cluster analysis classifies the trajectory dataset into number of groups which differ from each by a specified distance, such as the Euclidean distance between trajectories (Stohl, 1996; Markou and Kassomenos, 2010; Kong et al., 2013).
A hybrid source-receptor analysis was carried out to evaluate the potential emission flux to winter monsoon (WinMon) aerosols over Bengal Gangetic plain urban (Kolkata, Kol) and semi-urban atmospheres (Kharagpur, Kgp). This was done through application of fuzzy c-mean clustering to back-trajectory data combined with emission flux and residence time weighted aerosols analysis. WinMon mean aerosol optical depth (AOD) and angstrom exponent (AE) at Kol (AOD: 0.77; AE: 1.17) were respectively slightly higher than and nearly equal to that at Kgp (AOD: 0.71; AE: 1.18). Out of six source region clusters over Indian subcontinent and two over Indian oceanic region, the cluster mean AOD was the highest when associated with the mean path of air mass originating from the Bay of Bengal and the Arabian sea clusters at Kol and that from the Indo-Gangetic plain (IGP) cluster at Kgp. Spatial distribution of weighted AOD fields showed the highest potential source of aerosols over the IGP, primarily over upper IGP (e.g. Punjab, Haryana), lower IGP (e.g. Uttarpradesh) and eastern region (e.g. west Bengal, Bihar, northeast India) clusters. The emission flux contribution potential (EFCP) of fossil fuel (FF) emissions at surface (SL) of Kol/Kgp, elevated layer (EL) of Kol, and of biomass burning (BB) emissions at SL of Kol were primarily from upper, lower, upper/lower IGP clusters respectively. The EFCP of FF/BB emissions at Kgp-EL/SL, and that of BB at EL of Kol/Kgp were mainly from eastern region and Africa (AFR) clusters respectively. Though the AFR cluster was constituted of significantly high emission flux source potential of dust emissions, the EFCP of dust from northwest India (NWI) was comparable to that from AFR at Kol SL/EL.
Comparison of transport pathways and potential sources of PM<inf>10</inf> in two cities around a large Chinese lake using the modified trajectory analysis
2013, Atmospheric Research
Trajectory cluster analysis, including the two-stage cluster method based on Euclidean metrics and the one-stage clustering method based on Mahalanobis metrics and self-organizing maps (SOM), was applied and compared to identify the transport pathways of PM₁₀ for the cities of Chaohu and Hefei, both located near Lake Chaohu in China. The two-stage cluster method was modified to further investigate the long trajectories in the second stage in order to eliminate the observed disaggregation among them. Twelve trajectory clusters were identified for both cities. The one-stage clustering method based on Mahalanobis metrics gives the best performance regarding the variances within clusters. The results showed that local PM₁₀ emission was one of the most important sources in both cities and that the local emission in Hefei was higher than in Chaohu. In addition, Chaohu suffered greater effects from the eastern region (Yangtze River Delta, YRD) than Hefei. On the other hand, the long-range transportation from the northwestern pathway had a higher influence on the PM₁₀ level in Hefei. Receptor models, including potential source contribution function (PSCF) and residence time weighted concentrations (RTWC), were utilized to identify the potential source locations of PM₁₀ for both cities. However, the combined PSCF and RTWC results for the two cities provided PM₁₀ source locations that were more consistent with the results of transport pathways and the total anthropogenic PM₁₀ emission inventory. This indicates that the combined method's ability to identify the source regions is superior to that of the individual PSCF or RTWC methods. Henan and Shanxi Provinces and the YRD were important PM₁₀ source regions for the two cities, but the Henan and Shanxi area was more important for Hefei than for Chaohu, while the YRD region was less important. In addition, the PSCF, RTWC and the combined results all had higher correlation coefficients with PM₁₀ emission from traffic than from industry, electricity generation or residential sources, suggesting the relatively higher contribution of traffic emissions to the PM₁₀ pollution in Lake Chaohu.
Quantitative determination of regional contributions to fine and coarse particle mass in urban receptor sites
2013, Environmental Pollution
In this study, we demonstrate that regression analysis of trajectories residence time estimates the contributions of geographical sectors to fine and coarse particle mass in urban receptor sites. We applied the methodology to coarse and fine particles in Amsterdam, Athens, Birmingham and Helsinki. The sectors with the highest contributions on PM2.5 and PM10–2.5 for Amsterdam and Birmingham were Central/Eastern Europe and the Atlantic Ocean/North Sea, respectively. For Athens, the four sectors within 500 km accounted for the largest fraction of PM2.5. The Mediterranean Sea and North Africa added more than half of PM10–2.5 in Athens. For Helsinki, more than 50% of PM2.5 and PM10–2.5 were from sources outside Finland. This approach may be applied to assess the impact of transport on particle mass levels, identify the spatial patterns of particle sources and generate valuable data to design national and transnational efficient emission control strategies.
Adaptive Resonance Theory: How a brain learns to consciously attend, learn, and recognize a changing world
2013, Neural Networks
Adaptive Resonance Theory, or ART, is a cognitive and neural theory of how the brain autonomously learns to categorize, recognize, and predict objects and events in a changing world. This article reviews classical and recent developments of ART, and provides a synthesis of concepts, principles, mechanisms, architectures, and the interdisciplinary data bases that they have helped to explain and predict. The review illustrates that ART is currently the most highly developed cognitive and neural theory available, with the broadest explanatory and predictive range. Central to ART’s predictive power is its ability to carry out fast, incremental, and stable unsupervised and supervised learning in response to a changing world. ART specifies mechanistic links between processes of consciousness, learning, expectation, attention, resonance, and synchrony during both unsupervised and supervised learning. ART provides functional and mechanistic explanations of such diverse topics as laminar cortical circuitry; invariant object and scenic gist learning and recognition; prototype, surface, and boundary attention; gamma and beta oscillations; learning of entorhinal grid cells and hippocampal place cells; computation of homologous spatial and temporal mechanisms in the entorhinal–hippocampal system; vigilance breakdowns during autism and medial temporal amnesia; cognitive–emotional interactions that focus attention on valued objects in an adaptively timed way; item–order–rank working memories and learned list chunks for the planning and control of sequences of linguistic, spatial, and motor information; conscious speech percepts that are influenced by future context; auditory streaming in noise during source segregation; and speaker normalization. Brain regions that are functionally described include visual and auditory neocortex; specific and nonspecific thalamic nuclei; inferotemporal, parietal, prefrontal, entorhinal, hippocampal, parahippocampal, perirhinal, and motor cortices; frontal eye fields; supplementary eye fields; amygdala; basal ganglia: cerebellum; and superior colliculus. Due to the complementary organization of the brain, ART does not describe many spatial and motor behaviors whose matching and learning laws differ from those of ART. ART algorithms for engineering and technology are listed, as are comparisons with other types of models.

View all citing articles on Scopus

View full text

Identification of long-range aerosol transport patterns to Toronto via classification of back trajectories by cluster analysis and neural network techniques

Abstract

Introduction

Section snippets

Sampling of airborne pollutants

Compilation of back trajectories

Comparison of cluster analysis and ART trajectory classifications

Conclusions

Acknowledgements

Atmos. Environ.

Atmos. Environ.

Atmos. Environ.

Atmos. Environ.

Atmos. Environ.

Environ. Pollut.

Atmos. Environ.

J. Appl. Meteorol.