Identification of long-range aerosol transport patterns to Toronto via classification of back trajectories by cluster analysis and neural network techniques

https://doi.org/10.1016/j.chemolab.2005.12.009Get rights and content

Abstract

In this work, back trajectories of air masses arriving in Toronto were classified into distinct transport patterns by cluster analysis and, for the first time, by a neural network (Adaptive Resonance Theory—ART-2a). Different similarity criteria were used by the two classification techniques, the former relying on the Euclidean distances between trajectories, the latter on the Euclidean angles between trajectories. Nevertheless, both techniques provided similar conclusions as to the location of PM2.5 emission sources and the level of pollution associated with a given air transport pattern. Both techniques illustrated the cleaner nature of northerly and northwesterly transport patterns in comparison to southerly and southwesterly ones, as well as the effect of near stagnant air masses. In addition, ART-2a resolved a much larger percentage of trajectories than cluster analysis into groups with clearly identifiable transport patterns and compared favourably with cluster analysis with respect to the precision of the classification.

Introduction

Identifying the sources of airborne pollutants is of great importance to the study of fine particulate matter (PM2.5), which has been linked to adverse health effects [1]. The examination of transport patterns of air masses through the use of back trajectories is commonly performed for source identification. Unlike wind direction, back trajectories provide visualization of not just the local air direction, but transport over a continental scale. While back trajectories are on average accurate to within 20% of the distance traveled, individual back trajectories may be completely incorrect [2]. Thus, a large dataset is required to provide meaningful source identification.

Two approaches have recently emerged for the visualization of air quality data. The first consists of the generation of a probability map of the areas around a receptor site that contribute to its poor air quality days, as characterized by high PM2.5 and/or trace gas levels (the so-called Potential Source Contribution Function approach). This is the focus of our work in an upcoming publication [3]. The second approach was the focus of the present work and was based on grouping back trajectories with similar distances traveled or similar overall direction. It has been concluded that grouping back trajectories with similar distances traveled or similar overall direction is the best approach for the visualization of air quality data [4]. Cluster analysis, which uses physical distances between trajectories, has typically been used to group similar trajectories during the last two decades via algorithms like average-linkage clustering, Ward’s method and k-means clustering [4]. These algorithms generate different classification results and their interpretation is often subjective. Consequently, there is no one best grouping algorithm [2], [4], [5], [6], [7], [8].

Another grouping technique that has so far been overlooked for the classification of air mass back trajectories is neural network analysis. Neural networks have long been known as useful for analyses in synoptic climatology [9]. Since then, neural networks have found some use in general circulation models as they provide the tightest possible mapping of the complex, non-linear relationships between the atmosphere and the surface environment [10]. However, a study of transport patterns of air masses, a less complex problem, has not emerged.

The neural network approach differs from cluster analysis in that a desired degree of separation must be specified rather than a desired number of clusters. Secondly, the dot product used in neural net classification incorporates an angular component, relatable in this application to wind direction, rather than the geometric distance used in cluster analysis. Finally neural nets are designed to learn: when a novel trajectory is encountered, a neural net will create a new class with this trajectory as its sole member. This feature can be incorporated into some cluster analysis algorithms but it is not fundamental to this method. Hence sufficient differences existed to suggest that the methods might produce different insight.

Air masses arriving in Toronto have diverse histories ranging from clean, fast-moving Arctic air to polluted, nearly stagnant Ohio Valley air. In fact, Southern Ontario PM2.5 concentrations have been reported to be 2 to 4 times higher under southerly or southwesterly flow conditions than under northerly flow conditions [11]. Thus, the ideal trajectory taxonomy would not group trajectories that pass through both northerly and southerly regions before arriving in Toronto with purely northerly or southerly trajectories. Grouping of air trajectories has been used to study the origins of ozone pollution in Toronto [12]. However, this approach has not previously been applied to particulate matter in the region.

For this reason, the main goal of this work was a comparison of the air pollution information provided by cluster analysis and an artificial neural network, Adaptive Resonance Theory (ART-2a) [13], [14], [15], for back trajectories ending in Toronto during a thirteen month sampling duration. The adaptation of cluster analysis and ART-2a to interpret atmospheric pollutant concentration is described. The inter-cluster variation of atmospheric species concentration was also explored, with special attention devoted to those trajectory groups that displayed abnormally large or small concentrations of atmospheric particulate matter. Some reasons for the dissimilarity of the classifications are also suggested.

Section snippets

Sampling of airborne pollutants

Urban Toronto PM2.5 mass and number concentration, SO2 concentration and Nitrate PM2.5 mass concentration were measured for each hour in the sampling duration by a tapered element oscillating microbalance (TEOM 1400A, Rupprecht & Pataschnick Co.), aerodynamic particle sizer (APS) (Model 3321, TSI Inc.), and a real-time nitrate analyzer (Series 8400N, Rupprecht & Pataschnick Co.), respectively. Note that the APS provided a total particle number concentration between 0.5 and 2.5 μm in this work.

Compilation of back trajectories

Comparison of cluster analysis and ART trajectory classifications

Since this was the first application of ART-2a to back trajectory analysis, a comparison between the ART and cluster analysis solutions follows. Due to the different similarity criteria, each ART class was composed of trajectories that were placed in different clusters. Fig. 2 illustrates an example to explain the reasons behind the different trajectory assignments and to highlight some associated implications.

Trajectories 1 and 3 were assigned to one ART class (class 7), while trajectories 1

Conclusions

In this paper, back trajectories of air masses arriving in Toronto were classified into distinct transport patterns by cluster analysis and a neural network (ART-2a). The application of bulk data to the classification of air mass back trajectories by cluster analysis and a neural network (ART-2a) demonstrated that both techniques provide similar conclusions as to the location of emission sources and the level of pollution associated with a given air transport pattern. Both techniques

Acknowledgements

The authors thank the Canada Foundation for Innovation, Ontario Research and Development Challenge Fund, Natural Sciences and Engineering Research Council (NSERC), Environment Canada, and the Ontario Ministry of the Environment, for funding to construct and operate the University of Toronto Facility for Aerosol Characterization. The authors are also grateful to Environment Canada for the TEOM results and the loan of the nitrate analyzer.

References (20)

  • A. Stohl

    Atmos. Environ.

    (1998)
  • J.N. Cape et al.

    Atmos. Environ.

    (2000)
  • E. Brankov et al.

    Atmos. Environ.

    (1998)
  • L.A. Moy et al.

    Atmos. Environ.

    (1994)
  • S.R. Dorling et al.

    Atmos. Environ.

    (1992)
  • E. Brankov et al.

    Environ. Pollut.

    (2003)
  • C. Perrino et al.

    Atmos. Environ.

    (2002)
  • Ontario Medical Association (Ed.), The illness costs of air pollution—findings report,...
  • S. Owega, G.J. Evans, B. Khan, R.E Jervis, M. Fila, Ecological modelling (submitted for...
  • M.E. Fernau et al.

    J. Appl. Meteorol.

    (1990)
There are more references available in the full text version of this article.

Cited by (30)

  • Comprehensive study of regional haze in the North China Plain with synergistic measurement from multiple mobile vehicle-based lidars and a lidar network

    2020, Science of the Total Environment
    Citation Excerpt :

    The HYSPLIT model is one of the most extensively used atmospheric transport and dispersion models in the atmospheric sciences community and was developed by the US National Oceanic and Atmospheric Administration Air Resource Laboratory. It has been widely used to link pollution concentrations to pollution sources in the previous studies (Owega et al., 2006; Chen et al., 2017). During the measurement period, the backward trajectories were simulated to analyse the source of air masses in the NCP.

  • Potential emission flux to aerosol pollutants over Bengal Gangetic plain through combined trajectory clustering and aerosol source fields analysis

    2016, Atmospheric Research
    Citation Excerpt :

    Cluster analysis algorithm (such as average-link clustering, K-means clustering), a multivariate statistical tool is being used to overcome this difficulty (Bratchell, 1989; Dorling et al., 1992; Brankov et al., 1998; Moy et al., 1994; Stohl, 1998; Cape et al., 2000; Methven et al., 2001; Chan et al., 2002). Cluster analysis has also been carried out recently using new artificial intelligence techniques such as artificial neural networks and self organization map (Owega et al., 2006; Kassomenos et al., 2010; Mingoti and Lima, 2006). Cluster analysis classifies the trajectory dataset into number of groups which differ from each by a specified distance, such as the Euclidean distance between trajectories (Stohl, 1996; Markou and Kassomenos, 2010; Kong et al., 2013).

View all citing articles on Scopus
View full text