Elsevier

Atmospheric Environment

Volume 53, June 2012, Pages 60-74
Atmospheric Environment

Model evaluation and ensemble modelling of surface-level ozone in Europe and North America in the context of AQMEII

https://doi.org/10.1016/j.atmosenv.2012.01.003Get rights and content

Abstract

More than ten state-of-the-art regional air quality models have been applied as part of the Air Quality Model Evaluation International Initiative (AQMEII). These models were run by twenty independent groups in Europe and North America. Standardised modelling outputs over a full year (2006) from each group have been shared on the web-distributed ENSEMBLE system, which allows for statistical and ensemble analyses to be performed by each group. The estimated ground-level ozone mixing ratios from the models are collectively examined in an ensemble fashion and evaluated against a large set of observations from both continents. The scale of the exercise is unprecedented and offers a unique opportunity to investigate methodologies for generating skilful ensembles of regional air quality models outputs. Despite the remarkable progress of ensemble air quality modelling over the past decade, there are still outstanding questions regarding this technique. Among them, what is the best and most beneficial way to build an ensemble of members? And how should the optimum size of the ensemble be determined in order to capture data variability as well as keeping the error low? These questions are addressed here by looking at optimal ensemble size and quality of the members. The analysis carried out is based on systematic minimization of the model error and is important for performing diagnostic/probabilistic model evaluation. It is shown that the most commonly used multi-model approach, namely the average over all available members, can be outperformed by subsets of members optimally selected in terms of bias, error, and correlation. More importantly, this result does not strictly depend on the skill of the individual members, but may require the inclusion of low-ranking skill-score members. A clustering methodology is applied to discern among members and to build a skilful ensemble based on model association and data clustering, which makes no use of priori knowledge of model skill. Results show that, while the methodology needs further refinement, by optimally selecting the cluster distance and association criteria, this approach can be useful for model applications beyond those strictly related to model evaluation, such as air quality forecasting.

Introduction

Regional air quality (AQ) models have undergone considerable development over the past three decades, mainly driven by the increased concern regarding the impact of air pollution on human health and ecosystems (Rao et al., 2011). This is particularly true for ozone and particulate matter (e.g., Holloway et al., 2003, Jacob and Winner, 2009). Regional AQ models are now widely used for supporting emissions control policy formulation, testing the efficacy of abatement strategies, performing real-time AQ forecasts, and evaluating integrated monitoring strategies. Moreover, ozone estimates have been used in assimilation schemes to provide further information on meteorological variables such as wind speed (e.g., Eskes, 2003). The combination of outcomes predicted by several models (regardless of their field of application), in what is typically defined as ensemble modelling, has been shown to enhance skill when compared against an individual model realisation (e.g., Delle Monache and Stull, 2003, Galmarini et al., 2004, van Loon et al., 2007). Although ensemble modelling is well established (both from the applied and theoretical perspectives) and is now routinely used in weather forecasting, it is only during the last decade that a growing number of AQ modelling communities have joined their model outputs in multi-model (MM) combinations (Galmarini et al., 2001, Carmichael et al., 2003, Rao et al., 2011). The advantages of ensemble modelling versus an individual model are at least twofold: (i) the mean (or median) of the ensemble is, in effect, a new model that is expected to lower the error of the individual members due to mutual cancellation of errors; and (ii) the spread of the ensemble represents a measure of the variability of the model predictions (Galmarini et al., 2004, Mallet and Sportisse, 2006, Vautard et al., 2006, Vautard et al., 2009, van Loon et al., 2007). Potempski and Galmarini (2009) also point out the scientific consensus around MM ensemble techniques as a way of extracting information from many sources and synthetically assessing their variability. In particular, the mean and median offer enhanced performance, on average, compared with single-model (SM) realisations (Delle Monache and Stull, 2003, Galmarini et al., 2004, McKeen et al., 2005, and others).

A MM ensemble can be generated in many ways (see, e.g., Galmarini et al., 2004), including by varying some internal parameters for multiple simulations with an SM, by using different input data (e.g., emissions) for multiple simulations with an SM, or by applying several different models to the same scenario. This latter approach is the main focus of the Air Quality Model Evaluation International Initiative (AQMEII) (Rao et al., 2011), an international project aimed at joining the knowledge and experiences of AQ modelling groups in Europe and North America. Within AQMEII, standardised modelling outputs have been shared on the web-distributed ENSEMBLE system, which allows statistical and ensemble analyses to be performed by multiple groups (Bianconi et al., 2004, Galmarini et al., 2012). A joint exercise was launched for European and North American AQ modelling communities to use their own regional AQ models to simulate the entire year 2006 for the continents of Europe and North America, retrospectively. Outputs from numerous regional AQ models have been submitted in the form of both gridded, hourly concentration fields and values at specific locations, allowing for direct comparison with air quality measurements available from monitoring networks across North America and Europe (see Rao et al., 2011 for additional details). This type of evaluation, with large temporal and spatial scales, is essential to assess model performance and identify model deficiencies (Dennis et al., 2010, Rao et al., 2011).

In this study, we analyse ozone mixing ratios provided by simulations from eleven state-of-the-art regional AQ models run by eighteen independent groups from North America (NA) and Europe (EU) (while a companion study is devoted to the examination of particulate matter, Solazzo et al., 2012). Model predictions have been made available, along with observational data, to the ENSEMBLE system. The ability of the ensemble mean and median to reduce the error and bias of SM outputs is examined, and conclusions regarding the size of the ensemble and its quality are made. The level of repetition provided by each individual model to the ensemble is quantified by applying a clustering analysis to examine whether the improvement in error using the mean or median of the model ensemble is due to the increased ensemble size, or if information carried by each model contributes to the MM superiority.

The main objective of this study is to assess the statistical properties of the ensemble of models in relation to the individual model realisations for a range of air quality cases. Each model has imperfections, and it is beyond the scope of this analysis to identify the causes of model bias for each ensemble member. Several other papers examining the performance of the individual model simulation are available in the AQMEII special issue.

Section snippets

Experimental set up

In order to carry out a comprehensive evaluation of the participating regional-scale AQ models, the model estimates are compared to observations for the year of 2006, with the various modelling groups providing hourly ozone mixing ratios and concentrations of other compounds. Surface concentrations were then interpolated to the monitoring locations for the purposes of model evaluation.

Participating models

Table 1 summarises the meteorological and AQ models participating in the AQMEII intercomparison exercise and

Operational SM and ensemble statistics for the continental-wide domains

van Loon et al. (2007) showed that the ensemble mean ozone daily cycle over EU, obtained by averaging over all monitoring stations for the entire year of 2001, agrees almost perfectly with the observations, and better than any individual member of the ensemble. This result provides substantial evidence of the enhanced skill of MM predictions versus the individual SM predictions. Such a result, though, while encouraging, poses some additional questions, such as what is the role of repeated

Ensemble size

In this section we evaluate whether an ensemble built with a subset of individual models can outperform the ensemble mean of all available members, as anticipated by the theoretical analysis of Potempski and Galmarini (2009). The analysis is done for the sub-regions of EU and NA separately, using hourly ozone data for the period JJA.

Consider the distribution of some statistical measures (RMSE, PCC, MB, MGE, defined in Appendix A) of the mean of all possible combinations of available ensemble

Reduction of data complexity: a clustering approach

Results discussed in the previous section have shown that a skilful ensemble is built with an optimal number of members and often includes low-ranking skill-score members as well. In order to discern which members should be included in the ensemble, a method for clustering highly associated models and then discarding redundant information was developed using the PCC as the determining metric (we note that PCC is independent of model bias; therefore, the analysis would be the same for unbiased

Conclusions

This study collectively evaluates and analyses the performance of eleven regional AQ models and their ensembles in the context of the AQMEII inter-comparison exercise. The scale of the exercise is unprecedented, with two continent-wide domains being modelled for a full year. The focus of this study was on the collective analysis of surface ozone mixing ratios, rather than on inter-comparing metrics for each individual model. The study began with an analysis of ozone time series for sub-regions

Acknowledgments

The work carried out with the DEHM model was supported by The Danish Strategic Research Program under contract no 2104-06-0027 (CEEH). Homepage: www.ceeh.dk.

References (57)

  • S. Galmarini et al.

    Ensemble dispersion forecasting. Part I: concept, approach and indicators

    Atmospheric Environment

    (2004)
  • S. Galmarini et al.

    ENSEMBLE and AMET: two systems and approaches to a harmonised, simplified and efficient assistance to air quality model developments and evaluation

    Atmospheric Environment

    (2012)
  • W. Gong et al.

    Cloud processing of gases and aerosols in a regional air quality model (AURAMS)

    Atmospheric Research

    (2006)
  • A. Guenther et al.

    Natural volatile organic compound emission rate estimates for US woodland landscapes

    Atmospheric Environment

    (1994)
  • J.A. Herwehe et al.

    Diagnostic analysis of ozone concentrations simulated by two regional-scale air quality models

    Atmospheric Environment

    (2011)
  • D.J. Jacob et al.

    Effect of climate change on air quality

    Atmospheric Environment

    (2009)
  • U. Nopmongcol et al.

    Modeling Europe with CAMx for the Air Quality Model Evaluation International Initiative (AQMEII)

    Atmospheric Environment

    (2012)
  • E. Renner et al.

    Modelling the formation and atmospheric transport of secondary inorganic aerosols with special attention to regions with high ammonia emissions

    Atmospheric Environment

    (2010)
  • A. Russell et al.

    NARSTO critical review of photochemical models and modelling

    Atmospheric Environment

    (2000)
  • K. Sartelet et al.

    Impact of biogenic emissions on air quality over Europe and North America

    Atmospheric Environment

    (2012)
  • H. Schmidt et al.

    A comparison of simulated and observed ozone mixing ratios for the summer of 1998 in western Europe

    Atmospheric Environment

    (2001)
  • S.C. Smyth et al.

    A comparative performance evaluation of the AURAMS and CMAQ air quality modelling systems

    Atmospheric Environment

    (2009)
  • M. Sofiev et al.

    A dispersion modeling system SILAM and its evaluation against ETEX data

    Atmospheric Environment

    (2006)
  • E. Solazzo et al.

    Operational model evaluation for particulate matter in Europe and North America in the context of the AQMEII project

    Submitted for publication to Atmospheric Environment

    (2012)
  • M. van Loon et al.

    Evaluation of long-term ozone simulations from seven regional air quality models and their ensemble average

    Atmospheric Environment

    (2007)
  • R. Vautard et al.

    Skill and uncertainty of a regional air quality model ensemble

    Atmospheric Environment

    (2009)
  • R. Vautard et al.

    Evaluation of the meteorological forcing used for AQMEII air quality simulations

    Atmospheric Environment

    (2012)
  • K.W. Appel et al.

    Examination of the Community Multiscale Air Quality (CMAQ) model performance for North America and Europe for the AQMEII project

    Atmospheric Environment

    (2012)
  • Cited by (153)

    View all citing articles on Scopus
    View full text