Comparison of OMI NO2 tropospheric columns with an ensemble of global and European regional air quality models

Abstract. We present a comparison of tropospheric NO2 from OMI measurements to the median of an ensemble of Regional Air Quality (RAQ) models, and an intercomparison of the contributing RAQ models and two global models for the period July 2008–June 2009 over Europe. The model forecasts were produced routinely on a daily basis in the context of the European GEMS ("Global and regional Earth-system (atmosphere) Monitoring using Satellite and in-situ data") project. The tropospheric vertical column of the RAQ ensemble median shows a spatial distribution which agrees well with the OMI NO2 observations, with a correlation r=0.8. This is higher than the correlations from any one of the individual RAQ models, which supports the use of a model ensemble approach for regional air pollution forecasting. The global models show high correlations compared to OMI, but with significantly less spatial detail, due to their coarser resolution. Deviations in the tropospheric NO2 columns of individual RAQ models from the mean were in the range of 20–34% in winter and 40–62% in summer, suggesting that the RAQ ensemble prediction is relatively more uncertain in the summer months. The ensemble median shows a stronger seasonal cycle of NO2 columns than OMI, and the ensemble is on average 50% below the OMI observations in summer, whereas in winter the bias is small. On the other hand the ensemble median shows a somewhat weaker seasonal cycle than NO2 surface observations from the Dutch Air Quality Network, and on average a negative bias of 14%. Full profile information was available for two RAQ models and for the global models. For these models the retrieval averaging kernel was applied. Minor differences are found for area-averaged model columns with and without applying the kernel, which shows that the impact of replacing the a priori profiles by the RAQ model profiles is on average small. However, the contrast between major hotspots and rural areas is stronger for the direct modeled vertical columns than the columns where the averaging kernels are applied, related to a larger relative contribution of the free troposphere and the coarse horizontal resolution in the a priori profiles compared to the RAQ models. In line with validation results reported in the literature, summertime concentrations in the lowermost boundary layer in the a priori profiles from the DOMINO product are significantly larger than the RAQ model concentrations and surface observations over the Netherlands. This affects the profile shape, and contributes to a high bias in OMI tropospheric columns over polluted regions. The global models indicate that the upper troposphere may contribute significantly to the total column and it is important to account for this in comparisons with RAQ models. A combination of upper troposphere model biases, the a priori profile effects and DOMINO product retrieval issues could explain the discrepancy observed between the OMI observations and the ensemble median in summer.

However, the contrast between major hotspots and rural areas is stronger for the direct modeled vertical columns than the columns where the averaging kernels are applied, related to a larger relative contribution of the free troposphere and the coarse horizontal resolution in the a priori profiles compared to the RAQ models.
In line with validation results reported in the literature, summertime concentrations in the lowermost boundary layer in the a priori profiles from the DOMINO product are significantly larger than the RAQ model concentrations and surface observations over the Netherlands. This affects the profile shape, and contributes to a high bias in OMI tropospheric columns over polluted regions. The global models indicate that the upper troposphere may contribute significantly to the total column and it is important to account for this in comparisons with RAQ models. A combination of upper troposphere model biases, the a priori profile effects and DOMINO product retrieval issues could explain the discrepancy observed between the OMI observations and the ensemble median in summer.

Introduction
NO 2 is a key chemical variable determining air quality. It affects human health directly, and indirectly through increased ozone concentrations (Godowitch et al., 2008), as NO 2 acts as a catalyst in ozone formation (Knowlton et al., 2004). The trace gases relevant for regional air quality are affected by local sources and weather conditions, but also by changing background conditions influenced by long range transport of pollution from elsewhere. Regional Air Quality (RAQ) models have been developed in many countries to describe and forecast surface concentrations of health-related species, such as O 3 , aerosols and NO x . As the quality of the RAQ models improves, their use in an operational system for the provision of daily forecasts of regional air pollution levels comes within reach. Examples are the French Prevair system (Rouil et al., 2009), or the US AIRNow system (http: //www.airnow.gov). NO 2 is one of the key trace gases that is extensively monitored, and is subject to health regulations.
The European project "Global and regional Earthsystem (atmosphere) Monitoring using Satellite and in-situ data" (GEMS) has developed a pre-operational system for forecasting the chemical composition of the atmosphere, both on the global scale and on the regional scale for Europe (Hollingsworth et al., 2008).
Three global Chemistry Transport Models (CTMs) are incorporated in the GEMS system. The MOZART model (Horowitz et al., 2003;Kinnison et al., 2007) was coupled to ECMWF's integrated forecast system (IFS) (Flemming et al., 2009). This coupled system delivers daily forecasts for reactive trace gases. The models MOCAGE (Josse et al., 2004;Bousserez et al., 2007) and TM5 (Krol et al., 2005) have been running in an offline mode and in experimental phase in a forecast mode coupled to IFS.
As part of the GEMS project an ensemble of ten RAQ models have been set up independently to deliver forecasts of trace gases on a daily basis, up to three days ahead. Several RAQ models use the IFS forecast as meteorological driver and most RAQ models use the global MOZART-IFS forecasts for their trace gas boundary conditions. A high-resolution anthropogenic emission inventory has become available during GEMS (Visschedijk et al., 2007), and was used by most RAQ models.
Apart from these common elements the RAQ models differ significantly with respect to the applied chemical mechanisms, and detailed implementation of the transport schemes, meteorological processes and emissions. This diversity is an important motivation for the multi-model ensemble forecast approach adopted in the GEMS project. Several studies have shown that a model ensemble mean or median performs better than the best individual model, e.g. van Loon et al. (2007). Furthermore, the spread of an air-quality model ensemble may serve as indicator of the uncertainty of the ensemble forecast .
In this paper we compare 12 months of semi-operational global and regional forecast results from the RAQ ensemble with satellite NO 2 measurements. During this one year of operations some of the models changed their configuration related to model upgrades (e.g. increasing resolution) and bug-fixes (e.g. implementation of emissions). These changes are listed in the model description, Sect. 2.1.
During the GEMS project the RAQ models were routinely verified against surface observations of trace gases and OMI NO 2 satellite observations. Although the verification against surface observations is most relevant from the perspective of air pollution levels at the surface, there are complicating factors with this type of validation, in particular concerning the representativity, coverage and the measurement accuracy of the surface observations. Complementary to the surface observations, satellite data can give valuable insight in the quality of the models, because they provide a complete coverage and contains information on concentrations aloft, i.e. in the full boundary layer and the free troposphere.
Satellite data have been used in several studies to validate global CTM's. For instance, van Noije et al. (2006) have performed a multi-model intercomparison for NO 2 on a global scale, based on GOME retrievals. In their study both the retrievals and the models were smoothed to a common 5×5 • grid. It highlighted the differences in the models, but also showed significant differences between the retrieval algorithms. A more detailed analysis on a 0.5 • grid for a regional model (CHIMERE) using the SCIAMACHY NO 2 data and surface observations was presented by Blond et al. (2007). In these studies some of the differences between models as well as differences of models compared to NO 2 retrievals remained unclarified. For instance, the effect of the model Atmos. Chem. Phys., 10, 2010 www.atmos-chem-phys.net/10/3273/2010/ V. Huijnen et al.: Comparison of NO 2 in regional and global models to OMI 3275 resolution, related to the large spatial/temporal gradients and the short lifetime of NO 2 has not been considered. The use of the retrieval averaging kernel in the comparisons has an impact when the model profile shape is different from the a priori profile used in the satellite retrieval (Eskes and Boersma, 2003). The intercomparison of RAQ models, that are designed to simulate the chemistry and dynamics in surface concentrations, and global CTMs, which are more focussed on simulating background concentrations in the free troposphere, can also be used to quantify altitude-dependent model uncertainties.
In the analysis of modeled tropospheric columns the NO 2 contribution from the free troposphere needs to be accounted for (Napelenok et al., 2008), in particular because the satellite is generally more sensitive to NO 2 in the free troposphere. Therefore the combination of global and regional scale models compared with both surface observations and satellite retrievals helps to attribute model errors at different levels.
In this study we compare the tropospheric NO 2 column data derived from the OMI satellite instrument, the DOMINO product , to the NO 2 forecasts produced by the RAQ models and global CTMs. This retrieval product contains the averaging kernel as well as the a priori profile shapes. The DOMINO product was validated in several studies, e.g. Boersma et al. (2008Boersma et al. ( , 2009b; Brinksma et al. (2008). OMI achieves a resolution of up to 13×24 km 2 at nadir, with a daily global coverage. This makes the data very suitable for the daily comparison to the high-resolution RAQ model predictions (with a typical resolution of 0.2×0.2 • ). Because of its daily coverage a sufficient amount of data is available for a quantitative, statistical analysis on a monthly basis.
Eight members of the RAQ ensemble have been providing tropospheric NO 2 concentration fields on an hourly basis. We intercompare these model results in terms of total columns, profile shape and surface concentrations from July 2008 to June 2009 over the European domain. The ensemble median is used as reference to which the individual models and OMI retrievals are compared. This gives information on the model spread, a measure of the uncertainty of the ensemble forecast. The impact of averaging kernels on modeled columns is assessed, in relation to the vertical profiles. Additionally the ensemble median and the individual models are compared against surface observations from the Dutch Air Quality Monitoring Network (LML) (Beijk et al., 2007).
An analysis of NO 2 from two global models, MOZART-IFS and TM5, is also included. This gives information on the consistency between the regional and global models. It also illustrates the effect of using a limited domain in the horizontal and in the vertical in the RAQ models, versus a limited resolution in the global models. A sensitivity study with TM5, with the use of a regional 1×1 • resolution over the EU-RAQ domain, versus a global 3×2 • baseline version is used to investigate the resolution issue in more detail.

Participating models
In this section we describe the models that contributed to this study. The models are all participating in the EU-GEMS project. Included are two global models (MOZART-IFS and TM5), and eight RAQ models.

Regional models
The contributing regional models are BOLCHEM (Mircea et al., 2008), CAC (Gross et al., 2007), CAMx (Morris et al., 2003) CHIMERE (Bessagnet et al., 2008), EMEP , EURAD-IM (Elbern et al., 2007), MATCH (Andersson et al., 2007) and SILAM (Sofiev et al., 2008a). All these models delivered tropospheric NO 2 columns on an hourly basis, up to 72 h forecast time. The model domain ranges from −15 to 35 • longitude and 35 to 70 • latitude. The RAQ models differ substantially in resolution (0.15-0.5 • ), model top (100-500 hPa), meteorology, chemical mechanism and transport scheme. A model specification is provided in Table 1. Four models directly use meteorology from the IFS operational forecasts, whereas EURAD-IM and CAMx use the MM5 model (Kain, 2002). BOLCHEM uses the BOLAM meteorological model and CAC uses HIRLAM. All these regional meteorological models use initial and boundary values provided by the operational IFS forecast. The chemical mechanisms in CAC and CAMx are based on updated versions of the CBM-IV mechanism (Gery et al., 1989). CHIMERE uses the MELCHIOR II mechanism, (Schmidt et al., 2001). BOLCHEM applies the SAPRC90 gas chemistry mechanism (Carter, 1990).
The EURAD-IM model applies a 3-D-var data assimilation procedure before the beginning of a forecast, which uses NO 2 concentrations from ground-based measurement of the European air quality networks. Most of the RAQ models except for CAC and EMEP use boundary conditions for trace gases, including O 3 , CO, NO, NO 2 , PAN and HNO 3 (horizontally and at model top) from the MOZART-IFS forecast system. The EMEP model applies climatological data for most species, and a constant boundary value for O 3 of 40 ppb.
All RAQ models have produced daily semi-operational 3day forecasts and the present study is based on the accumulated output produced in the course of one year. During this year some model upgrades were implemented. The MATCH model resolved a bug in the application of the NO x emissions and applied the MOZART-IFS boundary conditions from the first of November onwards. The EURAD-IM model increased its resolution to 0.15×0.125 • after 13 February 2009.  Gross et al. (2007) A. Gross L25, 250 hPa Undén et al. (2002) Gery et al. (1989; Carter (1996) Yang et al. (2005 CAMx NKUA, I. Kioutsioukis, 0.3×0.3, MM5/ECMWF CBM-IV + updates Collela and Woodward (1984) K-theory, coeff. from Morris et al. (2003) A. Poupkou L15, 300 hPa Gery et al. (1989), Carter (1996) MM5 Hong and Pan (1996)

Global models
The MOZART-IFS forecast run experiment ez2m, Flemming et al. (2009) is based on MOZART-3, (Kinnison et al., 2007;Horowitz et al., 2003), coupled to ECMWF's Integrated Forecasting System (IFS). Advection is treated by a numerically fast, flux form semi-Lagrangian transport scheme (Lin and Rood, 1996). The chemical mechanism contains the chemical families O x , NO x , HO x , ClO x and BrO x , as well as CH 4 and a series of Non-Methane Hydrocarbons (NMHCs).
In total there are about 108 species, over 200 gas-phase reactions and 70 photolytic processes (Horowitz et al., 2003;Kinnison et al., 2007). The current version applies a gaussian grid with a resolution of about 1.875 • longitude/latitude and a distribution of 60 layers, with the top layer at 0.1 hPa. This system has run continuously from January 2008 to April 2009, delivering global forecasts of trace gases up to three days ahead. This experiment is based on a free-running coupled system, i.e. without data assimilation. The TM5 model, (Krol et al., 2005), version KNMI-cy3-GEMS is employed offline, and uses the operational meteorological fields from ECMWF. The baseline horizontal resolution is 3×2 • longitude/latitude. In the current setup the model has 34 vertical layers with the top layer at 0.1 hPa. The chemistry scheme in TM5 is based on a modified CBM-IV mechanism (Gery et al., 1989;Houweling et al., 1998). The main modifications concern an extension of the methane oxidation chemistry and updating the product distribution for the isoprene oxidation reactions. This improves the performance for background conditions (Houweling et al., 1998). The rate constants have been updated to the latest recommendations from JPL (Sander et al., 2006). Tracer advection is evaluated with the "slopes" scheme (Russell and Lerner, 1981), and turbulent transport is according to Holtslag and Boville (1993). Another difference compared to the standard version of TM5 is that transport of NO 2 and NO is evaluated explicitly, rather than using a scaling by NO x . For this study model runs were performed with the baseline resolution as well as with a zoom region with a resolution over Europe of 1×1 • . These model runs are denoted as TM5 and TM5-Zoom, respectively. The tropospheric column is evaluated based on a definition for the tropopause where O 3 exceeds 150 ppb. Above Europe this is at about 200 hPa.
point-sources, which may be injected into higher model levels. The total amount of anthropogenic NO x emissions for the EU RAQ domain is 4.2 Tg N/yr. The emission inventory in both global models is based on the RETRO inventory for the year 2000 (http://retro.enes.org), see Table 2. In Fig. 1 the yearly-average NO x emissions from RETRO and TNO are shown. The high resolution of the TNO emissions as compared to RETRO is clear from this figure. On average for the RAQ domain the total NO x anthropogenic emissions for RETRO are about 10% higher, due to higher emissions over the western part of Europe (see Fig. 2, light-blue region). In this region, the emissions are on average about 2.5 times higher than what is specified by TNO. Over east and south-east Europe the inventories are on average more alike, and occasionally TNO is higher.
Recently an emission inventory for Greece (Markakis et al., 2010) and the Greater Istanbul Area (Markakis et al., 2009) has been compiled based on detailed activity data as well as national emission reports employing bottom-up methodologies. The comparison between these inventories and the TNO inventory indicate a possible underestimation of NO 2 in the TNO inventory of 26% for Greece and 57% for Istanbul, which has seen substantial economic growth in the past ten years.
In SILAM the EMEP inventory, (Tarrasón et al., 2005) is used to fill in the missing emissions in the TNO inventory for some eastern European and Asian countries. In CAMx, CHIMERE, BOLCHEM, MATCH and SILAM ships-emissions based on the EMEP inventory (Vestreng, 2003) have been included. In contrast to the global CTM's, the regional models apply a diurnal cycle and distinguish between working days and weekends. The implementation of this temporal variability is different between the models, for instance in CAMx this is based on the GENEMIS project (Society, 1994). The NO x emissions are injected as a combination of NO and NO 2 . In the RAQ models the fraction of NO emissions varies between 85%, as in BOLCHEM, and 95% as in EMEP.
In TM5 and MOZART-IFS the NO x emissions are injected in the model as NO. Unfortunately, due to an implementation error the actual NO emissions applied in the MOZART-IFS forecast system were scaled down by approximately a factor two compared to the original RETRO inventory. Also different to the RAQ models, the global models include parameterizations for lightning NO x emissions, aircraft emissions and a climatological emission set for biomass burning. The lighting and aircraft emissions as applied in TM5 are slightly larger than in MOZART-IFS, Table 2. From the RAQ models only the EMEP model includes a parametrization for lightning NO x production (Köhler et al., 1995).

DOMINO Product description
OMI has an overpass at approximately 13:30 LT and achieves a resolution of 13 km along track and 24 km in nadir across track, with its highest resolution at small viewing zenith angles. It obtains global coverage within one day, as OMI observes the atmosphere with a 114 • field of view corresponding to a 2600 km wide spatial swath. This image is constructed from 60 discrete viewing angles, perpendicular to the flight direction.
In this study we compare the modeled NO 2 columns to tropospheric columns from the DOMINO product, version 1.0.2. The retrieval algorithm for the DOMINO product has been described by Boersma et al. (2007Boersma et al. ( , 2009a. Slant columns for NO 2 are retrieved using the differential optical absorption spectroscopy technique (DOAS) in the 405-465 nm range. For the evaluation of tropospheric columns a combined retrieval-assimilation-modelling approach is used. The stratospheric NO 2 columns are obtained by running the TM4 chemistry transport model forward in time based on assimilated NO 2 information from previously observed orbits. For the evaluation of the retrieval Air Mass Factor (AMF), Table 3. the TM4 tropospheric NO 2 profiles simulated for 13:30 LT are used. TM4 evaluates the tropospheric composition on a 3×2 • resolution and uses basically the same chemical mechanism as in TM5, as described in Houweling et al. (1998). Cloud fraction and cloud pressure are obtained by the O 2 -O 2 algorithm (Acarreta et al., 2004). The main differences of version 1.0.2 from version 0.8 described in  are the use of level-1 radiance and irradiance spectra with much improved instrument calibration parameters (Collection 3, see Dobber et al., 2008), and the switching off of the a posteriori viewing-angle dependent corrections. Prior to 17 February 2009 surface albedo from combined TOMS and GOME sets are used in the standard DOMINO product. After this date, a surface albedo map derived from the OMIdatabase at 471 nm  has been used. The OMI datasets are publicly available from the TEMIS project website (http://www.temis.nl).

Fig. 2. Illustration of regions as defined in
For this study the retrieved tropospheric NO 2 columns have been filtered for pixels where the fraction of the satellite-observed radiance originating from clouds is less than 50%. This roughly corresponds to cloud fractions below 10-20%, which implies that the models are evaluated for (nearly) clear-sky conditions. In cases where multiple measurements are available at the same location for the same day a weighting of observation data is applied, based on the squared cosine of the satellite viewing zenith angle. In this way high resolution observations are given more weight than observations at the side of the swath. During the analysis period several row anomalies occurred in OMI data. The affected rows have been removed from the data set, see http://www.temis.nl and (Boersma et al., 2009a).

Uncertainties in the DOMINO product
The contributions to the error estimate in the tropospheric NO 2 column are described in (Boersma et al., 2004). The uncertainty due to cloud fraction (and aerosols) was estimated to be up to 30% for polluted regions and uncertainties due the surface albedo up to 25%. For the retrieval of the vertical NO 2 column an a priori estimate of the NO 2 profile is needed. Errors in the a priori profile shape can be caused by an under-representation of the OMI pixels, due to the low spatial resolution of the a priori concentration field . The uncertainty in the tropospheric AMF due to the model profile is evaluated for the GOME retrieval by Boersma et al. (2004), and is estimated to be of the order of 10%.
Recently it was shown that an improved surface albedo map  leads to an average decrease of the OMI NO 2 columns by about 12% in September over the Netherlands (Hains et al., 2010). They also found that the DOMINO product in September over the Netherlands is over-estimating the total columns by 10% when using the TM4 profiles, compared to using LIDAR measurement profiles. This was attributed to a too modest mixing of the boundary layer in the TM4 model. For measurement locations in less polluted regions the a priori profile shapes are generally well in line with the observations. A study where TM4 a priori profiles were replaced with GEOS-Chem profiles (Lamsal et al., 2010), which assumes full mixing in the planetary boundary layer, confirmed these findings. Also Zhou et al. (2009) reported a high bias over rural areas in spring and summer over the Po Valley and the Swiss Plateau. Another effect that leads to systematic errors in the current DOMINO product concerns the Air Mass Factor (AMF) for the lowest model layer. The interpolation method used results in too low values for the lowest box AMF and consequently 0-20% too high tropospheric NO 2 columns (Zhou et al., 2009). Taken together the above results suggest that the current OMI product is biased high over polluted regions by 0-40%, especially in summer.

Intercomparison approaches
In this study we use the model fields from the first forecast day only, as we are mainly focussing on the general differences of NO 2 between models and OMI, rather than their forecast skills over time. Ideally for all models the inner product of the simulated profiles and the OMI averaging kernels should be taken, before comparing the modeled retrieval equivalents to the DOMINO product. Unfortunately full 3-D information is available only for two RAQ models. Instead the OMI product is directly compared to the modeled total columns, which are readily available. To investigate the effect of the neglect of the averaging kernels on the model results, we have performed a sensitivity test for two RAQ models for which the full 3-D model output is present, see Sect. 10. For the intercomparison of modeled total columns to the retrieval product, the model data are interpolated in space and time to the OMI measurement points. Specifically, the model data are collocated at the OMI measurement points, which means that implicitly the same cloud cover selection criteria as for the OMI observations are used. Next, the measurement data and the corresponding model data are regridded onto a common 0.1×0.1 • grid. The ensemble median is then created from the daily median of the tropospheric columns from all contributing regional models, in every grid cell.
For the intercomparison to surface observations from the Dutch Air Quality Monitoring Network the model output is interpolated in space and time to the available measurements from all rural sites. All available observations are averaged on a monthly basis.
Seven regions have been defined to facilitate the comparison of the models in different parts over the RAQ domain, see Table 3 and Fig. 2. During winter months there are no retrievals available over the northern part of Europe, due to low solar zenith angles. To intercompare area-averaged statistics for different months, a "mid/southern-Europe" region is defined where all year round OMI data are available. The region over the Netherlands is defined in order to relate the comparison to OMI observations with the analysis at the surface.

Comparison of the RAQ ensemble median with OMI observations
Maps of monthly mean tropospheric NO 2 columns for the ensemble median of the regional models are given in Fig. 3 for August, December 2008 and April 2009, as compared to OMI NO 2 observations. The scale is approximately logarithmic and ranges over two orders of magnitude. In general the ensemble median captures the observed locations of high and low NO 2 columns over the densely populated regions, like the Benelux region and the large cities in Europe, and the low values over the Atlantic ocean. The spatial correlation between the RAQ ensemble median over the mid/south region and the OMI observations is both in August and in December r=0.80. For this evaluation the ensemble and the observations are averaged onto a common 0.4×0.4 • grid (n=6000).
In summer (August) OMI shows considerably higher NO 2 columns than the RAQ model median. Although the values in the hotspots (London, Paris, Madrid, Ruhr) are quite comparable, the mean background values over continental Europe are considerably higher in the OMI retrieval than in the models. This suggests that the concentrations higher up in the atmosphere are higher than modelled. This could indicate that the NO x lifetime, which is determined by chemistry Atmos. Chem. Phys., 10, 3273-3296, 2010 www.atmos-chem-phys.net/10/3273/2010/ (including the conversion of reservoir species such as PAN), dry and wet deposition, is longer than predicted by most models. Also transport processes to the free troposphere may be underestimated. As discussed before, the DOMINO product may have a positive bias which is most pronounced in summer. The lower ratio between hotspots and background values in the DOMINO product compared to the ensemble median can also partly be explained by the coarse horizontal resolution of the a priori profiles in the retrieval product, as will be discussed in Sect. 10.1. OMI shows relatively high NO 2 concentrations over parts of eastern Europe in comparison to the models. For instance, Istanbul appears more pronounced in the OMI data, indicating that the TNO emissions may be underestimated. The RAQ models generally do not include soil-NO x emissions, which could lead to an under-estimation over Ukraine.
Also over southern Europe, and especially the Iberian Peninsula, the model ensemble shows systematically lower NO 2 tropospheric columns in summer compared to OMI. This can partially be explained by missing emission sources from biomass burning and lightning.
On average the measured tropospheric NO 2 column increases in winter months, due to an increased NO 2 lifetime. The discrepancy between the ensemble median and the retrieval is on average relatively small, compared to summer. However, regionally differences between the ensemble median and OMI are observed. For instance over the Po Valley and the outflow over the Adriatic sea, the RAQ ensemble underestimates the high NO 2 columns.
Scatter plots have been produced of the RAQ ensemble median versus OMI for the mid/south RAQ region (not shown). The regression slope is 0.54 in August and 0.68 in December, while the offset is −0.2×10 15 molec/cm 2 in August, and 1.1×10 15 molec/cm 2 in December (n=6000). The small slope in August illustrates the much higher values of OMI in summer. The relatively small slope combined with the offset in winter indicates that the ensemble median does not capture the full range of values as observed by OMI, as the mean column amount is comparable.
In Table 4 the regional mean of the ensemble median and the corresponding OMI observations are given for winter (DJF) and summer (JJA) time periods. During winter months the model average is well in line with the observations, in both cases about 3.0×10 15 molec/cm 2 for the mid/south region. For the same region in summer the ensemble median is 0.9×10 15 molec/cm 2 , which is about 50% of the mean OMI column. Over the western European region the RAQ ensemble is 40% below OMI. Also over eastern Europe and the Iberian Peninsula the model ensemble shows systematically lower NO 2 tropospheric columns in summer compared to OMI. Although the surface albedo maps in the DOMINO product have been replaced in February 2009, a comparison of OMI observations for May-June 2008 to the 2009 data did not reveal an overall systematic change.

Model intercomparison
Maps of monthly mean tropospheric NO 2 columns for all regional models as well as the global models are given in Figs. 4 and 5 for August 2008. Additionally, Fig. 6 shows the seasonal evolution of the mean tropospheric NO 2 columns over the selected regions.
The EURAD-IM, EMEP and CAMx models show generally good correspondence to each other, and to the ensemble median. BOLCHEM shows relatively high tropospheric columns over the big cities, and at the same time similar low values in rural areas as the other RAQ models. The MATCH model suffered from a relatively large low bias during the summer months compared to the ensemble median, which was identified as a problem in the application of the emission inventory. Unlike the other models MATCH is relatively high in August at its domain boundaries over the Atlantic. With a model-upgrade in November  Fig. 4. Mean modeled tropospheric NO 2 columns in August for the contributing regional models. emissions are increased and boundary conditions are taken from MOZART-IFS, which led to a better correspondence to the ensemble median afterwards. The CHIMERE model is generally well in line with other models for summer months, but it misses the NO 2 hotspot over Madrid. Ship tracks west and south from Spain, as visible in the ensemble median are visible in BOLCHEM, CAMx, MATCH, EMEP, SILAM and TM5-Zoom. The EURAD-IM, CHIMERE and CAC models do not show enhanced NO 2 columns at the major shipping routes, as these emissions have been omitted in these model versions. The reason for this is that they were not part of the initial distribution of the prescribed emission inventory. In the global models MOZART-IFS and TM5 the NO 2 is too diluted in the large grid-boxes to see any signal from shipping.
Atmos. Chem. Phys., 10, 3273-3296, 2010 www.atmos-chem-phys.net/10/3273/2010/ Compared to the ensemble median the SILAM model shows relatively large NO 2 columns all over the continent, indicating a longer NO 2 lifetime in this model. Only for Scandinavia and over the Atlantic Ocean tropospheric NO 2 columns are low. In December the difference between SILAM and the other RAQ models is smaller, although this model still shows relatively high columns. As a result the seasonal cycle in this model over the western and eastern European regions is closer to what is observed by OMI. Because of its exceptional behavior as compared to the other models, a number of sensitivity studies have been performed. This revealed that the modeled background level of NO 2 is mainly explained by the specific chemical mechanism used in SILAM. When this mechanism was replaced with a basic version of CBM4 (Gery et al., 1989) the modeled columns were much more in line with the ensemble median, but somewhat worse compared to OMI. Secondly, a decrease in the intensity of the vertical mixing was shown to result in a decrease in the tropospheric column.
The spatial correlation between the individual models and the OMI observations over the mid/south region is given in Fig. 7. It shows that the correlation of the ensemble median and OMI is higher than all individual RAQ models that contribute to the ensemble. This illustrates the strength of the ensemble approach. The good performance of the median suggests that quasi random errors existing in individual models cancel out in the ensemble. The regression slope and offsets for the individual models and the ensemble median, compared to OMI are given in Table 5. With exception of MATCH and SILAM, the slopes for the individual RAQ models range between 0.45 and 0.87 in summer and between 0.69 and 0.77 in winter (n=6000). The offsets for the individual RAQ models are well comparable, both in summer and in winter. Only CHIMERE shows a relatively low offset in winter, compared to the ensemble median, suggesting that the model captures the dynamical range as observed by OMI, but with a mean value which is low in December. This is also visible from Fig. 6 where CHIMERE shows relatively low tropospheric columns over eastern Europe and Italy in the winter season. Similar to the missing NO 2 concentrations in summer over Madrid, this is most likely caused by a problem with the application of the emission inventory. Table 4 also lists the model spread, quantified as the RMS of the difference of the individually modeled regional mean tropospheric columns and the mean of all regional models. On average for the mid/south RAQ region, the spread in the models is of the order of 45% in summer and 27% in winter. For smaller regions the model spread varies between 40%-62% in summer and 20%-34% in winter. This indicates the model results are relatively more uncertain in summer. In this season the NO x photochemistry, which is the key difference between the contributing models, is more active than in winter. This may also partly explain the larger differences observed in the comparison with OMI in summer.

Global models
The MOZART-IFS global model shows low NO 2 columns compared to the RAQ ensemble, both in summer and winter. The low bias in MOZART-IFS is attributed to the fact that NO x emission fluxes in this experiment have been underrepresented by about a factor 2, which is resolved in a new model version (not shown). In the TM5 model on the 3×2 • Atmos. Chem. Phys., 10, 3273-3296, 2010 www.atmos-chem-phys.net/10/3273/2010/ resolution the tropospheric NO 2 columns over the western Europe region are in summer slightly larger than the RAQ ensemble. This can be explained by the use of the RETRO emission inventory, which is significantly higher for this region than the TNO-inventory. TM5-Zoom shows a much larger spatial detail in NO 2 columns compared to the reference TM5 run and the spatial correlation with the observed columns is similar in summer, and larger in winter. On the other hand, the total columns in TM5-Zoom are also significantly higher compared to the reference run, and also compared to most of the other models. This, together with the fact that the global models are not able to resolve the observed hotspots, illustrates that the use of high-resolution models is necessary to account for the spatial variation in NO 2 . At the same time it reveals a sensitivity to the change in model resolution. This is possibly related to the shorter time-stepping in TM5-Zoom, which results in larger vertical mixing and consequently larger tropospheric columns. The correlation of the global models with OMI is higher than the regional models. This is artificial and is related to the lower resolution of the global models. A similar effect was observed earlier by van Noije et al. (2006), where a simple smoothing of global model results led to higher correlation coefficients compared to observations. Therefore the regional and global correlations cannot be quantitatively compared. For a sound comparison of the correlation statistics between regional models and the global models it would be necessary to regrid all individual model-results to the same (coarse) resolution as the global models. In this process the spatial detail of the regional models would be completely lost. We therefore limit ourselves to an intercomparison of the correlation statistics of the global models over Europe. The correlation in August is approximately 0.83 (n=6000) for all global models, and somewhat lower in winter. The relatively poor correlation of about 0.73-0.81 compared to the summer months may be explained by the relatively large variability in observed values in December, as a possible consequence of a poorer sampling of OMI data compared to summer, which is not captured by the global models.

Model intercomparison of vertical profiles
The modeled total columns and surface concentrations are linked by the NO 2 profiles, Figs. 8 and 9. These figures show the area-averaged monthly mean profiles at midday (12:00 UTC), using all available model data for August and December 2008. The RAQ models have stored their daily forecasts at four levels: the surface, 500, 1000 and 3000 m above the surface. These levels have been converted to pressure levels, using a standard surface pressure for the selected regions. For the global models as well as the two RAQ models with full 3-D information (EURAD-IM/CAMx) the fields from all model levels are used. The RAQ models show qualitatively similar mean profile shapes in August over the western Europe and the Netherlands regions. The SILAM and CHIMERE model concentrations are relatively high, especially at about 900 hPa, and MATCH and MOZART are on the low side, related to the implementation of emissions in these models. Over the eastern Europe region the SILAM model shows very high NO 2 concentrations in summer. This indicates a longer NO 2 lifetime, which was explained by the chemical mechanism adopted in this model, as well as implementation differences in eastern Europe as compared to the other models.
Over Italy and specifically the Po-valley region the BOLCHEM and EMEP models show a relatively large NO 2 gradient in the boundary layer with large surface concentrations (not shown). This is also the case for BOLCHEM over the Iberian Peninsula which produces high concentration hotspots around the cities, as discussed in Sect. 6. Background concentrations in BOLCHEM match relatively well to the model median.
In December the spread in the model results is smaller than in August, which is in line with the ensemble spread results for the NO 2 column. In particular the CAC model is relatively high in this month, both in the boundary layer and the free troposphere.
The global models TM5 reference and TM5-Zoom are well in line with the RAQ models. The concentrations from the MOZART-IFS system show a similar shape as the other RAQ models, but concentrations are lower both for August and December.
Differences in the profile shape in the models could partly be explained by the applied boundary layer mixing scheme. Models with enhanced mixing show lower NO 2 concentrations near the surface and a smaller vertical gradient in the boundary layer. Also the injection of NO x as either NO   a distribution of 85 % NO versus 15% NO 2 emissions. This is a relatively large fraction of NO 2 injected in the model compared to the other models, where the NO x emissions are introduced as at least 90% NO, up to 100% in the global models. This may locally lead to a shift in the photostationary equilibrium between NO, NO 2 and O 3 , in particular in high emission regions. This could contribute to the high surface concentrations as observed locally over the Iberian Peninsula.
Other explanations are differences in the chemistry and indirectly the photolysis scheme that determines the NO 2 /NO equilibrium. The photolysis rates are in turn affected by meteorology, as for instance modeled cloud cover has an impact on the solar radiation. High NO 2 in the free troposphere, as observed in SILAM, CHIMERE and BOLCHEM and in winter time the CAC model, could also be explained by the chemical mechanism. The conversion of NO 2 to other species depends on the OH concentration in the models: high OH concentrations lead to a reduced lifetime of NO x . However, the OH concentration and its variability depends on many other aspects of the photochemical mechanism. Also the presence of heterogeneous chemistry, and specifically the removal of N 2 O 5 by hydrolysis plays an important role in the removal of NO x , (Dentener and Crutzen, 1993). Finally reactive nitrogen is transported to cleaner regions via PAN and also organic nitrate. Their formation rates vary between the different chemistry schemes (Emmerson and Evans, 2009). The quantification of the relative importance of all these aspects would require an in-depth comparison of the chemistry schemes, and is outside the scope of the current analysis.

Comparison to in-situ observations in The Netherlands
The modeled monthly mean concentrations at the lowest model layer are compared to the Dutch Air Quality Monitoring Network (LML), (Beijk et al., 2007), at 13:00 UTC. We have selected 17 rural stations as their measurements are considered most representative for the regions comparable to the coinciding model grid (Blond et al., 2007). The corresponding model results have been interpolated in space and time to these measurement sites. The measurements of NO 2 from the ground stations are all based on detection of NO by chemiluminescence and the reduction of NO 2 to NO by heated molybdenum converters. It is well known that this method is subject to interferences due to NO z components (e.g. PAN and HNO 3 ), (Winer et al., 1974;Steinbacher et al., 2007). Here NO z is defined as NO y -NO x with NO y the sum of all reactive nitrogen oxides. This interference effect is stronger over background stations than in urban regions, larger in summer compared to winter, and larger in the afternoon than in the morning. A correction factor has been proposed by Lamsal et al. (2008), based on the estimated ratio of NO 2 to NO z . This also accounts for the Atmos. Chem. Phys., 10, 3273-3296, 2010 www.atmos-chem-phys.net/10/3273/2010/ efficiency with which NO z species are converted into NO on the molybdenum surface. Based on independent CHIMERE model results for NO z (Boersma et al., 2009b), which have been validated for a rural measurement site at Taenikon, located on the Swiss plateau (Lamsal et al., 2008), monthlymean correction factors at 14:00 UTC for all individual stations have been calculated. These factors range from 0.6 in summer (with a spread due to variations in the modeled concentrations of σ =0.14), to 0.97 (σ =0.01) in winter. This implies an increase of the seasonal cycle in the observations due to this interference correction. The comparison of the individual models, as well as the RAQ ensemble median to the corrected measurements is shown in Fig. 10. In summer 2008 the RAQ ensemble is very close to the LML observations for July-August 2008, see also Table 4.
In DJF the RAQ ensemble under-estimates the observed NO 2 concentrations by 21%, while tropospheric columns in this period are only low by 9% as compared to OMI. On average the model spread evaluated as the RMS of the individual seasonal means, scaled to the ensemble mean, in July-August is 33%, whereas in DJF this is 14%.
Again the MATCH and MOZART-IFS models predict the lowest surface concentrations, whereas MATCH gets more in line with the other models from November onwards. In summer 2008 model data from EURAD-IM, EMEP, SILAM and TM5 are well in line with observations. CAC is relatively low in summer 2008, but it is remarkable that this model, as well as SILAM, performs best in predicting the observed high concentrations in winter. It is interesting that SILAM is able to produce summertime surface concentrations that are well in line with observations, but at the same time produces high NO 2 column values in summer as compared to the other models and also in comparison to the DOMINO product over this region. EURAD-IM performs relatively well in summer 2008 and winter, but has a negative bias in spring 2009. TM5, TM5-Zoom, BOLCHEM, EMEP and CHIMERE show a relatively modest seasonal cycle, showing a good correspondence or over-estimation in spring/summer, and an under-estimation in winter. CAMx is low both in summer and winter. It should be noted that the current analysis over the Netherlands is representative for one of the most NO x -polluted regions in Europe, and differences in NO 2 concentrations are mostly dominated by the proximity of emission sources, rather than due to transport effects. These results therefore may not be representative for other regions in Europe where the NO x lifetime plays a more important role.
In summary, the individual model performance depends on the season and region. Individual models perform better for specific months than the ensemble median, but on average for the whole year the ensemble median is as good as the best two RAQ models, with a negative bias of 14% compared to observations. 9 Diurnal cycle Figure 11 shows the diurnal cycle of the area-averaged tropospheric column for the region over The Netherlands. This region is chosen as it is well representative for regions with high anthropogenic emissions. The CAC model data and August data for SILAM were not available for this purpose. The spread in the RAQ models, quantified as the RMS of the individual RAQ members at OMI-overpass time scaled to the monthly mean, is of the order of 36% in August and 16% in December. These numbers and also the monthly means are similar to what was found earlier, see Table 4.
Apart from differences in their offset, the models show significant differences in the diurnal cycle. All models show a drop in NO 2 concentrations during daytime, related to the changing photochemistry, but the timing and magnitudes are different. At OMI overpass time (13:30 LT, which corresponds on average for this region to approximately 12:00 UTC) the models are close to their daytime minimum. The ratio of the maximum over the minimum tropospheric column is a bit larger in summer compared to winter. For August this ratio is on average for the model mean 1.8, with a spread in the models, defined as the RMS of the individual ratios of the monthly mean diurnal maximum to minimum, of 0.6, while in December the mean ratio is 1.6 (spread 0.3). Model results with GEOS-Chem over Israel (Boersma et al., 2009b), which also included a diurnal cycle in anthropogenic emissions, also showed a stronger cycle in summer compared to winter, in line with observations. In their study larger ratios in summer were attributed to larger daytime NO 2 loss rates in summer compared to winter, as the photochemical sink from oxidation by OH is larger in summer than in winter. The ratio of the RMS to the model mean diurnal cycle ranges from 15% in December to 33% in August. This indicates a larger spread between the models in summer compared to winter with respect to their diurnal cycle. The RAQ models show a distinct peak in NO 2 concentrations in the evening, related to the rush hour emissions and the NO to NO 2 conversion. A modest peak in NO 2 is found also in the morning hours (06:00-09:00 UTC), which can also be attributed to increasing (traffic) emissions, before the photolysis rate of NO 2 becomes important. BOLCHEM shows a remarkably strong diurnal cycle in summer. This could be related to the application of the relatively large fraction of NO 2 over NO, emitted into the model (15% of NO 2 versus 85% of NO), together with the increase in rush-hour emissions in the evening.
The global models capture the decrease in NO 2 during daytime, but to a lesser extent the increases in morning and evening hours, as predicted by the regional models. This can be attributed to the timing of emissions. In the global models these emissions are simply constant over the whole day, which results in an over-estimation of NO 2 concentrations during night-time and the reverse during daytime. The figures also show that NO 2 columns from TM5-Zoom for this region are higher than TM5 over all day, and more in line with the regional models. This is partly a resolution effect, where TM5 is not able to resolve the high emission area considered here.

Effect of averaging kernel on modeled total column
The tropospheric NO 2 retrieval algorithm accounts for the fact that the sensitivity of the satellite instrument is changing with altitude. On average OMI is more sensitive to NO 2 in the free troposphere than to NO 2 in the boundary layer. This vertical sensitivity information is stored in the averaging kernel which depends on the satellite viewing geometry, and on aspects like the cloud cover and the surface reflectivity. This averaging kernel profile is included in the retrieval product for every individual pixel (Boersma et al., 2009a). The retrieval of the vertical tropospheric column depends on independent information on the vertical distribution. In the DOMINO product best-guess NO 2 tropospheric profiles have been derived from collocated TM4 model simulations sampled at local overpass time. This implies that the direct comparison of RAQ model tropospheric columns with the DOMINO product depends also on the quality of the TM4 profile simulations.
A better solution is the comparison between OMI and the modeled profile where the averaging kernel is applied. In this case the actual sensitivity of the satellite measurement is explicitly accounted for and the a priori TM4 profile shape no longer influences the comparison (Eskes and Boersma, 2003). In mathematical language: (y − Ax)/y or (y −Ax)/Ax is independent of the a priori profile shape used in the retrieval. Here y is the OMI observation, A is the averaging kernel vector, and x is the vertical profile of NO 2 partial columns of the model to be compared with OMI.
For most of the RAQ models only limited vertical information (concentration at a few vertical levels) was available for this study. For these models we have therefore compared the reported tropospheric NO 2 column with the OMI retrieval. Two models, EURAD-IM and CAMx, have provided the full 3-D model fields. We use these two models to answer the following questions: (1) What is the quantitative difference between the direct column comparison and the comparison using the averaging kernel?
(2) What is the error introduced by a missing upper troposphere in those models with a model top below the tropopause?
(3) What is the free troposphere contribution to the OMI tropospheric column observation?
(4) How do the regional and global model profiles compare with the TM4 a priori profile? with the averaging kernel. This is a measure of the contribution of NO 2 from these levels to the total signal as measured by OMI. In the following, the integrated partial columns are denoted as N tc for the total model column or N k =Ax for the profile where the averaging kernel is applied. Table 6 lists the direct tropospheric columns over the western Europe region, as well as its contributions from the boundary layer (1000-800 hPa), the free troposphere, and also specifically the upper part of the free troposphere (500-200 hPa) as compared to the corresponding partial columns multiplied with the averaging kernels, for August and December 2008. The TM4 a priori column N tc is identical to N k , due to the definition of the averaging kernel. The columns N tc and N k for the CAMx and EURAD-IM models are very similar, both for August and December 2008. Thus the comparisons that explicitly use the kernel lead on average to similar results as compared to the direct column comparisons. The total, area-averaged columns with and without kernels for July-December 2008 are shown in Fig. 13 as well as the corresponding average retrieval from the DOMINO product. It shows that also for the other months the mean difference between the column with kernel and the direct column is small over the western Europe region. For other regions in Europe the conclusions are similar. Also the RMS difference between N k and N tc is provided. This value does not exceed 10% in summer and approximately 20% in winter.

The impact of the averaging kernel on the partial columns
However, locally the differences between N tc and N k are substantial. This is shown in Fig. 14, for the model EURAD-IM in August 2008. N tc is higher than N k over major cities and other hotspots of pollution, whereas it is lower over background regions. The contrast between major hotspots and rural areas will therefore be smaller in the OMI data from the DOMINO product than in the vertical NO 2 column from the RAQ models. This partly explains the differences between the model median vertical columns and the OMI data as presented in Fig. 3. For instance, over Oslo the DOMINO product is significantly lower than the ensemble. Part of this difference will disappear if N k instead of N tc is displayed for the RAQ median. This is related to the horizontal resolution in the TM4 model, used to generate the a priori profiles, which cannot resolve these relatively small-scale effects. For quantitative applications such as the estimation of emissions it is therefore crucial that the kernels are used.
NO 2 at higher altitudes as compared to the a priori profile used in the retrieval, then N k >N tc . The largest gradients in the averaging kernel occur near the surface, and therefore the comparisons are most sensitive to the exact altitude of the NO 2 lower in the atmosphere, e.g. in the boundary layer.
In the boundary layer the TM4 a priori profile peaks closer to the surface than in the regional models CAMx and EURAD-IM. This is true for both August and December. This leads to a relative increase in N k in the RAQ models compared to the case where the BL profile shape would be identical to TM4. Table 6 quantifies this effect: for TM4 we find a ratio N k,BL /N tc,BL =3.6/4.8=0.75 (0.79) for August (December); for EURAD-IM this ratio is N k,BL /N tc,BL =0.83 (0.78); for CAMx this ratio is N k,BL /N tc,BL =0.78 (0.89). On average this ratio is therefore 7% (6%) higher in the regional models in August (December) as compared to TM4. The effect is not very large, but systematic.
A second effect comes from the free troposphere. Figure 12 and Table 6 show that the global model TM4 predicts much higher NO 2 concentrations above 800 hPa than the regional models. The ratio is N tc,FT /N tc,BL =1.0/4.8=0.21 (0.07) for TM4 in August (December). For EURAD-IM this ratio is N tc,FT /N tc,BL =0.17 (0.07), and for CAMx this ratio is N tc,FT /N tc,BL =0.13 (0.06). The regional models have a larger fraction of their NO 2 column in the boundary layer as compared to the a priori TM4. This results in a decrease of N k relative to N tc . To conclude, the regional models have relatively more NO 2 at the top of the boundary layer, but relatively less in the free troposphere as compared to the a priori profiles. These two effects are not very strong, and partly cancel, which explains the small differences between N k and N tc in Fig. 13. We note that these statements are made for monthly and regional averages.
For the interpretation of the satellite measurements, however, Fig. 12 and Table 6 hold an important message. The kernel results provide the contribution of different altitude ranges to the signal observed by OMI. Based on the TM4 a priori profiles the troposphere above 800 hPa contributes 2.2/(3.6+2.2)×100=38% (26%) to the signal observed. For EURAD-IM and CAMx these numbers are somewhat smaller 32% (25%) and 22% (16%). The contribution of the different sublayers to the OMI signal is shown in Fig. 12 (dashed lines). This means that a large part of the observations should be interpreted as representative of the free troposphere. (Clearly 800 hPa is a crude estimate of the BL top pressure.) Table 6 shows that the TM5 model has very similar ratios between the free troposphere and boundary layer subcolumns as the TM4 a priori. Also MOZART-IFS has similar ratios, despite the much lower total column amounts. Therefore all global models are in reasonable agreement as far as profile shape is concerned and predict larger free troposphere concentrations than the two regional models studied.
The relatively high partial columns in TM4 near the surface compared to the RAQ models are in line with Hains et al. (2010), who found that TM4 a priori partial columns Atmos. Chem. Phys., 10, 3273-3296, 2010 www.atmos-chem-phys.net/10/3273/2010/ in the lower boundary layer are higher compared to LIDAR measurements. Lamsal et al. (2010) showed that replacing the TM4 a priori profiles with GEOS-Chem profiles, which have more mixed NO 2 concentrations in the boundary layer, leads to reduced tropospheric columns. Our analysis of surface NO 2 concentrations at rural sites over the Netherlands also suggests that TM4 over-estimates NO 2 compared to the measurements as presented in Fig. 10. In summer TM4 simulates concentrations that are twice as high as the surface observations, while in winter TM4 is in line with the ensemble median, and underestimates the surface observations by 20%. An analysis of the TM4 code as used in the DOMINO product revealed an NO 2 sampling error. Just before the moment of sampling vertical transport including boundary layer mixing is applied on NO x . However this vertical mixing and transport is not explicitly applied to NO 2 , resulting in too high surface concentrations in TM4. The impact of the profile shape in the study based on GEOS-Chem (Lamsal et al., 2010) is stronger than in our analysis based on GEMS-RAQ models, due to the assumption of local mixing in the boundary layer in GEOS-Chem.

The contribution of the upper troposphere to the partial NO 2 columns
As suggested by Napelenok et al. (2008), the contribution from the upper part of the free troposphere (higher than 500 hPa) to the tropospheric NO 2 column may be important. This is confirmed by simulations from the TM4 and TM5 models, see Table 6. In these models 5-10% of the tropospheric NO 2 column is situated at levels above 500 hPa in summer. Although emissions in the free troposphere are low compared to the boundary layer, the NO x lifetime is much larger. The percentual contribution in MOZART-IFS, CAMx and EURAD-IM at these levels is lower. The difference between TM5 and MOZART-IFS NO 2 concentrations in the upper part of the free troposphere can be attributed to different aircraft and lightning emissions, as well as differences in the chemistry schemes. When considering the partial columns with the averaging kernels, the percentual contribution to the OMI observed signal reaches a total of the order of 10-20% in the global models over the western Europe region, both in summer and in winter. Again, the CAMx and EURAD-IM models show a relatively small contribution, between 0 and 15% in summer and about 5% in winter, over this region. This implies that in the comparison of model output from regional models where the averaging kernels are applied one has to correct for the contribution to the total column in the upper troposphere, if this region is not accounted for in the models.

Conclusions
We presented a comparison of tropospheric NO 2 from OMI measurements (the DOMINO product) to the median of an ensemble of Regional Air Quality (RAQ) models, and an intercomparison of the contributing RAQ models and two global models for a period of one year (July 2008-June 2009) over Europe. The models are all part of the GEMS forecasting system. The regional models in majority apply a similar anthropogenic emission inventory, the same driving meteorological forecast fields, and the same boundary conditions. Apart from these common factors the models are characterized by considerable differences. The RETRO NO x emissions used in the global models are generally larger than the TNO emissions used in the RAQ models. An evaluation of the model performance of the ensemble median compared to the individual models, and an intercomparison of the individual models leads to the following conclusions: -A good correspondence of spatial patterns as well as the variation in magnitude from the ensemble median compared to the OMI columns over Europe is found. The spatial correlation r=0.8 (n=6000) of the ensemble median is higher than all individual contributing RAQ models, both in summer and in winter. The global models agree well with the large-scale features of the OMI NO 2 distribution, but capture substantially less detail than the RAQ models. The profile shape for the different RAQ models are qualitatively similar and correspond well to the profile shapes in the global models.
-With respect to the Dutch surface observations, on average for the whole year the ensemble median shows equally good performance as the best individual RAQ models. The yearly average bias of the ensemble median is −14% relative to observed surface NO 2 concentrations in the Netherlands.
-The model spread, quantified as the RMS of the RAQ models scaled to their mean, is 45%. This relative spread is smaller in winter (20%-34%, depending on the region) than in summer (40%-60%). In summer model differences between NO x photochemistry could be more important than in winter, suggesting a larger uncertainty in the model ensemble.
-The diurnal ratio of maximum over minimum tropospheric columns are slightly larger in summer than in winter, in line with a study by Boersma et al. (2009b). This is probably due to a larger photochemical sink from oxidation by OH in summer. The spread in the models, defined as the RMS of this ratio, ranges between 15% of the ratio itself in December and 33% in August.
-The relatively high background concentrations in SILAM points at a significantly larger NO 2 residence time compared to the other models. We note that the quality of SILAM can not be judged on the basis of the NO 2 comparisons presented here, and more extensive comparisons for other compounds are needed. However, it is interesting to note that a model with an enhanced lifetime is able to relate the seasonal cycles observed by OMI and by the Dutch surface stations in a quantitative way.
A comparison of area-averaged columns from two regional models (EURAD-IM and CAMx) and the global models show only a remarkably small bias when the averaging kernel is not taken into account. This reflects the higher sensitivity in the boundary layer for the RAQ models in summer that is largely compensated by the lower NO 2 in the free troposphere as compared to the TM4 a priori profile. In winter the modeled profile shapes are more similar to the a priori. Tropospheric concentrations in the RAQ models may be low because of missing emissions from aircraft and lightning. These emissions are partly accounted for via the MOZART-IFS boundary conditions. However, MOZART-IFS concentrations of NO 2 in the free troposphere are also relatively low compared to TM4 and TM5, due to an underrepresentation of the NO x emissions.
However, there are substantial local differences between the direct vertical columns and the columns with the averaging kernels. Vertical columns are larger (smaller) than the columns with the kernels over major hotspots (rural areas). Therefore, the vertical columns in the RAQ models are expected to show a stronger contrast between cities and background areas as compared to the OMI retrieval from the DOMINO product, which uses the coarse TM4 a priori.
Validation studies (Hains et al., 2010;Lamsal et al., 2010;Zhou et al., 2009) have indicated a high-bias of the DOMINO product, due to the a priori profile shape, the surface albedo map and the error in the air-mass factor at the surface. The combined effects lead to an estimated overestimation of the order of 0-40% in summer. In winter the TM4 and RAQ profiles are very similar and the bias in the OMI retrievals is probably less affected by TM4 profile shape issues. With these considerations in mind the remaining conclusions from this study can be summarized as follows: -It is found that the TM4 a priori NO 2 concentrations near the surface as used in the retrieval algorithm are significantly larger relative to the observations and to all contributing global and RAQ models. This is caused by a time sampling issue in TM4 that leads to an underestimate of vertical mixing. This is consistent with earlier findings by, e.g., Hains et al. (2010). This affects the a priori profile shape, and hence contributes to a positive bias in the DOMINO product.
-The amplitude of the seasonal cycle in the ensemble median is larger than observed from OMI. On average for the middle and southern part of the RAQ-domain the mean of the RAQ models is in summer 50% below the OMI observations. In winter the RAQ median and OMI are in closer agreement.
-The comparison to Dutch surface observations shows that the ensemble median performs better in summer than in winter, when concentrations are underpredicted. This implies that the seasonal cycle in surface observations is stronger than the cycle in the ensemble median surface concentrations. The good correspondence between the RAQ ensemble median and surface observations in summer, combined with the overall discrepancy between models and OMI over this region, are in line with the suggestion of an OMI high bias in summer.
-The global models indicate that the upper part of the free troposphere (higher than 500 hPa) contributes between 6-10% to the tropospheric NO 2 column. For EURAD-IM and CAMx this contribution is 0-4%. When considering the partial columns where the averaging kernels are applied, the percentual contribution to the signal observed by OMI reaches up to 20%, both in summer and in winter. This implies that when the averaging kernels are applied in the comparison of regional models to OMI observations one has to correct for the contribution to the total column in the upper troposphere, in case that this region is not part of the model domain.
More conclusive statements on uncertainties in processes for the individual models, such as emission, deposition, chemical reaction rates including photolysis and (vertical) transport, would require dedicated sensitivity studies which is beyond the scope of this paper. The ensemble median showed the best spatial correlation to OMI observations, which supports the use of an ensemble of models for the prediction of the regional air quality, as implemented in the GEMS project.