Evaluation of the performance of different atmospheric chemical transport models and inter-comparison of nitrogen and sulphur deposition estimates for the UK

comparison chemical transport model Model evaluation in a generally better representation of measured aerosol and precipitation concentrations by more complex models. The models were compared graphically by plotting maps and cross-country transects of wet and dry deposition as well as calculating budgets of total wet and dry deposition to the UK for sulphur, oxidised nitrogen and reduced nitrogen. The total deposition to the UK varied by ± 22 e 36% amongst the different models depending on the deposition component. At a local scale estimates of both dry and wet deposition for individual 5 km (cid:3) 5 km model grid squares were found to vary between the different models by up to a factor of 4.


Introduction
Concern over the emissions of pollutant gases leading to acidification of soils and surface waters in Europe arose during the 1970s and 1980s, principally due to SO 2 emissions from commercial power production caused by burning coal. The environmental degradation of sensitive ecosystems, notably in upland regions was linked to the emissions of pollutants which in some cases originated in neighbouring countries, at distances of up to approximately a thousand km away.
Following substantial reductions in SO 2 emissions (http://naei. defra.gov.uk/) scientific interest has subsequently become more focused on eutrophication of natural ecosystems due to the deposition of nitrogen (N) from both oxidized nitrogen (emitted primarily as NO x from fuel combustion) and reduced nitrogen (emitted mostly as NH 3 from agricultural sources). Heath-land communities are highly sensitive to N deposition. Field experiments have correlated inorganic nitrogen deposition to a loss of biodiversity in different ecosystem ranging from grassland (Stevens et al., 2004) to boreal forest (Nordin et al., 2005). Nitrogen deposition is also an important pathway leading to acidification of terrestrial and freshwater ecosystems. The eutrophication of fresh waters can cause a severe reduction in water quality impacting on fish stocks and other plant and animal life. Atmospheric deposition of reactive nitrogen has been recognised as one of the most significant threats to global biodiversity (Sala et al., 2000).
Deposition of sulphur and nitrogen to the earth's surface can occur via the mechanisms of both 'dry' deposition and 'wet' deposition. Dry deposition is primarily due to gaseous compounds (SO 2 , HNO 3 , NH 3 , and NO 2 ) with aerosol making smaller contributions. In the case of NH 3 , the deposition is largest near to emissions sources which occur in the rural environment (e.g. Loubet et al., 2009;Vogt et al., 2013). Oxidized nitrogen (NO x ) emissions are primarily in the form of NO which has a very low deposition velocity to vegetation so atmospheric oxidation to HNO 3 must take place before significant deposition occurs. For SO 2 , emissions are predominantly due to power generation from elevated point sources with the speed of vertical diffusion to the surface being strongly influenced by meteorological conditions. Wet deposition occurs due to the incorporation of aerosol particles (acting as cloud condensation nuclei and scavenged below cloud) which fall to ground as precipitation, as well as below-cloud and in-cloud scavenging of soluble gases.
National and international monitoring networks have been set up during the last few decades to analyse the chemical composition of precipitation, notably sulphate (SO À 4 ), nitrate (NO À 3 ) and ammonium (NH þ 4 ). In the UK, the network in its current configuration was initiated in 1986 and is now a component network of the UK Government's Eutrophying and Acidifying Atmospheric Pollutants (UKEAP) project (http://uk-air.defra.gov.uk/networks/networkinfo?view¼ukeap, last access 5/3/2015). Monitoring of gas phase pollutants initially focused on SO 2 however since then networks to monitor NO 2 , NH 3 and other pollutant concentrations at rural locations have been set up. The DELTA system  was developed initially to monitor NH 3 and NH þ 4 in regional, long-term monitoring being subsequently extended to sample acid gases (SO 2 , HNO 3 , HCl), and aerosols as well as the inorganic components of aerosol (size fraction < 4 mg m À3 ). For the period of the model inter-comparison addressed here (2003, see below) the DELTA method was applied for all these air pollution components at 12 sites, with the network since having been extended to 30 sites.
International legislation has been successful in reducing emissions of SO 2 to the atmosphere through the United Nations Economic Commission for Europe Gothenburg Protocol (1999) and the European Union National Emissions Ceiling Directive (2001). In the UK, an 85% reduction in SO 2 emissions occurred between 1970 and 2003 primarily due to fuel switching from coal to gas and the introduction of flue gas desulphurization to power generating plants. The reduction in emissions has led to major decreases of sulphur concentrations measured in the atmosphere in both air and precipitation (RoTAP, 2012), with corresponding reductions in acidifying inputs to natural ecosystems in the UK and other European countries. Major reductions in emissions of NO x of 40% between 1970 and 2003 have also occurred due to introduction of more efficient combustion processes and the fitting of catalytic convertors on vehicles. However these reductions have not resulted in major decreases in wet deposition of oxidized nitrogen, most likely because of non-linearity in atmospheric chemical reactions, in particular the interactions between gas phase and aerosol lifetimes (Fowler et al., 2005). Furthermore, decreases in estimated emissions of ammonia in the UK have been more modest (11% between 1990 and 2010) and reductions in concentrations of ammonia in air and wet deposition of reduced nitrogen have not been observed on a national scale. As a result, the decrease of inputs of nitrogen to natural ecosystems has been much less significant than that for sulphur deposition during recent decades (Matejko et al., 2009;Fowler et al., 2005). Analysis of data from the EMEP (European Monitoring and Evaluation Programme) monitoring network has shown that whilst ammonium and nitrate concentrations in precipitation have declined in Europe, the sum of nitrate and nitric acid in air remained at the same level (Fagerli and Aas, 2008).
Atmospheric Chemical Transport Models (ACTMs) are computer programs which have been developed to simulate meteorological, physical and chemical processes. They are able to provide estimates of the concentration and deposition of air pollutants known to have detrimental impacts on both human health and natural ecosystems. In this study a range of simpler and more complex ACTMs have been applied to make estimates of sulphur and nitrogen deposition. An operational evaluation of the performance of the models has been undertaken by comparison with measurements of concentrations in air (gas and aerosol) and precipitation. A comparison of wet and dry deposition obtained with the different models has been made using both national deposition totals and a cross-country transect.

Description of models
ACTMs have been used in the UK during the last two decades to calculate acid deposition and provide advice to policy makers. The advantages of models include: (i) Estimation of the concentration and deposition of air pollutants at a large number of model grid cells in the UK (typically~10,000 for a model with a 5 km grid resolution). (ii) Estimation of the future changes of impacts on ecosystems based on projections for pollutant emissions. (iii) Attribution of pollutant deposition to individual emissions sources through source emission reduction simulations.
In contrast, monitoring of air pollutants is both spatially and temporally limited by the number of sites and the period of their operation.
Atmospheric chemical transport modelling in the UK was initially undertaken using 'simpler' models such as HARM (Metcalfe et al., 2001) and FRAME (Singles et al., 1998). These Lagrangian models use straight line trajectories and operate in an annual average mode, assuming constant drizzle (Fournier et al., 2005) to drive wet deposition (based on maps of precipitation for the UK) and annual wind direction frequency roses (Dore et al., 2006) to represent general circulation patterns of air trajectories. These models simulate a 'moving air column' and independently perform calculations along pre-defined trajectories by contrast to Eulerian models which simultaneously perform calculations at all points in a predefined grid.
Major advances in High Performance Computer technology as well as a general move to open source code for both meteorological models and ACTMs have both driven a move to the use of more complex models during recent years, with an emphasis on Eulerian approaches. These models include the US Environmental Protection Agency Community Multi-scale Air Quality (CMAQ) modelling system (Byun and Schere, 2006;Chemel et al., 2011) and the EMEP model , including its high resolution application to the UK which is used in the present study (EMEP4UK, Vieno et al., 2014). Such systems use a meteorological model to generate 3-dimensional temporally evolving data on wind speed, temperature, humidity, cloud and precipitation which are then used to drive the ACTM. The meteorological data was evaluated with the Met Office Integrated Data Archive System (MIDAS, http:// catalogue.ceda.ac.uk/uuid/220a65615218d5c9cc9e4785a3234bd0 last access 2 July 2015). These 'more complex' models also include Lagrangian approaches, such as NAME (Redington et al., 2009) which is driven by temporally evolving meteorology in a Lagrangian framework. For the present study CMAQ and EMEP4UK used independently generated meteorological data calculated with the WRF (Weather Research and Forecasting) model (http://www. wrf-model.org; Skamrock and Klemp, 2008) while the NAME model was run using global meteorological data calculated with the UK Met Office Unified Model.
The use of both simpler and more complex models provides complementary benefits. For example the advantages of more complex models include: a more detailed representation of meteorology and its influence on concentrations of air pollutants; high temporal resolution of pollutant concentration (Vieno et al., 2014); more detailed parameterisation of non-linear atmospheric chemical reactions; simultaneous multi-pollutant simulation (i.e. representation of acid deposition, surface ozone and particulate matter in one model; Byun and Schere, 2006). In contrast, the simpler models benefit from a fast simulation time which allows: multiple simulation applications including source-receptor and integrated assessment studies (i.e. Oxley et al., 2013); uncertainty studies (Page et al., 2008); high spatial resolution studies and detailed vertical resolution (Hallsworth et al., 2010;Dore et al., 2012).
The models involved in this study included two simpler Lagrangian models (employing annually averaged meteorology) and three more complex models driven by dynamic meteorology and using diurnally variable photo-chemical reaction schemes. A summary of the models is given in Table 1. This inter-comparison included two independent applications of the CMAQ model and also two applications of the EMEP model run at different resolutions of 50 km and 5 km respectively (termed EMEP.MSCwest and EMEP4UK). The EMEP.MSCwest simulation used data from the HILRAM meteorological model (http://www.hirlam.org/last access 2 July 2015). This allowed an assessment of the sensitivity of model grid resolution and of the variability in modelled concentrations and deposition not just between different models but for different applications of the same model. The two CMAQ simulations used identical meteorological inputs but different annual emissions profiles (discussed below). The models used common inputs of annual atmospheric emissions from the UK National Atmospheric Emissions Inventory (http://naei.defra.gov.uk/ last access 5: March 2015) which are updated annually and gridded at a 1 km resolution. The models were unconstrained with regard to choice of boundary conditions, meteorological data, land use cover and internal model parameters. The model domains covered the entire United Kingdom (including the northern islands) for HARM and the British Isles (including the Republic of Ireland) for CMAQ, EMEP4UK, FRAME and NAME. The model domains were not uniform but typically covered an area of approximately 900 km westeeast Â 1200 km southenorth. The models included in this study are of varying levels of chemical complexity and therefore have differences regarding the speciation of the chemical components of the atmosphere. To generate the boundary conditions for a UK scale simulation, the ACTMS were first run at a coarser 50 resolution over a European domain with the EMEP4UK and CMAQ European simulations using meteorological data from a 50 resolution WRF simulation. All the models used in this study include the major inorganic sulphur and nitrogen compounds. These include gases which are significant for dry deposition (SO 2 , NO 2 , HNO 3 and NH 3 ), as well as particulate matter components which are efficiently wet deposited (NH þ 4 , NO À 3 and SO À 4 ) and can also be dry deposited. Other chemical components including many NO y species such as nitrous acid (HONO) are included in the more complex models but in this exercise their dry deposition was not explicitly modelled. The simpler models adopt a single scavenging parameter for wet deposition processes whilst the more complex models have separate scavenging coefficients for in cloud and below cloud scavenging of gases and particles. Various different resistance formulae are used to calculate dry deposition velocities. However whilst the more complex models use temporally evolving meteorology in their calculations, dry deposition calculated with simpler models is based on annually averaged deposition velocities.
Other models used to calculate sulphur and nitrogen deposition include the Danish Ammonia Modelling System (DAMOS, Geels et al., 2012) which combined a long range transport model with a local scale Gaussian model for dry deposition. Kranenburg et al. (2013) describe the development of a source apportionment tool in the LOTOS-EUROS model which was used to track the emissions sources contributing to nitrogen concentrations in the Netherlands. The CHIMERE model was used by Garcia-Gomez et al. (2014) to assess the threat of nitrogen deposition to the Natura 2000 network of nature reserves in Spain. Appel et al. (2010) assessed the performance of CMAQ over the USA by comparison with measurements of wet deposition of sulphur and nitrogen from the National Atmospheric Deposition Programme.

Measurement data
Atmospheric monitoring data for 2003 were used in this study for comparison with the model estimates. Measured concentrations in air and precipitation were obtained as part of the component networks which now collectively comprise UKEAP: SO À 4 , NO À 3 , NH þ 4 precipitation concentrations from bulk sampler analysis at 37 sites. Aerosol (SO À 4 ,NO À 3 ,NH þ 4 ) at 12 sites using DELTA samplers. SO 2 , NH 3 and HNO 3 gas concentrations at 12 sites using DELTA samplers. SO 2 gas concentrations at 37 sites using bubbler samplers. NH 3 gas concentrations at 88 sites using both active (DELTA) samplers and passive (ALPHA) samplers. NO 2 gas concentrations at 32 sites using diffusion tubes.
Further details of the monitoring networks are included in Hayman et al. (2004), which is available from http://uk-air.defra. gov.uk/networks/network-info?view¼ukeap (last access: 5/3/ 2015). Data capture averaged across the sites exceeded 97% of the samples collected for particulate and gaseous chemical concentrations. For precipitation chemistry data capture (on average 78%) was lower principally due to exclusion of samples with high phosphate concentrations indicating contamination from bird strike. All monitoring sites used in this study are based at rural or semi-rural locations which are located at least 2.5 km away from significant emissions sources, such as major roads. The location of the monitoring sites are shown in the supplementary material ( Fig. 1(a)e(f)). Bulk samplers used to measure precipitation composition are sampled fortnightly, DELTA and ALPHA samplers record monthly averages and NO 2 diffusion tubes are changed every 4e5 weeks.

Evaluation of models by comparison with measurements
The models were evaluated by comparing annually averaged measurements of gas concentrations (SO 2 , NO 2 , NH 3 ) and aerosol concentrations (SO À 4 ,NO À 3 and NH þ 4 ) in air as well as ion concentrations in precipitation with the output of the models. It is noted that HNO 3 measurements are currently under review and have not been included in this assessment.
The evaluation was undertaken with the Openair software using the R statistical language (Carslaw and Ropkins, 2012). A report blending text and data analysis was automatically generated (Xie, 2013). This approach has the advantage that the results are easily reproducible by a third party and updates to submitted model data can rapidly be incorporated by re-running the software. Development of the Openair software and its application to intercomparison of the models in this study as well as models for surface ozone and local dispersion is discussed in detail in Carslaw (2011). The more complex models participating in this study generate data with high temporal frequency (typically with resolution of a few hours) whilst the simple models are designed to calculate only annually averaged concentrations and deposition. For this study, the models have been evaluated using only annually averaged data for a single year. 2003 was selected based on the availability of meteorological data to drive the complex model simulations.
A variety of different metrics have been proposed to evaluate the performance of atmospheric chemical transport models by comparing the difference between model predictions and observations (i.e. Chang and Hanna, 2004). Here we adopt relatively simple criteria for a model to be considered 'fit for purpose' which were set according to a previously agreed model evaluation protocol (Derwent et al., 2010). These were: FAC2 > 0.5 and À0.2 < NMB < 0.2 where: FAC2 (i.e. 'factor of 2') is the fraction of points greater than 0.5 times and less than 2 times the measured value and NMB is the Normalised Mean Bias defined as: Oi where O i represents the i th observed value and M i represents the i th modelled value for a total of n observations. The NMB illustrates model over-or under-estimate relative to measurements and is useful for comparing pollutants that cover different concentration scales as the mean bias is normalised by dividing by the observed concentration.
Example plots of the correlation of the models with a gas concentration (NO 2 ), a particulate concentration (SO À 4 ) and a concentration in precipitation (NH þ 4 ) are illustrated in Fig. 1(a)e(c) with performance statistics summarised in Table 2. The results for other chemical components of gas, aerosol and precipitation concentrations are illustrated in the supplementary material ( Figure S3, Table S1). Table 3 illustrates correlation statistics for all measured chemical components averaged across the different models. Further details of the analysis are available at: http://uk-air.defra. gov.uk/library/reports?report_id¼652 (last access 30/6/2015).
The first of the evaluation criteria, FAC2 > 0.5, was generally satisfied by the models for all variables, but the second condition, À0.2 < NMB < 0.2, was not satisfied for all variables (as shown in Table 2 and Table S1(a)e(c), supplementary material).
The example of comparison with measurements of a gas concentration (NO 2 , Fig. 1(a)) shows that a simpler model (FRAME) was able to achieve a level of agreement with measurements (FAC2 ¼ 0.97, NMB ¼ À0.1) which is as good as the more complex models. This may be due to the fine vertical resolution in FRAME (1 m at the surface) which permits detailed specification of the height at which different types of emission source are input to the model. For SO 2 (supplementary material, Figure S3.1) there were more significant variations between models for NMB (from 0.07 to Table 1 Summary of models participating in the inter-comparison including model grid resolution. Two independent applications of the CMAQ and EMEP models (the latter at different grid resolutions) are included.  2.38) and FAC2 (from 0.04 to 0.96). SO 2 emissions originate primarily from a small number of elevated point sources (principally power stations) and treatment of emission height can be important. The CMAQ and FRAME models include a plume rise parameterization for point source emissions whereas other models apply an emission sector-dependent height. The scatter in correlation of the models with NH 3 gas concentrations (supplementary material, Figure S3.2) is generally higher (FAC2 less than 0.78 for all models) than for SO 2 and NO 2 . This does not necessarily reflect a difficulty in the models to simulate the behaviour of ammonia. It is more likely to be caused by the high spatial variability in emissions in rural locations which results in changes in ammonia concentrations on scales not captured by ACTMs with grid spacing typically of approximately 5 km. The NH 3 concentration measured at an individual site may not be representative of the surrounding area as represented in a 5 km model grid cell (Hallsworth et al. (2010); Vogt et al. (2013).
For aerosol concentrations there is clear evidence that the more complex models obtain better correlation with measurements than the simpler models. EMEP4UK, EMEP.MSCwest, CMAQ.UH and NAME all achieved a FAC2 of 1.0 and NMBs of 0.0, À0.03, À0.03 and 0.20 for SO À 4 aerosol respectively ( Fig. 1(b)). This may be due to the difficulty of the simple models to capture the full magnitude of long range transport of particulate matter from the European continent during 2003 when winds from the east were more common than normal. Furthermore complex models use hourly meteorological data to drive the formation of secondary inorganic aerosols whereas simpler models assume an annual average formation rate. More complex models also performed well for NO À 3 aerosol (supplementary material, Figure S3.4) notably NAME (FAC2 ¼ 1.0, NMB ¼ 0.04) and CMAQ.JEP (FAC2 ¼ 1.0, NMB ¼ À0.20). The overall correlation of the models with measurements of NO À 3 aerosol (average FAC2 ¼ 0.75) is not as good as for SO À 4 (average FAC2 ¼ 0.94) which may be due to the more complex chemical reactions leading to the formation of oxidised nitrogen aerosol. All models showed some underestimate of NH þ 4 aerosol concentrations (average NMB ¼ À0.30, supplementary material Figure S3.3).
Considerable scatter in the correlation of all the models for NH þ 4 concentrations in precipitation is apparent (Fig. 1(c)). None of the models is able to achieve FAC2 > 0.9. On average the models tend to underestimate reduced nitrogen concentrations in precipitation as well as the gaseous and particulate forms which may be an indication that emissions sources are underestimated or that removal of NH 3 by dry deposition is too rapid. The average value of r for all the models for NH 4 concentration in precipitation is 0.67, compared to 0.74 and 0.76 for SO À 4 and NO À 3 respectively (Table 3). All models underestimate NO À 3 concentrations in precipitation (average NMB ¼ À0.26). This may suggest either a missing source of oxidised nitrogen emissions, or overall underestimates in atmospheric chemical conversion or washout coefficients. The models generally exhibited negative values of NMB for aqueous phase concentrations (average values of À0.10, À0.26 and À0.18 for SO À 4 , NO À 3 and NH þ 4 respectively, Table 3). This result may be explained by the fact that bulk precipitation collectors are used in the monitoring network and will be subject to dry deposition contamination, principally by gaseous deposition (as discussed below). Overall more complex models tended to score higher values for FAC2 and r than the simpler models for precipitation concentrations. The influence of model grid resolution can be assessed by comparing the results of the correlation with measurements for EMEP.MSCwest (50 km grid resolution) and EMEP4UK (5 km grid resolution). In general, EMEP4UK performed better than EMEP.MSCwest for gas concentrations (SO 2 , NO 2 , and NH 3 ) whereas for aerosol and precipitation concentrations, the differences in correlation are less significant.

Comparison of modelled deposition
The second part of this study involves a comparison of deposition data generated by the models described above (with the exception of EMEP.MSCwest) using national scale deposition budgets as well as maps and plots along a transect across the UK for wet and dry deposition of SO x , NO y and NH x , show in supplementary material, Figure S2. This approach allows visualization of both the national scale and local scale variability in deposition between the models.
The total UK wet and dry deposition budgets of SO x , NO y and NH x are illustrated in Fig. 2 for all models. The average values and standard deviations for modelled deposition were: 95 ± 30 Gg S for SO x dry deposition and 71 ± 17 Gg S for SO x wet deposition; 67 ± 14 Gg N for NO y dry deposition and 47 ± 9 Gg N for NO y wet deposition; 79 ± 19 Gg N for NH x dry deposition and 59 ± 11 Gg N for NH x wet deposition. These values show that the models predict that dry deposition is overall a more important process for removal Table 2 Model performance statistics for comparison with concentration measurements: FAC2: fraction of points greater than 0.5Â and less than 2Â the measured value; NMB: normalised mean bias; r: Pearson correlation coefficient. of sulphur and nitrogen compounds from the atmosphere (with higher values than those for wet deposition by 35%, 44% and 34% for SO x, NO y and NH x respectively) for the year 2003. However it should be noted that for SO x and NO y dry deposition the higher values occur in industrial and urban areas where as many ecosystems sensitive to acid deposition and nitrogen deposition are located in upland areas where wet deposition is the most important process. Furthermore the year 2003 was noted for its low annual precipitation and this result may not be typical of other years. Differences in dry deposition of SO 2 amongst the models occur due to significant variation in modelled surface SO 2 concentrations as is evident from the NMB values for the correlation in measurements (Table S1(a)) and can be attributed to different model treatment of elevated point source emissions. The two CMAQ simulations achieved close agreement for dry deposition and NOy wet deposition. However wet deposition of NH x and SO x is notably higher with CMAQ.JEP than with CMAQ.UH. Whilst the meteorological data used were common to the two models and parameter settings for the CMAQ simulations were generally similar, significant differences occurred due to the seasonal profile of ammonia emissions. Annual average ammonia emissions were identical but the CMAQ.UH simulation had a large seasonal variation in ammonia emissions, with summer time emission rates higher by a factor of ten than the winter time values ( Figure S4, supplementary material). This was effective in restricting the rate of formation of ammonium sulphate aerosol during the winter months. For CMAQ.JEP ammonia emission rates during summer months were approximately two times higher than winter time values and there was less restriction of ammonium sulphate formation during the winter . In reality emissions of ammonia are highly sensitive to meteorological conditions, particularly temperature. This issue is discussed in detail by Sutton et al. (2013). Skjoth et al. (2004) describes a system for dynamically generating seasonally and diurnally variable ammonia emissions for use in an ACTM using modelled meteorological data.
The spatially distributed deposition data for all the models were used to calculate the mean and standard deviation (Figs. 3.1 and  3.2) across the UK. Sulphur dry deposition is highest in the industrial regions of northern England as well as near the coast and at major ports due to the major contribution to SO 2 emissions from international shipping. NO y dry deposition is highest in the region of major cities and urban areas. For NH x dry deposition, the highest values occur in areas of intensive livestock farming, including Northern Ireland and western and eastern England. The geographical distribution of wet deposition is similar for sulphur and both oxidised and reduced nitrogen. Due to the long range transport of aerosol, the highest values occur in the high rainfall areas of the hills of Wales and northern England. The normalized standard deviation of model deposition gives an indication of the uncertainty in modelled deposition associated with choice of model. Sulphur dry deposition and reduced nitrogen deposition show the greatest variability of deposition in source regions. For wet deposition the highest values of standard deviation amongst the models occur in the hill regions of Scotland and the far northwest of the country. These differences are caused both by: variations in formation and long range transport of particulate matter; differences in representation of atmospheric washout of sulphur and nitrogen compounds by the models; different estimates of precipitation, particularly orographic precipitation over hill regions. Fig. 4(a) and (b) illustrate plots of dry deposition and wet deposition respectively for SO x , NO y and NH x along a westeeast transect across the UK. The transect (illustrated in Figure S2, supplementary material) passes through agricultural regions in Northern Ireland, crosses the North Sea and passes over high precipitation regions in southern Scotland and industrial and urban regions on the east coast of Northern England. Considerable variation is evident in the magnitude of dry deposition. Reasons for these differences include: different treatment of elevated (i.e. SO 2 ) and low (NH 3 ) emissions sources which can influence surface gas concentrations; variation in deposition velocities; differences in representation of land cover and use of vegetation-specific deposition velocities. The last of these is particularly important for ammonia as the deposition velocity can vary by an order of magnitude between improved grass land and forest (Flechard et al., 2011). Generally the standard deviation of NO y dry deposition is lower than for SO x and NH x . The magnitude of wet deposition at a local scale is found to vary considerably between models, by a factor of up to 4 between the lowest and highest estimate.
Variation in estimates of wet deposition can occur due to different precipitation values used by the models. The simpler models (HARM and FRAME) use spatially distributed data based on measurement and interpolation of annual precipitation measurements from the UK Meteorological Office (UKMO) national precipitation monitoring network which is mapped at a 5 km resolution (Simpson and Jones, 2012). Wet deposition in the complex models is driven by calculations of dynamic precipitation from a meteorological model. For CMAQ and EMEP4UK the meteorological driver is the WRF (Weather Research and Forecasting) model (http://www.wrf-model.org; Skamrock and Klemp, 2008). The UKMO and WRF precipitation maps (Fig. 5(a) and (b)) show similar spatial distributions, with the lowest values of precipitation of approximately 600 mm year À1 along the east coast of England and the highest values, above 1800 mm year À1 , in the hills of Scotland, Wales and Northern England. High precipitation regions are closely correlated to terrain height (Fig. 5(c)). Analysis of the difference in precipitation between UKMO and WRF (Fig. 5(d)) shows that the UKMO data generally have higher values in the upland areas, with differences relative to WRF of over 200 mm year À1 . Meteorological models may underestimate upland precipitation due to the complexities of air flow and formation of cloud and rainfall in hill areas which are not fully resolved at a 5 km grid resolution (Richard et al., 2007). Precipitation in hill areas is also uncertain using the UKMO measurement-interpolation approach. Rain gauges exposed to higher winds and lower temperatures may capture precipitation inefficiently and fail to record snowfall during sub-zero temperatures (Sevruk et al., 2009). Interpolation of rainfall in complex terrain may introduce errors by failing to capture the influence of local orography on annual precipitation. Hill areas are the regions of highest wet deposition and the location of sensitive ecosystems where deposition of nitrogen may exceed critical loads but they are also the areas where precipitation is less accurately estimated by both meteorological models and measurement-interpolation methods.

Discussion
The models were found on average to have negative mean biases for all precipitation concentration and gas and particulate phase measurements, except for SO 2 . It is important to review this in the light of systematic errors in measurement. Cape et al. (2009) used a 'flushing sampler' which, by detecting the onset of precipitation, was able to separately collect material dry deposited and that contained in precipitation. Comparison of this design with a standard bulk sampler over 3 months at a site in eastern Scotland showed that dry deposition to the funnel surface contributed approximately 20% of SO À 4 , 20e30% of NO À 3 and 20e40% of NH þ 4 ions. Uncertainties in measurement of gas and aerosol concentrations may also occur due to incomplete reaction of gases with the substrate on a denuder tube, loss of aerosol mass by impaction on tubing, incomplete capture of fine particulate matter by filter papers and chemical analysis by ion chromatography. Araya et al. (2012) estimated an uncertainty of 20% in the measurement of anion and cation components in aerosol particles. It is therefore clearly inappropriate to set limits on the normalised mean bias of less than ±20% for model evaluation without due consideration of systematic errors in measurement technique. This emphasises the important issue that evaluation criteria to assess the performance of atmospheric chemical transport models should never be used alone in the absence of expert knowledge. Wet only collectors are now widely used to collect samples of precipitation chemistry and can be combined with measurements of precipitation from tipping bucket rain gauges to give site-based estimates of wet deposition (i.e. Van der Swaluw et al., 2011). Furthermore wet only collection is the standard procedure recommended by the World Meteorological Organisation. However there are technical issues (more complex maintenance; the need for electrical power; under-collection of precipitation) associated with their operation. Historically the reason for installation of a monitoring network for precipitation chemistry in the UK has not been for model validation but rather to detect long term trends in pollutant concentrations and provide adequate spatial coverage for mapping purposes. The emphasis has therefore remained on use of a simple low cost technique with a relatively dense network and continuity of measurement technique. However, the increased use of ACTMs in recent years to support national policy decisions inevitably means that the design of monitoring networks in the future should take account of their role in evaluating modelled concentrations.
The deposition model inter-comparison study was undertaken for the single year 2003 based on availability of input meteorological data for the Eulerian models. It is beyond the scope of the present study to include a detailed analysis of multiple years of data. However the question as to whether the results of the model comparison would have changed with the choice of a different year needs to be considered. Changes in annual circulation and precipitation can have a strong influence on concentrations of nitrogen and sulphur compounds in air and their deposition. Kryza et al. (2012) showed that inter-annual variability of precipitation and general circulation could cause major variations in sulphur and nitrogen deposition, equivalent to changes associated with long term emissions changes. 2003 was characterised by the lowest precipitation over the UK during the last 20 years. The 2003 UK annual average according to the UK Met Office precipitation maps was 880 mm compared to 1130 mm averaged over the years 1986e2011 (http://www.metoffice.gov.uk/climate/uk/summaries). Another feature of 2003 was the high value for the annual average aerosol concentrations in air. These were the highest for SO À 4 , NO À 3 and NH þ 4 in air since measurements with the Delta samplers began in 1999 and approximately 50% higher than the long term average. Whilst the high aerosol concentrations were caused partly by low precipitation, a more important reason was the high incidence of south-easterly flow (in general an infrequent wind direction in the UK) leading to elevated concentrations of aerosol during the months of February, March and April caused by import of particulate matter from the European continent (Vieno et al., 2014). 2003 may therefore be considered a somewhat uncharacteristic year for general circulation of air masses to the UK. A model may demonstrate good agreement with measurements of, for example, total aerosol concentrations but not necessarily accurately capture specific atmospheric processes (i.e. the different relative contributions from national emission sources or long range transport). It cannot therefore necessarily be assumed that the correlation with measurements presented here for the year 2003 would be reproduced for other years with different meteorology and emissions. Simulating a year in which long range transport from the European continent made a greater than average contribution to sulphur and nitrogen concentrations poses additional challenges for ACTMs. Despite this fact both simpler and more complex models achieved a good degree of success in simulating the measured concentrations.
A multiple year study is recommended for future work to assess model sensitivity to inter-annual changes in meteorology.
An alternative approach to emissions-based atmospheric chemical transport modelling is to make use of spatially distributed measurements combined with interpolation techniques to generate deposition data. This technique is frequently used to map wet deposition by combining measurements of concentrations of precipitation with annual precipitation measurements (i.e. Smith and Fowler, 2001). Spatially distributed dry deposition estimates can also be made by interpolating gas and aerosol concentrations and combining these with vegetation specific deposition velocities, as described by Smith et al. (2000) using a big leaf model. In the UK the combination of these dry and wet deposition estimates forms the Concentration Based Estimated (CBED) deposition data and has been used, averaged over three years, to estimate the exceedance of critical loads for nitrogen and acid deposition and changes in recent decades (RoTAP, 2012). For the year 2003 the CBED UK total annual deposition estimates for the UK showed significantly higher values for wet deposition (by 31%, 65% and 47% for SO x , NO y and NH x respectively) than the mean value of the ACTMs presented in this study. For dry deposition of NH x and SO x , CBED obtained similar values to the ACTMs. Dry deposition of NO y in CBED is currently under revision due to re-calibration of HNO 3 concentrations. The reasons for these differences require detailed investigation which should be undertaken in further work.
The model evaluation in this study has been based on annually averaged concentrations in air and precipitation because the simpler models are designed to calculate annual deposition and this is the standard data required for ecosystem impact assessment. More complex models are able to output data at high temporal resolution and can be subjected to a more detailed evaluation involving hourly or daily measured data. Future work should employ updated emissions estimates to calculate multiple year estimates of annual deposition of sulphur and nitrogen to the UK for use in environmental impact assessments. It is recommended that this would include an assessment of the sensitivity of critical load exceedance (Hall et al., 2006) to both choice of technique for deposition estimation (ACTM or measurements-interpolation system) and choice of individual model. More complex models are recommended as effective tools to assess future changes in nitrogen and sulphur deposition based on projected emissions reductions. The faster run times of simpler models however means that their application in studies requiring high resolution spatial simulation or large numbers of model runs (i.e. uncertainty estimates and source-receptor calculations) will continue to be useful.

Conclusion
An evaluation has been made of a range of simpler and more complex atmospheric chemical transport models, applied to make spatial estimates of acid deposition and nitrogen deposition to the UK. Deposition data from such models can be used to calculate the exceedance of critical loads which provides valuable information to policy makers on the need to reduce emissions of SO 2 , NO x and NH 3 to protect natural ecosystems. The models were evaluated by comparison with annually averaged measurements of gas (SO 2 , NO x and NH 3 ), aerosol and precipitation concentrations (SO À 4 ,NO À 3 and NH þ 4 ) from the national monitoring networks for the year 2003. A model evaluation protocol was used to set the criteria for 'fitness for purposes'. The first condition, that at least 50% of modelled concentrations should be within a factor of two of the measured value, was generally satisfied by the models. The second criteria, that the magnitude of the normalised mean bias should be less than 20%, was not always satisfied. Uncertainties resulting from measurement techniques were not accounted for in this analysis, however these can be significant. In particular ion concentrations in precipitation can be overestimated by 20e40% using bulk collection of precipitation samplers (Cape et al., 2009). It is therefore recommended that uncertainties and biases in measurement technique are taken into account when using model evaluation criteria to judge whether a model is fit for purpose. For example, 'adjusted NMB' criteria could be used which had variable maximum and minimum limits of acceptability for the normalised mean bias that were dependent on errors and uncertainty in measurement techniques.
Simple models have practical advantages due to their fast run times and ability to perform multiple simulations and performed satisfactorily when compared with measurements. Complex models are able to more accurately represent chemical transformation and long range transport of pollutants leading to better representation of particulate concentrations of SO À 4 , NO À 3 and NH þ 4 . They also benefit from the ability to simultaneously represent other pollutants (i.e. surface ozone). No attempt was made to rank the models overall. However it was clear from the evaluation that different models performed best for different pollutants (sulphur, oxidised nitrogen, reduced nitrogen) and states (gas, particulate, aqueous) so that in practical terms ranking would not be a simple task.
Comparison of the modelled deposition budgets to the UK showed that total deposition varied by ±22e36% depending on model deposition parameter, with similar variability amongst both wet and dry deposition estimates. At a local (5 km grid square) scale however, variability in estimates of deposition amongst models could be very much higher, varying by up to a factor of four between different models. These results give an indication of the uncertainty associated with estimating sulphur and nitrogen deposition due to choice of model. Variation, and therefore uncertainty, was notably high for wet deposition in high precipitation upland areas, regions where ecosystems which are sensitive to nitrogen deposition are present.