A tropospheric chemistry reanalysis for the years 2005–2012 based on an assimilation of OMI, MLS, TES, and MOPITT satellite data

We present the results from an 8-year tropospheric chemistry reanalysis for the period 2005–2012 obtained by assimilating multiple data sets from the OMI, MLS, TES, and MOPITT satellite instruments. The reanalysis calculation was conducted using a global chemical transport model and an ensemble Kalman filter technique that simultaneously optimises the chemical concentrations of various species and emissions of several precursors. The optimisation of both the concentration and the emission fields is an efficient method to correct the entire tropospheric profile and its year-to-year variations, and to adjust various tracers chemically linked to the species assimilated. Comparisons against independent aircraft, satellite, and ozonesonde observations demonstrate the quality of the analysed O3, NO2, and CO concentrations on regional and global scales and for both seasonal and yearto-year variations from the lower troposphere to the lower stratosphere. The data assimilation statistics imply persistent reduction of model error and improved representation of emission variability, but they also show that discontinuities in the availability of the measurements lead to a degradation of the reanalysis. The decrease in the number of assimilated measurements increased the ozonesonde-minus-analysis difference after 2010 and caused spurious variations in the estimated emissions. The Northern/Southern Hemisphere OH ratio was modified considerably due to the multiple-species assimilation and became closer to an observational estimate, which played an important role in propagating observational information among various chemical fields and affected the emission estimates. The consistent concentration and emission products provide unique information on year-to-year variations in the atmospheric environment.


Introduction
Long-term records of the tropospheric composition of gases such as ozone (O 3 ), carbon monoxide (CO), and nitrogen oxides (NO x ) are important for understanding the changes in tropospheric chemistry and human activity and consequences for the atmospheric environment and climate change (HTAP, 2010;IPCC, 2013).Satellite instruments provide observations of the global distributions of tropospheric composition.For example, measurements of tropospheric O 3 have been retrieved using the Tropospheric Emission Spectrometer (TES) since 2004 (Beer, 2006) and by the Infrared Atmospheric Sounding Interferometer (IASI) since 2007 (Coman et al., 2012).Tropospheric NO 2 column concentrations have been retrieved by the Ozone Monitoring Instrument (OMI) since 2004 (Levelt et al., 2006), the Scanning Imaging Absorption Spectrometer for Atmospheric Cartography (SCIAMACHY) from 2002 to 2012 (Bovensmann et al., 1999), the Global Ozone Monitoring Experiment (GOME) from 1996 to 2003, and GOME-2 since 2007 (Callies et al., 2000).The availability of satellite-derived measurements of various chemical species has prompted increasing interest in developing methods for combining these sources of satellite observational information for studies of long-term variations within the atmospheric environment and for improving estimates of emissions sources (Inness et al., 2013;Streets et al., 2013).
Combining measurements of O 3 , CO and NO x in the atmosphere puts constraints on the concentration of OH, the main radical responsible for the removal of pollution from the atmosphere and determining the lifetime of many chemicals (Levy, 1971;Logan et al., 1981;Thompson, 1992).At the same time the combined use provides constraints on dif-Published by Copernicus Publications on behalf of the European Geosciences Union.ferent sources of surface emissions and production of NO x by lightning (LNO x ) (e.g.Martin et al., 2007;Miyazaki et al., 2014).The information that may be obtained from a combined use of multiple satellite data sets without involving a model is limited, related to differing vertical sensitivity profiles, different overpass times, and mismatches in spatial and temporal coverage between the instruments, as well as missing information on the chemical regime and origin of the air masses.
Data assimilation is the technique for combining different observational data sets with a model by considering the characteristics of each measurement (e.g.Kalnay, 2003;Lahoz and Schneider, 2014).Advanced data assimilation schemes like the Kalman filter or the related 4D-Var technique use the information provided by satellite-derived measurements and propagate it, in time and space, from a limited number of observed species to a wide range of chemical components to provide global fields that are physically and chemically consistent and in agreement with the observations.Various studies have demonstrated the capability of data assimilation techniques regarding the analysis of chemical species in the troposphere and stratosphere.
Assimilation of satellite limb measurements for O 3 profiles and nadir measurements for O 3 columns has been used to study O 3 variations in the stratosphere and the upper troposphere (e.g.Stajner and Wargan, 2004;Jackson, 2007;Stajner et al., 2008;Wargan et al., 2010;Flemming et al., 2011;Barré et al., 2013;Emili et al., 2014).Long-term integrated data sets of stratospheric O 3 have been produced by several studies by combining multiple satellite retrieval data sets (e.g.Kiesewetter et al., 2010;van der A et al., 2010).The assimilation of satellite observations has been also applied to investigate global variations in the tropospheric composition of gases such as O 3 and CO (e.g.Parrington et al., 2009;Coman et al., 2012;Miyazaki et al., 2012b).For providing long-term integrated data of tropospheric composition, as a pioneer study, Inness et al. (2013) performed an 8-year reanalysis of tropospheric chemistry for 2003-2010 using an advanced data assimilation system.They included atmospheric concentrations of O 3 , CO, NO x , and formaldehyde (CH 2 O) as the forecast model variables in the integrated forecasting system with modules for atmospheric composition (C-IFS), and they demonstrated improved O 3 and CO profiles for the free troposphere.They also highlighted biases remaining in the lower troposphere associated with fixed surface emissions, which are not adjusted in the 4D-Var assimilation scheme presented by Inness et al. (2013).
Currently available bottom-up inventories of emissions, produced based on statistical data such as emission-related activities and emissions factors, contain large uncertainties, mainly because of inaccurate activity rates and emission factors for each category and poor representation of their seasonal and interannual variations (e.g.Jaeglé et al., 2005;Xiao et al., 2010;Reuter et al., 2014).Top-down inverse approaches using satellite retrievals have been applied to ob-tain optimised emissions of CO (e.g.Kopacz et al., 2010;Hooghiemstra et al., 2011) and NO x (e.g.Lamsal et al., 2010;Miyazaki et al., 2012a;Mijling et al., 2013) by minimising the differences between observed and simulated concentrations, as summarised by Streets et al. (2013).In addition to surface emissions, the improved representations of LNO x sources are important for a realistic representation of O 3 formation and chemical processes in the upper troposphere (Schumann and Huntrieser, 2007;Miyazaki et al., 2014).
The simultaneous adjustment of emissions and concentrations of various species is a new development in tropospheric chemical reanalysis and long-term emissions analysis.Miyazaki et al. (2012b) developed a data assimilation system, called CHASER-DAS, for the simultaneous optimisation of the atmospheric concentration of various trace gases, together with an optimisation of the surface emissions of NO x and CO, and the LNO x sources, while taking their complex chemical interactions into account, as represented by the CHASER chemistry-transport model.Within the simultaneous optimisation framework, the analysis adjustment of atmospheric concentrations of chemically related species has the potential to improve the emission inversion (Miyazaki and Eskes, 2013;Miyazaki et al., 2014).This was compared with an emission inversion based on measurements from one species alone, where uncertainties in the model chemistry affect the quality of the emission source estimates.In addition, the improved estimates of emissions benefit the atmospheric concentration analysis through a reduction in model forecast error.The simultaneous adjustment of the emissions and the concentrations is therefore a powerful approach to optimise all aspects of the chemical system influencing tropospheric O 3 (Miyazaki et al., 2012b).
In this study, we present a tropospheric chemistry reanalysis data set for the 8-year period from 2005 to 2012 using CHASER-DAS.This reanalysis is produced with the CHASER-DAS system introduced in Miyazaki et al. (2012b).The system uses the ensemble Kalman filter (EnKF) assimilation technique and assimilates Microwave Limb Sounder (MLS), OMI, TES, and Measurement of Pollution in the Troposphere (MOPITT) retrieved observations.The chemical concentrations and emission sources are simultaneously optimised during the reanalysis, and are expected to provide useful information for various research topics related to the interannual variability of the atmospheric environment and short-term trends.
The remainder of this paper is structured as follows.Section 2 describes the observations used for the assimilation and validation.Section 3 introduces the data assimilation system and Sect. 4 evaluates the reanalysis performance based on analyses of data assimilation statistics.Section 5 presents comparisons against independent observations.Section 6 describes the emission source estimation results.Section 7, which discusses possible errors in the reanalysis data and offers thoughts on future developments, is followed by the conclusions in Sect.8.

Data assimilation system
The CHASER-DAS system (Miyazaki et al., 2012a(Miyazaki et al., , b, 2014;;Miyazaki and Eskes, 2013) has been developed based on an EnKF approach and a global chemical transport model called CHASER.The data assimilation settings used for the reanalysis calculation are mostly the same as in Miyazaki et al. (2014), but the calculation was extended to cover the eight years from 2005 to 2012, and several updates were applied to the a priori and state vector settings.Brief descriptions of the forecast model, data assimilation approach, and experimental settings are presented below.

Forecast model
The CHASER model (Sudo et al., 2002;Sudo and Akimoto, 2007) was used as a forecast model.It has so-called T42 horizontal resolution (2.8 • for longitude and the T42 Gaussian grid for latitude) and 32 vertical levels from the surface to 4 hPa.It is coupled to the atmospheric general circulation model (AGCM) version 5.7b of the Center for Climate System Research and Japanese National Institute for Environmental Studies (CCSR/NIES).Meteorological fields are provided by the AGCM at every time step of CHASER (i.e.every 20 min).The AGCM fields were nudged toward the National Centers for Environmental Prediction/Department of Energy Atmospheric Model Intercomparison Project II (NCEP-DOE/AMIP-II) reanalysis (Kanamitsu et al., 2002) at every time step of the AGCM to reproduce past meteorological fields.The nudged AGCM enabled us to perform CHASER calculations that included short-term atmospheric variations and parameterised transport processes by sub-gridscale convection and boundary layer mixing.
The a priori value for surface emissions of NO x and CO were obtained from bottom-up emission inventories.Anthropogenic NO x and CO emissions were obtained from the Emission Database for Global Atmospheric Research (EDGAR) version 4.2.Emissions from biomass burning are based on the monthly Global Fire Emissions Database (GFED) version 3.1 (van der Werf et al., 2010).Emissions from soils are based on monthly mean Global Emissions Inventory Activity (GEIA) (Graedel et al., 1993).EDGAR version 4.2 was not available after 2008 at the time the reanalysis was started; therefore, the emissions for 2008 were used in the calculations for 2009-2012.GFED 3.1 was not available for 2012, and thus the emissions averaged over 2005-2011 were used in the calculation for 2012.For surface NO x emissions, a diurnal variability scheme developed by Miyazaki et al. (2012a, b) was applied depending on the dominant category for each area: anthropogenic, biogenic, and soil emissions.
For the calculation of a priori LNO x emissions, the global distribution of the flash rate was parameterised in CHASER for convective clouds based on the relation between lightning activity and cloud top height (Price and Rind, 1992).
To obtain a realistic estimate of the global annual total flash occurrence, a tuning factor was applied for the global total frequency, which is independent of the lightning adjustment in the assimilation.The global distribution of the total flash rate is generally reproduced well by the model in comparison with the observations, except for overestimations over northern South America and underestimations over both Central Africa and most of the oceanic Intertropical Convergence Zone (Miyazaki et al., 2014).

Data assimilation technique
The data assimilation technique employed is an EnKF approach, i.e. a local ensemble transform Kalman filter (LETKF; Hunt et al., 2007) based on the ensemble square root filter (SRF) method, which uses an ensemble forecast to estimate the background error covariance matrix.The covariance matrices of the observation error and background error determine the relative weights given to the observation and the background in the analysis.The LETKF has conceptual and computational advantages over the original EnKF.First, the analysis is performed locally in space and time, which reduces sampling errors caused by limited ensemble size.Second, performing the analysis independently for different grid points allow parallel computations to be performed that reduce the computational cost.These advantages are important in the chemical reanalysis calculation because of the many analysis steps included in the 8-year reanalysis run and the large state vector size used for the multiple-states optimisation (cf.Sect.2.3 and 2.7).
The assimilation step transforms a background ensemble (x b i ; i = 1, . .., k) into an analysis ensemble (x a i ; i = 1, . .., k) and updates the analysis mean, where x represents the model variable, b the background state, a the analysis state, and k the ensemble size.The forecast and analysis steps are described briefly below.

The forecast step
In the forecast step, the background ensemble mean x b and its perturbation X b are obtained from the evolution of each ensemble member using the forecast model at every model grid, (1) , where N indicates the system dimension (the state vector size times the physical system dimension).Based on the assumption that background ensemble perturbations X b sample the forecast errors, the background error covariance is estimated as follows: where the background error covariance P b varies with time and space, reflecting dominant atmospheric processes and locations of the observations.An ensemble of background vectors y b i and an ensemble of background perturbations in the observation space Y b are estimated using the observation operator H (cf. Sect.2.5): (3)

The analysis step
The analysis ensemble mean is obtained by updating the background ensemble mean: where y o represents the observation vector, R is the p × p observation error covariance, and p indicates the number of observations.The observation error information is obtained for each retrieval (cf.Sect.2.6), where P a is the k × k local analysis error covariance in the ensemble space: A covariance inflation factor ( = 6 %) was applied to inflate the forecast error covariance at each analysis step.The inflation is used to prevent an underestimation of background error covariance and resultant filter divergence caused by model errors and sampling errors.The estimation of the P a matrix does not require any calculation of large vectors or matrices with N dimensions in the LETKF algorithm.
The new analysis ensemble perturbation matrix in the model space (X a ) is obtained by transforming the background ensemble X b with P a : The new ensemble members x b i after the next forecast step are then obtained from model simulations starting from the analysis ensemble x a i .

State vector
The state vector for the reanalysis calculation is chosen to optimise the tropospheric chemical system and to improve the reanalysis performance.The state vector used in the reanalysis includes several emission sources (surface emissions of NO x and CO, and LNO x sources) as well as the predicted concentrations of 35 chemical species.The chemical concentrations in the state vector are expressed in the form of volume mixing ratio, while the emissions are represented by scaling factors for each surface grid cell for the total NO x and CO emissions at the surface (not for individual sectors), and for each production rate profile of the LNO x sources.Perturbations obtained by adding these model parameters into the state vector introduced an ensemble spread of chemical concentrations and emissions in the forecast step.The background error correlations, estimated from the ensemble model simulations at each analysis step, determine the relationship between the concentrations and emissions of related species, which can reflect daily, seasonal, interannual, and geographical variations in transport and chemical reactions.The emission sources were optimised at every analysis step throughout the reanalysis period, which reduced the initial bias in the a priori emissions during the data assimilation cycle.

Covariance localisation
The EnKF approach always has the problem of introducing unrealistic long-distance error correlations because of the limited number of ensemble members.During the reanalysis calculation, such spurious correlations lead to errors in the fields that may accumulate and will influence the reanalysis quality in a negative way.In order to improve the filter performance, the covariance among non-or weakly related variables in the state vector is set to zero based on sensitivity calculation results, as in Miyazaki et al. (2012b).The analysis of surface emissions of NO x and CO allowed for error correlations with OMI NO 2 and MOPITT CO data, while those with other data were neglected.For the LNO x sources, covariances with MOPITT CO data were neglected.Concentrations of NO y species and O 3 were optimised from TES O 3 , OMI NO 2 , and MLS O 3 and HNO 3 observations.One difference to the study of Miyazaki et al. (2012b) is that concentrations of non-methane hydrocarbons (NMHCs) were not optimised in the reanalysis.The assimilation of MOPITT CO data led to concentrations of NMHCs that increased to unrealistic values during the reanalysis, likely associated with too much chemical destruction of CO (cf. Sect. 7.4.2).
Covariance localisation was also applied to avoid the influence of remote observations, which is described in Sect.2.7.

Observation operator
The observation operator (H ) includes the spatial interpolation operator (S), a priori profile (x a priori ), and averaging kernel (A), which maps the model fields (x b i ) into retrieval space (y b i ), thereby accounting for the vertical averaging implicit in the observations, as follows: where x b i is the N-dimensional state vector and y b i is the p-dimensional model equivalent of the observational vector.The averaging kernel A defines the vertical sensitivity profile of the satellite observation.Even though the retrieval y o and the model equivalent y b i both depend on the a priori, the use of the kernel removes the dependence of the analysis or the relative model-retrieval comparison (y b i − y o )/y b i on the re-trieval a priori profile (Eskes and Boersma, 2003;Migliorini, 2012).

Observation error
The observation error provided in the retrieval data products includes contributions from the smoothing errors, model parameter errors, forward model errors, geophysical noise, and instrument errors.In addition, a representativeness error was added for the OMI NO 2 and MOPITT CO observations to account for the spatial resolution differences between the model and the observation using a super-observation approach following Miyazaki et al. (2012a).The superobservation error was estimated by considering an error correlation of 15 % among the individual satellite observations within a model grid cell.

Reanalysis settings
Because a single continuous data assimilation calculation for 8 years requires a long computational time, we parallelised the reanalysis calculation.Eight series of 1-year calculations from 1 January of each year in 2005-2012 with a 2-month spin-up starting from 1 November of the previous year were conducted to produce the 8-year reanalysis data set.Each 1-year run was parallelised on 16 processors.The 2-month spin-up removed the differences in the analysis between the different time series, providing a continuous 8-year data set.
Because of distinct diurnal variations in the tropospheric chemical system, the data assimilation cycle was set to be short (i.e. 120 min) to reduce sampling errors.The emission and concentration fields were analysed and updated at every analysis step.
In the reanalysis calculation the ensemble size was set to 30, which is somewhat smaller than the 48 members used in our previous studies.A smaller ensemble size reduces computational cost but slightly degrades analysis performance, as quantified in Miyazaki et al. (2012b).The horizontal localisation scale L was set to 450 km for NO x emissions and to 600 km for CO emissions, LNO x , and for the concentrations.The physical vertical localisation length was set to ln(P 1/P 2) [hPa] = 0.2.These choices are based on sensitivity experiments (Miyazaki et al., 2012b), for which the influence of an observation was set to zero when the horizontal distance between the observation and analysis point was larger than 2L × √ 10/3 (the cut-off radius is set to 2191 km for L = 600 km).We also account for the influence of the averaging kernels of the instruments, which captures the vertical sensitivity profiles of the retrievals.The ensemble members and ensemble spread (error covariance) do vary from one location to the next, and from one species to the next, thereby representing the large number of degrees of freedom contained in the model and the way these are constrained by the observations.The a priori error was set to 40 % for surface emissions of NO x and CO and 60 % for LNO x sources, but a model error term was not implemented for emissions during the forecast.To prevent covariance underestimation and maintain emission variability during the long-term reanalysis calculation, we applied covariance inflation to the emission source factors in the analysis step -i.e.model error is implemented through a covariance inflation term.The standard deviation was artificially inflated to a minimum predefined value (30 % of the initial standard deviation) at each analysis step.This was found to be important for representing realistic seasonal and interannual variability in the emission estimates, as confirmed by the improved agreements between the predicted concentrations and independent observations when this emission covariance inflation setting is used.
In addition to the standard reanalysis run, we conducted a control run for the 8-year period from 2005 to 2012 and several sensitivity calculations for 2005 and 2010 by changing the data assimilation settings.The control run was performed without any data assimilation, but using the same model settings as used in the reanalysis run.The settings and results of sensitivity calculations are presented in Sect.7.

Assimilated data sets
The assimilated observations were obtained from the OMI, TES, and MLS on the Aura satellite, launched in July 2004 and from MOPITT on Earth Observing System (EOS) Terra, which was launched in December 1999.

OMI tropospheric NO 2 column
The OMI provides measurements of both direct and atmosphere-backscattered sunlight in the ultraviolet-visible range (Levelt et al., 2006).The reanalysis used tropospheric NO 2 column retrievals obtained from the version-2 DOMINO data product (Boersma et al., 2011).The analysis increments in the assimilation of OMI NO 2 were limited to adjust only the surface emissions of NO x , LNO x sources, and concentrations of NO y species.Low-quality data were excluded before assimilation following the recommendations of the product's specification document (Boersma et al., 2011).Since December 2009, approximately half of the pixels have been compromised by the so-called row anomaly, which reduced the daily coverage of the instrument.

TES O 3
The TES O 3 data used are the version 5 level 2 nadir data obtained from the global survey mode (Herman and Kulawik, 2013).This data set consists of 16 daily orbits with spatial resolution of 5-8 km along the orbit track. in the tropics and in the summer hemisphere for cloud-free conditions (Worden et al., 2004).The standard quality flags were used to exclude low-quality data (Herman and Kulawik, 2013).We also excluded data poleward of 72 • , because of the small retrieval sensitivity.The data assimilation was performed based on the logarithm of the mixing ratio following the retrieval product specification.

MLS O 3 and HNO 3
The MLS data used are the version 3.3 O 3 and HNO 3 level 2 products (Livesey et al., 2011).We excluded tropical-cloudinduced outliers, following the recommendations in Livesey et al. (2011).We used data for pressures lower than 215 hPa for O 3 and 150 hPa for HNO 3 to constrain the LNO x sources and concentration of O 3 and NO y species.The accuracy and precision of the measurement error, described in Livesey et al. (2011), were included as the diagonal element of the observation error covariance matrix.

MOPITT CO
The MOPITT CO data used are the version 6 level 2 TIR products (Deeter et al., 2013).The MOPITT instrument is mainly sensitive to free-tropospheric CO, especially in the middle troposphere, with degrees of freedom for signal (DOFs) typically much larger than 0.5.We excluded data poleward of 65 • and during night-time because of data quality problems (Heald et al., 2004).The data at 700 hPa were used for constraining the surface CO emissions.

Validation data sets
For the comparisons with satellite observations, the model concentrations were interpolated to the retrieval pixels at the overpass time of the satellite while applying the averaging kernel of each retrieval, and then both the retrieved and simulated concentrations are mapped on a horizontal grid with a resolution of 2.5 • × 2.5 • .For comparisons with aircraft and ozonesonde observations, the data were binned on a pressure grid with an interval of 30 hPa and mapped with a horizontal resolution of 5.0 • × 5.0 • , while the model output was interpolated to the time and space of each sample.

GOME-2 and SCIAMACHY NO 2
Tropospheric NO 2 retrievals were obtained from the TEMIS website (www.temis.nl)and consist of the version 2.3 GOME-2 and SCIAMACHY products (Boersma et al., 2011).The ground pixel size of the GOME-2 retrievals is 80 km × 40 km with a global coverage within 1.5 days, whereas that of the SCIAMACHY retrievals is 60 km × 30 km with global coverage provided approximately once every 6 days.The equatorial overpass times of GOME-2 and SCIAMACHY are at 09:30 and 10:00 LT, respec-tively.Observations with radiance reflectance of < 50 % from clouds with quality flag = 0 were used for validation.

MOZAIC/IAGOS aircraft data
Aircraft O 3 and CO measurements obtained from the MOZAIC/IAGOS (Measurement of OZone, water vapour, carbon monoxide and nitrogen oxide by AIrbus in-service airCraft/In-service Aircraft for Global Observing System) programmes (Petzold et al., 2013;Zbinden et al., 2013) were used to validate the tropospheric profiles near airports and the upper-tropospheric spatial distributions at flight altitude of about 12 km in the Northern Hemisphere (NH) and some parts of the tropics.The data are available at www.iagos.fr.The measurements of O 3 and CO have an estimated accuracy of ± (2 ppb + 2 %) and ±5 ppb (±5 %), respectively (Zbinden et al., 2013).

HIPPO aircraft data
HIAPER Pole-to-Pole Observation (HIPPO) aircraft measurements provide global information on vertical profiles of various species over the Pacific (Wofsy et al., 2012).Latitudinal and vertical variations in O 3 and CO obtained from the five HIPPO campaigns (HIPPO I, 8-30 January 2009; HIPPO II, 31 October to 22 November 2009; HIPPO III, 24 March to 16 April 2010; HIPPO IV, 14 June to 11 July 2011; and HIPPO V, 9 August to 9 September 2011) were used to validate the assimilated profiles.The DC-8 measurements obtained during the INTEX-B campaign over the Gulf of Mexico (Singh et al., 2009) were used for the comparison for March 2006.Data collected over highly polluted areas (over Mexico City and Houston) were removed from the comparison, because they can cause serious errors in representativeness (Hains et al., 2010).

NASA aircraft campaign data
The NASA ARCTAS mission (Jacob et al., 2010) was conducted in two 3-week deployments based in Alaska (April 2008, ARCTAS-A) and western Canada (June-July 2008, ARCTAS-B).During ARCTAS-A, most of the measurements were collected between 60 and 90 • N, whereas during ARCTAS-B, the measurements were mainly recorded in the sub-Arctic between 50 and 70 • N.
During the NASA DISCOVER-AQ campaign over Baltimore (US) in July 2011, the NASA P-3B aircraft performed extensive profiling of the optical, chemical, and microphysical properties of aerosols (Crumeyrolle et al., 2014).
The Deep Convective Clouds and Chemistry (DC3) experiment field campaign investigated the impact of deep, mid-latitude continental convective clouds, including their dynamical, physical, and lightning processes, on uppertropospheric composition and chemistry during May and June 2012 (Barth et al., 2015).The observations were conducted in three locations: northeastern Colorado, western Texas to central Oklahoma, and northern Alabama.The observations obtained from the DC-8 (DC3-DC8) and G-V (DC3-GV) aircraft were used.

Ozonesonde data
Ozonesonde observations taken from the World Ozone and Ultraviolet Radiation Data Center (WOUDC) database (available at http://www.woudc.org)were used to validate the vertical O 3 profiles.All available data from the WOUDC database are used for the validation (totally 19 273 profiles for 149 stations during 2005-2012).The observation error is 5-10 % between 0 and 30 km (Smit et al., 2007).

WDCGG CO
The CO concentration observations were obtained from the World Data Centre for Greenhouse Gases (WDCGG) operated by the World Meteorological Organization (WMO) Global Atmospheric Watch programme (http://ds.data.jma.go.jp/gmd/wdcgg/).Hourly and event observations from 59 stations were used to validate the surface CO concentrations.

χ 2 diagnosis
The long-term stability of the data assimilation performance is important in evaluating the reanalysis.The χ 2 test can be used to evaluate the data assimilation balance (e.g.Ménard and Chang, 2000), which is estimated from the ratio of the actual observation minus forecast (OmF: to the sum of the estimated model and observation error covariances in the observational space (HP b H T + R), as follows: where m is the number of observations.χ 2 becomes 1 if the background error covariances (P b ) are properly determined to match with the observed OmF (y o − H x b ) under the presence of the prescribed observation error (R).
Figure 1 shows the temporal evolution of the number of assimilated observations (m) and χ 2 for each assimilated measurement type.The number of super-observations is shown for the OMI NO 2 and MOPITT CO.For most cases, the mean values of χ 2 are generally within 50 % difference from the ideal value of 1, which suggests that the forecast error covariance is reasonably well specified in the data assimilation throughout the reanalysis.Note that the covariance inflation factors for the concentrations and emissions were optimised to approach to the ideal value based on sensitivity experiments (Miyazaki et al., 2012b).For the OMI NO 2 assimilation, the χ 2 is > 1, which indicates overconfidence in the model or underestimation of the super-observation error (computed as a combination of the measurement error and the representativeness error).The χ 2 for the OMI NO 2 was less sensitive to the choice of the inflation factor compared to that for other assimilated measurements.Lower tropospheric NO 2 is controlled by fast chemical reactions restricted by biased chemical equilibrium states, leading to an underestimation of the background error covariance during the forecast.Although the emission analysis introduces spread to the concentration ensemble, the perturbations are present primarily near the surface and tend to be removed in the free troposphere because of the short chemical lifetime of NO x .
Before 2010, the annual mean χ 2 is roughly constant, which confirms the good stability of the performance.Seasonal and interannual variations, especially after 2010, in χ 2 can be attributed to variations in the coverage and quality of satellite retrievals as well as changes in atmospheric conditions (e.g. chemical lifetime and dominant transport type).The increased χ 2 for OMI NO 2 after 2010 is associated with a decrease in the number of the assimilated measurements and changes in the super-observation error.Both the mean measurement error and the representativeness error (a function of the number of OMI observations) are typically larger in 2010-2012 than in 2005-2009; the mean measurement error and the total super-observation error (a sum of the measurement error and the representativeness error) averaged over 30-55 • N in January are about 7 and 9 % larger in 2010-2012 than in 2005-2009, respectively.After 2010, the excessive χ 2 indicates underestimations in the analysis spread, while the increased OmF indicates smaller corrections by the assimilation (cf.Sect.4.2).To correct the concentrations and emission from OMI super-observations that have larger super-observation errors, the forecast error needs to be further inflated.A technique to adaptively inflate the forecast error covariance for the concentrations and emissions of NO and NO 2 is required to better represent the data assimilation balance throughout the reanalysis.

OmF
OmF statistics are computed in observation space to investigate the structure of model-observation differences and to measure improvements in the reanalysis (Fig. 2).Model biases, as measured from the OmF in the control run, are persistent throughout the reanalysis period and vary considerably with season.The figure shows an underestimation (i.e.positive OmF) of tropospheric NO 2 columns compared with the OMI NO 2 data from the Southern Hemisphere (SH) subtropics to NH mid-latitudes, an underestimation of tropospheric CO compared with MOPITT CO data in the NH, an overestimation (i.e.negative OmF) of middle and uppertropospheric O 3 in the extratropics compared with TES and MLS O 3 data, and underestimation of middle-tropospheric O 3 in the tropics compared with TES.The underestimation of tropospheric CO by CHASER was found to be very similar to that in most of the other chemistry-transport models (CTMs) (Shindell et al., 2006).
After 2010, the positive OmF for MOPITT CO in the control run decreases in the NH, and the positive OmF for OMI NO 2 increases in the NH mid-latitudes.As the quality of these retrievals is considered constant in the reanalysis period (e.g.Worden et al., 2013), the interannual variations in OmF are probably attributed to long-term changes in the model bias.The anthropogenic emission inventories for 2008 were used in the model simulation for 2009-2012, which could be partly responsible for the absence of a concentration trend in the model.
In the reanalysis run, the OmF bias and root-mean-square error (RMSE) for MLS O 3 becomes nearly zero globally because of the assimilation.The systematic reductions of the OmF confirm the continuous corrections for model errors by the assimilation.The remaining error is almost equal to the mean observational error.The OmF reduction is relatively smaller for MLS HNO 3 than for MLS O 3 because of the larger observational errors.
The mean OmF bias against TES O 3 data in the middle troposphere is almost completely removed because of the assimilation, and the mean OmF RMSE is reduced by about 40 % in the SH extratropics and by up to 15 % from the tropics to the NH.The error reduction is weaker in the lower troposphere (figure not shown) because of the reduced sensitivity of the TES retrievals to lower-tropospheric O 3 .The analysed OmF becomes larger after 2010 corresponding to the decreased number of assimilated measurements.
Data assimilation removes most of the OmF bias against MOPITT CO data with a mean bias (RMSE) reduction of about 85 % (60 %) in the NH extratropics and about 80 % (30 %) in the tropics, respectively.The annual mean OmF becomes almost constant through the reanalysis, suggesting that the a posteriori emissions realistically represent the interannual variations.
The mean OmF bias against OMI NO 2 is reduced with a mean reduction of about 30-60 % at the NH mid-latitudes and about 50-60 % in the tropics.The remaining errors could be associated with the short chemical lifetime of NO x in the boundary layer as compared to the OMI revisit time of roughly 1 day, biases in the simulated chemical equilibrium state, and the underestimation of the emission spread.The OmF is relatively larger in 2010-2012 than in other years, corresponding to about half the reduction in the OMI NO 2 observation.The number of assimilated measurements The first row is the OmF for OMI NO 2 data (in 10 15 molec cm −2 ), second row is for TES O 3 data between 500 and 300 hPa (in ppb), third row is for MOPITT CO data between 700 and 500 hPa (in ppb), fourth row is for MLS O 3 data between 216 and 100 hPa (in ppm), and fifth row is for MLS HNO 3 data between 150 and 80 hPa (in ppb).A super-observation approach is employed to the OMI and MOPITT measurements, whereas individual observations are used in the analysis of the others.
is important for reducing model errors, even when global coverage is provided.The mean observation-minus-analysis (OmA) bias is about 10-15 %; it is smaller in the NH midlatitudes and almost the same in the tropics and SH compared with the mean OmF in the reanalysis (figure not shown).

Analysis increment
The analysis increment information, estimated from the differences between the forecast and the analysis both in the re-analysis run, is a measure of the adjustment made in the analysis step.The analysis increment for O 3 is mostly positive at 700 hPa and negative at 400 hPa at mid-latitudes (Fig. 3).The positive (negative) increments imply that the short-term model forecast underestimates (overestimates) the O 3 concentrations.As the increments are introduced by the TES assimilation, these vertical structures suggest that the tropospheric TES O 3 data have independent information for the lower-and upper-tropospheric O 3 .Jourdain et al. (2007) showed that the TES retrievals have 1-2 DOFs in the tropo- The mean analysis increment for NO 2 varies largely with space and time in the troposphere (not shown).For some regions with strong surface emissions, especially at NH midlatitudes, the NO 2 increment becomes negative in the free troposphere because of the assimilation of non-NO 2 measurements, compensating for the tropospheric NO 2 column changes caused by the (positive) surface emissions adjustment.This demonstrates that simultaneous data assimilation provides independent constraints on the surface emissions and free-tropospheric NO 2 concentration, because of the use of observations from multiple species with different measurement sensitivities.Large adjustments are introduced to the NO 2 concentration in the upper troposphere-lower stratosphere (UTLS), because the MLS O 3 and HNO 3 assimilation effectively corrects the model NO 2 bias as a result of the correlations between species in the error covariance matrix.

Ozonesonde
The validation of the reanalysis and control run with global ozonesonde observations is summarised in Table 1.As depicted in Figs. 4 and 5, the CHASER simulation reproduced the observed main features of global O 3 distributions in the troposphere and lower stratosphere.However, there are systematic differences such as a negative bias in the NH highlatitude troposphere and a positive bias from the middle troposphere to the lower stratosphere in the SH.
The reanalysis shows improved agreements with the ozonesonde observations.The mean negative bias in the NH high latitudes is reduced in the troposphere.In the NH midlatitudes, the model's positive bias in the UTLS and negative bias in the lower troposphere is mostly removed.The large reduction of the mean lower-tropospheric bias in the NH mid-latitudes is attributed primarily to increased O 3 concentrations in boreal spring-summer (Fig. 5).The RMSEs compared with the ozonesonde observations are also reduced throughout the troposphere.The remaining errors, especially near the surface, are associated with low retrieval sensitivities in the lower troposphere and gaps in the spatial representation between the model and observations.
In the tropics, the data assimilation generally increases the O 3 concentration, reducing the negative bias in the upper troposphere but increasing the positive bias in the lower troposphere.The increased positive bias could be attributed to the positive bias in the TES measurements (Sect.7.2).
In the SH, the model's positive bias from the middle troposphere to the lower stratosphere is attributed largely to a posi- tive bias in the prescribed O 3 concentrations above 70 hPa in CHASER, which is mostly removed in the reanalysis.The observed seasonal and interannual variations are captured well in the reanalysis.The observed tropospheric O 3 concentration shows variations from year to year during the reanalysis period (Fig. 5).As summarised in Table 2, the reanalysis reveals better agreements with the observed linear slope in most cases.The observed linear slope during the reanalysis period is positive (+2.9 ± 2.8 ppb (8 years) −1 ) at the NH mid-latitudes between 850 and 500 hPa, but the significance of this trend is not very high.The slope over the 8-year period at the same region is also positive in the reanalysis data (+1.2 ± 2.1 ppb (8 years) −1 ), whereas it is negative in the control run (−1.2 ± 2.1 ppb (8 years) −1 ).At the NH mid-latitudes in the lower stratosphere (200-90 hPa), the observed slope is negative (−17.7 ± 41.9 ppb (8 years) −1 ), whereas the reanalysis (−25.7 ± 38.8 ppb (8 years) −1 ) shows better agreement with the observed slope than the control run (−35.8± 46.3 ppb (8 years) −1 ).The seasonal and year-to-year variations are generally well reproduced in the control run in the NH troposphere (r = 0.73-0.93),whereas the reanalysis further improves the temporal correlation by 0.07 between 850 and 500 hPa and by 0.04 between 500 and 200 hPa at the NH mid-latitudes.
The observed time series show obvious year-to-year variations in the tropics associated with variations such as in the El Niño-Southern Oscillation (ENSO), including their influences on the biomass-burning activity.The tropical O 3 variations are better represented in the reanalysis (r = 0.80 between 850 and 500 hPa and r = 0.72 between 500 and 200 hPa) than in the control run (r = 0.74 and r = 0.59).In the tropics and SH, annual and zonal mean O 3 concentration does not show clear linear trends during the reanalysis period either in the observations or reanalysis.However, local O 3 concentrations might have significant trends.For instance, Thompson et al. (2014) showed wintertime free-tropospheric O 3 increases over Irene and Réunion probably due to longrange transport of growing pollution in the SH.Further analyses will be required to investigate the detailed characteristics of O 3 variation.
The ozonesonde-analysis difference is slightly larger in 2010-2012 than in 2005-2009 (Table 3 and Fig. 6).The large positive bias throughout the troposphere in winter and negative bias below 500 hPa in spring-autumn remain in 2010-2012 (Fig. 6).This is associated with the decreased number of assimilation measurements (TES and OMI); this is discussed further in Sect.7.3.In contrast, during 2005-2009 the mean O 3 bias does not change significantly with year in the reanalysis, which confirms the stable performance of the O 3 reanalysis field.Verstraeten et al. (2013) highlighted that the time series of the TES-sonde O 3 biases do not change over time, which suggests that TES is an appropriate instrument for long-term analysis of free-tropospheric O 3 .

Aircraft
Both the model and the reanalysis generally capture well the observed horizontal, vertical, and seasonal variations in O 3 concentration compared with the MOZAIC/IAGOS aircraft measurements (Figs. 7 and 8).However, the model mostly overestimates O 3 concentration from the northern tropics to the mid-latitudes and underestimates it at the NH high lati- tudes in the middle and upper troposphere (between 850 and 300 hPa in Table 1), as consistently revealed by comparison with ozonesonde observations.Although the improvement is not large in the upper troposphere (500-300 hPa, Fig. 7), an improved agreement with the MOZAIC/IAGOS measurements is found in the reanalysis run in the middle troposphere (850-500 hPa) and at the aircraft cruising altitude (300-200 hPa), as summarised in Table 1.Most of the negative bias of the model in the troposphere of the NH high latitudes is reduced throughout the reanalysis period.A substantial improvement is observed at the aircraft cruising altitude around the tropopause (between 300 and 200 hPa) at the NH high latitudes; the mean positive bias is reduced from +8 % in the control run to +3 % in the reanalysis.By separately assimilating individual measurements through the observing system experiments (OSEs), we confirmed that the improvement is mainly attributed to the MLS assimilation (not shown).
From the NH subtropics to the mid-latitudes, the mean positive bias of the model at the aircraft cruising altitude (300-200 hPa) is reduced, whereas the positive bias of low concentration in autumn-winter in the middle troposphere (850-500 hPa) is increased.In the tropics, the MOZAIC/IAGOS measurements were mostly collected near large biomass-burning areas (Fig. 7: e.g.Central Africa and Southeast Asia), where O 3 concentration in the troposphere becomes too high in the reanalysis probably attributed to a positive bias in the TES O 3 observations (cf.Sect.7.2).Note that more substantial improvements in comparison with the aircraft measurements are found in 2005-2009 than in the later years.
HIPPO measurements provide information on the vertical O 3 profiles over the Pacific.The observed tropospheric O 3 concentration is higher in the extratropics than the tropics, with higher concentrations in the NH than the SH (Fig. 9).The observed tropospheric O 3 concentration displays a maximum in the NH subtropics in March (HIPPO3) because of the strong influence of stratospheric inflows along the westerly jet stream.The observed latitudinal-vertical distributions are generally captured well by both the model and the reanalysis for all the HIPPO campaigns.
The model shows negative biases in the NH extratropics and positive biases from the tropics to the SH compared with the HIPPO measurements (Table 1).These characteristics of the bias are commonly found in comparisons with global ozonesonde observations in this study (cf.Sect.5.1.1)and are reduced effectively in the reanalysis.A considerable bias reduction can be found in the lower-and middle-tropospheric O 3 at the NH mid-latitudes where O 3 variations could be influenced by long-range transport from the Eurasian continent.Direct concentration adjustment by TES measurements in the troposphere and by MLS measurements in the UTLS played important roles in correcting tropospheric O 3 profiles.In addition, corrections made to the O 3 precursors emissions over the Eurasian continent by OMI, especially over East Asia, were important in influencing tropospheric O 3 concentration over the North Pacific around 35-60 • N, especially in boreal spring.This demonstrates that the assimilation of multiple-species data sets is a powerful means by which to correct the global tropospheric O 3 profiles, including those over remote oceans.In contrast, the positive bias in the tropics is further increased in the reanalysis (from 850 and 500 hPa and from +10 to +15 % between 500 and 300 hPa), as mostly commonly found in comparisons against the MOZAIC/IAGOS and ozonesonde measurements (cf.Sect.5.1.1 and 5.1.2).Vertical profiles obtained during the NASA aircraft campaigns were also used to validate the O 3 profile (Fig. 10).The comparisons show improved agreements in the reanalysis in the middle and upper troposphere during INTEX-B over Mexico and during the ARCTAS campaign over the Arctic, but the model's positive bias near the surface is further increased for the INTEX-B profile.For the DISCOVER-AQ profile, the model's negative bias in the free troposphere is mostly removed in the reanalysis.For the DC3 profiles, the model captures the observed tropospheric O 3 profiles well, whereas the assimilation leads to small overestimations.

Surface
Surface CO concentrations are compared with the WDCGG surface observations from 59 stations, as summarised in Table 4 and depicted for 12 selected stations in Fig. 11.The control run underestimates CO concentration by up to about 60 ppb in the NH extratropics, with the largest negative bias in winter and smallest bias in summer.The model underestimation has been commonly found in most of the CTMs (Shindell et al., 2006;Kopacz et al., 2010;Fortems-Cheiney et al., 2011;Stein et al., 2014).The model's negative bias is also found in most tropical sites, but not in the SH.
Most of the negative bias in the NH extratropics and in the tropics is removed in the reanalysis run, due to the increased surface CO emissions in the analysis (cf.Sect.6).The MO-PITT assimilation dominates the negative bias reduction   through the surface CO emission optimisation, whereas the assimilation of other data has only a small influence on the CO concentration analysis through changes in the OH field.The annual and regional mean surface bias becomes positive after assimilation at NH mid-and high latitudes, which is illustrated at locations such as Midway and Bermuda (32 • N, 65 • W; figure not shown).The observed negative trends at most NH sites are captured well in the reanalysis.
Tropical CO concentrations show district interannual variations associated with variations in tropical biomass-burning activities and meteorological conditions.The temporal correlations with the observations are about 0.1-0.2higher in the reanalysis compared with the control run in the tropics at Christmas Island and Barbados.
In the SH, the model generally shows good agreement with the surface observations.However, assimilation increases the CO concentration and leads to overestimations in some places (e.g.Showa).The mean negative bias at the SH midlatitudes changed from −10 % in the control run to +7 % in the reanalysis.

Aircraft
The model underestimates the CO concentration in the tropics and the NH compared with the MOZAIC/IAGOS aircraft measurements throughout the troposphere (below 300 hPa) and around the tropopause at the aircraft cruising altitude (between 300 and 200 hPa), as depicted in Fig. 12.The model's negative bias is mostly removed in the reanalysis, with a mean improvement of 50-90 % throughout the troposphere, as summarised in Table 4.This confirms that the constraints provided for the surface emissions are propagated well into the concentrations of the entire troposphere with a delay in the peak timing and decay in the amplitude.Note that the CO concentrations were not directly adjusted in the data assimilation.The spatial distribution in the upper troposphere is also captured well in the reanalysis (Fig. 7).Despite the overall improvement, the low concentrations in the NH lower and middle troposphere in summer and autumn remain underestimated, whereas the analysed concentration becomes too high in the NH high latitudes at the aircraft cruising altitude (Fig. 12).A decreasing trend is observed in both the lower and upper troposphere in the NH, which is The analysis and the comparison with the independent observations show that this caused unrealistic interannual CO variations and an underestimate of the decreasing trend in the control run.
The distinct interannual variations in the tropics (over Southeast Asia and around Central and North Africa) observed from the MOZAIC/IAGOS aircraft measurements mainly reflect variations in biomass-burning emissions.The temporal variations of CO are captured better by the reanalysis between 850 and 500 hPa (r = 0.67 in the control run and 0.78 in the reanalysis).
The HIPPO observations exhibit large latitudinal CO gradients around 15-25 • N over the Pacific for all campaigns (Fig. 13).Tropospheric air can be distinguished between the tropics and extratropics because of the transport barrier around the subtropical jet (Bowman and Carrie, 2002;Miyazaki et al., 2008).The transport barrier produces the large CO gradient in the subtropics and acts to accumulate high levels of CO in the NH extratropics.In the SH, CO concentration increases with height in the free troposphere, because of the strong poleward transport in the upper troposphere from the tropics to the SH high latitudes.The assimilation increases CO concentration and reduces the mean model negative bias by about 60-80 % in the NH extratropics against the HIPPO measurements.The remaining negative bias could be attributed to overemphasised chemical destruction while air is transported from the Eurasian continent to the HIPPO locations over the central Pacific.For instance, the negative bias of the surface CO concentration is mostly removed in the reanalysis over Yonaguni at the ground surface, located near (downwind of) large sources of Chinese emissions (Fig. 11).This suggests that the emission sources are realistically represented in the reanalysis.Errors in stratospheric CO might also cause the negative bias through stratosphere-troposphere exchange (STE).
Reductions in the negative model bias of tropospheric CO can be found in comparisons against the NASA aircraft campaign profiles from INTEX-B, ARCTAS-A, and DC3 (Fig. 10), although the bias reduction is small for the ARCTAS-B profile.Bian et al. (2013) demonstrated that most of the enhanced CO concentrations observed during the ARCTAS-A originate from Asian anthropogenic emissions.This suggests that the reanalysis realistically repre-  sents the Asian anthropogenic emissions and their influences on the western Arctic CO level.Bian et al. (2013) also suggested a lower fraction of CO from Asian anthropogenic emissions during the ARCTAS-B than during the ARCTAS-A and showed that the along-track measurements are not representative of the concentrations within the large domain of the western Arctic during the ARCTAS-B, which may explain the small bias reduction for the ARCTAS-B profile in our comparison.MOPITT data are assimilated equatorward of 65 • , and only the CO emissions are optimised in the reanalysis.Direct adjustment of CO concentration using highlatitude retrievals could be expected to improve the representation of CO in the ARCTAS profiles, as demonstrated by Klonecki et al. (2012) using IASI measurements.

Tropospheric column
Compared with the satellite retrievals, the model generally underestimates the NO 2 concentration over most industrial areas (e.g.East China, Europe, eastern USA, and South Africa) and over large biomass-burning areas (e.g.Central Africa), as shown by Fig. 14.The model underestimations are commonly found in comparisons against three different retrievals.The three products are produced using the same retrieval approach (Boersma et al., 2011).Therefore, the overpass time difference and diurnal variations in chemical processes and emissions dominate the differences between these retrievals.The negative bias over these regions is greatly reduced in the reanalysis, decreasing the 8-year global mean negative bias by about 65, 45, and 30 % as compared with OMI, SCIAMACHY, and GOME-2, respectively (Table 5).
The improvement can be also seen in the increased spatial correlation of 0.03-0.05and in the reduced RMSE of 15-30 %.Over East China, the model's negative bias is large in winter, whereas the assimilation reduces the wintertime bias by about 40 % compared with OMI retrievals.The observed low concentration in 2009 and high concentration in 2010-2012 are captured in the reanalysis, whereas the control run mostly failed to reproduce the interannual variability.The reanalysis shows larger positive trends than the control run, but the observed trend is even higher.The underestimation in the mean concentration and positive trend remain large in the reanalysis, especially when compared with the SCIAMACHY and GOME-2 retrievals.Note that over polluted areas, realistic concentration pathways of NO 2 do not follow simple linear trends but reflect a combination of effects of environmental policies and economic activities.For instance, NO x emissions in China have been increasing because of the rapid economic growth, although an economic slowdown affected the growth rate in 2009 (Gu et al., 2013).Over Europe, the model's negative bias in summertime is reduced by about 10-30 % in the reanalysis.The observed wintertime concentration is high in 2011-2012 and relatively low in 2010 because of the global economic recession and emission controls (Castellanos and Boersma, 2012).The assimilation increases the wintertime NO 2 concentration in 2011-2012 and captures the observed interannual variations better.
Over the eastern USA, the observed NO 2 concentration is high in 2005-2007 and low after 2008.The control run failed to reproduce these variations.In the reanalysis run, the model's negative bias is reduced in 2005-2007 compared with the OMI retrievals, showing a negative trend in the reanalysis period.The improvement is smaller for the SCIA-MACHY and GOME-2 retrievals.
Despite the general improvement, the reanalysis still has large negative biases compared with the satellite retrievals over the polluted regions.There may be several reasons for the remaining underestimation of NO 2 concentrations.The analysis increment can partly be lost after the forecast because of the short lifetime of NO x (Miyazaki and Eskes, 2013), especially when concentrations are adjusted.Other model processes, such as the diurnal cycle, boundary layer mixing and venting, and the chemical equilibrium at overpass, may not be described well.Also, the averaging kernels show a relatively small sensitivity close to the surface, resulting in relatively smaller adjustments in the assimilation.The remaining bias varied considerably with season (e.g. the bias is mostly absent during summer over East China and the eastern USA), whereas the eight series of 1-year calculations were conducted separately.Therefore, the remaining underestimation of NO 2 concentrations did not cause (spurious) gradual intra-annual and year-to-year increases in the estimated surface NO x emissions during the reanalysis period  (cf.Sect.6.1).The larger discrepancies with respect to the SCIAMACHY and GOME-2 retrievals may be attributed to the errors in the simulated diurnal NO 2 variations and a bias between OMI and these retrievals.Both the emission factors and the tropospheric concentrations of NO x are constrained primarily in the early afternoon by OMI, whereas no direct observational constraint on tropospheric NO x is available in the morning (i.e. during the SCIAMACHY and GOME-2 overpass time).Over North and Central Africa, the data assimilation removes most of the negative bias throughout the year because of the increased biomass-burning emissions.The remaining negative bias in the reanalysis is relatively large when compared with the GOME-2 over North Africa and with SCIA-MACHY and GOME-2 over Central Africa.The observed concentration is relatively small in 2010-2012 over North Africa, and the reanalysis captures the observed interannual variations better compared with the control run.
The control run fails to reproduce the observed distinct seasonal and interannual variations over Southeast Asia (r = 0.74-0.79 in the control run and r = 0.89-0.98 in the reanalysis compared with the three retrievals).The control run underestimates the concentration throughout the year with the largest biases in boreal spring in 2008-2009.The negative bias is greatly reduced in the reanalysis throughout the year, and the interannual variations are represented realistically.The remaining negative bias is large, especially when compared with the GOME-2 retrievals.

Aircraft
Compared with the vertical NO 2 profiles from the aircraft measurements, the simulated NO 2 concentration in the troposphere is generally too low (Fig. 10).For the ARCTAS profiles, the data assimilation has less impact in the troposphere.At high latitudes, the surface NO x emissions have only a small effect on the tropospheric NO 2 profiles, and the observational error of the OMI measurements is large in comparison with the observed low concentration.Compared with the two DC3 profiles, the model is too high in the lower troposphere and too low in the middle/upper troposphere.Data assimilation further increases the positive bias in the lower troposphere.The relatively coarse resolution of the model could cause large differences near the surface for comparisons at urban sites such as the DC3 profiles.Compared with the DISCOVER-AQ profile, the rapid change in NO 2 concentration in the lower troposphere is captured well by both the model and the reanalysis.The MLS O 3 and HNO 3 data assimilation effectively corrects the amount of NO 2 in the lower stratosphere, especially for the ARCTAS-A profile, because of the use of the interspecies correlation in the analysis step and by influencing the NO x / NO y species in the forecast step.

Other reactive species
The observed main features of the HNO 3 profiles are captured by both the control and reanalysis runs.The increase in HNO 3 toward the surface is driven mainly by oxidation of NO x in polluted areas, which is visible in the INTEX-B, ARCTAS-B, DC3-DC8, DC3-GV, and DISCOVER-AQ pro-files.The positive corrections by assimilation, primarily attributable to the increased NO 2 concentration and NO x emissions, reduce the model's underestimation for the DC3-GV profile, but led to concentrations that are too high for the INTEX-B, DC3-DC8, DC3-GV, and DISCOVER-AQ profiles.The assimilation only slightly influences the tropospheric HNO 3 concentration for the ARCTAS profiles because of the negligible impact of surface NO x emissions at NH high latitudes and because of the absence of HNO 3 measurements for the troposphere.To further improve the lower-tropospheric HNO 3 concentrations, corrections for its removal processes including depositions might be important.
In the middle and upper troposphere, both the control and reanalysis runs generally underestimate HNO 3 concentration.The assimilation partly reduces the negative bias for the DC3 profiles.Additional positive increments of NO 2 appear to be required in order to compensate for the negative bias in HNO 3 .In the UTLS, the model HNO 3 negative bias is reduced globally in the reanalysis because of the MLS assimilation.For the ARCTAS profiles, Liang et al. (2011) and Wespes et al. (2012) found that an adequate representation of stratospheric NO y inputs is important for the accurate simulation of tropospheric Arctic O 3 and NO x at pressures < 400 hPa.The vertical HO 2 profile mainly reflects variations in water vapour concentrations in the troposphere, which decrease with latitude.The control run overestimates the tropospheric HO 2 concentration for the INTEX-B and ARCTAS-A profiles but underestimates it for the ARCTAS-B, DC3-DC8, and DC3-GV profiles.The reanalysis generally increases HO 2 concentrations, while it decreases OH concentration.and atmospheric transports, which are hardly optimised by the currently available measurements.

Estimated emissions
In previous publications (Miyazaki and Eskes, 2013;Miyazaki et al., 2014) we demonstrated that the simultane-ous analysis of chemical concentrations and emissions improves the estimate of surface NO x emissions and LNO x sources, with differences of up to 58 % in regional surface NO x emissions.The analysis increment produced directly via the chemical concentrations plays an important role in reducing the model-observation mismatches that arise from model errors other than those related to emissions.Here we describe the estimated emissions briefly.Further detailed analyses of the 8-year variations in the estimated emission sources will be discussed in a separate paper.

Surface NO x emissions
The time series and global distributions of the analysed emission sources obtained during the reanalysis period are depicted in Figs. 15 and 16, respectively.The data assimilation increases the 8-year mean of global total surface NO x emissions from 38.4 to 42.2 Tg N. The approximate 10 % increase in global total emissions is attributable to an approximately 7 % increase in the NH (20-90 • N) and a 14 % increase in the tropics (20 • S-20 • N).The large increase in the NH emissions is associated with positive corrections over industrial areas such as China and India, and with corrections in Europe and the USA.Meanwhile, the increased emissions over Central Africa indicate larger emissions from biomass burning than shown by the inventories.These needed adjustments were commonly revealed by referring to our previous estimates for 2007 (Miyazaki and Eskes, 2013).The seasonal and interannual variability is also modified considerably in many regions.The emission inventories exhibit considerable uncertainties in representing seasonal and interannual emission variabilities associated with uncertain input information, such as economic conditions, biomass-burning activity, and emission factors (e.g.Jaeglé et al., 2005;Xiao et al., 2010;Reuter et al., 2014).For instance, the anthropogenic emissions were reported on a yearly basis, and thus seasonal variability in anthropogenic emissions such as from wintertime heating of buildings (e.g.Streets et al., 2003) was not considered in the a priori emissions.Wang et al. (2007) also suggested that the emission inventories largely underestimate soil emissions by a factor of 2-3 at NH mid-latitudes during summer.The assumptions applied to the a priori emissions (cf.Sect.2.1; for example, the anthropogenic emissions for 2008 are used in the estimations for 2009-2012) also cause an unrealistic lack of interannual variability in the a priori emissions and lead to significant differences between the a priori and a posteriori emissions.

LNO x sources
The average yearly global flash rate obtained for the reanalysis period 2005-2012 was 45.3 flashes s −1 , which is comparable with climatological estimates of 46 flashes s −1 derived from Lightning Imaging Sensor (LIS) and Optical Transient Detector (OTD) measurements (Cecil et al., 2014).The LNO x shows large discrepancies between the control and reanalysis runs.The mean annual global total LNO x source in the reanalysis run is estimated at 6.   during 2010-2012.From a sensitivity reanalysis calculation that was performed by removing the TES measurements for 2005, we conclude that the large increase in 2010-2012 is at least partly introduced artificially because of the lack of constraints from the TES measurements.The TES data assimilation generally tends to decrease the global LNO x amount in the simultaneous assimilation framework (the global total LNO x source in 2005 is 5.8 and 6.6 Tg N when estimated with and without the TES measurements, respectively).For the period 2005-2009, when the assimilated measurement density is nearly constant, the analysed LNO x variability is considered to be induced by variations in convective activity, thunderstorm type, and cloud distributions.The positive slope (+3.1 % ± 4.2 year −1 ) obtained for the period 2005-2009 in the reanalysis implies that variations in such processes led to the LNO x sources increase.The increase in the global LNO x sources for the period 2005-2009 is attributed to large increases over North Africa (+5.7 % ± 26.8 year −1 ), South America (+3.2 % ± 22.0 year −1 ), and the Atlantic Ocean (+7.4 % ± 11.5 year −1 ).Further detailed analyses are required in order to understand the possible causal mechanisms.
The global LNO x amount in the reanalysis (6.15 Tg N) for 2007 is in agreement with our previous estimate (6.31 Tg N) for the same year (Miyazaki et al., 2014).However, because the tuning factor applied for the global total flash frequency is about 10 % larger than in the previous estimate based on the recent climatological estimates (Cecil et al., 2014), the analysis increment can be different between the two estimates.For instance, the positive increment for 2007 is smaller or becomes negative over Siberia, Southeast Asia, and South America in the reanalysis.Note that the global structure of the analysis increment is generally similar between 2007 (figure not shown) and the 8-year reanalysis mean.Meanwhile, the seasonal variation in the tropical LNO x sources is modified more significantly in the reanalysis than in the previous estimate.In the reanalysis, the observational information is accumulated during the consequent 1-year calculation after a 2-month spin-up, while continuously correcting the LNO x source factors.In the previous estimate (Miyazaki et al., 2014), the LNO x sources were estimated from shorter data assimilation calculations (i.e.twelve 1-month calculations were conducted after a 15-day spin-up).

Surface CO emissions
The 8-year mean of global total emissions of CO is increased by 36 % by data assimilation (1298 Tg CO vs. 820 Tg CO), attributable mainly to an approximately 110 % increase in the NH.The increase in the total CO emission in the NH is large in the boreal late winter-spring period, especially over China and Europe.Stein et al. (2014) commonly found it necessary to adjust emissions seasonally, using regionally varying scaling factors with large corrections during winter-spring for industrialised countries.A similar seasonality in the adjustments is found in Fig. 15, whereas the seasonality in the NH is mostly absent in the a priori emissions.The positive increments for surface CO emissions are introduced by assimilation of MOPITT CO observations, whereas the assimilation of non-CO observations also affects the CO emission estimation via changes in OH concentrations.For instance, changes in surface NO x emissions decreased tropospheric OH concentrations at NH mid-latitudes, and this in turn acted to increase the tropospheric CO concentrations; this is discussed further in Sect.7.4.2.

Impact of emission analysis
The impact of the emission optimisation on the tropospheric O 3 analysis is evaluated based on comparison between the reanalysis run and a sensitivity calculation that excludes the emission factors for the surface emissions and LNO x sources from the state vector.The emission optimisation influences the O 3 concentrations with mean changes of about 15 % in the tropics and 10 % in the NH mid-latitudes in the lower troposphere.These changes improve the agreement with ozonesonde observations in the lower troposphere in both the NH and SH (reanalysis vs. w/o emission in Table 6), but not in the tropics.At the NH mid-latitudes the changes introduced by optimising the emission factors improve the agreement with the ozonesonde observation from April to August below about 500 hPa (Fig. 17) associated with the pronounced O 3 production caused by NO x increases; the monthly mean positive bias below about 900 hPa is reduced by 10-15 % in the summer and the negative bias between 900 and 500 hPa is reduced by 30-50 % in spring and summer.Vertical transport of O 3 and its precursors propagate the variations in surface emissions into the free troposphere, whereas the LNO x source optimisation improves the performance of the upper-tropospheric O 3 simulation directly.The impact of the emission optimisation on the free troposphere is large throughout the year in the tropics.
The observed O 3 concentration in the NH mid-latitude between 850 and 500 hPa increased from 2005 to 2010 (+2.3 ppb (5 years) −1 ); the positive slope is represented in the reanalysis run (+1.0 ppb (5 years) −1 ), whereas a case without emission source optimisation (w/o emission) shows a negative slope (−1.1 ppb (5 years) −1 ).These results imply that the simultaneous optimisation approach improves the concentrations and emissions in the model and produces high-quality multiple-year reanalysis data for tropospheric O 3 profiles.

Biases in the observations
TES O 3 retrievals are known to have positive bias compared with ozonesonde observations in the troposphere (e.g.Herman and Osterman, 2012;Verstraeten et al., 2013) 2013) determined that the upper and lower troposphere mean biases range from −0.4 to +13.3 and +3.9 to +6.0 ppb, respectively.In the reanalysis described in this paper we did not apply a bias correction to TES because of the difficulty in estimating the bias structure that possibly varies temporally and spatially in the reanalysis period.We tested a bias correction scheme with a linear concentration-bias relationship, in which the slope and intercept estimated by Verstraeten et al. (2013) for five latitudinal bands of the upper troposphere (above 464 hPa), at 464 hPa, and for the lower troposphere (below 464 hPa) were interpolated in log pressure to the model's vertical layers.For the Arctic lower troposphere, a constant bias of 1.1 ppb was assumed because of the very small correlation found by Verstraeten et al. (2013).
A sensitivity calculation for the year 2005 with the TES bias correction (TES-bias in Table 6) shows reductions in the positive O 3 bias in the tropical lower and middle troposphere against the ozonesonde observations.Conversely, in the NH mid-and high latitudes, the mean negative O 3 bias in the lower and middle troposphere increases.Because the bias was assumed constant with time, the representation of the interannual O 3 variation between 2005 and 2010 was not improved by applying the TES bias correction.
In the CHASER-DAS data assimilation approach, the O 3 analysis bias is not solely determined by bias in the assimilated O 3 measurements.A sensitivity experiment without the assimilation of TES measurements (w/o TES in Table 6) shows improvements in the lower and middle-tropospheric O 3 in the NH extratropics compared with the control run, demonstrating that the use of measurements other than TES measurements led to corrections in the lower-and middletropospheric O 3 .The additional use of the TES O 3 measurements further improved the O 3 analysis in most cases (see Table 6).

Satellite data availability
Any discontinuities in the availability and coverage of the assimilated measurement will affect the quality of the reanalysis and estimated interannual variability.In particular, the number of assimilated TES O 3 retrievals decreases after 2010 through 2012, while approximately half of the OMI retrieval pixels per orbit are compromised since December 2009.Correspondingly, the data assimilation performance, as measured from the data assimilation statistics (Sect.4) and comparisons against the independent observations (Sect.5), became worse after 2010 in the NH.The lack of direct O 3 measurements and the reduced constraints from the precursor (i.e.NO 2 ) measurements will degrade the O 3 analysis in the NH after 2010, and will also limit the evaluation of the analysis uncertainties (cf.Sect.7.6) and may cause spurious interannual changes and trends.Changes in the observing system thus limit the usability of the reanalysis for long-term variability studies.

A priori emissions
The choice of the a priori emissions will influence the reanalysis result.To study the sensitivity of the reanalysis to the a priori settings, emissions obtained from EDGAR-HTAP v2 (http://edgar.jrc.ec.europa.eu/htap_v2/index.php?SECURE=123) for the years 2008 and 2010 were alternatively used as a priori anthropogenic NO x and CO emissions in the calculation for 2005 and 2010, respectively (the inventory was not provided for 2005 at the time of this study).EDGAR-HTAP v2 was produced using nationally reported emissions combined with regional scientific inventories from the European Monitoring and Evaluation Programme (EMEP), Environmental Protection Agency (EPA), Greenhouse gas-Air Pollution Interactions and Synergies (GAINS), and Regional Emission Inventory in Asia (REAS).The model simulation using the a priori emissions, constructed based on the EDGAR v4 and GFED v3 emissions, shows significant underestimations in tropospheric CO concentrations, as in most of the CTMs (e.g.Stein et al., 2014), and this underestimation is large over urban sites in the NH (Sect.5.2).The global CO emissions of EDGAR-HTAP v2 inventory are about 20 % higher than the a priori emissions.Using the EDGAR-HTAP v2 emissions instead of the a priori emissions means that the negative bias in the simulated surface CO concentration could be reduced by about 20-40 % in the tropics and the NH extratropics as is shown by the green lines in Fig. 11.The error reduction is large in winter-spring and small in summer in the NH, whereas it is mostly negligible in the SH.
Despite the large differences in the simulated concentration, the choice of a priori emissions has only slight influence on the a posteriori CO concentrations and emissions.The annual global total emission is 1398 Tg CO in the case with the EDGAR v4 and GFED v3 emissions and 1360 Tg CO with the HTAP v2 emissions in 2005.
The O 3 analysis is only slightly influenced by the choice of a priori emissions (reanalysis vs. HTAP in Table 6), except that the agreement against the ozonesonde observation is improved in the NH extratropics between 850 and 500 hPa through use of the EDGAR HTAP v2 emissions.The changes are attributable to the slightly different a posteriori surface CO and NO x emission (annual NH (20-90 • N) total emission of 26.5 Tg N in the case of the EDGAR v4 and GFED v3 emissions, and 29.4 Tg N with the HTAP v2 emissions in 2005).The spatial distribution of the estimated LNO x sources is also somewhat influenced by the choice of a priori surface emissions in the NH mid-latitudes (not shown), which led to differences in the agreement with the ozonesonde observation in the upper troposphere at 200 hPa.

OH distribution
OH is a key driver of the tropospheric chemical system as the processes leading to the removal of hydrocarbons from the atmosphere start with the reaction with OH.However, its distribution is represented poorly in CTMs.Patra et al. (2014) estimated an NH / SH OH ratio of 0.97 ± 0.12 with the help of methyl chloroform observations (a proxy for OH concentrations), whereas the ratio was estimated at 1.26 in the CHASER control run.The simulated ratio from this study falls within the range 1.28 ± 0.10 in the ACCMIP (the Atmospheric Chemistry and Climate Model Intercomparison Project) (Naik et al., 2013).The concentration of OH is directly linked to the concentrations of species determining the primary production (O 3 and H 2 O), removal (CO, CH 4 ), and regeneration of OH (NO x ).Because the CHASER-DAS system constrains O 3 , CO, and NO x , this holds the promise of a positive impact on the modelled OH concentration, given that the reactions are reasonably well described by the model.The impact of the assimilation on OH is shown in Fig. 18.The tropospheric OH concentration is decreased by the assimilation in the NH and increased in the SH tropics; these changes are primarily attributable to the increased concentration of CO and O 3 , respectively.From a sensitivity experiment in which the state vector was modified (either the emission factors or the concentrations were excluded from the state vector), we confirmed that the emission optimisation solely decreases the OH concentration in the NH troposphere, whereas both the concentration assimilation (mainly TES O 3 ) and the emission optimisation (mainly NO x emissions) increase the OH concentration in the tropics.The decrease in the tropospheric OH concentration in the NH is found throughout the reanalysis period, with the largest reductions of about 10 % during boreal spring-summer, leading to about 2 % decrease in the global annual mean OH concentration linked to CO increases in the NH.Changes in surface NO x emissions tend to decrease the annual mean tropospheric OH concentration in the NH mid-latitudes by about 3 % and increase it in the tropics by about 5 %.The 8-year mean NH / SH OH ratio is 1.18 in the reanalysis, which is smaller than the values of 1.26 in the control run and 1.28 in the ACCMIP; the value of 1.18 is closer to the observational estimate (0.97) of Patra et al. (2014).Because the chemical lifetimes of NO x and CO are affected by the amount of OH, these changes once more suggest the importance of the simultaneous optimisation of the concentration and emissions on the entire tropospheric chemical system and the emission estimates.
Although the methyl chloroform analysis in Patra et al. (2014) has considerable uncertainties, the large discrepancy between the analysis of Patra et al. (2014) and our estimate suggests that possible errors in the modelled OH could have had a negative influence on the reanalysis quality.If it is assumed that OH is overestimated in the NH, then top-down emission estimates of reactive species such as CO in the NH could also be overestimated.Sensitivity calculations were conducted to investigate the influence of the remaining possible OH positive bias on the reanalysis results.In the sensitivity reanalysis calculations, a factor of 0.8 was applied to the chemical reaction rate in the calculation of the chemical reaction CO + OH → CO 2 + HO 2 for the NH, in consideration of the obtained difference (1.18 vs. 0.97).Other chemical reaction rates were not adjusted so as to simplify interpretation of the calculations.In the sensitivity model calculation with reduced OH, the model's CO negative bias is reduced by about 30-50 % in the NH.After assimilation with reduced OH, the a posteriori annual total CO emissions become smaller by 15 % in the NH, whereas the a posteriori CO concentration at the surface does not change so obviously.Conversely, in the free troposphere, the a posteriori CO concentration becomes higher by about 5-10 % with the reduced OH, which shows better agreement with the MOZAIC/IAGOS aircraft measurements.Thus, a possible overestimation of the simulated OH might lead to overestimations in the estimated CO emissions and underestimations in the analysed CO concentration in the free troposphere.The large positive adjustment needed for the CO concentrations in the NH may therefore be related to deficiencies in the modelling of OH, instead of too low emissions.
Note that CO is produced by the oxidation of methane and biogenic NMHCs, a process that contributes about half of the background CO (Duncan et al., 2007).This component can also account for part of the missing CO concentrations.Stein et al. (2014) considered that anthropogenic CO and VOC emissions in their inventory are too low for industrialised countries during winter and spring.

Other error sources
The emissions of O 3 precursors other than NO x and CO, such as VOCs, have a pronounced influence on tropospheric chemistry.Further constraints are required to improve the O 3 analysis.Optimising isoprene emissions from satellite CH 2 O measurements in the reanalysis framework have the potential to improve the O 3 analysis; this will be investigated in a future study.
Incorrect model processes in atmospheric transport and chemistry lead to model forecast errors and degrade the reanalysis performance.Improving the forecast model is important for properly propagating observational information in space and among different species.
Meteorological fields used as inputs to the chemical reanalysis calculation were produced using an AGCM simulation nudged toward the meteorological reanalysis in order to reproduce past meteorological variations while simulating the influence of sub-grid transport processes.Simultaneous assimilation of meteorological and chemical observations us-ing an advanced data assimilation technique with consideration of radiative feedbacks and the covariances between the meteorological and chemical fields is expected to reduce systematic model errors and improve the chemical reanalysis performance.

Data assimilation setting
To improve the data assimilation analysis with the limited ensemble size, covariance localisation was applied to neglect the error correlation among non-or weakly related variables in the background error covariance matrix.The inclusion of correlations between a larger number of variables allows the propagation of observational information among various fields, but it requires a large ensemble size to represent the multivariate relationships properly.For instance, Zoogman et al. (2014) demonstrated the possibility of substantial benefit from joint O 3 -CO data assimilation in analysing nearsurface O 3 , if the instrument sensitivity for CO in the boundary layer is larger than that for O 3 .Such covariances were not considered in our reanalysis calculation.

Uncertainty estimation
Important information regarding the reanalysis product is provided by the error covariance.The analysis ensemble spread, which is estimated as the standard deviation of the simulated concentrations across the ensemble, in combination with the χ 2 test can be used as a measure of the uncertainty of the reanalysis product within the EnKF assimilation framework (Miyazaki et al., 2012b).The analysis spread is caused by errors in the model input data, model processes, and errors in the assimilated measurements, and it is reduced if the analysis converges to a true state.
The analysis spread for O 3 is about 8-12 % relative to the analysed concentration in the tropical upper troposphere at 200 hPa (lower panels in Fig. 3), which is mostly determined by the assimilation of TES and MLS O 3 retrievals.The analysis spread is relatively small in the extratropical lower stratosphere (4-7 %) except at the polar regions, because of the high accuracy of the MLS measurements.At 700 and 400 hPa, the O 3 analysis spread is generally smaller in the tropics than the extratropics because of the higher sensitivities in the TES O 3 retrievals.The simultaneous emission and concentration optimisation is important in producing proper ensemble perturbations, especially in the lower troposphere.
The global analysis spread for O 3 at 700 and 400 hPa is small in 2010-2012 (lower panels in Fig. 3).Considering the smaller level of agreement with the ozonesonde observations in 2010-2012 than in 2005-2009 (Table 3), the small analysis spread cannot be regarded as an error reduction caused by the analysis converging to a true state.The small analysis spread is likely associated with the lack of effective observations for measuring the analysis uncertainties and with the stiff chemical system.The obtained results indicate the requirements for additional observational information and/or stronger covariance inflation to the forecast error covariance for measuring the long-term analysis spread corresponding to actual analysis uncertainty.The too large χ 2 for OMI NO 2 and TES O 3 (Fig. 1) also suggested underestimations in the forecast error covariance in comparison with the actual OmF in 2010-2012 (cf. Sect. 4.1) (cf. Sect. 4.1).

Applications and future developments
The chemical reanalysis data set has great potential to contribute in a number of ways to studies of the atmospheric environment and climate: 1.The concentration and emission data, which are produced consistently from a single analysis system, provide comprehensive information on atmospheric composition variability in order to improve the understanding of the processes controlling the atmospheric environment, including OH, and their roles in changing climate.
2. The reanalysis data provide initial and boundary conditions for climate and chemical simulations.They can also be used as an input to meteorological reanalyses for radiation calculations (Dragani and McNally, 2013).
3. The obtained emission data can be used to study emission variabilities and to evaluate bottom-up emission inventories.
4. The statistical information obtained during the reanalysis calculation can be used to suggest developments of models and observations.The large spread can be regarded as an indicator for the requirement for further constraints, whereas the analysis increment identifies sources of model error.
Several further developments have been identified as necessary to improve the quality and value of the reanalysis data set: 1. Discontinuities in the assimilated measurements lead to changes in the reanalysis quality.The O 3 analysis performance was degraded in 2010-2012, corresponding to the decreased number of assimilated measurements.The influence of data discontinuities must be considered or removed when studying interannual variability and trends using products from reanalyses.Including more data sets such as from IASI and GOME-2 measurements could improve the reanalysis quality.
2. Application of a bias correction procedure for multiple measurements could improve the reanalysis quality but should be carefully checked (Inness et al., 2013).Observations taken from aircraft and ozonesonde measurements or independent satellite data sets can be used as anchors in the bias correction.Alternatively, these data could be assimilated to provide additional unbiased constraints, as has been demonstrated by Baier et al. (2013).
3. Additional constraints are required to improve the lower troposphere and boundary layer concentrations and emissions.Recently developed retrievals with high sensitivity to the lower troposphere would be helpful (e.g.Deeter et al., 2013;Cuesta et al., 2013).Moreover, the optimisation of additional precursors emissions could be important for improving the lower tropospheric analysis, including the representation of long-term variability.
4. Extension of the forecast model to the entire stratosphere with detailed stratospheric chemistry is expected to reduce forecast errors in both the stratosphere and the troposphere.We plan to replace the forecast model with one that has an updated chemical scheme and a model top extended to the stratosphere (Watanabe et al., 2011).This would also allow the assimilation of total column measurements, in which the combined assimilation of limb profiles with nadir column measurements could benefit the reanalysis performance, especially in the UTLS (Barré et al., 2013;Inness et al., 2013;Emili et al., 2014).

Conclusions
We conducted a chemical reanalysis calculation for the 8 years from 2005 to 2012 based on an assimilation of multiple satellite data sets obtained from OMI, MLS, TES, and MO-PITT.The simultaneous optimisation of the chemical concentrations and the precursors emissions provides a comprehensive data set that can be used for various applications in air-quality and climate research.By analysing simultaneously concentrations and emissions, the improved atmospheric concentrations of chemically related species have the potential to improve the emission inversion, whereas the improved representations of the seasonal, interannual, and geographical variability of the emissions benefit the atmospheric concentration reanalysis through a reduction in model forecast error.Data assimilation statistics were analysed to evaluate the long-term stability of the chemical reanalysis.The analysis confirmed that the forecast error covariance was specified reasonably well.The OmFs without assimilation varied with year, which suggested an unrealistic lack of interannual variations in the precursor's emissions.The OmFs after assimilation became almost constant and decreased in the reanalysis, implying persistent reduction of model error and improved representation of emission variability.The information on the analysis uncertainty obtained during the assimilation adds value to the chemical reanalysis data set, in which the observed large analysis spreads indicated a requirement for fur-ther constraints from additional observations.However, the discontinuity in the assimilated measurements limited the usability of the reanalysis product.The number of available TES measurements decreased significantly after 2010, which produced unrealistically small analysis spreads and degraded the quality of the tropospheric O 3 analysis.
The analysed O 3 , CO, and NO 2 concentrations in the troposphere showed good agreement with independent observations on both regional and global scales, for seasonal and interannual variations from the lower troposphere to the lower stratosphere.The linear ozone slopes observed during the reanalysis period were positive at NH mid-latitudes in the lower troposphere and negative in the NH UTLS; these interannual variations were captured well in the reanalysis.The model simulation without any assimilation mostly failed to reproduce the observed variations.The simultaneous assimilation of multiple-species data with optimisation of both the concentrations and emission fields was shown to be effective in correcting the profiles for the entire troposphere, including the long-term variations in O 3 , CO, NO 2 .The global distribution of OH was modified considerably, decreasing the difference between NH and SH because of the simultaneous assimilation throughout the reanalysis period, which played an important role in propagating observational information among various species and in modifying the chemical lifetimes of reactive gases.To conclude, the combined analysis of concentrations and emissions is considered an important development in tropospheric chemistry reanalysis.
To produce better chemical reanalysis data, it will be necessary to have additional constraints, a better forecast model, and bias correction.Although the assimilation of multispecies data influences the representation of the entire chemical system, the influence of persistent model errors remains a concern.For instance, the reanalysis still has large negative biases in NO 2 concentrations over the polluted regions, which may be associated with errors in, for instance, the model chemical equilibrium states, planetary boundary layer mixing, and diurnal variations in chemical processes and emissions.Adjusting additional model parameters such as VOC emissions, deposition, and/or chemical reactions rates by adding observational constraints will help to reduce model errors.An extension of the forecast model to the entire stratosphere and incorporating detailed stratospheric chemistry is expected to reduce forecast errors in both the stratosphere and troposphere and allow the assimilation of total column measurements (Inness et al., 2013).Techniques to reduce the influence of discontinuities in the assimilated measurements and to use sparse observations efficiently (van der A et al., 2010) on the quality of the reanalysis are also required.
Vertical profiles of seven key gases (O 3 , CO, NO 2 , OH, HO 2 , HNO 3 , and CH 2 O) obtained from six aircraft campaigns -Intercontinental Chemical Transport Experiment Phase B (INTEX-B), Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARCTAS)-A, ARCTAS-B, Deriving Information on Surface Conditions from Column and Vertically Resolved Observations Relevant to Air Quality (DISCOVER-AQ), Deep Convection Clouds and Chemistry (DC3)-DC8, and DC3-GV -were used.

Figure 1 .
Figure 1.Time series of the monthly mean chi-square value and its standard deviation (black lines) and the number of assimilated observations per month (blue bars) for OMI NO 2 , TES O 3 , MOPITT CO, MLS O 3 , and MLS HNO 3 .A super-observation approach is employed to the OMI and MOPITT measurements (the number of super-observations is shown), whereas individual observations are used in the analysis of the others.

Figure 2 .
Figure2.Time-latitude cross section of the monthly and zonal mean OmF obtained without assimilation (left panels) and with assimilation (centre panels).The positive and negative OmF values are shown in red and blue, respectively.Positive OmF represents negative model bias compared with observations.Right panels show latitudinal distributions of the 8-year mean OmF bias (black line) and RMSE (red line) obtained with assimilation (solid line) and without assimilation (dotted line).The first row is the OmF for OMI NO 2 data (in 10 15 molec cm −2 ), second row is for TES O 3 data between 500 and 300 hPa (in ppb), third row is for MOPITT CO data between 700 and 500 hPa (in ppb), fourth row is for MLS O 3 data between 216 and 100 hPa (in ppm), and fifth row is for MLS HNO 3 data between 150 and 80 hPa (in ppb).A super-observation approach is employed to the OMI and MOPITT measurements, whereas individual observations are used in the analysis of the others.

Figure 3 .
Figure 3. Time-latitude cross section of the analysis increment (upper panels, in ppb per analysis step) and the analysis spread (lower panels, in ppb/analysis step) obtained for O 3 at 700 hPa (left), 400 hPa (centre), and 200 hPa (right).

Figure 4 .
Figure 4. Comparison of the vertical O 3 profiles between ozonesondes (black), control run (blue), and reanalysis (red) averaged for the period 2005-2012.The left column shows the mean profile; centre and right columns show the mean difference and the RMSE between the control run and the observations (blue) and between the reanalysis and the observations (red).From top to bottom, results are shown for the NH high latitudes (55-90 • N), NH mid-latitudes (15-55 • N), tropics (15 • S-15 • N), SH mid-latitudes (15-55 • S), and SH high latitudes (55-90 • S).

Figure 6 .
Figure 6.Vertical profiles of the time series of the monthly mean O 3 concentration difference (in %) between the control run and ozonesondes (top) and between the reanalysis and ozonesondes (bottom) averaged over the NH mid-latitudes (15-55 • N).

Figure 7 .
Figure 7. Spatial distributions of O 3 (left column) and CO (right column) averaged between 500 and 300 hPa and during 2005-2012 obtained from the MOZAIC/IAGOS aircraft measurements (first row), control run (second row), and reanalysis (third row).Differences between the control run and observations (fourth row) and between the reanalysis and observations (fifth row) are also plotted.Units are ppb.

Figure 9 .
Figure 9. Latitude-pressure cross section of mean O 3 concentration (in ppb) obtained from HIPPO aircraft measurements (first row), control run (second row), and reanalysis (third row).The relative difference (in %) between the control run and the observation (fourth row) and between the reanalysis and the observation (fifth row) is also shown.Results are shown for all HIPPO campaigns (from left to right: HIPPO I, 8-30 January 2009; HIPPO II, 31 October to 22 November 2009; HIPPO III, 24 March to 16 April 2010; HIPPO IV, 14 June to 11 July 2011; and HIPPO V, 9 August to 9 September 2011).

Figure 11 .
Figure11.Time series of monthly mean CO concentration obtained from the WDCGG ground measurements (black), control run (blue), and reanalysis (red).Model simulation results with the HTAP emissions are also plotted (green).

Figure 12 .
Figure 12.Same as in Fig. 8, but for CO concentration obtained from MOZAIC/IAGOS aircraft measurements.

Figure 15 .
Figure 15.Time series of monthly total global and regional surface NO x emissions (in Tg N yr −1 , top), LNO x emissions (in Tg N yr −1 , centre), and surface CO emissions (in Tg CO yr −1 , bottom) obtained from the reanalysis (solid lines) and the emission inventories or the control run (dashed lines) over the globe (90 • S-90 • N), NH (20-90 • N), tropics (TR, 20 • S-20 • N), and SH (90-20 • S).The 8year mean emissions values obtained from the reanalysis run and the emission inventories (in bracket) are shown on the right-hand side.

Figure 16 .
Figure 16.Global distributions of surface NO x emissions (in 10 −13 kg m −2 s −1 ) (left column), LNO x sources (in 10 −14 kg m −2 s −1 ) (centre column), and surface CO emissions (in 10 −10 kg m −2 s −1 ) (right column) averaged over 2005-2012.The a priori emissions (upper row), a posteriori emissions (middle row), and analysis increment (lower row), i.e. the difference between the a posteriori and the a priori emissions, are shown for each panel.

Figure 17 .
Figure 17.Month-pressure cross section of the zonal mean bias of O 3 concentration (in %) compared with the ozonesonde observations averaged over 30-60 • for the reanalysis run (top) and the sensitivity experiment that excludes the emission factors from the state vector (w/o emission, bottom).

Figure 18 .
Figure18.Latitude-pressure cross section of the 8-year mean OH concentration (right panels) and time-latitude cross section of the monthly mean OH concentration averaged between 1000 and 300 hPa (left panels).The OH concentration obtained from the reanalysis (top panels) and the difference between the reanalysis and the control run (bottom panels) are also shown.Units are ppt.

Table 1 .
2005-2012us observation comparisons of the mean O 3 concentrations between the analysis or control run (in brackets) and the observations.The units of the root-mean-square error (RMSE) and bias are ppb.Results are provided for WOUDC ozonesonde observations during 2005-2012, MOZAIC/IAGOS aircraft measurements during2005-2012 , and HIPPO aircraft measurements during 2009 -2011.   .

Table 2 .
Linear trend (slope in ppb (8 years) −1 ) and standard deviation (in ppb) of O 3 derived from the WMO ozonesonde observations, the control run, and the reanalysis during2005-2012.

Table 3 .
Comparisons of the mean O 3 concentrations between the reanalysis run and the WOUDC ozonesonde observations in the Southern Hemisphere (SH) (90-30 • S), troposphere (TR) (30 • S-30 • N) and Northern Hemisphere (NH) (30-90 • N).The mean differences are shown for each year of the reanalysis period and for mean concentrations during 2005-2009 and during 2010-2012.The latter includes results for the control run given in brackets.

Table 5 .
Comparisons of global tropospheric NO 2 columns between the control run and the satellite retrievals in brackets, and between the reanalysis run and the satellite retrievals: OMI for 2005-2012, SCIAMACHY for 2005-2011, and GOME-2 for 2007-2012.S-Corr is the global spatial correlation coefficient.The bias represents the control run or reanalysis minus the retrievals.The averaging kernel of each retrieval is applied to the control run and the reanalysis.The units for the RMSE and bias are 10 15 molec cm −2 .

Table 6 .
between the control/reanalysis calculations and the ozonesonde observations for 2005 in the SH (90-30 • S), TR (30 • S-30 • N), and NH (30-90 • N).Sensitivity reanalysis calculations were conducted by excluding the emission factors from the state vector (w/o emission), with TES O 3 bias correction (TES-bias), without assimilation of TES measurements (w/o TES), and with HTAP-v2 emission inventories for 2008 as the a priori surface emissions (HTAP).