A global historical ozone data set and prominent features of stratospheric variability prior to 1979

We present a vertically resolved zonal mean monthly mean global ozone data set spanning the period 1901 to 2007, called HISTOZ.1.0. It is based on a new approach that combines information from an ensemble of chemistry climate model (CCM) simulations with historical total column ozone information. The CCM simulations incorporate important external drivers of stratospheric chemistry and dynamics (in particular solar and volcanic effects, greenhouse gases and ozone depleting substances, sea surface temperatures, and the quasi-biennial oscillation). The historical total column ozone observations include groundbased measurements from the 1920s onward and satellite observations from 1970 to 1976. An off-line data assimilation approach is used to combine model simulations, observations, and information on the observation error. The period starting in 1979 was used for validation with existing ozone data sets and therefore only ground-based measurements were assimilated. Results demonstrate considerable skill from the CCM simulations alone. Assimilating observations provides additional skill for total column ozone. With respect to the vertical ozone distribution, assimilating observations increases on average the correlation with a reference data set, but does not decrease the mean squared error. Analyses of HISTOZ.1.0 with respect to the effects of El Niño– Southern Oscillation (ENSO) and of the 11 yr solar cycle on stratospheric ozone from 1934 to 1979 qualitatively confirm previous studies that focussed on the post-1979 period. The ENSO signature exhibits a much clearer imprint of a change in strength of the Brewer–Dobson circulation compared to the post-1979 period. The imprint of the 11 yr solar cycle is slightly weaker in the earlier period. Furthermore, the total column ozone increase from the 1950s to around 1970 at northern mid-latitudes is briefly discussed. Indications for contributions of a tropospheric ozone increase, greenhouse gases, and changes in atmospheric circulation are found. Finally, the paper points at several possible future improvements of HISTOZ.1.0. Published by Copernicus Publications on behalf of the European Geosciences Union. 9624 S. Brönnimann et al.: A global historical ozone data set


Introduction
Stratospheric ozone affects the radiation budget of the atmosphere and therefore needs to be incorporated in climate model simulations.Models usually require a twodimensional (latitude, pressure) ozone distribution as a boundary condition.As the spatial distribution of ozone changes on different timescales related to external forcings, a transient ozone boundary condition is essential.While in the third Climate Modelling Intercomparison Project (CMIP-3) some models still used a time-invariant ozone climatology (Miller et al., 2006), current model simulations (CMIP5) mostly use a time-dependent ozone data set (Taylor et al. 2012).Since 1979, two-or three-dimensional ozone data sets are available from merging different satellite (or other) observations (e.g.Stolarski and Frith, 2006;Hassler et al., 2008Hassler et al., , 2013)).For CMIP5 simulations starting in 1850, a global monthly mean two-dimensional ozone data set was produced whose temporal variability was described using a regression approach based on stratospheric chlorine and the 11 yr solar cycle (Cionni et al., 2011).This data set was designed for coupled ocean-atmosphere simulations and consequently does not include interannual variability that arises due to nonforced climate variability.
For atmospheric general circulation models (AGCMs), a vertically and meridionally resolved ozone data set would be desirable that is as close to observations as possible and hence also reflects interannual variability, which may arise, e.g. from changes in sea surface temperatures (SSTs).For instance, the response of a climate model to a forcing such as volcanic eruptions may depend on the prescribed ozone field (e.g.Muthers et al., 2013).Moreover, for analysing stratospheric variability prior to 1979 and its relation to SSTs or external forcing, a vertically and meridionally resolved, historical ozone data set would also be valuable.
Here we present an approach for producing a monthly mean zonal-mean vertically resolved global historical ozone data set based on chemistry climate model simulations and historical (ground-based and satellite-based) total column ozone observations using data assimilation techniques.The main aim of this paper is to discuss the approach and point to possible improvements.The product, termed HISTOZ.1.0,is presented and analysed with respect to prominent variability prior to 1979, focusing on El Niño-Southern Oscillation, the 11 yr solar cycle, and the stratospheric ozone trend from 1957 to 1970.
The paper is organised as follows.Section 2 describes the data used and Sect. 3 gives an overview of the approach.In Sect. 4 we briefly report on the ancillary processing and analysis steps, with further details in the Supplement.In Sect. 5 we present HISTOZ.1.0and discuss validation results in the post-1979 period.Some prominent variations in the global ozone field such as those due to El Niño-Southern Oscillation (ENSO) or the 11 yr solar cycle as well as the apparent ozone increase between the 1950s and around 1970 are analysed in Sect.6. Conclusions are drawn in Sect.7.

Model simulations
In this study we use an initial condition ensemble of nine simulations from 1901 to 1999 that was performed with the chemistry-climate model SOCOL Version 2 (Schraner et al., 2008).The simulations are described in Fischer et al. (2008b).SOCOL is a combination of the middle atmosphere version of the ECHAM4 spectral AGCM (Manzini and McFarlane, 1998) and the chemistry-transport model MEZON (Egorova et al., 2003).It was run with a spectral truncation of T30 and 39 levels on a hybrid sigma-pressure coordinate system, with the model top at 0.01 hPa.A hybrid numerical advection scheme is used for transporting chemical species (Zubov et al., 1999), with the Prather advection scheme in the vertical (Prather, 1986) and the Semi-Lagrangian scheme in the horizontal direction (Williamson and Rasch, 1989).The model participated in the CCMval validation experiments (SPARC CCMVal, 2010) and the C20C intercomparisons for AGCMs (e.g.Scaife et al., 2008).Further comparisons are performed in Sect. 4.
The model was constrained at the boundaries with monthly SSTs and sea ice from the HadISST1 data set (Rayner et al., 2003) and land-surface properties based on Hagemann (2002).Solar variability was prescribed using spectral solar irradiance data from Lean (2000).Greenhouse gases and organic chlorine and bromine containing gases were prescribed in the lowermost five layers.In addition, surface emissions of CO and NO x were prescribed (see Fischer et al., 2008b for details).Stratospheric aerosols were taken from Sato et al. (1993).The model set-up included a nudging of the quasibiennial oscillation (QBO) using a preliminary version of the QBO reconstruction by Brönnimann et al. (2007b).

Ground-based total column ozone observations
Ground-based measurements of total column ozone have been made since the 1920s (for a review see Brönnimann et al., 2003a).In 1957, the start of the International Geophysical Year (IGY), a global standardized network, employing techniques superior to what had previously been available, was established.Hence we distinguish pre-IGY and post-IGY periods in the following analysis.
For the pre-IGY period, we use all series described by Brönnimann et al. (2003a), comprising several long, homogenised, and semi-continuous series such as those from Arosa since 1926 (Staehelin et al., 1998), Oxford, 1924-1975(Vogler et al., 2007), Tromsø since the 1930s (Hansen and Svenøe, 2005), Dombås and Oslo from the 1940s (Svendby,    and validation period .Ground-based and BUV data are shown in light and dark grey shading, respectively.Spitsbergen, 1950-1963(Vogler et al., 2006), as well as several shorter series.The latter were re-evaluated as described in Brönnimann et al. (2003b).Those historical data series qualified as "poor" in Brönnimann et al. (2003b) were excluded.In addition, measurements from Gulmarg and Srinagar were excluded due to their poor seasonal coverage.Despite their rather low quality, total column ozone data derived from measurements by the Smithsonian Institution at Table Mountain, California (Brönnimann, 2005), were used because they constitute an important source of information for the 1920s and 1930s.For the post-IGY period, data were obtained from the World Ozone and Ultraviolet Radiation Data Centre (WOUDC).We chose measurement series with at least 5 complete years prior to 1979 and few instrument changes.In all, total column ozone series at 57 locations were used (Fig. 1, Table S1).All selected series were also used after 1978 for testing and validation purposes.

2003), and
Monthly means were calculated as the average of the available observations if at least ten daily values were available.Moreover, at least eight monthly means must be available per sequence (i.e. per series obtained with the same instrument type and observation mode, see Table S1), otherwise the series were discarded.This criterion was necessary to retain enough degrees of freedom for the bias adjustment.Figure 2 gives the number of monthly mean values.The first observations became available in 1925.Until about 1950, measurements from around five sites are available in each month.This number then increases to around ten between 1950 and 1957 and to around 20 to 35 from the IGY onwards.

Backscatter ultraviolet (BUV) observations
Total column ozone was measured from space starting in 1970 with the backscatter ultraviolet (BUV) instrument onboard Nimbus-4 (Heath et al., 1973;Stolarski et al., 1997).Due to battery failure, coverage decreased after the first two years, but data were retrieved until 1976.The data quality is inferior to later missions and the data have only rarely been used in publications.Here we use reprocessed BUV data (Stolarski and Frith, 2006).Comparisons with ground-based total column ozone data were performed to check their quality (see Supplement).We make use of monthly zonal mean total column ozone in 5 • latitude bins.Figure 2 shows the number of available monthly means from BUV.Note that whenever a zonal mean value from BUV was available, no ground-based data from the corresponding 5 • latitude belt were assimilated (but all ground-based data were used for the pre-processing steps described in Sect.4).

Auxiliary observation-based data
Additional data sets were used in various pre-processing steps.With respect to ozone, we used the monthly BDBP vertically resolved ozone data set (Bodeker et al., 2013(Bodeker et al., ), 1979(Bodeker et al., -2007, as a reference.In addition, Version 8 TOMS satellite total column ozone data were used for validation and, in some cases (see Table S1), for calibrating the correction function for the zonal mean adjustment (see Sect. 4 and Supplement).Note that no TOMS total column ozone data were assimilated.For comparison purposes we also use total Number of monthly mean values assimilated in the assimilation period  and validation 979-1999).Ground-based and BUV data are shown in light and dark grey shading, respectively.and validation period .Ground-based and BUV data are shown in light and dark grey shading, respectively.column ozone from two reanalyes, namely from ERA-40 from 1957 to 2002 (Uppala et al., 2005) and from the twentieth century reanalysis (20CR, Compo et al., 2011) from 1901 to 2010.
For estimating zonal mean total column ozone from a station, we used 200 hPa GPH data from ERA-40 from 1957 to 2002 in a regression framework (see Sect. 4.3).ERA-40 was supplemented back in time by statistically reconstructed upper-level fields from Griesser et al. (2010) which are based on historical upper-air and surface data.Additionally, for one analysis we also include 200 hPa GPH from NCEP/NCAR reanalysis (Kistler et al., 2001).

Method
Our method starts from the modelled ozone distribution and uses a data assimilation approach to correct the model fields according to the sparse historical data.The resulting estimates are both physically consistent and consistent with the observations.In this Section we outline the principal steps of the approach.
Data assimilation combines information from observations with numerical models.It implies that the models have predictability on the timescales considered.In numerical weather prediction or reanalyses applications, assimilation time steps are short (hours) and predictability mainly comes from the initial conditions, i.e. the model is used to generate a short-term forecast, which is then corrected according to observations.The corrected fields are then used as new initial conditions for the next forecast.
In our application we use long assimilation time steps (seasons, although the data are monthly, see Sect.3.2).There is hardly any predictability on a seasonal scale from initial con-ditions.However, on a seasonal scale, total column ozone is strongly influenced by factors that are prescribed in the model such as sea surface temperatures, the quasi-biennial oscillation, or volcanic aerosols.Our simulations thus have seasonal predictability from the boundary conditions.In fact, Fischer et al. (2008b) show that a considerable fraction of the observed, zonally averaged total column ozone variability is captured by the SOCOL simulations based solely on the boundary conditions.
Since predictability comes from the boundary conditions, there is no need to update the initial conditions.This allows using pre-computed simulations in an off-line approach, which has many advantages.For instance, the procedure can be optimised and observations or error information can be updated without the need to repeat the simulations.

Ensemble square root filter
The basis of our approach (sketched schematically in Fig. S1) is the off-line data assimilation procedure described by Bhend et al. (2012).The approach aims to find an optimal state (in the model space), termed "analysis" or x a , starting from a "background" model state x b and adding a correction that depends on the observations y.The observation error covariance matrix R determines the weight given to each observation.The background error covariance matrix P b specifies how deviations in each element of the model state vector are related to deviations in each other element and thus determines how the information from the observations is spread into model space.
We use the ensemble square root filter (EnSRF, Whitaker and Hamill, 2002), a variant of the ensemble Kalman filter (EnKF, e.g.Evensen, 2003), to determine the correction (see Bhend et al., 2012 for the following).In our application, x b , a vector of length m, denotes the monthly mean ozone at all grid points (latitudes, levels) for a given time from one simulation (one member) in an initial-condition ensemble simulation.Each member, as well as the ensemble mean, are corrected using the observations y (all available monthly means for the corresponding time), a vector of length n.In the En-SRF, x b is decomposed into the ensemble mean x b and the deviations therefrom (x b ).Similarly, the correction is separated into an ensemble mean correction (Eq. 1) which is identical to the EnKF correction, and a correction of the anomalies from the ensemble mean (Eq.2): (1) where H, a matrix of size n × m, is the Jacobian matrix of the linear observation operator that mimics the observations from the model state x (see Sect. 3.3).The Kalman gain matrix K (m × n) is identical to the gain matrix in the classical EnKF approach (Eq.3).
The gain matrix for the ensemble anomalies is expressed as follows (Eq.4): where P b is the m × m background error covariance matrix, R is the n × n observation error covariance matrix (Sect.4 and Supplement).We assume that the observation errors are uncorrelated (R is diagonal), which allows a serial incorporation of each observation (i.e.R is a scalar).

State vector
In most data assimilation approaches, the vector x describes the model state at a given time.In our off-line data assimilation procedure, x is not used as initial condition, and hence there is no need to consider the entire model state.In our case, x comprises only zonal mean ozone concentrations.
The product is generated at monthly resolution.However, as ozone at middle to high latitudes has strong month-tomonth correlation during the summer months of the respective hemisphere (Fioletov and Shepherd, 2003), observations in a given month were allowed to affect neighbouring months in the same season by combining six consecutive months into one state vector x.In other words, x contains the monthly and zonally averaged ozone number densities from all grid cells for six months.Sudden steps due to gaps in individual observation records are expected to be smoothed and the sparse information is exploited more fully.Based on the correlation tables in Fioletov and Shepherd (2003), the sixmonth seasons are defined as May to October and November to April.Correlations between the seasons are low so that this procedure is not expected to introduce large steps at the transitions of the seasons.

Observation operator
The function H (x) transforms the model space into the observation space.For a given observation y: where ϕ(x) is the latitude of element x of x, ϕ is the grid spacing, and ϕ(y) is the latitude of the observation (i.e. the term max(0, 1 − |ϕ(x) − ϕ(y)|/ ϕ) provides the weights of a linear interpolation), z(x) is the layer thickness and the constant c = 2.6868×10 16 molec.cm −2 converts the result to Dobson Units (DU).The function mon(x) selects the month, i.e. it equals 1 (else 0) if element x pertains to the same month as y.The layer thickness is calculated from geopotential height (GPH) from the ensemble mean.For simplicity (and after finding only a very small dependence on interannual variability), the GPH profile from 1960 was used for all years.
As H expresses a linear combination of the elements of x, the Jacobian matrix of H , termed H, consists of its coefficients: In our implementation, H is a vector (because observations are assimilated serially) and its scalar product with x gives the integrated and interpolated ozone column for the latitude and month of the observation.

Data processing and auxiliary steps
Before the sketched approach can be applied, many auxiliary steps are necessary, which are briefly summarized here; each paragraph in this Section corresponds to a Section in the Supplement, which gives a detailed technical description.

Model assessment
As already discussed and shown by Fischer et al. (2008b), the modelled total column ozone agrees well with observations.However, further analyses are necessary to assure the physical consistency of the modelled fields, since it affects the covariance structure, which has an important effect on the outcome.In a regression approach, we analysed the response of the zonal mean ozone fields to several important forcing factors.Results are shown in the Supplement and demonstrate that SOCOL reproduces the expected effects of the QBO, volcanic eruptions, solar irradiance changes, and ENSO in a statistical sense and in a physically and chemically consistent manner.The assessment also shows, however, that the annual cycle at 25 hPa and higher levels does not agree well with observations (BDBP) in the extratropics (see Fig. 3).

Debiasing of model data
Ozone in SOCOL has a seasonally varying bias relative to the BDBP data set.This bias was removed by means of a regression model calibrated in the overlapping period 1979-1999 in each calendar month (see Supplement).The results (Fig. 3) show that the debiasing brings the time series closer to the observations and again shows good agreement with respect to interannual variability, but trends differ in some cases.The procedure also resolves, in a statistical sense, the problem identified in the previous paragraph concerning the mismatch of the extratropical annual ozone cycle at higher levels as compared to observations.

Adjustment of ground-based total column ozone to zonal means
The ground-based total column ozone data represent a given location, whereas the assimilation requires zonal averages.
Because station data are too sparse to form averages over latitude bands, the data from each station were adjusted to represent the zonal mean total column ozone of the corresponding latitude.The adjustment was performed based on 200 hPa GPH, assuming that deviations of local total column ozone from zonal mean total column ozone behave similarly as deviations of local 200 hPa GPH from zonal mean 200 hPa GPH.Previous studies have found a very close relation between upper-level GPH and total column ozone, specifically at mid-latitudes (e.g.Steinbrecht et al., 1998;Brönnimann et al., 2000) and this relation was also found to be strong in historical data (Vogler et al., 2007;Brönnimann and Compo, 2012).Although the procedure works relatively well, it clearly adds uncertainty that needs to be taken into account when specifying the errors (Fig. S3).

Debiasing of total column ozone observations
Recall that the model data were debiased with respect to BDBP in the overlapping period.A similar debiasing is necessary for the total column ozone observations (both groundbased and BUV satellite data).We applied H to the debiased model data x b and calculated the difference from observations, i.e. y −H (x b ).The first harmonic of the seasonal cycle of this difference was then used to correct the observations.

Construction of observation error covariance matrix
R is a diagonal matrix that contains the observation error variances.We assume that the error variance σ 2 obs consists of three additive terms: the error variance of a zenith observation, an air mass dependent error variance, and an error variance due to an insufficient adjustment to zonal means (see Supplement).The error of a zenith observation was determined based on metadata and literature information (with standard deviations between 4 and 20 DU).The air mass dependent error was estimated based on a statistical analysis of BUV total column ozone and co-located ground-based total column ozone data.It was estimated as 2 % per unit air mass for BUV and 0.5 % for ground-based total column ozone (and assuming solar noon observations).Finally, the error due to an insufficient adjustment to zonal means was estimated from applying the adjustment procedure to TOMS spatially resolved total column ozone data and comparing to the TOMS zonal mean.This error was additionally weighted to account for the fact that neighbouring stations have correlated errors (see Supplement).

Background error covariance
The background covariance matrix P b was estimated from the ensemble covariance matrix.Due to the small ensemble size, spurious correlations off the diagonal may occur which affects the results.The covariance matrix was therefore localised in latitude (leaving the altitude dimension unaltered), i.e. covariances between distant latitudes were reduced towards zero using a distance weighting.Because ozone is strongly affected by the BDC, which shows a distinct, seasonally dependent, latitudinal structure, the distance weighting itself depends on season and latitude (see Supplement, Fig. S4).A localisation was also applied in time, using a length scale of 3 months (consistent with Fioletov and Shepherd, 2003).

Assessment of observation errors
The consistency of errors was assessed monthly by comparing the variance of differences between observations and model y − H (x b ) with the sum of the variances of the observation error, σ 2 obs , and the background error (i.e. the ensemble variance σ 2 ens , see Supplement).The variance of the differences was slightly higher than expected, but in general our error estimates that are based on metadata and independent analyses are broadly consistent with the expected differences.Further improvements on the side of the observations (debiasing, homogenisation, outlier screening) might be beneficial.
All processing steps are summarized, using Rome as an example site, in Fig. 4 (top, note that November-April averages are shown here).The zonal correction of the observations (orange) removes some of the variance of the original raw series (black).The subsequent debiasing of the observations (blue) does not change the variance but brings the curve closer to H (x b ) (dashed).The middle panel shows debiased observations (solid) as well as H (x b ) (dashed, November-April averages) for three sites (Rome, Nashville, Tateno) and one BUV series (35-40 • N).All series represent similar latitudes.As a consequence, H (x b ) is quite similar for all sites.The zonal correction should bring the three observationbased series (which are at very different longitudes) closer together.Indeed, the mutual correlations increase from 0.6-0.7 to 0.65-0.8,but differences remain.BUV shows an outlier (winter 1975), which is related to an observation error.The procedure rejects observations outside ±3(σ 2 obs + σ 2 ens ) 0.5 , hence the erroneous value is not assimilated.

The HISTOZ.1.0 data
The assimilation corrects the zonally averaged background ozone field in such a way as to best match all adjusted total column ozone observations.Corrections are usually largest at the locations of observations and at altitudes of high ozone variability.However, large corrections sometimes also occur in polar regions, even though little information is available (see Fig. S5 for four example months).In addition, corrections of opposite sign are sometimes found equatorward of the assimilated information, pointing to the important role of P b .
Figure 4 (bottom) shows results for the zonally adjusted, debiased total column ozone average from Rome, Nashville, Tateno and the corresponding series from x a and x b (and x a and x b ).It is apparent that x a is much closer to the observations than x b .The same holds for x a and x b .Correlations with observations increase from 0.69 to 0.90.The ensemble spread also decreases considerably.The observations very often lie outside the ensemble spread (assuming uncorrelated errors, we estimated the observation error in Fig. 4 error to around 2.5 DU) and hence the model might be overconfident.No sudden jumps in the corrections are found during the time of season transition.
The HISTOZ.1.0ensemble mean at 25.1 hPa is compared to the SPARC data set (Cionni et al. 2011) (interpolated to 25.1 hPa) and BDBP (from 1979 onward) (Fig. 5) in the form of latitude time cross sections.As expected, HISTOZ.1.0shows clear interannual variability which is lacking in the Cionni et al. (2011) data by construction.The interannual variability in HISTOZ.1.0 is already seen in the first subperiod plotted (1901-1926, i.e. before observations are assimilated), generated by the model boundary conditions.In the last period, i.e. from 1979 onward the agreement between HISTOZ.1.0and BDBP is very good for the tropical and subtropical regions, but differences are found over the polar regions, especially Antarctica, where HISTOZ.1.0shows lower minimum values than BDBP.

Validation using quasi-independent data
In the following, the agreement between HISTOZ.1.0and quasi-independent data sets is assessed more quantitatively.Note that this is only possible for a validation period after 1979, where quasi-independent data of high quality are available.The pre-1979 part of HISTOZ.1.0cannot be validated in this way; some comparisons with total column ozone from reanalyses for the early period are shown in the next subsection.
We use the reduction of error RE (see Cook et al., 1994) as a measure of skill.RE measures the squared differences between the candidate data set x c and "truth" x t (a validation data set) and compares this statistic with a "no knowledge prediction" x 0 such as a climatology: For x c = x 0 , RE is 0. Often x 0 has little or no variance.As a consequence, if x c has much more variance, it may lead to negative RE values even if x c correlates with x t (the RE measure tends to punish high variance).In general, positive RE values are assumed to indicate skill.In our application, we use the debiased model background (x b is used for evalu- As "truth" we use TOMS for total column ozone and BDBP for the vertical ozone distribution.The analysis was performed for the period 1979-1999.During this period, total column ozone observations from around 30 stations (no satellite data) were assimilated, which is typical for the post-1957 period.Note that the validation period contains two volcanic eruptions, anthropogenically driven ozone depletion, and strong trends in atmospheric circulation (e.g.positive trends in the North Atlantic Oscillation) that are generally not well depicted by climate models (e.g.Scaife et al., 2008).It thus provides a rather strong test.
The validation against TOMS zonal mean total column ozone shows mostly positive RE values (Fig. 6), indicating that HISTOZ.1.0 is mostly closer to TOMS than x b .With respect to the 1979-1999 climatology of x b , RE reaches values between 0.3 and 0.6, indicating the total skill of the product.The high skill in the equatorial zone is arguably due to the well-modelled effect (after the seasonal debiasing) of the QBO on ozone.Using x b as x 0 , values are close to 0 in the tropics (where almost no station data are assimilated) and around 0.2-0.5 in the extratropics.This skill results entirely from assimilating observations.The skill for the ensemble mean is mostly higher than the skill for the individual ensemble members, but the latitudinal structure is very similar.
The validation of the vertical structure using BDBP as "truth" (here we only analyse the ensemble mean) shows much lower RE values (Fig. 7).There is skill relative to climatology, stronger on a seasonal mean basis than on a month-to-month basis.This suggests that most of the skill comes from the simulations, which show stronger responses to boundary conditions on seasonal-to-interannual than on month-to-month timescales.Relative to x b there is hardly any skill.Although the assimilation correctly increases or decreases the amount of ozone in the column for a given latitude, it may put the difference at a slightly wrong altitude or add variance, thus decreasing RE.Degrading the spatiotemporal resolution (seasonal mean, 10 • latitude bins, 5 al-titude levels) before calculating RE leads to slightly higher values (not shown).
In terms of correlation with BDBP on a seasonal mean basis, HISTOZ.1.0performs slightly better than x b (Fig. 8), showing that the assimilation of observations slightly increases the correlation.This implies that the lack of skill in terms of RE values may be related to increased variance.The covariance matrix was not localised in the vertical dimension, hence applying a localisation might lead to improved results.
In all, the results show that HISTOZ.1.0has considerable skill for total column ozone and for the seasonally averaged vertical ozone distribution.The skill in the vertical ozone distribution mainly comes from the simulations.Assimilating the observations slightly improves the vertical ozone distribution in terms of correlation, but does not reduce the mean squared errors.As only total column ozone is assimilated, skill in the vertical structure (with respect to x b ) can only come through the background error covariance matrix P b , pointing to the need to further improve the specification of this matrix.Another reason for low skill is that x b (used as x 0 ) was debiased with respect to same data set (BDBP) and the same time period as used for validation (unlike the debiasing for y) with the result that its error is small by construction.The skill added by the observations is largest in the lower stratosphere of the mid-latitudes where dynamically induced ozone changes are large and observations are available.

Comparison with reanalysis data
Although no other observation-based ozone data set is available for comparison in the pre-1979 time period, it is interesting to compare total column ozone from HISTOZ.1.0with independent total column ozone data from reanalyses.Note that the latter data themselves have large uncertainties.Here we use data from ERA-40 reanalyses (1957-1978) and 20CR (1924-1978).Both data sets have been compared with observations previously (Dethof and Hólm, 2004;Kunz et al., 2007;Brönnimann and Compo, 2012).Pre-satellite total column ozone in ERA-40 at northern mid-latitudes was found to be biased high, with a strong increase during the 1957-1978 period.For 20CR, a good correspondence with observations was found on the day-to-day scale at northern mid-latitudes (Brönnimann and Compo, 2012).In the following we focus on boreal winter total column ozone at mid latitudes, which is where we expect the skill of all products to be largest.As almost no total column ozone observations had been assimilated prior to 1927, we use data from the winter 1927/28 onward.
For zonal mean total column ozone near 48 • N (Fig. 9) we find a relatively good agreement between HISTOZ.1.0and 20CR in terms of interannual variability.The Pearson correlation coefficient (r = 0.43) is statistically significant for the analysed period .This is due to the assimilated     (1952) and others, the trend peaking in 1970 by Komhyr et al. (1971), Johnston et al. (1973) and Goldsmith et al. (1973).The maxima have also triggered attention in recent years (e.g.Brönnimann et al., 2004;Fischer et al., 2008aFischer et al., on the 1940Fischer et al., -1942Fischer et al., and 1976 extrema) extrema) and were analysed in the context of ENSO.The total column ozone increase peaking in 1970 has been interpreted in terms of tropospheric ozone increases (Shindell and Faluvegi, 2002).

Analyses
In this section three analyses of HISTOZ.1.0are presented.They concern the effects of the El Niño-Southern Oscillation (Sect.6.1), and of the 11 yr solar cycle (Sect.6.2) on ozone.There is ample literature for both effects, but based primarily on the last 30 yr, which unfortunately also carry other strong external signals (volcanic eruptions, ozone depletion, greenhouse gases).HISTOZ.1.0provides an opportunity to analyse these effects in an independent period that is less disturbed by other forcings.In Sect.6.3.we also briefly revisit the total column ozone increase between 1957 and 1970.

El Niño-Southern Oscillation (ENSO)
It has been suggested that ENSO affects the distribution of ozone in the stratosphere by changing the strength of the BDC (e.g., Sassi et al., 2004;Brönnimann et al., 2004).The SOCOL model (Fischer et al., 2008a), as well as the simulations used here (Fischer et al., 2008b), have been assessed for the effects of ENSO on ozone (see also Fig. S2).Here we focus on particular, strong events.We start with the ENSO cycle 1939-1944, comprising the long-lasting El Niño event in 1940-1942 (Brönnimann et al., 2004) that was apparent in (orange dashed).(middle) November-to-April averages of averages of zonal mean total column ozone °N in four SOCOL ensembles: all forcings simulations (i.e., x b , thick light grey), fixed greenhouse gas rations (fixed GHG, thin dark grey, scaled to match HISTOZ.1.0 in the 1928-1957 period), fixed ozone g substance (fixed ODS, purple) and fixed tropopsheric ozone precursors (fixed CO/NOx, dark olive).
: November-to-April averages of total column ozone near 48° N from HISTOZ (black) and zonal mean GPH (right scale) from different reanalysis data sets.boreal winter and a decrease in the tropics, but the differences vanish when analysing calendar year averages (Fig. 10a).In contrast, the observations show an ozone increase in the extratropics in the annual means (Brönnimann et al., 2004).
In HISTOZ.1.0,after assimilating the observations (Fig. 10b), the El Niño-La Niña difference is more pronounced.The meridional structure is consistent with a strengthening of the BDC.The difference (between HIS- TOZ.1.0and x b ) of the El Niño-La Niña differences is shown in Fig. 10c, indicating the locations of stations whose data were assimilated.As expected, the assimilation increases ozone at latitudes where observations are available.Interestingly, an even more pronounced increase is found for the polar regions.Furthermore, the procedure decreases ozone in the tropics, although no observations are available from that region.This is consistent with a strengthening of the BDC, showing that the correction resulting from the assimilation is physically meaningful.The 1969/70 positive and 1975/76 negative excursions in Fig. 9 might also partly be due to ENSO events.The effect of the 1976 La Niña event on ozone was modelled in Fischer et al. (2008a).Here we contrast the 1969/70 El Niño event (which was only moderate) with the strong 1975/1976 La Niña event by compositing differences in the annual means (1969-1970 minus 1975-1976).The results (Fig. 10, bottom row) again show a sign of an increase in strength of the BDC with El Niño events.An additional contribution may have come from the solar cycle (see Sect. 6.2).
Most previous studies on the ozone response to ENSO have focused on the period after 1979.However, two of the El Niño events during this period concurred with volcanic eruptions, which may obscure a clean attribution to ENSO, and not many other strong events have been observed.Using HIS-TOZ.1.0we analysed the average ozone fields in January-March and composited all strong El Niño winters after 1934 (to ensure a minimum number of assimilated observations) minus strong La Niña winters using the list given in Brönnimann et al. (2007a) (see Table 1).The difference (Fig. 11a) shows a very prominent ozone signature with an increase in the northern extratropics and a decrease in the tropics (relative deviations are shown in Fig. S6).This pattern, reflecting an increase in strength of the BDC, is consistent with the literature (see Sassi et al., 2004;Manzini et al., 2006;Brönnimann et al., 2004;Fischer et al., 2008b).The composite of three El Niño and three La Niña events after 1979 from BDBP (Fig. 11b) shows a signal that is much less clear and even shows a decrease in northern high latitudes.

The 11 yr sunspot cycle
Analyses have also been performed with respect to the effect of the 11 yr solar cycle.Rather than reconstructed total solar irradiance from Lean (2000), which is only annually resolved, we used the monthly International Sunspot Number record, available from the Royal Observatory of Belgium's website (http://sidc.oma.be/sunspot-data/SIDCpub.php) as the proxy for solar activity.After smoothing the series with a 24-month moving average, we chose periods of 24 consecutive months of maximum and minimum activity in the 11 yr sunspot cycle (Table 2), subdivided into pre-1979 (4 cycles, analysed in HISTOZ.1.0)and post-1979 periods (3 cycles, BDBP).Note that using Lean (2000) total solar irra-diance, the same maxima and minima would be found within the uncertainty of the temporal resolution.
The post-1979 period shows an ozone increase throughout the stratosphere with increasing solar activity, strongest in the lower stratosphere (in number density) or upper stratosphere (in mixing ratio, see also Fig. S2).The average difference in x b in the pre-1979 period (Fig. 12) shows a similar pattern, although the increase in the northern hemisphere is no longer uniform, and differences arise in the tropical tropopause region (the same plot is given in Fig. S6, with ozone changes expressed as percentage).The large increase near 40 km altitude is comparable with the literature (e.g.Hood and Soukharev, 2012) and can be explained by direct photochemical effects.Signatures in the tropical lower stratosphere are likely dynamical consequences of the primary upper-stratospheric signal.Compared to x b , the signature in HISTOZ.1.0 is slightly weaker.Note that the El Niño period around 1970 corresponds to a solar maximum and the La Niña around 1976 to a solar minimum, which may have contributed to the signal found in the previous figure.In all, the new results confirm the findings from previous studies based on the post-1979 period.

The total column ozone increase from the 1950s to around 1970
The apparent total column ozone increase from the 1950s to around 1970 is briefly studied in this subsection.Shindell and Faluvegi (2002), highlighting the importance of this period for understanding radiative forcings, found that the apparent total column ozone increase took mostly place in the northern hemisphere and can be attributed to tropospheric ozone increase.The vertical structure of the linear trend in annual mean HISTOZ.1.0ozone from 1957 to 1970 (Fig. 13b) shows that the ozone increase originates in the lower and middle stratosphere, while the lowermost stratosphere shows a decrease.SOCOL has only a crude treatment of tropospheric chemistry and arguably underestimates regional tropospheric ozone production.Nevertheless, x b shows a slight increase in tropospheric ozone over northern mit-latitudes (Fig. 13a).Compared to x b , HISTOZ.1.0amplifies the total column ozone trend (for mid-latitudes in boreal winter, see Fig. 9).This amplification is due to less negative trends in the lowermost stratosphere, but also an increase in tropospheric ozone poleward of 40 • N (Fig. 13c).Given the low skill of HISTOZ.1.0 in the vertical distribution and given the crude tropospheric chemistry, it cannot be excluded that more of the total column ozone increase is related to tropospheric ozone increase.Interestingly, a positive trend at 48 • N is also found for total column ozone in 20CR (Fig. 9), which does not explicitly include chemistry nor resolve changes in the BDC.A possible cause in 20CR might be decadal circulation changes near the tropopause.In fact, zonal mean 200 hPa GPH near 48 • N in boreal winter shows a decrease in NCEP/NCAR reanalysis and ERA-40 (Fig. 9, bottom).Ozone has a very long lifetime in the lowermost stratosphere, so that even decadal scale changes in circulation may cause changes in the ozone distribution (see Steinbrecht et al., 1998).Decadal changes in atmospheric circulation from the 1950s to around 1970 thus might have contributed to the apparent total column ozone change.Thompson et al. (2010) have pointed to an abrupt change in Northern Hemispheric SSTs between 1968 and 1970, near the end of the period of the ozone increase.The relation between the ozone changes and climate trends thus remains to by further studied.Shindell and Faluvegi (2002) find that increasing greenhouse gas concentrations may have led to a negative stratospheric ozone trend over the 1957-1970 period due to an increase in stratospheric water vapour.While a detailed attribution study is beyond the scope of this paper, we also analysed three small (3 members each) ensembles of SOCOL simu-lations that were performed in the same way as outlined in Sect. 2 except that one factor per ensemble was kept constant.In one set, greenhouse gases were fixed at their 1900 value, in a second set (starting 1950), ozone depleting substances (ODS) were fixed at their 1950 value, and in a third set (also starting 1950), tropospheric ozone precursor emissions (CO and NO x ) were fixed at their 1950 values.Comparing total column ozone in these simulations (ensemble means) to x b for 48 • N in boreal winter (Fig. 9b) implies again a small contribution from tropospheric ozone formation, but a larger contribution from greenhouse gas changes.The comparison also shows that the ozone increase would have even been higher without ODS, which in the model already had an influence on ozone from the mid-1960s onwards.
In summary, we find indications for three contributions to the total column ozone increase at northern mid-latitudes from the 1950s to around 1970: tropospheric ozone formation, climate effects of the greenhouse gas increase, and decadal changes in atmospheric circulation, all of which counteracted a negative trend expected from ozone depleting substances.Further studies involving HISTOZ.1.0,other data sets, and model simulations may help elucidating the causes of the ozone increase in the 1957-1970 period more fully.

Conclusions and outlook
In this paper we have produced, validated, and analysed a two-dimensional ozone data set from 1900-2008 termed HISTOZ.1.0that can be used as a boundary condition for climate model simulations.The data set is based on chemistryclimate model simulations up to 1925 and an off-line ensemble Kalman filter approach, combining the simulations with historical total column ozone data afterwards.Independent validation suggests that the data set has high skill relative to a climatology (both for total column ozone and for the vertical ozone distribution).Relative to the model background, additional skill is found for total column ozone.For the vertical distribution, a slight increase in correlations with the validation data set is found as compared to the model background, but no decrease in the mean squared error.An analysis of the difference fields between HISTOZ.1.0and the model background shows that the increments are physically meaningful.HISTOZ.1.0supports analyses of the effects of ENSO and the 11 yr solar cycle on ozone in periods that have not yet been previously studied.Importantly, the 1930s to 1970s period studied here, unlike the satellite period that is studied in most other publications, is much less strongly affected by volcanic eruptions, ozone depletion, and greenhouse gas emissions.Our results largely confirm previous studies.In the case of ENSO, the signature is more robust than in the satellite period.A clear strengthening of the BDC during El Niño winters relative to La Niña winters is found.High solar irradiance elevates ozone throughout the stratosphere, again consistent with previous studies.Furthermore we have addressed the increase in northern extratropical total column ozone from the 1950s to around 1970.Although a more detailed study is beyond the scope of this paper, we found indications for a contribution of troposheric ozone, changes in atmospheric circulation as well as for greenhouse gas forcing, all of which counteracted a decreasing tendency due to ozone depleting substances.
Several shortcomings of HISTOZ.1.0were identified and improvements could be explored for future versions of the data set, comprising the number of ensemble members, and the quality of the simulations themselves, as well as details in the set-up of the assimilation such as the debiasing of observations, the localisation of the background error covariance matrix, or the incorporation of a non-diagonal R matrix.Using the same general approach (outlined by Bhend et al., 2012), it might also be possible to assimilate upper-air information to better constrain ozone or to generate 3-dimensional ozone distributions.

Figure 2 .
Figure 2. Number of monthly mean values assimilated in the assimilation period (1925-1978) and validation

Fig. 1 .
Fig. 1.Map showing the locations of the ground-based total column ozone stations used in this study.

Fig. 2 .
Fig. 2. Number of monthly mean values assimilated in the assimilation period (1925-1978) and validation period (1979-1999).Ground-based and BUV data are shown in light and dark grey shading, respectively.

Figure 3 .
Figure 3.Comparison of zonal mean ozone at two levels and in three latitude regions from the original SOCOL simulations (red dotted), debiased SOCOL simulations (blue) and BDBP (thick black).

Fig. 3 .
Fig. 3. Comparison of zonal mean ozone at two levels and in three latitude regions from the original SOCOL simulations (red dotted), debiased SOCOL simulations (blue) and BDBP (thick black).

Figure 4 .
Figure 4. Example of the pre-processing steps of monthly mean total column ozone observation series perf in this paper.Top: Comparison of raw series (black), zonally adjusted series (orange), debiased series solid), and ensemble mean background (debiased SOCOL, blue dashed) for the case of Rome.Middle: de observations (solid) and ensemble mean background (debiased SOCOL, dashed) for Rome, Tateno, Nas and BUV zonal mean data at 35-40°N.Bottom: Comparison of debiased observations (black, grey sh indicates the estimated observation error), background (ensemble mean and members, red and orange) a final HISTOZ.1.0product (ensemble mean and members, dark and light blue) for the average of the stations Rome, Tateno, Nashville.

Fig. 4 .
Fig. 4. Example of the pre-processing steps of monthly mean total column ozone observation series performed in this paper.Top: comparison of raw series (black), zonally adjusted series (orange), debiased series (blue solid), and ensemble mean background (debiased SOCOL, blue dashed) for the case of Rome.Middle: debiased observations (solid) and ensemble mean background (debiased SOCOL, dashed) for Rome, Tateno, Nashville, and BUV zonal mean data at 35-40 • N. Bottom: comparison of debiased observations (black, grey shading indicates the estimated observation error), background (ensemble mean and members, red and orange) and the final HISTOZ.1.0product (ensemble mean and members, dark and light blue) for the average of the three stations Rome, Tateno, Nashville.

Figure 7 .
Figure 7. RE values for the zonal mean vertical ozone distribution derived from comparing HISTOZ.1.0with BDBP (only for the ensemble mean) for boreal winter and summer periods.

Figure 8 .
Figure 8. Difference in the Pearson correlation coefficient between r(HISTOZ, BDBP) and r(x b , BDBP) for the ensemble mean for boreal winter and summer periods.

Fig. 7 .
Fig. 7. RE values for the zonal mean vertical ozone distribution derived from comparing HISTOZ.1.0with BDBP (only for the ensemble mean) for boreal winter and summer periods.

Fig. 8 .
Fig. 8. Difference in the Pearson correlation coefficient between r(HISTOZ, BDBP) and r(x b , BDBP) for the ensemble mean for boreal winter and summer averages.

Fig. 9 .
We contrast the mean ozone values for the El Niño event 1940-1942 with the averages of the years 1939, 1943 and 1944, during which La Niña was present.The model correctly simulates an increase in ozone in the extratropics in 9633 32 9. (top) November-to-April averages of zonal mean total column ozone near 48° N in HISTOZ.1.0x b (thick light grey), 20CR (green dashed, scaled to match HISTOZ.1.0 in the 1958-1978 period), and

Figure 10 .
Figure 10.Annual mean zonal mean ozone differences for El Niño and La Niña for (a,d) the background, (b,e) HISTOZ.1.0,and (c,f) the difference HISTOZ.1.0minus background.Arrows in (c) indicate locations where observations were assimilated.The upper row shows the differences between 1940-1942 and the years 1939,1943, and 1944, the lower row shows the difference between the years1969-1970 and 1975-1976.

Figure 11 .
Figure 11.Differences in Jan-Mar averages of zonal mean ozone for El Niño minus La Niña winters (see Table 3) for the pre-1979 period (a, based on HISTOZ.1.0)and the post-1979 period (b, BDBP).Percentage deviations are shown in Fig. S10.

Figure 10 .
Figure 10.Annual mean zonal mean ozone differences for El Niño and La Niña for (a,d) the background, (b,e) HISTOZ.1.0,and (c,f) the difference HISTOZ.1.0minus background.Arrows in (c) indicate locations where observations were assimilated.The upper row shows the differences between 1940-1942 and the years 1939,1943, and 1944, the lower row shows the difference between the years1969-1970 and 1975-1976.

Figure 11 .
Figure 11.Differences in Jan-Mar averages of zonal mean ozone for El Niño minus La Niña winters (see Table 3) for the pre-1979 period (a, based on HISTOZ.1.0)and the post-1979 period (b, BDBP).Percentage deviations are shown in Fig. S10.

Figure 12 .
Figure12.Zonal mean ozone differences for solar maximum minus solar minimum (see Table4) prior to 1979 in Figure 12.Zonal mean ozone differences for solar maximum minus solar minimum (see Table 4) prior to 1979 in (a) the background, (b) HISTOZ.1.0)and (c) the difference HISTOZ.1.0minus background.Panel (d) shows corresponding differences in the post-1979 period (BDBP).Percentage deviations are shown in Fig. S10.

Table 2 .
Definition of periods of solar maxima and minima.We used January-December 2007 for the last minimum as BDBP ends in 2007. *