Atmospheric Chemistry and Physics Open Access Atmospheric Measurement Techniques Validation of ozone monthly zonal mean profiles obtained from the version 8.6 Solar Backscatter Ultraviolet algorithm

Abstract. We present the validation of ozone profiles from a number of Solar Backscatter Ultraviolet (SBUV and SBUV/2) instruments that were recently reprocessed using an updated (version 8.6) algorithm. The SBUV data record spans a 41 yr period from 1970 to 2011 with a 5 yr gap in the 1970s. The ultimate goal is to create a consistent, well-calibrated data set of ozone profiles that can be used for climate studies and trend analyses. SBUV ozone profiles have been intensively validated against satellite profile measurements from the Microwave Limb Sounders (MLS) (on board the UARS and Aura satellites) and the Stratospheric Aerosol and Gas Experiment (SAGE II) and ground-based observations from the microwave spectrometers, lidars, Umkehr instruments and balloon-borne ozonesondes. In the stratosphere between 25 and 1 hPa the mean biases and standard deviations are mostly within 5% for monthly zonal mean ozone profiles. Above and below this layer the vertical resolution of the SBUV algorithm decreases. We combine several layers of data in the troposphere/lower stratosphere to account for the lower resolution. The bias in the SBUV tropospheric/lower stratospheric combined layer relative to similarly integrated columns from Aura MLS, ozonesonde and Umkehr instruments varies within 5%. We also estimate the drift of the SBUV instruments and their potential effect on the long-term stability of the combined data record. Data from the SBUV instruments that collectively cover the 1980s and 2000s are very stable, with drifts mostly less than 0.5% per year. The features of individual SBUV(/2) instruments are discussed and recommendations for creating a merged SBUV data set are provided.


Introduction
As levels of chlorine and other ozone depleting constituents in the atmosphere stabilize and start to fall, we expect stratospheric ozone values to begin to recover.Atmospheric chemistry models predict a slow recovery, with stratospheric ozone levels returning to or exceeding 1980 values in the middle of the century between 2037 and 2056 (Strahan et al., 2011).To isolate long-term trend signals from natural ozone variations and quantify the rate of ozone recovery in observations, very well calibrated data sets are required.Several recent studies have demonstrated increases in stratospheric ozone at mid and high latitudes, which are marginally statistically significant (Newchurch et al., 2003;Yang et al., 2006), but longer data records are needed to verify and increase the significance of these results.We rely on models to predict future ozone evolution, but those models must be validated using past and current observations to give confidence in their future results (WMO, 2011).
The Solar Backscatter Ultraviolet (SBUV) series of instruments provide the longest available record of global ozone profiles, with nearly continuous data over a 41 yr time span, from April 1970 to the present, except for a 5 yr gap in the 1970s (McPeters et al., 2013).The data set includes ozone profile records obtained from the Nimbus-4 BUV and Nimbus-7 SBUV instruments, and a series of SBUV(/2) instruments launched on NOAA operational satellites (NOAA's 9,11,14,16,17,18 and 19).Two instruments are currently operational.In addition, the Ozone Mapper and Profiler Suite (OMPS) -a successor of the SBUV -was recently launched onboard the Suomi NPP (National Published by Copernicus Publications on behalf of the European Geosciences Union.

N. A. Kramarova et al.: Validation of ozone monthly zonal mean profiles
Polar-orbiting Partnership) satellite.Data from the OMPS instrument will be used to continue the 40 yr record of ozone observations.OMPS observational data have been acquired since March 2012, and the instrument is performing well, exceeding its design requirements.The data are consistent with that from SBUV/2 instruments on NOAA's 17, 18, and 19, even though the calibration is still being finalized.
Although modifications in instrument design were made in the evolution from the BUV instrument to the modern SBUV/2 model, the basic principles of the measurement technique and retrieval algorithm remain the same (Bhartia et al., 2012).Each instrument makes nadir-viewing measurements of ultraviolet radiation scattered by Earth's atmosphere.Measurements are made at 12 wavelengths ranging from 252 nanometers (nm) to 340 nm.Recently, all SBUV data have been reprocessed using the updated version 8.6 (hereafter v8.6) algorithm (Bhartia et al., 2012).Part of the v8.6 processing included an updated intercalibration of the individual instruments to a common standard (DeLand et al., 2012).The homogeneity of measurements allows us to create a consistent, calibrated data set of ozone profiles for use in climate study and trend analysis.However, to properly apply these data we must have a thorough understanding of the uncertainties in each instrument, and how these uncertainties translate to uncertainties in the combined record.For this purpose it is important to validate ozone profiles obtained from each individual instrument against coincident satellite and ground-based measurements.Hereafter we will use "SBUV" to refer in general to all instruments and will specify individual instruments by their satellite name (for example, N9 refers to the NOAA 9 SBUV/2).
In this study we analyze the newly processed v8.6 SBUV (N4-N18; N19 not included in this study) monthly zonal mean profiles based on comparisons with a number of independent profile measurements.In Sect. 2 we provide a description of the SBUV version 8.6 data set and correlative independent data and discuss the validation methods used.In Sect. 3 we estimate and analyze biases and standard deviations of the SBUV instruments relative to independent measurements in the stratosphere between 25 and 1 hPa.In Sect. 4 we validate the SBUV ozone amounts in the thick layers with the corresponding amounts obtained from Aura MLS (250-25 hPa and 250-16 hPa), ozonesonde (surface-30 hPa) and Umkehr (surface-30 hPa) measurements.In Sect. 5 we estimate drifts for each individual SBUV instrument relative to independent satellite and ground-based measurements.In the final section we present conclusions, including recommendations we follow when constructing the merged SBUV data set.
2 Data sets and methods

SBUV version 8.6 data
Several processing changes were made in version 8.6.The Brion-Daumont-Malicet ozone cross sections (Daumont et al., 1992;Brion et al., 1993;Malicet et al., 1995) were used instead of Bass and Paur cross sections.In addition, a new cloud height climatology derived from Aura OMI (Ozone Monitoring Instrument) measurements (Joiner et al., 1995;Vasilkov et al., 2008), and a new ozone climatology based on Aura MLS (Microwave Limb Sounder) and ozonesonde observations (McPeters and Labow, 2012) have been implemented.Radiance adjustments in v8.6 were made for each instrument to maintain a consistent calibration and were used to reprocess data for all instruments that were reprocessed using adjusted radiances (DeLand et al., 2012).The absolute radiance calibrations for N14, N16, N17 and N18 were made relative to N17, while N7, N9 and N11 were calibrated against the shuttle's SBUV.The calibration for N4 is based on measurements at Arosa.
Figure 1 shows a timeline for SBUV instruments.The x axis shows years, while the y axis shows the Equatorcrossing local time (ECT).The first two instruments -N4 and N7 -flew in stable near-noon ECT orbits.Instruments starting with N9 were launched on satellites with drifting orbits.When the orbit of an SBUV instrument approaches the terminator the quality of the measurements decline.It is not clear why this occurs (Bhartia et al., 2012), but most likely it is due to instrumental problems that increase at high viewing angles (DeLand et al., 2012).The first three SBUV/2 instruments launched on NOAA satellites (N9, N11 and N14) began drifting rapidly shortly after launch.The onset of orbit drift is slower for the more recent instruments, starting with N16.
The v8.6 algorithm uses the optimal estimation technique (Rodgers, 2000) to retrieve ozone profiles as ozone layer amounts (partial columns, DU) in 21 pressure layers (see Supplement, Table S1).This is the same algorithm used in the previous version 8 processing.The corresponding total ozone values are calculated by summing ozone columns at individual layers.
For the first time we are releasing the SBUV monthly zonal mean (mzm) ozone profiles as a primary product in addition to the more familiar level 2 PMF (Product Master File) files.The SBUV v8.6 mzm time series are best suited for long-term trend analysis rather than for day-to-day variability studies.In this paper we focus on validating the mzm SBUV profiles.V8.6 SBUV mzm profiles for each instrument are available for download in HDF-5 format at http:// disc.sci.gsfc.nasa.gov/daac-bin/DataHoldingsMEASURES.pl?PROGRAM_List=RichardMcPeters.The mzm profiles are calculated in 5 • latitudinal bins with midpoints starting at 87.5 • S by simply averaging individual profiles in the specific month and latitude bin.To create mzm profiles all  satellites 9, 11, 14, 16, 17, 18, and 19.The orbital properties of each satellite vary.In general, measurements taken within the 08:00-16:00 ECT range are less noisy.Periods of operation for SAGE II, UARS MLS and Aura MLS are denoted at the bottom of the figure .level 2 ozone profiles are screened to ensure high quality, and only profiles with an error flag of 0 (no flag) or 1 (solar zenith angle in the 84-88 • range) are accepted.In general, 5 % or fewer (in most cases 1 % or fewer) individual profiles are rejected.We also require that the mean latitude of measurements within each latitude band are within 1 • of the center of the band, and similarly that the mean time of measurements within a given month is within 4 days of the center of the month (i.e., day 15) to ensure that the mzm is adequately sampled.The mzm is not computed if these criteria are not met.Data have been averaged for either the ascending phase of the orbit or the descending phase of the orbit, whichever gives the best coverage.A volcano contamination index (VCI) flag has been developed to identify potential aerosol contamination of the ozone measurements following the eruptions of El Chichón (April 1982) and Mt. Pinatubo (July 1991).The VCI flag uses the absolute value of mzm ozone in layer 1 (639-1013 hPa) and the standard deviation of mzm ozone values in layer 10 (10.1-16.1 hPa) as indicators of possible contamination.This flag is currently relevant for N7 SBUV data following the El Chichón eruption, but does not appear to properly capture the aerosol evolution in time and latitude following the Mt.Pinatubo eruption.Therefore in our analysis we use the VCI as a filter for N7 SBUV, and exclude all data after the eruption of Mt.Pinatubo through the end of 1992 for N9 and N11.We note that N9 was in a near-terminator orbit when Mt. Pinatubo erupted, and the data are likely affected beyond 1992.
As part of the v8.6 processing we focused on estimating the various sources of error in the SBUV ozone retrievals using independent observations and analysis of the algorithm itself (Bhartia et al., 2012;Kramarova et al., 2013).We found that the main source of error in the SBUV profile retrievals is the smoothing error, which is the error due to vertical ozone variability that the SBUV observing system cannot measure.Due to the low vertical resolution in the lower stratosphere and troposphere, SBUV measures a signal from a wide vertical range and the retrieval algorithm relies on the a priori information to properly distribute this signal among the individual layers.The smoothing error represents the uncertainties related to this distribution.The smoothing errors for the SBUV mzm profiles and total ozone were estimated and reported along with the data.The smoothing errors are less than 1 % between 10 and 1 hPa and increase above and below this range.The largest smoothing errors were found in the troposphere (up to 15 %).To minimize the SBUV smoothing effect we recommend combining individual layers in the lower stratosphere and troposphere (Kramarova et al., 2013).
To facilitate analysis with the SBUV profile data, we provide the SBUV averaging kernel matrices, a priori profiles, weighting functions (Jacobian) and smoothing errors in addition to the ozone mzm product.Having the error bars and retrieval characteristics along with ozone profiles makes it easier to analyze and properly use SBUV data (Kramarova et al., 2013).Reported averaging kernels (A IK ) are applicable to ozone profiles in SBUV native units of layer amount (DU/layer), and Bhartia et al. (2012) refer to them as integrating kernels.To get the traditional bell-shaped averaging kernels, applicable to profiles of fractional ozone changes, the following expression can be used: where x a are the SBUV a priori profiles, and i and j are the layer indices.
In addition to the layer data, SBUV profile ozone is reported as a mixing ratio at 15 fixed levels between 50 and 0.5 hPa (see Supplement, Table S2).Ancillary data, including the number of profiles in the mzm average, the standard deviations, the average solar zenith angles, and the total covariance matrices used to compute the smoothing error, are also included in the mzm files.

Independent satellite ozone profile measurements
We validate SBUV ozone mzm profiles against independent satellite measurements, obtained from the Stratospheric Aerosol and Gas Experiment II (SAGE II) instrument and two MLS instruments flown on the Upper Atmosphere Research Satellite (UARS) and Aura satellites.The time frames for each independent satellite observation are shown in Fig. 1.In this section we provide a brief description of each independent data set.

SAGE II
The SAGE II instrument was launched in October 1984 aboard the Earth Radiation Budget Satellite (ERBS) and operated until August 2005.SAGE II uses the solar occultation technique to measure ozone, aerosol, nitrogen dioxide, and water vapor profiles.The ozone is derived from the attenuation of solar radiation at 600 nm as it passes through the limb of the atmosphere (Mauldin et al., 1985).SAGE retrieves ozone vertical profiles with 1-2 km resolution from the upper troposphere to about 60 km (Chu et al., 1989).The instrument observes 14-15 sunrises and sunsets per day, resulting in profiles equally spaced in longitude along two narrow latitude bands.The poleward extent of the SAGE coverage varies with season, from 50 • in the winter hemisphere to 70 • in the summer hemisphere.The latitude of daily observations varies in time such that the full latitude range is covered ∼ every 3 weeks.
We use both sunrise and sunset SAGE II data from 1985-1999.We do not consider SAGE II data after 1999 because they are limited to sunset events only.In this study we use SAGE II version 6.2 data.Wang et al. (2002) assessed the quality of the SAGE v6.1 data and reported data precision of 4 % or better above 25 km, with less precision (10-50 %) at lower altitudes.The accuracy of SAGE II ozone profiles is 6 % above 25 km (Cunnold et al., 1989).Only minor changes (typically < 0.5 %) to the SAGE ozone were made in the update to SAGE v6.2 (http://www-sage2.larc.nasa.gov/Version6-2Data.html).We apply filters to the SAGE profiles to account for aerosol and cloud contamination and other sporadic anomalous data as recommended by Wang et al. (2002).

UARS MLS
The first satellite-based MLS instrument flew on the UARS platform, launched in September 1991 (Barath et al., 1993;Waters et al., 1993).The instrument operated through August 1999, but the number of measurements decreased over time beginning in 1994 in an effort to limit antenna degradation and conserve instrument power.Ozone measurement noise increased after the shutdown of the 63 GHz channel in June 1997.The MLS instrument measures the thermal emission spectrum from Earth's limb at three wavelengths.In this study we use ozone retrieved from the 205 GHz channel.The UARS orbit was such that measurements were made from 34 • in one hemisphere to 80 • in the other.Every 36 days the satellite underwent a 180 • yaw maneuver, and the opposite hemisphere was imaged.This gives full coverage equatorward of 34 • , and roughly every-other-month coverage at latitudes poleward of 34 • .UARS MLS profiles are retrieved at a resolution of ∼ 2.5 km, though the true vertical resolution is lower in the troposphere and mesosphere.We use the version 5 data in the vertical range 64-1 hPa, and filter data as recommended by the instrument team (Livesey et al., 2003).The accuracy and precision of UARS measurements are 6 and 4-10 %, respectively, and increase to 15 and 20 %, respectively, at 68 hPa (Livesey et al., 2003).UARS MLS measures at varying local solar times, day and night, but we use only daytime measurements for comparison with SBUV.

AURA MLS
The Aura satellite was launched in July 2004 with an MLS instrument on board, and the instrument continues to operate (Froidevaux et al., 2008).Aura MLS measures thermal limb emissions over five broad-wavelength ranges using seven radiometers.The Aura orbit allows for daily uniform spatial sampling over the globe, and ozone profiles are retrieved from the 240 GHz spectral band.The AURA MLS vertical range is 200-0.02hPa, and covers a wider vertical range than the UARS MLS measurements.The vertical resolution of the instrument is ∼ 2.5 km throughout most of the profile, and the ozone is retrieved at a resolution of 12 levels per decade of pressure (about 1 km).We use the current version 3.3 MLS data set, and filter the data according to recommendations outlined in the MLS version 3.3 user's guide (Livesey et al., 2011).The estimated uncertainties of Aura MLS ozone measurements are 5 % or less throughout the stratosphere and increase to 10 % (occasionally to 20 %) in the lower stratosphere (Froidevaux et al., 2008).For comparison with SBUV we use only daytime MLS measurements.

Independent ground-based ozone profile measurements
We validate SBUV measurements against three types of the ground-based ozone profilers: microwave spectrometers, lidars and Umkehr instruments.Here we summarize the main features of the ground-based instruments.Table 1 shows the list of ground-based stations and overlapping time periods with each SBUV instrument.

Microwave spectrometers
We use data from ground-based microwave spectrometers located at Mauna Loa, Hawaii, USA (20 • N) and Lauder, New Zealand (45 • S).The microwave spectrometers measure a spectral line produced by a rotational transition of ozone at 110.836 GHz.One of the advantages of the microwave spectrometers is their ability to operate unattended day and night, independent of weather conditions (Parrish et al., 1992).Microwave profiles cover an altitude range of 56-0.1 hPa (about 20-66 km) with a vertical resolution of about 8 km below 3 hPa and up to 17 km at 0.2 hPa.Net precision of the measurements is 4-6 % between 55 and 0.2 hPa (Connor et al., 1995).Regular observations at Mauna Loa and Lauder started in 1995 and 1992, respectively, and continue to date.In this study we use specially reprocessed microwave data with an hourly time resolution (Parrish et al., 2012).For comparisons with SBUV we use daytime microwave measurements only.

Lidar instruments
For our validation work we use data from four lidar instruments, located at Mauna Loa, Hawaii, USA (20  et al., 1990).The lidars retrieve ozone profiles in the vertical range from approximately 20 to 50 km in units of number density (cm −3 ) on geometric height (Megie et al., 1985).The vertical resolution of lidars is about 0.3-0.5 km in the middle stratosphere, decreasing to 3-5 km around 45 km (Godin-Beekmann et al., 2003).The lidars can operate only at night and depend on weather conditions.Thus the sampling at each station is distributed unevenly and at some stations depends on the season.We consider only lidar measurements with reported errors of less than 10 %.The 10 % error screening significantly reduces the number of lidar measurements available at the bottom (below 40 hPa) and top (above ∼ 5 hPa) layers.Recently, Nair et al. (2012) assessed the performance of lidar measurements at 6 stations (including four stations considered in this study) relative to multiple satellite observations and demonstrate that biases and drifts are mostly within ±5 % and ±0.5 % per year.The lidar stations started operating in the late 1980s (see Table 1 for more details).

Umkehr instruments
For SBUV validation we chose 6 Umkehr stations with long time records and high measurement quality (Arosa, Switzerland (47 .The Umkehr data are reported ozone in 10 Umkehr layers divided into equal log-pressure vertical intervals starting at the surface (except the bottom layer where layers 0 and 1 were combined).These Umkehr layers are approximately 5 km thick.Here we use profiles from the updated Umkehr algorithm (Petropavlovskikh et al., 2005) that uses fixed seasonal a priori profiles instead of profiles based on measured total ozone amount.We remove three years of Umkehr data after the eruptions of El Chichon and Mt.Pinatubo (I.Petropavlovskikh, personal communication, 2013).A comparison between Umkehr and SAGE profiles (Newchurch et al., 1998) indicates a 5 % bias that increases to 15 % in layer 8.The Umkehr technique is too noisy to monitor short-term ozone variability, but Umkehr measurements can be used to monitor long-term changes of ozone monthly means with less than 5 % uncertainty in the stratosphere (Petropavlovskikh et al., 2005).

Ozonesondes
To validate SBUV measurements in the troposphere and lower stratosphere we chose four ozonesonde stations based on their long time records and their proximity to the northern midlatitude Umkehr stations.We use data from Boulder, Colorado, USA (40 • N); Hohenpeissenberg, Germany (48 • N); Lindenberg, Germany (52 • N) and Payerne, Switzerland (47 • N).Ozonesondes measure in situ ozone concentration from the ground up to 30-35 km and report profiles of partial ozone pressures.Long-term ozonesonde measurements provide valuable information about ozone concentration in the troposphere.Two main types of ozonesondes have been in use at the stations used in this study: Brewer-Mast (Brewer and Milford, 1960) and electrochemical concentration cell (ECC) (Komhyr, 1969).Recent studies (e.g., Smit et al., 2007) demonstrate differences up to ±5-10 % among different types of ozonesondes, that might affect the long-term stability of ozonesonde records if different types of sondes were used at a given station.Accuracy and precision of the ECC

Vertical coordinates
Before doing comparisons, all independent ozone profiles are converted to partial column ozone in SBUV pressure layers (see Supplement, Table S1), except for the Umkehr profiles (see details below).To convert mixing ratios into layer amounts, we first calculate partial ozone columns from each fixed pressure level to the top of the atmosphere.The logarithm of the resulting cumulative ozone as a function of ln (pressure) is interpolated to the SBUV pressure scale.Layers are successively subtracted from the top-down to obtain partial ozone column in each individual layer.SAGE and lidar profiles, reported as ozone number density on altitude levels, are first converted to a mixing ratio on pressure scale using NCEP temperature and pressure profiles.However, offsets and drifts in the NCEP data could contribute to spurious long-term trends in the ozone layer amounts through drifts in the air density and the altitude of pressure surfaces derived from temperature (Keckhut et al., 2001;Rosenfield et al., 2005;Terao and Logan, 2007;McLinden and Fioletov, 2011).Terao and Logan (2007) found the NCEP error primarily to be a problem at pressures less than 10 hPa (higher in the atmosphere).Further, the problematic long-term trends in NCEP temperatures are largely due to short-term variations, likely caused by shifting from one temperature satellite record to another (Gaffen et al., 2000;McLinden and Fioletov, 2011).Since we use SAGE II and lidar data for validation of individual SBUV instruments over comparatively short time periods, we do not explicitly correct for possible trends in the temperature data and we expect that the possible temperature trend will not significantly affect the validation results.
For comparison with Umkehr measurements, SBUV partial columns are interpolated onto 10 Umkehr layers.We first calculate the ozone amount above each SBUV level, and then we interpolate the resulting values to Umkehr pressure levels.We also account for the elevation of the Umkehr stations.Finally, the ozone columns for each individual Umkehr layer are calculated by subtracting the adjacent layers.Although the standard Umkehr retrieval returns data at 10 layers, the actual vertical resolution of the Umkehr retrievals is much coarser, due to a combination of physical atmospheric scattering processes, finite instrument spectral resolution, and real atmospheric vertical correlations (Mateer, 1965;Hahn et al., 1995).Analysis of the corresponding averaging kernels demonstrates that the retrievals possess, at most, four independent pieces of information.Thus combining all layers above layer 8, and merging layers 2 and 3 for the middle latitude stations and layers 1 + 2 and 3 + 4 in the tropics is recommended to increase information content and accuracy (I.Petropavlovskikh, personal communication, 2013).
We use ozonesonde observations to validate SBUV measurements in a broad troposphere/lower stratosphere layer.Using sonde measurements we calculated integrated ozone columns from the ground to 30 hPa and compared with the corresponding values obtained from the SBUV instruments.

Vertical resolution
When comparing two profiles it is important to account for differences in vertical resolution.There are two ways to approach this issue.One approach is to convolve the highly resolved profile by the averaging kernels (AK) of the profile with the lower vertical resolution (e.g., Liu et al., 2010).However, it is not clear how to convolve ozone profiles that cover only part of the atmosphere (for example, lidars measure ozone only between 60 and 1 hPa), because the SBUV AKs should be applied on the profiles that cover the entire range from the surface to the top of the atmosphere (Kramarova et al., 2013).In addition, the physical interpretation of comparisons with the convolved profiles is a challenge.Thus as an alternative, we choose to combine several layers of SBUV data.The combined layer is more representative of what SBUV actually measures, and the smoothing error for the combined layer is reduced.We then can directly compare the ozone amount in the combined layer with the corresponding amount obtained from similarly integrating the independent, highly resolved measurements.In this case the results of the comparison will have a clear physical interpretation.
Figure 2 shows the SBUV smoothing error for mzm profiles as a function of altitude for several latitude bands.Analysis of the SBUV retrieval algorithm and smoothing error (Kramarova et al., 2013) demonstrates that the SBUV smoothing error is low (about 1-2 %) for individual layers between 25 and 1 hPa in mid and high latitudes (poleward of 20 • latitude) and between 16 and 1 hPa in the tropics (20 • S-20 • N).In this vertical range SBUV ozone layer amounts can be directly compared to the corresponding quantities obtained from the highly resolved measurements.Smoothing errors of the order of 1-2 % can be neglected compared to other sources of random and systematical errors.Below and above this range the SBUV vertical resolution decreases and smoothing errors increase accordingly (up to 10-15 % in the troposphere).To reduce the smoothing error down to 1-2 %, we combine all layers below 25 hPa in the extratropics (or below 16 hPa in the tropics) down to 250 hPa (or down to the surface).In Sect. 4 we validate the SBUV ozone amounts in the thick layers with the corresponding amounts obtained from Aura MLS (250-25 hPa and 250-16 hPa), ozonesonde (surface-30 hPa) and Umkehr (surface to 30 hPa) measurements.

Spatial and temporal coincident criteria
Appropriate coincidence criteria in both time and space are very important for validation.Above 1 hPa diurnal ozone variation plays a significant role (e.g., Connor et al., 1994;Haefele et al., 2008), and time coincidence criteria should be stricter.For this reason we limit the vertical range of the validation to 1 hPa and below for all instruments except lidars, where we limit the upper range to 1.6 hPa due to the reduced number of lidar measurements above this altitude after the 10 % error screening.
The spatial and temporal coincidence requirements vary depending on the spatial and temporal resolution of the external instruments.SBUV, UARS and Aura MLS all have good spatial resolution with sufficient sampling to produce representative monthly zonal mean values.Thus when comparing SBUV to MLS we simply compute mzm values for each instrument and compare them directly.
SAGE II data have comparatively poor spatial/time coverage, so in this case we subset the SBUV data set to match the SAGE space/time coverage.For each SAGE profile we find all SBUV profiles within ±12 h and within ±1 • latitude and ±14 • longitude.This is typically 1-3 SBUV profiles.When more than one SBUV profile match is found we average the profiles using a linear weighting by distance from the SAGE profile location.Then we construct monthly zonal means from the SAGE and sub-sampled SBUV for comparison.
We use the same procedure when comparing SBUV to ground-based instruments.We require at least five coincident profiles to calculate monthly means for ground-based microwave data and two profiles for lidar, sonde and Umkehr data.On average we typically have about 15 coincident mi-crowave profiles, between 2 and 20 Umkehr profiles, between 2 and 15 lidar profiles and about 2-5 ozonesonde profiles each month.Measurements from the ground-based microwave spectrometer at Mauna Loa are available at high time resolution, so for these comparisons we restrict the time difference to ±1.5 h.The microwave instrument at Lauder also measures ozone profiles at high time resolution, but the number of profiles that satisfy ±1.5 h coincident criteria is too low for statistical significance.Instead we calculate the daily average using all measurements between 09:00 and 17:00 local solar time to get a sufficient number of profiles.

Bias, standard deviation and relative drift
The bias and standard deviation are calculated for each pair of instruments.The bias is the mean deviation of profiles measured by two different instruments: where Xsbuv is the SBUV mzm ozone profile, Xext is the external mzm profile (profile from the independent instrument), and N is the number of coincident mzm profiles.To estimate the percent relative bias we normalize bias b by the SBUV a priori x a .The standard deviation of the differences is estimated using We also compute the drift between the ozone time series from the instrument pairs.To estimate possible drifts for each SBUV instrument relative to various external measurements, we calculate monthly mean time series of seasonal anomalies by subtracting the seasonal cycle from each instrument independently.We deseasonalize anomalies to reduce persistence in the time series of residuals.This way we can assume that the residuals are random and normally distributed.Then we compute the time series of differences between the pair of deseasonalized anomalies and linearly regress the difference time series at all altitudes (see example in Fig. S14 in the Supplement).Linear regression provides a simple way to estimate drift.The standard deviation of the slope of the linear regression is estimated using where and linear regression fits (a +bp) at layer k, N is the number of months, p is time and a and b are regression coefficients (Wilks, 2006).Long overlapping time periods and sufficient sampling are required to accurately estimate the relative drift and to reduce the standard deviation of the slope (see Fig. S19 in the Supplement).
3 Results: mean biases and standard deviations in the middle and upper stratosphere

Relative to independent satellite measurements
We validate SBUV mzm profiles relative to independent satellite measurements in the altitude range between 25 and 1 hPa.The time series of mzm differences for each SBUV instrument relative to independent satellite measurements for three latitude zones (20-50 • N, 20 over its lifetime from 1991 to 1999; Aura MLS overlaps with N16, N17 and N18 over the time period from October 2004 to 2011.The differences between SBUV and independent instruments are mostly within ±10 %.Differences for N9, N11 and N14 relative to SAGE II and UARS closely follow each other.However, the range of differences is much narrower relative to Aura MLS compared to the earlier satellites.N16 demonstrates a drift starting from 2007 especially notable in the 4-2.5 hPa layer.This is the period when the N16 orbit was approaching the terminator. In the tropical lower stratosphere (layer 25-16 hPa) the differences relative to all satellites show a clear signal of the quasi-biennial oscillations (QBO).Because of the low SBUV vertical resolution in the lower tropical stratosphere the SBUV algorithm is not capable of accurately retrieving the QBO signal in individual layers (Kramarova et al., 2013).Due to this limitation of the SBUV algorithm in the narrow tropical zone (20 • S-20 • N), we do not recommend to use the SBUV data in the 25-16 hPa layer in the tropics.Instead it is better to merge several layers from the surface (or from 250 hPa for Aura MLS comparisons) up to 16 hPa.We calculate mean biases for each pair of instruments that have at least a 24 month overlap.Figure 4 shows mean biases for individual SBUV instruments relative to (a) MLS and (b) SAGE II mzm as a function of latitude in four layers between 25 hPa and 1 hPa.The left panel in Fig. 4 shows mean biases relative to UARS and Aura MLS.Biases for N9, N11 and N14 have been calculated relative to UARS MLS, while biases for N16, N17 and N18 have been calculated relative to Aura MLS.We do not calculate biases for N14 relative to Aura MLS, since the overlap time is less than a year.Vertical bars on Fig. 4 indicate the standard error of the bias (σ/ √ N ).The right panel in Fig. 4 demonstrates mean biases for four SBUV instruments (N7, N9, N11 and N14) relative to SAGE II.The standard deviations of the differences relative to independent satellite instruments for individual latitude bands are mostly within 5 % (not shown here).
Between 50 • S and 50 • N the mean biases are mostly within ±5 % for all SBUV instruments.Between 25 and 10 hPa, below the ozone peak, all SBUV instruments underestimate ozone by about 3-5 % compared to the reference independent satellite observations.We also found negative systematic biases in the layer between 1.6 and 1 hPa for all SBUV instruments except for N7, which has almost zero offset relative to SAGE II in the tropics.
Although for the most part SBUV instruments demonstrate consistent results, we note some features of individual instruments.N9 has larger negative offsets in the 16-10 hPa and 10-6 hPa layers (more negative relative to UARS MLS compared with SAGE II), which is not consistent with the behavior of other SBUV instruments.N11 has negative bi-ases relative to both UARS MLS and SAGE II throughout the vertical range.Again biases are more negative relative to UARS MLS than to SAGE II.Between 2.5 and 10 hPa biases are more negative for the descending portion of N11 (after 1998).The largest spread (∼ 10 %) among the SBUV instruments is in the layer between 10 and 6 hPa, where biases for the three recent SBUV instruments relative to Aura MLS are positive and biases for N9, N11 and N14 relative to UARS MLS are negative.
Figure 5 shows the altitude dependence of the mean biases for individual SBUV instruments averaged over the wide latitude zone 50 • S and 50 • N relative to (a) SAGE II, (b) UARS MLS and (c) Aura MLS.Biases, standard deviations (see Fig. S1 in the Supplement) and drifts (see Sect. 5.1 below) for 50 • S-50 • N are calculated by constructing the 50 • N-50 • S area-weighted mean time series and finally applying the equations presented in Sect.2.4.4.We use this approach rather than calculating mean values from the biases, standard deviations and drifts for individual latitude bands to reduce the noise associated with the limited sampling at some latitude bands and to isolate the robust patterns that help to characterize the performance of individual SBUV instruments.Again, the mean biases for the wide latitude band are mostly within the ±5 % range.Comparisons with UARS MLS and SAGE II show that the profile of mean biases for the ascending mode of N14 is very similar to the shape of the biases for N16, N17 and N18 relative to Aura MLS, with negative biases below 16 hPa and above 1.6 hPa, and slightly positive or close to zero biases in between.The vertical shape of the biases for N7 relative to SAGE II is also very similar to that described above.N9 and N11 have slightly different shapes.N9 has large negative biases between 16 and 6 hPa, while N11 shows negative biases at all layers between 25 and 1 hPa.It is important to remember that N14, N16, N17 and N18 were calibrated against N17, while N7, N9 and N11 were calibrated against the shuttle's SBUV (Deland et al., 2012).Thus, similarity in shape of the mean biases for N7 and the four recent instruments demonstrates that these instruments have common systematic errors that can be attributed to the SBUV retrieval algorithm itself rather than to features of the individual SBUV instruments.These results highlight the consistency among the individual SBUV instruments and add to the creditability of the SBUV merged data set for long-term trend analysis.
The corresponding standard deviations for the differences relative to Aura MLS (shown in Supplement, Fig. S1) in the broad latitude band from 50 • S to 50 • N are less than 1.5 %, except for N16.In 2004 the N16 satellite started quickly drifting toward the terminator, and by the middle of 2007 the local equator crossing time passed 16:00.After mid-2007 the N16 differences relative to Aura MLS significantly increase.As a result the standard deviations for N16 over the entire overlap period are 2-2.5 %, while standard deviations over the period from October 2004 to July 2007 are the same order as for N17 and N18.For this reason we do not recommend to use N16 measurements after mid-2007.Standard deviations for differences relative to UARS MLS and SAGE II are larger, varying within 1-3 %.Larger standard deviations for N9, N11 and N14 are partially due to poorer sampling with SAGE II and UARS MLS, but are also due to the lower quality of these SBUV instruments.
We also estimate seasonal biases, defined as the difference in the seasonal cycles for the pair of instruments (see Supplement, Figs.S8-S10).Seasonal biases relative to Aura MLS are less than 2 % in the tropics and mostly statistically insignificant (see Supplement, Fig. S8).Outside of the tropics we found a clear seasonal pattern relative to Aura MLS, though the amplitudes of seasonal variability are still mostly within 2-3 %, increasing to 5-6 % in the 10-6 hPa layer.There is an approximately 6 month lag between southern and northern midlatitudes.A clear seasonal signature can be also seen in Fig. 3 in the time series of differences relative to Aura MLS in the extratropics of both hemispheres.We were not able to isolate clear seasonal structures from UARS MLS and SAGE comparisons possibly due to the poorer spatial and temporal sampling.The amplitude of seasonal differences varies within ±2-8 % and is mostly less than the 2σ standard deviations (see Supplement, Figs.S9, S10).

Relative to ground-based profile measurements
We validate SBUV profiles against three types of groundbased ozone profilers: microwave spectrometers, lidars and Umkehr instruments.All instruments have different vertical resolutions and make measurements in various vertical ranges.We compare ozone amounts obtained from different instruments in vertical ranges where both instruments have sufficient information content and vertical resolution.For comparisons against microwave spectrometers, we consider ozone at 8 layers between 40 and 1 hPa, and for lidar comparisons we validate ozone at 7 layers between 40 and 1.6 hPa.We compare SBUV relative to Umkehr from the surface to 31 hPa, 31-16 hPa, 16-8 hPa, 8-4 hPa and 4-2 hPa.Comparisons for the layer between the surface and 30 hPa are considered in Sect. 4.

Ground-based microwave
We validate SBUV ozone profiles against coincident microwave observations at Mauna Loa and Lauder.The time series of differences relative to ground-based microwave measurements are shown in Supplement Fig. S2.The vertical profiles of mean biases are shown in Fig. 6 for both microwave stations.Various colors correspond to different SBUV instruments.Biases at both locations are negative between 25 and 10 hPa (up to −7 %), which is consistent with the satellite comparisons.Between 10 and 4 hPa the biases are positive and flip sign above 2 hPa for all instruments except for N9 at Lauder.The main difference in the results for the two locations is the altitude where the biases switch sign from positive to negative.Mauna Loa biases are negative above 4 hPa, while at Lauder the transition occurs above 2.5 hPa.The vertical pattern of biases at Mauna Loa is very consistent for all SBUV instruments.However, we note that the shape of biases for N11 differs slightly from the other instruments, and N9 has larger biases.At Lauder we see a larger spread for individual SBUV instruments in upper levels.Above 2 hPa biases for N16, N17 and N18 are significantly negative and exceed 5 %, while the biases for N11 and N14 are less negative and close to zero.At the same time N9 has positive biases everywhere above 6 hPa.The standard deviations of the differences at Mauna Loa average about 3 % and are almost independent of altitude (see Supplement, Fig. S5).The standard deviations at Lauder are larger and vary from 3 to 5 %.
For microwave instruments we also calculate biases and standard deviations for individual profiles without monthly averaging (results are not shown here).We found that the value and shape of the biases remain the same, though the standard deviations increase approximately by a factor of 2.

SBUV vs. lidars
We compared SBUV profiles with ground-based lidar observations at Mauna Loa, Table Mountain, Lauder and Haute Provence.The time series of differences at 3 different layers are shown in Supplement Fig. S3.
Figure 7 shows vertical profiles of SBUV mean biases relative to lidar measurements for all locations.Different colors correspond to different SBUV instruments.We can clearly see the consistency from station to station for the later instruments N16, N17 and N18.Biases are slightly negative between 25 and 10 hPa and positive between 10 and 4 hPa and finally again switch sign above.However, at Lauder biases remain positive above 4 hPa.Biases for the three recent SBUV instruments are mostly within a ±7 % range.
All earlier SBUV instruments demonstrate behavior that is not consistent from station to station or from one SBUV instrument to another.At Mauna Loa the vertical pattern of biases for N14 is very similar to that for N16, N17 and N18 with slightly less positive anomalies between 10 and 4 hPa.This pattern is consistent with the results obtained from satellite comparisons.Biases for N9 and N11 are very different, but still within ±7 %.At Table Mountain biases above 4 hPa are negative for the three recent instruments (N16-18) and positive up to +10 % for the four early instruments.Such inconsistency is most likely related to a ma- jor upgrade that was applied to the Table Mountain lidar in 2001 (http://tmf-lidar.jpl.nasa.gov/instruments/TMF_strato_DIAL.htm).At the Lauder station, biases above 10 hPa are negative for the earlier instruments and positive for the recent instruments, possibly due to fewer coincident profiles in this period.The biases relative to lidar measurements at Haute Provence are very consistent and are mostly within 5 % for all SBUV instruments, except in the 2-1.6 hPa layer, where biases increase up to −10 %.
The standard deviations of the differences between monthly mean profiles are mostly within the 2-6 % range between 40 and 2 hPa.In the 2.5-1.6 hPa layer standard deviations increase to 10-12 % (see Supplement, Fig. S6).This again might be a result of the reduced number of lidar measurements at higher altitudes due to the 10 % lidar precision screening.The lowest standard deviations we found were for comparisons with Mauna Loa, where ozone variability is naturally low.
Comparisons with ground-based microwave spectrometers and lidars are consistent with the results we found above from the satellite comparisons.The vertical structure of biases for N16, N17 and N18 is very robust.At some (but not all) stations the shape of biases for N14 is similar to the shape for the three recent instruments.N9 and N11 demonstrate different behavior that is inconsistent from station to station.It is important to note that the quality and frequency of ground-based measurements gradually increase over time.Thus larger uncertainties in 1990s can be partially attributed to the lower quality of ground-based observations.

SBUV vs. Umkehr
Umkehr ground-based observations provide the longest record of ozone profiles, but the frequency of Umkehr measurements varies with location and over time.We select 6 Umkehr stations with relatively long time records and high quality of measurements for comparisons: Mauna Loa, Arosa, Belsk, Lauder, Haute Provence and Boulder Figure 8 shows mean biases for individual SBUV instruments relative to Umkehr for all six locations.The vertical structures of biases are similar at all locations (except Mauna Loa) with positive biases between 8 and 2 hPa and between 30-16 hPa.Between 16 and 8 hPa biases tend to be closer to zero.Similar results were shown by Nair et al. (2011), where authors noticed low Umkehr ozone values at Haute Provence relative to lidar observations at the same location.At Mauna Loa the vertical structure of biases is similar to that described above, except biases are negative between 8-4 hPa.The similar structure of biases at all locations points to a systematic error in the Umkehr retrievals, causing Umkehr instruments to underestimate ozone amounts in the stratosphere.
At Mauna Loa, Belsk, Lauder, Haute Provence and Boulder the vertical structures of the biases for all SBUV instruments are very similar.At Arosa, the last four instruments demonstrate similar behavior, while N4, N7, N9 and N11 show less positive biases in layers between 16 and 4 hPa.The standard deviations of monthly mean biases vary from 2 to 6 %, with larger standard deviations at Belsk (up to 10-12 %) for N17 and N18 (see Supplement, Fig. S7).
The Umkehr technique measures the entire profile, so we also compare total column ozone amounts with those obtained from SBUV instruments.Biases are mostly positive and vary from 1-3 % with corresponding standard deviations of about 4 %.Labow et al. (2013) compared SBUV v8.6 total ozone against ground-based Dobson and Brewer observations and found differences within ±1 %.
We also calculated seasonal biases for comparisons with ground-based stations (results are shown in the Supplement, S11-S13).Similarly to satellite comparisons, we find that seasonal biases are smaller at the tropical station (Mauna Loa) and do not exceed ±2-3 %.In the northern midlatitudes we do not detect a stable seasonal pattern, and seasonal bi-ases at all northern stations are within ±3-5 %.However, at some layers the amplitude is as high as 10 %.Overall seasonal biases are within the 2σ range of standard deviations.We find clear seasonal structures relative to the instruments at Lauder (microwave spectrometer, lidar and Umkehr).Particularly, we isolate a robust seasonal pattern in the 10-6 hPa layer with negative biases in winter and positive biases in summer.The seasonal pattern at Lauder in this layer is consistent with Aura MLS comparisons.However, at other layers the seasonal pattern is not consistent from one groundbased instrument to another and not consistent with Aura MLS.The cause of the larger seasonal biases between SBUV measurements and ground-based observations at Lauder is not clear.

Results: validation of partial ozone columns in the lower stratosphere and troposphere
In order to get around the issue of differing vertical resolution, we compare SBUV ozone partial columns in the lower stratosphere and troposphere with corresponding values obtained from Aura MLS. Figure 9 shows the mean biases and standard deviations as a function of latitude for two recommended layer combinations: 250-25 hPa and 250-16 hPa (see Sect. 2.4.2).The biases for the 250-25 hPa layer are negative, from 0 to −2 %, outside the tropics.In the narrow tropical zone between 20 • S and 20 • N biases increase to −6 %.
The biases are slightly more negative for the 250-16 hPa layer outside of the tropics, but the latitudinal structures are similar to those described above (dotted lines in Fig. 9).Previously Froidevaux et al. (2008) also detected positive biases in the troposphere/lower stratosphere for Aura MLS v.2.2, meaning that Aura MLS slightly overestimates ozone concentration here.
The corresponding standard deviations for the 250-25 hPa layer vary within 1-2 % outside the tropics and increase to 3-4 % in the tropics.However, the standard deviations are reduced to 1 % over the tropics for the broader 250-16 hPa layer.This example demonstrates the increase in the precision of SBUV measurements in the tropics when the vertical resolution is downgraded by combining the layers up to 16 hPa.Standard deviations for the differences between the SBUV and Aura MLS measurements at individual layers in the lower tropical stratosphere vary from 3 to 10 % (results not shown here).
In addition, we compare partial ozone columns between the surface and 30 hPa against corresponding amounts obtained from the ensemble of four northern midlatitude Umkehr instruments (see Fig. 10).The color lines on this plot show the time series of differences for individual SBUV instruments, while the thick black line shows the 12 month moving average.The number of stations changes each month, and we require observations from at least two stations.The biases are mostly within ±5 %.The mean biases  are very close to zero between 1995 and 2011.In the late 1980s-early 1990s mean biases are negative with amplitudes of −2 to −3 %.
We also validate the SBUV partial ozone columns between the surface and 30 hPa with the corresponding values obtained from the ensemble of four northern midlatitude ozonesonde stations.Figure 11 shows time series of differences between SBUV and ozonesonde measurements.The thick black line shows the 12 month moving average.Mean biases for individual SBUV instruments are mostly positive within ±5 %.
Results shown in Figs.9-11 demonstrate that despite the limited SBUV vertical resolution in the lower stratosphere and troposphere, the mean biases for partial ozone columns obtained from SBUV instruments and from independent satellite and ground-based instruments are within ±6 %.

Results: drifts
We estimate possible drifts in the SBUV time series relative to independent ground-based and satellite measurements as described in Sect.2.4.4.However, the short overlap periods between SBUV and reference instruments, degradation of satellite instruments and multiple adjustments applied to ground-based instruments make it difficult to confidently estimate drifts.Three SBUV instruments (N9, N11 and N14) made measurements on both the ascending and descending modes of their orbit.DeLand et al. (2012) show that often instrument behavior appears to change after crossing the terminator.Thus we estimate drifts for ascending and descending modes separately.

Drifts relative to satellite instruments
For drift calculations we require at least a 24 month overlap between two time series.We estimate drifts relative to Aura MLS for the three recent instruments N16, N17 and N18.The N16 record starts drifting notably after mid-2007 (see Fig. 3) when the equatorial crossing time of the orbit passes 16:00.Thus here we show drifts for N16 only up to mid-2007.We evaluate drifts for N9 descending and N14 ascending relative to both SAGE II and UARS MLS and drifts for N7, N9 ascending, N11 ascending and descending relative to SAGE II.N11 ascending and descending drifts relative to UARS MLS are not evaluated.N11 ascending data do not have sufficient overlap because of data loss after the eruption of Mt. Pinatubo (1991-1992) and limited spatial coverage as the N11 orbit approaches the terminator (1994)(1995).N11 descending drift estimates are also not computed because the overlap is in the period after mid-1997 when UARS MLS data quality is reduced, and therefore should be used with caution in trend analyses (Livesey et al., 2003).However, we use UARS MLS data after 1997 when computing drifts for N14 ascending to increase the statistical significance of the results.
Figure 12 shows drifts for individual SBUV instruments relative to independent satellite measurements as a function of latitude at two layers.Vertical bars indicate two times the standard deviation of the slope.Hereafter, we consider a drift to be significant where the drift is different from zero at the 2σ level.Various colors correspond to individual SBUV instruments.Drifts relative to the MLS instruments are mostly within ±1 % yr −1 , except for N9 descending (Fig. 12a).N17 and N18 have the smallest drifts (mostly less than ±0.3 % yr −1 ) among all SBUV instruments.Drifts are slightly larger for N16 (up to 1 % yr −1 ) due to the shorter overlap period with Aura MLS observations (October 2004-June 2007).We estimated drift for the whole N16 record (results are not shown here) and found drifts varying from −2.5 to +2 % yr −1 .N14 has drifts up to ±1 % and the vertical structure of the drifts is robust across all latitude bands (see Fig. S18 in the Supplement).N9 descending has larger drifts in the tropics up to +2 % yr −1 between 16 and 6.4 hPa and more than −1 % yr −1 drifts in the 25-16 hPa layer (see Fig. S18 in the Supplement).
Drifts are larger relative to SAGE II.There are several factors that contribute to this: trends in the temperature records used to convert SAGE profiles from altitude to pressure scale; short overlapping time periods; and sparse SAGE sampling.N7 drifts less than 1 % yr −1 everywhere except the 1.6-1 hPa layer where drifts are slightly larger.N7 has small but significant negative drifts relative to SAGE II in both northern and southern midlatitudes between 16 and 6 hPa, and positive drifts above 2.5 hPa (see Fig. S18 in the Supplement).In the tropics the N7 drifts are significant and positive above 6 hPa.Drifts for N9 ascending are larger over the tropics, with ascending mode drifts of +2-3 % yr −1 between 25 and 16 hPa and −3 % yr −1 between 6 and 2.5 hPa.Drifts in the descending mode of N9 measurements are up to −1 % yr −1 at 25-16 hPa, +2 % yr −1 between 10 and 6 hPa and about +1 % yr −1 above 1.6 hPa for all latitudes (see Fig. S18 in the Supplement).We estimate drifts for the descending mode of N9 relative to both SAGE II and UARS MLS and find consistent results.The drifts for N11 ascending are mostly less than 0.5 % yr −1 through the considered altitude range.However, we found large and significant drifts during the descending portion of N11, with the largest drifts over the tropics.N11 descending drifts vary from up to −1.5 % yr −1 between 25 and 10 hPa to +2 % yr −1 between 6 and 1.6 hPa.We also found statistically significant drifts in the N14 ascending mode measurements in all latitude bands relative to both SAGE II and UARS MLS.N14 has larger drifts relative to SAGE than relative to UARS MLS (see Fig. S18 in the Supplement).The N14 drifts relative to SAGE II exceed −1 % yr −1 between 6.4 and 2.5 hPa. Figure 13 shows drifts for SBUV instruments relative to independent satellite measurements as a function of altitude for the wide latitudinal band 50 • S-50 • N. The drifts for N7 and N11 ascending are less than 0.5 % yr −1 (Fig. 13a, b) and mostly insignificant.The drifts of the three recent SBUV instruments relative to Aura MLS (Fig. 13d) are also less than 0.3-0.5 % yr −1 and mostly insignificant.Both portions of N9, N11 descending and N14 ascending have larger drifts that exceed ±1 % yr −1 .Drift estimations relative to UARS and SAGE for N9 descending and N14 ascending are consistent.The vertical structure of the N14 drift is robust for all latitudes with positive drifts between 25 and 10 hPa and negative drifts (up to −1.2 % yr −1 ) between 6 and 2.5 hPa.

Drifts relative to ground-based instruments
Our analysis demonstrates that records from some stations have obvious time-dependent changes (see Supplement, Figs.S2-S4).Thus we choose only instruments that show no signs of such temporal changes to estimate drifts.We evaluate SBUV drifts relative to lidars at Haute Provence, Mauna Loa and Lauder.We exclude Table Mountain due to changes above 4 hPa that occurred after the upgrade of the lidar in 2001.We use records from four Umkehr stations: Arosa, Lauder, Haute Provence and Boulder; we exclude Mauna Loa Umkehr due to significant stray light problems and Belsk due to missing measurements during winter.We relax the overlap criteria to 18 months for the ascending and descending portions of N9 and N11 to account for shorter overlap periods with ground-based data.
Figure 14 shows vertical profiles of the mean drifts for each SBUV instrument relative to each type of ozone profiler.
The mean drifts are calculated as the mean of regressions at all considered stations weighted by the corresponding error.Drifts relative to all ground-based instruments are shown in the Supplement (Figs.S15-S17).The top panel shows the drifts over the time period 2000-2011 and the bottom panel shows drifts over the time period 1985-1999.As before the three recent instruments have smaller drifts.In particularly N17 has proved to be a stable instrument with drifts mostly within ±0.2 % yr −1 .The slightly larger drifts for N18 are insignificant, most likely due to the shorter overlap periods.Generally, drifts for N16 and N18 are within ±0.5 % yr −1 and never exceed ±1 % yr −1 .The magnitude and vertical structure of drifts for N16-N18 relative to ground-based microwave and lidars are consistent with the drifts estimated relative to Aura MLS.Drifts in the N11 ascending mode and N7 data relative to ground-based Umkehr observations are less than ±1 % yr −1 .We estimate drift in N4 relative to measurements from the Umkehr instrument at Arosa (see Supplement, Fig. S17) and find that drifts are statistically insignificant.However, this is not an independent measure because the N4 radiance calibration was made relative to Arosa measurements (Bhartia et al., 2012).We detected rather large drifts (more than ±1 % yr −1 ) for ascending and descending modes of N9 and N14 and for descending mode of N11 relative to ground-based instruments, which is consistent with the satellite comparison results.We found a good correspondence between the value and shape of drifts for N14 ascending detected by the ground-based instruments with the drifts estimated relative to SAGE II and UARS MLS.

Conclusions
We validate SBUV monthly zonal mean profiles from Nimbus 4 and 7 and NOAA 's 9, 11, 14, 16, 17 and 18 against independent satellite and ground-based profile measurements between 50 • S and 50 • N. We validate SBUV profiles in the vertical range between 25 and 1 hPa, where SBUV has the best vertical resolution and smallest smoothing error.We are looking for differences that are consistent across many comparisons.Such differences characterize the accuracy of the SBUV retrievals.
Relative to independent satellite measurements SBUV biases for monthly zonal mean profiles never exceed ±10 % and are mostly within ±5 %.Standard deviations of the differences with Aura MLS in the wide latitude band between 50 • S and 50 • N are about 1-2 %, while for comparisons with SAGE II and UARS MLS standard deviations range from 3-4 %, likely due to poorer sampling and lower quality of SBUV instruments in the 1990s (N9, N11 and N14).We found negative biases between 25 and 10 hPa for all SBUV instruments (on average between −2 to −8 %) relative to all independent satellite instruments, meaning that SBUV underestimates ozone amounts below the ozone peak relative to SAGE and MLS measurements.We also found biases of −4 to −5 % between 1.5 and 1 hPa.
The biases and standard deviations are slightly larger when comparing SBUV time series to data from ground-based stations.Differences relative to ground-based lidar and microwave instruments are mostly within ±7 % and occasionally increase to 12 %.The corresponding standard deviations are mostly within ±2-6 %.The vertical structure of biases for N16, N17 and N18 is very robust and similar for groundbased microwave and lidar instruments and satellite comparisons, with negative biases between 25 and 10 hPa and above 2.5 hPa and positive biases between 10 and 4 hPa.A very similar shape for biases was found for the ascending portion of N14.
Comparisons with Umkehr instruments demonstrate consistent positive biases for all locations, suggesting that Umkehr underestimates ozone amounts.Differences were as much as 10-15 % between 8 and 4 hPa.These biases are most likely due to the problems with Umkehr retrievals.
We validate ozone amounts in the broad layers between 250 and 25 hPa and 250 and 16 hPa against the corresponding values obtained from Aura MLS.We detect slightly negative biases (−1 to −2 %) in the extratropics and up to −6 % in the tropics.The standard deviations in the extratropics are about 1-2 % for both layer combinations and decrease from 3 % down to 1 % over the tropics when layers up to 16 hPa are combined together.Comparison of ozone amounts between the surface and 30 hPa relative to corresponding values obtained from an ensemble of Umkehr instruments and ozonesonde stations in the northern midlatitudes show mean biases of less than ±5 %.
We evaluate drifts for individual SBUV instruments relative to ground-based and satellite instruments.We calculate relative drifts separately for ascending and descending orbit modes.N17 is very stable with drifts mostly less than 0.3 % yr −1 .N7, N16, N18 and the ascending portion of N11 show stable behavior as well with relative drifts of less than ±1 % yr −1 everywhere, and mostly less than ±0.5 % yr −1 .Larger and significant drifts (more than ±1 % yr −1 ) are detected for the ascending and descending portions of N9 and N14 and the descending portion of N11.
The results of this validation work are used to select data for the SBUV merged ozone data set (Frith et al., 2012).We find that N7 and N16, N17 and N18 have well-characterized biases of less than ±5 % and drifts of less than ±0.5 % yr −1 .These instruments collectively cover two time periods from 1979 to 1990 and from 2000 to the present.Data from N9, N11, and N14 have larger variations compared to independent data sets and compared to each other.Thus choosing one instrument over another can make a substantial difference in the merged product, indicating a larger uncertainty during this period in the merged time series, regardless of which data are included.We find large biases and drifts in both N9 ascending and descending mode data.Conversely, long-term drifts in the ascending orbit portion of N11 are mostly less than ±0.5 % yr −1 , making this data preferable to N9 data when extending the record into the 1990s.In 1995, as N11 enters the terminator, the N14 data become available.Though we find significant drifts in the N14 ascending mode data, the vertical structure of the drift is well-characterized and can be accounted for in the merged data set uncertainties.Limited validation of N4 ozone profiles do not allow us to draw specific conclusions, but results suggest that N4 data can reasonably be used to extend the SBUV data set back to the 1970s.

Fig. 1 .
Fig. 1.Equator crossing times of the SBUV instrument series as a function of time.The SBUV data set includes measurements obtained from the Nimbus-4 BUV instrument, the Nimbus-7 SBUV instrument, and the series of SBUV/2 instruments on board NOAA satellites 9, 11, 14, 16, 17, 18, and 19.The orbital properties of each satellite vary.In general, measurements taken within the 08:00-16:00 ECT range are less noisy.Periods of operation for SAGE II, UARS MLS and Aura MLS are denoted at the bottom of the figure.

Fig. 3 .
Fig. 3. Time series of differences for individual SBUV instruments relative to SAGE, UARS and Aura MLS for three latitude bands (30-50 • S, 20 • S-20 • N and 30-50 • N) and four layers between 25 and 1 hPa.SAGE measurements overlap with N7, N9, N11 and N14 over the period 1984 to 1999; UARS MLS overlaps with N9, N11 and N14 over the period 1993 to 1999; and Aura MLS overlaps with N16, N17 and N18 over the period 2004 to 2011.Colors correspond to individual SBUV instruments.Differences relative to SAGE are marked as crosses, relative to UARS MLS as triangles and relative to Aura MLS as filled circles.

Fig. 4 .
Fig. 4. Mean biases for individual SBUV instruments as a function of latitude for four layers between 25 and 1 hPa.Colors correspond to individual SBUV instruments.The vertical bars indicate standard errors of the biases (σ / √ n).(a) Biases relative to UARS MLS for N9, N11 and N14 and relative to Aura MLS for N16, N17 and N18.(b) Biases relative to SAGE II data for N7, N9, N11 and N14.

Fig. 5 .
Fig. 5. Vertical profiles of mean biases relative to (a) SAGE II, (b) UARS MLS and (c) Aura MLS for the latitude band 50 • S-50 • N. Colors correspond to individual SBUV instruments.

Fig. 6 .
Fig. 6.Vertical profiles of mean biases for individual SBUV instruments relative to coincident ground-based microwave measurements at (a) Mauna Loa and (b) Lauder.Colors correspond to individual SBUV instruments.

Fig. 9 .
Fig. 9. Biases (top) and standard deviations (bottom) for N16, N17 and N18 relative to Aura MLS in the integrated ozone layers between 250-25 hPa (solid lines) and 250-16 hPa (dotted lines) as a function of latitude.

Fig. 13 .
Fig. 13.Vertical profiles of the drifts for individual SBUV instruments relative to independent satellite measurements for the wide latitudinal band 50 • S-50 • N. The horizontal error bars indicate two times the standard deviation of the slope.Colors correspond to individual SBUV instruments, and dashed lines correspond to the descending modes.(a) Drifts for N7 and ascending and descending modes of N9 relative to SAGE II; (b) drifts for ascending and descending modes of N11 and ascending mode of N14 relative to SAGE II; (c) drifts for descending mode of N9 and ascending mode of NOAA14 relative to UARS MLS; (d) drifts for N16, N17 and N18 relative to Aura MLS.

Fig. 14 .
Fig. 14.Vertical profiles of mean drifts for individual SBUV instruments relative to each type of ground-based instruments: (a, d) microwave spectrometers; (b, e) lidars; (c, f) Umkehr instruments.The mean drifts are calculated as the mean of regressions at all considered stations weighted by the corresponding standard deviations.The horizontal error bars indicate the 2σ standard deviation of the slope.Colors correspond to individual SBUV instruments, and dashed lines correspond to descending modes.The top row shows drifts over the time period from 2000 to 2011 and the bottom row over the period from 1985-1999.

Table 1 .
Overlapping time periods for individual SBUV instruments with a variety of ground-based instruments (mm/yyyy).