Intercomparison of slant column measurements of NO 2 and O 4 by MAX-DOAS and zenith-sky UV and visible spectrometers

Abstract. In June 2009, 22 spectrometers from 14 institutes measured tropospheric and stratospheric NO2 from the ground for more than 11 days during the Cabauw Intercomparison Campaign of Nitrogen Dioxide measuring Instruments (CINDI), at Cabauw, NL (51.97° N, 4.93° E). All visible instruments used a common wavelength range and set of cross sections for the spectral analysis. Most of the instruments were of the multi-axis design with analysis by differential spectroscopy software (MAX-DOAS), whose non-zenith slant columns were compared by examining slopes of their least-squares straight line fits to mean values of a selection of instruments, after taking 30-min averages. Zenith slant columns near twilight were compared by fits to interpolated values of a reference instrument, then normalised by the mean of the slopes of the best instruments. For visible MAX-DOAS instruments, the means of the fitted slopes for NO2 and O4 of all except one instrument were within 10% of unity at almost all non-zenith elevations, and most were within 5%. Values for UV MAX-DOAS instruments were almost as good, being 12% and 7%, respectively. For visible instruments at zenith near twilight, the means of the fitted slopes of all instruments were within 5% of unity. This level of agreement is as good as that of previous intercomparisons, despite the site not being ideal for zenith twilight measurements. It bodes well for the future of measurements of tropospheric NO2, as previous intercomparisons were only for zenith instruments focussing on stratospheric NO2, with their longer heritage.


Introduction
UV-visible spectrometers that observe scattered sunlight provide the simplest method for routine remote sensing of NO 2 from the ground. By observing sunlight scattered from the zenith sky, they originally determined the total vertical amount of NO 2 , weighted to the stratospheric amount (Brewer et al., 1973;Noxon, 1975). The spectrum is analysed by least-squares fits of laboratory crosssections, after spectral filtering to eliminate slowly-varying spectral features -the so-called Differential Optical Absorption Spectroscopy (DOAS) method (Platt et al., 1979;Platt and Stutz, 2008).
More recently, observations of the sky at several elevations between horizon and zenith have allowed tropospheric NO 2 in polluted regions to be distinguished from the stratospheric NO 2 -the so-called Multiple Axis or MAX-DOAS method illustrated in Fig. 1 (Hönninger and Platt, 2002;Hönninger et al., 2004;Wittrock et al., 2004). Because the path to the last scattering point is confined to lower altitudes at elevations close to the horizon, measurements at several elevations down almost to the horizon yield information about the vertical profile of the absorber within the troposphere. The number of MAX-DOAS instruments for NO 2 deployed worldwide has grown considerably in recent years. This increasing use of MAX-DOAS instruments for tropospheric observations, together with the diversity of their designs and operation protocols, created the need for a formal intercomparison to include as many different instruments as possible.
The Cabauw Intercomparison Campaign of Nitrogen Dioxide measuring Instruments (CINDI) described here was held under the auspices of the European Space Agency (ESA), of the International Network for Detection of Atmospheric Composition Change (NDACC), and of the EU Framework 6's ACCENT-AT2 Network of Excellence and GEOMON Integrated Project. ESA promotes accuracy of ground-based measurements that can be used for satellite validation; NDACC promotes excellence in measurements of atmospheric composition; and GEOMON has been responsible for maintaining and developing networks of ground-based remote sensors, in support of the preparation of the GMES Atmospheric Service.
One component of ensuring high quality of measurements is to compare instruments and analyses when measuring and analysing identical felds, and NDACC holds intercomparisons of relevant instruments and analysis techniques from time to time. So far for NO 2 , only stratospheric measurements have been intercompared (Lauder, New Zealand, in 1992by Hofmann et al., 1995Camborne, UK, in 1994by Vaughan et al., 1997OHP, France, in 1996 by Roscoe et al., 1999;andAndoya, Norway, in 2003 by Vandaele et al., 2005). Here we present results from the first intercomparison of MAX-DOAS as well as zenith-sky ground-based remote sensors of NO 2 .
19 Fig. 1. The principle of MAX-DOAS measurements (Multiple Axis, i.e. elevation scanning, with DOAS spectral analysis): the stratospheric paths at low elevation and zenith are almost identical at low solar zenith angles. Hence if a spectrum at lower elevation is divided by a zenith sky spectrum, the result of the subsequent spectral analysis is only sensitive to the tropospheric absorber amount. Fig. 1. The principle of MAX-DOAS measurements (Multiple Axis, i.e. elevation scanning, with DOAS spectral analysis): the stratospheric paths at low elevation and zenith are almost identical at low solar zenith angles. Hence if a spectrum at lower elevation is divided by a zenith sky spectrum, the result of the subsequent spectral analysis is only sensitive to the tropospheric absorber amount.
The interest of ESA is stimulated by the ability of recent atmospheric chemistry nadir sensors such as GOME, SCIA-MACHY, OMI and GOME-2 to measure tropospheric NO 2 . More such instruments are planned, for example the GMES Sentinel 4 and 5 missions, and the GMES Sentinel 5 Precursor (to be launched in 2014). Validation of tropospheric composition measurements from space is crucial because of the typically large uncertainties in retrievals that rely on a-priori knowledge of surface properties, cloud and aerosol effects and the vertical distribution of the measured trace gas. Furthermore, tropospheric NO 2 measurements from nadir UVvisible sensors show little or no vertical discrimination beyond correction for the stratospheric contribution, and are therefore limited to total tropospheric amounts. Hence surface in-situ measurements are not necessarily useful for validation, instead validation demands a technique that can deliver the mean concentration throughout the troposphere, for which the elevation scanning of MAX-DOAS measurements is ideal.
In fact, elevation scanning allows two to three pieces of independent vertical information to be retrieved, a subject that will be explored in a companion paper that will intercompare vertical profiles, retrieved using different inversion programs and/or different data sets. An important aspect of the retrieval is that the weighting functions are strongly dependent on the aerosol profile, which can be determined from measurements of the oxygen dimer O 4 , which has a well known vertical profile and several prominent absorption bands in the UV and visible. However this introduces the need for accurate O 4 measurements, hence they are included in this intercomparison exercise.
Measurements in the UV part of the spectrum of NO 2 are not particularly useful for measurements in the stratosphere as the absorption cross-section is smaller and the light intensity lower in the UV, leading to overall reduced sensitivity. However, for tropospheric MAX-DOAS observations the situation is different as UV measurements have a different set of vertical weighting functions and a very different sensitivity to aerosol. Measurements in both visible and UV regions therefore improve the vertical information content of MAX-DOAS measurements, as well as supplying redundancy for quality control. In other situations where only one MAX-DOAS instrument can be operated, the UV is sometimes chosen because other important tropospheric gases can only be measured there (e.g. BrO, HCHO). Hence we also include UV measurements of NO 2 and O 4 in this intercomparison.
Compared to earlier intercomparison campaigns dealing with stratospheric observations of the zenith sky at twilight, measurements of tropospheric NO 2 by MAX-DOAS face different challenges: 1. Clouds interfere strongly with observations close to the horizon. They change the observed intensity, and they change the average light path and so the expected slant column. They also change the signal from interfering gases.
2. The expected temporal and spatial variability of tropospheric NO 2 is large, which calls for a high measurement repetition rate or exact synchronisation of measurements.
3. The need for good temporal resolution, together with the need for observation at different elevation angles, reduces the time available for individual measurements, which tends to reduce signal to noise ratios. On the other hand, measurements are taken during full daylight rather than twilight, which tends to increase signal to noise ratios.
4. The large change in sensitivity with elevation angle results in a strict requirement for pointing accuracy, unlike measurements of the zenith sky where pointing accuracy is not an issue.
5. To ensure good agreement between measurements from different instruments, in spite of the horizontal variability of NO 2 , good alignment in the viewing azimuth is also needed.

The intercomparison campaign
The campaign took place at Cabauw (latitude 51.97 • N, longitude 4.93 • E, at sea level) at KNMI's Cabauw Experimental Site for Atmospheric Research (CESAR), in The Netherlands (see Fig. 2). This location was chosen because of its unobstructed views close to horizontal at many azimuth angles, its large variability in tropospheric NO 2 , the absence of local pollution sources, good local support because of its closeness to KNMI headquarters in De Bilt near Utrecht, and its tower of height over 200 m. The same site has been used for two previous MAX-DOAS campaigns focusing on validation of satellite observations (Brinksma et al., 2008;Hains et al., 2010). The Cabauw site has a large suite of meteorological instruments deployed continuously, specialising in the boundary layer. The tower has wind, pressure and temperature instruments at various heights, NO 2 is sampled in situ close to the base of the tower, the wind profiling radar at its own site determines winds throughout the troposphere, and there is a cloud lidar at the site of the roof-top deployments. The site Fig. 3. Some of the roof-top instruments deployed at the campaign site, in views looking to the south (upper) and to the west (lower). Some other instruments were on the ground to the west of these containers; four other instruments were on the tower 370 m to the right of the lower picture; and a further three instruments were at the wind profiler site 160 m to the right of the upper picture. Fig. 3. Some of the roof-top instruments deployed at the campaign site, in views looking to the south (upper) and to the west (lower). Some other instruments were on the ground to the west of these containers; four other instruments were on the tower 370 m to the right of the lower picture; and a further three instruments were at the wind profiler site 160 m to the right of the upper picture. has a Total Sky Imager and a CT75 Ceilometer, and is a certified BSRN irradiance measurement station and a certified AERONET aerosol measurement station. Some additional instruments were assembled for the campaign -an NO 2 lidar with elevation scanning and an aerosol lidar were deployed at the roof-top site, extra in-situ NO 2 instruments were operated on the ground and near the top of the tower, and some novel NO 2 sondes were flown on balloons. Results from these profiling instruments will be compared to retrieved profiles from the MAX-DOAS measurements in a companion paper.
The intercomparison campaign took place in June and July 2009. Instruments (see Fig. 3) were installed and tested between 8 and 14 June; the formal semi-blind intercomparison was from 15 to 30 June inclusive (16 days); extra measurements of various kinds were continued by some instruments until 24 July. During the formal intercomparison, most of the instruments were measuring most of the time, the maximum data absent from any one instrument being 4 days. Weather conditions were mixed, with frequent changes in cloud cover, some rainy periods, and some early-morning mist. There were five days with exceptionally clear skies throughout the morning: 18, 23, 24, 25 and 30 June. Special attention will be paid to their results because we might expect less scatter then, due to the absence of clouds passing overhead; and because they will be the important data sets for use in companion papers exploring profiling methods.
The instruments participating in the campaign not only differ in design, but also in the way they are normally operated. Some scan from the horizon at close intervals in elevation, others take measurements at a smaller number of elevations. Some instruments also vary the azimuth angle, to investigate horizontal variability and to better constrain the aerosol profiles. Some instruments are also capable of direct sun observations. To ensure comparability of the measurements, a set of minimum requirements was defined which had to be performed by all instruments. This included measurements at elevations of 2 • , 4 • , 8 • , 15 • , 30 • and the zenith, all to be performed within a maximum of half an hour. All instruments were oriented to an azimuth of 287 • (north-west). For the intercomparison, only measurements with Solar Zenith Angle (SZA) less than 80 • were used. Some instruments performed measurements at additional elevation and/or azimuth angles, but these were not part of the formal intercomparison.
Following the precedent set by Roscoe et al. (1999) and adopted by Vandaele et al. (2005), the intercomparison protocol was semi-blind: a. Measurement and analysis results from the previous day had to be provided to the referee (HKR) by 10 a.m. At the daily meeting in the early afternoon, slant columns measured the previous day were displayed without assignment to the different instruments.
b. The referee notifed instrument representatives if there was an obvious error so that it could be corrected immediately.
c. At the end of the formal campaign, plots had instrument names attached, and plots of mean differences from one instrument were discussed. of an easily corrected error. For example, after the second day it was clear that at least one instrument had elevation angles that were wrong by about 1 • (see below). Without correction, this would have been particularly frustrating as the elevation sampling was at 2 • intervals, so that measurements at the adjacent nominal elevation could not simply be substituted.
For instruments observing sunlight, it is important to divide the measurement spectrum by a reference spectrum, in order to eliminate fine structure in the solar spectrum (Fraunhofer lines). The result of the subsequent spectral fit is then the difference in slant amounts of absorber between the measurement and reference spectrum. This quantity, sometimes called the "Differential Slant Column Density", is what we hereafter call simply the "slant column". For MAX-DOAS measurements focussing on tropospheric NO 2 , the best approach is to divide by a reference spectrum containing the same amount of stratospheric NO 2 , which would be the zenith measurement during each elevation scan. Unfortunately the scans by instruments in this campaign could not be synchronised to each other, so the resulting slant amounts being observed with such a choice of reference could be different for each instrument, because of the temporal variability in tropospheric NO 2 . We therefore chose to use as a reference the spectrum at zenith near local noon. Instrument scientists were encouraged to allocate at least half an hour for measurement of reference spectra, so that a spectrum could be selected without broken cloud passing the field of view, important because cloud significantly alters the O 4 and tropospheric NO 2 amounts.

Instruments
In total, 22 instruments from 14 institutes participated in the campaign. Table 1 shows that instruments observed over a variety of differing wavelength ranges. However NO 2 and O 4 were mostly analysed over a wavelength interval from 425 to 490 nm in the visible, or from 338 to 370 nm in the UV. Exceptions were MPI-Mainz that in the visible could only analyse from 420 to 450 nm, which also meant that it could not provide a useful visible-O 4 value.
Most instruments had a field of view (Table 1) that would not permit seeing the horizon even at the lowest elevation angle of 2 • , except in case of significant elevation errors (see below). That of Toronto at 2 • was much the largest:  it was designed as a zenith sky instrument, and only modified to MAX-DOAS elevation scanning in the run-up to the campaign. A uniform set of cross sections and other parameters was used for spectral analysis, as listed in Table 2. Cross sections were all at room temperature except ozone. This is justified by the dominance of tropospheric absorption features in lower-elevation spectra when using a zenith-sky measurement as the reference spectrum. We note that for quantitative analysis at large SZA, either corrections would be needed to account for the low temperature at which stratospheric NO 2 absorbs, or a reference from the same SZA must be used rather than noon.
Accuracy of the elevation of MAX-DOAS instruments can be a severe problem, as air mass factors change considerably with small changes in elevation when within one or two degrees of the horizon. Most groups aligned their instrument via an external reference surface set to horizontal using a spirit level (most spirit levels are accurate to 0.02 • or better). In many cases this was during an operating point in the elevation scan, which was then adjusted via software.
Unfortunately several instruments had significant backlash in the scanning mechanism, which became clear by the third day of the campaign when 2 • -elevation values differed significantly from other instruments whereas 8 • agreed well. For most instruments, a dark horizon of trees was visible, whose non-zero elevation could be calculated within 0.05 • from visual observation and dead reckoning. On a day with bright cloud, the dark horizon could be scanned to determine its apparent elevation, thereby finding the error in elevation angle. Some instruments were as much as 1 • in error in their earlier setting of horizontal.

MAX-DOAS results
Intercomparison of raw MAX-DOAS results between one instrument and another proved difficult because the measurements were not simultaneous, and because measurements at low elevations were often changing rapidly in response to variations in cloud and in NO 2 concentration. Figure 4 shows an example, where a cloud at 15:20 UT caused a large increase in slant NO 2 , but the difference in sample times between BIRA and Bremen measurements resulted in a large difference in the apparent increase. The difference was reduced, though not in this case eliminated, by taking 30 min averages.  Taking 30-min averages also allowed us to use the mean of a set of instruments as a reference for the analyses, rather than just one instrument. An example analysis is shown in the straight-line fits in Fig. 5. The fits provide three types of information: the slope between the slant columns of each instrument and the reference, which should be close to 1; the intercept, which should be close to 0; and the scatter, which indicates the precision of the measurements, but also is influenced by the sampling issues discussed above. Firstly we made plots for the instruments, such as Fig. 5, but against one instrument arbitrarily chosen as Bremen, in order to make a preliminary assessment of the quality of their slant columns. The most consistent instruments (those with similar slopes to each other, with small intercepts and with small residuals) were then chosen for the reference set for straight line fitting, and their weighted 30-min average values were found. Instruments in the visible reference set were Bremen-Vis, BIRA-Vis, INTA-RASAS2, NASA, NIWA and Washington.

Atmos
In order to facilitate comparison between all the instruments and all elevations, the slopes and standard errors in slope have been derived from fits similar to Fig. 5, using data from the whole time period of the formal intercomparison except those discarded because of elevation errors (see above). The results presented in Figs. 6 and 7 show that the means of the fitted slopes for NO 2 and O 4 of all except one instrument in the visible were within 10% of unity at almost all non-zenith elevations, and most were within 5%. The small values for the standard errors of the slopes show that these differences of slopes from unity are highly significant.
As mentioned above, adjustments were made to some instruments and data sets early in the campaign, when the referee detected obvious inconsistencies in the values submitted. In addition, revised values were used for Leicester, whose semi-blind results showed significant disagreementslopes smaller than 0.8 for NO 2 . The fault was analytical, and arose from fitting errors introduced by custom spectral fitting software under development by this group. Following publication of slant column intercomparisons, spectra were reanalysed by the Leicester group using BIRA's QDOAS software (a multi-platform derivative of WinDOAS), significantly improving agreement. Such algorithmic errors, which may remain undetected outside of an intercomparison campaign, demonstrate the importance of the availability of trusted common retrieval software such as WinDOAS for validation of developmental algorithms, and the importance of checking developmental software when an instrument is deployed alone.  Table 1).

Fig. 6.
Straight-line slopes and their standard errors of NO 2 slant columns against those of the reference data set, for each instrument at visible wavelengths and for the whole campaign. Colours refer to elevation angles shown top right. Note that MPI-Mainz used a non-standard wavelength range for spectral analysis because of the limited range of the instrument (see Table 1).
The question arises whether any of the differences in Fig. 6 are caused by interference from clouds. This seems unlikely given the small standard errors on the slopes. However, some part-days were almost entirely cloud free. These were especially useful for comparison of vertical profiles, but they also enabled a definitive answer to this question. The part-days were the mornings of 18, 23, 24, 25 and 30 June, and Fig. 8 shows a similar plot to Fig. 6 but on just those part-days. It shows that the differences between instruments in Fig. 6 are not due to interference from clouds, as much of the pattern of differences is the same in the two figures. Figure 8 also shows that the scatter within any one instrument is not caused by the increase in variability expected from partial cloudthe results in Fig. 8 are if anything more scattered, which might be expected from a smaller number of days sampled if the variability were similar on all days.
It is also important to distinguish between good average agreement over the whole campaign with the reference data set, and the error bar for an individual day's MAX-DOAS measurement as it would be used for satellite validation. Table 3 lists the standard deviations of daily fitted slopes, which are a measure of this latter error. The values in Table 3 are more consistent with the larger scatter in Fig. 8 than in Fig. 6. Excluding the most extreme cases, the standard deviations vary from 3 to 15%, with generally larger values at 30 • , probably due to the reduced slant columns at a larger elevation. This day-to-day variability in the slopes results from the combined effects of instrumental noise and variability in pointing errors, together with the effect of the temporal mismatch between the measurements allied with the temporal variability in the NO 2 concentrations. Because of elevation errors and other instrument faults, some instruments had a smaller sample than others. However, only Heidelberg sampled less than half the available days at 2 • elevation, and half or more of the available days were sampled by all instruments at higher elevations (see Table 3).
Another way to assess the quality of measurements is to examine the histograms of differences from the reference data set. Ideally, the histograms should be symmetric and Gaussian in shape. Asymmetry might result from a number Atmos. Meas. Tech., 3, 1629-1646, 2010 www.atmos-meas-tech.net/3/1629/2010/   2.8 6.5 6.3 6.7 6.0 10 12 of reasons, for example saturation of some spectra, or attribution of absorption to the wrong cross-section at small absorber amounts. Generally speaking, a large non-Gaussian tail to the distribution (especially if occurring at any elevation angle) implies poorer spectral fits in some circumstances. Asymmetries or shifts occurring mostly at the lowest elevation angles might be related to pointing inaccuracies or variabilities. Figure 9 shows that Leicester, MPI-Mainz, JAMSTEC and INTA-NEVA had non-Gaussian tails to some of their distributions; and JAMSTEC, Heidelberg, KNMI-2, BIRA, Washington and NIWA had asymmetric distributions at certain elevations. However, the figure does show symmetric near-Gaussian histograms for many other combinations of instrument and elevation angle. Eight of the instruments in the campaign had the ability to measure NO 2 in the UV, where light intensities are smaller, tropospheric light paths are shorter and sensitivity to aerosols is different. Figures 10 and 11 show that the means of the fitted slopes for NO 2 and O 4 of all except one instrument were within 12% of unity at almost all non-zenith elevations, and most were within 7%. Again, the small values www.atmos-meas-tech.net/3/1629/2010/ Atmos. Meas. Tech., 3, 1629-1646, 2010  Fig. 6 in differences from unity slope for many instruments, but with larger scatter and errors because the number of data points is much smaller. Fig. 8. Straight-line slopes and their errors of NO 2 slant columns against those of the reference data set, for each instrument at visible wavelengths, for clear sunny mornings only. Colours refer to elevation angles shown top right. Note the similarity to Fig. 6 in differences from unity slope for many instruments, but with larger scatter and errors because the number of data points is much smaller.
for the standard errors of the slopes show that these differences of slopes from unity are highly significant. In contrast to the results from the visible instruments, the size of errors is strongly linked to elevation, with the largest errors at 30 • , where signals are smallest. This indicates that in the UV, the error is probably dominated by the signal to noise ratio. The histograms (Fig. 12) show that most instruments have either asymmetric or non-Gaussian distributions of residuals at several elevations. At the end of the formal intercomparison, Toronto modified their zenith-sky instrument to include MAX-DOAS viewing, simultaneously moving to UV wavelengths so as to also measure HCHO. Several other instruments also continued observations for some days, so a MAX-DOAS intercomparison that includes Toronto could be made. The results in Fig. 13 show that Toronto performed well, with a slope within 8% of unity at all elevations and within 5% of unity at most elevations.

Zenith sky results near twilight
Although the focus of this intercomparison was on tropospheric observations, all instruments also performed zenith-sky measurements during twilight, when sensitivity to stratospheric absorbers is largest.
However, in comparison to instruments operated solely for stratospheric measurements, the frequency of measurements was reduced, as a large fraction of the time was used for lowelevation measurements. Operation was changed to zenithsky only at about 80 • SZA, but this threshold varied slightly between instruments, making the sampling of the time series highly variable.
Further, the technique used to compare MAX-DOAS measurements (straight-line fits to 30-min averages) cannot be used for zenith sky measurements because the slant amounts of NO 2 change too rapidly during twilight. Hence we could not provide an average of several instruments to use as a reference for straight-line fitting. Instead, we chose one instrument with good sampling (INTA-RASAS2, see Fig. 14), and interpolated its values to the time of observation of each other instrument. Because INTA-RASAS2 was switched to UV observations later on 26 June, this limited the zenith-sky intercomparison to the period 15 June to 26 June.
If the fitted slopes of the style shown in Fig. 15 were used without modification, then most values would be less than unity because the values from INTA-RASAS2 were Atmos. Meas. Tech., 3, 1629-1646, 2010 www.atmos-meas-tech.net/3/1629/2010/ 27 Fig. 9. Histograms of the absolute deviations of visible measurements from the reference visible data set, for the whole campaign. Fig. 9. Histograms of the absolute deviations of visible measurements from the reference visible data set, for the whole campaign. generally a little larger than others. This would make it difficult to make a sensible statement about the level of agreement. Instead, we found the average of the slopes of all instruments and divided all slopes by this average, to produce the normalised values in Fig. 16 and Table 4. Figure 16 shows an excellent level of agreement -all instruments had slopes within 5% of the mean, thereby fulfilling the most important NDACC acceptance criterion for NO 2 (see Roscoe et al., 1999;Vandaele et al., 2005). Although this level of agreement is similar to what was achieved in the last campaign, this is a great success here, as this campaign had no focus on twilight measurements, and Cabauw is not ideal for a stratospheric NO 2 intercomparison because of the significant amounts of tropospheric NO 2 on some days. As with the MAX-DOAS intercomparisons, revised values were used for the Leicester instrument, whose semi-blind results had showed significant disagreement.
Another NDACC acceptance criterion is that the intercept should be less than or equal to ±0.1 × 10 16 molec cm −2 . Table 4 shows that Heidelberg significantly exceeds the intercept limit, and Toronto and CNRS exceed it by small amounts. The relatively large intercepts obtained in this  comparison (compared to previous exercises in clean sites like Lauder, OHP and Andoya) is to be expected. This is because in such a polluted site, reference spectra that cannot be perfectly synchronised will contain different amounts of NO 2 because of its temporal variability in the troposphere. This is almost certainly the cause of the large intercept for Heildelberg, with its otherwise good performance, arising for their instrument by chance.
The NDACC protocol also requests the measurement of slit function, polarisation and stray light. Measurement of slit function is requested via spectral lamps, but for this campaign most investigators used an analysis suite that determined the slit function via fitting to Fraunhofer lines in the spectra themselves, which can allow for changes during the campaign. Polarisation is not an issue for the majority of the instruments that have a fibre between the input optics and the spectrometer -of those with no fibre, Washington uses a wedge polarizer and CNRS have instruments that have previously been accepted because of their negligible polarization response. Finally, stray light is hardly an issue with modern spectrometers at the longer wavelengths analysed here.
Another NDACC requirement is to demonstrate the quality of the data, e.g. by showing the smoothness of a time series. Here we have taken alternative approaches, investigating the distribution of differences from a reference instrument (Fig. 17), and the root-mean-square residuals from a straight line fit to the reference instrument (Table 4). Table 4 shows that most instruments except Leicester have similar residuals, and Fig. 17 shows that this is due to a few of their differences being atypically large, thereby biasing the rms. Discarding these outliers, Leicester's histogram in Fig. 17 is similarly narrow to those of other instruments. Many histograms have some asymmetry, probably due to a dependence of the differences on SZA. For completeness, we also list in Table 4 the mean errors in the spectral fits, which are rather less than the residuals from the straight line fit, as might be expected.
To conclude, most instruments meet the zenith-sky criteria for endorsement by NDACC, with an important caveat about analysis software for Leicester (resolved by the use of the WinDOAS derivative QDOAS), and except for a strange distribution of differences from the Leicester instrument and      While the agreement between the measurements from all the instruments is good, some points have been identified that are of particular relevance for MAX-DOAS observations: 1. Exact alignment of the elevation angle is of utmost importance, and probably should be checked on a regular basis (not relevant for instruments that include directsun capability, such as NASA, as alignment is then regularly confirmed). During the campaign, problems with pointing were detected for several instruments which would have gone unnoticed in normal operations.
2. Temporal variability in the tropospheric signals is large, and a high frequency of measurements is needed to arrive at representative results. For future intercomparison campaigns, synchronisation of measurements should be considered, as a significant part of the scatter is probably due to differences in measurement time.
3. The consistency of NO 2 and O 4 observations is good but not perfect, and their spread gives a useful indication of representative uncertainties to be assigned to these quantities when used in profile inversion.  These are fits to twilight data from the whole campaign, hence the large density of measurements. Fig. 16. Slopes, errors in slope, and intercepts, of straight line fits of each instrument's data to that of INTA, for the whole campaign, after the slopes from fits such as those of Figure 15 were normalised by dividing by the mean of the slopes of all instruments. Fig. 16. Slopes, errors in slope, and intercepts, of straight line fits of each instrument's data to that of INTA, for the whole campaign, after the slopes from fits such as those of Fig. 15 were normalised by dividing by the mean of the slopes of all instruments.