Sea surface temperature datasets for climate applications from Phase 1 of the European Space Agency Climate Change Initiative (SST CCI)

Sea surface temperature (SST) datasets have been generated from satellite observations for the period 1991–2010, intended for use in climate science applications. Attributes of the datasets specifically relevant to climate applications are: first, independence from in situ observations; second, effort to ensure homogeneity and stability through the time‐series; third, context‐specific uncertainty estimates attached to each SST value; and, fourth, provision of estimates of both skin SST (the fundamental measurement, relevant to air‐sea fluxes) and SST at standard depth and local time (partly model mediated, enabling comparison with historical in situ datasets). These attributes in part reflect requirements solicited from climate data users prior to and during the project. Datasets consisting of SSTs on satellite swaths are derived from the Along‐Track Scanning Radiometers (ATSRs) and Advanced Very High Resolution Radiometers (AVHRRs). These are then used as sole SST inputs to a daily, spatially complete, analysis SST product, with a latitude‐longitude resolution of 0.05°C and good discrimination of ocean surface thermal features. A product user guide is available, linking to reports describing the datasets' algorithmic basis, validation results, format, uncertainty information and experimental use in trial climate applications. Future versions of the datasets will span at least 1982–2015, better addressing the need in many climate applications for stable records of global SST that are at least 30 years in length.

Sea surface temperature datasets for climate applications from Phase 1 of the European Space Agency Climate Change Initiative (SST CCI) Introduction Sustained observations from satellites contribute vital knowledge to our understanding of Earth's climate and how it is changing. Sea surface temperature (SST) is an 'essential climate variable' (Global Climate Observing System (GCOS), 2011) whose precise measurement is viable by remote sensing and is central to understanding of climate variability and change. Satellite and in situ SST measurements need to be used together to quantify marine change over many decades. Satellite SSTs both corroborate and challenge the quantification of global change available from the in situ ocean observing system, which is not in general designed to have the stability and traceability required to monitor climate (Kennedy, 2013). For these reasons, SST is one of the essential climate variables included in the European Space Agency's Climate Change Initiative (CCI; Hollman et al., 2013), the project that generated the datasets described here.
SST is derived in near-real time from observations by sensors on meteorological satellites by several agencies globally, many of whom co-ordinate activities through the Group for High Resolution SST (GHRSST, http://www.ghrsst.org). These near-real time data streams are essential to numerical weather prediction (e.g., Stark et al., 2007;Donlon et al., 2012), but not necessarily optimized for applications in climate science. For climate science, it is necessary to optimize characteristics such as: stability of observation (a key component of 'homogeneity'), sensitivity of estimated SST to true SST variability, independence from other datasets, and comparability of data with in situ measurements by drifting buoys. Reprocessing of SST datasets for 1991-2010 with this climate focus has been the main purpose of the SST CCI project whose outputs are presented in this article. There is discussion of these climate quality issues (in the context of analysis of the 'ARC v1.1' dataset discussed below; see Table 2) in , and they will be briefly reviewed below in the context of describing the SST CCI products.
The SST CCI outputs include two single-sensor datasets and a blended product. The description of these outputs below is structured as follows. First (section 1), we describe the dataset derived from the series of three Along Track Scanning Radiometers (ATSRs). This dataset is fundamental, since it provides the calibration reference for the other datasets. Second (section 2), we describe the dataset derived from Advanced Very High Resolution Radiometers (AVHRRs). Both the ATSR and AVHRR datasets are combined in the blended product that is finally described in section 4. The discussion in section 4 places these SST CCI outputs in the context of other available SST datasets. Key validation results and feedback from users are presented in section 5.

Retrieval, cloud-detection and uncertainty algorithms
The ATSRs have been well-described in Llewellyn-Jones and Remedios (2012), and references therein. In common with the AVHRRs, they have channels sensitive to infra-red radiation emitted by the Earth's surface, centred around wavelengths of~3.7, 11 and 12 lm. Because of high quality instrumental calibration and the robustness of retrieval available from their dual-view scanning geometry (e.g., , the ATSRs are able to support SST retrieval based on radiative transfer (RT) simulations (Merchant and Le Borgne, 2004), in contrast with the usual empirical methods applied to AVHRRs (e.g., Kilpatrick et al., 2001). Thus, the SST retrievals are not tuned to in situ measurements of SST, which is a unique feature. The importance of having an independent dataset for assessment of recent change and variability is discussed in .
A common approach for SST estimation is to form a weighted combination of two or more window-channel brightness temperatures (BTs). The weights are referred to as 'retrieval coefficients'. Such coefficient-based SST retrievals from ATSRs provide the stable reference for all SST CCI products. This retrieval scheme was developed in an earlier project, the ATSR Reprocessing for Climate (ARC) and extended to 2010 within SST CCI. The development of these coefficient-based SSTs is documented as follows: Embury et al. (2012a) present their basis in RT simulation; Embury and Merchant (2012) document the formulation of the coefficients; O. Embury and C. J. Merchant (submitted) describe the steps taken to harmonize the dataset across the three consecutive ATSR sensors; and Embury et al. (2012b) give the results of validation of the SSTs.
The SSTs in CCI Phase 1 products are not the coefficient-based retrievals of ARC. Instead, they are obtained by a reduced-state-vector optimal estimation (OE) algorithm similar to Merchant et al. (2008), and described further in the SST CCI Algorithm Theoretical Basis Document (ATBD; . OE is not essential in order to obtain low-bias SSTs from dual-view observations such as those of the ATSRs; ARC dual-view coefficient-based retrievals meet the SST accuracy requirements (Embury et al., 2012b), because of the extra degrees of freedom brought by observing at two zenith angles. However, for the single-view AVHRRs, coefficient-based approaches do not yield satisfactory levels of bias for all regions for daytime observations (Merchant et al., 2009), and OE is considered to be necessary. To maximize ATSR-AVHRR consistency, we applied the OE algorithm to ATSR also, having tuned the OE procedure to try to replicate the ARC SSTs. The full rationale is given in the SST CCI Algorithm Selection Report (Merchant and Mac-Callum, 2012). As discussed below, the ATSR SSTs by OE in the CCI products do not quite match the ARC SSTs' performance in terms of accuracy on regional scales, and this strategy may be reviewed in a future dataset release.
Retrieval by OE involves comparison of the ATSR BTs against a simulation of those BTs. The simulation uses a profile describing the atmospheric state from numerical weather prediction, and also assumes a prior SST. The difference between the observations and simulations is the basis on which the prior SST is adjusted to give an improved SST estimate given the ATSR observations. SST retrieval in infra-red imagery is valid only for clear-sky conditions, so in addition to the retrieval method, the method of cloud detection is critical to the resulting SST quality. A physically-based, probabilistic (Bayesian) approach to cloud detection is applied for the ATSRs, based on Merchant et al. (2005). This approach has been found to reduce substantially the incidence of both missed cloud and false detection of cloud relative to operational cloud masking methods (Embury et al., 2012b). Cloud detection failures are still likely to be present in specific situations, such as night-time observations of sea-ice with surface temperatures close to 0°C, or of low-lying fog. Detection or estimation of tropospheric aerosol is not integrated into the Bayesian cloud detection scheme, which in principle would be advantangeous. Tropospheric aerosol is masked as if it were cloud when it is sufficiently optically thick, whereas optically thin aerosol may have an impact on some SST measurements. An infra-red desert dust index (Good et al., 2012) is used to minimize SST biases arising specifically from mineral aerosol.
The method to estimate the uncertainty of each SST is physically based, and discussed in . Estimates of radiometric noise in the ATSR BTs and simulation-minus-observation uncertainty from fast forward modelling of BTs are propagated through the OE procedure using standard OE formulations for retrieval uncertainty, to give uncertainty components for individual pixels.

Contents of SST CCI ATSR product
ATSRs have a full resolution of 1 km at nadir, sampled across a swath~500 km wide. With~14 orbits per day, this swath width is sufficient to give a view of most of the globe over about 3 days. Accounting for cloud cover, which obscures the majority of the ocean at any given instant when resolved at 1 km, a single day of ATSR SSTs gives a rather sparse coverage (Figure 1). This is the main motive in the SST CCI project for endeavouring to add AVHRRs to the climate data record in a manner consistent with ATSR SSTs.
Although processing (cloud detection and retrieval) is done on full resolution imagery, the SST CCI ATSR product comprises 'level 3 uncollated' (L3U, see Table 1 for definition; Group for High Resolution Sea 2010-01-05 Figure 1. Typical coverage of 1 day in the ATSR dataset. The coloured swaths correspond to the SSTs obtained with the sun below the horizon, and using channels at three wavelengths. (Red colours are warm SSTs and blue/violet are cool, for illustrative purposes.) The grey swaths show the SST data obtained with the sun above the horizon, and therefore 'day-time' retrievals using channels at two wavelengths. Gaps along swaths indicate pervasive cloud cover preventing SST estimation. Gaps between swaths are areas not observed by the satellite during this day. The pattern of swaths is shifted longitudinally on each consecutive day, such that the pattern of swaths shown approximately repeats every 3 days, although there were exceptions to this '3-day pseudo-repeat' configuration during some phases of the ATSR-1 mission.
Surface Temperature (GHRSST) Science Team, 2012) files. Data are averaged in space (only) from the 1 km native resolution to a regular latitude-longitude grid at 0.05°C resolution (of order 5 km). Only clear-sky pixels are included in the average. Different channel combinations are feasible for SST retrieval during the day (just 11 and 12 lm) and night (3.7 lm in addition). Data are flagged where the 3.7 lm is used, so day and night cells can be easily identified. Except at very high latitudes, the local time of morning observations is close to 1030 h (ATSR-1 and ATSR-2) or 1000 h (Advanced ATSR), and of night observations is close to 2230 h (ATSR-1 and ATSR-2) or 2200 h (AATSR).
The primary data are the cell-mean skin SST (sea_surface_temperature). (The radiation to which all infra-red SST sensors are sensitive is controlled by the temperature within~0.01 mm of the air-sea interface, which is generally a few tenths of kelvin cooler than the SST immediately below the oceanic thermal skin layer, which is of depth of order a millimetre; e.g., Donlon et al., 2002.) Each skin SST has an associated mean observation time (time + sst_dtime), latitude (lat), and longitude (lon).
In addition, each skin SST also has an individual estimate of total uncertainty (sses_standard_deviation). Uncertainty is the degree to which the measurement is in doubt, and we quantify this using standard uncertainty, i.e., the standard deviation of the estimated distribution of (unknown) errors. (The name sses_standard_deviation ensures compatibility with readers of GHRSST-format products.) The total uncertainty in the cell-mean skin SST comprises three components . The SST uncertainty from radiometric noise at pixel level is propagated to the cell-mean assuming the noise is uncorrelated between pixels. The SST uncertainty from the forward modelling in the OE process is propagated to the cell mean assuming associated errors are fully correlated within the cell. Lastly, a component accounting for the global systematic uncertainty of the estimated SSTs is included.
For each cell-mean skin SST, there is an additional estimate of the SST that would be measured at a depth of 20 cm at a standardized local time (sea_sur-face_temperature_depth). The 20 cm SSTs are obtained by adding to the skin SST an estimate of the adjustment required to account for the ocean skin effect and near-surface thermal stratification (and their evolution between the observation time and standardized time). The standardized local time is 1030 h or 2230 h, whichever is closest to the observation time.
We provide both skin and 20 cm SST because they are both relevant in different ways. Skin SST is the primary measurement, controls the outgoing long-wave radiation, controls the turbulent fluxes of heat and moisture across the air-sea interface, but is difficult to validate comprehensively because ship-borne radiometers (e.g., Donlon et al., 2008) are not presently deployed with sufficient geographical coverage. 20 cm SST at a fixed local time is comparable to measurements made by drifting buoys and the historical in situ-based record of SST, and is therefore able to be validated globally during most of the period. It is intended to be stable with respect to changes in satellite overpass time which otherwise alias the SST diurnal cycle into the long-term signal, and can be compared by climate and ocean modellers to their uppermost model SST.
The uncertainty provided for the 20 cm SST (sst_depth_total_uncertainty) is larger than that for the corresponding skin SST, because uncertainty in the adjustment from skin to depth is included.
In addition to the total uncertainty estimates mentioned above, specific components of uncertainty are also provided. These quantify the uncertainty arising from effects giving rise to errors that have different degrees of correlation. Three components of uncertainty are estimated: from errors that are purely random between SST values (e.g., arising from instrumental noise), from errors that are in common between SST values over large space-time scales (e.g., from calibration error), and from errors that are correlated between SST values on the space-time scales of atmospheric variability. Distinguishing these components allows rigorous propagation of uncertainty when creating SST datasets averaged to coarser space-time resolution.
The universal time of the 20 cm SST observation was omitted from the product definition. It can be inferred from the skin observation time and location, but this is not ideal. Future versions of the SST CCI products will include this time explicitly.

L2P
Level 2 Pre-Processed SST retrievals on the same grid as the source satellite observations. Typically the satellite projection for one orbit L3U Level 3 Uncollated Data from a single L2P file remapped and/or averaged onto a regular grid L3C Level 3 Collated Data from multiple L3U files from a single sensor combined to cover a longer period of timetypically daily files L4 Level 4 Analysis Data from multiple sensors combined with an analysis procedure, such as Optimal Interpolation, to produce a gap-free SST product

L3U product format
The product format is an extension of the GHRSST Data Specification version 2.0 revision 5 (Group for High Resolution Sea Surface Temperature (GHRSST) Science Team, 2012), using Network Common Data Form (netCDF; see http://www.unidata.ucar.edu/software/netcdf/) with the Climate Forecasting conventions. Readers designed for GHRSST products should be able to ingest the SST CCI skin SST, skin SST uncertainty, location information and quality flags without modification.

SST CCI advanced very high resolution radiometer products
The SST CCI AVHRR products are derived from Global Area Coverage (GAC) imagery. In GAC imagery, the full resolution (~1.1 km at nadir) BTs in the infra-red window channels are both averaged (over four acrosstrack pixels) and sub-sampled (along track every fourth scan) on-board the satellite, because of historical limitations in downlink bandwidth. Full resolution AVHRR imagery would be preferable, but no such dataset with global coverage exists. The resulting GAC resolution is considered to be representative of~4 km (coarser at the swath edges). Since this is comparable to the ATSR L3U spatial resolution, no averaging is done of GAC-based SSTs. The SST CCI AVHRR product therefore comprises 'level 2' (L2P) products. The AVHRR swath is~2900 km across. The platforms are generally placed in 'morning' or 'afternoon' sun-synchronous orbits with daily repeat cycles, and the local equator crossing time is allowed to drift during the mission life (in the case of platforms operated by the US National Oceanic and Atmospheric Administration).
As with the SST CCI ATSR products, an extension of the GHRSST Data Specification version 2.0 revision 5 for L2P files is used. The geophysical content is the same as for ATSR L3U files: skin SST at time of observation; 20 cm SST at standardized local time; spacetime location information and standard uncertainty estimates per observation.
The AVHRR SSTs are retrieved by reduced-statevector OE, as with the ATSRs. Skin SSTs are usually less certain than for ATSR skin SSTs, firstly because the AVHRR instruments are typically more radiometrically noisy, and secondly because there is generally more ambiguity in inferring SST from a single view observation. However, being a wide swath instrument, and with two AVHRRs processed for most of the period, the sampling is greatly improved (roughly ten times as many SST observations as in the ATSR dataset) -see Figure 2.
The OE retrieval for AVHRRs is tuned to the calibration of the ATSR SSTs, by a bias correction of brightness temperature simulations, as described in the SST CCI ATBD . This makes the SST CCI AVHRR products independent of in situ measurements, in contrast with the AVHRR Pathfinder SSTs (Kilpatrick et al., 2001). Briefly, the steps in bias correction are as follows. Multi-sensor matches of both an ATSR and AVHRR overpass with a drifting buoy measurement are obtained. AVHRR BTs are simulated for the multi-sensor match, using the same simulation procedure as used in the ATSR OE. The SST used as input to the simulation is an SST retrieved from the ATSR observations using the reference SST coefficients. An adjustment for the difference in time between the ATSR and AVHRR overpasses is obtained from the drifting buoy time-series during that interval. Since the drifting buoy record is only used to give an SST difference between two times, independence of the AVHRR SSTs from in situ SST calibrations is preserved: the OE is tuned to return a skin SST consistent with skin SSTs from the ATSR. The tuning is achieved by parameterizing the difference of simulated and observed BTs for each AVHRR channel across all the multi-sensor matches obtained. The parameters used include the AVHRR instrument temperature, satellite zenith angle and aspects of the numerical weather prediction profile used in the simulation. The use of the AVHRR instrument temperature is particularly important, since temporal changes in instrument temperature strongly influence secular changes in AVHRR calibration (Mittaz and Harris, 2011). 2010-01-05 The cloud detection applied to the AVHRR dataset is the Extended Clouds from AVHRR (CLAVR-x) algorithm (Heidinger et al., 2012). This is a 'na € ıve Bayesian' cloud detection algorithm, based on clear-sky probability estimates from six classifiers that are assumed to be independent.
The error effects of noise and retrieval ambiguity are represented in the standard uncertainty attached to each AVHRR SST. This standard uncertainty is derived as part of the OE retrieval, as with ATSR (see section 1.2), except using noise assumptions appropriate to the AVHRR series. Other effects that cause error in AVHRR SSTs are not presently represented in the SST uncertainty fields. These include cloud screening (which is more challenging when using averaged and subsampled BTs) and variability in AVHRR calibration. The local time of overpass of the satellites carrying AVHRRs generally drifts over time, changing the thermal environment with possible calibration effects. In addition, AVHRR-12 SSTs become suddenly warm (by~1 K) relative to ATSR-1 during three periods (11 January-27 January 1992, 3 June-31 August 1993 and 12 May-2 August 1994); in the current version of the dataset, there is no bias correction for this effect, nor is the problem reflected in the uncertainty information.

Optimally interpolated SST fields
Many climate users require SST fields that are complete in time and space, without the gaps that swath limitations and cloud distributions introduce into L2P and L3U products. Such interpolated and (in this case) multi-sensor blended products are referred to as 'SST analyses' or 'L4 SSTs'.
The SST CCI analysis product is a daily analysis on a regular latitude-longitude grid of 0.05°C resolution. The input fields are the 20 cm SST from both the SST CCI ATSR and AVHRR datasets. The analysis is therefore a 20 cm SST product, comparable to drifting buoy measurements and useful for comparing to the uppermost SST in climate/ocean models. The analysis is spatially complete ( Figure 3). As described earlier, the 20 cm SST product is adjusted to standardized local times (1030 and 2230 h). The temperature at these times is on average close to the mean of the diurnal cycle (Figure 4, which indicates that the analysis represents a daily mean with systematic uncertainty arising from the local-time sampling of the diurnal cycle of less than~0.02 K). The time adjustment to give stable sampling within the diurnal cycle of the 20 cm SST estimates is important for the stability of the SST CCI analysis. Daily SST is a maximum in early afternoon and a minimum around dawn, and the drift of local equator crossing time seen in AVHRR missions in particular can therefore introduce trends in measured SST without any real underlying trend in the ocean. This is the reason that in SST CCI we seek to minimize this aliasing effect by adjusting data to a standardized local time. In summary, the SST CCI analysis product provides an estimate of the daily mean SST at 20 cm depth. Unlike other SSTs analyses, no in situ data are used, and the dataset is independent from in situ data.
The system used for optimal interpolation is a reanalysis version of the Met Office 'Operational Sea Surface Temperature and Sea Ice Analysis' (OSTIA; Donlon et al., 2012;Roberts-Jones et al., 2012) system. The optimal interpolation scheme in OSTIA uses the previous day's analysis field as the basis for a first guess ('background') field. Feature resolution in an analysis does not necessarily match grid resolution, and is an important property (Reynolds et al., 2013). The OSTIA reanalysis system has been updated within the SST CCI project to better preserve high-resolution features by optimizing assumed length-scales of error correlation within the analysis and background error covariances (J. Roberts-Jones, K. Bovis, M. Martin, and A. McLaren, submitted) ( Figure 5). The system has been adapted to use satellite-only input data, and there have been some improvements to the use of sea ice data in the system (Roberts-Jones et al., 2013). Other differences from the previous OSTIA reanalysis system (which was described in Roberts-Jones et al., 2012) are as follows. First, there is spatial subsampling (1 in 4) of the SST CCI AVHRR data to reduce data volumes. This was found to have no significant impact on the analysis 2010-01-05 validation statistics. Second, when using 20 cm SST products at standardized times, there is no need to attempt to remove observations at risk of contamination by diurnal warming, and the diurnal-cycle filters are turned off for the SST CCI analysis. Third, lakes are included in the temperature analysis, but as they are not included in the SST CCI ATSR and AVHRR datasets, their temperature is simply set by a relaxation to the ARC-Lake temperature climatology (MacCallum and Merchant, 2011). Fourth, there is no masking of SST under sea ice. Finally, for the background field relaxation (in the absence of recent observations), the climatology from the MyOcean OSTIA reanalysis (Roberts-Jones et al., 2012) is used.
Standard uncertainty from the SST CCI ATSR and AVHRR inputs is used within the analysis system along  with background field uncertainty estimates to weight the satellite SSTs contributing to the optimally interpolated SST at any particular place and time. The standard uncertainty attached to a given analysed SST value is based on optimal interpolation of the weights given to the observations in the analysis (Donlon et al., 2012); the analysis uncertainties should therefore reflect the spatial distribution and uncertainties of the observations and uncertainties in the background field used in the analysis.
The OSTIA system has been run operationally using AATSR SSTs as a reference to which other satellite SSTs were bias adjusted (Donlon et al., 2012). The SST CCI ATSR and AVHRR SSTs input to the SST CCI analysis are intended to be highly consistent with each other, since the OE algorithm in both cases is tuned at brightness temperature level for consistency to ARC SSTs. However, it was found nonetheless to be beneficial to retain in the analysis system the bias correction of the AVHRR to ATSR SSTs to remove residual biases between the data. One factor in this decision was the marked AVHRR-12 problems mentioned above. More generally, the degree of consistency between the SST CCI AVHRR and ATSR products (as measured by monthly mean differences of 20 cm SST) has not yet reached the target level of <0.1 K regionally on scales of 1000 km and longer (Merchant et al., 2009;GCOS, 2011).
The OSTIA product includes a sea ice concentration field. This was sourced from products of the Ocean and Sea Ice Satellite Application Facility of EUMETSAT (Eastwood et al., 2011;Eastwood, 2012), and future versions will take account of outputs from the CCI project on sea ice, to maintain consistency between CCI datasets.

Context for use of the SST CCI datasets
The purpose of this section is to help potential users understand SST CCI products in comparison to other closely comparable options. There is a broader spectrum of SST datasets available beyond those discussed here, discussion of which is beyond our present scope. Table 2 shows a summary of the properties of six datasets, the three SST CCI datasets presented here, plus an existing similar dataset in each case. The table gives basic information, such as the satellite sensors of relevance, the years with data and grid resolution. The next row states whether the dataset is independent of in situ data: i.e., is the overall SST calibration obtained from the sensor calibration and RT modelling (independence), or by empirical means of relating satellite BTs to in situ measurements (no independence)? The next row addresses whether explicit steps have been taken to harmonize the time series, i.e., whether overlap periods between consecutive sensors have been exploited to minimize steps in the data from the introduction and disappearance of different sensors. Harmonization should improve the stability of measurement across the time series. Also related to stability is whether adjustment has been made to compensate for any drift in the local time of observation, addressed in the next row. Daily SST is a maximum in early afternoon and a minimum around dawn, and the drift of local equator crossing time seen in AVHRR data in particular can therefore introduce trends in measured SST without any real underlying trend in the ocean. One approach (used in SST CCI) to minimize this aliasing effect is to adjust data to a standardized local time.
As noted in Table 2, the ARC v1.1 dataset  is the most accurate and stable of the six datasets. However, the ARC v1.1 dataset exists on a coarser grid resolution. The OE algorithm used in the SST CCI v1.1 ATSR datasets is tied to ARC SSTs, but preliminary validation (see section 5) suggests it does not fully replicate the accuracy and stability of Both products have the same grid resolution of 0.05°. The feature resolution in the SST CCI analysis is improved; quantification of the improvement is yet to be undertaken. ARC . Both ATSR datasets have relatively sparse spatial coverage. The Pathfinder v5.2 AVHRR dataset is the longest time series, being consistently processed between 1984 and 2012. These SSTs are not independent of in situ observations, being regressed to drifting buoys. Although consistently processed, no harmonization or diurnal cycle adjustment is applied in Pathfinder, and it is an open question whether the stability of the time series is adversely affected. The MyOcean OSTIA re-analysis  is longer than the SST CCI analysis. The MyOcean dataset is not based on consistent, harmonized inputs and significantly smooths SST features (fronts). In comparison with the SST CCI analysis, the utility of the MyOcean dataset lies in its earlier start date. Corlett et al. (2014) reported detailed validation results for all SST CCI products (each ATSR and AVHRR mission, and the analysis), validating both SSTs and associated SST uncertainty information. The report also contains comparisons with other SST products. Here, we present key results and conclusions from the validation of the SST CCI ATSR and AVHRR products, and some more detailed results for the analysis product. 20 cm SSTs from ATSR and AVHRRs were matched to drifting buoys, tropical moored buoys and for the AATSR period, August 2002 onwards, uppermost mea-surements from Argo profiling floats. Results were consistent across different types of in situ validation data, and so summary statistics are given in Table 3 only relative to global drifting buoys. Table 3 shows that day time SSTs in the SST CCI ATSR and AVHRR products are generally noisier than night time SSTs, having larger robust standard deviation (RSD). The median differences, which estimate bias assuming the mean calibration of the drifters is correct, are mostly in the range 0.0-0.1 K. In many cases the day time median difference is more negative than the night time bias for the same sensor. Generally, the statistics are better for the newer sensors in both the ATSR and AVHRR series. The target for bias in the project is <0.1 K, which is achieved in the global median in many cases.

Key validation and assessment results
Detailed analysis of the results for each sensor, using a variety of metrics, allowed conclusions to be made about the particular nature of errors for each sensor. Some key conclusions are as follows, with full details in Corlett et al. (2014).
AVHRR 12 displays large (~1 K) intermittent fluctuations in SST bias in the earlier years in the SST CCI dataset (1991-1994, attributed to unstable instrument calibration. Small dependencies of bias on wind speed and total column water vapour are found in SST CCI retrievals for many AVHRR sensors, and these tend to be similar in form between 'afternoon' satellites (AVHRRs 12,14,16 and 18). Evidence of desert dust affecting SST retrievals is noted for AVHRRs 15,16,17,18 and Metop A;for AVHRRs 14,16,17,18  Robust standard deviation (RSD) is calculated by scaling the median absolute deviation from the median, and is equal to standard deviation in the absence of outliers. See . 2 For the case of ATSR-1, failure of the 3.7 lm channel early in the mission means that the same channels are used for retrieval day and night, unlike the later ATSR sensors.
and Metop A there are detectable adverse effects of unscreened cloud (usually more prominent in night time SSTs). These systematic effects mean that biases are locally greater than the target of 0.1 K in some products. Evidence of biases from desert dust are detectable also for AATSR and ATSR-2. Both AATSR and ATSR-2 SSTs tend to be warmer relative to validation data in the tropics than at other latitudes by about 0.1 K, a latitudinal dependence of bias that is propagated through to the SST CCI analysis.
The stability of the ATSR SSTs is particularly important in SST CCI products, since the ATSRs give the overall absolute reference for the SST CCI analysis product. The stability assessment is reported in Rayner et al. (2014). Relative to tropical moored buoys during 1995-2010, the 95% confidence intervals for the linear trend in the difference between moored buoy measurements and the SST CCI ATSR product are (day time SSTs) +0.7 to +0.32 mK year À1 and (night time SSTs) À1.3 to +6.4 mK year À1 , with the AATSR period being most stable. Stability is an order of magnitude poorer during 1991-1995, because ATSR-1 was the least stable of the three ATSR instruments and BTs were affected by volcanic stratospheric aerosol. There is evidence of increased bias in the last months of ATSR-1 data (May and June 1996). The target stability in the project is 10 mK year À1 . Figure 6 shows median differences between the SST CCI analysis and matched drifting buoy observations, as both a map and as a latitude-time plot. In general, the analysis is warmer than drifting buoys in lower latitudes, and is more in agreement at higher latitudes. Exceptions are close to the western coast of Saharan Africa and the Arabian Sea, where negative biases up to À0.5 K are evident, traceable to desert dust effects on BTs. The latitude-time plot shows an annual cycle in satellite-drifter differences at most latitudes. The cycle amplitude varies with latitude up to~0.3 K (e.g., at 30°N). Cycles are most clearly evident after 1999 (which is about the point at which the global coverage of drifting buoys becomes more consistent from year to year). In the early part of the record up to 1994, there is no evidence of the large biases present in the AVHRR 12 SSTs propagating into the SST CCI analysis, showing that the bias-correction of AVHRR to ATSR within the optimal interpolation processing is effective. Table 4 shows global validation statistics for the SST CCI analysis compared to drifting buoys, tropical moorings and Argo floats.
Associated with every SST value in the SST CCI products is an uncertainty estimate. Corlett et al. (2014) also validate that this uncertainty estimate is properly discriminating between SSTs with small and large uncertainty. The uncertainty estimates in the SST CCI analysis product range from 0.1 to 1.5 K, depending principally on the proximity in time and space of a particular analysed SST to satellite observations. Corlett et al. (2014) bin matches of analysed SST to drifting buoys according to the stated analysis uncertainty, and show that the standard deviation of the satellitedrifter differences depends as expected on the analysis uncertainty: e.g., when the analysis uncertainty is largest, the spread of satellite-drifter differences is largest and is close in value to the analysis uncertainty. This confirms the appropriateness of the calculation of analysis uncertainty. Rayner et al. (2014) also report assessments of the SST CCI products that go beyond traditional validation  activities. These assessments capture feedback on the experiences of climate scientists using the SST CCI products as 'trail-blazer users'. For example, the SST CCI analysis was found to be a suitable tool for evaluation of the mean state and variability of coupled climate models. As mentioned above, one area of concern was the Arabian Sea, where variability appears to be exaggerated by intermittent biases related to desert dust, adversely affecting the ability to simulate monsoon rainfall when using the SST CCI analysis to drive an atmosphere-only model. Users identified benefits from using SST uncertainties provided within the products. SST CCI products showed stronger relationships to precipitation and cloud than comparison SST data, which, taken together with other results, suggests the variability in the SST CCI product is more geophysically representative. Users generally found the products convenient to use and helpfully documented. The Product User Guide (Good and Rayner, 2013) is the recommended starting point for new users of SST CCI products.
Phase 2 of the SST CCI project commenced in 2014. The project team intend to extend the period covered by its datasets back to the early 1980s and forward to link with the era of new dual-view radiometers (Sea and Land Surface Temperature Radiometers) due to be launched and operational by 2016. Work will be done to improve accuracy, particularly during periods of significantly elevated stratospheric aerosol (1982/1983 and 1991/1992), and to develop means of maintaining independence from in situ SST for the AVHRR dataset prior to the advent of the ATSR series in 1991. The SST CCI analysis from Phase 2 will use the new ATSR and AVHRR datasets, and will thus be more than 10 years longer than the v1 analysis described here. To improve feature representation further, adaptive correlation length-scale parameterization and other developments will be explored. Thus, the objective for Phase 2 is a climate data record for SST of >30 years duration, with the independence and high stability required for many applications in climate science.