Narrowband-to-Broadband Conversions for Top-of-Atmosphere Reflectance from the Advanced Very High Resolution Radiometer (AVHRR)

The current lack of a long, 30+ year, global climate data record of reflected shortwave top-of-atmosphere (TOA) radiation could be tackled by relying on existing narrowband records from the Advanced Very High Resolution Radiometer (AVHRR) instruments, and transform these measurements into broadband quantities like provided by the Clouds and the Earth’s Radiant Energy System (CERES). This paper presents the methodology of an AVHRR-to-CERES narrowband-to-broadband conversion for shortwave TOA reflectance, including the ready-to-use results in the form of scene-type dependent regression coefficients, allowing a calculation of CERES-like shortwave broadband reflectance from AVHRR channels 1 and 2. The coefficients are obtained using empirical relations in a large data set of collocated, coangular and simultaneous AVHRR-CERES observations, requiring specific orbital conditions for the AVHRRand CERES-carrying satellites, from which our data analysis uses all available data for an unprecedented observation matching between both instruments. The multivariate linear regressions were found to be robust and well-fitting, as demonstrated by the regression statistics on the calibration subset (80% of data): adjusted R2 higher than 0.9 and relative RMS residual mostly below 3%, which is a significant improvement compared to previous regressions. Regression models are validated by applying them on a validation subset (20% of data), indicating a good performance overall, roughly similar to the calibration subset, and a negligible mean bias. A second validation approach uses an expanded data set with global coverage, allowing regional analyses. In the error analysis, instantaneous accuracy is quantified at regional scale between 1.8 Wm−2 and 2.3 Wm−2 (resp. clear-sky and overcast conditions) at 1 standard deviation (RMS bias). On daily and monthly time scales, these errors correspond to 0.7 and 0.9 Wm−2, which is compliant with the GCOS requirement of 1 Wm−2.


Introduction
Broadband measurements of top-of-atmosphere (TOA) reflected solar flux and emitted thermal flux are essential climate variables of which a high-quality data record of satellite measurements with sufficient length ("Climate Data Record" or CDR) is needed by, among others, the climate modeling community (for validation purposes), and preferably spanning several decades [1]. However, apart from instrument-specific broadband measurement campaigns (e.g., CERES, 2000-present [2]) and regional datasets based on geostationary satellites (e.g., Meteosat-based, 1983Meteosat-based, -2015 [3]), to date no harmonized global CDR dating back several decades exists. An alternative method is to rely on existing global long-term CDRs of harmonized narrowband reflectance (FCDRs), e.g., from the Advanced Very High Resolution Radiometer (AVHRR) instrument [4,5], and transform these measurements into broadband quantities [6]. This study aims to establish robust relations between the long-term zenith angle (VZA; θ). Built on this knowledge, the current study includes the SZA-dependency in the same way as Minnis et al. [22] and adds the VZA-dependency [23], resulting in the following regression equation: The inverse of the cosine being a good approximation for the atmospheric path. While the first two of above mentioned studies [16,22] removed anisotropy by applying a narrowband angular directional model (ADM), prior to the NTB conversion, the latter study [23] argued against this practice: it prevents the user from applying an ADM of his choice, e.g., anticipating on future improved ADMs. Their point of view is largely followed in this study, in which no a priori angular correction is applied. The resulting ρ SW can then be used in a broadband ADM of choice (e.g., [24][25][26], or anticipated future ADMs) to remove anisotropy and ultimately obtain the real hemispherical albedo and flux.

Data
The broadband measurements are taken from the instantaneous footprint data of the CERES instrument [27] onboard satellites Terra FM1 and Aqua FM3 (Single Scanner Footprint TOA/Surface Fluxes and Clouds edition 4A [28]). Apart from the actual broadband reflectance, also included are geophysical variables (e.g., cloud mask) simultaneously recorded by the accompanying MODIS imager and aggregated to CERES resolution. These will be compared to geophysical variables from AVHRR, to verify whether the matched NTB pair also has identical conditions (apart from the purely temporal-geometrical conditions). Footprint size and shape are spatially variable: cross-track and along-track dimensions increase with viewing zenith angle (θ) due to an increasingly spread-out projected area, which is even more magnified by the earth's curvature. Assuming a spherical pixel with diameter of 32 km at nadir (PSF with 95% energy cutoff [29]), at θ = 70 • its dimensions reach 212 km (cross-track) and~71 km (along-track). The narrowband measurements are taken from the AVHRR instrument [30] onboard NOAA-17,-18,-19 satellites, from which an FCDR is created with Global Area Coverage Level-1c data created using the PyGAC processor [5]. This data set is supplemented with geophysical variables (such as cloud mask, sea ice concentration, etc.) remapped on the same grid [31]. The data have a spatial resolution of 4 km at nadir, but similar to CERES SSF, pixel size and shape are subject to biaxial deformation with increasing viewing zenith angle.
The analysis time period has been selected by examining the orbital planes of Terra/Aqua and NOAA satellites: the former have a true sun-synchronous orbit (with daytime equatorial crossing times of~10:30 and~13:30 LST) whereas the latter suffer from orbital drift ( Figure 1). The orbital planes coincide around January 2005 (NOAA-17), 2008 (NOAA-18) and 2012 (NOAA- 19), which then provides the necessary conditions for having frequent matched NTB pairs, i.e., simultaneous, collocated, and coangular. Since coinciding orbital planes of AVHRR-and CERES-carrying satellites did not occur outside these three time periods, this study analyses an unprecedented exhaustive amount of matched NTB pairs. Combining satellites with deviating orbital planes is not efficient, since there is only a marginal addition of NTB pairs at a high computational cost. This is illustrated by a small test on January 2012: the combination Aqua+NOAA19 (coinciding) results in 4.124.334 matched NTB pairs, whereas Aqua+NOAA18 (~1h off) generates only 99.605 pairs, or 2.4%, which furthermore occur in two limited high-latitude bands. To account for seasonal variability in spectral properties of some land cover types (such as vegetation or sea-ice), the full annual cycle is considered. Therefore, a range of at least +/−6 months is chosen around January, ending up with the periods given in Table 1. The long time span of a full year furthermore enables regressions per surface type, as the sample size must be large enough within each of these groups, and finally it also allows splitting off a smaller subset serving as independent observations for validation purposes.

Data Preparation
An initial version of the dataset is constructed by identifying matched NTB pairs: based on temporal and spatial constraints, the algorithm iterates over all pixels in both datasets to find corresponding pixels.
In a first step, the CERES pixels are filtered on the criterion that it only contains either land, water, or snow (mixtures are excluded). Only water and sea ice are permitted to co-exist in a single CERES pixel. Then, the AVHRR pixel closest to the CERES pixel center is identified, but only considered if differences in 3D viewing angle (combining viewing zenith and azimuth) and observation time are less than 6 • and 450 s [13], respectively. If so, based on the viewing geometry and assuming an at-nadir size of 32 × 32 km (cf. Appendix A.1), the footprint size of the CERES pixel is determined, the corresponding AVHRR pixels are identified (their centroids should be located inside the CERES footprint), and their associated radiance and other geophysical variables are aggregated. This converts the binary cloud mask, either 0 or 1, into a continuous cloud fraction between 0-100%. An additional constraint is that all AVHRR pixels should belong to the same surface type (first column in Table 2: ocean, forest, etc.), otherwise the pixel disqualifies and the iteration continues. Therefore the algorithm finds the nearest matching IGBP land cover category for each AVHRR pixel in a static 1 km grid IGBP map [32], followed by a mapping from IGBP category to surface type [33] as shown in the first two columns in Table 2. Sea ice surface types can only contain a mixture of ocean and sea ice, and are subdivided according to sea ice concentration: subdivisions are based on the non-linear relations between concentration, color and albedo due to the seasonal effects such as melting pond formation [34]. Finally, sun glint contamination is avoided by rejecting NTB pairs with a sun glint angle (between sun specular reflection and viewing direction) smaller than 25 • [13]. The resulting dataset amounts over 45 million matched NTB pairs ( Figure 2). In a second step, filtering is performed on the resulting dataset. Matched NTB pairs are categorized according to their cloud fraction: "clear" and "overcast" pairs require both the CERES and AVHRR cloud fraction to exactly amount 0% and 100% respectively. "All-sky pairs" simply require that the difference between CERES and AVHRR cloud cover is no more than 20% point. At this point, the dataset contains over 35 million all-sky pairs and is referred to as the "expanded validation database" characterized by relaxed matching criteria (used in Sections 3.3.2 and 4.3.2).
In a third step, the dataset is narrowed down to 3.3 million pairs by restricting angular and temporal matching criteria: for most NTB pairs, the maximum allowed temporal and 3D-angular differences between CERES and AVHRR observations are set to 250 s and 2.5 • , respectively. These threshold values were established after an iterative trial-and-error process to find a trade-off between keeping a sufficiently large sample size (higher thresholds) and lowering angle/time difference error (lower thresholds). For surface types ocean and permanent snow, their areal abundance allows stricter criteria: 100 s and 1.0 • respectively. For some scarce combinations of surface type and cloud cover (i.e. scene type), namely clear forests, overcast savannas and clear broken sea ice (60-10%), the thresholds are set to 300 s and 3.0 • . Subsequently, for each scene type, every fifth pair in the chronologically ordered dataset is split off: these 20% of the pairs constitute the "smaller validation subset" characterized by strict matching criteria (used in Sections 3.3.1 and 4.3.1). The remainder of the data constitutes the "calibration subset" (80% of the pairs), used to derive the NTB regression coefficients (about 2.7 million pairs).

Regression Calibration
For each scene type, a multivariate linear regression is performed on the calibration subset using the least squares method with AVHRR reflectance and solar/viewing zenith angles as predictors (Equation (2)). The resulting best-fit coefficients are stored together with the associated regression statistics (adjusted R 2 , absolute and relative RMS residual). Letρ SW and ρ SW be the estimated and observed broadband shortwave reflectance, then residual r i is defined as (ρ SW,i − ρ SW,i ) and the sum of squared residuals (SSR) as (∑ n i=1 r 2 i ), so that the surface-type-dependent Root-Mean-Square residual (RMSr) is calculated as follows: where n is the number of NTB pairs i within a given scene type. Please Note that the units of RMSr are identical to the units of reflectance, i.e., expressed as percentage and ranging from 0 to 100. The relative RMS residual (rRMSr) is defined as the absolute RMSr divided by the mean observed broadband shortwave reflectance (< ρ SW >), and can be expressed as fraction or percentage: For each scene type, also two simpler regression models are constructed by using only 3 and 2 predictors, respectively (instead of 4 predictors in the complete model), from which the 2-predictor model is given by Equation (1). This allows to assessment of the added value of the additional predictors.
Finally, beside the surface-type-dependent regressions, a generic surface-type-independent regression is fitted by considering all matched NTB pairs regardless surface type. This regression allows to assessment of the added value of with a separate regression for each surface type.

Validation on Smaller Validation Subset (with Strict Matching Criteria)
To validate the surface-type-dependent NTB regression models obtained from the calibration subset, they are applied on narrowband AVHRR reflectance from the smaller validation subset (20% of the matched NTB pairs with strict matching criteria) using Equation (2). The resulting estimated broadband reflectance is compared with its observed equivalent (from CERES-SSF), and their difference quantified by the Mean Bias (MB) and relative Mean Bias (rMB) per scene type, each containing n NTB pairs: withρ SW the estimated and ρ SW the observed broadband reflectance. The statistical significance of MB is estimated by using Welch's Two Sample t-test at 95% confidence level, meaning that when the p-value remains below 0.05 the null hypothesis is rejected and the means are considered significantly different. Furthermore, for each scene type, the reflectance bias is also expressed in terms of isotropic flux (MB f lux , Wm −2 ), also referred to as flux-equivalent Mean Bias, allowing a better assessment of the bias' relevance since it is weighted by insolation: where TSI is the total solar irradiance (here taken at 1363 Wm −2 ). The statistical significance calculation of MB f lux is equal to the one of MB. This approach does not allow a regional (spatial) analysis since the strict matching criteria for both calibration and validation subsets limit the sampling, geographical coverage and hence the construction of global maps and globally aggregated metrics. This problem is tackled when using a different validation approach (Section 3.3.2).

Validation on Expanded Validation Database (with Relaxed Matching Criteria) and Regional Analysis
A second validation approach is to apply the surface-type-dependent NTB regressions on the narrowband reflectances from the expanded validation database, consisting of 35 million pairs with global coverage due to the relaxed matching criteria. The bias (MB, rMB, MB f lux ) between estimated and observed broadband reflectance is then calculated for each pair and stratified per surface type. Furthermore, to detect regional biases and to calculate a global mean bias regardless region-specific over-or undersampling of the matched NTB pairs, the expanded database is spatially aggregated into a 5 • × 5 • equirectangular grid. Prior to the global mean calculation, grid boxes with less than 32 observations are excluded and the grid is subjected to area-weighting [cos(latitude)]. Additional global statistics are calculated, such as the mean absolute bias (MAB) and RMS bias, both indicators for uncertainty at the 5 • regional scale.

Added Value of Surface-Type-Dependent Regressions
Both above mentioned validations are also done using the generic surface-type-independent regression, which is applied on all NTB pairs regardless their surface type. The added value of having a separate NTB regression for each surface type is then assessed by comparing the validation results derived by surface-type-dependent and by surface-type-independent regression coefficients.

Error Budget
The error is separated in three components: the precision, the stability and the accuracy.

Precision
The precision refers to the non-systematic error associated with the derivation of the NTB regression. Note that this does not include the error associated with the input data (including the FCDR). The precision depends on the sample size, i.e., number of NTB pairs within each scene type (n) used to derive the NTB regressions, and is quantified by the standard error of residuals (SEr), calculated as the RMS residual divided by √ n: Just like all other above mentioned metrics (R 2 adj , RMSr, rRMSr, MB, rMB, MB f lux ), this quantity is calculated for each scene type separately.

Stability
The stability refers to the change of the systematic error (bias) in time. Usually this includes jumps due to instrument switches and slow temporal degradation (aging of sensor). Here we only look to the first, i.e., the change of the bias from one AVHRR instrument to the other. Therefore, the regressions calibrated using all three satellite combinations (Section 3.2) are now applied on each satellite combination separately (NOAA17+Terra, NOAA18+Aqua and NOAA19+Aqua). This is done using the entire database of NTB pairs with relaxed matching criteria (max 6 • 3D angle and 450 s). Global mean statistics are calculated as described in Section 3.3.2, i.e., by aggregating all matched NTB pairs on a global map with 5 • × 5 • resolution and averaging all grid boxes after weighting by areal fraction [cos(latitude)]. For the flux-equivalent mean bias (MB f lux ) we derive three values MB f lux,N17 , MB f lux,N18 , and MB f lux,N19 : where n stands for the number of NTB pairs for each satellite combination sat (across all scene types).

Accuracy
Finally, the accuracy is defined as the regional bias associated with the NTB regressions. Similar to the stability statistic, the accuracy statistics are calculated by considering the global 5 • × 5 • bias grid based on NTB pairs with relaxed matching criteria, but now for all satellite combinations together. On the resulting bias map, the Mean Absolute Bias (MAB) and root mean square bias (RMSB) are calculated. The latter can be used to describe regional uncertainty at 1 standard deviation. Table 2 shows the different surface types with their associated IGBP category and occurrence frequency of matched NTB pairs using strict matching criteria. Combined sample size for categories "clear" and "overcast" is much lower than for "all-sky", because they have much stricter selection criteria (cfr. Section 3.1). For clear and overcast categories, the regional sampling patterns are plotted in Figure 3. These patterns are self-evident and reflect the spatial distribution of dry and wet climate regions. The figures are shown for all satellites together; subdividing the density maps per satellite does not significantly affect the spatial distribution (results not shown here).

Regression Calibration
The resulting regression coefficients are listed in Table 3. NTB conversions can be applied on any combination of channel 1 + 2 AVHRR reflectances by inserting the coefficients associated with a given scene type, together with solar and viewing zenith angles, into Equation (2).
The regressions are characterized by high coefficient of determination (R 2 adj ) and low RMS residual (Table 4), indicating the high goodness-of-fit and explanatory power of the models. The largest RMS residual (RMSr) can be noticed for surface type "Sea ice 10-60%" because of the high internal variability within this surface type which enlarges the spread around the regression line. Clearsky scenes have on average much lower absolute RMSr compared to overcast scenes, but this comparison doesn't hold anymore when considering the relative RMSr (rRMSr). The latter should be interpreted with care: ocean, for instance, has a rather high rRMSr (4.82%) but one of the lowest absolute RMSr (0.26). Overall, the regression models fit the observations well.
When comparing to earlier NTB regressions with AVHRR as narrowband reflectance predictor, the present study forms a significant improvement. For example Wydick et al. [17] report for clear ocean, clear land and clouds an absolute RMSr of respectively 1.4, 2.1 and 7.5 (compared to our 0.26, 0.33-0.67 and 0.80-1.76). Li and Leighton [11] report for the same surface types an RMSr of respectively 1.0, 1.8 and 3.1, corresponding to a relative RMSr of 13.8%, 10.4% and 6.5% (compared to our 4.82%, 2.26-3.13% and 1.49-3.71%). Explained variance (R 2 ) is higher for all scene types, e.g., for clear ocean (0.91), clear land (0.95-0.98) and clouds (0.97-0.98) compared to respectively 0.64, 0.75, 0.76 [17] and 0.83, 0.77, 0.93 [11]. Part of the improvement can be attributed to the CERES instrument, which has a better spatial resolution allowing a more homogeneous signal (less noise) within a scene type, compared to the ERBE broadband instrument used in earlier studies. Also the strict pixel selection criteria (data filtering) used in this study, which is based on both AVHRR and the CERES-accompanying MODIS imager, could contribute to less noise (compared to the previous studies in which only AVHRR was used for scene identification).

Clear-Sky
Overcast All-Sky Ocean  The adjusted R 2 is a modified version of R 2 that has been corrected for the number of predictors: it increases only if the additional predictor improves the model more than it would be expected by chance. By comparing R 2 adj obtained by the two simpler models, with respectively only 2 and 3 predictors, the added value of the SZA and VZA predictors becomes obvious as (for every scene type) their R 2 adj is equal or lower than the full 4-predictor model (Appendix A.2). Furthermore, rRMSr is in both simple models equal or higher than the full 4-predictor model (results not shown).
The generic surface-type-independent regression combines all matched NTB pairs, regardless surface type. With a value of 0.998, the adjusted R 2 from the generic regression is higher than any of the surface-type-dependent regressions (Table 4). This has been reported in similar studies [11,17], from which the latter argues that this does not mean that surface type discrimination is unimportant, since applying the generic model on homogeneous scenes would introduce systematic biases [11]. The added value of the surface-type-dependent regressions is assessed in Sections 4.3 and 4.4.3, where the results are derived using both surface-type-dependent and surface-type-independent regressions.

Validation on Smaller Subset of Data (with Strict Matching Criteria)
Validation is done by applying the regression models to the narrowband AVHRR reflectances in the smaller validation subset, thereby quantifying their performance, of which the results are shown in Table 5. The bias metrics (rMB, MB, and MB f lux ) should be small and as close as possible to zero (note that in the calibration subset this metric is per definition zero, and therefore omitted in Table 4). The relative RMS residual (rRMSr) is calculated similarly to the calibration rRMSr (Equation (4)), but in this case applied on the validation subset instead of the calibration subset. The results indicate an overall good performance, with rRMSr roughly similar to the calibration rRMSr, and the rMB close to zero with an outlier maximum value of only +0.48% (corresponding to MB f lux of +0.57 Wm −2 ). None of the scene types have a statistically significant bias. MB f lux : mean bias expressed as isotropic flux (Wm −2 ); rMB: relative mean bias; rRMSr: relative rms residual.
Instead of using surface-type-dependent regressions, the same validation is done using the generic surface-type-independent regression: the results shown in Appendix A.3 indicate that for all scene types the performance is significantly worse, demonstrating the advantage of having surface-type-dependent regressions.

Validation on Expanded Validation Database (with Relaxed Matching Criteria) and Regional Analysis
For the second validation approach, the regression models are applied on the expanded database, and the results are listed in Table 6. Compared to the first validation approach (Section 4.3.1) relative RMSr is mostly larger due to the relaxed matching criteria, which introduces additional noise due to spatial and angular errors. The mean biases are relatively small, but still significant for some scene types, with MB f lux ranging from −1.14 to +1.20 Wm −2 (for respectively all-sky ocean and clear sea-ice 10-60%). Please Note that the ratio MB f lux /MB varies, e.g., clear-sky ocean has a Mean Bias (MB) of −0.078 and a resulting MB f lux of −0.78 Wm −2 (factor 10), whereas clear-sky permanent snow/ice has a Mean Bias of −0.116 and a resulting MB f lux of −0.64 Wm −2 (factor 5), which is due to differences in geographical distribution (latitude) and hence solar zenith angle for the different surface types (Equation (7)).
rRMSr obtained by validation of the two simpler models, with respectively only 2 and 3 predictors, demonstrates the added value of the SZA and VZA predictors as (for every scene type) this metric is equal or lower than the full 4-predictor model (results not shown).
Instead of using surface-type-dependent regressions, the same validation is done using the generic surface-type-independent regression: the results shown in Appendix A.4 indicate that for all scene types the performance is significantly worse, demonstrating the advantage of having surface-type-dependent regressions.
Global statistics can be made after the bias is aggregated and mapped on a global grid of 5 • × 5 • . Table 2 and Figure 3 illustrate the sample size and corresponding geographical coverage of the dataset with strict matching criteria: it is clear from this that the sampling coverage is insufficient (not global). Using relaxed criteria, however, the sample size becomes much larger (35 million instead of 3.3 million matched pairs) and the geographical coverage improves accordingly (figures not shown), allowing a spatial analysis of the regional bias as well as the calculation of a global mean. The resulting global MB f lux (Table 6)  SZA is known to be a significant predictor in shortwave NTB regression models, explaining increases up to 10% for observations with highest SZA (polar regions) compared to lowest SZA (tropics) [16,22,35]. To a lesser extent, this is also true for VZA. Models ignoring these parameters are tuned to the most frequently sampled SZA and VZA. The dominance of midlatitude/subpolar sampling of the scene type overcast ocean in our analysis (Figure 3b) thus explains the positive bias associated with overcast tropical ocean (Figure 4a) and its reduction when adding SZA as predictor (Figure 4b). Similarly, the inclusion of VZA improves the negative bias associated with overcast poles (Figure 4b,c), because in this limited region the VZA is much higher than the global average (due to specific sampling near the terminator). The added value of SZA and VZA as regression parameters is also quantified using the accuracy statistics (cfr. Section 4.4.3).  [2] −0.08 −0.044 +0.19% n.a. [2] −0.64 −0.092 +0.64% n.a. [2] MB f lux : mean bias expressed as isotropic flux (Wm −2 ); rMB: relative mean bias; rRMSr: relative rms residual. * statistically significant on the 99% confidence level.
[1] After spatial aggregation of all NTB pairs on a 5 • ×5 • grid with area-weighting. [2] Since rRMSr is calculated with respect to all NTB pairs within that scene type, this metric cannot be globally aggregated.
When considering the complete models using all predictors, the bias does not show pronounced region-specific patterns (Figures 4c and 5a,b).

Precision
The standard error of residuals (SEr), calculated on the calibration subset, is very small ( Table 4): it ranges from 0.001 to 0.024, indicating a sufficient sample size for all scene types.

Stability
The change of systematic bias from one AVHRR instrument to the other is investigated. Therefore the NTB regressions are applied on three subsets of the expanded validation dataset, which is stratified by satellite combination (NOAA17+Terra, NOAA18+Aqua, NOAA19+Aqua). After aggregating these instrument-specific biases (MB f lux,N17 , MB f lux,N18 , MB f lux,N19 ) on a 5 • × 5 • grid, the resulting global mean biases are calculated (Table 7).
When applying the regressions on the subset with satellite combination NOAA17+Terra, a small positive bias becomes apparent (+0.32 Wm −2 for all-sky conditions). For the subset NOAA18+Aqua there is a clear negative bias (−1.42 Wm −2 for all-sky conditions), and the subset NOAA19+Aqua takes an intermediate position (−0.86 Wm −2 for all-sky conditions).
The instrument-specific bias can be attributed to the FCDR on which the analysis is based. Homogenizing satellite data from different satellites and viewing conditions is a challenging task and a continuous effort that will remain subject of future research. The stability, here quantified as instrument-specific bias, may therefore be alleviated whenever the FCDR is improved with updated calibrations.  Figure 6 illustrates the figures in a spatially-explicit way; the satellite-specific biases are well spread across all regions (scene types), and mostly reflect the average numbers indicated in Table 7.

Accuracy
The (regional) regression uncertainty at 1 standard deviation is quantified by the Root Mean Square of biases (RMSB) calculated on the earlier derived bias maps (Section 4.3.2: Figures 4 and 5). It is estimated at 1.82 Wm −2 and 2.29 Wm −2 for clearsky and overcast conditions, respectively ( Table 8). As expected from visual inspection of the bias maps, including the solar zenith angle as regression predictor clearly improves the regression accuracy for overcast conditions (by 0.6 Wm −2 ), whereas the inclusion of viewing zenith angle only yields a smaller improvement (by another 0.3 Wm −2 ). The global Mean Absolute Bias (MAB) is an alternative way of quantifying accuracy, and shows similar tendencies.
Above mentioned accuracy estimations are valid for the instantaneous matched NTB pairs. For daily or monthly time scales, however, the error decreases due to the inclusion of nighttime (zero flux). The proportion between monthly mean shortwave flux from the CERES Energy Balanced and Filled (EBAF) dataset [2] (98.7 Wm −2 ) and mean instantaneous shortwave flux from matched NTB pairs (254.4 Wm −2 ) is 0.388. This is used as correction factor to calculate the accuracy on daily time scale, which is estimated at 0.71 Wm −2 (clear-sky) and 0.89 Wm −2 (overcast), both compliant with the GCOS requirement of 1 Wm −2 [1].
The same accuracy assessment is done using the generic surface-type-independent regression, instead of the surface-type-dependent regressions: the results shown in Appendix A.5 indicate that all statistics the performance is significantly worse, demonstrating the advantage of having surface-type-dependent regressions.

Conclusions
This paper establishes conversions between narrowband and broadband reflectances observed by respectively the AVHRR and CERES instruments. This is achieved by using empirical relations in a large data set of collocated, coangular and simultaneous AVHRR-CERES observation pairs. Specific orbital conditions for the AVHRR-and CERES-carrying satellites are required, which seriously narrows down the temporal range of potential matched pairs. A data survey has been performed using all available data for an unprecedented observation matching between both instruments.
Multivariate linear regressions are performed for 15 surface types and 3 cloud cover classes, resulting in regression coefficients for 45 scene types. As a result, for each combination of AVHRR reflectances in channels 1 and 2, for a given scene type, a CERES-like broadband reflectance can be estimated. The linear regressions were found to be accurate, as demonstrated by the regression statistics: adjusted R 2 > 0.9 and relative RMS residual (rRMSr) below 3% for almost all clear-sky and overcast scene types, which is a significant improvement compared to previous works in the literature.
The added value of scene-dependent models was extensively demonstrated for regression calibration (better fit) as well as for validation (lower bias and rRMSr). In addition, this study shows that the inclusion of solar and viewing zenith angle as predictors increases goodness-of-fit and lowers rRMSr, and furthermore proves necessary to avoid sampling-based model biases.
Validation of the regression models is done first by applying them on a validation subset (20% of pairs with strict matching criteria). The results indicate an overall good performance, with rRMSr similar to the calibration-rRMSr, and the flux-equivalent Mean Bias (MB f lux ) close to zero. A second validation approach makes use of a much expanded database with global coverage (obtained by relaxing matching criteria), allowing a spatially-explicit analysis and calculation of global mean statistics.
Error analysis is performed with three measures: Precision is given by the standard error of residuals, which is found very small (<0.03%). Stability is calculated by stratifying the data set according to the AVHRR-CERES instrument combination. The analysis reveals systematic instrument-specific biases with an order of (NOAAA19+Aqua) > (NOAA18+Aqua) > (NOAA17+Terra): this is illustrated by MB f lux for all-sky conditions (+0.32, −0.86, −1.42 Wm −2 ), stressing the need for a well calibrated FCDR. Finally, accuracy is quantified as regional uncertainty at 1 standard deviation on a 5 • × 5 • grid (RMS bias), which is estimated for clear-sky and overcast conditions at respectively 1.8 and 2.3 Wm −2 (instantaneous); this corresponds with 0.7 and 0.9 Wm −2 on daily mean time scale, which is compliant with the GCOS requirements of 1 Wm −2 .
Author Contributions: T.A. wrote the paper which was thoroughly re-read by N.C. All authors have read and agreed to the published version of the manuscript

Conflicts of Interest:
The authors declare no conflict of interest.
[1] After spatial aggregation of all NTB pairs on a 5 • × 5 • grid with area-weighting. [2] Since rRMSr is calculated with respect to all NTB pairs within that scene type, this metric cannot be globally aggregated.