Improved cloud and snow screening in MAIAC aerosol retrievals using spectral and spatial analysis

An improved cloud/snow screening technique in the Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm is described. It is implemented as part of MAIAC aerosol retrievals based on analysis of spectral residuals and spatial variability. Comparisons with AERONET aerosol observations and a large-scale MODIS data analysis show strong suppression of aerosol optical thickness outliers due to unresolved clouds and snow. At the same time, the developed filter does not reduce the aerosol retrieval capability at high 1 km resolution in strongly inhomogeneous environments, such as near centers of the active fires. Despite significant improvement, the optical depth outliers in high spatial resolution data are and will remain the problem to be addressed by the application-dependent specialized filtering techniques.


Introduction
A new Multi-Angle Implementation of Atmospheric Correction (MAIAC) algorithm developed for Moderate Resolution Imaging Spectroradiometer (MODIS) was described recently (Lyapustin et al., 2011a, b). This is a generic algorithm which retrieves aerosol information over land simultaneously with parameters of the bidirectional reflectance distribution function (BRDF) model. To achieve this goal, MAIAC uses the time series of MODIS radiance measurements as well as processing of groups of pixels. This approach utilizes the difference in the time-space variability of aerosols and surface reflectance which can be captured with the daily near-global coverage of MODIS: namely, aerosols vary slowly in space but may change between consecutive MODIS observations, whereas the land surface reflectance has a high spatial variability but low rate of change at short time intervals. A similar idea has recently been implemented for the advanced processing of PARASOL data (Dubovik et al., 2011).
MAIAC aerosol retrievals are performed at high 1 km resolution which is needed in different applications such as visibility assessments , aerosol source identification, air quality analysis (Hoff and Christopher, 2009) etc. In a recent work, Emili et al. (2011) evaluated MAIAC cloud/snow mask and aerosol products in the region of European Alps characterized by a heterogeneous aerosol distribution with strong impact of topography and aerosol sources localized in the narrow valleys with width of several km. While this study clearly demonstrated benefits of the high resolution data as compared to the standard 10 km MODIS product (Levy et al., 2007), including improved spatial coverage and 50 % increase in the number of observations, it has also revealed residual cloud and snow contamination. This effect becomes particularly noticeable in rather pristine Alpine conditions with low average mid-visible aerosol optical thickness AOT ∼ 0.05-0.2. The problem of bias was successfully overcome by Emili et al. (2011) with AOT data filtering where the main filter was based on the 3 × 3 pixel spatial variance test (σ ≤ 0.05). In more detail, this filter successively removed the highest AOT value from the 3 × 3 km 2 area if the standard deviation exceeded 0.05, and then averaged the remaining values effectively leading to 3 km resolution of the aerosol product. The filtering significantly 844 A. Lyapustin et al.: Improved cloud and snow screening in MAIAC aerosol retrievals improved correlation of MAIAC data with AErosol RObotic NETwork (AERONET) AOT for the selected mountainous sites, for example from R ∼ 0.2 to ∼0.8 for Laegeren and Davos, Switzerland, by excluding ∼30 % and ∼50 % of AOT retrievals, respectively.
While filtering is clearly required for some applications, such as climatology analysis, it would have negative consequences for the others. For example, the σ -filter along with reduction of the effective spatial resolution of MAIAC AOT from 1 km to 3 km would eliminate many meaningful retrievals with point sources such as fire smoke plumes. It is, therefore, desirable to address the problem of residual cloud/snow contamination within MAIAC itself. In this work, we explore the opportunities which exist within MA-IAC aerosol algorithm to improve the cloud/snow mask.

Spectral residuals and spatial variability
The full description of MAIAC has been given before (Lyapustin and Wang, 2009;Lyapustin et al., 2011a, b). Below, we provide the minimum level of detail which is only relevant for the current discussion. MAIAC processing starts with gridding MODIS L1B data to a 1 km resolution (Wolfe et al., 1998). The gridded data are placed in the Queue which stores from 5 (poles) to 16 (equator) days of imagery, depending on latitude. The Queue implements a sliding window algorithm used for cloud masking (CM) and surface characterization. Both algorithms utilize 1 km grid cells, which are called pixels, as well as fixed 25 × 25 km 2 areas called blocks. MAIAC CM algorithm (Lyapustin et al., 2008) provides a generally robust performance which is similar to that of the MODIS operational cloud mask (Ackerman et al., 1998) or may exceed it in difficult conditions, for example over bright surfaces and snow. Common to all CM algorithms, the MAIAC CM has a limited ability to identify thin or sub-pixel clouds. MAIAC CM algorithm includes a dynamic land-water-snow classification based on the time series analysis. The snow detection tests are commonly based on the fixed thresholds, which automatically creates a problem of residual snow contamination in aerosol retrievals. The main outcome of the MAIAC surface characterization, relevant for aerosol retrievals, are the BRDF model parameters and surface reflectance uncertainty (ε λ ) at the top of atmosphere (TOA) for every 1 km grid cell in the reflective MODIS bands.
The aerosols are modeled conventionally as a superposition of the fine and coarse modes. Following the MODIS operational Dark Target algorithm MOD04 (Levy et al., 2007), the fine and coarse aerosol models in MAIAC are fixed regionally based on AERONET (Holben et al., 1998) climatology. MAIAC uses the latest MODIS measurements to perform aerosol retrieval based on the knowledge of spectral surface BRDF and its uncertainty in bands B3 (0.47 µm), B1 (0.67 µm) and B7 (2.13 µm) from the previous retrievals. For each pixel (i, j ), it computes AOT by matching the modeled TOA reflectance to the measurement in the Blue band (B3). This procedure is repeated for the increasing values of the coarse mode fraction (CMF) characterized by parameter η equal to ratio of volumetric concentrations of the coarse and fine fractions. The final solution (τ , η) is selected based on the rmse test which is computed using the Red (B1) and shortwave infrared (SWIR, B7) bands: If Eq. (1) cannot be satisfied with aerosol models, the algorithm also tries a liquid water cloud model. The latter represents a cloud consisting of 5 µm water droplets with narrow size distribution (σ = 0.5 µm) which is used to test possible cloud contamination in each pixel. This additional test improves cloud detection capturing many thin or small subpixel clouds (e.g., see Fig. 5 from Lyapustin et al., 2011b). Prior to aerosol retrievals, a snow test (Li et al., 2005) originally implemented in the MOD04 algorithm is performed to filter undetected snow pixels. While the rmse (χ ) test proved to be useful for improved cloud masking, there is additional information contained in the individual spectral residuals δ λ = R Meas λ −R Theor λ ε λ . For example, a retrieval for a thin cloud pixel with the background aerosol model will result in positive residuals δ 0.67 , δ 2.13 . For a partly cloudy pixel, the residuals will be positive with the aerosol models and will change sign when the cloud model is used in the retrievals.

Spectral residuals
A proposed simple cloud test is based on the difference in spectral dependence of extinction of aerosols and clouds due to a large difference in the particle size. To understand its capabilities and assess sensitivity to thin clouds over different surfaces, numerical simulations were conducted. The TOA radiance was first simulated for a given atmosphere-surface model using code SHARM (Lyapustin, 2005), and then MA-IAC aerosol retrieval was applied. We used two surface types representing a typical dense vegetation and a bright urban area whose BRDF model and its uncertainty were provided by MAIAC from MODIS data. The green and bright urban areas geographically represent the summertime northern Washington DC, with albedo q = {0.014, 0.02, 0.033, 0.061} and q = {0.04, 0.081, 0.149, 0.20}, respectively, as a measure of surface brightness in the MODIS channels B8 (0.412 µm), B3, B1 and B7. Here, the MODIS "Deep Blue" band B8 was added to the set of channels used by MAIAC in aerosol retrievals. The liquid water cloud was modeled using a lognormal size distribution with radius 10 µm and standard deviation 0.5 µm using refractive indices from Hale and Querry (1973). A dynamic East Coast aerosol model (see Lyapustin et al., 2011b) was used in the retrievals.
The test results are presented in Fig. 1. The plots (a-b) show spectral residuals for the vegetated and urban surfaces obtained with aerosol model for a thin cloud pixel with optical thickness (COT) of 0.234, and plot (c) shows results for the urban surface and a thicker cloud of COT = 0.7. The different lines represent different view geometries: the red, black and blue lines correspond to cosines of view zenith angle µ = cos(VZA) = −1, −0.7 and −0.4, while the solid, dashed and dotted lines represent three relative azimuths of 35 • (forward scattering), 90 • , and 145 • (backscattering). The residual in the Blue channel (0.47 µm), which is used to compute AOT, is always zero. It is positive in the Red and SWIR bands over the dark and vegetated surfaces. The plot (a) shows that a simple criterion δ 0.67 > 1.5-2, δ 2.13 > 1.5-2 could be used in this case to detect very thin liquid water clouds with COT ∼ 0.25. Over brighter surfaces, however, the residuals may take both positive and negative values, depending on the view geometry. In this case, a sufficient detection sensitivity is attained for thicker clouds (COT > 0.7), as illustrated by plot (c). These plots also show that adding the "Deep Blue" channel (0.412 µm), where the residual systematically takes negative values, may enhance the cloud discrimination capability of the proposed spectral test.
A similar idea can be used for detection of the residual snow which increases surface brightness in the visible wavelengths (δ 0.67 > 0) and decreases it in the SWIR (δ 2.13 < 0).
While the idea of the proposed spectral test is seemingly simple, its realization is complicated by several factors: (1) the spectral surface reflectance in MAIAC is known with uncertainty characterized by its standard deviation ε λ . This error includes contributions from all sources including gridding, atmospheric correction and fitting limitations of the BRDF model. Assuming Gaussian distribution of errors, the specific reflectance would agree with the model to within ±ε λ in ∼68 % cases and within ±2ε λ in ∼95 % cases; (2) the surface can change since the last BRDF retrieval. For example, rain can darken the soil decreasing its reflectance in both Red and SWIR channels, whereas undetected sub-pixel snow would increase surface reflectance in the Red band and decrease it in the SWIR. Over vegetated surfaces, the common perennial changes are related to the vegetation phenology tracking transitions between winter and summer in the northern latitudes or between wet and dry seasons in sub-tropics. While the "green-up" surface signal is spectrally unique, the senescence or "browning" of the surface presents a particular problem because it is spectrally similar to the effect of thin clouds in the Red-SWIR bands. This discussion shows that the individual pixel tests are prone to errors, and they should be used together with the larger-scale analysis based on groups of pixels (blocks), which would provide statistical mitigation of errors and a robust separation between clouds and surface change.

Aerosol spatial variability
With the block-level analysis, justified above, one can use additional spatial variability techniques to screen outliers caused by clouds and snow. They are based on a relative homogeneity of global aerosols at scales below ∼50 km (e.g., Anderson et al., 2003). For example, the MOD04 Collection 5 algorithm uses a 3 × 3 spatial variance cloud filter (Martins et al., 2002) and discards the darkest 20 % and brightest 50 % pixels in the 10 × 10 km 2 box, helping to screen cloud shadows and clouds/snow, respectively. A similar approach was implemented in MAIAC validation analysis against AERONET (Lyapustin et al., 2011b, c) based on screening of the high 50 % of retrieved AOT data. For this reason, MAIAC validation was not generally affected by the outliers. In addition, averaging the remaining data over a 10-20 km window allowed us to account for the meteorological conditions and the time difference between AERONET measurements and MODIS overpass (Ichoku et al., 2002), as well as to increase the comparison statistics. Emili et al. (2011), however, conducted validation using a single 1 km AOT value closest to the AERONET sunphotometer location which explicitly revealed the high scatter and biases in the unfiltered MAIAC AOT data over mountainous regions.   The spatial variability analysis, based on the 25 km blocks, was introduced in the current version of MAIAC. Specifically, it filters high AOT values when clouds and/or snow are detected by the CM algorithm. The threshold linearly depends on the cloud fraction (CF), decreasing from 60th percentile for CF = 0.05 down to 20th percentile for CF = 0.7. An extensive analysis of MODIS data showed that the dynamic threshold depending on cloud fraction provides much better cloud screening than the static global threshold. If the cloud fraction exceeds 70 %, MAIAC does not perform processing for the given block. In cloud-free conditions with snow detected, the high threshold represents the 25th percentile of AOT data.
The described screening is not applied when the cloud fraction is low to preserve MAIAC capability for high resolution aerosol retrieval near the aerosol sources.

Algorithm implementation
The improved scheme of MAIAC aerosol retrievals is illustrated in Fig. 2. It includes the surface change detection component and enhanced cloud, cloud shadow, and snow masking. The numbered rectangles on the left represent separate routines with the block-level (25 × 25 km 2 ) scope of application. Symbol B on the left indicates that an operation is applied to every cloud-free pixel of the block, whereas RB means "the Remaining pixels of the Block" unaffected by the previous actions. Because the individual pixel tests do not guarantee the correct result, the algorithm is designed to al-low commission errors, for example detecting clouds in clear conditions, with restoration of the correct results in the final stage of processing.
The aerosol algorithm starts with computing aerosol optical thickness and rmse (τ 1 , χ 1 ) ij using the background aerosol model (k = 1) for every clear pixel of the block according to the MAIAC cloud mask (step 1). A copy of these results is saved for a later "Clear Sky Restore" analysis (step 9).
The next retrievals should be repeated with higher CMF values (η) searching for a pair (τ , η) that minimizes the rmse given by Eq. (1). At this stage, the undetected surface change may introduce a systematic error as was mentioned above. For example, during senescence, when the surface is brightening in the Red and SWIR channels, the straightforward approach would overestimate both CMF and AOT and would also result in a high commission error of false cloud detection. To avoid that, a Surface Change Detection algorithm is implemented in step 2. It is applied when AOT is low and the day is clear which for a given block is verified by a high covariance (cov ≥ HIGH) between the measured reflectance (in B1) and the reference clear-sky image maintained by the CM algorithm (Lyapustin et al., 2008).
The Surface Change Detection looks for a simultaneous anti-correlated change of surface reflectance in the Red and NIR (band B2) bands during the 16-day interval. To enable reflectance comparisons measured at different view angles on different days, the surface reflectance is first normalized to the standard view geometry of nadir view and solar zenith angle 45 • using the BRDF model. While the green-up change is spectrally unique and can be accepted for an individual pixel, the "browning" can be easily confused with undetected thin cloud. For this reason, detected "browning" is confirmed only if it is observed for at least a quarter of the block's pixels. Otherwise, the detected "browning" is classified as a random noise and is discarded. The details of this algorithm will be described separately.
If the retrieved AOT is low or rmse < 1 or the surface change has been detected, the algorithm reports AOT for the background aerosol model in step 3, and aerosol processing for a given pixel terminates. Otherwise, cloud test 1 (CT1) is applied (δ 0.67 > 1.5, δ 2.13 > 1.5, and δ 0.412 < 0), and if successful, the pixel is flagged as possibly cloudy (CM PCLOUD). Note that all newly detected CM PCLOUD pixels must be validated in the final "Clear Sky Restore" test in step 9.
Step 4 shows the standard aerosol retrieval loop with higher coarse mode fractions (index k) according to Eq. (1).
The further processing is designed to detect additional clouds, cloud shadows and snow.
Step 5 helps avoid unnecessary processing in clear conditions (cov ≥ HIGH).
In the next Step 6, the aerosol retrievals are repeated with the cloud model for the remaining pixels of the block. The pixel is masked as possibly cloudy if rmse k < 1, or if spectral residuals are negative with the cloud model but were positive with the last aerosol model (χ K 0.67 < 0 , χ K 2.13 < 0 and χ K−1 0.67 > 1, χ K−1 2.13 > 1) which often indicates presence of the sub-pixel clouds.
Step 7 implements the spatial variability analysis discussed in Sect. 2.2 based on low (AOT 0.1 ) and high (AOT High ) AOT thresholds, where the latter depends on the cloud fraction.
The next step 8 performs additional tests for residual snow, shadows and clouds: -If snow has been detected in the block (N snow > 5), then the pixel is flagged as CM CLEAR SNOW if δ 0.67 > 2, δ 2.13 < −2, τ ij > AOT High and cov is high. If covariance is low, indicating presence of clouds, then the pixel is flagged as possibly cloudy (CM PCLOUD).
Earlier we mentioned that uncertainty in the knowledge of surface reflectance and surface change often result in selection of unrealistically high aerosol coarse mode fraction (and high AOT value) or false cloud detection. The last Clear Sky Restore Test 9 is designed to correct these errors and restore the value of cloud mask and AOT/CMF. It is based on the idea that aerosol variability at scale of 25 km is expected to be low in clear conditions, so the pixel AOT should be close to the average value. To this end, the block-average value (AOT av ) is first computed for the pixels with the background aerosol model retrieval. Next, for the other cloud-free pixels we check if the original AOT value retrieved with the background model and saved at stage 1 (τ 1 ij ) is close to the AOT av . Specifically, we restore the CM CLEAR value of the cloud mask for pixels masked as CM PCLOUD, or the background model for pixels with high CMF if τ 1 ij < 1.2 × AOT av .

Performance analysis
The performance of the upgraded algorithm was first assessed in comparison with AERONET data for different sites, including Laegeren, Switzerland, and Ispra, Italy analyzed by Emili et al. (2011). We used three different averaging window sizes, 3, 5 and 10 km, and applied a standard validation approach which filters high MAIAC AOT data above 70th percentile (Lyapustin et al., 2011b) and requires at least 3 valid retrievals in the window. In order to test MAIAC AOT at 1 km resolution, we used the nearest valid pixel within 1 km from the AERONET sunphotometer location.
The results for the Laegeren and Ispra sites, based on MODIS Terra 2000-2008 data, are presented in Fig. 3. The original 1 km data show the most improvement in correlation coefficient, from R ∼ 0.2 to R ∼ 0.84 for Laegeren and from R ∼ 0.67 to R ∼ 0.91 for Ispra, despite the relatively high scatter and offset. The number of comparison points is the lowest at 1 km increasing by a factor 4-8 for 3 and 10 km windows for Laegeren, and 6-14 for Ispra, respectively. The regression parameters improve noticeably when switching to a 3 km window due to filtering residual outliers and data averaging, however further improvement with increasing the window size to 10 km is only incremental. Moreover, the regression coefficient for Ispra slightly drops when the window size becomes larger than 3 km. Analysis of MAIAC retrievals shows a significant heterogeneity of aerosols on hazy days in the area of Ispra created by aerosol transport from highly industrialized and polluted Po valley to the south-east and blocking effect of Alps on the north-west. Figure 4 gives an illustration of significant south-east to north-west AOT gradient across the 50 km area centered at Ispra during 5 days in October 2001. In these circumstances, the smaller 3 km window may give a better representation of the local conditions sampled by the AERONET sunphotometer. A similar observation was reported by Lyapustin et al. (2011c) for Beijing where a better agreement with AERONET was observed with 10 km rather than 20 km window.
One of the key requirements for the developed cloud/snow filter in MAIAC was to keep the aerosol retrievals with high spatial variability unaffected, in particular near sources.   274 (b). The images show MODIS Terra TOA data, MAIAC RGB BRF, cloud mask and AOT 0.47 for 600 km tiles. The cloud mask color coding is described in caption of Fig. 4. (b) The same as with variable cloudiness and high aerosol levels in the middle of the image transported by easterly winds. The aerosol concentration is much lower on the northern and southern parts of the image where the good quality of cloud masking becomes apparent. The second image (day 274) shows a number of active fires and smoke plumes. Despite the high spatial aerosol variability, these retrievals generally are not affected by the developed filter even in the centers of the burning area.

Conclusions
Contamination of aerosol retrievals by unresolved clouds and snow is a serious problem of remote sensing gaining a particular significance at high spatial resolution. Due to a continuous spectrum of cloud/snow presence in measurements in terms of sub-pixel area and reflectance magnitude, there is no ultimate solution to this problem, but it can be sufficiently mitigated to provide an overall high quality aerosol product. Moreover, the definition of AOT outliers is clearly application-dependent: for example, AOT enhancement near cloud edges may be a manifestation of 3-D radiative effects (Wen et al., 2006) or of complex aerosol-cloud interactions in the "twilight" zone (Koren et al., 2007), while being an obvious outlier for the climatology or air quality analysis.
This paper described an improved cloud/snow screening technique in algorithm MAIAC which is implemented as part of the aerosol retrieval based on analysis of spectral residuals and spatial variability. Comparisons with AERONET aerosol measurements and a large-scale MODIS data analysis show strong suppression of aerosol optical depth outliers due to unresolved clouds and snow. At the same time, the developed filter does not reduce the aerosol retrieval capability at high 1 km resolution in strongly inhomogeneous environments, such as near centers of active fires. Despite significant improvement, the optical depth outliers in high spatial resolution data are and will remain a problem which needs to be addressed by the application-dependent specialized screening techniques.