Downscaling MODIS Surface Reflectance to Improve Water Body Extraction

Inland surface water is essential to terrestrial ecosystems and human civilization. Accurate mapping of surface water dynamic is vital for both scientific research and policy-driven applications. MODIS provides twice observation per day, making it perfect for monitoring temporal water dynamic. Although MODIS provides two bands at 250m resolution, accurately deriving water area always depends on observations from the spectral bands with 500m resolution, which limits its discrimination ability over small lakes and rivers. The paper presents an automated method for downscaling the 500mMODIS surface reflectance (SR) to 250m to improve the spatial discrimination of water body extraction.Themethod has been tested at Co Ngoin and Co Bangkog in QinghaiTibet plateau. The downscaled SR and the derived water bodies were compared to SR and water body mapped from Landsat-7 ETM+ images were acquired on the same date. Consistency metrics were calculated to measure their agreement and disagreement. The comparisons indicated that the downscaled MODIS SR showed significant improvement over the original 500m observations when compared with Landsat-7 ETM+ SR, and both commission and omission errors were reduced in the derived 250m water bodies.


Introduction
Inland surface water bodies include fresh and saline lakes, rivers, and reservoirs in the land. Although inland surface water only covers a small portion of global land, it is essential to terrestrial ecosystems and human civilization. For example, the distribution of surface water in space and its dynamic over time would greatly alter the Earth's surface conditions and influence the general circulation of the atmosphere [1,2]. On the other hand, by influencing the hydrological cycle and water phase variability, variations in temperature and precipitation strongly affect lakes and glaciers. They would also impose some feedback on the climate system and affect the environment of human beings [3,4]. Thus, the expansion and shrinkage of lakes are related to many environmental and ecological issues and are important factors that must be considered in human socioeconomic development [5].
Particularly, in the mountain regions, as the alpine hydrologic environment has been minimally disturbed by human activities, such as agricultural settlements and irrigation, the high altitude lakes are sensitive to the changes in air temperature, precipitation, snow-glacier melt, and soil frost degradation [6]. Because of the remoteness and inaccessibility of the lakes in mountain regions, remote sensing provides quick views of land from distance and has been proven to be the most effective way for mapping surface water in terms of spatial distribution and dynamic [7]. Data acquired from many remote sensing sensors, such as Landsat-5 TM (Thematic Mapper), Landsat-7 ETM+ (Enhanced Thematic Mapper Plus) and MODIS (Moderate Resolution Imaging Spectroradiometer), have been used for water mapping [8]. Although remote sensing data with higher spatial resolution (e.g., Landsat-5 TM and Landsat-7 ETM+) are able to delineate smaller water bodies, its long repeating time (16 days) limits it from 2 Advances in Meteorology catching rapid water dynamic due to seasonal or extreme weather events, such as heavy rainfall [9]. The long repeating time also makes it difficult to observe the water body progress especially in cloudy areas due to its vulnerability to cloud contaminations [10]. Quick repeating remote sensing sensors, such as MODIS on board of Terra and Aqua, are able to catch water body changes precisely with their high temporal resolutions, making it particularly suitable for detecting water body dynamic for large areas [11]. The downside of MODIS is the low spatial resolution. MODIS has only two 250 m bands, but water bodies are usually detected at 500 m since they require Green or SWIR bands at 500 m resolution, unless combining other high resolution reference data. Ji et al. found that the Normalized Difference Water Index (NDWI) calculated from Green and SWIR with shorter wavelength region (1.2 to 1.8 um) had the most stable threshold to delineate surface water features [12]. Soti et al. suggested that among several water indices, two middle infrared-based indices (MNDWI-Modified Normalized Difference Water Index and NDWI) were most efficient for locating and monitoring temporary ponds in arid lands [13]. Another example is the production of the MODIS water mask data set (MOD44W) by combining MODIS 250 m bands and 90 m SRTM water body dataset [14]. However, previous studies have suggested that there are possible relationships between the 250 m bands and other 500 m MODIS bands, and such relationships can be used for improving the spatial resolution with MODIS data downscaled to 250 m. For example, Discrete Wavelet Transform (DWT) fusion method by decomposing high resolution images and inverse DWT was used to generate other related high resolution images. However, DWT fusion method had lower visual interpretation [15]. Trishchenko et al. proposed an adaptive regression model for every generic scene type (vegetation, desert/barren land, snow, water and cloud), relating MODIS band3-7 with band1, band2 and Normalized Difference Vegetation Index (NDVI) to obtain band3-7 SR at 250 m. The adaptive regression model was mainly built on MODIS MOD 02-Level-1B calibrated and geo-located ataperture radiances product (MOD02HKM/MOD02QKM) without atmospheric conditions correction [16]. Therefore, to build an adaptive regression model, atmospheric correction aimed to adjust for atmospheric scattering and absorption effects and a generic land cover classification were needed, which required complicated processing steps and much time [17]. This paper presents a downscaling method based on the modified adaptive regression model towards extracting water bodies from MODIS SR at 250 m resolution. The method was applied to the MODIS image acquired in Co Ngoin and Co Bangkog in the Tibetan Plateau. Finally, we make comparison between the extracted water area at original 500 m, downscaled 250 m and water mask visually interpreted from Landsat-7 ETM+ data acquired on the same date.

Study Area and Data Sources
2.1. Study Area. The Qinghai-Tibet plateau (QTP) is located in central Asia, with an average elevation over 4,000 m and is called "the Third Pole" of the Earth and "Asian water tower" because of the thousands of lakes formed by water melted from the large glaciers in QTP [18]. Among all the lakes in QTP, Co Ngoin and Co Bangkog located in the center of the plateau is selected as the study area to represent the lakes in QTP, primarily because their complex shapes and shorelines are perfect for investigating the improvement after incasing the spatial discrimination ( Figure 1). Both Co Ngoin (31 ∘ 24 -31 ∘ 32 N and 91 ∘ 28 -91 ∘ 33 E) and Co Bangkog (31 ∘ 40 -31 ∘ 47 N and 89 ∘ 26 -89 ∘ 39 E) are located in the basin of Selin. The area of Co Ngoin is 61.3 km 2 and supplied by rivers in its southwest. Average annual temperature in the lake catchment is −2.0∼0.0 ∘ C, and the annual evaporation is around 1,000 mm. Annual precipitation fluctuates between 300 and 600 mm,. and more than 80% annual precipitation concentrates between May to September and falls mainly as hailstorms [19]. Co Bangkog is about 90 km 2 in area and the average annual temperature in the lake catchment is −2.0∼ −1.0 ∘ C. The annual precipitation is 308.33 mm and the annual evaporation is around 2238.6 mm. More thunderstorms and hail concentrate in July and August [20]. In rainy seasons, regional continuous precipitation could result in the expansion and shrinkage of lake area over several days in QTP. For example, there was twice continuous precipitation in the period of both from August 31 to September 5th and from September 16 to 17 for Zhuonai lake, which led to the dramatic extension of lake area. After flooding, the area of Zhuonai lake declined steeply [21]. MODIS provides twice observation per day, making it particularly suitable for detecting water body rapid dynamic.

Data Sources.
The MODIS sensor on the Terra and Aqua satellites has viewing swath width of 2,330 km and a revisit period of one day with 36 spectral bands ranging in wavelength from 0.4 m to 14.4 m. The sensor takes observations at R and NIR bands at a nominal resolution of 250 m at nadir, five bands at 500 m, and the remaining 29 bands at 1 km (Table 1) [22]. MOD09GA provides MODIS 1-7 daily SR at 500 m resolution and 1 km observation and geo-location statistics. MOD09GQ provides MODIS 1-2 daily SR at 250 m resolution in conjunction with the MOD09GA where important quality and viewing geometry information is stored [23]. The two products are distributed in the Sinusoidal (SIN) projection and HDF-EOS format. The global coverage of the datasets is divided into 648 tiles. The data were downloaded from http://modis.gsfc.nasa.gov/.
The Landsat-7 ETM+ image acquired on the same dates as MODIS data was collected for validating the results because: (1) The Terra MODIS and Landsat-7 are in the same orbit, with MODIS Terra nadir observations occurring approximately 15 min after Landsat-7 observations while the orbit of Landsat-5 is greatly different from MODIS Terra which may lead to disagreements on SR due to Bidirectional Reflectance Distribution Function (BRDF) effect [24]; (2) The MODIS bands have been proven to be comparable and strongly agree with the 6 Landsat-7 ETM+ multispectral bands [25] ( Table 1). We only use the Landsat-7 ETM+ SR acquired before 2003 to avoid the scan-line corrector (SLC) off issue.  Figure 1: Locations of Co Ngoin and Co Bangkog (RGB543 TM image).  500 m resolution, which is then applied to MODIS 250 m 1 and 2 to predict corresponding 3-7 SR (Equation (1)). The logic and sequence of operations for this processing scheme ( Figure 2) as follows: where NDVI = ( 2 − 1 )/( 2 + 1 ) and is the observed reflectance for band .
Step 1 (data blocking). A few negative SR values resulted from atmospheric correction algorithm [23] were firstly filled with 1. To minimize the impact of vegetation diversity across different climatic and ecological zones, latitudinal variation of surface properties, and sun-view geometry effects on the SR prediction, each MOD09GA tile was equally divided into blocks (4 × 4) where pixels within the NDVI range of 0.1 as homogeneous land cover type were in the same regression units. Similar processes were applied to corresponding MOD09GQ tile as well.
Step 2 (training samples collection). To build the model precisely, three sampling rules were applied: (1) pixels with cloud contaminations and detector problem, identified by the 1 km QA layer in MOD09GA [23], were filtered; (2) heterogeneous pixels were excluded. Specifically, a window of 3 × 3 MODIS pixels was used to calculate the value range of each centered pixel as the difference between the maximum and the minimum values of the 9 MODIS pixels. A MODIS pixel was homogeneous if the value range was less than a predefined threshold value tested by Feng et al. [28]; (3) for the regression unit without adequate homogeneous samples, these samples were then pooled with other homogeneous samples within corresponding block to form an ensemble training sample.
Step 3 (regression coefficients calculation). In each regression unit, taking 1 , 2 , and NDVI as independent variables and 3 to 7 as dependent variables separately at 500 m resolution, regression coefficients were derived using the nonlinear least-squares method of Press et al. [29]. Subsequently, according to the NDVI, each pixel at 250 m resolution was matched to specific regression coefficients within the parallel blocks. Pixels with NDVI values out of the range of NDVI at 500 m resolution were assigned to coefficients related to nearest maximum or minimum NDVI at 500 m. The matched regression coefficients finally were applied to MODIS 250 m 1 and 2 SR to generate the surrogate intermediate 3 to 7 SR at 250 m.
Step 4 (spectral normalization). A simple 2 × 2 active window for preserving the radiometric consistency was utilized to average SR values of four pixels at 250 m to the corresponding pixel SR at the original 500 m resolution.

Water Extraction Based on Multiple Water Indexes.
Water body extraction is mainly based on the visible, near-infrared and shortwave-infrared band reflectance difference in water from other land cover types (e.g., vegetation, soil, and resident region) [30]. For example, NDWI proposed by Mcfeeters took advantage of the obvious difference of reflectance in the visible green and near-infrared wavelengths (Equation (2)) [31]. However, Xu discovered that the reflectance of water in the shortwave-infrared band is generally a little lower to augment the difference with the visible green in comparison with near-infrared band, hence MNDWI was proposed to detect water body (Equation (3)) [32]. Consider where CH2, CH4, and CH6 are near-infrared, visible green, shortwave-infrared band reflectance in MODIS, respectively. To our knowledge, single NDWI or MNDWI is difficult to distinguish water from other features. For example, shallow or low quality water bodies showing weak water character in the multispectral bands, are difficult to be identified. Hill shade and shadow may also show similar spectral profile and cause errors in water extraction. Therefore, water body extraction model built in this paper integrates NDWI and MNDWI with spectral relation between near-infrared, shortwave-infrared band [33]. More specifically, typical water bodies were identified using a stable threshold value of NDWI (0.1). Since MNDWI is more sensitive to water bodies and ice and snow, it was used to extract ice and snow of the QTP region [34]. Finally, unlike other features (such as vegetation and soil), water in the short-wave infrared reflectivity is lower than the near infrared wave band, thus spectral relation between short-wave infrared and near infrared band can distinguish the difference between water and land mixed pixels [35]. Figure 3 presents the steps of the water detection process.

Implementation and
Sharing of the Methods. The method was implemented using C and Python programming languages. Open-source libraries, that is, GDAL/OGR (http:// www.gdal.org/), PROJ4 (http://trac.osgeo.org/proj/), NumPy (http://www.numpy.org/), Matplotlib (http://matplotlib.org/), were integrated into the implementation to avoid using commercial software and also to enable application on multiple platforms, for example, Linux, Windows. The method implementation has also been standardized and deployed on a model sharing platform to promote applications of the method in the scientific research communities. The shared model can be called and interacted through predefined protocols, such as the Open Geospatial Consortium (OGC) Web Processing Service (WPS) standard. Detail of the model sharing can be found in [36].

Validation
To evaluate the performance of MODIS downscaling method, it was applied to MODIS SR acquired in Co Ngoin and Co Bangkog, and the results were compared to simultaneously acquired Landsat-7 ETM+ SR. The Landsat-7 ETM+ SR was aggregated to match the MODIS spatial resolution for comparisons. Water bodies extracted from the original and downscaled MODIS SR data were compared to water body visually interpreted from Landsat-7 ETM+ images. Confusion matrix was calculated to measure their agreement.

MODIS-Landsat-7 ETM+ SR Comparison.
The original and downscaled images were compared to coincident Landsat-7 ETM+ SR, respectively. When comparing the results derived from MODIS to Landsat, the Landsat pixels were aggregated to MODIS projection and pixel footprint for comparison to minimize the error caused by reprojection. The Landsat data is distributed at 30-m resolution in Universal Transverse Mercator (UTM) projection. The area of a 250-m MODIS pixel is roughly 64 times of a Landsat pixel. In order to match the coverage of each MODIS pixel more accurately, the four corners of each MODIS pixel are used to draw a polygon, which represents the footprint of the MODIS pixel. The polygon was re-projected to Landsat UTM projection, and only Landsat pixels located within the polygon were used for calculate an average reflectance value for the corresponding MODIS pixel, which was then compared to the SR value of the MODIS. Additionally, MODIS has a wider view than Landsat-7. To avoid disagreement caused by view zenith angles differences, cloud and shadow, only the cloudfree pixels with view zenith angles within the view zenith angle range of the Landsat-7 (i.e., ±7.5 ∘ from nadir) were used in the comparison. Consistency metrics were calculated to measure the agreement and disagreement between the original and downscaled MODIS SR and aggregated Landsat-7 ETM+ SR. These evaluation metrics were quantified by Mean Absolute Error (MAE), Root-Mean-Squared Difference (RMSD) recommended by Willmott [37]. Consider where and are the measured SR values for MODIS and aggregated Landsat-7 ETM+, is the count of joint observations in the sample. In order to reduce uncertainties of MODIS and Landsat-7 ETM+ SR values, the linear relationship ( = + ) between and was modeled by Reduced Major Axis (RMA) regression to calculate the parameters (intercept), (slope), and 2 (correlation coefficient). Consider Advances in Meteorology where , , , and are mean and sample variance of and , respectively, and is covariance of and . Consider wherêis predicted MODIS SR value from a Landsat-7 ETM+-derived SR value based on the modeled correlation ( = + ). Systematic error (MSE ) represents the difference between the trend of MODIS-derived and Landsat-7 ETM+ SR while unsystematic error (MSE ) accounts for the variation surrounding the trend. Therefore, MSE and MSE are aggregated to RMSD. Consider To maintain consistent units, the square roots of MSE and MSE are reported, that is, RMSD and RMSD , in units of percent reflectance.

Accuracy Assessment of Extracted Water Bodies.
The water bodies extracted from both the original and downscaled MODIS SR were compared to the water body interpreted from Landsat-7 ETM+ image to assess the improvement of the downscaled method in terms of water classification. To achieve better pixel accuracy, water map acquired from coincident Landsat-7 ETM+ images was first aggregated to MODIS spatial resolution based on the principle of dominant type by calculating the number of Landsat-7 water pixels within the extent of each sampled MODIS pixel. Detected water bodies were separately overlaid with aggregated Landsat-7 water extent to verify the content of the pixels on the MODIS image. Accuracy measures were based on confusion matrix [7], overall classification accuracy ( ), percentage of omission, and commission error ( and ). Consider where is the accuracy of water pixels identified correctly and and are commission and omission error, respectively. is the number of water pixels in the reference ETM+ water map, is the number of water pixels detected in MODIS images, is the number of land pixels misclassified as water in MODIS images, and is the number of water pixels misclassified as land in MODIS images. Figure 4 presents false-color compositions (Band SWIR1, NIR and R) of downscaled 250 m MODIS SR data (Figure 4(a)), the original 500 m MODIS SR data (Figure 4(b)), and simultaneously acquired aggregated 250 m Landsat-7 ETM+ SR data (Figure 4(c)) in Co Nagoin. Figure 5 presents false-color compositions (Band SWIR1, NIR and R) of downscaled 250 m MODIS SR data ( Figure 5(a)), the original 500 m MODIS SR data (Figure 5(b)), and simultaneously acquired aggregated 250 m Landsat-7 ETM+ SR data ( Figure 5(c)) in Co Bangkog. Significant improvement of spatial delineation can be observed in the downscaled data than the original data. For example, in the two study areas, the lakes and islands (Figures  4(a) and 5(a)) were showing closer areas and shapes to the higher resolution Landsat-7 ETM+ reference data (Figures  4(c) and 5(c)). The white noise pixels in band 6 of the original MODIS data Figure 4(b) were fixed in the downscaled result (Figure 4(a)).

Improvement of Downscaled MODIS SR.
Although the downscaled data showed more detail, few uneven SR values can be observed, especially at the transition boundary of different land cover type, such as the red pixels sat at the edge of the lake and islands in the Figures 4(a) and 5(a). Further investigation revealed that the disagreements were caused by spectral normalization of the downscaled method. In this study, spectral normalization using equally weighted average method possibly led to the excessive enhancement on mixed pixels with high SR values. Theoretically, the pixel SR at 500 m can be derived from aggregating the 250 m pixels within the particular extent centered at the corresponding pixel at 500 m [38]. For mixed pixels of 500 × 500 m cell size at the edge of the lakes, SR values of the corresponding four pixels at 250 m change greatly, especially in the two MODIS SWIR bands. As a result, the equally weighted average method based on a 2 × 2 active window generally causes excessive enhancement in the pixels of higher SR and widens the distinction among neighboring pixels. This variable condition can also be confirmed in Figure 6(a) where part of pixels, especially for the two MODIS SWIR bands, tend to have higher SR values than Landsat-7 ETM+. Nevertheless, excessive enhancement has inherent advantage of identifying mixed water pixels at the edge of lakes on the ground that excessive enhancement can increase the difference between water and other land types to better detect water pixels.
The performance of the downscaling method was further assessed by comparing the MODIS and Landsat-7 ETM+ SR in density plots shown in Figure 6. Following the conclusion of the comparability of Landsat and MODIS bands proposed by Masek et al. [39], each MODIS band was compared to its Landsat-7 ETM+ solar-reflective spectral band with closer spectral range (Table 1). Figure 6(a) presents the SR comparison between the MODIS downscaled SR and aggregated Landsat-7 ETM+ SR, and Figure 6 percentage points. Correlations at the local scale are strongly linear, with 2 higher than 0.9 for all bands except blue band. Intercepts vary between −1.5 and 0.9, and slopes are close to 1, although outliers-boundary pixels SR of excessive enhancement for two MODIS SWIR bands (Table 2). Correspondences between 500 m MODIS SR and the coincident Landsat-7 ETM+ SR are similar to those between 250 m MODIS SR and Landsat-7 ETM+ SR, but the spread of the data are wider (Figures 6(b)), and deviations (RMSD) range from 1.9 to 4.6 percentage points (Table 2). When comparing results at 500 m spatial resolution, 2 is typically higher and other evaluation indicators are lower (i.e., MAE, RMSD, RMSD , RMSD ), indicating that SR results at 250 m are superior to corresponding SR at 500 m. On the other hand, there are still some discrepancies in comparison to Landsat-7 ETM+ SR, particularly for two MODIS SWIR bands. Table 2 shows that two SWIR bands have higher RMSD than the other 5 bands. Wavelength differences between Landsat-7 and MODIS bands (Table 1) are likely source of the SR differences. The much narrower bandwidth of the two MODIS SWIR bands make it possible to avoid the spectral ranges with lower atmospheric transmittance that are covered by the Landsat-7 ETM+ SWIR bands [26]. Figure 7 presents the comparison of MODIS water masks and spatial error distributions in comparison to Landsat-7 ETM+ data in Co Nogin and Figure 8 is for Co Bangkog. Although the same water classification method was applied, the water map produced from the downscaled data (Figures 7(c) and 8(c)) shows more details than the map from the original MODIS data (Figures  7(d) and 8(d)). For instance, the edges of the identified lakes are smoother. Shapes of the small islands in Co Ngoin and Co Bangkog are also improved. This richer information generally matches the features that are depicted in the water body reference map well (Figures 7(b) and 8(b)).   Apart from visual interpretation, the spatial errors were calculated and presented in Table 3. For Co Nagoin, MODIS results (5166 water pixels at 250 m and 5096 at 500 m resp.) underestimate the extent of water body compared to that of Landsat-7 ETM+ image with 5228 water pixels. Although omission ( ) and commission error ( ) exist in both water bodies from the original and downscaled MODIS data, both and at 250 m (4.32% and 3.14%) are almost cut in half in comparison to the original 500 m resolution data (8.30% and 5.78%), and overall accuracy ( ) increases from 91.70% to 95.68%. For Co Bangkog, is reduced slightly from 9.13% to 8.68%.

Improvement of Water Body Map.
significantly reduces from 12.02% at 500 m to 3.45% at 250 m. Overall accuracy ( ) increases from 90.87% to 91.31%.

Conclusion
In order to improve the spatial delineation of water mask detected from MODIS data, this paper presents an automated method for downscaling the original 500 m bands to 250 m for water detection. The method was applied to Co Ngoin and Co Bangkog, to extract water bodies at a higher spatial resolution. Comparisons to SR and water body extracted from simultaneously acquired Landsat-7 ETM+ image suggest the improvement over the original data in terms of downscaled SR and water classification. The edges of the lakes become smoother than the results from original MODIS image and smaller islands are delineated with clearer shape in the downscaled 250 m water body maps, as well as a higher overall accuracy and lower omission and commission error of water bodies mapping for Co Ngoin and Co Bangkog at 250 m. The paper presented downscaled 250 m resolution SR and detected water bodies for two lakes in QTP. With two MODIS images, the results only demonstrated static representing of water extent of the two lakes on the image acquisition dates, but the success of the method enables detecting possibly daily water dynamics using MODIS Terra observations. The automated method is also expected to be applied to other lakes in QTP to meet the need for measuring water distribution and dynamic in the plateau and its surrounding areas. Further effort is needed to assess the robustness of the method for nonwater applications and water bodies in other areas beyond QTP.