Generating High Spatio-Temporal Resolution Fractional Vegetation Cover by Fusing GF-1 WFV and MODIS Data

As an important indicator to characterize the surface vegetation, fractional vegetation cover (FVC) with high spatio-temporal resolution is essential for earth surface process simulation. However, due to technical limitations and the influence of weather, it is difficult to generate temporally continuous FVC with high spatio-temporal resolution based on a single remote-sensing data source. Therefore, the objective of this study is to explore the feasibility of generating high spatio-temporal resolution FVC based on the fusion of GaoFen-1 Wide Field View (GF-1 WFV) data and Moderate-resolution Imaging Spectroradiometer (MODIS) data. Two fusion strategies were employed to identify a suitable fusion method: (i) fusing reflectance data from GF-1 WFV and MODIS firstly and then estimating FVC from the reflectance fusion result (strategy FC, Fusion_then_FVC). (ii) fusing the FVC estimated from GF-1 WFV and MODIS reflectance data directly (strategy CF, FVC_then_Fusion). The FVC generated using strategies FC and CF were evaluated based on FVC estimated from the real GF-1 WFV data and the field survey FVC, respectively. The results indicated that strategy CF achieved higher accuracies with less computational cost than those of strategy FC both in the comparisons with FVC estimated from the real GF-1 WFV (CF:R2 = 0.9580, RMSE = 0.0576; FC: R2 = 0.9345, RMSE = 0.0719) and the field survey FVC data (CF: R2 = 0.8138, RMSE = 0.0985; FC: R2 = 0.7173, RMSE = 0.1214). Strategy CF preserved spatial details more accurately than strategy FC and had a lower probability of generating abnormal values. It could be concluded that fusing GF-1 WFV and MODIS data for generating high spatiotemporal resolution FVC with good quality was feasible, and strategy CF was more suitable for generating FVC given its advantages in estimation accuracy and computational efficiency.


Introduction
Vegetation is a major component of terrestrial ecosystems, playing a key role in the exchanges of carbon, water and energy [1]. Fractional vegetation cover (FVC), which is commonly defined as temporally continuous vegetation indices are generated. The vegetation phenological characteristics can be extracted from the constructed time-series vegetation indices, and then phenological change analysis and vegetation type identification can be conducted. Walker et al. [25] fused MODIS and Landsat data to generate time-series NDVI and EVI with 30-m spatial resolution for dryland vegetation monitoring in Arizona. Meng et al. [26] fused the NDVI from HJ-1 CCD and MODIS for crop biomass estimation. Jia et al. [27] generated time-series NDVI with 30-m spatial resolution and 8-day intervals for forest cover change analysis. From the current studies, the following findings are observed: (i) the spatio-temporal fusion methods are mostly used for the combination of Landsat and MODIS data due to their similar spectral band configuration designs in visible and near-infrared spectral regions; (ii) although spatio-temporal fusion methods have been widely used in vegetation phenology, vegetation indices reconstruction and vegetation type identification, these methods are rarely used in vegetation parameter fusion and have not been used for FVC estimation; (iii) band reflectances are the inputs for the fusion algorithm in most studies, whereas vegetation parameters have seldom been selected as the object to fuse.
The GaoFen-1 Wide Field View (GF-1 WFV) sensors launched by China in recent years can acquire 16-m spatial resolution multi-spectral data, which are of high value for estimating finer resolution FVC. However, GF-1 WFV data are easily influenced by clouds, making it difficult to produce time-series FVC data. The MODIS data are extensively applied to dynamic monitoring and simulation of land surface processes due to its ability for high-frequency observations. Therefore, time-series FVC with high spatio-temporal resolution can potentially be generated by integrating the high spatial resolution from GF-1 WFV and the high temporal resolution from MODIS data based on spatio-temporal fusion methods. The objective of this study is to evaluate the feasibility of generating high spatio-temporal FVC data using GF-1 WFV and MODIS data. Generally, there are two fusion strategies. One method first fuses reflectance data from GF-1 WFV and MODIS and then estimates FVC from the fused reflectance data (strategy FC, Fusion_then_FVC), and the other method directly fuses FVC data generated from GF-1 WFV and MODIS reflectances (strategy CF, FVC_then_Fusion). Which strategy is more suitable is also the key issue to be discussed in this study.

Study Area and Field Survey Data
The study area is located in Hengshui City (115 • 10 E~116 • 34 E, 37 • 03 N~38 • 23 N) of Hebei Province, China (Figure 1a). The landform is plain, with an average elevation of 23 m. The average annual temperature and precipitation are approximately 13.33 • C and 522.5 mm, respectively. Cultivated land and residential land are the main land cover types in this area.
The sample sites were distributed over the cropland in 11 counties of Hengshui ( Figure 1b). The field survey data were collected for winter wheat and maize in four key crop growth periods (Table 1). Each collection period lasted for 2-3 days. There were two 100 m × 100 m sample sites in each county. For each sample site, five sample points with size of 30 m × 30 m were selected, including four on the corners and one at the center. Therefore, there were 110 ground truth points (GTPs) during each observation period. At each sample point, five digital images were acquired using a digital camera from the nadir, and the average of FVC extracted from these five digital images was considered as the field survey FVC at this sample point.

Study Area and Field Survey Data
The study area is located in Hengshui City (115°10′E~116°34′E, 37°03′N~38°23′N) of Hebei Province, China (Figure 1a). The landform is plain, with an average elevation of 23 m. The average annual temperature and precipitation are approximately 13.33 °C and 522.5 mm, respectively. Cultivated land and residential land are the main land cover types in this area. The FVC of a digital image was extracted by calculating the proportion of the vegetation pixel number in the total image. To eliminate the systematic bias caused by the distortion and the perspective effect of the image edge, 25% of the image edge was removed. The FVC extraction from digital image was based on an automatic shadow-resistant LABFVC (SHAR-LABFVC) algorithm, which introduced Hue Saturation Intensity (HSI) to enhance the brightness of the shadow, making sure that vegetation pixels were accurately identified over heavy shaded areas [28]. The extraction example of this algorithm used for winter wheat image ( Figure 2) revealed that FVC could be effectively extracted. The sample sites were distributed over the cropland in 11 counties of Hengshui ( Figure 1b). The field survey data were collected for winter wheat and maize in four key crop growth periods (Table  1). Each collection period lasted for 2-3 days. There were two 100 m×100 m sample sites in each county. For each sample site, five sample points with size of 30 m × 30 m were selected, including four on the corners and one at the center. Therefore, there were 110 ground truth points (GTPs) during each observation period. At each sample point, five digital images were acquired using a digital camera from the nadir, and the average of FVC extracted from these five digital images was considered as the field survey FVC at this sample point. The FVC of a digital image was extracted by calculating the proportion of the vegetation pixel number in the total image. To eliminate the systematic bias caused by the distortion and the perspective effect of the image edge, 25% of the image edge was removed. The FVC extraction from digital image was based on an automatic shadow-resistant LABFVC (SHAR-LABFVC) algorithm, which introduced Hue Saturation Intensity (HSI) to enhance the brightness of the shadow, making sure that vegetation pixels were accurately identified over heavy shaded areas [28]. The extraction example of this algorithm used for winter wheat image ( Figure 2) revealed that FVC could be effectively extracted.

GF-1 WFV Data
The GF-1 satellite is equipped with two Panchromatic/Multi-spectral (PMS) cameras and four WFV cameras. The GF-1 WFV sensors can provide multi-spectral data with a spectral range from 450 to 890 nm, including three visible bands (blue, green, red) and one near-infrared band. Although the

GF-1 WFV Data
The GF-1 satellite is equipped with two Panchromatic/Multi-spectral (PMS) cameras and four WFV cameras. The GF-1 WFV sensors can provide multi-spectral data with a spectral range from Remote Sens. 2019, 11, 2324 5 of 21 450 to 890 nm, including three visible bands (blue, green, red) and one near-infrared band. Although the revisit cycle of GF-1 WFV could be set as 4 days by combining the four WFV sensors, the GF-1 WFV data provided by China Centre for Resources Satellite Data and Application (CRESDA) (http://www.cresda.com/CN/) could not be used to construct time-series data with 4-day intervals due to the occasional absence of data for half a month, and the unfavorable weather conditions reduced the number of high-quality data providing continuous and effective observations for the same region. Therefore, it is necessary to generate synthetic GF-1 WFV data to fill the temporal gaps using temporal information from other remote sensing data with high-frequency observations, such as cloud-cleared MODIS data. GF-1 WFV data used in this study are listed in Table 2. The pre-processing of GF-1 WFV data included radiance calibration, atmospheric correction, and geometric correction. The DN value of the original data was converted to radiance using the following equation: where L e is the radiance, and Gain and Offset are the calibration coefficients obtained from CRESDA (http://218.247.138.119/CN/Downloads/dbcs/11307.shtml). According to the spectral response functions of GF-1 WFV sensors provided by CRESDA, the atmospheric correction was conducted using the FLAASH method [29] based on the ENVI platform (Version 4.8). For the geometric correction, the orthographical correction was first performed using the RPC data of GF-1 WFV. Then, the precision geometric correction was performed. Because the geometric positioning of Landsat 8 data was in good consistency with the GPS values of field survey data, high-quality Landsat 8 data were selected as the reference data for ground control points selection.

MOD09A1 Reflectance and GLASS FVC Data
MODIS reflectance product (MOD09A1) and Global LAnd Surface Satellite (GLASS) FVC product were selected as coarse resolution data for reflectance fusion and FVC fusion, respectively. The Visible Infrared Imaging Radiometer Suite (VIIRS) on-board the Suomi National Polar-Orbiting Partnership (S-NPP) satellite, a successor for MODIS, is also a valuable coarse resolution data source. However, a mature time-series FVC product of VIIRS data has not been generated to date. Thus, MODIS and its derived FVC product (GLASS FVC) are currently more suitable for this study. MOD09A1 is an 8-day composited surface reflectance product with 500-m spatial resolution. The green (band 4), red (band1) and near-infrared bands (band 2) of MOD09A1 were used for reflectance fusion. The MOD09A1 data were reprocessed using the temporal-spatial filtering method developed by Tang et al. [30] to obtain cloud-cleared MODIS reflectance data. The GLASS FVC product was derived from the reprocessed MOD09A1 reflectance data based on the Multivariate Adaptive Regression Splines (MARS) model, which was trained using the globally distributed training samples extracted from Landsat TM and ETM+ data [31,32]. The validation results showed that GLASS FVC had good accuracy and spatio-temporal continuities [33]. The tiles and acquisition dates of MOD09A1 and GLASS FVC data selected are presented in Table 3. The pre-processing of MOD09A1 reflectance and GLASS FVC data were identical. First, the projection was converted from the Sinusoidal to the Universal Transverse Mercator (UTM) using the MODIS Reprojection Tool (MRT), in which the green, red and near-infrared bands of MOD09A1 were extracted simultaneously. Then, the re-projected data were clipped to match the range of the study area and resampled to 16 m spatial resolution using the cubic convolution method to meet the requirements of the fusion algorithm.

Overall Workflow
The flow chart of this study is shown in Figure 3. Strategies FC and CF were conducted, and the predicted results of these two strategies were evaluated and compared based on the field survey FVC and the FVC estimated from the actual GF-1 WFV reflectance data.  The pre-processing of MOD09A1 reflectance and GLASS FVC data were identical. First, the projection was converted from the Sinusoidal to the Universal Transverse Mercator (UTM) using the MODIS Reprojection Tool (MRT), in which the green, red and near-infrared bands of MOD09A1 were extracted simultaneously. Then, the re-projected data were clipped to match the range of the study area and resampled to 16 m spatial resolution using the cubic convolution method to meet the requirements of the fusion algorithm.

Overall Workflow
The flow chart of this study is shown in Figure 3. Strategies FC and CF were conducted, and the predicted results of these two strategies were evaluated and compared based on the field survey FVC and the FVC estimated from the actual GF-1 WFV reflectance data.

FVC Estimation Model for GF-1 WFV Data
Due to the clear physical significance, the physical method was adopted to construct the FVC estimation model for GF-1 WFV data. The complex inversion process was simplified by machine learning method in this study. Random Forest Regression (RFR) is insensitive to noise or overfitting, and less parameters need to be optimized in this method compared with other machine learning

FVC Estimation Model for GF-1 WFV Data
Due to the clear physical significance, the physical method was adopted to construct the FVC estimation model for GF-1 WFV data. The complex inversion process was simplified by machine learning method in this study. Random Forest Regression (RFR) is insensitive to noise or overfitting, and less parameters need to be optimized in this method compared with other machine learning algorithms, such as Neural Networks (NNs) and Support Vector Regression (SVR). RFR also shows satisfactory accuracy in the FVC estimation using remote sensing data [34,35]. Therefore, RFR is a suitable choice for building the FVC estimation model for GF-1 WFV data.
The construction of the FVC estimation model for GF-1 WFV involved generating learning a dataset and training the RFR. First, the widely used PROSAIL model was utilized to simulate a learning dataset containing the canopy reflectance and the corresponding FVC under various vegetation conditions. Then, the RFR was trained using simulated data to model the relationship between band reflectance and FVC and was finally used to estimate FVC from GF-1 WFV data.

Generating Simulated Datasets Based on the PROSAIL Model
The PROSAIL model, a combination of the leaf optical property model PROSPECT [36] and the canopy reflectance model SAIL [37], has been widely used in the retrieval of vegetation biophysical parameters given its simplicity, general robustness and satisfactory performance in validation [2,13,38,39]. In this study, the version of PROSPECT-D model was selected. The hemispheric reflectance and transmission of leaves over wavelengths from 400 nm to 2500 nm can be simulated based on the PROSPECT model by providing several physiological and biochemical parameters [36]. The SAIL model with hot-spot correction [40] is a canopy Bi-directional Reflectance Distribution Function (BRDF) model to simulate the vegetation canopy reflectance under different observation conditions [37]. Since FVC is the canopy structural parameter to retrieve, the classical gap fraction described by LAI and ALA was used to convert the simulated LAI to FVC [2]. To avoid the redundant calculation and better represent land surface conditions, reasonable ranges or fixed values were given to the input parameters for the PROSAIL model based on the previous studies [35,[41][42][43] (Table 4).
Gauss 60 30 SZA Soil reflectance is also an important parameter for the PROSAIL model. The soil reflectance was derived from the global soil spectral library provided by the International Soil Reference and Information Center (http://www.isric.org) in this study. Because the original soil spectral data were enormous and contained redundant information, the Spectral Angle Mapper (SAM) method [44] was used to select representative soil spectral from the original data. Assuming two spectral vectors with n Remote Sens. 2019, 11, 2324 8 of 21 bands, X = (x 1 , x 2 , · · · , x n ) and Y = (y 1 , y 2 , · · · , y n ), the spectral angle between these two vectors can be expressed as follows: where α XY represents the spectral angle between soil spectral vectors X and Y, ranging from 0 to π/2. The greater difference between the two vectors, the lager α XY is. In this study, when α XY is less than 0.05, these two soil spectral vectors are considered to be similar. Finally, 20 soil reflectances ( Figure 4) were extracted from the original soil spectral data to characterize the possible range of soil spectral reflectances.
x y cos α x y (2) where α XY represents the spectral angle between soil spectral vectors X and Y, ranging from 0 to π 2 ⁄ . The greater difference between the two vectors, the lager α XY is. In this study, when α XY is less than 0.05, these two soil spectral vectors are considered to be similar. Finally, 20 soil reflectances ( Figure 4) were extracted from the original soil spectral data to characterize the possible range of soil spectral reflectances. The reflectance of the vegetation canopy at each wavelength under different parameter combinations was calculated using PROSAIL. Then, according to the spectral response functions of GF-1 WFV sensors, resampling was conducted to simulate GF-1 WFV observations at green, red and near-infrared bands. The simulation finally generated a lookup table containing 180,000 cases of reflectances and the corresponding FVC for each WFV sensor. In total, 80% of the cases were selected as the training samples of RFR model, and the remaining 20% were selected as validation samples.

Random Forest Regression
The RFR algorithm [45] is an ensemble learning algorithm composed by multiple regression trees. The relationship between band reflectances and FVC can be modelled by a set of decision rules using RFR [46]. Each regression tree generates a predicted value, and the final predicted value of RFR is the average response value of all regression trees. The randomness of random forest is mainly reflected in the following two aspects: (i) Each tree is grown on the independent bootstrap sample sets extracted from the training samples randomly. (ii) A random subset of the explanatory variables is selected at each split node when constructing the splitting rules. Two parameters need to be optimized in RFR: the number of trees in random forest (n tree ) and the number of band reflectances The reflectance of the vegetation canopy at each wavelength under different parameter combinations was calculated using PROSAIL. Then, according to the spectral response functions of GF-1 WFV sensors, resampling was conducted to simulate GF-1 WFV observations at green, red and near-infrared bands. The simulation finally generated a lookup table containing 180,000 cases of reflectances and the corresponding FVC for each WFV sensor. In total, 80% of the cases were selected as the training samples of RFR model, and the remaining 20% were selected as validation samples.

Random Forest Regression
The RFR algorithm [45] is an ensemble learning algorithm composed by multiple regression trees. The relationship between band reflectances and FVC can be modelled by a set of decision rules using RFR [46]. Each regression tree generates a predicted value, and the final predicted value of RFR is the average response value of all regression trees. The randomness of random forest is mainly reflected in the following two aspects: (i) Each tree is grown on the independent bootstrap sample sets extracted from the training samples randomly. (ii) A random subset of the explanatory variables is selected at each split node when constructing the splitting rules. Two parameters need to be optimized in RFR: the number of trees in random forest (n tree ) and the number of band reflectances randomly extracted at each node (m try ). RFR is not very sensitive to their values [47]. In this study, n tree was set as 500, and m try was set as 1/3 of the number of bands, which were the default values.

Spatio-Temporal Fusion Method for GF-1 WFV and MODIS Data
The ESTARFM proposed by Zhu et al. [24] was utilized to fuse GF-1 WFV and MODIS data. Compared with the STARFM, spatial heterogeneity was considered in this algorithm. Instead of simply assuming that the fine pixel changes are equal to the coarse pixel changes, a conversion coefficient denoting the ratio of changes between fine and coarse pixels was introduced. This modification improved the prediction accuracy in the heterogeneous region, and more spatial details could be captured. Based on this coefficient, the prediction of a GF-1 WFV pixel can be determined using Equation (3): where F and C are the GF-1 WFV pixel value and MODIS pixel value, respectively; (X,Y) is the location of the target pixel; T p is the prediction date; T 0 is the acquisition date of the base data; and V(X,Y) is the conversion coefficient of the target pixel.
The key step of ESTARFM is the calculation of V. To obtain the value of V, two pairs of MODIS data and GF-1 WFV data before and after the prediction time (T m and T n ) are required. V was computed for each pixel by establishing linear regression relationship between the GF-1 WFV and MODIS pixel values of all the similar pixels in two observed pairs. The slope of the fitted line in the linear regression analysis is considered as the conversion coefficient V. Here, similar pixels are selected in a moving search window for the target pixel based on the spectral similarity. To reduce uncertainty, ESTARFM also takes full advantage of the information from these similar pixels to calculate the final prediction by giving weights to them. Similar pixels with higher spectral similarity between the coarse and fine pixels and closer spatial distance to the central pixel are considered to have greater impacts on the central pixel and given higher weights. Moreover, two predicted results were generated using the base data at T m and T n , separately. The final prediction at T p was the combination of the two predicted results by giving weights to them. Based on the above optimizations, the predicted result can be computed as follows: F X w/2 , Y w/2 , T p = W m ×F m X w/2 , Y w/2 , T p +W n ×F n X w/2 , Y w/2 , T p (4) where F is the final predicted result; F m and F n are the predictions generated by base data at T m and T n , respectively; W m and W n are the temporal weight given to F m and F n ; (X w/2 , Y w/2 ) is the location of the target pixel; and w is the size of the moving window for the target pixel. In this study, w was set as 51 × 51 pixels of GF-1 WFV, the number of similar pixels for the target pixel in the moving window was set as 20, and the number of land cover class for the selection of similar pixels was set as 4. Since there is no limitation on the number of input bands for ESTARFM, multiple reflectance bands and one FVC band can be fused using ESTARFM. Therefore, there are two fusion strategies as mentioned above to generate FVC with high spatio-temporal resolution: the strategy of fusing reflectance data (FC) and the strategy of fusing derived FVC data (CF). In some cases, the MODIS data matching the date of GF-1 WFV data cannot be found in the time series. Considering that MOD09A1 is an 8-day composited product, the MODIS data fused with GF-1 WFV data can be selected according to the composited period to which the corresponding GF-1 WFV data belong.

Validation Method
Two parts need to be validated in this study: the accuracy of the FVC estimation model for GF-1 WFV and the reliability of the fusion of GF-1 WFV and MODIS data. Assessment of the FVC estimation model was first performed based on simulated validation data derived from the PROSAIL model. Then, a direct validation comparing the field survey FVC and the FVC estimated from the actual GF-1 WFV data was conducted. To validate the reliability of the fusion result, FVC estimated from the real GF-1 WFV data and the field survey FVC were considered as the reference data. Although the aim of the spatio-temporal fusion is to predict the FVC at the time when real GF-1 WFV data could not be acquired and to compare the predicted results against the reference data, FVC was predicted on the date when the reference data were known. Therefore, the performances of strategies FC and CF were validated using two methods. The first approach was to compare the predicted result with the FVC estimated from the real GF-1 WFV data, in which visual comparison and pixel-to-pixel validation were both implemented. The second approach was to compare the predicted FVC at each sample site with the field survey FVC. To evaluate the performances quantitively, the coefficient of determination R 2 and Root Mean Square Error (RMSE) were calculated.

Assessment of the GF-1 WFV FVC Estimation Model
From the validation based on the simulated data ( Figure 5), the GF-1 WFV FVC estimation models for the four WFV sensors show satisfactory accuracy (R 2 ≥ 0.966, RMSE ≤ 0.05). Most of the scattered points are distributed along the 1:1 reference line. As the input parameter, LAI of the PROSAIL model was set as a Gaussian function distribution, and the simulated FVC also conformed to the Gaussian distribution after LAI was converted into FVC. The validation performance indicates that the RFR model is reliable and has good generalization ability to adapt to various situations. Although the aim of the spatio-temporal fusion is to predict the FVC at the time when real GF-1 WFV data could not be acquired and to compare the predicted results against the reference data, FVC was predicted on the date when the reference data were known. Therefore, the performances of strategies FC and CF were validated using two methods. The first approach was to compare the predicted result with the FVC estimated from the real GF-1 WFV data, in which visual comparison and pixel-to-pixel validation were both implemented. The second approach was to compare the predicted FVC at each sample site with the field survey FVC. To evaluate the performances quantitively, the coefficient of determination R 2 and Root Mean Square Error (RMSE) were calculated.

Assessment of the GF-1 WFV FVC Estimation Model
From the validation based on the simulated data ( Figure 5), the GF-1 WFV FVC estimation models for the four WFV sensors show satisfactory accuracy (R 2 ≥ 0.966, RMSE ≤ 0.05). Most of the scattered points are distributed along the 1:1 reference line. As the input parameter, LAI of the PROSAIL model was set as a Gaussian function distribution, and the simulated FVC also conformed to the Gaussian distribution after LAI was converted into FVC. The validation performance indicates that the RFR model is reliable and has good generalization ability to adapt to various situations.  Although there were four field survey periods, the GF-1 WFV data matching the second field survey period (4 May 2017 to 6 May 2017) was covered by cloud at the sampling locations, and there was no GF-1 WFV data acquired around the fourth field survey period (29 July 2017 to 31 July 2017). Therefore, only the field survey data collected from 29 March 2017 to 31 March 2017 and from 5 July 2017 to 8 July 2017 was used to validate the accuracy of the FVC estimation model for GF-1 WFV ( Figure 6). In Hengshui City, maize in early July is in the seedling stage, whereas winter wheat is in the vigorous growth period at the end of March and the beginning of April. Thus, the FVC of winter wheat is much higher than that of maize. The fitted line with a slope of 0.973 is almost parallel to the 1:1 reference line. The validation result (R 2 = 0.89, RMSE = 0.071) also indicates that the GF-1 WFV FVC estimation model has reliable estimation accuracy, and the trained RFR model is robust and can be used for generating high quality FVC from GF-1 WFV reflectance data. Although there were four field survey periods, the GF-1 WFV data matching the second field survey period (4 May 2017 to 6 May 2017) was covered by cloud at the sampling locations, and there was no GF-1 WFV data acquired around the fourth field survey period (29 July 2017 to 31 July 2017). Therefore, only the field survey data collected from 29 March 2017 to 31 March 2017 and from 5 July 2017 to 8 July 2017 was used to validate the accuracy of the FVC estimation model for GF-1 WFV ( Figure 6). In Hengshui City, maize in early July is in the seedling stage, whereas winter wheat is in the vigorous growth period at the end of March and the beginning of April. Thus, the FVC of winter wheat is much higher than that of maize. The fitted line with a slope of 0.973 is almost parallel to the 1:1 reference line. The validation result (R 2 = 0.89, RMSE = 0.071) also indicates that the GF-1 WFV FVC estimation model has reliable estimation accuracy, and the trained RFR model is robust and can be used for generating high quality FVC from GF-1 WFV reflectance data.

Assessment Based on FVC Estimated from the Actual GF-1 WFV Data
The acquisition dates of GF-1 WFV and MODIS data to fuse and the acquisition date of actual data for validation are shown in Table 5. A subset of the study area (1590 × 1351 GF-1 WFV pixels) was selected to assess the performance of strategies FC and CF (Figure 7). Figure 8 shows the FVC prediction from strategies FC and CF and the FVC estimated from actual GF-1 WFV data. Visually, the results obtained from the two strategies both closely resemble the FVC estimated from actual data, successfully capturing most of the FVC changes. However, strategy CF performs better than strategy FC given that the patches with low FVC prediction values in the cropland near to the residential area are considerably more obvious in FC. Cropland and residential areas are the main endmembers for the mixed MODIS pixels between these two land cover types. When the residential area contributes more information to the mixed pixels, the FVC of the mixed pixels are significantly lower compared with cropland, making it easier to generate patches with low prediction after fusion. As shown in the figures presenting details of the two results, there are unrealistic areas in both strategies FC and CF, but CF generates a better prediction. For strategy FC, obvious noise is present in the predicted result. In addition, large errors are produced in the cropland in the center of the amplifying scenes (Figure 8b), and its value is significantly lower than the actual FVC. In contrast, almost no noise is present in the prediction of strategy CF.

Assessment Based on FVC Estimated from the Actual GF-1 WFV Data
The acquisition dates of GF-1 WFV and MODIS data to fuse and the acquisition date of actual data for validation are shown in Table 5. A subset of the study area (1590 × 1351 GF-1 WFV pixels) was selected to assess the performance of strategies FC and CF (Figure 7). Figure 8 shows the FVC prediction from strategies FC and CF and the FVC estimated from actual GF-1 WFV data. Visually, the results obtained from the two strategies both closely resemble the FVC estimated from actual data, successfully capturing most of the FVC changes. However, strategy CF performs better than strategy FC given that the patches with low FVC prediction values in the cropland near to the residential area are considerably more obvious in FC. Cropland and residential areas are the main endmembers for the mixed MODIS pixels between these two land cover types. When the residential area contributes more information to the mixed pixels, the FVC of the mixed pixels are significantly lower compared with cropland, making it easier to generate patches with low prediction after fusion. As shown in the figures presenting details of the two results, there are unrealistic areas in both strategies FC and CF, but CF generates a better prediction. For strategy FC, obvious noise is present in the predicted result. In addition, large errors are produced in the cropland in the center of the amplifying scenes (Figure 8b), and its value is significantly lower than the actual FVC. In contrast, almost no noise is present in the prediction of strategy CF.    The difference between the predicted and actual FVC data is represented by difference maps and difference histograms (Figure 9). Most of the pixels are concentrated between −0.1 and 0.1, making the histogram curves of strategies FC and CF both reach peak values when the difference is approximately 0. Pixels with differences between −0.1 and 0.1 account for 84% of the total pixels in FC, while the proportion of CF reaches 92%. Meanwhile, comparing Figure 9a and Figure 9b, it can be seen that the pure red and pure blue patches representing large deviations in FC are significantly greater in number and larger than those in CF. The areas in blue (predicted value higher than the actual FVC) are mainly buildings or fields with low FVC, whereas the areas in red (predicted value lower than the actual FVC) are mainly fields with high FVC that are near the residential areas. Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 21 The difference between the predicted and actual FVC data is represented by difference maps and difference histograms (Figure 9). Most of the pixels are concentrated between −0.1 and 0.1, making approximately 0. Pixels with differences between −0.1 and 0.1 account for 84% of the total pixels in FC, while the proportion of CF reaches 92%. Meanwhile, comparing Figure 9a and Figure 9b, it can be seen that the pure red and pure blue patches representing large deviations in FC are significantly greater in number and larger than those in CF. The areas in blue (predicted value higher than the actual FVC) are mainly buildings or fields with low FVC, whereas the areas in red (predicted value lower than the actual FVC) are mainly fields with high FVC that are near the residential areas. To further evaluate the performances of strategies FC and CF, pixel-to-pixel validations were conducted ( Figure 10). The validation accuracy indicates that both the strategies FC (R 2 = 0.9345, RMSE = 0.0719) and CF (R 2 = 0.9580, RMSE = 0.0576) show good FVC prediction accuracy, and CF performs slightly better than FC. Compared with CF, there are more points distributed far away from the 1:1 reference line in FC. There are two groups of points of FC with obvious erroneous prediction in the interval of [0.0, 0.2] and [0.8, 0.9] (Figure 10a). This phenomenon is consistent with the information shown in Figure 9. There are more pixels with large deviations in the prediction results of FC. The slope of the fitted line of CF is 1.017, which almost coincides with the 1:1 reference line. The slope of the fitted line of FC is 0.9229, which has larger difference from 1 compared with CF. To further evaluate the performances of strategies FC and CF, pixel-to-pixel validations were conducted ( Figure 10). The validation accuracy indicates that both the strategies FC (R 2 = 0.9345, RMSE = 0.0719) and CF (R 2 = 0.9580, RMSE = 0.0576) show good FVC prediction accuracy, and CF performs slightly better than FC. Compared with CF, there are more points distributed far away from the 1:1 reference line in FC. There are two groups of points of FC with obvious erroneous prediction in the interval of [0.0, 0.2] and [0.8, 0.9] (Figure 10a). This phenomenon is consistent with the information shown in Figure 9. There are more pixels with large deviations in the prediction results of FC. The slope of the fitted line of CF is 1.017, which almost coincides with the 1:1 reference line. The slope of the fitted line of FC is 0.9229, which has larger difference from 1 compared with CF.

Assessment Based on Field Survey FVC
To assess the accuracy of strategies FC and CF using field survey data, four experiments were conducted (Table 6). In the first three experiments, actual GF-1 WFV and MODIS data were fused to predict FVC at the time when field survey data were acquired. In the No.4 Experiment, due to the absence of actual GF-1 WFV data, the second fine resolution image input to ESTARFM was the predicted result of the No.3 Experiment rather than the actual GF-1 WFV data. Because clouds covered the field survey sites in some cases, not all the field survey data were selected for the accuracy validation.

Assessment Based on Field Survey FVC
To assess the accuracy of strategies FC and CF using field survey data, four experiments were conducted (Table 6). In the first three experiments, actual GF-1 WFV and MODIS data were fused to predict FVC at the time when field survey data were acquired. In the No.4 Experiment, due to the absence of actual GF-1 WFV data, the second fine resolution image input to ESTARFM was the predicted result of the No.3 Experiment rather than the actual GF-1 WFV data. Because clouds covered the field survey sites in some cases, not all the field survey data were selected for the accuracy validation.

DOY209
Different from the pixel-to-pixel validation results in Section 3.2.1, strategy CF (R 2 = 0.8138, RMSE = 0.0985) achieved more satisfactory performance than that of strategy FC (R 2 = 0.7173, RMSE = 0.1214) when compared with the field survey data. As is shown in Figure 11a, the scattered points of the four measurement periods in strategy FC have different degrees of deviation from the 1:1 reference line. There is a point on the horizontal axis with the coordinate of (0.1, 0). It is possible that the pixel corresponding to this point was predicted as a non-vegetation land cover type after reflectance fusion, and then the FVC value of this pixel was estimated as 0 when calculated using the  Different from the pixel-to-pixel validation results in Section 3.2.1, strategy CF (R 2 = 0.8138, RMSE = 0.0985) achieved more satisfactory performance than that of strategy FC (R 2 = 0.7173, RMSE = 0.1214) when compared with the field survey data. As is shown in Figure 11a, the scattered points of the four measurement periods in strategy FC have different degrees of deviation from the 1:1 reference line. There is a point on the horizontal axis with the coordinate of (0.1, 0). It is possible that the pixel corresponding to this point was predicted as a non-vegetation land cover type after reflectance fusion, and then the FVC value of this pixel was estimated as 0 when calculated using the FVC estimation model. All the results show that FC is instability compared to CF, which will bring greater uncertainty and higher probability of generating large errors.
that the predicted results of the period from 5 July to 8 July were obtained by the fusion of the actual GF-1 WFV FVC data on 26 June and the FVC fusion result of No.3 Experiment. Thus, the fusion results of No.3 Experiment were "reused". Therefore, we can infer that reliable prediction could also be obtained by reusing the predicted results as the base data of fusion, and it is possible to utilize this "reused" method to generate long-term time-series FVC product in the future.

Discussion
Spatial and temporal fusion methods have been applied for the generation of NDVI and EVI time series in previous studies [25][26][27]. This study extends such fusion algorithms to the estimation of FVC time series with fine spatio-temporal resolution. Since vegetation indices, such as NDVI and EVI, are directly obtained from the band combination, the fusion results of vegetation indices were directly compared with the vegetation indices calculated from real data in previous studies [48,49], which can adequately validate the performance of the fusion. However, for vegetation parameters, the fusion results need to be validated with the ground truth to determine whether they are consistent with the actual condition. Hence, in addition to the comparison against the FVC derived from the real GF-1 WFV data, validation based on field survey data was also conducted in this study to further assess the accuracy of the fusion results.
This study indicates the feasibility of fusing GF-1 WFV and MODIS data to generate high spatiotemporal resolution FVC estimates at the GF-1 WFV scale. The results were compared for a strategy of data fusion of reflectance data (strategy FC) versus fusion of derived FVC (strategy CF). The results indicate that strategy CF outperforms FC, which is mainly reflected as follows: (i) The predicted FVC data generated by FC have large unrealistic patches, whereas this phenomenon is not obvious in CF, which better reflects the spatial details; (ii) The accuracy of CF is significantly better than that of FC with a higher R 2 and a smaller RMSE; (iii) The probability of generating abnormal CF values is lower. The fact that FC performs worse than CF is mainly caused by error propagation. Regarding strategy FC, the errors caused by the sensors, data pre-processing and the fusion algorithm will be further propagated to the estimation of FVC after reflectance fusion, and certain uncertainties will be generated in the process of error propagation. Although some errors are also be propagated to FVC Unlike the strategy FC, the sample points of strategy CF are mainly distributed near the 1:1 reference line (Figure 11b). The scattered points of the period from 5 July to 8 July appear to distribute more consistently with the 1:1 reference line than those of the other three periods. It should be noted that the predicted results of the period from 5 July to 8 July were obtained by the fusion of the actual GF-1 WFV FVC data on 26 June and the FVC fusion result of No.3 Experiment. Thus, the fusion results of No.3 Experiment were "reused". Therefore, we can infer that reliable prediction could also be obtained by reusing the predicted results as the base data of fusion, and it is possible to utilize this "reused" method to generate long-term time-series FVC product in the future.

Discussion
Spatial and temporal fusion methods have been applied for the generation of NDVI and EVI time series in previous studies [25][26][27]. This study extends such fusion algorithms to the estimation of FVC time series with fine spatio-temporal resolution. Since vegetation indices, such as NDVI and EVI, are directly obtained from the band combination, the fusion results of vegetation indices were directly compared with the vegetation indices calculated from real data in previous studies [48,49], which can adequately validate the performance of the fusion. However, for vegetation parameters, the fusion results need to be validated with the ground truth to determine whether they are consistent with the actual condition. Hence, in addition to the comparison against the FVC derived from the real GF-1 WFV data, validation based on field survey data was also conducted in this study to further assess the accuracy of the fusion results.
This study indicates the feasibility of fusing GF-1 WFV and MODIS data to generate high spatio-temporal resolution FVC estimates at the GF-1 WFV scale. The results were compared for a strategy of data fusion of reflectance data (strategy FC) versus fusion of derived FVC (strategy CF). The results indicate that strategy CF outperforms FC, which is mainly reflected as follows: (i) The predicted FVC data generated by FC have large unrealistic patches, whereas this phenomenon is not obvious in CF, which better reflects the spatial details; (ii) The accuracy of CF is significantly better than that of FC with a higher R 2 and a smaller RMSE; (iii) The probability of generating abnormal CF values is lower. The fact that FC performs worse than CF is mainly caused by error propagation. Regarding strategy FC, the errors caused by the sensors, data pre-processing and the fusion algorithm will be further propagated to the estimation of FVC after reflectance fusion, and certain uncertainties will be generated in the process of error propagation. Although some errors are also be propagated to FVC fusion after data pre-processing in strategy CF, CF only needs to fuse one FVC band compared with FC which fuses three reflectance bands, and the impact of errors on the final predicted result is relatively smaller. In addition to the greater uncertainty of predicted results and higher probability of abnormal predicted values, the strategy FC is also limited by the type of WFV sensor. There are four WFV sensors on-board GF-1 satellites, and their spectral response functions have some minor differences. Thus, four different FVC estimation models were constructed according to the sensor type. After reflectance fusion in strategy FC, a suitable FVC estimation model should be selected for the fused data. When GF-1 data acquired from two different WFV sensors are selected as two fine-resolution base data for reflectance fusion, it is a problem when selecting the FVC estimation model for the reflectance fusion result. Regardless of which model is selected, some errors may be introduced into the final result. In addition, since FC needs to fuse three reflectance bands, the operation time required is approximately three times longer than that of CF. Therefore, in terms of computational efficiency, CF is also obviously superior to FC.
The accuracy of the fusion results is related to whether the land cover type changes [50]. The change of land cover type may lead to poorer accuracy. For the cropland, the crop type may change between two base dates of fusion. Taking the study area in this research as an example, crop grown in the same cropland changes from winter wheat to maize after July. Therefore, when applying the ESTARFM algorithm, it should be noted that the dates of the two pairs of base data should be as close as the prediction date to ensure they are in the same crop-growing season. If the spectral characteristics of vegetation at the two base time are significantly different, uncertainties would be introduced to the fusion process and further influenced the accuracy of the predicted results. When this method is extended to large-scale areas with multiple vegetation types, the base data of the fusion are relatively difficult to determine. The changes of the growing season in different local areas are not synchronized due to the differences in vegetation types. Therefore, a strategy that uses the identical base images for the entire area may not satisfy all local conditions. According to the vegetation types and the changes of the growth conditions, the large-scale region can be decomposed into several small regions. Suitable base data with acquisition dates that are close can then be selected for each small region, ensuring the dates of the base images are within the same growing season. If there are cropland, forest, grassland and other vegetation types in the area, it is recommended to focus on the changes in cropland rather than other types and divide the area mainly according to the differences between croplands. Crop type may change during several months in the same cropland, whereas this change is not obvious for forest and grassland.
The MODIS data are chosen as the coarse resolution data for the fusion. As mentioned above, VIIRS data also have potential in spatio-temporal fusion. Although an FVC estimation algorithm for VIIRS has been developed [51], the mature time-series VIIRS FVC product has not been produced, which makes VIIRS data unable to effectively fill the gaps in time series for high spatial resolution data. Therefore, the application of VIIRS data in spatio-temporal fusion can be studied in the future, and it is also worth comparing the performances of VIIRS and MODIS in the spatio-temporal fusion. Which method can generate more reliable prediction will be explored in further work.

Conclusions
This study evaluated the feasibility of generating high spatio-temporal resolution FVC based on the fusion of GF-1 WFV and MODIS data. The validations based on the field survey FVC and the FVC derived from the real GF-1 WFV data indicated that GF-1 WFV and MODIS data could be fused to generate high spatio-temporal resolution FVC with satisfactory accuracy. In addition, the strategies of estimating FVC from the prediction of reflectance fusion (strategy FC) and directly fusing FVC data (strategy CF) were compared, and results indicated that CF is more preferable for generating high spatio-temporal resolution FVC than FC. However, the fusion method will be limited by the huge computational cost when applied to a large region. Therefore, it is necessary to consider the parallel processing of the fusion algorithm in the future work to generate high spatio-temporal resolution FVC products for large-scale regions. Acknowledgments: The authors would like to appreciate Xiaolin Zhu (The Hong Kong Polytechnic University, Hong Kong) for providing the ESTARFM IDL code.

Conflicts of Interest:
The authors declare no conflict of interest.