Spatiotemporal image fusion in Google Earth Engine for annual estimates of land surface phenology in a heterogenous landscape

the accuracy achieved by executing the original ESTARFM algorithm. Excluding snow and cloud obscured observations, the algorithm produced approximately 215 observations per 30-meter pixel in 2017. Root mean squared prediction error (RMSPE) of Normalized Difference Vegetation Index (NDVI) for the GEE predicted images ranged from 0.032 to 0.066, and the RMSPE for the original ESTARFM predicted images from the ranged from 0.027 to 0.064. Phe-nometric estimates were evaluated with near-surface sensors (PhenoCams) in shrubland, conifer, and agricultural sites and field observations of phenology in grassland, open-pine, and mixed-conifer sites. Although phenometric estimates were dissimilar at all PhenoCam sites, the general temporal pattern of the GEE image fusion and PhenoCam time series was often similar. The start of season derived from the GEE image fusion time series had closer correspondence to the PhenoCam-derived start of season at the shrubland site (13 days) than the agri-culture and conifer sites. The end of season was closest at one of the conifer sites and the agriculture site (22 and 31 days, respectively). Trends of some of the field-based phenology observations aligned with phenometrics estimated from the image fusion time series. At the grassland and open-pine field sites, the phenometrics from GEE image fusion were associated with phenophase trends of dominant plant functional types. Though characterizing LSP within the interior Pacific Northwest remains a challenge, this study demonstrates that image fusion implemented in GEE can produce a densified time series capable of capturing seasonal trends in NDVI related to vegetation phenology, which can be used to estimate intraannual phenometrics.


Introduction
The study of vegetation phenology provides important information about past, current, and potential future ecosystem states. Variation in the timing of phenology results from temperature, precipitation, plant community composition and condition, genetic traits, and soil characteristics (Wolkovich et al., 2014), giving it the capacity to serve as an indicator of climate change (Richardson et al., 2013). Phenology also influences many processes, including carbon flux (Forkel et al., 2016;Richardson et al., 2012), wildfire activity (Westerling et al., 2006), crop production (Anwar et al., 2015), and wildlife populations (Morellato et al., 2016). Moreover, certain species' phenology can drive plant community composition, with invasive species being of great concern (Colautti et al., 2017). However, monitoring phenology with traditional field-based methods is expensive and limited to small spatial extents (Richardson et al., 2009).
The availability of satellite-imagery time series has allowed for phenological observation across previously unattainable extents, expanding our understanding of the relationship between phenology and the environment. Characterization of the temporal patterns of electromagnetic reflectance with satellite imagery is referred to as land surface phenology (LSP; de Beurs and Henebry, 2004). However, the spatial resolution of LSP does not always match the scale at which vegetation communities vary. Moreover, the phenological metrics (e.g., start of season) extracted by these means do not necessarily represent the same stages that might be recorded at a field site (e.g., leaf emergence) and can be influenced by abiotic processes like snow melt and soil-moisture fluctuation. Consequently, LSP does not entirely fit with the ecological definition of phenology, which is "the study of the timing of recurring biological events, the causes of their timing with regard to biotic and abiotic forces, and the interrelation among phases of the same or different species" (Lieth, 1974). This inconsistency makes validation of LSP challenging (Nijland et al., 2016;Richardson et al., 2018). Quantifying LSP at more meaningful spatial and temporal scales may be one potential solution to some of the discrepancies between LSP estimates and ground-based observations of phenology.
A major challenge in choosing satellite data for LSP is the tradeoff between revisit frequency and spatial detail. Satellites like MODIS (Moderate Resolution Imaging Spectroradiometer), AVHRR (Advanced Very High Resolution Radiometer), and VIIRS (Visible Infrared Imaging Radiometer Suite) identify fine-grained temporal signals of plant development owing to their daily revisit frequency (White et al., 1997;Zhang et al., 2018Zhang et al., , 2003. These revisit frequencies allow for intraannual estimates of phenology. However, the relatively low spatial resolution (e.g., 250 m -1 km for MODIS) of these satellites results in a highly mixed composition of vegetation, which can reduce the utility of LSP estimates for studying phenomena occurring at a fine scale. This is particularly problematic in landscapes where resources are variably dispersed leading to heterogeneous vegetation patterns.
Other satellites, like the Landsat missions, have finer spatial resolutions (30-meter) but typically lack the temporal resolution to capture fine-scale intraannual patterns of LSP (Fisher et al., 2006;Jönsson and Eklundh, 2002). To address this deficiency, some have assessed longterm phenology estimates by aggregating multiple years of data (Melaas et al., 2013), ensuring adequate observations across the growing season. Melaas et al. (2016) used such methods to correct the phenological estimate for individual years by adjusting the long-term mean curve based on individual years' anomalies. More recently, others have developed methods to estimate LSP at 30-meters using Landsat and Sentinel-2 imagery (Bolton et al., 2020;Gao et al., 2020;Zhang et al., 2020). Although the utilization of Landsat and Sentinel-2 shows promise moving forward, methods focused on leveraging older platforms enable the investigation of phenological changes that have occurred over the last 20 or more years.
Methods that blend data from these sensors (i.e., spatio-temporal image fusion) have been developed to capture the complementary strengths of Landsat and MODIS. These algorithms leverage the temporal frequency of MODIS (daily) and the spatial resolution of Landsat (30-meter) to predict imagery at 30-meter resolution for times when observations are unavailable from Landsat. There have been many new fusion methods developed in the last decade (see Belgiu and Stein, 2019). One of the first methods developed for this purpose was the Spatio-temporal Adaptive Reflectance Fusion Model (STARFM; Gao et al., 2006). STARFM uses pairs of Landsat and MODIS images to predict the Landsat reflectance at a time where only MODIS is available. This method performs well when change in the spatial dimension is gradual but is less effective when change is abrupt (Emelyanova et al., 2013;Gao et al., 2006;Hilker et al., 2009b). Other variations of spatio-temporal image fusion were developed to address issues with STARFM, including STAARCH (Spatial Temporal Adaptive Algorithm for mapping Reflectance Change) which improved the method for detecting disturbances (Hilker et al., 2009a) and ESTARFM (Enhanced STARFM; Zhu et al., 2010) which improved predictions in heterogenous regions (Emelyanova et al., 2013).
The STARFM algorithm has been employed to estimate LSP in a limited number of instances (Coops et al., 2012;Liang et al., 2014;Walker et al., 2014). Walker et al. (2014) reported that the inclusion of STARFM-fused images helped improve LSP estimates in semi-arid ecosystems. Cropland LSP was also recently assessed with STARFM and Timesat (Jönsson and Eklundh, 2004) by . They were able to extract various phenometrics (i.e., phenological transition dates) from the time series and evaluate these estimates with crop progress reports. There remains a need to execute and evaluate high-spatial and high-temporal LSP across expansive, heterogenous natural landscapes.
While image fusion methods are promising as a means of filling in missing or noisy Landsat observations, these methods are computationally expensive Rao et al., 2015). The development of cloud computing platforms like Google Earth Engine (GEE; Gorelick et al., 2017) may allow for the development and deployment of image fusion techniques, increasing their availability and reducing the onsite processing infrastructure and processing time. One such method was recently developed and tested using GEE (Moreno-Martínez et al., 2020), demonstrating the potential for this type of application of the GEE platform.
In this study, we developed and evaluated methods for estimating LSP in a heterogenous region of the interior Pacific Northwest of the United States. This research had three primary objectives: 1) develop an ESTARFM-like approach to spatio-temporal image fusion that is capable of running on a cloud-computing platform (GEE); 2) process and assemble a time series of daily 30-meter imagery and evaluate the quality of fused images across the growing season; and 3) estimate LSP and assess the similarity of LSP estimates to estimates from near-surface cameras (PhenoCams) and ground-based observations from field data. Phenometrics were evaluated at varying spatial scales and with multiple datasets to account for potential discrepancies in scale and in the method of phenological observation.

Methods
The estimation of LSP with high-spatial and high-temporal resolution was accomplished with a two-phase process. First, a high-resolution time series was assembled from Landsat observations and MODISderived predictions with an ESTARFM-like algorithm implemented on GEE (hereafter, GEE image fusion). The time series created with the GEE image fusion was then used to estimate phenometrics with a doublelogistic smoothing method where transition dates were extracted based on rates of change and curve inflection points.

Study area
The interior Pacific Northwest region is composed of a variety of natural and human-derived land cover types that are well suited to testing the methods of this research (Fig. 1). The region includes parts of the Columbia Plateau, Blue Mountains, and Northern Basin and Range Ecoregions, including the eastern edge of the Eastern Cascades Slopes and Foothills (Omernik, 1987). The region has an arid to semi-arid climate resulting from the Cascade Range's rain-shadow, which interacts with numerous mountain ranges, canyons, and valleys to produce a mosaic of forest, grassland, and shrubland plant communities. The marine-influenced continental climate is characterized by warm, dry summers and cold winters, during which most of the annual precipitation occurs. On average, total annual precipitation for this region ranged from 16 to 277 cm (mean of 43 cm) and mean annual temperature ranged from − 1 to 13 • C (mean of 8 • C) between 1981 and 2010 (PRISM Climate Group, 2012).

Satellite data and image pre-processing
The satellite data used in this study are Landsat 8 OLI (Operational The dates of all Landsat images used in the GEE image fusion and LSP estimation for each WRS-2 scene. For LSP estimation, images occurring in January and February 2017 were dropped due to persistent cloud and snow cover during this period. An orange point indicates that the Landsat image was paired with a MODIS image for GEE image fusion. A black point indicates that the image was not used in the GEE image fusion because it had greater than 5-percent cloud cover, but valid pixels from the image were retained in the final time series. Percent cloud cover is indicated by the transparency of each point where high transparency correspond with high cloud cover. Specific images used to evaluate the accuracy of image-fusion predictions are denoted with an "X" mark. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Land Imager) Surface Reflectance and MODIS NBAR (Nadir BRDF-Adjusted Reflectance) imagery provided to GEE by the United States Geological Survey. Landsat data were corrected to surface reflectance with the LaSRC method (Landsat Surface Reflectance Code; Vermote et al., 2016) and included a cloud mask calculated with the CFMask method (C code based on the Function of Mask; Foga et al., 2017). MODIS NBAR data (i.e., MCD43A4) are generated using both Terra and Aqua satellites to correct MOD09 surface reflectance to a nadir viewing angle using the bidirectional reflectance distribution function generated from images in a 16-day moving window (Schaaf et al., 2002;Vermote et al., 1997). These data were found to yield the best results in spatiotemporal image fusion based on comparisons among MOD09GA, MCD43A4, and MOD09A1 (Walker et al., 2012).
In preparation of the image-fusion process, additional filtering and preprocessing methods were applied to the Landsat and MODIS data. First, Landsat and MODIS images were acquired for dates between July 29th, 2016 and June 1st, 2018. Landsat images used in the image fusion were restricted to those with less than 5-percent cloud cover to ensure that fused images were as close to cloud-free as possible. Landsat images with greater than 50-percent snow-flagged pixels were also excluded from use in the image-fusion process. All Landsat pixels flagged as cloud, cloud shadow, or snow in the quality band of each image were masked. The MCD43A4 product does not include a mask for snow, so the Snow Water Index (Dixit et al., 2019) was used to identify and mask all snowcovered pixels from the MODIS imagery. For each image, the NDVI (Normalized Difference Vegetation Index; Rouse et al., 1973) was calculated from red and near-infrared bands. The filtered and masked Landsat images were then paired with masked MODIS images from the corresponding date. For each scene, this resulted in between 6 and 11 pairs of Landsat and MODIS images. The earliest Landsat image from 2016 was the first image paired with a MODIS image prior to September (further details provided in Section 2.3.2). All MODIS images from dates between the first and last pairs were used for prediction. The geolocation accuracy of MODIS and Landsat differ, partially as a result of pixel resolution, and it has been reported that an additional pre-fusion step of co-registration can improve image-fusion results (Gao et al., 2015). Therefore, MODIS images used in the GEE image fusion were registered to the earliest Landsat image of each set of pairs. The co-registration of MODIS images to Landsat images was performed using a rubber-sheet technique based on image correlation .

Image fusion algorithm and theory
Preprocessed Landsat and MODIS imagery were used to perform an image fusion-process similar to the methods developed by Gao et al. (2006) and Zhu et al. (2010). Although systematic differences exist between these two sensors for the same location and date, Landsat OLI and MODIS surface reflectance products were recently determined to be highly comparable (Vermote et al., 2016). The implementation of image fusion in GEE was guided by the following theory and assumptions.
For a MODIS pixel with homogenous land cover (i.e., a homogenous MODIS pixel), the relationship with Landsat surface reflectance can be represented as where L and M represent Landsat and MODIS images, respectively; s i is the location of a MODIS pixel; s ij is the location of a Landsat pixel within the MODIS pixel at s i ; t k is the date; b is the band; and ε t k is the reflectance difference at date t k . Reflectance difference can be induced by various factors including geolocation error and solar geometry at the time of acquisition. Assuming that the error is the same between time periods (i.e., ε t0 = ε t1 ; Gao et al., 2006), Eq. (1) can be rewritten to approximate the Landsat reflectance at a time where only MODIS reflectance is available as where time-1 (t 1 ) represents a date without a true Landsat observation and time-0 (t 0 ) is a date with both Landsat and MODIS observations. Land cover is often heterogenous at the scale of a MODIS pixel. The reflectance of a MODIS pixel (M si,t k ,b ) can be represented as a mixture of the reflectance of each cover class within that pixel. In this context, the reflectance of a MODIS pixel could be thought of as the area weighted average reflectance of each land-cover class at the Landsat scale, where C is the total number of land cover classes; N is the total number of s ij Landsat-resolution pixels within a MODIS pixel at s i ; n c is the number of Landsat pixels in a land-cover class; and here L sij,t k ,b,c notates the L sij,t k ,b (Landsat pixel reflectance) in the c th land-cover class. In practice, a pixel's surface reflectance can be better estimated by considering a moving window that includes multiple MODIS pixels at times when Landsat observations are not available. Within this window, 'similar' pixels (i.e., pixels assumed to be the same land-cover class) are used to estimate the central pixel's value. For this study, similar pixels were selected based on the following criteria defined by Gao et al. (2006) For simplicity, L si ,t k ,b is hereafter redefined to represent the central pixel in the moving window and L s ij ,t k ,b to represent any other pixel within the moving window; σ t k ,b is the standard deviation of a band (b) of the Landsat image (L) at time t k ; and C L is the number of land-cover classes in image L. Similar pixels were constrained to those that were similar in the image pairs immediately before and after the prediction date. Note that M has been resampled to the same resolution as L (i.e., 30-meter). At this resolution for M, M s ij ,t k ,b refers to a pixel within the moving window, as defined above.
To improve the prediction accuracy, similar pixels within a landcover type were weighted based on spatial and spectral proximity (Zhu et al., 2010). It is assumed that similar pixels within a land-cover type are more likely to change similarly to the central pixel. Therefore, this weighting step ensures that pixels of the same land-cover class within close proximity are given greater weight. Different cover classes may not change at the same rate over time, so linear regression can be used to approximate a scaling coefficient (i.e., β in Eq. (5)) for the rate of change of an individual cover class within the window.
After adding the moving window and scaling coefficient, Eq.
(2) becomes where P is the number of similar pixels within the window; w p is the weight of the p th similar pixel in the window; β is a scaling coefficient for the MODIS pixel difference, which is based on the rate of change between image dates for all similar pixels in the window; and s ijp notates the p th similar pixel, which is located at s ij . The final refinement to the prediction is achieved by averaging predictions obtained from Landsat and MODIS image pairs before and after the prediction date. The approach outlined here follows much of the same theory and assumptions as image fusion with the ESTARFM algorithm, however, some details differ. The design of GEE prevents the direct iteration over pixels performed in the original STARFM and ESTARFM algorithms. However, similar approaches can be applied in GEE where the neighborhood around each pixel is used. As mentioned above, the calculation for the scaling coefficient (β) in the GEE image fusion approach is based on all similar pixels within a window. In contrast, the calculation in ESTARFM solely uses values within an individual MODIS pixel. As opposed to having a scaling coefficient applied to a broad region (e.g., an entire image), calculating the scaling coefficient in the whole window keeps this calculation local to the window but still increases the region over which the scaling coefficient is applied in comparison to the calculation implemented in ESTARFM. Producing a dense time series of a vegetation index can be performed with "blend-then-index" or "indexthen-blend" approaches, but it was found that the latter method produces higher accuracy results because there is less error propagation in the process (Jarihani et al., 2014). Although the algorithm used in the GEE image fusion is flexible in terms of its application to an index or reflectance, an "index-then-blend" approach was used. This raises an issue for calculating spectral similarity using the ESTARFM methodology. In assigning the weight to similar pixels Zhu et al. (2010) determined spectral similarity based on the correlation between Landsat and MODIS spectral vectors. However, spectral similarity calculated from the correlation of a single band could only take the value of − 1 or 1. Therefore, spectral similarity in the GEE image fusion was calculated as the absolute difference between Landsat and MODIS pixels allowing for any number of bands to be used in the fusion process.

Evaluation
Landsat 8 images with low amounts of cloud cover that were not used in the GEE image fusion process (i.e., cloud cover between 5 and 25%) were used to evaluate the quality of the images produced through the image fusion. These images were not suited for use in the image fusion, as the amount of cloud cover was too high, but contained cloud and snow free observations that could be compared with the image fusion predictions. Clouds, cloud shadows, and snow were masked from these images.
Two Landsat WRS-2 scenes were used for evaluation (path 43/ row 29 and path 43/ row 30). These scenes contained all land-cover types present within the broader study area and areas of high spatial heterogeneity. All six images meeting cloud cover requirements were used to evaluate the image fusion predictions (Fig. 1, Appendix 1). The six evaluation images were captured during April, June, July, August, and September, allowing performance to be quantified across the growing season.
The overall quality of the prediction was evaluated using root mean squared prediction error (RMSPE), bias, signed relative bias (SRB), and Pearson's correlation (r), calculated as: where n is the number of sampled pixels, NDVI i is the NDVI value for the i th pixel from the true Landsat image, N DVI i is the NDVI value for the i th pixel from the predicted image, NDVI is the mean NDVI of the true image, N DVI is the mean NDVI of the predicted image, and sign(bias) is the sign associated with the value of bias. These measures were calculated from a random sample of 5,000 pixels across each image to get an overall measure of accuracy at an image level. Accuracy was also evaluated by land-cover class in each image by sampling 5,000 pixels within forest, shrub, grassland, and agriculture land-cover types. The NLCD 2016 land-cover map (Yang et al., 2018) was utilized to determine the location of each of these land-cover types. In addition to evaluating prediction metrics, the NDVI from both true Landsat and predicted images was visually inspected to identify unusual patterns in the images.
To determine cloud-computing improvements to computation time and confirm that predictive performance of the GEE implementation of image fusion was comparable to the original ESTARFM algorithm, a Python version of ESTARFM (https://xiaolinzhu.weebly.com/open-so urce-code.html) was also executed and evaluated using the same preprocessed image pairs, including the same sampling process for evaluation metrics at the scene level and within each of the land-cover classes. The ESTARFM algorithm was run on a 16-core processor (AMD Ryzen 9 3950x) at an average clock rate of 4.2 GHz.

Land surface phenology 2.3.1. Image post-processing
As a post-processing step for estimating LSP, predictions of Landsatresolution images produced using the GEE image fusion were combined with all true Landsat observations from late 2016 through the end of 2017. Landsat images that contained cloud-and snow-free observations not used in the GEE image fusion were combined with the time series of fused images. True Landsat observations were retained at times when there were both true pixels and predicted pixels. The final time series contained a near-daily record over late 2016 through the end of 2017. Detailed information about the Landsat and MODIS images used in the GEE image fusion can be found in Appendix 1. All image processing, from pre-processing through post-processing, was completed using the Python and JavaScript API's for GEE.

Time series smoothing and phenometric extraction
Periods of cloud and snow cover are common in the interior Pacific Northwest, resulting in missing data in the early season (Appendix 1). To approximate the NDVI at these times, the median NDVI value from September 2016 was imputed to the months of January and February of 2017 as this time of year represents the NDVI expected for dormant vegetation in the region. This method of determining the 'winter' NDVI is similar to methods that have been employed to estimate long-term phenological cycles in northern latitudes, which typically do not have many snow-or cloud-free early-or late-season observations (Beck et al., 2006). Later in the growth cycle, other occasional missing values were linearly interpolated from values occurring before and after the missing observation.
Moving from a time series of observations to an estimate of phenometrics can be accomplished through several approaches (Cai et al., 2017;Zeng et al., 2020;Zhou et al., 2016). Some methods only provide an estimate of the start of season (SOS) while others offer a suite of phenology characteristics. A recent comparison of methods found that using double-logistic functions produced predictions that showed coherent spatial patterns, corresponded well with gross primary productivity, and agreed with the expected effects of elevation on phenometrics (Cai et al., 2017). Furthermore, double-logistic smoothing can capture asymmetrical annual patterns and is more robust to the effects of noise than local smoothing methods like Savitsky-Golay filtering and LOESS smoothing (Cai et al., 2017). Double-logistic smoothing also allows for the extraction of several phenometrics automatically and robustly across an image.
Double-logistic smoothing and automated phenometric-extraction techniques were applied to calculate phenometrics. The following double-logistic function was used to model the annual growth pattern: where v t is the vegetation index at time t; m 1 is the minimum index value or the 'winter' NDVI; m 2 is the maximum index value; m 3 and m 5 are the rate of change associated with the SOS and end of season (EOS), respectively; and m 4 and m 6 are the day of year (DOY) associated with the SOS and EOS, respectively. This double logistic model (Eq. (10)) also allows for the automated extraction of phenometrics that characterize six seasonal transition dates: start of green-up, SOS, maturity, EOS, dormancy, and length of season (Fig. 3). The SOS and EOS are represented by m 4 and m 6 , respectively, and are found at the point along the spring growth and fall senescence trajectory where the model's slope is steepest (i.e., maximum (SOS) and minimum (EOS) of the first derivative of Eq. (10)). Start of green-up, maturity, and dormancy correspond to the model's inflection points (i.e., local maxima of the second derivative of Eq. (10)). Length of season is the number of days between SOS and EOS. Other characteristics that could be used to describe the phenology for an individual time series include the seasonal amplitude (i.e., m 2 − m 1 ) and the slope at the SOS and EOS (m 3 and m 5 , respectively).

Evaluation
Fusion-derived LSP and phenometrics were compared with estimates from near-surface cameras and ground-based observations. Near-surface estimates came from the PhenoCam Dataset v2.0 (Seyednasrollah et al., 2019) and ground-based observations came from field data provided by the Oregon Department of Fish and Wildlife (ODFW; ODFW, unpublished data). To compare the fusion-derived estimates with PhenoCam and ODFW data, the pixel nearest the location of each PhenoCam or ODFW site were extracted. Scenes overlapping PhenoCam and ODFW sites include Landsat WRS-2 path 45/ row 29, path 44/ row 30, path 43/ row 30, path 43/ row 28, and path 42/ row 28.
The PhenoCam project is a network of digital cameras that observe near-surface conditions across North America (http://phenocam.unh. edu). While several PhenoCams are located within the region, only four cameras had an adequate number of observations for 2017. The four cameras are burnssagebrush, oregonMP, oregonYP, and cafcoo-keastltar01, referred to hereafter as PhenoCam-1 through 4, respectively (Fig. 4). The four cameras are located within land-cover types including sagebrush steppe (PhenoCam-1), ponderosa pine forest (PhenoCam-2, PhenoCam-3), and agriculture (PhenoCam-4). The camera located at the agriculture site did not have observations recorded before the time when the crop was planted in 2017, so values from the dormant period after the harvest were used to estimate conditions in early 2017.
Several preprocessing procedures were performed for the PhenoCam data (downloaded from https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_i d=1674) before their utilization in this analysis; further details can be found in Seyednasrollah et al. (2019). Green Chromatic Coordinate (G cc ), a canopy greenness metric, is the most similar metric to NDVI produced with the PhenoCam data and has been frequently used to compare LSP from satellites and PhenoCams Richardson et al., 2018). Green Chromatic Coordinate is defined as where the subscript DN indicates the digital number and R DN , G DN , and B DN are the red, green, and blue bands, respectively. The double-logistic smoothing and phenometric extraction techniques were applied to the prepared G cc time series with the method described in Section 2.3.2. Field phenology data contributed by ODFW were collected at the Starkey Experimental Forest and Range (SEFR), located in the area overlapping Landsat WRS-2 scenes at path 44/ row 29 and path 43/ row 29 (Fig. 1). A total of 11 sites in three plant-community types were monitored at two-week intervals starting April 6th, 2017 and continuing through November 11th, 2017. The plant-community types included grassland, open pine, and mixed conifer (4, 4, and 3 sites, respectively). At each site, a 25-meter transect was established containing four onemeter square subplots. Within each subplot, the understory aerial cover was recorded in 20-percent intervals for four plant functional groups (forbs, grasses, deciduous woody, and evergreen woody). Additionally, the percent of plants in each of three phenophase categories (green-up, vegetative, and cured) was recorded in 20-percent intervals for each functional group. Therefore, the proportion of a given functional group in each phenophase is weighted by that functional group's total aerial cover at that time. The green-up phase included plants with any new growth (e.g., the onset of leaf greening, young leaves, increasing leaf size); the cured phase included plants with senescent leaves or other vegetative parts (e.g., loss of pigment, leaf drop, cured plant parts); and the vegetative phase included plants that fell between green-up and cured (e.g., fully green leaves, elongated stems, no longer putting on new growth).
The ODFW data of aerial cover and phenophase were summarized at  the transect level to represent a scale closer to a Landsat pixel. To accomplish this for a given functional group and phenophase (e.g., forb in green-up), the subplot-level values for aerial cover class and phenophase class were first converted to the midpoint of their respective ranges (i.e., 10% represents the 0-20% class). Next, midpoint values for aerial cover and phenophase were multiplied together at the subplot level. Finally, these values were averaged across the 4 subplots within a transect. The resulting values represent a summary of the proportion of each functional group in each phenophase relative to the cover of the functional group.

Spatio-temporal image fusion
Combining all available cloud-and snow-free Landsat images with the GEE image fusion predictions resulted in a substantial increase in the number of 30-meter observations. Within the two scenes for which GEE image fusion was evaluated, the final time series had a median of 215 observations in each pixel between DOY 1 and 300, with the middle 90% of pixels containing between 53 (5th quantile) and 248 observations (95th quantile) for this period. This temporal range (i.e., DOY 1 -300) captures the region's growing season, with a buffer on either side. Pixels associated with the lower range of these values corresponded to high elevation areas with persistent snow and cloud cover.
Visual comparison of the GEE image fusion predictions with true Landsat images from the same date showed relatively strong correspondence (Fig. 5). Areas where differences were most apparent corresponded to conditions of especially high-spatial or high-temporal variability, such as areas with forest and grassland in close-proximity and areas where surface water was ephemerally present. This was not unexpected as an accurate prediction of abrupt or rapid changes is a known challenge for image fusion techniques (Gao et al., 2006;Zhu et al., 2010).
Overall, the images predicted with the GEE image fusion strongly corresponded with the true Landsat observations from the same date (Table 1, Fig. 6, Appendix 2). Predicted images had the lowest correlation to true Landsat images in the early growing season (e.g., the image from DOY 94 at path 43/ row 30). The evaluation metrics calculated on a scene-wide basis also showed that the GEE image fusion performed better during peak growing season than during earlier or later periods of the year. The predictions had negative bias and SRB, overpredicting NDVI in five out of the six images. Overall, SRB ranged from 0.07 to − 0.54 and was best during the peak of the season. As expected, variability in the predictions was similar but slightly lower than variability in the true Landsat images (an artifact of downscaling course-resolution data). Differences in variability in the evaluation images follows seasonal trends also present in RMSPE, bias, and SRB.
When compared to the locally run ESTARFM algorithm, total processing time between the two methods differed substantially. On average, the GEE fusion completed in 105.83 min and the ESTARFM processing completed in 298.41 min. Evaluation metrics demonstrated similar performance between the GEE fusion and locally run ESTARFM methods (Table 1).
When predicted images were evaluated by the four land-cover classes, a seasonal trend was also present in the RMSPE, bias, and SRB (Fig. 7). Predictive accuracy varied by land-cover type, with shrubland and grassland generally performing better across the year in terms of RMSPE and bias. However, when the class variability is considered (i.e., SRB), difference in performance between land cover classes was less noticeable. While forest and agriculture classes tended to have higher variation in bias across the year, the SRB is more similar across classes because variability of NDVI is lower in grassland and shrubland. This is not unexpected as forest and agricultural areas are more spatially heterogenous than grassland and shrubland areas in this region. GEE fused images tended to overpredict rather than underpredict, regardless of the cover type. As found in the scene wide evaluation, ESTARFM produced similar results across the four different land-cover types (Fig. 7).

PhenoCam
General temporal trends in the double-logistic curve fit were consistent between the GEE image fusion and PhenoCam datasets (e.g., Fig. 8). However, not all NDVI-derived phenometrics estimated from the GEE image fusion dataset reliably aligned with the G cc -derived phenometrics estimated from the PhenoCam datasets. Both datasets resulted in similar SOS but different EOS for the PhenoCam located in sagebrush steppe near Burns, Oregon (PhenoCam-1; Fig. 8).

Table 1
Image evaluation results for each DOY and scene combination. A total of 5000 pixels were sampled from each image to calculate bias, correlation, RMSPE, and SRB. All pixels were used to calculate the variance of each image. Italicized values inside parentheses correspond to the ESTARFM results while the GEE image fusion results are in a regular typeface. The PhenoCam in agricultural lands near Moscow, Idaho (Pheno-Cam-4) showed little similarity in the predicted SOS and EOS metrics (Table 2, Appendix 3). However, the image fusion dataset did show a similar trend in declining NDVI at the EOS. Some of the differences in predicted metrics at this site are likely attributable to greening in the early season not recorded by this PhenoCam (this site had early season values imputed). As other authors have noted, conifer forests typically do not have distinct transitions between the dormant and active states. The double-logistic model is not suited to reliably capture such seasonal trends (Nijland et al., 2016). Accordingly, the phenometrics from the two PhenoCam sites located in conifer forest near Bend, Oregon (Phe-noCam-2, PhenoCam-3) did not align well with the NDVI-derived phenometrics (Table 2, Appendix 3).

Starkey experimental forest and range
The fusion-derived SOS corresponded with SEFR understory observations at a few of the open pine sites (Fig. 9, transects 4, 6, and 9), for which the estimated SOS occurred when the dominant functional groups (grass, forb) were primarily in the green-up phenophase. Because phenology monitoring at SEFR did not start until April 6th, 2017, the true beginning of the season and growth initiation was not captured for the SEFR grassland sites (Fig. 10). The fusion-derived EOS and dormancy closely aligned with field-observed patterns of senescence and temporal trends in dominant functional groups. EOS most frequently occurred when the proportion of plants putting on new growth rapidly decreased and right before or during the time when the proportion of cured plants increased. Dormancy coincided with the time of year when Fig. 7. Performance of the GEE image fusion and locally run ESTARFM within the four NLCD land-cover types and within the scenes located at A) path 43/ row 29, and B) path 43/ row 30. The x-axis shows the DOY in which the model performance was evaluated for each scene (3 DOY's/scene). Performance metrics evaluated for each scene-date combination include RMSPE, bias, and SRB. T.C. Nietupski et al. the proportion of cured plants was at or near its maximum. As expected, correspondence between the fusion-derived phenometrics and the SEFR understory field observations was lowest for the SEFR mixed-conifer sites (Appendix 4). Correspondence is expected to decrease with increased mature conifer canopy cover as conifer dominance will obscure the signal exhibited by understory plants.

Discussion
This study explored the utility of GEE in implementing an ESTARFMlike image fusion technique that was applied to estimate LSP at a 30-meter resolution. When estimates of LSP are derived exclusively from MODIS or Landsat, they are limited by the spatial or temporal characteristics of these data. As a result, intraannual LSP from MODIS is limited to 250 to 500-meter resolutions (Zhang et al., 2018(Zhang et al., , 2003, while higher spatial-resolution estimates from Landsat are often limited to multiyear averages (Melaas et al., 2016(Melaas et al., , 2013. Both approaches serve to characterize different aspects of phenology but may be constrained in their application to processes that fit within their respective spatial or temporal domains. Recently launched satellites like Sentinel-2 and the creation of the Harmonized Landsat Sentinel dataset (Claverie et al., 2018) have created new opportunities for estimating LSP with high-spatial and high-temporal resolution (Bolton et al., 2020;Gao et al., 2020;Zhang et al., 2020). However, these data lack the long archive available with sensors like Landsat or MODIS, which allow for LSP estimates from the past two decades or more.

Spatio-temporal image fusion
This study demonstrated that spatio-temporal image fusion implemented on a cloud-computing platform can produce accurate image predictions throughout the growing season. Implementing image fusion in cloud-computing environments can increase the accessibility of image fusion datasets for enhanced prediction of intraannual phenology. The  evaluation showed that our implementation of image fusion on GEE had similar accuracy to ESTARFM for a heterogenous landscape, both when considered as a whole and within particular vegetation classes. The accuracy reported in this study was also within the range of accuracy reported by other studies evaluating image-fusion algorithms Liao et al., 2016). Early season predictions from GEE image fusion were the least correlated with true Landsat images, which is likely attributed to the rate of vegetation change in early season or the length of time between image pairs and predicted images (Fig. 2, Appendix 1). The linear-change assumption of the model is less reliable when time between predictions is greater. Implementation on GEE also substantially decreases the processing time required for large-scale imagefusion tasks, processing evaluation images 2.8 times faster than the locally run ESTARFM algorithm. In testing the GEE image fusion algorithm, a year and a half of image predictions completed in approximately the same time as reported for a year of image predictions produced with the STARFM algorithm ; this is a significant reduction, considering that other comparisons show that ESTARFM processing takes longer than STARFM (Rao et al., 2015). Unlike the ESTARFM algorithm, the GEE image fusion method uses all MODIS pixels with the moving window to determine the conversion coefficient. Predictive performance in heterogenous landscapes for ESTARFM draws from this conversion coefficient and its relation to principles of spectral unmixing (Zhu et al., 2010). Specifically, the conversion coefficient accounts for the land-cover rate of change by estimating change within a land-cover class relative to the change in an individual MODIS pixel. Instead of considering change relative to an individual MODIS pixel, our implementation determines this coefficient for the entire window. This allows the relative rate of change to remain local to the window while providing a more generalized approximation based on the relationship between all MODIS and Landsat pixels in a cover class. In both panels, lines may overlap for functional groups with little aerial cover (i.e., those with cover recorded in the 0-10% class or low proportion in a given phase; e.g., shrub in transect 5 or, at day of year 100 in transect 5, grass in the cured phase).

Land surface phenology
The comparison between image-fusion and PhenoCam phenometrics showed mixed results. While general trends in phenology held for shrublands, phenometrics such as SOS and EOS were not similar between the datasets. In a worldwide study, Richardson et al. (2018) found that NDVI-derived estimates of the SOS from MODIS in agriculture and grassland differed by 25 and 11 days with standard deviations of 27 and 15 days, respectively. Differences in the spectra used for NDVI (image fusion) and G CC (PhenoCam), sun-sensor angles, cloud-or snow-cover effects, and field-of-view could all contribute to phenometric differences between the datasets. This study's use of the MCD43A4 product could have also impacted the similarity of image-fusion-derived phenometrics and PhenoCam phenometrics or field data. MODIS NBAR data are corrected to account for the pixel level anisotropy using a BRDF model calculated from a 16day window (Schaaf et al., 2002). Accordingly, any surface changes (e. g., vegetation growth/senescence, disturbance) that may have occurred over this time frame could impact the BRDF model and the adjusted reflectance. Depending on the surface conditions over the 16-day window, the BRDF adjusted reflectance could result in earlier or later NBARbased phenometrics compared to either PhenoCams or field observations. However, the benefits correcting view angle effects outweighed the potential phenology related drawbacks.
A unique characteristic of the SEFR field data was that vegetation was assessed for functional groups as opposed to individual species. Temporal trends in functional groups coincided with LSP estimated from GEE image fusion at the grassland site and most of the open-pine sites. Additionally, EOS estimates derived from the GEE image fusion dataset closely corresponded with the dates when the dominant functional groups transitioned to a senescent state. Some LSP studies have found good SOS correspondence with field data but poorer correspondence with EOS estimates Ganguly et al., 2010). Fieldbased estimates of SOS and EOS typically do not account for Fig. 10. The SEFR transects sampled in grassland sites for the 2017 growing season. The top panel shows percent cover by functional group at the four sites (transects). The bottom panel shows the proportion of vegetation in each phase by functional group. In both panels, vertical lines show the phenometrics estimated from the GEE image fusion's NDVI time-series, including start of green-up (SOG), start of season (SOS), end of season (EOS), and dormancy (Dorm). In both panels, lines may overlap for functional groups with little aerial cover (i.e., those with cover recorded in the 0-10% class or low proportion in a given phase; e.g., shrub in transect 5 or, at day of year 100 in transect 5, grass in the cured phase). herbaceous understory vegetation when making comparisons to satellite-based estimates (Nijland et al., 2016). In the SEFR open pine and grassland sites it appeared that this herbaceous component was indicative of the EOS and dormancy metrics estimated from the satellite data ( Fig. 9, Fig. 10). However, this herbaceous understory signal was obscured when overtopped by high tree canopy cover (e.g., mixedconifer sites; Appendix 4).
Our validation with ground-based observations and near-surface camera data highlights some of the challenges and opportunities with applying these datasets for LSP studies. Comparisons between satellite LSP and PhenoCam estimates may also benefit from the utilization of sensors in PhenoCams that capture wavelengths more similar to those used in multispectral satellites. For example, sensors with a nearinfrared band are more capable of observing vegetation structural changes Luo et al., 2018;Petach et al., 2014) and these types of sensors have been demonstrated to provide reasonable correspondence with LSP from multispectral satellite observations (Eklundh et al., 2011).
The comparability of field-based observations to LSP has historically been limited due to mismatches in scale and observation method (Hufkens et al., 2012); for example, budburst of a species vs. aggregate vegetation within a pixel. Mismatches in scale and observation method may be partially resolved by downscaling LSP to resolutions more attainable by field studies. Correspondence between satellite LSP and functional group phenology warrant further investigation to assess if field data collection of functional groups is preferable to species-level assessments.

Future directions and potential applications
Implementation of an ESTARFM-like image fusion algorithm in GEE was not without its challenges. Moving windows are not ideal for a platform designed to divide datasets into smaller units for parallel computation. Recently, Moreno-Martínez et al. (2020) implemented a bias-aware Kalman filter method in GEE for fusion of Landsat and MODIS, which produces a monthly gap-filled 30-meter product across the United States. The integration of deep learning libraries like Ten-sorFlow  with GEE also offers new possibilities for further innovation in image fusion methods, potentially building upon current deep learning methods for image fusion (Song et al., 2018), while taking advantage of the cloud computing infrastructure of GEE.
The phenometrics extracted from the GEE-image-fusion time series represented some grassland, shrubland, and open-pine patterns in this study. However, not all landcover types were adequately captured, such as dense conifer forest and agriculture. Sensitivity of NDVI to fluctuations in vegetation biomass or the double logistic function's ability to adequately model the development patterns of vegetation may account for challenges in these landcover types. As opposed to using a single index like NDVI, a multivariate perspective of phenology could be considered, which would allow for temporal patterns of different spectral bands or indices to be captured simultaneously (Pasquarella et al., 2016). Many other methods for phenometric extraction exist (Zeng et al., 2020) and could be used for vegetation with temporal patterns that are difficult to characterize with a double-logistic function. For example, in agricultural areas it may be necessary to employ a method capable of characterizing multiple crop cycles within a single year.
Capturing intraannual LSP may also help ecologists studying rapidly changing ecosystems, like those where species invasions lead to shifts in plant community composition (D'Antonio and Vitousek, 1992;Gunderson, 2000;Kerns et al., 2020). Phenometrics from GEE image fusion could be used to identify the distribution of exotic annual grasses. The distinct phenological traits (Wallace et al., 2015) and rapid colonization from Ventenata dubia in the interior Pacific Northwest (Kerns et al., 2020) may present this methodology's ideal application. Phenological patterns have been employed to identify and map populations of other invasive annual grasses in the western United States, however data used in these studies was limited in spatial resolution (Boyte and Wylie, 2016;Bradley et al., 2017). Limited spatial resolution presents a challenge in identifying small grass populations across spatially heterogenous landscapes. Additionally, annual grass species strongly respond to year-toyear climatic variation (Pilliod et al. 2017) and represent a substantial proportion of surface canopy cover in some shrubland, grassland, and open pine communities (D'Antonio and Vitousek 1992). Thus, in temporally dynamic situations, intraannual LSP estimates may be useful. The availability of high-spatial and high-temporal resolution estimates of LSP over the last two decades may also provide an opportunity to track the unique characteristics of this rapid plant invasion over time.

Conclusions
This study addressed two challenges in estimating vegetation phenology at ecologically meaningful spatial and temporal scales. Employing spatio-temporal image fusion on a cloud-computing platform like GEE is feasible and can produce high-quality predictions in reasonable timeframes. We found that GEE image fusion predictions were accurate and similar to those produced by a locally run version of ESTARFM. Furthermore, using a time series enhanced with the additional data from the image-fusion process can lead to LSP estimates that coincide with plant development patterns in shrubland, grassland, and open-pine land-cover types. We showed that satellite-LSP estimates aligned with the phenology of dominant functional groups at some field sites. However, we found that NDVI-derived phenometrics from satellite data and G CC -derived phenometrics from PhenoCam data did not closely align, which could be attributed to spectral, sun-sensor angle, and cloudcover differences between the two datasets.
Characterizing LSP in semi-arid regions like the interior Pacific Northwest continues to be challenging, especially within conifer dominated areas and when cloud cover prevents surface observation in the early growing season. Incorporation of data from satellites like Sentinel-2 and continued efforts to improve image fusion on cloud-computing platforms can help to overcome some current data limitations. Capturing intraannual vegetation development patterns can continue to provide insight into meaningful and useful processes as they play out across landscapes now and in the future.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.