Quantifying vegetation response to environmental changes on the Galapagos Islands, Ecuador using the Normalized Difference Vegetation Index (NDVI)

The vegetation of the Galapagos Islands (Ecuador) is strongly influenced by climate. El Niño events, seasonality, isolation, volcanism, and increasing human activity define the ecosystems of the archipelago. Given their socio-cultural and economic importance, it is critical to monitor the response of Galapagos vegetation to changes in climate and assess its vulnerability. This study explores the potential to use Normalized Difference Vegetation Index (NDVI) as a proxy to describe trends in primary productivity in the Galapagos (2000–2019) and models the relationship between NDVI and climate variables including evaporation and atmospheric carbon dioxide concentration. From numerous possible co-variates compiled from reanalysis and satellites, we identify the independent variables that most strongly influence NDVI using the least absolute shrinkage and selection operator (LASSO) method. Significant variables, including carbon dioxide concentration, evaporation, and autocorrelation (1-month and 12-months lagged NDVI) are then used to model NDVI in a generalized linear model (GLM) framework. The model predicts NDVI more effectively where values for NDVI are high (high elevation, lush vegetation), and clearly reflects seasonality. Validation of the model across pixels produces R 2 values ranging from 0.05 to 0.94, and the mean R 2 is 0.57 (0.65 for elevation >20 m). This methodology has the potential to continuously and non-intrusively monitor vegetation changes in sensitive ecological regions, such as the Galapagos.


Introduction
The Galapagos Islands, located ∼1000 km off the west coast of Ecuador in the equatorial Pacific, are world famous. According to the United Nations Educational, Scientific and Cultural Organization (UNESCO), these islands and the surrounding marine reserve are a unique 'living museum and showcase of evolution' [1]. Island ecosystems often boast uniquely diverse biota, due to their isolation and micro-climates, which can drive species endemism [2]. Even so, the Galapagos have an exceptional scientific legacy. Charles Darwin conducted field research in the Galapagos in 1835 that helped inform his theory of evolution by natural selection-a theory that has fundamentally changed scientific understanding of biological diversity [3]. The organisms he studied and chronicled in On the Origin of Species [4], including tortoises and finches, are legendary; they are textbook examples of adaptive radiation and provide living evidence of evolution in progress [1]. They also help draw approximately 170,000 annual visitors to the islands, bestowing considerable economic importance to the Galapagos [1].

Study region
The Galapagos archipelago (figure 1) sits on the Equator and extends for approximately 260 km E-W (90°01′ W to 89°16′ W) and approximately 200 km N-S (1°40′ S to 1°36′ N) [17], with a total area near 52,000 km 2 . It is composed of 128 named islands [6]; four of the islands, including the largest, Isabela, are inhabited, with a combined population of approximately 30,000 [1,18]. The archipelago is volcanic in origin and geologically young; The oldest islands formed 3-6 million years ago (mya) and the youngest formed 0.05-0.5 mya [19]. Lavas represent ∼44% of land area.
Monthly average daytime/nighttime surface temperatures range from 24-42°C/14-23°C, respectively, with average diurnal temperature swings of 14°C. Average precipitation varies from 88-263 mm/year and is largely seasonal, driven by the interaction of nearby air and sea currents. From January-May, the Inter-Tropical Convergence Zone (ITCZ) is to the south of the Islands and the Panama Current brings tropical heat and rain to the Galapagos. [5]. From June-December, the ITCZ moves north of the Galapagos and the Humboldt Current keeps the archipelago unusually cool and dry for its latitude [5,6]. Garúa (misty/drizzly air blown inland and upslope from the ocean) is characteristic of this period, and rainfall is uncommon [22]. El Niño events, which occur every 2-8 years [5,23], resemble sustained rainy seasons in the Galapagos. Along with volcanism, ENSO is primarily responsible for interannual climate variability in the archipelago. Of vegetated land in the Galapagos, ∼61% is dry forest, which dominates the lowlands. ∼21% is evergreen forest and scrubland, which occurs at higher elevations (>200MASL). A study of the Galapagos National Park found that ∼54% had vegetation cover characteristic of native ecosystems, while ∼2% was dominated by invasive species [2]. The dominant species associated with dry and evergreen forest ecosystems are summarized in table A1-2 in the appendix.

Normalized difference vegetation index
Many Earth-observing satellites are equipped with sensors designed to measure near-infrared (NIR) and red reflectances (ρ). These are converted into NDVI [24] as: NDVI values range from −1.0 and 1.0 [25]. High values indicate a higher density of green vegetation, low values indicate scarce, moisture-stressed, or dryland vegetation [26,27], and values close to 0 are likely to  [20]; and 'World ocean base'. Sources: Esri, Garmin, GEBCO, NOAA NGDC, and other contributors [21].) correspond to non-vegetated surfaces (such as bare soil, urbanized areas, and exposed rock/lava flows). Negative values correspond primarily to clouds, water, or snow.
Average NDVI in the islands varies from −0.04 to 0.78. Lower values of NDVI are found on Isabela, Fernandina, Santiago, and Marchena Islands. These values correspond to active volcanic areas near the shore, which also have lower elevations and predominantly arid climates. Higher NDVI values are mainly associated with elevations 200-800 MASL. Rivas-Torres et al (2018) [2] classified the mean values of NDVI across the diverse ecosystems of the Galapagos Islands, and showed that values 0-0.2 corresponded to rocky outcrops and/ or sparse, deciduous vegetation; values from 0.2-0.59 represented deciduous vegetation, and values higher than 0.6 indicated dense and evergreen vegetation. Zones of mixed vegetation (such as transitional zones) had intermediate mean NDVI values [2]. Section 3.1 gives additional details on the distribution of NDVI across the archipelago.

Independent variables
The set of independent variables used in this study was divided in 5 categories: air composition, atmospheric state, soil, ENSO, and topographical.
Variations in the concentration of air molecules and particles, which can potentially affect vegetation, are both natural and man-made. The concentration of carbon dioxide, a major greenhouse gas, varies due to anthropogenic emissions, biological activity, and air-sea fluxes. Carbon monoxide is associated with fires, as well as transport of polluted air from industrial areas. Dust is carried by wind, and can be generated by lofting from volcanoes or deserts. It can negatively affect plant photosynthesis, or in some cases supply valuable nutrients. Ozone is another species whose concentration is affected by anthropogenic activity and atmospheric transport pathways and that in high concentration can negatively affect plant development by oxidizing tissues exposed to it. These variables are listed in table 2 as variables 1-4.
The atmospheric variables include day and night air and surface temperatures. Excessively high temperatures stress plants and increase respiration rates. Wind can affect plant water loss rate and physical integrity, depending on the direction, velocity, and duration. The water cycle is very important in vegetation growth. The amount of rain (precipitation) is a key component of ecosystem water balance, along with the amount of water that evaporates from earth or plants. The variables thus considered are listed in table 2 as variables 5-12.
Water from precipitation can be stored by the soil at different rates depending on the type of soil. This is known as soil moisture. Along with the temperature of soil at different depths, this influences the amount of water and nutrients plants can absorb. Soil variables are listed in table 2 as variables 15-18. The El Niño-Southern Oscillation (ENSO) includes the warm (El Niño) and the cool (La Niña) phases of a recurrent climate pattern across the tropical Pacific. Affecting the air currents and the ocean temperature, it can impact the growth of plants. Indices of ENSO are listed in table 2, variables 19-20.
The topographical variable consists of an elevation data set, which allows identifying the elevation ranges where vegetation is generally more dense. In addition, from the gridded elevation we can calculate aspect (slope face direction) and slope (change in elevation per unit horizontal distance) over the archipelago, which could potentially affect light and water fluxes and hence local temperature and moisture status. This is listed in table 2 as variable 21.

Datasets
Data were obtained from the following sources: The Moderate Resolution Imaging Spectroradiometer (MODIS) is a sensor on board the Terra and Aqua satellites, which were launched in 1999 and 2002 respectively. As polar-orbiting satellites, they move around the Earth in a north-south orientation, with Terra crossing the equator in the morning and Aqua crossing in the afternoon. Terra and Aqua observe the entire Earth surface every 1-2 days, acquiring data in 36 spectral bands (ranges of wavelengths) [28]. The MODIS Collection 6 products provide vegetation index (VI) values on a pixel basis using blue, red, and near-infrared reflectances with a spatial resolution of 0.05°latitude/longitude (5,600m at the equator). These MODIS products are a monthly composite of cloud-free spatial Level 3 products. The NDVI products, used independently in this study, are MOD13C2 (Terra) and MYD13C2 (Aqua). The new Enhanced Vegetation Index (EVI) is another product from MYD13C2, and uses the blue band to remove atmospheric contamination, minimizes canopy variations, and maintains sensitivity over dense vegetation. NDVI is much more commonly used as a vegetation index in the Galapagos and other tropical research, compared to EVI [2,9,10]. MODIS products also include land surface temperature and emissivity (MOD11C3), both of which average the corresponding daytime and nighttime observations over each month.
The Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) incorporates and synthesizes data gathered since 1980 within a global climate model framework. MERRA-2 incorporates space-based observations of aerosols to represent their interactions with other physical processes in the climate system, and has a spatial resolution of 50 km in the latitudinal direction [29]. From the different products that MERRA-2 produces, we selected 15 products, including meteorological, atmospheric, and geologic variables that have the potential to affect vegetative phenology (as summarized above, and listed in table 2).
The Tropical Rainfall Measuring Mission (TRMM) (1997-2015) was designed to improve the understanding of the distribution and variability of precipitation within the tropics [30]. The TRMM-3B43-7 monthly precipitation product is used, which is based on data from TRMM 3B42 product (3-hours merged microwave and infrared based precipitation (mm/hr), calibrated to station precipitation gauges).
The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Digital Elevation Model Version 2 (GDEM V2), released on October 2011, has a coverage from 83°N-83°S latitude, spanning 99% of Earth's landmass, with a resolution of 30 meters [31].
Monthly CO 2 dry air mixing ratio from Mauna Loa, Hawaii (20°N) is used to quantify the potential impact of global carbon dioxide (CO 2 ) concentration on the vegetation of the Galapagos Archipelago.
MEI (Multivariate ENSO Index) and ONI (Oceanic Niño Index) are used to quantify the effect of ENSO over the islands. ONI, an index of the National Oceanic and Atmospheric Administration (NOAA), uses the sea surface anomaly for the Niño 3.4 region in the Equatorial Pacific, with El Niño defined as when the anomaly exceeds +0.5°C for three months. MEI is a weighted average of the anomaly of six meteorological variables as associated with ENSO: sea surface temperature, sea level pressure, surface air temperature, surface wind (meridional and zonal components), and cloud fraction [32].  [33][34][35]. All other datasets presented in table 2 (below) and NDVI datasets are spatially resolved and were downloaded from NASA's Giovanni web interface [36]. All data represent monthly values, except for the digital elevation model (DEM) which is temporally constant. Data were processed in MATLAB v9.4.

Methods
The methods used in this paper to model the spatiotemporal patterns in NDVI are the Least Absolute Shrinkage and Selection Operator (LASSO) regression for selecting predictor variables, followed by the generalized linear model (GLM) for fitting the selected variables. First, we preprocessed each dataset by filling in missing values using nearest-neighbor or linear interpolation, as long as the percentage of the missing data did not exceed 5% of the study period (≈11.5 months). Filled data represent 13.6% of the total study area. Then the remaining selected pixels are used as a mask to eliminate independent variables that have more than 5% of missing data. Each dataset was spatially regridded from its original resolution (shown in table 2) to match the NDVI resolution (0.05°≈5.6 km). The regridded method used is the nearest neighbor interpolation. Then, each series was standardized using the mean and standard deviation for that pixel. In addition to the original set of 20 variables (table 2, variables 1-20), lags of 1 month, 3 months, 6 months, 9 months, and 12 months was applied to the NDVI, soil moisture, and precipitation data to consider possible relationships between long-term moisture accumulation, stress, and vegetation. Finally, all data sets were divided into two parts. The first 183 months of the data set (80% of the data) was used for calibration of the model, and the most recent 49 months (20% of the data) was used in validation of the model. The model was additionally tested using two other subsets, 70%-30% and 60%-40% for calibration-validation respectively (table 4). Table 2. List of explanatory independent variables. Each independent variable is numbered, named, and presented with its unit, source, spatial resolution, and product name. The temporal resolution of NDVI and potential independent variables is monthly, expcet for topography which is constant in time. The table is divided in sections. The first row is the dependent variable, followed by the numbered rows of independent variables categorized under air molecules and particles, atmosphere, soil, ENSO, and topography.
Air molecules and particles For calibration, we used a regression-based technique known as LASSO [37,38], to identify the best set of independent variables that can explain NDVI variability while dealing with multicollinearity between the possible independent variables for each pixel. The LASSO method assumes a linear relationship between dependent and independent variables, with Gaussian noise, but constrains the L1 norm of the regression coefficients in the least-squares optimization. The inclusion of the L1 constraint (α=1) results in the shrinkage of certain coefficients to zero, hence providing a way to obtain the best subset of independent variables. The variables associated with the coefficients that were shrunk to zero were dropped, giving a final set of 10 independent variables.
Then, we developed a generalized linear regression model, which used maximum likelihood [39] to model NDVI with independent variables selected from the LASSO analysis along with functions of time to account for seasonality and interannual variability. The GLM used a Gaussian link function, which makes our model equivalent to a linear model. Autoregressive (AR) spectrum analysis was used to detect frequencies in the NDVI and select the peak NDVI cycle amplitudes. Thus, the model took the following form: where i indicate that the values are different for each grid point, β 0 is the intercept, β i (α i ) is the independent variable (frequency) coefficient, x ji is an independent variable, ω i is a frequency component, and ò i is an error term, assumed to be a Gaussian distributed random variable. Depending on the number of frequencies present in the data, the model could use up to three sine components. Different combinations of the ten possible independent variables, including lags, were individually tested, and p-values were calculated. A series of GLMs with a minimum of one independent variable and a maximum of ten (including autocorrelation) were constructed and run; the models with higher correlation with the NDVI trend, using the least number of independent variables and best p-value were selected. Models were compared using adjusted R 2 (hereafter referred to as R 2 ) and a robust linear regression model. The robust regression model used iteratively reweighted least squares (IRLS) to assign a weight to each data point. This weight was assigned equally to each data point in the first iteration and model coefficients were estimated using ordinary least squares. At following iterations, points further from the model predictions in the previous iteration are given lower weight, then model coefficients are recomputed using weighted least squares. This process continued until the values of the coefficient estimates converged within a specified tolerance. This weighting ensured that the final model was not much affected by outliers [40] and allowed us to identify a smaller set of independent variables that robustly improve NDVI prediction.
Finally we compared the model-predicted NDVI with the observed data at each pixel, and we validate the model using the second part of the data. Figure 2 shows the overall workflow.

Results and analysis
3.1. Description of NDVI data Figure 3(a) shows the spatial distribution of the average NDVI value, and figure 3(b) shows the standard deviation (SD). The SD values reflect the temporal variability of NDVI at each pixel. The maximum standard deviation is 0.18 (in NDVI units). Regions with low NDVI values also have low standard deviations, as shown in figure 3(b). However, higher standard deviations are not necessary indication of higher NDVI values.
To get a general overview of the NDVI trend for the years of this study, NDVI values were averaged spatially, giving a mean NDVI value for the whole archipelago per month. Figure 3(c) shows a linear trend and a two stepchange model for mean NDVI, along with a clear seasonal cycle. Using a linear regression on the whole period, mean NDVI shows a 1% annual increase (R 2 =0.065, p-value=5.73E-05, RMSE=64.1). Two significant change points were found using Pettitt's test. In significance order, one occurred at January 2010 (month=120, p-value=2.36E-08) and the other occurred at June 2003 (month=41, P-value=3.50E-03). During the first period, the NDVI average value was 0.3447 for 3.33 years (40 months). Then, during the second step, NDVI decreased, with an average NDVI value of 0.3089 for the next 6.58 years (79 months). Then, for the last step, NDVI increased, with an average NDVI value of 0.3673 for the next 9.17 years (110 months) figure 3(d) shows the seasonal cycle of NDVI. NDVI values are higher from January to June (max: 0.33-0.48 in May [interannual range]), which is a warm and wet season, and lower from July to December (min: 0.26-0.36 in October), which is a cool and dry season. The dry season has lower interannual variability in NDVI, while the wet season has higher interannual variability.
In figure 4, topographical features such as aspect, slope, and elevation are compared with average NDVI to observe relationships between vegetation and the island topography. Figure 4(a) (left) shows the spatial distribution of the aspect angle over the whole archipelago, and (right) the radial distribution in degrees of each pixel. There is not a clear relationship between aspect and NDVI values.
shows the spatial distribution of the angle of the slope over the whole archipelago, and (right) the scatter plot of the dependence of NDVI to the slope angle. R 2 =5.29% (p-value=5.01E-6). Elevation has the clearest relationship with NDVI. There are two marked elevation zones where NDVI values are very high (over 0.6). One is between 2-16 MASL and the other is between 250-800 MASL (figure 4(c)). Higher variability of the seasonal trend in NDVI is observed at elevations lower than 20 MASL. Because of this, later analysis is divided in two elevation ranges. One elevation range includes the pixels with an elevation lower or equal to 20 meters above sea level (MASL) (35% of the area of study) and the other includes pixels with an elevation higher than 20 MASL (65% of the area of study). Elevation is unevenly distributed, resulting in gaps in NDVI data for certain elevation ranges; thus, figure 4(c) shows monthly mean NDVI based on a 100 m elevation interval. Therefore, this figure does not show well trends within the lower elevation range. Even though pixels representing 2-16mof elevation comprise approximately 10% of the data, they are lost when averaged with the rest of the pixels in this range because of the decreasing number of pixels with respect to increasing elevation. Figure 5 shows maximum annual NDVI (mean and median, using dashed and solid red lines, respectively). Monthly MEI values are presented in blue bars, and volcanic eruptions that registered on the VEI scale are presented with black asterisks. Negative MEI values indicate La Niña events, and positive MEI values indicate El Niño events. Figure 5 shows that larger magnitude ENSO events tend to correlate with higher average maximum NDVI values, while minor events (such as those indicated between 2003 and 2008) tend to correlate with lower  overall NDVI values. Higher NDVI values are often found with negative MEI values, suggesting that during these events vegetation may benefit from extended effects of El Nino, even after it passes.

Selection of independent variables
After gap filling and standardization, 368 (374) pixels from Terra (Aqua) were available to be used in the analyses. The ten independent variables selected after lasso regression are CO 2 , evaporation, precipitation, surface temperature day, and lags of 1 month for NDVI and soil moisture, 3 months for soil moisture, and 12 months for NDVI, soil moisture and precipitation. The average R 2 for NDVI was calculated using the independent variables named previously for each elevation region. For pixels with elevation less than or equal to 20 MASL, R 2 =46.88%, and for elevation higher than 20 MASL, R 2 =69.03%. NDVI autocorrelation (NDVI t−1 ), CO 2 , evaporation (E), and NDVI autocorrelation (NDVI t−12 ) were selected as independent variables to build the final model (table 3); each was the dominant independent variable for 68%, 51%, 42%, and 46% of the area of study, respectively. Using these 4 independent variables (section 3.3), the R 2 values changed to 43.28% (ΔR 2 =−3.6% from using all the lasso-selected independent variables); for pixels with elevations 20 MASL, and R 2 changed to 66.41% (ΔR 2 =−2.62% from using all the lasso-selected independent variables) for elevations higher than 20 MASL.

Building of the model and calibration
The following results are based in the calibration set. R 2 values presented on the graphs are based on the mean NDVI, and modeled mean NDVI for the whole archipelago. Figure 8 shows the spatial distribution of R 2 for the two elevation ranges previously mentioned. In the top left panel, shoreline pixels with elevations less than or equal to 20 MASL, are highly variable in R 2 values. The number of pixels in this range is 130, representing 35% of the study area. Pixels with elevations higher than 20 MASL (top right panel) generally have higher R 2 values. For each range, each pixel shows the Table 3. List of independent variables used to build the model and the number of pixels that they influence (expressed as percentage of the study area), RMSE, adjusted R 2 , and P-value for carbon dioxide, evaporation, autocorrelation of one-month and 12-months lag in NDVI for the study period, 2000-2019.  time series of the modeled NDVI, the actual NDVI trend, and their average for the whole archipelago (figure 8, second row). The peaks of the modeled NDVI have lower amplitude than the real observations, but the periodicity and trends match precisely. Figure 9 top panel shows a bar plot of all usable pixels in the archipelago from which R 2 was calculated. The lowest R 2 is 4.5% and the highest R 2 is 93.8%. The values of R 2 range from 4.5%-89% and 16-93.8% in lower and higher elevation zones, respectively. The overall mean R 2 is 58.5%; for pixels with an elevation higher than 20 MASL mean R 2 is 66.32% and for pixels with an elevation lower than 20 MASL mean R 2 is 45.4% .
Variance decomposition was used on the independent variables retained in the final model ( figure 6). NDVI autocorrelation of one month lag explained the largest proportion of the modeled NDVI variance (42%), followed by evaporation (33%), NDVI autocorrelation at twelve months lag (21%), and CO 2 (5%). After subtracting the effect of the NDVI autocorrelation, evaporation explained 83% of the NDVI variance, followed Figure 8. On the left side R 2 values for pixel with an elevation lower or equal than 20 MASL, and on the right side R 2 values for pixel with an elevation higher than 20 MASL. The first row shows the spatial correlation of the normalized difference vegetation index (NDVI) R 2 . Pixels in black represent pixels outside of the area of analysis. The second row shows the time series of the normalized difference vegetation index (NDVI)(light blue) and its average (blue) and the generalized linear regression model by pixel (mustard) and its average (red). These results are for the calibration section of the model, and on the third row are the results for the validation section of the model. by CO 2 , which explained 17%. CO 2 was more important as an independent variable in areas with very low NDVI values, as shown in figure 3 of Rivas Torres et al (2018) [2], corresponding to sites with relatively recent lavas and scarce or absent vegetation. Fernandina Island and Isabela Island, which have the most active volcanoes of the archipelago [41] present good examples of low NDVI values associated with recent and active lava flows [42]. Evaporation increases in importance with the presence of vegetation, but its predictive power is also influenced by the type of vegetation present, which varies across the archipelago. NDVI on Santa Cruz Island, which is one of the main agricultural zones of the archipelago, is highly influenced by evaporation. NDVI autocorrelation (lag-1 month) shows higher predictive power close to shore on Isabela, Fernandina, Santiago, San Cristobal, and Floreana islands. The area with dense vegetation such as between active volcanoes on Isabela and Fernandina islands, an unusual ecoregion, also corresponds to higher values of NDVI autocorrelation (lag-12 months).
Different combinations of independent variables and periodic trends were used in an attempt to improve the predictive power of the model for the entire archipelago. The AR spectrum analysis showed that the peak NDVI cycle amplitudes (in decreasing order) have periods of 17, 34, and 13 months. After a series of iterations, the best fit to the NDVI data included periodic trends with 17 and 34-month periods. These timescales are influenced by ENSO [34]. However, adding these periodic terms to the model only resulted in a slight increase (0.002%) in the average values of R 2 across pixels. Therefore, they were not included in the final model.
Removing the NDVI autocorrelation resulted in a dramatic decrease in R 2 to a maximum of 27.82% for pixels with an elevation 20 MASL and 39.04% for pixels with an elevation >20 MASL (appendix, figure A2-2). . Bar plot of R 2 of the calibration section (top) and validation section (bottom) for the whole archipelago, segmented by two elevation ranges. One segment shows pixels with an elevation lower or equal to 20 MASL in red, and the other segment shows pixels with an elevation higher than 20 MASL in blue. On the right side of the plot, the total number of pixels corresponding to each R 2 % is presented in grey.
In both elevation regions, the model without autocorrelation does not fit with the observed NDVI trend during the period of lower NDVI in 2003-2008. The peaks of modeled NDVI have lower amplitude than the observations, but the periodicity and trends match exactly for pixels with elevations higher than 20 MASL. This match is weaker for pixels with elevations 20 MASL. Figure A2-3 (appendix) shows a bar plot of all usable pixels in the archipelago from which R 2 was calculated. For elevation less than or equal to 20 MASL, R 2 is between 0%-71%. The number of pixels with an R 2 greater than 50% is 26. For elevation higher than 20 MASL, R 2 range is between 0%-75%. The number of pixels with an R 2 lower than 20% is 41. The different ranges of R 2 , 0%-70%, have an average number of pixels of 50.
Because NDVI autocorrelation (one month-lag, and 12-months lag) improved the fit of the model, especially for the period of lower NDVI (2003-2008), NDVI autocorrelation was retained in the model. The final model was reduced to Residuals were temporally independent, indicating that the method produced an adequate fit to the NDVI dataset. Kolmogorov-Smirnov (K-S) test was applied to check the fit of the residuals of each pixel to a normal distribution. Residuals did not follow a standard normal distribution (p-value=6 x 10 −6 ). The mean value of residuals for the whole archipelago is 0.0036; the model very slightly underestimates observed NDVI, which is most apparent in peak values, but otherwise matches the observed trend well. Ten other normality tests [43] (α=0.05) were applied on the residuals. These results are presented in table A1-1. Figure 7 shows the comparison of R 2 with elevation, average NDVI, and NDVI skewness. Figure 7(a) presents the distribution of R 2 based on pixel elevation and average NDVI value. Pixels with R 2 values <50% are generally those with elevation <20 MASL and NDVI values <0.3. These pixels could be representative of disturbed or arid zones. Those pixels with higher R 2 values and elevations generally have NDVI values >0.3, indicating denser vegetation. Figure 7(b) shows R 2 based on the average NDVI for those pixels that have an elevation higher than 20 MASL Pixels show an inverted u-pattern where pixels with lower (<0.15) and higher (>0.7) values of average NDVI tend to have a lower R 2 value. Figure 7(c) shows the distribution of the NDVI skewness for the comparison of R 2 with average NDVI. Pixels that represent extreme average NDVI values likewise have extreme skewness values (<−1 or >1). Based on these results, NDVI under 20 MASL is more variable and not as influenced by the independent variables used in the current model.

Validation
The model was validated using data from January 2016 to January 2018 (the validation subset described in 2.3). The period of analysis is shorter than the full 4 years because of the need to incorporate NDVI autocorrelation (12 month lag) in the model. The bottom left panel of figure 8 shows the average R 2 value for those pixels with elevation 20 MASL is 35.47%, while the average R 2 values for higher elevations is 64.86% (bottom right panel). At higher elevation, the values of R 2 are similar to those values obtained from calibration processes with differences (Δ) <5%. On the other hand, 20 MASL, Δ may be up to 10%. RMSE values are low for all elevations, and lower for elevations 20 MASL at any percentage of the data set. Table 4 shows RMSE and R 2 for three sets of calibration and validation data. Additionally, the model was run with NDVI from the Aqua data set for the same validation period tested using data from Terra (table 4, last two columns). Results for lower Table 4. List of R 2 from calibration (higher% data) and validation (lower% data) using different data sets length. Validation values are compared to NDVI from Aqua data set (last two columns). In the bottom panel of figure 9, a large number of pixels with elevation lower or equal to 20 MASL have an R 2 value that range between 0 to 20% while at higher elevation, many pixels have R 2 values between 70 to 90%. Overall, the maximum R 2 is 75%.

Interpretation of results
Climate change is expected to trigger changes to temperature and precipitation patterns across the globe. In the Galapagos, climate change is expected to create warmer and wetter conditions with stronger El Niño events; all of these are conditions that enhance plant growth and greenness, which can be measured through NDVI. Our results show a statistically significant increase in NDVI of 1% per year over the 19 years of study, which could be a signal of anthropogenic climate change (e.g. changes in temperature, precipitation, or CO 2 concentration), although a change-point detection (step change) analysis shows a decrease in NDVI values from 2003 to 2010, which could be related to ENSO events and volcanic eruptions at that time. Furthermore, small peaks in average/median annual maximum NDVI (2002,2008,2011,2015, and 2017) could reflect the multi-year periodicity associated with ENSO [34]. Importantly, not all the above years correspond to El Niño events, e.g. 2008, but this year is noted as having been unusually warm and rainy in the Galapagos [5]).
Our results show that, out of over 20 potential predictors considered, temporal autocorrelation at 1-month and 12-months lag, evaporation, and carbon dioxide levels are the most important independent variables for modeling NDVI in the Galapagos Islands, yielding values of R 2 for the GLM that range from 5%-93%. Prior research supports the importance of considering temporal autocorrelation in explanatory NDVI models [44]; given the uniquely stable character of Galapagos ecosystems, it is unsurprising that intrinsic variables and/or system memory are important for modeling vegetative dynamics.
We offer conjectural interpretations of some of the spatial patterns seen in the coefficients of our model for NDVI variability. These interpretations are subject to refinement as longer and finer-resolution data sets become available.
Evaporation appears to be an important proxy of NDVI in the ecosystems that exhibit strong seasonality, as shown in figure 6. This is particularly true of the deciduous forests and shrublands across the archipelago (see Rivas-Torres et al (2018) [2], figure 3). In Santiago, Santa Cruz, San Cristobal, Genovesa, and Pinzon, the dominant ecosystem is dry forest, where primary productivity is limited by moisture availability and dependent on seasonal rainfall [45]; importantly, precipitation is extremely variable in the Galapagos (spatially, interannually, and intra-annually) [45] and its relationship to NDVI is not direct-especially in the xeric ecosystems that dominate the archipelago [2,45]. On the other hand, increased evaporation-even at a coarse scale-can indicate overall conditions that are favorable for plant growth, including increased moisture availability, higher temperatures, and increased solar irradiation [5,45,46]. Evaporation and leafiness are positively associated in seasonal forests, because plants only produce larger leaves with larger surface areas when they can afford to lose water ( figure A2-1). The inverse is also true; leaf shedding is a phenological response of deciduous organisms to water stress. Indeed, other studies of vegetative response to atmospheric and climate variables have shown evaporation to be more strongly correlated with above ground primary productivity (AGPP) than precipitation [46][47][48]. Chen et al (2019) [46] offer a compelling explanation for this phenomenon, arguing that evaporation provides a more accurate estimation of NDVI by considering the interaction of moisture availability with available solar energy-both of which are limiting factors for plant growth [46]. A CO 2 fertilization effect has been observed in the tropics as well as, more famously, in the northern hemisphere; Krakauer et al [49] showed that for Nepal, global CO 2 concentration was better at modeling NDVI intrannually than climate variables, (which were more important interannually). Moreover, we suggest that in volcanic zones of the Galapagos, pioneer species make global CO 2 concentration a better indicator of NDVI than local climatic variables. Fast growing plants (such as pioneers) are more sensitive to elevated CO 2 than slow growers [50][51][52]. On primary lava, temperature and moisture are highly variable due to the lack of surface vegetation and soils, yet early successional species are adapted to full sun conditions and photosynthesize more rapidly than later successional species. Idealized curves for photosynthesis rates from Bazzaz (1979) [53] show that in full sun, early successional species achieve photosynthesis rates of ≈25mg CO 2 dm −2 h −1 , compared to ≈15 mg CO 2 dm −2 h −1 for mid successional species, and ≈2-5 mg CO 2 dm −2 h −1 for late successional species. Pioneer species in the Galapagos include numerous representatives of the fast growing 'weeds' of the Poaceae family (including Aristida repens and Eragrostis mexicana (lovegrass)) and Asteraceae family (including Sonchus oleraceus L., a dandelion relative, and Baccharis gnidifolia). Just outside of the ashy barren zones, pioneer communities also include members of the Solanacae family (e.g. Solanum erianthum, the potato tree,Lycopersicon (tomato), Exedeconus miersii (shore petunia), and numerous vines. Solanaceae are C3 photosynthesizers, which have been shown to respond to elevated CO 2 levels with higher levels of photosynthesis, compared to C4 species [54]. The predictive power of CO 2 for NDVI in areas that correspond to recent lava flows suggests a quickly colonizing, resilient pioneer community that is adapted to local climate variation and geologic disturbance. At the same time, sensitivity to CO 2 levels suggests that global CO 2 increase may have the capacity to drive successional dynamics and pioneer community composition in the future [55].
In the same way as volcanic zones, shoreline areas have a very dynamic and challenging environment that promotes pioneer communities. In these areas, the importance of previous-month NDVI as a predictor of current-month NDVI speaks to the sustainability of these communities within seasons in these harsh environments. Finally, previous-year NDVI as a predictor represents the adaptability of vegetation to slower changes in climate factors on the region. This is especially visible for dense vegetation where NDVI values are higher.
Since our model works best to explain NDVI at higher elevations (>20 MASL), and works especially well for sites with elevation >200 MASL, low elevation areas and littoral zones may represent unique plant communities (endemic Galapagos dry forest, shrub land, and/or Mangrove forests), whose overall productivity responds most strongly to variables not considered in this study. The model works best to explain NDVI in the dry season (June-December), with higher variability during rainy periods, including El Niño events, a trend also observed by Kalisa et al (2019) [56]. Seasonally anomalous NDVI values could be representative of a nearly universal greening response to water availability during these periods.

Potential implications for management
Trueman et al (2011) [5] present a detailed description of how a changing climate might affect the phenology of the islands' varied vegetation. Increased El Niño, temperatures, and precipitation are generally expected to enhance greenness across the islands, by stimulating growth and producing larger flowers, leaves, and bushes, as well as through longer growing seasons and leaf retention periods. Although climate change is broadly expected to benefit Galapagos flora, it is the non-native species that are expected to benefit most [5]. Among endemic species, sustained wet conditions can be detrimental. Opunita echios (prickly-pear cactus) Jasminocerus thouarsii (candelabra cactus), and Bursera graveolens (palo santo tree) can become waterlogged, lose large proportions of seedlings, and even break or collapse under the weight of rapidly growing vines and herbaceous species, which proliferate under wet conditions [2,57].
In the highlands, Scalesia pedunculata (daisy tree) is also speculated to be vulnerable to climate change because historically, it has been shown to die back in response to El Niño events [5,57]. In addition to direct mortality, dramatic die offs of Scalesia could fundamentally alter the ecosystem by ceding valuable growing territory in the humid highlands to opportunistic non-native species that have been shown to respond to El Niño conditions with increased growth [5].
Over the last several decades, there has been a concerted effort in the Galapagos to remedy direct anthropogenic disturbances to the ecosystem, including creation of the Galapagos National Park (GNP) Service and delineation of the official Park boundaries, the establishment of the Galapagos Biosphere Reserve and Whale Sanctuary, and concentrated eradication campaigns directed at large non-native herbivores such as goats and pigs [10,58]. Watson et al (2010) [58] attempted to quantify the proportion of the archipelago that had been severely degraded by direct human activities (defined as active and abandoned agricultural areas and urban centers). The team arrived at an overall estimate of ∼5%, which varied across the five inhabited islands from a maximum of 17% and 14% degraded land area on San Cristobal and Santa Cruz Islands respectively, to 8%, 5%, and 0.1% of total land area on Floreana, Isabela, Santiago, respectively. The team highlighted that the humid highlands are significantly more degraded than the lowlands, because growing conditions are more favorable for agriculture and invasive colonizers [58]. The Galapagos National Park currently occupies over 96% of the land area of the Galapagos Islands, and is protected from direct anthropogenic stressors such as land conversion and agricultural development [2]. Accordingly, anthropogenic impacts on the ecosystems of the archipelago are primary indirect for the period of our study, related to the dynamics of invasive species and-we argue-the early signals of a changing global climate.
The variability of our model's predictability across the Galapagos highlights the diversity of the flora that NDVI is designed to assess. Although climate is fundamentally important to vegetative activity and phenology, the variables that most strongly influence NDVI have been shown to vary, both by ecosystem and by plant type [56,59]. The Galapagos has a unique climate and a stable biological community that has existed for thousandsperhaps tens of thousands-of years [22], sorted into unique ecosystems across the archipelago. Many of the ecosystems that flourish in the Galapagos are poorly quantified by NDVI because they are adapted to aridity and characterized by low levels of greenness; indeed, the evolutionary advantage of many endemic Galapagos plants comes from their toughness and ability to withstand extreme environmental conditions. Wetter summers and more frequent El Niño events, which are expected over the next century, could benefit non-native plants whose life strategy is to grow fast and die young, at the expense of natives that evolved over millennia to survive extremes and poor growing conditions, and rely on seed banks that are similarly adapted to the severity of the environmental context.
Due to the lack of public local information from the different islands, it was not possible to corroborate the different remote sensing datasets used in the manuscript. Resolution of the different sets plays a significant role in exploring and understanding the vegetation on the Galapagos Islands. Since the finest resolution on this study is ∼5.6 km from the NDVI dataset, a large number of small islands were not taken into account in the analysis. It is also worth mentioning that sparse vegetation, such as that common in dryland systems, is underestimated by the spatial resolution of MODIS (∼5.6 km). This effect would be dominant at elevations lower than 20 MASL where vegetation is primarily deciduous grass, shrubland, and dry forest. Moreover, the 50 km resolution of the MERRA-2 reanalysis does not capture topographic variability across the archipelago well, which could be hindering our effort to model the factors driving variability in NDVI. In addition, global data such as atmospheric CO 2 concentrations can affect the credibility of the results since they are based on observations from higher latitude than the Galapagos Islands. We consider that those measurements give us a general idea of how atmospheric CO 2 affects vegetation in the archipelago, although they may not fully represent site-specific interactions between atmospheric CO 2 and flora at the different islands.

Conclusions
Our study demonstrates that NDVI can quantify the effect of changing climate variables on vegetation (NDVI  0.3) in the Galapagos Archipelago. This is significant, given the cultural, economic, educational, and scientific importance of these islands. Their capacity to serve as a museum and evolutionary laboratory for future generations will depend on the resiliency of the ecosystem and of its anchor vegetation in the face of global change, as well as on its responsible management. Remote sensing products and models such as ours should help scientists and conservationists develop tools to monitor the ecosystem continually and non-intrusively, especially in the highlands where our model works well to explain NDVI variability, and where species invasion is active and change in community composition is likely.
Studies of vegetation regime shifts on volcanic islands and in the tropics using remote sensing are still sparse, so there is much research to be done. It remains to be seen whether remote sensing products such as NDVI might develop the resolution to detect fine scale community dynamics or species invasions, or whether additional data and non-linear approaches [60] can clearly distinguish NDVI responses to various components of climate change, such as natural versus anthropogenic. Such studies will be useful to the global community as we monitor the responses of our natural systems to a changing climate.   Because the primary purpose of this analysis was to identify cyclical trends, all variables were standardized and linear trends (e.g. increasing CO 2 were removed prior to graphing).