Representativeness of Eddy-Covariance flux footprints for areas surrounding AmeriFlux sites

Large datasets of greenhouse gas and energy surface-atmosphere fluxes measured with the eddy-covariance technique (e.g., FLUXNET2015, AmeriFlux BASE) are widely used to benchmark models and remote-sensing products. This study addresses one of the major challenges facing model-data integration: To what spatial extent do flux measurements taken at individual eddy-covariance sites reflect model- or satellite-based grid cells? We evaluate flux footprints — the temporally dynamic source areas that contribute to measured fluxes — and the representativeness of these footprints for target areas (e.g., within 250 – 3000 m radii around flux towers) that are often used in flux-data synthesis and modeling studies. We examine the land-cover composition and vegetation characteristics, represented here by the Enhanced Vegetation Index (EVI), in the flux footprints and target areas across 214 AmeriFlux sites, and evaluate potential biases as a consequence of the footprint-to-target-area mismatch. Monthly 80% footprint climatologies vary across sites and through time ranging four orders of magnitude from 10 3 to 10 7 m 2 due to the measurement heights, underlying vegetation- and ground-surface characteristics, wind directions, and turbulent state of the atmosphere. Few eddy-covariance sites are located in a truly homogeneous landscape. Thus, the common model-data integration approaches that use a fixed-extent target area across sites introduce biases on the order of 4% – 20% for EVI and 6% – 20% for the dominant land cover percentage. These biases are site-specific functions of measurement heights, target area extents, and land-surface characteristics. We advocate that flux datasets need to be used with footprint awareness, especially in research and applications that benchmark against models and data products with explicit spatial information. We propose a simple representativeness index based on our evaluations that can be used as a guide to identify site- periods suitable for specific applications and to provide general guidance for data use. target areas, respectively. These results suggest that most AmeriFlux sites represent a specific land-cover type at each site, but they may not represent the dominant land-cover type at a larger spatial extent (e.g., 1000 m, 3000 m) around the tower.


Introduction
Global and regional networks of eddy-covariance towers such as those within the FLUXNET and the AmeriFlux provide the largest synthesized in situ datasets of energy, water, carbon, and momentum fluxes between Earth's surface and the atmosphere. By providing a global network of calibration and validation sites, flux networks serve as the cornerstone of the Earth Observing System (EOS) for global terrestrial vegetation monitoring (Running et al., 1999). The large datasets of greenhouse gas and energy surface-atmosphere fluxes are widely used to benchmark Earth system models (Chen et al., 2018;Ricciuto et al., 2018) and flux data products that are based on satellite remote-sensing and machine-learning algorithm upscaling (Tramontana et al., 2016;Verma et al., 2015). While benchmarking models against flux data helps identify key model shortcomings and guide their development, the value of comparisons is greatest when the data are used to understand which processes matter at which spatial and temporal scales. This so-called "space-time representativeness issue" remains one of the major challenges facing model-data benchmarking Hoffman et al., 2017). Adopting the definition in Nappo et al. (1982), representativeness describes the extent to which a set of (flux) measurements taken in a given space-time domain reflect the actual (flux) conditions in a different space-time domain. If focusing on the spatial aspect, the representativeness includes: 1) the network-to-region representativeness (Hargrove et al., 2003), i.e., to what extent do flux measurements taken at a relatively sparse network of tower locations reflect the aggregated flux conditions in a regional or global domain? 2) the point-to-area representativeness (Schmid, 1997), i.e., to what extent do flux measurements taken at a point (tower) location reflect the aggregated conditions over an area that is represented by a model-or satellite-based grid cell? The point-to-area representativeness is of primary interest in the present manuscript. Our primary focus is on evaluating flux data's representativeness and realizing that similar issues exist in models and other datasets (e.g., satellite data).
While data from eddy-covariance flux towers are well recognized for their rich temporal information (Baldocchi et al., 2001;Katul et al., 2001;Stoy et al., 2009), few eddy-covariance sites are truly homogeneous, and thus, the spatially dynamic nature of these data is often overlooked by modeling and synthesis communities (Giannico et al., 2018;Griebel et al., 2020). Briefly, the extent of the source area contributing to the flux at each observation time-the flux footprint-depends on the wind direction, relative measurement height, underlying vegetation-and ground-surface characteristics, and turbulent state of the atmosphere (Schmid, 2002). When aggregating over time, the flux footprint climatology is essential to identify the spatial extents and temporal dynamics of the areas contributing to the observed fluxes at a tower site (Amiro, 1998). While the footprint concept has gained recognition among the flux community (e.g., Chasmer et al., 2011;Helbig et al., 2016;Wang et al., 2006) and has been evaluated in a few regional-scale studies (Chen et al., 2012;Göckede et al., 2008;Wang et al., 2016), it was often omitted under the assumption of homogeneity at flux tower sites or out of necessity by large-scale syntheses and model-data benchmarking efforts (e.g., Heinsch et al., 2006;Jung et al., 2009;Verma et al., 2015). The main challenges include the lack of footprint information for many sites and difficulty in interpreting and incorporating footprint information into these modeling efforts (i.e., what flux towers observe vs. what models assume). Since most flux tower sites are located in more-or-less heterogeneous landscapes (Griebel et al., 2020;Stoy et al., 2013), the lack of "footprint awareness" leads to unknown biases and uncertainties in modeling and synthesis research (Metzger, 2018).
Some approaches to addressing the spatial mismatch in previous model-data benchmarking research are summarized here. First, Earth system models typically have a model grid cell of around 10 8 m 2 (Hoffman et al., 2017;Williams et al., 2009) or 10 6 m 2 in a few regional modeling cases (Buotte et al., 2019;Law et al., 2018), which are often several orders of magnitude larger than the flux footprints of around 10 3 -10 7 m 2 . Scale mismatch remains one of the key challenges and active research topics in model-data benchmarking Hoffman et al., 2017;Metzger, 2018;Xu et al., 2020). Model-data benchmarking is often carried out using point scale simulations at individual sites (e.g., Bonan et al., 2012;Chaney et al., 2016;Chen et al., 2018), prescribing with local meteorological forcing data, plant functional type, and other site characteristics that neglect the heterogeneity within the flux footprint. In some cases, model simulations use spatially-explicit gridded forcing data (e.g., satellite-based leaf area index, climate reanalysis data) and assume forcing data from the nearest grid cell (e.g., ~10 5 -10 9 m 2 ) to represent the conditions at individual sites (Ricciuto et al., 2018;Williams et al., 2020).
Second, remote-sensing data products and models that utilize moderate spatial resolution satellite retrievals (e.g., Moderate Resolution Imaging Spectroradiometer-MODIS) have a grid cell of around 10 5 -10 6 m 2 , which is close to the spatial scale of typical flux footprints. The challenge is to incorporate the temporally dynamic footprints into the modeling frameworks. The moderate resolution can be insufficient for capturing the fine-scale spatial variation, as seen by the flux footprints in some cases (Robinson et al., 2018;Wagle et al., 2020;Yang et al., 2020). Many of the common global, gridded satellite products are themselves aggregated in time and space, e.g., via spatial averaging and compositing over 16-day periods, which further compound the problem of matching flux to remote-sensing data, particularly during periods of disturbance or seasonal transition (Xin et al., 2013). Typically, a fixed set of pixels near the tower locations are selected (Heinsch et al., 2006;Verma et al., 2015;Zhang et al., 2017), assuming the flux conditions in the predefined areas represent the flux conditions observed by the flux footprints. Similar approaches and assumptions are also adopted in studies to extrapolate flux data to spatially-explicit data products (Jung et al., 2019;Xiao et al., 2014;Xiao et al., 2011) and to assess the representativeness of eddy-covariance networks at the regional and global scale (Chu et al., 2017;Villarreal et al., 2018;Villarreal et al., 2019).
Third, recent advances in fine resolution (e.g., ~10 2 -10 3 m 2 ) remote-sensing data products (e.g., ECOSTRESS, IKONOS, SPOT, CubeSats, Sentinel, MODIS-Landsat fusion) now enable the flux and remote-sensing data integration at a spatial scale sufficient to resolve the spatial variation as seen by the flux footprints (Anderson et al., 2018;Fisher et al., 2020;Yang et al., 2017). However, a similar approach that selects a fixed set of pixels near the tower locations is still commonly adopted because it lacks footprint information across sites. Alternatively, a few studies adopt a prognostic approach pairing flux data with land surface characteristics at carefully selected pixels based on the prevailing wind direction and estimated source areas (DuBois et al., 2018;Fisher et al., 2020).
To date, network-wide footprint information is still unavailable or sparse in flux data products, including La Thuile 2007 and FLUX-NET2015 (Papale et al., 2006;Pastorello et al., 2020;Pastorello et al., 2017), which leads to unknown biases and uncertainties in the modeling and synthesis research as discussed above. While several cross-site footprint analyses existed (Chen et al., 2011;Chen et al., 2012;Göckede et al., 2008;Rebmann et al., 2005;Wang et al., 2016), none were carried out extensively across a large number of sites, e.g., FLUX-NET, AmeriFlux. Also, the outcome of the footprint analyses was often not tailored for use by the modeling and synthesis communities. Fortunately, the flux measured at any given time can be traced back to its sources (i.e., spatial locations), providing us with an opportunity to connect the fluxes with land surface characteristics (e.g., land cover type, vegetation characteristics, landform). This study aims to reveal the importance of the network-wide flux footprint information to the modeling and synthesis communities. Our objectives are to 1) calculate and present the variation in the spatial distribution and extent of flux footprints across AmeriFlux sites, 2) evaluate the representativeness of flux footprints to target areas (i.e., areas within 250 m, 500 m, 1000 m, 1500 m, 2000 m, and 3000 m radii around flux towers) that are commonly used in modeling and synthesis studies, and 3) construct simple site-specific representativeness indices that indicate the footprint-to-target-area representativeness and improve the uses and interpretation of flux data by the modeling and synthesis user community.

Overview
To achieve our objectives, we: 1) calculated footprint climatologies for each site, 2) retrieved land surface characteristics in the footprint climatology areas and for a series of predefined target areas at each site, and compared land-surface characteristics between the footprint climatologies and the target areas, and 3) evaluated footprint-to-targetarea representativeness as described in Figure 1 and the subsequent sections.
Monthly footprint climatologies were calculated for 214 AmeriFlux sites using a two-dimensional footprint model (Figure 1a-1b, Section 2.2) (Kljun et al., 2015). Two land surface characteristics-one categorical (land cover type) and one continuous (vegetation index)-were retrieved at a fine resolution for each site to match the periods of footprint climatologies (Figure 1c-1d, Section 2.3). These land surface characteristics were used as proxies for representing the spatial variation of land conditions for the fluxes. We defined a series of target areas within 250 m, 500 m, 1000 m, 1500 m, 2000 m, and 3000 m radial distances from the tower at each site. The footprint-to-target-area representativeness was evaluated by comparing the land surface characteristics covered by the footprint climatologies to that in the target areas (Figure 1e-1f, Section 2.4). These target areas were selected based on the spatial resolutions of grid cells/pixels used in previous modeling and synthesis studies. Finally, we proposed qualitative indices for the footprint-to-target-area representativeness for each AmeriFlux site.
Micrometeorological data for each site were obtained from the AmeriFlux BASE data product (https://ameriflux.lbl.gov/, accessed in December 2019), including horizontal wind speed, wind direction, friction velocity, Obukhov length, air temperature, humidity, incoming shortwave radiation, and sensible heat flux. Additionally, the standard deviation of cross-wind velocity (V_SIGMA) was obtained for 106 sites from the AmeriFlux BASE data product, or by contacting the site investigators. Most sites' data had a half-hourly resolution, except the US-MMS, US-Ha1, US-Cop, US-Ne1, US-Ne2, and US-Ne3 sites, which had an hourly resolution. All of the following analyses used the data at their original temporal resolution (i.e., half-hourly, hourly). For sites with short data records (< 3 years), all available years were used. For sites with long data records, we selected 3-6 years based on data coverage and also the availability of the land surface characteristics (see Table S2 for details of site-year selection and data filtering). Altogether, 712 siteyears were used in the study, with the majority of years falling between 2001-2018. We adopted the Flux Footprint Prediction (FFP) model (http://f ootprint.kljun.net/) of Kljun et al. (2015) for the footprint calculations. The FFP model is a dimensionalized parameterization of a backward Lagrangian stochastic dispersion model (Kljun et al., 2002), which applies to a wide range of boundary layer stratifications and measurement heights (Kljun et al., 2015). Most required variables for the FFP model were obtained from the AmeriFlux dataset as described above, with the following exceptions. Temporal changes in roughness lengths and zero-plane displacement heights were derived following Chu et al. (2018) weekly for cropland, grassland, and wetland sites where seasonal dynamics of canopy heights were expected, and annually for the shrubland, savanna, and forest sites. The planetary boundary layer height was calculated following Kljun et al. (2015), who used the Obukhov length, friction velocity, air temperature, and sensible heat flux in the calculations. For V_SIGMA, we developed an empirical model using the random forests machine-learning algorithm to predict V_SIGMA for sites with no data available (Text S1, Figures S1-S2, Table S3). The model was trained, validated, and tested using the 106 sites with available V_SIGMA data. The model used six predictor variables (i.e., friction velocity, planetary boundary layer height, wind speed, incoming shortwave radiation, Obukhov stability parameter, and IGBP classifications) and showed robust performance against a withheld test dataset (R 2 = 0.79, mean absolute error (MAE) = 0.15 m s − 1 ) and an independent dataset (R 2 = 0.77, MAE = 0.16 m s − 1 ) collected using the portable eddy-covariance system by the AmeriFlux Tech Team through 42 site visits (https://ameriflux.lbl.gov/tech/site-visits/) ( Table S4). Details of the model validation and testing are described in the Supplement (Text S1, Figures S1-S2, Table S3).
The footprint calculations were carried out for each (half-)hour, generating a two-dimensional gridded map of footprint weights centered on the tower. All available footprint-weight maps in a month were aggregated to daytime and nighttime footprint climatologies (Amiro, 1998) (Figure 1a-1b). Daytime and nighttime were separated by the potential incoming radiation (i.e., > 0 W m − 2 ) calculated based on a site's geolocation and time zone. In total, around 14,000 monthly footprint climatologies were generated for all sites combined (Datasets S1-S3). Because the uncertainty of footprint models increases with upwind distance from the receptor (Kljun et al., 2015), we truncated the footprint climatologies at the 80% contour of source weights for all subsequent analyses (Kim et al., 2018). As Schmid (1997) suggested, a point source located outside the 50% contour of source weights must be at least 5-10 times stronger than the point source at the maximum source weight location to achieve a similar sensor response. Our sensitivity tests also showed that the 80% contour cutout selection had a marginal influence on the final results (Text S2, Figures S3-S7, Table S5).
To summarize the footprint climatologies across sites, the following metrics were calculated. The footprint fetch (X 80_day , X 80_night (daytime, nighttime)) was defined by the maximal distance from the tower location to the 80% footprint climatology contour. The footprint area (A 80_day , A 80_night ) was defined by the areas enclosed by the 80% contour. Both footprint fetch and area were averaged from all available monthly footprint climatologies in a site-year. The footprint symmetry index (S 80_day , S 80_night ) was calculated using the following equation: Equation 1 applies equivalently to S 80_night . The symmetry index ranges from zero to one, with a value of one indicating a perfectly symmetric (circular) footprint climatology centered around the tower location. The footprint overlap indices included the seasonal overlap index (O 80_sea-son_day , O 80_season_night ) and daytime-nighttime overlap index (O 80_daynight ) based on the following equations. Both overlap indices range from zero to one, with a value of one indicating perfectly overlapped footprint climatologies either across months (i.e., seasonal overlap) or between daytime and nighttime for each month. The overlap indices were calculated based on all available monthly footprint climatologies in a site-year.
where i and k designate one pixel and one month, φ day ik and φ night ik denote the daytime and nighttime footprint weights at i-pixel and k-month, respectively. I and K denote the total number of pixels within the footprint climatology and the number of months in a site-year, respectively. As we truncated the footprint climatologies at the 80% contours, the footprint weights were rescaled to sum up to unity within the 80% contours (e.g., = 1). The nighttime overlap index was calculated similarly using Equation 2.

Land surface characteristics
Two types of land surface characteristics-one continuous vegetation index and one categorical land cover type-were used in this study. We chose the enhanced vegetation index (EVI) derived from the Landsat images due to availability. EVI is closely related to vegetation and land cover variations, such as leaf area index (LAI), canopy type, plant physiognomy, and canopy architecture (Huete et al., 2002). Additionally, EVI is an important land surface product often used in upscaling tower-observed CO 2 fluxes and as a surrogate of CO 2 fluxes when the true fluxes (e.g., spatial-explicit) are unavailable (Chen et al., 2011;Fu et al., 2014;Stoy et al., 2013;Xiao et al., 2014). Landsat's 30 m nominal spatial resolution allows the characterization of landscape heterogeneity within the flux footprints. Landsat's 16-day revisit cycle also allowed us to obtain multiple images per year. For each site, we collected all cloud-free (<1% within a 3000 m radius from the tower) Landsat-5/TM and Landsat-8/OLI scenes (USGS Collection Tier 1 atmospherically corrected surface reflectance) that overlapped with the flux measurement periods. Landsat-7 ETM+ data were not considered due to large data gaps caused by the failure of Scan Line Corrector after 31 May 2003. Landsat-5/TM and Landsat-8/OLI provided sufficient data for this study. After the collection of Landsat images, EVI was computed using the following equation (Huete et al., 2002): where ρ NIR , ρ RED , and ρ BLUE are the surface reflectance of near-infrared, red, and blue bands of the Landsat images, respectively. We automated the aforementioned process of image retrieval and EVI calculation in the Google Earth Engine cloud-computing platform. Altogether, around 3,300 images were processed for the 214 AmeriFlux sites.
We chose land cover type as the categorical characteristic because it is commonly used in modeling and upscaling studies (e.g., Fu et al., 2014;Williams et al., 2009;Xiao et al., 2014). The land cover products used in this study include the 2001-2016 United States National Land Cover Dataset products (NLCD; https://www.mrlc.gov/) (Yang et al., 2018) and 2010 Land Cover of Canada (https://open.canada.ca/) (Latifovic et al., 2017). Both products were derived from Landsat imagery with a spatial resolution of 30 m. For NLCD, we selected the year that overlapped with or approximated the period when the flux data were collected. Because the two land cover data products had slightly different land cover classifications, we merged and consolidated the land cover types into 16 groups (Table S6).

Evaluation of footprint-to-target-area representativeness
The footprint-to-target-area representativeness was evaluated at each site based on the comparisons of EVI and land-cover composition between the footprints and the series of target areas within 250 m, 500 m, 1000 m, 1500 m, 2000 m, and 3000 m radial distances from the tower (Figures 1e-1f, S8-S9).
For land-cover evaluations, we first identified the dominant landcover type within the footprints, i.e., the land-cover type that had the highest footprint-weighted percentage based on all available monthly footprint climatologies (P footprint ). Then, the corresponding percentage of this dominant land-cover type in the target areas (P target ) was calculated. The evaluation of representativeness was based on P footprint , P target , and the Chi-square test between the footprint-weighted and target-area landcover compositions. For simplicity, we propose a three-level index illustrating the site-level footprint-to-target-area representativeness as follows.
• High representativeness: A site had ≥ 80% of a certain land-cover type both within the footprint and in a target area, and the landcover compositions were not significantly different (p ≥ 0.05) between the footprint and a target area (see an example in Figure S8a-S8b). • Medium representativeness: A site had ≥ 50% of a certain land-cover type both within the footprint and in a target area, and the landcover compositions were not significantly different (p ≥ 0.05) between the footprint and a target area, if not meeting the criteria of high representativeness ( Figure S8c-S8d).
• Low representativeness: A site had < 50% of a certain dominant land-cover in the footprint or target area, or the land-cover compositions were significantly different (p < 0.05) between the footprint and a target area ( Figure S8e-S8f).
The criteria of 50% and 80% were chosen following Göckede et al. (2008). All statistics of P footprint , P target , and Chi-square test are provided in the supplement (Dataset S4). In the preliminary tests, we found that 34 sites had incorrect land-cover classifications from the land-cover products (e.g., NLCD) as compared with sites' metadata (e.g., IGBP) (details in Text S3). Thus, the site-specific representativeness of land-cover composition was quantified at only 180 sites.
For EVI evaluations, we first matched the available Landsat EVI images with the corresponding month's footprint climatologies. At each available month, we calculated the target-area mean EVI (EVI target ), footprint-weighted EVI (EVI footprint ), and the sensor location bias (Δ) after Schmid and Lloyd (1999): where j designates a pixel, J denotes the total number of pixels within the footprint climatology, φ j and EVI j denote the footprint weight and EVI at pixel j, respectively. The sensor location bias, which was adopted in previous studies (Chen et al., 2011;Kim et al., 2006;Kim et al., 2018), represented the time-explicit footprint-to-target-area bias of EVI in each available month. Additionally, we quantified the site-level footprint-to-target-area representativeness using linear regression (model II) between all available footprint-weighted and target-area EVI at a site: where β 0 and β 1 denote the intercept and slope of regression, respectively. Similarly, we propose a three-level representativeness index based on the regression results as follows.
• High representativeness: A site had linear regression with R 2 ≥ 0.8, a slope of 1.0 ± 0.1, and an intercept of 0.0 ± 0.1 between the footprint-weighted and target-area EVI (see an example in Figures S9a-S9b). • Medium representativeness: A site had linear regression with R 2 ≥ 0.6 and p < 0.05, if not meeting the criteria of high representativeness (see an example in Figures S9c-S9d). • Low representativeness: A site had linear regression with R 2 < 0.6 or p ≥ 0.05 ( Figures S9e-S9f).
Different thresholds (e.g., 5%, 10%) of sensor location bias have been used to justify the representativeness in the previous studies (Chen et al., 2011;Kim et al., 2006;Wang et al., 2016). Here, we adopted a similar 10% threshold for both the sensor location bias in each available month (i.e., |Δ| ≤ 10%) and site-level regression (i.e., -0.1 ≤ β 0 ≤ 0.1, 0.9 ≤ β 1 ≤ 1.1). All statistics of EVI footprint , EVI target , Δ, and regression are provided in the supplement (Datasets S5-S6). While all 214 sites were used in the pooled analyses, site-specific regressions were carried out for only 166 sites with at least six matches between the monthly footprint climatologies and Landsat EVI. The median of available matches was 13 per site for the 166 sites.
Unless specified, all data processing and statistical analyses were conducted using R (R Core Team, 2019). Specifically, the R-version of the FFP model was downloaded through http://footprint.kljun.net/. The h2o package was used to interface with H2O (https://www.h2o.ai/), an open-source cloud platform for distributed and scalable machine learning, to carry out the random forests model training and validation.
The model II linear regression (lmodel2 package) was adopted for the comparison of EVI and evaluation of V_SIGMA prediction (Legendre, 2014). The ggmap and rasterVis packages were used in generating the footprint maps (Kahle and Wickham, 2013;Lamigueiro and Hijmans, 2018).

Footprint climatology across AmeriFlux sites
Across AmeriFlux sites, the spatial extent of flux footprint climatologies spanned several orders of magnitude (Figure 2, S10-S17). The footprint climatology extents, defined by the maximal distance from the tower to the 80% contour of the monthly footprint climatologies, ranged mostly from 100 m to 450 m (25th and 75th percentiles, Figure 3a). The footprint climatology areas, defined by the 80% contour of the monthly footprint climatologies, covered mostly around 13,000 to 260,000 m 2 in area (25th and 75th percentiles, Figure 3b). In almost all studied siteyears (> 95%), the nighttime footprint climatologies extended farther (~45% on average) and covered a larger area (~90% on average) than the daytime footprint climatologies. In rare cases, the daytime footprint climatologies may cover a slightly larger area than nighttime due to the distinct and variable wind directions between the daytime and nighttime (Text S4, Figure S18-S19). As the daytime and nighttime footprint climatologies largely overlapped in the areas with higher weights, around 93% of the site-years had O 80_daynight larger than 0.8 (Figure 3d, Text S2). Additionally, footprint climatology, which aggregated footprints from many time steps from different wind directions, smoothed out the variable and potentially extreme footprint conditions as seen by the half-hourly footprints ( Figure S19). That also explained the relatively high O 80_daynight at most studied sites. While most sites had a relatively symmetric footprint climatology centered around the tower locations, some sites were asymmetric (e.g., US-NR1 (S 80_night = 0.31), US-SRM (S 80_night = 0.18) in Figure 2), as a consequence of the unimodal or bimodal prevailing wind directions. Using an arbitrary criterion of 0.30 for the symmetry index, around 7% and 15% of the studied siteyears showed relatively asymmetric footprint climatologies for daytime and nighttime, respectively (Figure 3c). Noticeable monthly variability was found at ~32-44% of the studied site-years (i.e., O 80_season_day , O 80_season_night < 0.8, Figure 3d). Specifically, cropland, grassland, and wetland sites that experienced large changes of canopy heights throughout the growing season showed relatively pronounced monthly variations in footprint climatologies (e.g., US-Ne1, US-ARM, US-Ro5 in Figure 2).

Evaluation of land-cover type
While most AmeriFlux sites (83%-84%, i.e., daytime, nighttime) had relatively homogeneous land cover within the flux footprint climatologies, only 25%-39% of sites had similar land-cover compositions in the areas extending kilometers from the towers. Around 84% and 83% of AmeriFlux sites had one dominant land-cover type, as seen by the daytime and nighttime footprint climatologies, respectively (i.e., the dominant land-cover type accounted for ≥ 80% footprint-weighted land-cover percentages (Figure 4)). On the other hand, only around 64% of the sites had the same land-cover type also dominating (i.e., ≥ 80%) the 250 m target areas. The percentages of sites decreased rapidly as the target area extended farther from the towers. With the 1000 m and 3000 m target areas, only 39% and 25% of sites had the same dominant land-cover type as seen by the flux footprint climatologies. The median difference of the dominant land-cover percentages between the footprint and target area was ~6% and ~20% (P footprint > P target ) at the 1000 m and 3000 m target areas, respectively. These results suggest that most AmeriFlux sites represent a specific land-cover type at each site, but they may not represent the dominant land-cover type at a larger spatial extent (e.g., 1000 m, 3000 m) around the tower.
For site-specific evaluations of land-cover composition, around 64% and 65% of sites showed high representativeness for the 250 m target area based on daytime and nighttime footprints (Figures 5a, S20a), and the percentage dropped to 21%-27% and 19%-28% as the target areas extended beyond 1000 m from the towers. The percentage of highrepresentativeness sites also varied substantially among IGBP and ecoregion groups (Figures 5, S20). Cropland sites, mostly located in highly managed agricultural landscapes, had 90%-50% of highrepresentativeness sites for 250-3000 m target areas (Figure 5e, S20e). On the other hand, only half of the wetland sites showed high representativeness even for a 250 m target area (Figures 5g, S20g, examples in Figures S17 (US-ORv, US-Srr, US-StJ)). The percentage of high-representativeness sites dropped to ~20% when extending the target areas beyond 500 m from the towers. For savanna and grassland sites, the percentage of high-representativeness sites was around 50%-60% within the 500 m target area. It gradually decreased to about 16%-33% as the target areas extended beyond 1000 m from the towers. Except for cropland, all ecosystem types had 4%-18% of low-representativeness sites as the target areas extended beyond 1000 m from the towers. Among the ecoregions, southwestern desert/semiarid and Mediterranean California regions had 40%-82% of highrepresentativeness sites across all target areas (Figures 5l, 5m, S20l, S20m). Eastern temperate and Boreal regions had only 2%-16% of highrepresentativeness sites when the target areas extended beyond 500 m around the tower.

Evaluation of Enhanced Vegetation Index
Systematic EVI biases (4%-20%) were found between the flux footprint climatologies and the target areas using a fixed area across all available site-years ( Figure 6, Table 1). As indicated by the regression slopes, the systematic biases were ~4% and ~9% when using fixed 250 m and 500 m target areas across all site-years (Table 1), respectively. Bias increased as the target area extended farther from the tower, reaching around 20% with a 3000 m target area. At all target areas, footprint-weighted EVI was systematically higher than target-area EVI,

Fig. 2.
Maps of monthly footprint climatologies at 16 AmeriFlux Core sites. In each panel, a true-color satellite image (accessed through Google Map) centered on a flux tower (yellow crosshairs) was overlaid with monthly footprint climatologies (i.e., 80% contour, red: daytime, blue: nighttime) from a selected year. White circles denote the distance from the tower. For better visualization, each map was truncated at different target areas. The site ID and selected year are indicated in the inserted label. Similar maps for other AmeriFlux sites are provided in the Supplement (Figures S10-S17).
suggesting that the flux footprints mostly covered areas with higher EVI than the surroundings. Only marginal differences (~1% in regression slopes) were found between results using daytime and nighttime footprint climatologies (insets in Figure 6, Table 1). In addition to the larger systematic biases, the comparisons of footprint-weighted and target-area EVI were more variable (i.e., they exhibited case-by-case, month-by-month variation) with the target areas extending farther from the towers. The R 2 was 0.94 with a 250 m target area and dropped to 0.71 with a 3000 m target area. The results suggest that no single fixed target area can represent the flux footprint climatologies across sites without introducing biases. However, the 250 m target area was less biased in representing the footprint-weighted EVI when pooling all siteyears altogether.
The time-explicit sensor location biases (Δ) showed similar dependency on the extents of the target areas. Around 73% and 75% of the monthly sensor location biases were within the ±10% threshold when using a 250 m target area for daytime and nighttime, respectively (Figure 7). The percentages within the ±10% threshold decreased as the target areas extended farther from the tower, reaching 42% and 43% for daytime and nighttime at a 3000 m target area. Separating by seasons, the summer months (June-August) had slightly higher percentages within the ±10% threshold, which were, respectively, 75% and 77% for daytime and nighttime at a 250 m target area and 45% and 46% at a 3000 m target area (Figures S21e-S21f). In contrast, the winter months (December-February) had lower percentages of 70% and 68% for daytime and nighttime at a 250 m target area and 33% and 34% at a 3000 m target area, respectively ( Figures S21a-S21b).
For site-specific regressions using EVI, 53% and 54% of sites showed high representativeness for a 250 m target area based on daytime and nighttime footprints (Figure 8a, S22a), and the percentage dropped to 11%-16% and 11%-16% as the target areas extended beyond 1000 m from the towers. In all IGBP and ecoregion groups, the percentages of high-representativeness sites generally decreased as the target areas extended farther away from the tower. In addition, the percentage of high-representativeness sites varied substantially among the ecosystem types and ecoregions (Figures 8, S22). In sum, broadleaf and mixed  Figure (a) shows the distribution of the maximal distance (i.e., fetch (X 80_day , X 80_night )) from the tower location to the 80% footprint climatology contour. Figure (b) shows the distribution of the area within the 80% footprint climatology contour (A 80_day , A 80_night ). Figure (c) shows the distribution of the footprint symmetry index (S 80_day , S 80_night ). Figure (d) shows the distribution of the footprint overlap index for seasonal (O 80_season_day , O 80_season_night , solid lines) and daytime-nighttime overlaps (O 80_daynight , dashed line). When calculating these statistics, a data point represents an averaged value from all available monthly footprint climatologies in a studied site-year. The kernel density function was then obtained using data points from all available site-years. The black and grey lines show the distributions based on daytime and nighttime footprint climatologies, respectively. Both fetches and areas are logarithmic transformed (x-axis) in Figures (a) and (b).

Fig. 4.
Cumulative distributions of the percentage of dominant land-cover type in the footprints and target areas across AmeriFlux sites. The red and blue lines are based on daytime and nighttime footprint climatologies, respectively. The orange dash-dotted lines show results from a series of target areas (i.e., 250-3000 m radii centered around the tower location) as indicated by the color intensity (from dark to light, indicating an increasing distance from the tower).
forest sites had the highest percentages of high-representativeness sites across all target areas (Figures 8b, S22b), from ~77% with a 250 m target area to 27-32% when the target areas extended beyond 1000 m from the towers. Savanna and shrubland sites also had relatively high percentages of high-and medium-representativeness sites across all target areas. Cropland sites, while having ~50% of highrepresentativeness sites with a 250 m target area, decreased drastically as the target areas extended beyond hundreds of meters from the towers. Croplands, needleleaf forests, grasslands, and wetlands had around 20%-40% of low-representativeness sites as the target areas extended beyond 1000 m from the towers. Among the ecoregions, the great plains had relatively higher percentages of highrepresentativeness sites across all target areas (Figures 8k, S22k), from ~53% with a 250 m target area to 21-29% when the target areas extended beyond hundreds of meters from the towers. Northwestern mountain/coast, eastern temperate, southwestern desert/semiarid, and Mediterranean California ecoregions had around 20%-67% of lowrepresentativeness sites with a target area extending beyond 1000 m from the towers.   Table 1.

Example Cases
Three typical cases are discussed based on the representativeness for both the land-cover composition and EVI ( Figures S23-S26). First, an example demonstrates that a site's flux footprints are representative of most of the target areas. The US-MOz site-a deciduous forest located in a forest-dominated landscape-showed similar evaluation results for both the land-cover composition and EVI at most target areas ( Figure S23). The site was classified as high representativeness for the EVI at all target areas and land-cover composition up to a 1500 m target area. At 2000 m and 3000 m target areas, the site was classified as medium representativeness for land-cover composition due to a slightly higher percentage of cropland coverage.
Second, an example demonstrates that a site's flux footprints represent only limited extents from the tower. The US-Vcp site-an evergreen forest located within a forest-shrub-grassland landscape-showed a gradually declining representativeness as the target area extended farther from the tower ( Figure S24). The site was classified as high representativeness for both EVI and land-cover composition at a 250 m target area. Extending farther from the tower, the target areas covered more shrublands and grasslands. Consequently, the site was classified as low and medium representativeness for EVI and land-cover composition when the target area extended beyond a 1000 m radius around the tower. Both US-MOz and US-Vcp cases show similar evaluation results between the land-cover composition and EVI.
Third, contrasting representativeness between land-cover composition and EVI were found in several cropland sites, attributed to the variety of crop types that were not differentiated by the land-cover products. For example, the US-Ro6 site-a cropland located in an agricultural landscape dominated with corn/soybean rotation-was planted with wheat, clover, and corn in 2017-2018 ( Figure S25). The site was planted with spring wheat in April 2017. Kura clover was planted after the harvest of wheat in August. Corn was planted into the clover in the spring of 2018. While the site was classified as high/medium representativeness for the land-cover composition across all target areas, the site was classified as low representativeness for the EVI in all except the 250 m target area, as a consequence of the distinct crop types and phenology between the footprints and nearby fields. On the other hand, the US-Bi2 site-a corn cropland located in a corn-dominant agricultural landscape-showed medium to high representativeness for both landcover composition and EVI at all target areas ( Figure S26).

Implication of footprint-to-target-area representativeness
We advise the modeling and synthesis communities to be "footprintaware" when using the large-scale flux datasets (e.g., FLUXNET2015, AmeriFlux BASE data products), especially in research such as pointscale simulations and spatially-explicit land surface models, remotesensing-based models, and machine-learning upscaling studies. Our study highlights the importance of considering spatiotemporal dynamics of flux footprints, particularly within heterogeneous landscapes, when using flux data in model-data benchmarking. Across AmeriFlux sites, we found several orders of magnitude difference in the extents and areas of the footprint climatologies. As most flux tower sites were located in more-or-less heterogeneous landscapes (e.g., land cover, EVI) (Griebel et al., 2020;Reuss-Schmidt et al., 2019;Stoy et al., 2013), no single fixed-extent target area can represent the land surface characteristics as seen by the flux footprints universally across sites without introducing biases. These biases' sign and magnitude were site-specific and varied depending on the extents of target areas and the land surface characteristics. In general, the 250 m target area tended to be less biased in representing the land surface characteristics as covered by the flux footprint climatologies. Yet, around 36% and 47% of sites still differed in the land-cover composition and EVI between the footprints and a 250 Table 1 Pooled results of the comparison of footprint-weighted and target-area EVI for all available site-months (N = 3307). Linear regression model: EVI target ~ β 0 + β 1 × EVI footprint . RMSE: root mean square error. MAE: Mean absolute error. Numbers in square brackets indicate 95% confidence intervals. See Figure 6 for the results from the 250 m and 3000 m target areas.   Figure S21. m target area, respectively. With limited footprint information across sites, previous studies often assumed flux data collected at the tower locations adequately represented the conditions at a certain set of fixed areas surrounding the towers (e.g., 1 × 1 or 3 × 3 km 2 ) (Tramontana et al., 2016;Verma et al., 2015;Xiao et al., 2014), or from a specific plant functional type (Chen et al., 2018;Williams et al., 2009). Such assumptions may not hold for many sites and would need to be revisited. We discuss potential approaches to being "footprint aware" in the following sections. We advocate that the representativeness indices can be used to identify and select sites suitable for specific applications considering the spatial scales of interest and relevant land surface characteristics-a subsetting approach. While simple and straightforward, the subsetting approach utilizes only a portion of available site-data. Also, the criteria of the representativeness levels may not be suitable for certain research and applications. Alternatively, the representativeness indices can be used to provide guidance of data usage for sites with different representativeness levels-a hybrid approach. For sites classified as "highrepresentativeness" for a given target area, the flux data may be used without additional detailed footprint analyses (after considering the limitations discussed below). Sites classified as "medium-representativeness" for a given target area were often located in a relatively heterogeneous or patchy landscape, and hence a prognostic method, such as pairing flux data with land surface characteristics at carefully selected pixels (i.e., a spatially-explicit method; DuBois et al., 2018;Fisher et al., 2020;Verma et al., 2015;Zhang et al., 2017), might be sufficient. Our monthly footprint climatologies (Figures 2, S10-S17, Datasets S2-S3) could be used to define the areas. Alternatively, a more sophisticated method that incorporates the subgrid variability in model simulations would be necessary to adequately link the observations with model outputs (Baldocchi et al., 2005;Bonan et al., 2012). Sites classified as "low-representativeness" for a given target area were typically located in a heterogeneous or patchy landscape, and additionally, were often characterized by time-varying footprint climatologies and/or distinct vegetation phenology within the landscape (e.g., different crop types in adjacent fields, a mix of different plant functional types, vegetated and non-vegetated cover). We advise that site-specific footprint analysis is needed to use and interpret flux data at these sites. The footprint quality flags recently adopted by AmeriFlux and Integrated Carbon Observation System (ICOS: https://www.icos-cp.eu/) or the time-explicit footprint weight maps provided by National Ecological Observatory Network (NEON: https://www.neonscience.org/) can be used to identify and filter out periods when flux footprints covered the less desired source areas. Several site-level studies also demonstrated the potential benefits of utilizing detailed footprint information in decomposing flux data signals in a patchy and heterogeneous site (Helbig et al., 2016;Tuovinen et al., 2019;Wang et al., 2006). Alternatively, a spatiotemporally-explicit method that incorporates both the temporal dynamics of footprints and the spatial variations of land surface characteristics is highly recommended (Chen et al., 2010;Fu et al., 2014;Ran et al., 2016;Xu et al., 2017a;Xu et al., 2017b).
Several cross-site footprint analyses were carried out at the network scales previously (Chen et al., 2011;Chen et al., 2012;Göckede et al., 2008;Rebmann et al., 2005;Wang et al., 2016). While incorporating certain aspects or metrics adopted in those previous studies, our approach is tailored to the modeling and synthesis data users who intend to use flux data for many sites. We demonstrate our approach's feasibility to be implemented "extensively" across a large number of sites and propose a simple representativeness index that could be used for site selection and data use guidance. Such an "extensive" approach can help subset data that better fulfill the space-time assumptions underlying a particular application. To address the space-time mismatches between flux observation and model simulation in more detail and utilize (rather than subset) the full set of available sites, we recommend a hierarchy of multiple approaches where feasible . For example, Griebel et al. (2020) proposed a practical approach to account for contributions of spatial sampling variations due to varying wind directions over heterogeneous surfaces for annual budgets of carbon fluxes. While the approach does not calculate footprints explicitly, it had relatively few implementation requirements and was successfully applied to 154 sites in the FLUXNET2015 dataset. Alternatively, a few studies proposed relatively "intensive" approaches to fully incorporate the footprint dynamics in footprint-to-target-area upscaling frameworks (Fu et al., 2014;Metzger et al., 2013;Ran et al., 2016;Reuss-Schmidt et al., 2019;Xu et al., 2017b). These approaches show merit for providing spatiotemporally-explicit flux data products that could be benchmarked against model predictions at designated grid cells. One such development even shows promise to close the surface energy imbalance frequently observed at flux towers (Metzger, 2018;Xu et al., 2020), which to date cripples data synthesis and model-data fusion with another pervasive bias (Cui and Chui, 2019;Mauder et al., 2020;Stoy et al., 2013). However, the data requirements for implementing these approaches across sites are comparatively high. These include the availability of fine-resolution (spatially and temporally) land surface characteristics and/or reprocessing the high-frequency eddy-covariance data. Our representativeness results could serve as a guide to identify and prioritize the sites where the intensive approach is deemed necessary .

Limitations and uncertainties
The FFP model-an analytical parameterization of a Lagrangian stochastic dispersion model-is robust in performance (Heidbach et al., 2017;Nicolini et al., 2017) and flexible enough to be applied across many sites over long periods. However, the FFP model, as well as most footprint models, has its assumptions and limitations. These include the assumption of steady-state conditions, a horizontally homogeneous turbulence field, no vertical advection, and the applicability of the Monin-Obukhov similarity theory (Rannik et al., 2012;Kljun et al., 2015). While we recognize these assumptions and carefully filter out less ideal sites and/or data periods (Table S2), it is challenging to systematically evaluate these assumptions (especially the last two) across AmeriFlux sites. We briefly discuss the potential uncertainties and implications here.
With flux data being carefully quality-controlled/filtered by the site investigators and a large number of (half-)hourly footprints aggregated for monthly footprint climatologies, we argue that the influences of non-steady states should be marginal on the final results. The unmet assumption of homogeneity at certain sites requires caution when interpreting the evaluation results. Similar to previous footprint-based evaluation studies (Chen et al., 2012;Göckede et al., 2008), our evaluation of footprint representativeness is inherently influenced by landscape heterogeneity because it violates the homogeneity assumption of most footprint models. The sites classified as "low-representativeness" might be prone to additional uncertainties resulting from an inhomogeneous turbulence field (Heidbach et al., 2017). Besides, ~14% of the studies sites were located in an undulated terrain, a strong slope (>10%), or a hilltop, and ~28% of the studied sites had the eddy-covariance measurement height within the roughness sublayer, assuming it extends to 1.5 times the aerodynamic canopy height (Chu et al., 2018). The possible occurrence of advection and violation of the Monin-Obukhov similarity theory may complicate the footprint calculations. We suggest using caution when interpreting results from these sites (listed in Dataset S4-S5). Further site-specific footprint analyses are highly recommended for these cases. It should be noted that the footprint models typically assume flux sources/sinks are located dominantly in the plant canopy. Such an assumption implies that our footprint evaluations are less applicable to CH 4 and nighttime CO 2 fluxes, especially at tall forest sites.
Additional uncertainties in footprint calculations could result from the selection of footprint models or uncertainties in the input variables/ parameters. First, we recognized that there are other types of footprint models such as the analytical models (e.g., Hsieh et al., 2000;Kormann and Meixner, 2001;Schmid, 2002;Schuepp et al., 1990) and Lagrangian stochastic models (e.g., Leclerc and Thurtell, 1990;Thomson, 1987). Our preliminary tests found that the extents and areas of footprint climatologies could vary substantially using a different footprint model (e. g., Kormann and Meixner, 2001 (KM01 model)) (Text S2, Figure S3). Yet, as the areas with higher footprint weights were largely overlapped between the FFP and KM01 models, we found only marginal differences in the results of footprint-to-target-area representativeness, footprint-weighted land-cover percentage and EVI (Text S2, Figures S4-S7). Footprint model benchmarking and validation is an active research area beyond the scope of this study, and we refer interested readers to the relevant studies (Arriga et al., 2017;Dumortier et al., 2019;Heidbach et al., 2017;Kumari et al., 2020;Nicolini et al., 2017). We also advocate that future research should aim to better quantify and constrain the uncertainty of footprint models, as ultimately, all footprint models are "models" and have their limitations and uncertainties. Second, we parameterized the roughness lengths and zero-plane displacement heights following Chu et al. (2018). This allowed us to track canopy dynamics (e.g., seasonal dynamics for croplands, Figures S25-S26) and evaluate many sites where required metadata (e.g., canopy structure) are not always available. Yet, the approach has its assumptions and limitations (Chu et al., 2018). Potential uncertainties or biases could result from the violation of the Monin-Obukhov similarity theory, an aerodynamically inhomogeneous turbulence field, or any unaccounted changes in canopy structures. Again, caution should be exercised when interpreting results at sites with complex terrain and relatively low eddy-covariance measurement heights because those sites likely violate the assumptions mentioned above. Third, the use of predicted V_SIGMA could introduce additional uncertainties in the footprint calculations, especially in the lateral dimension of the footprint area (Detto et al., 2008). Yet, we argue that the influences should be marginal on the final evaluation results, as our analyses were conducted using monthly footprint climatologies, aggregating footprints from many time steps and different wind directions. The aggregation process tends to cancel out random differences in the lateral dimensions of the half-hourly footprints. To facilitate an in-depth footprint analysis at a finer temporal scale (e.g., daily, hourly), we advocate that V_SIGMA should be calculated and provided when possible.
Our evaluations were constrained by the availability of both flux tower data and gridded land surface characteristics, which may not be comprehensive in all possible situations. We suggest that future research should further investigate the following aspects. First, the footprint climatology, which aggregates footprints from many time steps and provides a panoramic view of footprints over a certain period, smooths out the variable and extreme footprint conditions as seen by the halfhourly footprints (Text S4). We suggest future research can explore the use of half-hourly footprints directly if there is a need to resolve the fine-scale footprint variability (e.g., CH 4 flux; Helbig et al., 2016;Tuovinen et al., 2019) or if the data-model integration can be carried out at a finer temporal scale (e.g., Xu et al., 2017b). Second, both land-cover type and EVI are important land surface characteristics often used in modeling and upscaling tower-based CO 2 fluxes. However, the relation between the land surface characteristics and fluxes may be nonlinear and ecosystem-specific. We suggest that future research could explore other proxies of fluxes (e.g., machine-learning predictions, model outputs from mechanistic models) such as those used in the evaluation of the network-to-region representativeness (Carvalhais et al., 2010;Papale et al., 2015;Sulkava et al., 2011). Third, our preliminary tests found that ~82 sites had mismatches between the site's IGBP classifications and the dominant land-cover obtained from the land-cover data products (e.g., NLCD, Land Cover Canada). Around 59% of the mismatches could be attributed to differences in classifications, especially in patchy and sparsely-vegetated landscapes (e.g., savanna, shrub-grassland, forest-wetland mosaics), disturbed and restored sites (e.g., logged and replanted forests), and wetlands (see Text S3).
However, we found that around 41% of the mismatches were incorrect land-cover classifications. This discrepancy highlights the need for better assessments and improvements to the available land-cover data products (Jin et al., 2019;Wickham et al., 2017;Wickham et al., 2013). Fourth, the selection of land surface characteristics largely depends on the research questions and data availability (Chen et al., 2012;Kim et al., 2006;Reuss-Schmidt et al., 2019), generally focusing on the vegetation and landscape features that constrain the target fluxes. When available, we suggest considering other fine-resolution (i.e., tens of meters or finer) land surface characteristics in the evaluations. Potential land surface characteristics may include, but are not limited to, land surface temperature (e.g., ECOSTRESS) (Fisher et al., 2020), topographic characteristic, canopy structure, and soils (e.g., LiDAR, digital elevation model) (Chasmer et al., 2011;Giannico et al., 2018;Jucker et al., 2018), vegetation composition (Allred et al., 2020;Davidson et al., 2017), fine-scale disturbance (Xin et al., 2013), vegetation indices (e.g., leaf area index, land surface water index, near-infrared reflectance of vegetation, solar-induced fluorescence) (Griebel et al., 2016;Magney et al., 2019;Pasqualotto et al., 2019;Taddeo et al., 2019), repeated digital camera imagery (Richardson et al., 2018), and other spectral retrievals Singh et al., 2015). Potentially, a holistic approach utilizing multivariate metrics could be adopted similar to the previous works evaluating network representativeness (Chu et al., 2017;Hargrove and Hoffman, 2004;Villarreal et al., 2018). Fifth, the footprint aggregation's temporal interval could be tailored to the retrieval frequency of specific land surface characteristics. For example, with the new terrestrial-monitoring satellites (e.g., ECOSTRESS, Sentinel-2) (Belgiu and Csillik, 2018;Fisher et al., 2020) and other airborne platforms (e.g., unmanned aerial vehicle, airborne visible-infrared imaging spectrometer) becoming available (Chapman et al., 2019;Klosterman et al., 2018;Wang et al., 2019), evaluations could be carried out at a finer temporal scale (e.g., days or weeks). Similarly, proximal optical monitoring at the scale of the flux tower footprint can help understand how best to match remote-sensing and flux data in time and space (Gamon, 2015). Sixth, our evaluation focused on the site-level representativeness and chose 1-6 years of available data at all studied sites. With some AmeriFlux sites being operated for 15-30 years and experiencing changes in vegetation compositions and structures, future evaluations could be carried out separately for individual periods or years (Kim et al., 2018). Similar considerations also apply to ecosystems that experience rapid changes in vegetation compositions and structures (e. g., disturbed, intensively managed sites). Last, many remote-sensing datasets are aggregated in time and/or space (Robinson et al., 2017;Xin et al., 2013) or merged from multiple data sources (Allen et al., 2007;Hilker et al., 2009). This adds extra complexity in matching the spatial and temporal scales between flux and remote-sensing data. While beyond the immediate scope of this study, we recommend that future efforts consider the spatiotemporal scales and relate data aggregation of the flux data and the remotely sensed datasets.

Conclusions
Eddy-covariance flux data are spatiotemporally dynamic. Largescale flux datasets, such as FLUXNET2015 and AmeriFlux BASE, need to be used and interpreted with "footprint awareness", especially in research and applications intended to benchmark against models and/or data products with explicit spatial information. This includes, but is not limited to, point-scale simulations and spatially-explicit land surface models, remote-sensing-based models, and machine-learning upscaling studies. The extent and direction of the source areas contributing to the measured fluxes at flux tower sites (i.e., flux footprint) vary largely across sites and through time depending on wind direction, measurement heights, underlying surface characteristics, and turbulent states of the atmosphere. As most flux tower sites are located in heterogeneous landscapes, the spatial variations of land surface characteristics and the temporal dynamics of flux footprints jointly lead to the so-called representativeness issue, i.e., to what spatial extent do the flux measurements taken at individual tower locations reflect the flux conditions at the corresponding model or data-product grid cells. The commonly adopted model-data integration approach that assumes that the flux footprint represents a fixed target area near the tower universally across sites would certainly introduce biases (e.g., 4%-20% in EVI, 6%-20% in the dominant land-cover percentage). These biases' sign and magnitude were site-specific and varied depending on the target area extent and the land surface characteristics. We suggest that the (half-)hourly footprint information or the required variables for footprint calculations should be generated routinely and made available to the modeling and synthesis user communities when possible. In the meantime, our proposed representativeness indices could be used as a guide to identifying sites suitable for specific applications considering the spatial scales of interest and relevant land surface characteristics. Additionally, the representativeness indices provide general guidance for data use for sites with different representativeness levels, recognizing there are uncertainties in these as well. We advocate that future research should explore a hierarchy of multiple approaches to fully address the representativeness issue.

Declaration of Competing Interest
None