A review of globally available data sources for modelling the Water-Energy-Food Nexus

The Water-Energy-Food Nexus (WEFN) is gaining attention as an important approach to manage resources more holistically. Data availability, however, is a crucial barrier to WEFN implementation globally. To assist modellers and practitioners in navigating this barrier, ‘data hierarchies ’ were created, based on a comprehensive review of openly globally available data in water, energy, and food sectors. Global data sources considered include Satellite Remote Sensing (RS), Reanalysis, Global Observation Networks and Land Surface Models. The data hierarchies detail what data are suitable for modelling water, energy, and food availability globally. Furthermore, the review has highlighted which variables are not well monitored by global data sources, and for the first time has discussed global data sources ’ interactions in the WEFN. The findings from the review and the associated data interactions have been used to recommend priority areas from a WEFN perspective for improving global data availability, and include streamflow, wind speed, and hydropower data.


Introduction
The Water-Energy-Food Nexus (WEFN) promotes coordination and collaboration across its constituent sectors (FAO, 2014), to identify trade-offs, synergies, and the most beneficial policies across all sectors (Bazilian et al., 2011).This builds on the integrated water resources management (IWRM) approach, but the WEFN instead considers the water, energy, and food sectors with equal importance (Wicaksono et al., 2017;Bach et al., 2012).The WEFN approach is critical to achieving the UN's Sustainable Development Goals (SDGs) relating to water, energy, and food (Pahl-Wostl, 2019) as well as the interlinkages between all SDGs (Scharlemann et al., 2020;Requejo-Castro et al., 2020).The nexus is thus an important research area globally for future sustainable development.
To implement the WEFN, science needs to identify the key trade-offs and synergies which exist between sectors, which is challenging, especially when considering data availability and compatibility across sectors.The availability of data is regarded as one of the major challenges in implementing the WEFN (Vinca et al., 2021b;Kaddoura and El Khatib, 2017;Liu et al., 2017).The aim of this paper is to assist WEFN practitioners in navigating this challenge around data availability by 1) assessing globally available data sources in each of the water, energy, and food sectors and 2) developing a hierarchical decision-support tool to select data suitable for nexus-based modelling at national-, regional-, or basin-scales.National, regional-, and basin-scales are the focus of the paper because the WEFN is generally implemented at these spatial scales.Thus, the focus is on globally-available data rather than data suitable for global-scale analyses.
Globally-available data (hereafter global data), for example Satellite Remote Sensing (RS), have been used widely in water, energy, and/or food nexus assessments (Al Zayed and Elagib, 2017;Basheer et al., 2021;Alam et al., 2019) and for understanding water and energy cycles and budgets (L'Ecuyer et al., 2015;Rodell et al., 2015).Global data sources thus have a valuable role in WEFN analysis and understanding resource availability.However, the large number of data products and associated assessments of these data products for individual water, energy, and food sectors make it difficult to evaluate them for the purposes of WEFN modelling.There has to date been no attempt to review and synthesize all globally available data for water, energy, and food sectors, nor to assess if there are variables which do not have any global data available.This paper addresses this gap.
While comprehensive global reviews have been completed for some variables, e.g.precipitation (Sun et al., 2018), reviews of other variables are limited, either being outdated and/or not comprehensive.Solar energy estimation using global data sources has not been comprehensively reviewed since 2003 (Hammer et al., 2003), and even then, only RS was considered.The use of global data for understanding groundwater resources has not been comprehensively reviewed since 2007 (Jha et al., 2007;Becker, 2006).Reviews since then have focussed on one aspect of groundwater, e.g., groundwater potential mapping, considering various auxiliary data (Díaz-Alcaide and Martínez-Santos, 2019), but they have not considered all groundwater attributes that can be estimated by RS.Hydropower does not have any comprehensive review of the use of RS or other global data.Furthermore, some reviews are in sub-discipline literature and therefore may not be accessible to all WEFN practitioners e.g. an irrigation focus for evapotranspiration (ET) products (Massari et al., 2021).Thus, it can be difficult for practitioners working in the WEFN to find global data sources when local data are unavailable.
This paper aims to address the gaps in knowledge of open, global data that are suitable for modelling and implementing the WEFN.Thus, data from global ground observation databases, Land Surface Models (LSMs), reanalysis, and RS are reviewed.Literature was reviewed on each water, energy, and food sector, to summarise data availability, resolution, and the uncertainties of these data types and sources.This paper provides guidance for the rapid assessment of water, energy, and food data availability as the first step in the design and implementation of WEFN at regional, national, and transboundary levels.

Globally available data for the WEFN
There are four main types of global datasets discussed here: RS, reanalysis, LSMs, and global ground observation databases.The focus here is on primary data sources, so although there are other datasets available from, for example, process-based numerical models and machine learning models, these have generally not been included to limit the scope of the review.Where these outputs have promise in terms of addressing important gaps for the WEFN, readers are directed to pertinent resources in the relevant sub-section.In this section, an overview of the datasets and general advantages and limitations of each are provided.Benefits at global scales may not always apply for specific locations, and local case studies may still be required to understand what datasets are best for any given location.
Throughout this review, "variable" has been used to describe each data type or source, i.e., soil moisture, wind energy, solar energy, and crop area, are all referred to as variables.Some of the "variables" could alternatively be referred to as processes, parameters, forcings or predictors, particularly in a modelling context."Variables" has been chosen for consistency.Furthermore, when discussing WEFN modelling, reference is to both conceptual and numerical modelling, where the former is framework for representing the WEFN interactions and systems and the latter represents a collection of mathematical equations to simulate one or more variables.

Remote sensing
RS provides information about an area, object, or phenomenon without contact with said area, object, or phenomenon (Lillesand et al., 2015).This can include data collected by ground based RS, aircraft, unmanned aerial vehicles, or satellites.Satellite RS is the primary focus of RS discussed in this paper.There are two types of RS data available: passive: which is information attained by sensors of the Earth's naturally emitted energy; and active: where energy is transmitted by a sensor to elicit information of interest (Lillesand et al., 2015).In either case, the variables of interest are not directly measured by RS, and numerical models are used to estimate them from the direct measurements.
Optical RS imagery has the advantage of being available at relatively high spatial resolution (Musa et al., 2015), but cloudy conditions present substantial barriers if data are required at fine temporal resolution.Radar and microwave RS can penetrate clouds and can also measure Earth systems at night (Musa et al., 2015).

Reanalysis
Reanalysis datasets use data assimilation to constrain earth system models with observation data to generate gridded simulations across the earth's surface and vertically through the atmosphere and oceans (Bosilovich et al., 2008).The reanalysis models considered here for the variables of interest are routinely updated to present day (with a small delay for collection and processing of information).Spatial resolution is lower than many RS products, but temporal coverage is usually much longer.Reanalyses estimate almost all the variables discussed in this review.However due to their relatively coarse resolution, reanalysis data do not provide useful estimates for numerical modelling of the all WEFN variables discussed here at the spatial scales of interest.

Land surface models
LSMs are numerical models which simulate coupled land surfaceatmosphere processes, including carbon, energy, and water (Bonan, 1996).When included as part of a General Circulation Model (GCM) or Earth System Model with potential future radiative forcings, LSMs can also be used to project future changes of the Earth system (Fisher and Koven, 2020), including changes in surface hydrology, vegetation, and crops (Fisher and Koven, 2020), all relevant to the WEFN.

Global openly available observation networks and gridded products
Global observation networks, based on in situ measurements, are available for some important variables for modelling the WEFN.One of the limitations of these networks is their lack of spatial and temporal homogeneity, because the ground observation network density has been variable in time and space.Global observation networks frequently lack data in remote regions and over large bodies of water, including lakes.Thus, gridded products developed from gauged data have varying bias and uncertainty according to gauge density.Furthermore, although the overall product may cover a long period, missing data at individual sites can be a substantial problem and the maximum temporal coverage is usually only available for a subset of the sites.

Water, energy, food data review
The datasets reviewed cover variables required for understanding WEFN availability.The most widely used data in the literature for each sector are covered.Secondary data used in equations for estimating variables are not considered, e.g., the inputs for the Penman Monteith estimates of evapotranspiration have been omitted to limit the scope of the review.Ground and aerial based RS products have also been omitted, as these are not globally available.Temporal resolution is an important consideration when selecting datasets.Monthly time steps are commonly used in WEFN studies (Bakhshianlamouki et al., 2020;Sušnik et al., 2021;Yang et al., 2016) and capture seasonality.However finer temporal resolutions would be necessary if the WEFN modelling is designed to understand extreme event impacts, system planning, or nonlinear interactions among variables.Furthermore, whilst monthly is considered sufficient for capturing seasonality, coarse temporal resolution input data producing monthly values could result in omission of important events within that month for certain variables causing uncertainties in datasets.An example of this is river discharge; although a satellite overpass such as SWOT may fulfil the monthly resolution ability (21 days (NASA, 2020)), collecting data for one event in the month is unlikely to be representative of flow for the whole month in many locations.Thus, consideration of variable temporal fluctuations is important when considering whether a monthly observation of a given product is suitable.

Water
Although a WEFN framework considers all sectors equally, water does have a special role in the nexus, as in many applications water cannot be replaced with alternative resources (Cansino-Loeza and Ponce-Ortega, 2021;Zhang et al., 2018;World Economic Forum, 2011).There are a wide range of global openly available data and tools for hydrological modelling and understanding, covering most hydrological processes (Table 1).

Precipitation
Precipitation is the major driver of surface water availability and variability in the hydrological cycle and thus, precipitation data are imperative for estimating runoff.Global precipitation data can be collected from global databases of ground observations, satellite RS, and reanalysis (Fig. 1).Although precipitation data are available worldwide from ground observations, often the density of stations is too low to provide accurate distributions of precipitation over larger areas (Kidd and Huffman, 2011).This affects water balance in hydrological models, and further impacts the irrigation requirement estimates for the food sector.For more extensive details of existing precipitation products reference should be made to Sun et al. (2018).
Different precipitation data are advised in the data hierarchy with the choice of product based on gauge network density, temperature/ climate, and timespan of data required (Fig. 1).Reliability of gridded surface gauge data depends on the availability of gauges in the region and quality of gauged data, reliability is lower where station density is lower (Rudolf et al., 1994;Schamm et al., 2014), and the published temporal coverage of a product does not always represent temporal coverage of data in the area of concern.Thus, where station density is low or temporal coverage is short, gridded surface gauge products should not be used.
Depending on climate, different datasets have varying accuracy in representing rainfall.At higher latitudes (>40 • ), reanalysis products are recommended because RS products are often unavailable at high latitudes (Tang et al., 2020;Beck et al., 2017).Where ground is frozen, or snow cover is present, some RS products do not estimate rainfall well, and reanalysis sometimes performs better (Massari et al., 2017).In

Table 1
Open, globally available data used for water modelling.
summer months in tropical wet climates, RS often performs better than reanalysis, which does not represent localised convective storms well (Beck et al., 2019a;Hu et al., 2016;Massari et al., 2017;Beck et al., 2017).Several RS sensors are used for rainfall data, including Passive Microwave (PM), Active Microwave (AM), and infrared (IR) (Michaelides et al., 2009).PM spatial and temporal sampling is often too low for localised events, whilst neither PM nor IR capture warm orographic rainfall over complex terrain well (Derin and Yilmaz, 2014).A combination of both passive and active sensors are beneficial for estimating precipitation (Levizzani and Cattani, 2019).Temperate regions are usually better represented than arid regions (Xiang et al., 2021).Merged products, combining RS, reanalysis, and correcting with gauged data, typically represent rainfall best (Xiang et al., 2021;Beck et al., 2017;Beck et al., 2019a), and are thus recommended first.
There are major challenges in collecting snow depth and converting it to snow water equivalent (SWE) data using RS, especially in mountainous areas, although snow cover extent is better understood (Lettenmaier et al., 2015;Skofronick-Jackson et al., 2015).There are also challenges in understanding performance of precipitation products in many snow covered regions, due to lack of in situ gauges (Palerme et al., 2017;Wrzesien et al., 2019).SWE is of particular importance for WEFN analyses in mountainous catchments where snowmelt dominates annual and seasonal water flows, providing water for hydropower and crops.Optical sensors can estimate snow cover area, whilst passive and active microwave are capable of estimating SWE, although PM's spatial resolution limit its ability in mountainous areas (Largeron et al., 2020).Several reanalysis datasets can also be used to monitor snow water depth, including MERRA, ERA, and GLDAS with large uncertainties (Wrzesien et al., 2019).In the European Alps, CHIRPS, a merged precipitation dataset, had reasonable skill in modelling snow (Weber et al., 2021).Treichler and Kääb (2017), detected snow depth with ICESat and high accuracy DEMs.However ICESat's revisit time (91-days), means it is unsuitable for monthly analysis.Lievens et al. (2019), showed that Sentinel-1 SAR could capture inter-annual variations and spatial variability of snow in mountain ranges in the northern hemisphere.A more comprehensive review on the current state of snow depth can be found in Largeron et al. (2020).
Temporal coverage of precipitation data varies widely (Table 2).If over 40 years of data are required, e.g. for climate scenarios, reanalysis is better (Sun et al., 2018), because merged and RS datasets are often only available for 20-40 years (Beck et al., 2019b).Whilst 30 to 40 years of data are adequate for analysing climatological means and seasonal variability, longer time series are generally required to understand the impacts of interannual variability, including droughts and pluvial extremes.However in many locations, even century-long observational records do not capture the full spectrum of such variability, particularly when non-stationarity is exacerbated by anthropogenic climate change (Milly et al., 2008).Resilient WEFN policies will likely require testing with climate scenarios and/or paleoclimate-informed stochastic simulations.
The WEFN is often analysed using a monthly temporal resolution and at this resolution, most precipitation products have adequate skill.However, if the impacts of short duration extreme events are important Fig. 2. Evapotranspiration data hierarchy.The products mentioned here are those which contain ET data which can be directly downloaded.
then products should be chosen accordingly (Table 2).For example, the cross sectoral impacts of short duration, extreme floods on food supply and transport, particularly in the context of spatially or temporally compound events is an active area of research.

Evapotranspiration
ET data are imperative in deriving irrigation requirements (Brouwer and Heibloem, 1985) and catchment losses.Thus, ET represents a key water-food interaction in the WEFN.ET is physically driven by solar radiation, wind speed, humidity deficit, temperature, soil moisture, crop type, and crop stage (Brouwer and Heibloem, 1985).ET is difficult to measure directly and there are many methods for estimating ET, including empirical, energy budget, deterministic, and vegetation index (VI) methods, which have varying levels of complexity and data requirements (McMahon et al., 2013).As a result, ET differs from the other variables reviewed here, with algorithms using a variety of data sources, combining indices and products (Zhang et al., 2016a).ET has high spatial variability, making in situ data collection at adequate spatial resolution difficult (Liu et al., 2010).This section gives a brief overview of ET numerical models that directly provide globally available ET estimates.
The driving factors to select ET data are temporal coverage, spatial resolution, and availability of in-country datasets (Fig. 2).If over 20 years of data are required for analysis in the WEFN, reanalysis-merged datasets, like GLEAM V3.6a or NTSG are available from 1980 (Table 3).If available, national ET datasets are advised because of their finer spatiotemporal resolution and higher accuracy due to use of local data.Examples include OpenET (OpenET, 2021) and LANID (Xie et al., 2021) in the US, and TERN CMRSET in Australia available at 30 m spatial resolution (McVicar et al., 2022).
The spatial resolutions of global products range from 70 m to >1 km.PML_V2 and MOD16A2 visual spectrum products were the only two available actual ET (ETa) products available globally at 500 m resolution, until the ECOSTRESS PT-JPL product was launched in 2018 at 70 m resolution (Guerschman et al., 2022)(Table 3).However, the short temporal coverage of ECOSTRESS is unsuitable for many applications.Finer spatial resolution products are necessary to understand crop variations and food security (Shi et al., 2014;Thenkabail et al., 2010).Global ET products do not exactly represent crop water requirements, and this needs to be considered when modelling irrigation for the WEFN.
Many ET data products merge RS with ground observations, land surface models that include data assimilation, and/or reanalysis (Liu et al., 2010;Guerschman et al., 2022;Peña-Arancibia and Ahmad, 2020;Zhang et al., 2019;Li et al., 2009;Courault et al., 2005).Thermal infrared (TIR) data are particularly useful in the majority of these methods, whilst visual based RS is also widely used, e.g.SSEBop (Senay et al., 2020) and MOD16A2 (Mu et al., 2011).Reference should be made to Li et al. (2009) and Zhang et al. (2016a), for more information on the methods, advantages and limitations of different ET calculation methods.

Streamflow
The WEFN is a recommended approach for developing transboundary river agreements (Keskinen et al., 2016;De Strasser et al., 2016;UNECE, 2015), and thus streamflow data are a necessity.RS, LSMs, or reanalyses alone do not provide reliable data for streamflow, although some global modelling efforts have produced streamflow data.The only openly available dataset providing streamflow measurements, requiring no hydrological modelling or analysis, is the Global Runoff Data Centre (GRDC) dataset, with data provided for over 100,000 stations (The Global Runoff Data Centre, 2020).However data coverage is not equally distributed across the world, and whilst some data are available from the 1800's to present, other stations are not up to date (The Global Runoff Data Centre, 2020).The Global Streamflow Indices and Metadata (GSIM) Archive also provides metadata on temporal and spatial availability and links for access to streamflow data (Do et al., 2018).
RS cannot reliably measure streamflow in ungauged basins, without hydrological modelling or in situ observations (Van Dijk et al., 2016).RS data can be used to supplement sparsely gauged river data in space and time (Pham et al., 2018) and for calibrating hydrologic models (Gleason and Durand, 2020).RS can also be used for prediction in ungauged basins through transfer functions relating streamflow data from geologically and climatologically similar gauged catchments (Jódar et al., 2018).Width/stage-discharge rating curves can be derived from fine spatial resolution RS, including Jason-2, MODIS, ENVISAT, and Landsat (Wang and Xie, 2018;Papa et al., 2012).The Surface Water and Ocean Topography Mission (SWOT) was launched in December 2022 and may contribute to improved understanding of streamflow.With three products, SWOT will monitor approximately 200,000 rivers with over 100 m width, with overpasses every 21 days (NASA, 2020).Although this is a relatively coarse temporal resolution, it should be useful for understanding seasonal streamflow dynamics at monthly scale for the WEFN.
An emerging area of research is the development of global streamflow reanalysis products and machine learning models for estimating global streamflow.These products are model derived, are generally at coarse spatial resolutions and can have substantial biases but may still be useful in terms of understanding seasonal dynamics or if in situ data are available for bias correction (Alfieri et al., 2020).GloFAS-ERA5 streamflow reanalysis covers the period 1979-present with a daily time step and 0.1 degree resolution (Harrigan et al., 2020).Lin et al. (2019) used RS, reanalysis, and a ground observation network to produce daily and monthly streamflow estimates at 2.94 million reaches from 1979 to 2014, in the Global Reach-level A priori Discharge Estimates for Surface Water and Ocean Topography (GRADES) product.The GRADES product was evaluated at approximately 14,000 gauges and although monthly correlations were generally high, there were substantial biases (>20%) at the majority of locations.Gridded monthly runoff data at 0.5 degree spatial resolution has also been estimated using machine learning covering the period from 1902 to 2014 (Ghiggi et al., 2021).(Fisher et al., 2020) As precipitation data are more easily collected and hence, more widely available, basin scale hydrological models are most commonly used for estimating runoff in rivers, using the available precipitation data described above (Wang and Xie, 2018).RS can be used to determine inputs for hydrologic models, including for land-use, soil, vegetation, drainage, area, elevation and slope data (Coskun et al., 2010).Numerical modelling is not the focus of this paper, but an overview of different types of hydrological modelling available, and benefits and drawbacks of these models can be found in Fatichi et al. (2016), Sitterson et al. (2018), or Jaiswal et al. (2020).Despite attempts to calibrate hydrologic models without in situ data (Sun et al., 2015;Emery et al., 2018), such models generally still require calibration against ground observed records.Thus, some streamflow records are still required.

Soil moisture
Soil moisture information is important in irrigation scheduling for agriculture and in estimating runoff.There are generally three methods for estimating soil moisture content, including in situ measurements, LSMs, and RS.There are some in-situ networks of soil moisture data, including the Cosmic-ray Soil Moisture Observing System (COSMOS) (Zreda et al., 2012) and the International Soil Moisture Network (ISMN) (Dorigo et al., 2021), Njoku et al. (2003) concluded that ground networks are largely unsuitable for representing the high spatial and temporal variability of soil moisture and there has been limited improvements in terms of spatial or temporal coverage in the following 20 years.RS and LSMs with data assimilation can represent the spatial and temporal variability well but introduce their own uncertainties.
Globally available soil moisture data includes RS, LSMs with data assimilation, and reanalysis products (Fig. 3).In most cases only the top 5 cm depth of soil moisture is measured by RS, and though this is used widely in RS soil moisture studies, the depth of the root zone in agriculture means that estimates of the top 5 cm soil moisture is likely insufficient for WEFN studies.Thus, LSMs with data assimilation (hereafter LSMs) are useful to understand soil moisture at greater depths, particularly where vegetation is dense (Nicolai-Shaw et al., 2015).GLDAS (an LSM) is used widely to estimate soil moisture over 5 cm depth and performs well (Spennemann et al., 2015).
Where RS is recommended (Fig. 3), SMAP should be considered first as it has outperformed other RS datasets in most cases globally and has higher spatial resolution than other products (Table 4) (Chen et al., 2018a;Fan et al., 2022;Kumar et al., 2018;Montzka et al., 2017).Additionally, SMAP Level 4 (L4) can estimate soil moisture deeper than 5 cm.If temporal coverage is required from before 2000, ESA CCI is recommended (Qiu et al., 2016;Li et al., 2022) (Table 4).In dense vegetation, the ability of microwave RS products are generally reduced (Duygu and Akyürek, 2019), although there have been cases where RS products perform well in dense vegetation (Kim et al., 2020), LSM products have been shown to perform more consistently over a range of dense vegetation (Fang et al., 2016).

Groundwater
Groundwater (GW) is water stored in soil and porous rock  underneath the Earth's surface, representing up to 33% of the world's water withdrawals (Siebert et al., 2010;Famiglietti, 2014).It is important as a primary drinking water source for much of the world's population and for irrigation water, but in many regions in the world it is poorly monitored (Famiglietti, 2014).Important considerations for groundwater from a WEFN perspective include surface water -GW interactions, the availability of GW, and understanding natural and artificial recharge (Jha et al., 2007).In situ observations are rarely available due to cost of installation and monitoring (Li et al., 2019).The only globally available global data source for GW measurements is the Global Groundwater Information System (GGIS) which provides locations of transboundary aquifers, a global monitoring network, and sites of managed aquifer recharge (https://ggis.un-igrac.org/).
No global data currently exist to quantify GW availability, but RS can be used as a proxy (Fig. 4).More information on the use of RS for GW monitoring data generally, can be found in Becker (2006) and Jha et al. (2007).There have been no comprehensive reviews on global data for all GW information since 2007, with more recent reviews only considering one aspect of GW.Features important to GW availability such as topography, vegetation, springs and seeps can be detected and defined Fig. 5. Lake data hierarchy.RS is recommended for global estimations of lake area, level, and storage changes.With regards to depth changes, global databases which contain altimetry data should be referred to first.Area can be estimated from several sensors using algorithms discussed below.Using both altimetry and area data, storage changes can then be calculated.
GRACE and GRACE-FO have received substantial attention for measuring seasonal mass changes associated with overall water storage, including GW, soil moisture, surface water, snow, ice, intercepted water, and biomass (Li et al., 2019).GRACE data can be used to estimate interannual variability of GW worldwide in large catchments (Yeh et al., 2006;Frappart and Ramillien, 2018;Rodell et al., 2009), noting the need to isolate GW from other terrestrial water sources, thus relying on other hydrological information (Rodell et al., 2009).This makes GRACE post processing requirements substantial and limits its use in WEFN studies.

Lake/reservoir storage
Lakes and reservoirs (hereafter, lakes, encompassing both, without neglecting the importance of reservoir storage in the WEFN) with surface area > 50 km 2 cover more than 1.4% of global land surface (excluding Antarctica and glaciated Greenland) (Lehner and Döll, 2004).
Lakes provide water for agriculture and energy, and are affected by these two sectors, thus creating a central link in the WEFN.Storage is affected in the long-term by climate variability and change, and at all time scales through hydropower and irrigation (Crétaux et al., 2011a).However, lake and reservoir management is very rarely reported.RS presents a good opportunity for lake monitoring in hydrology (Dörnhöfer and Oppelt, 2016).
RS can be used to estimate lake area, water level, and for inferring storage changes (Fig. 5) (Crétaux and Birkett, 2006;Mohsen et al., 2018;Lu et al., 2013).Altimetry data were originally designed for ocean surface water height but have been used since for studying lake levels (Singh et al., 2012), and are recommended here.To estimate lake surface area, SAR, visible, IR, and passive microwave sensors can be used (Lu et al., 2013;Mohsen et al., 2018;Rokni et al., 2014).If approximately synchronous area and level variations are estimated from RS, it is possible to infer storage changes without in situ data (Baup et al., 2014).This would only account for changes in storage, not considering total storagewhere bathymetry and topography data is usually necessary.This information is difficult to attain and RS data require substantial processing for attainment of this information (Lu et al., 2013).
Laser and radar altimetry are available for water level data, with laser altimeters providing higher spatial but lower temporal resolution compared to radar (Gao, 2015;Crétaux and Birkett, 2006).Laser altimetry (e.g.ICESat) has better than 30 cm accuracy in most cases (Yuan et al., 2020;Xu et al., 2021), while radar accuracy ranges from a few centimetres in larger lakes to tens of centimetres in smaller lakes due to fewer samples for averaging to compute the mean lake level against datum (Crétaux and Birkett, 2006;Crétaux et al., 2018).Despite the good spatial resolution of ICESat, its overpass time is only every 91 days which would be unsuitable for many WEFN studies (Table 5).If longer consistent data are required, a combination of altimetry products may be necessary (Singh et al., 2012).
Altimetry products can be hard to process without some expertise in RS (Abdalla et al., 2021), thus, lake level databases derived from RS are recommended if they cover the study area.Databases with lake levels include HYDROWEB (Santos Da Silva et al., 2010;Crétaux et al., 2011b), G-REALM (Birkett et al., 2011), Zhao and Gao (2018), Shen et al. (2022), and DAHITI (Schwatke et al., 2015).Although these data are available globally, the number of lakes captured by RS is limited and processing of altimetry data for small water bodies remains a challenge (Sulistioadi et al., 2015;Crétaux et al., 2018).The high resolution SWOT mission (discussed in Section 3.1.3)should enable monitoring of many more water bodies worldwide (Nair et al., 2021;Grippa et al., 2019), although this will cause inhomogeneities in long-term data records in the future.Thus, effort is required to make historic lake altimetry data more readily available for WEFN studies.Several RS products are available for mapping water surface area.SAR provides useful data in cases of darkness, cloud cover, and forest cover, however emergent vegetation and wind increase uncertainties (Smith, 1997).Passive microwave can also detect water under cloudy conditions and dense vegetation, but spatial resolution is coarse (Gao, 2015;Ji et al., 2009).Passive visible/IR have been easily accessible, available at higher spatial and temporal resolutions, and provide good delineation if clouds, dense vegetation, and smoke do not obscure the water surface since at least the 1990's (Smith, 1997;Alsdorf et al., 2007;Ji et al., 2009).This is still the case and visible sensors are still widely used in water surface mapping, especially at large spatial scales, and coarse temporal resolutions.Such sensors include Landsat, MODIS, ASTER, and AVHRR (Table 6).Water features can be delineated from these sensors using several methods.Both NDWI and MNDWI have been successfully used (El-Shirbeny and Abutaleb, 2018;Duan and Bastiaanssen, 2013;Rokni et al., 2014;Sarp and Ozcelik, 2017).NDWI overestimates water area in built-up areas, thus where there is substantial anthropogenic development, MNDWI might be better (Xu, 2006).

Energy
Substantial increases in renewable energy generation are required globally to achieve net zero scenarios (IEA, 2021), to meet the SDGs (United Nations, 2021) and the Nationally Determined Contributions to the Paris Agreement to reduce greenhouse gas emissions (UNFCCC, 2022) (Table 7).Thus, this section will focus on onshore renewable energy (hydropower, solar, wind, geothermal, and biomass).Offshore wind resources are also considered due to their similarities with onshore wind, fast development and current prominence in national policies worldwide (IEA, 2019).The key data of interest for energy development are the theoretical availabilities of the resources which produce the energy, for example for wind power, wind speed is assessed.There are several data sources available for most sectors (Table 7).Practical constraints, such as proximity to grid or topography are important, but these considerations and therefore the data for these aspects are outside the scope of this paper.Fig. 8. Wind data hierarchy, based on globally openly available data.GWA is only recommended to assess feasibility of wind for country scale.Reanalysis is generally recommended for analysis after this.Although remote sensing has some capability in offshore wind speed, it is not useful for onshore assessments.Complex terrain is characterised by uneven topography, for example including mountainous regions.

Hydropower
Hydropower has been the centre of many WEFN studies (Basheer et al., 2021;Yang et al., 2018;Jalilov et al., 2015).As well as energy generation, hydropower dams have multiple benefits and impacts on water and food sectors.There are two main data needs, including the theoretical potential energy available from new hydropower and the operating rules of existing hydropower dams (Fig. 6).However, because of its strong commercial relevance, data are rarely shared on hydropower operations.
The theoretical potential of hydropower is primarily dependent on flow and elevation head (Coskun et al., 2010;Larentis et al., 2010;Kusre et al., 2010).Flow data have been discussed in Section 3.1.3,which concluded that there are currently limited globally available data sources for flow, these limited data sources are unlikely to be available for the exact point of interest.Because streamflow data are usually unavailable from global sources at points of interest, hydrological modelling has been necessary (Coskun et al., 2010;Kusre et al., 2010;Pandey et al., 2015).Elevation head could be considered using a Digital Elevation Model (DEM), showing topography, which thus could be used to measure the elevation differences between mountains and valleys where hydropower is being considered.There are a wide range of DEMs available, including for example, the Shuttle Radar Topography Mission (SRTM) from NASA at 30 or 90 m resolution, between 60 • N and 50 • S (USGS, 2018).For national scale assessments, it may be sufficient to assess whether hydropower is a feasible option for energy generation using only precipitation, evapotranspiration, and elevation data, however these hydropower energy estimates will not be precise.Operating rules govern how water is released from reservoirs.In the absence of publicly available operating rules, flow, lake drawdown, and lake area are required (Fig. 6) (Section 3.1.3and 3.1.6).The limitation of flow data also limits the ability to understand hydropower operating rules of hydropower at points of interest.

Solar energy
Solar energy capacity in 2030 has to be more than eight times the capacity in 2019 (Table 7) to achieve net-zero (IEA, 2021).Solar energy encompasses solar thermal heating and solar photovoltaic energy (Solar PV).In both cases, Solar Surface Irradiance (SSI) is the primary driver of the potential energy resource.SSI is made up of three variables, Direct Normal Irradiance (DNI), Diffuse Horizontal Irradiance (DHI) and Global Horizontal Irradiance (GHI) (Qin et al., 2021).DNI and GHI are the two most important components for estimating potential solar energy resources (Lopes et al., 2018).DNI and GHI can be determined in three ways, including via numerical modelling, ground observations, or satellite RS (Jia et al., 2021).The uncertainty of all products usually increases under cloudy conditions (Damiani et al., 2018;Zhao et al., 2013;Huang et al., 2019).
For estimating availability of solar energy at national scale, the Global Solar Atlas (GSA), or an existing national solar atlas is recommended first (Fig. 7).Although the spatiotemporal resolution of GSA is not high, it is easy to use and suitable to determine whether solar resources are viable in an area of interest, although it should not be used for detailed investigation.In GSA, uncertainties are higher for DNI, higher latitudes, regions with high aerosols, coastal zones, humid climates, regions with low data quality or no data availability, and snowor ice-covered mountainous regions in particular (Global Solar Atlas, 2019b).
At sub-national level, satellite RS is recommended (Fig. 7).The National Solar Radiation Database (NSRDB) is a good near-global database, which has estimates of typical year solar irradiance and provides some data at finer temporal resolutions from recent geostationary satellites, namely METEOSAT, HIMAWARI, and GOES (https://nsrdb.nrel.gov/).Geostationary satellites are generally recommended for current solar irradiance data because of their fine spatiotemporal resolutions (Hirooka et al., 2018;Bessho et al., 2016) and collectively, they cover most of the globe.The temporal coverage of many recent solar irradiance products using these satellites is however limited, typically not beginning before 2000.There are many methods available for use with RS information, including physical process based methods and statistical methods (Huang et al., 2019).

Wind energy
Wind energy can be harvested through wind turbines located on land (onshore) or over water (offshore).Wind energy is more difficult to estimate over large areas compared to solar energy due to its substantial spatial heterogeneity from land use, altitude, and topography (Murthy and Rahi, 2017, Zhou et al., 2011, Eurek et al., 2017, Keyhani et al., 2010).The most accurate way of collecting wind resource data is via in situ anemometer measurements, placed on masts at turbine height.However, wind turbines are being constructed taller, thus, placing anemometers at turbine height is becoming more difficult.Existing measurements are spatially sporadic (Monaldo et al., 2001).Because wind energy is proportional to the cube of the wind speed, errors in wind speed measurements result in large errors in wind energy output (Arun Kumar et al., 2020).
To assess feasibility of resources at national scales, the global wind atlas (GWA) (https://globalwindatlas.info/) is useful, despite its low spatiotemporal resolution.Any analysis considering more detail would require higher precision in energy output estimates, and GWA would be inappropriate for this (Fig. 8).
Reanalysis is widely used for energy estimates of both offshore and onshore resources (Staffell and Pfenninger, 2016;Gruber et al., 2022;Rabbani and Zeeshan, 2020), and thus is recommended when finer than national scale availability is required.However, reanalysis products have large uncertainties, and generally need to be corrected with ground observations.Thus, reanalysis should be approached with caution in study locations with sparse data (Rabbani and Zeeshan, 2020;Fan et al., 2021).Complex topography, including mountainous regions, coastal areas and lower wind speeds are not as well represented by reanalysis (Jourdier, 2020;Rabbani and Zeeshan, 2020;Gualtieri, 2022), temporal aggregation to the monthly values appropriate for WEFN analyses substantially improves data reliability (Gruber et al., 2022).However, the spatial resolution of reanalysis is still relatively coarse (Table 9).
Reference should be made to Fan et al. (2021) and Miao et al. (2020) when considering which reanalysis product performs best for any particular region.If detailed or precise wind resource assessments are needed, reanalysis data can be coupled with a computational fluid dynamics model to understand complex flows induced by surrounding topography (Kim et al., 2018;Waewsak et al., 2019).RS is only useful to estimate offshore wind resources (Young et al., 2020;Hasager et al., 2008;Monaldo et al., 2001).There are several satellite products available for estimating wind speed, including SAR products, scatterometers, altimeters, and passive microwave radiometers (Young et al., 2020;Hasager et al., 2008).It is generally recommended to use multiple satellites in wind resource assessments, to reduce uncertainty and improve the temporal resolution (Wei et al., 2019).However, in situ observations are still required to validate the offshore RS wind products prior to use in wind energy estimation (Hasager et al., 2008).

Geothermal energy
The temperature gradient in the Earth's crust is usually 25-30 • C/km depth, however in volcanic zones, this can increase to up to 150 • C/km depth ( Van der Meer et al., 2014).Within a depth of 10 km from the Earth's surface, geothermal resources can produce the equivalent energy of 3 × 10 17 billion barrels of oil, theoretically meeting global energy requirements for the next six million years (Lund et al., 2008).
Satellite RS can be used in the initial stages of investigations but in situ surveys are required to precisely measure geothermal energy potential (Qin et al., 2011).There are two ways to assess geothermal resources availability using RS (Fig. 9).VIs are recommended for mapping thermally stressed vegetation and different growth rates, which can indicate CO 2 vents from geothermal systems (Bateson et al., 2008).
Minerals can also suggest the presence of geothermal resources (Kratt et al., 2010).RS for both vegetation and minerals rely on visible-, mid-IR, and near-IR channels to identify areas of potential resources although both methods have led to false positives in the past (Kratt et al., 2010;Bateson et al., 2008).Other direct and indirect indicators of geothermal resources include temperature, gas emissions, surface deformation, hot springs, fumaroles, and calderas (Haselwimmer and Prakash, 2013).
Surface features can be used to estimate temperature and the energy potential of underground resources, although these have large uncertainties (Vaughan et al., 2012;Heasler and Jaworowski, 2018).TIR can estimate surface temperature and geothermal heat flux, but surface features are usually tens of meters in dimension (Mongillo et al., 1995), and thus most current satellites do not have high enough spatial resolution (Heasler and Jaworowski, 2018).

Biomass
Biomass accounts for 35% and 3% primary energy needs in developing and developed countries, respectively (Avtar et al., 2019).Biomass can come from agricultural by-products, forests, and dedicated energy crops (Fig. 10).Energy from agricultural by-product, or residual energy, cannot be accurately determined directly by any globally     available products.Therefore, to estimate the residual energy potential, local or regional data on biomass residue are required, along with yield and heat energy values to estimate energy potential (Gao et al., 2016;Jiang et al., 2019;Hiloidhari et al., 2014;Angelis-Dimakis et al., 2011).
VIs can be derived from optical RS based on the changes of reflectance, emanating from changes to biomass production (Ahamed et al., 2011).VIs are however specific to species, conditions, and vegetation type (Viña et al., 2011), meaning VIs employed at large scale might not represent biomass well without processing.VIs are also sensitive to cloud cover, with relatively high uncertainty, whereas SAR data are suitable for almost all-weather conditions, making it especially good for crops which grow in wet seasons, e.g.rice (Chao et al., 2019).Optical data uncertainties result from the inability to assess biomass structural uncertainty, heterogeneity, and seasonality (Kumar and Mutanga, 2017).Optical sensing is good for assessing horizontal canopy cover and vegetation types.However there are limitations in estimating canopy height (Kumar et al., 2015), which is especially important in forests.A list of satellite products for biomass is provided (Table 10).

Food
Food demand is increasing rapidly, with global crop demand expected to grow by up to 110% between 2005 and 2050 (Tilman et al., 2011).As a result, large areas of land will potentially have to be cleared with associated increases in irrigated areas required (Tilman et al., 2011).Food as a sector is a strong driver of both water and energy use, with 70% of global water used for irrigation, and 30% of global energy used for food production and supply (FAO, 2014).Food data are often available through national statistics offices, though the data usually do not cover all crop types and there is limited information on the spatial or temporal distribution of resources.Satellite RS presents a good alternative to address these limitations.
The focus of this section is on crop area and yield which can be estimated using a variety of methods (Table 11).Crop area and yield are the variables directly representing resource availability at the regional and catchment scale considered here.At the local scale, crop yield is affected by pests, disease, soil nutrient availability, and meteorological conditions (Karthikeyan et al., 2020).For more information on how RS can be used for monitoring crop pests and diseases, refer to the review by Mutanga et al. (2017).For more information on how RS data can be used as a proxy for food security, including access to available food, refer to Brown (2016).

Crop area
Satellite imagery has been widely used to map cropland, although crop area is sometimes difficult to estimate due to its high spatiotemporal variability and ground data necessary to distinguish different crop types (Shi et al., 2014;Brown, 2016).To estimate crop area, three types of data are of interest: overall crop extent, crop type, and irrigated area (Fig. 11).
To estimate cropland extent, the 10 m spatial resolution WorldCover product, available for 2020-2021(Zanaga et al., 2022)), is the finest spatial resolution product available (Table 11), followed by the Global Cropland-Extent Product at 30-m Resolution (GCEP30), using Landsat data and VIs (Thenkabail et al., 2021).Notably, several products for cropland extent, irrigation area, and crop classification have severely limited temporal coverage, with many only available for nominal years (Table 11).
If crop classifications are required, the MIRCA2000 dataset, with 26 crop classes available, is useful, although the spatial resolution is 5 arc min, and it is only available for around the year 2000 (Portmann et al., 2010).If more recent and/or higher spatial resolution is required on crop types then global maps are inadequate.In this case, multitemporal SAR or VIs are recommended.Several images are required because it is difficult to distinguish between crop types based on single images (Shi et al., 2022).Both SAR and VIs have performed well in estimating overall crop extent and individual crop areas (McNairn et al., 2014;Steele-Dunne et al., 2017;Jia et al., 2012;Chen et al., 2020;Guiomar et al., 2021;Xiao et al., 2005).SAR have increasingly improved spatial resolution which has meant they are capable of capturing more heterogeneous farming (Chen et al., 2020).SAR data are available in all weather conditions and at night.It can thus assess temporal variations of cropland and is becoming increasingly popular for crop classification (Shi et al., 2022;Skriver et al., 2011;Kim et al., 2012;Mcnairn and Shang, 2016).Thus, SAR is better applied to crops in cloudy areas, or grown in the rainy season, whilst VIs are well suited to cloud free conditions, or where longer temporal gaps in data are acceptable.
For irrigation mapping, visible, NIR, LSMs, and microwave data can all be used (Massari et al., 2021).Visible and NIR can be used to understand irrigation extent, frequency, and timing, from local to global scale, at spatial scales up to 30 m using Landsat and MODIS using VIs and/or ancillary data (Ozdogan et al., 2006;Beltran and Belmonte, 2001;Biggs et al., 2006;Chen et al., 2018b).For assessing irrigated areas, multiple images over time are also usually required (Ozdogan et al., 2006;Beltran and Belmonte, 2001).NDVI has been used to map irrigated areas with high accuracy (Ozdogan et al., 2006;Beltran and Belmonte, 2001;Biggs et al., 2006).Microwave detected soil moisture has also provided promising results in identifying irrigated areas (Qiu et al., 2016;Lawston et al., 2017).There are also several global irrigation products which were produced for years between 2000 and 2015 (Table 11).

Crop yield
There are three useful methods for approximating crop yield, including linear regression models with VIs, Solar Induced Fluorescence (SIF), and active microwave (Fig. 12) (Mulla, 2013;Guan et al., 2017;Kenduiywo et al., 2018;Kim et al., 2012;Mohammed et al., 2019;Guan et al., 2016).The performances of the methods are affected by a range of variables, including crop type, length of training data, atmospheric variables, and soil type (Peng et al., 2020;Karthikeyan et al., 2020).Food crop yieldi.e., the yield of the crop which is used as foodcannot be precisely estimated without some form of numerical modelling or ground-based data for first establishing an empirical relationship.Thus, although, RS can be used alone to broadly compare difference in yields in a given area, it will not give precise values for food crop yield alone.
Regression based methods with VIs are commonly used for crop yield estimates, particularly where long term data with high spatial resolution are required (Guan et al., 2017).There is a strong relationship between crop yield and VIs, especially in the later part of the growing season, although local ground data are required to first establish these empirical relationships (Labus et al., 2002;Quarmby et al., 1993;Lai et al., 2018;Mirasi et al., 2021).The combination of NIR and red bands and of NIR and green bands perform especially well in crop yield studies (Mulla, 2013;Labus et al., 2002).Using several VI observations over time generally improves results (Labus et al., 2002;Lai et al., 2018).Hyperand multi-spectral images can also be used to estimate crop yield and have showed promising results (Kira et al., 2016;Wu et al., 2010;Mulla, 2013).Using multiple sensors and various spectral ranges can also improve yield estimates (Guan et al., 2017).
Leaf Area Index (LAI) is also closely related with yield and is a common input into crop models (Doraiswamy et al., 2005;Mokhtari et al., 2018;Doraiswamy et al., 2004).There are several ways to calculate LAI using satellite RS, mainly via optical and microwave sensors with empirical relationships or algorithms.Fang et al. (2019) provides a good overview of these and more uses of LAI generally.LAI can also be estimated via microwave and LiDAR, with LiDAR available from ICESat used with radiative transfer models, and radar backscatter used with empirical relationships (Fang et al., 2019).Global LAI products have higher uncertainties in tropical and boreal regions as well as in summer seasons (Fang et al., 2019).
Solar Induced Chlorophyll Fluorescence (SIF) emissions are a more direct indicator of photosynthesis compared with VIs and other indicators (Mohammed et al., 2019, Guan et al., 2016).However, SIF measurements are not currently practical for WEFN assessments due to their coarse resolution (Mohammed et al., 2019;Peng et al., 2020) and the need for supplementary numerical modelling to process the data (Yang et al., 2019).The Fluorescence Explorer (FLEX) mission will provide higher resolution estimates of SIF when launched in 2025 (Drusch et al., 2017).With the FLEX mission and more research, this could be a viable option in the future.
Another project which may improve food data reliability is the Global Food Security-Support Analysis Data 30 m resolution (GFSAD30) project which aims to monitor crop dynamics from 2000 to 2025, producing maps with cropland types and spatial and temporal intensities (Oliphant et al., 2022;Western Geographic Science Center, 2018).
Active microwave is also influenced by crop variables, and can be used to estimate crop structure, crop growth, LAI, water content, and biomass (Kenduiywo et al., 2018;Kim et al., 2012).SAR is the most widely used radar due to its relatively fine spatial resolutions, although it is still too coarse for representing highly spatially variable crop planting practices (Steele-Dunne et al., 2017).A benefit of radar is the ability to provide information on the entire canopy, not constrained to the top layer of leaves (Kim et al., 2012).As for crop area, SAR can provide crop yield monitoring under all weather conditions (Mcnairn and Shang, 2016).

Overall synthesis of WEFN data availability and suitability
RS, reanalysis, and LSMs all provide data useful for WEFN modelling but with varying reliabilty and post-processing requirements (Fig. 13).Overall, no sector is better represented than any other by global data sources.RS can represent more variables (thirteen) and with better reliability than either LSMs or reanalysis (seven).The best represented variables are soil moisture, precipitation, and solar irradiance.The largest gaps in global datasets (Fig. 13) are streamflow, groundwater, hydropower, and geothermal resources.This ranking is based on the literature available, how much and the complexity of post-processing required overall, and the relative precision of products without calibration.The relative reliability is based on the literature stating to what extent variables can be estimated.I.e., the variables ranked lower are only useful for initial scoping investigations because of the high or undefined uncertainty associated with the data, whilst the variables ranked higher provide estimates with quantified uncertainties.
It is important to remember that all the global data sources discussed here are modelled to varying degrees, from relatively simple algorithms or linear regression to complex numerical models.Thus, all data products have included some ground data for calibration and validation at some point during their development.The ranking here is thus focused on how much additional post-processing or local ground data are required by WEFN modellers to provide useful estimates of variables influencing the nexus.
Precipitation and soil moisture are best represented by global datasets.Whilst solar irradiance is also represented well it was ranked lower than precipitation and soil moisture (Fig. 13), due to the widespread use of geostationary satellites which are not global.ET, crop area, and crop yield data are also reliably estimated by global products, however, the data provided usually need to be combined with other data products (e.g., LSMs or reanalysis) or land observations and are therefore ranked as requiring more post-processing for WEFN modelling.Crop area was ranked higher than crop yield because total combined crop area maps can be produced without in situ data, whereas crop yield for food relies on local ground data for correction.Wind, lake storage, and biomass are ranked around the middle of the 13 variables (Fig. 13).Storage is not truly globally available, with a very low percentage of lakes measured by altimeters and in global databases, and hence ranked lower in both reliability and post-processing.Wind data have high uncertainties, and biomass requires other data to calculate the required energy estimates.Streamflow RS and reanalysis data were ranked low, as both can only be used with complex numerical models to provide estimates, and most data outputs have large uncertainties unless corrected with in situ data, which are rarely available.
Hydropower is often central in WEFN studies (Basheer et al., 2021;Yang et al., 2018;Jalilov et al., 2015;Vinca et al., 2021a).The limited availability of streamflow data leads to a lack of data on hydropower potential and operating rules.Although SWOT (Section 3.1.3)will improve this, data will likely still not be available at potential new hydropower dam sites or directly downstream of existing hydropower dam sites.Even national streamflow datasets may not be sufficient.Therefore, it is likely that some form of hydrological modelling will continue to be required in WEFN studies.

WEFN variable interactions
In this sub-section the focus is specifically on WEFN data interactions in the context of the review and not on the larger system interactions that are a key feature of the WEFN.In particular, global data for one variable sometimes contributes to global data sources for another variable.The synergies in Fig. 14 have been identified based on the literature review undertaken here, where one variable was directly necessary and/or beneficial in the global data calculation of a second variable.These data interactions can be uni-directional or bi-directional (Fig. 14).For example, soil moisture and precipitation are widely used for RS irrigation mapping (Massari et al., 2021;Qiu et al., 2016;Brown, 2016), depicted as a uni-directional interaction.Bi-directional interactions occur between ET and crop mapping because crop mapping products are used in some ET products, but likewise, ET along with other meteorological products are sometimes used in irrigation mapping products.However, although hydropower data are relevant information for agriculture at a local scale, global agricultural data, i.e., NDVI, do not use hydropower data as a direct input.Thus, hydropower is not considered a direct data interaction for agriculture.There are obviously many more interactions between variables, for example since wind affects ET, it will also then impact streamflow in a hydrologic model, as a second-order model interaction.Here only primary global data interactions are considered, which is the first time that such interactions have been explicitly mapped for the WEFN.
Improving global data for one WEFN variable can benefit other variables' global data.For example, in investigating geothermal resources, it is important to remove solar radiation, soil moisture, and vegetation fluxes to minimize false positives (Qin et al., 2011;Coolbaugh et al., 2007).Wind data are important for ET numerical models and global ET estimations (Courault et al., 2005;Gomis-Cebolla et al., 2019).But reanalysis wind speed is unreliable and presents challenges in ET calculations when using reanalysis products (Nouri and Homaee, 2022), thus improvement of wind data can benefit ET.Solar irradiance is important in solar energy estimation, but also contributes to ET estimates (Huang et al., 2019).Variable interactions are also important to understand uncertainties in estimates of the WEFN sectors individually and collectively.The interactions mean that data uncertainties can cascade through the WEFN if global data for one variable are used in another variable's global data calculation.For example, uncertainties in precipitation datasets could affect crop mapping which in turn impact geothermal energy estimates.
There is an urgent need for a WEFN-focused database that can manage data interactions at local and global scales as well as the multidisciplinary nature of WEFN studies.The database should include all the globally available variables reviewed here.A monthly temporal resolution would balance the need to represent seasonal and interannual variability but also maximise data availabilities, as some variables are not monitored at finer temporal resolution.Such a database would be beneficial for basin-to national-scale rapid assessment of nexus interactions in near-future decision making and planning.For example, if food availability is low in a region, there is no benefit in assessing energy crop biomass.Further analysis and modelling would still be required for full WEFN implementation, but global maps could assist in this rapid nexus assessment.
The multi-disciplinary nature of WEFN studies makes a user-friendly database interface necessary, for example by combining the global data through map layers.With WEFN associated international agencies as custodians and suitable investment, this could be in the form of a data repository, however, this requires substantial effort.Thus, failing this, a referral service to the appropriate water, energy, or food data output data online is also recommended.As highlighted in Fig. 13, global data often require expertise in post-processing.With the number of disciplines involved in the WEFN, data collection and presentation are complex, and both decision makers and researchers will be required to work outside of their main area expertise.A map-based interface could assist under these conditions.

Priorities for data improvements for WEFN modelling
Considering the direct data interactions within the WEFN summarised in this review (Fig. 14), there are clear priority areas for data collection.Soil moisture, crop mapping and solar irradiance are central in the WEFN (Fig. 14b), but existing global data sources can capture the variables relatively well, compared with other variables.Wind speed has also been allocated a central role in the nexus due to its importance in ET calculations for irrigation.Wind speed data have high uncertainties which have large consequences for the WEFN, considering that 70% of water is used for food production globally (FAO, 2014).Wind speed is thus considered a priority area in the WEFN global data improvement.Considering only global data interactions, streamflow data are only relevant in global water and energy data; however, it is considered a priority area for the WEFN, due to its impact on hydropower calculations which are central to many WEFN studies.Wind estimation and streamflow both currently require substantial post processing and/or numerical modelling from global data sources considered here to create location specific, precise estimates.
Thus, currently ground, national and multilateral agency level, and other data sources are required for WEFN implementation.Variables where global data are currently insufficient include streamflow, groundwater, crop yield, biomass, geothermal energy, and hydropower energy.This requirement cascades into other global data calculation (Fig. 14), therefore ground networks and other data sources must continue to be maintained and improved.Notably, Geographic Information Systems (GIS) are useful tools for managing data for the WEFN.Although the focus of the review here was mainly on temporally varying variables, static variables often best managed in GIS can also complement the data discussed here.For example, river streamflow and reservoirs/lakes GIS datasets, showing locations and vectors for dams and rivers, including, MERIT-Hydro (Yamazaki et al., 2019), SWORD (Altenau et al., 2021), and GeoDAR (Wang et al., 2022).National and multilateral level data collection are also imperative from a WEFN perspective to supplement the gaps in global data products.National Statistics Offices often report on past crop yield and area data, as well as meteorological data with varying variables and resolutions.National meteorological offices collect and often share some hydrological data, although the data publicly available online varies.Multilateral agencies, such as the World Meteorological Organization (WMO), IRENA, and FAO, collect and publish data on water, energy, and food, respectively.

Conclusion
This review of global data sources has synthesized and recommended data sources for modelling the WEFN, whilst highlighting priority areas for WEFN global data sources.The data hierarchies presented will assist data prioritisation and collection efforts for WEFN researchers, practitioners, and policy makers with specific relevance for enhancing progress towards national SDG and net zero targets across each of the water, energy, and food sectors.Implementation of WEFN modelling using openly available global datasets allows national and transboundary consideration of SDGs 2, 6, and 7, even for locations where ground observation data are unavailable or inadequate for assessment.
The recommendations for priority areas are based on current gaps in data availability as well as the synergies that exist in global data.Streamflow, wind speed, and hydropower are priority areas in the WEFN that are not currently suitably monitored by global data sources.Crop yield, biomass, geothermal energy, and groundwater, also require additional data or post processing before use.Furthermore, although global data sources can monitor many WEFN variables, spatial and temporal scale discrepancies between variables and sectors still exist, particularly with respect to the coarse spatial resolution of data.Due to the synergies between global data sources here, a combined database for WEFN assessments would be beneficial to better integrate decision making and homogenise the different data sources.Interactions between variables and their uncertainties were highlighted as they are vital to represent correctly in WEFN modelling.
Considering the number of variables which still require additional data for estimation, national and multilateral institution data are still needed for WEFN modelling.National and multilateral institution data can complement the data collected from the global data sources discussed here for calibration and validation.Thus, ground networks and the associated databases must continue to be maintained and improved in the future in parallel with global data improvements.The data hierarchies and recommendations presented here provide an insight into the data available for WEFN implementation in locations where data scarcity is still prevalent.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
Table 12 provides a list of URLs for data sources mentioned above.In the "Data Information" column are links to home pages for datasets, or pages which provide a brief description of data.In the "Data URL" column, there are URLs to data download pages.There are some databases Table 12 Data websites and downloading platforms.

Product Data Information
which contain many of the datasets mentioned, for example, including JAXA's "GPortal", NASA's "Earthdata", and USGS's "Earth Explorer".Whilst an attempt has been made to refer readers to the original data source, there may be cases where a secondary data source is cited.

Fig. 3 .
Fig. 3. Soil Moisture data hierarchy, based on depth of soil moisture of interest and vegetation.

Fig. 4 .
Fig. 4. Groundwater Data Proxies.Note none of the above products provide exact estimates of GW.

Fig. 6 .Fig. 7 .
Fig. 6.Hydropower data hierarchy, based on globally available data.No readily available data currently exist, and reference should be made to appropriate sections.
J.W.Lodge et al.

Fig. 9 .
Fig. 9. Geothermal data hierarchy based on globally, openly available data.Only where deep individual surface features can be detected and analysed, can resource availability be estimated.Otherwise, RS can only be used for identifying areas for further investigation.

Fig. 10 .
Fig. 10.Biomass data hierarchy, based on globally openly available data.National statistics are necessary for residue estimation.

Fig. 11 .
Fig. 11.Crop Area data hierarchy based on globally openly available data.Note that finer spatial resolution represents sub-kilometre spatial resolution.

Fig. 13 .
Fig. 13.WEFN Global Data Source Capability for Water (blue), Energy (yellow), and Food (green).Point shapes represent data source typestriangle for satellite remote sensing (RS) and square for reanalysis or land surface models (LSM) datasets.Streamflow data which are currently not globally available is represented by a circle.Reliability (x-axis) is a qualitative measure of how much uncertainty is inherent in the datasets based on literature reviewed.The y-axis represents the amount of post-processing required to use the datasets for WEFN modelling.P-Precipitation, ET-Evapotranspiration, SM-Soil Moisture, St-Lake Storage, Fl-Streamflow, CA -Crop Area, CY -Crop Yield, SI-Solar irradiance, W-Wind, W(off)-Offshore wind, B-Biomass, H-Hydropower, Geo-Geothermal Energy.(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Fig. 14 .
Fig. 14. a Direct data synergies between the natural and human system variables involved in the WEFN (left) where each circle represents a variable and they are coloured according to their primary sectoral classification (bluewater, yellow -energy and greenfood), and Fig. 14b -Data Nexus overlaps (right).(For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 2
Different global precipitation data types and examples of datasets available.Spatial resolution, temporal resolution, temporal coverage, *: gauge corrected, (*): Both gauge corrected and uncorrected available.Adapted from Beck et al. (2019a).

Table 3
Openly available ET data products available at the global level.

Table 4
Openly available soil moisture products available at the global level.

Table 5
Openly available altimetry products.Note, although globally available, only a small percentage of lakes are currently captured by this data.

Table 6
Openly available data for water surface area, available at the global level.

Table 7
Openly available energy data, available at the global level.Also included are current energy installed and future targets of energy installation to achieve net-zero by 2030.

Table 8
Openly available solar irradiance products.HELIOSAT databases, satellites, reanalysis, and near-global to global databases are detailed below.Only GOES has a method advised, as this method is commonly used for the satellite coverage.

Table 9
Openly available global scale wind energy data products.

Table 10
Biomass Satellite Missions, including optical and radar.Note: many radar products are only available by request, if downloading fine spatial resolution data.

Table 11
Crop Information Datasets grouped according to application.Useful in WEFN modelling for irrigation estimates, crop area, crop productivity, and food output estimation.