Spatially explicit regionalization of airborne flux measurements using environmental response functions

. The goal of this study is to characterize the sensible ( H ) and latent (LE) heat exchange for different land covers in the heterogeneous steppe landscape of the Xilin River catchment, Inner Mongolia, China. Eddy-covariance flux measurements at 50–100 m above ground were conducted in July 2009 using a weight-shift microlight aircraft. Wavelet decomposition of the turbulence data enables a spatial discretization of 90 m of the flux measurements. For a total of 8446 flux observations during 12 flights, MODIS land surface temperature (LST) and enhanced vegetation index (EVI) in each flux footprint are determined. Boosted regression trees are then used to infer an environmental response function (ERF) between all flux observations ( H , LE) and biophysical (LST, EVI) and meteorological drivers. Numerical tests show that ERF predictions cov-ering the entire Xilin River catchment ( ≈ 3670 km 2 ) are accurate to ≤ 18 % (1 σ ). The predictions are then summarized for each land cover type, providing individual estimates of source strength (36 W m − 2 < H < 364 W m − 2 , 46 W m − 2 < LE < 425 W m − 2 ) and spatial variability (11 W m − 2 < σ H < 169 W m − 2 , 14 W m − 2 < σ LE < 152 W m − 2 ) to a precision of ≤ 5 %. Lastly, ERF predictions of land cover specific in River Agreement of cover specific Bowen to within the of the presented approach. This study indicates the potential of ERFs for (i) extending airborne flux measurements to the catchment scale, (ii) assessing the spatial representativeness of long-term tower flux measurements, and (iii) designing, constraining and evaluating flux algorithms for remote sensing and numerical modelling applications.


Introduction
Measurements of the exchange of heat and moisture between the land surface and the atmosphere are critical to our understanding of the role of terrestrial ecosystems in the global climate system.Ground-based eddy-covariance (EC, a summary of all notation is provided in Appendix A) measurements are suited to continuously monitor selected sites for long periods and enable the integration in time (Baldocchi et al., 2001).However, these results might only represent  Steffens et al., 2008).
small areas around the immediate measurement locations (e.g.Kaharabata et al., 1997;Schuepp et al., 1992).On the other hand aircraft-based measurements can provide flux information at regional scales (e.g.Desjardins et al., 1995) but are restricted to short periods of time.Thus the temporal and spatial characteristics of ground-based and airborne measurements complement each other (Gioli et al., 2004;Mauder et al., 2007).It is desirable to integrate both approaches in an effort to provide suitable datasets for the design, constraint, and evaluation of mass and energy exchange models at site as well as at regional scales (Chen et al., 1999;Desjardins et al., 1997).In the following we briefly review the requirements for spatial scaling of airborne EC measurements, and the applicability of airborne EC measurements over complex terrain.
Aggregation approaches enable estimating the exchange over entire landscapes, provided fluxes for characteristic land cover features or domains are known (Beyrich et al., 2006).Flight path segmentation can be a useful tool to directly relate airborne EC measurements to landscape units (e.g.Desjardins et al., 1994;Vellinga et al., 2010).It is also possible to functionally relate these measurements to land cover properties, which then reflect the effects of vegetation, climate, soil and topography on the flux strength.For example, Kirby et al. (2008) propose a method for discerning individual fluxes in a heterogeneous landscape based on subsets of "pure" flux fragments.Another approach is to utilize quantitative information about the EC measurement's spatial context, on which basis environmental response functions (ERFs, Desjardins et al., 1994) can be derived.The general idea of ERFs is to establish a relationship between spatially or temporally resolved flux observations (responses) and corresponding environmental drivers.Hence ERFs are a quanti-tative mechanism to extract relationships from, and to condense the information content in a dataset.If sufficiently accurate, the extracted relationships can then be used, e.g. to bridge observational scales or to adjust the spatial representativeness of ground-based flux measurements.In addition, current methods to spatially resolve surface fluxes are mainly focused on remote sensing algorithms (e.g.Fan et al., 2007) and process-based land surface models (e.g.Vetter et al., 2012).These procedures often demand far-reaching assumptions, such as the closure of the energy and water balances (e.g.Anderson et al., 2012), or are challenging with respect to the required data basis (e.g.Kaminski et al., 2012;Ziehn et al., 2011).In contrast, accurate ERFs enable inferring high-resolution surface flux maps directly from observational data with minimal, quantifiable assumptions.However, ERFs cannot provide insights, e.g.into ecosystem pools.Consequently, ERFs might be suitable for complementing data assimilation and remote sensing approaches, e.g. through contributing to the design, constraint and evaluation of flux algorithms.
The forenamed applications require the relation of the airborne measured fluxes to land cover properties.To enable this requirement an aircraft is bound to measure close to the surface, where characteristic fluxes from different land covers are not yet fully homogenized (or blended, Mason, 1988;Wood and Mason, 1991).Moreover, the flux must be measured at a constant altitude above ground, so as to avoid artificial flux contributions through altitude fluctuations along vertical gradients (Vickers and Mahrt, 1997).However investigation areas are seldom ideally flat, and topography can vary significantly throughout a domain.To safely follow terrain contours at a low and constant altitude above ground, the aircraft must possess a low ratio of true airspeed to climb rate.Only a few airborne platforms fulfil this requirement (e.g.Bange et al., 2006;Gioli et al., 2004;Thomas et al., 2012), with the weight-shift microlight aircraft (WSMA) being one of them (Metzger et al., 2011(Metzger et al., , 2012)).In forenamed studies we describe a WSMA that enables airborne EC flux measurements in remote settings at reasonable cost and minimal infrastructural demand.The objectives of the present study are to investigate the possibilities of (i) deriving meaningful EC fluxes from WSMA measurements over complex terrain, and (ii) scaling the results to a domain of interest.
We applied the WSMA over the undulating steppe of the Xilin River catchment (XRC), Inner Mongolia, China (Fig. 1).On 21 days in the summer of 2009, flights along line transects were conducted at 50-100 m a.g.l.From boundary layer scaling it is found that the vertical flux gradients below the flight level satisfy the surface layer definition (constant within 5-10 %).Hence measured sensible (H ) and latent heat flux (LE) can be interpreted as surface fluxes (Sect.3.1).Because of its climate and management practices typical for semiarid grasslands of China (Butterbach-Bahl et al., 2011), intensive ecological research commenced in the XRC in the late 1970s (Jiang, 1985).Besides Tibet, Inner Mongolia is China's most important province for grassland-based livestock production, and desertification due to overgrazing is a major problem for extensive areas (Meurer and Jiang, 2001).In addition, the land cover in the investigation area varies distinctly in space and time (Ketzer et al., 2008;Schaffrath et al., 2011).We postulate that airborne EC flux measurement is a promising tool to gain new insights into the spatial variability of heat and moisture exchange across the XRC.
In the present paper we firstly introduce the WSMA and the on-board measurements, and give an overview of the climate and physical composition of the study area (Sects.2.1, 2.2).We then describe a measurement strategy (Sect.2.3) which is linked to a novel data processing approach (Sect.2.4).A wavelet transformation allows us to resolve fluxes above each overflown cell of a 90 m land cover raster without neglecting flux contributions on much larger scales.In combination with footprint modelling and a nonparametric machine learning technique, an ERF is computed between the airborne flux observations and meteorological and land surface drivers (Fig. 5 provides an overview of the data flow).In Sect. 3 we present the results of this functional relationship and evaluate its potential to explain the spatial distribution of the heat and moisture exchange along the flight lines.We interpret the results in the context of blending scales and statistical errors, and discuss the uncertainty associated with using the ERF to predict fluxes to freely selectable domains in the XRC (Sect.3.2.4).Lastly, we give an outlook on potential applications of WSMA flux measurements and future improvements of the presented methodology (Sect.4).

The weight-shift microlight aircraft
The structure of a WSMA differs from common fixed-wing aircraft: it consists of two distinct parts, the wing and the trike, which hangs below the wing and contains pilot, engine and the majority of the scientific equipment.This particular structure provides the WSMA with exceptional transportability and climb rate, which qualifies it for applications in inaccessible and topographically structured terrain.A detailed description of the physical properties of the WSMA used in this study as well as characteristics and manufacturers of sensors and data acquisition is given in Metzger et al. (2011Metzger et al. ( , 2012)).In short, most variables are sampled at 100 Hz and are block-averaged and stored at 10 Hz, yielding a horizontal resolution of approximately 2.5 m.In this study, we use the 10 Hz measurements of the 3-D wind speed (σ w = 0.04 m s −1 precision), temperature (σ = 0.04 K), humidity (σ = 0.005 g m −3 ), and the height a.g.l.(σ = 0.04 m).From error propagation it was found that changes in friction velocity (u * ), H and LE of 0.02 m s −1 , 5 W m −2 , and 3 W m −2 , respectively, can be reliably distinguished (Metzger et al., 2012).In addition, we use for this study slow measurements (≤ 0.1 Hz) of humidity (TP3 dew point mirror, Meteolabor AG, Wetzikon, Switzerland), surface temperature (CT infrared thermometer, Optris GmbH, Berlin, Germany), and down-welling shortwave radiation (LI-200 SZ, LI-COR Inc., Lincoln, Nebraska, 400-1100 nm, within an error of 5 % equal to pyranometer measurements, 300-3000 nm).

Study area
Airborne EC flux measurements were performed in the XRC from 23 June to 4 August 2009.The hilly investigation area lies south of the provincial capital Xilinhot, Inner Mongolia, China (43.1-43.9• N, 116.0-117.2• E; 1000-1500 m a.s.l., Fig. 2).The XRC covers an area of ≈ 3670 km 2 and is characterized by temperate continental monsoon climate, with cold and dry winters and warm and wet summers.From data of the years 1982-2005 at the Inner Mongolia Grassland Ecosystem Research Station (IMGERS, 43.63 • N, 116.70 • E; 1187 m a.s.l., Fig. 2), the monthly mean air temperature ranges from −21 • C in January to +19 • C in July, with an annual mean of +1 • C (Liu et al., 2008).Variability in total annual precipitation is high (166-507 mm) with a mean annual sum of 335 mm.Typically, 60-80 % of the rainfall occurs from June to August (Chen, 1988).June 2009 (57 mm) and August 2009 (60 mm) were in the usual range, but July 2009 (35 mm) only received half of the long-term average rainfall.Detailed information on the meteorological conditions during the flight campaign is provided in Appendix B. Chestnut soils are the main zonal soil types, with a land cover dominated by Stipa grandis, S. krylovii, Artemisia and Leymus chinensis steppe.Throughout the XRC the abundance of C4 species in the steppe composition is relatively homogeneous (15-25 %, Auerswald et al., 2009).The growing season usually lasts from the end of May to late September (Liang et al., 2001).
Land cover in the XRC had been classified on the basis of a Landsat 7 Thematic Mapper image of 17 August, 2005 (Wiesmeier et al., 2011).In recent years however the development of settlements sprawled, and irrigated agriculture is gaining popularity (Qi et al., 2007).The Bowen ratio (Bo) of the latter is distinctly different from the land cover classes that already exist in the classification of Wiesmeier et al. (2011).This land cover classification was thus updated and extended by visual reclassification of Advanced Spaceborne Thermal and Reflection Radiometer (ASTER) images of 7 and 28 April 2009.The result is a land cover map with a resolution of 90 m, which is dominated by generic steppe (71 % coverage), intersected by a dune belt (10 %, Fig. 2).The coverage of bare soil, mountain meadow, marshland and rainfed agriculture is each ≈ 5 %, and the coverage of water bodies, settlements and irrigated agriculture is sub-per cent.In the context of this study the land cover classification represents the longer-term effects of vegetation, climate, soil and topography.
The spatial variation of temperature and precipitation in the XRC follows altitudinal and latitudinal trends (Auerswald et al., 2009;Wittmer et al., 2010).To resolve the effective state of biophysical surface properties over time, we use Moderate Resolution Imaging Spectroradiometer (MODIS) data.We chose 8-day composites of the daytime land surface temperature (LST, MOD11A2.5, 1 km resolution), and 16-day composites of the enhanced vegetation index (EVI, MOD13Q1, MYD13Q1, 250 m resolution) for this purpose.Due to an 8-day overlap of the EVI data products by the MODIS Terra and Aqua missions, LST and EVI datasets could both be acquired for 4, 12, 20, 28 July, and 5 August 2009.The LST and EVI datasets were bi-linearly interpolated to the 90 m resolution of the land cover classification, and linearly interpolated in time to yield an individual map for each flight day.The spatial gradients in temperature and precipitation throughout the XRC are clearly reproduced by the greenness of the vegetation (Fig. 2).The spatio-temporal resolution of the MODIS data enables assessing the actual state of the biophysical conditions at the land surface.LST and EVI vary significantly not only throughout the study period, but also between the different land cover types (Fig. 3).All land cover types follow a similar temporal trend, with LST and EVI peaking mid-July and end-July, respectively.While open water is the coolest surface, LST and greenness increase from mountain meadow over marshland to irrigated agriculture, which are likely strong sources of evapotranspiration.The reverse relationship (increasing LST and decreasing EVI) is found for settlements, rainfed agriculture, dunes, steppe and bare soil, which are likely strong sources of H .
A ceilometer (LD40 -Vaisala, Helsinki, Finland) was deployed at IMGERS and provided vertical profiles of the atmospheric laser radiation backscatter intensity (Mini light detection and ranging, originally applied for the detection of the cloud base height).The depth of the convective boundary layer (CBL) was inferred from 10 min means of these data, in combination with semi-daily radiosonde ascends in nearby Xilinhot (World Meteorological Organization station 54102, http://weather.uwyo.edu/upperair/sounding.html).For this purpose the maximum gradient method is used, which enables the detection of up to five lifted inversions (Emeis et al., 2008;Helmis et al., 2012;Münkel and Roininen, 2010).It is assumed that the aerosol number concentration, size distribution, shape and chemical composition (refractive index, absorption) adapt rapidly to the CBL structure.If there was more than one maximum or layer detected, the lowest one is taken as the CBL depth.The CBL depth is used in Sect.2.4.1 for the calculation of atmospheric length scales, and in Sect.2.4.3 for the source area calculation of the airborne flux measurement.In addition, the cloud cover during the flight periods was monitored by ground personnel at IMGERS.

Measurement strategy
Advancing into more complex terrain, a flight strategy needs to be derived that considers (i) pilot safety, (ii) vertical flux gradients, (iii) orographically induced effects on radiative transfer and turbulence generation, (iv) statistical errors, and (v) the land cover distribution.Such a flight strategy was derived for the XRC study region using the geographic information system ArcMap 9.2 (Environmental Systems Research Institute, Redlands, CA, USA).(i) A minimum flight level of 50 m above ground was found to provide the pilot with sufficient clearance for safe flight even under buoyancy-driven turbulence.(ii) Measuring at a constant pressure level would make a conversion of the measured temperature and densities into potential quantities (Sect.2.4) less important.However, over tilted or undulating terrain the aircraft would partially travel along the vertical flux gradients, thus spuriously contaminating the measured flux signal.A correction for the vertical flux gradients over complex terrain is not as straightforward and well conditioned as the conversion into potential quantities.In order for the fluxes to remain interpretable, the aircraft should thus measure at approximately constant height above terrain; i.e. the flight paths should follow the terrain contours.(iii) We use a digital elevation model (Shuttle Radar Topography Mission, Tile 60 04, data version 4.1, Jarvis et al., 2008) to calculate slope angles.In an effort to avoid immediate orographically induced effects on radiative transfer and turbulence generation, all locations within 500 m radius around slopes exceeding 6 • were masked.This radius approximately equals five times the standard deviation (SD) of the terrain elevation.(iv) To reduce the statistical errors, the flight path should be long and perpendicular to the mean wind direction.Thus we aligned straight flight lines along four wind axes in the areas which were not masked in the previous step.(v) Of the flight lines that were frequently perpendicular to the mean wind direction, those that best represented the land cover distribution in the XRC were covered on multiple flight days.This strategy results in a terrain-following flight patterns, with typical altitude gradients of 100 m vertical on 10 km horizontal.The climb angle of the aircraft rarely exceeds ±5 • , and on average the height above ground is constant to within 12 m (Table 1).Each flight line was repeated until a minimum of 40 km of data were acquired.
The aircraft was operated from IMGERS (Fig. 2).For the present study we use data from six days in July 2009 (Table 1), which were selected according to the availability of auxiliary datasets and homogeneity of the down-welling shortwave radiation S ↓ along the flight tracks (Table B1).On each day measurements were carried out along pairs of approximately parallel flight lines at a nominal airspeed of 27 m s −1 , with eight individual flight lines in total.Each pair is located across or along the humidity and temperature gradients in the XRC (Fig. 2).This strategy provides two independent datasets for each flight day, and covers the fundamental climatic gradients in this area.
The land cover type most frequently observed below all flight lines is steppe (Table 1).Flight lines with significant surface coverage of marshland or irrigation agriculture tend to be greener (higher EVI) compared to flight lines with significant coverage of dunes or bare soil.In the following, we investigate whether spatially resolved land cover information can be used as predictor for H and LE measured along the flight lines.At this we hypothesize that LST and EVI are rep- resentative proxies for heat and moisture sources on the surface, respectively (e.g.Glenn et al., 2008;Lyons and Halldin, 2004;Nagler et al., 2007).Because aircraft measurements cover a broad state space, the resulting observations are particularly suited to infer ERFs.

Data processing
An analysis package for the processing of the airborne EC data was developed in GNU R version 2.13 (R Development Core Team, 2012).The analysis package is described in detail in Metzger et al. (2012) and is available upon request.Relevant processing steps are: (i) the raw data are screened for spikes; (ii) humidity from fast response and slow reference sensors are merged using a complementary filter; (iii) the WSMA temperature and densities are transformed to potential quantities at the mean flight altitude (pressure level) of each flight line; (iv) the time delay due to separation between the vertical wind measurement and the temperature and humidity measurements is corrected by maximising their lagged correlation; (v) the WPL correction according to Webb et al. (1980) is used to correct LE for density fluctuations; (vi) correction for spectral artefacts (−1 ± 1 %, −2 ± 1 %, and −6 ± 2 % for the SDs in the wind components along, transverse and vertical to the aircraft coordinate  Lenschow and Stankov (1986) and Lenschow et al. (1994).

Horizontal mixing between surface and flight level
Horizontal mixing between the surface and the flight level results in the spatial integration of fluxes above heterogeneous terrain, a process also referred to as "blending" (e.g.Mason, 1988).Working toward an ERF between surface properties (driver) and flux measurement (response), we will test three hypotheses related to horizontal mixing.At flight level (i) the turbulence is in approximate equilibrium with the land surface in the flux footprint; (ii) the measured turbulence statistics are representative of the mechanical setting upwind; (iii) changes in the turbulent flux can be resolved at the horizontal scale of surface heterogeneities.Several analytical formulations have been developed to characterize a mixing regime (e.g.Mahrt, 2000;Raupach and Finnigan, 1995;Wood and Mason, 1991).Such formulations are usually based on the comparison of characteristic length scales for surface heterogeneity and CBL mixing.
Here we use the autocorrelation function to estimate the typical horizontal scale of surface heterogeneity L H .For this purpose we integrate the autocorrelation function of the WSMA-measured surface temperature T s at distance d along a flight line from zero lag to the first crossing with zero at lag c 0 ; where overbars denote the mean along a flight line, and primes denote the deviations from this mean.L H can be interpreted as the spatial coherence of surface features along the flight path (e.g.Strunin et al., 2004).In the following we assume that L H is isotropic within the flux footprint, i.e. also representative perpendicular to the flight path.
In order to further characterize the mixing regime, L H can be compared to atmospheric length scales.These length scales inter-relate the transport strengths in the horizontal and vertical directions, and correspond to the along wind distance after which the air mass below a reference level is approximately homogenized.Formulations for atmospheric length scales mainly differ in their use of (i) the measures of transport strengths, and (ii) the vertical or horizontal reference scale.Raupach and Finnigan (1995) proposed a length scale L R (now also referred to as Raupach length), which characterizes the mixing regime throughout the entire CBL; with CBL depth z i , average (bulk) horizontal transport velocity u, and convective velocity w * .Because u cannot be measured directly, it is substituted in Eq. ( 2) and in the following investigations with the measured horizontal wind speed at flight altitude.The influence of surface heterogeneities with spatial scales L H that are small compared to L R is confined below the CBL top.In this case the concept of a "blending height" within the CBL arises.The blending height corresponds to a vertical level at which the turbulent flow field over heterogeneous terrain approaches equilibrium with the local vertical gradient.If the blending height is confined within the surface layer, Monin-Obukhov similarity can be applied above the blending height (Mahrt, 2000).Wood and Mason (1991) define the thermal blending height for unstable stratification; with the buoyancy heat flux from the virtual potential temperature θ 0,v , and the horizontal wind speed u at the blending height.This thermal blending height can be rearranged as thermal blending length; now representing the smallest scale of surface heterogeneity that significantly influences the turbulent flow at flight level z above ground.An improved version of the thermal blending length was proposed by Mahrt (2000); which considers the SD of T s as a measure for the amplitude of surface heterogeneity.The numeral factors in Eqs.
(3)-( 5) were estimated from observations by Mahrt (2000).In Sect.3.1 the results of above formulations are used to test the initial hypotheses related to horizontal mixing.

Wavelet cross-scalogram
The differentiation of land cover types is at odds with the classical time-domain EC method, which assumes homogeneous terrain.However, Parseval's theorem implies that the covariance of signals may be studied not only in the time domain, but equivalently in the frequency domain.There, we have the wavelet transform family of methods at our disposal, which are particularly suited for the spectral analysis of nonstationary signals.
A wavelet is a signal that is localized in both time and frequency.Different localizations of the same basic shape (daughter wavelets) are constructed as a function of time t by defining; where ψ is a suitable mother wavelet, a a scale parameter (in frequency domain), b a location parameter (in time domain), and q = (t − b)/a a dimensionless coordinate (in time-frequency space).The convolution x(t)ψ a, b (t) dt of a signal x with a daughter wavelet ψ a, b yields a wavelet coefficient W x (a, b), a wavelet transform being a collection of such coefficients.We follow the procedure of Torrence and Compo (1998), using the continuous wavelet transform approximation for discrete input.The chosen mother wavelet is the Morlet wavelet ψ(q) = π −1/4 e iω 0 q e −q 2 /2 with the frequency parameter ω 0 = 6.The relevant parameters are spaced exponentially in frequency and linearly in time, respectively: a j = a 0 2 j δj for j = 0, . . ., J and b n = nδt for n = 0, . . ., N − 1, with length of the dataset N, initial scale parameter a 0 (δt, ψ) and number of scale increments J (a 0 , δj, δt, N, ψ). a 0 and J are chosen such that the extreme wavelet scales match the period of the Nyquist frequency, here 0.2 s, and the duration of the dataset, respectively.The unit of increment in the time domain, δt, is given by the sampling period of the time series, here 0.1 s.The unit of increment in frequency domain, δj , can be set to different values, with smaller values increasing both resolution and redundancy.For the present data, the results of the wavelet analyses were insensitive to the choice of δj in the range 0.0625 < δj < 0.25.Hence we follow the example of Torrence and Compo (1998) where C δ is a reconstruction factor specific to each mother wavelet, here 0.776 for the Morlet wavelet.The covariance can also be estimated locally for a subinterval of either j or n.This is a useful feature for dealing with changes in land cover: the continuous wavelet transform is highly redundant, with high correlation between adjacent low-frequency coefficients.Therefore, the covariance for a subinterval in time can be estimated without neglecting low-frequency, largescale contributions.The downside of the large low frequency support is that edge effects due to the finite overall dataset increase with scale.Torrence and Compo (1998) define the cone of influence (COI) as the boundary where the power of edge-related artefacts is damped by a factor of e −2 .Integration over all scales yields results close to the time-domain EC method (here, −7 to −3 % median differences), but also includes less reliable estimates above the COI (e.g.Strunin and Hiyama, 2004).Considering only scales below the COI rejects those less reliable estimates (e.g.Mauder et al., 2007) procedure also systematically increases the discrepancy between wavelet and time-domain EC methods (here, −22 to −7 % median differences).Moreover, the COI tapers toward the centre of the dataset (Fig. 4).Different scales would be included when estimating the covariance for subintervals at different positions along the flight path.Hence, for the localization of flux contributions in space we (i) integrate over all scales of a subinterval, and (ii) use a correction factor for each individual flight path to compensate the difference between wavelet and time-domain EC methods.Such a procedure is suited for the derivation of ERFs, because (i) it enables localization in space, (ii) considers contributions to the local flux from scales that are larger than the subinterval, (iii) is not biased with respect to the global time-domain covariance, and (iv) uncertainty arising from edge-effects is propagated in the ERF and included in the final uncertainty metric (Sect.2.5.1).
Computations were performed with a continuous wavelet transform package written in GNU R (R Development Core Team, 2012), partially based on the published code of Torrence and Compo (1998) available from http://atoc.colorado.edu/research/wavelets.u * , H and LE (and analogously the SDs of the wind components) are calculated for overlapping subintervals of 1000 m length.The subintervals are centred above each cell of the land cover/LST/EVI grids that was overflown by the WSMA, principally yielding one flux observation every 90 m.The resulting sample size for all 12 flights in Table 1 is N = 8446.

Footprint modelling
The footprint-or source weight function quantifies the spatial contributions to each flux observation (Schmid, 2002;Vesala et al., 2008).For this purpose we use the footprint model of Kljun et al. (2004, KL04), which is a parameterisation of the backward Lagrangian model of Kljun et al. (2002) in the range −200 ≤ z/L ≤ 1, u * ≥ 0.2 m s −1 , and 1 m≤ z ≤ z i .The parameterisation depends upon u * , measurement height z, SD of the vertical wind σ w and the aerodynamic roughness length z 0 , of which u * , z and σ w are measured directly by the WSMA.The roughness length is inferred using the logarithmic wind profile with the integrated universal function for momentum exchange after Businger et al. (1971) in the form of Högström (1988).The KL04 is a cross-wind integrated footprint model; i.e. it does not resolve the distribution perpendicular to the main wind direction.In order to account for cross-wind dispersion, the KL04 was combined with a Gaussian cross-wind distribution function (Kljun et al., 2013, in the following referred to as KL04+).In addition to above variables, the SD of the crosswind from WSMA measurements and the depth of the CBL z i from ceilometer measurements (Sect.2.2) are used.This results in a computationally fast footprint parameterisation which considers 3-D dispersion and is not constrained to applications in the surface layer.Metzger et al. (2012) evaluated KL04+ against a backward Lagrangian reference footprint model, and good agreement was found for all considered cases.
Turbulence statistics for a 1000-m-long subinterval over the wavelet scalograms (Sect.2.4.2) are used to evaluate the KL04+.One evaluation is carried out for each overflown cell of the land cover, LST and EVI grids (i.e.every 90 m along the flight path).With the overflown grid cell as base point, the footprint weights w xy ( w xy = 1) are calculated for each grid cell with position x, y, relative to the base point.From here the footprint composition is calculated;

Environmental response function
We base the development of a catchment-specific ERF on the works of Chen et al. (1999), Hutjes et al. (2010) and Ogunjemiyo et al. (2003).The general idea is to establish a functional relationship between spatially or temporally resolved flux observations (responses) and corresponding environmental drivers.Figure 5 provides an overview of the novel approach to ERF presented in the following.Thus far, a suitable number of flux observations was obtained by either shortening the time-domain EC averaging interval (Chen et al., 1999;Ogunjemiyo et al., 2003), or by stratifying repeated observation along the same flight line on different days (Hutjes et al., 2010).The inherent drawbacks are the neglect of either long wavelength contributions to the flux measurement, or inter-day variability of ecosystem drivers.Both are overcome using the wavelet crossscalogram technique (Sect.2.4.2).
Previously, the development of ERFs has solely focused on drivers in the footprint of the flux observations, namely discrete land cover classifications.This procedure ignores within-class variability across a catchment, e.g.along climatic or altitudinal gradients, which can be overcome by using continuous variables such as LST and EVI instead.In addition, the present approach considers the meteorological drivers S ↓, mixing ratio (MR), and potential temperature (θ ).This avoids the need for stratifying or pre-selecting data, and enables constructing a single ERF that is valid for the en-tire observation period and, within the range of the measured variables, throughout a catchment of interest.
Hitherto, ERFs were determined as the inverse of a linear mixing matrix, using either numerical (Chen et al., 1999) or regression methods (Hutjes et al., 2010;Ogunjemiyo et al., 2003).Such a procedure assumes a linear relationship between drivers and responses, which is subject of on-going discussion and research (e.g.Raupach and Finnigan, 1995).Instead, the present approach uses boosted regression trees (BRTs), a non-parametric machine learning technique, to establish an ERF between drivers and responses.In contrast to parametric approaches, BRTs do not assume a predetermined form of the response, but construct an ERF according to the information in the data.It is for this reason that not the absolute values of the land surface and meteorological drivers are important, but rather their spatial variability and coherence.In case of the land surface drivers for example, the only assumption made here is that the spatial patterns of LST and EVI approximate the spatial patterns of source strength in H and LE (e.g.Holmes, 1970;Oke, 1987).This is a much weaker assumption than a mechanistic link, and adds power to the method.BRTs can fit complex nonlinear relationships, automatically handle interactions between drivers, and provide predictive performance that is superior to most traditional modelling methods (e.g.Hu et al., 2010).Here we use the BRT work package by Elith et al. (2008), which builds upon the GBM library by Ridgeway (2012).To identify the optimal choice of parameters and variables for the BRTs, a sensitivity analysis was conducted using the cross-validation (CV) procedure described in Elith et al. (2008).During cross validation all available data are divided into 10 random combinations of training (90 %) and evaluation (10 %) fractions, which allows assessing and optimising model performance.The parameter settings that minimized predictive deviance for the present dataset were found to be: absolute (Laplace) error structure, bag fraction (0.7), tree complexity (5), learning rate (0.1), and number of trees (10 4 ).The initial set of variables also included time of the day, MODIS albedo, atmospheric pressure, land cover, z, z i , u, u * , z 0 , virtual potential temperature, as well as elevation, topographic wetness index, aspect, and slope of the footprint modelled source area.We use the variable dropping algorithm by Elith et al. (2008) to reach a compromise between predictive deviance and model parsimony.This algorithm (i) fits a BRT model, (ii) performs a 10-fold CV, (iii) drops the least important predictor (determined from the improvement to the model and the number of splits, Friedman, 2001), and (iv) repeats this sequence until a stopping criterion is reached.The mean CV deviance can be used to decide how many variables can be removed without significantly affecting predictive performance.Here, we set an upper threshold of 30 W m −2 for the mean CV deviance, which equals ≤ 1/2 the random sampling error in the flux observations (Table 4).The dropping of variables first stopped for LE at 29.2 W m −2 mean CV deviance, yielding a set of the five most important predictors (LST, EVI, S ↓, MR, and θ ).
For H the same predictor set yields a mean CV deviance of only 22.6 W m −2 .Remarkably, atmospheric pressure, z, and z i were no significant predictors for the observed fluxes.This indicates that the chosen flight/analysis strategy effectively minimizes cross-contamination of the flux observations by vertical flux/pressure gradients.Analogously the algorithm dropped elevation, aspect, and slope of the footprint modelled source area as predictors.This shows that slope-induced effects on radiative transfer or turbulence generation do not significantly impact the flux observations.Consequently, the final BRT model is fitting an ERF to H and LE as function of only the five most important predictors.This ERF is then used to predict H and LE throughout the XRC, as a function of LST and EVI for each grid cell, and the median S ↓, MR, and θ for the duration of a flight.
In the following we will use the term LTFM to refer to the overall procedure consisting of Low level flights, Time-frequency-, Footprint-, and Machine learning analyses (Fig. 5).

Uncertainty
Throughout the present study, we use the median and the median absolute deviation as preferred measure of location and scale, respectively (Croux and Rousseeuw, 1992;Rousseeuw and Verboven, 2002).All resulting uncertainty estimates are representative of one standard deviation.For the purpose of detecting systematic differences between observations and predictions, we use the maximum-likelihood fitting of a functional relationship (MLFR, Ripley and Thompson, 1987).This method assigns a weight to each data couple in the relationship, which is inversely proportional to its error variances.In our case, the squared random flux errors in the observations, and the residuals in the BRT cross-validation ensemble are used.This appreciates reliable data and depreciates uncertain data couples.The errors in the MLFR coefficients are determined from a jackknife estimator (Quenouille, 1956;Tukey, 1958).If the regression intercepts were not significant, the relationships were forced through the origin, and confidence intervals were determined from the slope error.The coefficient of determination R 2 was calculated in analogy to weighted least-squares regression (Kvalseth, 1985;Willett and Singer, 1988).It is the proportion of variation in the weighted dependent variable that can be accounted for by the weighted independent variable.Uncertainty in the LTFM up-scaling procedure originates from different sources during measurement and data analysis.Part of these uncertainty terms exhibit random characteristics; i.e. they tend to cease with sample size.Another part however will systematically bias the results, independent of sample size.An uncertainty budget for the random and systematic uncertainties in the LTFM procedure will consist of uncertainty terms for (i) instrumentation and hardware, (ii) turbulence sampling, (iii) spatio-temporal analysis, (iv) BRT residuals, (v) BRT response function, and (vi) BRT state variables.While uncertainty terms (i), (ii), and (iv) can be quantified with readily available procedures (Sects.2.4, 3.3), in teh following we describe several techniques to assess terms (iii), (v) and (vi).

Spatio-temporal analysis in heterogeneous terrain
Under the umbrella of spatio-temporal analysis, we quantify in the following the uncertainty contribution from wavelet analysis, footprint modelling, and the assumption of linear mixing.The fluxes derived from the wavelet cross-scalogram were adjusted to match the leg-averaged fluxes from timeseries EC, which avoids bias between both techniques.Also areas above the wavelet cross-scalogram COI were used in the flux calculation to ensure including all scales of turbulent transport along the entire transect.However, values above the COI are potentially distorted due to edge effects, in particular close to the beginning and the end of each transect.These artefacts propagate in the resulting variances and fluxes, and consequently into the footprint estimates.Additional spatial uncertainty terms result from the use of an "offline" footprint model that does not consider the actual flow field, as well as from the MODIS EVI and LST data.The use of BRTs does not expect a linear response between the state variables and the flux signal.However, LTFM still assumes the linear mixing of the flux signal with respect to the contributing surface patches with different biophysical properties and source strengths.
To quantify the error inherent in the above analysis steps, we compare maps of LTFM predicted fluxes to airborne flux observations.(i) The BRT is trained with all available observations (N = 8466).(ii) Using the median state variables along each flight leg (N = 42), the BRTs response function is used to predict a similar number of flux maps (Fig. 11).(iii) The LTFM footprints are superimposed over these flux maps.(iv) For each flux observation, a predicted flux is calculated as the footprint-weighted average of all contributing cells, and (v) predictions and observations are compared.

Response function
BRTs are a non-parametric machine learning technique in which a response function is constructed according to the coherencies in the training data.As a direct consequence the predictive performance of BRTs depends on how complete the combinations of state variables in the evaluation data are represented in the training data.Here we assess the susceptibility of the BRT response function and predictive performance to missing state variable combinations in the training data.For this purpose, 12 incomplete training datasets are created, each of which omitting a different flight out of the total of 12 flights in Table 1.For each incomplete training dataset, (i) the BRT is trained, (ii) the resulting response function is used with the state variables along the omitted flight for prediction, and (iii) predictions and observations are compared.

State variables
Here, we consider the uncertainty resulting from disregarding part of the natural variability in the state variables that are used for spatially and temporally explicit BRT predictions.For this purpose we quantify the disregarded parts of the natural variability in each state variable and propagate it through the full BRT model.While explicit in time, the meteorological variables measured by the aircraft do not cover the entire catchment.We estimate a measure of spatial variability from all subsequent pairs of flights that are located in different areas of the catchment (Table 1, Fig. 2).The median differences throughout the catchment for S ↓ (−6±12 W m −2 ), θ (−1.1 ± 1.1 K), and MR (−0.5 ± 0.3 g kg −1 ) are not significant (Wilcoxon rank-sum test, p ≥ 0.18).On the contrary, MODIS EVI and LST are explicit in space, but not continuous in time.The 8-day trends from one scene to the next are accounted for in the BRT procedure through temporal interpolation between the MODIS scenes (Sect.2.2).However, processes of shorter duration, such as frequent events of small-scale convective precipitation, go unaccounted.Hence we estimate a measure of the natural variability between two MODIS scenes.For this purpose we calculate the median change of all grid cells between all subsequent MODIS scenes, amounting to 0.01 ± 0.05 for EVI and −0.5 ± 6.2 K for LST.The random part of EVI and LST natural variability by far exceeds the MODIS data product uncertainty of ≈ 0.015 (Xiang et al., 2003) and ≈1 K (Wan and Li, 2008), respectively.Hence MODIS data product uncertainty was not considered separately.
The correlation matrix between the state variables was calculated using all 8446 aircraft observations.A variancecovariance matrix was calculated from this correlation matrix and the random part of the state variables' natural variability.Preserving the variance-covariance relationship, 1000 samples were drawn from a multivariate normal distribution with zero mean.These represent 1000 combinations of co-existing natural variability in the state space of the BRT model.The propagation through the BRT model was performed individually for each combination by (i) superimposing the estimated natural variability over the measured state variables of all 8446 observations, (ii) performing a BRT prediction, and (iii) comparing the results to the undisturbed predictions.In the first part of this section, we assess the surfaceatmosphere mixing regimes.From there wavelet analysis, footprint modelling and BRTs are used to infer ERFs between land surface properties and the flux measurements.Lastly, uncertainties in the LTFM up-scaling procedure are analysed and discussed.

Horizontal mixing between surface and flight level
On spatial average energy conservation requires that the vertical profiles of H and LE approach their respective entrainment flux at the top of the CBL (e.g.Deardorff, 1974;Sorbjan, 2006).The linear vertical flux gradient of H throughout the CBL was calculated (−0.21 W m −2 m −1 -−0.06 W m −2 m −1 ), assuming that H ceases at the statically stable entrainment zone around 0.8 CBL.However, the entrainment flux of E is unknown.Hence we cannot estimate the vertical flux gradient of E, but assume a comparable order of magnitude as for H .The resulting effect of the vertical flux gradient below the flight level is −5 ± 2 % of H , which falls well within the surface layer definition (e.g.Raupach and Finnigan, 1995;Stull, 1988, flux constant within 5 − 10 %).Thus, it is feasible to assume that H and E measured at flight level are representative of surface fluxes.
The characteristic length scale of surface heterogeneity is on the order of several hundred to thousand meters, with an average of L H = 1012 ± 715 m (Table 2).Along identical flight paths L H is comparable between days with different meteorological settings (e.g. 15 and17 July 2009, 26 and30 July 2009).This confirms the usefulness of the surface temperature measurement as a proxy for surface heterogeneity.Only the longer flight paths C1 and C2 cross the dune belt in the centre of the catchment (Fig. 2).The dune belt is the largest continuous land cover after steppe, and consequently the autocorrelation function of T s estimates large values of L H (1458-2615 m).During all flights, L H was small compared to the Raupach length (L R = 1532-5214 m), and thus the influence of the surface heterogeneity is confined within the CBL (z i = 1100-2500 m).Here we use the thermal blending height (z TB1 = 40 ± 29 m) as an estimate for the vertical level where quasi-equilibrium of the turbulent exchange between land surface and atmosphere is reached.At all times the flight level (z = 48-102 m) is above z TB1 and below ≈ 10 % of the CBL depth, a common estimate for the depth of the atmospheric surface layer (e.g.Raupach and Finnigan, 1995;Stull, 1988).Hence it is feasible to assume that the turbulence measurement at flight level is representative for the land surface in the flux footprint.The interpretation of the flux observations might be more complicated for measurement heights below the thermal blending height (limited spatial representativeness) or above the surface layer (vertical flux gradient).The blending length formulations L TB1 (1660 ± 723) and L TB2 (957 ± 441) are used to assess the minimum size of surface heterogeneity that significantly influences the flow at flight level.Here we use 255 m < L TB2 < 1852 m as a guideline, because L TB2 is also representative of the magnitude of surface heterogeneity.The native resolution of the EVI data (230 m) and the land cover data (90 m) is equal to or better than L TB2 , and thus sufficient to reproduce the variability of the land cover.In comparison, the native resolution of the LST data (1000 m) is coarse, potentially leading to an attenuation of the ERFs.
Using the wavelet cross-scalogram, long wavelength contributions to the flux do not constrain the spatial resolution of the flux computation along the flight path.Nevertheless, the random flux error is inversely proportional to the square root of the averaging length (e.g.Lenschow and Stankov, 1986), and propagates directly into the computation of the ERFs.Hence, we consider a trade-off between random error (high resolution) and smearing (low resolution) of the resulting flux estimates.The upwind distance (perpendicular to the WSMA flight path) where 80 % of the flux contributions are included in the footprint, L 80 % = 1171 ± 314 m, is comparable in magnitude to L H . Thus, a flight path length of similar extent (1000 m) is a physically meaningful window for the computation of turbulence statistics and fluxes, because (i) changes in the turbulent flux (response) are resolved at the same spatial scale as the characteristic surface heterogeneity (driver); (ii) the turbulence statistics used for footprint calculations are representative on the same spatial scale as the upwind extent; and (iii) the random error for each flux estimate decreases by ≈ 70 % compared to a window length of 90 m.
The (aerodynamic) roughness length is usually below 1 m, with exception of the low wind speed situation on 17 July 2009, pattern O11, and the higher flight levels (z ≥ 97 m) on 26 July 2009 (Table 2).

Flux un-mixing
The presentation of the flux un-mixing results follows the sequence of the LTFM analysis steps.In Sect.3.2.1 the spatially resolved flux observations from the wavelet cross-scalogram are illustrated.Subsequently, footprint modelling is used to infer the biophysical surface properties in the source area of each flux observation (Sect.3.2.2).In Sect.3.2.3 the ERFs between flux observations and meteorological and land surface drivers are established.These ERFs are then used to predict the surface fluxes throughout the XRC, which are finally summarized for different land covers (Sect.3.2.4).

Spatially resolved flux measurement
Here and in the following we use a flight along pattern O12 for illustration, which follows a shallow elevation gradient (Fig. 4   cross-scalogram allows a high spatial discretization of turbulent flux measurements.At the same time it includes flux contributions from wavelength that are significantly longer than the 1000 m subinterval for each flux observation (Fig. 4).The resulting high number of flux observations along a flight line leads to previously unachievable resolution and coverage of the state space.Spatially coherent flux contributions are detected on transport scales (eddy sizes) of 500-2000 m, that is of similar size as the extent of homogeneous surface patches L H . Strong local flux contributions are confined to scales < 500 m, and approximately decay within the lower threshold of the observed blending lengths L TB1 and L TB2 .This confirms a close coupling between atmospheric turbulence structures with surface patchiness, and consolidates the interpretation of the length-scale approach.The less certain flux contributions above the wavelet COI are small (−15 to −4 % median differences for all flights).In the present example the COI is confined to relatively small scales (≤ 4 km), which is a direct result of the comparatively short flight.In general, more certain flux contributions below the COI include transport scales up to ≈ 1/3 of the flight length, and can reach ≈ 16 km for flight patterns C1 and C2.However, this also implies that the maximum considered transport scale differs between the flight patterns, just as it would be the case for the time-domain EC method.The wavelet crossscalogram reveals strong turbulent transport in the second and fourth quarter of the flight for H , and in the first and third quarter for LE (Fig. 4).When integrated over all transport scales for each overflown 90 m cell of the land cover grid, these patterns correspond to strong upward fluxes.

Land cover
In Sect.3.2.1,turbulence statistics and fluxes were integrated for each overflown 90 m cell of the land cover grid.In the following we expand the integration window to overlapping subintervals of 1000 m length, while retaining a spatial discretization of 90 m.Such a procedure significantly reduces the random sampling error (Sect.3.1), though at the cost of decreasing the number of resulting observations by one window size (dN ≈ 10).The resulting turbulence statistics are used to calculate the source area of each individual flux observation along the flight line, which are superimposed over the land cover grids.Figure 6 shows that in general LST and EVI follow the land cover patterns, e.g.lower temperature and higher greenness for irrigated agriculture and marshland.However, it is also evident that the static land cover classification cannot reflect the current surface conditions.For example, the marshland in the north-western quadrant appears dried-out (high LST and low EVI), while the steppe area in the north-eastern quadrant shows large variations in LST.Hence, biophysical surface properties also vary significantly within the land cover classes.This is likely a function of geomorphological properties such as aspect, slope and soil type, but also due to the large variability of convective rainfall events across the study area (e.g.Schaffrath et al., 2011).Following superimposition of the footprints over the land cover data, the spatial contributions of different surface properties to each flux observation can be quantified (Fig. 7).It is evident that measured Bo changes in correspondence with the dominating land cover, i.e. low Bo for marshland and irrigated agriculture, and high Bo for bare soil and steppe.LST and EVI are stratified between the land covers, although in different sequence compared to the regional average (Fig. 3).The variability of LST and EVI within the land cover classes is equal to or larger than the between-class variability, in particular for marshland, irrigated and rainfed agriculture.While LST and EVI behave inversely for all natural land covers (−0.78 < r < −0.10), the contrary is true for irrigated (r = 0.92) and rainfed (r = 0.30) agriculture.The latter finding appears counter-intuitive, but can be explained by tillage farming in the low-level plains with crops that are not adapted to the semiarid climate, such as potatoes.The albedo of these densely vegetated crops can be lower compared to the sparsely vegetated steppe land cover (α ≈ 0.2, Ketzer et al., 2008), resulting in higher foliage temperatures.Only two natural land covers, marshland and mountain meadow, exhibit similarly high EVI values as the field crops (Figs. 2, 3).Nevertheless, the LST of these land covers is comparatively low.In case of the marshland this can be explained by water-saturated soils with high heat capacity.Conversely, lower temperatures in accordance with the adiabatic temperature gradient are expected for the mountain meadows at higher altitudes.In Fig. 8 H and LE observations along the flight line are shown together with the LST and EVI in the respective source area.Because of the 1000 m integration window over the wavelet cross-scalogram, the results appear smoother compared to Fig. 4, where a 90 m integration window is used.It is apparent that H and LE both systematically change with LST (r H = 0.64, r LE = −0.84)and EVI (r H = −0.62,r LE = 0.73).However, peaks in H (3 km and 9 km in Fig. 8) and in LE (0 km and 7 km) do not manifest when their respective land surface drivers in the footprint are maximal.Instead they seem to follow a trade-off function between LST and EVI.

Environmental response functions
Thus far our findings indicate that the interactions between land surface and atmosphere are multi-facetted and potentially non-linear.Hence we use LST and EVI as topical, spatio-temporal proxies for the source strength of H and LE, rather than using the land cover classification directly.In comparison to earlier flux un-mixing studies (Chen et al., 1999;Hutjes et al., 2010;Ogunjemiyo et al., 2003), this has the benefit of (i) providing individual source strength representations for the effects of surface moisture and temperature, and (ii) representing the land surface by continuous (LST, EVI) rather than discrete variables (land cover classes), thus enabling the use of more advanced scaling algorithms.
Here, we use BRTs to extract the relationships between all (N = 8446) flux observations and land cover (LST, EVI) and meteorological (S↓, MR, and θ) variables.While BRTs are capable of reproducing complex interactions through multilayered branching, the fitted function can be summarized, e.g. as partial dependence plots (Fig. 9).These show the effect of each individual variable on the response after (i) subtraction of the offset (H 0 = 161 W m −2 , LE 0 = 176 W m −2 ), and (ii) accounting for the average effects of all other variables in the model.The partial dependence plots in Fig. 9 are sorted in order of the relative importance of the response variables (Friedman, 2001).The most important responses of H are non-linear (LST, θ), followed by linear responses (S ↓, MR, and EVI).With the exception of MR and EVI, the individual responses are positive in sign.The order of the responses for LE is partially different (MR, LST, θ, S ↓, and EVI), and only the responses on S↓ and EVI are approximately linear (not shown).With exception of MR (concave, maximal response around 10 g kg −1 ) and LST (convex, minimal response around ≈ 310 K), the signs of the responses for LE are positive.It appears surprising that H and LE are only weakly related to S ↓.This can be explained by using only noontime flights in the present study, where S ↓ mainly fulfils the purpose of accounting for varying cloud/radiation conditions between different measurement days.In addition, during individual flights S ↓ was usually constant to within ≤10 % (Table B1).However, when using ERFs to reproduce a diurnal cycle, a much larger dependence of H and LE on S ↓ would be expected.
In Fig. 10 MLFRs are established between BRT fitted values for H and LE and the observed fluxes (N = 8446).
Here we use the BRT cross-validation residuals and the random sampling errors in the observations to determine the MLFR weights of each data point.Uncertainty terms (i), (iii), (v) and (vi) (Sect.2.5) cannot be quantified individually for each observation.Hence these terms are not considered here, but in the final uncertainty budget (Tables 3 and  4).For both H and LE the agreement between the BRT fitted values and the observed fluxes is excellent.Contrary to our initial anticipation, the ERFs are not attenuated by the relatively coarse MODIS LST resolution, as indicated by approximately zero MLFR offset and unity slope.The median absolute deviation in the residuals is small (≤ 1 %).However, several outliers are found for moderate to high fluxes of H (N = 41) and LE (N = 133), for which the BRTs underestimate the observed value by −150 W m −2 or more.The majority of these cases occur during the flights O8 on 13 July 2009 and C1 on 26 July 2009, respectively.On both dates the outliers concur with highly intermittent solar irradiance (200 < S ↓< 1200 W m −2 ) along a short section of the flight paths.For instance an intermittent cloud cover can disrupt the functional relation between the irradiance (driver) and the flux (response) observations, because, (i) at a flight level of 50-100 m a.g.l., the aircraft irradiance measurement does not represent S ↓ in the source areas of H and LE, and (ii) the plant physiological response can vary substantially on spatio-temporal scales that are small compared to atmospheric transport processes between the land surface and the aircraft.
Our choice of land surface and meteorological drivers appears to work well for describing the noontime surfaceatmosphere exchange of heat and water vapour over a moisture-limited landscape.However, it is important to note that appropriately describing exchange processes over longer periods of time, for different landscapes or scalars might require finding an entirely different set of predictors.

Extrapolation and summarization
For the duration of each flight pattern, the trained BRT models are used to extrapolate H and LE throughout the XRC.For this purpose the median meteorological state variables during each flight pattern as well as topical grids of MODIS LST and EVI data are used.Grid cells that exceed the state space of the BRT training dataset (N = 8446) are excluded from extrapolation.Fig. 11 shows the resulting flux grids for three different days, with a spatial coverage of ≥ 92 %.Because of the identical state space ranges for BRT training and prediction, also the ranges of the extrapolated turbulent fluxes are within limits of the observations.Despite that the land cover classification was never used during the extrapolation process, several landscape units are clearly recognizable in the flux maps.For instance bot, the Xilin River valley and the mountainous headwater area to the east display low H and LE.On the contrary, the non-vegetated basin on the northern tip shows consistently low evapotranspiration.For a given meteorological boundary condition (MR, θ, S ↓), the heat fluxes within several hours of solar zenith can be expressed as a function of LST and EVI (Fig. 9).In turn, these biophysical surface properties are characteristic within a land cover class (Fig. 3).Here, we aggregate all grid cells of the flux maps according to land cover class, resulting in sample distributions of H and LE.  a formal transition from a mosaic-to a tile representation of H and LE over the XRC for the duration of each flight pattern (Mengelkamp et al., 2006).These sample distributions then enable the analysis of land cover specific source strengths (36 W m −2 < H < 364 W m −2 , 46 W m −2 < LE < 425 W m −2 ), as well as the spatial variability within a land cover (11 W m −2 < σ H < 169 W m −2 , 14 W m −2 < σ LE < 152 W m −2 ).Table 3 gives an overview of the median land cover specific H and LE over all flight patterns, and their median spatial variability.These results fall well within the range of summertime ensemble average fluxes during solar noon observed by ground-based EC measurements over different land covers in this region (100 W m −2 < H < 310 W m −2 and 100 W m −2 < LE < 480 W m −2 ; Gao et al., 2009;Hao et al., 2007;Hao et al., 2008;Shao et al., 2008).
In comparison, the flight-line average heat fluxes are in the range of 71 W m −2 < H < 310 W m −2 and 46 W m −2 < LE < 300 W m −2 (Table B2).However, the magnitudes of H and LE are not only functions of land cover, but also proportional to the available energy.The available energy changes within, but in particular between flight days.To alleviate this effect and to enable the comparison between different flights, we calculate the Bowen ratio Bo = H /LE between the sample distributions.Despite differences in the meteorological drivers (MR, θ, S ↓), the median land cover specific Bo agrees well between subsequent flight patterns on all measurement days (Fig. 12).During the afternoon flights, 12 ± 9 % higher Bo values are observed compared to the morning flights, as expected from a land surface that desiccates in the course of the day.Nevertheless, the 99.9 % confidence interval includes unity slope.Hence, for several hours within solar zenith Bo does not change significantly, and can be interpreted as a characteristic land surface property.On this basis we summarize the regional flux estimates for the duration of the flight campaign as time series of land cover specific Bo ratios (Fig. 12).The order of Bo between the land covers follows the order of the land cover specific EVI approximately inversely, while the temporal pattern follows the pattern of the land cover specific LST (Fig. 3).High Bo values until mid-campaign indicate that the land surface dries out.This trend is reversed toward the end of the campaign, when the approach of humid air masses leads to considerable precipitation.The median daily  natural variability of Bo within the land covers ranges from 48 % (rainfed agriculture) to 79 % (marshland).Water absorbs strongly in the near infrared, leading to negative EVI values that are not indicative of vegetation greenness.Hence EVI values for water surfaces are discarded, and the land cover "water" cannot be modelled by the present ERFs.Metzger et al. (2012) have shown that turbulent flux measurements with the WSMA platform and instrumentation are unbiased, and precise to within 8 %.Uncertainty due to the limited sampling size of turbulent eddies is estimated using the methods of Lenschow and Stankov (1986) and Lenschow et al. (1994).Details on the implementation can be found in Metzger et al. (2012).systematic (and random) components of this sampling uncertainty range from < 1 % (57 %) for H to < 1 % (121 %) for LE.Table 4 summarizes above uncertainty sources, as well as additional sources which are discussed in the following.

Uncertainty
In order to assess the uncertainty arising from the spatiotemporal analyses (Sect.2.5.1),we compare the median observed and predicted fluxes along all flight legs (Fig. 13).The LTFM predictions slightly overestimate the observed fluxes (H = 5 %, LE = 5 %), but in both cases the 99.9 % confidence intervals include unity slope.The median differences of dH = 2 % (40 %), and dLE = 4 % (47 %) agree marginally more closely.Moreover, the median residuals between fitted and observed values emphasize that the BRT fitting technique is unbiased (Table 4).Subsequently, we assess the predictive performance of the BRT response function in light of missing state variable combinations in the training data.For this purpose one flight at a time was omitted from the training data, and the incompletely trained BRT model was used to predict the missing data.The resulting median differences amount to 11 % (69 %, N = 7311) for H and 18 % (77 %, N = 7265) for LE.During prediction, cases where one or more state variables exceed their respective range during training were excluded.As a consequence the sample size is ≈ 14 % smaller than the total number of observations (N = 8466).
Lastly, we consider the uncertainty resulting from disregarding part of the spatio-temporal variability in the state variables during BRT predictions.For this purpose we quantify the disregarded parts of the natural variability, and propagate it through the full BRT model.The resulting median differences amount to 13 % (77 %) and 14 % (75 %) for H and LE, respectively of LST natural variability (r = 0.81, and −0.69).Because the response of the BRT predictions on LST is non-linear (Fig. 9), deviations of similar magnitude but opposite sign in LST do not cancel out in the predictions.This can lead to a systematic overestimation as a function of the specific state variable combination in each prediction, and is hence dependent on the catchment composition.However, in all test cases the 99.9 % confidence intervals between observed and predicted fluxes include unity slope.Hence we go without introducing a non-linearity response factor, but assign an accuracy of ≤ 20 % to the LTFM method.
Assuming normal distribution and independence, the random parts of all uncertainty terms (Table 4, in parentheses) can be combined to their Gaussian sum.Then, the ensemble random uncertainty σ ens considers the reduction of the random uncertainty with sample size (e.g.Mahrt, 1998); with zero expected value σ ens and the SD σ ran of the population with size N.While σ ran is a measure for the average dispersion of a single observation or grid cell, σ ens quantifies the level of confidence we can expect from aggregating multiple observations or grid cells.The resulting ensemble random uncertainty for land cover specific flux estimates throughout the XRC ranges from < 1 % for steppe to 5 % for settlements and irrigated agriculture (Table 3).

Conclusions
The overarching goal of airborne EC flux measurements is to bridge the gap between observations and data assimilation approaches on different spatial scales.This study develops the LTFM procedure to characterize the exchange of sensible and latent heat for different land covers in a heterogeneous steppe landscape.The procedure "mines" the information content of EC flux observations and extracts quantitative relationships with environmental drivers.In the process LTFM maximises objectivity and data use efficiency -all available observations are considered.The subsequent steps of LTFM are (ii) Wavelet decomposition of the turbulence data yields unprecedented spatial resolution of the flux observations.However, due to edge effects flux observations close to the start or end of a dataset can contain spectral artefacts.Using alternative techniques such as empirical mode decomposition (Barnhart et al., 2012a, b) or structure-parameter methods (Van Kesteren et al., 2013) might help to further improve the results.
(iii) An "offline" footprint parameterization considering 3-D dispersion is suitable to map the differences in surface properties encountered by a flux measuring aircraft.However, when adapting LTFM e.g. to ground-based measurements, the range of surface properties is likely to shrink significantly.In order to improve the decreased signal-to-noise ratio, it might become important to also consider the local flow field, especially when measuring at greater heights.For example, closure models with terrain-following coordinates (Hsieh and Katul, 2009;Sogachev and Lloyd, 2004) or "online" Lagrangian dispersion modelling (Markkanen et al., 2010;Matross et al., 2011;Wang and Rotach, 2010;Weil et al., 2012) could be useful for such a purpose.
(iv) Instead of a static and discrete land cover classification, the LTFM method uses spatio-temporally continuous and topical information of biophysical surface properties.Only the continuous nature of MODIS land surface data enabled the use of the BRT machine learning technique.In this combination the climatic and altitudinal gradients throughout the XRC are successfully reproduced.In the interest of further advancing LTFM, it is desirable to also consider the uncertainty in the observations during machine learning, and to explore alternative machine learning techniques such as support vector machines (e.g.Yang et al., 2007).
The ERFs resulting from LTFM can aid bridging observational scales, e.g. by isolating and quantifying relevant landatmosphere exchange processes, estimating land cover specific emission factors, extending flux measurements to the catchment scale, assessing the spatial representativeness of EC flux measurements, etc. Analogously applying LTFM to ground-based EC measurements could aid, e.g.advancing the treatment of location bias from diagnostic assessment (e.g.Chen et al., 2012) to prognostic transfer functions, constraining local to regional water budgets, distinguishing anthropogenic and natural sources/sinks in urban environments and substantiating process-studies.

Meteorological conditions
The midday flights are usually accompanied by a thin layer of cirrus clouds, interspersed with local convective cumuli, resulting in a cloud cover between 4/8 and 7/8 (Table B1).
The down-welling shortwave radiation decreases over the duration of the campaign, with minima on 17 and 30 July 2009.These minima coincide with the advection of comparatively moist air, as evident from the higher mixing ratios.S ↓ also correlates with the precipitation history (Table 1,  of northerlies occur.Both the virtual potential air temperature and the surface temperature peak during the middle of the flight campaign.As a result of several convective precipitation events, the mixing ratio increases over the flight campaign, accompanied by a dampening of the surface temperature variability. The ranges of the flight line average turbulent fluxes are 0.3 < u * < 0.5 m s −1 , 71 W m −2 < H < 310 W m −2 and 46 W m −2 < LE < 300 W m −2 (Table B2).The friction velocity peaks during flights under high wind speeds.While H dominates the heat exchange during the middle of the campaign, LE peaks at the beginning and end of the campaign.The Bowen ratio throughout the campaign correlates (r = −0.67)with precipitation history (Table 1), i.e. the moisture available for evapotranspiration.Moreover, H clearly correlates with S ↓ (r = 0.68), while no such relationship was found for LE (r = 0.02).The atmospheric stratification was unstable throughout all flights (Monin-Obukhov length L = −34 ± 20 m), with corresponding high values of the SD of the vertical wind σ w = 0.88 ± 0.11 m s −1 and convective velocity w * = 2.21 ± 0.39 m s −1 .

Fig. 1 .
Fig. 1.Location of the Xilin River catchment in the Inner Mongolia Autonomous Region, China (modified after Steffens et al., 2008).

Fig. 3 .
Fig. 3. Change of land surface temperature (top) and enhanced vegetation index (bottom) for each land cover class throughout the study period.The land cover colour code and corresponding abbreviations are identical with Fig. 2. Vertical dashed lines indicate the flight dates.The land cover "Water" is not present for the EVI, because water absorbs strongly in the near infrared, leading to negative EVI values that are not indicative of vegetation greenness.
and use δj = 0.125.The wavelet scalogram of a signal x is defined as the matrix of |W x (a, b)| 2 for all admissible a, b.Likewise, the wavelet cross-scalogram of two signals x, y is the matrix of W x (a, b)W y (a, b) * , where * denotes the complex conjugate.The global covariance of x and y can be estimated by weighted averaging;

Fig. 4 .
Fig. 4. Wavelet cross-scalograms for the sensible heat flux (top panel) and the latent heat flux (centre panel) along flight pattern O12 on 8 July 2009, 12:16-12:24 CST.The colour palette changes from blue (downward fluxes) over white (neutral) to red (upward fluxes).The shaded areas identify the cone of influence.Below each cross-scalogram the integrated flux over all scales is shown for each overflown 90 m cell of the land cover grid.The surface elevation along the flight pattern is displayed in the bottom panel.

Fig. 5 .
Fig. 5. Flow chart showing how input and reported data streams are processed along the four principal steps of the LTFM method.Additional detail is provided in Sects.2.4.4 and 4, and a summary of all notation can be found in Appendix A.
bottom panel).This flight pattern is particularly suitable for this purpose because of its marked land cover changes over a relatively short distance.The wavelet

Fig. 7 .
Fig. 7. Biophysical surface properties in the footprint of each observation (N = 124) along the flight pattern O12 on 8 July 2009, 12:16-12:24 CST, summarized by land cover.Shown are (clockwise from top right panel) land surface temperature, enhanced vegetation index, Bowen ratio, and the land cover fraction in the footprint.The dashed lines are land cover averages for LST and EVI, and the spatial trend for Bo.The land cover colour code and corresponding abbreviations are identical with Fig. 2.

Fig. 8 .
Fig. 8. Sensible heat flux (left panels) and latent heat flux (right panels) along the flight pattern O12 on 8 July 2009, 12:16-12:24 CST.Also shown is the random sampling error (error bars) for each observation (N = 124), and the spatial trend (dashed line).The top and bottom panels show the land surface temperature and the enhanced vegetation index in the footprint of each observation, respectively.

Fig. 9 .
Fig. 9. Boosted regression tree partial response plots of H for all five state variables in order of their relative importance (in braces).The fitted function (black) shows the variable response of the BRT over the range of one individual state variable, while the remaining state variables are held at an average, constant value.The red dashed line is a smoothed representation of the fitted function (locally weighted polynomial regression).

Fig. 10 .
Fig. 10.Maximum likelihood functional relationships between N = 8446 aircraft observation and LTFM predictions of sensible heat flux (left) and latent heat flux (right).The weight of each data point in the relationship is represented by the size of the circles.The error bars show the cross-validation residuals for the LTFM predictions, and the ensemble random sampling error for the aircraft measurement.The 99.9 % confidence intervals are too narrow to be displayed properly.

Fig. 11 .
Fig. 11.Maps of the LTFM predicted fluxes of sensible heat (H , top) and latent heat (LE, bottom) on 13, 17 and 26 July 2009 (left to right).The colour gradient from blue over grey to red represents values that are lower, equal to, or greater than the of the values, respectively (see legend).Percentages in braces after the flight ID indicate the spatial coverage of the prediction throughout the catchment.Meteorological state variables from the superimposed flight lines are used in the respective LTFM prediction (illustration identical with Fig. 2).

Fig. 12 .Fig. 13 .
Fig. 12. Left: MLFR of Bowen ratio between the first and the second flight pattern on every measurement day.The weight of each data point in the relationship is represented by the size of the circles.Right: time series of Bo for different land covers throughout the measurement campaign.In both images the error bars represent the Gaussian sum of the natural variability in each land cover class and the ensemble random error in the LTFM procedure.The land cover colour code and corresponding abbreviations are identical with Fig. 2.
, and are dominated by the effect 1) low level EC flux flights, (2) time-frequency analysis of the flux observations, (3) source area modelling of continuous biophysical surface properties, and (4) inferring ERFs from non-parametric machine learning.(i) The use of a weight-shift microlight aircraft with low airspeed and high climb rate enables low level flights at constant height even above topographically structured terrain.Masking out slopes during flight planning effectively minimizes cross-contamination of the flux observations by slope-induced effects on radiative transfer or turbulence generation.This reduces the degrees of freedom in explaining the observed flux responses, albeit potentially at the expense of oversimplifying surfaceair exchange processes.
r = −0.40).The wind speed at flight level decays from up to 8.3 m s −1 at the beginning down to 2.7 m s −1 towards the last quarter of the flight campaign.All wind sectors with the exception

www.biogeosciences.net/10/2193/2013/ Biogeosciences, 10, 2193-2217, 2013Table 1 .
Summary of the WSMA flights selected for analysis and related surface conditions.Shown are date, Chinese standard time (CST = coordinated universal time + 8), flight identifier (ID), length of each flight line l, repetitions rep, cumulated precipitation in a 10-day trailing window P , most frequently occurring land cover classes LC1-LC3, and the enhanced vegetation index EVI immediately below the flight lines.A legend with colour codes for LC and EVI is provided at the bottom.The LC colour code and corresponding abbreviations are identical with Fig.2.
* , H and LE, respectively) after Metzger et al. (2012); and (vii) calculation of random and systematic statistical errors after

Table 2 .
Mean length scales ±SD between repetitions during the WSMA flights selected for analysis.Shown are CBL depth z i , aerodynamic roughness length z 0 , flight altitude z, thermal blending height z TB1 , length scale of surface heterogeneity L H , Raupach length L R , the thermal blending lengths L TB1 and L TB2 , and the upwind distance from the WSMA L 80 % , where 80 % of the flux contributions are included in the flux footprint.

Table 3 .
Median land cover specific flux estimates of H and LE from the LTFM procedure over all flight patterns ± median spatial variability within the respective land cover.Also shown are the corresponding median ensemble random uncertainties σ ens (H ), σ ens (LE) and land cover specific sample size N.

Table 4 .
Median systematic-and random uncertainty terms (in parentheses) for a single flux observation or grid cell throughout the LTFM procedure.

Table B1 .
Mean meteorological conditions ±SD between repetitions during the WSMA flights selected for analysis.Shown are cloud cover CC, shortwave down-welling radiation S↓, mixing ratio MR, horizontal wind speed u, wind direction DIR, virtual potential temperature θ v , surface temperature T s , and the SD of the surface temperature σ T s .

Table B2 .
Mean turbulence statistics ±SD between repetitions during the WSMA flights selected for analysis.Shown are friction velocity u * , sensible heat flux H , latent heat flux LE, Monin-Obukhov length L, SD of vertical wind σ w , and convective velocity w * .