Uncertainty in gap filling and estimating the annual sum of carbon dioxide exchange for the desert Tugai forest, Ebinur Lake Basin, Northwest China

In most eddy covariance (EC) studies, carbon flux measurements have a high defect rate for a variety of reasons. Obtaining the annual sum of carbon dioxide exchange requires imputation of data gaps with high precision and accuracy. This study used five methods to fill the gaps in carbon flux data and estimate the total annual carbon dioxide exchange of the Tugai forest in the arid desert ecosystem of Ebinur Lake Basin, Northwest China. The Monte Carlo method was used to estimate the random error and bias caused by gap filling. The results revealed that (1) there was a seasonal difference in the friction velocity threshold of nighttime flux, with values in the growing season and non-growing season of 0.12 and 0.10 m/s, respectively; (2) the five gap-filling methods explained 77–84% of the data variability in the fluxes, and the random errors estimated by these methods were characterized by non-normality and leptokurtic heavy tail features, following the Laplacian (or double-exponential) distribution; (3) estimates of the annual sum of carbon dioxide exchange using the five methods at the study site in 2015 ranged from −178.25 to −155.21 g C m−2 year−1, indicating that the Tugai forest in the Ebinur Lake Basin is a net carbon sink. The standard deviation of the total annual carbon dioxide exchange sums estimated by the five different methods ranged from 3.15 to 19.08 g C m−2 year−1, with bias errors ranging from −13.69 to 14.05 g C m−2 year−1. This study provides a theoretical basis for the carbon dioxide exchange and carbon source/sink assessment of the Tugai forest in an arid desert ecosystem. In order to explore the functioning of the Tugai forest at this site, a greater understanding of the underlying ecological mechanisms is necessary.


INTRODUCTION
In recent decades, with the development of sonic anemometers, the eddy covariance method has become one of the direct techniques for determining the surface-atmosphere carbon dioxide exchange (Baldocchi, 2014). This method consists of calculating the covariance of fluctuations in vertical wind speed and gas concentration, providing valuable data on the dynamic interactions of the surface and atmosphere. At present, more than 800 EC stations around the world are monitoring the carbon dioxide and water exchange of the surface-atmosphere. These stations are located in forests, grasslands, farmlands and other terrestrial ecosystems, concentrated in the grasslands of North America, Europe and Africa, as well as in forest ecosystems (Baldocchi, Chu & Reichstein, 2018;Rebane et al., 2019). However, there have been few consecutive observations of material fluxes in arid desert ecosystems, leading to great difficulties in understanding the role of arid desert ecosystems in the global carbon cycle.
Eddy covariance is influenced by numerous factors during the measurement of carbon dioxide (CO 2 ) flux (e.g., limited turbulence, non-stationarity, blizzards, sand, precipitation and instrument resolution), usually resulting in data loss or abnormalities. One study found that missing or abnormal data in carbon flux observations account for approximately 20-60% of the total (Papale et al., 2006). This poses enormous difficulties for estimating the annual sum of carbon dioxide exchange. The important step in this estimation is gap filling. To solve this problem, a number of approaches have been proposed, including non-linear regression (Ruppert et al., 2006), look-up tables, mean diurnal variation methods (Falge et al., 2001), artificial neural networks (Braswell et al., 2005;Moffat et al., 2007;Papale & Valentini, 2003), MDS (Reichstein et al., 2005), data assimilation, and Bayesian model approaches (Gove & Hollinger, 2006). The differences among gap-filling methods for missing data and their impacts on the estimation of carbon dioxide exchange vary temporally and spatially in urban, forest, and grassland ecosystems (Kunwor et al., 2017;Liu et al., 2010;Moffat et al., 2007). Establishing a standard set of gap-filling methods can enhance comparability between different sites in regional and global flux networks (Falge et al., 2001). Models based on the light and temperature response are widely used for filling gaps in carbon flux observations of natural vegetation (Desai et al., 2005). Gap-filling approaches that rely on the average temperature underestimate flux values compared to those that preserve the observed variance as a consequence of Jensen's inequality, adding further bias to the long-term estimates of carbon dioxide exchange (Moffat et al., 2007). The introduction of additional predictors, such as vapor pressure deficit and soil moisture content, may reduce the bias that has been found to be proportional to the percentage of gaps filled in the annual sums of carbon dioxide exchange (Falge et al., 2001). Hagen et al. (2006) utilized two regression models to fill the gaps of carbon dioxide exchange in Howland Forest, Maine, USA, concluding that the greatest uncertainty comes from model differences. The Gaussian processes (GP) model and the radial basis function (RBF) neural network have been shown to significantly improve model performance by capturing complex relationships in carbon dioxide exchange studies (Menzer et al., 2013;Menzer et al., 2015;Schmidt, Wrzesinsky & Klemm, 2008). There have been few studies on gap filling and the annual sum of carbon dioxide exchange in arid desert ecosystems, and uncertainty still remains in the estimation and evaluation of the annual sum of carbon dioxide exchange (Soloway et al., 2017). The evaluation of gap-filling uncertainty plays an important role in estimating regional carbon sink strength. The scientific comparison and evaluation of carbon flux gap-filling and the carbon budget in arid desert ecosystems are conducive to the sharing of flux data and its related research.
Arid desert ecosystems are playing an increasingly important role in the global carbon cycle (Donohue et al., 2013) and may even supplant tropical rainforests, affecting the interannual variation in the global carbon cycle (Poulter et al., 2014). Recent findings that desert regions remove carbon dioxide from the atmosphere at a magnitude of 100 g C m −2 year −1 suggest that these systems may explain at least a portion of the terrestrial carbon sink that has not been fully identified nor had its mechanisms explained in the global carbon cycle (Jasoni, Smith & Arnone, 2005;Ma et al., 2014;Stone, 2008;Wohlfahrt, Fenstermaker & Arnone, 2008;Xie et al., 2009). Irrigated saline/alkaline arid land is a potentially large carbon sink in the terrestrial ecosystem (Li et al., 2015). The study of carbon flux has direct implications for the missing global carbon sink. The Ebinur Lake Basin is a typical salinized area (Wang et al., 2019b). The Tugai forest is the only natural forest community in the Ebinur Lake Basin, and its contribution to the carbon dioxide exchange of the region cannot be overlooked. Estimating the annual sum of the carbon dioxide exchange of the Tugai forest can provide insight into the issue of the missing carbon sink to a certain extent.
In this study, EC techniques were used to obtain continuous records of the carbon flux of the Tugai forest on the northern bank of the Aqikesu River in the Ebinur Lake Wetland National Nature Reserve, Northwest China. We evaluated the ability of five gap-filling methods to analyze random uncertainties and bias in order to provide more robust estimations of the annual sums of the carbon dioxide exchange from the Tugai forest. In addition, the impacts of these different gap-filling methods-the artificial neural network for backward propagation algorithm (ANN), the radial basis function neural network, the Gaussian processes model (GP), the regularized hierarchical linear model (HLM) with a Bayesian framework, and marginal distribution sampling (MDS)-on the annual carbon dioxide exchange in the Tugai forest were examined and the random uncertainties were studied.

Site description
The study site is located in the Ebinur Lake Wetland National Nature Reserve in the Bortala Mongolian Autonomous Prefecture of the Xinjiang Uygur Autonomous Region,[82][83]. The Xinjiang Ebinur Lake Wetland National Natural Protection Zone Management Bureau approved site access. With a temperate continental climate, the reserve located in the Alataw Pass (Fig. 1A) is a typical arid desert, featuring high temperatures, drought, and sufficient sunshine in summer, and dry and cold conditions in winter. With strong winds from the Alataw Mountain Pass, the loose and relatively thick Quaternary sediment provides a rich sand source for sand shifting, and the soil salinization is generally pronounced. Typical soils in the study area are gray desert soil, gray-brown desert soil, and aolian sandy soil. The azonal soil is dominated by saline soil, and supplemented by meadow soil, bog soil, and brown calcic soil (Jin et al., 2010;Wang et al., 2019a). The annual average temperature in the basin is 6-8 C, the annual precipitation is approximately 100 mm, and the annual average sunshine is 2,800 h. The ecological environment is relatively fragile and consists primarily of two ecological ecosystems: the wetland ecosystem and the arid grassland-forest ecosystem (Fig. 1B). The flux observation tower stands near the Aqikesu River (44 37′4.8′′N, 83 33′59.4′′E) in the forest-desert transitional zone influenced by lakes and rivers (Fig. 1E). The dominant species of plants in this region are Populus euphratica, Haloxylon ammodendron and Phragmites australis, with a combined coverage of more than 60%, accompanied by a variety of shrubs, herbs and desert-specific short-lived plants (He, Lv & Qin, 2014, He et al., 2015Wang et al., 2019). During the observation period, the annual average temperature at the flux tower was 9 C, the highest temperature was 43 C, and the lowest temperature was −26 C. More weather conditions are shown in Fig. 2.

Carbon flux data observation and processing
The data for this study were obtained from the Ebinur Lake Desert Ecosystem Flux Tower (2012XJ-AibiHu-OPEC) using an EC system consisting of a 3-D sonic anemometer (CSAT3; Campbell Scientific, Logan, UT, USA) and an open-path infrared gas analyzer (EC150; Campbell Scientific, Logan, UT, USA). The instruments were installed on the southwest side of the tower above the forest canopy at a height of 15.0 m (~4.0 m above  (Kormann & Meixner, 2001). The measurement tower is marked with a red circle. The white lines represent isopleths of the average contributions.
Only high-quality data were selected to establish the model, and 5,784 (approximately 33.01%) of the data observations with a rank other than zero were excluded. Based on changes in surface vegetation, the year can be divided into the growing season (May-September) (Munir et al., 2015) and the non-growing season (January-April and October-December). This study divided the data into daytime and nighttime periods based on the sunrise and sunset times at the location of the flux tower. The observation time of the daytime data extended from sunrise to sunset, and the observation time of the nighttime data extended from sunset to sunrise. There were 8,924 daytime observations, accounting for 50.94% of the total and 8,596 nighttime observations, accounting for 49.06%. The total count of null and low-quality observations was 6,735, accounting for 38.44%. Thus, there was a large amount of missing data, and data gap-filling faced enormous challenges.

Net ecosystem carbon dioxide exchange
The net ecosystem carbon dioxide exchange (NEE) between the ecosystem and the atmosphere can be estimated as the sum of the vertical EC and the storage term: The low density of the Tugai forest canopies, which is favorable to eddy penetration, can reduce the importance and frequency of storage events. The lower the vegetation height, the less storage of CO 2 . In addition, the effect of the storage terms on the carbon uptake characteristics over long-term time scales is not significant since the cumulative value of the storage terms in the long-term is approximately equal to 0 (Massman & Lee, 2002).
In this study, the storage terms were considered to be negligible compared with the terms associated with EC fluctuation.

Night friction velocity threshold
The friction velocity u Ã is intended to capture the turbulence characteristics in the vicinity of the surface. In a general way, u Ã can be expressed as a fraction of the average wind speed u: where the parameter Π is the roughness index. We used standardized major axis (SMA) regression to compare the roughness difference between the growing season and the non-growing season (Warton et al., 2012). Due to the relatively stable atmospheric stratification at night, the carbon flux observed using EC techniques cannot truly represent the surface-atmosphere carbon dioxide exchange. Nighttime data processing was an important step in the flux data processing. The uncertainty of the annual sum of ecosystem carbon exchange caused by different treatment methods can reach 150 g C m −2 year −1 (Williams et al., 2009). The bias due to the data processing methods in this study area, as well as the estimated bias from the gap-filling techniques, may be greater. To correct selective systematic errors, the international flux community usually excludes nighttime observations below the friction velocity threshold in order to ensure that EC techniques are measured under strong turbulence conditions (Aubinet & Feigenwinter, 2010). Currently, despite theoretical and practical deficiencies, the empirical and effective method for processing nighttime negative flux is filtered by the friction velocity threshold (Aubinet & Feigenwinter, 2010). The friction velocity threshold can be determined by testing the relationship between the flux and the friction velocity, since, in theory, carbon dioxide flux and friction velocity are independent. When the friction velocity is lower than this threshold, F C decreases as the friction velocity decreases. The nighttime flux data of the growing and non-growing seasons were sorted based on the increment of u Ã , and the friction velocity thresholds for the growing and non-growing seasons were then determined. A total of 45 levels were set, and the thresholds were derived by statistically comparing (using the two-sample Wilcoxon test) the flux average at each u Ã level with the flux average at the next higher u Ã level.

Detection of abnormal data points
The technique of using EC to observe carbon flux is susceptible to environmental factors such as rapid changes in turbulence, as well as the instrumentation itself. The 30 min data obtained from processing the observed raw data contained a large number of gaps and abnormal values, which deviated from the range of normal data variation (Li et al., 2005). Given the relatively clear daily variation of the carbon flux data, this study divided the carbon flux data into 48 parts according to the time of day and defined any carbon flux data beyond the upper and lower limits as abnormal points. For outliers in the continuous data, this study used 7 days as the window to determine the difference between the median of three consecutive data points (D i ). The specific algorithm (Papale et al., 2006) was as follows: where F C is the carbon dioxide flux, and i is the ordinal number of the observation data for 30 min. When D i satisfies the relationship the corresponding ith observation data point is normal; otherwise, it is defined as an "abnormal point" and is eliminated (Daytime and nighttime data were processed separately). Md is the median of the F C differences between all sets of two adjacent points. MAD = median (|D i − Md|) and is the median of all |D i − Md| for a 7-day window. In this study, the sensitivity coefficient z of the artificially defined outlier identification was set to 5.5.

Gap-filling methods
Given the actual conditions of a site, selecting the appropriate flux data processing method and reducing the uncertainty of the flux data estimation are important issues that need to be understood and analyzed by the flux community. The effectiveness of different methods for filling gaps in carbon dioxide flux (F C ) data has been investigated. This study used ANN, RBF, GP, HLM and MDS to fill the gaps. The high-quality data were randomly divided into two parts, 80% of which were used to train the model and 20% to test the model. In order to avoid overfitting, four different methods (ANN, GP, HLM and RBF) were used for model training before the best model for each method was selected and used for prediction. In the data analysis, the neuralnet, RSNNS, laGP and INLA packages (R language) were used to train and test ANN, RBF, GP and HLM, respectively. The gap-filling model was evaluated in terms of three statistical factors: the coefficient of determination (R 2 ), root mean square error (RMSE), and bias error (BE).

ANN and RBF
For ANN training, the input data were passed through the nodes using weighted connections, the error between the predicted and actual output values was calculated by backpropagation, and the weight was adjusted to minimize the error (Braswell et al., 2005;Papale & Valentini, 2003). The training data were pre-sampled to ensure equal coverage under different conditions, and fuzzy values were used to represent additional information such as time (Moffat et al., 2007;Papale & Valentini, 2003). By testing the structure of the ANN, the network was built to be as simple as possible, and overfitting was avoided.
During the model training process, with the exception of the input and output layers, only one hidden layer was needed to obtain sufficient results; additional hidden layers did not improve model performance. In the output layer, a linear transfer function was used. Both the ANN and RBF were built using the input variables listed in Table 1, as well as the 30 min data points of F C , with the same parameters presented by Schmidt, Wrzesinsky & Klemm (2008). The number of neurons in the ANN hidden layer was 5, and the number of neurons in the RBF hidden layer was 181.

Regularized hierarchical linear regression model with Bayesian framework
Since signals may be less pronounced than random errors, overfitting is a potential problem in regression models in which parameters estimate noise rather than the underlying signal. In earlier works, hierarchical modeling produced more accurate predictions than regular regression and also tended to reduce overfitting in the presence of a hierarchical data structure (Gelman, 2006). There have been some detailed examples highlighting uses of Bayesian and hierarchical Bayesian methods in plant physiology and ecosystem ecology (Ogle & Barber, 2008). Hierarchical modeling has been proposed to predict monthly streamflow (Lima & Lall, 2010) and extreme flooding (Lima et al., 2016). This study used varying intercepts of hierarchical modeling for the carbon flux data with a temporally hierarchical structure. The form of the model was as follows: where y ij is F C at time j within Julian Day i, and x ij is a vector for the environmental measurement (Table 1) corresponding to F C . The hyperparameters of the prior distribution of a j are g j and s 2 a j . In previous studies, the prior ridge has been chosen for shrinking coefficients due to its weaker penalization of large coefficients (Fahrmeir, Kneib & Konrath, 2010). Therefore, we assumed that bjs 2 b $ N 0; s 2 b I .

Gaussian processes
As nonlinear regression tools, GP have nonparametric flexibility and can be used to fit arbitrary functions or surfaces (Azmy et al., 2009). Therefore, the Gaussian process is often used to replace the supervised neural network in nonlinear regression. The GP can be described by a mean function m(x) (a vector) and a covariance function k(x, x 0 ) (a matrix): where f x ð Þ is a random variable. In this study, f x ð Þ was the modeled F C . This assumes that the mean function m(x) is equal to 0 since our data were standard normalized, and k x; x 0 ð Þ is a function of the Euclidean distance between the predictors x and x′. The squared exponential covariance function was chosen with automatic relevance determination (Bishop, 2006): where s 2 f is a hyperparameter for the amplitude variance. The width hyperparameter s 2 i is separate for each input variable x i .

Characteristics of random uncertainty
The correlation between the predicted carbon flux and its corresponding residual is represented by Kendall's tau (τ) coefficient. If τ is close to 0, the posterior residual does not contain structural error from the model. Conversely, if τ is close to 1, the predicted value should contain the model structure error of the influence, and such interpretation information may be passed to the posterior residual (Wang, Riley & Collins, 2015).
The relationship between the standard deviation of the random uncertainty of carbon flux observations and the magnitude of the observed values is commonly expressed as follows: The intercept term a varies from site to site, with a typical range of 0.9-3.5 mmol m −2 s −1 (Richardson et al., 2008). The larger the value of a, the greater the uncertainty. The range of the slope b is usually 0.1-0.2; the smaller the value of b, the closer the probability distribution of the uncertainty to white noise.

Uncertainty in the cumulative carbon dioxide exchange
To determine the random error of carbon flux and the error caused by interpolation, this study used the bootstrap method to estimate the deviation of the annual sum of ecosystem carbon exchange and its 95% confidence interval. Estimating the RMSE, variance, and bias of the annual sum of carbon exchange required a complete carbon flux time series (1,000 bootstrap sequences). The variance of the total annual carbon exchange can be directly estimated, but due to vacancies in the carbon flux data, the deviation of the total annual carbon exchange cannot be directly estimated. To estimate the (potential) bias in carbon flux vacancy data, we used the bootstrap method to simulate and analyze the bias of high-quality modeling data in order to extend it to an annual scale Wang, Riley & Collins, 2015). The estimation steps used to determine the bias of the annual sum of carbon exchange were as follows: 1. The vectorê, which is the estimation of the random error e, was estimated by subtracting the observed F C from the F C estimated by the models.
2. All data (observed F C with gaps as well as modeled F C ) were divided in terms of the growing season, daytime, non-growing season, and nighttime. The daytime data of the growing season were divided according to the incident radiation intensity (limited to 400 W·m −2 ), resulting in 6 parts.
3.ê was repeatedly sampled in order to obtain a new complete bias sequence (a total of 17,520 repeated extractions per year).

4.
A cumulated summation of the new complete bias sequence was calculated.
6. The empirical distribution of the 1,000 biases of the annual sum of ecosystem carbon exchange, along with the 95% confidence interval of the annual sum of carbon exchange, were estimated.

Carbon dioxide flux data characteristics
To model the high-quality carbon flux data, the 30 min carbon flux data with high signal quality were screened. It can be seen that the flux observation points exhibit more obvious carbon uptake characteristics during the growing season (Fig. 3). To extract the nighttime flux error, the nighttime friction velocity thresholds for the growing and non-growing seasons were estimated, and the data on friction velocities less than the threshold were screened and eliminated. This amounted to approximately 2.4% of the data (415 data points). After processing for outliers and nighttime friction velocity threshold screening, approximately 35.9% (or 6,293 observations) of the high-quality data remained, which was then used to build and test the model (Table 2). Among these measurements, there were 2,892 valid daytime observations (~16.5%) and 1,032 valid nighttime observations during the growing season (~5.9%). There were 1,583 valid daytime observations (~9%) and 786 (~4.5%) valid nighttime observations during the non-growing season.
Within the boxes, the horizontal bars indicate medians, while the tops and bottoms of the boxes illustrate the 75th and 25th quartiles, respectively. Small circles represent outliers in the observations. This study defined the length of time in which data were continuously missing as the missing time window. A histogram plot was used to display the distribution of the missing time window (Fig. 4). The average length of the missing time window for the flux data was 5.8 h; a total of 24 missing time windows were longer than 24 h, mainly concentrated in the non-growing season; the longest missing time window was 30 days (Fig. 5). It can be seen that wind speed and friction velocity u Ã differed significantly between the growing and non-growing seasons (Fig. 6). The details of the relationship between u Ã and wind speed using SMA regression are presented in Table 3; Fig. 6C. The slopes of the SMA regression for the growing and non-growing seasons were not equal, indicating that the roughness index varied considerably with season. Red, orange, and gray represent Qc = 0, Qc = 1 and Qc = 2, respectively. The nighttime friction velocity threshold during the growing season was 0.12 m/s, and the corresponding value for the non-growing season was 0.10 m/s. Detailed information concerning the quality filtering procedures is listed in Table 2. A total of 42 negative F C values were observed at night in this study, accounting for 1.9% of the nighttime data. After removal by the friction velocity threshold and other quality filtering procedures, the number of nighttime negative values was reduced to 16 (~0.7%).

Results and residual analysis of gap-filling methods
The results of the ANN, GP, HLM, RB, and MDS methods are illustrated in Fig. 7. The test set R 2 obtained by these different methods ranged from 0.77 to 0.84 and the RMSE ranged from 2.04 to 2.50 mmol m −2 s −1 . There were differences in the R 2 and RMSE obtained by these methods. The GP and MDS methods represented the full range of scatter in the observations, while the others all exhibited the upper and lower limits (Fig. 7). The training set fit curve overlapped with the test set fit curve, and they were equally close to the 1:1 line, indicating that there was no overfitting phenomenon in these methods. The gap-filled F C for the five methods exhibited a relatively obvious diurnal variation (Fig. 8).
When model error is quantified, the model residuals can quantify the random uncertainty. In this study, it was assumed that model error was negligible. Thus, the model residuals could be attributed almost entirely to random error. The empirical distribution of the model residuals and the fitted curves ( Fig. 9) can describe the uncertainty of the random error and largely reflects the nature of the carbon flux observations. The model residual distributions obtained by the five methods exhibited characteristics of sharp peaks and thick tails, indicating non-normality. Comparing the kurtosis for the empirical distributions of the residuals obtained by these methods (Fig. 9), it was found that GP > MDS > ANN > HLM > RBF. Additionally, the empirical distribution of these residuals was discovered to approximate the Laplace (or double-exponential) distribution by the Wilcoxon rank sum test (p > 0.05).
Comparing the correlations between the carbon flux observations and the corresponding residuals obtained by the five different methods (Table 4), it was found that during the day in the non-growing season, Rd ≤ 400 W m −2 showed a significant weak positive correlation; the same was true at night in the non-growing season. Additionally, it was shown that the random uncertainty during the non-growing season increased with increasing carbon flux. The correlations between the flux prediction values and the residuals obtained by the five methods differed during the growing season. In the growing season, the GP and MDS methods each exhibited a significant positive correlation, while the other methods showed either a relatively weak correlation (τ close to 0) or no significant correlation.
The standard deviation of the random flux measurement error varied linearly with the magnitude of the F C . Results are summarized in Table 5. The range of the intercept term a of the flux observation station in this study was 1.3209-1.5550 mmol m −2 s −1 , which is within the typical range. The largest values came from the MDS method; the smallest from the GP. The uncertainties of the carbon flux gap-filling from these five methods were ranked as follows: GP < ANN < RBF < HLM < MDS. The slope b ranged from 0.0579 to 0.1092, which was slightly less than the typical range. The smallest slope came from the HLM; the largest from the MDS. The posterior residual of the carbon flux gap-filling from the HLM method was closer to white noise than that from the MDS method. The carbon flux data gaps were filled by the different methods. The probability distributions of the measured random errors varied with changes in the

Annual sum of carbon dioxide exchange
Utilizing the gap-filling from the five methods, the complete carbon flux data were obtained. As shown in Table 6, the gap-filled annual F C sums from MC simulations were estimated to be −159.91 ± 18.35 g C m −2 year −1 for ANN, −178.25 ± 8.01 g C m −2 year −1 for GP, −167.16 ± 19.08 g C m −2 year −1 for RBF, −156.95 ± 3.15 g C m −2 year −1 for HLM, and −155.21 ± 5.16 g C m −2 year −1 for MDS. The CO 2 exchange sums from MC simulations for the growing and non-growing seasons are listed in Table 7. Given that five different gap-filling models were used, it was definitely necessary to correct the model error. Based on the chosen fusion method (Kuncheva, 2002), it was assumed that the mean of the five methods was close to the true value of the annual sum of carbon dioxide exchange. The annual sum of CO 2 exchange in 2015 at the observed site was estimated to be −163.50 ± 9.43 g C m −2 year −1 , indicating that the desert Tugai forest in the Ebinur Lake Basin is a net carbon sink on an annual basis. The averages of the annual sum of CO 2 exchange obtained by the five methods ranged from −178.25 to −155.21 g C m −2 year −1 , and the standard deviation ranged from 3.15 to 19.08 g C m −2 year −1 ( Table 6). The smallest (in terms of absolute value) came from the MDS method and the largest from the RBF. The average bias of the annual sum of ecosystem carbon exchange for the five methods ranged from −13.69 to 14.05 g C m −2 year −1 . Table 5 Relationship between the magnitude of carbon flux and the standard deviation of random uncertainty.

Method
Intercept (  Notes: * Indicates 0.05 < P-value < 0.1. ** Indicates 0.01 < P-value < 0.05. *** Indicates p-value < 0.01. The HLM and MDS tended to underestimate, while the other methods tended to overestimate ( Table 6). The smallest mean bias of the annual sum of carbon exchange came from the ANN; the largest was from the RBF. The GP had the smallest 95% confidence interval for the bias of the annual sum of carbon dioxide exchange, while the HLM had the largest. Comparing the carbon accumulation losses of ecosystems estimated by the different methods (Fig. 10), it was found that these five methods exhibited a consistent performance during the growing season and large differences during the non-growing season. The data sources were composed of three parts, which were high-quality modeling data, data used to calculate annual sum of carbon dioxide exchange, and predictions from the different gap-filling methods.

DISCUSSION
Differences in annual sums of CO 2 exchange among gap-filling methods for the Tugai forest The annual sum of carbon dioxide exchange from the Tugai forest in our study was estimated to range from −178.25 to −155.21 g C m −2 year −1 . In a previous study, a total  annual carbon exchange ranging from −155 to −55 g C m −2 year −1 for oak/grass savanna from 2000 to 2006 was obtained (Ma et al., 2007), slightly less than the estimated value in this investigation. For comparison, Wohlfahrt, Fenstermaker & Arnone (2008) reported a net sink from 102 to 110 g C m −2 year −1 in the Mojave desert ecosystem. For a desert shrub community in Baja California/Mexico, the total annual carbon dioxide exchange was determined to range from −52 to −39 g C m −2 year −1 (Hastings, Oechel & Muhlia-Melo, 2005). In a riparian forest with similar mean annual temperature and precipitation, Ma et al. (2017) reported a net carbon sink from 278 g C m −2 year −1 to 427 g C m −2 year −1 . Thus, the carbon sink capacity of the Tugai forest in the Ebinur Lake Basin is higher than that of the savanna in the Mojave Desert ecosystem of California and the desert shrub in Baja California/Mexico, while lower than the riparian forest. The performances of the gap-filling methods in this study were comparatively good. It is difficult to determine the best applicability of the gap-filling methods in the desert Tugai forest via a single statistical parameter. Therefore, this study comprehensively evaluated the gap-filling models using R 2 , RMSE, and BE. It was found that the test set R 2 obtained by the ANN method was 0.827. The test set R 2 obtained by Menzer et al. (2015) using the ANN method was 0.740, which was 8% lower than this study. The R 2 value obtained by the GP method was 0.844, which was 6% higher than that of Menzer et al. (2015); R 2 obtained by the RBF method was 0.790, which was 39% higher than that of Järvi et al. (2012) and 3% higher than that of Menzer et al. (2015). The RMSE of the test set obtained by the five methods ranged from 2.04 to 2.50 mmol m −2 s −1 , which was smaller than that of Järvi et al. (2012) and Menzer et al. (2015). This may be primarily due to the fact that the estimations by Järvi et al. (2012) and Menzer et al. (2015) were conducted on urban ecosystems in which the underlying surface of the observation sites was rougher, and the environmental factors affecting carbon dioxide exchange were more complex. Although the EC method requires that the underlying surface be uniform and flat, the position of the terrain is not absolutely guaranteed in the actual selection process of the underlying surface. In addition, when screening the modeling data, data with a signal quality of two were excluded (Järvi et al., 2012). In the current study, however, all data with a signal quality other than 0 (including 1 and 2) were excluded, resulting in high modeling data quality. It is clear that the selection of modeling data is particularly important when training models.
There were minor differences among the standard deviations of the annual sum of carbon dioxide exchange obtained by the five methods. Due to the low annual sum of arid desert ecosystem carbon dioxide exchange, its estimation is sensitive to data processing methods, with some methods possibly leading to bias (Liu et al., 2010). In addition, the soil volume water content data were distorted as a result of the high degree of soil salinization. Therefore, this study ignored the impact of soil moisture on carbon dioxide flux, leading to a very large difference in the annual sum of carbon dioxide exchange obtained by the five methods. It should be noted that this study was limited to the seasonal signal from only a single year's worth of data at the site. Future studies should utilize a longer dataset, which would provide more insight into the ecological functioning of the site.
The BE of the annual sum of carbon dioxide exchange obtained by the five methods ranged from −13.69 to 14.05 g C m −2 year −1 . The minor differences found in the BE may be related to the fact that the carbon source/sink function of the Tugai forest is more affected by environmental factors than that of other ecosystems, especially soil moisture content. The impact of data processing methods on the estimation of the annual sum of carbon dioxide exchange was discovered to be highly variable, which may be related to the sensitivity of arid desert ecosystems to environmental factors, as well as the definition of valid data.

Carbon dioxide exchange during nighttime and non-growing season daytime
This study used a quantile-based method to determine the outliers, and also identified and deleted the anomalies in the F C time series. After careful screening during data preprocessing, a total of 42 negative values of carbon dioxide exchange at night were observed, accounting for 1.9% of the nighttime data, which was far less than the 28% of temperate broadleaf forest observations (Thomas et al., 2011). The negative carbon flux observations indicate that there was carbon uptake in the underlying vegetation of the flux tower at night, although this phenomenon was unlikely (Thomas et al., 2011). There is currently no unified view to explain this phenomenon, ignoring atmospheric advection and precipitation as possible causes. Moreover, there is also a systematic error that stems from ignoring the atmospheric advection and precipitation measurements. Even though advection correction is attractive from a theoretical point of view, it is unlikely that air advection characteristics varied over the observed period and sites (Feigenwinter et al., 2008).
The accumulation of carbon dioxide exchange obtained by the five different methods performed consistently during the growing season and exhibited very large differences during the non-growing season (Fig. 10). During the growing season, the carbon flux data were relatively complete, with no long gaps; the carbon flux data during the non-growing season, however, had more gaps, and the continuously missing time windows were long (Fig. 5), leading to very large differences in method performance. The five methods displayed obvious carbon uptake characteristics in February (Fig. 10), which was consistent with the daily variation characteristics of the selected high-quality modeling data from January and February (Fig. 3A). This result may have been influenced by ignoring the advection and storage terms, which produced a certain systematic error (Bowling et al., 2010). In addition, the high saline alkali water and soil played an important role in the carbon uptake (Li et al., 2015;Ma et al., 2013;Xie et al., 2009).
To some extent, the valid data observed at the site depend on independent processes influenced by the geographic environment of the observatory. The resulting uncertainty is primarily random, depending on both the missing time window and the gap-filling method. The friction velocity u Ã is intended to capture the turbulent characteristics in the vicinity of the surface. Nighttime data under unsteady conditions are often removed by u Ã threshold screening. Each site has a specific u Ã threshold that is expected to vary over time (Béziat, Ceschia & Dedieu, 2009;Moureaux et al., 2008). The nighttime friction velocity threshold (0.12 m/s) during the growing season was greater than the corresponding value for the non-growing season (0.10 m/s), indicating that the flux observation station experienced different air turbulence characteristics during different seasons (Fig. 6). A possible reason for this is simply that the air temperature during the growing season is higher than during the non-growing season. According to the turbulence intensity theorem, the turbulence intensity is a function of both velocity and temperature shears (Hu, Chen & Zuo, 2007). The near-surface temperature changes more rapidly during the growing season night than during the non-growing season night. Thus, in the near-surface layer, the temperature shear formed during the growing season night is larger than that formed during the non-growing season night. In addition, due to vegetation growth, the underlying surface roughness of the observatory is higher during the non-growing season than during the growing season (Fig. 6C). Uncertainty may still exist after u Ã threshold screening. There is no guarantee that the data screening process will eliminate all bad data, nor that only bad data will be removed.

Uncertainty analysis
All measurements have errors or uncertainties (Taylor, 1997), and measurement uncertainty includes random and systematic uncertainties (ISO/IEC 2008). Systematic error is considered to be a constant but unknown deviation (Abernethy, Benedict & Dowdell, 1985) and must be estimated by empirical, theoretical, or supplementary measurements. Systematic error cannot be determined by statistical analysis of the measured data itself, nor can it be reduced by averaging. The bootstrapping method has been used to quantify the random uncertainty of the flux integral at different time scales (Wang, Riley & Collins, 2015). In addition, the Monte Carlo method is used to estimate the transmission of uncertainty over longer time scales, particularly the uncertainty associated with gap filling (Moffat et al., 2007).
Comparing the change in the empirical distribution of the residuals obtained by the five gap-filling methods (Fig. 9), it was discovered that the empirical distribution of these residuals approximated the Laplace (or double-exponential) distribution. Actually, the distribution of the F C random error from the Tugai forest (|F C | < 10 mmol m −2 s −1 ) is leptokurtic. As a characteristic of random uncertainty, the leptokurtic distribution has been shown to be robust across a variety of sites and ecosystem types (Hollinger & Richardson, 2005;Richardson et al., 2008;Stauch, Jarvis & Schulz, 2008;Lasslop et al., 2008;Liu et al., 2009;Wang, Riley & Collins, 2015). It has been suggested that the leptokurtic distribution is the result of a superposition of Gaussian distributions and a non-constant variance (Hollinger et al., 2004;Lasslop et al., 2008). After normalizing the error (i.e., dividing each flux observation by the expected standard deviation), its distribution approximated a Gaussian distribution (Richardson et al., 2008).
We also found that the random uncertainty increased with increasing carbon flux (Table 5). This was consistent with the findings of Richardson et al. (2008) for the uncertainty of carbon flux measurements from model residuals. Using different methods to fill gaps leads to the differences among the distributions of the measured random errors and affects the estimation of the annual sum of carbon dioxide exchanges. Considering the differences in site characteristics and types of ecosystems, as well as the degree to which the valid flux data are contaminated with data from a separate (nonbiological or atmospheric) process, it is necessary to evaluate random uncertainty carefully and site specifically. This study may provide a valid method for evaluating the gap-filling uncertainty of other types of ecosystems.

CONCLUSIONS
This study estimated the annual sum of carbon dioxide exchange of the Tugai forest in the Ebinur Lake Basin of Northwest China in 2015. Selective system errors were corrected by determining the friction velocity threshold and utilizing it to extract high-quality data on the growing and non-growing seasons. Five different methods were used to fill the gaps in the carbon flux time series. These methods all exhibited higher R 2 and lower RMSE values than previous techniques. The GP method was the best of these gap-filling techniques. Uncertainty was inferred from the model residuals; the distribution of the residuals exhibited non-normal properties of spikes and thick tails. The ultimate goal of this investigation was not to select the best gap-filling method by comparing the merits of each but to estimate the uncertainty caused by the gap-filling and to correct model error. The total annual carbon exchange of the Tugai forest at the observation site in 2015 was −163.50 ± 9.43 g C m −2 year −1 , indicating that the Tugai forest in the Ebinur Lake Basin is a carbon sink. The errors from different sources were complex, making it difficult to give accurate estimates; therefore, this study only estimated the random errors and bias. Estimating errors from different sources will be the focus of uncertainty studies on the cumulants of carbon dioxide exchange in the future. Exploring the function of the Tugai forest in arid regions still requires additional understanding of the underlying ecological mechanisms.