Carbon flux estimates are sensitive to data source: a comparison of field and lab temperature sensitivity data

A large literature exists on mechanisms driving soil production of the greenhouse gases CO2 and CH4. Although it is common knowledge that measurements obtained through field studies vs. laboratory incubations can diverge because of the vastly different conditions of these environments, few studies have systematically examined these patterns. These data are used to parameterize and benchmark ecosystem- to global-scale models, which are then susceptible to the biases of the source data. Here, we examine how greenhouse gas measurements may be influenced by whether the measurement/incubation was conducted in the field vs. laboratory, focusing on CO2 and CH4 measurements. We use Q 10 of greenhouse gas flux (temperature sensitivity) for our analyses because this metric is commonly used in biological and Earth system sciences and is an important parameter in many modeling frameworks. We predicted that laboratory measurements would be less variable, but also less representative of true field conditions. However, there was greater variability in the Q 10 values calculated from lab-based measurements of CO2 fluxes, because lab experiments explore extremes rarely seen in situ, and reflect the physical and chemical disturbances occurring during sampling, transport, and incubation. Overall, respiration Q 10 values were significantly greater in laboratory incubations (mean = 4.19) than field measurements (mean = 3.05), with strong influences of incubation temperature and climate region/biome. However, this was in part because field measurements typically represent total respiration (Rs), whereas lab incubations typically represent heterotrophic respiration (Rh), making direct comparisons difficult to interpret. Focusing only on Rh-derived Q 10, these values showed almost identical distributions across laboratory (n = 1110) and field (n = 581) experiments, providing strong support for using the former as an experimental proxy for the latter, although we caution that geographic biases in the extant data make this conclusion tentative. Due to a smaller sample size of CH4 Q 10 data, we were unable to perform a comparable robust analysis, but we expect similar interactions with soil temperature, moisture, and environmental/climatic variables. Our results here suggest the need for more concerted efforts to document and standardize these data, including sample and site metadata.


Introduction
Understanding the mechanisms that drive greenhouse gas (e.g. CO 2 and CH 4 ) production depends on accurate measurements of the production of these gases. With the current trajectory of our changing climate, including rising temperatures and increasing precipitation fluctuations, we can expect accelerated CO 2 and CH 4 flux from environmental systems, and it is important to understand mechanisms that drive these processes in order to build more robust predictive models. Such improvements are crucial, as the degree to which greenhouse gases from land ecosystems will feed back to the climate remains one of the least certain aspects of Earth System Models (ESMs) (Friedlingstein et al 2014).
Both field and lab experiments are used to understand temperature-driven changes in soil C and parameterize the models seeking to predict those changes. Field experiments provide an integrated site-level understanding of biogeochemical transformations, and-because of their similar scale to eddy covariance and remote sensing products-the potential to scale these processes and fluxes regionally and globally. However, their inherent complexity can make understanding mechanistic causality difficult; field measurements are often subject to low signal-to-noise ratios due to environmental and climatic fluctuations, or multiple interacting drivers that are difficult to unravel and control.
In contrast, laboratory soil studies occur in tightly controlled environments and are almost entirely experimental, rather than observational, which allows for clearer mechanistic understanding. However, laboratory experimental conditions may not accurately reflect in situ temperature and moisture variations (Kirschbaum 1995). In addition, and perhaps more importantly, sampling separates the soils from the pedosphere and thus laboratory incubations are inherently artificial, excluding the effect of roots, litter, and soil fauna, as well as processes such as nitrogen uptake and leaching (Williams et al 1998, Risk et al 2008. Sampling also introduces physical disturbances-such as cutting of roots and disrupting fungal hyphae-that alter the biological and biochemical conditions at the pore to core scale. As a consequence, laboratory incubations typically allow us to measure only heterotrophic respiration, Rh, whereas field experiments typically give us a measure of total soil respiration, Rs (i.e. autotrophic + heterotrophic), resulting in a further mismatch when directly comparing field and lab measurements of respiration (Bond-Lamberty et al 2004, Subke et al 2006. A troubling consequence is that measured variables and thus model parameterizations tend to systematically differ between these two approaches. A complicating factor, but also a powerful potential way to examine these differences, is that models representing field and lab conditions tend to have different goals and structures. The predictive models that emerge from (and are needed by) field studies are generally simpler in structure, and tend to focus on larger-scale dynamics and processes (Manzoni and Porporato 2009). In contrast, the predictive models that emerge from lab studies tend to be more mechanistic and limited in temporal and spatial scale. They are usually designed to serve very specific problems with explicit system simplifications, which may not be widely applicable, and their necessary parameters may not be measurable at a larger scale. Fieldscale models, however, are also often parameterized using results from lab studies, resulting in large uncertainties in their predictive power. A typical example is modeling soil heterotrophic respiration processes in ESMs. Most land models in ESMs employ various empirical functions to represent the impacts of temperature and moisture changes on respiration rates. These empirical functions are mostly derived from lab based experimenters, e.g. (Moyano et al 2012, Sierra et al 2015, but have been frequently used to simulate field processes at regional or global scales. Model intercomparisons have shown large disagreement in simulated soil carbon dynamics (Wieder et al 2018), partly due to the variations in the functional format of temperature and moisture responses derived from lab experiments.
We propose that quantifying and explaining the gap between lab and field observations will reduce model uncertainties and provide a more systemic understanding of biogeochemical cycling, including how soils interact with temperature, moisture, and C inputs to drive transformations and fluxes in different ecosystems. Here, we specifically examine how greenhouse gas measurements may be influenced by whether the measurement/incubation was conducted in the field vs. laboratory, focusing on CO 2 and CH 4 measurements. We use Q 10 of greenhouse gas flux (temperature sensitivity) for our analyses, because of the ubiquity of this metric in biological and Earth system sciences and its importance to many modeling frameworks. Reported Q 10 values differ greatly between laboratory incubations (e.g. 1.6-2.7, Chen Soil respiration and its Q 10 have been heavily studied for the last few decades, and numerous studies have identified key environmental and edaphic controls on the temperature sensitivity of soil respiration, including soil temperature and moisture (Kirschbaum 1995, Janssens and Pilegaard 2003, Carey et al 2016, Meyer et al 2018, texture/clay content (Zhang et  We offer a unique, quantitative perspective of experimental biases introduced by incubation environmental conditions. This analysis of gas flux measurements at different scales will provide an opportunity to systematically understand the factors driving divergence of field and lab results.

Data in published papers
The studies included in this analysis were identified by searching the Web of Science and Google Scholar databases until December 2021. The search terms used were ('CO 2 ' OR 'carbon dioxide' OR 'respiration' OR 'CH 4 ' OR 'methane') AND 'soil' AND 'Q 10 ' . We only included studies that reported Q 10 values. Some studies were syntheses/meta-analyses (e.g. Kirschbaum 1995, Hamdi et al 2013, Chen et al 2020, and we also used these syntheses to identify additional sources of Q 10 data. We recorded the Q 10 values, incubation temperatures/temperature ranges, site locations (latitude, longitude), and any experimental manipulations/treatments. We included only unmanipulated samples/controls in our analysis to avoid confounding effects of nutrient or substrate amendments, warming, burning, etc.

Published respiration databases
In addition, we also used data from publicly available (open access) soil respiration databases. The Soil Respiration Database (SRDB-V5) is a near-universal database of globally published field respiration measurements, particularly seasonal-to-annual respiration fluxes (Bond-Lamberty and Thomson 2010a, Jian et al 2020). This database includes 572 studies that reported Q 10 values, which were screened for studies that were in unmanipulated, natural (non-managed, including agricultural) ecosystems; these were then used directly in our current analysis without any further data manipulation, across all soil depths. The Soil Incubation Database (SIDb) is an open database containing time-series respiration measurements from 16 laboratory experiments (Schädel et al 2020, Sierra et al 2020. We extracted respiration flux data from this database and calculated Q 10 using the exponential equation (Cui et al 2020): where a and b are the fitted parameters for the model, R T is the soil respiration rate at temperature T (Celsius), and R T+10 is the soil respiration rate at temperature T + 10. A number of functions have conventionally been used to calculate Q 10 of soil respiration, with different parameters (Cui et al 2020)we chose the exponential model because that was the most widely used function in the SRDB.

Screening
(a) Experimental manipulations-we included only unmanipulated samples/controls in our analysis. A list of manipulations reported in the SRDB is included in appendix A3. Where manipulations were part of the experimental design, we included only samples listed as 'control' . (b) Study durations-we included all studies, irrespective of study duration or time/season of data collection, as Q 10 has been previously been found to be independent of incubation duration (Reichstein et al 2005). (c) Measurement method-we did not filter data by measurement method, and we included all data and studies. However, we provide a comparison of the three common measurement types in appendix A6. Of the nearly 6000 data points for CO 2 Q 10 in this analysis, 4494 were measured using infra-red gas analyzers/IRGA (primarily LI-COR instruments, but also including other makes and models); 582 were measured by alkali absorption method; and 593 were by gas chromatography. Other measurement types included isotope ratio mass spectrometry, tunable diode laser absorption spectroscopy, or 'unknown' (not listed), but these measurements made up a very small portion (3%) of the data analyzed in this paper. Based on these criteria, we identified a total of 744 studies for CO 2 Q 10 data and 47 studies for CH 4 Q 10 data (figures 1 and 2, table 1). Following the criteria outlined above, we extracted a total of 1230 datapoints (181 studies) from published papers, 4818 datapoints (1764 studies) from SRDB-V5, and 44 datapoints (16 studies) from SIDb.

Data processing 2.2.1. Incubation temperatures
Our compiled dataset contained flux data at various incubation temperatures, spanning a wide range of −15 • C-+60 • C (supplemental figure S1). Initial analysis was performed on the entire dataset, and these data were subsequently categorized into discrete classes to investigate the effect of incubation temperature on Q 10 : <5, 5-15, 15-25, >25 • C (table 2).

Site climate and biome classification
Mean annual air temperature and precipitation for the study sites were obtained from the Center for Climate Research at the University of Delaware (Willmott and Matsuura 2001), and the sites were classified into one of five biome types (equatorial, arid, temperate, snow, and polar) based on the Köppen-Geiger climate classification (Kottek et al 2006, Appendix A2).  Of the 682 respiration studies assigned to a climate region, 4% were equatorial, 5% arid, 46.5% temperate, 38% snow, and 6.5% polar.

Partitioning of soil respiration
We used the 'RC' (root contribution) index provided within SRDB-V5 to identify data that were dominated by autotrophic vs. heterotrophic respiration. The RC index is defined as the ratio of annual Root to Rs, and is a unitless value ranging from 0 (no root contribution, or 100% heterotrophic/microbial) to 1 (no microbial contribution, 100% autotrophic/roots) (Bond-Lamberty et al 2004, Jian et al 2022. We used a cutoff of 0.5 to group our data into broad root/autotrophic dominated (RC > 0.5) vs. microbial/heterotrophic dominated (RC < 0.5) categories.

Statistical and data analysis
We used analysis of variance (ANOVA) to detect statistically significant differences between field and lab measurements. To account for unequal sample sizes between field and laboratory measurements, we employed a bootstrapping approach (10 000 iterations × sample size 10).
All data processing and analysis was performed using R version 4.

CH 4 : no difference between field and laboratory measurements
The Q 10 values for CH 4 ranged from 0.80 to 83.00 and did not differ significantly between field and lab measurements (figure 3, ANOVA, F = 0.183, P = 0.670). The smaller sample size of the CH 4 data did not permit robust analyses based on temperature or climate grouping, as we did for CO 2 (see below).
Because methane is such an important greenhouse gas, we share our limited results here, and suggest that broader efforts to quantify and document methane emissions are needed.

CO 2 : laboratory measurements were more variable than field measurements
Overall, Q 10 values for CO 2 ranged from 0.56 to 132 for field measurements (mean = 3.05) and from 0.50 to 344 for laboratory incubations (mean = 4.19) and differed significantly between the two experiment types (figure 4(A), ANOVA, F = 18.9, P < 0.001). Contrary to our expectations, laboratory measurements were significantly more variable than field measurements (F-test, F = 38.547, P < 0.001; coefficient of variation: field = 103%, lab = 409%). Despite these wide ranges and high variability, the median Q 10 values were generally similar for the two: 2.66 for field measurements vs. 2.35 for laboratory measurements. These median values are noticeably higher than generalized values in the literature; respiration Q 10 is often assumed to be 1.5 or 2 (Raich et al 1991, Potter et al 1993, Ise and Moorcroft 2006, Foereid et al 2014.

Extreme values may represent laboratory artifacts
Incubation temperature has long been recognized as a strong driver of temperature sensitivity (Kirschbaum 1995, Chen and Tian 2005), due to the fundamental underlying biokinetics (Davidson and Janssens 2006), and we see a consistent pattern in figure 4(B). Q 10 measurements at lower temperatures were typically 1-2 orders of magnitude greater than those at higher temperatures. Because these values represent 'apparent' temperature sensitivity (sensu Davidson and Janssens 2006), they have a long right-hand tail of seemingly extreme values. In our analysis, most of the Q 10 values > 30 (99.9th percentile in field measurements) were from snow and polar regions (with a few temperate), in laboratory experiments with nearly zero or sub-zero incubation temperatures (appendices A4 and A5). Many of these 'extreme' data were obtained from Mikan et al (2002), who reported drastically greater Q 10 values for frozen (Q 10 = 63-237) compared to thawed soils (Q 10 < 10) in laboratory incubations of arctic tundra soils, suggesting shifting controls on respiration as soils are frozen. Water is an important driver of soil respiration, affecting spatial accessibility and substrate bioavailability (Bailey et al 2019, Patel et al 2021), including controls on substrate and enzyme diffusion (Ebrahimi and Or 2015, Zheng et al 2022). Freezing water below 0 • C limits the diffusion of substrates, nutrients, and enzymes in soils, providing additional physical barriers for substrate access, compared to unfrozen soils. This can decouple the link between temperature sensitivity and substrate decomposition (Ostroumov and Siegert 1996). The Q 10 values of frozen soils therefore do not accurately represent kinetic response to temperature, and instead are more likely to represent physical barriers to diffusion.
It is interesting to note that it was only the lab experiments that showed such high Q 10 values for sub-zero incubation temperatures. Most field Q 10 values were below 30, including for sub-zero temperatures, with only two measurements higher, at 105 and 131 (Nakane et al 1997, Monson et al 2006. This would suggest that the laboratory incubations introduced experimental artifacts that may have influenced the high Q 10 values, including physical disturbance of sampling and sieving, disruption of roots and microbes, releasing fresh labile carbon into the system (Curtin et al 2014, Herbst et al 2016. Researchers must thus be cautious and aware of experimental and environmental artifact that can influence these values, when comparing data across different experiments.

Q 10 by biome and ecosystem type
The spread in Q 10 values was greatest for 'coldinfluenced' biomes (i.e. temperate, snow, and polar), as high as 150 in temperate, 237 in snow, and 344 in polar regions ( figure 5(A)). The median Q 10 values were consistently between 1.5 and 2.5 across all five biomes, despite the wider ranges and greater variation for the cold-influenced regions. There was a significant difference between field and lab measurements in arid and snow regions (lab > field, ANOVA, P < 0.01), but not in any of the other biomes. We suggest that the difference was greatest in these two biomes because they represent regions that are strongly constrained by environmental conditions (one is dry and one is cold), and thus even small shifts in water content or temperature during laboratory incubations would likely induce strong responses. For arid soils, in particular, soil respiration is decoupled from temperature and less sensitive to temperature changes, because drought reduces access to organic substrate-leading to lower Q 10 values (Jassal et al 2008, Suseela et al 2012, Carey et al 2016. Liu et al (2016a) reported that soil respiration in arid areas was strongly influenced by increased precipitation, whereas more humid regions would be less sensitive to precipitation/moisture changes, and we can assume similar responses to laboratory incubations, as water is added to the experimental units.
For all biomes except arid, there was a significant difference in variability between field and lab measurements (F-test, P < 0.001). For equatorial regions, field measurements were more variable than lab. But for temperate/snow/polar, lab measurements were more variable than field, strongly influenced by the extreme values. This is interesting because we expected field measurements to be more variable, in contrast to the tightly controlled conditions found in laboratory experiments. However, these data represent all the measurements across all incubation temperatures, including more 'extreme' laboratory incubations. In fact, the range of incubation temperatures for laboratory experiments was much broader than that seen in field measurements, indicating that the lab incubations may not always reflect the 'normal' field conditions (appendices A4 and A5). Additional experimental artifacts may also drive the variability in the laboratory measurements, including the physical disturbance of sampling and sieving, which could damage/cut roots and hyphae, introducing fresh carbon for metabolism. This may be a source of a carbon surge that is more temperature sensitive than the naturally turned over carbon in the field (Zimmermann and Bird 2012, Datta et al 2014, Sokol and Bradford 2018, Makita et al 2021. However, when excluding Q 10 > 30, equatorial, snow, and temperate regions showed significant differences in variability between field and lab. When grouped by ecosystem type ( figure 5(B)), there were significant differences between field and lab measurements only for forest (field > lab) and wetland (lab > field). For wetland soils, this might be due to experimental artifact, as most respiration incubations are performed on partially saturated soils, as opposed to field conditions, where the soils are presumably saturated. Forest soils made up the majority of the data in this synthesis, and the differences in field vs. lab are likely due to the variable incubation/experimental temperatures, and the unequal distribution across biomes (field data points were 42% temperate and 54% snow, whereas lab data points were 54% temperate and 10%-15% each of snow, polar, and equatorial).

Effect of incubation temperature
The results in figures 4 and 5 include data across all incubation temperatures from −15 • C to +60 • C (incubation temperature ranges provided in figure  S1) and therefore do not provide an accurate comparison of field vs. laboratory measurements. Q 10 measurements at lower temperatures are typically 1-2 orders of magnitude compared to higher temperatures (figure 4(B)) (Kirschbaum 1995, Mikan et al 2002, Chen and Tian 2005, and we therefore need to account for this when comparing Q 10 values across different studies. To account for these temperature effects, we split the data into groups based on incubation temperature ranges: 5 • C-15 • C, 15 • C-25 • C, and >25 • C (figure 6(A)). We chose only studies where incubation temperature ranges were 10 • C or less (for instance, 5-10, 10-12, and 5 • C-15 • C all fell under the group 5 • C-15 • C; but 5 • C-25 • C was excluded). We chose these groups because they had the greatest number of datapoints, allowing for a more robust analysis (appendix A1). Another consideration was to prevent confounding effects of freezing (Mikan et al 2002)we therefore chose 5 • C-15 • C, and not 0 • C-10 • C). For this analysis/figure, we only include data from incubations above 5 • C (figure 6(A)), as we did not have sufficient data points below 5 • C for a robust analysis.
These comparisons were done on unequal sample sizes (see figure 6(A)). Such sampling inequality complicates frequentist statistical tests, and we therefore performed a bootstrapping analysis on these data to compare data across equal sample sizes ( figure 6(B)). The trends between field and laboratory data still held true after the bootstrapping analysis, with laboratory Q 10 > field Q 10 for 5 • C-15 • C, and field Q 10 > laboratory Q 10 for 15 • C-25 • C, and >25 • C. It is interesting to note that despite the wide spread of Q 10 values, the bootstrapped data remained only between 0 and 10, highlighting once again the overall rarity of the extreme values.
Why the different trends? Due to stronger temperature limitations on respiration at lower temperatures, it is likely that these soils were more sensitive or responsive to other sampling disturbances or incubation artifacts, increasing the variability across seemingly comparable experimental incubations. On the other hand, these responses could be muted or countered by other factors at higher incubation temperatures. Another reason for this variable pattern across temperatures could be respiration partitioning (autotrophic vs. heterotrophic vs. total soil respiration), as the shifting balance between autotrophic (Ra) and heterotrophic (Rh) respiration across temperatures is an important factor to be considered (Wei et al 2010, Rankin et al 2021, Lei et al 2022, as we discuss below.

Heterotrophic soil respiration was comparable between field and lab
A complication when comparing field and laboratory respiration measurements is the measurement of heterotrophic vs. total soil respiration, Rs (Bond-Lamberty et al 2004, Subke et al 2006, Maseyk et al 2008, Liu et al 2016b, Bond-Lamberty et al 2018, Feng et al 2018. Soil surface CO 2 flux (total soil respiration) consists of respiration by roots (autotrophic respiration, Ra) and respiration by soil organisms (heterotrophic respiration, Rh). In contrast, laboratory incubations of CO 2 flux generally account only for heterotrophic respiration, because roots are often cut and removed for these experiments. Greenhouse experiments offer an alternative to laboratory experiments to address respiration partitioning-they can provide the experimental control needed, and the inclusion of plants in greenhouse incubations can provide estimates of total soil respiration. However, we do not have sufficient Q 10 data from these studies for our analysis, and they are not included here.
Thus, a direct comparison of field vs. lab may not provide accurate comparisons, and we must account for differences due to respiration partitioning when we analyze data across different experiments. Most of the SRDB data represent total soil respiration in the field, but some studies (e.g. Dhital et al 2010, Ruehr and Buchmann 2010, De Simon et al 2013, Yan et al 2015 partitioned total soil respiration into autotrophic and heterotrophic components. We used the 'RC' (root contribution) index provided within SRDB (Bond-Lamberty and Thomson 2010a) to compare data that were dominated by autotrophic (RC > 0.5) vs. heterotrophic (RC < 0.5) respiration (figure 7). Q 10 values for autotrophicdominated respiration were significantly greater than those for heterotrophic-dominated respiration (mean autotrophic = 3.13, heterotrophic 2.70; ANOVA, F = 24.67, P < 0.001, table 3). Interestingly, the distributions of Rh-dominated field data and laboratory data (Rh-only) showed a strong overlap, suggesting that, based on this limited dataset, heterotrophic respiration measurements may be similar across field and lab experiments.
The almost identical distributions of laboratoryand field-derived Q 10 values for Rh (figure 7) provide strong support for using the former as an experimental proxy for the latter.
The contribution of Ra vs. Rh to total soil respiration is an important consideration for field measurements. Since root respiration is more sensitive to temperature changes, Ra is likely to have a stronger Figure 7. Density plot of Q10 for heterotrophic (Rh) vs. autotrophic (Ra) dominated respiration. Field-based data were classified as Ra-dominated or Rh-dominated based on the RC index (root contribution) from SRDB. All laboratory-based data reflect Rh. The dashed lines represent the means for the groups (field-Ra = 3.13; field-Rh = 2.70; lab-Rh = 2.58). The Ra-dominated field Q10 data were significantly different from Rh-dominated field and laboratory Q10 data (ANOVA, P < 0.001). The Rh-dominated field Q10 data were not significantly different from the laboratory (Rh) Q10 data (ANOVA, P = 0.764). phenological/seasonal pattern (Schindlbacher et al 2009), with 'root growing periods' inflating respiration rates because of increased fine root/fine tissue respiration during this period (Boone et al 1998, Epron et al 1999, Hanson et al 2003, Davidson et al 2006. Conversely, microbes are generally more insulated from aboveground temperature changes, and are therefore less likely to show strong seasonal patterns in Rh Q 10 values. Yet another complication is that Rh:Rs has been rising significantly over the last few decades (Bond-Lamberty et al 2018, Lei et al 2021, reflecting enhanced soil organic matter (SOM) mineralization driven by climate changes, showing a shifting balance between autotrophic and heterotrophic respiration. Ra, however, has remained unchanged over this period (Lei et al 2021), and it is therefore important to understand the relationships between Ra, Rh, and Rs as we study the soil carbon cycling in a changing environment. Also important is that the proportion of plant roots (and therefore Ra:Rs) scales with the successional stage of an ecosystem, implying that the age of the stand will also influence the respiration partitioning and hence the overall Q 10 patterns (Wang et al 2010).

Field vs. lab measurements: perspective
After 16 years since the call of Davidson et al (2006) to 'move beyond Q 10 ' , and in spite of the wide recognition of its weaknesses (Gu et al 2004, Tang andRiley 2020), temperature sensitivity remains a central concept in lab, field, and modeling sciences of the earth system. As a parameter commonly used in existing models, it is easy to understand, and as an index, it allows us to compare measurements and data across different study types that may have different measurement methods.
Our objective was to identify the biases occurring in field vs. lab experiments that would guide optimization of measurements for specific uses, decreasing the aforementioned signal-to-noise ratio. We demonstrate that this is a very complicated question. Initial assumptions were that field measurements would be more variable than lab measurements, given the abundance of environmental factors that cannot be controlled. Lab measurements were predicted to be less variable, but less representative of true field conditions, owing to the absence of those same environmental factors. Surprisingly, our analyses revealed that there was greater variability in the Q 10 values calculated from lab-based measurements of CO 2 fluxes. This initially surprising result makes sense on further reflection: lab experiments can explore extremes rarely seen in situ, and more critically, by design isolate single experimental factors, removing other constraints. In contrast, field observations will always be subject to constraint by the most limiting factorand only rarely will these factors 'line up' to produce extreme observations.
In spite of this, models typically have trouble replicating real-world extremes, because we need a better mechanistic understanding of the extremes themselves and the ecosystem carbon-cycle processes responding to these extremes (Reichstein et al 2013, Zscheischler et al 2014. This speaks to the value of both types of observations and ways of doing science. The real-world Q 10 values at core-to-ecosystem scales are critical to evaluate models against (Todd-Brown et al 2018), but impossible (or very difficult) to draw mechanistic insight from. In contrast, the artificiality of incubations means that they should not be used for larger-scale, integrated (plant + soil) model benchmarking; but these studies are essential for probing mechanistic understanding (Wieder et al 2019). Together, these approaches highlight the critical role of ecosystem-scale manipulations (e.g. SPRUCE (Hanson et al 2017), FACE (Palmroth et al 2006), TEMPEST (Hopple et al submitted), BBWM (Patel et al 2019)) that provide experimental control but also integrated, real-world soil, plant, and microbial conditions. The overall variabilities of field vs labbased measurements appear to depend on the geographic origin of the soils, with water being a key driver.

Environmental factors contributing to bias
Water content exerts strong physicochemical and biochemical controls on carbon availability (Moyano et al 2012, Ebrahimi and Or 2015, Yan et al 2016, Patel et al 2021, and strong correlations have been reported for respiration Q 10 and soil moisture-both positive (Xu and Qi 2001, Craine and Gelderman 2011, Meyer et al 2018 and negative (Luan et al 2013, Meyer et al 2018, depending on the land use and vegetation type, SOM quality, and other soil properties. Reported soil moisture values for our compiled dataset ranged from 30% to 70% water holding capacity, up to 75% water-filled pore space, and as high as 340% gravimetric water content. However, not all studies reported soil moisture, and, as we demonstrate with these values, the moisture reported was in inconsistent units (Franzluebbers 2020), and we are therefore unable to perform a robust analysis of soil moisture effects here.
Seasonality and study duration can also influence respiration Q 10 values, due to shifting patterns of temperature and moisture on an annual scale. For instance, winter Q 10 values are generally larger than summer Q 10 values (Rayment and Jarvis 2000, Janssens and Pilegaard 2003, Han and Jin 2018). Short-term measurements, especially in the field, are therefore subject to these seasonal variations, which must be considered when we interpret respiration data. Further, short-term incubations (days) could have higher Q 10 values compared to longer incubations, driven by experimental artifacts, which get smoothed out over time (Janssens andPilegaard 2003, Wang et al 2014). This can be seen especially in the 'extreme' environmental conditions like dry or cold regions, where the microbes are likely more sensitive to small changes in temperature and moisture. For lab incubations, the time of year that the soils are sampled may also influence how comparable the data are to the field measurements. For our current analysis, we include data from all experiments, irrespective of duration and seasons.

Current gaps and future opportunities
This data synthesis highlights a number of crucial gaps in data and understanding, but also opportunities for both experimentalists and modelers studying soils and their temperature-sensitive GHG processes. First, we need information on soil depth and composition, including simple measurements such as organic vs. mineral soil, reported more regularly. Reporting the soil depth associated with Q 10 is also crucial, as they have different responses to temperature fluctuations in situ or in the lab. Soil texture is another crucial piece of information (Ghezzehei et al 2019), and can, along with gravimetric water content, help to infer soil water tension and water retention properties. While we were unable to perform a comparable robust analysis for the CH 4 data, we expect similar interactions with soil temperature, moisture, and environmental/climatic variables, as we discuss above. Numerous studies have reported on CH 4 emissions over the last few decades, but our results here suggest the need for more concerted efforts to document and standardize these data, including sample and site metadata (Bond-Lamberty et al 2021).
In addition to these current limitations exist future opportunities. The work we present here highlights some of the challenges in interpreting data across different experimental/incubation types, as well as the need for more concerted and targeted experiments. Quantifying and understanding how and why field and lab measurements of GHG temperature sensitivity vary is crucial to better understanding the strengths and limitations of experimental designs.
Finally, our results have implications for the parameterization and assessment of ESMs. The sensitivity of terrestrial carbon pools to climate change is one of the largest sources of uncertainty in earth system modeling (Friedlingstein et al 2014, Bonan and Doney 2018), meaning that robust parameterization of fundamental processes in ESMs, and benchmarking of these models' outputs, are crucial. This process is most effective when observations and modeling iteratively strengthen each other (Kyker-Snowman et al 2022). In our analysis, the consistency between field-and lab-based Rh Q 10 distributions (figure 7) provides confidence in the use of laboratory experiments to parameterize largerscale models, and that models' emergent Rh temperature sensitivity can be reasonably compared to ecosystem-scale observations (Moyano et al 2013, Shao et al 2013. Such emergent behaviors likely provide the strongest scale-dependent response metric for evaluating ESMs (Collier et al 2018), for which assembled field and lab datasets will be crucial resources. This highlights the need for improved and expanded respiration measurements from underrepresented/excluded regions Shang 2016, Kim et al 2022). Most of the studies published focus on temperate regions, a common problem in soil field sciences, but it is in less represented high-and lowlatitude regions that the climate and carbon cycle is changing most rapidly (Pörtner et al 2022).

Data availability statement
The data and R scripts that support the findings of this study are openly available at the following URL/DOI: https://github.com/kaizadp/field_lab_ q10 (10.5281/zenodo.7106554) and archived at ESS-DIVE (doi:10.15485/1889750).