Standardized Data to Improve Understanding and Modeling of Soil Nitrogen at Continental Scale

Nitrogen (N) is a key limiting nutrient in terrestrial ecosystems, but there remain critical gaps in our ability to predict and model controls on soil N cycling. This may be in part due to lack of standardized sampling across broad spatial–temporal scales. Here, we introduce a continentally distributed, publicly available data set collected by the National Ecological Observatory Network (NEON) that can help fill these gaps. First, we detail the sampling design and methods used to collect and analyze soil inorganic N pool and net flux rate data from 47 terrestrial sites. We address methodological challenges in generating a standardized data set, even for a network using uniform protocols. Then, we evaluate sources of variation within the sampling design and compare measured net N mineralization to simulated fluxes from the Community Earth System Model 2 (CESM2). We observed wide spatiotemporal variation in inorganic N pool sizes and net transformation rates. Site explained the most variation in NEON’s stratified sampling design, followed by plots within sites. Organic horizons had larger pools and net N transformation rates than mineral horizons on a sample weight basis. The majority of sites showed some degree of seasonality in N dynamics, but overall these temporal patterns were not matched by CESM2, leading to poor correspondence between observed and modeled data. Looking forward, these data can reveal new insights into controls on soil N cycling, especially in the context of other environmental data sets provided by NEON, and should be leveraged to improve predictive modeling of the soil N cycle.

snowpack, there is a strong positive influence of snow cover on N availability in spring (Brooks et al., 1998;Durán et al., 2014), while in warmer ecosystems, temporal trends in N availability are often linked to seasonality of rainfall (Austin et al., 2004;Davidson et al., 1990;Homyak et al., 2014). But many regions are experiencing less snowpack and earlier growing season arrival, along with more variable rainfall and prolonged droughts (Gergel et al., 2017;Kapnick & Hall, 2010). How will these changes influence seasonal patterns in N supply and demand?
Along with seasonality, other factors such as vegetation cover and soil type are known to influence N availability. For instance, herbaceous cover tends to promote higher soil N availability compared to forests due to higher-quality plant litter inputs (Chapman et al., 2006;Weintraub et al., 2017), highly weathered soils such as those found in the humid tropics tend to be relatively N-rich (Brookshire et al., 2011;P. Vitousek & Matson, 1988), and organic soils found in colder climates tend to have high N stocks and turnover rates compared to underlying mineral horizons (Hart & Gunther, 1989). In the context of our rapidly changing climate and resulting alterations in seasonality and vegetation phenology, we need a better understanding of the trajectory of seasonal N dynamics and the relative importance of this compared to controls by vegetation composition, soil horizon type, and other ecosystem factors.
Given the many variables that influence N availability, this property is difficult to predict and model across scales. A common proxy for available N is net nitrogen mineralization, the net microbial production of inorganic N from organic matter. Nitrogen mineralization provides a major N source for plant and microbial growth in terrestrial ecosystems, although soluble organic N can be equally important in low N systems (Schimel & Bennett, 2004). N release from plant litter is well predicted using initial litter chemical characteristics and site climate (Parton et al., 2007). Yet we still lack the ability to predict broad-scale controls on the rates of N cycle processes in soil, where mineralogy and mineral-organic matter interactions are likely to figure prominently (Bingham & Cotrufo, 2016;Jilling et al., 2018). This holds true for N mineralization as well as nitrification, the conversion of ammonium to nitrate with a byproduct of N 2 O and other N trace gasses. For instance, Colman and Schimel (2013) measured rates of net N mineralization via laboratory incubations of continentally distributed mineral soils under constant temperature and moisture and found that edaphic and site factors could only explain ∼33% of the observed variation. Using data synthesis, Li et al. (2020) found increasing rates of net nitrification with decreasing latitude and increasing temperature, and a positive relationship with N 2 O production and net nitrification rate. However, they observed wide variation in nitrification rates within biomes and low explanatory power overall. Soils display high spatial heterogeneity from local to regional and continental scales, with land use history contributing to variation in N dynamics. Explicitly capturing this cross-scale variance may be key to improved understanding and modeling of ecosystem function and response to global change (Bradford et al., 2021). Indeed, N cycle processes have proved challenging to represent in Earth system models (ESMs; Davies-Barnard et al., 2020;Thomas et al., 2015), and this makes it difficult to predict N availability and related carbon cycle processes, now and in the future.
A suite of standardized, broadly distributed measurements of soil inorganic N pools and net transformation rates could shed new insights into controls on N cycling in soils. The National Ecological Observatory Network (NEON) provides a community resource to deliver such measurements, along with a rich set of ancillary biogeochemical, ecological, and climatic variables. To our knowledge, there has been no other effort of this type to collect standardized data on soil N pools and transformation rates at sites distributed strategically across the U.S., with stratified within-site sampling across land-cover types, intra-annual sampling to capture seasonal dynamics, and plans for repeated sampling every 5 years for several decades. The spatial and temporal axes covered by this sampling should allow for new understanding and improve our ability to understand and model the forces shaping soil N availability in response to global change.
The aim of this paper is to introduce the NEON soil inorganic N data set to the biogeochemistry research community and illustrate how a unified data set such as this can address important research questions. First, we describe the spatial and temporal sampling design along with analytical approaches and challenges in generating a robust inorganic N data set, even for a project with highly standardized protocols. After exploring solutions to overcome the challenges, we use the data to investigate sources of variation in the context of the NEON sampling design, to probe whether NEON measurements are capturing heterogeneity in N cycling along dimensions of seasonality, soil horizon type, vegetation cover, and across spatial scales. Additionally, we compare NEON soil N measurements with outputs from an ESM (Community Earth System Model version 2, CESM2) to evaluate the degree 3 of 16 of agreement between the two. Finally, we discuss the many opportunities to leverage these data, pairing them with other NEON data sets, additional measurements, and modeling efforts to yield new insights into terrestrial N availability.

NEON Sampling and Data
The NEON is a U.S. based monitoring networking that provides open ecological data at a continental scale (www. neonscience.org). The NEON provides 182 data products that deliver standardized measurements of biodiversity, biogeochemistry, micrometeorology, disease ecology, and other environmental variables. The observatory is composed of 47 terrestrial and 34 aquatic sites, distributed across 20 eco-climatic domains ( Figure 1). Sensor infrastructure and observational sampling are used to observe ecosystems on the scale of minutes to years and from microns to kilometers (Balch et al., 2020;Keller et al., 2008).
Measurements of soil inorganic N pools and net transformation rates are a key component of the terrestrial biogeochemistry sampling design (Hinckley et al., 2016). These measurements are provided as part of NEON data product DP1.10086.001, soil physical and chemical properties, periodic (https://data.neonscience.org/data-products/DP1.10086.001). At each NEON terrestrial site, ten 40 × 40-m plots are used for long-term soil sampling. These plots are located using stratified random sampling to encompass the major vegetation/land-cover types within each site, according to National Land Cover Database (NLCD) classification (Barnett et al., 2019). Each plot is divided into quadrants, and soils are sampled from three of these four quadrants from a random, preselected location during each field campaign, which in NEON terminology is a "bout" (Figure 1). Thus, at each site, each of the 10 plots and 3 of the 4 subplots within each plot are resampled during each sampling bout, and the exact location within each subplot rotates for each bout. Samples are collected from 0 to 30 cm depth (or refusal), a standardized depth that was selected by NEON to facilitate within and across site comparisons and which includes the highest rates of soil N cycling activity and the majority of fine root biomass in most ecosystems (Binkley & Hart, 1989;Jackson et al., 1996), including NEON sites (Table S1 in Supporting Information S1). Soils are generally sampled using 2 or 2¼ in. (5.1 or 5.7 cm) soil augers, but with some adjustments to accommodate difficult to sample soils (see Appendix table in the NEON soil protocol, Stanish, 2022). If a surficial organic horizon is present, it is separated from mineral material and processed as a distinct sample.
Most NEON sites have three soil sampling bouts per year, except for the five sites located in Alaska that only have one due to the short growing season. All sites are sampled during their period of peak greenness, and most (c) Each plot is divided into four subplots, three of which are sampled in any given bout at random, preselected locations (brown circles). (d) At each subplot location, an initial core is collected and an incubated core is installed for in situ, net N transformation rate measurements.
are also sampled during two seasonal transition periods, the timing of which differs between sites as a function of climate. Seasonally cold sites are sampled during their respective spring and fall, while sites with milder temperature with greenness driven by rainfall are sampled during rainy season onset and dry down. Inorganic N pools and net transformation rates are measured during all bouts in a given sampling year, but each site only conducts inorganic N analyses one out of every 5 years, with other years reserved for nonbiogeochemistry soil sampling. This allows for the examination of seasonal trends in N dynamics, as well as how these dynamics may be changing over time at the continental scale.
Within 1 day of collection, field moist mineral horizon soils are sieved to 2 mm, and organic horizon soils are hand-picked to remove roots, rocks, and other non-soil material. Then, inorganic N pools are measured using 2 M potassium chloride (KCl) extractions with 10:1 solution to fresh soil weight ratio (Bremner & Keeney, 1966). The NEON includes up to three procedural blanks for each day of extractions to account for any contaminant N introduced in the procedure. Aliquots from "initial" soil samples are used to measure bacterial and fungal community composition, microbial biomass, pH, and gravimetric soil moisture, as well as total organic C and total N concentrations, C and N stable isotope ratios, and soil metagenomic content during peak greenness. To measure net N transformation rates, covered cores (PVC cylinders with loose-fitting caps) are installed to 30 cm depth (or refusal) when initial soil samples are taken. The exception is for wetland plots where buried bags are used instead to prevent the water table from interfering with N accumulation (Hart et al., 1994). These covered cores (or buried bags) are left to incubate in situ, then recovered 2-4 weeks later (depending on site and time of year), and extracted with 2 M KCl, similar to the initial samples.
The KCl extracts are frozen following collection and shipped to a centralized facility contracted by NEON (2017-2018, EcoCore Analytical Services at Colorado State University; 2019-present, University of Florida Wetland Biogeochemistry Laboratory) for analysis of ammonium (NH 4 + ) and nitrate + nitrite (NO 3 − + NO 2 − ). Ammonium is measured using the salicylate nitroferricyanide method, and NO 3 − + NO 2 − is measured using cadmium reduction and sulfanilamide, both on an autoanalyzer. Due to resource constraints, NEON samples are not run with and without the cadmium reduction step, thus all measurements for nitrate include nitrite. However, nitrite is often a minor and short-lived constituent of natural soils (Hart et al., 1994); hereafter, we refer to NO 3 − + NO 2 − as nitrate. Difference between final and initial inorganic N is net N mineralization, while difference in final versus initial nitrate is net nitrification. Data are reported as milligrams N per liter KCl (mg-N/L). In order to blank-correct the data and then calculate milligrams N per kilogram dry soil (per day for net rates), data must be combined from several tables that come in the data product download, followed by a series of basic calculations. The NEON developed a simple R package to perform these calculations (https://github.com/NEON-Science/NEON-Nitrogen-Transformations). Data are generally available within 6 months of sample collection on the NEON data portal, along with field and laboratory metadata, sampling and analytical protocols, and other documentation.

Challenges to Continental-Scale Inorganic N Sampling
To our knowledge, this is the first time a monitoring network has attempted repeated, standardized, continental-scale soil inorganic N sampling, and it has not been without challenges. The NEON employs a large cohort of technicians every season, and while most have some experience with ecology or field work, rarely is this specific to soils or biogeochemistry. The NEON has learned that all details required to ensure high-quality soil inorganic N pool and flux data must be included explicitly in protocols and training materials. Examples of details that were added over time include setting a minimum and maximum distance between the initial and incubated core, as well as a requirement for similar depths (within 5 cm) and adding detailed instructions for laboratory equipment preparation and clean lab techniques. Moreover, the Domain Support Facility (DSF) laboratories where extractions occur are set up for general ecological field work, not biogeochemistry per se, and the in-house water purification systems only achieve Type II (1 MOhm) grade. This water was used in 2017 to create KCl solutions but, upon examination of the first year of data, it was clear that certain DSF purification systems were not removing enough NH 4 + or NO 3 − and blanks were unacceptably high. To solve this, all DSFs began purchasing Type I (18.2 MOhm) water to use for solution preparation and final equipment rinses starting in 2018.
Another issue involves "negative" sample concentrations after accounting for the inorganic N in analytical blanks. After observing this in several sites, troubleshooting revealed the cause to be nitrite contamination (up to 0.15 mg N/L) in certain batches of KCl used for extractions. In the presence of acidic soil, NO 2 − is abiotically converted to nitrogen gasses and lost (chemodenitrified), but this process does not occur in blanks (Homyak et al., 2015). This often leads to significant negative NO 3 − + NO 2 − values upon blank-correction for highly contaminated KCl batches and greatly reduces sensitivity of the method in low inorganic N concentration NEON sites where nitrate-N tends to be <0.05 mg/kg, about one third of the total. NEON continues to troubleshoot this issue. It has been difficult to source KCl powder that is consistently low in nitrite, so NEON has considered options such as in-house purification of KCl or custom order of KCl free from contaminants-but these solutions have proved challenging. For the 2022 field season, the workflow was to test each batch of KCl ordered from standard vendors, then use only batches where nitrite concentrations were verified to be low (generally <0.02 mg NO 2 − -N/L, although a few batches were used with up to 0.05 mg NO 2 − -N/L due to supply chain issues). The plan is to continue with this procedure unless or until an alternate solution can be implemented. Any sample values that blank-correct negative beyond the level of background noise, which is considered less than −0.02 mg N/L (more than twice the method detection limit of 0.01 mg/L), are flagged in the data with "blanks exceed sample value." Below, we discuss possible approaches to account for these values.

Data Download and Blank-Corrections
To examine the inorganic N data, we downloaded DP1.10086.001 soil physical and chemical properties, periodic from the NEON data portal on 9 June 2022 (NEON, 2022a) using the neonUtilities R package (Lunch et al., 2022). We chose to download "RELEASE-2022" data that includes a static version of inorganic N measurements from 2017 to 2020. More information about the sites included in the download along with relevant site metadata are summarized in Table S1 in Supporting Information S1. This includes fine root biomass fraction in the top 30 cm of soil, derived from DP1.10066.001 root biomass and chemistry, Megapit (NEON, 2022b). We used the def.calc.ntrans function in the neonNTrans R package (Weintraub, 2017) to calculate blank-corrected inorganic N concentrations and net N transformation rates on a per soil mass basis. Expressing pools and rates per gram soil is common practice in the biogeochemical literature (Liu et al., 2017), should be reflective of what plants and microbes "see" when competing for N, and is relevant to N losses to water and the atmosphere. Recent work has linked soil C with variation in net N transformation rates (Gill et al., 2023;Liu et al., 2017), and it could be interesting to examine rates per gram soil C. However, this would require assumptions and extrapolations, as the peak green sampling bout is the only one where total soil C is measured on the NEON soil cores, thus this would introduce more uncertainty (especially for organic horizons). We avoid this given our aim to assess sources of variation in the context of the NEON sampling design but acknowledge this could be relevant for future studies.
Given the issues with high NO 2 − blanks described above, we explored four possible ways to account for negative blank-corrected concentration data, as outlined in Table 1. Each has certain advantages, disadvantages, and assumptions. In the end, we used Option 4 because it yields a larger data set with all sites and samples retained, while avoiding substantial inflation with zeros ( Figure S1 in Supporting Information S1). We suggest that the combination of a quantile regression for nitrate to estimate and add back the NO 2 − from KCl lost from samples but not blanks, plus use of an empirical cutoff for blank values for NH 4 + ( Figure S2 in Supporting Information S1) produces a data set highly suitable for further analyses. However, we recognize that there are other ways these data can (and will) be analyzed. Over time, these issues should become less prominent as NEON is now paying attention to and remediating the nitrite contamination issue.

Mixed Model Analyses to Examine Sources of Variation
Taking the resulting data set generated from the Option 4 blank-correction procedure, we fit mixed-effects models to test which aspects of the NEON sampling design were associated with variation in inorganic N dynamics. We fit separate models for the following response variables, leading to four final models, one each for: net N mineralization, net nitrification, and the initial concentrations of NH 4 + and NO 3 − measured in the field. The net N mineralization and net nitrification data were strongly skewed and included large positive and negative values, so we used a "neglog" transformation to reduce skewness while preserving the sign of each value (Whittaker et al., 2005). Positive values were summed with one prior to a natural log transformation, and negative values were multiplied by negative one and summed with one prior to a natural log transformation, and were again multiplied by negative one to preserve the negative sign of the original value (Whittaker et al., 2005). The NH 4 + and NO 3 − concentrations were natural log-transformed.
In each model, we included soil horizon type (organic or mineral), seasonal timing, and horizon by timing interactions as candidate fixed effects along with candidate random effects to account for the spatial structure of the NEON sampling design. Subplot, plot, site, and domain were all coded as unique spatially explicit variables (i.e., there were observations from 1,328 subplots, 349 plots, 35 sites, and 19 domains). Given the NEON design of random stratification of plots within sites across NLCD vegetation class, this parameter was also included as a random effect with 10 levels: deciduous forest, pasture/hay, cultivated crops, shrub/scrub, evergreen forest, mixed forest, grassland/herbaceous, woody wetlands, dwarf scrub, and sedge/herbaceous. NLCD class varies at the plot scale, but not within plots. Because seasonal timing of sampling intentionally differed among sites as a function of climate (e.g., rainfall transitions in some sites compared to spring and fall in others), we generated a timing variable that was more comparable among sites. Specifically, seasonal timing was recoded from the initial five levels (peak greenness, winter/spring transition, fall/winter transition, dry/wet, and wet/dry) to a simpler scheme, where dry/wet and winter/spring were recoded as "ramp up" and wet/dry and fall/winter were recoded as "ramp down," and were then compared relative to "peak greenness." In the initial phase of analysis, we explored adding other fixed effects such as mean annual temperature, soil moisture, pH, and soil organic C concentration, but they resulted in only weak increases in explanatory power of the statistical models (no more than a 0.1 increase in R 2 ). This was not pursued further in order to focus on evaluating components of the sampling design.
We fit the linear mixed models using the lmer function in the "lme4" package in R (Bates et al., 2015). Singular fits were obtained in some cases where a random effect had variance near zero, so models were refit after removing that particular random effect. We assessed variable importance by plotting estimates of fixed effects coefficients, the standard deviations of random effects, and approximate p-values of fixed effects using a Type II Wald chi-square test with the ANOVA function in the "car" package (Fox & Weisberg, 2018). We report the marginal and conditional R 2 values, which represent the approximate variance explained by fixed and fixed plus random effects, respectively (Nakagawa et al., 2017). We evaluated model assumptions using plots of residual versus predicted values and sample quantiles versus theoretical quantiles. Moderate heteroscedasticity and deviations from normality were observed, so models should be treated with caution.

ESM Outputs and Data-Model Comparison
We used modeled monthly rates of net N mineralization from the CESM2, from simulations conducted as part of the Coupled Model Intercomparison Project 6 (CMIP6; O'Neill et al., 2016). Specifically, we used output from the Shared Socioeconomic Pathway (SSP) scenario 5, Representative Concentration Pathway (RCP) 8.5, Most liberal approach as all data are retained, but inflates the data set with many zeroes ( Figure S1 in Supporting Information S1) Leads to the smallest data set as certain sites are removed completely given site-specific issues with contaminated water or batches of KCl ( Figure S1 in Supporting Information S1)  Figure S2 in Supporting Information S1) Allows for the largest data set, all sites and majority of samples retained, yet the data set is not inflated with zeros which represents the high end of plausible future pathways in terms of radiative forcing (O'Neill et al., 2014). Compared with CESM1, the land model used in CESM2 (Community Land Model version 5, CLM5) includes a more detailed and dynamic representation of the ecosystem N cycle (Lawrence et al., 2019). For example, CLM5 allows for flexible plant tissue stoichiometry, interactions between plant N and C, and more dynamic N fixation responses. Observations were compared with the simulated variable "fNnetmin" (i.e., net N mineralization) from soil and litter pools across the whole soil profile. Thus, modeled rates are not a perfect match with the NEON observations, which were collected from soil to 30 cm depth. However, there is a wealth of data to suggest that rates of N mineralization decline substantially with depth (Chen et al., 2019;Darby et al., 2020;Dessureault-Rompré et al., 2016). Since it is reasonable to assume that surface soils are making a significant contribution to modeled rates, the data-model comparison should prove useful. Moreover, temporal comparisons are still quite relevant; if the model represents seasonal patterns in N mineralization correctly, the temporal trends in data versus model should align, even if absolute magnitudes differ. Model data have a nominal 100 km spatial resolution. Site-level net N mineralization data were extracted based on latitude and longitude for each CESM2 grid cell that contained a site. We used the "ncdf4" package v.1.17 in R to work with netCDF files (Pierce, 2021) and the "raster" package v.3.4-5 (Hijmans, 2022) to match site locations to grid cells.
In order for the data set of net N mineralization observations to be comparable to CESM2 outputs, measurements were converted from a per-kg soil basis to per area fluxes, according to the following equation: This required bulk density and percent coarse fragments (particles >2 mm diameter) for each sample. For a subset of plots used for periodic soil sampling, plot-level measurements of these variables were available by downloading DP1.10047.001 Soil physical and chemical properties, distributed initial characterization (NEON, 2022c). This data product includes physical and geochemical properties from 8 to 26 soil pits, generally 1-m deep, dug by the U.S. Department of Agriculture Natural Resource Conservation Service at each site to characterize the diversity of local soils. For this effort, soils were sampled by pedogenic horizon; to convert these data to a fixed depth interval (0-30 cm) to better match inorganic N measurements, data from each pit were separated into "mineral" and "organic" horizons, then the "slab" function in the aqp package (Beaudette et al., 2022) was used to estimate 0-30 cm values. Where periodic sampling and initial characterization data were measured in the same plots, we assigned bulk density and coarse fragment data to all inorganic N concentration values from that plot. For inorganic N samples from plots that lack these measurements, we estimated bulk density using a pedotransfer function (Drew, 1973;Guevara et al., 2020): Bulk density ( Mg∕m 3 ) = 1∕(0.6268 + 0.0361 × organic carbon (%) × 1.724) As percent coarse fragments cannot be estimated from a pedotransfer function, we instead calculated mean of percent coarse fragments by horizon type (mineral vs. organic) and NLCD cover classes within each site, then assigned these values to inorganic N data within the same site, horizon, and NLCD cover. Net N mineralization rates were thus estimated for all organic and mineral horizon samples on an areal basis. Then, horizon values were summed for each within-plot sampling location where both were present. Finally, we took the average and standard error for each site and temporal sampling bout (n = 89), assigned month to each bout of NEON sampling, and assigned dominant NLCD class at the site level according to whichever cover class accounted for the greatest area of the NEON soil plots.
First, we examined the seasonal time course of net N mineralization in observations compared to modeled results from the same year to visually assess how well the two agreed. Then, we plotted average net N mineralization by month and site in observation versus model data and examined the relationships between them using several metrics: Spearman rank correlation, a linear model, and a mixed model with site as a random effect. We calculated the Nash-Sutcliffe efficiency of the data-model comparison using the NSE function in the "hydroGOF" package (Zambrano-Bigiarini, 2020) and the root mean square error (RMSE) using the rmse function in the "Metrics" package (Hamner & Frasco, 2018). Lastly, we extracted the residuals from the linear model and used analysis of variance to see if residuals could be explained by dominant NLCD class or seasonal timing. All analyses and data visualizations were performed using R Statistical Software (v 4.0.5; R Core Team, 2021) and statistical tests were considered significant with p-values <0.05.

Results
The 35 NEON sites examined here displayed wide variation in rates of net N transformations and inorganic N pool sizes (Figure 2). Overall, organic horizons showed higher rates and pool sizes compared to adjacent mineral horizons when expressed on a weight basis. The largest portion of variance in soil extractable pools and net N transformation rates was explained by site, except for net N mineralization where domain was slightly statistically more important (Table 2 and Figure 3b). Plot explained the next largest fraction of variance for most variables, followed by domain, with subplot and NLCD class contributing the least explanatory power. The fixed effects did not explain a high proportion of the variation compared to the random effects, and each model had a substantiation amount of variation in residuals. For the fixed effects, horizon was important for all four variables, with  Table S1 in Supporting Information S1.
higher levels in organic layers compared to mineral layers on a per mass basis (Figure 3a). Soil NH 4 + concentrations and net mineralization rates were also influenced by seasonality. Soil NH 4 + concentrations displayed lowest values during peak greenness compared to the transition seasons (e.g., positive coefficients for ramp up and ramp down when compared to peak greenness, Figure 3a), while the net rates were higher during the ramp up and slower during the ramp down compared to peak greenness-but only in organic horizons.
Most NEON sites displayed some degree of seasonality in net N mineralization rates, and whether temporal patterns aligned between measured and modeled data depended on the site (Figure 4). In some cases, dynamics observed in the field were captured reasonably well by CESM2, such as at the sites STER, HARV, and SCBI (see Table S1 in Supporting Information S1 for site names and metadata). For other sites, the temporal patterns did not coincide, for example, in GUAN, DCSF, and WREF. There was no clear visual indication that the model was better at capturing seasonality in certain types of ecosystems.
The Spearman rank correlation between mean observed net N mineralization and modeled net N mineralization in the same month was significant (p-value <0.001, r = 0.38; Figure 5), as was the linear correlation between modeled versus observed values (Nmin obs = Nmin mod × 0.323 + 0.022, p = 0.003, r 2 = 0.09), but the explanatory power was low. A higher fraction of the variation could be explained by using a mixed model accounting for site as a random effect (p = 0.004, conditional r 2 = 0.22). The RMSE of the datamodel comparison was 0.05 g N m −2 day −1 , and the Nash-Sutcliffe efficiency was −0.51. Analysis of variance  Combined (i.e., Subplot, Plot, Site, Domain, and NLCD Class)  Note. A blank cell means a variable was not a significant predictor. Net nit., net nitrification; net N min., net N mineralization; NLCD, National Land Cover Database; residual, unexplained variance in each model. suggested that dominant NLCD cover was a significant predictor of data versus model residuals, but only for the "Pasture Hay" class, which had larger residuals compared to other land-cover types (p = 0.006). The "ramp down" period had marginally significantly lower residuals compared to peak greenness (p = 0.08), but overall this statistical model had low explanatory power (r 2 = 0.11, p = 0.03).

Discussion
Despite the importance of N mineralization and nitrification for terrestrial N cycling, we have a poor understanding of the controls on these processes at a continental scale. Data collected by NEON provide a potential opportunity to make progress in this area, and the standardized data sets can be useful for validating and improving predictive models. The NEON measurements capture high spatiotemporal heterogeneity in inorganic N pools and fluxes, some of which are predictable as a function of the spatial structure of sampling and core aspects of the NEON design.

Variation in N Pool and Net Flux Rates Across Sites
Certain types of sites conformed to expectations. For example, those with greatest soil nitrate pools and highest net N transformation rates were either in the tropics (GUAN) or under agricultural management (KONA, STER). Most forested sites outside the tropics had larger ammonium pools and very little nitrate, which is typical (although on its own, not diagnostic) for N-limited ecosystems. In contrast, some sites showed unexpected patterns; for example, relatively large soil inorganic N pools and high net transformation rates in the Alaskan tundra (TOOL) where we expected very low N availability, and low net N transformation rates and N pool sizes in subtropical woody wetlands in the Everglades (DSNY). Sites with low inorganic N pools might have low rates of N transformations, or alternatively, high plant N uptake, and this question could potentially be further investigated using additional NEON data.

Sampling Design Captures Important Sources of Variation, But Still Much Is Unexplained
Given that this is a first of its kind effort, it is important to evaluate the NEON design to ensure key sources of variation are being captured. According to the mixed model analysis, the nested components of the NEON design ( Figure 1) all have some explanatory power for inorganic N pools and processes, suggesting that the sampling approach is generally robust. Sites tend to account for the most variation (Table 2), hence adding more sites could yield more insight into soil N cycling compared to adding more plots within sites or subplots within plots. While NEON is unlikely to make this expansion in the near future, and adding more sites is no assurance that fundamental understanding will increase, if others use NEON's standardized, publicly available protocols and analysis methods, the community can assess whether new insight can be gained by filling in geographic, climatic, and other kinds of gaps. The next most important source of variation is at the level of plots, which suggests that NEON should preserve the existing number of plots even if it means sacrificing within-plot replication. This is helpful given budgetary and other logistical pressures and can help to focus efforts when sampling resources are strained (e.g., during the COVID-19 pandemic). Not only are these insights useful for NEON, but also potentially for researchers seeking to set up cross-site soil monitoring efforts or establish within-site sampling designs.
Surficial organic (O) horizons tended to have greater extractable inorganic N pools and higher mineralization rates compared to mineral horizons on a per weight basis (Figures 2 and 3). This is consistent with findings from previous studies, which tend to be limited to single sites or regions (Darby et al., 2020;Hart & Gunther, 1989) and implies that the pattern holds when looking across a wide variety of ecosystems. Given higher N concentrations and faster net mineralization rates per unit soil, organic horizons are likely to be "hotspots" for provision of mineral N and it may be useful for investigators to separate organic and mineral layers, even if otherwise pursuing fixed-depth sampling, to understand the contributions of the organic layer to soil nutrient availability. Moreover, as the climate warms and dries, organic horizons are likely to be reduced in areal density due to increased decomposition rates (Melillo et al., 2017), as their persistence is governed by temperature and moisture constraints as opposed to physical protection of organic matter as in the mineral soil (Schmidt et al., 2011). Because these organic horizons have high rates of microbial decomposition and inorganic N production, especially in the early part of the growing season ( Figure 3a, ramp up), their decline might exacerbate N limitation. On the other hand, there could be counterbalancing forces in the kinds of ecosystems with thick organic horizons (e.g., high latitude or altitude), such as longer growing seasons and an expansion of the active layer, that could yield higher N availability in mineral layers. The net effects on N availability will be important to assess going forward, and NEON data can be leveraged for this.
There was a significant amount of unexplained variation in the statistical models used for predicting inorganic N pools and net flux rates, similar to previous findings (Colman & Schimel, 2013;Li et al., 2020). Clearly, understanding the sources of this unexplained variation in N dynamics remains a ripe area for research, and the NEON samples provide a unique opportunity to integrate a suite of data sets collected at a variety of spatiotemporal scales to assist in uncovering what drives these patterns. Our exploratory analyses suggested that snapshots of soil moisture and temperature (i.e., measurements at the time of soil sampling) did not provide additional statistical insight. However, NEON monitors soil moisture and temperature continuously via automated sensors, and these temporally resolved measurements might shed light on controls on N pool sizes and transformation rates. Sensor data could be especially useful if paired with information on microbial community abundance and composition, which NEON provides. We did not find NLCD class to be a strong driver of N cycle variation, yet this is a very coarse metric of vegetation cover. Within the same NLCD class, various attributes of plant communities can strongly influence N dynamics (e.g., mycorrhizal type, Phillips et al., 2013). Information on plant biomass, composition, phenology, or chemistry, derived from on-the-ground measurements and/or high-resolution airborne remote sensing, are also available from NEON and may provide new insights into how vegetation influences soil N dynamics across diverse ecosystems.

Simulations Do Not Capture Observed Rates of Net N Mineralization
Overall, we observed a relatively poor fit between NEON measured and CESM2 modeled net N mineralization, suggesting room for growth in our understanding of the soil N cycle and its representation in large-scale models. There was a large RMSE and a negative Nash-Sutcliffe efficiency value suggesting that the process model was worse for prediction than taking only the mean of the observed values. However, the Spearman rank correlation was higher than linear model fits, suggesting that although prediction of the absolute values was difficult, there was some correspondence in the general patterns observed. As noted earlier, there are limitations in both the data and model side of the comparison. For observations, we had to make assumptions about bulk density and coarse fragment content in some plots, and soils below 30 cm depth were not included. For the modeled data, the observed climate during net N transformation measurements may have differed from the one simulated in the CESM2 outputs. Nonetheless, it is likely that the way N cycling is mechanistically represented in the model is not sufficient to capture spatial and temporal variation in N cycling processes. We were unable to explain much of the residual in the observations versus simulations linear model with dominant NLCD class or sample timing, but a next step would be to probe more systematically where and why the model does better or worse and seek to improve the underlying process representation from there. Additionally, NEON's standardized, continental-scale data on soil inorganic N pools and fluxes can be used to evaluate process-based model estimates of soil N 2 O emissions. Even without the N 2 O flux data, soil pools and net N transformation rates can serve as standardized benchmarks and play important roles in the N 2 O Model Intercomparison Project (NMIP, Tian et al., 2018;Xu et al., 2021).

Opportunities for Future Research and Engagement
It is critical to understand how cycling of a key limiting nutrient-N-will respond to increasing climatic and other ecological stressors. The NEON data provide a baseline for monitoring changes in soil N cycling going forward. Changes in the timing and magnitude of N pools and process rates will be mediated by complex, cross-scale interactions among different ecosystem components. NEON helps meet this challenge by monitoring many ecosystem dynamics at the same time. Nonetheless, there remain some obvious information gaps. This is where individual researchers can propose to collaborate with NEON (SanClements et al., 2020) to uncover new, continental-scale insights. For instance, we now recognize that minerals play an important role in soil N availability (Bingham & Cotrufo, 2016;Jilling et al., 2018). There is potential to not only leverage existing NEON data on soil mineralogy, geochemistry, and N cycle metrics but also propose new collaborative projects at NEON sites. Our ability to benchmark models and upscale measured rates would be improved by having estimates of bulk density and coarse fragments in all NEON soil plots. This is not part of the baseline NEON design, but could be added through an initiative driven by the scientific community.
There are several ways for researchers to engage with the NEON data sets described in this paper. This ranges from simply downloading and using data from the NEON portal, to taking NEON data tutorials, contributing code resources to the NEON code hub, or proposing collaborations to conduct additional research (Nagy et al., 2021;SanClements et al., 2020). To fully utilize these large, integrated, publicly available data sets, techniques such as machine learning, deep learning, information theory, and others will be needed to uncover previously hidden connections between N cycling and ecological dynamics. Moreover, the biogeochemistry research community will derive the most benefit by wholly embracing the FAIR data principles (Wilkinson et al., 2016), to ensure all derived data sets, code, and analyses are findable, reproducible, and enable new insights to build on the most recent contributions. By leveraging these unique data sets and opportunities, we can unlock new insights into the controls on soil N cycling.