Spatially distributed snow depth, bulk density, and snow water equivalent from ground-based and airborne sensor integration at Grand Mesa, Colorado, USA

. Spaceborne remote sensing of snow currently enables landscape-scale snow covered area, but estimating snow mass in the mountains remains a major challenge from space. Airborne LiDAR can retrieve snow depth, and some promising results have recently been shown from spaceborne platforms, yet density estimates are required to convert snow depth to snow water 15 equivalent (SWE). However, the retrieval of snow bulk density remains unsolved, and limited data is available to evaluate model estimates of density in mountainous terrain. Knowledge of the spatial patterns and predictors of density is critical for accurate assessment of SWE and essential snow physics, such as energy balance and mechanics related to hazards and over-snow mobility. Toward the goal of landscape-scale retrievals of snow density, we estimated bulk density and length-scale variability by combining ground-penetrating radar (GPR) two-way travel-time observations and airborne LiDAR snow depths 20 collected during the mid-winter NASA SnowEx 2020 campaign at Grand Mesa, Colorado, USA. Key advancements of our approach include an automated layer picking method that leverages co-and cross-polarization coherence and distributed LiDAR–GPR inferred bulk density with machine learning. The root-mean-square error between the distributed estimates is 12 cm for depth, 27 kg/m 3 for density, and 42 mm for SWE, and the median relative uncertainty in distributed SWE is 7 %. Wind, terrain, and vegetation interactions display corroborated controls on bulk density that show model and observation 25 agreement. The spatially continuous snow density and SWE estimated over approximately 16 km 2 represents the next step towards broad-scale SWE retrieval.


Introduction
Throughout the past half-century, snowpacks in the western US declined by ∼ 20 % because of ongoing warming (Pierce et al., 2008;Mote et al., 2018).By the end of the 21st century, projections suggest that the snow water equivalent (SWE) in this region will decline by an additional ∼ 50 % (Siirila-Woodburn et al., 2021).The decreased snow water supply and increased demand motivate new innovations for SWE measurement and modeling (e.g., Lettenmaier et al., 2015).Ground observations of SWE, such as those from snow telemetry (SNOTEL) sites or manual measurements performed during snow surveys, provide useful information in the context of a historical record.However, as a strategy for adapting to changing snow-climatological conditions, building the relationship between these observations Published by Copernicus Publications on behalf of the European Geosciences Union.
T. G. Meehan et al.: Spatially distributed snow depth, density, and water equivalent and snow distribution patterns across watersheds requires innovative spatiotemporal datasets and for snow hydrology models to advance.
In this pursuit, NASA's snow experiment (SnowEx; 2017-2023) campaign tested a suite of remote-sensing instruments with the potential to measure global SWE if deployed on a future satellite platform (Marshall et al., 2019).The work presented here was part of SnowEx and was designed to expand the spatial scale over which snow depth and density can be observed and reliably extrapolated.Our work provides a validation dataset for SnowEx SWE retrieval methods and yields new insights into the spatial patterns and driving factors of snow density at Grand Mesa, Colorado.Spaceborne snow depth estimates have been obtained from passive microwave sensors (Tedesco et al., 2010), Sentinel-1 radar returns (Lievens et al., 2019(Lievens et al., , 2022)), high-resolution satellite stereo imagery (Marti et al., 2016;McGrath et al., 2019), and light detection and ranging (lidar) with ICESat (Treichler and Kääb, 2017) and ICESat-2 (Hu et al., 2021;Deschamps-Berger et al., 2023;Besso et al., 2024).Lidar and photogrammetric techniques can measure snow depth by differencing repeated acquisitions during periods with and without snow cover (e.g., Deems et al., 2013).Because it has the advantages of greater spatial resolution and flexible scheduling to target acquisitions during periods of interest, airborne lidar constitutes a prominent method for estimating snow depth and is flown operationally for integration with hydrologic modeling at the catchment scale (Painter et al., 2016;Hedrick et al., 2018).Regardless of the choice in snow depth retrieval, an estimate of snow density is required to convert snow depths to SWE, and bulk density often provides the greatest source of uncertainty in SWE estimates, especially in deeper snow (Raleigh and Small, 2017).
Excavating and weighing snow samples of a known volume remains the state-of-the-art approach for measuring snow density, even though this labor-intensive work limits the number of possible observations.Because snow depth varies more in space than density (e.g., Elder et al., 1991;Sturm et al., 2010;López-Moreno et al., 2013) and depth measurements may be collected more rapidly, density is observed far less frequently (e.g., Rovansek et al., 1993;Elder et al., 1998).As a result, snow sampling strategies tend to be too coarse to examine the 10 0 -10 3 m scale spatial variability of snow density (e.g., Fassnacht et al., 2010), and the spatial nature of snow density remains largely unknown.
Often, empirical models are used to spatially distribute the density in SWE estimates.Linear regression models developed using snow depth alone are often unsuccessful because the snow load only has a linear effect on bulk density, while snow type characteristics (e.g., faceted crystals versus rounded-grain snow) can have an exponential effect (Sturm and Holmgren, 1998).Successful regression models parameterized by snow depth have been split into elevation and month-of-year classes (Jonas et al., 2009), accumulation and melt seasons (Hill et al., 2019), or day-of-year and snow cover classification (Sturm et al., 2010) and account for the effects of snow depth and snow age on density (McCreight and Small, 2014).Snow density often depends on environmental (i.e., slope, aspect, elevation, and vegetation) and climatological (i.e., precipitation, solar radiation, temperature, and wind) factors (Meløysund et al., 2007), which makes these constituents candidate features for predicting distribution patterns (e.g., Winstral et al., 2002).Machine-learning (ML) approaches utilizing environmental or climatological features (e.g., Elder et al., 1998;Wetlaufer et al., 2016;Broxton et al., 2019) are often distributed over vast areas with little of the validation or consideration of the underlying physical processes required to gain an acceptable level of model confidence.
Snow density can also be distributed with process-based snow models, which may account for changes in bulk snow density due to new snowfall, metamorphism, and compaction.The representations of snow densification range in complexity, with some models utilizing time-dependent compaction curves and other models representing snow compaction dynamically as a function of snow viscosity and overburden pressure (Essery et al., 2013).Dynamic models offer more consistent and accurate characterizations of snowpacks; however, even for a single physics-based model, performance in snow density simulations varies across snow climates and watersheds (e.g., Marks et al., 1992;Lv and Pomeroy, 2020).The choice of snow density model (empirical or physical) produces differences in spatial distributions and basin mean estimates of snow density (Raleigh and Small, 2017).
Despite numerous techniques for modeling snow density, few studies characterize spatial variations in snow density and the underlying processes driving variability, largely due to limited density datasets.The labor-intensive nature of in situ observations severely limits spatial analyses, requiring the development of broad-scale snow density retrieval.Relationships between snow density, dielectric permittivity, and radar signals (e.g., Tiuri et al., 1984) provide the radar-retrieved snow density.Yet, many radar remote-sensing retrievals require constraints on the snow depth, density, stratigraphy, and microstructure to be reliable now (Tsang et al., 2022).Our research utilizes ground-penetrating radar (GPR), lidar, and ML to define an approach to map snow density at resolutions appropriate for air-and spaceborne remote-sensing calibration and validation.
Ground-penetrating radar records the amplitude and travel time of each of a series of echoes from short-pulse electromagnetic waves as an image in range-time and position coordinates.Provided the snow depth is constrained, GPR analysis can estimate the snow density, or, by exploiting a ray path function of travel time versus antenna separation (offset), the snow depth and density can be estimated independently (e.g., Griessinger et al., 2018;Meehan et al., 2021).By combining snow depths from drone-based aerial photogrammetry or lidar with GPR travel times, snow density has been estimated  (Hiemstra et al., 2021).(c) Inset map of the contiguous US which identifies the location of Grand Mesa, Colorado.Landcover classification data were accessed from the 2016 National Land Cover Database (Dewitz, 2019).Slope hillshade data were accessed from the USGS 3D Elevation Program (Lukas and Baez, 2021).Cartographic boundary files were accessed from the Census Bureau's Master Address File/Topologically Integrated Geographic Encoding and Referencing geographic database (US Census Bureau, 2020).The geographic coordinate projection of these maps is UTM zone 12 N; EPSG code 32612.
along 100 m scale transects and then analyzed as a time series to understand the densification process (McGrath et al., 2022;Valence et al., 2022;Bonnell et al., 2023) and explore extrapolation across the study-plot scale (Yildiz et al., 2021).
Our work leverages airborne lidar snow depths in tandem with GPR two-way travel times (TWTs) to facilitate density estimates.These data then become input to multiple ML regression approaches to develop and compare spatially continuous estimates of bulk snow density and SWE across the entire lidar domain.Sensitivity testing of regression models informed the model repeatability and forcing processes for spatial density patterns at Grand Mesa, Colorado, USA.As part of this workflow, we developed a reliable automated radar coherence approach for automatically interpreting the TWTs needed to retrieve snow density.This work highlights interactions between snow, terrain, vegetation, and wind in the densification process as well as the importance of careful ML model parameterizations and validation approaches.Our work addresses the need for high-accuracy distributed density measurements as assimilation data for parameterizations of snow densification to reduce runoff model uncertainty.
Additional knowledge of the spatial patterns and predictors of density may improve the calibration, validation, and parameterization of radar remote-sensing SWE retrievals.

Study area
Grand Mesa, Colorado, is a high-elevation subalpine plateau with an average elevation of ∼ 3200 m and an area of ∼ 1300 km 2 .Grand Mesa has a cold and dry continental snow climate, low relief, and vegetation cover that varies from shrub steppe and subalpine meadow to dense conifer forest.These factors, along with the proximity to a regional airport, make Grand Mesa a near-ideal study area for evaluating airborne snow remote-sensing techniques and developing many challenging snow remote-sensing advancements (e.g., Boyd et al., 2022;Singh et al., 2023).
The Grand Mesa NASA SnowEx Intensive Observation Period (IOP) spanned 27 January-12 February 2020.During that time, more than 150 snow pits were excavated and https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024 T. G. Meehan et al.: Spatially distributed snow depth, density, and water equivalent nearly 38 000 in situ snow depth measurements were collected.Snow pits were distributed within forested and open areas along the swaths of the airborne remote-sensing campaign flight lines (Fig. 1).
As part of the SnowEx campaigns at Grand Mesa, five meteorological stations were installed between 2016 and 2017 and operated through the 2021 water year (Houser et al., 2022).Of these sites, the 3 m elevation wind speed and direction data measured at Mesa Middle (MM) and Mesa West (MW) were examined as the validation for snow transport potential and to quantify differences in exposed and sheltered terrain (Appendix C1).The MW station was in exposed western terrain of the mesa, 350 m west of the study domain boundary (Fig. 1).The MM station is sheltered within a dense stand of conifer trees 18.7 km east of the study domain boundary.

GPR data acquisition
Two GPR instruments were operated during the first week of the Grand Mesa IOP.To acquire data within forested areas of central Grand Mesa, a conventional L-band GPR was pulled by ski during 30 January to 1 February and on 5 February (Webb, 2021).This unit was equipped with a global positioning system (GPS) receiver with 2.5 m horizontal accuracy.In open areas, we deployed a multi-polarization L-band GPR fastened within a sled that was pulled by a snowmobile at approximately 3 m s −1 in the open areas of the central and south regions of western Grand Mesa on 28 and 29 January and 4 February 2020 (Meehan, 2021).The snowmobile was driven along the edges of forested stands but could not travel through densely treed areas.The multichannel Lband GPR was configured with one transmitting antenna and two receiving antennas that were oriented parallel (H) and orthogonal (V) to the transmitter (H).The transmit and receive antennas were separated by 25 cm.Using this GPR configuration, we simultaneously acquired the radar imagery in co-and cross-polarizations (HH and HV).A global navigation satellite system (GNSS) receiver with approximately 1 m horizontal position uncertainty was located on the snowmobile 5 m away from the GPR array.We applied a geometric correction to relocate the coordinate positions to the antenna midpoint of each channel.The GPR data were acquired within a few meters of, but not directly beside, the snow pit walls, which necessitated a radius for pairing retrieved or modeled data with validation observations.The GPR systems were operated continuously, collecting approximately 30 traces per second, given the duration of the time window for each trace (30 ns), the sample interval (0.1 ns), and the number of stacks acquired (2).Due to differences in the travel speed, the spatial interval of the GPR traces collected via snowmobile is approximately 10 ± 1 cm, while the interval for traces collected by ski is 5 ± 1 cm.We used piecewise cubic Hermite interpolating polynomials (Kahaner et al., 1989) to fix a geolocation to every acquired trace, as the GPS acquisition rate was 1 Hz.Throughout this week, we acquired 144 km of quasi-gridded and spiraled snowmobiledriven radar transects and 16 km of skied spiral transects in the forest.Spiral transects were coincident with depth measurements.We used a 4.5 km by 3.5 km portion of the snowon lidar acquisition to bound the GPR transects (Fig. 1) and omitted any transects acquired beyond the lidar boundary.

GPR data processing
Multi-polarization radargrams were processed using an automated routine (Meehan, 2024).We applied a frequencywavenumber (F-K) filter as a 2D bandpass filter (Kim et al., 2007).Time-zero correction was performed automatically using the modified energy ratio first-break picker (Wong et al., 2009).We removed coherent noise by subtracting the median trace from the radargrams (Kim et al., 2007).The trace amplitudes were corrected for spherical divergence by applying t-squared scaling as a signal gain function (Yilmaz, 2001).In a random-noise removal step, we then applied edge-preserving smoothing (Kuwahara et al., 1976).This routine emphasized the continuity and amplitude of the ground reflection, which benefited the method for automatically picking the travel times.The GPR data within forests were processed with a bandpass filter, time-zero correction, and background subtraction prior to manual interpretation using a semiautomatic algorithm and are available through the National Snow and Ice Data Center (Webb, 2021).The slower-paced data acquisition by ski improves the quality of the radargram, which benefits the tracking of the ground surface in the more variable forest environment.

Multi-polarization coherence for automatic two-way travel-time determination
The rough ground depolarized the L-band radar signal, and thus we used the coherence between the co-and crosspolarized channels as a filter that illuminates the ground reflections and removes the planar reflections of the snow stratigraphy.We paired the co-and cross-polarization radargrams into shot gathers, which are the bins of traces that share the same transmitter location.The automatic traveltime pick is determined by maximizing the coherence between the co-and cross-polarization shot gathers.To measure the coherence for each pair of traces, we applied the unnormalized cross-correlation sum: which is half of the summed difference between the energy of the stacked traces and the energy of the input traces (Neidell and Taner, 1971).The calculation in Eq. ( 1) is performed in a sliding window over N = 11 samples, which is evaluated at every sample (j ) of the GPR signal (S i,j ) for channel i (M = 2).The HH-HV coherence (C HH-HV ) at each shot location is then normalized by the maximum coherence, . (2) Small (one-wavelength) offsets introduce waves that have an approximately normal incidence with respect to the reflection horizons, such that the nonlinear effects of travel-time moveout are negligible and snow depth can be directly retrieved from the measurement of TWT.Because the offsets are equal, the travel times to the ground for each channel are equal to within a small error (due to the variability of the ground surface inside the radar footprint), and therefore the two channels sum coherently.
We automatically chose the travel time with the maximum coherence of each trace and subtracted 1 ns (1/2 wavelet) to estimate the first break of the reflection (Booth et al., 2010).We then applied a median filter to remove outliers and reviewed the automatic picks for any systematic errors.
2.4 Observed, derived, and explanatory data 2.4.1 In situ measurements Snow pit observations and manual depth probe measurements were collected throughout the 27 January to 12 February 2020 IOP to serve as a validation for the SWE and snow depth retrieved by airborne remote sensing.Snow pits were measured for the snow depth, density, water equivalent, temperature, wetness, liquid-water content, grain size, and stratigraphy (Vuyovich et al., 2021).Snow density (ρ s,pit ) was measured continuously every 10 cm from the snow surface to the ground using a 1000 cm 3 wedge sampler, with duplicate samples.If the difference between the two measurements at a given depth exceeded 10 %, the density was sampled a third time, and bulk density was then calculated by averaging all the measurements for each snow pit.Because the density snapshot we retrieved is valid for the time of the lidar flight, we corrected the measured density to 12:00 on 1 February using densification rates determined by linear regression for both open and forested areas.Liquidwater content was estimated by combining the density and in situ measurements of dielectric permittivity in an empirical formula, which showed that the snowpack remained almost completely dry throughout the IOP (Webb et al., 2021).Snow depth measurements (h s,probe ) were collected using geolocated probes (± 3 m spatial accuracy) along spiral transects (∼ 60 m radius) centered around pits (Hiemstra et al., 2020).

Lidar snow depth
Snow depth (H s,lidar ) was estimated from repeated airbornelidar point cloud surface elevations of snow-free and snowcovered terrain (e.g., Lague et al., 2013).The Airborne Snow Observatories (ASOs) performed the snow-free acquisition on 26 September 2016 (Painter et al., 2016;Painter and Bormann, 2020) and NV5 Geospatial (formerly Quantum Spatial Inc.) acquired snow-covered surface elevations during the IOP; both had a point density of approximately 20 points m −2 .We selected the 1-2 February 2020 flight to minimize temporal differences with the GPR and resulting errors due to snow redistribution and densification.We transformed the 2016 snow-free vertical datum into NAVD88/Geoid 12B (the same as the 2020 snow-on) using NOAA VDatum 4.3 software (NOAA, 2021).Then, we applied the point-cloud-differencing method to estimate snow depth on a 1 m grid (Appendix B1).Negative snow depth values were filtered as no-data values.After computing the snow depth, the 3 m ASO bare-earth and vegetation data products were resampled using the nearest-neighbor approximation to the 1 m resolution of the snow-covered SnowEx 2020 lidar acquisitions, and the coordinate system was transformed from UTM zone 13 N to UTM zone 12 N.As a comparison between our lidar snow depths and data processed using raster differencing, we used the 1-2 February 2020 ASOacquired snow depths and upscaled H s,lidar to 3 m using the nearest-neighbor method.

Lidar-GPR-estimated density
We combined the lidar snow depths with the GPR TWTs to calculate the radar wave velocity, which is only a function of density in dry snow.We applied a k-d tree searcher (Bentley, 1975) to find the lidar coordinates within a 1 m radius of the GPR TWTs.We then used the median values of the TWTs within a 1 m radius of these coordinates to interpolate to the lidar grid.
The average electromagnetic wave speed in the snowpack was estimated using for each of the coincident lidar snow depths (H s,lidar ) and GPR two-way travel-times (τ ).We then related the electromagnetic wave speed to the dry snow density using the Complex Refractive Index Method (CRIM; Wharton et al., 1980): The CRIM equation relies on the known wave speeds of the pore space (v a = 0.3 m s −1 ) and ice matrix (v i = 0.169 m ns −1 ), the measured bulk wave speed of the snowpack (v s,lidar-GPR ; Eq. 3), and the density of ice (ρ i = 917 kg m −3 ) to determine the dry snow density (ρ s,lidar-GPR ; Eq. 4).

Wind and terrain exposure
Wind data were examined from 1 October 2019 through the end of the SnowEx IOP on 12 February 2020.Hourly air temperature data parameterized an empirical relationship to determine the threshold for snow-transportable wind speed (Li https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere , 18, 3253-3276, 2024 T. G. Meehan et al.: Spatially distributed snow depth, density, and water equivalent and Pomeroy, 1997).For values exceeding this threshold minus the 95 % confidence interval, we then determined the median wind speed and direction for snow transport (Fig. S4b).
We utilized the maximum upwind slope (Sx) and wind factor (Winstral et al., 2002;Appendix C1) as parameters explaining the patterns and processes captured by the ML regression ensemble, rather than incorporating this information as model predictors of snow density.To validate the GPR-lidarestimated training data and the modeled results, we calculated the correlation between the model input and output and the Sx and wind factor rasters for all wind directions.

Spatial scales of variability for snow depth, travel time, density, and SWE
We examined the differences in snow properties between forested and open areas using generalized relative semivariograms (Isaaks and Srivastava, 1989).The generalized relative semi-variogram describes the average percent variability, relative to the mean, as a function of separation distance between observations.To estimate the spatial variability of the snow depth, TWT, density, and the resulting SWE of the 1 m gridded data along the radar transects, the experimental variograms were first calculated in 1 m bins up to a 250 m lag and then fitted with exponential models via least squares to estimate the range, sill, and nugget parameters (e.g., Cressie, 1985).We used an exponential variogram model for which the correlation length is equal to 3 times the range parameter.We created 250 realizations of the experimental variogram calculation using Monte Carlo simulation with 10 % random subsampling to assess the means and standard deviations of the variogram parameters (Efron and Tibshirani, 1986).
2.6 Modeling of the spatial snow density

Machine-learning model ensemble
To distribute the spatial observations of average snow density to areas without GPR observations, we tested three regression techniques: multiple linear regression (MLR; Andrews, 1974; Appendix A1.1), random-forest regression (RF; Breiman, 2001; Appendix A1.2), and artificial-neuralnetwork regression (ANN; Jain et al., 1996; Appendix A1.2).We examined the ∼ 16 km 2 area of the lidar domain, which closely bounded the extent of the GPR survey.A set of normalized predictor variables, notated with capital lettering, were developed using the elevations of four lidar rasters (the bare-earth elevation (Z g ), snow-covered elevation (Z s ), snow depth (H s ), and vegetation height (H veg )); the aspect, slope, and x and y derivatives of the elevation rasters (excluding H veg ); and the distance to the nearest vegetation ≥ 0.5 m (S veg ).Aspect rasters were transformed by the cosine to remove the wrapping ambiguity around north.We smoothed the elevation, vegetation height, and snow depth rasters using a median filter with a 5 m × 5 m window and the derivatives of these rasters (slope, aspect, dx, and dy) with a 25 m × 25 m window.Regression models were trained on the lidar-GPR-estimated snow density using cross-validation and were applied to the surrounding terrain.By retraining the model architectures on random subsets of data, 50 model ensembles were generated and then averaged for both RF and ANN regressions.The model hyperparameters were developed such that the variance of the predictions in pixels where training data exists matches that of the predictions.The appropriate hyperparameterization coincided with an R 2 of approximately 0.8.An ML snow density ensemble (ρ s,Ens ) was composed by averaging the MLR, RF, and ANN outputs.For more detail on the model hyperparameterization and predictor importance, see Appendix A.

Gaussian random field model
To serve as a baseline for model assessment, a Gaussian random field snow density model was synthesized from the statistics of the in situ density measurements and the correlation length of snow density estimated via variogram analysis.
Provided that the empirical variogram function was available, a covariance matrix was determined between all pairs of points in the ∼ 16 km 2 domain.Using Cholesky decomposition, the large covariance matrix was efficiently inverted to determine a matrix of weights with the desired covariance properties (Vecherin et al., 2022).The synthetic snow density model was then generated by multiplying a normal random vector with zero mean and the same standard deviation as the in situ observations by the weighting matrix and adding the mean value of the density observations.

Distributed snow water equivalent and uncertainty
Multiplying H s,lidar by ρ s,Ens yielded SWE (b s,lidar-Ens ) distributed throughout the lidar domain.As a benchmark example drawn from in situ sampling, we also distributed SWE using the average snow density (276 kg m −3 ), the average density of both open (280 kg m −3 ) and forest (257 kg m −3 ) areas, and the Gaussian random field model (b s,lidar-Rand ).See Appendix B3 for additional details.We upscaled b s,lidar-Ens to 50 m resolution using the nearest-neighbor approximation for comparison with the 50 m ASO SWE.
Using simple linear regression, we modeled the snow density errors for ρ s,Ens .No correlation between error and snow depth was found, and the RMSE (11 cm) was used to estimate the random error.Using the random errors in snow depth, linear errors in density, and linear error propagation, we estimated the uncertainty in b s,lidar-Ens to first order (Raleigh and Small, 2017).Appendix B4 has further information on SWE uncertainties regarding sampling errors estimated from in situ measurements.

Snow depth
The lidar-derived snow depths show an overall increasing trend from west to east in addition to smaller-scale patterns near vegetation, with deeper snow around the perimeter of treed areas and shallow snow on the ground beneath tree canopies (Fig. 2).This pattern is consistent with previous snow depth distribution studies of Grand Mesa (e.g., Mc-Grath et al., 2019).The mean lidar snow depth for the entire domain is 92 cm with a standard deviation of 18 cm.In open areas (H veg < 0.5 m), the mean lidar snow depth is 96 ± 15 cm, while in the forest (H veg ≥ 0.5 m), the mean lidar snow depth is 79 ± 23 cm.In validation snow depth observations (h s,Pit and h s,Probe ), the average snow depth over the lidar domain is 95 ± 16 cm (R 2 = 0.61, RMSE = 11 cm, ME = 0 cm).Lidar-and GPR-estimated snow depths within the open domain and those within the forested domain are also compared to in situ snow depths (Table 1).In snow depth comparisons between H s,lidar , H s,ASO , and H s,Probe , the two lidar processing methods showed similar correlations (R 2 ≈ 0.6) and root-mean-square errors (RMSE ≈ 12 cm).However, H s,ASO (86 ± 16 cm) underestimates snow depth by 7 cm, while H s,lidar (93 ± 16 cm) is unbiased on average (Fig. S1 and Table S1 in the Supplement).

GPR travel time
Ground-penetrating radar travel-time data analyzed at crossover locations exhibited a root-mean-square deviation of 1 ns with a bias of 0 ns, and no systematic bias between https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024 the two GPR instruments was found.Approximately 90 % of the travel-time data applied in this work were automatically determined using the coherence method, where less than 1 % of the automatic picks required manual correction.To illustrate this, automated picks are overlaid on the radargrams of a 900 m long transect in Fig. 3.The resulting TWT data produced from this method and used in this study are available through the National Snow and Ice Data Center (Meehan, 2021).

Lidar-GPR-estimated density
The lidar-GPR-retrieved average snow density shows repeatable structure at the many crossover locations and greater variability in the open terrain than areas sheltered by forest canopies (Fig. 4).The integrated lidar and GPR data resolve lower spatial frequency patterns than the snow pit observations, which are sparse and have limited spatial support.When compared to snow pit observations, the relative RMSE among the 37 snow pits that are within 12.5 m of the GPR transects is 35 kg m −3 or 13 % (Table 2).The ρ s,Pit and ρ s,lidar-GPR data are both normally distributed, as evidenced by a Z test (Appendix B3) with overlapping standard deviations.The maximum upwind slope and wind factor evaluated on GPR transects show the strongest correlation (R = −0.45 and R = 0.48, respectively) in the 225 and 220°directions (Fig. S2 and Table S2).

Spatial variability of the lidar snow depth, GPR travel time, density, and SWE
The generalized relative semi-variogram allows us to examine the differences in the length scales of variability among depth, density, SWE, and TWT within the forested and open areas of Grand Mesa (Fig. 5).Table S3

Density modeled using machine-learning regression and a Gaussian random field
Using supervised ML regression, three models were generated from lidar information (Fig. S3a-c).Using prior information from the in situ snow pit observations a randomly distributed density was synthesized (Fig. S3d).The mean of the regression-based ensemble was taken to generate ρ s,Ens (Fig. 6).Generally, the regression models predict higher snow density in the open and exposed areas than in areas that are protected from the wind by trees.Each of these five models are evaluated against ρ s,lidar-GPR and ρ s,Pit (Table 2).The ML regression snow densities are generally uncorrelated (R 2 ≈ 0.05) and have an RMSE of ∼ 10 % with snow pit observations.The spatial similarity of these models is presented in Appendix A3.

Model representation of wind, terrain, and vegetation effects
Maximized correlations between the terrain exposure parameters for all wind directions and the ML regression modeled snow density agree with the median wind direction that is able to transport snow (6.4 m s −1 at 200°; Fig. S2).The wind speed and direction data from the Mesa West meteorological station (Houser et al., 2022) are presented for all of the time period from 1 October to 12 February (Fig. S4a) and the time period where the wind was strong enough to transport snow (Fig. S4b).Upwind-slope and wind-factor parameters calculated for a 200°wind direction are provided for reference (Fig. S5).No correlation was evident between these wind exposure parameters and the Gaussian-random-fielddistributed snow densities (Table S2).

Spatially distributed snow water equivalent
SWE distributed within the ∼ 16 km 2 domain (Fig. 7) underestimated the snow pit average SWE by 20 mm (R 2 = 0.57; RMSE = 46 mm; Table 3; Fig. S6a-c).Upscaling b s,lidar-Ens to 50 m resolution resulted in decorrelation (R 2 = 0.03) and increased error (RMSE = 60 mm), and the bias remained nearly the same (−18 mm; Fig. S6d-f).ASO SWE (b s,ASO ; 50 m resolution) had similar errors (RMSE = 57 mm) and bias (−21 mm) and a low correlation (R 2 = 0.1) to measurements (Fig. S6g-i).The 50 m b s,lidar-Ens had a mean SWE and standard deviation of 245 ± 33 mm, while b s,ASO was 236 ± 45.The average snow density of b s,ASO was 274 kg m −3 (versus a value of 264 kg m −3 for ρ s,Ens ).Distributing SWE using the average measured value led to a lower bias (−9 mm) and root-mean-square error (38 mm) than b s,lidar-Ens .Using a constant density causes the SWE spatial patterns and data correlation to be driven solely by snow depth.Tested against the Gaussian-random-distributed density (R 2 = 0.49), the ML-regression-modeled densities explained more variation in the distributed SWE estimates (R 2 = 0.56) -about as much as snow depth alone.For comparison, b s,lidar-Ens is presented alongside H s,lidar and ρ s,Ens in Fig. S7.https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024     H s,lidar snow depths were computed from point cloud differencing rather than raster differencing.The relative accuracy of the snow depth measurement was estimated at 7 cm (based on the maximum standard deviation of the point cloud differencing method), which agrees well with previous lidar error assessments using this approach (Hojatimalekshah et al., 2021).The absolute accuracy of the snow depths (11 cm) was determined by comparison to validation snow depth measurements, which shows 0 cm of bias on average over all measurements (Table 1).On plowed roads, we observed snow depths ranging between 0 and 5 cm (Fig. S8).

Lidar-GPR co-registration
We estimated the horizontal accuracy of the multipolarization GPR positions, which had a clear-sky view and a GNSS receiver, at ± 1 m.The GPR system operated within the forest stands had lower GPS fidelity, which we conservatively estimate at ± 3 m.Ground validation showed less than 1 cm horizontal position uncertainty within the lidar point cloud.We found that errors in the co-registration of the lidar and GPR data are the leading source of error in the estimated densities.The overall accuracy of the spatial registration between the lidar and GPR varies on the order of a few meters.Sensitivity analysis showed how measurement errors propagate into the lidar-GPR-measured snow density (Appendix B2).At the 1σ confidence interval, we found that measurement errors are on the order of 10 cm for lidar and 1 ns for GPR, which may translate into errors in the density estimate of 150 kg m −3 or greater.Variogram analyses show that beyond 10 m, these errors are spatially uncorrelated and can be treated with random noise filtering.After median filtering and interpolating through outliers, error estimates reduced to 30 kg m −3 (Appendix B2).

Spatial and temporal support of density measurements
Measurements accumulated over 12.5 m distance introduce inherent variability on the order of 10 % (Sect.3.4).We found that the expected variability among co-located ρ s,lidar-GPR is approximately 2 %, which is consistent with replicated in situ density observations (ρ s,Pit ).The average density sampled from each of the snow pit columns (within 1 m) shows high repeatability, with a root-mean-square deviation of 2.5 %.We observed linear temporal trends in snow densification of 0.07 kg m −3 h −1 in forested areas and 0.13 kg m −3 h −1 in open areas.These trends were removed and were not considered within uncertainty analyses.

Multi-polarization GPR
This work advances the utility of GPR for seasonal snow applications by resolving the spatial snow density and SWE through the integration of remotely sensed lidar and GPR observations.Grand Mesa is a good site for testing our approach of combining lidar and GPR for density and SWE retrieval, yet it presents many challenges for GPR analysis because of the abrupt discontinuities along reflection horizons due to vegetation and boulders on the ground surface.By exploring the effects of depolarization on L-band GPR signals, we developed a new, automated GPR processing workflow that reliably identifies the ground surface beneath the snow cover.This advance encourages the collection of large multipolarization GPR datasets for operational use by removing the subjectivity involved in the GPR post-processing and interpretation and alleviating the labor of manually interpreting radargrams through an objective function.

SWE uncertainty
Our work characterized the measurement uncertainties and the resulting SWE uncertainty in pursuit of the goal of 10 % uncertainty in global SWE estimation (National Academies of Sciences, Engineering, and Medicine, 2018).We found that GPR TWT and lidar snow depth contribute approximately equal uncertainties to the retrieved snow density.The uncertainty in lidar snow depth tends to vary spatially and is dependent on landscape characteristics such as slope and vegetation (Deems et al., 2013).However, our evaluation of snow depth in forested and open areas did not suggest that lidar snow depth errors were greater beneath the tree canopy (Table 1).The relative SWE uncertainty resembles the snow depth distribution, with shallow snow within forest stands yielding greater relative uncertainty.The absolute SWE uncertainty resembles the snow density patterns, especially in areas of anomalous snow density, where the modeled errors are greatest.Our uncertainty analysis (Fig. S10) suggests that, even when using innovatory measurements from airborne and ground-based sensors, 10 % uncertainty is difficult to achieve.However, the joint lidar-GPR technique shows promise for SWE remote-sensing calibration and validation and demonstrates advantages over contemporary SWE retrievals at plot to forest-stand scales.

Geolocation errors in sensor fusion
Though the signals from lidar and GPR instruments are repeatable and coherent, the leading source of error in our density measurement is spatial misalignments (potentially sourced from geolocation inaccuracies, point cloud to raster processing, and coordinate transformations) that are on the scale of the 1 m resolution data products.In some locations, the co-registration between the two instruments may be nearly exact, and the resulting error will be low.We found by cross-correlating the GPR and co-located lidar snow depth transects that misalignments of approximately 1-5 m are possible.Utilizing 3 m resolution lidar snow depths paired with GPR TWTs for snow density retrieval did not reduce the errors.To evaluate how spatial misalignments impact the training data, the predictor data, and the regression model output, and to estimate the uncertainties introduced upon integrating the cross-platform sensor data, we created multiple sets of training data by effectively perturbing where lidar-GPR transects are aligned via cross-correlation lagging and introducing common practice mistakes in the sensor integration, such as mixing the geographic coordinate system of the data between NAD83 and WGS84.We found that perturbing the sensor integration introduces less than 1 kg m −3 error into the modeled density on average (up to 5 kg m −3 in forest stands), that outlier filtering is robust to sensor integration errors, and that this error is small relative to the overall SWE uncertainty.

Spatial variability in snow hydrological properties
Measurements retrieved from GPR profiles permitted us to quantify the spatial length scales of variability in TWT, snow depth, snow density, and SWE.We determined that density measurements up to ∼ 100 m apart are correlated.These findings differ from a previous variogram analysis that found correlation lengths for snow density of less than 10 m at a smaller study site which limited the maximum lag separation to approximately 50 m (Yildiz et al., 2021).Variability is study-site dependent (Bonnell et al., 2023), and it may be that we have identified an additional longer, lower spatial frequency scaling of snow density.As it is a corollary to SWE, the two-way travel time in dry snow depends on both snow depth and density.We found that TWT and SWE consistently exhibited similar correlation lengths (∼ 100 m) and nugget variabilities in the forested (∼ 35 %) and open (∼ 15 %) areas.This finding supports TWT as an informer of spatial SWE variability.We found that snow depth and TWT reached comparable maximum variabilities in open arhttps://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024 eas (∼ 25 %), while depth variability in the forested areas (∼ 50 %) was greater than that of TWT and SWE (∼ 45 %).
In situ snow density observations have limited spatial support and tend to examine shorter length scales of variability than are expressed in distributed models or retrievable by remote sensing.As a random-sampling strategy was targeted for the IOP, the median distance between snow pit observations in our study is beyond the length scale of variability for snow density.The observations are therefore spatially uncorrelated.Snow density exhibited 4 % greater variability in the open areas than in the forests, indicating that wind exposure increased the variability and, conversely, shelter provided by terrain and vegetation tended to reduce the spatial density variability.The larger (roughly 10 m) spatial support of the lidar-GPR-estimated densities cannot directly sense subpixel correlation lengths and potentially missed a 0-5 m scale break that is more comparable to the spatial support of in situ density observations.Differences between representative observation scales may explain the weak correlation between the estimated density and the in situ measurements (R 2 = 0.01 in open areas and R 2 = 0.39 in forest stands; Table 2b).Our analysis of the correlation length of lidar snow depths generally agrees with the scale breaks identified in previous studies within forested and open areas (Deems et al., 2006;Marshall et al., 2006;Trujillo et al., 2009).We observed decreased correlation and increased error when upscaling the rasterized SWE to 50 m resolution, which suggests that evaluating the 50 m SWE with sparse point measurements may not be the most representative approach, and a greater than 20 % error can be expected due to spatial variability.

Density modeling using ML regression
The ML regression models developed from the lidar and GPR acquisitions during the SnowEx 2020 Grand Mesa IOP will likely have only weak predictive capabilities at other field sites, as they require model recalibration.The SWE predictions reported here represent a single snapshot in time of snow depth and density.The patterns in depth, density, and SWE may be characteristic for mid-winter dry-snow conditions, but other times of the year may exhibit different spatial patterns in all three (e.g., due to variable melt or liquid water during ripening).For each regression model, we identified the most important lidar features that are used to distribute density, and we found dependencies of predictor importance on model choice and architecture (Appendix A2).Vegetation height and proximity to vegetation greater than 0.5 m in height appeared prominent in the three regression models, whereas the dependence of snow density at Grand Mesa on elevation, slope, and aspect was weakened.We used a "kitchen-sink" approach to our regression modeling but found comparable accuracy in models using fewer parameters.Evaluations against snow pit density did not indicate an obvious best model choice, as so we averaged the regression models.
To capture the range of terrain features (i.e., elevation, slope, aspect, and forest attributes) that influence snow densification, one field campaign in the western US collected density measurements from 300 snow cores at 10-20 m intervals and 17 snow pits (Broxton et al., 2019).Based on these observations, bulk snow density was distributed at 1 m resolution using an ANN combined with the airborne-lidar-derived snow depth to estimate the SWE.Broxton et al. (2019) highlighted the importance of representing the broader landscape with distributed densities for estimating SWE by finding ∼ 30 % differences between the distributed estimates and observations from a nearby SNOTEL station.Elder et al. (1998) used a simpler, three-feature (net radiation, slope, and elevation, with an intercept) MLR model that was trained on density observations of five snow pits and averages of five snow core transects to predict the basin-wide average density and SWE.More recently, a similar study used a sampling strategy to represent unique classes of basin-wide physiography, acquiring ∼ 1000 snow core observations and using MLR and binary-classification tree models to distribute the density based on elevation and incoming radiation (Wetlaufer et al., 2016).The dependence of density on net solar radiation may explain the good performance of these models, whereas terrain parameters, such as slope and aspect, relate indirectly to radiation.
We tested model sensitivity to training and learned how many data are required for ML density estimation.Using approximately 30 000 lidar-GPR-derived densities (10 % of the total) from random subsets, we obtained density models that are statistically identical to those generated from the larger dataset.Though random sampling is not a practical method for GPR data acquisition and analysis, this exercise showed that the amount of GPR information required to train the model parameters is not as important as collecting data for a variety of landscape and snow-cover characteristics.

Model representation of snow, terrain, vegetation, and wind interactions
The lidar predictors were inspired by a theory that windterrain-vegetation interactions govern the snow distribution (Winstral et al., 2002).However, to keep the model design innate to lidar information, we did not include wind data or predictors such as the maximum upwind slope and wind factor.Instead, these wind parameters were utilized as a corroboratory metric for explaining spatial patterns retrieved by the lidar-GPR density and predicted in ML regression modeling.We support the idea that spatial patterns are representative of snow exposure or shelter provided by the topography and vegetation from the wind.We found that the prevailing wind direction capable of transporting snow at Mesa West generated the upwind-slope and wind-factor parameters that agree most strongly with the retrieved and ML-modeled snow densities.A larger correlation is observed for the wind factor than the maximum upwind slope, which suggests that wind shelter by vegetation, not only terrain, has an important quantifiable effect on snow density.The role of forest vegetation in snow density is evinced in this topographically simple environment because a large-scale topographic trend, such as one driven by elevation or aspect, does not saturate the signal of retrieved density.In validating wind effects on density, our approach is an explanatory simplification of the controls on snow density, which may be further impacted by forest stands.Effects such as the blocking of shortwave and emitted longwave radiation from forest canopies, the delivery of canopy-intercepted snow to the snowpack, or the loss of snow mass (and thereby the altered compaction) due to the sublimation of canopy-intercepted snow were unaccounted for (Bonner et al., 2022).

Value of pit observations for distributed SWE
The Grand Mesa IOP was one of the largest campaigns to examine the spatiotemporal patterns of SWE, and it provided a rich data source for snow density analysis.In most circumstances, only a few snow pits are dug, and uncertainties arise from the spatial sampling of the underlying density distribution.The distribution of density on Grand Mesa during the early February SnowEx 2020 campaign appears to be a random normal variable with a mean and standard deviation of 276 ± 21 kg m −3 , despite differences of roughly 25 kg m −3 between the average densities of forested and open areas.The large sample size of snow pits allowed us to accurately quantify the mean snow density for distributed SWE estimates.While the uncertainty in any measurement of density was found to be 2.5 % on average, we sought to quantify the degree of uncertainty in the SWE distributed based on the sampled population of snow density as a function of sample size.Our analysis suggests that 10 spatially random snow pit observations within the study domain are sufficient to reduce the median uncertainty in distributed SWE to within 10 ± 2 % (Appendix B4).Although the differences are marginal, we have shown that, on average, this simpler approach for distributing density represents the in situ observations more accurately than the SWE distributed using our modeled estimates of density (Sect.3.7), but it does not resolve any information about spatial patterns of snow density and is therefore not useful for understanding density patterns across the landscape.In situ snow campaigns targeting the average SWE require far fewer pits than are needed to resolve the spatial patterns.Snow pits are an invaluable source of calibration and validation observations, but they do not adequately scale spatially, they incur human errors and biases, and they are time intensive to sample.For example, a team of two can fully sample a 1 m deep SnowEx pit in 2 h, which, for the approximately 100 snow pits in the study area, amounts to ∼ 400 h of labor (excluding the time taken to quality control (QC), cu-rate the snow pit logs, and travel to and from the field site).The 160 km of GPR data used in this work required approximately 20 h to collect and an additional 20 h to QC the TWTs, which amount to ∼ 40 h or roughly a 90 % reduction in field labor, excluding the labor for the acquisition and processing of the airborne lidar.We must note the greater financial cost of obtaining GPR equipment and outsourcing airborne-lidar data collection.
However, densities estimated from GPR TWTs and lidar snow depths are objective, repeatable, and offer the spatial continuity and areal coverage needed to provide insights into the spatial patterns of density.The expense of acquiring airborne remote-sensing data is a crux of the technique, and it may not be feasible to fly entire catchments across the breadth of snow climates.Less-expensive techniques for estimating the SWE distribution, such as drone-based radar retrievals of dielectric permittivity (e.g., Valence et al., 2022), and in situ measurement campaigns combined with ML models (e.g., Wetlaufer et al., 2016;Broxton et al., 2019) should be utilized where appropriate and examined for physical representativeness.

Conclusion
We developed an innovative approach to estimate SWE across a ∼ 16 km 2 domain by evaluating GPR travel times for bulk snow density given a snow depth constraint and then extrapolating across the domain using machine learning.Our automatic and objective technique for interpreting radargrams reduces post-processing labor, which is a primary hindrance to the widespread use of GPR in snow science.We leveraged the lidar-estimated snow depth to solve for snow density along ∼ 160 km of GPR transects.From these alongtrack estimates, we calculated the length scales of variability for depth, density, and SWE and found that snow pit observations that follow the SnowEx 2020 Grand Mesa IOP sampling strategy are independent and unable to resolve spatial patterns < 150 m in scale.Radar travel time informed the dry-snow SWE variability better than either depth or density independently.Snow density distributed by machine learning revealed anomalies associated with localized terrain features and forest stands that shelter the snowpack from wind densification.Density spatial patterns show the best agreement in the direction of prevailing winds strong enough to transport snow, where roughly 60 % of the density variability in our single midwinter survey can be accounted for using a wind factor analysis.On average, distributed relative SWE uncertainty was less than 15 %.While our analysis suggests that measurements from 1 snow pit per km 2 may reduce the SWE uncertainty to within 10 ± 2 %, using such a sampling strategy would not resolve the spatial patterns and variability in snow properties.This pilot study provides a useful method for resolving explanatory spatial patterns in snow depth, density, and water equivalent with comparable uncertainty to in https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024 The MLR model has the form where y is the observed density along the GPR transects, X is a matrix with columns containing the normalized lidar predictors at the coordinates along the GPR transects, β is the vector of the regression coefficients which we seek to estimate, and represents the model residual.From the method of least squares, the multiple linear regression coefficients are estimated as Using cross-validation to assess the model accuracy and sensitivity, we estimated the MLR model parameters.We trained the model with 1000 Monte Carlo simulations by randomly sampling 90 % of the density observations and testing on the remaining 10 %.Additionally, we repeated this process, but we randomly sampled only 10 % of the data and tested on the remaining 90 %.In doing so, we created two sets of parameters that robustly span the parameter space.Using these regression coefficients, Eq. (A1) was computed to distribute the predicted densities.The modeled densities are insensitive to the training chosen for parameter estimation, as the root-mean-square deviation between the two models is less than 1 kg m −3 .

A1.2 Random-forest and artificial-neural-network regression
Whereas MLR models are relatively inflexible and model overtraining is not a concern, techniques such as randomforest and artificial-neural-network regression are highly tuneable and may overfit the data.Hyperparameters determine the model architecture, which is often designed subjectively or through an optimization process.The number of trees and the minimum leaf size of a tree were the hyperparameters adjusted for the random-forest method.Neural networks offer a greater hyperparameter space, allowing for the design of the number of and size of hidden layers, the neuron activation function, and model regularization.The machinelearning models were implemented using the MATLAB Regression Learner application, where it was determined that model hyperparameters which minimize the cross-validation mean-squared error overfit the data.Model overfitting was remedied by training ensembles of models with various hyperparameters.We calculated the averaged standard deviations of density data for each ensemble predicted along the GPR transects (σ train ) and elsewhere in the lidar domain (σ pred ) and the coefficient of determination (R 2 ) of the training data and prediction.The optimal hyperparameters which do not overfit the data were then determined by minimizing the objective function The ratio of the standard deviations asserts that an appropriate model will have similar variance throughout the modeled domain, as it penalizes overfitted data in the training locations and rewards the model which explains the data accurately.We found that the best model parameterizations that are not overfitted had R 2 ∼ = 0.8 with RMSE ∼ = 15 kg m −3 .The corresponding hyperparameters for the random-forest regression were 10 trees with a minimum leaf size of 200.The ANN architecture had two hidden layers, each with 50 neurons and hyperbolic tangent activations, and the regularization of λ = 0.015.

A2.1 Multiple linear regression
We applied the "kitchen-sink" approach because the MLR model that was trained using the lidar-GPR densities, which utilized every lidar predictor, exhibited the largest correlation (R 2 = 0.27) to the observations.However, various model parameterizations which utilized few parameters yielded equivalent accuracies.To assess the importance of the individual predictors, we assembled all combinations of 1 to 17 predictor models, solved the regression for each combination, and cross-validated against a test set of the lidar-GPR-estimated densities.We considered the optimal models to be the top 1 % of outputs, and we tracked which predictors composed each of these models.We identified the relative importance of each predictor (Fig. A1) by summing the number of appearances for a given predictor and dividing by the number of optimal models.Vegetation parameters and the east-west gradient of the ground surface elevation were featured in all the most accurate modeled predictions.Notably, snow density on Grand Mesa exhibits a weak dependence on elevation, aspect, and slope.

A2.2 Random-forest regression
The permutation accuracy importance was calculated to determine which of the lidar-derived predictor variables are  most valuable in predicting the response.The permutation importance was assessed by comparing the accuracy of the prediction for a given learner (tree) and then randomly permuting the predictor variable of interest and recalculating the prediction accuracy (Hapfelmeier et al., 2014).An important predictor will lose its predictive capability after a random permutation, while an unimportant predictor will be unaffected by the randomization.The prediction accuracy was calculated using "out-of-bag" observations that were excluded from the population used to build the decision tree (Breiman, 2001).The relative out-of-bag predictor importance for the ensemble of random forests generated using 10 trees with a minimum leaf size of 200 suggests that the slope of the snow depth, snow surface elevation, ground surface elevation, proximity to vegetation, and vegetation height are the five leading predictors of snow density (Fig. A2).

A2.3 Artificial-neural-network regression
Approaches which partition the weights between neural connections to determine the relative importance of the predictors within an ANN have a classically simple architecture, with the network presenting one hidden layer (Goh, 1995).This technique becomes obfuscated when applying an ANN with multiple hidden layers.To determine the relative importance of predictors within an ensemble of networks with two hidden layers, we simply multiplied the matrices of the weights connecting the input to the first hidden layer, the first to the second hidden layer, and the second hidden layer to the output.The greater the overall weighting that is assigned to a predictor, the greater the importance of the feature.This method suggests that vegetation height, proximity to vegetation, the north derivative of the ground surface elevations, the slope of the ground elevations, and the east derivative of the snow surface elevations are the five leading predictors of snow density (Fig. A3).

A3 Model similarity intercomparison
Visual inspection reveals apparent structural similarity among the three regression-based models.As a quantified model intercomparison, we applied the coefficient of determination measured by Pearson correlation calculated on a pixel-by-pixel basis.The structural similarity index (SSIM; Wang et al., 2004) is a normalized value between 0 and 1 that is defined by the image luminance, contrast, and standard deviation.We calculated the SSIM in 100 m radius kernels (comparable to the estimated correlation length of the density) as a second means of determining the similarity among the model ensembles.Capturing various structural length scales allows an examination of the model similarity across the correlation length.Table A1 overviews the R 2 similarity matrix.Nearly 50 % of the features observed in the MLR model are explained within the RF model and vice versa.The SSIM similarity matrix suggests that there is greater structural similarity with a larger spatial support (Table A1).The https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024 The point cloud differencing method operates directly on point cloud data by selecting core points from the point clouds and distinguishing between the reference point cloud (representing the ground) and the compared point cloud (depicting snow-covered surfaces).Utilizing a user-defined radius denoted as D, the algorithm fits a plane within a radius of D/2 around each core point and determines the normal vector of the plane.Subsequently, based on the core point and the associated normal vector, the algorithm fits a cylinder with a radius of D/2, oriented along the normal vector, with the cylinder axis passing through the core point.This cylinder intersects both the reference (ground) and compared (snow) point clouds, resulting in two sets of points.The algorithm then projects the points within each set onto the normal vector and calculates the average and standard deviation for each set.These values represent the average position and roughness, respectively.In the presence of outliers, the algorithm substitutes the mean and standard deviation with the median and interquartile range, respectively.Ultimately, the local distance between the average positions of the two sets provides the snow depth.However, in cases where surfaces are rough and the surface orientation does not align consistently with the normal direction, the measured distance (snow depth) uncertainty increases.This point cloud differencing method could prove beneficial for flat areas, given the effectiveness of the method on both rough and smooth surfaces.
The Cloud Compare Caractérisation de Nuages de Points (CANUPO) method was employed to distinguish vegetation from ground and snow returns following methods in Štroner et al., (2021).This process involves a combination of train-ing and classification which defines the dimensionality of point clouds (1D for a line, 2D for a plane, or 3D for a volume) around specific points within a sphere at various scales (radii).The expectation is that branches exhibit a more linear (1D) structure, leaves display 2D surfaces at small scales (e.g., centimeter scales), and trees manifest in 3D at larger scales (e.g., meter scales).Both snow and ground surfaces tend to exhibit more of a 2D nature at both small and large scales.The determination of dimensionality is based on principal component analysis within the algorithm.The combination of point dimensionality at different scales is used to define object categories and, in this instance, remove vegetation.

B2 Error analysis of lidar-GPR-estimated density
We conducted a sensitivity analysis to evaluate how the errors in radar travel time and lidar snow depth affect the estimated snow density.This involved establishing a level curve through the average snow values of the study area (mean density 276 kg m −3 , mean TWT 8 ns, and mean snow depth 96 cm) and applying perturbations to evaluate the density error (Fig. B1).Perturbations of up to ± 1 ns were added to the TWT and those of ± 15 cm were added to the depth.After the TWT and depth perturbations were applied, the densities were evaluated, and the mean (276 kg m −3 ) was subtracted from this result to measure the density perturbations.The error bars in Fig. B1 represent the lidar root-mean-square deviation evaluated by co-located depth probing (11 cm) and the RMSD of the GPR TWT crossovers (0.9 ns).At the 1σ level, errors of approximately ± 150 kg m −3 can be expected from this sensor integration method.
The combined measurement and geolocation errors in lidar-derived snow depths and GPR TWTs may translate into errors in the retrieved density that are larger than the range of densities observed in the snow pits.Upon filtering out outliers, the random error reduced to ± 30 kg m −3 and a meaningful density signal was retrieved.We chose the interquartile range of the lidar-GPR-retrieved densities as the threshold for determining outliers because the 25th and 75th percentiles envelop the range of snow densities observed in the snow pits.We then applied a 2D median window with a 12.5 m radius (chosen to extend beyond the correlation length of the errors) to smooth the densities and interpolate those at the outlying locations.
Although we only included GPR observations within 100 h of the 1 February lidar flight, seeking a potential bias related to the effects of densification and redistribution, we regressed ρ s,lidar-GPR against hours elapsed prior to and after 1 February.Separate linear trends were identified in the forested and open regions traversed with the GPR.As conducted for the snow pit density observations, the trends were centered on 1 February and removed from the retrieved density observations.Figure B1.Perturbations were added to the mean values of 8 ns for TWT and 96 cm for depth, and then the density was evaluated to estimate potential errors resulting from sensor integration (colored area and contours).The error bars represent the RMSE of the lidar (11 cm), evaluated by probing, and the RMSD of the GPR TWT crossovers (0.9 ns).At 1 standard deviation, combined snow density errors of ± 150 kg m −3 can be expected from sensor integration, which are reduced to within ± 30 kg m −3 by outlier filtering.

B3 Statistics of the in situ, lidar-GPR-retrieved, and modeled snow densities
To show that the retrieved and modeled densities are within the range of measurements observed in the snow pits, we provide the distributions of these three datasets for the entire study area domain in Fig. B2.The means of the distributions overlap within the standard deviations of the datasets.The lidar-GPR-retrieved densities suggest a broader distribution of densities than those observed in the pits or modeled.Sampling biases may explain the small disagreement between mean values.The sample size and spatial representation of each dataset vary by many orders of magnitude.The distributions are unequally represented by vegetation class, as 18 % of the snow pits (17 of 96), 23 % of the modeled domain (3 665 343 of 15 753 500), and 7 % of the GPR transects (19 978 of 278 627) were located within the forest stands.However, other than a bias of -10 kg m −3 , the distribution of the modeled estimates closely resembles that of the snow pit measurements.Z tests confirm that the snow density data are normally distributed about the mean and the standard deviations are listed with high confidence.

B4 Sample uncertainty of density and SWE
To estimate the uncertainty in average density due to the sample size and to propagate this uncertainty in terms of the SWE, we conducted Monte Carlo simulations by randomly subsampling density observations.Utilizing 1000 Monte Carlo realizations as a function of sample size, we incrementally increased the number of randomly sampled snow pits to estimate the mean and the standard deviation of the distributed average snow density (Fig. B3).We set the sampled standard deviation as the spatial uncertainty in density and summed in quadrature the distributed errors in lidar snow depth (as described in Sect.2.7) to propagate the SWE uncertainty (Fig. B2).Our analysis suggests that 10 snow pit observations are sufficient to reduce the median uncertainty in SWE to within 10 ± 2 %.

Appendix C C1 Maximum upwind slope and wind factor
To infer the physical bases underlying the snow density patterns of ρ s,Ens , we computed two parameters (the maximum upwind slope and wind factor; Winstral et al., 2002) from the 1-2 February lidar-derived snow surface elevations.The maximum upwind slope characterizes the degree of wind exposure for a given pixel.A wind-sheltered pixel has a positive slope, indicating that higher terrain exists upwind of the pixel.Conversely, wind-exposed terrain has a negative slope value, with lower-elevation terrain in the upwind direction.
The maximum upwind slope parameter (Sx) was calculated for each azimuth from 0 to 355°in 5°increments averaged over ± 15-degree overlapping bins (Winstral et al., 2002).A search distance of 25 m was applied in the calculation of the local Sx, while a 250 m distance was used to calculate the outlier length scale Sx between 25 and 250 m from the pixel.The local and outlying Sx parameters were differenced to calculate the slope break parameter, Sb (Winstral et al., 2002).
Because pixels with high wind exposure have negative Sx values, and we hypothesize that wind-exposed terrain will https://doi.org/10.5194/tc-18-3253-2024 The Cryosphere, 18, 3253-3276, 2024 have greater snow density (e.g., due to the formation of wind slabs), the wind direction expressing the largest negative correlation was recognized as the best explanation of the density patterns determined by Sx.
The wind-factor parameter determines the degree of wind exposure or shelter for a given pixel, based on Sx, Sb, vegetation proximity, and the average scalar multiple by which wintertime winds at the MW station exceeded those at the MM station (Winstral et al., 2002).The value of 2.18 determined from the wind speed data at MW and MM is in close agreement with the value of 2.3 determined by Winstral et al. (2002) and was applied to compute the wind factor by inversely rescaling Sx to the range between 1 and 2.18, where larger values indicate more wind exposure.Vegetated windsheltered zones were identified as pixels within a 3 m buffer of lidar vegetation greater than 0.5 m in height.The wind factors of sheltered vegetated areas and regions of Sb which exceeded the 97.5th percentile were arbitrarily reduced by 10 % to enhance the effective wind shelter provided by vegetation.Because the wind factor was inversely rescaled, the wind direction expressing the largest positive correlation with the snow density was recognized as being the best explanation of the density patterns determined by the wind factor parameter.
with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper.While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Figure 1 .
Figure 1.(a) Study area map of snow pit locations (yellow circles), the Mesa West weather station (brown circle), GPR transects (black lines), and the lidar boundary (grey wireframe).Land-cover classification identifies the forested areas as green and lakes as blue.(b) Inset map of Grand Mesa, Colorado, depicting the extent of the dataset acquired during the NASA SnowEx 2020 Intensive Observation Period at Grand Mesa, Colorado (Hiemstra et al., 2021).(c) Inset map of the contiguous US which identifies the location of Grand Mesa, Colorado.Landcover classification data were accessed from the 2016 National Land Cover Database(Dewitz, 2019).Slope hillshade data were accessed from the USGS 3D Elevation Program(Lukas and Baez, 2021).Cartographic boundary files were accessed from the Census Bureau's Master Address File/Topologically Integrated Geographic Encoding and Referencing geographic database (US Census Bureau, 2020).The geographic coordinate projection of these maps is UTM zone 12 N; EPSG code 32612.

Figure 2 .
Figure 2. Snow depths (1 m resolution) from the 1 February 2020 lidar flight.The western half of the domain is relatively unforested shrub steppe (lakes are masked in black), while the eastern half has stands of dense forest (see Fig.1).The color map is centered on the mean value.
Table 1.(a) Mean and standard deviation values of the snow-pit and probe-measured snow depths.(b) Estimates co-located by the lidar and GPR techniques and gathered from within a 1 m radius of the in situ observations are compared to the in situ observations.

Figure 3 .
Figure 3.A 900 m GPR transect with autopicks in magenta for (a) HH and (b) HV profiles of travel time and (c) the coherence of these radargrams (Eqs. 1 and 2).

Figure 4 .
Figure 4. Bulk snow density along radar profiles, estimated by combining lidar snow depths with GPR TWTs.Average densities measured in the 96 snow pits within the lidar boundary are overlaid as larger makers.Forested areas (grey) and lakes (black) are shown.The color map is centered on the mean value.

Figure 5 .
Figure 5. Generalized relative semi-variograms in (a) open and (b) forested areas for lidar snow depth, GPR TWT, average density retrieved along the GPR transects, and the resulting SWE.Experimental variograms were fitted with an exponential model to determine the variogram parameters.The larger markers represent the average nugget, sill, and correlation length estimated by Monte Carlo subsampling.

Figure 6 .
Figure 6.Snow density distributed spatially using the ML regression ensemble average (b s,lidar-Ens ).The color map is centered on the mean value.

Figure 7 .
Figure7.Spatially distributed snow water equivalent estimated using the regression ensemble mean density (b s,lidar-Ens ) and lidar snow depths (H s,lidar ).Forests and wind-scoured areas tend to have lower SWEs, and forest perimeters have higher SWEs.The meter-scale stippled texture is the result of low-stature vegetation (H veg < 0.5) and boulders, which both reduce the snow depth and decrease the snow density.Large markers are SWE values measured at snow pits.Lakes are masked in black.The color map is centered on the mean value.

Figure A1 .
FigureA1.The relative importance of the following lidar-derived predictors: Hs (snow depth), aspctHs (aspect of the snow depth), slpHs (slope of the snow depth), dyHs (north component of the snow depth gradient), dxHs (east component of the snow depth gradient), Zs (snow surface elevation) and its derivatives, Zg (ground elevation) and its derivatives, Hveg (vegetation height), and Sveg (distance to vegetation with a height greater than 0.5 m).The predictor importance was determined from the top 1 % of the models.

Figure A2 .
Figure A2.Relative importance of lidar predictors calculated using the out-of-bag technique for random-forest regression.Uncertainties were developed using random subsets of training data in a Monte Carlo simulation.

Figure A3 .
Figure A3.Relative importance of lidar predictors within the ANN comprising two hidden layers.

Figure B3 .
Figure B3.Monte Carlo uncertainty analysis for (a) mean snow density, (b) the standard deviation of the snow density, and (c) the propagated SWE uncertainty as functions of sample size.

Table 2 .
(a) Mean and standard deviation for snow pit, lidar-GPR (Sects.2.4.1 and 2.4.3), and modeled densities.Snow pit means and standard deviations are estimated from all available data (N = 96 snow pits for the entire domain, N = 79 for the open domain, and N = 17 for the forested domain).(b) Comparison between snow pit densities and estimated densities.Statistics (R 2 , RMSE, and bias) are measured from a subset of snow pits within 12.5 m of the GPR transects (N = 42 for all the domain, N = 36 for the open domain, and N = 6 for the forested domain).(c) ρ s,lidar-GPR -estimated densities are evaluated against modeled results.

Table 3 .
(a)Mean and standard deviation values of the snow water equivalent measured in snow pits and distributed by lidar snow depths using an average snow pit density value (276 kg m −3 ), the average snow pit density in each respective domain (276, 280, 257 kg m −3 ), the regression ensemble densities, and the random-field densities.In situ means and standard deviations are estimated from all available data (N = 96 snow pits for the entire domain, N = 79 for the open domain, and N = 17 for the forested domain).(b) Comparison of SWE between snow pit observations and distributed estimates obtained by multiplying lidar snow depths by the average density measured from snow pits in the respective domain, the ML-ensemble-modeled densities, and the random-field-synthesized densities.
T. G.Meehan et al.:Spatially distributed snow depth, density, and water equivalent situ methods but with spatial continuity at resolutions that make the calibration or validation of airborne or spaceborne radar remote-sensing retrievals of SWE practical.The spatial data generated and the validation data acquired during the NASA SnowEx 2020 IOP at Grand Mesa, Colorado, USA, provide a core set of observables that continue to inform essential snow research.

Table A1 .
(Lague et al., 2013) R 2 values for a pixel-by-pixel intercomparison, and SSIM values for a model intercomparison estimated over a 100 m radius (the approximate correlation length of snow density).Point cloud processing was performed using the Multiscale Model to Model Cloud Compare method(Lague et al., 2013).