Bayesian Approach to Estimate Proglacial Lake Volume (BE‐GLAV)

We present a new model called Bayesian Estimated Glacial Lake Volume (BE‐GLAV) to estimate the volume of proglacial lakes. Presuming the lake cross‐section as trapezoidal, BE‐GLAV uses a Bayesian calibration approach to adjust the cross‐sectional geometry to match modeled and observed lake surface widths. We validated our model using bathymetric measurements from lakes spread across High Mountain Asia (specifically, the Himalaya and Tien‐Shan), with aerial extents ranging from 0.01 to 5.5 km2. The modeled lake volumes agreed with the measured lake volume with a root‐mean‐square absolute uncertainty of ∼14%. With minimum and maximum errors of ∼0.3% and ∼61.2%, BE‐GLAV performed well compared to 10 other models in a model inter‐comparison experiment. Using the measured set of volumes, our model can constrain both the root mean square (RMS) error and the maximum percentage error in modeled lake volume, unlike other models, some of which can compute just the RMS uncertainty.


Introduction
Glaciers have experienced extensive recession and thinning throughout the 20th and 21st century, primarily due to global warming.In many regions, glacier retreat has accelerated over the past few decades (Azam et al., 2018;Bolch et al., 2012;Hock et al., 2019;Zemp et al., 2019), causing the formation and (or) expansion of proglacial lakes (Chen et al., 2020;Haritashya et al., 2018;Shugar et al., 2020;Wang et al., 2020).Some studies suggest a ∼50% increase in the number of lakes and a ∼53% increase in the total lake area worldwide (Shugar et al., 2020).
The High Mountain Asia (HMA) is home to ∼15,000 to ∼25,000 proglacial lakes (e.g., Furian et al., 2021).Some of these lakes are also the sources of Glacial lake outburst floods (GLOFs), which occur when the lake water is discharged catastrophically, causing significant damage to communities and infrastructure and the geomorphology of downstream regions (e.g., Sattar et al., 2022;Wilson et al., 2019;Zheng et al., 2021).Veh et al. (2019) showed that the flood peaks from GLOFs can rival monsoon-fed discharges in rivers up to thousands of kilometers downstream.Recently, Emmer et al. (2020) have shown that several thousand lives have been claimed by a small number of GLOFs and associated mass movements, and many more people suffer when major infrastructure is damaged by them (e.g., Lützow et al., 2023).
The volume of a glacial lake is an important input parameter for many hydrological applications such as modeling the severity of GLOF events (e.g., Emmer et al., 2022;Rinzin et al., 2023); estimating glacier retreat when coupled with various minimalistic ice-flow models (Nick & Oerlemans, 2006); and local small-scale hydropower production (e.g., Sattar et al., 2019).Unfortunately, glacial lakes are often located in some of the most remote and challenging terrains, making the in-situ measurement of their volumes extremely difficult.Consequently, the insitu estimates of lake volume exist only for a handful of lakes in the HMA (e.g., Haritashya et al., 2018;Kapitsa et al., 2017;Sharma et al., 2018;Watson et al., 2020;Zhang et al., 2023).
Several studies have proposed power-law (non-linear) relationships between lake volume and other lake parameters (i.e., area and/or mean depth) to estimate lake volume.These relationships (scaling models) are derived by fitting an exponential/polynomial curve to data representing volume-area/volume-mean depth data for multiple lakes (e.g., Huggel et al., 2002;O'Connor et al., 2001;Zhang et al., 2023).Some also utilize mean lake surface width, volume and area (e.g., Muñoz et al., 2020).Scaling models have been popular because of their simplistic formulation and ease of implementation on large, spatial scales.The scaling models proposed by O'Connor et al. (2001), Huggel et al. (2002), andFujita et al. (2013) were calibrated against the lake volume and area from the Central Oregon Cascade Range, Swiss Alps, and Himalaya, respectively, and Cook and Quincey (2015) derived a scaling relationship using a global data set of lake volume and area.Some scaling models have been derived for a specific lake using time series data of the lake volume and area (e.g., Sharma et al., 2018;Yao et al., 2012).Wang et al. (2012) developed a model using the volume and area data from 20 lakes in the Longbasaba and Pida regions (in China), and Emmer and Vilímek (2013) calibrated their scaling model with volume and area data from 35 lakes from the Peruvian Andes.Further, Zhang et al. (2023) calibrated their model against the areas and volume data from 47 lakes across the Indian, Nepal, Bhutan and Tibetan Himalaya.The scaling models that Muñoz et al., 2020 andQi et al. (2022) developed were among the first to use lake surface width as an input parameter, along with volume and area, calibrating them against data from glacial lakes in the Andes and the Tibetan Himalaya, respectively.
The uncertainty associated with the lake volume estimated using scaling models exponentially increases as the lake area increases (e.g., Zhang et al., 2023) due to the non-linear nature of the empirical relationship between lake volume and lake area.Therefore, the scaling models have been reported to give unrealistic estimates of lake volume (i.e., more than three times the measured lake volume in some cases) for individual lakes (e.g., Zhang et al., 2023).Additionally, Cook and Quincey (2015) conclusively showed that lake volume and lake area are auto-correlated, which can give a false impression of accuracy in the estimated lake volume.Hence, the use of lake area in both the variables of Volume-Area (V-A) regression for proglacial lakes may create a misleading impression about the quality of the data used in the relationship, and the high correlation coefficients may misleadingly indicate reduced uncertainty in the V-A relation (e.g., Cook & Quincey, 2015;Haeberli et al., 2016).With over 5,000 existing glacial lakes at present and more predicted to form in the future, for example, in the Karakoram region (Furian et al., 2021;Zheng et al., 2021), it is important to develop a comprehensive modeling framework that addresses concerns with the empirical volume-area scaling approaches and also constrains the level of uncertainty.In Section 4.2 we address the auto-correlation issue.
The Bayesian Calibration based models have been successfully used in many glaciological applications, such as the estimation of ice thickness (Brinkerhoff et al., 2016;Werder et al., 2020); estimation of supraglacial debris heat conductivity (Laha et al., 2023); optimization of firn densification models (Verjans et al., 2020); and inferring basal sliding parameters (Gopalan et al., 2021).The calibration process is based on Bayes' theorem of probability and allows combining prior knowledge of the system variable(s) with the fitting of a physical model to the observations (e.g., Werder et al., 2020).
This study presents a new Bayesian statistics-based numerical model (Bayesian Estimated Glacial Lake Volume -BE-GLAV) to estimate proglacial lake volume.To our knowledge, ours is the first of its kind among the existing glacial lake volume estimation models.

Materials and Methods
Our model (BE-GLAV) assumes that at any point along the lake's centerline, the cross-sectional profile of the bed elevation of a proglacial lake can be approximated as parabolic or trapezoidal in nature (Figure 1a).This is because the present lake bed was formerly under a glacier whose cross-sectional geometry resembles a trapezium (e.g., Oerlemans, 1997).This geometry works particularly well for lakes with thick sediment deposits on their beds and nearly constant sloping side walls.It is also a useful approximation for lakes that have U-shaped crosssections.We expect that most of the statistical uncertainty associated with this model-which we determine directly for our sample of measured lakes-will occur for common U-shaped lake cross sections, where there are anomalous extreme overdeepenings, or exceptional thicknesses of deposited sediment (hence, shallowing), and other sharp departures from trapezoids.Another source of the anomaly is the ice-ramps that extend under the water line from the connected glacier's terminus (Robertson et al., 2012).These ramps have been found to extend up to >500 m from the calving termini of the connected glacier into the respective proglacial lake.The surfaces of these ice ramps are undulating and covered with a thick layer of unsorted sediment derived from supraglacial and englacial debris, lateral moraines and fluvial deltaic deposition (Robertson et al., 2012).The sediment on the ice ramps reduces melting and, therefore, thinning of the ramps, and can load the ramp to hinder flotation.Consequently, the ice ramps can remain intact a few hundreds of meters in front of the calving termini (Robertson et al., 2012).
Throughout the paper, the phrase "present lake bed" refers to the deglaciated area under the lake at the time of bathymetric measurements.Consequently, for every cross-section, we have three variables, namely lake base width (W b ), lake depth (H), and side slope (λ = cotα 0 + cotα 1 ) (Figure 1a).The lake surface width at a particular cross-section (W s ) can be expressed in terms of the variables as (Figure 1a): Therefore, the volume (V) of a proglacial lake with n cross-sections is computed as: where, W si , W bi , and H i are the lake surface width, base width and lake depth at cross-section i, respectively.x represents 50 m grid resolution.
Our model estimates the physically plausible values of W b , λ, and H for a cross-section by minimizing the difference between observed and modeled surface width for each grid point along the lake's centerline (Figure 1d).The minimization procedure was performed using Bayesian calibration.

Bayesian Calibration
The Bayesian calibration approach identifies W b , λ and H as the free parameters and defines this parameter set as θ.The calibration process relies on Bayes' theorem (Equation 2) that allows updating a probability distribution P (θ) based on observed data D: where P(θ) is called the prior distribution, P(D/θ) is called the likelihood distribution that quantifies the difference between modeled and measured lake surface width, that is, W s and P(θ/D) is the posterior probability distribution and is the quantity that our model calculates.The term P(D) is the marginal probability distribution of the data; is a constant term, independent of θ, that does not affect P(θ/D) and is therefore disregarded in our analyses (e.g., Brinkerhoff et al., 2016).; aqua colored regions represent the currently glaciated areas, the blue region represents the region that was filled with water in the past as well as during the time of bathymetric surveys, green regions represent the portion of the current lake bed that was glaciated in the past but is currently part of the lake bed; (a) assumption of trapezoidal-shaped lake cross-sections where dashed red line represents the actual lake bed, free parameters are W b (base width), λ (sidewall slope parameter), H (lake depth).The lake surface width is W s .Here, λ is formulated as a function of the slope angles (i.e., α 0 and α 1 ) as λ = cotα 0 + cotα 1 ; (b) top view of the lake's 2-D extent and the past ice thickness map of the region; yellow dots represent grid points at 50 m intervals; (c) the procedure to derive the marginal and the joint prior distribution of the free parameters; (d) schematic showing the necessary boundary conditions that is, H = 0, W b = 0 and undefined λ; (e) flowchart of the MCMC block of our model (see Section 2.4).For Type A lakes, variable V = 3, else it is set to 2 (Equation 7); the Monte Carlo module is solved at every grid point (yellow circle) located along the lake's centerline at 50 m intervals (Bottom left corner).Identification of the flow in panel (e) MCMC block can be determined from the Roman numerals.
Earth and Space Science 10.1029/2024EA003542 GANTAYAT ET AL.

For Type A Lakes
By Type A lakes, we refer to those lakes where the ice thickness map of the corresponding connected glacier (i.e., from a time when part of the lake's current bed was beneath the glacier) is available.
We use normal and weakly informative prior distributions about the initial values of θ so that the constraint of P (θ) on P(D/θ) is minor (e.g., Verjans et al., 2020).To calculate P(θ) for the grid points along the lake's centerline (Figure 1d), we first estimate the distribution for each of the free parameters (i.e., W b , λ and H) at the grid points using an ice thickness map of a glacier from a time when either the whole or a part of the current lake bed was glaciated (i.e., the light-green colored region in Figure 1d).Using the ice thickness and the corresponding surface elevation maps of the glacier (Figure 1b), we estimate the maximum extent and the corresponding depth distribution of the glacier bed overdeepening of which the lake is a part (i.e., the aqua colored region in Figure 1b).
We then delineate the centerline of the overdeepening and at every 50 m interval along the centerline we estimate W s , W b , λ, H, W b /W s (R1) and H/W s (R2) for the corresponding cross-sections of the bed overdeepening, assuming that the depression (i.e., the aqua colored region in Figure 1c) is completely filled.We then estimate the mean (μ λ , μ R1 , μ R2 ) and standard deviation (σ λ , σ R1 , σ R2 ) of λ, R1, and R2 at the 50 m apart grid points for the area shown in Figure 1c.We also estimate the correlation coefficients for the pairs (W b , H), (H, λ), and (W b , λ), namely ρ 1 , ρ 2 and ρ 3 , respectively.ρ 1 , ρ 2 and ρ 3 are considered non-zero only when the total number of grid points in the overdeepening area exceeds 10.Because the overdeepening and the lake are both parts of a single, larger glacier bed depression, our model assumes that the mean, standard deviations, and correlation coefficients calculated above can be applied to the corresponding entities in the present lake bed, which is the light-green colored region shown in Figure 1d.The means and standard deviations of λ, R1, R2 for all grid points (except for the extremes; see Figure 1d) located along the centerline of the present lake bed (Figure 1d) are μ λ , μ R1 , μ R2 and σ λ , σ R1 , σ R2 , respectively.The minimum and maximum values of R2 (i.e., m R2 and M R2 , respectively) are also calculated for use in the posterior distribution, as explained in Section 2.4.Similarly, within the present lake-bed, the correlation coefficients for the pairs (W b , H), (H, λ) and (W b , λ) are ρ 1 , ρ 2 and ρ 3 , respectively.At the two grid points where the lake centerline ends, we assume H = 0, W b = 0 and an undefined λ.H is then calculated for each grid point along the centerline of the present lake-bed (Figure 1d) by inferring depth values from the distributed depth map of the glacier bed overdeepening.This map is produced by using the ice thickness and glacier surface elevation for the previously glaciated region, which is now part of the lake (i.e., the green region in Figure 1d).By linearly interpolating the depth values from the grid point that is closest to the former glacier front and is located in the light-green region (i.e., X in Figure 1d), H is calculated for the remaining grid points (i.e., those that are in the blue zone of Figure 1d).
The mean and standard deviation for λ (i.e., μ λ and σ λ ) stay constant at each grid point along the lake's centerline (aside from the first and last) (Figure 1d).The mean and standard deviation for W b is formulated as μ W = W s μ R1 and σ W = σ R1 W s , where W s is the lake surface width at the grid point (Figure 1d).The mean for parameter H (i.e., μ H ) at every grid point is calculated as μ H = H.The standard deviation (σ H ) of H at the grid point is assumed to be of the form: where, the value of the parameter k is set as 0.5 (i.e., we assume 50% margins around lake depth).Finally, for every grid point lying along the lake's centerline (Figure 1d), the prior probability distribution of θ was formulated as a multivariate normal distribution with mean μ Ɵ [μ λ , μ W , μ H ] and the covariance (Σ θ ) a 3-by-3 matrix as shown below: The above procedure was followed for seven lakes (i.e., the lakes named in bold fonts) in Table S1 in Supporting Information S1.

Earth and Space Science
10.1029/2024EA003542

For Type B Lakes
By Type B lakes, we refer to those lakes where the corresponding glacier ice thickness map is unavailable, unlike the Type A counterparts.Our approach to calculating P(θ) at grid points differs slightly when historical ice thickness maps are unavailable.In this case, we use two free parameters (W b and H) instead of three, with λ being treated as a constant equal to two (i.e., the lake's sidewalls are assumed to have 45-degree slopes).Therefore, for these lakes, the free parameter set θ consists of two parameters, W b and H.To avoid limiting P(θ) on P(D/θ), we assume normal and weakly informative priors for the initial values of θ (i.e., W b and H) (e.g., Verjans et al., 2020).Except for the first and last grid points, the prior distribution for H was modeled as a normal distribution with mean μ H = H lake and standard deviation σ H = 0.5H (i.e., we assumed 50% margins about the initial estimate of H).Where H lake is equal to the mean lake depth as determined by the two volume-area scaling models of Zhang et al. (2023) for the Himalaya and Kapitsa et al. (2017) for the Tien Shan.These models were chosen because they were calibrated against the available in-situ data (i.e., volume and area) in the respective regions.We establish a normal prior for W b with a mean μ W and standard deviation σ W .The standard deviation of the observed lake surface widths at the grid points is also set to be equal to σ W . Since there is no prior information on the lake geometry, we assume that μ W equals the grid spacing of 50 m.We further assumed that the free parameters W b and H are independent of each other for lakes where there is a scarcity of past ice-thickness maps of their corresponding glaciers.As a result, for each grid point along the lake's centerline, the prior probability distribution of θ, that is, P(θ), was defined as a multivariate normal distribution with a mean μ Ɵ [μ W , μ H ] and a 2-by-2 covariance matrix (Σ θ ) as shown below: Finally, the prior distribution of θ is formulated as a multivariate normal distribution:

Estimation of P(D/θ)
Our model uses a normal likelihood function P(D/θ) that quantifies the match between modeled and observed lake surface widths: where, W sm and W so represent modeled and observed lake surface widths, respectively, and σ 2 is the variance.The variance sets the spread allowed for the model outputs when compared to measured values and are calculated by taking 10% margins around the corresponding observed surface widths (e.g., Verjans et al., 2020).Allowing for such spread is necessary because model-observation discrepancy can result from various factors, including errors in measuring surface widths from satellite imagery, uncertainty in estimating the mean and standard deviation of free parameters, and uncertainty in modeled past ice thickness map of the glacier that covered a portion of the lake's current bed.Errors are also likely to be introduced by the measured volume, depending on the survey style of the bathymetric measurements, track density, and interpolation technique used in preparing 3-D bathymetric maps from point-based, single beam sonar survey (Watson et al., 2020).

Estimation of P(θ/D)
The posterior distribution P(θ/D) gives our model's probability distribution over the parameter space (i.e., θ) given the observed lake surface widths.In our model, with a weak informative P(θ), the distribution P(θ/D) is substantially governed by P(D/θ).Due to the paucity of any analytical form of P(θ/D) for studied lakes, we investigate the parameter space to generate an ensemble of θ j approximating P(θ/D) (e.g., Brinkerhoff et al., 2016;Verjans et al., 2020).This is achieved efficiently by applying a gradient-free Markov Chain Monte Carlo (MCMC) algorithm, Random Walk Metropolis (RWM) (Hastings, 1970).The posterior distribution of the free parameter set is calculated at every grid point.We first estimate lake surface width for all grid points using the grid point-specific Earth and Space Science 10.1029/2024EA003542 initial estimates of μ θ .The difference between modeled and observed lake surface width is quantified by the likelihood, that is, P(D/θ).The posterior probability (i.e., P(θ/D)) is calculated following Equation 3. At this point, the RWM algorithm starts, and the state of the chain, θ j (Figure 1e part (i)), is set to the original values of the free parameters, and the corresponding posterior probability is saved as P(θ j /D).Note that the j subscript denotes the iteration number, which is equal to 0 at this initial step.The RMW algorithm then proposes a new θ j * from a proposal distribution (Figure 1e part (ii)).Here, we assume the proposal distribution to be the symmetric multivariate normal distribution centered on θ j .Consequently, the random selection of θ j * depends only on the current state θ j and the proposal variance (Σ prop in Figure 1e).For the parameter set θ j *, the model recalculates the lake surface width for the grid point and P(θ j */D) is computed (Figure 1e part (iii) & part (iv)).From there, the proposed θ i * is either accepted or rejected in the ensemble approximating P(θ/D) (vi; Figure 1e part (vi)).This decision depends on the ratio α = P(θ j */D)/P(θ/D) (Figure 1e part (v)).The RWM then selects a random number u from the uniform distribution U(0,1) and θ j * is accepted in the ensemble only if u > α (Figure 1e part (vi)).The saved set becomes the updated current status for the next iteration θ i+1 with its associated P(θ j+1 /D) (Figure 1e part (vii) & part (viii)).The algorithm iterates this process and reaches a final posterior distribution over θ.The RWM algorithm has the property that the chain will ultimately converge to a stationary distribution representing the posterior P(θ/ D).Thus, after a sufficiently high number of iterations of the algorithm, the ensemble of parameter sets is representative of P(θ/D).In our study, we discovered that 5,000 iterations are sufficient to achieve convergence.The proposal covariance Σ prop must account for inter-dependence among the different components of θ; that is, the value of one free parameter can influence the value of another free parameter for the model to reach a good match with the observed data.Σ prop can capture this dependence between parameters and, for optimality, it is updated every given number of iterations (100 in our study) using Equation 7 (Rosenthal, 2011): where Σ cov is the covariance matrix between the model's free parameters at this stage of the iterative chain, and V is the number of free parameters, which is three for lakes with a historical ice thickness map of connected glaciers, and two for those that do not.
The ensemble of P(θ/D) was then checked for the presence of physically unrealistic parameter sets before computing the posterior probability distribution.This was carried out following a two-step approach wherein we first removed all the θs with non-positive values for any free parameters (i.e., W b , λ, H).We then separated the lakes where the historical map of the associated glacier's ice thickness was available and those without.For lakes with ice thickness map, we eliminated θs where the relationship between the simulated lake depth and the associated observed lake surface width was outside the range [m R2 , M R2 ] (see Section 2.2 for more information on m R2 and M R2 ).For the rest of the lakes with no ice thickness map, we first estimated the means of m R2 and M R2 (i.e., μm R2 and μM R2 , respectively) and rejected θs where the ratio of modeled lake depth to observed lake surface width was not in the interval [μm R2 , μM R2 ].The remaining θs in the ensemble after the above filtering process were used to estimate the posterior distribution.After obtaining the posterior distributions of the free parameter set at every grid point along the lake's centerline, we generate an ensemble of lake volume estimates using Equation 2. Finally, the mean of the ensemble is our modeled estimate of lake volume.

Data and Validation Sites
We validated our model, assessing its effectiveness across 66 proglacial lakes in the central and eastern Himalaya and the Tien Shan (Figure S1a and Figure S1b in Supporting Information S1).Table S1 in Supporting Information S1 provides information on the lake parameters of all 66 lakes.The glacier ice thickness and corresponding surface elevation were inferred from the Global consensus ice thickness data set of Farinotti et al. (2019) and Millan et al. (2022).

Modeled Lake Volumes
Modeled and measured estimates of all 66 lake's volume agree well with each other, with a mean, absolute, and relative uncertainty of ∼14% (Table S1 in Supporting Information S1).Lakes Bechung and Aksu 1 had the largest Earth and Space Science 10.1029/2024EA003542 (∼61.2%) and the smallest (0.3%) error in the predicted lake volume, respectively.The inventory of 66 lakes was approximately equally divided into four categories based on their lake area (Figure 2).Except for Lake Bechung, the percentage deviation of our modeled lake volumes from measured is consistently within the range of ∼ 36%-∼36% (Figure 2b; Table S1 in Supporting Information S1).In addition to that, the results were unchanged when a coarse grid resolution of 50-100 m was implemented.
In our further treatment, we use the term error to mean the percentage departure of an individual measured lake volume from its modeled volume.Uncertainty then carries a statistical significance of how the actual (but unmeasured) volumes of an unknown population of lakes may vary from the modeled mean for a given lake area, Figure 2.For simplicity, the set of 66 lakes was divided into four parts based on lake area (a,c,e,g).Boxplot of % error (i.e., % deviation from measured) of modeled lake volumes for lakes with area > 0.2 km 2 (a), 0.05 km 2 ≤ area < 0.2 km 2 (c), 0.03 km 2 ≤ area < 0.05 km 2 (e), area > 0.03 km 2 (g).The red dots in a, c, e, g represent the % error in lake volumes modeled by BE-GLAV; Box plot of measured lake area for lakes with area > 0.2 km 2 (b), 0.05 km 2 ≤ area < 0.2 km 2 (d), 0.03 km 2 ≤ area < 0.05 km 2 (f), area > 0.03 km 2 (h).In all the box and whisker plots, the ends of the whiskers mark the ± 3σ limits beyond which any data point was treated as an outlier.The vertical ends of the boxes represent the Interquartile range.
whereas the term "maximum error" could, for instance, refer to the possible greatest outliers of the shallowest and deepest lakes for a given lake area in any comparable population of about 66 unmeasured lakes in any location where glacial valley hypsometry closely resembles that in the Himalaya and Tien Shan.

Comparison With Other Models
Figure 2 and Table S3 in Supporting Information S1 show the performance of our model with respect to 10 previously published models (listed in Table S2 in Supporting Information S1) that are essentially power-law relations between lake volume and lake area.
The minimum error in lake volume simulated by each scaling model ranged from ∼0.1% to 92%, while the maximum error ranged from ∼92% to ∼1,779% (Table S3 in Supporting Information S1).The large value of the maximum bound of the error indicates the unreliability of the scaling models for individual lakes.Lake volumes modeled using Fujita et al. (2013) and Yao et al. (2012) approaches, both calibrated using volume-area data from Himalayan lakes, showed a systematic uncertainty of at least 130% and 250%, respectively, for all lakes in the Tien Shan (Table S3 in Supporting Information S1).This demonstrates that using some scaling models in places where they have not been calibrated with corresponding observed lake volume-area data is dubious.
The scaling model of Zhang et al. (2023) gave the overall third-best estimate of proglacial lake volume in the Himalayan region with a minimum and maximum error of 0.1% and 233%, respectively, while the scaling model developed by Kapitsa et al. (2017) gave the best estimate of lake volume for all the lakes located in the Tien Shan region (i.e., maximum error < ∼10%) (Table S3 in Supporting Information S1).This is likely due to the inclusion of in-situ volume-area data from 47 lakes in the case of Zhang et al. (2023) and 50 for Kapitsa et al. (2017).
The scaling approach of Emmer and Vilímek (2013) performed second best by being among the top five models for 61 out of 66 lakes (i.e., 48 lakes in Tien Shan and 13 in the Himalaya) (Figure S3 in Supporting Information S1).We were surprised by this outcome because that model was calibrated using volume and area data from lakes in the Peruvian Andes (Emmer & Vilímek, 2013).This suggests morphometric similarities between the Andes and the HMA, which could be attributed to active, receding glaciers and concurrent intense tectonism in both regions.However, our analysis is restricted due to the poor quality of existing knowledge about depth erosion by glaciers and the overdeepened topography of their rock beds (Haeberli et al., 2016).We believe that more research is needed to support the above hypothesis, such as running our model on Peruvian Andes lakes and comparing its performance to that of Emmer and Vilímek (2013).
Our model was among the top five performing models at 65 out of 66 lakes with minimal prior 353 tuning against measured lake area-volume data, unlike scaling models.In our volume estimates, the maximum and minimum errors were ∼1% and ∼61%, respectively (Figure 2 and Table S3 in Supporting Information S1; Supplementary).Compared to all the scaling models, our model's projected lake volume was among the two closest to the observed volume for Lake Bechung (∼61% deviation from measured volume), even with the high magnitude of error.The outcome was the same for model runs at higher spatial resolutions of 30 and 20 m, suggesting that sometimes a lake's morphometry might be too complicated to simulate for any model despite its capability.
Besides volume, BE-GLAV can estimate depth along the lake's centerline.Bathymetry was available for only 17 lakes located in the Himalaya.Unlike volume, modeled and measured lake depth was in good agreement with each other only for eight lakes (Figure S2 in Supporting Information S1).Out of these eight lakes, Thulagi, Imja, Lower Barun, South Lhonak, Bechung, Luggye and Ranzeriaco were classified as Type A lakes, and Maqiongco was classified as Type B lake (Figure S2; Table S1 in Supporting Information S1).For these eight lakes, the root mean square uncertainty between modeled and measured lake depth along the lake centerline ranges between 4 and 17 m (Figure S2 in Supporting Information S1).Discrepancies between modeled and measured lake depth (i.e., along the lake centerline) were found to be large near the calving front.However, the lake depth near the glacier terminus is not only difficult to measure but, at many times, is also marred with uncertainties due to the presence of ice ramps that extend laterally up to at least a hundred meters from the calving front (Haeberli et al., 2016).Additionally, due to the dynamic nature of the region near the glacier terminus, the lake depth is estimated via spatial interpolation of point-based lake depth values that were measured relatively further away from the glacier terminus.These factors contribute to the uncertainty in measured lake depth near the glacier terminus.However, this does not pose an issue, because the main aim of our paper is to estimate lake volume and not lake bathymetry along the lake centerline.
Earth and Space Science 10.1029/2024EA003542 GANTAYAT ET AL.
Results demonstrate that, unlike scaling models where the level of uncertainty in the lake volume is unpredictable, BE-GLAV can restrict the upper limit of the error in volume (likely due to the Bayesian calibration procedure) and do that with minimal pre-calibration against the regional, observed lake volume, and area data.
For every Type B lake, we formulated the prior distribution of lake-depth at every gridpoint (i.e., along the lake centerline) as a Gaussian distribution centered around H lake (where, H lake is the mean depth that is derived using the V-A scaling models proposed by Zhang et al. (2023) and Kapitsa et al. (2017) for lakes located in the Himalaya and the Tien Shan, respectively).Interestingly, despite using auto-correlated variables (i.e., lake volume and lake area) in calculating H lake , the Bayesian calibration part of our model was able to restrict the maximum level of error for individual lake volumes as well as the 95% confidence level of global uncertainty in the modeled lake volume.This shows that BE-GLAV provides a robust way-forward to estimate proglacial lake volume by using mean lake depth derived from a V-A scaling model.Such a feat was earlier not possible by solely using V-A scaling techniques because lake area and lake volume are autocorrelated (discussed in Section 1), which can give a false impression of accuracy in the corresponding estimated lake volume (e.g., Cook & Quincey, 2015;Haeberli et al. (2016)).
It is crucial to highlight that BE-GLAV, as the geometry is presently developed, cannot be applied to supraglacial, ice-marginal, or extremely shallow lakes (depth <5-10 m) since our assumption regarding the trapezoidal or parabolic structure of the lake cross-section may not hold true for such lakes.

Conclusions
We present a proof of concept where a new Bayesian calibration-based numerical model (BE-GLAV) is used to estimate the volume of a proglacial lake.Assuming the lake cross-section to be trapezoidal, BE-GLAV minimizes the difference between modeled and measured lake surface width.Our modeled volumes were in good agreement with that measured with a mean absolute uncertainty of ∼14%, and it was able to limit the maximum error to ∼61%, unlike the large errors of the scaling models.A model intercomparison experiment showed that all models (scaling and our method) showed acceptable performance (about average) for lakes with extremely shallow bathymetry (<10 m) for more than half of the current lake extent (e.g., Bechung).
BE-GLAV could be a good-we think better-alternative to scaling models for regions where there is a paucity of bathymetric surveys.It is easy to implement with minimal calibration, unlike the scaling counterparts.We have indirect evidence that BE-GLAV may perform well for the Peruvian Andes, but we would need to test it there as well as in areas where active tectonics and valley hypsometry could differ from HMA.
validated over 66 lakes and the maximum error in modelled volume got reduced by >5 times as compared to scaling models • Requires minimal calibration with observations as compared to scaling models Supporting Information: Supporting Information may be found in the online version of this article.

Figure 1 .
Figure1.Schematic representation outlining the process to estimate P(θ) for Type A lakes (a)-(d); aqua colored regions represent the currently glaciated areas, the blue region represents the region that was filled with water in the past as well as during the time of bathymetric surveys, green regions represent the portion of the current lake bed that was glaciated in the past but is currently part of the lake bed; (a) assumption of trapezoidal-shaped lake cross-sections where dashed red line represents the actual lake bed, free parameters are W b (base width), λ (sidewall slope parameter), H (lake depth).The lake surface width is W s .Here, λ is formulated as a function of the slope angles (i.e., α 0 and α 1 ) as λ = cotα 0 + cotα 1 ; (b) top view of the lake's 2-D extent and the past ice thickness map of the region; yellow dots represent grid points at 50 m intervals; (c) the procedure to derive the marginal and the joint prior distribution of the free parameters; (d) schematic showing the necessary boundary conditions that is, H = 0, W b = 0 and undefined λ; (e) flowchart of the MCMC block of our model (see Section 2.4).For Type A lakes, variable V = 3, else it is set to 2 (Equation7); the Monte Carlo module is solved at every grid point (yellow circle) located along the lake's centerline at 50 m intervals (Bottom left corner).Identification of the flow in panel (e) MCMC block can be determined from the Roman numerals.