1 1. Introduction

Glacier monitoring in alpine regions plays a crucial role in the understanding of the response of glaciers to climate change and its implications for water resources, natural hazards, and ecosystems. Over the years, various remote sensing approaches have been utilized to capture glacier dynamics and evolution. These approaches include satellite-based sensing (Altena and Kääb 2020; Scherler et al. 2008), satellite synthetic aperture radar (SAR) (Strozzi et al. 2020), 3D photogrammetric reconstruction from UAVs (Chudley et al. 2019; Ioli et al. 2022), aerial platforms (De Gaetani et al. 2021) and optical high-resolution satellite (Giulio Tonolo et al. 2020), ground-based monoscopic or stereoscopic time-lapse cameras (Messerli and Grinsted 2015; Schwalbe and Maas 2017; Hendrickx et al. 2022), ground-based SAR (Dematteis et al. 2018; Noferini et al. 2009), aerial and UAV laser scanner (Hartl et al. 2023), and terrestrial laser scanners (TLS) (Hendrickx et al. 2022; Voordendag et al. 2023).

Satellite-based optical sensing provides a wide area coverage but offers limited spatial and/or temporal resolution, which hinders the ability to capture short-term glacier dynamics of small alpine glaciers. On the other hand, UAV photogrammetry offers high-precision 3D reconstruction of mountain environments at a low cost (Chudley et al. 2019; Ioli et al. 2022). However, it requires in-situ intervention to perform the flights and to deploy and measure ground control points (GCPs). These operations may be time consuming and labor intensive. Consequently, UAV photogrammetry is not suitable for short-term glacier monitoring.

Permanent terrestrial SAR and TLS have gained significant attention for short-term monitoring. SAR, in particular, offers all-weather and day-and-night imaging, making it suitable for real-time monitoring and early-warning systems (Dematteis and Giordan 2021; Noferini et al. 2009). However, both ground-based SAR and permanent TLS are expensive and require complex logistics. These limitations prevent their applications in distributed sites at regional scale. Thus, their usage may be limited the investigation of hazardous locations.

Fixed time-lapse cameras offer an affordable alternative to terrestrial SAR and TLS for gathering qualitative and quantitative information on glacier dynamics (Giordan et al. 2016; James et al. 2016; Maas et al. 2006; Messerli and Grinsted 2015). Digital image correlation (DIC) can be used on sequences of images from terrestrial time-lapse cameras to estimate glacier surface velocity. Initially used in laboratory settings, DIC was first applied to glaciers by Evans (2000). Subsequently, numerous studies employed this technique to measure glacier kinematics (Ahn and Box 2010; Giordan et al. 2016; Hadhri et al. 2019; James et al. 2016; Messerli and Grinsted 2015). However, when images capture non-planar areas, external digital surface models (DSMs) are required to convert the 2D pixel displacements obtained by DIC to 3D displacement vectors.

While most of the existing literature focuses on single time-lapse camera systems, some studies have been conducted on multi-camera systems in recent years. Although it was not applied to glaciers, Roncella et al. (2014) first developed a pure stereoscopic system for monitoring a moving landslide, proposing a fully photogrammetric approach to derive a 3D model of the landslide and detect collapses, without the need for a priori DSMs. Schwalbe and Maas (2017) proposed an application of multiple time-lapse cameras for deriving a coarse digital surface model using structure-from-motion (SfM) to estimate a distance map, which was used to scale displacement vectors. Marsy et al. (2020) used a stereo camera system to construct a glacier DSM and back-projected the DIC-derived velocity field. However, their main focus was on deriving glacier kinematics rather than achieving a short-term 3D reconstruction of the glacier. Recently, Kneib et al. (2022) used a multi-camera system to estimate sub-seasonal variability of supraglacial ice cliff melt rates. Taylor et al. (2023) successfully tested a low-cost Raspberry Pi with a high-quality camera module and a telephoto lens mounted on a boat for reconstructing a glacier calving front by SfM achieving sub-meter accuracy when compared to a UAV benchmark. Blanch et al. (2023, 2024) proposed a low-cost fixed multi-camera system for natural hazard monitoring using a photogrammetric approach, and they achieved sub-decimetric change detection sensitivity in rockfall detection.

Nevertheless, all of the aforementioned studies used small or normal camera baselines (i.e., the distance between the cameras). This made it easier for traditional photogrammetric methods to find corresponding points between images and achieve 3D scene reconstruction. However, in mountainous terrain, finding stable, accessible locations with ideal baseline and viewing geometry can be challenging. Conversely, wide baselines or highly convergent views hinder a 3D reconstruct by traditional photogrammetry.

In traditional photogrammetry, keypoints are identified on different images by local feature detectors and descriptors such as SIFT (Lowe 2004), Kaze (Alcantarilla et al. 2012), and ORB (Rublee et al. 2011) and matched by minimizing descriptor distances in the descriptor space. Bundle adjustment (BA) is then used to estimate the camera parameters and the 3D coordinates of tie points in an arbitrary coordinate system

Extracting tie points becomes challenging when the camera baseline is wide. Changes in viewpoint lead to complex distortions, missing content, and occlusions between corresponding objects, making feature matching difficult. Deep learning (DL) techniques, and particularly attention-based architectures (Vaswani et al. 2023), have emerged as powerful techniques to extract and match features in difficult conditions, such as wide baselines and significant radiometric differences (Jin et al. 2021; Yao et al. 2021). Research in DL-based matching has soared in recent years (Chen et al. 2021), addressing various challenges, including large baselines, oblique images (Yao et al. 2021), historical image matching (Maiwald et al. 2021), low-quality cameras, and public webcams (Wu et al. 2021). A comprehensive survey by Remondino et al. (2022) covers methods from traditional corner detectors to state-of-the-art DL matching approaches.

This paper presents a pilot study on short-term 4D monitoring of an alpine glacier using a low-cost stereo-camera system with a wide baseline. The study focuses on investigating daily glacier dynamics by calculating surface movement, ice volume loss, and glacier retreat of a debris-covered alpine glacier. The proposed methodology integrates 3D reconstruction from stereo cameras for volume estimation and utilizes DIC on monoscopic cameras to derive surface velocities. By utilizing state-of-the-art DL matching techniques, this approach overcomes the challenges posed by the wide camera baselines and allows for accurately reconstructing the scene under suboptimal viewing conditions.

The objective of this work is twofold: (i) to demonstrate the feasibility and effectiveness of using a cost-effective photogrammetric solution for short-term glacier monitoring using a DL approach; (ii) to investigate the short-term relationship between air temperature, ice volume variations, and glacier surface velocity for an alpine debris-covered glacier. By combining stereo and monoscopic image sequence analysis, this study contributes to a better understanding of subseasonal glacier dynamics, particularly in the context of changing climate conditions and rising temperatures.

2 Study Area, Equipment, and Datasets

2.1 Study Area

The Belvedere Glacier (Randolph Glacier Inventory code RGI60–11.02858) is an alpine glacier in Valle Anzasca (Italy), on the east side of the Monte Rosa Massif (N \({45^{\circ};58^{\prime}}\) E \({7^{\circ};55}^{\prime}\)) (Fig. 1). The lower part of the Belvedere Glacier is a temperate debris-covered glacier that covers an area of \(\sim\)\(1.8\,\text{k}\text{m}\)\({}^{2}\) and extends from an altitude of \(\sim\)\(2250\,\text{m\,a.s.l.}\) to \(\sim\)\(1800\,\text{m\,a.s.l.}\) This region is characterized by a gentle slope, and it is fed by ice falls and snow avalanches coming from the Monte Rosa East Face (Haeberli et al. 2002). In its low-relief sector, the Belvedere Glacier splits into two lobes, reaching \(\sim\)\(1800\,\text{m\,a.s.l.}\) The northern lobe, in particular, ends with a prominent ice cliff, from which the River Anza springs. Except for the terminal ice cliff, the glacier tongue is completely covered by rocks and boulders with dimensions ranging from few decimeters to some meters.

Fig. 1
figure 1

a Map of the Belvedere Glacier, with marked the location of the two cameras C1 and C2, the automatic weather station (AWS) and the Zamboni Zappa hut; b The area of study: the stereoscopic reconstruction is focused on the terminal ice cliff (dashed light blue line), while the monoscopic DIC processing from camera C2 image sequence is focused on the upper area of the north lobe (dashed green line)

In the past, several hazardous events originated by the Belvedere Glacier, such as floods and slope instability, threatened the nearby village of Macugnaga and the Zamboni Zappa hut, at 2070 m a.s.l. (Kääb et al. 2004). At the beginning of the twenty-first century, the Belvedere Glacier was characterized by a particular surge-type dynamic (Haeberli et al. 2002); a wave of compression–decompression stresses was generated by an accelerated ice flow derived from the glacier accumulation area. Surface velocities reached up to \(200\,\text{m}\)\({\text{y}}^{-1}\) (Kääb et al. 2004), and the ice thickness increased more than \(20\,\text{m}\).

The northern lobe of the Belvedere Glacier is concurrently experiencing a fast melt; in the past few years, an average retreat of \(\sim\)\(20\,\text{m}\)\({\text{y}}^{-1}\) was documented (Ioli et al. 2022), and it is actively changing every day, due to ice falls and collapses. Therefore, the northern lobe was chosen as a test site for the camera setup.

2.2 The Low-cost Stereoscopic System

The low-cost stereoscopic system consists of two autonomous and independent monitoring units. Each unit includes an off-the-shelf DSLR camera (Canon Eos 1200D), an Arduino microcontroller for camera triggering, and a Raspberry Pi Zero with a SIM card for sending images to a remote server via a GSM network (Ioli et al. 2023a). The instruments are housed in waterproof cases and mounted on tripods anchored to stable rocks along the glacier moraines. Power is supplied by a solar panel combined with a sealed lead-acid battery. The total cost of each unit was less than €2000, including the camera and the lens. The low-cost camera system is described in detail in Ioli et al. (2023a).

The two monitoring stations, labelled as C1 and C2, were installed in summer 2021 on either side of the north tongue of Belvedere Glacier (Fig. 1). The harsh glacier environment with steep and unstable moraines, often subject to rockfall due to glacier retreat, constrained the camera installation location at \(\sim\)\(230\,\text{m}\) (camera C1) and \(\sim\)\(350\,\text{m}\) (camera C2) from the terminal ice cliff, with a strongly convergent pose. Different lenses were used for the two cameras to achieve the same image scale in correspondence of the glacier tongue; C1 was equipped with a 24 mm lens, while a 35 mm lens was used for C2.

The baseline between the two cameras of \(\sim\)\(261\,\text{m}\) ensured a good viewing geometry because of the large parallax between corresponding points. However, the baseline comparable to the camera–object distance (base–height ratio close to one; Table 1) led to complex affinity-like distortions and occlusions between corresponding areas in the two images (Yao et al. 2021). Additionally, C1 was positioned at a lower viewpoint compared to C2, due to site geometry constraints. Therefore, camera C1 provides a limited view that primarily encompasses the frontal ice cliff but does not capture the glacier surface, which is only visible from camera C2 (Fig. 2).

Table 1 Summary of the characteristics of the two cameras. Fields marked with \({}^{*}\) were computed considering the distance between each camera and the ice cliff
Fig. 2
figure 2

Example of a stereo pair acquired on 28 July 2022: a image taken from Camera C1, located on the stream-wise right moraine; b image from Camera C2, located on stream-wise left moraine

The stereoscopic system operated from August 2021 up to December 2022, when it was temporarily unmounted for ordinary maintenance; it was finally mounted again in June 2023. During the operational period, the system was programmed to acquire two images per day, but only one image per day was used for the multi-temporal processing. Only daily images taken during the snow-free period between 01 May 2022 and 13 November 2022 were considered. In this period, the glacier experienced the most significant changes, flow velocity, and ablation rates. On the other hand, during winter and spring seasons, the glacier was covered by snow, making it difficult to extract relevant information from optical images for tasks such as 3D reconstruction and surface velocity estimation.

2.3 UAV Surveys

Two UAV flights, spaced by a 10-day interval, were conducted in summer 2022 to acquire ground-truth data for assessing the proposed methodology. The first UAV flight, labeled UAV‑A, was carried out on 28 July 2022 with a DJI Matrice 300 RTK quadcopter and a DJI Zenmuse P1 camera with a 35 mm lens. During the survey, 436 images were captured, encompassing both nadiral and oblique perspectives, with an average GSD of \(\sim\)\(3\,\text{c}\text{m}\). Additionally, we measured 19 ground control points (GCPs), including both artificial targets and natural features. The GCPs were measured using a combination of a total station Leica MS60 and differential GNSS, employing topographic-grade GNSS receivers Leica GS14 as a rover and Leica 1200 as a local master station. The GCPs included both artificial targets and natural features. During the flight, the UAV was equipped with an on-board RTK GNSS receiver, enabling the acquisition of camera projection centers with decimetric accuracy. The photogrammetric block (Fig. 3a) was processed with the commercial software package Agisoft Metashape (Agisoft 2023) by using 14 GCPs and 5 check points (CPs) to evaluate the block accuracy. The global RMSE evaluated on the CPs was equal to 4.0 cm.

Fig. 3
figure 3

a UAV-A block and b UAV‑B block processed with Agisoft Metashape. The flags represent the targets used either as GCPs (flags with yellow markers) in the BA or as CPs (flags with red markers) to evaluate the quality of the photogrammetric block

The second UAV flight, UAV‑B, was carried out on 05 August 2022 with a DJI Phantom 4 RTK, focusing on a smaller portion of the northern lobe of the Belvedere Glacier. However, due to technical constraints, moving GCPs located inside the glacier body were not measured again. Therefore, only the fixed targets located outside the glacier were employed. Among these, eight targets were designated as GCPs, while the remaining four were used as CPs. The photogrammetric block encompassed 428 nadiral and oblique images, again with an average GSD of \(\sim\)\(3\,\text{c}\text{m}\) (Fig. 3b). A global RMSE of 7.5 cm was obtained on the 4 CPs.

2.4 Meteorological Monitoring Station

To analyze the correlation between the Belvedere Glacier dynamics and external environmental variables, data measured from an AWS located close to the Zamboni Zappa hut were used. The AWS is located at an altitude of \(2075\,\text{m\,a.s.l.}\) and at a distance of 2 km from the area of study. In our study, we analyzed mean daily values of air temperature, precipitation, and incoming solar radiation from 01 May 2022 to 15 November 2022.

3 Methodology

This chapter outlines the methodology developed to monitor the evolution of the Belvedere Glacier northern lobe. The daily images were processed using a framework consisting of two parallel processing chains (Fig. 4): (i) a photogrammetric-based stereoscopic approach; (ii) a DIC-based monoscopic approach. The daily stereo pairs were used to generate 3D models of the glacier terminus, enabling the estimation of ice volume loss on a daily basis by computing point cloud differences along the main flow direction. The main flow direction at the north lobe was identified as the mean direction of the average surface velocity field derived by UAV orthophotos between 2015 and 2022 (Ioli et al. 2022). However, due to limited overlapping views of the two cameras, the stereoscopic approach primarily provides a 3D reconstruction of the terminal ice cliff (dashed-blue area in Fig. 1b). Consequently, deriving the 3D surface velocity field of the glacier solely from photogrammetry was not feasible. The image sequence captured by camera C2, offering a higher viewpoint and broader coverage of the glacier surface, was employed to determine the glacier’s surface velocity over a larger area of the north lobe of the Belvedere Glacier (dashed-green area in Fig. 1b).

Fig. 4
figure 4

General workflow with two parallel processing chains involving stereoscopic reconstruction of terminal ice cliff from stereo pairs of images to derive ice volume losses at the glacier terminus and glacier retreat and monoscopic digital-image correlation to derive surface velocities

3.1 Image Selection

Once the acquired images were received from the monitoring system, an automatic selection of the images was performed to exclude the ones acquired in rainy, foggy, or poor lighting conditions. This selection was based on the analysis of images median entropy (Tsai et al. 2008); the images with a low value over the entire image were rejected. Additionally, a visual inspection was performed on the entire image dataset. The inspection had the following objectives: (i) to select only one image per day; (ii) to reject poor quality images that were not automatically rejected, and images in which the fixed targets placed along the moraines were not visible; (iii) to detect sudden changes in the morphology of the glacier terminus (e.g., icefall).

3.2 Camera Calibration

Each camera was mounted with all its electronics inside a waterproof case and protected by a neutral filter that was glued to the case and fixed in front of the cameras. Since the filter introduced additional distortion, the cameras need to be calibrated inside the boxes to reproduce the final setup. Therefore, a two-step approach was taken: first, a 120 cm \(\times 70~\) cm calibration board with a checkerboard printed on it was used to estimate an initial set of parameters for the interior orientation (Zhang 2000). To this end, the cameras were mounted on tripods inside their cases while the calibration board was moved and rotated in front of the camera to simulate a convergent hemispheric acquisition. For each camera, about 30 images were collected and processed in Agisoft Metashape to obtain a first estimate of the interior orientation. However, the average camera-to-panel distance was much smaller than the actual camera-to-glacier distance. Therefore, a refinement of the calibration was performed in situ by incorporating the stereo pair acquired by the cameras on 28 July 2022, within the UAV-A block, carried out on the same day. This block was processed with Agisoft Metashape to refine the interior camera orientation of the stereo cameras, aided by the increased robustness of the block given by the additional matches between the two images taken by cameras C1 and C2 and the UAV.

An additional challenge was represented by camera interior orientation stability over time. From experimental evidence, the interior orientation parameter that suffered the most because of temperature variations that occurred in mountain environment was the camera principal distance (Elias et al. 2020), while the other parameters remained more stable during time. To mitigate the impact of temperature-induced variations, it was crucial to incorporate camera self-calibration during the stereoscopic processing at each epoch to refine the pre-calibrated principal distance, following the procedure detailed in Sect. 3.4.3.

3.3 Camera Stability and GCPs

Since the two cameras were mounted on topographic tripods, the stability of the cameras was not perfectly guaranteed. In particular, the two cameras experienced small vibrations around their pivots due to wind gusts. On the other hand, the position of the cameras was constrained by topographic heads that kept the position of the camera center constant to the centimeter level. Therefore, the baseline of the cameras can reasonably be considered as constant, with a value of \(261.55\,\text{m}\). Further, camera angular vibrations implied that the relative orientation of the cameras must be estimated at each epoch. Moreover, to fix the world reference system over time, absolute orientation of the stereo model was required. To this end, the position of the cameras was measured in situ using a topography-grade GNSS receiver in real-time kinematic (RTK) positioning. In addition, four GCPs were materialized with plastic targets, anchored to stable rocks in front of the terminal ice cliff and along the streamwise left moraine (Fig. 1a). While the minimum requirement to estimate a Helmert transformation would have been just the two cameras’ location and one GCP, having a redundant number of GCPs overcomes the possibility that on some days not all GCPs were clearly visible in the images due to low clouds or fog. Additionally, GCPs can be included into the BA to refine the cameras’ interior orientation, and in particular the camera’s focal length.

As the cameras are subjected to slight rotations, the image coordinates of the GCPs’ projections must be detected at each epoch. To this end, a feature tracking routine was developed based on the ImGRAFT TemplateMatch method (Messerli and Grinsted 2015), enabling the detection of a template defined on a reference image within a target image through DIC. The routine utilized the orientation correlation algorithm (Fitch et al. 2002), providing increased robustness against illumination changes and achieving sub-pixel accuracy (Dematteis and Giordan 2021; Heid and Kääb 2012).

3.4 Stereoscopic Image Processing Workflow

For the daily stereoscopic reconstruction of the glacier snout, a Python-based toolkit ICEpy4D was developed (Ioli et al. 2023c). ICEpy4D allows for solving different steps of scene reconstruction (Fig. 5). In particular, the main steps consist of: (i) finding corresponding points by deep learning feature matching; (ii) tracking features on sequences of images acquired from the same camera; (iii) performing 3D scene reconstruction and point cloud post-processing. The ICEpy4D code is available in the repository https://github.com/franioli/icepy4d. All the processing was carried out with a mid-class workstation equipped with a 20-core i7-12700 CPU, 32 Gb of DDR5 RAM and a GPU NVIDIA RTX A2000 12 GB.

Fig. 5
figure 5

Scheme of the stereoscopic workflow performed with ICEpy4D. At a generic epoch \(i\), new features are extracted and matched. At the same time, features from the previous epoch \(i-1\) are tracked on the current epoch images. After geometric verification, features successfully matched are used for 3D scene reconstruction. Point clouds obtained on different days are used to compute ice volume differences and glacier retreat

3.4.1 Wide-baseline Image Matching

To find corresponding points, the combination of DL feature-matching algorithms SuperPoint (DeTone et al. 2018) and SuperGlue (Sarlin et al. 2020) was used and fully integrated into ICEpy4D package. SuperPoint is a convolutional neural network (CNN) with an encoder–decoder structure, which detects interesting points and computes 256-value descriptors in a single forward pass. DeTone et al. (2018) trained SuperPoint to detect corner-like features first using corners of synthetic shape datasets as ground truth. They employed a self-supervised approach by applying random homographies to warped copies of the input training images and combining the results (DeTone et al. 2018). SuperGlue is an attention-based graph CNN used for local feature matching. It was specifically designed to match SuperPoint features based on their descriptors. Sarlin et al. (2020) trained SuperGlue in a supervised manner and published two different sets of weights: one for indoor environments and another for outdoor environments. The outdoor environment model was trained on the Megadepth dataset (Li and Snavely 2018). The SuperPoint and SuperGlue models were not fine-tuned to ensure the replicability of the proposed pipeline in different scenarios, without the need for a dedicated ground-truth dataset.

Since feature matching was performed on a GPU, which has limited memory, it was necessary to decompose the full-resolution images into smaller tiles of a maximum size of \(2000\times 1500\,\text{px}\) and match pairs of tiles one after the other. Therefore, a two-step approach was implemented: first, matching was performed with SuperPoint and SuperGlue on downsampled images. Subsequently, the full-resolution images were subdivided into regular tiles, and only the tiles that had corresponding features in the low-resolution images were selected as candidates for a second matching step. In the second step, the selected tiles were matched using the same procedure as before. The features matched in each tile were then reassembled to recover their image coordinates on the original image for geometric verification. Incorrectly matched keypoints, which did not satisfy the epipolar constraint, were rejected using Pydegensac (Mishkin et al. 2015), by imposing a maximum re-projection error of 1.5 px.

3.4.2 Across-epoch Matching

Features matched across two images acquired by the two cameras in the same epoch (intra-epoch matching) are tracked over time (across-epoch matching). This approach has two main advantages: (i) increasing the number of matches at each epoch; (ii) obtaining a time-series at each epoch; (ii) obtaining a time series of corresponding points that can be triangulated to derive their triangulated to derive their 3D coordinates over time. The procedure is illustrated in Fig. 6. For brevity, the images are labeled with the camera labels \(C1\) and \(C2\), while a superscript indicates the acquisition epoch. For a generic epoch \(i\), intra-epoch matches are found between images \(C1^{i}\) and \(C2^{i}\) (see Sect. 3.4.1). The intra-epoch matched features at epoch \(i\) are then tracked across epochs on the images acquired at epoch \(i+1\) for each camera independently. Across-epoch matching is again performed using SuperGlue. If a feature is successfully matched across epochs between \(C1^{i}\) and \(C1^{(i+1)}\) and between \(C2^{i}\) and \(C2^{(i+1)}\), then this feature is considered a valid match also for epoch \(i+1\) and merged with the new intra-epoch matches extracted at epoch \(i+1\). If there are duplicate matches (e.g., the same feature is matched intra-epoch and also tracked across epochs), the duplicate matches are removed. Finally, a new geometric verification is performed to reject outliers.

Fig. 6
figure 6

Scheme of the across-epoch matching procedure: at each epoch, new intra-epoch matches are extracted (horizontal lines), but features are also tracked across epoches (diagonal lines). At epoch 0, only intra-epoch matches are extracted (red triangles). At epoch 1, features matched at epoch 0 are tracked across epoches (red triangles), but new intra-epoch matches are also extracted (green triangles), Similarly, at epoch 2, new across-epoch matches are extracted (yellow pentagons), but features matched at epoch 0 and 1 are also tracked across epoches (red and green triangles)

3.4.3 3D Scene Reconstruction

At each epoch, the reconstruction of the terminal ice cliff was carried out using ICEpy4D in two main steps: (i) relative-absolute orientation, and (ii) bundle adjustment. In the first step, matches between each stereo pair were utilized to estimate the relative pose of the two cameras, based on the camera’s pre-calibrated intrinsic matrix. Subsequently, the absolute orientation was determined by calculating a Helmert transformation, considering the cameras’ locations and the available GCPs. In the second step, a full BA was performed to refine the relative–absolute solution. For this purpose, the Agisoft Metashape software package was employed via its Python API. The choice of Agisoft Metashape was driven by its support for GCPs, the possibility to assign different a‑priori weights to observations, and the availability of a Python API, enabling the software to be run from a Python routine in headless mode. The Metashape BA was seamlessly integrated into ICEpy4D, enabling the entire workflow chain to run automatically without any interaction with the GUI.

During the BA, the cameras’ locations and the available GCPs were used as observations. A centimetric a‑priori accuracy was assigned to the world coordinates of the GCPs. A centimetric accuracy was also given to the camera locations to constrain their positions, while still allowing for an estimation of the cameras’ rotations. Additionally, during the BA process, the principal distance of the two cameras was refined by self-calibration based on the available GCPs, while the remaining interior orientation parameters were kept fixed at their pre-calibrated values.

The dense reconstruction of the ice cliff terminus was again carried out using the Agisoft Metashape API. In fact, although Agisoft Metashape could not perform feature matching with wide baselines due to its reliance on hand-crafted features, it was effective in performing dense matching through its own dense matching algorithm (probably based on a variation of the Semi-Global Matching algorithm (Hirschmüller et al. 2012), when accurate camera poses were available for image rectification. Depth maps were generated from full-resolution images (i.e., the highest quality parameter in Agisoft Metashape) with mild filtering. The estimated depth maps were then used to reconstruct a dense point cloud and a triangulated mesh.

3.5 Volume Variation Estimation

The daily variation of ice volume was determined by calculating the DEM of difference (DOD) along the streamwise direction. A local reference system was established by defining the X‑direction as the streamwise direction, the Z‑direction as the geodetic height, and the Y‑direction to complete the right-hand reference system. The point clouds of two different epoches were rasterized to a reference YZ-plane, corresponding to the frontal view of the terminal ice cliff, using a grid step of 0.3 m, while maintaining the same raster origin and extent (i.e., the two point clouds were first clipped with the same convex polygon). The volume change between two epochs was then calculated by differentiating the two rasters cell-by-cell (Fig. 7). To account for possible incomplete coverage and holes in the 3D reconstruction, which may result in volume underestimation, the computed volume difference was normalized by the percentage of filled cells (i.e., cells with at least one point in the original point cloud) in both rasters. The normalization was computed as in Eq. 1:

$$dV_{\text{norm}}=dV\times\frac{n_{\text{filled}}}{n_{\text{total}}},$$
(1)

where \(n_{\text{filled}}\) is the number of filled cells in both the rasters and \(n_{\text{total}}\) is the total number of cells within the clipping polygon. During the study period considered, the percentage of filled cells in the rasterized point clouds of the terminal ice cliff ranged from 90% to 98.8%, with a mean value of 96.8%.

Fig. 7
figure 7

Scheme of the DEM of difference (DOD) approach used to compute volume variations at the glacier terminal ice cliff. Two point clouds built at epoch i and i+dt (we used dt=5 days to increase the signal-to-noise ratio), were rasterized to a vertical YZ-planes, with normals parallel to the glacier flow direction (X-direction). Before rasterization, the point clouds are clipped with the same convex polygon in the YZ-plane (red line). The colors of the grids represent the depth of each cell of the two rasters along the X direction (i.e., the distance along X from a reference plane). Volume variation was computed by DOD of the two rasters along X

3.5.1 Temporal Lag Between Point Clouds for Volume and Velocity Estimation

To estimate volume variation and surface velocity, the point clouds were processed using pairs spaced 5 days apart. The choice of this interval is consistent with an estimated annual retreat of the terminal ice cliff of about \(\sim\)\(20\,\text{m}\)\({\text{y}}^{-1}\) derived from annual photogrammetric UAV surveys conducted on Belvedere Glacier from 2015 to 2022 (Ioli et al. 2022, 2023b). This results in an average daily retreat of \(\sim\)\(5.5\,\text{c}\text{m}\)\({\text{d}}^{-1}\). Given the decimetric accuracy of the stereo point clouds, as detailed in the results presented in Sect. 4.6, the 5‑day interval was considered appropriate to achieve a favorable signal-to-noise ratio. However, in case of adverse weather conditions, which hindered stereo reconstruction and resulted in the unavailability of a point cloud on a certain date, the closest one acquired within a range between 5 to 7 days earlier was chosen as a substitute. This approach ensured a balance between preserving temporal resolution and minimizing data gaps caused by unfavorable weather conditions.

To ensure consistency between volume variations and surface velocities, the same pairs of days utilized for estimating volume differences from point clouds were also employed for selecting pairs of images to compute surface velocities using DIC on monoscopic image sequences (see Sect. 3.7).

3.6 Automatic Extraction of Ice Cliff Top Edge

To calculate the glacier retreat over time, the top edge of the ice cliff was chosen as a reference point. This decision was made because the top edge remained rather stable and experienced limited changes in morphology compared to the rest of the terminal ice cliff, which is often affected by deformation and other processes that altered its shape, such as ice block collapses.

The top edge of the ice cliff was automatically extracted from point clouds, by exploiting geometric features of each point with respect to its neighborhood (Hackel et al. 2016). To identify elongated features situated at the edges of the ice cliff within the point cloud, the linearity and normal change rate were computed for each point with respect to a neighborhood sphere of radius 2 m. By employing empirically determined thresholds specifically tailored to the study site, these geometric properties facilitated the identification and extraction of the top edge of the glacier terminus from the point cloud data.

3.7 Digital Image Correlation from Single Cameras

DIC allows for determining the displacement of an image patch between two images (master and slave) of the same scene and acquired at different epochs by the same camera. The displacement \(d\) is obtained as (Eq. 2):

$$d=\text{argmax}_{(r,q)}S(A(i,j),B(i+r,j+q)),$$
(2)

where \(S\) is a similarity function, \((i,j)\) are the coordinates of the master patch \(A\), and \((r,q)\) are the coordinates of a search area around the slave patch \(B\). In our study, we adopted the open-source Local Adaptive Multiscale image Matching Algorithm (LAMMA) written in Matlab (Dematteis et al. 2022). LAMMA adopts a hierarchy structure of patch grids of increasing spatial resolution and uses locally adaptive search area sizes, according to the displacements already obtained in the neighboring region. LAMMA allows faster computation and reduces the occurrence of outliers compared to traditional DIC processing (Dematteis et al. 2022). As a correlation function, we used the cosine similarity applied to orientation images (Dematteis and Giordan 2021), which is less sensitive to changes in the shadow pattern and is known to perform well in glacier environments (Heid and Kääb 2012).

We applied DIC to the undistorted images, which were corrected according to the interior orientation parameters estimated at every epoch within stereo BA. Moreover, assuming a zero translation of the camera perspective centers, but only small rotations (Sect. 3.3), the transformation between the image plane at an arbitrary epoch with respect to a reference one (named master) is described by a homography transformation. As master image, we adopted the image taken on 28 July 2022, which was coeval with the DSM built from the UAV-A survey. All the other images acquired were warped to be coregistered to the master image, by computing a homography transformation \(H\) (Eq. 3):

$$H=KRK^{-1},$$
(3)

where \(K\) is the camera intrinsic matrix and \(R\) is the relative rotation between the image planes (Förstner and Wrobel 2016). The relative rotation \(R\) between the camera pose at a generic epoch and the pose of the master camera was known from the stereo processing.

The same image pairs used for volume variation computation, spaced by a time lag of 5 days, were selected for DIC (Sect. 3.5.1). The resulting displacement maps on the image plane, in pixel units, were finally post-processed using a local filter to remove outliers. This filter calculates the average reciprocal vectorial distance of all the displacement vectors in a \(3\times 3\) sliding window, centered on a given node. If the mean distance of the specific node does not belong to the lowest four values, it is replaced with the median of these four vector displacements; otherwise it is maintained as it is.

To geocode the displacement maps, we used the open-source ImGRAFT Matlab toolbox (Messerli and Grinsted 2015). Given a set of camera interior parameters (from stereo automatic calibration), ImGRAFT allows the projection of 3D world coordinates \((x,y,z)\) into the 2D image coordinates and the back projection of image coordinates onto a DSM, expressed as \(f(i,j)\to(x,y,z)\). Thereby, the metric conversion of the image displacement vectors, with components \((du,dv)\), is obtained by \(v(dx,dy,dz)=f(i+du,j+dv)-f(i,j)\), where \(v\) is the 3D displacement vector. The \(f\) function is a form of ray tracing, carried out based on the input DSM and the interior and exterior camera parameters (Messerli and Grinsted 2015). To estimate the transformation between 2D and 3D coordinate systems, we adopted the DSM acquired by UAV photogrammetry on 28 July 2022, re-sampled at 1 m resolution to smooth small local DSM variations.

3.7.1 Orthorectification Uncertainty

The orthorectification of the 2D DIC displacement vectors to obtain 3D vectors is influenced by the chosen DSM and its associated uncertainty (Travelletti et al. 2012). Ideally, using an updated DSM derived from daily stereo processing, as performed in Marsy et al. (2020), would be the optimal approach. However, due to the camera’s viewing location, it was not feasible to reconstruct the daily DSM of the upper surface of the glacier lobe. Consequently, we utilized the UAV- A DSM acquired on 28 July 2022.

To estimate the uncertainty derived by the use of a single DSM over the study period, we conducted the following experiment: using the DSM of 28 July 2022 as the reference, we virtually translated the glacier DSM along the flow direction to create two synthetic DSMs representing the most advanced and the most retreated positions that the glacier snout experienced in 2022. The selection of these positions was based on the actual results of the stereo processing (Sect. 4.5): the most advanced location of the snout was approximately 12 m downstream (May 2022), while the most retreated location was about 5 m upstream (November 2022). This rigid translation of the glacier body not only affected the front position but also resulted in a variation in the glacier surface elevation, i.e., thinning in the case of retreat. Subsequently, we back projected the DIC displacement values of 28 July 2022 onto the original DSM and the two synthetic DSMs. Finally, we compared the metric displacement values considering the median absolute deviation (MAD) between the original and the synthetic results.

3.8 Correlation Between Glacier Dynamics and Meteorological Variables

We compared the behavior of glacier velocity and ice-volume loss at the snout with the mean daily air temperature, precipitation, and incoming solar radiation measured by the AWS located 2 km from the glacier terminus (Sect. 2.4). Since both volume loss and velocity were obtained by comparing point clouds and images acquired with a temporal lag of 5 days, we applied a moving average with a window of 5 days to the weather data. Subsequently, to filter out the main signal due to the seasonal trend, we detrended the time series by subtracting a robust loess smoothing function (Cleveland 1979) evaluated on a period of 180 days. Finally, we calculated the Spearman correlation with the original data and obtained residuals, which indicates how well the relationship between two variables can be described using a monotonic function. We adopted the Spearman correlation because it is more robust against outliers compared to the linear correlation and because it allows the modelling of any kind of correlation (i.e., linear and non-linear) between meteorological variables, ice volume losses, and glacier surface velocities.

4 Results

4.1 Image Acquisition

During the snow-free study period (1 May 2022–13 November 2022), both camera C1 and camera C2 were able to acquire images every day, for a total of 197 images. However, 39 images were discarded due to bad weather conditions, such as rain, low clouds, or fog, resulting in 158 days of valid data for stereo and monoscopic processing (Fig. 8).

Fig. 8
figure 8

Days with images acquired by the two cameras C1 and C2. Valid shots are represented with dark colors, while discarded days due to bad weather conditions are represented with light colors

4.2 Automatic Detection of GCPs

Automatic target tracking of GCPs in monocular image sequences allowed successful tracking of GCPs in 99% of image sequences from both the cameras. Only in 2 out of 158 epochs did the template matching fail, mostly due to the presence of clouds covering the targets or fog reducing the image contrast. In these cases, manual collimation of the target was required. To validate template matching results, manual collimation was performed on all the images. For camera C1, the MAD between the image coordinates of the automatically tracked targets and the manually collimated one was 0.46 px, with a standard deviation of 0.25 px. Similarly for camera C2, the MAD was 0.55 px, with a standard deviation of 0.23 px, denoting a sub-pixel accuracy of the template matching routine.

4.3 Wide-baseline Feature Matching and Tracking

The features matched by ICEpy4D on each stereo pair extended over the entire frontal ice cliff, with some features detected in overlapping regions along the streamwise left moraine (Fig. 9), near targets T1 and T2 (see Fig. 1b). For each epoch, a number of features ranging from 1000 to 3500 were successfully matched and validated following geometric verification (Fig. 10a). Among them, approximately 30–40% of the matched features were effectively tracked from previous epochs, increasing the total number of matches available for camera pose estimation while establishing a linkage between consecutive epochs.

Fig. 9
figure 9

Example of features successfully matched on the stereo pair acquired on 27 September 2022

Fig. 10
figure 10

a Number of valid matches extracted from the stereo pair at each epoch and number of points tracked from the previous epoch. b Average and standard deviation of the reprojection error obtained by projecting the 3D coordinates of the tie points to the images. The average reprojection error was computed as the mean reprojection error for all the points on the two images

After BA, the global average reprojection error was computed by projecting the 3D coordinates of all the tie points to the two images and averaging the norm of the reprojection errors for all the points. Figure 10b show the global reprojection error, which exhibited a consistent fluctuation around 0.45 pixels (px), with a standard deviation of approximately 0.3 px.

4.4 3D Scene Reconstruction

A total number of 158 point clouds was obtained from the stereo processing, corresponding to the days with valid images (Sect. 4.1). At each epoch, the dense point cloud consisted of 5 to 7 million points, with a spacing of \(\sim\)\(3\,\text{c}\text{m}\) at the terminal ice cliff, which is comparable to the average image GSD on the ice cliff. Due to the wide baseline and the different camera viewpoints, only the sub-vertical part of the terminal ice cliff and a few boulders near the top of the ice cliff were reconstructed.

For a clear representation, Fig. 11a shows a point cloud for each month. The figure highlights a total glacier retreat of approximately \(\sim\)\(17.5\,\text{m}\). Four vertical cross sections, shown in Fig. 11b‑e, were extracted parallel to the streamwise direction at specific locations identified in Fig. 11a. The accelerated ice ablation rate that occurred during the summer, particularly from June to September, is evident in the leftward and downward shifts of the cross sections (i.e., light green, light blue, and red curves) compared to those in the fall (i.e., yellow, dark green, and dark blue curves). In particular, cross section AA’ (see Fig. 11b) captures an ice block collapse in August. This can be seen as the rightmost sections detach from the main ice cliff body and slide downward and to the right from May to August. This behavior is consistent until the September (i.e., yellow profile), where the detached ice block is no longer present. The remaining sections of the profile show a leftward movement of the ice cliff, highlighting a glacier retreat. Cross section BB’ (Fig. 11c) depicts the deformation of the ice cave at the center of the terminal ice cliff, where the Anza river originates. Initially extending about \(5\,\text{m}\), the ice cave was reduced to a small cavity between May and July due to ice melting, resulting in a more vertical terminal ice cliff. Cross sections CC’ and DD’ (Fig. 11d–e) show the ice melting and glacier retreat. The effect is much more pronounced during the summer months, from May to September, and decreases by an order of magnitude during the fall season.

Fig. 11
figure 11

a Subset of the series daily dense point clouds built at the beginning of each month from 01 May 2022 to 01 November 2022. The basemap is derived from a previous UAV survey carried out by the authors in July 2022 and serves as a visual reference. b–e Vertical cross sections extracted at the location marked in a and named respectively Sez. AA’, Sez. BB’, Sez. CC’ and Sez. DD’. All the cross sections are extracted in a local reference system, with the X‑direction pointing in direction of the glacier flow, the Z‑direction pointing upward, and the Y‑direction completing the right-hand reference system. Please note the different scales of the figures, indicated by the scale bar (in meters)

4.5 Volume Variations and Glacier Retreat

Estimation of ice volume variation was performed by DOD along the streamwise direction on a daily basis (Sect. 3.5). Figure 12 presents the daily ice volume variations and the cumulated curve, highlighting the varying ablation rate throughout the study period. Every value of the time series is referred to the mean instant between the date of the two point clouds from which it was derived (see Sect. 3.5.1). The ice loss rate increases during summer and significantly reduces in autumn, particularly after mid-September. A total ice loss of \(63\,000\,\text{m}\)\({}^{3}\) was observed from 01 May 2022 to 13 November 2022.

Fig. 12
figure 12

a Daily ice volume lost at the glacier terminus, estimated by DOD of pairs of point clouds spaced by 5 days. b Cumulative curve of the ice volume loss during the study period

The daily glacier was estimated continuously tracking a point located at the center of the top edge of the terminal ice cliff, extracted as described in Sect. 3.6. The glacier retreat curve is shown in Fig. 13b and, similarly to that of volume variations, it shows a higher rate of retreat during the summer months, while the retreat velocity decreased between September and November. From 01 May 2022 to 13 November 2022, a retreat of 17.8 m was estimated.

Fig. 13
figure 13

a Location of the segmented ice cliff top edge at the beginning of each month from May to November 2022. The basemap is derived from a previous UAV survey carried out by the authors in July 2022 and serves as a visual reference; b Estimation of daily retreat based on the displacement of the top edge along the flow direction (X-direction). The location of the ice cliff at 01 May 2022 is considered as baseline for the retreat. The orange marker indicates the day of the UAV-A survey

4.6 Validation of the Stereo Models with UAV Data

To validate the stereo models, both in terms of internal geometric and absolute georeferencing errors, the dense stereo point clouds were compared with those obtained from UAV-A (28 July 2022) and UAV‑B (05 August 2022) flights. The signed difference between the UAV point clouds and the stereo point clouds acquired on the same day were computed using the M3C2 algorithm (Lague et al. 2013), in CloudCompare. M3C2 first computes a 3D local surface normal for each core point at a scale D, by fitting a plane to the core points located within a radius of size D/2. Once the normal vectors are defined, the local distance is computed for each core point as the mean distance of the target points that fall inside a cylinder oriented as the normal vector, projected to the cylinder axis.

The comparison with the UAV-A point cloud resulted in a non-significant mean difference between the point clouds of 0.01 m, with a standard deviation of 0.04 cm, indicating absence of systematic errors in stereo point cloud georeferencing. The largest differences were observed at the edges of the stereo point cloud and inside the ice cave. Considering the UAV‑B block, the mean difference was 0.05 m, with a standard deviation of 0.03 m. The non-negligible mean difference was likely due to a combination of the georeferencing accuracy of the stereo point cloud and the georeferencing error in the UAV model, which was in the order of few centimeters (see Sect. 2.3). UAV‑B block, in fact, had limited coverage in the surveyed area and fewer GCPs compared to UAV-A block. Overall, from both the comparison showed an error of stereo models smaller than 10 cm. Therefore, a decimetric level of precision can reasonably be considered for the stereo models.

4.7 Glacier Surface Velocity and Morphology

DIC displacement maps allow for evaluating the surface kinematics of the glacier and the moraines (Fig. 14a). We orthorectified the results only in the area of the right lobe up to 1 km of distance from the terminal ice cliff, covering most of the north lobe of the glacier. Since the morphology of the frontal ice cliff evolved and moved significantly during the year, we did not orthorectify the displacement vectors in this area (Fig. 14b).

Fig. 14
figure 14

a DIC displacement map computed by DIC between 25 and 30 July 2022, with vectors expressed in px/day. b Orthorectified displacement map computed by DIC between July 25 and 30 2022, with vectors expressed in meters/day; The two triangles indicate the points P1 and P2 where we extracted the velocity time series. (c, d) Glacier domains characterized by specific morphology and kinematics. Beside the glacier body, on the orographic right, the main moraine ridge is visible, currently partially vegetated. The debris talus is the portion of the moraine destabilized by glacier movement. The mobilized moraine deposit is composed of moraine material that has collapsed onto the glacier surface and is moving downhill with the glacier

From the displacement maps of the oblique (Fig. 14a) and georeferenced images (Fig. 14b), it was possible to identify different domains, based on their morphology and kinematic behavior (Fig. 14c,d):

  1. 1.

    The frontal ice cliff of the glacier, which was the portion more prone to ablation. The slope was very steep, and several ice and rock falls occurred, particularly at the upper edge, at the limit with the rear part of the glacier.

  2. 2.

    The debris-covered main body of the glacier, covering the last 400 m of the lobe. This region was gentler, and the velocity was higher in the central part.

  3. 3.

    The mobilized morainic deposit, where the lack of lateral pressure from the glacier due to the loss of volume destabilized the original moraines. The internal part collapsed onto the glacier and moved downslope along with the glacier.

  4. 4.

    The debris talus resulting from erosional processes on the moraine, where the boundary between the glacier and the stable morainic deposit was partially covered by a debris deposit formed as a result of superficial erosional processes on the steep morainic side.

Compared to the glacier main body, the two latter domains moved at progressively lower rates, with the morainic talus showing the slowest velocity. Besides the kinematic regimes, these domains are clearly distinct due to long longitudinal fractures indicating displacement gradients. In the oblique images (Fig. 14c), it is possible to recognize other portions of the glacier outside the area of interest of this study, including a lower crevassed area and the icefall that feeds the glacier lobes.

We considered the displacement time series of two points on the glacier surface: one at 10 m from the frontal ice cliff at the end of the considered period (P1) and one in the central part approximately 120 m from the front (P2), as indicated in Fig. 14a,b. The kinematics of such points is representative of the respective domains. The spatial pattern of displacement (Fig. 14a,b) did not change during the period of study. The orthorectified 3D components of daily velocity and the velocity module, calculated between pairs of images spaced 5 days apart (see Sect. 3.5.1), are shown in Fig. 15. The time series exhibit similar behavior, with a Spearman correlation coefficient of 0.88, and show comparable velocity values between May and the first half of June and from the second half of September until the end of the period. However, P1 has a 30% higher velocity during the warm season, with values ranging from \(0.15\,\text{m}\)\({\text{d}}^{-1}\), occurring on a few days during the second week of August and at the beginning of September, to \(0.23\,\text{m}\)\({\text{d}}^{-1}\) in mid-July. The south–north velocity component is the highest, varying during the warm season between \(0.10\,\text{m}\)\({\text{d}}^{-1}\) to \(0.18\,\text{m}\)\({\text{d}}^{-1}\) (P1) and \(0.06\,\text{m}\)\({\text{d}}^{-1}\) to \(0.15\,\text{m}\)\({\text{d}}^{-1}\) (P2). The east–west component is approximately \(0.05\,\text{m}\)\({\text{d}}^{-1}\) lower, while the vertical component assumes smaller values, approximately \(0.03\,\text{m}\)\({\text{d}}^{-1}\).

Fig. 15
figure 15

Time series of displacement extracted in the terminal part of the glacier lobe. The red curves correspond to point P1, located at 10 m from the terminal ice cliff (considering the location of the terminal ice cliff on 28 July 2022), the blue curves correspond to point P2 located 120 m far from the ice cliff. The location of points P1 and P2 is marked in Fig. 14

4.8 Velocity Orthorectification Uncertainty

To estimate the error related to DSM variations, we simulated two different snout positions, as described in Sect. 3.7.1. The first simulated snout position was 12 m downstream of the location of the terminal ice cliff on 28 July 2022, which was the most advanced front position at the beginning of the season (Fig. 13b). The second simulated snout position was 5 m upstream, representing the most retreated position at the end of the season (Fig. 13b). On average, the different front locations caused a glacier elevation change of \(+1.76\,\text{m}\) and \(-0.88\,\text{m}\) for the advanced and retreated positions, respectively. The differences between the velocity vectors orthorectified with the true and the simulated DSMs are summarized in Fig. 16. The velocity vectors considered were measured by DIC between 25 and 30 July 2022, during a 5-day period centered around the acquisition of the UAV-A DSM of 28 July 2022. It was observed that the velocity vertical component \(v_{Z}\) was the least influenced, with a normalized median absolute deviation in the Z direction (MAD\({}_{z}\)) of \(2.6\%\) for the advanced simulated position and \(1.3\%\) for the retreated position. The MAD of the east and north planimetric components \(v_{E},v_{N}\) were similar and approximately \(2.5\) times higher than MAD\({}_{z}\), while the normalized MAD of the velocity modules were \(5.2\%\) and \(2.6\%\) (advanced and retreated positions).

Fig. 16
figure 16

Uncertainty analysis of velocity vector orthorectification against possible DSM variation. Blue boxes represent the difference between the velocity vectors back-projected on the true DSM of 28 July 2022 and those back-projected on a simulated DSM advanced downstream by 12 m (as it was in May 2022). Red boxes represent the difference between the true velocity vectors and those obtained with a simulated DSM with the glacier front retreated 5mupstream (as it was in November 2022). \(v_{E}\), \(v_{N}\), \(v_{z}\), and \(v\) are, respectively, the three components in the east, north, and height directions and the magnitude of the differences of the velocity vectors

4.9 Comparison Between Surface Velocity, Frontal Ice Loss and Meteorological Variables

We compared the time series of ice volume variation \(dV\), obtained with the stereo cameras, and the magnitude of the surface velocity \(v\) obtained by the monoscopic camera approach at the locations of points \(P1\) and \(P2\) (labeled \(\mathbf{v}_{P1}\) and \(\mathbf{v}_{P2}\)) with the series of daily meteorological measurements acquired by the AWS. We observed a significant relationship between the daily average air temperature \(T\), glacier surface velocity, and volume variations, while precipitation and incoming solar radiation were decorrelated. Figure 17 shows the time series and the seasonal trend. All the series exhibit the same behavior, which is in phase. To remove the main seasonal trend and analyze the detrended signal, we subtracted the robust LOESS fit (Cleveland 1979) from the original series and computed the Spearman correlation coefficients for each pair of variables (Table 2). The correlations were always larger than \(0.49\) for the detrended signal and larger than \(0.81\) for the original series.

Fig. 17
figure 17

Time series of the daily volume loss due to the glacier terminus retreat, compared to the time series of glacier surface velocity extracted by monoscopic DIC at the location of point P1 (located close the ice cliff terminus and marked with an orange triangle in Fig. 15a) and the 5‑day smoothed time series air temperature measured by the AWS at the location marked in Fig. 1

Table 2 Spearman correlation coefficients between original (\(\rho_{S}\)) and detrended \(\rho_{D}\) signals for the daily volume variation (\(dV\)), mean air temperature (\(T\)), and surface velocity of points \(P1\) and \(P2\) (\(\mathbf{v}_{P1}\) and \(\mathbf{v}_{P2}\), respectively). In all the cases, the p‑value was \(<10^{-5}\)

5 Discussion

5.1 Hand-crafted vs Deep Learning Matching

In the case of the Belvedere Glacier, the wide baseline between the two cameras posed a challenge for traditional hand-crafted matching techniques, which failed to find sufficient corresponding points for estimating camera poses. Traditional feature matching, including commercial solutions like Agisoft Metashape or open-source solutions such as COLMAP (Schoenberger et al. 2016) or MICMAC (Rupnik et al. 2017), yielded a limited number of poorly distributed tie points, ranging from a dozen to a hundred tie points. In all the cases, the tie points were concentrated mainly in the central part of the ice cliff, where the ice cliff was nearly vertical, and it was not possible to determine the reliably camera relative orientation.

On the other hand, DL feature matching algorithms outperformed traditional hand-crafted methods, successfully finding a significant number of corresponding features. SuperPoint and SuperGlue matched between 1000 and 3500 features, which were well distributed over the entire ice cliff with some along the streamwise right moraine in the background. This substantial matching success allowed for estimating a sparse but complete 3D reconstruction of the terminal ice cliff, serving as the starting point for a dense scene reconstruction. With a reliable camera pose and sparse reconstruction in place, traditional dense matching algorithms, like semi-global matching, proved to be effective. To further enhance the result, state-of-the-art deep learning-based dense matching algorithms, such as PSMNet (Chang and Chen 2018), UniMVSNet (Peng et al. 2022) and RAFT-Stereo (Lipson et al. 2021), or semi-dense such as LOFTR (Sun et al. 2021) and RoMa (Edstedt et al. 2023) can be used.

In this work, the SuperPoint and SuperGlue models were not fine-tuned to ensure the replicability of the proposed pipeline in different scenarios, without the need for a dedicated ground-truth dataset. In fact, acquiring a ground-truth dataset can be challenging, especially for natural scenarios and mountain environments. Typically, generating datasets for training wide baseline matching requires collecting a substantial number of image sequences with a normal baseline and artificially creating a wide baseline by skipping intermediate images. However, this approach would demand numerous field campaigns to capture series of UAV images under different environmental conditions and throughout various seasons, making it generally unfeasible. Moreover, the matching results obtained with the Belvedere stereo pairs were already satisfactory using pre-trained weights for outdoor environments without any fine tuning.

5.2 Merging Stereoscopic and Monoscopic Processing to Study the Glacier Dynamics

The stereoscopic and monoscopic workflows focused on different aspects of the glacier’s dynamics and the combined use of these two pieces of information is useful for the characterization of the glacier’s behavior of the studied area. The installation of two cameras allowed for the stereoscopic reconstruction of the glacier snout and the computation of volume variations, while the use of a single camera (C2) allowed for the characterization of the movement of the debris-covered sector of the lobe.

The proposed system was able to monitor the evolution of the glacier in its different sectors and supports the identification of different kinematic sectors (Fig. 14). The north lobe of the Belvedere Glacier exhibited a complex motion that involves multiple components. This included a retreat along the streamwise direction, a downslope forward motion due to glacier sliding, and a thinning of the glacier surface caused by ablation. From the stereoscopic processing, it was possible to extract morphology and position of the terminal lobe, which retreated of 17.6 m between May and November 2022 (Fig. 13), and ice volume loss (Fig. 12). On the other hand, the monoscopic approach allowed us to derive the forward motion of the glacier, by tracking features on the glacier surface. Additionally, the availability of volumetric and kinematic data allowed the combined analysis of potential relationship between volume variation, surface velocity, and external meteorological factors such as air temperature.

5.3 Glacier Velocity, Frontal Ablation, and Temperature

Typically, mountain glacier motion is dominated by basal sliding (Willis 1995), which is related to the state of the hydraulic drainage system (Vincent et al. 2016). In particular, the water pressure is higher in the presence of distributed small channels (Pimentel and Flowers 2011), while it diminished when large cavities form and a more efficient drainage develops (Nienow et al. 2005). As a consequence, glacier flow is often faster in early summer than at the end of the warm season (Sanders et al. 2018; Vincent et al. 2016). Therefore, at seasonal scale, glacier surface velocity is sometimes not linked to air temperature (Sanders et al. 2018). At hourly to daily scale, several studies showed that the flow velocity is linked to sub-glacial water pressure (Sugiyama et al. 2010) and air temperature (Liu et al. 2019), even though Allstadt et al. (2015) did not observe a significant variation in the diurnal cycle. Moreover, in many cases, the flow velocity was observed to rise after intense rainstorms (Benoit et al. 2015; Horgan et al. 2015; Sugiyama et al. 2010).

We showed that the daily air temperature variations are strictly linked to glacier surface velocity at a daily to weekly scale. On a monthly scale, we registered the highest velocity in July, even though the absolute peak was reached in the first half of June, concurrently with high air temperature. Contrary to the findings of other scholars (Benoit et al. 2015; Horgan et al. 2015; Sugiyama et al. 2010), we did not register any relationship with rainfall episodes. These observations suggest that (at least in the lower portion of the north lobe) the sub-glacial hydraulic system remained homogeneous during the season. We also observed an even stronger relationship of the air temperature with the frontal ablation, which, to our knowledge, has never been documented before. This observation is not trivial because it shows that the influence of air temperature has immediate effects on the dynamics of the snout. Moreover, this relationship gradually weakens towards the upper sectors of the glacier. This was possible thanks to the stereoscopic system that produced volumetric data, increasing the understanding of the glacier snout dynamics.

The principal aim of our study was to develop a low-cost stereoscopic system that is able to derive daily 3D reconstruction in a complex environment, considering the acquisition geometry, the harsh environmental conditions, and the strong glacier activity. Therefore, we focused our attention to the frontal part of the glacier, which presents the worst conditions, since it experience fast changes in the geometry due to the fast ice melting and the related morphological variations. As demonstrated in the manuscript, the system we developed proves effective even under these complex conditions. We are optimistic about its adaptability to other sectors of the glacier characterized by less pronounced morphological changes, given its robust performance in the demanding frontal region. This can be achieved by adding additional cameras for extending the viewing coverage for 3D reconstruction.

5.4 Transferability of the System

The transferability of the low-cost camera system used in this study to other test sites is a crucial aspect to consider. The installation process of the system is relatively straightforward, as the cameras are mounted on topographic tripods, eliminating the need for complex and permanent structures. This ease of installation enables the system to be replicated at different sites without significant difficulty. While a more robust installation with minimal vibrations and camera rotations would be beneficial, the simplicity of the setup allows for efficient deployment by a small team, without the requirement of specialized equipment.

The Belvedere glacier, characterized by debris cover and a dirty ice terminal cliff, presents favorable conditions for 3D reconstruction due to the presence of distinct patterns in the images. However, successful 3D reconstructions of bare ice or snow-covered areas have also been achieved in previous studies. Belloni et al. (2023) and Gindraux et al. (2017) utilized UAV-SfM to generate 3D models of debris-free or partially debris-covered glaciers. Taylor et al. (2023) demonstrated the effectiveness of an extremely low-cost camera for 3D reconstruction of a debris-free glacier calving front in Iceland. Additionally, Avanzi et al. (2018) and De Michele et al. (2016) employed UAV-based photogrammetric reconstruction to map snow depth. Despite snow having even fewer discernible patterns than bare ice, the use of multi-camera photogrammetry allowed for achieving accurate 3D reconstructions. It is worth noting that these examples utilized traditional feature matching techniques. The results may be further improved employing state-of-the-art DL sparse and dense matching techniques. Even though the Belvedere glacier’s characteristics facilitate 3D reconstruction, successful reconstructions of debris-free ice and snow-covered areas using similar techniques have been demonstrated in the literature, supporting the broader applicability of the low-cost camera system for monitoring glaciers in diverse environments.

6 Conclusions

We presented a pilot study that introduced a low-cost multi-camera system for high-rate glacier monitoring. By utilizing two cameras near the snout of the Belvedere Glacier, we successfully integrated 3D reconstruction from stereo cameras and surface velocity estimation on monoscopic cameras by DIC. This approach enabled accurate estimation of glacier dynamics, including surface movement and terminus ice volume loss and retreat. The application of DL matching techniques proved to be a significant strength of our approach, as it allowed us to overcome the challenges posed by wide camera baselines. Furthermore, the affordability and ease of installation made the system highly replicable, offering the potential for widespread deployment in monitoring multiple glaciers or dynamic objects in mountainous environments.

One of the key findings of this study was the significant correlation observed between air temperature and glacier surface velocity and ablation. In the literature, the relationship between ablation and temperature is well established, but rarely assessed quantitatively in the short term. On the other hand, a direct relationship between surface displacement and temperature has not been established in the literature with the same level of detail as in our study.

Future developments of this system include expanding the study area by adding more cameras, which would provide a broader coverage of glacier dynamics. Additionally, this would allow for building daily DSMs of the glacier for georeferencing DIC velocity fields, enhancing the accuracy of the monitoring system. Integrating ICEpy4D with other libraries, such as py4dgeo (Anders et al. 2021; Winiwarter et al. 2023), would enable 4D change detection analysis on point clouds. Moreover, the adoption of full BA with Ceres Solver (Agarwal et al. 2022) aims at eliminating the dependency on commercial software such as Agisoft Metashape and enhancing the system’s flexibility.