Remote Sensing of Environment

Rock glaciers are an important component of the cryosphere and are one of the most visible manifestations of permafrost. While the signi ﬁ cance of rock glacier contribution to stream ﬂ ow remains uncertain, the contribution is likely to be important for certain parts of the world. High-resolution remote sensing data has permitted the creation of rock glacier inventories for large regions. However, due to the spectral similarity between rock glaciers and the surrounding material, the creation of such inventories is typically conducted based on manual interpretation, which is both time consuming and subjective. Here, we present a novel method that combines deep learning (convolutional neural networks or CNNs) and object-based image analysis (OBIA) into one work ﬂ ow based on freely available Sentinel-2 optical imagery (10m spatial resolution), Sentinel-1 interferometric coherence data, and a digital elevation model (DEM). CNNs identify recurring patterns and textures and produce a prediction raster, or heatmap where each pixel indicates the probability that it belongs to a certain class (i.e. rock glacier) or not. By using OBIA we can segment the datasets and classify objects based on their heatmap value as well as morphological and spatial characteristics. We analysed two distinct catchments, the La Laguna catchment in the Chilean semi-arid Andes and the Poiqu catchment in the central Himalaya. In total, our method mapped 108 of the 120 rock glaciers across both catchments with a mean overestimation of 28%. Individual rock glacier polygons howevercontained false positives that are texturally similar, such as debris-ﬂ ows, avalanche deposits, or ﬂ uvial material causing the user's accuracy to be moderate (63.9 – 68.9%) even if the producer's accuracy was higher (75.0 – 75.4%). We repeated our method on very-high-resolution Pléiades satellite imagery and a corresponding DEM (at 2m resolution) for a subset of the Poiqu catchment to ascertain what di ﬀ erence image resolution makes. We found that working at a higher spatial resolution has little in ﬂ uence on the producer's accuracy (an increase of 1.0%), however the rock glaciers delineated were mapped with a greater user's accuracy (increase by 9.1% to 72.0%). By running all the processing within an object-based environment it was possible to both generate the deep learning heatmap and perform post-processing through image segmentation and object reshaping. Given the di ﬃ culties in di ﬀ erentiating rock glaciers using image spectra, deep learning combined with OBIA o ﬀ ers a promising method for automating the process of mapping rock glaciers over regional scales and lead to a reduction in the workload required in creating inventories.


Introduction
Globally, the cryosphere is in a state of rapid change, for example mountain glaciers have lost 335 ± 144 Gt of mass per year between 2006 and 2016 (Zemp et al., 2019). As a consequence, many regions of the world are expected to undergo changes in seasonal and long-term water availability (Kehrwald et al., 2008;Immerzeel et al., 2010;Piao et al., 2010;Huggel et al., 2020), with the strongest effects being felt in arid and semi-arid environments (Huss and Hock, 2018;Pritchard, 2019).
Rock glaciers are typically tongue-or lobate-shaped landforms, consisting of poorly sorted, angular debris and ice-rich sediments, are commonly found in semi-arid and arid mountains. Rock glaciers can contain significant amounts of ice and are a visible manifestation of permafrost (Barsch, 1996;Haeberli et al., 2006;Rangecroft et al., 2014;Jones et al., 2018a).
Rock glaciers are generally categorised as active rock glaciers that contain sufficient ground ice to permit measurable ice deformation and therefore surface movements, inactive rock glaciers or transitional rock glaciers that contain ice, but not enough to actively deform, and relict rock glaciers that contain minimal to no ice (Barsch, 1996;Cremonese et al., 2011;Colucci et al., 2019). Rock glaciers are more resilient to climate change than other components of the cryosphere as both the rocky material and the active layer insulate the ice contained within the landform. As such they may become an important future water source, especially for arid and semi-arid regions, where in some cases the demand for water has been increasing while future climate scenarios predict a decrease in precipitation (Bolch and Marchenko, 2009;Azocar and Brenning, 2010;Rangecroft et al., 2015;Schaffer et al., 2019). Although studies are scarce, some estimates put current rock glacier contribution to annual streamflow as high as 13-30% for some catchments (Geiger et al., 2014;Schaffer et al., 2019). It is therefore important to create and maintain up-to-date and accurate inventories of rock glaciers.
Mountain glaciers can typically be semi-automatically delineated with high-resolution satellite imagery, such as Landsat, ASTER, or Sentinel-2 Pope and Rees, 2014) using a combination of band ratios between the visible and shortwave infrared sections of the electromagnetic spectrum for clean glacier ice, and a mixture of topographic and Synthetic Aperture Radar (SAR) coherence datasets for debris-covered ice (Bolch et al., 2007;Racoviteanu and Williams, 2012;Paul et al., 2015;Robson et al., 2015). These methods are however not applicable to the identification of rock glaciers, which are both spectrally inseparable from the surrounding paraglacial terrain and are deforming at rates sufficiently low to maintain SAR coherence. Historically, the majority of rock glacier inventories have been created based on manual interpretation of aerial or very high-resolution satellite imagery. The earliest rock glacier inventories were produced solely using manual interpretation of aerial photography (Wahrhaftig and Cox, 1959;Outcalt et al., 1965;Gorbunov and Titkov, 1989). Manual interpretation of high-resolution satellite or aerial imagery relies on identifying surface features indicative of rock glaciers, such as furrows, ridges and steep frontal slopes, and separating them from the surrounding terrain (Jones et al., 2018b). High-resolution aerial photography in conjunction with high-resolution Digital Elevation Models (DEMs) has been used to create inventories of rock glaciers in several mountain ranges (Esper Angillieri, 2009;Scotti et al., 2013;Onaca et al., 2017).
The prohibitively high cost of sub-metre-scale satellite imagery and elevation products however means in regions without accessible national aerial photography datasets, Google Earth imagery is often the best alternative in case of limited budget, for example Rangecroft et al. (2014), Jones et al. (2018b), and Pandey (2019). The use of Google Earth imagery however impedes automation of workflows, and analysis is restricted to the manual creation of polygons within the software.
Many recent rock glacier inventories have been produced with the aid of auxiliary data, for example topographic datasets can be used to help distinguish rock glaciers by their morphology (Falaschi et al., 2014;Rangecroft et al., 2014;Bolch et al., 2019), while Differential Interferometric Synthetic-Aperture Radar (DInSAR) can be used to identify areas that are actively deforming (Liu et al., 2013;Barboux et al., 2014;Bodin et al., 2016;Necsoiu et al., 2016;Villarroel et al., 2018). DInSAR however cannot be directly used for automatically detecting rock glaciers due to its sensitivity to atmospheric disturbances and ability to only observe deformations in the satellite line of sight (LOS). Additionally foreshortening, layover, and shadowing can cause problems in mountainous terrain. Additionally, DInSAR is only able to distinguish actively deforming land surfaces which could include other kinds of surface deformations such as solifluction.
The creation of rock glacier inventories is often subjective, with results varying due to the datasets used, the image interpreter, and landform definitions chosen (Bolch and Gorbunov, 2014;Brardinoni et al., 2019). As such, an automated methodology is needed. Some studies have made advances in automated identification of rock glaciers. The methods developed thus far typically rely on surface textures or rock glacier morphological characteristics (Janke, 2001;Brenning, 2009;Brenning et al., 2012). To date, these methods have only been applied to regions with a relatively small number of rock glaciers and have not been widely applied to larger regions.
Finally, recent studies have used remote sensing data combined with existing rock glacier inventories to distinguish active rock glaciers from inactive and relict rock glaciers. Bertone et al. (2019) used multi-temporal Sentinel-1 derived backscatter and coherence rasters to differentiate between moving and non-moving rock glaciers. Similarly, Kofler et al. (2019) used Sentinel-2 imagery, a LiDAR DTM within three classifiers (logistic regression (generalized linear model -GLM), support vector machine (SVM) and random forest (RF)) and assigned rock glaciers in the South Tyrolean Alps in Italy to be either intact or relict.
Convolution Neural Networks (CNNs), also known as deep learning, in conjunction with Object-Based Image Analysis (OBIA) presents a possibility for automated identification of rock glaciers. CNNs are a branch of machine learning that are increasingly being applied to remotely sensed imagery. CNNs rely on large sample datasets to train the algorithm to recognise recurring patterns within datasets and are typically utilised in applications where spectral characteristics are not sufficient, such as landslides and avalanches (Ding et al., 2016;Yu et al., 2017;Bianchi et al., 2019;Ghorbanzadeh et al., 2019), permafrost thaw slumps (Huang et al., 2020), urban mapping (Längkvist et al., 2016;Mahdianpari et al., 2018;Zhang et al., 2018), and ship identification in various sea and ice conditions (Bentes et al., 2016;Gallego et al., 2018).
OBIA is an image analysis method that relies on near-homogeneous objects created through image segmentation as the basis for classification, as opposed to individual pixels. Analysis at the object-level rather than the pixel-level allows classifications to utilise contextual, hierarchical, and spatial characteristic of image objects in addition to solely using spectral information. In recent years OBIA has proven to be a useful method for automating the delineation of natural phenomena including clean-ice and debris-covered glaciers (Rastner et al., 2013;Robson et al., 2015). Using CNNs together with OBIA is likely to provide a novel and powerful technique to automatically identify rock glaciers in mountainous landscapes.
The main objective of this study is to develop a novel method based on deep learning and OBIA to extract the location and extent of rock glaciers in two catchments, namely the La Laguna catchment in Northern Chile, and the Poiqu catchment in Central Himalaya. We ascertain the accuracy and transferability of our Convolutional Neural Network Object-Based Image Analysis method by comparing our results with reference outlines of rock glaciers based on manual interpretation of orthorectified Pléiades imagery (henceforth referred to as RG_Man). We assess both Sentinel 2 imagery (spatial resolution 10 m, henceforth referred to as CNN_OBIA) and very-high resolution Pléiades imagery (resolution 0.5 m, resampled to 2 m, henceforth referred to as CNN_OBIA_Ple).

Backgrounddeep learning
Deep learning is a rapidly emerging area within the fields of remote sensing and geoinformatics Li et al., 2018;Ma et al., 2019). Deep learning, as well as other machine learning or artificial intelligence methods, attempt to interpret imagery in the same way as a human operator would, relying not only on pixel values but reoccurring patterns and textures (Ma et al., 2019). As such, deep learning can be used to extract information from complex situations where normal classification methods are not sufficient .
In this study we restrict our analysis to CNNs which are the most widely used deep learning model and are well suited to multispectral satellite imagery (Ma et al., 2019). CNNs are made up of three components: convolution layers, pooling layers, and fully connected layers. The input data, for example a multispectral image, is first convolved with a moving window of a fixed size (for example a 5 × 5 or 3 × 3 pixel kernel). This then creates the first convolution layer. Each kernel is repeated several times, each one looking for a different distinct texture or pattern. These are referred to as feature maps. A convolution layer is often expressed in terms of its pixel size in X and Y dimensions, and its number of feature maps in the Z dimension, for example, a convolution of 5 × 5 × 40 refers to an image being convolved by a 5 × 5 kernel producing 40 feature maps.
CNNs are typically composed of multiple convolution layers. These convolution layers are often interspaced by pooling layers (Fig. 1). A pooling layer acts to aggregate pixels in a layer, thereby producing a new layer of reduced dimensions. The most common type of pooling, and the type used in this analysis, is max pooling. Max pooling simply uses a 2 × 2 kernel and takes the highest value and outputs it to the new layer. Lastly, a fully-connected layer brings all the previous layers together, and determines which feature maps are most useful at identifying the landform or class that the CNN is aiming to classify (for example a rock glacier). The output from a fully-connected layer is a prediction raster, also known as a heatmap, where pixels range between 0 and 1, with higher values indicating a given pixel has a greater certainty of belonging to the desired class. A simplistic schematic explanation of a CNN is shown in Fig. 1.

Object-based image analysis (OBIA)
OBIA is an established method within remote sensing (Blaschke et al., 2014). OBIA works by segmenting imagery into near-homogeneous objects, which serve as the basis for subsequent classification. Image segmentation is perhaps the most critical step within OBIA and can strongly influence the final classification result (Drăguţ et al., 2014;Jozdani and Chen, 2020). If the objects created are too large then multiple features can be treated as single objects, while too small objects reduce the effectiveness of using shape and contextual constraints within the classification (Rastner et al., 2013;Robson et al., 2015). Image segmentation is a bottom-up process, where additional imageobject levels can be created by segmenting existing image objects.
Conducting the analysis at the object-level rather than the pixellevel allows spatial, contextual, hierarchical and textural information to be used instead of solely relying on spectral reflectance values (Lang, 2008). Additionally, OBIA permits the integration of many data types (optical, radar, topographic, point cloud, vector) within the analysis. OBIA has been shown to be well-suited for the mapping of natural phenomena such as debris-covered glaciers (Rastner et al., 2013, Robson et al., 2015, landslides (Hölbling et al., 2012;Hölbling et al., 2016) and flood extents (Mallinis et al., 2013;Dao and Liou, 2015). By utilising the spatial and contextual properties of image objects, it is possible to automate an element of the classification post-processing using OBIA, for example removing irregularly shaped, or asymmetrical image objects (Rastner et al., 2013, Robson et al., 2015.

Study area
We conducted our analysis in two distinct periglacial catchments (Fig. 2). The study areas were chosen due to the availability of VHR Pléiades satellite imagery, as well as each site including rock glaciers of a variety of sizes and activity.

La Laguna catchment, Chile
The La Laguna catchment (~30 o 11'53 S, 69 o 56'15 W) is situated at the headwaters of the Elqui River catchment, with glaciers and rock glaciers contributing between 4 and 13% of the annual streamflow (Favier et al., 2009;Pourrier et al., 2014). The study area is situated between~4000 and 6000 m a.s.l. The catchment is~140 km 2 in size and contains 105 rock glaciers (~14.8 km 2 ) according to the Chilean National Water Directorate (DGA) inventory. Tapado Glacier (1.26 km 2 ) is the largest glacier in the catchment and contains a clean ice section, a debris-covered glacier section, and a rock glacier tongue. Precipitation falls mainly as snow and is concentrated within the austral winter. The area is semi-arid and cold, and at an elevation of approximately 3100 m a.s.l. has a mean precipitation rate of 167 mm per year measured between 1970 and 2009, and a mean annual air temperature (MAAT) of 8°C between 1974 and 2011. The MAAT has been reported to be rising by 0.17°C per decade between 1974 and 2011 .

Poiqu catchment, Central Himalaya
The Poiqu catchment (~28 o N, 85 o E) is a transboundary watershed that drains through the Himalayas to Nepal and ultimately into the river Ganges. The catchment has a drainage area in excess of 2000 km 2 and ranges in elevation from~1100 to > 8000 m a.s.l. (Mt. Shishapangma, 8027 m). It contains an assortment of clean-ice, debriscovered glaciers, and rock glaciers. The mean annual precipitation (measured at Nyalam, 3750 m a.s.l.) is~650 mm and is influenced by the Indian monsoon, with most precipitation falling in the autumn (September to November). Mean annual temperatures at Nyalam range from 10.7°C to −3.4°C with temperatures typically below freezing between November and March (Xiang et al., 2014). We restricted our study to approximately 1500 km 2 of the Poiqu catchment that according to the inventory created P. Rastner  contains 140 rock glaciers (~21 km 2 in total) with sizes varying from < 0.01 to > 1 km 2 . A smaller subset (~63 km 2 ) centred around the Mulaco massif in the centre of Poiqu catchment, an area that contains a total rock glacier area of 3.5 km 2 , was chosen for repeating the analysis using Pléiades imagery. The areas processed for both subsets are shown in Fig. 2.

Data
For both study areas, we used freely available Sentinel-2 imagery (Blue, Green, Red, NIR and SWIR bands) as well as SAR coherence data generated from interferometric Sentinel-1 imagery (Table 1). We included the coherence data as the lower coherence values over debris- Fig. 1. A simplified example of a Convolutional Neural Network workflow. A convolution based on a 6 × 6 moving window is applied to an input image, resulting in the first convolutional layer. Max pooling, and an additional convolution are then applied. The heatmap shows the probability that a pixel belongs to a given class. Note that the process has been simplified and most CNNs would involve additional convolution and pooling layers, feature maps, and additional input bands. covered glaciers helped separate them from rock glaciers, which can in some cases be spectrally similar.
All Sentinel data was downloaded through the Copernicus Open Access Hub. Topographic data is important in rock glacier mapping, and while a high-resolution, freely available DEM was available for the Poiqu catchment (namely the 8 m resolution High Mountain Asia DEM (Shean, 2017)) it was necessary to have consistent and comparable datasets for both study regions that would also be applicable at other sites for future implementation. For that reason, we used tri-stereo Pléiades satellite imagery from 2018 and 2019 to generate DEMs over both study areas (Table 1).
Reference rock glacier outlines were used for the generation of training data. For both study areas we used manually delineated outlines based on orthorectified Pléiades imagery and hillshade models based on the Pléiades DEMs. For the La Laguna catchment these were the outlines created by Schaffer and Macdonell (2020) and for the Poiqu catchment the outlines are those created by P. Rastner .
It should be noted that in Chile, both glaciers and rock glaciers are typically incorporated into one inventory (Nicholson et al., 2009). As such, we excluded clean-ice glaciers from our analysis. We did, however, include two landforms (Tapado Glacier and an unnamed landform (CL104300033)) that have both an identified debris-covered glacier component as well as a rock glacier component .

Methods
The methods used in this paper can be split into three sections ( Fig. 3): (A) pre-processing of imagery and generation of training data, (B) image classification, and (C) post-processing and accuracy assessment. Full details about the thresholds and parameters used are given in Fig. 3. A combination of ArcGIS 10.7, eCognition 9.5, PCI Geomatica 2018, the European Space Agency's Sentinel Application Platform (SNAP) and Correlation Image Analysis Software (CIAS) (Kääb and Vollmer, 2000) were used.

Data pre-processing
The Pléiades tri-stereo stereo images were processed in PCI Geomatica 2018. DEMs were produced using the Rational Polynomial Coefficients (RPCs) provided with the imagery, as well as between 100 and 200 automatically generated tie points per tri-stereo pair. The DEMs were extracted at both 10 m and 2 m postings using a Semi-Global Matching algorithm, which has been shown to outperform normalised cross-correlation and thereby produce cleaner DEMs (Hirschmuller, 2007). For the Poiqu catchment, the resulting ten DEMs were mosaicked together. The multispectral imagery was panchromatically sharpened using a multi-resolution analysis (MRA) fusion that uses wavelet decomposition to preserve spectral fidelity (González-Audícana et al., 2005). The pan-sharpened images were then orthorectified and mosaicked together. As a prerequisite to running the CNN on the data, it was necessary to convert all the satellite imagery and DEMs into 32-bit floating rasters with pixel values normalised between 0 and 1. SAR coherence rasters were generated for both study areas using SNAP. For each location, two Sentinel-1 single look complex (SLC) images in interferometric wide (IW) swath mode with a temporal baseline of 12 days were co-registered together using a DEM-assisted co-registration based on the Pléiades DEMs. The resulting image stacks were then deburst before the coherence was generated. The resulting rasters were then multilooked using ten pixels in range and two in azimuth, before being converted to ground range and exported as a GeoTiff.
The RG_Man outlines were used for training data. We extracted 30% of RG_Man for training data based on a random number generation. Given the small number of rock glaciers in the subset covered by the Pléiades imagery, 50% of the polygons were used as training. This resulted in 2.3 km 2 (n = 11) and 6.1 km 2 (n = 41) of rock glaciers for the La Laguna catchment and Poiqu catchment, respectively, and 0.7 km 2 (n = 8) for the Pléiades subset. The remaining polygons (n = 50 for La Laguna, n = 117 for Poiqu, n = 7 for Poiqu Pléiades subset) were used for accuracy assessment. In both study areas, adjacent rock glacier polygons were merged, and small polygons (< 0.05 km 2 ) were removed. Approximately 300 random training points were generated within the rock glacier training outlines. Points were also generated for debris-covered glaciers, clean ice glaciers and stable (i.e. not related to fluvial, glacial or periglacial processes) terrain. All data were projected into UTM 19 S for La Laguna catchment, and UTM 45 N for the Poiqu catchment.

Image classification
The entire image classification procedure (training of the CNN, applying to the entire imagery, OBIA reshaping) was conducted within eCognition 9.5 (Fig. 3). The CNN model used in this study, Google Tensorflow, is free and open source; however, by running the process through eCognition it was possible to have the CNN and OBIA reshaping components in one seamless workflow.

Sentinel-2 analysis
We trained the CNN using the Sentinel 2 (Red, Green, Blue, NIR, SWIR), Sentinel-1 SAR coherence, as well as the DEM and curvature for the classes of rock glacier, debris-covered glacier, clean ice, stable terrain, and shadows. Additionally, we found it beneficial to also include the NDVI (Normalised Difference Vegetation Index), MNDWI (Modified Normalised Difference Water Index) (Xu, 2006) and SAVI (Soil Adjusted Vegetation Index) (Alba et al., 2011) within the analysis.
As a first step, the pixels with the lowest 5% reflectance in a mean of all the multispectral channels were classified as shadows and masked out. A 10-m buffer was constructed around the training point files for each and used to generate sample patches measuring 30 × 30 pixels using all the multispectral channels, the DEM and the curvature. Both the number of sample patches as well as the distribution of used can influence the output of a CNN. Some studies have chosen the number of sample patches per class based on the prevalence of each class (e.g. Guirado et al., 2017;Csillik et al., 2018;Zhang et al., 2020). In our case, we manually chose the number of sample patches used for training the model based on a combination of relative spectral distinction between classes (for example clean ice, debris-covered ice, and shadows requiring less samples than rock glaciers or stable ground) and the prevalence of a class (for example stable ground covering most of the image). Five thousand sample patches were collected for rock glaciers, 3000 for clean ice, debris-covered ice, and shadows, and 15,000 for stable ground.
It is still an ongoing debate within the machine learning community how to tune a CNN for optimal performance (Csillik et al., 2018;Schratz et al., 2019). In the absence of other studies that have applied CNNs to identify rock glaciers, we experimented with the simplest CNN architecture (a one-layer model) and adapted the kernel sizes, the number of convolutional layers and feature maps by a trial and error approach on the data covering the La Laguna catchment. In order to ascertain the transferability of our method we then applied the same ruleset on the Poiqu catchment without modifications. We found that having more convolutions with relatively small kernel sizes produced a cleaner heatmap than having fewer layers with larger kernels. In the end, a CNN with five convolutional layers gave the cleanest heatmap with the most contrast between rock glaciers and other classes. Our final CNN had the architecture 3 × 3 × 70, 7 × 7 × 40, 3 × 3 × 20, 1 × 1 × 12, 3 × 3 × 12 with max pooling applied to the third and fifth layers. The trained CNN model was then applied producing a predictive heatmap, with each pixel indicating the probability (ranging from 0 to 1 of rock glacier presence. The heatmap was then smoothed with a 7 × 7 Gaussian filter.

Pléiades imagery analysis
We repeated our analysis using the four Pléiades multispectral (Red, Green, Blue, NIR) bands at 2 m resolution for a subset of the Poiqu catchment. We chose to not use the 0.5 m pan-sharpened imagery due to the memory demands of processing at such a resolution. As the Pléiades satellite imagery does not include a shortwave infrared (SWIR) spectral band, the analysis had to be modified. This also meant that certain multispectral indices such as the MNDWI could not be included. Due to the finer resolution we however opted to include a canny edge detection filter on the slope raster in the analysis. The rest of the classification procedure was the same as for the Sentinel-2 imagery.

OBIA reshaping
The reshaping of image objects in order to create rock glacier outlines was also performed in eCognition. As is common with OBIA, parameters in the image segmentation as well as thresholds in the image classification were chosen subjectively (Hay and Castilla, 2008;Blaschke et al., 2014).
A three-level image segmentation was conducted based on the multispectral bands as well as the slope. We found that having a multilevel segmentation helped group rock glaciers into larger objects, thereby making the reshaping simpler to perform. The resulting objects were then classified using a combination of the mean deep learning heatmap, the morphology (mainly the slope, and the standard deviation of the slope which was used to gauge the surface roughness), object shape and contextuality. Details of the thresholds used are given in Fig. 3. Objects were classified as rock glaciers using a combination of fixed and fuzzy criteria. The resulting outlines were expanded using a pixel-based growing algorithm based on a 5 × 5 kernel. Objects were expanded if 50% of the kernel had a heatmap value greater than 0.3. This helped include more irregularly shaped parts of the rock glaciers, such as the sections next to the headwall of the mountain. Lastly, the outlines were smoothed using a 7 × 7 kernel pixel-based growing and shrinking algorithm before being exported as ESRI Shapefiles.

Accuracy assessment
Assessing the accuracy of remote sensing classifications can be troublesome and is heavily dependent on the quality of the validation data. For both study regions, RG_Man outlines were used based on orthorectified Pléiades imagery and accompanying DEM to map rock glaciers geomorphologically, following the criteria set out by Delaloye et al. (2019). A national rock glacier inventory exists for Chile (Dirección General de Aguas (DGA), 2012, Barcaza et al., 2017), however when we visually inspected the inventory there were several cases of over-and underestimates of rock glacier size and missing polygons. As such our accuracy assessment was exclusively based on the Pléiades derived outlines. We computed accuracies for the Sentinel-2 classifications over both catchments. An accuracy assessment was also performed for a subset of the Poiqu catchment using both Pléiades and Sentinel-2 imagery.
We determined accuracies in two ways. In both cases our accuracy assessment was based on the rock glacier outlines and rock glacier absence (i.e. clean ice, debris-covered glaciers, shadows, or stable terrain), The first method is a simplistic approach to quantifying classification accuracies based on the percentage overestimation or underestimation when compared to RG_Man. Many studies on automated glacier and rock glacier inventories use this method (for example Paul et al. (2004), Alifu et al. (2015), Mithan et al. (2019)). This however can be too simplistic, and neglects errors of omission and commission, which over larger areas can conceivably cancel each other out. As such we portioned our data into a training and validation set in which we determined errors of omission and commission; namely a user's accuracy (the percentage of the CNN_OBIA classification that is actually a rock glacier) and a producer's accuracy (the percentage of total rock glaciers that were classified by the CNN_OBIA method). Given that our analysis using the Pléiades data had a different extent to the catchmentscale analysis of Poiqu, we also computed accuracies for the Sentinel-2 analysis using the same extent as the Pléiades analysis to compare accuracies.
Many papers addressing machine learning on remote sensing imagery typically adopt cross-validation techniques (e.g. Brenning (2009), Brenning et al. (2012, Knevels et al. (2019)) which are less affected by spatial autocorrelation (Schratz et al., 2019). Cross-validation was difficult to implement within this study owing both to the computational challenges of re-running our analysis multiple times, as well as the OBIA refinement that we performed being based on subjective thresholds Our validation techniques are however in line with other studies that integrate CNN with OBIA (Csillik et al., 2018;Fu et al., 2019;Ghorbanzadeh et al., 2019;Zhang et al., 2020) although we do acknowledge that by not accounting for spatial autocorrelation, our accuracy assessment may be over-optimistic.

Results
For the validation data over both study sites, the CNN_OBIA classification mapped in total 108 rock glaciers (26.0 km 2 ) out of 120 (20.3 km 2 ) in RG_Man. This corresponds to an overestimation of 28.0% ( Table 2). The user's and producer's accuracies were 65.9% and 71.4% respectively, indicating that a moderately high proportion of the total rock glacier was identified yet the classification contains false positives.

La Laguna catchment
When looking at the validation data, our CNN_OBIA detected 26 of the 39 rock glaciers with a mean size of 0.2 km 2 (Fig. 4). In total 4.42 km 2 out of 4.39 km 2 were detected, representing a mean Table 2 Classification accuracies for the CNN_OBIA classification for the La Laguna catchment, Poiqu catchment and Poiqu catchment using Pléiades imagery based on the validation set of the data. Note that in order to assess the impact of using higher-resolution Pléiades imagery, we also present the accuracy of the Sentinel-2 classification for the same subset as the Pléiades analysis. The mean over/underestimation of total rock glacier area is shown for both catchments combined is also shown.

Classification
User accuracy (%) Producer accuracy (%) Over/Underestimation (%) overestimation of 0.6%. Of those detected, the three largest landforms are rock glaciers CL104300054 (1.50 km 2 ), CL104300039 (Tapado rock glacier; 1.01 km 2 ), and CL104300038 (0.63 km 2 ). Rock glaciers were detected at elevations ranging from 3814 to 4948 m a.s.l. Individual estimations of rock glacier area range considerably. Generally larger rock glaciers were classified with lower classification errors, with glaciers between 0.1 and 0.2 km 2 being on average underestimated by 6.8%, while rock glaciers less than 0.1 km 2 in area were overestimated by an average of 115.9%. We determined a user's accuracy of 63.9% and a producer's accuracy of 75.4%.

Poiqu catchment
When compared to the validation data, CNN_OBIA mapped 82 rock glaciers (18.0 km 2 ) out of 90 (12.5 km 2 ) resulting in an overestimation of the area by 43.6% (Fig. 5). There was a large variability of over-and underestimations between individual rock glacier polygons, from rock glaciers that were underestimated by > 90% to those that were vastly overestimated by more than 450%, although it should be noted that these rock glacier polygons were generally the smallest found in the RG_Man inventory. Generally, the smallest rock glaciers (< 0.05 km 2 ) were mapped with the lowest accuracies (overestimations of 207.5%). The magnitude of error decreased as the mean rock glacier area increased, with rock glaciers between 0.2 and 0.3 km 2 being underestimated by an average of 42.8%, while rock glaciers with areas between 0.3 and 0.5 km 2 were underestimated by 11.3%. The largest rock glaciers in the region however were overestimated by 37.4%. The user's accuracy was 68.9% indicating that the classification contained false positives within the classification. The producer's accuracy was 75.0% with a user's accuracy of 56.5%.

Pléiades imagery classification
The validation data for the portion of the Poiqu catchment that was classified with the Pléiades imagery contained seven rock glaciers totalling 2.1 km 2 . Of these the CNN_OBIA_Ple classification identified all seven rock glaciers (total area 2.6 km 2 ) with an overestimation of rock glacier area of 23.2%, as visible in Fig. 6. It seems that three rock glacier outlines were misclassified and grouped as a single polygon (Fig. 6). The misclassified landforms has a morphology that is similar to the rock glaciers, causing a high probability heatmap score resulting in CNN_OBIA_Ple grouping the landforms as one. It can be difficult to reliably determine from the satellite imagery if these landforms are periglacial, glacial or fluvial in nature When evaluating the influence of running the classification on Pléiades imagery instead of Sentinel-2 imagery, we found that the producer accuracy remains approximately the same (87.4% with Sentinel-2, 88.4% with Pléiades), however the user accuracy increases with the use of higher-resolution imagery (62.9% with Sentinel-2, 72.0% for Pléiades).When comparing the rock glaciers found in the validation data for both classifications, CNN_O-BIA_Ple outperformed CNN_OBIA on 3 polygons, was equivalent on 1 polygon, and performed worse on 2 polygons. The rock glaciers on the southern slope of the Mulaco massif were generally delineated with higher accuracies (74.4%) than those on the northern slopes (44.4%).

Use of deep learning to classify rock glaciers
On average, our CNN_OBIA classification managed to map rock glaciers in two distinct periglacial environments with a mean overestimation of 28%. The user accuracy ranges from 63.9 to 68.9% indicating that the final classified polygons included false positives that were not rock glaciers. The producer accuracies were 75.4% and 75.0% for the La Laguna and Poiqu catchments respectively, indicating that even although the individual rock glacier polygons were in most cases overestimated, a large proportion of the total rock glaciers were successfully identified. The CNN itself is only capable of identifying features large enough to be detected after the convolution. In our case, we convolved our image five times and applyied max pooling twice to reduce noise, meaning that a rock glacier approximately 30 × 30 pixels in size (300 × 300 m) would be visible as 3 × 3 (30 × 30 m) pixels on the generated heatmap, which is below the 0.025 km 2 (i.e. 158 × 158 m) threshold used in the OBIA refinement. As such, due to the finer spatial resolution the Pléiades classification was able to classify much smaller landforms with more subtle, smaller-scale morphological patterns that were not visible on the Sentinel-2 imagery. The higher producer's accuracy of CNN_OBIA_Ple therefore suggests that higher resolution satellite imagery is more suitable for identifying smaller rock glaciers, or rock glaciers with less prominent surface features.
While our method worked well in both of our study catchments, the flexible nature of machine learning methods, such as CNNs, means we cannot be certain that our method will work equally well in other periglacial catchments.
The probability heatmap corresponds well with RG_Man and it appears that the heatmap value relates to the prominence of the surface morphology, with landforms exhibiting marked morphology such as ridges and furrows (for example Fig. 7A) having higher probability heatmap values than those areas without prominent morphology (for example Fig. 7B). This infers that our method is more suitable for identifying active rock glaciers with prominent surface morphology.
The heatmap produced from the deep learning itself could be used as auxiliary data to aid the creation of rock glacier inventories. Our CNN_OBIA method could be used to reduce the uncertainty associated with manual inventories due to inter-user inconsistencies and personal subjectivity. Our method could provide a rock glacier outline base product which could be refined manually to create a finalised rock glacier inventory, thereby reducing the amount of manual digitisation required.

Comparison with other rock glacier inventory methods
The majority of rock glacier inventories are compiled with manual interpretation of high-resolution imagery, often complimented with auxiliary data (morphological or kinematic) (e.g. Villarroel et al. (2018), Bolch et al. (2019)). It is not straightforward to compare these classifications to ours, as in the absence of reference data these studies do not provide accuracy assessments. Additionally, the scale of analysis differs, some of the rock glacier inventories created manually have been conducted over regional scales (for example Wang et al., 2017;Villarroel et al., 2018) compared to our catchment scale. We can however compare the strengths and weaknesses of the methods involved.
The most common recent form of auxiliary data used in the creation of rock glacier inventories is surface velocities or kinematics derived from SAR interferometry (Liu et al., 2013;Bodin et al., 2016;Wang et al., 2017;Villarroel et al., 2018). Our CNN_OBIA method has a couple of key advantages over the use of interferometry. Firstly, it is possible to identify inactive rock glaciers which would not be identified using velocities alone since they are essentially not flowing. These features however are still important for the local hydrology and are a physical manifestation of permafrost. Additionally, our CNN_OBIA method is not reliant on VHR and costly satellite imagery, and instead runs on freely available 10 m resolution Sentinel-2 imagery. In our case we produced a DEM from commercial stereo imagery, however many regions of the world are covered by freely available DEMs of comparable resolutions. Unlike interferometry, our CNN_OBIA method is not restricted to analysing rock glaciers that are oriented in the LOS of the satellite, is less influenced by atmospheric disturbances, and does not depend on maintaining high SAR coherence between multiple acquisitions. This latter point means that the CNN_OBIA methodology can be equally be applied to regions where vegetation is growing either within the vicinity of rock glaciers, or on the rock glaciers themselves, providing a solution to the problem encountered by Bertone et al. (2019) when working in the Italian Alps. Finally, DInSAR is dependent on the perpendicular and temporal baselines making it easier to calculate over a larger area, whereas the CNN_OBIA method does not have this limitation.
Our method builds on the work of Brenning et al. (2012) who used textural filters on IKONOS imagery to identify rock glaciers based on surface morphology indicative of flow patterns It is not possible to compare accuracies due to differences in validation (computation of AUROC (Area Under the Receiver Operating Characteristic) as well as masking of areas by topography, bedrock, and landcover). Importantly though, our method was applied using freely available Sentinel-2 imagery available worldwide. As such we believe that analysis of textural signatures shows promise for identifying rock glaciers.
Our method is however not without limitations. One shortcoming is that sufficient and reliable inventory data must be available for a given region in order to train the CNN model. The training data does not need to be spatially complete, but nevertheless this can be challenging for completely unstudied periglacial regions. Secondly, since the CNN relies on identifying recurring spectral patterns and textures, it can misclassify debris flows, rock avalanche deposits and fluvial environments as rock glaciers which share some spectral, textural and morphological properties. This was problematic for both the Sentinel-2 classification and the Pléiades classification. The OBIA reshaping however helps mitigate these false positives to an extent, in a way that cannot be achieved solely using CNNs. OBIA allows objects to be excluded that are irregularly shaped, or by their spectral or morphological properties. As such, while the raw probability heatmap can assist an analyst to identify rock glaciers manually, by running the whole procedure through OBIA, it is possible to segment the imagery and obtain individual rock glacier polygons in an automated way.

Potential methodological developments
Our method shows promise yet there further development is needed before the method can be used to create rock glacier inventories over large areas without significant manual editing. At present, our method is prone to misclassifying landforms with similar textural characteristics such as rock avalanche deposits. However, these landforms can also be visually difficult to distinguish from rock glaciers without additional information. One possible solution to this could be including surface velocity data, either derived from feature tracking or from SAR interferometry, albeit with the limitation of LOS movements as discussed earlier. This would have the additional advantage of being able to classify rock glaciers based on their level of activity and would likely reduce the number of false positives in the classification. Using a combination of CNNs and OBIA could also be useful for other landforms that are hard to identify using image spectra alone, such as debriscovered glaciers, lava flows, and landslides.

Conclusion
Studies of rock glaciers over catchment to regional scales are hampered by inventories of variable quality that are based on subjective criteria and prone to inter-user inconsistencies. In this study we have presented a semi-automated workflow to map rock glaciers based on freely available Sentinel data and a high-resolution DEM using a combination of CNNs and OBIA. We trialled our method in two catchments, the La Laguna catchment in the semi-arid Andes of Chile, and the Poiqu catchment in central Himalaya. Our method managed to map rock glaciers across both catchments with a mean overestimation of 28.0% with the freely available remote sensing data. Producer accuracies were higher than user accuracies indicating that the classification successfully identified many of the rock glaciers in each catchment, yet the rock glacier outlines created contained false positives that must be removed manually. We found that better detection rates are not necessarily obtained from using VHR Pléiades satellite data, although the rock glaciers identified are mapped with a higher user accuracy. There are also some limitations with texturally similar landforms such as debris flows, avalanches, and fluvial deposits that become misclassified as rock glaciers. Nevertheless, our method produced promising results, runs as one workflow, and reduces the amount of manual work required. Further advances in machine learning methods are likely to lead to refined identification of rock glacier surface textures, as such we recommend the continued use of deep learning to semi-automatically identify rock glaciers and believe that OBIA provides a good framework for conducting and further automating the analysis.

Description of author's responsibilities
The study was conceptualised by BAR, TB, SM and PR. The analysis was performed by BAR and DH. All authors contributed to writing and editing the manuscript.

Dedication
This article is dedicated in memory of Michael Williams (1956 -2020).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  read an early version of the manuscript. We are thankful to ESA for the provision of Sentinel data and CNES/Airbus DS for the provision of the Pléiades satellite data for a reduced price within the ISIS programme. We are grateful for the constructive comments from three anonymous reviewers.