A machine learning method for Arctic lakes detection in the permafrost areas of Siberia

ABSTRACT Thermokarst lakes are the main components of the vast Arctic and subarctic landscapes. These lakes can serve as geoindicators of permafrost degradation; therefore, proper lake distribution assessment methods are necessary. In this study, we compared four machine learning methods to improve existing lake detection systems. The northern part of Yakutia was selected as the study area owing to its complex environment. We used data from Landsat 8 and spectral indices to take into account the spectral characteristics of the lakes, and MERIT DEM data to take into account the topography. The lowest accuracy was found for the classification and regression trees (CART) method (overall accuracy = 81%). On the other hand, the random forests (RF) classification provided the best results (overall accuracy = 92%), and only this classification coped well in all problematic areas, such as shaded and humid areas, near steep slopes, burn scars, and rivers. The altitude and bands SWIR1 (Short wave infrared 1), SWIR2 (Short wave infrared 2), and Green were the most important. Spectral indices did not have significant impact on the classification results in the specific conditions of the thermokarst lakes environment. 17,700 lakes were identified with the total area of 271.43 km2.


Introduction
Permafrost regions cover a quarter of the land surface in the northern hemisphere and contain large amounts of organic carbon (Brown et al., 1997;Jorgenson & Grosse, 2016;Obu et al., 2019).Climate change is expected to significantly degrade permafrost (Anisimov et al., 2010;Lawrence & Slater, 2005;Romanovsky et al., 2017), leading to the formation of thermokarst lakes (Zandt et al., 2020).Lakes in Siberia are point sources of carbon dioxide and methane that release long-term carbon stocks into the atmosphere (Hughes-Allen et al., 2021;Kallistova et al., 2019;Serikova et al., 2019).This process initiates positive climate feedback, potentially contributing to a 0.39°C rise in surface air temperature by 2300 (Schneider von Deimling et al., 2015;Zandt et al., 2020).Thermokarst lakes are one of the main features of the Arctic and Subarctic landscapes of Siberia, Alaska, Canada (Grosse et al., 2008) and Qinghai-Tibet Plateau (Chen et al., 2021) creating characteristic thermokarst landscapes.The formation of lakes in Siberia significantly impacts regional landscape morphology, hydrology, and biogeochemistry.Such lakes are characterized by landforms that develop as a result of the melting of ice-rich permafrost or massive ground ice (Brown & Grave, 1979).The common occurrence of thermokarst lakes has a large impact on the arctic areas, despite the fact that it is a local thermal disturbance of the ground (Grosse et al., 2013).The active thermokarst often indicates that permafrost is unstable and warming up (Niu et al., 2011;Serreze et al., 2000).
Proper lake distribution assessment methods are necessary to accurately determine their location and characteristics as well as to study their temporal and spatial dynamics.After studying these characteristics, it will be possible to ascertain whether thermokarst lakes can serve as geoindicators of permafrost degradation.The concept of geoindicators appears to be particularly well suited to determining changes in morphogenetic and sedimentary environments or geosystems in general, especially in sensitive polar and subpolar systems (Zwoliński et al., 2008;Zwoliński, 2004).
Scientists have used various sensors to map lakes in permafrost regions.Kripotkin et al. 2008 studied the dynamics of thermokarst lakes in continuous and discontinuous cryolithozones of Western Siberia using fusions of different sensors (Landsat 1, 4, 5, and 7, Meteor-3 M, and ERS-2).They used manual interpretation; therefore, their approach cannot be implemented for large areas.Kravtsova et al. (2009, 2011, Kravtsova & Rodinova, 2016) studied changes in the size of thermokarst lakes using Landsat, Soyuz-22 sensors, and aerial images.They pointed out many difficulties in mapping lakes in northern regions, caused by the similar spectral characteristics of lakes, burn scars, and shadows.Sentinel 2 satellite imagery has been used since 2015 to map water bodies (Bogoyavlensky, 2019;Du et al., 2016;Yang et al., 2017).It is possible to map lakes smaller than 1 ha owing to Sentinel's high spatial resolution (Șerban et al., 2020), but they cannot be used to analyse surface changes over long periods.
The main methods used in lake mapping are digitizing through visual interpretation, thresholding, standard image classification techniques, edge detection using a single or a combination of multiple bands, algebraic operations (e.g.band ratios and spectral water indices), spectral transformation, and texture analysis (Song et al., 2014).
Previous research on this topic is based mainly on the definition of threshold values that best separate the image pixel values of water from land.Grosse et al. (2008) used panchromatic Spot-5 and Ikonos-2 images for threshold assessments.They observed strong reflectance differences between the land surface and the water bodies.However, they highlighted some difficulties in lakes with very shallow water levels, turbid water with high sediment suspension, frozen lakes, deep thermo-erosional valleys, and steep northfacing cliffs or slopes, which resulted in misclassification.They manually corrected errors, which were ineffective for research covering large territories.Morgenstern et al. (2011) used the thresholding method with mid-infrared wavelengths of Landsat data and encountered similar problems.Lindgren et al. (2021) applied the Object-Based Image Analysis (OBIA) classification for lake detection; however, they faced misclassification caused by terrain shadows, rivers, streams, and channels, which were manually removed.
Numerous spectral indices exist, but their suitability for water detection in different settings must be investigated (Palmer et al., 2015).The Normalized Difference Vegetation Index (NDVI) is widely used in remote sensing research to capture regional and global vegetation changes (Wang et al., 2014).McFeeters (1996) developed the most basic water index, the Normalized Difference Water Index (NDWI), using the green and near-infrared (NIR) bands of the Landsat thematic mapper (TM) to maximize the identification of water bodies.Lu et al. (2011) used the combined difference between NDVI and NDWI to enhance the contrast between water bodies and the surrounding surface features.The NDWI index exhibited some shortcomings in builtup areas.To solve this problem, the NIR range was replaced with the short-wave infrared (SWIR) range, resulting in the Modified Normalized Difference Water Index (MNDWI) index (Xu, 2006).However, there are still severe problems owing to shadows in the highlands.Therefore, Feyisa et al. (2014) proposed an automated water extraction index (AWEI) for identifying water bodies, which has two conditions: AWEIsh is primarily designed to remove shadow pixels, while AWEInsh is designed for areas with urban backgrounds.
Machine-learning methods have also been used for the classification of water objects.Both supervised (Acharya et al., 2019;Huang et al., 2015) and unsupervised (Brezonik et al., 2007;Kloiber et al., 2002;Olmanson et al., 2008) approaches have been tested.In the Arctic regions, machine learning algorithms are mostly used for the permafrost modelling (Deluigi et al., 2017;Wang et al., 2019) and water body mapping (Dirscherl et al., 2020;Nitze et al., 2017;Rokni et al., 2015;Șerban et al., 2020).The results of these studies are characterized by various accuracies and mapping results.Xie et al. (2016) obtained an accuracy of 96% using Landsat TM imagery (30 m resolution), while Pradhan achieved an accuracy of 58% using radar TerraSAR-X (3 m resolution).Nitze et al. (2017) used the random forest method for lake change detection.They optimized the data quality, trying to remove the misclassifications described in the previously mentioned works, however they still faced the problems mainly attributed to the presence of snow, ice, and shadows.Șerban et al. (2020) compared five machine learning methods for the classification of water bodies using Sentinel-2 data.Automatic classifications have proved their efficiency and accuracy in the water body extraction (Acharya et al., 2019;Nie et al., 2017;Xie et al., 2016) and performed better than simple thresholding of the water indices as NDWI (Șerban et al., 2020).Bangira et al. (2019) compared thresholding with machine learning classifiers for mapping complex water and determined that machine learning methods were less sensitive to variations such as water turbidity and aquatic plants.
Attempts were made to create global lake databases, both, by combining existing databases (Lehner & Döll, 2004) and, by using satellite imagery (Donchyts et al., 2016;Verpoorter et al., 2014).A recent global highresolution mapping approach was conducted using Landsat satellite imagery; however, this method does not distinguish between rivers and lakes (Pekel et al., 2016).Such data cannot be used to map thermokarst lakes in large territories or areas crossed by river channels.
Most of the related work focus on the classification of water objects only and there is no specific distinction between lakes and other water objects.Such a distinction may be needed when studying thermokarst lakes.Often, rivers must be excluded manually from the dataset.In this study we assumed that rivers differ spectrally from lakes, and are characterized by different topographic condition and morphology, so the distinguish will be possible.
Previous works highlighted that most classification errors are caused by similar spectral features of lakes and other objects such as shadows, burn scars, or rivers.However, lakes often have different spectral characteristics owing to their depth and the presence of vegetation or ice, which causes false positive errors.The main goal of this study was to develop a method for detecting water bodies as lakes in a differentiated environmental setting.It was assumed that machine learning techniques perform better than thresholding in permafrost environment covered by wetlands and peatlands with complex relief.The northern part of Yakutia was selected as the study area owing to its complex environment and poor exploration.It is characterized by a significant variety of terrain and is entirely located in a continuous permafrost zone, which means that most lakes are of thermokarst origin.The main objectives of this study were to: (i) identify the features that best distinguish lakes from other spectrally similar objects, (ii) compare and determine the accuracy of the supervised classification methods for lake detection, and (iii) design a method for detecting lakes for large areas.

Study area
As the main aim of this research was to develop a method for lake detection in a complex environment, we selected the northern part of Yakutia as the study area.The territory is distinguished by various landforms and landscapes as well as complex geodiversity and biodiversity, thereby making it possible to test the designed method under various environmental conditions.The study area is significant owing to its location in a continuous permafrost zone.
Thermokarst lakes are widespread in the study area.The region's area is more than 137,400 km 2 , and its population is just over 11,000 people.Field research in this area is financially unprofitable and logistically difficult, making it very important to improve the relevant research technologies using remote sensing.
The study area belongs to the Verkhoyansk and Chersky mountain systems (Figure 1).The climate of the area is harsh and sharply continental (Matveev, 1989).Its development is influenced by the geographical location between the Arctic and subarctic climatic zones as well as the isolation of the area by mountain ranges.The main river in the study area is the Yana River.The Yana River basin covers highlands with mountain tundra, rocky deserts, and plateaus with mountain larch forests.The area is dominated by sparse woodlands, mainly Kayander larch (Larix Cajanderi), with birch, dwarf pine, cranberries, and intensive moss-lichen complexes (Matveev, 1989).Lichen and moss tundra are common in the north of the district.The mountainous landscape is dominated by cedar elfin and shrubby birch, and the soil surface is covered with lichens (Matveev, 1989).
Visual validation was performed on three selected test sites (Figure 2).The total area and number of lakes is as follows: -test site 1-1.86 km 2 and 8 lakes, test site 2-5.58 km 2 and 20 lakes, test site 3-1.24km 2 and 8 lakes.The total area of three test sites is 186 km 2 .Each test site has the area of 62 km 2 .

Data
An extended processing time is one of the main problems when using machine learning methods over large areas (Nath & Deb, 2010).The problem of insufficient computing power has mainly been solved since the release of the Google Earth Engine (GGE) (Google).GEE is a cloud platform dedicated to geographic data processing and analysis (Gorelick et al., 2017).GEE is widely used for mapping water bodies on a regional (Mahdianpari et al., 2020;Nguyen et al., 2019;Wang et al., 2020) or even on a global scale (Pekel et al., 2016).Thus, all of the presented analyses were performed using the GEE with the code written in JavaScript.The north-eastern part of the Siberian plateau was first imaged in 1999 by Landsat sensors (Gutman et al., 2013;Pekel et al., 2016), and since that year the number of imaging has been increasing.Analyses were performed using Landsat 8 data, available from 2014 as it was the first possibility of building a cloud free mosaic in the area of research.In the study area, there is no snow cover from June to September (Matveev, 1989).Therefore, only images from this period were selected for the mosaic.The images were taken from the USGS Landsat collection 8 Surface Reflection Level 1.Therefore, atmospheric and geometric corrections were unnecessary.All images with cloud cover lower than 30% were filtered from the collection.The collection contained 55 images in total.
In addition to Landsat 8 images, a digital elevation model, the Multi-Error-Removed Improved-Terrain DEM, was used (Yamazaki et al., 2017).MERIT DEM is a high-precision global digital elevation model with a spatial resolution of 3 arc seconds (ca 36 m at the study area, 90 m at the equator), obtained by excluding major error components from existing DEMs (NASA SRTM3 DEM, JAXA AW3D DEM, Viewfinder Panoramas DEM).

Preprocessing
An image mosaic covering the entire study area was created, covering the period of June to September (2014).If two images were available for a specific area, an image with a lower cloud cover was selected.A cloud mask created using the CFMask algorithm was applied in the next step (Foga et al., 2017).
Spectral indices used in the study were selected according to the Acharya et al. (2018), who tested spectral indices for water body detection in a mountainous environment (Table 1).As some mountain shadows cannot be distinguished from lake features using spectral variables, MERIT DEM data were used to create an additional dataset with the geomorphometric variables.

Classification
In the next stage, the training and test data were prepared.Ground truth points were selected based on Joint Research Centre (JRC) Global Surface Water Mapping Layers, v1.3 (Pekel et al,. 2016), and visual assessment according to the Landsat data mosaic and high-resolution images available from Google Earth Pro.The number of 400 ground truth points was decided based on the square root of the study area in Table 1.Spectral indices used in the study, acc. to Acharya et al. (2018).square kilometres and rounding the result to the nearest hundredth.200 points were assigned to the lake category and another 200 to the non-lake category.Field validation was not possible because of the inaccessibility of the territory and the large area of research (137,400 km 2 ).The points characterizing the lakes were chosen to reflect all the spectral and spatial characteristics of the lakes, while the rest of the points were chosen with an emphasis on the most difficult to distinguish objects, such as shadows, river bends, and fire scars.Examples of the ground truth are shown in Figure 3.The entire dataset was randomly divided into training (75%) and test (25%) data.
Classification and regression trees (Lewis, 2000) is a well-proven machine learning algorithm for identifying and classifying objects in the remote sensing field (Shelestov et al., 2017).The CART classification does not require expert knowledge, automatically selects useful spectral and ancillary information from data supplied and can be used with continuous and categorical ancillary data (Lawrence & Wright, 2001).
Bayes' theorem was proposed to statistically express the relationship between the conditional probabilities of two events.NB is a particular case of Bayesian networks, where a class node has no parents, and each eigenvector has the same class node as its unique parent.Despite the simplicity of this method, it often performs better than other complex classification methods (Bressan et al., 2009).
The RF algorithm is developed by Breiman (2001).It uses machine learning systems and relies on creating many decision trees based on a random set of data.Random forests, which generalize the decision tree concept, are classified as ensemble methods.The final decision is made by a majority vote on the classes indicated by individual robust decision trees.To find the best partitioning at each node in the decision tree, the RF uses a metric called the Gini index.In this context, the Gini index was used to calculate the average Gini decrease, which returns the importance of the variables used during classification (Belgiu & Drăguţ, 2016).The average Gini decrease was calculated to assess the importance of all input variables used to train the model.
In recent decades, SVM have become a popular supervised machine learning algorithms for classification and regression (Suthaharan, 2016).The SVM is based on the concept of perpendicular distance between decision planes (defined as a decision boundary) and data points.The maximum margin is chosen as the decision boundary.Objects are separated and assigned to different classes based on this distance.A subset of the data, called the reference vector, determines the border position (Suthaharan, 2016).

Validation
The classification results were evaluated using a confusion matrix that allows obtaining general measures of statistical accuracy, including precision and recall, F-score, and Cohen's kappa (K).Precision (P) (or user accuracy) was computed by dividing the number of true pixels by the total number of predicted pixels in the class.Recall (R), often called producer accuracy, was calculated by dividing the number of true pixels in a class prediction by the total number of true samples in the class.To capture the information from P and R simultaneously, the F-score (F1) was calculated, providing their harmonic mean (Jolly, 2018): The omission error (EO) and commission error (EC) were also calculated.EO returns the number of false negative (FN) (classification of a lake as non-lake) results in relation to all true samples of a class, while EC describes the frequency of all false positive (FP) (classification of a non-lake as lake) results in relation to all predicted samples in a class.All indicators were calculated for each class (lakes and non-lakes).
The overall accuracy (OA) was calculated by dividing the sum of all true positive (TP) and true negative (TN) classification results by the total amount of data (TS).The overall accuracy represents the probability that a test will correctly classify an individual.As the OA decreases, the model performance decreases.Therefore, the expected accuracy (EA) was calculated from TP, TN, FP, FN, and TS (Cohen, 1960): As a general measure of accuracy, Cohen's kappa coefficient (K) was calculated using the following formula (Cohen, 1960): K determines the degree of compliance of multiple measurements of the same variable under different conditions (Cohen, 1960;Landis & Koch, 1977).A similar approach to accuracy measures was used by Dirscherl et al. (2020), who classified Antarctic supraglacial lakes using the random forest method.All calculated measures of accuracy are between 0 and 1. K, OA, F1, P and R, with higher values represents better performance, while EO and EC with lower values represents better results.Visual validation was performed for the three selected test sites described in the study area section.The lakes in these areas were manually digitized.The areas of the classified lakes, false negatives, and false positives were also calculated.

Model performance
Table 2 illustrates the calculated values of accuracy and error rates for each type of classification.The omission error (EO) level of the lake class was the highest for the CART classification (0.16), and the EO for the non-lake class was the highest for the CART and NB classifications (0.22).The lowest EO characterized the NB classification for the lake class.The non-lake class had the lowest EO in the RF classification.The commission error (EC) level for the lake class was lowest in the RF classification.In contrast, the non-lake class had the lowest EC in the NB classification.R and P were analysed simultaneously in terms of F-score (F1).The best F1 was characterized by the RF classification, both for the lake and non-lake classes.The worst F1 characterized the CART classification in both classes.
The best OA and K were found for RF classification and the worst for CART classification.The RF classification yielded the lowest error in both the lake and non-lake classes.
Based on the visual assessment, it can be seen that the RF classification yielded the best results in the lowland conditions (Figure 4).Other types of classification significantly overestimate the number of lakes and are characterized by a high error.The CART and SVM classifications were characterized by a considerable error and often classified objects such as shadows and non-vegetated areas as lakes.Areas close to the waterbodies, were also often classified as lakes.False positives significantly underestimated the classification results.
It can be concluded that only the RF model coped well in rivers and wetland conditions (Figure 5).The NB model classified all rivers as lakes.The CART and SVM methods often classified objects such as shadows, bare grounds, and areas close to the waterbodies as lakes.For all types of classifications, the shallow parts of the lakes were characterized by a large number of false negatives.
The CART and SVM classifications were characterized by high errors under mountainous conditions (Figure 6), especially in highly shaded areas.A large number of false positives characterized shadows in both the classifications.The lowest error in mountain conditions characterized the NB and RF classifications.Both classifications were resistant to the misclassification of shaded mountain slopes.
Table 3 illustrates the classified lake areas on each of the three test sites.In test site 1, the RF classification was characterized by the lowest false positive and false negative errors.The CART and SVM classifications had the highest number of false positives.The NB had the highest number of false negatives.In terms of false positives and false negatives, the CART and NB classifications had the highest values.The RF classification had the lowest error values.Test site 3 had the lowest number of false negatives.Neither of the classifications was characterized by a high number of false negatives at test site 3.The CART classification had the falsest positives RF and NB classifications had the lowest number of false positives Based on the information shown in Table 3, it can be concluded that the RF classification is the most accurate, as it had the lowest number of false positives and false negatives in all three sites.Similar results were obtained for the NB classification but only at the test site 3.In contrast, the worst results were obtained for the CART and SVM classifications.

Variable importance
Figure 7 shows the importance of all input variables of the RF model.Altitude had the greatest importance, while other terrain features, such as slope and aspect, had medium importance.Almost all Landsat 8 channels had high importance, except for thermal bands (B10 and B11) and band 1. AWEish was the most significant, while NDVI was the least significant among all the indices.In general, the indices had the lowest overall impact on the classification results.
Rivers, burn scars and shadows were the hardest to distinguish from the lakes (Figure 8).Noticeable differences between lakes and rivers are observed in bands 2, 3 and 4 of the Landsat images.Burns scars have different spectral characteristics than lakes in thermal channels (B10 and 11).Water indices did not show any significant differences between lakes and rivers.However, these indices did have large spectral differences between water features compared to shadows and burn scars.The NDVI values differed between the lakes and rivers classes, and were higher in the lakes than in the rivers.The altitude varied significantly for the shadows class as most shaded places are in mountainous areas with high slope values.On the other hand, lakes and rivers are in flat areas and are of low altitudes; moreover, rivers are situated at lower altitudes than lakes.

Lakes inventory
According to the RF classification results, 24923 lakes were identified with the total area of 275.84 km 2 .The smallest identified lake had an area of 900 m 2 .We assumed the smallest lake area according to Pekel et al. (2016), who reported issues with the detection of water bodies smaller than 30 × 30 m (900 m 2 ).However, after the classification we decided to raise this value to 0.5 ha, due to the high value of the false positives of the smaller lakes.The lakes smaller than this value has been removed from the dataset.After the data cleaning the total area of lakes has been 271.43 km 2 , with 17,770 identified lakes.The largest lake had an area of 9.95 km 2 .The average lake area was 0.015 km 2 .Most of the lakes were located in the central and north part of the research area (Figure 9).Lakes were located mainly along rivers.Mountain areas and lowlands in the south were not characterized by a high density of lakes.

Model performance
The statistical accuracy metrics revealed promising workflow functionality.However, some classification methods showed lower overall accuracy than others.
The lowest accuracy scores were obtained using CART and SVM.In other studies, different classifiers showed various results depending on the subject and location of study.In the assessment of machine learning classifiers for global lake ice cover mapping, the RF (96% accuracy) method showed advantage over SVM (79% accuracy) and the SVM turned out to be the least accurate method among the tested (Wu et al., 2021).However, the SVM proved to be suitable for water quality assessment (Bouamar & Ladjal, 2007;Danades et al., 2016).Misra et al. (2018) used SVM for the shallow water bathymetry mapping, with the accuracy of 80%.The CART classification has been used with success (92%) on urban lake areas mapping, which is contrary to our results.This is probably due to a completely different specificity of the study area (Zhang et al., 2018).The main disadvantage of the CART is their sensitivity to training data.Thus, a slight change in the training dataset could significantly affect the results.Such conclusions were confirmed by Yasmin et al. (2019), who compared the CART and SVM methods for detecting water bodies using Landsat 8 OLI.Our results are in line with by older studies comparing RF and CART, where RF has been superior over CART (Oliveira et al., 2012;Peters et al., 2007;Vorpahl et al., 2012).Rithin Paul Reddy et al. ( 2020) assessed multiple classifiers and the NB method identified water regions better than others, contrary to our results.In our study, the RF method was found to be highly resistant to all forms of landscapes and factors causing errors.This study justified the use of RF models based on multiple variables to extract water bodies from massive remote sensing data.This is supported by other studies on the mapping of water bodies in China (Deng et al., 2017) and Australia (Tulbure & Broich, 2019), where the RF method was more than 90% accurate, as was the case in this study.Șerban et al. (2020) tested various machine learning techniques for the thermokarst lakes detection on north-eastern Qinghai-Tibet Plateau.They report that the maximum likelihood classification gave the most accurate results, nevertheless the RF classification was characterized by the accuracy higher than 90%.Our studies also have similar accuracy results to the land cover studies using RF method (Deluigi et al., 2017;Gislason et al., 2006;Shi et al., 2019;Wang et al., 2019).
Despite the high accuracy, a visual check of the classification results is necessary, particularly in areas close to rivers and multichannel rivers.Visual interpretations of the results revealed some deficiencies in the classification methods in problematic places (rivers, shadows, burn scars).According to the visual assessment, only the RF classification was not characterized by a large error in distinguishing between lakes and rivers.The other classification results were characterized by considerable errors in cloud-shaded areas and mountain slopes, areas with high moisture, and areas with burn scars.

Variable importance
The Gini index was calculated to determine the variables that influenced the classification results the most.Although some variables returned lower significance than others, we decided not to reduce the number of  input variables while training the model.Each of the explanatory variables has different characteristics, therefore the removal could result in worse results, especially during the classification of the objects with similar spectral signatures.The input variables were selected based on a careful literature review (Acharya et al., 2018;Dirscherl et al., 2020;Feyisa et al., 2014;Kravtsova & Rodinova, 2016;Palmer et al., 2015).As the primary goal of this study was to develop an approach that is portable in space and time, a wider range of input variables provides more flexibility to spatially classify independent regions.Among the raw Landsat 8 channels, the SWIR 2 (band 7) and SWIR 1 (band 6) bands had the highest importance values.These bands are commonly used to assess the soil and vegetation moisture (Bao et al., 2018;Khellouk et al., 2020).However, the indices built using these bands did not reach such high values, which is contrary to other studies (Acharya et al., 2018;Yang et al., 2017;Zhai et al., 2015).Most of the indices  uses only two bands (NIR and Green) and for this reason a shallow thaw lake, with low capability of green spectral band can be underestimated (Szabó et al., 2020;Zhao et al., 2020).It may explain why the water indices did not work well for mapping thermokarst lakes, as these types of the water bodies are often shallow.However, Chen et al. (2021) prepared a dataset of permafrost lakes across the Qinghai-Tibetan Plateau permafrost region using the NDWI index with the satisfactory results.Our study area, especially in the lowlands, is characterized by high soil moisture.Depending on the season, these areas are often flooded, which may lead to large errors in the classification.According to the Șerban et al. (2020), the NIR band, which is used in most of the indices is prone to commission errors as clusters of wet soil, shadows and clouds can be misinterpreted as water.Muster et al. (2013) and Wangchuk and Bolch (2020) claim that the shadows problem can be solved with the R, G, B bands, which explains the great importance of these bands in our study.However apart from these bands we observed high spectral difference between shadows and waterbodies on the SWIR bands and spectral indices built on these bands (MNDWI).The high importance of the SWIR bands was also confirmed by the study of water bodies with Sentinel 2 images (Du et al., 2016).RGB bands of the Landsat images differed spectrally in the lakes and rivers class.This could be caused by different water clarity and turbidity in lakes and rivers (Brezonik et al., 2007;Zhao et al., 2011).This is also confirmed by higher NDVI values, which may indicate greater eutrophication of lakes than rivers, more developed aquatic vegetation, and thus different spectral values.The potential of using NDVI for lake detection was previously confirmed by multiple studies in various environments (Fan et al., 2020;Han & Niu, 2020;Kiage & Walker, 2009;Propastin, 2008).
Mapping lakes remains uniquely challenging in mountainous regions, especially because of the usually small size of lakes (Wangchuk & Bolch, 2020).Compared to other variables, altitude was far more important.This is most likely due to the organization of the runoff of water from the thawing of permafrost on the mountain slopes, while, on the flat areas, the water after permafrost thawing remains in situ.Polishchuk et al. (2018) highlighted that the number and proportion of lakes are strongly controlled by variations of topography, which confirms the importance of altitude in the classification.Most of the lakes in the region have a thermokarst origin and are placed in shallow depressions called alas, formed by permafrost subsidence in the Yedoma regions.Yedoma is an ice-rich permafrost deposit containing large syngenetic ice wedges, which accumulated in regions of Eurasia, Alaska, and Northwest Canada during the Pleistocene (Strauss et al., 2013).These areas of Yedoma are strongly affected by thermokarst processes, which results in the formation of lakes.
The difficult aspect in this study was to find differences between river and lakes that could help in the classification.We assumed that rivers, despite similar spectral characteristics as lakes, will be distinguishable by taking into account the topographic factor.The results indicated that the rivers are situated at lower elevations than the lakes, which was significant in the problem of distinction between these two classes.
Lakes inventoryIn our study, we assumed that the minimum lake area that can be mapped using Landsat imagery is the pixel size of this sensor (30 × 30 m).The spatial resolution of the Landsat data (30 m) may have led to the exclusion of many small lakes and, therefore, underestimated the water extent (Șerban et al., 2020).Many studies used different area thresholds.Polishchuk et al., (2008), specified that the smallest identifiable lakes were those of 5.55 Landsat pixels in size (5,000 m 2 ).Other studies assumptions strongly varied from 1,000 m 2 (Muster et al., 2013) to 200,000 m 2 (Morgenstern et al., 2008).According to Smith et al. (2005), the smallest reliable lake area to be mapped is 400,000 m 2 to avoid seasonal variability.Our preliminary results show that lakes below 5,000 m 2 are often characterized by high classification error and, at the same time prone to seasonal variability, but the topic needs to be explored.
Our inventory showed that the spatial distribution of lakes in the study area is not uniform.Grosse et al. (2008) showed that the thermokarst lakes distribution is strongly connected to hydrological and geomorphological parameters.We found high density of lakes at the lowlands, in the valleys and bogs of the central part of the region.Most lakes are located along rivers or its interfluves.These areas, with high density of thermokarst lakes, are probably highly vulnerable to the thermokarst processes and are a part of the thermokarst landscapes in the region (Olefeldt et al., 2016).Mountain areas are characterized by a much lower lake density, which is in line with the results in Qinghai -Tibet Plateau, where the number of lakes is considerably smaller in mountainous areas (Niu et al., 2011).
The field mapping and monitoring of lakes in these areas are complicated owing to their large numbers, remoteness, and inaccessibility.Consequently, remote sensing-based approaches are highly preferred over field-based approaches.In recent years, remote sensing methods have rapidly developed owing to the free availability of data and extensive spatial coverage.However, remote sensing approaches have certain drawbacks.Limitations, in particular, include their inability to reliably map lakes over large areas owing to many disruptive factors, such as shadows from mountains and clouds, lake turbidity, cloud cover, areas with high moisture, burn scars, and similar spectral characteristics of lakes and rivers.This study demonstrated that the errors caused by these factors can be overcome.
In the future, using Sentinel 2 data or a data fusion between Sentinel 2 and Landsat 8 can help identify smaller lakes.On the other hand, Sentinel 2 does not have a time series as long as Landsat, so analysing lake dynamics over a long period would not be possible.More training data can be used during model training in areas with high errors to improve classification accuracy.According to the Millard and Richardson (2015) who investigated the importance of training data in RF image classification, as many training and validation sample points as possible should be collected.More sample points should be collected from mountainous areas than from lowland areas.Classification methods can be adapted for time-series analyses when annual DEM models are available.The investigated area is strongly exposed to thermokarst processes, therefore the changes in the relief are dynamically shaped, even at annual intervals.The relief is exposed to strong land subsidence or landslides, which may shape the distribution of lakes as well as changes in their surface.High resolution data is required to accurately identify these changes.To analyse the temporal dynamics of lakes, data sources that consider annual changes in hypsometry are required.

Conclusion
This study provides an adaptation of known classification methods for lake detection in a very specific territory with cold climate, such as a permafrost area, using Landsat and MERIT DEM data.Based on the results of this case study, the following conclusions can be drawn: •Altitude and bands 7 (SWIR2), 6 (SWIR1), and 3 (Green)of Landsat 8 OLI had the greatest impact on the lakes classification results.
•Most classification errors were false positive (classification of a lake as non-lake) rather than false negative (classification of a non-lake as lake).
•The CART, NB, and SVM classifications were characterized by high error in shaded and moist areas, near steep slopes, burn scars, and rivers.Only the RF classification coped with these limitations and provided the best results.
•The greatest error characterized the CART classification owing to the large number of false-positive errors.
•Future developments may involve improving the RF model with more training data and using different variables (spectral indices based on the SWIR bands), with a greater spatial resolution.
The RF technique has exploratory potential and can be used to map and observe changes in permafrost lakes on a definite time-space scale in arctic and subarctic permafrost areas.

Description of author's responsibilities
JP: conception of the paper; JP, JN, and ZZ: designed the analysis; JP: collected the data and performed the analysis; JP, JN, and ZZ: data interpretation; JP: wrote the paper; JN and ZZ: critical revision of the paper; ZZ: final approval.

Figure 1 .
Figure 1.A) the map indicates the location of the study area in Siberia on the background of the permafrost distribution (Obu et al., 2020).b) the location of the studied Yana River drainage basin (Global Runoff Data Centre,2020) on digital elevation model (DEM based on Yamazaki et al., 2017).

Figure 2 .
Figure 2. Three selected test sites a) Lowland conditions; b) River and wetland conditions; c) Mountainous area with river-lake system.

Figure 3 .
Figure 3. Examples of ground points in study area.The white colour indicates points in the lake category.Points in the non-lake category are marked in yellow.A Landsat 8 composite of bands 5, 4, and 3 was used as the base map.

Figure 7 .
Figure 7. Importance of the variables based on the Gini index.The variables were divided into three groups based on their characteristics and application.

Figure 9 .
Figure 9. Lakes distribution plot, according to the RF classification results.a) altitude, b) longitude, c) latitude.

Table 2 .
Accuracy assessment results for each classification type.The best values are bold.

Table 3 .
Lake area [km 2 ], false positive and false negative error percentage for each of the tested classifications on three test sites.Area value is given in square kilometres while false positives and false negatives are given as percentages.The best values are bold.