An artificial intelligence approach to remotely assess pale lichen biomass

Although generally given little attention in vegetation studies, ground-dwelling (terricolous) lichens are major contributors to overall carbon and nitrogen cycling, albedo, biodiversity and biomass in many high-latitude ecosystems. Changes in biomass of mat-forming pale lichens have the potential to affect vegetation, fauna, climate and human activities including reindeer husbandry. Lichens have a complex spectral signature and terricolous lichens have limited growth height, often growing in mixtures with taller vegetation. This has, so far, prevented the development of remote sensing techniques to accurately assess lichen biomass, which would be a powerful tool in ecosystem and ecological research and rangeland management. We present a Landsat based remote sensing model developed using deep neural networks, trained with 8914 field records of lichen volume collected for >20 years. In contrast to earlier proposed machine learning and regression methods for lichens, our model exploited the ability of neural networks to handle mixed spatial resolution input. We trained candidate models using input of 1 × 1 (30 × 30 m) and 3 × 3 Landsat pixels based on 7 reflective bands and 3 indices, combined with a 10 m spatial resolution digital elevation model. We normalised elevation data locally for each plot to remove the region-specific variation, while maintaining informative local variation in topography. The final model predicted lichen volume in an evaluation set (n = 159) reaching an R2 of 0.57. NDVI and elevation were the most important predictors, followed by the green band. Even with moderate tree cover density, the model was efficient, offering a considerable improvement compared to earlier methods based on specific reflectance. The model was in principle trained on data from Scandinavia, but when applied to sites in North America and Russia, the predictions of the model corresponded well with our visual interpretations of lichen abundance. We also accurately quantified a recent historic (35 years) change in lichen abundance in northern Norway. This new method enables further spatial and temporal studies of variation and changes in lichen biomass related to multiple research questions as well as rangeland management and economic and cultural ecosystem services. Combined with information on changes in drivers such as climate, land use and management, and air pollution, our model can be used to provide accurate estimates of ecosystem changes and to improve vegetation-climate models by including pale lichens. * Corresponding author. E-mail addresses: rasmus.erlandsson@nina.no (R. Erlandsson), jarle.bjerke@nina.no (J.W. Bjerke), eirik.finne@nina.no (E.A. Finne), slpiao@pku.edu.cn (S. Piao), xuhui.wang@pku.edu.cn (X. Wang), tarmo.virtanen@helsinki.fi (T. Virtanen), aleksi.rasanen@luke.fi (A. Räsänen), timo.kumpula@uef.fi (T. Kumpula), tiina. kolari@uef.fi (T.H.M. Kolari), teemu.tahvanainen@uef.fi (T. Tahvanainen), hans.tommervik@nina.no (H. Tømmervik).


Background
While satellite-based methods for remote sensing of green vascular plants have been the most important source for monitoring global environmental change (e.g. Myneni et al., 1997;Gould, 2000;Piao et al., 2020;Ryu et al., 2019), reliable methods for lichens have been elusive (Falldorf et al., 2014;Nordberg and Allard, 2002). Lichens are symbiotic associations of fungi and algae and/or cyanobacteria characterized by low requirements of nutrient and low growth rates that enable them to pioneer in the colonization of exposed surfaces (Cutler, 2010). They can remain prominent in later succession stages, it is calculated that 6-8% of the Earth land surface is covered and dominated by lichens (Crittenden, 2000;Larson, 1987), and they may colonize nearly all terrestrial habitats while being functionally important in many ecosystems (Kappen, 1988). Especially in Arctic, alpine, boreal and dryland ecosystems lichens are major contributors to overall carbon and nitrogen cycling, biodiversity and biomass (Cutler, 2010;Elbert et al., 2012).
Terricolous (ground-dwelling) fruticose lichens, typically forming 10-20 cm thick mats in undisturbed regions, have the potential to be the dominating growth form in Arctic and alpine tundra, in open boreal forests, and on the drier parts of peatlands (Ahti and Hepburn, 1967;Joly et al., 2009;Rautiainen et al., 2007;Tømmervik et al., 2012). Their role as a vital winter food resource for reindeer/caribou (Rangifer tarandus, hereafter Rangifer) makes them economically and culturally important in areas with semi-domestic or wild populations of Rangifer (Joly et al., 2009;Riseth et al., 2016). Due to their wide distribution in the Arctic and boreal regions of Eurasia and North America, the area covered by reindeer husbandry alone amounts to some four million square kilometres (Jernsletten and Klokov, 2002;Tømmervik et al., 2012). Rangifer have impact on climate through their modification of vegetation structure by trampling and selective grazing (Cohen et al., 2013;Collins et al., 2011;Joly et al., 2010;Tømmervik et al., 2004).
While green plants have a relatively distinct reflective signal in the red and near-infrared bands of the light spectrum, lichens have considerably complex spectral signatures (Rees, 2004;Solheim et al., 2000). Notably, terricolous lichens consist of species both with melanised dark surfaces and pale surfaces with no or little melanin (Gauslaa, 1984), and these often grow in mixtures. Light reflectance properties are also affected by thallus humidity (Granlund et al., 2018) and growth form (crustose, foliose or fruticose). Another complicating factor is that terricolous lichens have limited growth height (maximum 25 cm), and they often grow in communities with a mixed cover of lichens, bryophytes and taller vascular plants, especially graminoids, shrubs and trees, that can partly shade lichens or hide them from optical sensors (Fraser et al., 2014;Tømmervik et al., 2004). Remote sensing assessment methods of lichen abundance should therefore ideally be able to detect lichens despite noise from varying vegetation structures. Simple arithmetic equations based on satellite reflectance values are typically sensitive to that type of noise (Nordberg and Allard, 2002), hence such methods should be used with caution, limiting their applicability to specific areas or habitats. This is for example the case with the Lichen Volume Estimator (LVE) model developed for pale lichens by Falldorf et al. (2014) and later applied by Rickbeil et al. (2017) and Macander et al. (2018).
Recently, methods to detect pale terricolous lichens based on machine learning have started to emerge He et al., 2021;Jozdani et al., 2021;Kennedy et al., 2020;Macander et al., 2020). Artificial intelligence-based approaches have several benefits since the combined information from all included bands are used without any a priori assumptions on their relationship to lichen reflectance. The blunt naivety of such an uninformed approach, compared to the more informed decisions when constructing a traditional equation, pays off in the ability to find context-dependent relationships (Brodrick et al., 2019). This allows detection algorithms of the lichen signal to vary across the landscape, for example along gradients from open tundra to boreal taiga. In this way, the approach resembles the ability of the human brain to distinguish a relevant signal despite different types of noise Jozdani et al., 2021), however with the potential to process a large number of bands (Kennedy et al., 2020). In addition to reflectance, it is possible to include any additional rasterised data, such as elevation, emphasising the potential of artificial intelligence to efficiently combine different types of geographical information to solve complex remote sensing tasks Jozdani et al., 2021;Kennedy et al., 2020). Jozdani et al. (2021) and Fraser et al. (2021) investigated the ability to train computers on extremely high-resolution (< 3.5 cm) images taken with unmanned aerial vehicles (drones) and changed scale to high-resolution satellites (0.5-6 m). While the method of Jozdani et al. (2021) was limited to a binary classification of terricolous lichens in general; Fraser et al. (2021) assessed the ability of a random forest model to quantify cover of pale, fruticose lichens of the genus Cladonia, reaching R 2 values from 0.07 to 0.49 for the Planet satellites, and 0.52 to 0.70 for the World View satellites, both at 6 m resolution. Although both studies showed the versatility and promising aspects of machine learning techniques to scale from local to regional scope, high resolution images are still often expensive, tend to have a limited availability, and low coverage or number of spectral bands. Not least, they have a very limited historical record, making such images unsuitable for long-term studies, and inapplicable with field records that are more than a few years old.
In contrast, Kennedy et al. (2020) presented a promising methodology using deep learning to train a neural network for assessment of lichen cover based on Landsat images, elevation data and climatic parameters. Kennedy et al. (2020) assessed pale lichen cover at three different high-latitude sites spanning the North American continent, achieving R 2 values ranging from 0.27 to 0.55. This is a pioneering application of AI to remote sensing of terrestrial lichen cover, demonstrating the high potential of the technique. However, correlation coefficients were rather low in several areas, while lichen cover remains a rather crude measurement of actual lichen biomass. Although biomass is a function of cover and thickness, the correlation with cover is nonlinear and highly sensitive to reindeer grazing pressure. In particular, when lichens are recovering after prolonged grazing, the cover can be substantial while the biomass is still very low due to rapid horizontal expansion from numerous lichen fragments while vertical growth is a slower process (Klein and Shulski, 2009). Thus volume is a more relevant ecological measure since it is closely correlated to biomass (Moen et al., 2007;Tømmervik et al., 2004;Tømmervik et al., 2012). Biomass is also socioeconomically relevant for reindeer herding and hunting-based communities as it quantifies the amount of food available for Rangifer, and lichen biomass can affect the ability of other vegetation to establish, since thick lichen mats can prevent vascular plant seeds from reaching the soil (Sedia and Ehrenfeld, 2003;Tømmervik et al., 2009). Furthermore, modellers of global ecosystem dynamics typically rely on biomass rather than cover. In addition, a thick layer of pale lichens would have a higher albedo compared to a thinner layer with the same cover, potentially affecting local climate (Stoy et al., 2012). However, while lichen cover can efficiently be obtained by using, for example, aerial photos or drone images, large datasets with lichen volume are tedious to collect since the thickness of the lichen cover must be measured in the field. As field data collection of representative lichen volumes is timeconsuming and expensive, an accurate remote sensing model for estimation of lichen volumes would be a powerful tool in ecological research and rangeland management.

Objectives
We assembled a data set of 8914 field records collected over a period of 23 years from both open and forested lichen habitats covering a gradient from no or negligible grazing conditions to high-grazing conditions. This dataset was used to address the following objectives: (1) To develop and evaluate a neural network-based method for assessment of pale lichen volume relying solely on reflectance and topography data and apply the model in selected circumpolar sites to evaluate the general validity of the model. (2) To test the ability of the model to detect a documented long-term change in lichen abundance.

Methods
We trained neural networks with Landsat images and ground-truth data collected over 20 years in northern Scandinavia and the Kola peninsula. In contrast to earlier proposed methods, we did not restrict the input data to single pixel values only, but exploited the ability of neural networks to handle mixed resolution input.
Although it is technically possible to feed virtually any type of geographic information to our neural network, we wanted to reduce complexity and limited the input data to Landsat reflectance and elevation data, since those datasets are easily retrievable and in high spatial resolution. To be able to utilise the historical records of Landsat images we only used bands that are similar across the Landsat 5, 7 and 8 satellites, allowing the method to be applied back to the 1980s.

Lichen abundance data
The ground-truth data we retrieved originated from different monitoring projects undertaken in Norway, Sweden, Finland and the Kola peninsula, Russia (Table 1, Fig. 1), and therefore, they are subject to some variation in methods and within-plot sampling regimes. The distribution of plots within sampling sites varied as well from random to systematic transects, depending on the original purpose of the sampling. All samples were based on 1 m 2 field plots except the data from Finland which was sampled from 0.25 m 2 . Values from fields plots that fell within the 30 m Landsat pixel size, were averaged (2-5 depending on area) and used as single values for training, evaluation and testing. A special case was made for the plots from west Finnmark that were located 50 m apart. Those plots were used as individual plots for training, to maintain a larger training set, but the mean was used for testing as the mean value, although sampled from adjacent Landsat pixels, more accurately reflected the general lichen biomass within the target pixel. All plots (n = 8914) were standardised to volume in litres (dm 3 ) per m 2 of pale terricolous fruticose lichens mainly belonging to the two genera Cladonia and Flavocetraria. In 21 plots, lichen cover was >100% reaching a maximum of 126% because of 2-3 layers of different lichen species, where the upper layers shade the lower ones. Vegetation cover measurements summing to over 100% are common in vegetation studies (Fehmi, 2010), and those records did not result in outlier volume measurements. Table S2 provides an overview of the main types of vegetation covered by the AI analysis in this study, following the concept developed by Fremstad (1997), which relies on ecological gradients in moisture, nutrient availability, soil type, topography, and climate. These types are dwarf shrub and lichen heaths, bilberry-lichen heath, alpine heather-lichen heath, lichen-birch woodland, lichen-pine woodland, and lichen-rich bog with heather and dwarf shrub.
Both a training and a validation set are needed to train neural networks. For an unbiased test of the model performance, an independent test set is also needed. We therefore randomly divided the dataset into 3 datasets: training (n = 3290, 73%), validation (n = 1054, 23%) and test (n = 159, 4%). Spatially overlapping records (within 300 m clusters) were allowed in the training and validation set, but plots that were closer than 300 m were removed from the test set in order not to bias the results (regarding the distance, see discussion below).For the test set, averaged values were used for multiple data points collected within a single or adjacent Landsat pixel.
We also rendered a 'remote' dataset, to investigate if boosting the training set with remotely assessed data could improve the models. Across an area of 280 km 2 in Central Norway, we classified a total area of 6.3 km 2 into 4 land-cover types based on lichen abundance through visual interpretation of aerial photography (0.5 m resolution). The categories were: a) vascular plants without lichens, b) bare ground with negligible amounts of lichens, c) intermediate lichen cover, d) lichendominated vegetation. Within the classified areas, we randomly placed 1734 points to which we assigned values for lichen cover and height corresponding to the average empirical values we measured in the field at comparable sites. To render variation within the assigned values, they were randomly picked from a class-specific distribution. The remotely assessed dataset was solely used as training data.

Image acquisition and pre-processing
The workflow described below is also graphically presented in a methodological flowchart (Fig. 2).

Satellite images
Atmospherically corrected surface reflectance Landsat collection 1, Table 1 Size of datasets used for training. Data points with high lichen volume were heavily underrepresented and therefore repeated in the training set to achieve a more even distribution (numbers given within parentheses). Tier 1 images (Landsat 5, 7 or 8 depending on availability, U.S. Geological Survey) were acquired using Google Earth Engine (Gorelick et al., 2017). For each field site and year that data was recorded, a collection of all images taken between July 1st and August 31st was masked for clouds, cloud shadows and snow using the embedded quality assessment band (CFMASK-derived, see Foga et al., 2017). A composite was then formed using median values. In some years and locations, clouds completely prevented the compilation of a useful composite. In those cases, we instead used composite from the closest possible adjacent year with enough coverage (Table 2). An alternative method to handle atmospheric noise and other distortions is to use composites averaged from multiple years. However, this method is sensitive to vegetation changes between years. We decided not to use multi-year averages, since reindeer grazing impact in some of the field sites often vary considerably between years. Hence, multi-year averages would possibly add, and not reduce, noise. In addition to the 7 bands that Landsat 5, 7 and 8 have in common, we included 3 normalised differentiated indices that have been used in previous efforts to sense lichens remotely: Normalised Differentiated Vegetation Index (NDVI, Tucker, 1979, bands: NIR and red), ND Lichen Index (NDLI, Nordberg, 1998, bands: SWIR1 and green), and ND Moisture Index (NDMI, Wilson and Sader, 2002, bands: NIR and SWIR1). All pixel values were scaled to 8-bit values (0-255) in Google Earth Engine prior to export to facilitate an accurate conversion from Tag Image File Format (TIFF) to 8-bit Portable Network Graphics (PNG), which was the image file format supported by the artificial intelligence platform (Peltarion AB, Stockholm, Sweden) used in this study. For the scaling, the highest recorded reflectance value (0.4785) occurring in our training set was used as 255. Since the normalised indices range between − 1 and + 1, those values were stretched to values between 0 and 255 separately.

Elevation
Terricolous fruticose lichens at northern latitudes have the capability to form large mats from sea level to mountain summits, with the same species being abundant along the entire bioclimatic gradient (Crittenden, 2000;Hein et al., 2021;Tømmervik et al., 2009). Hence, altitude per se has a limited effect on lichen growth in the areas covered by this study and the study sites with the highest lichen abundance were found in upland areas with little reindeer grazing and low impacts of other land-use activities. Since this was largely an effect of variation in extent of reindeer husbandry, rather than in ecological processes relevant for lichen growth, there was a risk that absolute altitude could be exploited by the neural network to identify an artificial relationship between altitude and lichen abundance, limiting the generality of the model. However, local topography variation has strong impact on snow accumulation, which in turn can affect reindeer access to lichen forage resources (Tømmervik et al., 2012). By normalising the elevation data for each plot, we removed the region-specific variation, while maintaining informative local variation in topography. (The regional minimum value within a square of 3870 m (387 pixels) was subtracted from all pixels, resulting in elevation models ranging from 0 to the locally highest point.).
Elevation data was either acquired from the Arctic Digital Elevation Model (2 m resolution, Porter et al., 2018) or, for Norwegian sites, a 2 m resolution digital elevation model produced by the Norwegian Mapping Authority ("Kartverkets digitale terrengmodell"). The resolution was reduced to 10 m × 10 m using Google Earth Engine and default settings (NN: nearest neighbour). All images were exported as GeoTIFF raster files in the format of their local Universal Transversal Mercator (UTM) zone.

Raster handling
During the exploratory phase of the method development, we experimented with different cropping sizes of the input rasters before we settled on 90 × 90 m. While for all applications (maps) of the model we used a 90 × 90 m cropping straight away, the rasters for the training and test data were constructed using a larger crop size of 129 Landsat pixels.
For each Landsat-derived raster and data point, we cropped a square of 3870 × 3870 m (Landsat: 129 × 129 px, elevation data: 387 × 387 px) with the sampling coordinate in the centre. If 11% or more of the area was masked due to clouds, the data point was dropped entirely. Otherwise, any masked pixels were assigned with the mean value of the unmasked pixels as this could be assumed to render least noise for the artificial intelligence modelling. Rasters were handled in R (R Core Team, 2020) using packages raster (Hijmans and van Etten, 2012), terra (Hijmans, 2021), sp (Bivand et al., 2013) and sf (Pebesma, 2018).

Fig. 1.
Map of the field plots used to train and test the deep neural network. Plots within a 1 km distance from each other was aggregated for readability. For sample sizes, see Table 1.

Fig. 2.
Methodological flow chart. Field work steps are shown with green background, while processing in Google Earth Engine, R, and the Peltarion AI-platform are shown in blue, beige, and red backgrounds, respectively. In Google Earth Engine, we compiled cloud-free median composites of all 1st tier surface reflectance Landsat images taken between 1st July and 31st August. We extracted the 7 bands that are in common for Landsat 5, 7 and 8 and calculated three spectral indices. Field records within pixel distance were averaged, resulting in the discrepancy between number of actual filed records and training/validation sets. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Structuring of data for training neural network models
We subdivided the datasets into 3 different categories (Table 1). The Core dataset consisted of data collected in the field by one or several of the authors or closely affiliated monitoring programs. The Mixed sources dataset was retrieved from external collaborators and collected following more diverse methods and with different levels of detail). The Remote dataset was compiled from aerial photos as described above.
We transferred 112 data points from the core dataset to the mixed sources dataset (Table 1), because these plots had comparatively high lichen volume-to-cover ratios (thick but patchy lichen cover, which occasionally occurs in forested systems with little reindeer grazing but high competition from shrubs) and could potentially add noise rather than contribute to an efficient training.
For each area where a dataset was collected, we randomly assigned 80% of the data for training and 20% for validation. Prior to this division, to avoid spatial overlap between training and evaluation sets, all data points that fell within a 300 m radius of any other point were grouped into clusters. In the assignment to the different set classes, training and evaluation points were never drawn from the same spatial clusters.
Before we trained the neural networks, we also extracted a random test-set from the Core training set. This set was completely removed prior to any training, including evaluation during training. To avoid spatial overlap between the test and the training data, the points in the test set were not allowed to come from the same spatial cluster as any data points in the training or validation set (similarly to the division of the training and test set). A recent debate has emerged regarding the implications of spatial autocorrelation when validating spatial predictions (Ploton et al., 2020;Wadoux et al., 2021). On the one hand, Ploton et al. (2020) argue that test data should be spatially independent from the training data. To test the spatial dependency, they propose to use variogram analysis to find out a threshold distance after which there is little spatial autocorrelation in the response variable. They suggest using such spatial cross-validation techniques in which test data points are not located within the spatial threshold distance from each training data point. On the other hand, Wadoux et al. (2021) argue that a random selection of points for evaluation should be used instead of spatial or traditional cross-validation, and that spatial cross-validation methods result in overly pessimistic assessment of model fit.
To assess the influence of spatial autocorrelation of our data set, we ran a spatial random forest analysis to analyse how well the lichen biomass in the test set could be predicted solely with the distance between data points (Hengl et al., 2018). To assess the scale of spatial autocorrelation, we constructed a variogram following the method described in Ploton et al. (2020). A distance sufficient to effectively handle the inherent spatial autocorrelation of the data set (at least 43 km, as indicated by a variogram analysis Supplementary Fig. S4) would render training impossible as it results in to few clusters per study site. The choice of clustering distance is a trade-off between spatial independence and having training sets in all study areas. Given that we were unable to compensate for the spatial autocorrelation, we settled for a clustering distance of 300 m to achieve a practical number of clusters while maintaining a minimum of spatial partitioning.
Data points with high lichen volumes were heavily underrepresented. This can be problematic since each iteration of the training works with a subset of the complete training data, and a skewed representation of records is likely to result in few or no records of underrepresented values in many training iterations. In our case, this would result in biased models that were not sufficiently trained to predict high lichen volumes. A way to limit this effect is to repeat less represented values in the training set and artificially achieve a more even distribution. We binned the values into 19 volume bins of equal intervals and repeated values in the training set belonging to the 17 highest categories (repeating values from 25 to 140 dm 3 /m 2 ) up to a maximum of 600 times (Fig. S1). Those values were repeated as is, without any additional noise.

Training of neural network models
After exploratory testing, we chose a model architecture (Fig. 3) that was used in all models. Landsat pixels (30 m) and elevation pixels (10 m) were given separate inputs due to the difference in spatial resolution. In addition to input values of a single Landsat pixel (1 × 1 Landsat +3 × 3 elevation), we also included an input of 3 × 3 Landsat pixels (9 × 9 elevation pixels, Fig. 3). The models were trained using a batch-size of 128, Adam optimizer, learning rate of 0.0005 and patience of 20. The final model discussed in detail below was trained for 33 epochs, reaching a plateau after 7 epochs. Epoch 13 was ranked best according to the training platform and thus used in the subsequent analyses.

Model evaluation
To evaluate model fit, we fitted a linear model with the empirical lichen volume of the test set as explanatory variable and the AI-model output as the response variable to test the fit of each candidate model. The candidate models were compared based on intercept, coefficient of determination (R 2 ), p-values and Root mean square error (RMSE). The R 2 value we present is based on the comparison between the predicted lichen volume and the empirical field measurement. For the sake of comparison, however, we also present the R 2 value derived from the linear regression, as this method might have been used in previous studies. To compare our model with the LVE (Falldorf et al., 2014) we applied the algorithm using Google Earth Engine on the same Landsat composites that we used for the AI analysis. We also calculated Bias and Mean Absolute Error (MAE) using the Metrics package (Hamner and Frasco, 2018) and R 2 using the MLmetrics package (Yan, 2016). Hengl et al. (2018) suggest that random forest can be used as a generic framework for spatial interpolation and to test to what degree the response variable is spatially autocorrelated. Following their method, we constructed a random forest (150 trees) with measured lichen volume as the dependent variable and buffer distance from other plots as independent variable (Hengl et al., 2018).

Importance of variables
One downside of deep neural networks is that the training process essentially is a black box offering little insight compared to more traditional methods, and the model gives no straight forward coefficients for the different predictors. Therefore, we did a permutation test to evaluate the explanatory contribution of each predictor variable. The idea is to compare how the model efficiency is affected when one of the input variables is permuted (randomly shuffled among the data points). A small decrease in model efficiency indicates that the permuted predictor contributes little to the predictive power of the model, and vice versa. We made 11 copies of the original data set, one for each predictor (7 bands, 3 indices and elevation), and permuted the input rasters for a single predictor in each copy. The resulting 11 runs (one for each predictor variable) were then compared to the output of the original test set, using the Nash-Sutcliffe model efficiency coefficient (Nash and Sutcliffe, 1970; Zambrano-Bigiarini, 2020).

Applicability
The Finnmark region in northern Norway has experienced a  considerable reduction in lichen abundance since the 1980s (Tømmervik et al., 2009). To test if the model could accurately detect changes in lichen abundance over time, we created a grid of points (240 m between point) covering the Kautokeino reindeer winter grazing area (5400 km 2 ) and applied the model based on 2 Landsat mosaics from 1984 to 1985 and 2020-2021 respectively. We interpolated the values between the gridded points and masked water bodies based on NIR pixel values (threshold = 90) in the original Landsat mosaics.
To evaluate if the model produced reasonable results in areas that were not used for training, we also applied the model to two sites in Canada and Russia, with a grid resolution of 120 m, and interpolating the values between. (For the Canadian site we used the Canadian Digital Elevation Model from Natural Resources Canada). We also applied the model in native Landsat resolution to two smaller areas in Norway to illustrate how the model responded to bright rocks and sand.

Results
When applied to the test set (n = 159), all models described a positive relationship between predicted values and empirical field measurements. The models were capable of detecting volumes as low as 1 dm 3 / m 2 and as high as 100 dm 3 /m 2 in heaths, tundra and open pine and birch forests. Applying the Lichen Volume Estimate algorithm (Falldorf et al., 2014) to our test set rendered an R 2 of 0.17 (RMSE = 12.31 dm 3 /m 2 , n = 158, df = 1, p < 0.01). In the evaluation of spatial autocorrelation, the random forest model explaining measured lichen volume based on distances between plots (to evaluate the influence of spatial autocorrelation) reached a R 2 = 0.33 (RMSE = 15.1 dm 3 /m 2 ).

Candidate model training
The use of different training sets (Table 1) affected the goodness of fit and variation explained by the candidate AI models with R 2 ranging from 0.53 to 0.57 and RMSE from 9.95 to 11.24 dm 3 /m 2 (Table 3, Fig. 4). The model based on all available training data (Core + Mixedsources) performed similarly as compared to the model trained with the core subset alone (Table 3). However, the model based on the Core and Mixed-sources had an intercept value of 4.05 compared to 5.64 in the Core set, (p intercept > 0.16). Including the "remote" dataset (derived from aerial photos) in the training resulted in lower RMSE, but the highest intercept value (9.95). We considered the model trained on the Core + Mixed-sources as the best, based on the lowest intercept, highest regression score and lowest mean absolute error (See Table 4). This is the model subject for all following analyses, mapped estimations and the discussion.

Permutation test
The full model had a Model Efficiency (ME) of 0.55. We ran 70 complete permutation iterations and calculated the mean ME. There was a sequential decrease in mean explanatory value (absolute change in ME) from the most important predictor variable, NDVI (ΔME = 0.13), to the least important, TIR (ΔME = 0), showing no drastic shifts across the range (Fig. 5). Elevation and the Green band was the second and third most important variables, whereas NDMI and the Red band was the 2nd and 3rd least important. For some predictors, the effect of the permutations varied considerably between iterations as indicated by the comparably large standard deviation (SD Green : 0.07, SD Elevation : 0.05, SD NIR :0.05). The TIR, however, was constantly uninformative (SD = 0.01).

Applying the best candidate model to circumpolar sites
The model was applied to test sites in Quebec, Canada and the eastern Kola Peninsula, Russia (Fig. 6). Although we did not have ground-truth data from the sites outside of the Nordic countries and the western Kola Peninsula to evaluate the accuracy of the model, the predictions of the model corresponded well with our visual interpretations of lichen abundance using images on Google Earth, and literature confirm that these regions are rich in lichens. Visual comparisons against high-resolution aerial photos in Norway confirmed that the model performed well, detecting high lichen abundance in sparse Scots pine (Pinus sylverstris) tree forest stands ( Supplementary Fig. S2).
However, when applied to sandy areas or human infrastructure, the model was unable to distinguish some bright surfaces from lichens and produced incorrect high estimations of lichens ( Supplementary Fig. S3a, Table 3 Table over the years field data was collected and what year the corresponding satellite image used for remote sensing was acquired. Satellite images originating from a different year than the field data was used when satellite coverage for the correct year was too patchy. Those instances are marked with * and the year of the used image. an industrial site and sandy river sides in North Norway).

Time series analysis
The model detected that lichen abundance halved between 1984 and 2020, both in mean lichen volume per m 2 (1984: 23.1 ± 14.45 dm 3 /m 2 , 2020: 11.9 ± 7.14 dm 3 /m 2 ) and in total volume for the whole Kautokeino area (1984: 116 million cubic meters, 2020: 60 million cubic meters, Fig. 7). The decrease was particularly visible near the reindeer fence along the Finnish-Norwegian border where the striking contrast between the thicker lichen layers on the Norwegian side and the scarcer lichen vegetation on the Finnish side decreased considerably during the period (Fig. 8).

Discussion
We assessed pale lichen volume using neural networks trained on Landsat satellite images and elevation data. By the use of artificial intelligence, we have here developed models that rendered considerably better fit than the method developed by Falldorf et al. (2014) did against field-based measurements.
In general, all candidate models managed to predict lichen volume up to 100 dm 3 /m 2 with high statistical significance (p < 0.001) and R 2 values over 0.53. The predictive power of the models was high, given the results of Räsänen and Virtanen (2019) and Virtanen and Ek (2014) who concluded that at least 5 m spatial resolution is required for accurate vegetation mapping in treeless northern ecosystems. However, all of the models tended to overestimate lichen volume at low values, as indicated by the intercept values. Model outputs for areas with low lichen abundance are therefore likely to result in some overestimations. The model did also underestimate highest volumes (indicated by a slope lower than 1), and the general accuracy for higher lichen volumes was lower. This is probably largely an effect of the training dataset having comparably few data points with high lichen volumes and that the repetition of high values could not entirely compensate for this. However, it could also be related to a decreasing difference between plots with varying thickness, but consistently high cover over a certain level. Although our experience from fieldwork is that the thickness of lichen cover has a major impact on the reflected light, satellites, in a strict sense, are only able to register surface properties. If the difference between plots with varying lichen thickness is becoming less pronounced with increasing cover, there is likely a saturation threshold where remote sensing methods become less Fig. 4. The performance of the three candidate models predicting empirical lichen volume in plots that were new to the deep neural network. The three models were trained with datasets of increasing size: a) solely the core data set, b) The core set as well as a data set from mixed sources, c) The core, mixed and remotely assessed data set.  accurate or unable to efficiently quantify differences in high lichen biomass. This kind of problem is not new to remote sensing, a parallel example is the issues with accurate remote sensing of biomass in forests (Lu et al., 2016). More high cover datapoints would be needed to assess this effect and improve the model. The inclusion of additional training data sets did not considerably affect the model performance (Table 3), but the intercept value improved (i.e. got closer to zero) by including the mixed sources dataset. The opposite happened when including the remotely assessed lichen volumes, which increased the intercept value. Altogether, this suggests that these additional datasets did not provide any information of fundamental value to the training process. While the mixed-sources dataset potentially resulted in more noise due to the variable quality of input data, the remotely assessed data points represented clear categories that likely did not introduce any new information to the training. This method, as we implemented it, did not provide an alternative to the time-consuming collection of empirical data in the field. Given the nature of neural networks, the current model can continuously be trained with new field data.
There was spatial autocorrelation in the field measured lichen data, and the test set was not spatially independent from the training set.
Earlier, Ploton et al. (2020) argued that spatial dependency between training and test set might lead to overestimates of model fit. However, the example of Ploton et al. (2020) represented an extreme case, since their model predicting aboveground forest biomass in Africa, in addition to 9 satellite bands, included 27 explanatory factors with strong inherent spatial autocorrelation (e.g. precipitation, temperature, absolute elevation, solar radiation). On the contrary, our model did not include any climatic or similar explanatory variables with strong patterns across pixels, which likely makes our modelling less sensitive to bias from spatial autocorrelation. This is further supported by the fact that the random forest model that used distances from other plots as the explanatory variables to explain lichen biomass (following Hengl et al., 2018) had notably lower explanatory capacity (R 2 = 0.33) than our neural network models (R 2 0.53-0.57). Wadoux et al. (2021) showed that a random K-fold cross-validation is appropriate for systematic random and simple random sample designs. However, our randomly selected test set is likely not able to compensate for the spatial distribution and autocorrelation of our data, and the test statistics can be suspected to be somewhat inflated. A tendency that is likely amplified by the fact that datapoints from Hardangervidda form a dominating part of the dataset. This is a consequence of the data not being collected with the current application in mind, but for different local studies and monitoring programs. Although desirable, a simple random or systematic random field sampling would demand a large investment in terms of money and time which was beyond our financial capacity.

Predictive power of the explanatory factors
The permutation test showed that NDVI was the predictor with the strongest explanatory capacity, closely followed by regional elevation (Fig. 5). The importance of NDVI differed strongly from the results of Kennedy et al. (2020), where NDVI had a very low explanatory value, while elevation turned out as the by far most important predictor. However, there are some differences in the elevation input that can be critical when comparing the role of topography in the two studies. Kennedy et al. (2020) used absolute altitude while our model applies locally normalised elevation. While topography is likely to be important, lichens are not limited by elevation per se (Crittenden, 2000). However, if altitude and lichen abundance differed considerably between study sites, a neural network could infer an elevation signal that represents regional differences (for example in Rangifer grazing regime) rather than an actual effect of topography. This can result in an artificially boosted predictive power in the training regions at the cost of general applicability. Locally normalised elevation values, on the other hand, remove altitudinal differences between study sites, while keeping the effect of local topography structure. In this regard, our elevation input was likely more comparable to the aspect and slope predictors used by Kennedy et al. (2020) which were the 5th and 8th most important predictors in their model. An important difference, however, is that Kennedy et al. (2020) used elevation data with the same spatial resolution as the satellite imagery, while we used a higher spatial resolution (10 m) providing 9 elevation pixels per reflectance pixel (30 m). This way we did not need to calculate separate values for aspect or slope as the neural network could derive such information directly from the data. Our single elevation input, hence, comprises the elevation, aspect and slope inputs of the model of Kennedy et al. (2020). Yet, despite a higher explanatory potential (more pixels provided), elevation had a lower explanatory capacity in our model compared to Kennedy et al. (2020). This, combined with the relatively lower predictive power of aspect and slope in their study, could be an indication that the model of Kennedy et al. (2020) may rely too much on regional differences in absolute altitude.
The predictive power of the green band (3rd most important) varied the most between iterations. The high variation suggests that certain combinations of the green band and other explanatory variables have a disproportional influence on the predictions. This would then also be the case, to a decreasing extent, for Elevation, NIR and NDLI.
The bands red, green, NIR and SWIR1 are all used in the calculation of the normalised indices. This could cloud the interpretation of their importance, since the model could, to some extent, compensate for the permutation of one of them by using the information in the corresponding index, and vice versa. The three indices diverge in this aspect. The most important variable, NDVI, is based on NIR and the red band, none of which turned out particularly informative. NDMI on the other hand, was less informative than its components NIR and SWIR1, while NDLI, and its components green and the SWIR1, ended up in the middle. Since the permutation test did not include correlation between variables, it is likely that the importance of less correlated predictors (e.g. elevation) are somewhat exaggerated.
The thermal band, TIR, could be expected to hold little to no information, and the results confirm this. The band could hence be excluded in further development of the model, but fills the function of a reference variable for the others to be compared against.
Temperature and precipitation were among the 4 most important predictors in the model of Kennedy et al. (2020). We deliberately excluded these variables to keep our model a strict remote sensing product, and we believe that the general applicability of our model is higher because of this choice. In addition to the increased simplicity, the fact that our model relied more on reflectance features is likely an advantage when applied to areas with homogeneous topography, for example lichen tundra or lichen-dominated shrub tundra at lower altitudes.

Applicability
Our results showed that our model was efficient in assessing pale lichen volume in ecosystems with moderate tree cover density, thereby offering a considerable improvement compared to earlier methods based on specific reflectance, for example the LVE equation by Falldorf et al. (2014). Furthermore, we observed that LVE, which is supposed to be applicable to open lichen heaths and lichen tundra, worked poorly on our evaluation set. Further, LVE cannot measure volumes higher than 60 dm 3 /m 2 , while our method has a theoretical limit of 130 dm 3 /m 2 (maximum value in the training set), although no plots over 90 dm 3 appeared in our random evaluation set. Our method detected even small volumes of lichens in lichen tundra, dry heaths, and even in sparsely wooded forests. However, areas with bright sand, such as river sides, was falsely assessed as lichen rich This means that model, although producing realistic results in our test applications, should be used with caution in areas with bare sand or artificially bright features and such larger areas should be masked out using a landcover map or visual classification. In future iterations of the model, specific training to improve the ability to distinguish sand and buildings from lichens should be a priority. Although the model correctly assessed low lichen abundance in a mosaic of dwarf shrub-lichen heath and bare bright bed rock with a scarce cover of fruticose lichens ( Supplementary Fig. S3b), we cannot rule out that bright bedrocks could result in overestimated lichen volumes as well. Furthermore it remains unexplored how the model would perform in estimation of lichen volume in forests with a denser canopy cover. From a reindeer herder perspective, dense forests are not the most important source for winter forage, as lichen volumes tend to decline with increasing tree crown density (Forbes et al., 2019). Thus, in economic and socio-cultural perspective, it is not so critical that the model has not yet been tested on datasets from dense forests.
Given that the training data was sampled in areas where lichens do occur to some extent (Table S2), we have not formally tested the applicability of the model in other vegetation types. Our results, when applying the model to larger areas, indicates that the model is fairly robust as it did not predict high lichen volume in areas where lichens clearly were absent. Although assessment of potentially lichen rich areas are of primary interest, the applicability of the model in other vegetation types is also important. It is possible that the model returns less reliable assessments in vegetation types that we have not investigated, and model output should therefore be manually investigated, especially if applied to vegetation types that were not covered by our study.
The results from the comparison between 1984 and 2020 in the Kautokeino winter grazing area in North Norway fitted well with the results reported by Tømmervik et al. (2009). Our results confirm the possible use of the Landsat catalogue of historical images (back to the launch of Landsat 5). The clear difference between years is an indication of the importance of the spectral bands for the model output, since the same topography input was used in 1984 and 2020 and hence remained constant across the years. The detected change over time confirms that although topography was the second most important explanatory variable, the model is not overinterpreting topography, but using topography in combination with the spectral information. The ability of the model to accurately assess historical changes, also confirms that the model is generally applicable using older images than the collections that were used for training.
The historical comparison also puts the problem with overestimation of lichen abundance in lichen free areas in perspective. Although the model had a tendency to assign moderately high lichen abundance in some lichen free areas, it was still sensitive to the drastic decrease in Kautokeino winter grazing area where large areas became virtually lichen free. This shows that the model, despite some inaccuracies in lichen free areas, is able to detect relative changes. However, when used for absolute assessments in low-lichen areas, it should be combined with control measurements, preferably in the field.
The consistency between the model output and our visual interpretation, when we applied the model on landscape-scale across different areas in North America and Russia, suggests that the model performs consistently across regions and continents. Although we lack groundtruth data to verify that the accuracy is consistent outside our field sites, we find those results promising, opening up for circumpolar investigation and monitoring of lichen volume. Together with the models ability to detect lichen in sparsely forested areas (Fig. S2), potentially also in areas with mosaics of shrubs and trees. This is a major improvement compared to the LVE, which was originally only intended for use in exposed lichen heaths and lichen tundra (Falldorf et al., 2014) and therefore having a more limited regional applicability.
Our results confirm that neural networks are useful in complex remote sensing tasks. But contrary to the results of Kennedy et al. (2020) and Jozdani et al. (2021) our best model achieved higher accuracy with fewer explanatory variables; hence delimiting the input data requirements to elevation and the bands that Landsat 5, 7 and 8 have in common, i.e. datasets that are easily retrievable and do not rely on any environmental assumptions or weather data. In addition, our method appears to be robust for direct volume estimations of lichens over vast areas. The cropping of satellite images is however a time-consuming process for larger areas, and hence we applied the model on a lower spatial resolution (120-140 m) than the native 30 m Landsat resolution, an limited more detailed assessments to the specific areas along the Finnish-Norwegian border and the examples in the supplementary data. Preliminary analyses of Tveraa et al. (unpublished data) showed that change from 120 and 240 resolution had a limited effect on total lichen volume assessment, but high computational load could still be a limitation for mapping over larger areas if higher resolution is needed. More efficient raster cropping, or implementation of the model directly in Google Earth Engine, have the potential to improve high spatial resolution processing in large scale studies.
To summarize, volume is arguably a more ecologically and socially relevant metric of lichen abundance than cover, as several field-based studies have confirmed a close relationship between volume and biomass of terricolous fruticose lichens. The ability to remotely assess lichen volume, combined with the long historical catalogue of Landsat images, enables spatial and temporal studies of variation and changes in lichen biomass related to multiple research questions. The approach we have developed can be used to predict the quality of lichen pastures for Rangifer and, ultimately, facilitate and improve Rangifer habitat and population management (Rickbeil et al., 2017;Tømmervik et al., 2012). Combined with information on changes in drivers, for example climate, land use and management, and air pollution, the model can be used to provide estimates of environmental changes, as well as ecosystem services (Riseth et al., 2016;Stoy et al., 2012). Finally, as current vegetation-climate models typically disregard lichen abundance (Ellis, 2019), our lichen volume model may help to improve vegetationclimate models by facilitating continental-scale assessments of groundlichen volumes.

Funding
This study was supported by the Research Council of Norway (287402,294948)

Declaration of Competing Interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: co-author is a close relative of a co-founder of Peltarion AB, R.E.

Data availability
Data will be made available on request.