PCA-based supervised identification of biological soil crusts in multispectral images

Graphical abstract

The multispectral approach in remote sensing typically includes the estimation of spectral indices. For biocrusts, the normalized difference vegetation index (NDVI [1,2]) has been proposed; however, high NDVI values of wet biocrusts may be misinterpreted as vascular plant vegetation dynamics whereas dry biocrusts only gained negligible NDVI values [3]. To overcome this disadvantage, several more specific spectral indices have been proposed for biocrusts, like the crust index (CI, respectively [4]), the brightness index (BI [5]) or the biological soil crust index (BSCI [6]), where biocrusts are characterized by typical ranges of respective index values. All spectral indices well reflect crust activity or biomass but are not specific enough to differentiate between different crust types, like lichen, cyanobacterial, algal or moss biocrusts. Rodriguez-Caballero et al. [7] propose support vector machines (SVM) for supervised classification of biocrusts from hyperspectral remote sensing data. However, a common problem with linear spectral mixture analysis (SMA) remains when the number of spectral endmembers is greater than the number actually required to unmix an individual pixel in the scene [3]. Non-spectral unsupervised principal component analysis (PCA) classification of biocrusts has indicated that the development of the microbial community was affected at multiple scales, including biocrust successional stage, seasonal effect and the micro-geomorphology [8]. Highresolution VIS-NIR spectroscopy was employed to study the influence of wetting on cyanolichendominated biocrusts in a non-imaging approach [7]. There was no attempt so far in the literature to use multispectral PCA classification of biocrusts in high-resolution images.
The method proposed refers to Rodarmel and Shan [9] who suggested principal component analysis for preprocessing of hyperspectral images. Based on their finding that only the first few principal component image bands contain significant information, this study aimed at elucidating the feasibility of biocrust classification by manual selection of spectral features in PCA ordination plots.
The study site was a catena from the mobile part of an inland dune to dry acidic grassland dominated by Corynephorus canescens and located near Lieberose, Brandenburg, northeast Germany (51 55 0 49 00 N, 14 22 0 22 00 E). A detailed description of the sampling site is given by [10]. The samples represented the sandy substrate, an algal biocrust dominated by Zygogonium ericetorum, a moss crust dominated by Polytrichum piliferum, and a mixed biocrust composed of both Z. ericetorum and P. piliferum (Fig. 1).

Method workflow
Select spectral features by outlining a polygon with the mouse pointer. The function fhs "(freehand select) of the gatepoints package allows selection of data points by outlining polygons in twodimensional scatterplots. In the present method, each data point represents spectrally identical pixels in a PC1 vs. PC2 density plot, where data points inside the polygon were denoted as T "(TRUE) and all other data points as F "(FALSE). Close selection with click on right mouse button and plot selection result using the previously defined plot_PCA function (see step 1 of this section), which generates binarized plots with selected features being represented in white color and all other in black. Fig. 3 shows the selection of the data points representing the bare sandy soil, algae and mosses. selectedPoints <fhs(kk2, names = F, pch = NA_integer_) plot_PCA(selectedPoints)

Hint 1
Step 6 may be repeated multiple times and allows the simultaneous presentation of multiple classes, for example in combined color images of biocrust types (use, for example, selectedPoints1, selectedPoints2 etc. as fhs output for multiple polygons, Fig. 4).

Hint 2
Situations may arise when it is necessary to identify the area where objects with given spectral features are represented in the ordination plot. The predict "function of R may be deployed in this case with steps 2 and 3 of the method workflow being performed for an image containing the desired objects only. The binarized plots of biocrust types may be used for further image analysis, like calculation of coverages (Fig. 7) or for geostatistical analyses.

Example
Provided that the normalized difference vegetation index (NDVI) correlates with net primary production [12], the proposed approach may be used to estimate the relative contribution of biocrust types to photosynthesis. Fig. 6 depicts an example spot covered with an algae, mosses and some higher vegetation (grass), as well as the NDVI image of that spot.
The NDVI image was subsequently masked with binary images of mosses, algae and grass, where the pixel values of 255 were set to full transparency (Fig. 7). Then, the pixel value histogram of the resulting combined images would represent the distribution of net primary production for each biocrust type (Fig. 8).    7. Combined biocrust type image (A) with mosses (green), algae (red) and grass (blue); NDVI images of mosses (B), algae (C) and grass (D). The PCA ordination plot, data points of training spots (see Fig. 6) and the polygon for manual feature selection for each biocrust type are shown in the lower right corners. Coverages were estimated according to HINT 3. Trampled areas and color checker were disregarded for analysis. A high-resolution version of the image is available as eSlide.

Image registration workflow
VIS and NIR photographs were taken using a Olympus Camedia Z5000 consumer camera. Reflectance calibration of the images was performed using a ColorChecker classic (X-Rite) with known RGB and NIR reflectance values of all 24 chart cells [12]. Deviating from [12], the following MATLAB v2018b procedure (The MathWorks, Inc.) was used for image registration, where imageNIR and imageRGB denote the imported MATLAB image objects from NIR and RGB image files, respectively, into the workspace.
1 Select pairs of control points, copy control points as movingPoints for the NIR image and fixedPoints for the RGB image to workspace when finished. This means that the NIR image will be aligned to fully match the RGB image. A minimum of 8 control point pairs yielded best results. > cpselect(imageNIR, imageRGB) 2 Infer spatial transformation from control point pairs. t_concord" contains the transformation matrix for the NIR image object and will be used later in step 4. > t_concord = cp2tform(movingPoints,fixedPoints,'projective');