Classification of tree species based on hyperspectral reflectance images of stem bark

ABSTRACT Automatization of tree species identification in the field is crucial in improving forest-based bioeconomy, supporting forest management, and facilitating in situ data collection for remote sensing applications. However, tree species recognition has never been addressed with hyperspectral reflectance images of stem bark before. We investigated how stem bark texture differs between tree species using a hyperspectral camera set-up and gray level co-occurrence matrices and assessed the potential of using reflectance spectra and texture features of stem bark to identify tree species. The analyses were based on 200 hyperspectral reflectance data cubes (415–925 nm) representing ten tree species. There were subtle interspecific differences in bark texture. Using average spectral features in linear discriminant analysis classifier resulted in classification accuracy of 92–96.5%. Using spectral and texture features together resulted in accuracy of 93–97.5%. With a convolutional neural network, we obtained an accuracy of 94%. Our study showed that the spectral features of stem bark were robust for classifying tree species, but importantly, bark texture is beneficial when combined with spectral data. Our results suggest that in situ imaging spectroscopy is a promising sensor technology for developing accurate tree species identification applications to support remote sensing.


Introduction
Globally, forests are adapting to altering environmental conditions that have been accelerated by climate change and past forest management decisions.The ensuing biodiversity losses, decline in forest resilience caused by pest outbreaks, and extreme weather conditions have placed forest management policies under critical discussion.Consequently, for example in Europe, forests have been adopted into the European Union Green Deal (European Commission, 2019), which has an ambitious forest strategy (European Commission, 2021) to preserve and protect forests for future sustainability.The prevailing argument is that improving our knowledge on forests through research, innovative data collection, and versatile forest monitoring can improve and promote forest-based bioeconomy and forest management within sustainable limits.One way of assessing biodiversity and supporting forest management is to continue developing accurate tree species identification methods.With novel approaches, the extent and patterns of species diversity could be mapped and understood better.A potential data source for this could be in situ optical spectroscopy.Optical spectroscopy is particularly attractive due to its possible synergies with air-and spaceborne remote sensing platforms.Automatic tree species identification from in situ imaging spectroscopy could potentially reduce expenses of collecting training and validation data for remote sensing, and support decision-making in forest robotics (automatization of harvesters), while the same data source could also be used to provide spectral data on woody elements, for example, as input for forest reflectance models (Hovi et al., 2022;Kuusinen et al., 2021;Malenovský et al., 2008;Verrelst et al., 2010).In addition, visible to near-infrared reflectance images of stem bark could possibly be used to assess an individual tree's health (Finley et al., 2016) and understand better the drivers for wood boring pest infestations and their host selection (Campbell & Borden, 2005).
Although the stems of trees are often the most easily accessible canopy elements, field measured reflectance spectra of bark have been rarely utilized to identify tree species.Understanding the full power of bark spectra for tree species recognition is further hindered by the small number of geographical locations and species sampled by the previous studies.Previously, Hadlich et al. (2018) and Juola et al. (2020) have demonstrated using two different measurement set-ups that bark might hold significant spectral features to identify tree species.Hadlich et al. (2018) investigated if the reflectance spectra of inner and outer bark could be used to recognize 11 different Amazonian tree species.They utilized a portable point spectrometer and a contact probe to collect the stem bark reflectance spectra directly in the field and concluded that their technique can improve the quality of local species identification.Furthermore, Juola et al. (2020) demonstrated the use of laboratory measurements of multiangular reflectance spectra of outer bark in successfully classifying three common boreal tree species found in Europe.One reason for the scarcity of studies is that in situ spectral measurements are particularly tedious and the equipment are often expensive.The recent development of small portable imaging spectrometers (i.e.spectral cameras) provides an alternative to the conventional point spectrometers.In comparison to the standard point spectrometers, the benefit of spectral cameras is that they capture data in image format that can also offer supplementary information about bark, such as texture.Whilst supplementary, texture is something unique for all surface materials and might help in categorizing trees if the reflectance spectra are indistinguishable.Exploring the benefits of different data collection set-ups provides the necessary tools for developing more accurate tree species identification methods in the future.
Texture as a measure describes how pixel intensities are spatially arranged within the image or a region of interest.Common approaches to quantify texture have included the manual calculation of statistical features or automatic feature extraction with computer vision algorithms (Tuceryan & Jain, 1993).Historically, the most used approach to identify tree species from bark images has been to treat it as a texture recognition problem by utilizing the manually extracted statistical features (Sulc & Matas, 2013).Past studies have extracted features based on, for example, Local Binary Patterns (Boudra et al., 2015;Sulc & Matas, 2013), grey-level co-occurrence matrices (Fiel & Sablatnig, 2011;Song et al., 2004), Gabor filter banks (Zheru et al., 2003), SIFT descriptors (Fiel & Sablatnig, 2011), and rotationally invariant multispectral texture features (Remeš & Haindl, 2019).Recent studies have also begun to employ image classification methods with automatic feature extraction capabilities, such as convolutional neural networks (CNNs), and have reached equal or state-of-the-art accuracies when compared to the texture recognition approaches (Carpentier et al., 2018;Wu et al., 2021).Regardless of the specific method or approach, the task of identifying plants has been built around the recognition of RGB images (Šulc & Matas, 2017).Future work recommendations have been towards the adaptation of existing algorithms (e.g.convolutional neural network architectures), extraction of more efficient statistical features, collection of more and versatile data, or understanding what complementary information (e.g.tree structural measures or color) is required to surpass the current state-of-the-art classification accuracies (Carpentier et al., 2018).Hyperspectral images potentially provide complementary information compared to traditional RGB images that only resolve the red, green, and blue wavelengths.The additional spectral wavelengths might provide finer details about the texture differences among tree species compared to conventional methods.Finally, hyperspectral images that have been processed into reflectance quantities could be used to compare both the spectral and texture content to understand which features should be considered when identifying tree species based on stem bark.
The overall objective of this study is to introduce and analyze the use of hyperspectral reflectance images of stem bark in developing accurate in situ tree species identification methods for European forests.Our data set provides a unique perspective to an experimental problem that has not previously been explored with similar data or attributes.The specific aims are 1) to investigate how stem bark texture differs between tree species using a novel hyperspectral camera set-up and gray level co-occurrence matrices, and 2) to assess the potential of using visible to nearinfrared reflectance spectra and texture features of stem bark to identify tree species.

Study species and spectral data
We designed a hyperspectral camera set-up that can be operated in field conditions and used it to collect a data set of reflectance images of stem bark representing ten boreal and temperate tree species in Finland and Estonia (Figure 1).The studied tree species were Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst), silver birch (Betula pendula Roth), littleleaf linden (Tilia cordata Mill.), English oak (Quercus robur L.), black alder (Alnus glutinosa (L.) Gaertn.),European ash (Fraxinus excelsior L.), gray alder (Alnus incana (L.) Moench), Norway maple (Acer platanoides L.), and European aspen (Populus tremula L.).The common names will be used throughout this study to refer to the different species.The study sites and sampling design have been described by Juola et al. (2022).
All hyperspectral images were taken with a Specim IQ imaging spectrometer (serial number: 190-1100152) manufactured by Specim, Spectral Imaging Ltd. (Oulu, Finland) and covered the visible (VIS) to near-infrared (NIR) wavelengths region (400-1000 nm).The imaging spectrometer was attached to a tree stem measurement set-up during data acquisition (see Juola et al., 2022 for technical details).The measurement height was 1.3 m (i.e.breast height) and the measurement distance to the bark surface was approximately 20 cm.The processed data set consists of 10 species × 20 samples × 173 wavelengths.
The reflectance images were processed from corresponding hyperspectral images of a white reference standard and the target surface (i.e.stem bark).The reflectance quantity measured was hemisphericaldirectional reflectance factor (HDRF, Schaepman-Strub et al., 2006).The white reference used in this study was a 25.5 × 25.5 cm Spectralon® panel manufactured by Labsphere Inc. (North Sutton, U.S.A) with a nominal reflectance of 99% in VIS to NIR.The imaging spectrometer recorded 204 spectral bands in the 400-1000 nm wavelength region (full width at half maximum of 7 nm).The number of recorded pixels was 512 × 512 and the field-of-view of the Specim IQ was 31 × 31 degrees.Hence, the measured surface area on the stem and the pixel size were approximately 11.1 × 11.1 cm and 0.02 mm, respectively (Figure 1).Data between the wavelengths of 400-415 and 925-1000 nm were clipped due to signal instability, as reported originally by Behmann et al. (2018).Consequently, the size of one processed reflectance data cube was 512 × 512 × 173 (pixels/pixels/ bands).We use the term reflectance data cube to refer to the processed stack of reflectance images in all spectral wavelengths captured by the imaging spectrometer.Further technical details for the Specim IQ imaging spectrometer have been published by Behmann et al. (2018), and detailed explanations for the data collection and processing have been published by Juola et al. (2022).

The Gray Level Co-occurrence Matrix
We analyzed texture of stem bark with a set of Haralick texture features (Haralick et al., 1973), which are computed from a Gray Level Co-occurrence Matrix (GLCM).From a range of methods that tackle texture in image analysis, texture features based on GLCMs have become a standard approach in remote sensing and other fields (Bharati et al., 2004;Tuceryan & Jain, 1993).
The GLCM is a matrix that tallies the co-occurrences of neighboring gray tones found in the digital image.The GLCM is a square matrix, and it is commonly made symmetrical around the diagonal before normalization.Symmetry assures that each pixel has been considered as a reference and a neighbor.Normalization of the GLCM is done to represent the co-occurrences as probabilities, which are needed in the feature computations.The dimensions of the square matrix are determined by the number of gray levels, n, found in the image.In addition, the GLCM requires that the pixel values are quantized to a discrete range, e.g.7-or 8-bits.The distance between neighboring gray tones, d, can be defined according to user needs but the original features (as in Haralick et al., 1973) considered only the immediate pixel neighbors (i.e.d = 1).Finally, the GLCM can be constructed for the full spatial extent of the digital image (i.e.global view), or it can be constructed multiple times from smaller rectangular views, i.e. windows, that together cover the entire digital image.Different window sizes have been found to influence further analyses, but the choice is often made on an ad hoc basis (Ozkan & Demirel, 2021).

Computation of Haralick texture features
Haralick texture features can be calculated for each wavelength (i.e. for each reflectance image) separately, but due to the high computation time and linear correlation between neighboring wavelengths, it is advisable to process the high-dimensional data cubes into gray tone images in some way that summarizes the information content within that image.To produce gray level images from our reflectance data cubes, that still hold as much as possible of the original spectral information, we 1) applied Principal Component Analysis' dimensionality reduction separately for each individual reflectance data cube, 2) normalized the first principal component images to the range of 0-1 (with image-wise min-max scaling), and 3) quantized the normalized images to the range of 0-255 (i.e.8-bit quantization and total of 256 gray levels).Principal Component Analysis was implemented with a toolbox for Python called Scikit-learn (version 0.23.2, Pedregosa et al., 2011).
The 8-bit gray tone images were used to compute texture images with six different moving window pixel sizes (3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, and 15 × 15).The gray tone images were padded (reflection of the edge pixels) appropriately to produce equal sized output after the moving window convolution (i.e.size of 512 × 512 pixels).For each moving window view, we created symmetric and normalized (256 × 256) GLCMs in four angular directions (0, π/4, π/2, and 3π/4) with d = 1.Hence, the total number of symmetric and normalized GLCMs per window view was four.The symmetric matrices in the four angular directions account for the immediate neighborhood of pixels to the reference.We also created symmetric and normalized GLCMs with the full extent of the gray tone image (i.e.no moving window and using all gray tone pixels to construct the global GLCMs in four angular directions).
Finally, we computed a set of six commonly used Haralick texture features from the four GLCMs for all viewed windows per sample and the global views: 1) contrast, 2) dissimilarity, 3) homogeneity, 4) angular second moment (ASM), 5) energy, and 6) correlation (Figure 2): (1) (2) Energy ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi P nÀ 1 i¼0 Where, n is the number of gray levels (n = 256), P i,j is the (i,j)th element in the GLCM, provide the standard deviations for the GLCM.To achieve directionally invariant texture features, each of the computed features for the four GLCMs per sample were averaged.The GLCMs and the Haralick texture features for the window views were computed using an image processing toolbox called Scikit-image that is written in Python (Scikit-image: image processing in Python, Van Der Walt et al., 2014).The computations were implemented as parallel computing in the Puhti supercomputer, provided by CSC -IT Center for Science, Finland.The python data module version used in Puhti for this study was 3.9-2.

Classification methods
We report the results for two highest overall accuracy scoring classifiers: linear discriminant analysis (LDA) and convolutional neural networks (CNN).We also explored other commonly used classifiers (parametric and non-parametric), namely, quadratic discriminant analysis, linear and non-linear support vector machines, and random decision forests.The accuracy of those classifiers was not as good as LDA and CNN in our data set.The support vector machines and random decision forests performed poorly because of the small ratio of training data to features, while the non-linear quadratic discriminant analysis could not generalize as efficiently as its linear counterpart.

Linear discriminant analysis
We used linear discriminant analysis (LDA) to assess the potential of using mean visible to near-infrared reflectance spectra and the texture features of stem bark to identify boreal and temperate tree species.To do this, the mean spectra were calculated from each reflectance data cube by averaging the HDRF quantities over the reflectance image (512 × 512 pixels) per wavelength, and texture features were calculated by averaging the quantities over the texture images (512 × 512 pixels) per texture feature (Figure 3).Consequently, the number of available texture and spectral features were 6 and 173, respectively (Figure 3).The spectral and texture features were tested to be normally distributed (Shapiro-Wilk test), with few exceptions for individual features and species (e.g.birch and maple).The exceptional features were observed to have generally slight skew and kurtosis present.To address the phenomena of curse of dimensionality (Bellman, 1957) and multicollinearity of reflectance spectra, the LDA analyses were performed with two feature selection approaches: 1) exhaustive feature search, and 2) stepwise forward feature selection (Figure 3).Exhaustive search ensures that an optimal combination of features (from the available candidates) is selected, but due to large computational burden it required a preselection of features, which could have limited the achievable classification accuracy.Stepwise selection can handle a larger number of features, but due to the greedy selection algorithm (features that perform best alone are selected first) does not necessarily guarantee an optimal combination of features.The preselection for the exhaustive search was done by calculating the correlations between neighboring wavelengths and then removing all wavelengths that had a correlation of 0.998 or higher with its neighboring wavelength.Hence, the number of spectral features left for the exhaustive feature search method was 19 (Figure 3).The six average texture features finally paired with the spectral features were selected based on classification scores of each of the seven different GLCM sizes (Figure 3).
For evaluating the accuracy of the results, we used a stratified random five-fold hold-out cross-validation scheme (Figure 3).The data set was split into training and test sets, which were composed of 80% and 20% of the samples, respectively.The ten different classes were balanced with 16 samples in the training sets and 4 samples in the test sets.The stratified data split was done in a random manner, so that each sample was once in a test set.The training and test set indices for the five folds from the first classification routine with exhaustive feature search and only spectral features were saved to perform the same cross-validation technique for all later classifications.Finally, the results were averaged over the five folds.For both exhaustive feature search and stepwise feature selection, the optimal subset of features was determined by the highest average overall accuracy produced from the five-fold cross-validation.

Convolutional neural network
In addition to the LDA methods, we carried out a similar classification task with a convolutional neural network (CNN) that consisted of 3D-convolutional layers (Figure 3).We will use the term 3D-CNN to describe our network architecture.Due to the high number of features and computation limits associated with them, the hyperspectral reflectance data cubes were first reduced from 173 wavelengths to 15 (Figure 3).This was accomplished by removing all VIS wavelengths and systematically selecting every 5 th wavelength from red-edge to NIR (i.e. total of 15 wavelengths from 708 to 920 nm).Stem bark spectra has been observed to have higher explanatory value in the red-edge and NIR in comparison to the VIS wavelengths (Juola et al., 2022) and performing a similar optimal feature selection as with LDA in Section 2.3.1 was not feasible.The feature selection and subsequent classification for the 3D-CNN model were done independently of the LDA methods.
The model consisted of eight three-dimensional convolutional layers (Conv1-Conv8 in Figure 4).The convolutional layers are followed by one fully connected linear layer.As by common standard, the input before the linear layer is flattened, i.e. transformed to a one-dimensional vector.In addition to the convolutional and linear layers, we implemented four max pooling layers (maximum value over a specified kernel size) to shrink the data to be able to learn high-level features from the data (Pool1-Pool4 in Figure 4).The number of input and output channels we used was either 256 or 512, depending on the layer in the network (Figure 4).The kernel sizes and the strides for the different layers in the network were selected to have the dimensions of (512 ×) 1 × 1 × 1 as input for the final linear layer (Figure 4).We used rectified linear unit functions (ReLU) as the activation for each of the eight convolutional layers (Figure 4).The activation functions are used to find complex non-linear patterns in the data.
To enhance our model's regularization, we used dropout before the final linear layer with a probability of 0.6 (Figure 4).Dropout is a technique to remove or zero random units from the network with some probability to prevent co-adaptation of neurons during training and to perform model averaging (Hinton et al., 2012).We used a learning rate of 0.0001, a batch size of 16, and we set 1200 epochs as the maximum number of iterations for training.Batch size is the number of samples fed through the network before the model parameters are updated, and one epoch refers to the number of times all training data have been used once to update the model parameters.Our model was trained with the Adam optimizer using recommended default parameters of β 1 = 0.9, β 2 = 0.999, and ε = 1e −8 .We did not use weight decay for the optimizer.We trained the model five times with the same training and test set indices that were determined for the LDA classifications (described in Section 2.3.1).We used the cross-entropy loss to determine the performance of our model.Cross-entropy loss is one of the most used criterions for multi-class classification tasks.Whilst we trained the five models for the full 1200 epochs, we saved each model's state when it achieved its highest overall accuracy and lowest loss in the test data set due to possible overfitting of the model.We used the model states with the highest test data accuracies from the five different models for further analysis (similarly to what was done with LDA in Section 2.3.1).
We further regularized our model's learning through several common data augmentation techniques.The objective of producing new samples from our preexisting training data was to again reduce overfitting and allow the model to classify unseen data more accurately.Artificially increasing our small data set using data augmentation methods can achieve the benefits of large data sets (Shorten & Khoshgoftaar, 2019).All augmentations were performed on-the-fly (i.e.images were augmented before feeding forward to the network) and only for the training set.The original reflectance images in our data set were of size 512 × 512 (height × width) pixels, so our chosen data augmentation pipeline for each training sample was to randomly 1) flip horizontally with a probability of 0.5, 2) rotate clockwise or counterclockwise by 0-45° with probability of 0.5, and 3) select a 224 × 224 square pixel area from the image produced after steps one and two.Randomly cropping to a smaller image size after rotation and flipping enabled us to artificially increase the number of training samples of our data set and train the network with smaller image size.To accommodate the network architecture and the size of the training samples (i.e.224 × 224 being the image size the network is trained with), the test set was center cropped to image size 224 × 224.Finally, the training set was shuffled during model training, while the test set was kept constant.
Our CNN model was implemented with PyTorch (version 1.10.0),an optimized tensor library for deep learning in Python (Paszke et al., 2019).We used an NVIDIA GeForce RTX 2070 GPU for the computations and model training.

Texture features and their performance in classification
Individual pixel quantities varied significantly within the Contrast, Dissimilarity, and Homogeneity images but the averaged values were alike for all seven GLCM sizes.On the other hand, the size of the GLCMs (i.e.window sizes) that were used to calculate the texture images had a notable effect on the averaged texture quantities of ASM, Energy, and Correlation.Consequently, the choice of window size influenced the classification accuracy if averaged ASM, Energy, and Correlation features were within the input features (Table 1).In general, the classification results improved with increasing window size, but there is some saturation to be expected as the global GLCM (i.e.window size covering the entire image) produced lower classification accuracies than the window sizes of 9 × 9, 11 × 11, and 15 × 15 (Table 1).When comparing the seven different GLCM sizes, the 15 × 15 moving window size performed the best with a mean overall accuracy of 58% after stratified 5-fold crossvalidation (Table 1).
The features were examined in detail for the window size of 15 × 15, which was concluded to be the best window size for separating tree species based on texture for our data set (Table 1).Overall, the averaged texture varied substantially within species but the differences between the 10 boreal and temperate tree species were subtle with considerable overlap between species (Figure 5A-F).From the six texture features we calculated, Homogeneity, ASM, and Energy were able to capture more visible differences between some species when viewed separately against one another (e.g.spruce and pine, or birch and maple) (Figure 5C-E).

Classification results with spectral and texture features combined
Both the exhaustive feature search and the stepwise feature selection methods with LDA disclosed that average spectral features derived from reflectance data cubes were robust and have strong potential for identifying boreal and temperate tree species in the field (Table 2).The highest overall classification accuracy when using only spectral features as input for LDA and exhaustive feature search was 92%, and for LDA and stepwise feature selection the result was 96.5% (Table 2).When both averaged spectra and texture were as input features for the two methods, the results improved further to 93% and 97.5%, respectively (Table 2).In general, the training and test set split in Fold 2 was the most challenging for discriminating the boreal and temperate tree species found in our data set (Table 2).
The statistical texture features based on GLCMs (i.e.Haralick features) provided small improvements to the classification accuracies when combined with mean reflectance data (Figure 6).It is also evident with our data set that LDA is sensitive to the number of features as input, as the accuracy saturated and began to degrade if the ratio of observations to features became too small (Figure 6).Exhaustive feature search disclosed that the best combination for only spectral features was 14 features out of 19, and the best combination for spectra and texture features was with 16 features out of 25 (Figure 6 and Table 3).With LDA and the stepwise feature selection, the best combination of only spectral features was achieved with 38 features out of 173, and the best combination with spectral and texture features was with 42 features out of 179 (Figure 6 and Table 3).When inspecting only the spectral features selected by the four different approaches, the exhaustive feature search selected spectral features evenly throughout the VIS -NIR region (Figure 7A-B), while the stepwise feature selection favored spectral features near the red and rededge wavelengths (Figure 7A-B).

Classification with convolutional neural network
The 3D-CNN classification approach used in this study achieved similar results to the two conventional LDA methods, with an overall accuracy of 94% (Table 2).Closer inspection of the performance for the highest average classifications with exhaustive feature search, stepwise feature selection, and 3D-CNN (Table 2) revealed that the methods performed slightly differently with the ten boreal and temperate tree species (Figure 8A-C).For the 3D-CNN the easiest  tree species to predict were pine and black alder (both 100% average overall accuracy) (Figure 8C).The two LDA methods (exhaustive feature search and stepwise feature selection) shared gray alder, maple, and oak as the easiest species to predict correctly (100% average overall accuracies, respectively) (Figure 8A-B).On the other hand, maple had the highest number of false positives for the exhaustive feature search (7 occasions) (Figure 8A) and stepwise feature selection methods (2 occasions) (Figure 8B), while the 3D-CNN predicted black alder false positively in 4 occasions (Figure 8C).For the exhaustive feature search black alder was the most difficult tree species to classify correctly when both spectral and texture features were available (80% average overall accuracy) (Figure 8A).For stepwise feature selection aspen was the most difficult species to rate correctly when given both spectral and texture features (85% average overall accuracy) (Figure 8B).Finally, for 3D-CNN the most difficult species to classify correctly were linden and oak (both 85% average overall accuracy) (Figure 8C).The common misclassifications for all three methods occurred with ash and aspen (Figure 8A-C).

Discussion
In this study, we aimed to assess whether visible to near-infrared reflectance spectra and texture features of stem bark from hyperspectral reflectance cubes could be a viable approach for in situ tree species identification.The main findings are related to the identification accuracies for ten boreal and temperate tree species found in Europe.Both manually obtained spectral and texture features (LDA), and automatically extracted spectral-spatial features (3D-CNN) enabled for species identification with high accuracy (93% when using LDA and exhaustive feature search with spectral and texture features, 94% when using a 3D-CNN, and 97.5% when using LDA and stepwise feature selection with spectral and texture features).The results suggest that there is clear potential in utilizing in situ reflectance images of stem bark captured by hyperspectral cameras to identify tree species.We demonstrated for the first time how reflectance data cubes of stem bark collected in the field with a hyperspectral camera could be used as a novel approach for identifying trees with both conventional and modern image recognition algorithms.The added   14 415, 435, 481, 531, 569, 607, 643, 688, 715, 724, 763, 802, 847, 899 Stepwise feature selection Spectra 38 418,423,429,432,435,438,458,496,522,525,528,546,551,554,646,649,676,682,691,694,706,712,718,721,724,727,730,751,799,866,869,893,896,899,908,911,914,920 Exhaustive feature search Spectra and texture 16 415,435,481,531,569,607,688,706,763,802,899 Dissimilarity, Homogeneity, ASM, Energy, Correlation Stepwise feature selection Spectra and texture 42 418,420,426,432,458,461,473,476,481,487,499,505,590,593,605,607,610,622,625,646,661,667,697,700,718,724,730,733,751,757,760,793,844,896,905,908,911 value of being able to collect bark images in the future with a portable hyperspectral camera is that the data collection is two-fold, it collects a wide range of spectral information, and it preserves the information of spatial distribution in image format.The statistical texture features computed from the hyperspectral reflectance data cubes further increased the already high classification accuracies achieved with reflectance spectra.We see that our approach does not only hold potential for identifying tree species in the field using hyperspectral smartphone-based cameras or hyperspectral cameras attached to forest machines, but the data could also be harnessed in remote sensing.
The six statistical features based on gray level cooccurrence matrices indicated that there are subtle differences in the texture of our ten boreal and temperate tree species.Differences are masked by the large intraspecific variation observed for the six features.Large variation within species is expected when tree species are sampled from a wide range of ages (in this case tree diameter ranging from 15 cm up to 48 cm), because outer bark of trees is often assumed to change with age.We would like to note that to be able to make meaningful comparisons of texture features of other tree species with our results, similar raw data, image resolution, GLCM window size, quantization method, and gray levels would be required because changing any of the above will affect the texture quantities (Brynolfsson et al., 2017).From visual examination of the texture images, it appears that smaller window sizes introduced more noise into the texture features and are consequently unable to capture the meaningful spatial distributions of pixels that describe stem bark for boreal and temperate tree species.From a general perspective, our results support previous findings by Fiel and Sablatnig (2011) where using bark texture alone to identify tree species was deemed challenging and difficult for even trained human experts.When comparing the effect of window sizes on classification results (58% overall accuracy with 15 × 15 window size), the two human experts in their study achieved 56.6% and 77.8% overall accuracies for 11 species with 9 images per species.They also performed classifications with similar GLCM-based texture features but with an SVM algorithm and achieved lower overall accuracies (ranging from 55.5% to 65.6%).Noteworthy from our results was that the largest 15 × 15 window size performed the best for the GLCM analysis.This suggests that window sizes larger than 15 × 15, but smaller than the entire image (512 × 512 pixels), could have improved the results.However, as reported by Lim et al. (2020), if the larger window sizes cannot extract the textural information adequately, the performance will not be further improved by selecting solely larger window sizes.We suggest that future works with similar data as ours should take into consideration also other window sizes to assess the effect on the classification performance.
Most importantly, our results indicate, for the first time, that bark texture is beneficial when combined with spectral data.Even with subtle differences, texture improved the recognition of tree species when reflectance spectra were difficult to distinguish.Reflectance spectra alone were robust, but we suggest more research should be directed towards developing standardized texture features that would be directly comparable between datasets, similarly to spectral signatures.Finally, we suggest that hyperspectral reflectance images and especially spectral features of stem bark should be considered more in future studies when identifying tree species directly in the field.Spectral features of stem bark could prove to be useful, for example, in precision forestry where selective thinning (e.g.cutting trees based on species and quality of individual trees) remains as one of the most complex decision-making problems to be addressed for autonomous wood harvesters (Hellström et al., 2009).Currently, multispectral airborne and satellite imagery provides the basis for tree species determination for robotic wood harvesters (Bergerman et al., 2016) but the joint use of such remote sensing data with new in situ sensor technologies, as the one presented in this study, should be further explored.
The LDA and 3D-CNN classification methods disclosed that special attention should be given to the selection of spectral and texture features.The problem with using only reflectance spectra is that the most optimal feature selection method (i.e.exhaustive feature search) is usually not the optimal method due to the sheer number of features that make the combination search impractical.With the preselection of features, exhaustive feature search suggested that there are informative wavelengths throughout the VIS -NIR region.The greedy method via stepwise feature selection indicated that less than a third of the available spectral features are needed to distinguish tree species most efficiently with LDA.The 38 and 42 features selected by stepwise feature selection are also below one-third of the number of samples, which is within the recommendation of Williams and Titus (1988).In general, there is redundancy in hyperspectral reflectance spectra when it comes to tree species recognition based on stem bark.The most informative wavelengths for stepwise feature selection were grouped primarily in the red and rededge region, with some concentration of wavelength selection in the blue and green wavelengths.Interestingly, gaps in wavelength selection were present in the VIS and in the NIR regions for both stepwise feature selection approaches.The preselection of features based on previous knowledge was also efficient for our 3D-CNN as the overall accuracy was strong (94%).These experimental results suggest that there is considerable potential in using CNNs with hyperspectral reflectance images of stem bark to recognize tree species in the future.However, future studies should recognize the importance of feature engineering when designing a tree species classification approach with hyperspectral reflectance data cubes of stem bark.

Conclusions
Our data set provides a unique perspective to an experimental problem that has not previously been attempted to be solved with an equivalent data source.We suggest continuing data collection with a wider geographical extent, covering more species and biomes.Our study showed that the spectral features of stem bark from hyperspectral reflectance images were very robust for recognizing tree species.There were subtle interspecific differences in texture, yet it remains an important feature to take into consideration in future studies.Finally, our results suggest that in situ imaging spectroscopy is a promising new technology for developing accurate tree species identification methods to support remote sensing of forests.

Figure 1 .
Figure 1.RGB visualization of the stem bark reflectance data cubes for each of the ten tree species and their corresponding mean spectra (black lines) and standard deviations (filled gray region).

Figure 2 .
Figure 2. RGB visualization of the reflectance data cube for a stem bark sample of European aspen (top image), 8-bit quantization of the same reflectance data cube with PCA and min-max scaling (second image from top), and the six Haralick texture images (Contrast, Dissimilarity, Homogeneity, ASM, Energy, and Correlation) calculated from the PCA image with six different moving window sizes (3 × 3, 5 × 5, 7 × 7, 9 × 9, 11 × 11, and 15 × 15).

Figure 3 .
Figure 3. Flowchart of the methods used to classify tree species from reflectance data cubes of stem bark.

Figure 4 .
Figure 4. General structure of the 3D-CNN architecture used to classify 10 boreal and temperate tree species from reflectance data cubes.Textboxes in the top row (blue text) depict the inputs and outputs from one layer to another.Textboxes in the bottom row (green text) depict the operations done between layers.The images above the features maps visualize the activations of the convolutional layers (Conv1-conv8) when a training sample is fed through the network.

Figure 5 .
Figure 5. Box and whisker plots for the six averaged Haralick texture features, a) Contrast, b) Dissimilarity, c) Homogeneity, d) ASM, and e) Correlation calculated for each of the 10 tree species from texture images produced by a moving window size of 15 × 15 pixels.

Figure 6 .
Figure 6.Average overall 5-fold cross-validation accuracy results using LDA with exhaustive feature search (blue solid and dashed lines), and stepwise feature selection (black solid and dashed lines).

Figure 7 .
Figure 7. Average spectra for each of the ten tree species and the spectral features selected by a) exhaustive feature search when only spectral features were available, b) exhaustive feature search when both spectral and texture features were available, c) stepwise feature selection when only spectral features were available, andd) stepwise feature selection when both spectral and texture features were available.

Figure 8 .
Figure 8.Average confusion matrices after 5-fold cross-validation with the highest results using: a) LDA and exhaustive feature search with average spectral and texture features, b) LDA and stepwise feature selection with average spectral and texture features, and c) 3D-CNN with reflectance data cubes.

Table 1 .
Classification results when comparing the effect of moving window sizes and averaged Haralick texture features.

Table 3 .
Features selected by the different classification methods that yielded the highest overall classification accuracies.The spectral features were hemispherical-directional reflectance values at the selected Specim IQ spectral bands.The central wavelengths (nm) of these bands, rounded to the closest integer value, are given in the table.