Near-infrared spectroscopy for the distinction of wood and charcoal from Fabaceae species: comparison of ANN, KNN AND SVM models

Aim of study: The objective of this work was to evaluate the potential of NIR spectroscopy to differentiate Fabaceae species native to Araucaria forest fragments. Area of study: Trees of the evaluated species were collected from an Araucaria forest stand in the state of Santa Catarina, southern Brazil, in the region to be flooded by the São Roque hydroelectric project. Material and methods: Discs of three species (Inga vera, Machaerium paraguariense and Muellera campestris) were collected at 1.30 meters from the ground. They were sectioned to cover radial variation of the wood (regions near bark, intermediate and near pith). After wood analysis, the same samples were carbonized. Six spectra were obtained from each specimen of wood and charcoal. The original and second derivative spectra, principal component statistics and classification models (Artificial Neural Network: ANN, Support Vector Machines with kernel radial basis function: SVM and k-Nearest Neighbors: k-NN) were investigated. Main results: Visual analysis of spectra was not efficient for species differentiation, so three NIR classification models for species discrimination were tested. The best results were obtained with the use of k-NN for both wood and charcoal and ANN for wood analysis. In all situations, second derivative NIR spectra produced better results. Research highlights: Correct discrimination of wood and charcoal species for control of illegal logging was achieved. Fabaceae species in an Araucaria forest stand were correctly identified.


Introduction
In recent years, a wide range of nondestructive techniques with potential to evaluate wood properties have been developed, such as acoustics, Pilodyn densito-metry, resistography, rigidimetry, computer tomography (CT), near infrared (NIR) spectroscopy, and scanning and radial sample acoustics (Fundova et al., 2018;Pyörälä et al., 2019;Schimleck et al., 2019;Silva et al., 2020). Near infrared (NIR) spectroscopy is a vibrational spectroscopic Forest Systems December 2020 • Volume 29 • Issue 3 • e020 technique operating in the region from 800 to 2500 nm, whose main advantages are high spectral acquisition speed and good potential for analysis of materials (Schwanninger et al., 2011). NIR spectroscopy has been applied for analysis of a diverse range of solid materials, such as fruits, meat, cereals, pharmaceutical products, rubber, textiles, soil and wood, among others (Pasquini, 2018). Tsuchikawa (2007), Tsuchikawa & Schwanninger (2013), Tsuchikawa & Kobori (2015) produced comprehensive reviews of potential applications of near-infrared spectroscopy in wood science. The literature reports, for example, the use of NIR to identify chemical composition based a traditional destructive chemical analysis (Poke & Raymond, 2006, Funda et al., 2020, to evaluate moisture content based on hyperspectral images during natural drying (Kobori et al., 2013), to predict density of lodgepole pine (Stirling et al., 2007), to determine anatomical characteristics of eucalyptus (Viana et al., 2009), production line parameters of engineered wood material (Husted et al., 2007), biomass properties (Acquah et al., 2015), and also to distinguish species (Russ et al., 2009), among other potential uses.
Considering the difficulties in identifying species when there are no leaves, fruits or flowers to analyze, the possibility of identifying samples by NIR spectroscopy is highly relevant for wood and especially for charcoal. Davrieux et al. (2010) highlighted the importance of spectral analysis to discriminate wood and charcoal from Tabebuia serratifolia and Eucalyptus grandis applying NIR and MIR (mid-infrared) spectroscopy.
However, to adequately interpret the results, it is necessary to consult databases containing data from a wide range of species and botanical families, from different locations. This is the case for species from remnant Atlantic Forest areas in Northeast Brazil, which encompass dense, mixed and open ombrophilous and semideciduous vegetation as well as other associated ecosystems, such as mangrove swamps, sand banks, mountain fields, marshes and forest enclaves (Brasil, 2006).
Mixed ombrophilous forest can be called Araucaria forest (Kersten et al., 2015). This type of forest physiognomy was highlighted starting in the twentieth century, principally as a result of wood overexploitation. Narvaes et al. (2005) stated it is necessary to preserve these forested areas by protection of remnants and/or management plans to prevent unsustainable exploitation. In this sense, the use of effective techniques to identify species during inspection of wood and charcoal being transported is indispensable.
In Brazil, important studies applying NIR for Amazonian wood discrimination were performed by  and Braga et al. (2011), in particular as a potential tool for distinction of mahogany, cedar, andiroba and curupixa. The application of NIR spectroscopy to distinguish wood species in Atlantic Forest remnants is scarce. Studies have been performed by Pace et al. (2019) with trees from a natural fragment in Espírito Santo state, and Vieira et al. (2019b) for Myrtaceae species in Araucaria forest areas in Santa Catarina state.
The Fabaceae family is the third most common (in number) tree species in Araucaria forest areas (Gasper et al., 2013), but there are no studies applying near infrared spectroscopy to distinguish the species of wood and charcoal samples from this forest type. So, the objective of this study was to verify the potential of this spectroscopic technique for species distinction of wood and charcoal of three native Fabaceae species (Inga vera, Machaerium paraguariense and Muellera campestris) found in Araucaria forest areas.

Material
Trees of the evaluated species were collected from an Araucaria forest stand in the state of Santa Catarina, southern Brazil, in the region to be inundated by the São Roque hydroelectric project. Three trees of each species were analyzed (Table 1). Botanical material was registered in the Lages Herbarium of Santa Catarina State University (LUSC) and access to the material is registered with the Brazilian Council for Management of Genetic Heritage (CGEN/SISGEN) under number AF3EDDC.
In addition to other botanical material, two wood discs with approximate thickness of 5 cm were obtained at breast height. The botanical material and one disc from each tree were used for species identification, registration and deposited at the LUSC herbarium. Nine more discs, one from each tree, were evaluated in this study, three per species.
To account for wood variation, samples were obtained in three radial positions: near bark, intermediate (mid-radius) and near pith, for a total of nine samples per species. The samples' dimensions were 2 x 2 x 2 cm (radial x tangential x longitudinal) and to eliminate oxidation effects and saw marks on surfaces, they were smoothed with 100 grit sandpaper and stored at 25 ± 2 °C and relative humidity of 50 ± 2%. To obtain charcoal material, after spectral analysis of wood, the same samples were carbonized in a muffle furnace, with a final temperature of 450 °C and a heating rate of 1.66 °C min -1 , based on Muñiz et al. (2012).

NIR spectra
The NIR spectra of wood and charcoal samples were obtained with a Tensor 37 spectrometer (Bruker Optics Ettlingen, Germany) with an integrating sphere, operating with 64 scans and resolution of 4 cm -1 . Spectra from 10000-4000 cm -1 were collected directly from the surface. Two spectra in each anatomical section (transversal, longitudinal tangential, longitudinal radial), i.e., 6 spectra per sample, were obtained, for a total of 54 per species.

Principal component analysis
Principal component analysis (PCA) was performed with the R software, applying packages FactoMineR (Lê et al., 2008) and factoextra (Kassambara & Mundt, 2017). This test was applied to wood and charcoal data to verify the behavior of NIR spectra with raw data and after Savitzky-Golay second derivative transformation, with 15 smoothing points. After some previous analysis with first and second derivative data, we opted only to use second derivative spectra. Derivatives can remove baseline influence and undesirable noise (Rinnan et al., 2009) and can be applied to enhance selectivity (Pasquini, 2018).
The results obtained with PC1, PC2 and PC3 were evaluated, but the first two PCs were more representative of the evaluated data. So, here we show only results from analysis with PC1 and PC2.

Classification models
To verify the potential for Fabaceae species classification, spectra from wood and charcoal were analyzed in raw form and after second derivative transformation, according to three models: artificial neural network (ANN), support vector machines with kernel radial basis function (SVM) and k-nearest neighbors (k-NN).
These models were tested by applying the "train" function for different methods available in the "caret" package of the R software (Kuhn, 2020). Data on accuracy and precision in calibration and prediction in values of each data block were evaluated. In constructing the models, data were divided into learning (70% -38 spectra per species) and testing (30% -16 spectra per species), based on randomly stratified sampling as a function of species. Also, the confusion matrix for each model was evaluated.

Results
The original mean spectra from wood and charcoal samples ( Fig. 1) indicated differences in the analyzed material. In wood, infrared absorption decreased as a function of decreasing wavenumber, while in charcoal the opposite behavior occurred. Visually, a clear distinction of species was not observed. Another important aspect was the close similarity between mean wood spectra of Inga vera and Muellera campestris, and in charcoal between Machaerium paraguariense and Muellera campestris. Fig. 2 shows the differences in second derivative mean spectra of wood and charcoal of the studied species.
Raw and second derivative spectral data of wood and charcoal from principal component analysis are illustrated in Fig. 3.
There was a large difference in values from PC1 and PC2 with raw and second derivative data. For wood with raw data (Fig. 3A), some similarity was present in samples of Inga vera and Machaerium paraguariense, while samples of Muellera campestris were more distinct. However, with raw data for charcoal (Fig. 3B), the behavior was different, with higher similarity between Machaerium paraguariense and Muellera campestris. In the second derivative spectra for wood (Fig. 3C) and charcoal (Fig. 3D), there was better distinction of species, and the grouping of samples from Inga vera was more evident.
The classification models based on k-NN, SVM and ANN for identification of wood and charcoal, with raw and second derivative data, varied in accuracy and precision in the calibration and prediction tests (Table 2).
When considering values from calibration and prediction in classification models, better accuracy and precision results were obtained for second derivative spectra. Also, the values of accuracy and precision of wood models were higher than charcoal models.
In prediction by wood raw spectra, the best accuracy was found for the ANN model (94%), while the values    were 75% for the k-NN and 81% for the SVM model. With second derivative data, all tested models for wood produced values higher than 96% for accuracy and precision.

Model k-NN (%) SVM (%) ANN (%)
In general, the prediction results for charcoal were inferior compared to wood. With raw data, the SVM model produced the best results, with 50% accuracy and precision. The ANN model also showed promising results for charcoal second derivative spectra.
To verify the behavior of spectra of Fabaceae species in different discriminant models, confusion matrixes were formed with raw data and second derivative data, for both wood (Fig. 4) and charcoal (Fig. 5).

Discussion
Mean spectra (Fig. 1) for both evaluated materials were very similar, probably due to the anatomical and chemical characteristics of the species. There were comparable wood anatomical characteristics in function of kind and distribution of axial parenchyma cells, as was noted by the presence of lozenge-aliform and confluent cells (Inga vera) and aliform confluent cells (Machaerium paraguariense) (Vieira et al., 2019a). Also, all species have similar tangential diameter of vessels.
Similarity of spectral patterns of native species from the same botanical family is frequent. Soares et al. (2017) also reported close similarity of spectra from six Amazon Forest species, while Pastore et al. (2011) reported difficulties of NIR spectroscopy to discriminate native species from Brazil.
Visually, for charcoal, some proximity was observed for mean spectra of Machaerium paraguariense and Muellera campestris. On the other hand, for wood, mean spectra of Machaerium paraguariense had different behavior, principally at wavenumbers less than 8500 cm -1 , which according to Schwanninger et al. (2011), indicate lignin and hemicellulose.
When comparing second derivative spectra ( Figure  2), some differences for wood and charcoal were observed in the region from 4400 to 4000 cm -1 , with the presence of bands related to cellulose, hemicellulose and lignin (Schwanninger et al., 2011). These results are in accordance with wood thermal degradation, which can vary in function of carbonization time and temperature, but is principally influenced by species characteristics (Nisgoski et al., 2019).
Pretreatment of data is frequent in studies to verify the potential of NIR spectroscopy for forest species discrimination. Some authors have applied first derivative transformation (Bergo et al., 2016;Soares et al., 2017) and others second derivative (Toscano et al., 2017;Nisgoski et al., 2018;Vieira et al., 2019b), while Ramalho et al. (2018) applied the standard normal variate (SNV) model. In that work, three transformations were tested, but the authors only discussed the best results, obtained from second derivative transformation. Also, in charcoal Sandak et al. (2016) reported that the chemical information after second derivative application was still preserved and the pretreatment effectively removed the scatter and some noise. Figure 3 shows the modifications in chemical composition after carbonization, changing the NIR spectra, as expected, and consequently the PCA distribution. Carbonization causes degradation of biopolymers, resulting in changes in NIR spectra because these reflect the composition of charcoal, and even when the carbonization process is the same among species, alterations in chemical composition depend on the original material, i.e., the species' characteristics (Davrieux et al., 2010;Muñiz et al., 2013;Nisgoski et al., 2015).
The second derivative data (Figure 3 C, D) indicated diverse mean behavior of the species, confirmed by the ellipses. Better results for distinction of wood species based on PCA with second derivative spectra also were described by Hwang et al. (2016) in identifying Pinus species from Korean historic architecture.
In the classification models (Table 2), second derivative NIR spectra showed better results in all tested procedures. Horikawa et al. (2015) also described better results for identification of anatomically similar Diploxylon species in Japan based on second derivative NIR spectra and PLS-DA models.
The high values of accuracy and precision of the classification models confirmed the potential of using NIR spectroscopy to differentiate wood and charcoal samples of the species Inga vera, Machaerium paraguariense and Muellera campestris. However, this requires standardization of factors, such as sampling sites and moisture content of the material. Hein et al. (2017) commented that more studies are necessary to evaluate models based on NIR spectroscopy in real situations, such as evaluation of samples with diverse granulometry and surface quality, among other factors.
Other studies have also described adequate results for species distinction with these classification models. With the artificial neural network method (ANN), Esteban et al. (2009), using biometry of anatomical traits, reported 92% probability in differentiation of Juniperus cedrus and Juniperus phoenicea var. canariensis. Also, applying quantitative anatomical characteristics Turhan & Serdar (2013) evaluated SVM to differentiate three Salix species from Turkey and found successful of classification of 95.2% in test group validation. Applying NIR spectroscopy, Zhou et al. (2020) described that SVM produced better results than LDA in discrimination of a green mix of timber from Tsuga heterophylla and Abies amabilis, and Xu et al. (2019) described a correct rate of 92.8% in prediction and identification of origin of Angelica dahurica.
NIR spectroscopy in association with classification models can be applied to a wide range of material. For example, Balabin et al. (2011), using different discriminant techniques, such as regularized discriminant analysis (RDA), soft independent modeling of class analogy (SIM-CA), partial least squares regression and classification

Forest Systems
December 2020 • Volume 29 • Issue 3 • e020 (PLS), K-nearest neighbor (KNN), multilayer perceptron (MLP), support vector machine (SVM) and probabilistic neural network (PNN), applied NIR spectra to classify motor oils. They found that among the tests evaluated, simple techniques such as k-NN were considered inadequate for classification. In general, the confusion matrix for wood (Fig. 4) confirmed the behavior of previous analysis. In raw data, discriminant models based on k-NN and SVM indicated the similarity between Inga vera and Muellera campestris, which was also verified in visual analysis of spectra ( Fig. 1) and PCA (Fig. 3). For raw data, the ANN model had higher efficacy in distinction of I. vera. Considering second derivative data, models based on k-NN and SVM distinguished all species, and ANN presented confusion of one sample between Muellera campestris and Machaerium paraguariense.
In the confusion matrix based on charcoal samples (Fig. 5), the same tendency of the other analysis was verified. Just as for wood, raw data of charcoal based on k-NN and SVM confirmed the similarity between M. campestris and M. paraguariense, illustrated in Figs. 1 and 4. With second derivative data, SVM tended to classify samples of all species as I. vera.
Based on our results, and mainly considering the speed and the possibility of identifying wood based on NIR spectra in association of chemometrics, NIR spectroscopy can be applied for distinction of wood and charcoal of the evaluated Fabaceae species.

Conclusion
There was close similarity between mean wood spectra of Inga vera and Muellera campestris, and in charcoal between Machaerium paraguariense and Muellera campestris. Second derivative spectra of wood and charcoal indicated some distinction of material based on PCA distribution.
The three classification models tested can be applied for species discrimination. The results indicated the use of k-NN for both wood and charcoal and ANN for wood analysis. In all situations, second derivative NIR spectra produced better results.
NIR spectroscopy can be applied for distinction of wood and charcoal from Inga vera, Machaerium paraguariense and Muellera campestris, native Fabaceae species of Araucaria forest areas.