Introduction
Brazil is one of the largest charcoal producers, with a reaching 5,3 million tons in 2019 (Ministry of Mines and Energy 2020). Besides being a world producer, Brazil is also one of the largest consumers of charcoal. Most of this production is destined for the internal market, mainly for the pig-iron and steel sectors and lesser, for the ferroalloy sector and residential consumption (ABRAF 2013). However, this demand is not supplied through charcoal using planted forests, making the illegal exploitation of native forests attractive.
In order to try to prevent this illegal production, the Ministry of the Environment, through Ordinance No. 253/2006, established the Forest Origin Document (DOF), an obligatory license for the transportation and storage of forest products and by-products, that includes information about the origin of those products. This license expired in cases when the transported product does not correspond to the species authorized in the DOF. In this context, forensic identification is used in the analysis of the preserved wood in charcoal to determine his origin (Gonçalves et al. 2012, Nisgoski et al. 2014), i.e., to distinguish those produced with native forests from those from planted forests, mainly composed of species of Eucalyptus (Davrieux et al. 2010). The principal clones used to produce charcoal are Eucalyptus urophylla, E. grandis, and hybrids E. urophylla x grandis, E. urophylla x camaldulensis, and E. grandis x camaldulensis (Santos 2010, Pereira et al. 2012).
Usually, the anatomic analysis of charcoal can be done through a macro or microscopic approach. In the microscopic identification is observed features of the tissues and the constituent cells of the wood (Zenid and Ceccantini 2012), while in macroscopic analysis, only anatomical features visible to the naked eye or with a magnifying glass, such as vessel arrangement and grouping, arrangement and abundance of axial parenchyma and ray width (Wheeler and Baas 1998). Both analyses can be used in the distinction between Eucalyptus and other genera.
Much has been proposed on the microscopic analysis, as reported in the studies proposed by Gonçalves et al. 2012, Albuquerque (2012) and Muñiz et al. (2012), with higher cost and limited logistics, can identify the charcoal to the level of species with trustable results, although this is not always necessary for charcoal identification for supervision purpose. On the other hand, just a few studies have been proposed the macroscopic analysis to distinguish the origin of charcoal, although it allows agility and practicality. The genus Eucalyptus present a homogeneous anatomical constitution among the species, under the morphological level, a factor that hinders the separation, based only on the composition and structural arrangement of the wood constituents (Tomazello-Filho 1985, Oliveira 1997). This similarity can help in distinguishing this genus from the others.
Digital image process and machine learning techniques are essential to this task because it allows the acquisition of visual features for the automatic classification. Some studies proposed to classify charcoal images with a non-automated user-based process. Khalid et al. (2008) proposed a method based on analysis of anatomical images of the transverse plane in order to differentiate charcoals of the genus Eucalyptus sp. from charcoal of native species. Andrade et al. (2019) proposed a system of classification of the origin of the charcoal using analysis of texture in digital images of the cross-section plane. For this, a database was produced containing 900 images of 18 species, 12 native and 6 of the genus Eucalyptus sp. After, texture features were extracted from each image using Level Co-occurrence Matrices (GLCM) (Haralick et al. 1973), which were used in training and in the evaluation of statistical classifiers that identified the origin of the charcoals correctly in about 97 % of the attempts.
However, the previously cited works do not add much to the identification of the origin of the charcoal in the field, due to the subjective, expensive logistic limitation imposed by the use of microscopes and the preparation of the material. The computational resources advances have allowed deep learning approach outperforms techniques based on handcrafted feature extraction on several fields such as computer-aided medical diagnosis systems (Litjens et al. 2017, Rodrigues et al. 2020), remote sensing (Nogueira et al. 2017, Zhu et al. 2017), forest species recognition (Hafemann et al. 2014), identification of ecosystems (Morales et al. 2018, Bayr and Puschmann 2019), agriculture (Kamilaris and Prenafeta-Boldú 2018, Knoll et al. 2018), and other applications (Gu et al. 2018).
Recently, Maruyama et al. (2018) proposed a method for automatic classification of native species of charcoal based on deep learning using Inception-V3 architecture (Szegedy et al. 2016) as a feature extractor. However, it was considered microscopy images, and these experiments performed a simple hold-out validation technique (Devijver and Kittler 1982), which can randomly create biased sets, causing the CNNs to fit non-representative (abnormal) samples and result in unexpected accuracies. Differently, we considered the VGG-16 architecture (Simonyan and Zisserman 2014) instead of Inception-V3. The VGG-16 network was chosen due to its simplicity and robustness. Moreover, it was the first architecture to replace the filters that require more computational power, by large sequences of convolutional filters with size 3x3.
In this work, we study an efficient method for automatic identification of charcoal origin based on deep learning and cross-validation k-fold technique using macroscopic images. This is the first work to classify automatically in order to distinguish Eucalyptus and native species using the VGG-16 architecture. Also, preprocessing strategies based on contrast enhancement, data centralization, and data augmentation on the rotation of the training set images were tested to increase the performance of the CNN with fine-tuning.
Material and methods
The experiment was performed on a machine with an Intel i5 3,00 GHz processor, 16 GB RAM, and a GPU NVIDIA GeForce GTX 1050Ti with 4 GB memory. All experiments were programmed using Python 3.6, the PyTorch 1.7 deep learning framework (Paske et al. 2019) under CUDA version 10.1 (2019) and cuDNN 7.6 (2020). The operating system was Ubuntu 18.04.5 LTS.
Images acquisition
The dataset of macroscopic images of charcoal was acquired from Wood Panel and Energy Laboratory (LAPEM) at the Federal University of Viçosa (UFV), Brazil. The material is composed of samples of carbonized wood of Eucalyptus and native species typical of the region of Zona da Mata, Minas Gerais. Native species were chosen based on the anatomical similarity to the genus Eucalyptus as well as their attractiveness to the illegal production of charcoal. Eucalyptus species were chosen from those predominantly used for the production of charcoal, as Pereira et al. (2012) define.
In this dataset, each species or hybrid is represented by a sample coming from a single tree, without information of age or position of the trunk. The samples were charred in a muffle-type electric furnace, following an initial temperature of 150 ºC, with an increase of 50 ºC per hour, and the final temperature of 450 ºC, totaling 7 hours of carbonization. The condensable gases were collected in a condenser coupled to the muffle door. The species and hybrids used in this study and the numbers of samples for each species are presented in Table 1.
The images were acquired using equipment with led light illumination and support for a cell phone, generating images with 12 megapixels and optical zoom of 20 times. As the charcoal pieces were broken, and not cut, there was a large amount of non-flat surfaces. With this zoom, a larger area in which there are no irregular breaks on the surface of the charcoal (that made it difficult to analyze the distribution of cellular components) could be analyzed.
The dataset is composed of 360 charcoal images, in which 135 images are of Eucalyptus species, and 225 images of native species. An expert in wood anatomy analyzed the charcoal images classified them as Eucalyptus and native. To illustrate them, Table 2 shows information about name, quantity, and one image from each class. All images of charcoal dataset were categorized into two classes properly labeled: eucalyptus (135 images), and native (225 images). After, all images of the charcoal data set were randomly sampled and partitioned into five stratified sets (folds).
Image preprocessing
All images were resized to 224 x 224 pixels, size allowed for the input of the CNN architecture used in this work. Then was applied one of the preprocessing methods and used to train and test the VGG-16 architecture.
Figure 1 shows samples of charcoal images considering each preprocessing strategy evaluated. The original image from the dataset is defined as a strategy (a) (i.e., no preprocessing). In (b), there is an example of contrast stretching strategy.
Data augmentation
Data augmentation is a strategy that consists of increase the training data without increasing the number of samples (Krizhevsky et al. 2012). In this study, we applied data augmentation based on rotations of the images considering angles of between 0 º and 360 º with steps of 45 º, increasing the training set in 8 times.
Convolutional neural networks
The main concepts addressed in the Deep Learning paradigm were obtained from Neural Networks, which aims to develop computer programs capable of solving problems that are difficult to solve through formal rules (Goodfellow et al. 2016). The main characteristic of a Convolutional Neural Network (CNN) is to be composed mainly of convolutional layers, and its main application is the processing of visual information (Ponti et al. 2017). A CNN consists of three types of neural layers, described below (Guo et al. 2016).
Convolutional
The convolutional layer is generated through a set of filters over an input image. Each filter is responsible for detecting a specific type of feature. Figure 2 illustrates the basic structure of the convolutional layer define by C l and composed by filters with size of the spatial stent and the hyper-parameter from the input volume . Finally, the convolution result is added to the bias b, generating K 2D feature maps stacked in an output volume Ml, defined by Equation 1 (Rodrigues et al. 2020).
Pooling
The pooling layer allows reducing the size of feature maps considering maximum or average pooling. The CNN architecture considered in this paper applies maximum pooling because this criterion results in better generalization and faster convergence (Scherer et al. 2010). Figure 3 illustrates the maximum and average pooling considering a pooling layer with size 2 x 2.
Fully connected
The fully connected layer is present in the last layers and converted the two-dimensional feature maps into a one-dimensional feature vector. Finally, the last layer is composed of softmax with neurons representing the number of classes in the dataset. Figure 4 illustrates the fully-connected layers after the convolutional and pooling layers and the softmax layer.
Training based on fine-tuning
The training strategy based on fine-tuning it is a practical and common approach for training deep learning architectures (Goodfellow et al. 2016). The network is previously trained for a classification task using a very large data set (Deng et al. 2009). The parameters values (weights) learned for the initial layers of the network are kept (frozen), and the top layers trained over the data set of interest, which are intended to learn the more complex structures of the data.
VGG-16 architecture
The VGG-16 network, which is composed of 13 convolutional layers, five pooling layers, and three fully-connected (considering the softmax)(Simonyan and Zisserman 2014), was chosen due to its simplicity and robustness. In this study, we evaluated the VGG-16 improved with batch normalization. This strategy maintains the mean output close to 0 and the output standard deviation close to 1, increasing stability across the network and leading to a faster learning rate (Ioffe and Szegedy 2015).
We keep fixed all convolutional layers blocks to maintain the parameters learned from training over the ImageNet dataset, while the top layers have their parameters adjusted using a small learning rate. Figure 5 illustrates the VGG-16, and the blue box indicates the fixed layers.
The training of the VGG-16 is defined as an optimization problem to improve the quality of prediction. In this study, we considered the loss function as the objective function. The loss function used was binary cross-entropy function, commonly used for binary classification problems. In this way, we minimize this function using the Stochastic Gradient Descent (SGD) optimizer (Lecun et al. 1998), a popular optimization algorithm for parameter optimization of machine learning and deep learning models. It is based on a gradient descendent approximation using batches of randomly selected data samples instead of computing the gradient for each object of the dataset. Thus, the SGD optimizer allows finding iteratively the parameter values that minimize the loss function (cross-entropy) (Goodfellow et al. 2016).
VGG-16 was trained with a learning rate of 0,001, weight decay of 1e-6, a momentum of 0,9 momentum Nesterov, mini-batch size of 32, REctified Linear Unit (RELU) function, and training considering 100 epochs.
Validation
The validation of the classification is performed using k-fold cross-validation (Kohavi 1995) statistical method, which partition the data into k folds used for training and test. All images were sampled and partitioned into five stratified sets, i.e., the folds are build preserving (approximately) the proportion of examples for each class of the original set. We repeated the cross-validation five times, and for each iteration, one of the training folds is chosen for validation and the others for training.
Additionally, the mean value of accuracy (Equation 2) is used to quantify the quality of the results. The accuracy index is based on the number of true positives (TP), true negatives (TN), false positives (FP) and false negative (FN), computed from the confusion matrix, that allows verifying the number of correct classifications as opposed to the classifications predicted for each class (Duda et al. 2000).
Also, to visualize the True Positive Rate (TPR) against the False Positive Rate (FPR) at various decision thresholds, it was considered the Receiver Operating Characteristic (ROC). The Area Under ROC (AUC) is used as a reliable classification performance measure of all possible classification thresholds (Figure 6) (Fawcett 2006).
Results and discussion
We trained the VGG-16 architecture considering each contrast improvement strategy and average subtraction. Figure 7 shows the evolution of the loss values and accuracy’s for the considering the average of all k-fold iterations for each preprocessing strategy evaluated. This behavior result suggests that the training did not overfit the data and maintaining the generalization property of the CNN.
In order to assess the values of True Positive Rate (TPR) against the False Positive Rate (FPR) we analyzed the ROC (AUC) for each iteration of the k-fold. The evolution of these values is shown graphically in Figure 8. It is important to note that an AUC upper of 80% for most of the folds results in an average AUC of 84% and 81,6% for original and contrast stretching, respectively. Also, this result suggests that our approach is a promising method.
The mean accuracy resulted from VGG-16 is presented in Table 3, considering each preprocessing strategy evaluated. The use of the original images is the best choice, resulting in a mean accuracy of 85,8%. The data centralization performed by average image subtraction has a positive impact, independently of preprocessing.
The confusion matrices (Table 4) allow observing some aspect of the classification problem investigated in this work. The presented values were obtained for training with the whole training set and prediction over the validation set (which is the 3rd fold). It is worth noticing that the charcoal from native wood is rarely misclassified as eucalyptus, which is the main objective of this research, i.e., to provide a computational method capable of preventing the exploitation of native wood. Although the best overall result was obtained with the original images without preprocessing, it is possible to see that contrast widening allowed the identification of 97,78 % of native woods when fold-3 is considered.
Figure 9 shows samples of native images classified as Eucalyptus for each strategy tested. Although the goal is to perform a binary classification, we found that native species with few samples in the database such as Cydonia oblonga Mill, Inga edulis, Prosopis juliflora, and Sclerolobium paniculatum may be classified as Eucalyptus. Therefore, a small number of samples of these species results in a lack of visual patterns. Also, we observed that the other native species misclassified presents visual patterns similar to Eucalyptus, like an increase in the thickness and distribution of the vessels in the center - bark direction (de Jesus and Silva 2020).
Conclusions
The results allow concluding that, for the classification of charcoal images, the VGG-16 architecture obtained better results when the augmented data set is analyzed considering the average subtraction as preprocessing strategy (values lying on 85,8 %, in terms of accuracy). Also, after learning the particular features, the VGG-16 architecture resulted from the proposed method was able to classify charcoal from native forests, at least, 95 % mean accuracy using original images, i.e., without preprocessing strategy, and considering the 5-fold cross-validation procedure.
The presented results open new opportunities towards better exploiting deep learning for automatic classification between charcoal produced from planted wood (Eucalyptus), and those originated from native forests. As for future work, other data augmentation strategies may be tested, together with other normalization strategies and different types of convolutional neural networks.