IMACA – Automated wood identification system of Colombian timber species using convolutional neural networks

doi:10.21203/rs.3.rs-3640320/v1

Download PDF

Research Article

IMACA – Automated wood identification system of Colombian timber species using convolutional neural networks

https://doi.org/10.21203/rs.3.rs-3640320/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Monitoring and controlling illegal timber trafficking remains a formidable global challenge. The timber sector faces this issue without practical and on-site support systems to facilitate these tasks, and there exists a limited availability of technological and automated tools to assist control personnel in fulfilling their responsibilities. The challenge intensifies in regions where workers possess inadequate expertise in confidently identifying the forest species involved in illegal trade. This paper introduces the architectural framework and a computational model for a digital support tool designed to recognize twenty timber species that are illicitly traded in the Colombian Amazon region. A lightweight convolutional neural network was trained using the transfer learning approach and an in-house generated dataset. The resulting model was deployed on the cloud, following Software as a Service principles, and on a portable embedded system. The prototype exhibits a classification performance exceeding 93%, successfully emulating real-world conditions in the field, including challenges such as imprecise cutting techniques, low-resolution image capture devices, and images captured at varying orientations. Furthermore, the classifier model has been incorporated into a chatbot and a low-cost microcomputer, enabling rapid responses in less than ten seconds. This integration enhances versatility, reduces the subjectivity of the identification process, supports both online and offline operation, and offers potential scalability for the entire system.

Chatbots

Colombian wood species dataset

computer vision-based wood identification

forest preservation

transfer learning

Forests play a pivotal role in the lives of hundreds of millions of people, as they provide invaluable resources and ecosystem services for both local and global populations. When wood is harvested and traded in a regulated manner, it can bring about substantial economic benefits. However, the looming threat of deforestation is exacerbated by the selective extraction of high-value commercial timber with multifaceted applications, which endangers not only native forests but also global biodiversity (FAO 2020; Carvalho et al. 2020). This practice significantly contributes to forest degradation and fosters the proliferation of products of illegal timber origin, escalating deforestation rates (Bastin et al. 2019; Peery et al. 2022; Amancio 2020).

One of the primary strategies for curbing illegal timber exploitation involves establishing international policies for legal timber trade and prohibiting the harvest of timber species with a severe impact on forests. This effort is led by the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) (Le 2019). However, the stark reality remains that these laws alone are inadequate for halting, or at least reducing, global illegal logging (CITES 2022; Barroso & Mello 2021; Thompson & Magrath 2021). The vulnerability of the timber supply chain persists at various stages, including storage, transportation, and wood processing, where illicit timber lots can infiltrate.

In this context, attention has been drawn from various fields of science and technology to research and develop solutions to mitigate this issue. There are two primary options: identifying wood at the laboratory level or within the supply chain environment. In the former case, various techniques have been explored, including physical, chemical, and genetic methods, such as DNA analysis (Jiao, Lu, He, Guo, & Yin, 2020), and spectrometry (Yin et al. 2020). Furthermore, there are techniques with broader practical applicability in the field, such as those related to wood anatomy, which enable taxonomic characterization based on the internal structure of wood and are typically performed by trained anatomists or technical experts (GTTN 2019; Wiedenhoeft et al. 2019) (Ferreira et al., 2023).

To complement the efforts of wood anatomists, researchers have been researching in the application of computing, computer vision, and artificial intelligence for several years. These technologies aim to provide tools that bolster CITES policies, support databases of protected species (Figueroa-Mata et al., 2018), and facilitate interactive identification systems using macroscopic wood characteristics as features (Souza et al., 2020). However, despite their utility, such systems are influenced by the geographical location and distinct regions of timber exploitation because variations within the same species exist across different countries. This issue implies the customization of tools developed or adapted for specific contexts, requiring the expertise of those who employ them (Ravindran et al. 2021; Figueroa-Mata et al. 2022; Ravindran et al. 2019) (Silva, Bordalo, Pissarra, & de Palacios, 2022).

Moreover, the extensive array of artificial intelligence and computer vision algorithms has paved the way for developing timber species identification systems using deep learning techniques, primarily through convolutional neural networks. These systems have exhibited reliability that renders them applicable in real-world processes (Verly et al. 2020; De Andrade et al. 2020; Kırbaş & Çifci 2022; Hwang & Sugiyama, 2021). For instance, Xylorix Inspector represents a notable example of a mobile phone application created by Agritix in Malaysia. It employs a macroscopic wood identification system, integrating convolutional network models. This application can be used with a macro lens in the field to determine the authenticity of wood, effectively combating fraudulent practices (Tang et al. 2018; Tang & Tay 2019; Xylorix Division 2022). Similarly, Xylotron, developed in the United States, is a PC tool with a high-resolution digital camera that employs deep learning models to identify various American wood species (Ravindran et al. 2020). It has been adapted to include species from other countries, such as Ghana (Ravindran et al. 2019), Costa Rica (Figueroa-Mata et al. 2022), Peru (Ravindran et al. 2021), and introduced in Colombia, covering 13 local wood species (Arévalo et al. 2021).

In recent developments, MaderApp has been released, employing convolutional neural networks to identify 25 wood species in Peru. It is a collaborative effort between the University Continental and the National Forestry and Wildlife Service (SERFOR) in Peru (Ramos C., 2023). Additionally, a noteworthy breakthrough was achieved with a wood identification app for smartphones using an attachable lens. This innovation relies on convolutional neural networks and can recognize up to 400 species. However, detailed scientific information about its implementation is currently limited (Universidad Politécnica de Madrid, Universidad de Granada, Asociación Española del Comercio e Industria de la Madera, 2023).

The aforementioned underscores the critical need for ongoing research and the development of deep learning systems that can automatically identify the diverse wood species found not only within Colombia but across the globe. In Colombia alone, the number of wood species exceeds 500 (Universidad Nacional 2022). In recent years, there has been a growing interest in adapting tools like Xylotron to accommodate local species (Arévalo et al. 2021) and utilizing digital microscopes with machine learning algorithms tailored to wood identification (Cano Saenz, Ordoñez Urbano, Vargas-Cañas, & Gaitán Mesa, 2019). Hence, the imperative is to create more specialized regional models, develop and train tools from the ground up, and make them accessible for use in the field via mobile phones, digital tablets, or computers. These efforts are crucial in combatting the ongoing issue of illegal timber trade.

Consequently, this study has designed an automated wood identification system for twenty forest species found in the Pacific and Amazonian regions of Colombia. It replicates the current timber control process, which employs a conventional low-resolution digital magnifying glass, a cloud-based chatbot, and an embedded system for querying a deep learning model. This system aims to contribute to the practical application of wood identification in field scenarios.

The CRISP-DM (Cross-Industry Standard Process for Data Mining) methodology (Schröer et al., 2021) was employed to develop a system capable of identifying timber species using artificial intelligence. CRISP-DM’s iterative nature allows for adjustments in each of its generic phases, making it highly applicable to machine learning and deep learning challenges (Schröer et al., 2021). These phases encompass data acquisition and processing, algorithm modeling, performance evaluation, and system deployment. In this context, computational models trained iteratively were used to classify macroscopic image data derived from cross sections of wood. Macroscopic-based methods are the most advantageous for initial wood identification. However, in real-world environments, individuals typically require specialized training and a basic understanding of these methods to differentiate between species (Ruffinatto, Castro, Cremonini, Crivellaro, & Zanuttini, 2020).

2.1. Colombian Forest material - Data

A total of 20 species were carefully chosen, comprising 18 native species and two planted species. This selection includes some of the most commercially relevant species in the region and throughout Colombia. These species have unfortunately been subject to a high level of illegal timber trafficking, as reported by the forestry authorities of the Cauca Regional Autonomous Corporation (CRC) within the environmental management division. The dataset is composed of images from two sources: a dataset of macroscopic images of 11 species, as previously documented in (Cano Saenz et al., 2022), which were obtained using a digital microscope; images of 9 native forest species from the Pacific region, which were acquired using the same type of device during forest harvesting operations conducted by the Ministry of Environment in collaboration with regional corporations. This dataset is publicly available for academic and research purposes, and it can be accessed at https://www.unicauca.edu.co/laboratorios-fisica/maderas_colombia/

The selection of forest species (Table 1) considered various factors, including different families, distinctions between hardwoods and softwoods, commercial and prohibited species, and whether they originated from natural or planted forests. Additionally, species with high ecological value within the Colombian ecosystem were considered.

Table 1

Description of wood species in the dataset.
Scientific Name/Folder Name	Family	Common Colombian Name/Global Trade Name	Wood Type	Feature	Samples
subset 1
Campnosperma panamensis	Anacardiaceae	Sajo/Orey Wood	Hardwood	Natural Forest	137
Cedrela odorata	Meliaceae	Cedro costeño/ Cigarbox cedar	Hardwood	Valuable - ban	159
Cedrelinga cateniformis	Fabaceae	Achapo/Cedrorana	Hardwood	Commercially available	115
Cordia alliodora	Boraginaceae	Nogal cafetero/Laurel	Hardwood	Commercially available	115
Dialyanthera gracilipes	Myristicaceae	Cuángare/Virola/White Cedar	Hardwood	Natural Forest	116
Eucalyptus globulus	Myrtaceae	Eucalipto blanco/Blue gum	Hardwood	Planted specie	167
Handroanthus chrysanthus	Bignoniaceae	Guayacán amarillo/Trumpet Tree	Hardwood	Commercially available	116
Humiriastrum procerum	Humiriaceae	Chanul/Corozo	Hardwood	Valuable - ban	116
Fraxinus uhdei	Oleaceae	Urapan/fresno/Shamel ash	Hardwood	Commercially available	245
Cupresus lusitanica	Cupressaceae	Cipres/Pino Cipres	Softwood	Planted specie	161
Pinus patula	Pinaceae	Pino patula/Ocote	Softwood	Planted specie	248
subset 2
Osteophloeum Platyspermun	Myristicaceae	Aguamanil/caracoli	Hardwood	Commercially available	101
Brosimum utile	Moraceae	Sande/guaimaro	Hardwood	Natural Forest	120
Pouteria Caimito	Sapotaceae	Caimito/caimitillo	Hardwood	Natural Forest	101
Calophyllum mariae	Calophyllaceae	Palo María/ Aceite Maria	Hardwood	Commercially available	100
Carapa guianensis	Meliaceae	Tangare/Andiroba/Nandiroba	Hardwood	Valuable	100
Cariniana pyriformis	Lecythidaceae	Abarco/Caobano/Equitiva	Hardwood	Valuable	101
Guatteria cuatrecasasii	Annonaceae	Cargadero	Hardwood	Commercially available	101
Ocotea insulares	Lauraceae	Laurel Paliarte / guadaripo	Hardwood	Commercially available	100
Qualea acuminata Spruce ex Warm	Vochysiaceae	Pomo/Chisparo/acuminata	Hardwood	Commercially available	100

2.2 Sample preparation and imaging

The image acquisition protocol followed the guidelines provided by the Regional Autonomous Corporation of Cauca (CRC), an organization with extensive field expertise. This protocol was based on an adaptation of the standard procedures outlined by the Ministry of Environment and Sustainable Development, covering aspects such as extraction, cutting, and specific regions of interest (Fig. 1). Additionally, the protocol took into account recommendations from literature sources, particularly those mentioned in Ravindran (2022), to ensure the creation of a high-quality and reliable test dataset (Ravindran & Wiedenhoeft, “”2022).

The preparation of the dataset for the selected timber species involved obtaining a suitable number of images for the timber classifier. This process unfolded as follows:

a) The first subset (dataset 1), which comprises thousands of images, was captured using a digital magnifying glass on exposed crosssections of wood blocks (Fig. 1). From this extensive collection, a careful selection of 100 to 250 images per species was made, focusing on image quality, sharpness, and overall image focus. This selective process was guided by criteria recommended by the authors, with the intent of working with a manageable number of images.

b) The second subset (dataset 2), encompassing the additional nine species, involved the extraction of four to five cylindrical samples from two to three standing specimens per species. These samples were obtained using Pressler drills or forestry drills with a diameter of approximately 5 to 6 mm (Fig. 2). Subsequently, the wood specimens, or wood cores, were left to dry to reduce excess moisture, which can impact image saturation and intensity during the image capture process. Following this drying phase, cross-sections of the wood were performed, revealing the anatomical characteristics of the wood. Multiple images were captured from various sections, totaling between 80 to 120 images per species.

Both subsets have consistent cross-sectional areas, irrespective of whether the wood was originally in the form of a block or a cylinder. This uniformity is maintained since the cut exposes the most pertinent anatomical characteristics for discrimination through macroscopic analysis (Barmpoutis et al., 2018). Additionally, the images were acquired using a digital magnifying glass with a fixed magnification of 10X, equivalent to 3.9 µm/pixel. This magnification covers an area measuring 2.5 mm x 1.9 mm, providing an appropriate scale for observing key wood anatomical features such as pores, fibers, vessels, and parenchyma. Lastly, the images’ resolution was set to 640x480 pixels, ensuring clarity and detail in the visual representation.

2.2. Architectures and modeling for transfer learning

Convolutional Neural Networks (CNNs) are versatile models that have demonstrated their adaptability in various domains, particularly in tasks involving image classification and localization with reference images ( Wang, Y., Zhang, W., Gao, R. et al. Recent advances in the application of deep learning methods to forestry. Wood Sci Technol 55, 1171–1202 (2021)) CNNs embody the core principle of machine learning, aiming to reduce human intervention in the development of autonomous tools (Kattenborn et al., 2021; Macaulay & Shafiee, 2022). To achieve optimal learning, CNNs typically require a substantial volume of images, often in the hundreds of thousands. However, in the context of wood identification, where obtaining specimens and images is limited, transfer learning becomes a well-suited approach. Transfer learning is employed to strike a balance between training error and validation error, preventing the model from memorizing patterns. This balance is achieved through various regularization techniques, including dropout, early stopping, weight decay, and stochastic depth (Xu et al., 2019; Gonçalves & João, 2022).

Consequently, in this study, various CNN models were retrained to find the most suitable weights for classification. Several architectures, such as Resnet-50, EfficientNet-B0, and MobileNet, were experimented with and compared based on their performance. The goal was to determine the most effective architecture for deploying the model in the field. The Resnet-50 architecture (Residual Network) was chosen as a reference due to its widespread use, flexibility, and robustness. Resnet-50 is particularly useful for training deep networks using the residual learning framework, as demonstrated in prior work (Moreno, 2020; Kırbaş & Çifci, 2022). This architecture has also been employed in the Xylotron tool (Ravindran et al., 2020).

Special attention was devoted to the EfficientNet-B0 architecture, which is relatively newer and lighter in terms of trainable hyperparameters. The EfficientNet family of architectures is designed to provide improved accuracy and efficiency in terms of computational processing (Tan & Le, 2019), making it a promising option for the future of timber species identification.

2.3. Evaluation

At this stage, the evaluation of the CNN models involves assessing their performance on both the training and testing datasets, as well as their real-world effectiveness when deployed in the field. Typically, this evaluation is conducted within the context of timber species. It entails using 70% of the available data for training and conducting experimental validations with the remaining 30% of the data, which is kept unknown to the models. The models are fine-tuned by adjusting hyperparameters, including the learning rate (LR), the number of batches per epoch, the total number of epochs, the percentage of layers with dropout, and the number of frozen convolutional layers in transfer learning. The goal is to achieve optimal feature generalization and to deploy the model with the best metrics.

The performance of the classifier models is assessed using a range of metrics, including the confusion matrix (CM), from which key metrics such as accuracy (A), precision (P), sensitivity (S) or recall, F1-score, and the Mathews Correlation Coefficient (MCC) are derived. The MCC (Eq. 1) is a particularly valuable metric as it provides a balanced measure of classification performance (Chicco et al., 2021) and complements more traditional ones by taking into account all four values in the confusion matrix (True Positives TP, True Negatives TN, False Positives FP, False Negatives FN).

$$MCC = \frac{(TP\times TN - FP\times FN)}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN) }}$$

In the field testing of the deployed model within the framework of this study, classification percentages were evaluated. The testing was conducted in warehouses, and involved using timber species samples acquired on-site that were provided by the Cauca Autonomous Regional Corporation. This testing phase offers insight into the classifier’s practicality as a tool for real-world applications. During this stage, a specialized metric known as the F3-score is introduced. This metric is designed to assess the device’s performance by focusing on the top 3 matches or highest-weighted species. The aim is to assist in the determination and verification of forest species. In other words, the system is not expected to achieve 100% accuracy, which is a feat even beyond human capabilities.

2.4. Deployment

Most works focused on deep learning for wood identification typically conclude their efforts after evaluating the model on a test dataset and demonstrating its utility for this purpose. However, only a few applications, such as Xylorix and Xylotron, have progressed to the deployment phase, with the intention of commercial or academic use, as well as for applications in wood transport control processes, as mentioned in Arevalo et al. (2021) and Xylorix (2022).

In this case, the goal is to enhance the flexibility of deploying and using the deep learning model, with or without an internet connection. Two deployment schemes have been considered: i) A low-cost and portable microcomputer (Fig. 4a) that can operate with or without internet access. The CNN model can be used sequentially to make classification predictions, and it offers a user-friendly interface on a touchscreen; ii) A chatbot response system (Fig. 3b) that provides versatility and can be accessed from any device with internet connectivity. This system offers the model’s functionalities through the Telegram messaging application and communicates with Amazon AWS cloud services via the HTTP protocol.

In the second deployment scheme, an API Gateway access point continuously listens for incoming requests. Each message it receives triggers an activation signal to an AWS Lambda computing service. This Lambda function is responsible for processing the image, making predictions, and sending a response message to the user through the messaging system interface. Notably, the AWS Lambda compute service is only activated when a request is received and does not store states. A distinct instance is activated for each request, ensuring streamlined parallelization and cost reduction, especially when the request rate is not continuous or predictable.

3.1. Image Dataset

In this study, a dataset comprising twenty (20) different Colombian timber species was meticulously curated (Fig. 4). The selection of species was based on the availability of timber biological material in the region, particularly in collection centers, local timber warehouses, and timber harvesting forests originating from areas such as the Colombian Amazon and Pacific regions. The dataset compilation was carried out with technical assistance from the Cauca Autonomous Regional Corporation, a government entity. The cross-sectional images collected for this dataset (Table 1), totaled 2,619 images. Most of the species had fewer than 170 images per class, while only two had over 240 images per species. The dataset can be considered balanced for effective application in deep learning. Images with defects, including issues like insufficient focus, lack of cross-section detail and macroscopic characteristics, blurriness, or wood edge artifacts, were systematically discarded (Fig. 5).

The dataset described above served as the foundation for developing several deep learning models based on convolutional neural networks. This process encompasses retraining, validation, and testing. Transfer learning was employed, and the experimental conditions incorporated data augmentation techniques, including random rotation, flipping, transposition, random contrast resizing, and rescaling. These transformations resulted in an expanded training dataset, consisting of over 10,000 images.

3.2. Model Performance

To assess the feasibility of solving the wood identification problem, the model’s performance was evaluated based on the number of parameters per network that require training (Fig. 6). This parameter count is a function of the neural network’s depth, and it directly impacts the network’s size and training duration.

The Resnet-50 and EfficientNetB0 network architectures were implemented using the TensorFlow package and the Python language. The training and validation trends (Fig. 7) illustrate the convergence and performance, helping to determine their suitability as classification models for the 20 wood species classes.

Furthermore, performance metrics for these CNNs (Table 2) were derived from the confusion matrix (Fig. 8) using the test dataset. Given that it was used transfer learning for the network modeling, retraining allowed to fine-tuning hyperparameters, such as setting dropout in two layers at 0.50, introducing an additional dense layer with 32 neurons to apply weight decay as a regularization technique with L1 and L2 set to 1E-4, along with 60 training epochs, a batch size of 60, and a learning rate (LR) of 1E-4. Fine-tuning was achieved by freezing certain layers while readjusting the weights of the last layer to enhance learning about wood features. This approach aimed to ensure that the metrics approached values as close to 1.0 as possible.

Table 2

Performance metrics on improved and optimized CNN models.
Metrics	Resnet-50	EfficientNet B0
Accuracy	0.9269	0.9502
Recall	0.9235	0.9468
F1-Score	0.9289	0.9491
MCC	0.9232	0.9476
Loss function	0.4598	0.3499

3.3. Deployment

Considering the strong performance of convolutional models for timber species classification using transfer learning, the deployment strategy involved selecting one of the best-performing models. Based on the information from the confusion matrix (Fig. 8) and the performance metrics (Table 2), the EfficientNet-B0 architecture was chosen for deployment.

The implementation of the computational service scheme (Fig. 4), involved adapting the trained EfficientNet-B0 model into a lightweight tflite library version suitable for mobile and portable devices. This tool allows for wood species classification through the messaging platform Telegram, making it accessible as an online service (Fig. 9a). Additionally, an alternative deployment option was developed, featuring a graphical user interface (GUI) within a portable embedded system, which functions effectively offline (Fig. 9b). These deployment choices offer a user-friendly, intuitive, and scalable solution, serving as a valuable support tool for wood inspection and verification processes.

The two deployment mechanisms of the system offer a range of options and buttons to optimize usability. In the chatbot, users can submit an image, which is then preprocessed and sent to the classifier. In addition to image submission, the chatbot features buttons for user assistance, including instructions for use, cutting, and image loading. Furthermore, it provides a list of species that the classifier employs for identification (Fig. 9.a).

Similarly, the classifier model on the microcomputer is equipped with a graphical user interface (GUI) that facilitates user interaction through three main options: Scan, Identification, and Exit (Fig. 9.b). With this setup, users can capture an image of the wood specimen using the scan button. After ensuring that the image is suitable for analysis, users can press the identify button to retrieve potential species matches based on the anatomical features of the timber.

In both systems, the model-s top three probable species matches are presented to assist in decision-making and species verification (Fig. 10).

3.3.1. Results of deployment identification

The timber species identification system was put to the test in diverse, uncontrolled environments, including warehouses (cross sections of block specimens), specimens obtained from the forest with the collaboration of the Cauca Regional Corporation (CRC) (cross sections on cylindrical specimens), and images sourced from species within the xylotheque of the University of Cauca (cross sections of block specimens). These evaluations were conducted to assess the accuracy rates and contribute support tools for the legal timber inspection.

In this approach, images were collected from various types of specimens based on the available testing locations, and species identification was executed using the proposed IMACA system (in both deployment schemes). Direct matches indicate that the most probable result represents the correct species. Indirect matches (species within the top 3) signify that the right specie was identified among the three most probable outcomes (Table 3).

Table 3

Field results using the IMACA system
Place	Images	Direct match	Specie match in the top 3
Timber warehouse 1	12	50%	92%
Timber warehouse 2	10	40%	70%
Nursery (CRC) & Xylotheque	17	56%	94%
CRC forest harvesting	26	61.53%	84.61%

Based on the results obtained in this research, with the aim of creating a deployable and scalable identifier for Colombian forest species and accessible to the public of interest, the following key points are emphasized:

4.1. datasets

The dataset used in this research, collected through procedures consistent with the control and surveillance of legal forest material, has several noteworthy features: i) Macroscopic Characteristics: It effectively highlights the macroscopic features that differentiate various species. These characteristics contribute to the recognition system, allowing it to distinguish between species commonly traded in the region; ii) Compatibility with Standard Tools: The data collection process mirrors the methods used by wood anatomists, ensuring that the necessary features observed by experts are present in the dataset. This compatibility enhances the system’s performance in identifying wood species; iii) Potential for higher-resolution: While the dataset was gathered using standard tools like a digital magnifying glass, higher-resolution devices capable of capturing larger areas could be considered for future improvements. These advanced tools may provide even greater detail and accuracy; iv) Sensitivity to sample handling: It’s important to note that dataset preparation is sensitive to sample handling and cutting processes. Inadequate cutting may result in flaws in the captured images. Some species are more challenging to cut, affecting the number of images per specimen and per species; and v) Data augmentation: The use of data augmentation techniques, which introduce variations, noises, changes in scale, orientation, and other alterations to the dataset, plays a vital role. This process significantly increases the number of training images (up to 5 times), allowing the computational models to generalize better during training. It also simulates potential variations and challenges encountered in real verification processes under uncontrolled conditions.

4.2. CNN Models

The experimentation results provide valuable insights into the process of timber identification using CNNs and transfer learning. Here are the key takeaways from the results: i) The study validates that reusing pre-trained models through transfer learning for wood identification is a good choice. The performance was significantly enhanced by adjusting the convolutional layers, emphasizing the substantial distinctions between the source domain (common objects) and the target domain (wood anatomy); ii) The evaluation focused on two CNN architectures, Resnet-50 and EfficientNet-B0. Resnet-50, a well-established model, has been used for timber classification in other countries. EfficientNet-B0, a newer architecture, demonstrated favorable characteristics such as slightly shallower depth, high performance, low computational requirements, and a robust performance-accuracy ratio. Both models exhibited strong generalization capabilities for distinguishing macroscopic anatomical features of wood species through transfer learning; iii) Fine-tuning was performed by freezing specific layers and applying regularization methods like dropout, weight decay, and early stopping. These steps were crucial to ensure that the models do not overfit and achieve optimal convergence during training, leading to improved accuracy on the validation dataset; iv) The confusion matrix derived from the test dataset was analyzed, mainly focusing on the diagonal (correct matches) and off-diagonal elements (false negatives and false positives). The performance metrics, including accuracy and the Mathews Coefficient (MCC), were reported. EfficientNet-B0 outperformed Resnet-50 in terms of both the number of matches and performance metrics, thanks to its “Squeeze&Excite” module that optimizes performance; and v) Some confusion in classification was observed, particularly between cupressus lucitanica and pinus patula due to the smooth cross-sectional image they present, making them harder to distinguish from native species. Similar confusion occurred between handroanthus chrysanthus and Eucalyptus globulus, possibly due to issues with the cutting process or structural similarity. This issue highlights the need for a greater variety of specimens and images for model training, but obtaining such diverse data remains challenging.

4.3. CNN Deployment

The decision to proceed with the EfficientNet-B0 model based on its superior performance metrics is a logical choice. Implementing this model on computer systems opens up opportunities for its application in less controlled environments. Both implementation methods utilize the TensorFlow Lite library, ensuring that the model functions as a black-box model, requiring only a preprocessed input image and providing a vector of potential matches.

To make the system user-friendly, the sample preparation process mimics the procedures used in the monitoring and control of timber. This process (Fig. 11) involves cutting the wood to expose its transverse surface, lightly moistening it with water, and capturing an image using a digital magnifying glass, ensuring that a minimum area of 2.5 x 1.9 mm is obtained. Users can initiate this process from a mobile device or the IMACA portable system, sending the image through the Telegram messaging service or uploading it from the GUI. Within a short response time (5–8 seconds), the system provides users with the three most likely identifications, which have been previously filtered. This feature greatly assists users with limited knowledge of wood anatomy in verifying the species.

The deployment of the deep learning model in both service and portable architectures presents two crucial aspects: versatility and simplicity of use and the time and quality of responses to queries. In terms of versatility and simplicity, the system closely emulates the actual processes carried out by competent government agencies in Colombia. This user-friendly approach allows users to easily upload images to the system, whether they’re using a cellphone, computer, or tablet. The second aspect deals with response times and the quality of responses per query. In a real-world context, timber may be captured at various drying stages, and the system must demonstrate its ability to generalize to different distributions beyond those used during the modeling process. Achieving 100% accuracy is virtually impossible due to various limitations, such as sample availability, preprocessing, and image acquisition. The literature also supports this view, acknowledging the inherent biases and limitations of AI systems in practical applications.

IMACA is positioned as a tool to support identification, offering the most probable species options according to the model. The results indicate that direct matches range from 40–61%, with an average of 51.8%, in a non-laboratory environment. This fact means that there is a reasonable level of accuracy for species the system has never encountered before, and the quality of sample preparation is a critical factor affecting efficiency. Factors like the angle of the digital magnifying glass, image blurring, wood block positioning, light conditions, scalpel sharpness, wood condition, and pore coverage can impact the system's performance. Interestingly, when the first option isn't correct, the correct outcome is often found in the second or third probability. This underscores the importance of using an F3-score type metric for evaluating field identification systems. The performance in these scenarios ranges from 70–94%, with an average of 85%, contributing significantly to the decision-making process during inspections. To further enhance a timber identification system, considerations include building a more extensive dataset, increasing data augmentation, adopting higher-resolution digital magnification devices to capture more details, and expanding the number of classes without compromising classification performance.

The research demonstrated that high-performance convolutional neural networks (CNNs), particularly the EfficientNet-B0 and Resnet-50 architectures, can be successfully applied to timber identification problems. These models achieved impressive results, with the EfficientNet-B0 achieving an average F3-score of 85% in real-world and adverse conditions, and both models showing testing dataset accuracies of over 95% and 92%, respectively. These values are influenced by the environment where the image acquisition is made, the conditions of the wood, the place of the trunk where the capture is performed and the quality of the crosscut. These findings illustrate that these CNNs can effectively classify 20 Colombian timber species.

The implementation of this lightweight CNN on both cloud and portable embedded computing devices, with the ability to work with or without internet connectivity, represents a significant step in creating a wood identification system that can be widely accessible. The research not only focused on the technical aspects of identifying timber species using AI but also highlighted the practical applications of such a system in real-world scenarios. By developing prototype tools that individuals can easily use without expertise in wood anatomy, this technology can serve as a valuable resource in the fight against illegal timber trafficking. Furthermore, the versatility and scalability of the deployment mechanisms provide a solid foundation for potentially expanding the system to cover a broader range of timber species in the future. This research showcases the transformative potential of artificial intelligence in making complex tasks more accessible and contributing to environmental conservation efforts. It represents an exciting step toward addressing real-world challenges using cutting-edge technology.

ACKNOWLEDGMENTS The authors are grateful for the collaboration of the Cauca Regional Corporation (CRC), which has assisted with tools for the acquisition of the dataset of images of timber species and support for testing the system in a real environment, as well as the University of Cauca for their support during this study, especially the forestry engineering laboratory and xylotheque under the direction of PhD. Jorge Andres Ramirez.

Author contributions FO, RV, and ND were responsible for the design conception, with FO participating in data acquisition. FO, RV, and ND performed data analysis and interpretation. The initial manuscript was written by FO, with FO, RV, and ND contributing to the review and final approval of the content.

Funding The work described herein did not receive funding from any entity whatsoever.

Conflict of interest: The authors declare that they have no conflicts of interest regarding this article.

Amancio, N. L. (2 de 10 de 2020). Los últimos árboles de la Amazonía. (O. Publico, Ed.) Nodal - Noticias de América Latina y el Caribe. Obtenido de https://www.nodal.am/2020/10/los-ultimos-arboles-de-la-amazonia-por-nelly-luna-amancio-ojo-publico/
Arévalo B., R. E., Pulido R., E. N., Solórzano G., J. F., Soares, R., Ruffinatto, F., Ravindran, P., & Wiedenhoeft, A. C. (2021). Imaged based identification of colombian timbers using the xylotron: a proof of concept international partnership. Colombia forestal, 24(1), 5-16. doi:10.14483/2256201X.16700
Barmpoutis, P., Dimitropoulos, K., Barboutis, I., & Grammalidis, N. (2018). Wood species recognition through multidimensional texture analysis. Computers and Electronics in Agriculture, 144, 241-248. doi:10.1016/j.compag.2017.12.011
Barroso, L. R., & Mello, P. P. (2021). In Defense of the Amazon Forest: The Role of Law and Courts. HARVARD INTERNATIONAL LAW JOURNAL, 62, 1. Obtenido de https://ssrn.com/abstract=3830869
Bastin, J.-F., Finegold, Y., Garcia, C. A., Mollicone, D., Rezende, M., Routh, D., . . . Crowther, T. W. (2019). The global tree restoration potential. Science, 365(6448), 76-79. doi:10.1126/science.aax0848
Cano Saenz, D. A., Ordoñez Urbano, C. F., Gaitan Mesa, H. R., & Vargas-Cañas, R. (2022). Tropical Wood Species Recognition: A Dataset of. Data, MDPI, 7(8, 111), 1-7. doi:https://doi.org/10.3390/data7080111
Cano Saenz, D. A., Ordoñez Urbano, C. F., Vargas-Cañas, R., & Gaitán Mesa, H. R. (2019). Implementation of a computational system in order to perform wood species identification using Machine Learning. Ingenieria, tecnologia, automatización: Industria 4.0 y desarrollo CIITA (págs. 193-204). Medellín: CIMTED. Obtenido de http://memoriascimted.com/libros/
Carvalho Jr., E. A., Mendonça, E. N., Martins, A., & Haugaasen, T. (2020). Effects of illegal logging on Amazonian medium and large-sized terrestrial vertebrates. Forest Ecology and Management(118105). doi:10.1016/j.foreco.2020.118105
Chicco, D., Starovoitov, V., & Jurman, G. (2021). The Benefits of the Matthews Correlation Coefficient (MCC) Over the Diagnostic Odds Ratio (DOR) in Binary Classification Assessment. IEEE Access, 9, 47112-47124. doi:10.1109/ACCESS.2021.3068614
CITES. (10 de 07 de 2022). CITES - Convention on International Trade in Endangered Species of Wild Fauna and Flora. Obtenido de https://cites.org/eng
De Andrade, B. G., Basso, V. M., & De Figueiredo Latorraca, J. V. (2020). Machine vision for field-level wood identification. IAWA Journal, 41(4), 681–698. doi:10.1163/22941932-bja10001
FAO & UNEP, ONU. (2020). The State of the World’s Forests 2020: Forests, biodiversity and people. Roma, Italia: FAO and UNEP. doi:https://doi.org/10.4060/ca8642en
Ferreira, C. A., Inga Guillen, J. G., Buendia, R. H., Alanya Vidal, O. D., Reyes Aliaga, D. C., Centeno, W. G., . . . Tomazello Filho, M. (2023). Identification of 20 species from the Peruvian Amazon tropical forest by the wood macroscopic features. CERNE, 29, 1-14. doi:10.1590/01047760202329013134
Figueroa-Mata, G., Mata-Montero, E., Valverde-Otárola, J. C., & Arias-Aguilar, D. (2018). Automated Image-based Identification of Forest Species: Challenges and Opportunities for 21st Century Xylotheques. 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), (págs. 1-8). San Carlos, Costa Rica. doi:10.1109/IWOBI.2018.8464206.
Figueroa-Mata, G., Mata-Montero, E., Valverde-Otárola, J., Arias-Aguilar, D., & Zamora-Villalobos, N. (2022). Using Deep Learning to Identify Costa Rican Native Tree Species From Wood Cut Images. Frontiers in Plant Science, 13(789227). doi:10.3389/fpls.2022.789227
Gonçalves Dos Santos, C. F., & João, P. P. (2022). Avoiding Overfitting: A Survey on Regularization Methods for Convolutional Neural Networks. ACM Computing Surveys, 54(10), 1-25. doi:10.1145/3510413
GTTN. (17 de mayo de 2019). https://globaltimbertrackingnetwork.org/. Obtenido de https://globaltimbertrackingnetwork.org/2019/05/17/why-wood-anatomists-are-more-in-demand-than-ever/
Hwang, S.-W., & Sugiyama, J. (2021). Computer vision-based wood identification and its expansion and contribution potentials in wood science: A review. Plant Methods, 17(47). doi:10.1186/s13007-021-00746-1
Jiao, L., Lu, Y., He, T., Guo, J., & Yin, Y. (2020). DNA barcoding for wood identification: global review of the last decade and future perspective. IAWA, 41(4), 620-643. doi:10.1163/22941932-bja10041
Kattenborn, T., Leitloff, J., Schiefer, F., & Hinz, S. (2021). Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 173, 24-49. doi:10.1016/j.isprsjprs.2020.12.010
Kırbaş, İ., & Çifci, A. (2022). An effective and fast solution for classification of wood species: A deep transfer learning approach. Ecological Informatics, 69(101633). doi:10.1016/j.ecoinf.2022.101633
Koch, G., Heinz, I., Schmitt, U., & Richter, H.-G. (2018). Wood anatomy - the role of macroscopic and microscopic wood identification against illegal logging. 8th Hardwood Conference - New Aspects of hardwood utilization - From science to Technology, 8, pág. 10. Sopron, Hungary. doi:10.13140/RG.2.2.34178.32963
Le, T. (2019). CITES as global governance: Paths to consensus and defining nature through uncertainty. Journal of International Wildlife Law & Policy, 22(2), 115-144. doi:10.1080/13880292.2019.1629176
Macaulay, M. O., & Shafiee, M. (2022). Machine learning techniques for robotic and autonomous inspection of mechanical systems and civil infrastructure. Autonomous Intelligent Systems, 2(8). doi:10.1007/s43684-022-00025-3
Ministerio de Ambiente. (27 de 09 de 2022). COVIMA. Obtenido de https://archivo.minambiente.gov.co/index.php/temas-bosques-biodiversidad-y-servicios-ecosistemicos/4546-app-covima.
Moreno Diaz-Alejo, L. (2020). Análisis comparativo de arquitecturas de redes neuronales para la clasificación de imágenes. Universidad Internacional de la Rioja (UNIR), Madrid.
Peery, R., Cullingham, C. I., Coltman, D. W., & Cooke, J. E. (2022). Traceability of provenance-collected lodgepole pine in a reforestation chain of custody case study. Tree Genetics & Genomes, 18(5). doi:10.1007/s11295-022-01568-5
Ramos C., E. (2023). Identificación de las ventajas del aplicativo móvil Maderapp y su contribución a la transparencia en el control forestal de madera en la selva central-Junín. Lima: Universidad Continental. Obtenido de https://hdl.handle.net/20.500.12394/12600
Ravindran, P., & Wiedenhoeft, A. C. (2022). Caveat emptor: On the Need for Baseline Quality Standards in Computer Vision Wood Identification. Forest, 13(4), 632. doi:10.3390/f13040632
Ravindran, P., Ebanyenle, E., Ebeheakey, A. A., Abban, K. B., Lambog, O., Soares, R., . . . Wiedenhoeft, A. (2019). Image Based Identification of Ghanaian Timbers Using the XyloTron: Opportunities, Risks and Challenges. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Vancouver, Canada. doi:arXiv:1912.00296
Ravindran, P., Owens, F. C., Wade, A. C., Vega, P., Montenegro, R., Shmulsky, R., & Wiedenhoeft, A. C. (2021). Field-Deployable Computer Vision Wood Identification of Peruvian Timbers. Frontiers in Plant Science, 12(647515). doi:10.3389/fpls.2021.647515
Ravindran, P., Thompson, B. J., Soares, R. K., & Wiedenhoeft, A. C. (2020). The XyloTron: Flexible, Open-Source, Image-Based Macroscopic Field Identification of Wood Products. Frontiers in Plant Science, 11. doi:10.3389/fpls.2020.01015
Ruffinatto, F., Castro, G., Cremonini, C., Crivellaro, A., & Zanuttini, R. (2020). A new atlas and macroscopic wood identification software package for Italian timber species. IAWA Journal, 41(4), 393–411. doi:10.1163/22941932-00002102
Schröer, C., Kruse, F., & Gómez, J. M. (2021). A systematic literature review on applying CRISP-DM process model. Procedia Computer Science, 181, 526-534. doi:10.1016/j.procs.2021.01.199
Silva, J. L., Bordalo, R., Pissarra, J., & de Palacios, P. (2022). Computer Vision-Based Wood Identification: A Review. Forest, 13(12), 2041. doi:10.3390/f13122041
Souza, D. V., Santos, J. X., Vieira, H. C., Naide, T. L., Nisgoski, S., & Oliveira, L. E. (2020). An automatic recognition system of Brazilian flora species based on textural features of macroscopic images of wood. Wood Science and Technology, 54, 1065–1090. doi:10.1007/s00226-020-01196-z
Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. International conference on machine learning, 2019 PMLR (págs. 6105-6114). ArXiv. doi:10.48550/arXiv.1905.11946
Tang, X. J., & Tay, Y. H. (2019). Xylorix: An AI-as-a-Service platform for wood identification. IAWA-IUFRO International Symposium. Hong Kong, China: IAWA.
Tang, X. J., Tay, Y. H., Siam, N. A., & Lim, S. C. (2018). MyWood-ID: Automated Macroscopic Wood Identification System using Smartphone and macro-lens. Proceedings of the 2018 International Conference on Computational Intelligence and Intelligent Systems CIIS 2018, (págs. 37-43). Phuket, Thailand. doi:10.1145/3293475.3293493
Thompson, S. T., & Magrath, W. B. (Julio de 2021). Preventing illegal logging. Forest Policy and Economics, 128(102479). doi:10.1016/j.forpol.2021.102479
Universidad Nacional, Sede Medellin. (5 de 10 de 2022). Laboratorio de Productos Forestales “Héctor Anaya López”. Obtenido de Xiloteca MEDELw: https://cienciasagrarias.medellin.unal.edu.co/museos/xiloteca/
Universidad Politécnica de Madrid, Universidad de Granada, Asociación Española del Comercio e Industria de la Madera. (2023). (AEIM) IMAI App. Obtenido de https://monumai.ugr.es/goimai/
Verly Lopes, D. J., Burgreen, G. W., & Entsminger, E. D. (2020). North American Hardwoods Identification Using Machine-Learning. Forest, 11(3), 1-9. doi:10.3390/f11030298
Wiedenhoeft, A. C., Simeone, J., Smith, A., Parker-Forney, M., Soares, R., & Fishman, A. (2019). Fraud and misrepresentation in retail forest products exceeds U.S. forensic wood science capacity. PLoS ONE, 14(7). doi:10.1371/journal.pone.0219917
Xu, Q., Zhang, M., Gu, Z., & Pan, G. (2019). Overfitting remedy by sparsifying regularization on fully-connected layers of CNNs. Neurocomputing, 328, 69-74. doi:10.1016/j.neucom.2018.03.080
Xylorix Division, Agritix, Malaysia. (05 de 10 de 2022). https://www.xylorix.com/.
Yin, Y., Wiedenhoeft, A. C., & Donaldson, L. (2020). Advancing Wood Identification – Anatomical and Molecular Techniques. IAWA Journal, 41(4), 391-392. doi:10.1163/22941932-00002150

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

IMACA – Automated wood identification system of Colombian timber species using convolutional neural networks

Status:

Version 1

Abstract

Figures

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1. Colombian Forest material - Data

2.2 Sample preparation and imaging

2.2. Architectures and modeling for transfer learning

2.3. Evaluation

2.4. Deployment

3. RESULTS

3.1. Image Dataset

3.2. Model Performance

3.3. Deployment

3.3.1. Results of deployment identification

4. DISCUSSION

4.1. datasets

4.2. CNN Models

4.3. CNN Deployment

5. CONCLUSIONS

DECLARATIONS

REFERENCES

Additional Declarations

Status:

Version 1