A COMPARATIVE ANALYSIS OF CONVOLUTIONAL NEURAL NETWORKS APPROACHES FOR PHYTOPARASITIC NEMATODE IDENTIFICATION

,


INTRODUCTION
Phytoparasitic nematode is a harmful nematode that is usually found in plants. Around 4100 identified phytoparasitic nematode species [1], each impacting the host plants differently.
Phytoparasitic nematodes cause an enormous crop loss, projected to be around 157 billion USD globally [2]. The current identification process is based on the classical method, where the nematologist performs visual observation from the morphological characteristic of the microscopic nematode image. Other supplementary techniques were based on molecular analysis using fingerprint, DNA sequencing, and protein analysis [3]. Conventional methods are challenging as the morphological characteristic between species is similar. The identification also requires a long process, prone to error, and there is a decrease in the number of nematode experts. Therefore, an alternative method with high accuracy to assist phytoparasitic nematode identification is necessary for proper pest control and management.
The image-based method has been proven to have a notable result in nematode identification processes. The pioneer in integrating the advanced image-based method for nematode identification was first introduced by [4]. The research demonstrated image processing techniques, such as filtering, segmentation, and morphological operations for feature extraction and using the RBF neural network for classification tasks. The ability of a skeleton-based technique for C.
Elegans nematode detection and separation in population images was demonstrated by [5]. The strategy yield is a vision algorithm with a false rejection rate (FRR) and false acceptance rate (FAR) of 7.9% and 8.4%, respectively. An application for automated identification of nematode's size and shape, called WormSizer, was introduced by [6]. The application implemented several image processing techniques, such as a global thresholding algorithm, image segmentation, and skeletonizing, to measure the nematode's width, length, and volume. An image-based algorithm was accomplished by [7] to acquire a physical attribute of Meloidogyne type II species. The proposed algorithm, which applied several image processing methods, such as illumination correction, binarization, and segmentation, achieved an error rate of 15% and 11% for length and width measurement, respectively.
The breakthrough in machine learning and deep learning makes it possible to be implemented in nematode identification. This method is adapted for processing extensive data and recognizing 3 PHYTOPARASITIC NEMATODE IDENTIFICATION different and small things in challenging environments, such as microscopic nematode images. A deep convolutional selective encoder architecture was developed [8] to identify and count soybeans cyst nematode (SCN) in clutter-field images. The proposed algorithm is comparable to the expert's result, with SCN in the less cluttered and highly cluttered image resulting in 92% and 95% accuracy, respectively. Research by [9] proposed a new architecture combining DenseNet121 and Inception Blocks for phytonematode identification. The performance was outstanding, with an accuracy reach of 98.99% for the model implemented with the transfer learning method. The feasibility of the Xception model, a Convolutional Neural Networks (CNN)-based method, for classifying entomopathogenic nematodes (EPN) was studied [10]. The performance achieved an average validation accuracy of 69.45% for the adult nematode dataset and 88.28% for the juvenile nematode dataset. A deep learning-based application was developed by [11] for soil nematode identification. A ResNet-101 deployed to a web-based system could correctly identify 60% of the nematode genera, 76% of c-p values, and 76% of feeding types applied in the I-nema dataset [12].
Faster region-based convolutional neural networks were implemented by [13] to detect the nonparasitic and plant-parasitic nematode from the microscopic image. The method resulted in an accuracy reach of up to 87.5%.
Deep learning models were recently implemented for identifying plant-parasitic nematodes with a self-collected dataset of nematodes commonly found in Indonesia [14]. The previous study explored the effect of optimization techniques and augmentation methods on the performance of four different deep-learning models. The result shows that most augmentation methods negatively impacted the model performance. In terms of optimization, it was found that it is better to use an optimizer that is easy to use, but with parameter fine-tuning that is matched to the problem to be solved to produce the best performance.
To further improve the previous research, this study investigates 15 popular and well-known CNN-based methods, namely CoAtNet-0, DenseNet121, DenseNet169, DenseNet201, EfficientNetV2B0, EfficientNetV2B3, EfficientNetV2L, EfficientNetV2M, EfficientNetV2S, InceptionResNetV2, InceptionV3, ResNet101v2, ResNet50v2, VGG19, and Xception. The dataset used in this research is an improved version of the prior study [14], which added images to several classes. The deep learning models were performed via the transfer learning method with the same optimizer function, namely SGD optimizer. Fine-tuning was applied to match the dataset characteristic and the task which will be solved. The model performance was then compared based on several metric evaluations, such as test accuracy, mean class accuracy, F1 score, precision, and recall, to obtain the best CNN approach for phytoparasitic nematode identification. Figure 1 presents the general workflow for this study. Initially, the phytoparasitic nematode dataset was collected in Indonesia's agricultural area and classified by a nematologist expert before further processing. The data pre-processing includes several image preprocessing techniques such as edge detection, cropping, and converting into grayscale. The selected CNN models, namely CoAtNet-0, DenseNet121, DenseNet169, DenseNet201, EfficientNetV2B0, EfficientNetV2B3, EfficientNetV2L, EfficientNetV2M, EfficientNetV2S, InceptionResNetV2, InceptionV3, ResNet101v2, ResNet50v2, VGG19, and Xception, then build and configure using several hyperparameter settings. The pre-processing results are used as the input for training the CNN models. The results are obtained and compared using several evaluation metrics, such as test accuracy, mean class accuracy, F1, precision, and recall, to find the best CNN models.

Datasets.
The images were captured using an optical system linked to a laptop and a microscope with the samples presented in Figure 2. Initially, a soil sample collection of phytoparasitic nematodes from diseased Indonesian agricultural plants was collected. The nematodes then extracted from the soil using modified Whitehead Tray [15]. The specimen was then prepared for morphological assessment by using an Olympus CX 31 light microscope with a 5 PHYTOPARASITIC NEMATODE IDENTIFICATION magnification range of 40-1000. The picture was captured using an optical system linked to a laptop and a microscope, with the sample results presented in Figure 3. We added fifty-nine new images to the dataset in the prior study [14], resulting in the image distribution classes as presented in Table 1. The datasets consist of 1016 plant-parasitic nematode images divided into 11 classes.
As can be seen in Table 1, the total image in each class is considered unbalanced. However, it represents the real-world constraint faced by nematologists as some nematode genera were limited in the agricultural area.

CNN Implementations.
The workflow for training the phytoparasitic nematode using CNN models is presented in Figure 4. The data acquisition result from the procedure in Figure 2 was used as the input image of the CNN architecture. The images were then pre-processed using the same method as the previous works [12][13][14]. Edge detection was implemented to find the boundary   Table 2. Moreover, zero momentum of SGD was also applied when training the CNN model.
Hyperparameters are crucial when training convolutional neural networks since they directly influence the actions of training algorithms and significantly impact the models' performance [20].
For consistency, each model will be trained by utilizing the same hyperparameter value in batch size, activation layer, loss function, and epoch. A batch size of 32 was selected for the CNN models due to its improved generalization performance and memory saver [21]. However, because of memory limitations, the batch size of CoANet-0 was set to 16. For a multi-class classification problem, a sparse cross-entropy loss function was applied. The SoftMax activation layer was implemented for the dense layer. The training epoch was assigned to 100 with a callback early stopping function to end the training process when it reaches a certain condition and to avoid overfitting. LearningRateScheduler callback was also used to take the step decay function as an argument and return the updated learning rates with 0.97 from the initial LR value.
Large datasets are ideal for training convolutional neural network models [22]. However, in this study, the total data is considered small. Transfer learning is frequently used to reduce overfitting caused by smaller datasets [23]. This method transfer knowledge from the model trained using the ImageNet dataset to the new phytoparasitic nematode dataset. This study also avoided training from scratch due to the computational expense of building the architecture. When implementing the transfer learning model, the final fully connected layer was removed, replaced with a layer with 11 output nodes, and then retrained by varying weights to better categorize phytoparasitic nematodes.
This study investigates 14 popular pre-trained deep learning models provided by Keras [24], which are trained using ImageNet [25] weight. The models were selected based on the 9 PHYTOPARASITIC NEMATODE IDENTIFICATION computational limitations and image classification performance using the ImageNet benchmark [24][25][26]. The selected models are CoAtNet-0, DenseNet121, DenseNet169, DenseNet201, EfficientNetV2B0, EfficientNetV2B3, EfficientNetV2L, EfficientNetV2M, EfficientNetV2S, InceptionResNetV2, InceptionV3, ResNet101v2, ResNet50v2, VGG19, and Xception. All models have the same based architecture by applying convolutional architecture. The parameters, an internal value related to model architecture, and the main backbone for each convolutional neural network architecture implemented in this study are summarized in Table 2 and Table 3, respectively.
All models were then trained on Google Colab Pro with the specifications of NVIDIA P100 or T4 as GPU, CPU Xeon Processor@ 2.3GHz, and memory up to 25GB based on availability.  [27] through combining the strength of the convolution and transformer networks. This method has better generalization, a larger capacity, faster convergence, and improved efficiency. CoAtNet-0 consists of three convolutional stages and two transformers (attention) layers, as seen in the architecture presented in Figure 5. Global pooling is implemented in this model as the pooling layer before the final fully connected layer.

DenseNet. Dense Convolutional Networks (DenseNet) are based on the connection between
each layer in a feed-forward manner [28]. This neural network architecture offers a variety of significant advantages, including eliminating the vanishing-gradient issue, improving feature propagation, promoting feature reuse, and significantly decreasing parameter requirements. The networks are divided into multiple densely connected, namely dense blocks, to facilitate the downsampling layers that change the feature map. The network configuration in the dense block differentiated the version of this model. The configuration for DenseNet121 is presented in Figure   6, while DenseNet169 and DenseNet201 are presented in Figure 7 and Figure 8, respectively.
Those three DenseNet models are implemented in this study.

EfficientNet. The EfficientNet model advances the performance compared to other
architecture by implementing a smaller model and faster convergence speed. The core of this model uses a new scaling technique with a straightforward compound coefficient to scale the model's breadth, depth, and resolution to increase model capacity [29]. A new baseline model utilizing MBConv blocks is then built using Neural Architecture Search (NAS) and is scaled using the compound coefficient to build EfficientNet [30]. The network's core features of MBConv and Fused-MBConv increase training efficiency while reducing model size.
The newer and enhanced models, namely EfficientNetV2, with a certain training technique, can reach a 5-11 faster convergence rate than the existing cutting-edge models while being up to 6 smaller in size [31]. The EfficientNetV2B has four models starting from B0 to B3 version. The architecture of EfficientNetV2B0 and EfficientNetV2B3 implemented in this study is presented in Figure 9 and Figure 10, respectively. The small version of EfficientNetV2, namely EfficientNetV2S, was also utilized in this study. EfficientNetV2S is scaled up to generate EfficientNetV2M (Medium) and EfficientNetV2L (Large) with a few additional optimizations by limiting image size and adding more layers gradually in a later scale. The architecture for EfficientNetV2L, EfficientNetV2M, and EfficientNetV2S is presented in Figure 11, Figure 12, and

Inception.
The scaling up of networks is the core of the inception model. The model uses aggressive regularization and factorized convolutions to improve processing efficiency [32]. In this study, InceptionV3 and InceptionResNetV2 were used. Configuration of InceptionV3, presented in Figure 14, is retained, and improves the network by employing factorized 7x7 convolutions and BatchNorm in the auxiliary classifier. To prevent the network from overfitting, a regularizing component has been added to the loss formula [32]. The InceptionResNetV2 model [33] is a hybrid Inception architecture incorporating elements from ResNet's functionality. This model incorporates residual connections in place of the Inception's filter concatenation stage. This method allows Inception to achieve all its benefits while maintaining its processing efficiency. The InceptionResNetV2 implemented in this research is given in Figure 15.

ResNet.
The Residual Network (ResNet) architecture was built to accommodate the VGG model, which is far deeper and challenging to train [34]. The training procedure can be made simpler with ResNet's residual learning framework. By omitting connections and offering shortcuts, the previous model's problem with disappearing gradients is resolved. The two core ResNet blocks are the identity and convolutional blocks. These blocks can be stacked to produce deep residual networks. By implementing identity mapping on the skip connections, the most recent iteration of this model, known as ResNetV2, enhanced the prior version. Each residual block's data transfer speed can be increased using this method [35]. The variation of version two's model lies in the number of layers. The ResNet101V2 and ResNet50V2 architectures implemented in this research are presented in Figure 16 and Figure 17, respectively. convolutional layers and three fully connected layers, as seen in Figure 18.

2.3.6
Xception. The Inception model inspired the Xception model but with the replacement of the inception module by using depthwise separable convolution [37]. The model has the exact parameters count as Inception but performed better on ImageNet classification. The architecture of Xception utilized in this research is presented in Figure 19.

RESULTS AND DISCUSSION
This study compared several pre-trained CNN models based on the test accuracy, mean class accuracy, precision, F1 score, precision, and recall, as tabulated in

ACKNOWLEDGEMENT
We would like to thank you for the support given by Universitas Multimedia Nusantara during this study.

SOURCE OF FUNDING
This study was funded by Universitas Multimedia Nusantara with Grant Number 062/PI/LPPM-UMN/III/2022.