Automatic adaptive weighted fusion of features - based approach for plant disease identi ﬁ cation

: With the rapid expansion in plant disease detection, there has been a progressive increase in the demand for more accurate systems. In this work, we propose a new method combining color information, edge information, and textural information to identify diseases in 14 di ﬀ erent plants. A novel 3 - branch architecture is proposed containing the color information branch, an edge information branch, and a textural information branch extracting the textural information with the help of the central di ﬀ erence convolution network ( CDCN ) . ResNet - 18 was chosen as the base architecture of the deep neural network ( DNN ) . Unlike the traditional DNNs, the weights adjust automatically during the training phase and provide the best of all the ratios. The experiments were performed to determine individual and combinational features ’ contribution to the classi ﬁ cation process. Experimental results of the PlantVillage database with 38 classes show that the proposed method has higher accuracy, i.e., 99.23%, than the existing feature fusion methods for plant disease identi ﬁ cation.


Introduction
The world is facing various threats toward food security, whether it is a massive growth in global population, severe weather issues due to climate change, or the risk of sudden upsurges in severe crop disease epidemics. The unpredictable extreme weather conditions hamper the temperature, eventually leading to various pathogen infections in crops. These crop infections affect the yield and quality of the crops [1]. There may be conditions that can lead to food shortage, human starvation [2,3], social instability, and substantial economic disruptions. The production loss faced by four Indian districts namely-Nellore (92,000-105,000 tons), West Godavari (30,000-36,000 tons), Karnal (46,000 tons), and Rangareddy District (22,000 tons) was estimated using Cramer method that showed the conservative loss proportion to be 3-16% [4]. The epidemic caused by the black pod disease in cocoa beans in Ghana in 2012 destroyed about 25% of the annual yield. The annual yield was observed to be 8,50,000 metric tons out of which 2,12,500 metric tons of yield got wasted because of the disease. The revenue loss was estimated to be 7.5 million in Ghanian cedi [5]. Two types of diseases -Target leaf spot and B. tabaci infection in Soybean yield caused [44][45][46][47][48].66% of loss in yield that costed 20USD per hectare of the agricultural fields [6].
Crop protection methods can help prevent these disease epidemics from happening. The techniques used in the crop protection methods can detect the early onset of diseases so that the preventive actions or the treatments are confined to the affected reason, which can minimize the quantity of the products used before the appearance of visible symptoms [7]. The proposed approach is based on computer vision methods to identify plant diseases. Computer vision is a branch of computer science which can be helpful in collecting the data from the images, analyze the patterns and produce predictions. The systems can be integrated with Computer vision and automate the plant disease identification systems. Computer vision works on two main concepts: Feature extraction and classification [8][9][10][11][12]. There are many challenges in the field of plant disease identification, for example, the decision to choose the best features so that it provides the best classification accuracy. This can be possible using different machine learning techniques [13][14][15][16][17]. An advanced version of these machine learning techniques has been introduced in the world of Computer vision, i.e., deep neural networks (DNNs). DNNs extract the significant features without any human intervention and classify large amount of data in just a few hours [17][18][19].
There are many challenges in the field of plant disease identification, for example, the decision to choose the best features according to their contribution in providing the best classification accuracy. Another problem is that the weights in DNNs update at random, which might reduce the accuracy of the system. Also, the state-of-the-art techniques cannot provide the individual contribution of the features contributing in the correct identification.
The main contribution of this work is summarized as follows: (a) Analyzes the contribution of features contributing to the correct classification of the images. The rest of this article is organized as follows. In Section 2, the state-of-the-art techniques based on feature fusion are explained. Section 3 describes the novel proposed feature-fusion-based technique. The dataset description is provided in Section 4. Section 5 details the experimental setup, and the results are illustrated. Section 6 contains the visualization of the results. Section 7 contains the analysis of the novel proposed approach. And Section 8 contains the conclusion of the work.

Related works
The task of feature extraction plays the most crucial role in image classification. An image can contain thousands of features which can be helpful, but it is a cumbersome task to identify which features to use. The features can be categorized into two types: Local features and global features. Global features describe the whole image with the help of one vector. On the other hand, the local features are manually extracted from the image and are represented by many small vectors [20,21]. Standard local descriptive features are the edge features and the texture features.
Color features mainly include the color moment features (e.g., mean value, standard deviation, and skewness), color histogram features, and average RGB features [22]. Color features can help distinguish between diseased and unaffected leaf areas. Various edge extraction techniques are proven to be good feature extraction techniques for extracting the veins, affected areas, dead parts of the leaves. The Canny edge operator is one of the most common edge descriptors [23]. Other edge description techniques are the Sobel edge detector and the Prewitt edge detector. Texture features exhibit the repetition of patterns in the images which can facilitate the extraction of regions that contain the areas affected by the disease. Texture features can be contrast, entropy, RMS, energy, kurtosis, correlation, variance, fifth and sixth central moment, smoothness, mean value and standard deviation [24]. Other texture-based plant identification methods include local binary patterns, local edge pattern histograms, and complete local binary patterns [25,26].
Various researchers have used different combinational fusions of features with neural networks for different tasks. Khan et al. developed a multi-scale feature-fusion-based breast cancer detection technique. Deep features extracted from DenseNet-201, NasNetMobile, and VGG16 CNN pre-trained models were concatenated and passed into the classifier after performing numerous data augmentation techniques. Transfer learning and fine-tuning were integrated to facilitate the performance of the system. The system could achieve 98% accuracy with two breast cancer datasets [27]. A fusion of handcrafted features and DNNs was developed to detect melanoma and nevus skin lesions. CIE L*a*b* model was used to extract statistical color features from the region of interest (ROI). Area, perimeter, circularity, diameter, and eccentricity were used as shape features of the ROI. The texture was extracted using Gray-level co-occurrence matrix statistical features. The combination of handcrafted features and deep features was then passed into three types of classifiers: Linear regression, support vector machine (SVM), and relevance vector machine. The system could achieve an accuracy of 92.40% [28].
A system was developed to detect brain tumors using a feature-fusion-based approach. The fast nonlocal mean and the Otsu methods were used to preprocess the images and accomplish the segmentation task. GEO, local binary pattern (LBP), and Histogram of Oriented Gradients (HOG) features were extracted and fused to create a single feature vector. Seven classifiers were used to perform the classification task -SVM, logistic regression, k-nearest neighbors algorithm, ensemble, linear discriminant analysis, decision tree, and quadratic discriminant analysis [29]. A covid-19 detection system was developed using chest X-ray images based on the fusion of extracted features and deep learning. The feature extraction was performed using HOG and a CNN-based feature extractor. The classification was done with the help of VGG-Net. The system achieved 98.36% accuracy [30]. An image classification system was proposed to combine Superpixels-based features, which exhibit global (GIST), appearance (dilated SIFT histogram), and texture (color thumbnail) features. A complex feature vector was created using a parallel feature fusion strategy. The dimensionality reduction was made using principal component analysis as the feature vector size was huge. The highest accuracy of all experiments was 84.71% [31]. A feature fusion method was developed to classify different types of images. Scale invariant feature transform (SIFT), self-similarity (SSIM) descriptor, LBP, PACT descriptor, and HOG were fused to create a vector of features. The combined features were then passed to a multi-kernel SVM.
All the abovementioned existing techniques lack the contribution of individual features used, which is a significant drawback of these techniques. The proposed work overcomes this limitation and provides the contribution of significant and non-significant features in the correct classification of the images.

The proposed approach
The novel proposed automatic adaptive weighted feature-fusion-based plant disease identification classifier is a 3-branch architecture, as illustrated in Figure 1. The first branch, namely, the color feature branch, contains the RGB values of the images (size of 256 × 256), which are passed as input to a ResNet-18 DNN and deep, robust features (F 1 ) of size 512 × 7 × 7 are extracted from this branch. The second branch is the edge feature branch, in which the input is the image that contain the Sobel edge information. For generating the input for the edge branch, the RGB images are first converted into grayscale images and then the Sobel edge detection is applied.
The Sobel edge information images are then passed into a ResNet-18 architecture and its deep, robust features (F 2 ) of size × × 512 7 7 are extracted from this branch. The third branch is called the texture feature branch. It is created by replacing the traditional convolutional layers of ResNet-18 DNN with the newly introduced texture descriptor, Central Difference Convolutional layers. The RGB images are passed to this new ResNet-central difference convolution network (CDCN), and the deep, robust features (F 3 ) of size × × 512 7 7 are extracted from this branch.
These features obtained from every branch are of size ( × × 512 7 7 ). For analyzing the contribution of each feature, some weights ( = ) n 1, 2, 3 α , where n are assigned to features obtained from each branch, which are achieved by multiplying the assigned weights to the features and creating the new features F′ 1 , F′ , 2 and F′ 3 . These weighted features are summed up to create a combinational feature, i.e., F 4 .
The dimensionality of these F 4 features were too high, i.e., × × 1, 536 7 7 . It is then reduced using two convolutional layers to suit the hardware compatibility. The new features F 6 obtained after two convolutions are of size × × 16 5 5. The features are then flattened to a feature vector, F 7 of size × 400 1. These final features were passed to the fully connected layer and then to the SoftMax Layer for classification.

Extraction of texture features using CDCN
CDCN is a textural descriptor that provides texture difference information of an object by combining the pixel intensity value and its gradient information, as shown in Figure 2. CDCN helps to obtain the patterns' essential and significant features by merging the intensity and gradient information. Unlike the textural description models based on vanilla convolution, it is more robust in model building. It provides extraordinary performance in extracting the features from fine-grained patterns captured in environments with diverse variations. The advantage of CDC over the traditional vanilla convolution is that without increasing any parameter, it can work in the place of vanilla convolution with the capability of generating models more robust than traditional ones [32].
The standard vanilla convolution used to extract features in CNNs can be represented as follows:   The CDC and vanilla convolution are combined to become  where σ parameter is used for the combination and ∈ [ ] σ 0, 1 . The final generalized CDC equation can be expressed as follows [33]: The ResNet-CDCN model integrates transfer learning and fine-tuning for an improved training experience. In this work, the CDC is used to replace all the convolutions in ResNet-18 architecture, as shown in Figure 3, due to its capability to avoid any additional parameters in the network while still providing a more robust textural description. The learning rate (LR) optimizer is also employed in training for optimizing the network.

Dataset description
The dataset used for this work is the publicly available dataset, the PlantVillage dataset [34]. The hugest dataset publicly available for plant disease identification tasks contains 53,606 color plant leaf images. This dataset has been used for the same task by many researchers [35][36][37][38][39]. The images are labelled with 38 classes, including healthy plant leaf images and plant leaf images of different types of diseases. The total number of plants used to create the dataset is 14. The resolution of these images is × 256 256 pixels and are in JPEG format. Few samples from the dataset can be seen in Figure 4. The images of healthy plant leaves possess uniform color, no abnormalities, or discoloration. On the other hand, the diseased plant leaf images exhibit symptoms like different color spots, lines and patches, and discolored, crumbled, and dead-looking parts of the leaves. These symptoms can be identified with the help of some manual experts.

Experimental setup and results
For the experimental setup, the parameters are fixed for all models. The database used contained 53,606 images in JPEG format. The resolution of the images is 256 × 256 pixels. The split to train the data is set to be 80:10:10, i.e., 80% data is used for training, 10% data is used for validation, and the rest of the 10% data is used for testing the system. The batch size and epochs are chosen to facilitate the system capacity, i.e., 8 and 15, respectively. The ADAM optimizer is used in the models for training. The initial LR is set to be 0.0001, and a learning rate optimizer is integrated to improve the accuracy. The period of LR decay is chosen to be 1, with the multiplicative factor of LR decay as 0.8. The execution environment of all the models is an NVIDIA Graphical Processing Unit, as summarized in Table 1.

LR optimization
LR is an essential parameter to be set in a DNN that instructs the optimizer on how distant the weights should be from the gradient direction for a batch of images for training. The learning rate has to be chosen wisely since if the LR is set to very low, although the training is accurate, the steps for minimizing the loss function will become so small that it becomes time-consuming. On the other hand, if it is set to be too high, the changes in the weight distance are so massive that it can hamper the training process by making it even worse [40]. The more innovative way to choose the weights optimally is by using a LR scheduler [41]. The learning rate scheduler helps choose the optimal weights by decaying the LR of all the parameters at some defined number of epochs. The decay factor can be selected in terms of percentage. For example, if the initial learning rate is set to be 0.1 and the gamma factor is 0.6, so for a step size of 5, the LR will be 0.1/0.6, i.e., 0.04, then for the subsequent five epochs, the LR value will be decayed by 60%. In this work, the initial LR value is set to 0.0001, which will decay after every epoch with a decay/gamma factor of 0.8.

Automatic weight updation mechanism (AWUM)
The AWUM [42] contains two steps: (1) Combining the three types of features that are good in extracting significant features, as shown in Figure 5. The color, edge, and texture features are chosen in this case. All these features perform well in the feature extraction procedure but contribute differently to the process. (2) Assigning different weights to the features and analyzing their individual, as well as the combinational, contribution can be beneficial for choosing the best features, reducing time consumption and redundant information that can help obtain better results in classifying the diseases in plants.
The features used, apart from the color images, are the images containing Sobel edge features and the texture features extracted by using the CDCN. The equation used for the AWUM can be expressed as follows [42]: where α signifies the weight assigned to a feature. w i corresponds to the features, and W represents the output of the AWUM. In this work, there is no need for human intervention to update the weights, but the system automatically updates the weights assigned to features to obtain the best value. This system updates the C value with the same mechanism as the system weights, i.e., the backpropagation algorithm. The α weights are appended to the list of system weights to get updated similarly. The weight α depends upon the contribution of the individual features used and gets automatically updated accordingly. This mechanism can wholly use each feature for the plant disease classification, and the best significant features can also be extracted. The final feature vector can be computed as follows: where α 1 , α 2 , and α 3 are the weight assigned to the individual features and used to compute the contribution of the features. The sum of the weights assigned to the features is kept such that the sum is 1.  Fusion of features-based approach for plant disease identification  9

Image transformation and augmentation
Deep learning models face overfitting problems during the training phase and produce a large amount of training error. This leads to poor classification results. The deep learning models can be helpful only when there is a reduction in validation and training errors. A method called data augmentation is used to achieve a low training error. The training with augmented images can be beneficial in reducing the training and validation error by representing a more significant set of feature points [43].
The augmentation process includes considering some images of the same class as input images and providing a processed image of the same size as the output. The input and output images are fed to further layers of the network as input [44]. This work uses four types of operations for image augmentation: Random rotation with the range of −180 to 180 degrees, random scaling with values 0.75-1.25, random shearing with the value range of 2-4 and horizontal flipping.

Performance of color, edge, and texture information models
The RGB images are used for the color only and texture only models. The edge only model uses the Sobel edge images produced by applying the Sobel edge detector on RGB images, as represented in Figure 6.
It has been observed that the color model produced the highest test accuracy, i.  Table 2. Loss/accuracy plots and confusion matrix plots for individual feature information models can be seen in Figure 7.

Performance of color + edge information model with different weight combinations
It has been observed that the best results of the proposed color + edge model (illustrated in Figure 8) are obtained when α 1 and α 2 are initialized to 0.5 and 0.5, respectively. It is further observed that the automatic weighting of α 1 and α 2 changes their values to 0.68 and 0.32, respectively, as shown in Table 3. This indicates that the model considers 68% of color features and 32% of edge features. The alpha initial values vs final values plot, loss/accuracy plots, confusion matrix plots for individual feature information models can be seen in Figures 9 and 10, respectively.      ).

Performance of the color + texture (CDC) information model with different weight combinations
It has been observed that the best results of the proposed color + texture model (illustrated in Figure 11) are obtained when α 1 and α 2 are initialized to 0.5 and 0.5, respectively. It is further observed that the automatic weighting of α 1 and α 2 changes their value to 0.62 and 0.38, respectively, as shown in Table 5. This indicates that the model considers 62% of color features and 38% of texture features. The alpha initial values vs final values plot, Loss/accuracy plots, and confusion matrix plots for individual feature information models can be seen in Figures 12 and 13, respectively ( Table 4).  ).
Fusion of features-based approach for plant disease identification  13 Figure 13: Confusion matrix plot for combinational feature information model: Color + texture information model with best values ( = = α α 100% and 100% 1 2 ).

Performance of the edge + texture (CDC) information model with different weight combinations
It has been observed that the best results of the proposed edge + texture model (illustrated in Figure 14) are obtained when α 1 and α 2 are initialized to 0.4 and 0.6, respectively. It is further observed that the automatic weighting of α 1 and α 2 changes their value to 0.42 and 0.58, respectively, as shown in Table 6. This indicates that the model considers 42% of edge features and 58% of texture features. The alpha initial values vs final values plot, loss/accuracy plots, and confusion matrix plots for individual feature information models can be seen in Figures 15 and 16, respectively.  It has been observed that the best results of the proposed color + edge + texture model (illustrated in Figure 17) are obtained when α 1 , α , 2 and α 3 are initialized to 0.25, 0.25, and 0.50, respectively. It is further observed that the automatic weighting of α 1 , α , 2 and α 3 changes their value to 0.41, 0.17, and 0.42, respectively, as shown in Table 7. This indicates that the model considers 41% of color features, 17% of edge features, and 42% of texture features. The alpha initial values vs final values plot, loss/accuracy plots, and confusion matrix plots for individual feature information models can be seen in Figure 18.    It is noticed that the color + texture model outperformed all other models with the highest accuracy of 99.23%, as shown in Table 8. The precision, recall, F1 score, and AUC are obtained to be 98.98, 99.10, 99, and 100%, respectively, as represented in Figure 19.

Performance comparison of traditional plant disease identification approaches and the proposed approach
In this work, the test accuracy is the performance metric for comparing the traditional approaches for plant disease classification and the proposed novel automatic adaptive weighted features-fusion-based approach for plant disease identification.  Table 8. The proposed novel weighted features-fusion-based approach performed exhaustively and provided the highest accuracy of 99.230%, as illustrated in Figure 20.

Visualization of results with LayerCAM for different feature attentions
Zhou introduced the idea of visualizing the features via class activation maps, which proposed the concept of CAMs, i.e., Class Activation Maps [48]. These CAMs were created using the neural network structure in  93 Agarwal et al. [48] 94 Alehegn [49] 95.63 Gao and Lin [50] 96.02 Bin Tahir et al. [51] 97 Kaur et al. [52] 97.7 Sethy et al. [53] 97.96 Waheed et al. [54] 98.06 The proposed approach 99.230 which the global average pooling layer replaced the fully connected layer. Various methods were introduced in past years to generate class activation maps. They could able to locate the target feature regions excellently. Still, the problem that has been found so common among all these methods is that they utilized the final convolutional layer of CNN to create CAMs. The issue with relying on a final convolutional layer of CNN is that it has a low spatial resolution and hence can trace only the coarse regions [49], which limits the performance of the CAM methods. LayerCAM is convenient with the standard CNN-based image classifiers without altering the network architectures and weights obtained from backpropagation. The LayerCAM attention method was introduced by Jiang. LayerCAM does not uses the final convolutional layer of CNN but also utilizes different shallower layers of the CNN to create effective class activation maps to obtain more fine-grain features and to locate the target effectively. The attention plots or class activation maps of all three models, color model, edge model, and the proposed color + edge + texture model, utilized in this work, can be seen in Figure 21.

Discussion
In this work, we introduced a novel automatic adaptive weighted fusion of features-based approach for plant disease identification. The proof of the efficiency of the proposed approach can be seen in Tables 7  and 8. The key insights of the proposed approach are described as follows: 1. The experiments examined the proposed approach in every aspect, i.e., in terms of test accuracy, F1 score, precision, recall, and AUC score. 2. The developed approach provided the highest accuracy among all the existing feature-fusion-based approaches, i.e., 99.230% 3. Out of the total 53,606 images, the proposed feature-fusion-based approach classified 53,193 images correctly, and only 413 images were misclassified. 4. The proposed approach provided significant insight into the contribution of the features used, i.e., the color features and the texture features contributed the highest in the classification of the images, even contributed equally. On the other hand, edge features contributed negligibly to the process.
Apart from the several advantages, this approach also has some limitations. First, the images used for the experiments were captured in an environment with no chance of illumination variation, which eased the classification process. The system could not get the opportunity to face the challenge of tackling illumination variations. Another limitation is that the leaf images of the dataset contain one type of disease per leaf. In real-world scenarios, the problem of disease identification is much more complex because the leaves may contain multiple diseases per leaf. In future, the system will be improvised to work for the leaf images containing various illumination variations and with multiple disease symptoms.

Conclusion
Integrating the color, edge, and texture features, a novel 3-branch classifier for plant disease classification has been proposed. The proposed classifier can classify between 38 classes of 14 different plants. A newly introduced texture extraction method is used for extracting the prominent texture from the plant leaves, i.e., CDCN. The features are trained using ResNet-18 with an AWUM for automatically adjusting the weights in DNN and analyzing each feature's contribution to achieving the highest classification accuracy. The individual features, as well as the combinational features, are analyzed by performing various types of experiments. The highest accuracy is achieved with a 50% contribution of color features and 50% of texture features. The proposed classifier outperformed the existing feature-fusion-based approaches for plant disease identification.