Cotton Leaf Diseases Recognition Using Deep Learning and Genetic Algorithm

: Globally,Pakistan ranks 4 th in cotton production, 6 th as an importer of raw cotton, and 3 rd in cotton consumption. Nearly 10% of GDP and 55% of the country’s foreign exchange earnings depend on cotton products. Approximately 1.5 million people in Pakistan are engaged in the cotton value chain. However, several diseases such as Mildew, Leaf Spot, and Soreshine affect cotton production. Manual diagnosis is not a good solution due to several factors such as high cost and unavailability of an expert. Therefore, it is essential to develop an automated technique that can accurately detect and recognize these diseases at their early stages. In this study, a new technique is proposed using deep learning architecture with serially fused features and the best feature selection. The proposed architecture consists of the following steps: (a) a self-collected dataset of cotton diseases is prepared and labeled by an expert; (b) data augmentation is performed on the collected dataset to increase the number of images for better training at the earlier step; (c) a pre-trained deep learning model named ResNet101 is employed and trained through a transfer learning approach; (d) features are computed from the third and fourth last layers and serially combined into one matrix; (e) a genetic algorithm is applied to the combined matrix to select the best points for further recognition. For final recognition, a Cubic SVM approach was utilized and validated on a prepared dataset. On the newly prepared dataset, the highest achieved accuracy was 98.8% using Cubic SVM, which shows the perfection of the proposed framework.. for easy real-time

Machine learning (ML) has become a fundamental part of the agricultural industry [16]. ML is derived from artificial intelligence [17] that allows computer applications to work precisely and accurately in finding the results. Algorithms used in ML are inspired by the characteristics of realworld objects [18]. The main aspects of such systems are that they learn mechanically and improve with repetition and experience [19]. The meaning of learning is to recognize and understand the input data and make intelligent decisions based on the given data. Nowadays, convolutional neural network (CNN) has become more popular than all other computer vision (CV) methods [20]. To perform different tasks, CNN requires training on large datasets. However, it is not easy to obtain high-dimensional data for learning by a CNN model in agriculture. Therefore, a transfer learning (TL) method was used. In this study, a subset of the most commonly occurring diseases was investigated, namely (i) Areolate Mildew, (ii) Myrothecium leaf spot, and (iii) Soreshine. A few sample images are presented in Fig. 1.
A great revolution has emerged in the field of ML [21], CV [22], and robotics [23] after the invention of intelligence neural networks (NN) [24]. These systems are formulated by different arrangements of layers (convolution layer, pooling, and fully connected (FC)) and activation functions (ReLU, softmax, etc.) [25]. These NNs have some significant advantages:(i) An enormous number of variables can be handled simultaneously, (ii) improvement of two factors effectively, consistent, or discrete, (iii) they consider extensive testing of the cost surface; (iv) NN can optimize factors with very complex costs, and (v) they can achieve various ideal arrangements and not a solitary arrangement [26]. In [27] Edna evaluated and fine-tuned modern deep CNN by comparing different deep learning architectures, such as VGG [28] and ResNet. A total of 14 disease classes were considered in the presented work, collected from the Plant Village dataset. DenseNet [29] achieved a 99.75% accuracy score, higher than those of existing techniques. Qiufen et al. [30] utilized different CNN models for TL, such as AlexNet, GoogleNet, and ResNet, to identify diseases in a soybean leaf. In these models, the highest classification accuracy obtained by ResNet [31] was 93.71%. In [32] Shanwen proposed a new approach for the recognition of cucumber disease. The approach consisted of three pipelined procedures: the diseased leaf was segmented using K-means-based clustering, different types of features, such as shape features and color features, were extracted from laceration information, and then, using sparse representation, the diseased leaf images were classified. Many other techniques have been proposed in the literature for the identification and recognition of plant diseases. Most of these studies focused on the segmentation process, and few concentrated on the classification. For accurate classification, feature fusion and selection techniques were proposed [33]. Feature fusion is a process that extracts information from a group of training and testing images and integrates without any loss of data. After the fusion process, the resulting feature matrix is improved to make it more informative. Some researchers discussed that some irrelevant or redundant features are added to the fused matrix, which need to be discarded. To solve this type of problem, researchers introduced several feature selection techniques based on heuristic and meta-heuristic techniques. Researchers have developed different techniques to identify cotton leaf diseases; however, there are still many limitations, and a few of them are as follows: i) choice of color spaces, ii) weak contrast and boundaries, iii) similarities between infected and regular cotton leaves, iv) different symptoms with similar features and similar shapes, and v) selection of useful features for accurate recognition. In this study, we proposed a new automated approach for cotton leaf disease recognition using deep learning. The steps are as follows: a) a self-collected dataset of cotton diseases is prepared and labeled by an expert; b) data augmentation (DAg) is performed on the collected dataset to increase the number of images for better training; c) a pre-trained deep learning model named ResNet101 is employed and trained through TL; d) features are computed from the third and fourth last layers and serially combined into one matrix, and e) a genetic algorithm (GA) is applied to the combined matrix to select the best points for final recognition.
The rest of the manuscript is organized as follows. The proposed methodology, which includes deep learning model-based feature extraction and feature reduction, is discussed in Section 2. The results of the proposed method are discussed in Section 4. Finally, the conclusions drawn from this work are presented in Section 5.

Proposed Methodology
In this study, ResNet101 was used along with TL to recognize cotton leaf diseases. In the first step, the dataset consisted of three major diseases in cotton leaves: Areolate Mildew, Myrothecium leaf spot, and Soreshine. The dataset used in this study was not sufficiently large. The DAg technique was applied to increase the size of the dataset. After performing DAg, the enhanced dataset was provided to ResNet101 for feature extraction. The training to testing ratios were set to 80/20, 70/30, and 50/50. Features were extracted from both layers (average pool and FC) and fused using a serial-based approach. To select the most prominent and non-redundant features, we applied a GA to the fused feature vector. The notable features selected by the GA were provided to Cubic SVM for final recognition. The main flowchart is shown in Fig. 2. The description of each step follows.

Data Augmentation
Because the dataset was limited in this study, we performed DAg. The total number of original images was 95, as shown in Fig. 1. Therefore, enhancement of the dataset was essential. For this purpose, the DAg technique was applied. First, images were rotated at different angles, such as 90 • , 180 • , and 270 • . In addition to rotation, the flipping operation was also performed at different angles to increase the size of the actual dataset. Before DAg, there were only 30 images in each class, and after applying augmentation, 1000 images in each class were obtained.

Convolutional Neural Network
CNN is a famous deep neural network (DNN) model for feature extraction. The inputs of CNN are similar to those of the two-dimensional (2D) array. Similar to feed-forward NNs, neurons used in CNN also have learnable weights and biases. CNNs work directly on images rather than focusing on feature extraction. CNNs are beneficial for image classification and image recognition. Two main parts are included in the CNN: extraction of features and classification of features. The extraction of features depends on the activation function applied to specific layers, while the classification task represents the key class defined before the training process. The key strength of CNN is to extract the strong and complex features of an image and generate good results during the classification phase.
Convolution layer: After the input layer, the first layer of CNN is the convolution layer. The weights (original image pixels) are transformed through this layer and processed for the next layer. In this layer, more robust features are captured in the form of patches. The strongly correlated features are selected and then passed to the next layer for further processing based on each patch feature. The convolution operation is described below.
( 1 ) After every convolution operation, an additional operation is performed, known as the ReLU operation. The ReLU activation layer [34] is described as follows.
This equation explains that all negative values, after applying this function, are converted to zero. During the DNN training, each next layer's input comes from the previous layer's output. With time, the parameters of the previous layer changes. Therefore, the allocation of input data to the layers varies extensively. This requires lower learning rates, which slows down the training process. This internal problem can be solved through normalization called batch normalization (BN) [35]. Using this technique, the mean and variance are calculated and implemented in a new function, as given below.
where σ 2 denotes the batch variance, and μ b is the batch mean. This process adjusts the parameters of each layer during the training process. Higher learning rates are achieved with the help of BN, and no additional measure is needed regarding the initialization of parameters, and the requirements of dropout [36] are eliminated in some cases. After that, the pooling layer is employed to improve the feature information, and only the rich and strong features are passed to the next layer.
where P max denotes the max-pooling layer features, P min denotes the minimum pooling operation, and P avg denotes the average pooling operation. The FC layer connects the output of the previous layer with the successive layers. The CNN architecture exists in the last and combines the output layers to develop the desired outcome. Neurons in the FC layers have a connection of activation with all the previous layers.

Modified ResNet 101:
ResNet is a deep residual network with very deep architectures. It creates a direct path to propagate information throughout the network. The basic architecture of ResNet relies on many stacked residual units. The basic building blocks used to construct the network are residual units. The main difference between ResNet and VGG is the number of deeper layers. In deep networks, increasing the number of layers causes the problem of accuracy degradation. Therefore, in this network, fast connections are built to generate global features and high-level information. The unnecessary layers are removed during the training process that later helps to improve the network speed. Mathematically, it can be formulated as: where F(μ) is a non-linear weight function, and μ is the mean of each layer. Afterwards, residual mapping is learned by these weight layers, defined as: ResNet has five variants, such as ResNet with 18 layers, 34 layers, 50 layers, and 101 layers. In the proposed technique, ResNet 101 was employed for feature extraction. The layer descriptions are given in Tab. 1. The first layer of ResNet101 is the image input layer. There is a total of 104 BN, 104 convolution, 100 ReLU layers, a max pooling, an average pooling, an FC, and a softmax layer. Moreover, 33 additional layers are also involved in this network. In this study, we modified ResNet101 and fine-tuned it according to the prepared cotton dataset. We had 1000 images in each class in this dataset, and we split the dataset into different ratios. In the fine-tuning, the last layer was removed, and a new layer consisting of three classes was added. Subsequently, we trained this model using TL. TL [37] solves a given problem in less time with fewer resources and less effort. The process of TL is illustrated in Fig. 3. The features were extracted from the last two layers, the average pool and FC, after the training. Later, these features were fused for better representation.

Features Fusion
In this study, we used a serial-based method for the fusion of different layered features in one matrix through the following.
Suppose there are two feature spaces U (average pool layered features (2048)) and V (FC layered features (1000)) that are defined on the sample space ϕ of the pattern. For a random sample ϑ ∈ ϕ, the two equivalent vectors containing features are u ∈ U and v ∈ V . The serial-based combined feature of ϑ is defined byγ = u v . Clearly, if u is a k-dimensional feature vector and v is an l-dimensional feature vector, thenγ is a (k + l)-dimensional serial combined feature vector. All vectors containing features of pattern samples, combined in series, form a (k + l)-dimensional serial combined feature space. Finally, the resultant vector is obtained with dimension N × 3048. This fused vector is later optimized through the GA-based feature selection algorithm.

Features Optimization
After feature extraction [38], all the features were not prominent. Some features were redundant and were not required in the next step. It was essential to select the most notable features from the extracted feature vector to improve further processing and reduce the computational time by decreasing the number of features [39]. Less prominent and redundant features were rejected during the feature selection phase. Several algorithms are used for feature selection, such as Entropy [40,41], PSO, partial least square (PLS) [42], variances approach [43], and name a few more [44,45]. In this study, we implemented a GA for feature selection. A detailed description of the GA is provided below.
Genetic algorithm: GA is a meta-heuristic feature selection algorithm [46]. Using this algorithm, the most prominent features can be selected after applying the order of repetition. The GA is processed through the following steps: parameter and population initialization, fitness functionbased evaluation, crossover, mutation operation, and reproduction. The parameter and population initialization are the first and foremost step in GA. The purpose of this step is to initialize the GA parameters such as, total number of iterations is set to 200, population size is set to 20, crossover rate is set to 0.7, mutation rate is set to 0.01, and mutation rate and selection pressure are set to 5. In the population initialization step, each population is randomly selected and sorted according to the fitness function. If the features do not meet the criteria of the fitness function, then the crossover operation is performed. In this operation, the information of the two parents is combined to generate a new offspring. This function determines the GA's performance and creates a child solution using more than one parent solution. A uniform crossover was adopted in this work with a crossover rate of 0.7. This process is shown in Fig. 4.

Figure 4: Crossover operations
After the crossover operation, it is essential to perform a genetic diversity. For this purpose, a new operation is performed, known as mutation operation, similar to a biological mutation. The selected mutation technique was a uniform mutation with a mutation rate of 0.001. The roulette wheel selection was applied in this regard based on the high probability value of each selected subset. The formulation of the roulette wheel selection is defined as follows: where α1 represents the selected parent pressure, and S β represents the sorted population. Subsequently, the features were passed to the fitness function, which was Fine KNN (FitKnn), computing the accuracy and loss as follows: Using the loss value, the best features were selected for final recognition. This process was continued until all the iterations were completed. Finally, we obtained a selected vector of dimension N × K, where K = 1703 for this dataset. Cubic SVM was used as the main classifier for final recognition. A few other classifiers were also used for comparing the Cubic SVM accuracy using the proposed method. The detailed algorithm of GA based feature selection is provided below.

Experimental Results and Analysis
The experimental process of the proposed method is described in this section. The prepared dataset was divided into three different training to testing ratios and the experiments were performed as presented in Tab. 2. In experiment 1, the training set contained 80% of the sample images and the testing set contained the remaining 20%. For experiment 2, the training and testing samples comprised 70% and 30%, respectively. In experiment 3%, 50% of the images were used for training and 50% for testing. All the three experiments were conducted using the proposed method. The Cubic SVM classifier was considered as the key classifier, and the rest were used for comparison. The performance of each classifier was computed in terms of the following parameters: accuracy (ACC), recall rate, FNR, and AUC. All the experiments were performed on MATLAB2020a using a Core i7 desktop computer with 16 GB of RAM and 8 GB graphics card.

Results Experiment No. 1
In experiment 1, the proposed method was applied on the prepared dataset using a training/testing ratio of 80/20. Multiple classifiers were used to evaluate the proposed method. For each classifier, the recall and accuracy rates were the key calculated parameters. The results of this experiment are presented in Tab. 3. From this table, it can be observed that Cubic SVM outperformed and achieved an accuracy of 98.7% with a computational time of 14.72 s. The maximum noted time in this experiment was 26.99 s for Cosine KNN. The best noted time in this experiment was 7.34 s for Fine-Tree. The other computed parameters, such as the recall rate and AUC, of Cubic SVM were 98.66% and 0.99, respectively. Cosine KNN and Medium KNN also performed well, both achieving an accuracy of 98.7%. The rest of the classifiers also showed good performance but were not comparable to Cubic SVM. The recall rates of each classifier can be verified through the true positive rates (TPRs), given in Tab. 3. In addition, the recall rate of Cubic SVM can be verified by the confusion matrix given in Tab. 4. In this table, the diagonal values show the TPRs (correct predictions). The highest prediction rate was 98%, as obtained for Areolate Mildew.

Results Experiment No. 2
In experiment 2, the proposed method was applied on the prepared dataset using a training/ testing ratio of 70/30. Multiple classifiers were used to evaluate the proposed method. For each classifier, the recall and accuracy rates were the main calculated parameters. The results of this experiment are presented in Tab. 5. From this table, it can be observed that Cubic SVM achieved the best accuracy of 98.8% with a computational time of 21.93 s. The maximum noted time in this experiment was 36.83 s for Cubic KNN. The best noted time in this experiment was 13.63 s for Fine-Tree. The other computed parameters, such as the recall rate and AUC, of Cubic SVM were 98.67% and 1.00, respectively. Fine Gaussian SVM, Coarse KNN, and Cubic KNN also performed well, achieving accuracies of 98.8%, 98.7%, and 98.6%, respectively. The rest of the classifiers also showed good performance but were not comparable with Cubic SVM. Each classifier's recall rate can be verified through the TPRs, given in Tab. 5. In addition, the recall rate of Cubic SVM can be verified by the confusion matrix given in Tab. 6. In this table, the diagonal values show the TPRs (correct predictions). The highest prediction rate was 100%, as obtained for Soreshine. The prediction rate of the rest of the class was 98%. This experiment revealed that an increasing number of testing samples increased the testing computational time, whereas, the average accuracy was increased by 1%. Therefore, the proposed method is not affected by small number of training images.

Results Experiment No. 3
Experiment 3 was performed on the prepared dataset using a training/testing ratio of 50/50. The evaluation was performed through multiple classifiers, as listed in Tab. 7. For each classifier, the recall and accuracy rates were computed. Tab. 7 presents the results of this experiment. From this table, it can be observed that Cubic SVM outperformed and achieved an accuracy of 98.6% with a computational time of 29.02 s (as can be seen in Fig. 5). From Fig. 5, it can be observed that the computational time of the testing phase increased with the number of testing samples. The maximum noted time in this experiment was 49.11 s for Cubic KNN. The best noted time in this experiment was 20 s for Fine-Tree. The other computed parameters, such as the recall rate and AUC, of Cubic SVM were 98.6% and 1.00, respectively. Medium KNN also performed well, achieving an accuracy of 98.5%. Overall, the performance of each classifier was significant when the number of testing samples increased. The TPRs and ROC values were also noted for each classifier, as shown in Tab. 8. Tab. 8 presents the confusion matrix of Cubic SVM for experiment 3. In this table, the diagonal values show the TPRs (correct predictions). The highest prediction rate was 100% for the Soreshine disease.

Conclusion
This study focused on the identification of three common cotton leaf diseases using deep learning. A dataset was prepared for the three cotton diseases, and a pretrained deep model was reused using TL. Features were extracted from the different layers and serially combined into one vector for better feature representation instead of single-layer features. However, this process can affect the system's accuracy; therefore, we applied the GA for the best feature selection. The features selected through the GA maintained the accuracy and minimized the computational time during the testing process. Three experiments were conducted and a maximum accuracy of 98.8% was achieved for the approach considering 70/30 training/testing ratio. Overall, the performance of the proposed method was significant when using a prepared dataset. In the future, other cotton diseases will be considered for recognition. Moreover, the following improvements will be considered: (a) Collecting more images to increase the dataset; (b) utilizing the latest CNN model for recognition of cotton diseases; (c) selecting features based on meta-heuristic techniques, and (d) minimizing the recognition time for easy real-time processing.
Funding Statement: This work was supported by the Soonchunhyang University Research Fund.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.