Corn Leaf Diseases Diagnosis Based on K-Means Clustering and Deep Learning

Accurate diagnosis of corn crop diseases is a complex challenge faced by farmers during the growth and production stages of corn. In order to address this problem, this paper proposes a method based on K-means clustering and an improved deep learning model for accurately diagnosing three common diseases of corn leaves: gray spot, leaf spot, and rust. First, to diagnose three diseases, use the K-means algorithm to cluster sample images and then feed them into the improved deep learning model. This paper investigates the impact of various k values (2, 4, 8, 16, 32, and 64) and models (VGG-16, ResNet18, Inception v3, VGG-19, and the improved deep learning model) on corn disease diagnosis. The experiment results indicate that the method has the most significant identification effect on 32-means samples, and the diagnostic recall of leaf spot, rust, and gray spot disease is 89.24 %, 100 %, and 90.95 %, respectively. Similarly, VGG-16 and ResNet18 also achieve the best diagnostic results on 32-means samples, and their average diagnostic accuracy is 84.42% and 83.75%. In addition, Inception v3 (83.05%) and VGG-19 (82.63%) perform best on the 64-means samples. For the three corn diseases, the approach cited in this paper has an average diagnostic accuracy of 93%. It has a more significant diagnostic effect than the other four approaches and can be applied to the agricultural field to protect crops.


I. INTRODUCTION
Corn is currently the highest-yielding food crop around the world, an important food, and industrial raw material. The stable and healthy development of corn production plays a pivotal role in food security, farmers' income growth, and the national economy. Corn diseases directly affect its yield and quality. There are more than a dozen common diseases in corn, most of which occur in leaves, ears, and roots. Among them, leaf spots and rust are typical [1]. Leaf spot, there are oval or rectangular, spindle-shaped lesions on the leaves, with yellow-brown halos around them, 5-10cm long and 1.2-1.5 cm wide. In severe cases, several lesions are The associate editor coordinating the review of this manuscript and approving it for publication was Binit Lukose . connected, and the leaves die early. Rust disease mainly occurs in the middle and upper leaves of the plant. At first, small light-yellow dots scattered or clustered on the front of the leaf, then protruded and expanded to round to oblong, yellowish-brown, or brown, and the surrounding epidermis turned up. Gray leaf spot, also known as corn Cercospora leaf spot and corn mildew, is a more severe disease. The initial stage of the disease is light brown spots in the shape of water stains, which extend parallel to the veins and are often rectangular. However, the diagnosis of corn diseases has mainly relied on agricultural exports for field identification. This method has many shortcomings, such as subjective, high cost of time and energy, low efficiency, and so on [2]. Therefore, it is vital to be able to accurately and quickly identify corn leaf diseases.
With the enhancement of computer data processing capabilities, machine learning technology combined with image processing is becoming increasingly popular in the intelligent diagnosis of plant diseases [3]. Many accomplishments have been made [4]- [10]. Local Gray Gabor Pattern (LGGP) is a new texture feature that combines local binary pattern (LBP) and Gabor filter, which was developed by Patil et al. For soybean leaves infected with the mosaic virus, brown spot, and pod mottle, the detection efficiency is about 96%, 68%, and 76%, respectively [11]. Johannes et al. reported an image processing algorithm that uses candidate hot spot detection and statistical inference methods to analyze the early symptoms of three European epidemic wheat diseases (sepia, rust, and brown spots) [12]. The above studies are all extracting disease features through manual design. Although good results have been achieved, they also have disadvantages such as difficulty in feature extraction, poor adaptability, and weak anti-interference ability.
This paper proposes a corn leaf disease diagnosis method based on the K-means clustering and deep learning combination to improve corn leaf disease diagnosis accuracy, using transfer learning to train the deep learning model and explore different K values. The proposed CNN model is compared with classic deep learning models such as VGGNet and ResNet to study the impact of clustering on diagnosis results. The purpose of this work can effectively diagnose three common corn leaf diseases, which can be applied to the agricultural sector for crop protection.
The main contributions of this work are summarized as follows: First is to use K-means clustering to segment disease images. The second is to propose a deep learning model for corn leaf disease diagnosis, which is improved based on VGG-19. The third is that our method can be used to classify and diagnose corn leaf diseases.
The rest of this paper is organized as follows: Section ii gave the related work. Section iii introduced corn data and preprocessing methods and proposed a deep learning framework for corn disease diagnosis. Section iv explained the experimental process and analyzed the experimental results. Section v summarized the whole work and pointed out the directions for future work.

II. RELATED WORK
Accurate modeling and finding the most critical factors in the analysis is one of the required steps in preprocessing stage [13]. However, the convolutional neural network is critical for feature extraction [14]. It can automatically extract image features and has good adaptability to image displacement, scaling, and distortion [15]. Therefore, deep learning models are applied in the current research because of their excellent efficacy [16]. Deep learning is rapidly becoming the standard technology for image classification [17]. It has been applied to many fields such as medical image recognition, remote sensing image recognition, autonomous vehicle driving, and face recognition, text clustering, lunar impact crater identification, and age estimation, epidemic prevention and control [18]- [31]. In the field of agriculture, many studies have been conducted on the classification of plant pests and diseases [32]- [37], such as tea [38], apple [39], [40], rice [41], mango [42], cucumber [43], etc.
Saeed et al. [44] proposed an automated crop disease identification system that was evaluated algorithmically on tomato, corn and potato crops. They used partial least square (PLS) regression, fusion and selection of features extracted by the CNN model, which were then passed to multiple classifiers to obtain the final recognition. The average accuracy achieved by the PLS-based fusion and selection method is about 90.1%, which not only improves the recognition accuracy but also reduces the computation time.
Almadhor et al. [45] developed an artificial intelligence (AI) driven framework to detect and classify the most common guava plant diseases. They constructed a high-resolution guava image dataset. Using E chromatic image segmentation to isolate guava disease regions, a combination of color and texture features was applied instead of individual channels for disease detection and classification and the best recognition results were obtained on a set of RGB, HSV and LBP features.
Oyewola et al. [46] proposed a technique to detect cassava mosaic diseases. Both the dataset was expanded and the cassava disease images were balanced by unique block processing. Some of the images have low contrast and poor resolution, they are improved with low contrast using gamma correction and decorrelation stretching to enhance the color separation of images with significant band correlation. In this work, the researchers chose eight performance metrics to evaluate the proposed model.
Cap et al. [47] proposed a novel suppressor superresolution method (LASSR) specifically for the diagnosis of leaf diseases. LASSR detects and suppresses artifacts to a large extent and can generate high-resolution images, thus the performance of automatic diagnosis of plant leaf diseases. Experiments with this method on five cucumber diseases showed that training with data generated by LASSR significantly improved over 21% on unseen test data sets compared to baseline.

A. DATA ACQUISITIONS AND PROCESSING 1) DATASET
The corn data set used in this study is from the Crop Disease Recognition of the 2018 Artificial Intelligence Challenger Competition (challenger.ai). Three types of corn leaves (gray spot, rust, leaf spot) were selected for diagnosis. Three hundred images of each disease are chosen, resulting in a total of 900 disease images. Part of the disease images is illustrated in FIGURE 1.

2) IMAGE PREPROCESSING USING K-MEANS CLUSTERING
The K-means algorithm is often used in image segmentation. There is rich color information in the corn leaf disease image, VOLUME 9, 2021 and by observing the disease image, it is found that the background, disease spots, and uninfected areas have apparent color differences. Therefore, use the K-means algorithm for clustering of disease images can remove some noise reasonably and efficiently and facilitate subsequent diagnosis.  The K-means clustering algorithm requires a predetermined number of clusters, and each pixel in the image was divided into various classes, with the number of clusters K will affect the image clustering effect. In this study, 2-means, 4-means, 8-means, 16-means, 32-means, and 64-means clustering were performed on corn disease images. Convolutional Neural Network (CNN) is a neural network, which is employed to recognize and classify images. It is one of the representative algorithms of deep learning and has achieved outstanding results [48], [49]. In identifying plant species and diseases [50], studies have shown that CNNs can be more competitive in performance than traditional feature extraction methods [51], [52]. A typical CNN architecture mainly includes convolutional layers, pooling layers, and fully connected (FC) layers, which are described below.
When it comes to CNN, you have to mention the convolutional layer, which extracts specific features of images through convolution kernels of different sizes. After multiple applications of the convolutional layer, a set of feature maps of the input image can be extracted. If H i represent the feature map of the i-th layer of CNN, which is defined as follows: where H i is the feature map of the current layer, H i−1 is the feature map of the network of the previous layer. W i represents the weight of the i-th layer, and b i is the bias vector of the i-th layer and ϕ() represents the convolution function. The pooling layer is sandwiched between continuous convolutional layers and has no parameters. It downsamples the output of the previous layer, reducing the dimensionality of each feature map while retaining most of the relevant details. Through compressing the amount of data and parameters, this approach avoids overfitting. Assuming that f l i represents the output feature of the i-th local perception in the l-th pooling layer, then f l i can be expressed as follows: where down() represents the downsampling function, f l−1 i is the feature vector of the previous layer, and s is the pooling size.
Generally, there will be one or several FC layers between successively stacked convolutional layers, pooling layers, and output layers. The purpose of the FC layer is to use the extracted features to classify images. The Softmax function is usually used to classify and predict the features extracted from the previous layer. The definition of Softmax is defined in Equation 3.
Among them, z j is the output value of the i-th node, and K is the number of output nodes, that is, the number of categories. Through the Softmax function, the output value of the multicategory can be converted into a probability distribution in the range of [0, 1] and the sum is 1.
After the convolutional layer and the FC layer, there will be a Rectified Linear Units (ReLU) layer using ReLU as the activation function. In the usual sense, it refers to the ramp function in mathematics, that is as below.
In the neural network, the ReLU function is used as the activation function of the neuron, which is the nonlinear output result of the neuron after the linear transformation w T x + b. In other words, for the input vector x from the previous layer of the neural network into the neuron, the neuron using the ReLU function will output max 0, w T x + b to the next layer of neurons or as the output of the entire neural network.
It can enhance the nonlinear characteristics of the entire neural network without changing the convolutional layer or FC layer itself.

1) VGGNET
Researchers from Oxford University Visual Geometry Group and Google DeepMind developed VGGNet, which can perform excellent classification in convolutional neural networks. It contains 16 or more convolutional layers, pooling layers, and FC layers. The most prominent feature of VGGNet is that through the combination and stacking of 3×3 convolution kernels, more small features in the input field are extracted [53].In FIGURE 4, by constructing and combining multiple 3×3 convolution kernels, the same calculation effect as the convolution kernel size of 5 × 5 or 7 × 7 can be obtained. The continuous small-size convolution kernel has better nonlinearity than a single larger convolution kernel. VGGNet divides the network into five parts, comprising multiple 3 × 3 convolutional networks in series. After the convolutional layer, there is a maximum pooling layer, then three FC layers, and a Softmax classification layer. Many levels of networks are included in VGGNet. The depth of them varies from 11 to 19 layers. The more usually used ones are VGG-16 and VGG-19.
FIGURE 5 provides a network structure diagram of VGG -19; the VGG-19 network contains 19 weight layers, which are 16 convolutional layers and three FC layers. Input a 224 × 224 × 3 image. All convolutional layers use 3 × 3 convolution kernels. Every 2 or 4 convolution kernels are continuously stacked to form a convolution sequence to simulate a larger receptive field effect. In order to maintain the translation invariance of the model, the 2 × 2 pooling window is employed in the pooling layer, which can make the size of the feature map after convolution smaller. Three continuous FC combines the FC layer, the number of channels is 4096 4 096 1000, and finally classified and output by the Softmax classifier with 1000 labels.

2) OTHER CNNS
In addition to Oxford University's VGGNet [54], Google's [55] and Microsoft's ResNet [56] models are also VOLUME 9, 2021 where x is the network input and H (x) is the output. Inception Module is the first to use branching processing of convolution kernels (also called Bottleneck Layer), as shown in FIGURE 7. It introduces the idea of factorization into small convolutions, which split a larger two-dimensional convolution into two smaller convolutions, such as splitting the 5 × 5 convolution kernels into a 5 × 1 and a 1 × 5 convolutional kernel. This convolutional structure splitting can handle richer features and increase feature diversity.

C. TRANSFER LEARNING
To get better machine learning results, we use transfer learning. Specifically, knowledge from one domain (source domain) is transferred to another domain (target domain). The ImageNet data set includes approximately 1.2 million images and 1000 categories, whereas the data set obtained in this study is 900 images and three disease categories. For comparison, there is insufficient corn disease data to train a deep network, so transfer learning technology is used. Without adding data, fine-tune the Inception V3, VGG-16, ResNet18, and VGG-19 networks: Create and load the weight pre-trained by ImageNet, and connect a new FC layer.

D. PROPOSED FRAMEWORK
As mentioned above, VGGNet has robust and accurate classification capabilities. Because the model is highly portable, it is often used for transfer learning [57].
This paper proposes a corn disease diagnosis CNN model. It is an improvement over VGG-19, with five convolutional layers, five maximum pooling layers, two FC layers, and an output layer with a Softmax classifier. Experiments have shown that the 0-th, 5-th, 10-th, 19-th, and 28-th layers of the VGG-19 network have positive effects on feature extraction, so these five layers are included in the feature extraction part of the proposed CNN model.  layer (convolution-1). After each convolution layer from convolution-1 to convolution-5 Layer, a 2 × 2 maximum pooling is connected, with a step size of 2 and padding of zero. The convolution kernels' number of convolution-1 to convolution-4 is 64, 128, 256, and 512 in that sequence. The amount of convolution kernels continues to grow until convolution 5, where it is already 512. The scale of all convolution kernels is 3 × 3. The ReLU activation mechanism is used after each convolutional layer to stimulate the neurons in the next layer. The final FC layer has three neurons representing the number of corn leaf diseases, and the result is used as the input of the Softmax classifier. The overall architecture of the proposed method is shown in FIGURE 9, and the plant disease diagnosis method is summarized as follows. To begin with, use the K-means clustering method to cluster the images. Then for model training, feed the sample image into the deep learning network proposed above, and apply the learned model to the classification prediction of the test set image. Finally, obtain the outcome of plant disease diagnosis. The preceding procedure is briefly described in the following steps: Step 1 Cleaning the Data: Remove duplicate data in diseased images. Because of the unequal distribution of disease data, 300 images of each disease were chosen randomly for the experiment.
Step 2 Image Resizing: Change all images to a fixed size of 224 × 224 pixels.
Step 3 Data Split: The disease images are divided into two parts, train and validation, with a ratio of 8/2. To cluster the training certificate images, try different K values (2,4,8,16,32,64).
Step 4 Model Training and Verification: Refer to the method proposed, use the training set to train the model, and employ the validation set to evaluate the model. To thoroughly verify the effectiveness of this method, many experiments were repeated.
Step 5 Test: Apply images to verify the effectiveness of the model, which are not involved in modeling. The output result is compared to the real label, and the relevant performance index is computed.

A. EXPERIMENTAL SETUP
The experimental environment is Ubuntu 16.04 systems using the Pytorch framework for training, and Python is selected as the programming language. Computer GPU memory is 8GB, equipped with Intel(R) Xeon(R) CPUE5-2628 v3 processor.
In order to prevent model overfitting, we used 5-fold crossvalidation and applied the dropout technique in the network structure. There are 720 training samples and 180 test samples for each verification.

B. PERFORMANCE METRICS
Define the confusion matrix R ij , in which each column of the matrix R j (i = 1, 2, 3) represents the class prediction of the sample by the classifier, and each row of the matrix [R i ] (j = 1, 2, 3) represents the true category to which the sample belongs. Three general metrics for evaluating the performance of multiclass models can be obtained from the confusion matrix.
The percentage of correctly labeled samples in all classified samples is known as accuracy. It can reflect the classification performance of the model on data. Equation 6 shows its definition.
The recall is used to measure the probability that the prediction is correct in the instances labeled as i. It can express the effect of a certain type of recall. The recall calculation process is described in Equation 8.
The f1-score (f1) is calculated by taking the weighted average of Precision and Recall (Equation 9). In other words, F1 conveys a balance between Precision and Recall. Although it is not as intuitive as accuracy, F1 is generally more valuable than accuracy, mainly when the class distribution is uneven.
The receiver operating characteristic curve (ROC curve) is a graphical tool widely used in classification problems to evaluate the accuracy of prediction models. It reflects the relationship between sensitivity and specificity. The X-axis is specificity, the closer to zero the higher the accuracy; Y-axis is sensitivity, the larger the y-axis the better the accuracy.
The area under the ROC curve is called AUC (Area Under Curve), which is used to indicate the prediction accuracy, and the higher the AUC value, that is, the larger the area under the curve, the higher the prediction accuracy. The higher the AUC value, that is, the larger the area under the curve, the higher the prediction accuracy. The closer the curve is to the upper left corner (the smaller the X and the larger the Y), the higher the prediction accuracy.   Compared with other networks, the proposed model has a simple structure, the number of parameters is 3.34E+09, and the number of operations is significantly reduced to 1.68E+07, which is second only to ResNet18, and the accuracy is 88.50%, which is higher than other models.
The proposed model removes some convolutional layers and reduces the network depth based on VGG-19, which indicates that it is possible to obtain good classification results even if fewer feature maps are extracted.
From TABLE 3, for clustering preprocessing of different K values, the proposed CNN model, VGG-19 and Inception V3 have achieved the best results on 32-means image data, whereas VGG-16 and ResNet-18 are on 64-means image results, and they are inferior to the performance of the original images. On clustering data with the same K value, the proposed CNN model on 4-means, 8-means, 16-means, 32-means, and 64-means ranked first, respectively higher than the second-place 5.50%, 6.00%, 7.05%, 8.98%, and 5.86%. Similarly, the 2-means has achieved good results in second place. In addition, the proposed CNN model achieved the best performance in 32-means samples.
According to TABLE 3, the proposed CNN model is better than other models regardless of whether K-means preprocessing is performed, and effective K-means preprocessing will also bring more accurate diagnoses.
After 50 epochs of training, FIGURE 10 and FIGURE 11 depict the accuracy and loss curves of different models in training. VGG-16 and VGG-19 converge very quickly, and the training loss is finally about 0.3, but the accuracy on the validation set is not satisfactory. In FIGURE 10, the loss value of ResNet-18 is the largest among several models, reaching 0.4. After training, the maximum accuracy of our proposed method on the validation set can reach 96%, and the minimum loss is 0.27.   Since the literature [58]- [61] used the same AI Challenger data for similar crop disease research, it is comparable to the method in this article. TABLE 4 records the comparison results of the method in this article and the methods in the above four papers.
In the study [58], 61 kinds of crop diseases and insect pests were classified and identified based on the depth model of ResNet50. The authors pre-trained the model, adopted a finetuning strategy and added some levels to complete the task of crop disease level detection. Using transfer learning technology, the detection and recognition accuracy of the final model is 88.65%, which is lower than the accuracy of the method in this paper. In [59], a residual dense network-based tomato leaf disease identification model was proposed. The RDN from the image supertask was converted into a classification model by adjusting the model architecture. Part of the AI Challenger open-source data was used to identify the color of 9 tomato diseases. The Late Blight Water Mold sample number was as many as 1536, while the Target Spot Bacteria only had 74 samples. Although the final model reached 95% accuracy, the sample was not balanced. The problem did not give a solution. Ai et al. [60] designed the Inception-ResNet-v2 network model to identify 27 diseases of 10 crops, obtained 86.1% accuracy through training the model, and developed a WeChat applet for plant leaf image recognition. Like [59], the problem of sample imbalance is not considered, and the structure is very complicated and the amount of calculation is huge. Xin et al. [61] proposed a multi-scale residual neural network, using AI Challenger combined with a self-sampling book to identify 8 types of grape diseases. Multi-scale ResNet introduces multi-scale convolution to change the response of the bottom layer of ResNet to different scale features and uses the added SENet to improve the feature extraction ability of the network. Experimental results show that the average recognition accuracy of this method reaches 90.83%. The above results show that, compared with existing related research, the model proposed in this paper has a better cost performance and competitive advantage in terms of the number of parameters and recognition accuracy, and is more suitable for deployment on devices with low hardware requirements.
During model training, a single test set may have contingency and randomness. Therefore, in order to verify the stability and accuracy of the proposed model, the experiment uses 5-fold cross-validation to evaluate the model . TABLE 5 shows the model performance of each fold. It can be seen from the table that different data divisions have little influence on the results. The single cross-validation model predicts that the various indicators are distributed between 91% and 95%, with small fluctuations. After five calculations, the precision, recall, and F1 of the proposed model are finally stable at about 93%. Therefore, it is reliable to use the average value of the five-fold cross-validation as the final corn classification performance. FIGURE 12 shows the classification confusion matrix of the three corn leaf diseases. Following the definition of statistical parameters related to the confusion matrix, we analyzed and calculated three statistical parameters from the confusion matrix, as listed in TABLE 6, to better show the details of the methods proposed to achieve this diagnosis. According to TABLE 5 and FIGURE 12, it can be seen that 52 out of 55 gray spot diseased leaves were correctly identified, with a recall of 90.95%; 53 out of 62 leaf spot diseased leaves were correctly identified, with a recall of 89.24%; All 63 rusty leaves were correctly identified, and the recall was 100%. This shows that although our method has achieved good results with the average value of various evaluation indicators, it has not yet reached the ideal diagnosis result in leaf spot diseased leaves. Nearly 17% of leaf spots were predicted to be gray spots. The reason is that the symptoms of gray spot and leaf spot are similar and difficult to distinguish, and some corn leaves have more than one disease. This is a challenge for the diagnosis of corn leaf disease. In this paper, three different types of corn disease ROC curves are drawn to further evaluate the classification and diagnosis capabilities of the model.  Although the method proposed in this article has achieved certain results in the classification and diagnosis of corn leaf disease, due to the small experimental samples in this article, the simple sample background, and the limitations of experimental conditions, the method proposed in this article still has many problems that need to be solved in the future.
(1) In this work, our research on corn leaf diseases is based on a small data set, which contains fewer sample types, and most of the disease images are obtained under controlled conditions, which cannot meet the actual needs for disease identification. In future research, it is necessary to improve the disease data set, fully collect disease images under different conditions, and at the same time improve the diagnosis effect of the model in the actual complex environment.
(2) The model proposed in this paper is currently only for the classification and diagnosis of maize leaf viruses, and the generalization ability of this model needs to be further verified. In future research, we will extend to other corn diseases and even other crops, and study the adaptability of the model proposed in this article, to further improve the model in order to achieve satisfactory extrapolation ability.
(3) In addition, although we have proved that image preprocessing with a suitable K-means clustering algorithm is helpful for the classification and diagnosis of corn leaf disease, an exhaustive brute force search is used for the selection of effective values of K. Future work will be committed to applying swarm intelligence optimization methods [62]- [65] to find the most suitable value of K.

V. CONCLUSION
This paper proposes a method based on K-means and an improved deep learning network to identify corn leaf diseases. K-means clustering algorithm is used in the image preprocessing stage to perform simple segmentation of diseased images. Compared with images that have not been pre-processed by clustering, proper K-means clustering can greatly improve the performance of model classification and diagnosis. In addition, under the same experimental conditions, compared with the traditional networks VGG-16, VGG-19, ResNet18, and Inception V3, the proposed deep learning network structure in this article is simple, reducing the number of parameters and the number of model calculations, and at the same time, it has higher performance than other models. Therefore, the proposed method is suitable for deployment on a disease recognition platform with low hardware conditions. and PAKDD. He has more than ten ESI highly cited papers and two hot cited papers and with more than 10500 citations and an H-index of 57. His current research interests include evolutionary computation, machine learning, data mining, and their applications to medical diagnosis, bankruptcy prediction, and parameter extraction of the solar cell. He is ranked worldwide among top scientists for computer science and electronics prepared by Guide2Research, the best portal for computer science research Technology, Taif University. His research interests and activities lie at the interface of computer science and operational research, intelligent decision support systems, search and optimization, such as combinatorial, constraint, multi-modal, and multi-objective optimization, using heuristics, local search, and meta-heuristics, in particular memetic algorithms, particle swarm optimization, hybrid approaches, and their theoretical foundations . VOLUME 9, 2021