Disease Detection in Apple Leaves Using Image Processing Techniques

The agricultural sector in Saudi Arabia constitutes an essential pillar of the national economy and food security. Crop diseases are a major problem of the agricultural sector and greatly affect the development of the economies in various countries around the world. This study employed three prediction models, namely CNN, SVM, and KNN, with different image processing methods to detect and classify apple plant leaves as healthy or diseased. These models were evaluated using the Kaggle New Plant Diseases database. This study aims to help farmers detect and prevent diseases from spreading. The proposed method provides recommendations for the appropriate solutions for each type of recognized plant disease based on the classification results. Keywords-plant disease; apple; deep learning; machine learning

INTRODUCTION Agriculture plays a substantial role in modern life and contributes to the development of the economy by providing food security [1]. The agricultural sector in Saudi Arabia constitutes an essential pillar of the national economy and food security [2]. Crop diseases are one of the main problems of the agricultural sector and significantly affect the economic development in various countries, as they affect both quantity and quality of crops [3]. As a result, crop damage results in significant productivity loss, affecting the economy. According to the Food and Agriculture Organization [4], the world population will increase to more than 9.1 billion by 2050. The food growth rate should increase by 70% to meet the nutritional needs of such a vast global population.
Crop diseases can be challenging to diagnose. Some types can be dangerous, especially scab, black rot, and cedar rust, and they could spread fast to other parts of the leaf if left untreated, leading to the death of entire crops [5]. In addition, the diagnosis of crop diseases can be expensive, takes a long time, and is often prone to errors due to the difficulty of analyzing each plant individually [6]. Furthermore, it is expensive for many farmers to consult an expert about their crop disease problems, especially in developing countries. Hence, it is crucial to detect and recognize these diseases in their initial stages to prevent them from spreading. The continuous development of Computer Vision (CV) and Digital Image Processing (DIP) have tremendously impacted the detection of diseases, as they are used to improve image quality and aid diagnosis [7,8].
The improvement of the accuracy of diagnostic techniques through image processing and classification methods is crucial to address the challenges of manual diagnosis of plant diseases [9,10]. Several researchers were inspired to build efficient algorithms using Machine Learning (ML) and Deep Learning (DL) models to provide accurate diagnoses of plant leaf diseases [11,12]. This study employed three prediction models, namely CNN, SVM, and KNN, to detect apple plant leaf photos and classify them to healthy or diseased. Several image pre-processing methods were also applied to improve the quality of the images and remove any noise that could affect the classification results. Segmentation techniques were also used to detect the correct disease spot and extract the most convenient features to obtain the best result, while the proposed models were evaluated in a large and reliable database.
II. RELATED WORK The methodologies used by previous researchers to classify plant diseases are separated into two categories: DL-based and ML-based approaches. In [8], several ML-based classifiers were proposed for the detection and classification of plant diseases. A dataset of 14956 images was used, classified into 38 distinct classes. The highest accuracy was achieved by the RF model, with performance up to 73%, which was better than any other method. Several studies evaluated the performance of various DL methods in detecting plant diseases. Some studies used multiple CNN architectures such as AlexNet, GoogLeNet, InceptionV3, ResNet-101, ResNet-50, ResNet-34, ResNet-18, and VGGNet-16 to detect and classify apple plant diseases. In [13,14], a VGG-INCEP model was proposed, combining VGGNet-16 with InceptionV3, to detect apple leaf diseases. This model had higher accuracy than several prominent deep CNNs. In [15], ML methods, namely SVM and transfer learning, were used with DL techniques, such as DenseNet, VGG, ResNet, and GoogLeNet, to detect apple plant diseases. The result showed that SVM had low accuracy while transfer learning models had the highest. The Plant Village dataset with 1821 labeled images of apple plant leaves classified into 3 disease and one 1 classes, was utilized. Deep CNNs built on the AlexNet platform were used in [16] to detect apple leaf diseases. The dataset had a total of 13,689 images and the highest accuracy was 97.62%. In [17], deep CNN network architectures and ML methods, such as SVM and KNN, were evaluated. The used dataset had a total of 54,000 images of different plant diseases, and the highest accuracy was achieved by the SVM model. In [18], a CNN segmentation model trained on tomato leaf images was proposed, and a smartphone application was deployed, that can be used for real-time diagnosis and detection of diseases and monitor and manage early tomato growth stages. In [19], a hybrid framework of DL and SVM was proposed, where the results showed that the SVM classifier models outperformed the softmax function.

III. BASIC STEPS FOR DISEASE DETECTION
Recent developments in DL and ML appear to enhance the accuracy potential of plant disease detection [20]. The main stages of the proposed model are shown in Figure 1. It starts with pre-processing and enhancing the input images, followed by segmentation, feature extraction, and classification. A. Image Pre-Processing The first step in any CV-based system is to pre-process raw images [21]. Image preprocessing is necessary to ensure equality and uniformity between all images, making the following segmentation and feature extraction steps more accurate and effective. For this reason, the noise in the background should be removed to enhance images and color space conversions, helping to properly focus on the region of attention of an image [22]. All plant images were converted from RGB to LAB color space, as the latter decouples color and lightness, enhancing object detection [23].

1) CIELAB Color Model
CIELAB is a color space and color characterization system based on human vision, released by the International Commission on Illumination (CIE) in 1976. It is considered one of the most common and accurate color space systems used for detecting affected areas in plants. These affected areas have different colors, relying on the color without intensity [24,25]. The following equations were used to convert the images from RGB to CIE XYZ [26]: CIELAB uses a three-dimensional system where L represents brightness, A represents a range of colors from green to red, and B represents the colors ranging from blue to yellow [25]. The formula of the LAB color model is defined as [27]:  Figure 2 shows the RGB to LAB color conversion.

B. Image Segmentation
In many cases, images contain parts or problems that affect the detection results, such as an unwanted background [28]. For this reason, it is important to manipulate image segmentation approaches to detect the correct disease spot, which also helps the feature extraction process [22]. Image segmentation is a method used to divide data into groups with comparable properties or qualities to make it simpler for analytical operations [29] [30,31], and K-means clustering [32]. K-means clustering segmentation categorizes pixels by decreasing the average squares of the distance between the objects and the clusters they belong to [33]. The segmentation stage consisted of three steps. At first, K-means clustering was used to detect and classify pixels and objects into K classes based on the feature set [34]. Subsequently, the flood fill method was used to fill the empty spaces in binary images and match the color of the hole space of connected pixels with the same color [35]. Finally, the resulting image was applied as a mask to the original to obtain the image section containing the required characteristics [36]. The mask images were obtained after the flood fill algorithm combined with the original RGBbased image, which gives a background-free image. Figure 3 shows the steps of the image segmentation process. The last image presents only the leaf, as the background containing all unnecessary information was eliminated. Fig. 3.
Steps of the image segmentation process.

C. Feature Extraction
Feature extraction is a critical step in any CV system based on ML. Several feature extraction methods are used to obtain significant features that can be used in the disease classification stage. Some of the essential features adopted in plant disease detection are color, shape, and texture features [37][38][39]. Many researchers used techniques such as grey-level co-occurrence matrices and LBP for the extraction of texture features [40,41]. This study focused on texture because it is the most common and important feature used to detect different plant diseases. The Local Binary Pattern (LBP) method [8] was used to extract the features, considering the texture and color of any image [41].

D. Classification Method Based on Machine Learning
Classification is a supervised ML approach used in many applications such as disease prediction, classification, etc. [42], where the algorithms learn from the dataset how to classify. For evaluation purposes of the apple classifier, two ML algorithms were executed: SVM and KNN. SVM is a popular classifier algorithm that has been used to handle plant disease detection and classification [19,42]. The SVM process determines whether plants are diseased or healthy by optimizing a hyperplane. This algorithm creates a separate set of features that classifies the data sample into two different classes. At first, the segmented images are used to extract the LBP features, and then the SVM is trained on the extracted features. KNN is another classifier algorithm that has been used in plant disease classification [18]. Its implementation includes the following steps: • Finding the closest neighbors using the distance measure.
• Derive a classification from KNN using the distance rule.
• How many neighbors does the new sample require?
Plants are classified as diseased or healthy based on the KNN process by determining and calculating the nearest point to a new sample. If the nearest neighbor is diseased, then the plant indicates disease. Considering the K value to be small, K values of 1, 3, or higher are commonly employed to assist in minimizing noisy data sets. Finally, the KNN is trained on LBP features extracted from the previous stage.

E. Classification Method Based on Deep Learning
In DL, CNNs are most commonly used for image recognition and classification [43]. In contrast to the traditional feature extraction approaches, CNNs have been demonstrated to automatically learn robust and level features from raw images and outperform hand-extracting specific features. Pooling, convolution, and full connection layers are the main components of a CNN architecture. This study used one of the most popular CNN models, the GoogleNet architecture [43]. This model has been used successfully in CV challenges [14]. The goal is to employ these structures to improve the plant disease detection results. GoogleNet is a 22-layer deep CNN variation of the Inception Network, developed at Google [43]. The inception network employs parallel convolutions in conjunction with a max-pooling layer. This feature allows it to capture multiple features at the same time. However, GoogleNet was modified for this dataset and the classification procedure and trained on the segmented testing data. The parameters of some layers of the network were changed as per requirements.

A. Classification Methods
During the experiment, multiple DL and ML models were implemented to evaluate their performance and determine which fits best this dataset and is most effective and efficient at classifying plant diseases. These models included CNN, K-Nearest Neighbor, and SVMs. Accuracy and the Area Under the Curve (AUC) were used to analyze and compare the performance of these models. Figures 4-6 show the confusion matrices for the validation dataset of these models. Figures 7-9 show the AUC for the Google Net, SVM classifier, and KNN classifier models, respectively.       The maximum accuracy of 98.5% was achieved with GoogleNet. The DL classifier outperformed the ML because it was less complex, and there is no need to extract features before the classification process. Feature extraction is a necessary step before classification in ML methods. The DL classifier was also faster than the ML classifiers as the size of the dataset increased. ML algorithms such as KNN and SVM became slower as the dataset grew. On the other hand, SVM works better for binary classification, while KNN works based on majority voting, and DL learns to classify the 4 classes of apple leaves accurately on its own. Table III presents a comparison with similar works. Many studies have been conducted to diagnose plant diseases using ML and DL, but as most of them used different datasets and evaluation measurements, it is objectively difficult to compare the results. In [13] a DL-GoogleNet architecture was introduced for the identification of apple leaf diseases and the results showed an accuracy of 94.12%. Several deep convolution networks, one of which was the GoogleNet, were utilized in [14] for the detection of apple leaf diseases, and their results were evaluated through accuracy performance. The accuracy achieved by GoogleNet in this study was 94.85%. ML-based classifiers and several DL models were utilized in [15], where SVM and GoogleNet achieved accuracies of over 50% and 94%, respectively. A method to identify apple leaf diseases, based on the GoogleNet architecture, was presented in [16], which obtained an accuracy of 95.69%. Authors in [8,17] used different datasets comprising of apple and other plant diseases, so it is difficult to compare their results. ML-based classifiers, namely SVM and KNN, were presented in [8] to detect and classify plant leaf diseases, obtaining accuracies of 67.27% and 63.20%, respectively. An SVM classifier was presented in [17] for the detection and classification of plant leaf diseases, depending on feature extraction using LBP and obtaining an accuracy of 80.6%. These results show that the proposed method performed better than the others, as it attained 98.5% accuracy on GoogleNet, 82.25% on SVM, and 70.3% on KNN. Therefore, image processing, extracting, and important features acquisition play an important role in increasing the accuracy and effectiveness of the models. Moreover, as these approaches [8,[13][14][15][16][17] have a higher computational cost, the proposed method is more successful and efficient for classifying plant diseases.

B. Plant Disease Detection System
A Graphical User Interface (GUI), shown in Figure 10, was built for the proposed system to help users detect plant diseases and get appropriate recommendations for pesticides to treat any found disease. The screen is divided into three sections: the left part (input panel) to load or delete an image, the middle section displays the image after loading, and the right part (output panel) shows the classification result and the appropriate pesticides for the disease. Figure 11 presents a sample of results. As shown, the system detected apple scab disease and recommends sulfur pesticide to treat it. Figure 12 shows the result if no disease is detected and no pesticide is required. The user can press the delete button to test other images.   VI. CONCLUSION This study aimed to detect and prevent the spread of apple leaf diseases by detecting them in the early stages. To achieve that, 240 images were obtained and categorized as healthy and diseased. The images were pre-processed to improve the quality, and the Local Binary Pattern (LBP) method was used to extract features from the segmented region of interest. Three prediction models were applied: CNN, SVM, and KNN. GoogleNet acquired the highest accuracy at 98.5%. A userfriendly GUI was created for the system that displays the classification results and recommendations for the treatment of diseases. During the investigation, some challenging problems arose because the symptoms of different diseases can be nearly identical and can appear simultaneously. In the future, the transition of this system to a mobile application in order to make its use easier and to obtain diagnosis faster, is intended.