Abstract

Pathogens, including viruses, bacteria, and fungus, are the biotic agents that cause illnesses in crops and are the major cause of yield losses of up to 16 percent in certain parts of the globe. Pathogens are the primary cause of yield losses in some parts of the world. Deep learning algorithms, which are at the cutting edge of technology, are now being used to identify crop disease at an earlier stage. Supervised learning (support vector machine and K-nearest neighbor), ensemble learning (random forest and AdaBoost), and deep learning approaches were used in this study to suggest a classification of paddy leaf diseases, including bacteria leaf blight, blast, hispa, leaf spot, and leaf folder (neural networks). In order to evaluate the performance of the learning approaches, accuracy, recall, precision, score, and area under the receiver operating characteristic curve were used to evaluate the performance of the interpretation (ROC and AUC). According to the results of the investigation, when the fold value grows, the value of the evaluation metrics (AUC, CA, , precision, and recall) increases in a progressive manner, i.e., the 0.001 value increases as compared to the values obtained with the previous folds. When comparing the neural network to the baseline classifiers, the assessment metrics demonstrate that the neural network performs much better.

1. Introduction

Agriculture has had a vital role in the establishment of India’s monetary system. The rancher determines the required harvest based on the type of the land, the climatic conditions of the surrounding region, and the monetary value of the land. Following the country’s rapid population increase, coupled with global warming and climate change, agricultural firms began looking for other methods of expanding their food production. This enables experts to look for new high-utility innovations that are both powerful and precise in their seek. Farmers may acquire data and facts to make the best decision possible when it comes to high farm creation by using accurate horticulture in data innovation. Acknowledging the importance of precision agriculture (PA), which is a cutting-edge invention that provides complicated tactics for improving field production. It is possible to achieve financial development in the agricultural sector by applying new technological breakthroughs. The use of PA may be beneficial in a variety of applications, including plant insect detection, weed identification, agricultural production enhancement, and the discovery of plant ailments. Pesticides are used by farmers in order to control pests, keep them away from diseases, and increase crop output. Plant or crop disease is a wreaking havoc on farmers’ livelihoods as a result of poor yields, financial difficulties, and contemporary horticulture. Therefore, pathogen identification and severity should be defined in a suitable and consistent manner [1].

In the beginning, statistical approaches were employed to anticipate agricultural revenue, but later on, it became necessary to conduct logical examinations and disseminate information [2]. In 1763, Thomas developed Baye’s hypothesis, which is based on the likelihood concept, which serves as the foundation for the data mining concept. Adrien Marie Legendre suggested regression and least squares in 1805, which was later adopted by the United States. Data mining techniques were first used by analysts in the 1960s, and it was only in the 1990s that the database community officially adopted data mining as a standard practice [3]. Data mining algorithms are used to find unidentified data and to detect patterns in large datasets that are otherwise undetectable. Image processing techniques (IPT) were used since information mining could not cope with pictures as information, and it was necessary to switch to this method [4]. This was developed at the Jet Propulsion Laboratory in the 1960s. The Canny edge detection technique is used to locate the contaminated region of the leaves, and the highlights of the leaves are extracted based on the mean and standard deviation of the contaminated region estimated from the contaminated area. Because image processing techniques are difficult to assess massive datasets and because there is no automated framework, another suggestion was made in 1980, which was machine learning methods, which was offered for dealing with real-time dynamic challenges [5]. It makes effective use of the asset, and a computerized framework based on machine learning models was developed. In machine learning, more errors are made throughout the predicted time range. Deep learning was offered in 2010 as a means of avoiding this problem altogether. In this case, CNN is a learning algorithm that discovers new knowledge about the information within the time frame, and it delivers the best order precision in the time frame that is anticipated [6]. The following is an excerpt from Sairam Reddy Lattupally’s Quora answer [7]. In machine learning, the feature extraction process is influenced by the human’s prior knowledge. In a few instance, manual feature extraction may result in incorrect analysis being performed. ML approaches based on classical machine learning (ML) techniques have a poor accuracy in identifying paddy illnesses; however, deep convolution neural network (DCNN) overcomes these limitations [8]. DCNN is an end-to-end pipeline that allows for the detection of illnesses at a low computational cost [9].

Our main contribution of this work is as follows: (1)Created own dataset and used machine learning and deep learning models on a real-time dataset for paddy classification(2)Proposed a new deep learning model called CRI_NET_V1(3)To determine which approach performs better in disease classification, we performed a study that included the use of supervised learning methods (SVM and KNN), ensemble learning methods (random forest and AdaBoost), and deep learning techniques (neural networks—Inception-v3).(4)Two-, three-, five-, and tenfold cross-validations are used to determine which folds work best for the proposed model

The following is the outline for the paper: Section 2 provides an explanation of the related works. Section 3 describes the materials and processes that were employed in the creation of the work; the following is a subsection of this section: information regarding the data repository, the preprocessing process, feature extraction, supervised learning methods (SVM and KNN), ensemble learning methods (random forest and AdaBoost), and deep learning techniques is included within this document (neural networks—Inception-v3). Section 4 provides an explanation of the findings and debates. Section 5 brings the article to a close and gives a view for future research.

This section provides a full explanation of the detailed works offered by a variety of researchers. To engage the plant organization, the authors [10] provide a framework and an easy-to-use logical classification of ML techniques to apply the appropriate ML systems and best-practice rules precisely and adequately for various plant stress attributes, including biotic and abiotic stress. Reddy and Sashirekhak [11] provide information on several types of plant disease, as well as advanced ML and IPT to detect plant disease. This outline also provides extensive evaluation opportunities that will aid in further inquiry towards recognizing precise agriculture. Radhakrishnan et al. [12] employ perception and machine learning algorithms to arrange wilderness land on a landscape dataset derived from the ASTER imaging instrument in order to get understanding of the cumulated data by utilizing box plot and heat map. According to [13], a strategy for the detection and presentation of plant leaf illness using the KNN classifier was suggested. It has been presented in [14], an improved artificial plant optimization estimator using machine learning (ML) that detects the plant illness and groups the leaves into sound and spoilt using a dataset of 236 photos. [15] developed an AI-based modified plant leaf disease detection and prediction system that provided predicted solutions for disease repair.

Plant leaf disease detection and depiction using artificial intelligence (AI) was offered by [16] for an energetic and fundamental recognition of illness, followed by describing it and executing predicted arrangements to repair that sickness. Using a deep forest approach, [17] demonstrated the affirmation and order of maize plant leaf illnesses for the first time. In [18], it is recommended that a global pooling enlarged CNN be used for the detection of plant disease. In this paper, Tripathy et al. [19] focus on the most recent advances in investigations concerning machine learning (ML) for massive information coherence and diverse processes relating current computing requirements for various applications. In 2020, [20] announced a few-shot learning system for plant leaf arrangement utilizing deep learning with little datasets, which was developed by [20]. Tripathy et al. [19] provide a set of strategies that are intended to address, enhance, and empower multidisciplinary and multi-institutional machine learning research in clinical care informatics. Using hyperspectral imaging in conjunction with a variable decision system and machine learning, [21] investigated the common sense and likelihood of presymptomatic acknowledgement of tobacco contamination. [22] provides approaches on the most capable strategy to use ML within any affiliation and evaluates the appropriateness, sensibility, and viability of ML applications. [22] provides strategies on the most capable strategy to apply ML within any affiliation. The researchers [23] describe a lightweight CNN approach that may be used to break down grape illnesses, such as dim rot, dim measles, and leaf scourge. The authors [22] developed a CNN technique with eight hidden layers for the categorization of tomato diseases. Table 1 illustrates the many research methodologies that have been used by scientists.

3. Materials and Methods

3.1. Data Repository

The following locations in the state of Tamil Nadu were photographed: the Agri field (VIT School of Agricultural Innovations and Advanced Learning (VAIAL), Vellore), Brahmapuram, Sevur, Latheri, and Vaduthangal from the Vellore district. A high-resolution camera, the Canon EOS 1200D, and a thermal imaging camera, the FLIR E8, were used to obtain the photographs (thermal camera). The photographs were taken between August and December of 2019 and January and March of 2020, in a variety of ambient lighting conditions, including sunny, semiovercast, and somewhat cloudy conditions, respectively. It is administered in two shifts: the morning and the evening. Figure 1 and Table 2 show the raw dataset as well as the image count for each picture.

3.2. Preprocessing

During the preprocessing step, the provided picture is shrunk to pixels in size; the major reason for adopting such a specific dimension is to ensure that the image information stays consistent. CNN is a memory-intensive approach that is used mostly for training in transfer learning techniques. A higher picture size may cause the computer to run out of memory space; thus, it is preferable to scale the image to pixels, which is a regularly used dimension that supports all pretrained architecture and provides superior accuracy overall. A procedure known as data augmentation is used to enhance the size of the paddy dataset, which was previously discussed in Section 3.1, to 4254 photographs (i.e., rotating, flipping, cropping, and zooming). The noise is eliminated by the use of filtering methods such as the median and wiener filter [1].

3.3. Feature Extraction

The pattern of an item in a given picture is referred to as its features, and it is this pattern that aids in the categorization of a certain object. The qualities of a feature include things like the corner, the edges, the area of points, and the ridges. By sliding the kernel over the input picture, the convolutional layer of a deep convolutional neural network assists in extracting a certain feature from a particular input image. The pooling layer aids in dimensionality reduction, which means that it aids in minimizing the amount of memory space used. It has 4254 instances, 2062 variables, and the following characteristics: 2048 numeric (no missing values), target: category, and meta: thirteen (2 categorical, 9 numeric, and 2 string).

Illustration of the concept of seeing the feature map for a given input in order to understand which informative highlights are identified or maintained in the feature maps for that input is shown in Figure 2. In the feature maps near the input, detailed information is detected by recognizing small or fine-grained detail in the feature maps. Assumption: as seen in Figure 2(a), the dark squares indicate little or inhibitory weights, while the dazzling squares represent big or excitatory weights. According to the results of this impulse response, we can observe that the channels in the first row detect a gradient from bright in the upper left to dark in the bottom right.

3.4. Model Development
3.4.1. Supervised

(1) SVM. The distinction between logistic regression and other approaches does not exist when the cases are near to the border of the decision domain. A potential consequence of this is that the decision boundary it picks may not be the most optimum. Consequently, the optimal decision boundaries should be capable of expanding the distances between them and all instances to the maximum degree feasible while yet maintaining their integrity. Another way of saying it is to boost your profit margins [31]. A consequence is that the SVM algorithm is quite significant. We are using SVMs to determine the best line in two dimensions or the best hyperplane in three dimensions to help in the categorization of our space, which is a difficult undertaking. Identifying the dataset with the biggest margin, i.e., the dataset that spans the most distance across datasets in both classes, is the method for determining the hyperplane [32, 33].

The equation of the hyperplane is defined by

It can be written as

For two vectors and ,

Let us assume two support vectors, where one is positive and another one is negative. Then, margin can be calculated by first subtract of positive by of negative and do dot product on the subtracted vector and the unit vector .

The cost function and gradient updates are calculated by

The loss function measures the error that occurred during misclassification. Regularization avoids overfitting problems. The major task is to identify the trade-off between the margin size and measure that lies on the perfect lateral of the margin. The above loss function can be minimized by

Therefore,

Then, the objective function is

(2) KNN. KNN is classified as supervised learning in most cases. As a result of the K-NN definition, new particular examples and existing cases are considered comparable, and the new case is placed into the class that is the most similar to the existing categories. The K-NN approach preserves all of the information that is available and then identifies a new instance based on how closely it resembles the current data. This suggests that utilizing the K-NN approach, fresh data may be swiftly sorted into a well-defined group [34, 35].

3.4.2. Ensemble Model Development

Using ensemble learning approaches, individual models are combined to increase the stability and predictive capacity of the final model. Higher predictive performance is made possible by using these strategies. It is a predictive model that integrates numerous machine learning models into a single predictive model. The performance of a particular model in modelling one element of the data is superior than that of other models in modelling another. Acquire a basic understanding of various simple models and then integrate their output to arrive at a final conclusion. The overall strength of the model compensates for the variances and biases of the separate models. This results in a composite forecast with a final accuracy that is higher than the accuracy of the individual models used to make it. Ensemble learning is divided into two categories: bagging and boosting. Random forest is one kind of bagging, whereas AdaBoost, gradient boosting, and XGboost are examples of boosting.

(1) Random Forest (Bagging). Random forest approaches combine several decision trees to form a more generic model that generates random subsets of the attributes. They are often used in machine learning applications. Smaller trees are constructed from three subgroups, resulting in more tree variety. It is necessary to use a variety of decision trees in order to avoid overfitting [36].

(2) AdaBoost (Boosting). Boosting is a strategy for transforming poor learners into strong learners by increasing their motivation. When it comes to solving practices, AdaBoost is the first boosting algorithm to be used; it assists in combining several weak classifiers to form a single strong classifier [37]. Iteratively, training the AdaBoost machine learning model is the first step; later, it assigns a higher weight to incorrectly classified observations, and finally, it assigns weight to the trained classifier in each iteration according to the accuracy of the classifiers, and this process iterates until the entire training data fits without error. AdaBoost is a machine learning model that was developed by Google [38, 39].

3.4.3. Deep Learning Model

In the field of machine learning, deep learning is considered a subset. Deep learning is used to teach the system to filter the data it receives and to learn both prediction and classification tasks at the same time. In the human brain, which includes 100 billion neurons, and each neuron contains around 100,000 neurons in its immediate vicinity, deep learning finds its source of inspiration. Deep learning frameworks are accessible in a variety of state-of-the-art forms. We settled on the Keras framework since it provides a high-level standard library and is easy to use. It performs better than Theano or tensor flow. The tensor flow library is the quickest and most straightforward method of creating a neural network.

(1) Neural Network. Convolutional neural networks are the most effective for picture classification because they extract the most important characteristics from a large amount of data in a short amount of time. By linking neurons to their closest neighbors in the network, the requirements may be further reduced, resulting in a smaller number of parameters being required. By including dropouts in the network, it is possible to reduce the computational cost as well as the overfitting issue. Modern computer vision applications rely on deep convolutional neural networks (DCNNs), which are the most widely used neural network approach. In this study, the categorization software Inception-v3 is used to complete the tasks [4044]. Model structure is seen in Figure 3, and it has 24 million parameters.

3.5. Novel Model CRI_NET_V1 Architecture

The motivation to develop a new model is to address the following problems such as model utilizing more RAM space and disk space and requires more parameters to achieve better accuracy. Overfitting and underfitting problems and computational costs are more. The proposed novel model (CRI_Net_V1) illustrated in Figure 4 contains the following layers; The first layer which introduces zero paddings and pads the input with an appropriate number of 0 input so that kernel can be applied across the corners. The second layer is the Con2D layer to extract the features, and the third layer is max pooling which downsamples the images. Three convolutional blocks, one residual block, and one identity layer are utilized. The feature map is converted into a single column using flatten layer and introduces two dropouts of 0.3 and 0.2 to reduce the overfitting problem. The SoftMax regression is utilized for generating probability values. The proposed model has achieved better accuracy as compared with the pretrained model with fewer layers, less RAM, and disk space utilization. Utilized ReLU is an activation function. Since there is an enormous variation in the area of information, picking the right kernel size is tough, introducing the convolutional layer to reduce computational cost. To avoid the degradation problem, shortcut connections are added to the residual block. The training speed of the model is increased by using batch normalization, which allows a higher learning rate. During the backpropagation process, batch normalization helps initialize the weights easily.

4. Results and Discussion

4.1. Experimental Setup

Hardware specifications included an Intel(R) Core(TM) i5-8300H CPU running at 2.30 GHz, 8 GB RAM, and a 64-bit operating system running on an x64-based processor, and software specifications included Anaconda Navigator, Python programming, and an x64-based processor.

4.2. Interpretation of Confusion Matrix ML vs. DL

When analysing the results of the suggested classification model, a confusion matrix may be used. While the right classifications are located along the blue diagonal of the matrix, the misclassifications may be found at the other spots. The number of instances of each class determines the size of the confusion matrix. The 6-class model provides a confusion matrix with six dimensions, which is represented by the number 6. A full report on the proper and wrong class mapping will be provided as a result of this. Although the intended chosen characteristics are shown in the rows, it is really the true selected features that are displayed in the columns. As shown in Figure 5, each cell is categorized as true positive (TP), true negative (TN), false positive (FP), or false negative (FN), depending on whether it is true or false.

Figure 6 depicts the confusion matrix for the ML and DL methods, support vector machine, K-nearest neighbor, random forest, AdaBoost, and neural network, as well as the ML technique K-nearest neighbor. While the right classifications are located along the blue diagonal of the matrix, the misclassifications may be found at the other spots. As with other types of supervised classifiers (SVM and KNN), SVM is more accurate in predicting the leaf folder class than the other types of classifiers. The photos predicted by SVM are 1101, 963, 1627, 1104, 1204, and 2033, respectively. When it comes to the actual classifications 1081, 971, 1644, 943, 1215, and 1985, KNN accurately predicts the photos in each of those categories. Similarly, the true positive rate is greater for the leaf folder classes than for the other folder classes. Among the ensemble learning approaches (random forest and AdaBoost), random forest properly predicts the photos to the real classes by 923, 779, 1613, 801, 964, and 1865 points, according to the actual classes. For example, AdaBoost properly predicts that the photos will be in the following classes: 811, 672, 1483, 636, 800, 1545, and so on. When compared to standard supervised classifiers, the properly classified result demonstrates that ensemble classifiers perform much better. There are 1177, 952, 1659, 925, 1237, and 2023 photos successfully predicted by the neural network to the real classes, according to the actual classes. Given the higher number of properly categorized items, the resulting research concludes that the neural network outperforms the SVM, KNN (supervised), random forest, and AdaBoost, among others (ensemble learning methods).

4.3. Performance Evaluation

It is determined how well the model performs in this research by analyzing the accuracy (Acc), recall (Recall), precision (Precision), score, and area under the receiver operating characteristic curve of the model (ROC and AUC). With cross-validation, you may evaluate machine learning models on a tiny sample of data, similar to how you would with resampling. Cross-validation is a method used in neural networks to verify the competency of a deep learning model on unknown input. It is also used in machine learning. That seems to be the case: using a small sample to evaluate how well the model will perform overall when used to produce judgments and predictions appears to be the best approach. The folds used in this example are 2, 3, 5, and 10. As shown in Tables 36, the cross-validation for the various models is performed. The AUC value for the neural network model is higher, at 0.996, during cross-validation 4, and the maximum CA value of 0.93 is achieved for the neural network model during cross-validation 5. In a similar vein, when compared to baseline classifiers, the , , and values are higher, with values of 0.937, 0.938, and 0.937, respectively.

Because of the increase in the number of folds to 5, the AUC value for the neural network model is higher, at 0.996, and the greatest CA value, at 0.941, is achieved for the neural network model. In a similar vein, when compared to baseline classifiers, the , , and values are higher, with values of 0.941, 0.942, and 0.941, respectively. Now that the number of folds has been increased to 10, the AUC value for the neural network model has increased to 0.997, and the neural network model has acquired the maximum CA value of 0.942, which is the greatest in the study. The , , and values are also higher when compared to baseline classifiers, with values of 0.943, 0.943, and 0.942, respectively (Table 1). It has been discovered that raising the fold value improves the performance of the model. According to the findings of the investigation, the number of folds is directly linked to the model’s overall performance. As the fold value increases, the value of the evaluation metrics (, , , , and ) increases in a progressive manner; i.e., the 0.001 value increases as compared to the values obtained with previous folds. Illustration of the receiver operating characteristic (ROC) curve for the various disorders is shown in Figure 7. Table 7 illustrates the comparison of performance of deep learning optimizers. Table 8 represents the recognition performance of the CRI_NET_V1 models.

5. Conclusion

Using supervised learning (support vector machine and K-nearest neighbor), ensemble learning (random forest and AdaBoost), and deep learning approaches, the sickness of paddy leaves was categorized and the resulting interpretation was created in this study (neural networks). In order to measure the model’s performance, the following metrics are used: accuracy (Acc), recall (Recall), precision (Precision), score, and area under the receiver operating characteristic curve (ROC and AUC). The number of folds that have been selected are 2, 3, 5, and 10. AUC for the neural network model is greater for fold value 10, with an AUC of 0.997. The neural network also has the greatest CA, with a CA of 0.942, when fold value 10 is used. The , , and values are also higher when compared to baseline classifiers, with values of 0.943, 0.943, and 0.942, respectively (Table 1). The results reveal that the neural network outperforms baseline classifiers when compared to the data acquired from the baseline classifiers. In addition, this proposed work provides motivating future directions for aspiring researchers, such as the neural network can be fine-tuned by updating the hyperparameters, and classification can be extended further on other agricultural fields, such as weed detection, pest recognition, and plant disease prediction.

Data Availability

The data can be accessed upon reasonable request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.