Rice leaf disease detection based on bidirectional feature attention pyramid network with YOLO v5 model

To ensure higher quality, capacity, and production of rice, it is vital to diagnose rice leaf disease in its early stage in order to decrease the usage of pesticides in agriculture which in turn avoids environmental damage. Hence, this article presents a Multi-scale YOLO v5 detection network to detect and classify the rice crop disease in its early stage. The experiment is initially started by pre-processing the rice leaf images obtained from the RLD dataset, after which data set labels are created, which are then divided into train and test sets. DenseNet-201 is used as the backbone network and depth-aware instance segmentation is used to segment the different regions of rice leaf. Moreover, the proposed Bidirectional Feature Attention Pyramid Network (Bi-FAPN) is used for extracting the features from the segmented image and also enhances the detection of diseases with different scales. Furthermore, the feature maps are identified in the detection head, where the anchor boxes are then applied to the output feature maps to produce the final output vectors by the YOLO v5 network. The subset of channels or filters is pruned from different layers of deep neural network models through the principled pruning approach without affecting the full framework performance. The experiments are conducted with RLD dataset with different existing networks to verify the generalization ability of the proposed model. The effectiveness of the network is evaluated based on various parameters in terms of average precision, accuracy, average recall, IoU, inference time, and F1 score, which are achieved at 82.8, 94.87, 75.81, 0.71, 0.017, and 92.45 respectively.


Introduction
Any country's agricultural industry is crucial to its economic development. The bulk of the country's economy relies on agricultural products for their raw materials. Rice is a widely grown crop all over the world. Rice is grown in more than a hundred nations throughout the world. Each year, 158 million hectares of rice are harvested, yielding about 700 million tonnes of rice. Asia produces the majority of rice in contrast to other continents. Increased population has an impact on the environment in terms of global warming and quick climate shifts [1]. As a result of these environmental changes, the agriculture sector is affected. Due to these variations in the environment, rice fields are affected by a range of diseases. Crop quality, quantity, and productivity, all suffer as a result of these diseases. Nowadays different types of infections are causing damage to rice crops, reducing crop productivity all around the world. Many ailments such as rice curl disease, brown spot, blight, rice leaf blast, and others, have been seen in recent years [2]. These infections are noticed in the rice plant's

Related works
This literature review examines various strategies and approaches in Image Processing to identify rice leaf disease. This research explores the impact of Machine Learning and Deep Learning techniques by considering several network structural models.
A system for detecting diseases in rice leaves using machine learning techniques is reported by Ahmed et al, (2019) [13]. Three diseases affect rice plants most frequently: leaf smut, bacterial leaf blight, and brown spot. A variety of machine learning methods, including KNN (K-Nearest Neighbor), Naive Bayes, Decision Tree, and Logistic Regression had trained on the dataset. By combining various ML and image processing approaches, Daniya and Vigneshwari (2019) [14] review the detection of rice plant diseases based on the infected rice plant images. The numerous classification methods utilized in diverse applications in agricultural research include Probabilistic Neural Networks (PNN), Genetic Algorithms (GA), KNN, and Support Vector Machines (SVM). The size of the image dataset, preprocessing, segmentation methods, and classifiers were described together with a full analysis of rice plant disease. A rice leaf disease detection system was presented by Harun Rumy et al, [15] utilizing a simple Artificial Intelligent method. The required features were extracted from the images after the required preprocessing was applied. These attributes had then fed into a variety of machine learning techniques that were used to apply an image classification model.
Shah et al [16] present a review of image preprocessing and machine learning concepts applied to the classification and detection of infected rice leaves. It has given a block diagram of the proposed research work to address the problems in this research domain with important steps like image preprocessing, feature extraction, feature normalization segmentation, and classification. Prajapati et al [17] evaluated background removal with four techniques and segmentation with three techniques, extracted various features to obtain clear and detailed information about infected rice leaves. Thus, a trained discriminating network model has been obtained after back-propagation through all techniques.
The early detection of any disease and the application of necessary remedies to the damaged plants are vital for ensuring the healthy and proper growth of the rice plants. Having an automated method is wise because manual disease diagnosis requires a lot of time and manpower. Sachdeva et al [18] demonstrate the use of Deep CNN hybrid by the Bayesian learning approach for infection categorization in a variety of crops. To improve pixel dependency, the Bayesian technique was utilized. The researchers used the dataset Plant Village to acquire 20,639 photos for experimentation, which were sorted into 15 categories of infection and healthy leaf images. This technique is a good tool for diagnosing and categorizing infection in plants, according to the findings of this study. Many studies had been undertaken on this topic; however, disease diagnosis in soybeans was more difficult owing to a lack of information and technological challenges. Therefore, R-CNN was utilized in a study by Zhang et al [19] to look into the identification of soybean fungal infections from a synthetic image. To solve the issue of a limited database, this work focuses on constructing a synthetic soybean plant leaf photo collection.
In the study conducted by Sethy et al, [20] deep feature oriented rice illnesses identification with the use of a Support Vector Machine (SVM) has been applied. Finding the best strategy for feature extraction was crucial in the field of Machine Learning because features are critical for image categorization. Mojjada et al [21] employ digital image processing for identifying the problems of plant leaves in their study. A segmentation technique was used to automatically identify and categorizes plant leaf disease in pine trees. As part of the research, surveys of various methodologies for classifying illness were done. Genetic algorithms were used to segment images, which was critical for detecting plant illnesses. Also, segmentation and law mask feature extraction techniques were employed to identify plant leaves by Support Vector Machine classifier (Kaur and Devendran [22]). Chouhan et al [23] use Machine Learning and computer vision approaches to explore leaf disease segmentation and classification in biofuel crops of Jatropha Curcas L. and Pongamia Pinnata L. To categorize disorders in crop leaves, researchers employed machine learning approaches including KNN, Logistic Regression and Random Forest, SVM, Naive Bayes, and Regression Trees.
The EfficientNet Deep Learning approach was used to categorize the crop infections in a study by Atila et al [24]. Because many existing Machine learning systems do not offer excellent results, the EfficientNet deep learning approach was used in this study to categorize the leaf image disease. The efficiency of Deep Learning versus Machine Learning techniques in identifying leaf image infection was compared by Sujatha et al [25]. Citrus plant disease diagnostics had been developed using machine learning approaches like Stochastic Gradient Descent, Random Forest, deep learning, and SVM VGG-16 and Inception-v3 approaches. Hu et al [26] explore the detection and rigor investigation of tea fungal infections. A Faster Region-based CNNs model was utilized to recognize the foggy, blocked, and tiny pieces of leaves.
The leaves, growth, and yield of rice plants are affected by numerous diseases. Given this context, it is imperative to put the recognition method into practice and reduce the loss. Zhencun Jiang et al (2021) [27] target two kinds of leaf diseases in wheat and three kinds of leaf diseases in rice, gathering 40 shots of each leaf disease, this is used to upgrade the Visual Geometry Group Network-16(VGG16) model, which can predict all variety of rice plant and leaf diseases. It seeks to enhance the VGG16 model using the concept of multi-task learning before applying transfer learning and alternating learning using the pre-training model on ImageNET. Venu Vasantha et al, [28] discuss recently introduced methodologies with their performance measure and offer potential solutions using various machine learning techniques and comparative study of algorithms for detecting the sort of disease that afflicted the crop based on the crop image data. According to Asfaqur Rahman et al [29], the suggested model will successfully categorize and identify the diseases affecting rice leaves using image processing methods. CNN was employed to carry out and deliver the 90% accuracy. This model can distinguish between the diseased and healthy rice plants. To identify and classify rice disease from images of rice leaves, Yibin Wang et al (2022) [30] propose an attention-based depth-wise separable neural network with Bayesian optimization. The findings of this study will support the use of artificial intelligence in agriculture for quick diagnosis and management of plant diseases.
Most of the techniques are based on the classification of diseases in rice plants but the detection of rice leaf disease research works is very less. Recently, object detection networks work very well. One-stage (like the YOLO series) and two-stage detection algorithms are the two categories for object detection techniques based on deep learning that can be separated (such as FasterR-CNN) [12]. The BiFPN aggregates the features from the previous backbone layer as the different detector levels by learning the varying feature weights based on their importance [31], the highly effective weighted BiFPN optimizes the cross-scale connections. In order to find the object, the YOLO network divided the image into grid cells and evaluated each cell as a suggestion. YOLO achieved end-toend detection and skipped proposal generation in comparison to Faster R-CNN, enabling a real-time detection model. Due to the significantly increased detection speed, in-depth studies like SSD [32], YOLOv3 [33], and YOLOv4 [11] have been conducted. The former is more suited for real-time detection of plant diseases in complex field conditions since its processing rates are faster than those of the latter. Zhaoyi Chen [34] developed an enhanced plant disease-recognition model based on the original YOLOv5 network model to accurately identify plant diseases under challenging natural conditions. A global channel pruning network structure based on performance awareness is first theoretically demonstrated to achieve global channel pruning by using the joint saliency of channels from all layers with greedy policy as the pruning strategy for achieving globally most task oriented optimizing of their objective functions [35]. The test results revealed that the improved YOLOv5 networks mAP reached 70%, which was 5.4% higher than that of the competing networks. Therefore, in this work, the enhanced rice leaf detection network based on YOLO v5 model is introduced to enhance the disease detection accuracy.

Proposed methodology
The YOLO v3 and YOLO v4 are one-stage deep CNN that have accomplished a good performance in object detection. Because YOLO v5 makes use of a range of network architectures, it is particularly advantageous in terms of computational complexity and detection accuracy for the detection and identification of diseases. The proposed disease detection strategy contains four steps preprocessing, segmentation, feature extraction, and detection. In the first step, preprocessing is performed on the input rice disease images. After the image segmentation is done to segment the different regions of a rice leaf, each pixel in the image is labeled using depthaware instance segmentation (DAIS), ensuring that pixels with the same label have the same properties. Then, in the DenseNet-201 backbone network, the Bidirectional Feature Attention Pyramid Network (Bi-FAPN) module is used to extract useful information from deep and shallow feature maps. The newly developed FAM attention mechanism directs the distribution of various weights in order to realize and take note of the extraction of low and tiny features. This attention module enhances the extracted features. To efficiently fuse the multi-scale features retrieved from the backbone network and improve the diagnosis of diseases with various sizes, feature pyramids are generated using both top-down and bottom-up fusion structures. The feature maps are finally identified in the detection head, where the output feature maps are then combined with anchor boxes to produce the final output vectors in the YOLO v5 network. Figure 1 illustrates the architecture of the proposed methodology. The algorithm for the proposed method is illustrated in algorithm 1.

Input:
[ End For Algorithm 1 depicts the overall process. P represents the lateral connections in 1 × 1 convolution. The arrows pointing up and down, respectively, represent the upsampling and downsampling operations. A deconvolution layer followed by relu is employed for the upsampling operation while max-pooling is used for the downsampling operation. The gating operation is denoted by the letters G, while the 3 × 3 convolution is denoted by Mating.

Preprocessing
An image is modified during image preprocessing to add details about the leaf area that is affected. It successfully communicates information about leaf images with good visual interpretation. Common preprocessing methods include resizing and normalizing. For segmentation and detection networks, the input images are downsized to 224 × 224. Z-score normalization is used to normalize the images using the dataset's mean and standard deviation. Training with an unbalanced dataset can result in a biased model since the dataset is unbalanced and does not have an equal number of images for each category. Thus, data augmentation can aid in ensuring that each class has a comparable quantity of images. Three augmentation techniques are used in this study to balance the training images: rotation, scaling, and translation. The images have rotated with an angle between 5 and 15 degrees in both the clockwise and counterclockwise directions as part of the image augmentation process. In this experiment, image magnifications ranging from 2.5% to 10% have used. Scaling is the process of increasing or decreasing the image size. Images are translated by 5% to 20%in both the horizontal and vertical directions.

DenseNet-201 backbone network
As the backbone network for this detection task, DenseNet, a traditional CNN design developed to address the vanishing gradient problem, is used. DenseNet has the advantages of being light on the gradient problem, encouraging feature reuse, feature deployment, and its functionality minimizes the number of parameters [36].

convolution End For
In contrast to existing architectures that solve this issue, such as ResNets or highway networks, DenseNet connects all layers to ensure that there is the maximum amount of information flowing between them. In other words, such a connectivity pattern introduces connections in an L layer network. Figure 2 illustrates this layout schematically.The pre-trained DenseNet-201 has been trained it by modifying some layers and leaving the others frozen. The DenseNet-201 model employs compression of 0.5 and doesn't have any bottleneck layers, to be more precise. In other words, the next transition layer produces m 0.5 output feature-maps if a dense block includes m feature-maps.
In consequence, the network has acquired a substantial feature representations for many kinds of images. The input image size for the network is224 224 Each layer includes the ReLU activation, batch normalization (BN), and convolution with a3 3 filter, as shown in figure 2. Each block has an input in matrix form that represents an image pixel. This input then moves on to the batch normalization phase, which aids in lowering an over-fitting while training.If x is negative, ReLu activation will cause the value of x to go to 0, but if x is not less than 0, it will not. Convolutional with a3 3 filter will be multiplied by a convolution matrix with a3 3 filter, and the output is a previously processed matrix value. This method is used to process a matrix image that has passed the ReLu activation stage.

Depth-aware instance segmentation
In the segmentation process, quantize a depth interval [α, β] into K non-overlapping discrete depth bins using a spacing increasing discretization (SID), as is typical for the monocular depth estimation problem. In order to reduce the training losses in regions with high depth values and improve the accuracy based on the network predicts the depth of neighboring objects, this method uniformly splits a specified depth interval. The ground truth residual r d is calculated as follows for a depth of ground truth g d that falls in bin i [37], A bin i midpoint is given in equation (2), The left-edge of the i th depth bin is given in equation Where, right edge and K left edges 1 is given to a pixel if its ground-truth depth is g . d For an image, MaskLab produces direction prediction logits and semantic segmentation logits. Region of Interest (RoI) pooling of the semantic channel resultingin the RoI's predicted class from the semantic segmentation logits should be performed first. To collect a regional logits from each direction channel in order to take advantage of the direction information.Foreground/background segmentation is subsequently performed using the pooled direction logits and the cropped semantic segmentation logits.
Semantic segmentation logits may be effective for distinguishing the RoI's projected class from the surrounding background, but they don't ensure much more for dividing instances. On the other hand, as instance borders frequently coincide with significant depth discontinuities, monocular depth appraisals of the image can assist us in clearly separating the instances [37]. By using the same pooling direction logits, develop a Depth Aware Instance Segmentation (DAIS), which replaces the monocular depth logits with semantic segmentation logits which are displayed in figure 3. This method encourages to prediction of the depth logits that resemble a coherent instance mask through training, in addition to increasing the accuracy of instance segmentation. This is a result of the monocular depth logits directly affecting the instance mask output. BiFPN implements two cross-scale connectivity optimizations in this study. In order to fuse more features deprived of incurring significant additional cost, in the event that the original input and output are at the same level, BiFPN initially adds an additional edge from the original input to the output node. Second, rather than merely adding up or concatenating features, which may result in a feature mismatch and performance deterioration, BiFPN provides a learnable weight to determine the relative relevance of the various input features illustrated in figure 4. Given a list of multi-scale features, where P i m signifies a feature level with resolution of / 1 2 i 1/2i of input images. For example, if the input resolution is 1024´1024, P m 3 indicates feature level 3 (where 1024 divided by 2 3 is 128) with a resolution of 128´128 and P m 7 with a resolution of 8 by 8. The attention mechanism is a probability weight distribution process. The method divides the calculations of the features so that those with more information have higher weighting coefficients, increasing the quality of the high-dimensional hidden layer features. Given an input feature Î´ X C W H weight of feature attention can be formulated by equation (4), Where, feature map size isẂ H, point-wise and depth-wise convolution are denoted as P conv and D , conv respectively. ( ) Î´ i x C W H denotes the intermediate output of D , conv . d refers to ReLU, and s is Sigmoid function. The kernel sizes of D , conv and P conv is´´Ć C 3 3and´´Ć C 1 1,respectively. The FAM structure is displayed in figure 5.
Equation (5) can be used to compute the final enhanced feature Î ¢´ X C W H after the feature attention weight w is achieved.
Where, the symbol Ärepresents the element-wise multiplication operation. The interdependencies between spatial and channel can be captured by a fore mentioned processes, giving us more flexibility when interacting with various feature types. In general, D , conv and P conv in attention module can greatly increase the efficiency of detection network.

Multi-scale detection using YOLO v5
The YOLO v5 network [38] uses eight sampling output feature maps and 3 types of output feature maps to identify objects of various sizes. We create a feature scale to concentrate on lesser diseases because the diseases in the rice leaf dataset are weak and small. To upsample the feature maps until they reach 64 × 64 in size in order to produce 4 downsampling feature maps. In order to fully utilize the shallow and deep features, the top-down semantic feature maps are fused with the bottom-up localization feature information in the backbone network [8].  illustrates the four feature scales, which are128 128,64 64,32 32,and18 18. The size of each grid is shown by the 32 × 32 markings on a grid division. Twelve anchors with four detection scales are added to the original nine anchors with three scales. It is a simple model to acquire the converge and detect the disease with diverse scales because the YOLOv5 model's ability to adaptively compute relevant anchors based on the datasets.
The central point ( ) tx ty & ,height ( ) th ,width ( ) tw , and confidence score are all predicted by the four detection layers (P2-P5) on the left side of figure 6. The network gathers the label information from the input images, and the right part contains the ground truth. The loss among the prediction value and the actual value is then calculated to establish each detection layer has lost. The model steadily improves the performance and completes training through input from the loss.
The network's training time and detection accuracy have a direct impact on the loss function design. The intersection ratio among the prediction and ground-truth box is calculated as part of this study's use of the IoU  loss function for disease detection [9]. The greater the value, the greater the degree of coincidence and similarity between the two boxes. The IoU loss function evaluation is mathematically expressed in equation (6) [39], Where, A and B refers to the predicted box and the ground-truth box. Although the IoU loss function is scaleinsensitive, the intersection ratio is the detection result of the predicted box in comparison to the groundtruth box.
3.6. Model pruning 3.6.1. Sparsity training The sparse training is done by using the weight importance of all layers to prune less important channels of our model. The aim of the integrated batch normalization layers after each convolution layer into our model is to enhance convergence acceleration at the same time, it prevents gradient vanishing as shown in figures 7 and 8.
The expression for BN layer is given above. In equations (7) and (8), the input and output value are represented by x in and x . out The channel's mean and variance as in this mini-batch are represented m B and s B 2 respectively. The sparse factor and bias parameter are denoted by γ and β respectively,which are to be optimized as per equations (7) and (8).  The importance channel is represented by sparse factor γ which is achieved through the training, this is to force sparser factor γ of all BN layers. L 1 Norm of γ is added to loss function as a penalty as follows in equation (9). Where the first pruning layer is represented by d 1 which is obtained by mean of the maximum acceptable that layer acceptable relative performance drop, and the threshold of each step based on previously dropped threshold is represented by l as a constant scaling factor. The global performance drop threshold constraint is represented by a.
In this way, the model can be compressed in the well principled way by using different number of parameters based on different values of λ.
The pruned model should be fine-tuned by using suitable hyper-parameters for restoring their accuracy to detect objects accurately even after the pruning.

Result and discussion
The proposed Multiscale YOLO v5 method is designed in Google Colab using python programming with core i3 processor and 4GB RAM system. The Kaggle rice leaf disease (RLD) dataset [40] is used for the experimentation which contains images of Rice Leaf Diseases with boundary box similar to Bacterial Leaf Blight, Rice Blast, and Brown Spot. The dataset contains 850 rice leaf images. In this dataset, 75% of rice leaves images are utilized for training and the remaining 25%of rice leaves images are utilized for testing. For training, validation and testing, 583, 160 and 107 images are used, respectively. The 583 images are input into the proposed YOLOv5 network for training. Then, the test results are compared with the existing networks. Tables 1 and 2 display the dataset description and parameter details, respecttively.  The model can be evaluated on a training dataset and a validation dataset following each update, and graph analysis for measured performance can be made to show learning curves. An appropriate match is found by a training and validation loss that increases training accuracy. According to the analysis, the accuracy will rise in line with a growth in the number of epochs. On the training dataset, the model's loss is usually always smaller than the validation dataset. This implies that a gap among the training and validation loss in the learning curves should be evaluated. Figures 9 and 10 shows an evaluation of the training loss and accuracy, respectively. The proposed technique have high initial loss value, which effectively decreases as epochs in advance. When the sample is 95 training epochs into the experiment, the loss value is relatively low. Because of this, the suggested network model have high accuracy rate and a low loss value. According to the observation, the suggested network's training process is stable and quickly converges. The network is trained using ADAM optimizer with the rate of 0.001 of learning with a momentum of 0.9 until convergence.
Appendix table A1 shows the effect of rice disease image segmentation. The segmentation effect of Faster-RCNN is worst, it does not segmenting out all the disease spots. In Mask-RCNN tiny affected regions are not segmented clearly. On analysing RPN it segmented some unaffected regions along with affected regions. If the edges regions of leafs are affected it is not segmented by Yolo v3. Yolo v4 cannot segment the detailed affected regions properly. Yolo v5 segments the affected region better than other Yolo models but small affected regions  are sometimes missed from the group of affected regions. But, the proposed DenseNet-Bi-FAPN with YOLO v5 segments both the edge region and the detailed affected region perfectly than other models.
Appendix table B1 shows the automatic segmentation outcomes obtained by different segmentation approaches for different rice disease input images. It can be seen from the segmentation results that the Faster-RCNN segmentation approach only segments some portion of disease affected areas. The Mask-RCNN segmentation approach provides a better segmentation effect than Faster-RCNN but smaller affected areas are not covered in this approach. However, RPN mistakenly segments most of the normal areas as the disease areas and fails to complete the segmentation process accurately. Yolo v3 has a poor segmentation outcome on the disease leaf edge area. Yolo v4 has a better segmentation effect than Yolo v3 in terms of the edge area, and it can realize complete segmentation of the detailed parts in the rice disease leaf image. Yolo v5 does not detect the less affected leaf areas properly from the group of affected regions. The proposed DenseNet-Bi-FAPN with YOLO v5 segmentation approach can completely segment the rice disease affected leaf areas and has a good effect on the detailed edge areas. Hence, from the analysis, it is found that the proposed segmentation approach is far superior to other segmentation approaches.introduced DenseNet-Bi-FAPN with YOLO v5 can play a significant role in ensuring higher detection accuracy because it can improve the accuracy by~1% mAP. By efficiently balancing the multi-scale features, the multi-scale YOLO v5 can communicate its features more powerfully. As a result, the accuracy of the entire dataset is increased. Measuring the AP and AR metrics identifies the performance of the proposed method. The AP is calculated using IoU criteria as (0.5:0.05:0.95). Other significant measurements are reported according to the convention in the meantime; for instance, AP50 and AP75 are calculated in IoU ranges of 0.50 and 0.75. The term AR stands for the highest recall rate, which is calculated using certain detections (i.e., 1, 10, 100, and 500) for each image, as well as an average of all sets and different IoU thresholds. Tables 3 and 4 showed the AP and AR measurements. The proposed detection network of DenseNet-Bi-FAPN with YOLO v5 produces better AP results such as 63.79, AP-50 of 70.732 and AP-75 of 65.482. According to this, specific precise bounding boxes with IoU thresholds between 0.5 and 0.7 are preferable for the YOLO v5 network. The accuracy of the proposed prediction effectiveness is shown by AP50 and AP75. Particularly, the AR is expressively improved, with multi-scale testing yielding the greatest results. This is because the suggested strategy reduces the number of incorrect border boxes, increasing the assurance of border boxes with correct places. Better AR findings are reported by the suggested disease detector: AR10 of 42.46, AP100 of 52.26, and AP500 of 64.87. Table 5 shows the results of 10 images considered in the RLD dataset. The performance of the ten images further analyzed the data by AP, AR, mAP, IoU, inference time, accuracy, and F1 score. The inference times did  [11], and Yolo v5 in the RLD dataset. The reported YOLO v5 detector value is greater than the deeper models APs such as Faster-RCNN, Mask-RCNN, RPN, Yolo v3, and Yolo v4 models. The F1 score for the proposed network is 92.45% while those of Faster-RCNN, Mask-RCNN, RPN, Yolo v3, Yolo v4, and Yolo v5 are 75.63%, 77.34%, 79.97%, 82.32%, 86.64%, and 91.52%, respectively. The proposed disease detector's best performance is an AR of 75.18%, dramatically outstanding all the detection schemes to the best of our knowledge. At last, the proposed network performance significantly increases the detection accuracy for all diseases. Figure 11 shows the accuracy analysis of proposed and existing networks. From the analysis, the proposed network achieves 94.87% accuracy. This accuracy is the best among the other networks.

Ablation study
This study presents an ablation experiment to illustrate the contribution of the modules of the DenseNet-Bi-FAPN with YOLO v5 to the YOLO-v5 optimisation. In order to compare the YOLO-v3 and YOLO-v4 algorithms, the route aggregation network structure of the feature pyramid in YOLO-v5 is swapped out for the Bi-FAPN structure in this study. The findings of the ablation investigation using the Yolo v3 Yolo v4 and YOLO-v5 in the RLD dataset are shown in table 6. To compare the algorithms' AP, AR, mAP, IoU, inference time, accuracy, and F1 score, the ablation experiment is carried out. It is clear that the suggested disease detector outperforms all other detection methods, to the best of our knowledge, with an AR of 75.18%. Finally, the performance of the suggested network considerably improves the accuracy of disease detection with IoU is 0.74% and the mAP is 70.8%.

Conclusion
In this paper, a study presenting a method for the precise detection of rice leaf disease is proposed and evaluated.
The proposed system used the DenseNet-Bi-FAPN with YOLO v5 model while testing RLD dataset for their performance. The major goal of this proposed network is to detect and diagnose the diseases in the rice leaves. The proposed YOLO v5 network enhances detection accuracy by integrating the DAIS segmentation and Bi-FAPN networks. The features from the segmented images are obtained by the attention-based bidirectional FPN module. To detect the disease, the feature map based YOLO v5 model is used because of its ability to adaptively calculate the relevant anchors based on the RLD dataset. Then, the effectiveness of the network is evaluated based on the various parameters in terms of AP, mIoU, accuracy, AR, IoU, and F1 score. Finally, the proposed scheme performance is examined and showed a better performance with 94.87% accuracy. The efficiency of the proposed method is 91.89% which is compared with other existing methods like Faster-RCNN (76.47%), Mask-RCNN (78.46%), RPN (80.91%), Yolo v3 (84.32%), Yolo v4 (88.64%), and Yolo v5 (91.89%). The computational cost of our proposed framework has been greatly reduced by using the principled pruning approach. Thus, the proposed approach helps farmers for detecting rice leaf diseases in their early stages. In future, it can be improved by adding sensors to the application for retaining the rice quality. The F1 score is known as dice score which is interrelated to the IoU. It is expressed in equation (10), The consistent average of F1 score, precision and recall are by far the most appropriate for unbalanced datasets by its defiNnition. According to the formula, the F1 score result should also be zero.