New method for rice disease identification based on improved deep residual shrinkage network

A new method with an improved deep residual shrinkage network is proposed to address the problems of subtle differences in spot characteristics among different rice diseases and low recognition rate under noise interference. First, to reduce the number of network parameters as well as arithmetic cost and increase the nonlinearity of the model, the InceptionA module is embedded in the original network, and the convolutional kernels in the original residual structure are replaced by multiple small-sized convolutional kernels. Second, in order to strengthen the spot features, Convolutional Block Attention Module (CBAM) lightweight attention mechanism is introduced to achieve more effective information extraction. Exponential Linear Units (ELU) and Focal loss function are introduced to jointly guide the model training during the network training process, and 10-fold cross-validation method is used. The proposed InceptionA and CBAM-based DRSN (ICDRSN) obtains 98.89% mean average precision, 98.65% accuracy and 98.68% recall for three rice leaf disease data. Among them, the recognition accuracy is improved by 2.6%, 3.34%, 1.86%, and 2.23% compared with the Densenet, Shufflenet, Mobilenet, and Resnet models, respectively. These results verify that the ICDRSN model is stable, reliable, accurate, fast, and has satisfactory generalization ability.


Introduction
As one of the major food crops in China, the total domestic rice production in 2021 reached nearly 212,843,000 tons, which is the staple food for about 65% of the country's population (National Bureau of Statistics, 2021, december 06).Rice disease is an important factor that leads to the decrease in rice production and quality, which directly affects food security and agricultural income of China.Common rice diseases include flax spot disease, rice aspergillosis, rice blast, red blight, etc.; Among which rice blast, white leaf blight and brown spot disease are extremely serious threats to rice yield and quality.In view of the occurrence of diseases in rice growing cycle, early preparation for disease control can not only save production cost and keep crop yield growing, but also ensure the sustainable development of food security in our country.Traditional rice disease recognition is mainly based on manual observation of the shape and colour of the spot and subjective diagnosis based on past planting experience or network resources.In addition, with the adjustment of cultivation industry structure, rice disease variety and complexity, pose a great threat CONTACT Yang Lu luyanga@sina.comto rice production.Therefore, timely and effective diagnosis of the disease is particularly important, see Ngugi et al. (2021).In recent years, machine learning methods have received research attention in rice disease identification.The classic machine learning algorithms have been performed well and stable in the training process.
Nevertheless, the manual labelling of features is often required, which is a tedious and lengthy process.Moreover, there are other problems such as incomplete relevant features and difficulties in achieving deep feature extraction (Habib et al., 2020).
With the continuous research and development of computer simulation of human intelligence, image recognition technology based on machine learning has becoming more and more mature in the field of intelligent crop identification and detection.To the disadvantage of serious maize crop loss in India (Mishra et al., 2020) have constructed a real-time identification method of maize leaf disease based on deep learning technology to achieve an automatic diagnosis of maize disease through intelligent devices.To address the disadvantages of poor robustness and low accuracy of recognition models caused by complex background of crop disease images and small disease areas, Zeng and Li (2020) have introduced attention mechanism in convolutional neural network architecture to achieve efficient extraction of useful information of disease spots and recognition of disease species.To obtain high accuracy of the pest recognition model, M. H. Wang, Wu, et al. (2021) have optimized the channel and spatial attention modules by connecting them in parallel, which significantly improves the performance of the pest identification algorithm.Zhang et al. (2022) have improved the capsule network and introduced an attention mechanism to construct MCapsNet crop disease identification model to achieve capturing highweight classification and accelerate model training.Hussain et al. (2022) have combined VGG network and Inception model to construct a recognition framework for diseases of cucumber stems and leaves, the extracted features have been fused in parallel and optimized by whale optimization algorithm.Despite the effectiveness of the proposed method, there are still shortcomes such as a sudden increase in the number of model parameters, slow convergence, and decrease in recognition accuracy due to the increase in the number of network layers (Makanapura et al., 2022;S. Wang, Sun, et al., 2021).
Residual network (ResNet) has shown powerful advantages in the field of deep learning due to its good generalization ability and adaptation properties (Cai et al., 2021;Guo et al., 2021).Liang et al. (2022) (Zhao et al., 2019).Changchang et al. (2021) have used the DRSN model for fault diagnosis of vibration signals, which resulted in a 1% to 6% improvement in the average correct fault diagnosis rate and a low training error.Based on the above studies, an improved deep residual systolic network for rice disease identification is proposed in this paper, the main contributions are that: (1) The proposed ICDRSN model applies the sparse neural network module to the DRSN network, which catches the corresponding range of perceptual fields by convolutional kernels of different sizes, promotes the network to learn the required parameters by itself, and significantly reduces the model parameters and the computational computation of the algorithm.(2) To further strengthen the spot features, the CBAM (Convolutional Block Attention Module) module is introduced, which effectively improves the classification and location performance as well as robustness of the deep model.(3) ELU and Focal loss function are introduced to guide the model training.The phenomenon that the gradient value is zero when the input of the Rectified Linear Unit (RELU) function is negative in the original model can be effectively solved, and the problem of unbalanced sample distribution can be alleviated at the same time.He et al. (2016) have added hopping connections between network layers to construct residual learning units to maintain good gradients as well as generalization ability and have solved the previous challenge that it is quite difficult to approximate identity mapping in deep models by adding multiple nonlinear layers, the phenomena such as gradient dispersion and gradient explosion that occur in the backpropagation process have been effectively alleviated, and the network learning efficiency has been improved.Figure 1 illustrates the residual unit construction.Among the variants of deep residual networks, the DRSN can be of great use in the fields of intelligent fault detection and diagnosis of noisy signals and image recognition and has received good results (Li & Chen, 2021;Salimy et al., 2022).At present, the DRSN has been seldom used in the rice disease recognition.In this paper, the structure of DRSN is improved to achieve real-time, stable and efficient recognition of rice diseases.

SE attention
The SENet structure in the DRSN algorithm performs a convolution and pooling operation on the input features X of size H * W * C to obtain a feature map with the size of H*W*C, after Gloable Average Pooling (GAP), a feature map with the size of 1*1*C is obtained; The two linear layers are connected to fit the complex correlation between channels, and the output is normalized to a factor between 0 and 1 by the sigmoid layer.The channel weight dimension of 1*1*C is obtained so that it is fully multiplied with the initial input features to obtain a feature layer with different channel weight percentages.The structure of SE module is shown in Figure 2.

Methodological overview
The attention mechanism in the DRSN model only establishes the connection between data channel information, which is not conducive to the flow of information within the network.At the same time, the DRSN model also has the problems of huge amounts of parameters and long learning cycles with excessive computing cost, which has a hold on the recognition accuracy and robustness of the model.Under the field growth environment of rice, the colour, shape, area and texture features of disease spots on the plant leaves are complex and diverse.Moreover, the area is uneven and not concentrated, so the control of local feature information has high requirements.In this paper, we combine the features of DRSN network to be improved and the characteristics of three diseases(rice blast, brown spot and white leaf blight) and integrate the lightweight attention mechanism CBAM and InceptionA sub-network in DRSN to build an improved residual shrinkage stacking module to realize the multi-scale extraction of rice leaf disease features and deeply strengthen the disease spot features.The ELU activation function and Focal loss function are introduced to jointly guide the model training process.

InceptionA module structure
The structure of the InceptionA module is shown in Figure 3.The module is derived from the Inception V3 network structure (Szegedy et al., 2016), using two 3 × 3 convolutions instead of the original 5 × 5 convolutions, which reduces the number of parameters to 72% of the original number of parameters, while the perceptual domain of the two convolutional structures is exactly equal.The modular structure can effectively alleviate the problems of computational imbalance and high computational cost, and increase the learning space of the model and improve the accuracy of the model.

CBAM module architecture
The CBAM mechanism used in this paper consists of two modules, Channel Attention Module (CAM) as well as Spatial Attention Module (SAM), to realize feature recalibration.Figure 4 shows the structure of CBAM module.
where M c is the weight coefficient of the channel attention module, and ρ is the activation function sigmoid.
Figure 5 illustrates the construction of the channel attention mechanism.
(2) Spatial attention mechanism The module obtains the compressed feature maps M(H*W*1) and A(H*W*1) through a double pooling operation then, superposes the feature maps and performs a convolution operation to obtain C*(H*W*2).C is multiplied by the feature F' to obtain the new feature.The calculation formulas are shown as follows: (3) where ρ is the activation function sigmoid, and f (7×7) corresponds to the convolution operation with the size of the convolution kernel, M s is the weight coefficient of the spatial attention module.Figure 6 illustrates the spatial attention mechanism.
The functional formula for solving the soft threshold and its partial derivatives are given: (5) where the output and input features are denoted as y and x, respectively; τ is the pre-selected threshold.The improved residual shrinkage network hierarchy is shown in Table 1.C1, C2 and F correspond to Conv 1 × 1, Conv 3 × 3 and linear layers, respectively.

ELU activation function
ELU has all the advantages of Clevert et al. (2015).In order to solve the problem that the gradient of the Relu   function is zero when there is always a negative input in the backpropagation process, the ELU function is used in the activation layer instead of the original Relu function.The soft saturation part of the function enables the suppression of the input noise.The linear part of the function can effectively alleviate the problem of vanishing gradients, and ELU can reduce the effect of bias shift to make the output mean close to zero and the gradient closer to the natural gradient, which further accelerates the learning process: where x is the input feature; α is a hyperparameter whose value is adjusted in the same way as other hyperparameters, usually set to 1

Focal loss function
The Focal loss function can boost the weights of low percentage samples and misclassified samples and misclassified samples during training and suppress the weight of easily classified samples.Equation 8 illustrates the function : where p t is the probability that the model prediction belongs to the prospect, and its value is proportional to the sample's distinguishability, with a value range of 0 1; α t is a dynamically changed variable that is less than 1 and can reduce the Loss value in general.(1 − P t ) γ is the adjustment factor introduced into the cross-entropy loss function, and the corresponding loss value is obtained from the value of γ .
Figure 7 shows the overall model architecture of ICDRSN.

Data acquisition
The collected disease images included 702 images of white leaf blight disease, 1035 images of blast disease, 756 images of brown spot disease, and 1589 images of healthy rice leaves, totaling 4082 original images.Among them, 3471 images were collected from university rice experimental fields and 611 images were from rice disease mapping scans.The original image samples collected are shown in Figure 8.Among them, 3471 images were collected from the rice experimental field of Heilongjiang Bayi Agricultural University, and 611 images were from rice disease map scanning.

Data enhancement
Firstly, duplicate sample data are eliminated, and the feature layer is adjusted using the Batch Normalization (BN) operation.To ensure the diversity of samples, the disease categories with a small number share in the training set are processed by random rotation, image mirroring, colour dithering and adding noise.Table 2 shows the division of the dataset.

Evaluation metrics
In this experiment, Precision, Recall and mAP are adapted to analyse and assess the model for the classification task.where N corresponds to the number of all classes, which corresponds to the sum of all average precision.

Ablation experiments
During the training process, in order to facilitate the comparison of the effects of different structural improvements on the model performance through ablation experiments, the four control group models selected for this study all used ImageNet pre-trained models.The SGD algorithm incorporating the Momentum term is used to achieve parameter updates with a Momentum parameter value of 0.9, and the network was able to get rid of the local optimal solution to some extent.After scaling the randomly cropped image to 224 × 224, it is randomly flipped horizontally with a probability of 0.5 and then fed into backbone network.The training band and initial learning rate correspond to 400 and 0.0001, respectively.The randomly cropped image is scaled to 224 × 224 and passed into the trained model for validation after central cropping in order to facilitate the observation of the effect of different structural improvements on the model performance.
The experimental design considers several aspects of the filter size, attention method, and model parameters used by the models for various combinations of network structures, Table 3 shows the training performance.Overall, under the same environment, the average recognition accuracy obtained from the training of scenarios 3 and 4 is better than that of scenarios 1 and 2, while the average recognition accuracy obtained from scenario 4 is the best.The models are validated.In conjunction with the analysis of the data in Table 4 Woo et al. (2018), comparing the original standard ResNet50 network structure, the number of model parameters does increase after incorporating both SE and CBAM attention in the Resnet50 structure, note that, such a small amount of parameter growth is well within the acceptable arithmetic cost compared with the overall size, parameter amount of the model and the performance improvement of the model brought by this module.Such a small increase in parameters is well within acceptable arithmetic costs and does not bring substantial effects on the overall performance of the model.Meanwhile, the InceptionA module introduced at the front end of the network uses two convolutions of size 3 × 3 instead of the original convolution of size 7 × 7, which strengthens the network depth and width while reducing the parameters and accelerating the network computation.
In addition to analysing the effects of using different convolution sizes on the model performance and the arithmetic cost of the network under different structural combinations, the experiments also provide a comparative analysis of the different effects on the network performance caused by implanting CBAM modules at different locations in the model.Considering the training effects of schemes 1, 2, 3, and 4 and the performance indexes obtained from the classification tasks in Table 3, theoretically analysed scheme 4 outperforms other structural combination schemes in accordance with the mean average precision (mAP) and the effect of reduced network parameters, and the experiments will draw on scheme 4 to carry out research on rice disease image recognition.

Analysis of experimental results
From the matrix in Figure 9, it can be analysed that mutual misidentification occurred between rice blast and healthy leaves as well as between brown spot and blast.Due to the subtle differences between brown spot and early blast, the spots presented under different views in different growth periods have extreme similarity, leading to the possibility of being misclassified.The improved model can further accurately identify rice disease spot characteristics in complex environments.Compared with the identification of white leaf blight, there is a mutual Note: blast is rice blast, blight is rice white leaf blight, brownspot is rice brown spot, healthy is rice healthy leaf.misidentification of rice blast and brown spot, and it was analysed that the degree of similarity between the spots of the two diseases in different disease stages was not the same.The size, colour, texture and other initial characteristics of blight spots are particularly close to those of brown spot, which is difficult to identify accurately by the naked eye, making the accurate identification of the disease difficult, resulting in the model's poor recognition of early rice blight and brown spot.
In the process of verifying the robustness of the model, in order to circumvent the chance and randomness of a single test set during the training process, the crossvalidation method (Refaeilzadeh et al., 2009) is used for model evaluation, and the mean value of the results obtained after the comprehensive cross-experimental validation is used to evaluate the robustness of the model.In this paper, the most commonly used ten-fold crossvalidation method was used, and the average recognition accuracy was 98.79%, 99.26%, 99.13%, 98,90%, 97.96%, 96.94%, 98.71%, 98.99%, 99.08%, 98.76%, with the highest value of 99.26% and the lowest value of 96.94%, and the average model accuracy was 98.65%.The average accuracy of the model was 98.65%, and the comprehensive analysis of the results of the multiple tests showed that the difference of the model accuracy was smooth, which could prove the robustness of the model.
The experiments results show that the improved ICDRSN network achieved 98.65% mean accuracy on the constructed rice leaf disease dataset, which is 2.6%, 3.34%, 1.86%, and 2.23% higher compared to the DenseNet, ShuffleNet, MobileNet, and ResNet models, respectively.The improved model also achieved 98.95% accuracy and 98.68% recall.Experiments results show that the improved residual shrinkage network proposed in the paper can effectively extract disease spot features for the three rice diseases studied and accomplish rapid and accurate recognition of diseases in complex environments.
The effects of the improved network for the recognition of the three studied rice disease images as well as healthy rice images are shown in Table 5.

Comparative analysis of ICDRSN and classical model results
The accuracy and loss value curves of the improved network structure ICDRSN based on the constructed rice disease image set after training enter the convergence state earlier than the other four reference group models, and the state is relatively smooth.
As shown in Figure 10  In Figure 10 and Table 5, the accuracy of the three models, DenseNet, MobileNet and ResNet, is relatively stable, all reaching 96% or more, and the new model ICDRSN reaches 98.65%, which is 3.34 percentage points different from the 95.31% of the ShuffleNet model.At the same time, the accuracy values of the model improved by 1.15%, 1.65%, 0.4% and 0.68% compared to DenseNet, ShuffleNet, MobileNet and ResNet, respectively.The loss and accuracy curves in the training set validation process showed that the ICDRSN network combined with the CBAM mechanism and the Inception module had a faster convergence rate and smoother state, and the advantages were obvious, which shows that the model can effectively identify the above three rice diseases.

Conclusion
The intelligent recognition of rice disease images in this paper has been studied based on an improved deep residual shrinkage network.The InceptionA module has been embedded to achieve multi-scale feature fusion and reduce network parameters.A CBAM lightweight attention mechanism has been introduced to enhance the disease spot features and improve the robustness of the model.Cost function has been adapted to solve the problem that the model recognition effect is affected by the unbalanced sample classes.Various network structures are designed based on different convolutional kernel sizes and different attention module positions for network performance comparison analysis, A model structure with high performance, low computational power and balanced complexity has been determined.The improved model has achieved 98.65% recognition accuracy with the suppression of model size and memory consumption.Compared with each classical reference model, the new model has performed well in terms of convergence speed, generalization ability and robustness and can better balance the model complexity and various network performance indicators, indicating the effectiveness of the model in rice disease recognition tasks.There is still some room for optimization of the improved ICDRSN network structure, and further attention will be paid to the optimization of the model in future research to achieve fast and accurate identification of similar diseases and provide new ideas for efficient and intelligent identification of crop diseases in complex field environments.

Disclosure statement
No potential conflict of interest was reported by the author(s).

ORCID
Yang Lu http://orcid.org/0000-0001-9887-7078 have studied the problems of segmentation and weight prediction of grape clusters by cross-combining ResNet backbone feature extraction networks of different sizes as well as SFNet, GCNet, EMANet, and Deeplabv segmentation networks to construct the SFNet-ResNet18 model.The Deep Residual Shrinkage Network (DRSN) proposed by Zhao's team, extracts features and analyses the data to eliminate the influence of noise in the fault diagnosis process

Figure 1 .
Figure 1.Schematic diagram of the structure of the residual unit.
(1) Channel attention mechanism Double pooling operation is carried out on the feature map with input size H*W*C.Two vectors M(1*1*C)
Precision = TP TP + FP (9) where TP (True Positive) is the number of samples that are correctly predicted as positive, and FP (False Positive) the number of negative samples mistakenly identified as positive samples.Recall = TP TP + FN (10) TP (True Negative) is the number of samples that are correctly predicted as positive; FN (False Negative) is the

Figure 9 .
Figure 9. Mixed matrix of rice disease set based on the improved model.
(a), the accuracy curves of Densenet,Mobilenet and Resnet structures gradually reaches the convergence state and is stabilized near the 170th, 180th and 210th epoch regions, respectively.
Figure 10(b) illustrates that the loss curves of Densenet, Mobilenet and Resnet structures during the training process are not as good as those of ICDRSN, and the convergence rate is slower and the loss curve after convergence is not very smooth.As shown in Figure 10(c), the improved ICDRSN network structure is the first to enter the convergence state during the validation process while ensuring a high recognition accuracy and a smooth state; therefore, the improved ICDRSN network has advantages compared to other four comparison models.As shown in Figure 10(d), the loss curves of Densenet, Shufflenet, Mobilenet and Resnet enter convergence at the 130th epoch, 170th epoch, 155th epoch and 160th epoch, respectively.The improved ICDRSN network reaches convergence before these four networks, and the state smoothing advantage is obvious.

Figure 10 .
Figure 10.Changes of recognition accuracy and loss values during iterations of different models.(a) Accuracy change curve of the training set during the iteration of different models.(b) Change curve of the loss value of the training set during different model iterations.(c) Variation curve of the accuracy of the validation set during different model iterations and (d) The change curve of the loss value of the validation set during different model iterations.

Table 2 .
Rice leaf disease and healthy leaves.

Table 3 .
Comparison of ablation experiment results.