MnasNet-SimAM: An Improved Deep Learning Model for the Identification of Common Wheat Diseases in Complex Real-Field Environments

Deep learning approaches have been widely applied for agricultural disease detection. However, considerable challenges still exist, such as low recognition accuracy in complex backgrounds and high misjudgment rates for similar diseases. This study aimed to address these challenges through the detection of six prevalent wheat diseases and healthy wheat in images captured in a complex natural context, evaluating the recognition performance of five lightweight convolutional networks. A novel model, named MnasNet-SimAM, was developed by combining transfer learning and an attention mechanism. The results reveal that the five lightweight convolutional neural networks can recognize the six different wheat diseases with an accuracy of more than 90%. The MnasNet-SimAM model attained an accuracy of 95.14%, which is 1.7% better than that of the original model, while only increasing the model’s parameter size by 0.01 MB. Additionally, the MnasNet-SimAM model reached an accuracy of 91.20% on the public Wheat Fungi Diseases data set, proving its excellent generalization capacity. These findings reveal that the proposed model can satisfy the requirements for rapid and accurate wheat disease detection.


Introduction
Wheat is the world's most important cereal crop, being directly intertwined with humanity's survival and advancement [1].Common wheat diseases include wheat rust [2][3][4], wheat powdery mildew [5], wheat smut [6], and wheat scab [7].These diseases significantly reduce the quality and yield of wheat, resulting in substantial economic losses.Therefore, the rapid and accurate detection and identification of wheat diseases are vital measures to ensure healthy wheat growth and safeguard agricultural security [8].
Deep learning (DL) approaches have been frequently employed in identification tasks in the agricultural field and have produced remarkable outcomes, allowing for high identification accuracy at a relatively low cost [9].However, one of the biggest challenges when using DL for agricultural recognition tasks is image recognition accuracy.Realworld agricultural production images are often affected by complex backgrounds, adverse weather conditions, focus blurring, occlusion, and the presence of irrelevant objects [10].Additionally, the image quality can be significantly impacted by the image's size, position, and shape, as well as the lighting and shooting conditions.These variables are the primary causes of classification errors.Therefore, improving image recognition capability is a significant issue in environments with complex backgrounds.At present, attention mechanisms are widely used to improve model performance [11].Attention mechanisms are a technique used in deep learning to simulate selective attention and weighted processing of input data, similarly to processes in the human visual system [12].In a traditional neural network model, each input is usually treated equally without difference.However, in the actual task, the input data require different levels of attention for different components or regions.This notion is particularly crucial in agricultural classification tasks with complex backgrounds.When identifying specific crop diseases, it is crucial to focus on regions that exhibit distinct disease symptoms.Therefore, an attention mechanism is needed to efficiently allocate attention to relevant areas through autonomous learning and weight adjustment.In addition, through the use of Grad-CAM, attention mechanisms are also easy to interpret, effectively describing the key features or areas emphasized by the model to facilitate the decision-making process [13].For large agricultural objects, deep learning models are now superior to other classification methods; however, their classification performance may fall short of expectations in the initial stages of symptom appearance and when considering diseases with similar symptoms.Therefore, this disadvantage could be somewhat compensated for with the introduction of an attention mechanism.
Attention mechanisms have been introduced into the field of computer vision, imitating the ability of the human visual system to focus on salient regions in complex scenes and categorize those regions according to various approaches, such as channel attention, spatial attention, temporal attention, branch attention, and so on [14].Spatial attention and channel attention are often used in deep learning.Their main difference lies in whether the mechanism focuses on a specific spatial region of the input data or a specific channel when dealing with images or feature maps.The channel attention mechanism enhances the model's ability to use multi-channel feature information and representations through allowing for weighted processing and selective attention to input channel dimensions.Meanwhile, the spatial attention mechanism helps to carry out weighted processing and apply selective attention to the input spatial dimensions, and can effectively use the feature information of different locations to improve the perceptual range and accuracy of the model.Common attention mechanisms include the squeeze-and-excitation (SE) module [15], channel attention (CA) module [16], efficient channel attention (ECA) module [17], convolutional block attention module (CBAM) [18], and simple parameter-free attention module (SimAM) [19].Plug-and-play attention mechanisms can be easily integrated into pre-existing models, allowing for significant improvements in accuracy with very few additional parameters.At present, integrating spatial attention, channel attention, or both, is a major method for improving model performance [20].
In order to identify crop diseases swiftly and accurately, Genaev et al. [21] proposed a method for the recognition of five fungal diseases of wheat shoots based on EfficientNetB0.An approach based on an image hashing algorithm was used to reduce the degradation of the training data.The highest accuracy of the model regarding the used data set was 94.20%.Nigam et al. [22] created a data set called WheatRust21 and used a fine-tuned EfficientNetB4 to achieve 99.35% test accuracy on this data set.Nigam et al. [23] combined the attention mechanism with the EfficientNetB0 model to detect the WheatRust21 image data set and obtained a test set accuracy of 98.70%.Cheng et al. [24] proposed a lightweight crop disease image recognition model, DSGIResNet_AFF, based on attention feature fusion.This model was superior to other network models, and its parameters and number of floating point operations were fewer than those of the original model, with an accuracy of 98.30%, which was suitable for mobile devices.Zhao et al. [25] proposed a model called DTL-SE-ResNet50, which integrates the SE module into ResNet50 based on dual-transfer learning to achieve vegetable disease recognition under simple and complex backgrounds, and performed better than the traditional model.The system could identify vegetable Plants 2024, 13, 2334 3 of 16 diseases quickly with a short detection time and high accuracy, compared with dtl-cam-resnet50 and DTL-SA-ResNet50.A network that deeply integrated the SE module into the ShuffleNetV2 network was constructed by Xu et al. [26].The accuracy of the model was 4.85% higher than that of the original model.Yang et al. [27] established a model named DGLNet to solve the problems related to background noise and the dispersed distribution of disease symptoms in real environments.The model combined the Global Attention Module (GAM) and the Dynamic Representation Module (DRM).The results showed that the recognition accuracy of DGLNet reached 99.82% and 99.71% on the two plant disease data sets, respectively, outperforming state-of-the-art methods.Chen et al. [28] proposed a novel domain adaptive image recognition method called simple domain adaptation network (SDAN), which combines channel and location attention modules for disease recognition in rice with a small number of samples.
The above studies significantly proved that the use of an attention mechanism could improve the accuracy of plant disease recognition models.In this study, a lightweight convolutional neural network for wheat disease based on near-ground remote sensing data, named MnasNet-SimAM, is proposed to solve the persistent problem of difficulty in recognizing crop disease in real complex environments.The SimAM module is used to extract depth features, focus on the disease locations, and avoid redundant information.In addition, the training speed and recognition ability of the network are improved through the use of improved activation functions and normalization.The main contributions of this research are outlined below: 1.
The effectiveness of five lightweight convolutional neural networks to identify six common wheat diseases and healthy wheat is explored, based on two optimizers and three learning rate scheduling strategies.

2.
The influence of different values of λ in the SimAM module on model recognition accuracy is studied, and the performance of the improved model is verified through visualization of the model results.Grad-Cam is used to compare the effects of different attention mechanisms in MnasNet.

3.
The influence of agricultural pre-training of weights on the model's dual transfer learning is analyzed.

4.
The generalization ability of MnasNet-SimAM on public data sets is validated.

Image Acquisition
Wheat disease images, including wheat stripe rust, leaf rust, stem rust, smut, mildew, Fusarium head blight (FHB), and healthy wheat leaves, were obtained from three different sources: (1) field photography, (2) public data sets, and (3) web crawling.For the field photography, during 2022-2023, we collected different images in two main wheat-producing areas of Xinjiang, Yili Kazak Autonomous Prefecture and Bayingolin Mongol Autonomous Prefecture.The collection time was from 10 a.m. to 6 p.m. on a sunny day.The images were taken in automatic exposure mode using a 48 megapixel mobile phone camera with the natural background of the field.All obtained images were in JPG format.Some wheat disease images were obtained from a public website (https://aistudio.baidu.com/).Figure 1 shows examples of wheat disease images with complex backgrounds.A total of 4677 images were collected, including 2117 images captured on-site, 895 images obtained from web sources, and 1665 images acquired from public data sets.

Image Preprocessing
Images with poor pixels were removed, and 500 images of each type of disease were selected from the original images.In this way, a total of 3500 images were obtained.In order to prevent overfitting of the model, random data enhancement was carried out on the images, including random rotation, random scaling, brightness adjustment, Gaussian blur, and Gaussian noise addition (Figure 2).After enhancement, a total of 7000 images were obtained, and the data set was divided into training, verification, and test sets in a ratio of 7:2:1 [29].The sample distribution of the data sets is presented in Table 1  (f) wheat smut; and (g) healthy wheat.

Image Preprocessing
Images with poor pixels were removed, and 500 images of each type of disease were selected from the original images.In this way, a total of 3500 images were obtained.In order to prevent overfitting of the model, random data enhancement was carried out on the images, including random rotation, random scaling, brightness adjustment, Gaussian blur, and Gaussian noise addition (Figure 2).After enhancement, a total of 7000 images were obtained, and the data set was divided into training, verification, and test sets in a ratio of 7:2:1 [29].The sample distribution of the data sets is presented in Table 1

Image Preprocessing
Images with poor pixels were removed, and 500 images of each type of disease were selected from the original images.In this way, a total of 3500 images were obtained.In order to prevent overfitting of the model, random data enhancement was carried out on the images, including random rotation, random scaling, brightness adjustment, Gaussian blur, and Gaussian noise addition (Figure 2).After enhancement, a total of 7000 images were obtained, and the data set was divided into training, verification, and test sets in a ratio of 7:2:1 [29].The sample distribution of the data sets is presented in Table 1

Attention Mechanism
The attention mechanism concept is inspired by the ability of the human brain to focus more attention on the important information in input data.As such, the use of an attention mechanism enables a model to efficiently process large amounts of data and extract critical information to improve its performance and generalization ability, better capturing complex relationships in the data [30].Attention mechanisms can help models to resolve information imbalances in input data, highlight key parts, and reduce attention to noisy or irrelevant information, especially in complex contexts such as disease identification [31,32].In complex classification tasks, the input data may contain a large amount of redundant or secondary information.The use of an attention mechanism allows the model to better discriminate the critical information from secondary information, improving its perception of crucial information, enhancing its robustness to input data and making it more sensitive to small changes in the input data.A 3D attention module has been proposed, and an energy function was designed to calculate the attention weights in 2021, named the simpleparameter-free attention module [19].As a 3D attention mechanism, SimAM considers the correlation of spatial and channel dimensions simultaneously through feature mapping of the feature layer without adding parameters to the original network, as depicted in Figure 3.The individual neurons estimate the importance, through which SimAM can calculate the attention weights.In neuroscience, information-rich neurons usually exhibit firing patterns different from those of their surrounding neurons.Moreover, activating neurons usually inhibit peripheral neurons, in a process known as spatial inhibition.In other words, neurons with spatial inhibition should receive higher attention, and the simplest method of finding these neurons is to measure the linear repairability between the target neuron and other neurons.Based on these neuroscientific findings, the following energy function for each neuron was defined in this study (Formula (1) [19]): (1) Minimizing the above formula is equivalent to linear separability between the training neurons t and other neurons in the same channel.For simplicity, binary labeling was used to add the regular terms, and the final energy function is defined as follows (Formula (2) [19]): where t = ω t t + b t and xi = ω t x i + b t are linear transforms of t and x i , respectively, where t and x i are target neurons in a single channel of the input feature X ∈ R C×H×W ; i is an index over the spatial dimension; M = H × W is the number of neurons on the channel; and ω t and b t are the weight and bias parameters.
may be more concise and efficient.In addition, the relatively smaller number of model parameters makes it easier to adapt to new task data and reduces the risk of overfitting.
The attention calculation of the SimAM module is more transparent and interpretable, due to its parameterless design.However, it also has certain limitations.For example, the expressive ability of the model is limited, as it cannot adapt different tasks through learnable parameters.Adding more parameters to SimAM may improve the model performance for some complex tasks.In this study, in order to explore the performance of SimAM, a comparative experiment was conducted using the above attention mechanism.

Improved MnasNet Architecture
Dual transfer learning transfers knowledge from a source task to a target task through one or more intermediate tasks and further applies the learned knowledge to the new task, resulting in a better match between the source and target domains [37].The The early Squeeze-and-Excitation (SE) module can adaptively learn the importance of each channel through the introduction of squeeze-and-excitation operations, adjusting the channel contribution in the feature map through dynamic weighting [33].The SE module uses a global pooling operation to learn the weight vector of the channel dimension, then multiplies the weight vector with the original features to obtain an enhanced feature representation.The Channel Attention (CA) module also introduces a channel dimension attention mechanism, but unlike the SE module, the CA module uses two parallel convolutions to generate attention and multiplies the attention with the original feature to enhance the feature representation.The CA module can capture the relationship between channels more accurately [34].The Efficient Channel Attention (ECA) module can calculate the attention weight for each position through applying a learnable convolution kernel to the channel dimension [35].Using the ECA module can reduce the number of parameters and computational complexity of a model, while introducing an attention mechanism to enhance feature representation.The Convolutional Block Attention Module (CBAM) is a module that combines channel attention and spatial attention [36].First, channel attention is introduced through the SE module, then an attention graph is generated in the spatial dimension using lightweight convolution operations, and finally, the final feature representation is obtained through multiplying the channel attention graph with the space attention graph.The SimAM model is relatively smaller compared to other attention mechanisms, as it does not introduce additional learnable parameters.This is advantageous for inference in the case of limited model size, and the computational process may be more concise and efficient.In addition, the relatively smaller number of model parameters makes it easier to adapt to new task data and reduces the risk of overfitting.The attention calculation of the SimAM module is more transparent and interpretable, due to its parameterless design.However, it also has certain limitations.For example, the expressive ability of the model is limited, as it cannot adapt different tasks through learnable parameters.Adding more parameters to SimAM may improve the model performance for some complex tasks.In this study, in order to explore the performance of SimAM, a comparative experiment was conducted using the above attention mechanism.

Improved MnasNet Architecture
Dual transfer learning transfers knowledge from a source task to a target task through one or more intermediate tasks and further applies the learned knowledge to the new task, resulting in a better match between the source and target domains [37].The apple leaf disease data set includes apple scab, apple rust, mixed disease, and healthy leaf images.As there are some similarities between the characteristics of apple leaf and wheat diseases, through pre-training on the apple disease data set, we explored whether the model could enhance its ability to capture wheat disease symptoms.The MnasNet model was improved through adding attention mechanism modules and using the dual transfer learning method.The attention mechanism was integrated into the last three inverted residual networks Plants 2024, 13, 2334 7 of 16 of MnasNet without changing the backbone network structure.The modified MnasNet-SimAM structure is shown in Figure 4.The steps of combining agricultural dual transfer learning with the attention mechanism were as follows: First, the SimAM module was added to the MnasNet model, which was pre-trained on ImageNet.Then, dual transfer learning was performed on the apple disease data set to obtain new weights for MnasNet-SimAM.Finally, MnasNet-SimAM was trained on the wheat disease data set, and the model was evaluated using the test set.In addition, MnasNet was optimized using other attention mechanisms, and the obtained results were compared.apple leaf disease data set includes apple scab, apple rust, mixed disease, and healthy leaf images.As there are some similarities between the characteristics of apple leaf and wheat diseases, through pre-training on the apple disease data set, we explored whether the model could enhance its ability to capture wheat disease symptoms.The MnasNet model was improved through adding attention mechanism modules and using the dual transfer learning method.The attention mechanism was integrated into the last three inverted residual networks of MnasNet without changing the backbone network structure.The modified MnasNet-SimAM structure is shown in Figure 4.The steps of combining agricultural dual transfer learning with the attention mechanism were as follows: First, the SimAM module was added to the MnasNet model, which was pre-trained on ImageNet.Then, dual transfer learning was performed on the apple disease data set to obtain new weights for MnasNet-SimAM.Finally, MnasNet-SimAM was trained on the wheat disease data set, and the model was evaluated using the test set.In addition, MnasNet was optimized using other attention mechanisms, and the obtained results were compared.

Evaluation Indicators
The performance evaluation of image classification models often relies on five common indicators: accuracy, precision, recall, F1 score, and the confusion matrix heat map.Accuracy represents the percentage of correctly categorized examples out of the total number.Precision measures a model's ability to distinguish between positive and negative samples by calculating the ratio of correctly predicted positive samples.Recall quantifies the percentage of positive samples correctly predicted in a given sample set, with higher recall values indicating models that excel at identifying positive samples.The F1 score provides a balanced metric that considers both precision and recall.The confusion matrix heat map visualizes the prediction result of the classification model and facilitates analysis of its strengths and weaknesses.Detailed formulas for these indicators are provided in Equations ( 3)-( 6):

Evaluation Indicators
The performance evaluation of image classification models often relies on five common indicators: accuracy, precision, recall, F1 score, and the confusion matrix heat map.Accuracy represents the percentage of correctly categorized examples out of the total number.Precision measures a model's ability to distinguish between positive and negative samples by calculating the ratio of correctly predicted positive samples.Recall quantifies the percentage of positive samples correctly predicted in a given sample set, with higher recall values indicating models that excel at identifying positive samples.The F1 score provides a balanced metric that considers both precision and recall.The confusion matrix heat map visualizes the prediction result of the classification model and facilitates analysis of its strengths and weaknesses.Detailed formulas for these indicators are provided in Equations ( 3)-( 6): where TP is the number of positive samples that are correctly predicted, FP is the number of negative samples that are incorrectly predicted, TN is the number of negative samples that are correctly predicted, and FN is the number of positive samples that are incorrectly predicted.

Experimental Environment
The training process for the proposed model was run on the Linux operating system and PyTorch2.0.0 (GPU version) framework.The software environment was CUDA11.8 and Python 3.8.The CPU used for the training data set was a 12 vCPU Intel(R) Xeon(R) Silver 4214R CPU @ 2.40 GHz, and the GPU was an RTX 3080 Ti (12 GB).The batch size was set to 16, and the number of iterations was set to 50.

Application of Five Lightweight Models for Wheat Disease Identification
The SGD and Adam optimizers were used to train five lightweight models with different initial learning rates, and the accuracy and loss values of the models were recorded.The three optimal weight data sets were used to calculate the average accuracy and standard error.The results of the five models regarding the recognition of six different wheat diseases and healthy wheat were compared.The training weight parameter sizes are provided in Table 2.The training results for the five lightweight models are listed in Table 3, and the accuracy and loss values are shown in Figure 5. than the EfficientNetV2 model.Therefore, MnasNet was used for follow-up improvement and optimization.
Figure 4 shows the confusion matrix for the five models.The darker the color of diagonal elements, the better the recognition performance of the model.EfficientNetV2 had the best identification performance in all seven categories, while the MnasNet model had similar performance regarding the identification of different diseases.

Influence of SimAM Attention Mechanism on MnasNet Model
λ is an important parameter in the SimAM module that affects the identification accuracy of the model.In this study, λ values were set in the range of 10 −3 to 10 −7 , and a comparative analysis was performed.The average and maximum accuracies of the results from different λ values on the test set are shown in Table 4.Each λ value was repeated three times, and the mean value and standard error were obtained.The optimal SimAM module was selected for comparison with other attention mechanisms (CA, ECA, SE, CBAM) on the wheat disease data set.
As shown in Table 4, the model's recognition effect was better when λ = 10 −5 or 10 −6 .This demonstrates that the model paid more attention to the detailed characteristics of samples when the λ value was small, which were taken as an important basis to judge the disease.The model could more accurately distinguish different types of diseases through analyzing the local characteristics of the samples.When the λ value was 10 −7 , the accuracy of the model decreased, as a smaller λ value led to a narrower search range, such that only As shown in Table 3, there were significant differences in average test accuracy of the same model under different initial learning rates and different training strategies (p < 0.05).Therefore, an appropriate initial learning rate needs to be chosen, according to the specific task.The five lightweight models all showed good recognition effects, and the accuracy rate reached 90% on the test set.Among them, the model with the highest accuracy on the test set was efficientnetv2, which achieved 97%, followed by MnasNet and GhostNet, which reached 93.43% and 93.29%, respectively.The results showed that the use of a lightweight convolutional neural network for wheat disease identification in the actual field environment is feasible.MnasNet had smaller model parameters and, thus, trained faster: in particular, its parameter size was 19.11 MB, making it approximately four times smaller than the EfficientNetV2 model.Therefore, MnasNet was used for follow-up improvement and optimization.
Figure 4 shows the confusion matrix for the five models.The darker the color of diagonal elements, the better the recognition performance of the model.EfficientNetV2 had the best identification performance in all seven categories, while the MnasNet model had similar performance regarding the identification of different diseases.

Influence of SimAM Attention Mechanism on MnasNet Model
λ is an important parameter in the SimAM module that affects the identification accuracy of the model.In this study, λ values were set in the range of 10 −3 to 10 −7 , and a comparative analysis was performed.The average and maximum accuracies of the results from different λ values on the test set are shown in Table 4.Each λ value was repeated three times, and the mean value and standard error were obtained.The optimal SimAM module was selected for comparison with other attention mechanisms (CA, ECA, SE, CBAM) on the wheat disease data set.As shown in Table 4, the model's recognition effect was better when λ = 10 −5 or 10 −6 .This demonstrates that the model paid more attention to the detailed characteristics of samples when the λ value was small, which were taken as an important basis to judge the disease.The model could more accurately distinguish different types of diseases through analyzing the local characteristics of the samples.When the λ value was 10 −7 , the accuracy of the model decreased, as a smaller λ value led to a narrower search range, such that only the local features were focused on while the global information was ignored.
From Table 5, it can be seen that the accuracy of the model increased by 1.14%, 1.42%, 0.57%, 0.42%, and 1.71% after the addition of CA, ECA, SE, CBAM, and SimAM attention modules, respectively.MnasNet-SimAM had the best accuracy, at 95.14%.In this study, the Grad-Cam class activation diagram (Figure 6) was used to visualize the degree of attention paid to lesion features by the last three inverted residual networks of the model after the addition of attention mechanisms.It can be seen that MnasNet-SimAM was much more focused on the lesion site than the original model.In addition, the influence of some complex and irrelevant background elements on the classification task was reduced.Particularly in the classification of wheat smut, MnasNet not only expanded the focus on disease spots, but also on small disease spots that were overlooked by the original model.Table 6 provides the classification results of MnasNet-SimAM for common wheat diseases.With F1 scores used as the final evaluation index, the recognition performance of the model reached more than 90% for each category, and the F1 score for wheat stripe rust reached 98%.

Effect of Dual Transfer Learning Using Agricultural Disease Pre-Training Weights
Next, the influence of dual transfer learning on wheat disease recognition based agricultural pre-training weights was explored.The weights with the highest accura obtained through MnasNet-SimAM dual transfer learning were saved, and the changes accuracy and loss values and the comparison with single transfer learning were record as shown in Figure 7.The results show that, when using dual transfer learning, a high initial accuracy and lower losses were achieved, compared to single transfer, with the v idation set having only one-tenth of the losses observed with single transfer learni When the model was trained to converge at 25 epochs, using dual transfer learning duced the training time by 2.566 min compared to single transfer, which accounted 16.81% of the convergence process.Figure 8 exhibits the accuracy of the top five train weights in the test set for dual transfer learning versus single transfer learning.Dual tra fer learning for training improved the speed of convergence on the validation set, but creased it on the test set by about 4%.When trained on the apple leaf disease data set, model might overfit the specific features of the data set, leading to a decrease in its gen alization ability on the wheat disease data set.Next, the influence of dual transfer learning on wheat disease recognition based on agricultural pre-training weights was explored.The weights with the highest accuracy obtained through MnasNet-SimAM dual transfer learning were saved, and the changes in accuracy and loss values and the comparison with single transfer learning were recorded, as shown in Figure 7.The results show that, when using dual transfer learning, a higher initial accuracy and lower losses were achieved, compared to single transfer, with the validation set having only one-tenth of the losses observed with single transfer learning.When the model was trained to converge at 25 epochs, using dual transfer learning reduced the training time by 2.566 min compared to single transfer, which accounted for 16.81% of the convergence process.

Testing Model Robustness on a Public Data Set
The Wheat Fungi Diseases (WFD) data set (available at http://wfd.sysbio.ru/index.html)includes wheat stripe rust, leaf rust, stem rust, powdery mildew, and healthy leaf images.For this experiment, 50 images for each disease category were randomly selected from these data, and a total of 250 images were used to construct the WFD test set.The WFD test set was then used to evaluate the MnasNet-SimAM model (see Figure 9 and Table 7).

Testing Model Robustness on a Public Data Set
The Wheat Fungi Diseases (WFD) data set (available at http://wfd.sysbio.ru/index.html)includes wheat stripe rust, leaf rust, stem rust, powdery mildew, and healthy leaf images.For this experiment, 50 images for each disease category were randomly selected from these data, and a total of 250 images were used to construct the WFD test set.The WFD test set was then used to evaluate the MnasNet-SimAM model (see Figure 9 and Table 7).

Testing Model Robustness on a Public Data Set
The Wheat Fungi Diseases (WFD) data set (available at http://wfd.sysbio.ru/index.html accessed on 6 August 2024) includes wheat stripe rust, leaf rust, stem rust, powdery mildew, and healthy leaf images.For this experiment, 50 images for each disease category were randomly selected from these data, and a total of 250 images were used to construct the WFD test set.The WFD test set was then used to evaluate the MnasNet-SimAM model (see Figure 9 and Table 7).
Plants 2024, 13, x FOR PEER REVIEW 14 of 17  The obtained results indicated that MnasNet-SimAM had better fitting ability and robustness, with accuracy of 91.20% on the public data set.The F1 scores for the five categories were as follows: 90.74%, 92.47%, 92.47%, 92.31%, and 89.11%.

Discussion
Many studies have shown that the use of attention mechanisms can significantly improve the performance of models.As this study was based on transfer learning to build a wheat disease recognition network, the addition of an attention mechanism should not change the model network structure.Therefore, the SimAM module was added to the last three layers of the inverted residual network of MnasNet.The resulting model could capture more global context information and had improved understanding of the input image.The non-linear relationships between pixels could be better captured by this model.The complex features of the image could be better extracted, and useful information could be obtained while suppressing useless information [38][39][40].The size of the original model only increased by 0.01 MB after adding the SimAM module, meaning that it remained efficient for training agricultural disease image classification models.Li [41] introduced a convolutional neural network model called Sim-ConvNeXt for maize disease classification.The SimAM attention module was integrated into this model, and the accuracy was improved by 1.5% based on the original model, consistent with the results of this study.
λ is an important parameter used by the SimAM module to calculate the importance of neurons.It is a regularization term that is used to add a small constant to the denominator when calculating the variance, thus ensuring numerical stability and avoiding division by zero.Yang [18] explored the influence of the λ value on the SimAM module performance, and the highest accuracy was achieved with a value of 10 −5 , while the performance declined with a value of 10 −6 .However, in this study, the average accuracy on the  The obtained results indicated that MnasNet-SimAM had better fitting ability and robustness, with accuracy of 91.20% on the public data set.The F1 scores for the five categories were as follows: 90.74%, 92.47%, 92.47%, 92.31%, and 89.11%.

Discussion
Many studies have shown that the use of attention mechanisms can significantly improve the performance of models.As this study was based on transfer learning to build a wheat disease recognition network, the addition of an attention mechanism should not change the model network structure.Therefore, the SimAM module was added to the last three layers of the inverted residual network of MnasNet.The resulting model could capture more global context information and had improved understanding of the input image.The non-linear relationships between pixels could be better captured by this model.The complex features of the image could be better extracted, and useful information could be obtained while suppressing useless information [38][39][40].The size of the original model only increased by 0.01 MB after adding the SimAM module, meaning that it remained efficient for training agricultural disease image classification models.Li [41] introduced a convolutional neural network model called Sim-ConvNeXt for maize disease classification.The SimAM attention module was integrated into this model, and the accuracy was improved by 1.5% based on the original model, consistent with the results of this study.
λ is an important parameter used by the SimAM module to calculate the importance of neurons.It is a regularization term that is used to add a small constant to the denominator Plants 2024, 13, 2334 14 of 16 when calculating the variance, thus ensuring numerical stability and avoiding division by zero.Yang [18] explored the influence of the λ value on the SimAM module performance, and the highest accuracy was achieved with a value of 10 −5 , while the performance declined with a value of 10 −6 .However, in this study, the average accuracy on the test set was similar when the λ value was equal to 10 −5 or 10 −6 .Different from the results obtained by Yang, the maximum accuracy of the module was 95.14% when λ was equal to 10 −6 .The strength of the attention mechanism and the performance of the model may be differently affected by the λ value due to differences in the nature of the task.In this study, the attention mechanism almost failed when λ was close to zero.The model might ignore most of the information in the input data, such that its focus on important features was lost, degrading the model's performance.When λ was too large, the SimAM module focused too much on some local features and ignored others, causing the model to be overly sensitive and/or overfit to noise and irrelevant information.The optimal value of the parameter λ can be expected to vary, according to the global or local attention required in the actual task.Therefore, determination of the parameter λ could require multiple tests and different hyperparameter adjustments.In the training of models for agricultural disease classification tasks, the size of λ needs to be reduced such that the model can pay attention to the small disease spots while ignoring the influence of the complex background.
Dual transfer learning is an advanced transfer learning technology that allows for better adaptation to the target task through the use of knowledge from multiple source tasks simultaneously.Zhao [25] investigated the impact of dual transfer learning on ResNet50.The ImageNet data set was used for single transfer learning, while the AI Challenger 2018 data set was used for dual transfer learning.Dual transfer learning improved the model's training efficiency and accuracy.As mentioned in the study of Mukhlif [37], most of the previous transfer learning studies suffer from overfitting; hence, a 50% dropout layer was added to their experiments to minimize this problem.In this study, we also increased the dropout layers by 20%, but the accuracy on the test set still decreased by about 4%.This might be due to domain differences, feature mismatches, or overfitting, as reported in previous studies.Although this sped up the training convergence time, it is not cost-effective to sacrifice accuracy with high computing power and small samples.Therefore, researchers need to continue to explore ways in which training speed can be improved while maintaining accuracy in the future.
When tested on the WFD data set, the F1 score for wheat leaf rust was below 90%.This was because 4 out of 50 images of leaf rust were misclassified as stripe rust.In the study of Jiang [42], the seven models tested also misjudged stripe rust and leaf rust in the wheat disease recognition task.The most serious misjudgment (at 8%) was observed with DenseNet-121.As a certain similarity between wheat stripe rust and leaf rust in disease spots could lead to model classification errors, we should consider how to distinguish similar diseases in future work.

Conclusions
This study explored the possibility of using lightweight convolutional neural networks to recognize common wheat diseases based on transfer learning.A model named MnasNet-SimAM was constructed based on the SimAM attention mechanism.The proposed model achieved a highest accuracy of 95.14% and an average accuracy of 94.62% on the test set.Moreover, the parameter size of the improved model was only increased by 0.01 MB, while the accuracy was improved significantly (by 1.7%) over that of the original model.Furthermore, the use of pre-training weights for dual transfer learning sped up model convergence and reduced convergence time by 16.81%.On the Wheat Fungi Diseases data set, MnasNet-SimAM achieved an accuracy of 91.20%, indicating that MnasNet-SimAM is robust and can be used for wheat disease identification on mobile devices.
. The images were normalized and scaled to 224 × 224 pixels, in order to reduce model overfitting and accelerate model convergence before training.

Figure 6 .
Figure 6.Class activation diagram for MnasNet with different attention mechanisms.

Figure 6 .
Figure 6.Class activation diagram for MnasNet with different attention mechanisms.

Figure 7 .
Figure 7. (a,b) Comparison of accuracy and loss of dual transfer learning and single transfer learning on validation set.

Figure 8 .
Figure 8. Scatter plots of the highest quintic precision with dual transfer learning and single transfer learning.

Figure 7 .Figure 7 .
Figure 7. (a,b) Comparison of accuracy and loss of dual transfer learning and single transfer learning on validation set.

Figure 8 .
Figure 8. Scatter plots of the highest quintic precision with dual transfer learning and single transfer learning.

Figure 8 .
Figure 8. Scatter plots of the highest quintic precision with dual transfer learning and single transfer learning.

Figure 9 .
Figure 9. Confusion matrix of MnasNet-SimAM on the WFD test set.

Figure 9 .
Figure 9. Confusion matrix of MnasNet-SimAM on the WFD test set.
. The images were normalized and scaled to 224 × 224 pixels, in order to reduce model overfitting and accelerate model convergence before training.

Table 1 .
Sample distribution in wheat disease data sets.

Table 2 .
Training weight parameter size of the five lightweight models.

Table 3 .
Results for five lightweight models different learning rates (p < 0.05).
Note: Different lowercase letters indicated the significant difference at the 0.05 level.Plants 2024, 13, 2334 9 of 16

Table 5 .
Comparison of different attention mechanisms with the original model on the test set (p < 0.05).
Plants 2024, 13, x FOR PEER REVIEW 12 of

Table 6 .
Classification results of the final model.

Table 7 .
Classification results of MnasNet-SimAM on the WFD data set.

Table 7 .
Classification results of MnasNet-SimAM on the WFD data set.