A Shallow Pooled Weighted Feature Enhancement Network for Small-Sized Pine Wilt Diseased Tree Detection

: Pine wild disease poses a serious threat to the ecological environment of national forests. Combining the object detection algorithm with Unmanned Aerial Vehicles (UAV) to detect pine wild diseased trees (PWDT) is a signiﬁcant step in preventing the spread of pine wild disease. To address the issue of shallow feature layers lacking the ability to fully extract features from small-sized diseased trees in existing detection algorithms, as well as the problem of a small number of small-sized diseased trees in a single image, a Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN) based on Small Target Expansion (STE) has been proposed for detecting PWDT. First, a Pooled Weighted Channel Attention (PWCA) module is presented and introduced into the shallow feature layer with rich small target information to enhance the network’s expressive ability regarding the characteristics of two-layer shallow feature maps. Additionally, an STE data enhancement method is introduced for small-sized targets, which effectively increases the sample size of small-sized diseased trees in a single image. The experimental results on the PWDT dataset indicate that the proposed algorithm achieved an average precision and recall of 79.1% and 86.9%, respectively. This is 3.6 and 3.8 percentage points higher, respectively, than the recognition recall and average precision of the existing state-of-the-art method Faster-RCNN, and 6.4 and 5.5 percentage points higher than those of the newly proposed YOLOv6 method.


Introduction
Pine wild disease (PWD), known as the pine killer, poses a significant threat to pine forests globally [1]. This disease is caused by the pine wood nematode, which infiltrates and reproduces within the pine tree, ultimately resulting in the tree's demise [2].
At present, effective prevention and control measures involve manually cutting down infected pine trees affected by pine wilt disease, followed by centralized burning of the felled diseased trees. Additionally, a special medicine is sprayed on the stumps of the diseased trees and sealed to prevent secondary transmission. An important prerequisite for the above-mentioned control measures is the identification and localization of infected pine trees, which is achieved through the detection of diseased trees. Traditional monitoring of pine tree blights mainly relies on manual detection. Staff observe the appearance and surface morphological characteristics of trees, judging based on the color change characteristics of infected pine trees, such as yellowish-brown and reddish-brown [3]. This method has the disadvantages of poor timeliness and large recognition errors, making it difficult to effectively complete the task of epidemic monitoring.
Compared with manual detection, aerial remote sensing image monitoring has the advantages of wide coverage, low labor intensity, and high efficiency. However, imple-state-of-the-art methods in terms of detection accuracy, especially for small PWDT. In addition, a comprehensive analysis is performed to study the effectiveness of the proposed pooling and weighting schemes, as well as the contribution of shallow and deep features.
The remaining chapters of this paper are arranged as follows: Section 2 introduces in detail the dataset of pine wilt diseased trees produced in this paper, the experimental environment used in this paper, the design of experimental parameters, and a detailed description of the proposed SPW-FEN method. In Section 3, the results of our comparative experiments and ablation experiments are summarized and analyzed. Finally, Section 4 concludes the paper and discusses future directions.

UAV Pine Forest Image Acquisition
UAVs equipped with high-resolution cameras were used to take images of pine forests in Yiling District and Yidu City of Yichang City according to fixed routes. Among them, the UAV model was MD-25 UAV. This model is powered by four T-MOTOR motors and TMOTOR flame high-voltage electronic governors to provide rotor power; one T-MOTOR motor is matched with T-MOTOR high-voltage, and the electronic governor provides fixedwing power. Power device type: electric brushless engine, electronic speed control system; control device type: micro servo steering gear. The overall appearance of the MD-25 UAV is shown in Figure 1, and the main parameters of the MD-25 drone casing are shown in Table 1 below. The proposed network is evaluated on the pine wilt diseased trees dataset containing UAV images of PWDT. Experimental results show that SPW-FEN outperforms several state-of-the-art methods in terms of detection accuracy, especially for small PWDT. In addition, a comprehensive analysis is performed to study the effectiveness of the proposed pooling and weighting schemes, as well as the contribution of shallow and deep features.
The remaining chapters of this paper are arranged as follows: Section 2 introduces in detail the dataset of pine wilt diseased trees produced in this paper, the experimental environment used in this paper, the design of experimental parameters, and a detailed description of the proposed SPW-FEN method. In Section 3, the results of our comparative experiments and ablation experiments are summarized and analyzed. Finally, Section 4 concludes the paper and discusses future directions.

UAV Pine Forest Image Acquisition
UAVs equipped with high-resolution cameras were used to take images of pine forests in Yiling District and Yidu City of Yichang City according to fixed routes. Among them, the UAV model was MD-25 UAV. This model is powered by four T-MOTOR motors and TMOTOR flame high-voltage electronic governors to provide rotor power; one T-MOTOR motor is matched with T-MOTOR high-voltage, and the electronic governor provides fixed-wing power. Power device type: electric brushless engine, electronic speed control system; control device type: micro servo steering gear. The overall appearance of the MD-25 UAV is shown in Figure 1, and the main parameters of the MD-25 drone casing are shown in Table 1 below.   The cameras were Zeiss 35 mm fixed-focus lens, 36 million pixels, as shown in Figure 2.  The cameras were Zeiss 35 mm fixed-focus lens, 36 million pixels, as shown in Figure  2. In terms of route setting: set the relative flight height to 350 m, the average ground resolution to 4.89 cm, the flight height difference between adjacent photos on the same route to ≤30 m, and the difference between the actual flight height and the design flight height to ≤50 m; the heading overlap is 70%, the lateral overlap is 35%, and the single flight is 70 km. The sun elevation angle at the time of photography is greater than 30-40°. In terms of route setting: set the relative flight height to 350 m, the average ground resolution to 4.89 cm, the flight height difference between adjacent photos on the same route to ≤30 m, and the difference between the actual flight height and the design flight height to ≤50 m; the heading overlap is 70%, the lateral overlap is 35%, and the single flight is 70 km. The sun elevation angle at the time of photography is greater than 30-40 • .

Pine Wilt Diseased Tree Dataset
The pine forest images taken according to the UAV, on-board camera, and route in the previous section were used as the data source of the dataset. Crop the obtained drone image with a pixel size of 7952 × 5304 to a size of 1000 × 1000 pixels, as shown in Figure 3:

Pine Wilt Diseased Tree Dataset
The pine forest images taken according to the UAV, on-board camera, and route in the previous section were used as the data source of the dataset. Crop the obtained drone image with a pixel size of 7952 × 5304 to a size of 1000 × 1000 pixels, as shown in Figure 3: Then, use the LabelImg tool to label the pine wilt diseased trees in the cropped 1000 × 1000-pixel image, in which the red non-diseased trees, yellow bare land, and red roofs that are prone to interference are marked as negative sample classes, as shown in Figure  4. A total of 3271 positive samples of diseased pine trees and 1000 negative samples of easily disturbed diseased trees were marked, and then the marked image data were divided into training datasets, validation datasets, and test datasets. Among them, according to the pixel size of the diseased tree, the classification method of the COCO dataset defines targets with a diseased tree target pixel area smaller than 32 × 32 pixels as a small Then, use the LabelImg tool to label the pine wilt diseased trees in the cropped 1000 × 1000-pixel image, in which the red non-diseased trees, yellow bare land, and red roofs that are prone to interference are marked as negative sample classes, as shown in Figure 4.

Pine Wilt Diseased Tree Dataset
The pine forest images taken according to the UAV, on-board camera, and route in the previous section were used as the data source of the dataset. Crop the obtained drone image with a pixel size of 7952 × 5304 to a size of 1000 × 1000 pixels, as shown in Figure 3: Then, use the LabelImg tool to label the pine wilt diseased trees in the cropped 1000 × 1000-pixel image, in which the red non-diseased trees, yellow bare land, and red roofs that are prone to interference are marked as negative sample classes, as shown in Figure  4. A total of 3271 positive samples of diseased pine trees and 1000 negative samples of easily disturbed diseased trees were marked, and then the marked image data were divided into training datasets, validation datasets, and test datasets. Among them, according to the pixel size of the diseased tree, the classification method of the COCO dataset defines targets with a diseased tree target pixel area smaller than 32 × 32 pixels as a small A total of 3271 positive samples of diseased pine trees and 1000 negative samples of easily disturbed diseased trees were marked, and then the marked image data were divided into training datasets, validation datasets, and test datasets. Among them, according to the pixel size of the diseased tree, the classification method of the COCO dataset defines targets with a diseased tree target pixel area smaller than 32 × 32 pixels as a small target, targets with an area between 32 × 32 and 96 × 96 pixels as a medium object, and objects whose area is larger than 96 × 96 is defined as a large object [15]. See Table 2 for more information on the dataset. It can be seen from Table 2 that in our PWDT dataset, the medium-sized diseased trees accounted for the largest proportion, and the number of small-sized diseased trees and large-scale diseased trees was relatively small. The number of small target diseased trees in the training set was 727, accounting for 13.2% of the total target number. The number of small target diseased trees in the validation dataset and the test dataset was relatively small. Figure 5 shows in detail the proportion of small target diseased trees, medium target diseased trees, and large target diseased trees in the training set and verification set. In the training set, small target diseased trees accounted for 13% and medium target diseased trees and large target diseased trees accounted for 77% and 10%, respectively, while in the verification set, small target diseased trees accounted for 11% and medium target diseased trees and large target diseased trees accounted for 80% and 9%, respectively. In the pine wilt diseased tree dataset produced in this paper, the medium target diseased trees accounted for the vast majority, and the small target diseased trees and large target diseased trees accounted for a small proportion, resulting in uneven distribution of diseased trees within the class.
Electronics 2023, 11, x FOR PEER REVIEW 6 of 18 target, targets with an area between 32 × 32 and 96 × 96 pixels as a medium object, and objects whose area is larger than 96 × 96 is defined as a large object [15]. See Table 2 for more information on the dataset. It can be seen from Table 2 that in our PWDT dataset, the medium-sized diseased trees accounted for the largest proportion, and the number of small-sized diseased trees and large-scale diseased trees was relatively small. The number of small target diseased trees in the training set was 727, accounting for 13.2% of the total target number. The number of small target diseased trees in the validation dataset and the test dataset was relatively small. Figure 5 shows in detail the proportion of small target diseased trees, medium target diseased trees, and large target diseased trees in the training set and verification set. In the training set, small target diseased trees accounted for 13% and medium target diseased trees and large target diseased trees accounted for 77% and 10%, respectively, while in the verification set, small target diseased trees accounted for 11% and medium target diseased trees and large target diseased trees accounted for 80% and 9%, respectively. In the pine wilt diseased tree dataset produced in this paper, the medium target diseased trees accounted for the vast majority, and the small target diseased trees and large target diseased trees accounted for a small proportion, resulting in uneven distribution of diseased trees within the class.

Method
In this section, we first describe the Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN) in Section 2.

Method
In this section, we first describe the Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN) in Section 2.3.1, and then present the Pooled Weighted Channel Attention (PWCA) module in Section 2.3.2. Finally, Section 2.3.3 illustrates the proposed STE data enhancement method.

Shallow Pooled Weighted Feature Enhancement Network (SPW-FEN)
The proposed SPW-FEN uses a ResNet50 [16] feature extraction network to extract features to generate feature maps C1, C2, C3, C4, and C5. At the neck of the network, the feature pyramid structure is used to fuse the shallow feature map with high resolution and the deep feature map with low resolution and rich semantic information. Among them, the shallow feature map has more edge position information about the small-sized target, which is conducive to the detection of small-sized PWDT; the deep feature map has more semantic information, while the small target occupies fewer pixels in the image. In the deep feature map after multi-layer convolution, the feature information is easy to be lost.
The RetinaNet [17] algorithm uses the prediction feature map P3 that combines the feature map C4 and the shallow feature map C3 to predict the output of small-sized targets. The feature maps C3 and C4 have lost some details for some small-sized targets.
The proposed algorithm adds the prediction feature map P2, which combines the shallow feature maps C3 and C2 to divide the small-sized diseased tree targets into smallsized targets and minimum-sized targets, respectively, in the prediction feature map P3, and the P2 layer carries out shunt prediction output. At the same time, we introduce the PWCA module behind the shallow feature maps C2 and C3 to enhance the feature response ability of the shallow feature layer to small-sized targets.
In addition, based on the statistics of the scale distribution of diseased tree targets in the pine wilt diseased tree dataset, it was found that the minimum-scale diseased trees with target scales less than 24 × 24 pixels in the PWDT dataset accounted for 4.4%, and the number was 295; the proportion of diseased trees with a target scale greater than 256 × 256 pixels in the dataset was 0%. Therefore, the proposed network model adds the shallow feature map P2, deletes the deep feature maps P6 and P7, and designs the anchor frame size of each layer. According to the distribution of target scales in the dataset, set the anchor size on the prediction feature maps P2 to P5 to 16 × 16, 36 × 36, 78 × 78, and 140 × 140, respectively. Figure 6 is the structure of the SPW-FEN network proposed in this paper. The proposed SPW-FEN uses a ResNet50 [16] feature extraction network to extract features to generate feature maps C1, C2, C3, C4, and C5. At the neck of the network, the feature pyramid structure is used to fuse the shallow feature map with high resolution and the deep feature map with low resolution and rich semantic information. Among them, the shallow feature map has more edge position information about the small-sized target, which is conducive to the detection of small-sized PWDT; the deep feature map has more semantic information, while the small target occupies fewer pixels in the image. In the deep feature map after multi-layer convolution, the feature information is easy to be lost.
The RetinaNet [17] algorithm uses the prediction feature map P3 that combines the feature map C4 and the shallow feature map C3 to predict the output of small-sized targets. The feature maps C3 and C4 have lost some details for some small-sized targets.
The proposed algorithm adds the prediction feature map P2, which combines the shallow feature maps C3 and C2 to divide the small-sized diseased tree targets into smallsized targets and minimum-sized targets, respectively, in the prediction feature map P3, and the P2 layer carries out shunt prediction output. At the same time, we introduce the PWCA module behind the shallow feature maps C2 and C3 to enhance the feature response ability of the shallow feature layer to small-sized targets.
In addition, based on the statistics of the scale distribution of diseased tree targets in the pine wilt diseased tree dataset, it was found that the minimum-scale diseased trees with target scales less than 24 × 24 pixels in the PWDT dataset accounted for 4.4%, and the number was 295; the proportion of diseased trees with a target scale greater than 256 × 256 pixels in the dataset was 0%. Therefore, the proposed network model adds the shallow feature map P2, deletes the deep feature maps P6 and P7, and designs the anchor frame size of each layer. According to the distribution of target scales in the dataset, set the anchor size on the prediction feature maps P2 to P5 to 16 × 16, 36 × 36, 78 × 78, and 140 × 140, respectively. Figure 6 is the structure of the SPW-FEN network proposed in this paper.

Pooled Weighted Channel Attention (PWCA) Module
A large number of research results show that the channel attention module is conducive to the feature extraction of the target area by the network and can effectively mitigate the effect of background information on the feature extraction of small-sized targets [18][19][20][21]. To enhance the feature extraction ability of the shallow feature layer for small-scale diseased trees, in this paper, we propose a PWCA module, which is added after the 1 × 1 convolution operation of the shallow feature images C2 and C3. The PWCA module can increase the attention weight of the network model to the diseased tree area, inhibit the characteristic response of the background area, increase the network model's ability to distinguish small target feature channels and background channels and improve the network's detection performance of small-scale diseased trees. The structure of the PWCA module is shown in Figure 7.
diseased trees, in this paper, we propose a PWCA module, which is added after the 1 × 1 convolution operation of the shallow feature images C2 and C3. The PWCA module can increase the attention weight of the network model to the diseased tree area, inhibit the characteristic response of the background area, increase the network model's ability to distinguish small target feature channels and background channels and improve the network's detection performance of small-scale diseased trees. The structure of the PWCA module is shown in Figure 7. First, the global average pooling (GAP) and global maximum pooling (GMP) operations were performed on the feature graph F with dimensions H × W × C output from the backbone network to obtain two one-dimensional feature vectors of 1 × 1 × C with different spatial context information; then, the two one-dimensional eigenvectors of 1 × 1 × C obtained using GAP and GMP were, respectively, convolutional to generate two sets of channel weight values, where K was adaptively determined by the mapping of channel dimension C, as shown in Formula (1): where γ = 2, b = 1, and K is the odd number of the neighboring calculation. The weights of the two channels are adaptively added . Additionally, they are fused according to the random weighting to obtain the pooled weighted attention channel weights X, as shown in Formula (2): (2) where λ and β are two super parameters.
Additionally, the weight is then normalized to 0-1 through the sigmoid activation function to obtain the attention weight. The obtained attention weight is dot multiplied with the original feature map F to obtain the attention feature map , as shown in Formula (3): First, the global average pooling (GAP) and global maximum pooling (GMP) operations were performed on the feature graph F with dimensions H × W × C output from the backbone network to obtain two one-dimensional feature vectors of 1 × 1 × C with different spatial context information; then, the two one-dimensional eigenvectors of 1 × 1 × C obtained using GAP and GMP were, respectively, convolutional to generate two sets of channel weight values, where K was adaptively determined by the mapping of channel dimension C, as shown in Formula (1): where γ = 2, b = 1, and K is the odd number of the neighboring calculation.
The K A weights of the two channels are adaptively added K M . Additionally, they are fused according to the random weighting to obtain the pooled weighted attention channel weights X, as shown in Formula (2): where λ and β are two super parameters. Additionally, the weight is then normalized to 0-1 through the sigmoid activation function to obtain the attention weight. The X obtained attention weight is dot multiplied X with the original feature map F to obtain the attention feature map F , as shown in Formula (3):

Small Target Expansion (STE) Data Enhancement Method
The total number of small-sized diseased trees in our PWDT dataset and the number of small-sized diseased trees in a single image is small, and a small number of small-sized diseased tree data is not enough for the feature extraction network to extract their features. Therefore, in this paper, we propose an STE data enhancement method based on smallsized targets with double fixed scaling. First, through the fixed scale scaling method, four pictures numbered 1, 2, 3, and 4 are randomly selected from the pine wilt diseased tree dataset, and the length and width of the four pictures are scaled to the same ratio of 0.4, 0.5, and 0.6 to obtain the scaled picture Img1, Img2, Img3, and Img4, as shown in Formula (4): (random(0.4, 0.5, 0.6 Next, create a new rectangular box whose length and width are twice the size of the picture in the pine wilt diseased tree dataset. Take the center of the square box as the dividing point, and divide the rectangular box into four sub-areas, r1, r2, r3, and r4, of the same size. Then, fill the pictures Img1, Img2, Img3, and Img4 randomly into the sub-regions r1, r2, r3, and r4, reduce the length and width of the filled rectangular box by two, and the resulting rectangular box is an expanded sample image. Finally, remove the scaled and spliced pictures from the pine wilt tree dataset and repeat the above steps in the remaining pictures; a large number of expanded sample graphs were obtained and stored in the PWDT dataset, and the specific operation flow is shown in Figure 8. Therefore, in this paper, we propose an STE data enhancement method based on smallsized targets with double fixed scaling. First, through the fixed scale scaling method, four pictures numbered 1, 2, 3, and 4 are randomly selected from the pine wilt diseased tree dataset, and the length and width of the four pictures are scaled to the same ratio of 0.4, 0.5, and 0.6 to obtain the scaled picture Img1, Img2, Img3, and Img4, as shown in Formula (4): Next, create a new rectangular box whose length and width are twice the size of the picture in the pine wilt diseased tree dataset. Take the center of the square box as the dividing point, and divide the rectangular box into four sub-areas, r1, r2, r3, and r4, of the same size. Then, fill the pictures Img1, Img2, Img3, and Img4 randomly into the sub-regions r1, r2, r3, and r4, reduce the length and width of the filled rectangular box by two, and the resulting rectangular box is an expanded sample image. Finally, remove the scaled and spliced pictures from the pine wilt tree dataset and repeat the above steps in the remaining pictures; a large number of expanded sample graphs were obtained and stored in the PWDT dataset, and the specific operation flow is shown in Figure 8.

Results
In this section, we first introduce our experimental environment. Then, we introduce the evaluation metric of our experimental results, and then compare our algorithm with several current mainstream object detection algorithms on our dataset. Finally, an ablation experiment is designed for the proposed modules.

Results
In this section, we first introduce our experimental environment. Then, we introduce the evaluation metric of our experimental results, and then compare our algorithm with several current mainstream object detection algorithms on our dataset. Finally, an ablation experiment is designed for the proposed modules.

Experimental Environment and Parameter Setting
The detection algorithm in this paper is based on the PyTorch framework and uses NVIDIA GeForce RTX 3090. Using the dataset of PWDT made by ourselves to train the network model, a total of 120 epochs were trained in this experiment, and the learning rate was adjusted at the 80th and 110th epochs. The initial learning rate was set to 0.0001, and the batch size was set to four. The experimental environment and experimental parameter settings are shown in Table 3.

Evaluation Metric
Target detection algorithm evaluation indicators are mainly divided into two categories: classification indicators and localization indicators.
Classification indicators: These mainly measure the classification ability of the algorithm for the target category. Commonly used indicators are Accuracy, Precision, Recall, and F1-score. Among them, the accuracy rate is an indicator to measure the overall classification of the algorithm, while the precision rate and recall rate pay more attention to the classification of a single target category by the algorithm. F1-score is a comprehensive index of precision rate and recall rate, which can more comprehensively evaluate the classification ability of the algorithm. It is defined as the harmonic mean of precision rate and recall rate. Its formula is as follows: Positioning index: It mainly measures the evaluation of the algorithm on the target positioning ability. Commonly used indicators are the Intersection over Union (IoU), average precision (AP), and mean average precision (map). The IoU is an indicator for measuring the accuracy of the algorithm for target positioning; AP average accuracy is one of the indicators for evaluating image retrieval results. It is the abbreviation of average precision, which means that for a set of query images, all the prediction results are averaged. AP is calculated by sorting the retrieval results and calculating the area of recall and precision. For each query image, by comparing the similarity between the predicted result and the ground truth label, a set of ranked lists can be generated where each retrieved result has a relevance score. Sort these scores from high to low, and calculate the precision at each recall. Finally, the AP can be obtained by taking the average of the accuracy rates under all recall rates, and the formula is as follows; mAP considers the classification and positioning capabilities of the algorithm for all target categories, and the calculation formula of AP is as follows: where T P represents the number of samples with actual positive labels that are correctly classified as positive. F P indicates the number of samples with actual negative labels that are incorrectly classified as positive. F N denotes the number of samples with actual positive labels that are incorrectly classified as negative. P represents precision, and R represents recall.
In practical scenarios, object detection algorithms are evaluated based on both classification and localization indicators to comprehensively assess their performance. However, for specific applications, different indicators may need to be selected based on the specific conditions and requirements.
In the case of detecting pine wilt diseased trees, the priority is to minimize missed detections to prevent the spread of the disease. Hence, this study uses recall rate and average precision as performance indicators, where the recall rate measures the proportion of predicted positives to all annotated positives. It is expected that the model's recall rate is as high as possible while ensuring a high overall performance AP.

Comparative Experimental Results
To verify the performance of our proposed network model, we compared the verification results of the current seven mainstream target detection algorithms and our proposed detection algorithms on the PWDT dataset through experiments; the experimental results can be seen in Table 4 below. The experimental results show that compared with the classic network Faster-RCNN [22] and the mainstream network SSD [23], YOLOv3 [24], ATSS [25], YOLOF [26], FoveaBox [27], and YOLOv6 [28], the proposed detection algorithm achieves the best detection results, with a recall and AP of 86.9 and 79.1, respectively. The visual identification comparison results of each network on the test set are shown in Figure 9.
It can be found from the experimental comparison results in the two test samples in Figure 9 that the SPW-FEN algorithm proposed in this paper has the best recognition effect in small-sized pine wilt diseased trees. YOLOv3, Faster-RCNN, and ATSS all have obvious missed detections. The method proposed in this paper has greatly alleviated the missed detection of small-sized diseased trees, and the recognition effect is the best.

Ablation Study
To further analyze the impact of the proposed channel attention module and data enhancement module of this paper on the network performance, we used RetinaNet as the base network, and the effectiveness of the designed method will be discussed in the following three aspects: small-sized diseased tree shunt prediction output, anchor box recalibration, and PWCA module. The specific experimental analysis data are shown below.

Small-Scale Diseased Tree Shunt Prediction Output
In order to verify the effectiveness of the small-sized disease tree shunt prediction output proposed in this paper, a comparative experiment was designed to analyze the results of only the P3 layer predicting output for small-scale diseased trees and using both the P2 layer and P3 layer to predict small-scale diseased tree output. The detection effect and the specific experimental data are shown in Table 5 below. It can be found from the experimental comparison results in the two test samples in Figure 9 that the SPW-FEN algorithm proposed in this paper has the best recognition effect in small-sized pine wilt diseased trees. YOLOv3, Faster-RCNN, and ATSS all have obvious missed detections. The method proposed in this paper has greatly alleviated the missed detection of small-sized diseased trees, and the recognition effect is the best.

Ablation Study
To further analyze the impact of the proposed channel attention module and data enhancement module of this paper on the network performance, we used RetinaNet as the base network, and the effectiveness of the designed method will be discussed in the following three aspects: small-sized diseased tree shunt prediction output, anchor box recalibration, and PWCA module. The specific experimental analysis data are shown below.

Small-Scale Diseased Tree Shunt Prediction Output
In order to verify the effectiveness of the small-sized disease tree shunt prediction output proposed in this paper, a comparative experiment was designed to analyze the results of only the P3 layer predicting output for small-scale diseased trees and using both the P2 layer and P3 layer to predict small-scale diseased tree output. The detection effect and the specific experimental data are shown in Table 5 below.  It can be seen from Table 5 that when only the P3 layer prediction feature map is used to predict the small-scale diseased tree output, the recall rate is 82.1, and the precision is only 77.1. When the P2 layer prediction feature map and the P3 layer prediction feature map are used at the same time when the scale disease tree is used for prediction output, the recall rate is increased by 1.2 percentage points, and the precision is increased by 0.9 percentage points. The recall rate and precision reach between 83.2 and 78.0, respectively. It can be seen that it is necessary to split the diseased tree for prediction output.

Recalibration of Anchor Boxes
According to the distribution of target scales in the dataset, set the sizes of the anchors on the prediction feature maps P2 to P5 to 16 × 16, 36 × 36, 78 × 78, and 140 × 140, respectively, and the three aspect ratios of the anchors to, respectively { 1.0; 2.0; 0.5} and the ratio of the area of the anchor to {2 0 , 2 1/3 , 2 2/3 }. According to the size, aspect ratio, and the area of the anchor box, nine kinds of anchors are redesigned at each pixel on each layer of prediction feature layer. The comparison between the size of the anchor box in the original algorithm and the size of the anchor box after recalibration is shown in Table 6 below. From the data in Table 6, it can be seen that the adjustment of the anchor size can effectively change the detection effect of the diseased tree. There is no P2 layer in the original RetinaNet [17] network, and the detection accuracy and recall rate of the diseased trees are low. When adding the P2 layer and adjusting the size of the anchor in the P2 layer when detecting the diseased tree, the precision and recall rate are significantly improved. When the anchor of the P2 layer is set to 16 × 16, the anchor of the P3 layer is set to 36 × 36, the anchor of the P4 layer is set to 78 × 78, and the anchor of the P4 layer is set to 140 × 140, the recall rate and precision, respectively, reach 85.4 and 78.4, compared with when no adjustment is made to the size of the anchor, the recall rate and precision increased by 3 percentage points and 1.3 percentage points, respectively.

Pooled Weighted Channel Attenuation (PWCA) Module
To validate the effectiveness of the proposed Pooled Weighted Attention (PWCA) module in this chapter, this section investigates the influence of global average pooling and global maximum pooling on the detection of pine wilt disease in trees by adjusting the weighted parameter values (λ, β). Additionally, the impact of dimensionality reduction (MLP network) on the performance of the attention mechanism is analyzed through experiments. The specific experimental data are presented in Table 7. From the experimental results presented in Table 7, it can be observed that when λ = 0 and β = 1, the attention mechanism is referred to as ECA [29]. Additionally, when utilizing one-dimensional convolution instead of the dimensionality compression operation of the MLP network, the accuracy improves by 0.7% compared with that of the baseline. In this case, when the dimensionality compression operation of the MLP network is employed, the attention mechanism becomes CBAM [30]. Substituting the MLP network in the CBAM attention mechanism with one-dimensional convolution leads to a 0.5% increase in accuracy compared with the baseline. By adjusting the parameter values of λ and β and analyzing the weighted parameter experimental data, it is found that when λ = 1.5 and β = 0.5, the introduction of the attention mechanism has the highest recognition accuracy for the diseased tree. It is evident that the pooling weighted channel attention (PWCA) achieves the highest experimental accuracy, yielding the best detection results for diseased trees. The experimental results on the pine wilt diseased tree dataset indicate that the MLP network has a detrimental effect on the channel attention mechanism. It proves to be inefficient and unnecessary for capturing dependencies among all channels. Conversely, considering the recognition results for pine wilt diseased trees with fewer targets in a single image, the PWCA attention mechanism with an increased weight on global maximum pooling performs better in terms of diseased tree recognition.

Comprehensive Experimental Analysis
To further analyze the impact of the proposed channel attention module and data enhancement module of this paper on the network performance, we designed the ablation experiment after adding each module on the basis of the RetinaNet algorithm. The results of the ablation experiments are shown in Table 8 below. It can be seen from Table 8 that the recall of the proposed module increased from 82.4 to 85.4, the recall increased by 3, the AP increased from 77.1 to 78.4, and the AP increased by 1.3 after the anchor re-setting and the prediction output of the diversion in the RetinaNet network. After adding PWCA to the shallow feature map of the RetinaNet algorithm, the recall increased by 1.9 and the AP improved by 1.1. In the RetinaNet algorithm, the recall and AP of the algorithm were improved by 3.1 and 0.6, respectively, after the STE data enhancement method was adopted. At the same time, after using the PWCA module and STE data enhancement in the RetinaNet network, the recall was improved by 4.5 and the AP was improved by 2.0.
As shown in Figure 10a, the picture is overexposed, and light photography is brighter than natural light. As shown in Figure 10b-d, the original mosaic enhanced picture has lost the red and yellow-brown color characteristics of PWDT. Through experiments, it is found that these low-quality samples are mainly generated during HSV transformation of the image during sample enhancement [31]. To solve this problem, the STE data enhancement proposed in this paper removes HSV transformation operation, significantly improving the quality of the enhanced samples. The red circle represents the small-sized diseased tree after using the STE data enhancement method, as shown in Figure 10. Under the condition of ensuring the same quality as the original sample, the setting of the fixed scaling scale significantly increases the number of small target samples in the enhanced sample. There is only one small target or even no small target samples in the original image, and the number of small target samples in the transformed sample is increased by more than four to eleven samples, which effectively alleviates the problem of too few positive samples in the training process. diseased tree after using the STE data enhancement method, as shown in Figure 10. Under the condition of ensuring the same quality as the original sample, the setting of the fixed scaling scale significantly increases the number of small target samples in the enhanced sample. There is only one small target or even no small target samples in the original image, and the number of small target samples in the transformed sample is increased by more than four to eleven samples, which effectively alleviates the problem of too few positive samples in the training process. Figure 10. (a-d) represent the four pictures randomly generated in the Mosaic data enhancement method that deviate from the target color characteristics of the diseased tree, and (e-h) represent the four pictures generated by using the STE data enhancement method. Among them, the white box represents the labeling target in the mosaic enhancement method, the green box represents the labeling target in the expanded sample in the STE method, and the red circle represents the smallscale diseased tree. From the comparison chart, it can be seen that the STE data enhancement method has expanded the number of small-size diseased trees more.

Discussion
Through the analysis and statistics of the scale size of the diseased trees in the pine wilt diseased tree dataset, we found that the number of small-scale diseased trees is small, which is not enough for the network model to learn the characteristics of small-scale diseased trees. At the same time, we found that in drone footage, the small-scale diseased Figure 10. (a-d) represent the four pictures randomly generated in the Mosaic data enhancement method that deviate from the target color characteristics of the diseased tree, and (e-h) represent the four pictures generated by using the STE data enhancement method. Among them, the white box represents the labeling target in the mosaic enhancement method, the green box represents the labeling target in the expanded sample in the STE method, and the red circle represents the small-scale diseased tree. From the comparison chart, it can be seen that the STE data enhancement method has expanded the number of small-size diseased trees more.

Discussion
Through the analysis and statistics of the scale size of the diseased trees in the pine wilt diseased tree dataset, we found that the number of small-scale diseased trees is small, which is not enough for the network model to learn the characteristics of small-scale diseased trees. At the same time, we found that in drone footage, the small-scale diseased tree only occupies a small part of the pixel area in the image, and most of the pixel areas are background pixels. This background information seriously interferes with the feature extraction of the small-scale diseased tree.
As for the problem of background information interference, more and more researchers have begun to use the attention mechanism to alleviate the interference problem [32,33]. Therefore, in this paper, we propose a Pooled Weighted Channel Attention module to alleviate the background interference on small-scale diseased tree feature extraction. From the bias of the importance of global maximum pooling and global average pooling to feature learning after conducting research, a large number of experiments have proved that for the detection of small-scale diseased trees, the contribution of global maximum pooling is higher than that of global average pooling. Through the weighted fusion of global large pooling and global average pooling, exploring weight parameters is most suitable for small-scale diseased tree detection.
On the other hand, through the analysis of the advantages and disadvantages of the existing data enhancement methods in small-scale diseased tree data enhancement, we propose a data enhancement method based on small target sample expansion, so that it does not affect the color and shape of diseased tree targets. Based on the characteristics, the number of small-scale diseased trees is expanded. The experimental results show that the data enhancement method proposed in this paper can significantly enhance the number of small-scale diseased trees and the robustness of diseased tree detection.
The current research methods have achieved good results in the detection of smallscale diseased trees, but the detection effect on late-stage diseased trees is not good, and further research and analysis are needed. On the other hand, due to the high cost of acquiring diseased tree datasets, which require huge manpower and material resources, the number of existing pine wilt diseased tree datasets is relatively small. How to learn the characteristics of the target with a small number of labels is the focus of future research. At present, active learning technology is developing rapidly in various fields [34,35], and active learning mainly focuses on how to build efficient classifiers with little labeled data. Active learning technology provides a theoretical basis for future research on the identification of pine wilt diseased trees. Next, we will conduct research on tasks such as the classification of diseased trees in the field of active learning.

Conclusions
In this paper, to solve the problem of the poor detection effect of existing target detection algorithms on small-sized PWDT, we propose a new target detection network, SPW-FEN, for the detection of PWDT. First, to solve the problem that the shallow feature layer in the existing detection algorithms has insufficient ability to extract the features of small-sized diseased trees, in this paper, a PWCA attention module is proposed and adds the module to the shallow feature map, effectively improving the algorithm's ability to extract the features of small-scale diseased trees. Moreover, because of the problem that there are too few small-sized diseased trees in a single image, we propose an STE data enhancement method which effectively increases the number of small-sized diseased trees in a single image. The method proposed in this paper can effectively enhance the feature extraction ability of the network for small-sized diseased trees, reduce the missed detection rate of small-sized diseased trees, and achieve efficient detection of small-sized diseased trees in UAV images under complex backgrounds. The experimental results show that the method proposed in this paper has a recognition average precision of 79.1% and a recognition recall of 86.9% for pine wilt diseased trees. The recall and average precision are 3.6% and 3.8% higher than the current state-of-the-art method, Faster-RCNN [22]. At the same time, they are 6.4% and 5.5% higher than those in the YOLOv6 [28] algorithm in the latest YOLO series network.
In the future, we will focus on studying how to improve the detection performance of late-stage diseased trees, and use semi supervised feature learning and detection methods on the basis of a small amount of data samples to construct low-cost and high-precision diseased tree detection models. Additionally, we will further study the effect of mixed trees on the identification results of diseased trees, and verify the method for the possibility of error due to the presence of trees of other species (mixed forest).