An object detection method for bayberry trees based on an improved YOLO algorithm

ABSTRACT To quickly detect and count the number of bayberry trees, this paper improves the YOLO-v4 model and proposes an optimal YOLO-v4 method for detecting bayberry trees based on UAV images. We used the Leaky_ReLU activation function to accelerate the model extraction speed and used the DIoU NMS to retain the most accurate prediction boxes. In order to increase the recall rate of the object detection and construct the optimal YOLO-v4 model, the K-Means clustering method was embedded into DIoU NMS. We trained the model using UAV images of bayberry trees, it was determined that the optimal YOLO-v4 model threshold was 0.25, which had the best extraction effect. The optimal YOLO-v4 model had a detection accuracy of up to 97.78% and a recall rate of up to 98.16% on the dataset. The optimal YOLO-v4 model was compared with YOLO-v4, YOLO-v4 tiny, the YOLO-v3 model, and the Faster R-CNN model. With guaranteed accuracy, the recall rate was higher, up to 97.45%, and the detection of bayberry trees was better in different contexts. The result shows that the optimal YOLO-v4 model can accurately achieve the rapid detection and statistics of the number of bayberry trees in large-area orchards.


Introduction
Bayberry, also known as raspberry, is rich in glucose, cellulose, mineral elements, and amino acids.Every hundred gram of Bayberry edible parts contains 14 mg of calcium, 9 mg of vitamin C, and 8 mg of phosphorus, which has high nutritional value (Ge et al. 2018).Brightly colored, sweet and sour, bayberries are popular with consumers.With an overall oversupply in the fruit market, the price of high-quality bayberry is still high.Therefore, its planting scale has gradually expanded with high demand.Many provinces in southern China have bayberry tree plantations, such as Jiangsu, Zhejiang, Taiwan, Fujian and Jiangxi (As of 22 July 2022, Flore of China and Catalogue of life China listed).Among them, Zhejiang Province ranks first in the country in terms of planting area and output.The picking period for bayberry is very short, with no peel, mature bayberry is fragile and perishable (Wang, Lv et al. 2020).After the Summer Solstice, bayberry matures in reconstruction, which improved the detection effect of UAVs, enhanced the detection and positioning algorithms.However, the system efficiency needs to be further improved.Or combined data migration method to select data region of interest as sample data, to achieve the best data classification with the least amount of data (He, Guo, and Yuan 2020;Tuia, Pasolli, and Emery 2011).However, the method has fewer data categories, not suitable for large-scale data classification.Lei et al. (2022) proposed a multi-module convolutional neural network segmentation method, which applied semantic segmentation to bayberry fruits to realize automatic picking of bayberry orchards.However, the twostage algorithm needs to determine the pre-selection box in advance before detecting the object, the steps are cumbersome and the detection speed is not ideal.In 2013, Joseph Redmon proposed a onestage object detection method -YOLO (You Only Look Once) (Redmon et al. 2016), which simplifies the object detection process, pushes the development of real-time detection and improves the accuracy and speed of object detection (Liu and Li 2022;Radovic, Adarkwa, and Wang 2017).In 2018, Joseph Redmon launched the YOLO-v3 (Redmon and Farhadi 2018), which further improved the speed of object detection (Dewi, Chen, and Yu 2020).Kuznetsova, Maleva, and Soloviev (2020) and Liu et al. (2020) used YOLO-v3 for fruit detection, the experimental test achieved real-time detection speed.While the proposal of YOLO-v4 (Bochkovskiy, Wang, and Liao 2020) further optimize the speed and accuracy of object detection.In studies of Dewi et al. (2021) and Kumari et al. (2021), YOLO-v4 has the best accuracy and speed compared with other object detection methods.However, Xia, Qu, and Wan (2022) proposed a deep learning method that requires high computational cost and storage space when conducting medical image coloring research.Several scholars have also conducted research on this issue, such as Guo et al. (2022) who proposed a hierarchical multi-attention transfer framework (HMAT), compressed deep learning models, and object detection results that outperformed the state-of-the-art KD method.
Manually determining the number of bayberry trees in mountainous areas is difficult.Considering the superiority of the YOLO-v4 method in object detection and the lightweight memoryfriendly model.We proposed an object detection and statistics method of bayberry trees in large area orchard based on improved YOLO-v4 algorithm.Firstly, the image data of bayberry trees is collected and preprocessed through the UAV platform.Then, the YOLO-v4 tiny model is optimized and improved to make it suitable for object detection and statistics of large-area bayberry trees.The optimal threshold of the model is selected based on the characteristics of bayberry trees and data detection, the optimal YOLO-v4 model is generated to achieve the optimal object detection effect, realize the automatic management, and yield prediction of bayberry orchard.Finally, the experimental dataset is used as a test set to compare object detection accuracy and speed with YOLO-v4, YOLO-v4 tiny, Faster R-CNN and YOLO-v3 models.When the object detection accuracy is reliable, the optimal YOLO-v4 model has the best recall rate providing a high-quality method reference for the large-scale management of other orchards.

Model theory
YOLO-v4 (You Only Look Once) algorithm is a method to transform object detection into regression problem.While ensuring the detection speed of the original YOLO-v3 model, a large number of detection technologies are studied, combined and innovated to improve the detection accuracy of the model.YOLO-v4 employs Mosaic and SAT data augmentation methods, as well as the Mish activation function, to improve accuracy, CSPDarkNet53 for backbone feature extraction network.However, as this method has about 60 million parameters, it is very demanding on computer memory.Therefore, this study attempted to use YOLO-tiny structure (Figure 1), only 6 million parameters, greatly reducing the computer memory consumption.The YOLO-v4 tiny model is evaluated (Figure 2), because of its shallow network depth, when the loss function dropped to about 6 after 500 epochs, the model converges faster.
The method is divided into three modules: CSPDarknet53-tiny, FPN, and YOLO head.It used the Leaky_ReLU activation function to replace the Mish function (Misra 2019), and only one feature pyramid is used in the feature enhancement layer without DownSampling.
CSPDarknet53 network structure (Wang, Mark Liao et al. 2020) consists of three BasicConv blocks and three Resblock_body blocks.In order to accelerate the extraction speed, the Leaky_R-eLU Activation Function (Formula 1; Figure 3) is used in the BasicConv block.The Resblock_body block uses the CSPnet structure, which splits the residual blocks, with one part continuing the stacking of the original residual blocks and the other being reserved like a residual edge for final fusion processing.The FPN structure is to UpSampling one of the valid feature layers after BasicConv processing and Concat processing with another valid feature layer.The YOLO-v4 tiny structure extracts a total of two feature layers, where the YOLO-head output layers respectively are    19, 19, 75) and (38, 38, 75).

Model improvement
Overlapping parts will be repeatedly detected about 2-4 times, and eventually only the highest precision predicted box has to be retained in the object detection of overlapping images.The YOLO-v4 method adds the aspect ratio to the CIoU LOSS Function, but the CIoU LOSS Function can only be used when coordinates corresponding to the ground-truth boxes are present.Considering that it is difficult to implement in real scene prediction, NMS method is needed to delete duplicate predicted boxes.However, if the original NMS method is used, when the threshold is small, only sparse objects can be retained, overlapping objects cannot be retained; when the threshold is large, it is difficult to delete some overlapping objects, so the improved NMS algorithm is needed.Since DIoU takes into account the location information of the center points of the bounding boxes, it has more advantages for object selection.DIoU NMS method (Zheng et al. 2020) is used to delete duplicate boxes, so as to obtain the boxes with the highest confidence.
In data processing, DIoU NMS method can accelerate convergence, but only DIoU NMS method cannot solve the problem of overlapping.Since the DIoU NMS does not take into account aspect ratio, the K-Means clustering method was utilized to divide the data and determine the range of aspect ratio.Clustering analysis by using the center threshold to find the relationship between data objects, so as to group the data.The higher the similarity within the group, the greater the difference between groups, and the better the clustering analysis effect.K-Means algorithm is based on distance division data, this paper used Euclidean Distance calculation (Formula 2) to find the object set K cluster. (2) The K-Means algorithm process is as follows: Step1: Randomly select k objects as the initial clustering centers; Step2: For the remaining objects, classify them to the nearest cluster according to their similarity (distance) with each center, and then calculate the average value of all objects in each center as the new center; Step3: Repeat the procedure until the criterion function converges or the maximum number of epochs is completed.
In this study, the K-Means clustering method was directly embedded into the model to automatically obtain anchors and cluster the aspect ratio range (Figure 4), filter and delete unreasonable object boxes.Monitor test data and count output after threshold of aspect ratio was obtained, the result of the ground-truth could be obtained.Then combined with DIoU NMS method to remove the duplicate boxes, a new object detection method was obtained.

Evaluation index for model
The evaluation index for model in this study was a commonly used evaluation index for object detection models.At the stage of model training, the Loss function was used to evaluate the convergence of model training process; the Mean Average Precision (mAP) was used to evaluate the performance of the model, which can evaluate the performance of the model in various object detection tasks.For the evaluation of the model results, the precision (Formula 3), recall (Formula 4) and F1-score (Formula 5) of the model were calculated by counting true positive (TP) and false positive (FP).
TP represents the number of positive samples correctly detected as positive samples, FP represents the number of negative samples incorrectly detected as positive samples, and FN represents the number of positive samples incorrectly detected as negative samples.In the formula, the precision is the ratio of the predicted number of positive samples to the predicted number of forecast samples, and the recall is the ratio of the predicted number of positive samples to the number of real objects, that is, whether the positive samples are detected.F1-Score (0 ≤ F1 ≤ 1) is used to weigh the scale of precision and recall, which represents the harmonic mean of recall and precision, the higher the F1-score, the better the result.

Experimental process
The experimental procedure is shown in Figure 7. Firstly, the original data was processed, the image of large area bayberry orchard was segmented by overlapping segmentation.The dataset was divided into a training set and a test set.The performance of the YOLO-v4 model convolutional neural network was verified by qualitative comparison and quantitative evaluation.A lightweight model that places less strain on the computer's memory was chosen to train the model and adjust the parameters.Then, the partial method of YOLO-v4 model was improved to realize accurate detection and statistics of objects.The YOLO-v4 minutiae model with optimal thresholding was used on the test set to handle object detection.In order to achieve the best object detection result, the duplicate detection box was lastly removed using the K-Means clustering method.
The experiment mainly includes the following five parts: .Data collection.detecting the appropriate UAV orchard image; . Data preprocessing.dividing the dataset and making the training set; . Network model setting.the improved YOLO-v4 model was used to test the training set, and the experimental results were decision fusion with the model setting, to optimize the model parameters and obtain the optimal YOLO-v4 model; .Bayberry trees object detection.we used the model to detect orchard overlapping data which after segmenting and extracting coordinate, based on the detection results to get the number of bayberry trees;  .The Optimal YOLO-v4 Model testing.the effectiveness of the optimal YOLO-v4 model for UAV image object extraction was verified by qualitative and quantitative comparison experiments.

Dataset production
The training speed of YOLO-v4 tiny Model is negatively correlated with the image size, so it is necessary to cut the original images.Using pix4DMapper to process UAV data to extract study area, 611 images were selected from 3108 original images, and the image data was divided into 1368 × 912 pixels while ensuring the integrity of bayberry trees object.To ensure the comprehensiveness of the test results, the test set was divided into six types based on the bayberry trees object scene: sparse, dense, strong background, weak background, backlight, side light and facing light (Figure 8).
The dataset was used for model training and verification test, and the test set was used for the verification of the final model results and comparative study with other models (Table 1).The training set accounts for 0.85 of the total dataset, the test set accounts for 0.15, and the validation set accounts for 80% of the training set.

Selection of Overlapping Ratio of Segmented Regions
The UAV image has a single size of 5472 × 3648 pixels.According to the image resolution and pixel size of the dataset, the data was clipped by 20% area overlap, so that the bayberry trees objects were fully displayed and the repeated objects detection was reduced.Detailed segmentation information as shown in Table 2, the segmented images shown in Figure 9, and the segmentation method is in Figure 10.During image segmentation, the coordinates were noted, the object detection boxes of the repeated region were removed, and the repeated detection of the images due to data overlapping segmentation was avoided.

Coordinate Extraction of Segmented Area
The left upper corner coordinates, width and height of the segmented images were recorded to statistic the information in the object of the original images (Figure 11), including the Index Number of the image (index = 01), the left upper corner coordinates (x, y) and the width and height (w, h).The detection box (obj) in each image included the upper left corner coordinates (x_left, y_left), width and height (w_obj, h_obj) and the predicted score of the detection boxes (obj_score).After detection and statistics, the information of the object on the original images can be obtained by adding each obj result of the images to the image information of corresponding numbering.
3.5.Determination of optimal model parameters for bayberry trees detection 3.5.1.DIoU NMS threshold selection of optimal model DIoU NMS method was used to detect the overlap boxes of image processing dataset and retain the highest precision prediction box.By analyzing the object distribution of bayberry trees in the sample plots, the DIoU threshold setting should be small to ensure the removal of repeated predicted boxes.The number of prediction boxes retained by the bayberry trees is determined by the threshold of the non-maximum suppression method (Figure 12).When the threshold value was 0.9, there were too many prediction boxes retained, and there would be more data for further work.If the threshold value was 0.1, it will lead to missing data.To ensure the accuracy of data monitoring while the number of prediction boxes was as little as possible, the DIoU NMS threshold employed in this study is 0.2.

K-means threshold selection of optimal model
The K-Means clustering method was used in this study to remove the invalid cutting prediction boxes based on the difference between the aspect ratio of the segmented objects and the conventional bayberry trees object prediction boxes diagrams, so it is necessary to select the appropriate screening threshold.As shown in the aspect ratio distribution histogram of the model (Figure 13), the distribution of aspect ratio threshold in the 0.6∼1.18prediction boxes was more concentrated.To accurately find the threshold of the prediction boxes, SPSS data analysis tool was used to cluster the aspect threshold, 7 and 8 clustering centers are used respectively.The results are shown in Table 3.
In order to verify the effectiveness of the threshold division of K-Means clustering center results, the aspect ratio is divided according to Tables 4 and 5, the detection results were marked by different colors.Figure 14   and d represent the threshold results in Table 5.The thresholds in Table 5 are more accurate for the detection of sample 1 and sample 2, especially for the larger bayberry trees.In sample 2, the slight change of the aspect ratio threshold had little effect on the results, so the aspect ratio threshold was 0.4-1.6.

Weight selection of optimal model
Model training produced one model every 800 times, resulting in a total of 12 model weights.In this training, the model with the highest mAP value was chosen as the model with the best overall performance.Then the model's threshold was adjusted to balance the accuracy and recall rate, so as to find out the optimal weight of the model, that can both meet the detection accuracy of bayberry trees and the objects for the number of bayberry trees.Since there is only bayberry trees object in the study, so mAP = Ap.Because the number of bayberry trees was the critical factor in the single category, the recall priority was greater than the precision rate under the premise of ensuring overall accuracy at the result evaluation, so the recall rate of the model was mainly considered.Training parameters are shown in Table 6.The P-R curve (Figure 16) shows that the recall rate of the model was negatively connected with accuracy, being low when the model had high precision.Since the object detection dataset only has bayberry trees, a high recall rate is required.In the P-R curve of the optimal weight data, the accuracy of the model met 60% when the model met the recall rate of 95%.

Threshold selection of optimal model
The selection of threshold affects the accuracy and recall of object detection results (Figures 17 and  18).When the threshold is too low, the object will be detected with low precision, resulting in more detection boxes, a lower detection rate, and a higher recall rate.Conversely, when the threshold is too high, the accuracy rises while the recall rate falls.The recall rate was high when the threshold was 0.25, which could be used as the threshold of the model and the parameter effect was the best.
As the result, to ensure a recall rate and balanced precision rate, the model threshold was adjusted to 0.25, and the model with 2400 epochs was adopted in this experiment.It was referred to as the Optimal YOLO-v4 Model.

Analyze and extract the anchor box using K-means clustering
The experimental dataset was calculated by K-Means clustering method.First, extracted the object boxes from all the training data and put all the rectangle boxes together to process the width and height of the training data.Then, the cluster algorithm was used to analyze and extract the anchor boxes of the dataset.It was calculated that the input size of the anchor boxes of the bayberry trees dataset was 608 × 608, and the output sizes were (31 × 33), (39 × 39), (44 × 46), (48 × 53), (56 × 54), (7 × 65), (67 × 65), (70 × 77), (84 × 88).
From the loss function value results (Figure 19) obtained by using the K-Means clustering prior boxes training, it can be seen that the loss function of the model is larger in the early stage of training.During the training process, the loss function of the model without K-Means clustering anchor boxes was reduced to about 6 after 2200 epochs (Figure 19(a)), and after 9600 epochs of the network training, the loss function of the model was stabilized at about 1.After using K-Means to calculate the anchor boxes, the model loss function was reduced to 6 after 1500 epochs (Figure 19(b)), the convergence speed of the model was faster, and the loss function value of the model was gradually stabilized to 0.5 after 9600 epochs.It showed that the use of data anchor boxes could improve the convergence speed of the model.

Model performance assessment
The performance of the model is generally evaluated by the mAP value of the model weight.The higher the value of mAP, which ranges from 0 to 1, the better the effect.We compared the mAP values of the weights trained for the YOLO-v4 tiny model and the optimal YOLO-v4 model (Figure 20).YOLO-v4 tiny model has a maximum mAP of only 0.16.For the dataset with complex environment, the YOLO-v4 tiny structure model is not sufficient to meet the detection requirements.The mAP value of the optimal YOLO-v4 model is as low as about 0.35.When the number of epochs is  2400, the model weight mAP value is the highest, which is 0.8823.The overall performance of the model has been significantly improved.

The statistical results of final detection
The optimal YOLO-v4 model was used to detect sample 1 and sample 2 (Table 7; Figure 21), the final extraction objects of sample 1 were 503 and that of sample 2 were 765.
The final statistical results of the model showed that the object detection accuracy and recall rate on samples 1 and 2 were higher, with respective F1 values of 0.9701 and 0.9816.According to the samples of FP (Figure 22) and FN (Figure 23) of the sample plot results, the primary causes of the model accuracy loss are the presence of dense objects and inverse light occlusions, which also lead to accuracy losses in model detection and threshold adjustment in statistics.However, the results of threshold adjustment show a low loss rate.

Robustness comparison results of different models
In order to verify the detection effect of the optimal YOLO-v4 model, the accuracy, recall and F1score of YOLO-v4, YOLO-v4 tiny, Faster-CNN, YOLO-v3 and optimal YOLO-v4 models were evaluated by using the experimental dataset and the density, growth background and illumination  angle of bayberry trees as test categories.Since there was only one detection object data, bayberry trees, in the test set, the numbers of bayberry trees detected at this time were the critical factor.Therefore, when compared the robustness of different models, the recall rate of the model was still focused on under the premise of ensuring the accuracy.

Comparison results of different densities
In this section, the dataset was divided according to the density of bayberry trees growth.The test set used 14 sparse and 14 dense images, respectively including 351 and 606 objects of bayberry trees.In comparison with YOLO-v4, YOLO-v4 tiny, Faster R-CNN and YOLO-v3 models (Table 8), the optimal YOLO-v4 model had a recall rate of 96.87% when sparse and 95.05% when dense, while satisfying model detection accuracy, which was the highest among the five models.
The optimal YOLO-v4 model applies the CIoU Loss function, and it can accurately detect the object with the significant crown shape characteristics of the bayberry trees segmented at the boundary.While the YOLO-v3 and Faster R-CNN models have no obvious effect on extracting the bayberry trees objects with the boundary segmented, YOLO-v4 and YOLO-v4 tiny can only detect partial boundary objects (Figure 24).

Comparison results of different backgrounds
This part of the experiment was divided into two test datasets: weak background (There is no object similar to the shape feature of bayberry trees in the image) and strong background (There are objects similar to shape features of bayberry trees in the images), which contained respectively 369 and 430 bayberry trees.The test results are shown in Table 9 and Figure 25.The optimal YOLO-v4 model had the lowest accuracy of the five models in weak background, but it still reached 84%.Constantly in strong background, because the NMS detection threshold was low enough, the object similar to other features at the image boundary was marked, enhancing the accuracy rate to 90%.In both backgrounds, the recall rate of the optimal YOLO-v4 model was always the best.The optimal YOLO-v4 model has CSPNet network structure and PANet structure, which enhances the feature extraction ability and strengthens the transmission and integration of feature information,  while the Faster R-CNN and YOLO-v3 models have insufficient feature extraction ability and low recall rate, The YOLO-v4, YOLO-v4 tiny also have slightly lower recalls compared with the optimal YOLO-v4 model.

Comparison results of different light angles
According to the different light angles, the test dataset was divided into the dataset of facing light (Illumination direction consistent with imaging direction), backlight (Illumination direction opposite to imaging direction) and side light (90°angle between illumination direction and imaging direction), respectively including 471, 437 and 309 bayberry trees.
In the three cases (Figure 26), the overall performance of the optimal YOLO-v4 model was relatively stable, with the best recall rate and the highest F1-score (Table 10).In the backlight dataset with the lowest F1-score, the F1-score of the optimal YOLO-v4 model was 0.8983, which was still 0.2027 higher than that of YOLO-v3 model and 0.0464 higher than that of Faster R-CNN model.The main reason is that the optimal YOLO-v4 model uses DropBlock regularization method to improve the generalization ability of the model, and the feature pyramid also improves the feature extraction ability.The YOLO-v4 and YOLO-v4 tiny are less capable of detecting objects with segmented boundaries, resulting in a lower recall rate.

Overall dataset detection results
Comprehensive analysis of the model test results of the overall dataset (Table 11), the recall rate of YOLO-v3 algorithm in different datasets was very low, which led to the low F1-score of the model.When compared with YOLO-v4, YOLO-v3 and Faster R-CNN algorithm, the YOLO-v4 tiny detection accuracy was higher, but the recall rate did not reach the actual application level.The optimal YOLO-v4 model achieved a high recall rate for the accuracy, which ensured the experimental theoretical basis of the model in practical application.
The above analyzes show that the recall rate of the optimal YOLO-v4 model reaches 96.03% under different growth states, showing the optimal detection level.However, the model also has some accuracy loss, when the background features are similar to the object features of bayberry trees, such as the same or similar colors, shapes and textures, there will be misunderstanding or missing, resulting in inevitable accuracy loss.

Conclusions
Using the detection and statistics of bayberry trees as the pivotal point in this study, the YOLO-v4 feature extraction network proved to be the most effective method for obtaining the deep-seated features of bayberry trees.The lightweight YOLO-v4 tiny model was chosen to optimize the YOLO-v4 model while ensuring model performance and minimizing memory pressure on the computer.The model used the Leaky_ReLU activation function to enhance the data.The DIoU NMS method, an aggregated K-means clustering method, was used to filter out duplicate detection frames and retain the prediction frames with the highest accuracy.The threshold was set to 0.2, so that the objects with occlusion or boundary segmentation can also be predicted.The K-Means clustering method was used to screen the aspect ratio of the prediction boxes, and effectively removed the prediction boxes of invalid segmentation.By testing the results of the model under different thresholds, the optimal threshold of the model was finally determined to be 0.25, and the optimal YOLO-v4 model was obtained.Compared the object detection results with YOLO-v4, YOLO-v4 tiny, YOLO-v3 model and Faster R-CNN model, the test dataset was divided into different growth scenarios such as sparse, dense, strong background, weak background and backlight, sidelight and facing light.The optimal YOLO-v4 model performed best, which fully verified the practicability of the model and the effectiveness of the statistical method, The parameters of the model are only 6 million, which is suitable for object detection with limited memory resources and has high practical value.The robustness of the optimal YOLO-v4 is good.It has good detection results for bayberry trees in various scenarios such as sparse, dense, strong background, weak background, facing light, side light, and backlight, and is suitable for single object detection tasks in complex environments.
To sum up, the following work needs to be further improved: This paper mainly uses the bayberry trees dataset.However, for fruit trees with different crowns and textures in practice, whether the optimal YOLO-v4 model will cause accuracy loss needs further test and optimization, and the accuracy of fruit tree detection, extraction and statistics in complex scenes needs to be improved.

Figure 7 .
Figure 7.The flow chart of experiment.
and Figure 15 subgraphs a and b represent the threshold results in

Figure 13 .
Figure 13.The histogram of sample 2 aspect ratio.

Figure 14 .
Figure 14.The predict results of sample 1 with different thresholds.

Figure 15 .
Figure 15.The predict results of sample 2 with different thresholds.

Figure 17 .
Figure 17.The parameters of different thresholds.

Figure 18 .
Figure 18.The predicted results with different thresholds.(a) The predicted result with threshold is 0.05.(b) The predicted result with threshold is 0.95.

Figure 19 .
Figure 19.The loss function value of model.(a) The loss function value without K-Means anchor boxes.(b) The loss function value under K-Means anchor boxes.

Figure 20 .
Figure 20.Model performance assessment.(a) The mAP value of YOLO-v4 tiny model weight.(b) The mAP value of the optimal YOLO-v4 model weight.

Figure 21 .
Figure 21.The statistic results of samples.(a) The statistic results of sample 1.(b) The statistic results of sample 2.

Figure 24 .
Figure 24.The results of five models under different test datasets.The top image is sparse, the bottom image is dense.(a) The optimal YOLOv4 model.(b) The YOLOv3 model.(c) The Faster R-CNN model.(d) The YOLO-v4 model.(e) The YOLO-v4 tiny model.

Figure 25 .
Figure 25.The results of five models under different test datasets.The top image is a weak background, the bottom image is a strong background.(a) The optimal YOLOv4 model.(b) The YOLOv3 model.(c) The Faster R-CNN model.(d) The YOLO-v4 model.(e) The YOLO-v4 tiny model.

Figure 26 .
Figure 26.The results of five models under different test datasets.The top picture is the facing light, the middle picture is the backlight, and the bottom picture is the side light.(a) The optimal YOLOv4 model.(b) The YOLOv3 model.(c) The Faster R-CNN model.(d) The YOLO-v4 model.(e) The YOLO-v4 tiny model.

Table 2 .
The information of samples orchard (Image size unit: pixel).

Table 3 .
Different clustering centers of K-Means.Figure 11.Image segmentation coordinate recording.

Table 4 .
The results of different thresholds.

Table 5 .
The results of different thresholds.

Table 6 .
The information of training parameters.
Figure16.The P-R curve of optimal model weight.

Table 7 .
The statistical results of sample1 and sample 2 orchard.

Table 8 .
The results of five models under different test datasets, sparse and dense.

Table 9 .
The results of five models under different test datasets, strong and weak background.

Table 10 .
The results of five models under different test datasets, facing light, backlight and side light.

Table 11 .
The results of five models under whole test dataset.