A real-time branch detection and reconstruction mechanism for harvesting robot via convolutional neural network and image segmentation

https://doi.org/10.1016/j.compag.2021.106609Get rights and content

Highlights

  • Proposed a real-time detection and reconstruction scheme for obscured branches.

  • The method was applied to harvesting robots for fruit picking.

  • Combined image processing and CNNs for improving branch localization accuracy.

  • Branch reconstruction constraints are proposed according to the growth state.

  • The branch reconstruction speed reaches up to 22.7 FPS.

Abstract

To alleviate the burden of fruit harvesting imposed by rising costs and decreasing labor supply, intelligent robots are highly desired in modern farms. A major problem, however, is how to detect and locate the tree branches for the robots to plan their arm movements during harvesting process. This study addresses the obscured branch detection and reconstruction problem, and proposes a real-time branch detection and reconstruction (RBDR) mechanism using convolutional neural networks (CNNs) and image processing techniques. Firstly, we build a Branch-CNN framework for detecting the bare branches and complete their rough localization, and then, realize the background segmentation in HSV space to obtain the precise branch regions. Finally, with the distance and angle constraints considered, a polynomial fit is conducted onto the precise boxes of the same branch to fill in the obscured areas. The proposed RBDR mechanism is applied onto a harvesting robot platform, and experiments with both lab simulated orchard environment and real pomegranate tree environment are conducted to verify its feasibility. Results show that under the simulation environment, at an Intersection over Union (IOU) threshold of 0.5, Branch-CNN achieves the best overall performance, with the average detection precision, recall rate, and F1-Score being 90.98%, 92%, and 91%, respectively, and the average reconstruction accuracy of RBDR is 88.76%. Under the real pomegranate tree environment, Branch-CNN achieves 90.7% detection precision, 89% recall rate, and 90% F1-Score, respectively. The overall reconstruction speed of RBDR is 22.7 frames per second (FPS) on image with a resolution of 960*720. Such results fully demonstrate the rationality and effectiveness of RBDR.

Introduction

The increasing shortage of stable labor worldwide is bringing huge burden to the seasonal fruit industry and causing significant economic losses (Clark, 2017). As a tool that is expected to partially or fully “replace” the humans, intelligent harvesting robots are being applied to various repetitive and heavy work, e.g., fruit and vegetable harvesting, grafting seedlings, farmland irrigating, etc (Ren et al., 2020, Utstumo et al., 2018). However, when facing unstructured and complex orchard environments, harvesting robots must be perceptual, semantic, and capable of extracting reliable environmental information. For example, a harvesting robot not only needs to detect and locate apples quickly and accurately, but also needs to identify the branch structures to plan its arm motion trajectory and grasping strategy (Tao and Zhou, 2017). So far, extensive work has been done for fruit recognition, and promising results have been achieved (Koirala et al., 2019, Afonso et al., 2020), while for fruit tree branch detections and reconstructions, however, the progress is rather limited due to the complex spatial branch structures (e.g., the topology, length, number, etc.). Such a problem introduces various difficulties to both kinematic solution and trajectory planning of a robot, and may even causes serious damages to it. Hence, accurate branch detection and reconstruction is of critical importance to harvesting robots for their practical applications.

The existing work on branch detection methods can be roughly divided into two categories according to the sensors been utilized, i.e., the Light Detection and Ranging (LiDAR)-based and the camera-based methods. The former is to directly acquire 3D point cloud data of the orchard using LiDAR, and then detect and reconstruct the branches based on this information (Zhang et al., 2020, Westling et al., 2021). For example, Hackenberg et al. developed hierarchical columnar structures to describe the relationships among branches and also extract different tree components (Hackenberg et al., 2015). However, this type of methods require the input of high-quality point cloud data that is fragmented by light and easily got lost in real scene, the reconstruction effects could be affected. The authors of (Westling et al., 2021) proposed skeleton-based approaches to extract branch curves from point cloud data and fit them using feature equations. The major drawback of this type of methods, however, is that they all require LiDAR to scan the object from different positions to obtain sufficient information, and to form a cohesive point cloud. Such a process largely impacts on the effectiveness of the branch reconstructions.

In contrast, the camera-based methods use different cameras to capture images of orchard environment first, and then identify the branches with their appearances, shapes, textures, colors, and spatial relationships. Finally, reconstruction is fulfilled with artificial intelligence techniques using the extracted branch information. Among all those techniques being used for recognitions, convolutional neural networks (CNNs) are the most popular for its robustness, accuracy and effectiveness (Zhang et al., 2018, Gao et al., 2020, Dias et al., 2018). Majeed et al. effectively segmented branches in the apple canopy based on SegNet, and achieved a mean accuracy, Intersection over Union (IoU) and Boundary-F1 of 0.94, 0.52, and 0.92, respectively (Majeed et al., 2020). However, the method is examined only during the dormant season, and could not be directly applied in the harvesting season. Zhang et al. realized automatic trunk and branch segmentations with the ResNet-18 and achieved promising results with an average PcA and F1-score of 97% and 0.89, respectively (Zhang et al., 2021). Nevertheless, they failed to consider the real-time scenario, which seriously reduces the harvesting efficiency. Itakura et al. realized tree trunk diameter estimation with Yolov2, and achieved accurate diameter measurement with a recognition rate over 80% (Itakura and Hosoi, 2020). Unfortunately, the method lacks completeness as no tree branch structure was considered. Zhang et al. used trellis wires trained R-CNN to detect apple tree branches in shake-and-catch harvesting, and obtained an average Recall rate and an accuracy of 92% and 86%, respectively (Zhang et al., 2018). These neural network-based segmentation approaches in (Majeed et al., 2020, Zhang et al., 2021) enable accurate object segmentation for different classes of pixels in an image directly. However, there exist two major drawbacks, of which one is the trade-off between the real-time application and the performances, and the other is the lack of pixel level strong annotations contained in the current image dataset.

It is worth noting that owing to the time-consuming preparation of pixel-level annotation datasets and the slow detection speed, the semantic segmentation-based neural networks are not suitable for harvesting robotics. By leveraging the robustness of CNN and specificity of image segmentation methods, Livesley et al. directly used the CNN rectangular prediction box to identify the bare branches, and distinguished the low-level semantics of the objects according to the specific task to achieve object segmentation (Livesley et al., 2016). Such a mechanism provides a good idea of using neural networks with fast detection speed for coarse target localization and segmentation. Both Yolo (You only Look once) (Bochkovskiy et al., 2020) and SSD (Single Shot MultiBox Detector) (Liu et al., 2016) are representative algorithms of this category. Specifically, Yolo-v4 achieves 65.7% average precision on the MS COCO dataset (AP50), with a real-time speed of 65 FPS. However, since the detection boxes of both types of schemes are parallel to the image edges, they cannot accurately fit the branches with complex growth states, which thus requires further refinement. Furthermore, few studies have considered reconstructing partially obscured branches, let along giving a branch description model, and/or building a pixel-level database of branches.

To address the problem of real-time detection and reconstruction of partially obscured branches, this study proposes a real-time branch detection and reconstruction (RBDR) method applied to fruit harvesting during the harvest season using robotic arms. Specifically, we build a novel CNN network framework first based on Yolo-v4, which is a high real-time detection mechanism, yet the detection boxes struggle to exactly locate branch regions. And then, the CNN detection boxes utilizing image segmentation for accurate location of the branch boundaries are optimized. Next, the division of two precise boxes belonging to the same branch is achieved based on the branch growth trend constraints, and the structure reconstruction is completed. Finally, experiments are performed on simulated and real fruit trees to verify the performance. By using the transfer learning and tuning parameters, RBDR is also applied to the reconstruction of real pomegranate tree branches.

The contributions of this study are fourfold. First, branch reconstruction is applied for the first time to a harvesting robot for picking using robotic arms. Second, a novel structured CNN is built. Thirdly, image processing and CNN are combined to improve the localization accuracy of target detection. Fourthly, real-time reconstruction of branches is considered.

The rest of this paper is organized as follows. Section 2 presents the proposed CNN framework, prediction boxes optimization scheme, branches correlation analysis, and the structure reconstruction mechanisms. Section 3 shows the simulated and actual experimental results and analysis. Section 4 concludes the whole paper.

Section snippets

The overall RBDR mechanism

This paper focuses on real-time detection and structural reconstruction of obscured fruit tree branches. The specific operation steps are as follow. (see Fig. 1) First, Branch-CNN is constructed to detect branches and the model is pre-trained on a simulated fruit tree dataset. Then, in the HSV space, background segmentation is performed based on CNN prediction information according to the branch color threshold range of the branches. After that, the branch boundaries are fitted with the minimum

Branch detection with different CNNs

This experiment compares the performances of Branch-CNN and Yolo-v4 to verify whether Branch-CNN can improve the detection performances. The results of P-R curves for both are shown in Fig. 18. The P and R in the figure are defined as Eq. 6.P(precision)=TPTP+FPRrecall=TPTP+FNF1=2×P×RP+Rwhere TP, FP, FN, denote true-positives, false-positives, and false-negatives, respectively. F1-Score balances the Precision and Recall of the classification model. The integral of the P-R curve represents the AP

Conclusion

This paper has reported the design and performance of a novel branch recognition and structure reconstruction scheme based on neural networks and image segmentation. The experimental performance shows that the AP of branch detection is 90.98%, reconstruction accuracy is 88.76% and Recall is 92% on simulated fruit trees. As well, the AP, Recall and F1-score are 90.7%, 89%, 90% respectively, with a detection speed of 22.7 FPS on actual pomegranate trees. All results demonstrates that the proposed

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the Key Research and Development Program of Shaanxi, China (2021SF-342), China Postdoctoral Science Foundation (2018M641013), Postdoctoral Science Foundation of Shaanxi Province, China (2018BSHYDZZ05).

References (25)

  • M. Afonso et al.

    Tomato fruit detection and counting in greenhouses using deep learning

    Frontiers in plant science

    (2020)
  • Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M., 2020. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint...
  • Cited by (24)

    View all citing articles on Scopus
    View full text