Fast Anther Dehiscence Status Recognition System Establishing by Deep Learning to Screen Heat Tolerant Cotton

Cotton is one of the most economically important crops in the world. The fertility of male reproductive organs is a key determinant of cotton yield. The anther dehiscence or indehiscence directly determine the probability of fertilization in cotton. Thus, the rapid and accurate identification of cotton anther dehiscence status is important for 28 judging anther growth status and promoting genetic breeding research. The development of computer vision technology and the advent of big data have prompted the application of deep learning techniques to agricultural phenotype research. Therefore, two deep learning models (Faster R-CNN and YOLOv5) were proposed to detect the number and dehiscence status of anthers. addition, the percentage of dehiscent anther of randomly selected 30 cotton varieties 45 were observed from cotton population under normal conditions and HT conditions 46 through the ensemble of Faster R-CNN model and manual observation. The result 47 showed HT varying decreased the percentage of dehiscent anther in different cotton 48 lines, consistent with the manual method. 49 50 The deep learning technology first time been applied to cotton anther dehiscence 51 status recognition instead of manual method to quickly screen the HT tolerant cotton 52 varieties and can help to explore the key genetic improvement genes in the future, 53 promote cotton breeding and improvement. 54

The single-stage model based on YOLOv5 has higher recognition efficiency and the 35 ability to deploy to the mobile end. Breeding researchers can apply this model to 36 terminals to achieve a more intuitive understanding of cotton anther dehiscence status. Cotton is an economically important crop, and its reproductive development is 60 susceptible to a variety of adverse stresses that affect its yield and quality. The 61 reproductive organs of cotton include stamens and pistils, and stamens are more 62 sensitive to heat stress than female organs (Peet et al., 1998). In many summer crops,  to mine the diversity of stem-end meristematic tissues and to find candidate genes that 105 correlate with the transport of phytohormones, cell division, and cell size by GWAS 106 (Yang et al., 2007). In rice, the ratio of spikes to leaves, a new trait of rice, has been 107 extracted using a feature pyramid network mask model that has achieved leaf and 108 spike recognition accuracies of 0.98 and 0.99, respectively (Yang et al., 2020).     linear weighting formula for Soft-NMS can be expressed as: Thus, to screen the prediction box, when using the NMS algorithm, only the anther 177 images with the highest confidence are retained. Therefore, we used YOLOv5 with 178 the soft-NMS algorithm (Bodla et al., 2017) to screen the prediction box.    The loss function of the object detection network of Faster R-CNN is shown in 231 Formula: In the above-mentioned formula, i represents anchors index; t represents predict 235 bounding box; t* represents ground true box corresponding to positive anchor; xy, w,   is the newly generated sample and corresponding label, respectively. λ is the mixing 260 coefficient resulting from the hyperparametric α and β conducted beta distributions.

261
The principal formula of the mixup method can be expressed as: According to the study, we know that as the hyperparameters α and β increase, the 266 error and generalization ability of the network training will increase. When the beta 267 distribution of the mixing coefficient λ is α=β=0, the network recovers to the ERM 268 (Empirical Risk Minimization) principle to minimize the training data average error; 269 the beta distribution of the mixing coefficient λ has the best generalization ability and 270 robustness. This method can make full use of all the pixel information, but at the same 271 time also introduces some unnecessary pseudo-pixel information.    "all", which were 0.2523, 0.2619, and 0.3104 higher than YOLOv5, respectively. This 326 may be due to the interference of location information. Although YOLOv5 has a 327 slightly higher mAP@0.5:0.95, R 2 is far lower than Faster R-CNN (Table S1). Since   (Table S2). and "all", respectively. The R 2 in the category of "close" and "all" of Faster R-CNN 352 with data augmentation were 0.0028 and 0.0035 higher than that of Faster R-CNN 353 without augmentation. While R 2 in the "open" category of Faster R-CNN with data 354 augmentation was 0.0133 lower than that of Faster R-CNN without data augmentation. 355 Overall, the evaluation showed that the performance of Faster R-CNN with data 356 augmentation is higher than that of Faster R-CNN without data augmentation (Table   357 S3).  (Table S4).  (Table 1), and then affected the 392 pollination process, resulting in a reduction in cotton yield. Finally, by observing 30 393 cotton lines, we found that the anther dehiscent rate of S003 and S004 was still more 394 than 85% under HT stress, which was significantly improved compared with other 395 lines (Table 1). In addition, we screened cotton lines with HT tolerance in large 396 quantities through machine recognition, and obtained more than 35 HT tolerant cotton 397 lines. These HT tolerant germplasms will be used in cotton HT tolerance breeding. alone. Therefore, when accurate data is needed, we can choose to integrate the 418 detection results of the four models, so that the detection data obtained is the most 419 reliable. Of course, directly using Faster R-CNN model with FPN structure and data 420 augmentation and multi-scale has higher robustness and higher accuracy.

421
It is well known that anther is the male organ of plant, anther abortion will directly 422 lead to male sterility and reduce yield. Our previous studies could be preliminarily 423 concluded that HT stress can reduce cotton yield by inhibiting cotton male fertility.       were annotated one by one. The images were used as the data set and were randomly 503 divided into a training set and validation set in the ratio of 7:3 (Table S5).

504
Experimental operation environment 505 The hardware environment used in this study is in Table S6, and on   The data set offers the dehiscence of cotton anthers under high temperature stress, so 554 data is temporarily unavailable. As the experiment progresses, our data will gradually 555 improve the available state.

556
Competing interests 557 The authors declare that they have no competing interests.              Cotton anther identi cation effect graph a: The purple box marks an indehiscent cotton anther, and the pink box marks a dehiscent cotton anther. b: The blue box marks an indehiscent cotton anther, and the gray box marks a dehiscent cotton anther. c: The pink box marks an indehiscent cotton anther, and the green box marks a dehiscent cotton anther. d: The gray box marks an indehiscent cotton anther, and the red box marks a dehiscent cotton anther. In each test, the colors of the prediction boxes with different labels were randomly generated.  is the traditional Faster R-CNN. Model 4 is the Faster R-CNN with Multi-Scale and data augmentation and FPN structure. Epoch: All the data were sent into the network to complete a process of forward calculation and back propagation. mAP@0.5:0.95 is the process of increasing IoU from 0.5 to 0.95 according to the span of 0.05. The mAP corresponding to each IoU was added to obtain the average value of mAP in this process.  Image labeling The above gures are manually marked cotton anther images using the "Labelimg" software. Green boxes represent indehiscent anthers and red boxes represent dehiscent anthers. When the image labeling was nished, we corresponded the location information of the image with the name of the image one by one and saved it in VOC format. a: All the anthers are indehiscent. b: All the anthers are dehiscent. c: Dehiscent anthers account for the majority. d: Indehiscent anthers account for the majority.