Superpixel-based segmentation algorithm for mature citrus

With the decrease of agricultural labors and the increase in production costs, harvesting robots have become a research hotspot in recent years. To guide harvesting robots to pick mature citrus more precisely under variable illumination conditions, an image segmentation algorithm based on superpixel was proposed. Efficient simple linear iterative clustering (SLIC) algorithm which takes similarity of adjacent pixels into account was adopted to segment the images captured under variable illumination conditions into superpixels. The color and texture features of these superpixels were extracted and fused into feature vectors as descriptors to train backpropagation neural networks (BPNN) classifier in the next step. The adjacency information of superpixels was considered by calculating the global-local binary pattern (LBP) in R component images when extracting texture features. To accelerate the classification process, the mean of Cr-Cb image was utilized to find superpixels of interest which were regarded as candidates of citrus superpixels. These candidates were then classified by a pre-trained BPNN model with superpixel-level accuracy of 98.77% and pixel-level accuracy of 94.96%, while the average time to segment one image was 0.4778 s. Therefore, the results indicated that a superpixel-based segmentation algorithm toward citrus images had decent light robustness as well as high accuracy that could guide harvesting robot to pick mature citrus efficiently.


Introduction
Harvesting is one of the most time-consuming and labor-consuming parts of the fruit production chain [1] . The citrus harvesting process proportions 35%-45% of total production cost [2] . What makes things worse is that growing urbanization, the aging of the population as well as the education level have led to a labor shortage and increasing labor cost presenting severe problems that may impact citrus production [3] . Moreover, manual harvesting activities pose a high risk of back strain and musculoskeletal problems to citrus pickers [4] .
To improve the situation, it is an inevitable trend to use harvesting robots to pick citrus [5] . Like other fruit or vegetable harvesting systems, some ineluctable issues ought to be addressed that citrus detection and localization remain challenging tasks due to the complex natural environment [6][7] , especially the variable lighting conditions caused by variation of weather, changes of relative position of harvesting system and light source.
Citrus fruit detection and localization have been studied extensively.
Chromatic information is one of the available methods to differentiate fruits from the background. Zhang et al. [8] came up with a rule for segmenting citrus from the background based on the difference between R and B components. Cai et al. [9] distinguish targets and background effectively by processing Otsu's method toward 2R-G-B images. Li et al. [10] recognized citrus using hue component at first, and then divided the overlapping targets based on convex shell and distance transform method.
Although these researches had demonstrated the feasibility of segmentation methods based on chromatic aberration information, there remained a limitation. To guarantee the performance of these methods, the intensity difference between channel components of fruit regions ought to be greater than those of the background participants [11] , which meant that only under specific light conditions, would these methods get acceptable results. In the citrus growing environment, however, the changes of illumination are much greater than the chromatic difference of the fruits and the leaves [12] . What makes the problem worse is that some parts of citrus may be saturated when irradiated by the sun in the clear sky.
To overcome the difficulties caused by varying illumination conditions, partial occlusion and clustering of citrus, color variation of citrus at different stages of maturity as well as the inevitable movement of the camera and the position of the light source in the harvesting process, Lyu et al. [13] used a multiclass support vector machine which succeeded by a morphological operation to simultaneously segment the fruits and branches in a natural environment. Lin et al. [14] excluded backgrounds based on depth filter and Bayes-classifier, then a density clustering method was used to group adjacent points into clusters which might be citrus, and false positives were removed by color, gradient, and geometry feature based SVM classifier with an F1 score of 0.9197. Liu et al. [15] proposed a method that constructed a multi-elliptical boundary model in Cr-Cb coordinates to detect citrus fruit and tree trunks in natural light environments with correct and false positive percentages of 90.8% and 11.2%, respectively. An adaptive enhanced red and green chromatic map was generated from an illumination-compensated image by Zhuang et al. [16] , then some image segmentation and preprocessing methods like marker-controlled watershed transform and convex hull operation methods were used to locate potential citrus regions. LBP was utilized to behave as a texture feature and fed to a histogram intersection kernel-based support vector machine to classify the citrus. Lu and Sang [12] segmented image contours into smooth fragments, selected valid fragments based on their lengths, bending degrees and concavities, and then applied circle fitting to combine neighboring fragments as single citrus fruits.
Segmentation and localization of immature citrus in the natural environment have also been hotspots in recent years. Bansal et al. [17] proposed a fast Fourier transform leakage-based green citrus detection method but faulty results appeared when the image included sky background or another symmetric area. Sengupta [18] conducted a circular Hough transform (CHT) to detect possible citrus and SVM classifier trained on texture features was used to exclude false positives. Wang et al. [19] used discrete wavelet transform, K-means clustering, and CHT to detect citrus with detection accuracy, false positive and miss rate of 85.6%, 11.8% and14.4%, respectively. Lu et al. [20] detected local intensity maxima in the G channel of images taken in low natural light conditions with a flashlight and LBP features around them were extracted as an input of an ensemble random under-sampling with AdaBoost (RUSBoost) to get positive predictions. The hierarchical contour maps around positive predictions which considered as candidates were extracted and fitted with CHT to get the final targets if its radius were in a predetermined range.
In the above-mentioned studies, the researchers had not handled the problem of local reflection effectively and pixels were directly processed while the potential relationship between adjacent pixels was not fully utilized.
Superpixel segmentation considered adjacency of pixels that this method divided images into a certain number of subregions with semantic meaning. Compared with the basic unit of traditional processing methods, pixel, superpixel was more conducive to the extraction of local features and the expression of structural information, thus reduced computational complexity of subsequent processing by leaps and bounds.
Superpixel had been widely used in the field of computer vision with the remarkable performance [21][22][23] while its great application in the agricultural field also proved its notable potential under an unstructured environment [24,25] . In order to improve the robustness of the image segmentation algorithm under variable illumination, especially to avoid the local hole and false positive, a citrus segmentation algorithm based on super-pixels was proposed. There were four key steps in this paper including superpixel segmentation, feature extraction, superpixel screening and classification.

Image acquisition
Image acquisition is the first and important step. The quality of images will exert an effect on subsequent image processing. The images were obtained in a greenhouse located in Shanghai Jiao Tong University, Shanghai Province, China during the harvest season of citrus. Canon 700D was used to take images, of which the sensor was color complementary metal-oxide-semiconductor (CMOS). The trees were randomly selected in the greenhouse and the scene images were taken under natural daylight light conditions. Both front-light and back-light conditions were considered. Python 3.6, Open Source Computer Vision Library (openCV3.1.0, Intel Corporation) were used to realize the proposed detection algorithm on an Intel(R) Core(TM) i5-8265 CPU @ 1.60 GHz, 1.80 GHz, 8 GB RAM laptop.

Superpixel segmentation
SLIC was selected as the superpixel segmentation algorithm due to the remarkable property that it outperformed existing superpixel methods in nearly every respect [26] . SLIC algorithm structures N (N is the total number of image pixels) 5-Dimensional feature vectors composed of 3 color components of image pixels in the Lab color space and 2-Dimensional position coordinates of each pixel in the rectangular coordinate system, and then construct the distance metric for the 5-Dimensional vectors to cluster superpixels iteratively.
For considerable segmentation performance and balancing the running time of the algorithm, the number of superpixels which defined to be K and compactness of superpixels ought to be found out, especially the number of superpixels which is directly related to the running time of the algorithm. There are some indexes to evaluate the performance of superpixel segmentation, such as Boundary Recall (BR), Undersegmentation Error (UE), and Adaptive Segmentation Accuracy (ASA) [27] .
UE measures fraction of pixel leak across ground truth boundaries, as is stated as follows [27] : where, g={G 1 , G 2 , …, G ng } is used to represent a ground truth segmentation with n g segments and |G i | denotes the segment size; S k denotes one of the superpixels. It evaluates the quality of segmentation based on the requirement that a superpixel ought to overlap with only one object. The purpose of this work was to segment the citrus from the background independently, thus UE was chosen as a standard to select K. 50 images of which the resolutions were 1080×1920 were randomly selected and precisely labeled. With a view of boosting the running speed of the algorithm and considering the error of manual marking when labeling ground truth, images were down-sampled to 180×320 when determining K and so did their labeled ground truth images, the overlapping threshold of superpixels and labeled areas of images was roughly set to 5% to determine whether a superpixel was supposed to be taken into account to compute UE.
The smaller was UE, the better was the performance. The mean and standard deviation of UE measured the accuracy and stability of the superpixel segmentation process. After segmenting 50 images into superpixels with different values of K, the mean and standard deviation of UE were calculated with a result illustrating in Figure 1. It turned out that when K was 400, the mean of UE was at a local minimum while the standard deviation of UE did not fall at a noticeable rate anymore. Besides, if K was too large, the superpixel segmentation tended to lose its meaning and the operation burden would increase when extracting features. Therefore, K was set to be 400 as the result after segmenting by SLIC algorithm was displayed in Figure 2, while almost every superpixel consisted of one object only.  The images were segmented by the SLIC algorithm into about 400 superpixels. These superpixels were classified and those belonged to citrus were supposed to be picked out in the next step. Hence, superpixels were identified as an object or a region corresponding to a certain part of the foreground or background. However, it was very difficult to describe superpixels directly due to their irregular shape, thus color features or texture features ought to be extracted to characterize them.

Feature extraction and classification
In the natural environment, it was arduous to separate citrus from captured images since the success rate of segmentation was not only affected by background such as sky, seeds and leaves of which the front and back sides had different appearances but also affected by the illumination conditions.
Under variable light conditions, the obtained citrus images can be divided into three situations: front-lighting, back-lighting and overshadow [28] . For instance, there would be a strong reflected light or shadow area on the surface of the citrus under the condition of front-lighting, concave edge, or hole that might occur after segmentation. Additionally, clustering and different levels of maturity were also supposed to be taken into account. In order to comprehensively consider various influencing factors, not only color features but also texture features were utilized to describe superpixels.

Color feature
In spite of variable natural illumination, color information remained significant and distinct feature of images that included abundant valuable information, even the color of different parts of the same object was not identical. There were mainly four objects in captured images, including citrus, leaves, branches and other objects, as shown in Figure 3. Figure 3 Objects and color feature of citrus Many researchers had studied citrus image segmentation based on color space among which segmentation based on RGB [8][9] , HSV [10] , YCrCb [15] color spaces showed acceptable performance. These studies, nevertheless, were based on the pixel-level segmentation method. Hunting for color features that possessed distinguishing property toward super-pixels, 185 samples of four objects discussed above were cut off from 50 randomly selected images while mean values of 185 samples deemed to have the similar color feature of superpixels were calculated under hue component, R component, 2R-G-B and Cr-Cb chromatic aberration space.
As shown in Figure 4a, color features acquired from the R component got poor performance since it was unable to separate citrus objects from any other objects although the texture feature acquired from the R component performs well which would be introduced there-in-after. Figure 4b demonstrated that the color feature acquired from the 2R-G-B image failed to live up to our expectations even though it was a representative pixel-level segmentation method. Figure 4c illustrated that the H component was not the desired choice since the mean values of citrus and branch were similar, these values, however, were supposed to be as different as possible to avoid damage to harvesting robot. Feature extracted from Cr-Cb image outperformed the previous, the interval of mean values of citrus differed from that of the other three objects as shown in Figure 4d.
One significant conclusion could also be drawn that color features alone were not able to guarantee the integrity of segmentation, while the combination of several color spaces was not deemed to be reliable enough under variable illumination conditions.
In this paper, however, RGB and YCrCb color spaces were used to extract color features, as the means and standard deviations of R, G, B, Y, Cr, Cb were extracted to form 12-Dimensional color feature vectors of superpixels. To obtain better performance, the texture feature was going to be extracted in section 3.2.

Texture feature
As was mentioned in the preceding part of the text, we had considered the relationship between adjacent pixels as the images were segmented into super-pixels. Based on the cognition that the targets in images were composed of superpixels, the adjacency information between superpixels was considered when extracting texture features. Normally, statistical characterization would be used to extract the texture feature of superpixels due to its irregular shape. LBP, however, had shown its superiority in texture extraction [16][17][18][19][20] as discussed in section 1, thus the global LBP was computed instead of calculating statistical gray features of superpixels directly in this paper, hence the differences of the edge between superpixels were taken into account under which circumstance the targets were able to be segmented more precisely. The texture features of superpixels were extracted by acquiring LBP statistical histogram. To reduce the amount of computation, the level of LBP value was transformed from 256 into 8.
To evaluate whether texture features extracted from the presented method were able to upgrade the classification accuracy of superpixels, they were trained by BPNN. The topological structure of BPNN used to assess the method was 8×6×1 which was determined by Equation (2) Figure 4 Mean value of different color spaces

Superpixel screening
It used to be difficult, nevertheless, to realize the real-time feature extraction and classification under limited computing power since SLIC was essentially a clustering algorithm which cost time to iterate, moreover, inevitable double counting occurred in the process of feature extraction. To accelerate the algorithm and realize real-time detection to the largest extent, SOI (Superpixels of Interest) which might belong to citrus were selected before extracting superpixel features.
To exclude non-SOI (superpixels that were not segmented from citrus), 185 parts were randomly cut out from citrus, leaf, branch and other background objects, respectively, and means of chromatic aberration images (Cb-Cr) were computed to set a threshold to screen out non-SOI quickly, as shown in Figure 4d. Through analysis, the maximum value and minimum mean value of Cb-Cr were found to be 39.89 and 170.41, respectively. To avoid the case that SOI was mistakenly excluded, the upper bound was set to 175 while the lower bound was set to 35.

Neural network training and classification
The features of color and texture were fused to form 20-Dimensional superpixel feature vectors which were normalized to get the feature description of superpixels ultimately, so the input layer of BPNN had 20 input interfaces. The size of output data was 1, which showed whether a superpixel belonged to citrus. The size of the hidden layer was defined by an empirical formula [29] , as shown in Equation (2). (2) where, s is the size of the hidden layer, m is the size of the input layer while n is the size of the output layer and s ought to be rounded to an integer finally. Equation (2) was determined by surface fitting based on a lot of instances. The empirical formula had been verified and applied in other researches [30,31] . According to the formula, m was 20 and n was 1, so s turned out to be 8 thus the topological structure was determined as 20×6×1.
To train a BPNN classifier, 200 images captured under different illumination conditions were randomly selected to segment and extract superpixel feature vector samples from which 30% samples were split to behave as a test set.

Experiment and discussion
2435 test samples were utilized to evaluate the BPNN classifier and the confidence of each superpixel was obtained. If the confidence was less than 0.5, it was identified as background superpixel and vice versa. The ratio of positive to negative samples in this paper was close to 1 thus the accuracy was utilized as the evaluation criterion of the classification performance while the accuracy was defined as below: correct total accuracy n n = The statistical result displayed in Table 2 showed that the classification accuracy of the citrus superpixels was 98.95%, the classification accuracy of the background superpixels was 98.56%, and the accuracy of all superpixels was 98.77%.
Compared with results segmented from R-B images and pixel-level BPNN classifiers, segmentation results using the algorithm proposed in this paper were shown in Figure 5. More specifically, Otsu's method was used to segment the R-B images while the features utilized to train pixel-level BPNN classifier were R, G, B, L, a, b component and LBP in each position. Before extracting features to train a pixel-level BPNN classifier, the images were blurred by a 3×3 mean filter to overcome local distortions. It could be concluded from Figure 5 that the proposed superpixel-based algorithm outperformed the other two methods.
The superpixel-based algorithm was able to avoid background distractions and was robust under different environmental conditions. And the problem of local reflection or saturation was well-handled.  For further clarification, the purpose of this paper was to segment the citrus from images as precisely as possible, the most direct evaluation of the performance of the method was whether all pixels of citrus were classified accurately rather than the classification accuracy of superpixels.
Hence, 50 precisely labeled citrus images were utilized to assess the performance of the algorithm in pixel level, and the result showed that 95.18% of pixels of citrus were classified accurately.
Besides, 200 images were processed by an algorithm before and after acceleration, the running time was displayed in Figure 6 while the pixel-level accuracy before and after screening out non-SOI were shown in Table 3. It could be concluded that the accuracy in pixel-level varied within an acceptable range while the average running time processing one image was reduced from 0.9830 s to 0.4778 s which declared that the algorithm was capable of real-time segmentation. Figure 6 Running time before and after acceleration However, deficiencies did exist. As was shown in Figure  5(d3), some of the citruses were hidden by leaves or other background and became discrete parts, their relationships were difficult to be determined, thus it seemed not like an easy task to locate them. Besides, the main purpose of this paper was to segment citrus from background while the issue of classification of different obstacles was not discussed. And it seemed difficult to improve the pixel-level accuracy since not only depending on the size of the training sample but also the accuracy of superpixel segmentation.

Conclusions
A superpixel-based algorithm for citrus image segmentation in the natural environment was proposed in this paper. Citrus images were firstly segmented into superpixels from which color features and texture features were extracted. Color features consisted of means and standard deviations of components in selected color spaces while the texture features of superpixels were extracted by calculating global LBP on R component images and acquiring their statistical histogram feature. The result indicated that this algorithm was able to segment the citrus images under variable illumination conditions, especially it performed well toward the problem of a local saturation.
SOI was firstly proposed in this paper, which meant the superpixels might belong to the target. To accelerate the classification process, superpixels that did not belong to citrus were screened out to get SOI under which circumstance the average running time processing one image was reduced from 0.9830 s to 0.4778 s. Additionally, the superpixel-level accuracy was 98.77% with pixel-level accuracy of 94.96% which indicated that the superpixel-based segmentation algorithm was robust and precise enough to guide harvesting robot to pick citrus in real-time under variable natural light.