Computer Vision based Food Grain Classiﬁcation: a Comprehensive Survey

This manuscript presents a comprehensive survey on recent computer vision based food grain classiﬁcation techniques. It includes state-of-the-art approaches intended for diﬀerent grain varieties. The approaches proposed in the literature are analyzed according to the processing stages considered in the classiﬁcation pipeline, making it easier to identify common techniques and comparisons. Ad-ditionally, the type of images considered by each approach (i.e., images from the: visible, infrared, multispectral, hyperspectral bands) together with the strategy used to generate ground truth data (i.e., real and synthetic images) are reviewed. Finally, conclusions highlighting future needs and challenges are presented.


Introduction
With the continued population growth, the food industry needs to keep increasing production and improving the quality of products. Directly or indirectly related to the increase in food production are the cereals, which are at the base of the pyramid of the food industry, both for human and animal consumption. According to the Food and Agriculture Organization's last report the world cereal production in 2020 has been 2765 million tonnes, 2% higher than 2019 1 . The report shows that the increase in production has kept the same ratio during the last decade and it is expected to keep the same ratio in the near future. Therefore, since it is difficult to increase the arable land, improvements 10 are required in other processes in the production chain to increase productivity.
One of these improvements is related to the automation of the classification of food grains, where a great effort has been devoted in recent years by proposing new approaches to perform the classification in an automatic way. It should be noticed that the food grain classification problem requires specific features 15 according to the type of variety or problem. Some classes (seeds) have a large inter-class variability making easier the solution while others show a very tiny inter-class variability (e.g., classify between good grain and infected one). Actually, some of these challenging problems (small inter-class variability) requires the usage of multispectral or hyperspectral technology. The contributions of 20 this survey are as follows. Firstly, it presents a general pipeline that is used to analyze the different stages generally involved in the classification process, providing discussions for each one of them. Due to the lack of common benchmarks for validation and the complexity of reproducing different approaches, quantitative comparisons become difficult. Therefore, the survey presents an 25 analysis of the most important proposals for each stage and provides quantitative evaluations when possible. Finally, general conclusions are given pointing out the current limitations and future trends from a more general viewpoint.

Literature Review
This section presents a deep review of works related to the different stages in- 30 volved in the food grain classification problem. The reviewed works are grouped following different criteria. Firstly, the main modules generally used in grain 1 http://www.fao.org/worldfoodsituation/csdb/en/ by the food grain classification systems, there are several works devoted to processing images from other spectral bands, for instance, infrared spectrum.
Furthermore, some approaches exploit other types of images (e.g., multispectral or hyperspectral), covering not only the visible spectrum but also the ultraviolet or infrared spectral bands. These kinds of images allow us to obtain information useful for the classification process, which is not available in the classical 55 single band domain. This section reviews state-of-the-art approaches on visible and infrared spectral bands together with multispectral and hyperspectral based approaches, highlighting their limitations, advantages, and drawbacks.

Visible and infrared spectral bands
Most of the approaches in the literature are based on the use of a single 60 spectral band. In general, visible spectrum cameras are considered due to the low cost and availability of such devices. In addition to the visible spectrum, there are few approaches relying on near infrared imagery due to the possibility to better discriminate different objects in the given scene. In spite of that, the visible spectrum is more widely used, for instance, [4] proposes a robust, low-   Huang(2016) [33] Hyperspectral Black Adaptive threshold LS-SVM Maize seed variety classification Liu(2016) [34] Multispectral Black Thresholding SVM Online variety discrimination of rice seeds Ribeiro(2016) [ Also working in the visible and near infrared spectral bands, Liu et al. [45] propose a method to determine the purity of rice seeds of non-transgenic vari-95 eties from their transgenic counterparts. The approach is based on multispectral image analysis combined with the study of chemometric data. For the experiments, 200 samples of the transgenic and non-transgenic rice seeds, respectively, of the visible and near infrared spectra, in the range of 405-970 nm, were used.
The use of multispectral images combined with chemometric has shown the 100 best results of the classification. In Liu 2016 et al. [34], the authors propose another rice seed classification method based on the usage of multispectral images. the joint spectral-spatial information but also combine different grains' spectral and spatial relationships. On the contrary to the previous approaches, in Chu 125 et al. [9] an infrared hyperspectral (900-1700 nm) approach has been proposed.
The authors tackle the classification of healthy corn from infected from one of the three hybrid classes of fungi: dented, waxy, and semi-flint endosperms. Also working in the infrared spectral band, Berman et al. [56] present an infrared hyperspectral approach to perform the classification of individual sound and 130 stained wheat grains, belonging to 24 Australian different varieties. The image data were normalized based on its means, using only the spectral shape.
The experiments were carried out with image samples over the 420-2500 nm, 420-1000 nm, and 420-700 nm wavelength range. Also in the infrared hyperspectral domain, Sendin et al. [2] propose an approach to classify whole white 135 corn kernels. The method performs 13 classes division of disposal materials using hyperspectral imaging from 1118 to 2425 nm with a 6.3 nm spectral reso-lution between the 209 spectral points. Hyperspectral imaging is also exploited by Chu et al. [9], where an approach to classify infected corn seeds is proposed.
It uses infrared hyperspectral images in the range of 900 to 1700 nm. Although most of the approaches proposed in the literature for the seed grains classification problem work on the visible spectrum (see Table 1) there is an increasing number of approaches that rely on information from other spectral bands or other types of images (e.g., multispectral, hyperspectral), mainly due to It could be mentioned as a general conclusion that multispectral and hyperspectral technologies offer many possibilities that still need to be explored. The main drawback that can be observed is the lack of well-documented and available datasets for reference. In most of the approaches presented in the literature, 160 researchers acquire their dataset and make their contributions, which makes it difficult to compare the different techniques. Hopefully, common benchmarks will be shortly available to be used as references by the community.

PREPROCESSING
The reviewed approaches, in general, have some kind of preprocessing to the 165 given raw images to put them all in the same format (e.g., cropping, scaling, color space mapping, etc.), or to facilitate the segmentation and classification process by enhancing the given images (e.g., noise filtering, contrast, sharpness enhancement, etc.) [16]. Hence, this section reviews the most relevant preprocessing approaches. In case raw data corresponds to a high resolution image, some cropping or scaling is needed. The cropping technique consists of splitting up the given image into regions of small size (patches) to obtain a more easy representation to process images [13]. This splitting process allows to discard unwanted parts 175 of the images and to focus just on the object of interest (e.g., [46], [20], [28] [42]).
Image cropping is also used by some authors, after segmenting the given image, to focus the classification process just on a region of interest that contains just a single instance (e.g., [13], [15]). Image scaling is also a very common process, it points to resize the given images to represent them all at the same size. It 180 involves a trade-off between efficiency, smoothness, and sharpness [46]. In Huang Nen-Fu et al. [18], for instance, the authors do a resize to fit and represent the data according to the model requirements, they resize the images to width and length of 180 pixels each. In [15] the authors also apply a resize to their inputs to 1024×1024, which is the default setting for Mask R-CNN. In [10], [28], and 185 [14] image scaling is also considered in the preprocessing stage as an important operation in their pipeline.

Image enhancement
Image enhancement steps are focused on the improvement of the image's quality from the human perception point of view, some examples are: removing 190 blurring, noise, or increasing the content's contrast [46]. In [15] an image enhancement is performed as a pre-processing step. Applying the contrast-limited adaptive histogram equalization technique. In the case of noise filtering, the most common approach is to apply a Gaussian filter. For instance, García et al. [16] deal with this problem by using Gaussian filters with 2D Gaussian smooth-ing. In [10], [43], and [49] Gaussian filters are also applied due to the type of noise they have to tackle. On the contrary, in those cases where the noise is produced by the low lighting conditions, other filters are considered. In other words, depending on the type of noise different filters should be applied; for instance, in Gujjar et al. [46] a special median filter is used to remove noise 200 and smooth the given image. In [44] a median filter is also applied because it preserves the edges during the noise removal process; [47] and [39] follow the same approach. In most cases, a previous grayscale conversion is considered for the operations of filtering. Both [27] and [10] apply a median filter while they use Sobel edge detection to preserve edges during the noise removal process.

Morphological operations
Morphological operations, such as classical erosion or dilation, have been also used as preprocessing to tackle some specific tasks. For instance, in [43] and [52], the authors use erosion to eliminate shadows of grains followed by dilation to enhance the image after the erosion and improve the boundary sharpness.

210
Other solutions, for instance, [28], use morphological operations, during the preprocessing stage to remove white spot noise in the background. Although results are improved after using morphological operators, the main drawback lies in its high computational cost.

Color conversion 215
Color space conversion is generally used to produce robust solutions or to highlight some specific features of the given image. There are different color spaces (e.g., RGB, CIELAB, CIEXYZ, CMYK, etc.), being the RGB the one generally used in the grain classification problem. The capability of working at different color spaces is exploited by Patil et al. [52]; in this work, the RGB 220 color model is mapped to an L*a*b and HSI color spaces to later on make possible the color feature extraction, which is going to be the input for the classifier. In [35] the author proposes an approach for the classification of five types of grains, extracting morphology, color, and texture features. To increase the accuracy of the classification, the original RGB color space is converted to 225 an HSV color, which obtains the best results. A similar approach is followed in [3] where the authors present an approach to classify five types of rice, using a vector of characteristics applying the BPNN algorithm using the luminance component of the converted HSV color spaces. More recently, the same research team has proposed an extension [11] of their previous work. In this case, the hue 230 channel and an algorithm based on a neuro-diffuse cascade network are used to obtain similar results for all four types of rice grains. Others authors, like [28], [19], and [38], propose to convert the given images to grayscale and use them as inputs to the system. Actually, working in grayscale, just one dimension is considered, hence in these cases the texture or intensity analysis is considered 235 [42].
On the other hand, in [14], [50], and [51] the authors propose the usage of color histograms to obtain the best threshold value. Based on the information obtained from the color histogram of the RGB image channels, some authors (e.g., [14], [38], [54], [57]) propose to use just one channel of the input image.

240
Altuntacs et al. [14] use the blue channel of the RGB image to converts it to grayscale and then apply morphological operations using a median filter to reduce noise on corn kernel images. In the case of Birla et al. [38], the authors propose to use the green channel of the given image to converts it to grayscale and then apply a manual threshold to obtain the segmented image of rice grain. which is why it would be necessary to carry out a preliminary evaluation of the different color models within the preprocessing stage to find the best option depending on the problem to be addressed. Another challenge related to this problem is a multi-touch scenario, which makes the classification task difficult.

260
Most of the recent works are based on deep learning; in these cases, some authors use preprocessing techniques of cropping and scaling to generate the necessary amount of diversity of scenarios to carry out the training of the model.

SEGMENTATION
Following the pipeline defined in this work, after carrying out the prepro-

Classical Approaches
One of the most widely used image grain segmentation techniques is just the 275 thresholding; this technique works on grayscale images and performs the binarization using a threshold value, which depends on the type of grain analyzed together with the background color [58]. It should be mentioned that sometimes, after the image segmentation, some postprocessing techniques are applied to enhance the results, some of these postprocessing approaches are described next.

280
The main drawback of thresholding techniques lies in their sensitivity to the selected threshold value used to generate the binary image.
As mentioned above, in some cases, after thresholding techniques some additional processes are performed to the obtained binary image to improve the results from the further classification process; the problems generally found are Wah et al. [28] removes the noise of binary image (areas smaller than 10 pixels) applying two morphological operations, first erosion, and then dilation. Huang

295
Nen-Fu et al. [18] improves segmentation by applying a color detection method to remove the background of the coffee beans. Finally, Qiu et al. [25] are the only one in the reviewed literature that uses the spectral dimension and then applies thresholding to obtain the binary image.
On the other hand, Silva [49], Son [20], and Douik [53] use the background 300 subtraction method for segmenting rice kernels; in Silva et al. [49] an additional morphological opening is applied together with a contrast stretching to the given grayscale image. The usage of morphological operations has been also exploited in Guevara [51] and Siddagangappa [43] to delete shadow and improve edge sharpness to get better results. Some authors (e.g., [47], [31], [28]) use the 305 Otsu method to convert the grayscale image to a binary image, according to the defined threshold value, to extract the grain from the background. age. Huang Sheng [19] and Shrestha [36] use the watershed method to obtain the segmented instance of each grain; in the first case, this method is used to segment corn kernels while in the second case it is used to segment wheat kernels. After generating the binary mask elements are extracted from the original given input image to proceed with the classification stage; Actually, some au-320 thors carry out several additional steps to separate each instance of grain, this will depend on whether the image has a single kernel or a cluster of grains.
In the case of clusters, each instance must be identified in order to be used in the classification task. Altuntacs et al. [14] propose an approach to extract bounding boxes using contour lines for each grain. Zareiforoush [42] and Kilicc

325
[57] use a set of functions to separate and label each grain that existed in the image. Guevara et al. [51] calculate the center of mass of the regions obtained in the binarization process to label each instance of the wheat and barley grains.
Finally, another approach is proposed by Siddagangappa et al. in [43]; once the image is binarized, a labeling process is performed over connected components 330 by using labels and the similarity of gray level values.

Deep Learning Based Approaches
On the contrary to the classic approaches reviewed in the previous section, Based on the reviewed literature, it was found that the Mask R-CNN ar-chitecture [61], has been the most commonly used architecture (e.g., [15], [12], [13]). This network allows us to perform the segmentation of instances and 345 obtain the binary mask of each grain present in the input image. In all these approaches the authors did not change the original architecture. It should be mentioned that the Mask R-CNN is used to segment different types of cereals; in [15] different varieties of rice grains are segmented, while [12] performs the segmentation of different types of grains, such as rice, lettuce, oats, and wheat; 350 on the contrary to previous works, [13] uses the Mask R-CNN to segment corn kernels.

Discussions on segmentation approaches
According to the reviewed literature, one of the techniques most used by the authors was thresholding. The main reason for its popularity is because this

CLASSIFICATION
Following the pipeline presented in Fig. 1, once images are segmented, every  Also based on classical approaches, in García et al. [16] an image processing plus machine learning approach is proposed to classify green coffee beans. The beans are classified as good or defectives (five types of defects), using the K-395 NN algorithm; previously, a feature extraction stage is accomplished obtaining: surface area, roundness, area relation, and eccentricity. These features are used to train the classifier. Vlasov & Fadeev [30] also follow this approach classifying five different types of seeds with K-means clustering.
In the context of classical approaches, some authors use the SVM, also known 400 as a hyperplane classifier, to classify food grains. The main objective of this algorithm is to determine an optimal line (plane or hyperplane, depending on the feature space dimension) that allows separating two given classes [1]. However, in the case of performing a multiclass classification, it is necessary to build a combination of several binary classifiers; this is the case of the work presented by

Deep Learning Based Approaches
Since the evolution of the technology, especially regarding memory capabilities and parallel processing of a big amount of data, deep learning has taken a big advantage with a variety of tasks in the computer vision field. In the particular case of image recognition, for agricultural problems, it is not the exception. architectures and obtain the best results with VGG-19. A similar approach to the one presented above has been followed by Sheng et al. [19]; in this case, the authors propose to classify defectives corn kernels from goods, with defects including mold, worm, damages, and discoloration. In this work, GoogLeNet 440 and VGG networks were evaluated under a transfer learning scheme, and the first one obtains better results. Both approaches overcome machine learning state-of-the-art results. Finally, in [25], the authors also use VGGNet to speed up the learning process and outperforming state of the art results.
On the contrary to previous approaches, there are some recent works where 445 authors design a custom solution (e.g., [18], [13], [23]). In the case of [18] a two convolutional layer network, with a Rectified Linear Unit (RELU) activation function, is proposed; this network is trained with grayscale images in order to make it easier to detect the shape of green coffee beans and their dark color. In the case of [13], the authors propose a lightweight CNN architecture, referred to as CK-CNN, to classify corn kernels into three categories: good kernels, defective kernels, and impurities. The network receives as an input a single element from the segmentation module. It consists of five layers: three convolutional layers defined with a 3×3 size kernels and two fully connected layers. Finally, in [23] a novel architecture is designed with five convolutional blocks, each containing a   this timeline shows the first time a given approach is used.

APPLICATIONS
There are several food grain applications based on computer vision, in general, they can be grouped in: i) quality control approaches and ii) grain variety propose a method for estimating the size of Oryza sativa L rice class along with the detection of chalky and broken rice. Another rice quality control application can be found in [47], where the authors present a multiclass SVM algorithm to determine the grade of 4 types of rice kernels. There are also approaches in the literature for wheat grain quality control, for instance in [8] the authors propose a CNN approach to evaluate wheat grains according to electronic National Agriculture Market parameters of India, which enforce automatic grain quality, 500 using inexpensive mobile phones.
Also related with the quality control problem, but for a given sample set, some approaches evaluate the sample as a whole, in other words, they measure the percentage of good kernels, defective kernels (including broken or rotten kernels), and impurities (e.g., pieces of straw, foreign elements, dust) in the 505 given sample set. A review of these applications is found in the works proposed by [10] to grade rice quality, or [11] and [13] to classify corn according to their quality using CNN.

Grain Variety Classification
On the contrary to previous approaches, there are some works intended [33] propose a hyperspectral imagery system to classified seed varieties using an LS-SVM model. In the same way, in [21], the authors propose a system to effectively classify 17 varieties of maize seed based on a multi-linear discriminant 525 analysis model. On the other hand, in [56], the authors present an approach that implements a pixel-wise algorithm to classify wheat grains of 24 different Australian varieties. The authors in [34] and [25] propose approaches to classify rice variety using LS-SVM and CNN respectively. Similarly, in [46] the authors propose an approach to identify six varieties of Basmati rice; the approach is 530 based on color, morphological and textural features. This subsection is just a summary of some of the recent approaches proposed in the literature for grain variety classification.

Discussions on Applications
As presented above, computer vision systems have been used to support grain is not a benchmark dataset to be used as a reference to evaluate and compare results of applications targeting the same problem.

GRAIN VARIETY
This section reviews the state-of-the-art grain classification approaches ac-  Table   2 groups the reviewed literature according to the grain variety and depicts the most important features of each approach.

Corn
Corn is the cereal with the highest production worldwide, being fundamental 560 in the human diet and some animal species, also has high genetic variability, which allows it to adapt to any climatic environment according to FAOSTAT.
Given its high industrial use, it is important to improve the quality control of the corn grain. Hence, this section list a series of techniques focused on the automatic classification of corn kernels. Some of them have been able to classify 565 up to 17 different classes (e.g., [21], [33]), using hyperspectral or multispectral images. The analyzed techniques are mostly non-touching kernels [24]. A few approaches have been implemented with CNNs [14], which allows improving the classification accuracy. Like in the rice and wheat cases, in the corn there are approaches intended to discriminate corn grains by variety [24], while others 570 are intended to classify according to the quality of the sample (e.g., [19], [29]).
Each implemented approach has its own data acquisition process, generating a dataset that is not available for further comparisons or improvements.

Rice
Rice is at the base of the food chain in many countries; according to FAO-575 STAT rice production represents the second-largest cereal production after corn.
To improve the process of identifying the types and quality of rice an automatic and accurate classification process is required, which is a challenging problem due to the high similarity between the different varieties. This subsection lists the different approaches proposed in the literature for rice classification. In the reviewed literature, some recent approaches for classifying rice kernels according to the different varieties have been proposed (e.g., [34], [28]), other approaches have been proposed to classify rice kernels according to their quality (e.g., [47], [45]), while other approaches have been devoted for both, classifying according to the variety and quality [43]. Among the different reviewed approaches, hyper-585 spectral and multispectral based techniques, generally using CNNs, are the ones that allow classifying the greatest number of varieties or reaching the highest performance on quality classification (e.g., [25], [45], [34]). In spite of that, the vast majority of techniques use images from the visible spectrum (e.g., [15], [27], [23], [3], [20]). Most of the approaches are intended for the non-touching kernel 590 scenario, which represents an opportunity to explore techniques that support touching kernels.

Wheat
Wheat is the third most important grain in the human food chain after corn and rice. The huge volume of production needs effective methods to evaluate 595 the quality of the grains, and to improve productivity in the industry. The main objective is to improve quality control and replace manual processes that require time, effort, and are ineffective in most cases. The following is just a summary of some of the approaches proposed in the literature for the automatic classification of wheat grains. During last decades several techniques have been designed to 600 determine the wheat variety (e.g., [1], [53], [51]) or quality of wheat grain (e.g., [36], [30], [31]). On average, the approaches proposed in the literature tackle the two or three class problems. On the contrary to the rice classification problem we can find approaches for touching kernels [36], as well as approaches for the not touch case (e.g., [53], [37]). In the vast majority of cases, the proposed solutions 605 are based on machine learning (e.g., [36], [31]). Like in most computer vision domains, we can find also CNN based approaches for the wheat classification [30], although up to our knowledge there is not that much work based on deep learning. Hence, this can be an opportunity to improve the precision of current wheat classification approaches.

Barley
According to FAOSTAT, barley is the fourth most cultivated cereal worldwide and is currently grown significantly as animal feed, malt products, and human food, respectively. Due to the importance of this cereal, it is necessary to have mechanisms that allow automating specific tasks within the production  [12]. In most of the approaches, the analyzed cluster of grains contains non-touching kernels, which is not a realistic scenario. In a large number of cases, the proposed solutions are based on machine learning although very 625 few works are based on deep learning techniques (e.g., [12], [6]), which provides an opportunity to improve the accuracy rate of previous works.   have been proposed to improve the selection processes of coffee beans. Some of these techniques are listed below, describing their scope and focus. The analyzed coffee classification techniques are based on images of the visible spectrum [22], [18]. The classification discrimination in most cases is limited to two classes (e.g., [22], [16]). The developed methods generally use traditional ma-635 chine learning based techniques [22]. On the contrary to the previous cases, in the coffee beans classification problems, all techniques tackle the non-touching kernel classification case [18].

Others
In addition to the approaches listed above, which were focused on the largest 640 production grains, there are other approaches intended to classify different food grains. These approaches are mainly motivated by quality classification. among others. For instance, in [12] the authors propose a multi-variety approach that is trained using synthetically generated images. Others multigrain classification approaches use real images of the visible spectrum, for instance in [35] and [54] machine learning based approaches are proposed to perform multiple grading of grains. In most of the cases, the implemented multi-variety 660 techniques use morphological characteristics in order to differentiate the specific details of the grains and perform a better classification (e.g., [53], [51], [48], [58]). Although results from multi-variety approaches are interesting, they do not reach the performance of single variety approaches. Furthermore, in general, in the seed classification problem, there are not multi-variety scenarios. In 665 other words, the grains come from the harvest of a single crop.

Discussions on grain variety
After reviewing the techniques that allow the classification of different types of grains it can be summarized that former works were mainly based on the analysis of color and texture features; geometry has been also considered in some CNNs are trained with a large labeled dataset. In most cases, the approaches use their own datasets to train and validate the techniques. Following the trend on deep learning based approaches, in the grain classification, there are some approaches based on the usage of synthetic ground truth. As mentioned above, this allows tackling the classification of different varieties at a low cost (i.e., a large amount of ground truth data is obtained easily). Although not included in the pipeline presented in Fig. 1, ground truth data generation is reviewed in the next section.

680
As specific conclusions for each grain variety, it can be stated the following.
In the case of a rice grain, the proposed approaches have been migrating from classical machine learning techniques to CNNs models, in order to improve the efficiency of the obtained results. In the wheat grain classification domain, it could be observed that the use of multispectral or hyperspectral images is gener-685 ally used to improve class differentiation. In the case of coffee grain approaches, the proposed techniques mostly use images of the visible spectrum and do not explore the use of CNNs. The touching kernel scenario has not yet been explored, which is an opportunity to tackle new problems. Additionally, exploring the use of multispectral or hyperspectral images to improve class differentiation 690 has not been yet considered. In the case of multi-grain techniques, although attractive results have been obtained, their performance does not reach standalone single variety approaches.

GROUND TRUTH
Although not included in the pipeline presented in Fig. 1, ground truth 695 data are an important part for both, validating results from a given approach as well as comparing performances from different proposals. In addition to these usages, ground truth data are needed to train machine learning-based approaches. In general, a large amount of tagged data are required for training algorithms, which becomes a laborious and time consuming task. A possible 700 solution to this problem is to work with synthetic images, which include the necessary annotations, with which, there is no longer a dependency on trained human work in making annotations. This section reviews strategies followed in the literature to generate datasets, both real and synthetic, together with the corresponding annotations for a ground truth generation.

Real data
Most of the authors of the reviewed literature perform data acquisition from scratch, at all times controlling the conditions of the environment where the images are obtained. For example, the distance and location of the camera are always controlled, where most of the time the camera is orthogonal to the acqui-710 sition surface at a specific distance to view the largest amount of grains and also maintaining an adequate aspect ratio. Another important condition to consider was the light source, which generally is located on top of the working area where the grains were placed (e.g., [38], [16]). There are different approaches to carry out annotation tasks of the ground truth, some authors use the manual label-715 ing of the input data with the help of crowdsourcing tools such as Labelbox 4 , Voxel51 5 , Lionbridge 6 , SuperAnnotate 7 , just to mention a few [13], [12]. This way of performing data annotations is the most expensive method in terms of time and resources used and depends on the number of objects present in the scene. Hence, trying to avoid this time-consuming task, some authors [15] 720 use digital image processing techniques to partially automate the annotation process; among the different approaches proposed in the literature, watershed, discussed in Sec. 2.3.1, is the most used despite of the fact it has some drawbacks (e.g., over-segmentation, delimitation of incorrect contours, among others) but with controlled environmental conditions it is a good option to save time. 725

Synthetic data
In most of the approaches mentioned in previous sections, the ground truth has been obtained from images captured from the real world, as aforementioned this task requires a lot of effort and time on both activities: image acquisition and image annotation. Trying to overcome these problems some authors gen-730 erate ground truth from synthetic images. This synthetic images are obtained from virtual environments where different grain distributions (e.g., [12], [8]) may be generated. It should be noticed that the usage of 3D grain models in virtual environments not only helps to avoid the time required for the acquisition and annotation but also it helps to generate datasets with large variability, 735 which are required for training deep learning algorithms. As more complex the 3D grain model (parametric representation that allows changes in size, texture, and color) and virtual environment (lighting conditions, camera models, etc.) as large the acquired dataset will be. Taking advantage of the generalization offered by the synthetic data acquisition framework, Toda et al. [12] propose a propose the usage of a synthetic cluster generator. This generator uses three different ways of distributing the wheat grains to form the clusters. The first approach places the kernel images, obtained from real scenarios, randomly; in the second approach the kernels are also placed randomly but enforcing a maximum overlap constraint; while in the last approach, the kernels are placed using 755 a cell-population simulator. The resulting synthetic images are used as input to a U-Net architecture that performs the task of instance segmentation.

Discussions on ground truth
Generating ground truth involves spending time and resources that depend on the type of approach used to generate it. In the case of real data, the labeling 760 time is very long since each of the images obtained in the acquisition stage needs to be manually labeled, as well as having experts in the area of the type of grain to be analyzed. On the other hand, and although used in a lesser proportion, the generation of synthetic data seems a better option. By using synthetic images there is no need to label the data neither to have an expert devoting 765 time to this task; obviously, results would depend on the quality of the 3D model used to represent the given grain (i.e., how similar it is to the real grain), variability of such a 3D model (i.e., model parameters used to generate different representations based on the combination of shapes, texture, and color), and the virtual environment (i.e., lighting conditions, shadows, camera, and lens 770 modeling, etc.) used to generate the synthetic images. Although considering all the advantages synthetic data seems to be the best option, it is also true that because it is a relatively new approach applied to the area of food grain problem, just a few authors use it. Most of the works are based on the usage of traditional techniques (i.e., ground truth manually annotated on real images). using dense SIFT features with SVM classifier, Computers and Electronics