Multimodal Classification of Mangoes

Grading, sorting, and classification of agricultural products are important steps to ensure a profitable and sustainable food industry. Human-intensive labors are replaced with better devices/machines that can be used in-line and generate sufficiently fast measurements for a high production volume. Most previous works focused on only one of the external quality parameters, such as color, size, mass, shape, and defects. In this work, we proposed an integrated machine vision system that can grade, sort, and classify man-goes using multiple features including weight, size, and external defects. We found that weight estimation using our proposed algorithm based on visual information was not statistically different from that of a conventional weight measurement using a static digital load cell; the estimation error is relatively small (4–5%). We also constructed an artificial neural network model to classify mango having multiple types of external defect; the classification error is less than 8% for the worst possible case. The results indicate that our system shows a great potential to be used in a real industrial setting. Future work will aim to investigate other features such as ripeness and bruises to increase the effectiveness and practicality of the system.


Introduction
Food standards are evolving both to ensure the sustainability of agriculture and to satisfy consumer needs.The reputation of producers and consequently their market share is based on the quality of the product, which makes quality controls very crucial.The market together with ever increasing social concerns about good agricultural practices, including environmental, economic, and social sustainability and traceability, require guarantees of high quality from the earliest stages of the crop to postharvest storage and treatments.
Optical sensors have been used extensively in the industry ranging from the automatic sorting of products into categories to the control of processes which are difficult to observe, for instance, because of their long duration [1].At this point it is important to note that the quality of biological products is not easy to assess, as individuals of the same category may differ greatly from one to another in terms of color, shape, or size.Furthermore, because they are living products, their physiochemical properties evolve over time.Their inherent variability sometimes introduces a certain amount of subjectivity into quality control, thus increasing the difficulty involved in developing automated inspection systems.Addressing these challenges often requires research in advanced and multidisciplinary technologies and sometimes the use of expensive equipment.
In this work, our focus is on mango, Mangifera indica, especially the Cat-Chu cultivar due to the increasing export potential in our country-Vietnam.
Postharvest handling of mangoes is usually completed in several steps: washing, sorting, grading, packing, storage, and transportation as shown in the following Figure 1; among which, sorting and grading are considered the most important especially for fresh agricultural products.
Sorting of agricultural products is accomplished based on external quality parameters such as color, defects, shape, and sizes.Manual sorting is based on traditional visual quality inspection performed by trained human operators situated on one or both sides of a conveyor belt.They visually inspect the produce and remove those not satisfying the predetermined quality standards.Pieces are transported slowly enough to allow the workers to inspect all of them and even manipulate them to ensure the inspection of most of their surface.The process is normally tedious, time-consuming, subjective, slow, expensive, and nonconsistent.A cost-effective, consistent, faster, and accurate sorting can be achieved with a machine vision-assisted sorting.
In this work, we present an integrated machine vision-based inspection system including sorting, grading, and weighing of mangoes-particularly, the Cat-Chu cultivar.

Our research 2.1. Mass estimation
Consumers usually prefer fruits having almost uniform masses and shapes.This is also one of the requirements for export.However, one cannot easily model mango shapes which are not round or oval-shaped.Commonly accepted laboratory instruments are shown in Figure 2 including a Vernier caliper for size/length measurements, a water replacement measurement setup to estimate volumes, and a planimeter to calculate areas.These methods are timeconsuming and not suitable to be implemented into a real production line.
Several attempts have tried to formulate a relationship between mangoes' masses and their sizes [2][3][4][5].Guzman-Estrada et al. [2] used a set of complicated geometrical parameters to estimate the mass of mangoes; most of the parameters can only be obtained using a mechanical measurement tool.Vasquez-Caicedo et al. [3] tried to use five parameters such as length, width, and thickness at maximum width and minimum width to estimate mango weight.Yimyam et al. [4] used four digital photographs to produce a three-dimensional model of Nam-Dokmai mangoes.Most of these methods did not provide easy-to-obtain parameters, except for Spreer et al. [5]; they provide an experimental weight-size correlation based on just three parameters-Length, Max Width (W), and Max Thickness (T) for a specific mango cultivar (Chok Anan) from Thailand.
The weight estimation method using Speer's method is shown as follows: Estimated mass _ ChoknanMango ( grams ) = 5.39 × 10 −4 × L (mm) × W (mm) × T (mm) (1) In this work, we will try to use Spreer's approach to find a meaningful relationship between shape parameters and masses of Cat-Chu mangoes.We used over 200 mangoes as a training dataset to establish the necessary weight-size relationship.Fortunately, we also obtain a linear relationship as shown in the following Figure 3.The constant in our case is 4.879 × 10 −4 .The obtained R 2 is about 97.6%.
Our estimated mass is: To validate our findings, we collected an additional 68 mangoes to be used as a validation dataset.The accuracy achieved is impressive, with an average error percentage of 3.23%.This further proves that the simple, linear correlation between mass and sizes can be used to estimate the corresponding mass effectively.
We also designed and constructed an image capturing platform to obtain the images from two different viewpoints (top and side views).The platform would also be used to test the algorithm's ability to estimate mangoes' masses solely based on their sizes.An algorithm was developed to capture and process the images while mangoes travel along a conveyor.
Top and side views of the mango were captured to estimate the mango mass using Eq. ( 2), and the result will also be compared with conventional mass measurement using a calibrated digital scale.We found that the difference between the masses estimated using this technique was not statistically different from the conventional method using a digital scale (p < 0.05).Classification result showed an accuracy of 95-96% when grading mangoes solely based on masses.

Image segmentation
In this section we review a few methods for automatic selection of threshold values; the most important methods that we will discuss are Otsu's method and the valley-emphasis method.For a more general discussion regarding thresholding techniques, please read the reference "Machine Vision" by Davies [6].

Otsu's method
This used to be one of the de facto algorithms in image segmentation [7].An image is a twodimensional matrix of N pixels, each with an intensity level between 0 and L-1, where L is the number of distinct gray levels.The number of pixels with a certain gray level i is denoted as f i , and the probability of occurrence of gray level i is given by The average of the intensity level of the whole image can be calculated as By segmenting the image using a single threshold, we get two disjoint regions C 1 and C 2 , which are formed by the area of pixels with gray levels [1,…,t] and [t,…L], respectively, where t is the threshold level.Normally, C 1 and C 2 correspond to the object of interest and the background.The probability distributions of C 1 and C 2 are The mean gray-level values of the two classes can be computed as Using discriminant analysis, Otsu [7] showed that the optimal threshold t* can be determined by maximizing the between-class variance, that is where the between-class variance σ B is defined as Otsu's method works well when the images have clear peaks and valleys-in other words, it works for images whose histograms show clear bimodal or multimodal distributions.There are times when histograms of images contain several different types with widely varied number of pixels, such as external defects; Otsu's method will not give the correct threshold level as shown in the following Figure 4.

Valley-emphasis method
To improve drawbacks of Otsu's method, Ng et al. [8] proposed the valley-emphasis method.The idea of the valley-emphasis method is to select a threshold value that has a small probability of occurrence (valley in the gray-level histogram) and also maximize the between-group variance, as in Otsu's method.The formulation for the valley-emphasis method is The addition of an extra weight factor, (1-P t ), ensures the calculated threshold having a small probability of occurrence Pt will always be selected.Hence, the name valley-emphasis because the threshold level will always reside at the valley of the histogram.For images that have apparent bimodal distribution, the valley-emphasis method should give a threshold value that is very close to the value generated by Otsu's method because both methods attempt to maximize the between-group variance of the histogram.
The same segmentation experiment done previously using Otsu's method is repeated using the valley-emphasis method as shown in Figure 5.We can clearly observe that the segmentation result is much better.And, the result can be utilized for further analysis steps.

Defect isolation
Due to their green appearances, we use G channel as the main channel, since it will be much easier to observe defects.To make the defects stand out, we use a simple linear contrast enhancement as shown in [3].The results shown in Figure 6 illustrate the effectiveness of the contrast enhancement.After image enhancement, we apply another round of valley-emphasis segmentation on the area of the mango mask to isolate the defect zones.The result was illustrated as shown in Figure 7.
To simplify the calculation effort, we only concentrate on defects that are equal to or larger than 30 pixels.After segmenting the defect zones from the previous steps, we will use their sizes and locations on the original image to generate the new defect candidate for further classification steps as shown in Figure 8.

Defect classification
There are many kinds of defects that negatively degrade mangoes' quality [9].Among them, four kinds that are most commonly seen are shown in Figure 9 including stripe-type scars, dark patches, sap burns, and small spots.The defect classification steps will help us know how many kinds of defects are present on the fruit skin area as shown in Figure 10.

Color features
We use an artificial neural network with inputs as color features, shape features, and image statistical information.Li et al. [10] suggested that using HSV (HSI) instead of RGB color space improves segmentation results.In this research, there are 18H bins, 3S bins, and 3V bins.Therefore, we will have 162 features in HSV space.

Shape features
To calculate shape features, we used the moment invariant proposed by Hu [11] with practical implementations by OpenCV as in [12].The classification results are summarized in Table 1.From the statistics, we can see that the classification accuracy reduces with an increasing number of defect zones and it also takes more computation time.The result is quite promising to be applicable to an automated sorting and grading system.In the current version, no acceleration techniques have been applied; in the near future, advanced parallel programming technique using graphics processing units (GPU) can be utilized to speed up the process, hopefully, to achieve a real-time performance level.

Conclusion
In this work, we have established an integrated framework for an automated grading, sorting, and weighing system of Cat-Chu mangoes using features including weight, size, and external defects.We found a simple, easy-to-calculate formulation between simple parameters and mango mass.The estimation error is very small, less than 3% if we use a mechanical measurement tool and less than 5% if we use an optical measurement using top-and side-view image captures.We also proposed an innovative procedure to classify external defects based on an  artificial neural network.The classification error is less than 8% for the worst possible case.
The results indicate that our system has a great potential to be used in a real industrial setting.Future work will aim to investigate other features such as ripeness and bruises to increase the effectiveness and practicality of the system and possible speedup to real-time performance using advanced graphics processing unit (GPU) and further code parallelism.

Figure 1 .
Figure 1.Typical postharvest steps of agricultural products.

Figure 2 .
Figure 2. Equipment setup for measurement by lab instruments: (a) using a Vernier caliper-a device to measure size, (b) using water replacement method-a device to measure volume, and (c) using a planimeter-a device to calculate area.

Figure 3 .
Figure 3. Correlation between mango masses and their sizes: L, T, and W.

Figure 5 .
Figure 5. Segmentation result using valley-emphasis method: (a) original image, (b) histogram with the valley-emphasis threshold level (red), and (c) resulting image.

Figure 6 .
Figure 6.Contrast improvement after background removal: (a) before and (b) after.

Figure 7 .
Figure 7. Defect zone isolation: (a) original image, (b) after background removal and contrast enhancement, and (c) defect isolation result.

Figure 8 .
Figure 8. Defect zones on the original image.

Figure 11 .
Figure 11.Our proposed feed-forward neural network with 193 inputs, 98 neurons in a hidden layer, and 4 outputs.

Table 1 .
Summary of classification results.