Image dataset of pomegranate fruits (Punica granatum) for various machine vision applications

Dataset - an essential aspect and the requirement for any of the machine learning project. Collection/creation of dataset in the agriculture domain is highly challenging task because the domain itself is uncertain. Main objective of the present paper is to create an image dataset of pomegranate fruits of different grades. Accordingly, we have considered ‘Ruby’ cultivar of pomegranate and sincerely constructed the dataset. Fruits belonging to three grades are considered. The images for each fruit are covered from all the three angles. The dataset created also contains the weights of the fruits. The dataset consists of 12 folders named after their effective quality grades. The usage of this dataset is already proved in the works carried out by the authors in their previous studies. This dataset is highly helpful for the data science engineer / machine learning programmer or machine learning expert working in the field of precision agriculture.


Specifications
Agricultural Sciences Specific subject area Image dataset of Pomegranate Fruits ( Punica granatum ) of different grades Type of data Table  Image  How data were acquired In the present work, data were acquired using two instruments viz.
(1) Weighing Machine and (2) Image acquisition unit (Custom-built) with provision to place light source and cameras.
Data format Raw Parameters for data collection The images are captured using Logitech C905 720p Webcam with 2MP sensor using the Logitech (R) Webcam software Version 2.50 under CFL (Compact Fluorescent Light) as the light source.

Description of data collection
The weighing machine is used to measure the weight of the fruit sample. Image acquisition unit is a custom-built metallic compartment in which object of interest (i.e., pomegranate fruit), light sources and cameras can be positioned within. The unit is designed in such a way that each of the fruit can be imaged in all the three angles. Further, the fruits are imaged, and their weighs are measured every alternate day for a duration of eight days, resulting into four qualities within each grade of the fruit. Data

Value of the Data
• The dataset is important as for as grading of pomegranate fruits is considered. Specifically, the image dataset along with weight as the physical parameter is scares. Hence the dataset is important for the purpose of automated quality grading during post-harvest processing of pomegranate fruits. • The dataset built is made available to the public domain. Such a dataset is of great input for various researchers in building machine learning algorithms for quality grading of pomegranate fruits. • The data may be used/reused by conducting experiments related to the quality grading of pomegranate fruits various machine learning algorithms, apart from the algorithms that authors have incorporated, as provided in the specifications table above. Moreover, the researchers who are involved in automated quality inspection of other fruits also may get benefited indirectly. • Building the grade-wise image dataset of pomegranate fruits along with their weights is the uniqueness of the dataset. Moreover, each fruit is imaged in all the three views.

Data Description
The dataset consists of three grades and four qualities for each grade. Accordingly, there are twelve folders. Each folder is titled after its corresponding quality grade as outlined in Table 6 . In each folder there are 90 images corresponding to the images of three views of 30 sample fruits. We have used the following syntax in naming each image: Additionally, each folder consists of an excel sheet representing the weights of 30 sample fruits of each of the corresponding quality category.

Experimental Design, Materials and Methods
One of the hardest problems that every programmer faces in the development of machine vision applications / solutions is the availability of right dataset . Machine Learning depends heavily on data without which it is impossible to train any of the algorithms to recognize patterns. It is the most important aspect that makes algorithm training achievable. The accuracy of the training depends heavily on the quality of the dataset input [1] . Creating such a dataset is not always an easy stuff.
There are distinctive problems associated with the agricultural and horticultural industries viz. (1) High rise losses in post-harvest (2) Labor requirements (3) Subjectivity (4) Tediousness (5) Inconsistency etc. One of the main causes in the lowered product quality is the huge number of losses during post-harvest that can be found at variegated stages of marketing [2] . However, the studies have proved that all such problems associated with post-harvest losses can be effectively addressed by coalescing Digital Image Processing and Machine Learning techniques at variegated stages of post-harvest processes.
Post-harvest handling of fruits is vital in the horticulture domain as fruits are the important supplement to the human diet. Moreover, production of fruits in India holds an average of 31.3% share in the total production of horticulture crops in the last 5 years [3] . Pomegranate grabs the attention among all fruits as India is one of the biggest producers of pomegranates throughout the world and there is an absurd latent in exporting pomegranate fruits from India.
Grading is one of the important steps of the post-harvest management that is used to arrive at a reasonable pricing of pomegranates in both domestic and export markets. Continued boost in image processing and machine learning domains can provide effective tools and techniques in building systems that are capable of grading the pomegranate fruits provided a right dataset to the learning algorithms. Accordingly, goal of the present proposed work is a sincere effort in building a dataset of pomegranate fruits that aids in developing machine vision-based applications including Grading, Quality assessment and Sorting.
To the best of our knowledge, there is no public dataset available specifically for the gradation purpose of pomegranate fruits. Hence, there is a great need for building the dataset of pomegranate fruits.
There are various researchers working around the globe in fruit grading using machine vision in place. Present section outlines few of the research works consisting of pomegranate fruits. Authors in [8] classified the diseased pomegranate fruits and healthy ones using their own set of images. Quality of pomegranates was evaluated in [9] non-invasively by considering locally sourced Turkish pomegranates. Non-destructive pomegranate fruit grading and classification was carried out in [10] by using the cofilab dataset [8] . Identifying disease on pomegranate fruits using image processing was carried out in [7] using custom built images. Sunburn on pomegranate fruits was identified in [4] using custom built images. From the literature review, it is witnessed that the image datasets of pomegranate fruits are highly scarce. Hence there is a great need to build the dataset of pomegranate fruit images. Table 1 summarizes the previous works in connection with image processing of pomegranate fruits. Images are obtained from two angles  In the current work, 'Ruby' cultivar of pomegranate fruit is considered and accordingly the dataset is built. There are three grades considered for collecting the dataset. There are four persons involved in the grading process along with the corresponding author and all the authors have expressed their sincere gratitude in the acknowledgement section for the personnel involved in this process. Table 2 briefs the description about the data collection. Table 3 outlines the description associated with each of the three grades.
In the present work, pomegranate fruits are collected and are preserved for duration of eight days for the purpose of analysis. The storage conditions are as follows: Average Temperature: 22 °C, Wind: 20 km/h, Gust: 25 km/h, Humidity: 90%, Pressure: 1005 mb. The preserved fruit samples are imaged for every alternate day. Such an analysis resulted the creation of four qualities within each grade. Hence, a total of twelve classes of effective qualities got created. Since the fruit samples are preserved for some duration, tagging of each fruit is done so as to keep   track of each fruit. Fig. 1 shows sample tagging of pomegranate fruits preserved for analysis. The designations for each class label are given in Table 4 . Table 5 summarizes the description of the effective qualities in brief. Readers of this article are encouraged to refer the work carried out by the authors [6] for further details and applications of the qualities in each grade. The quality definitions are same for each of the three grades i.e., for example the fruits belonging to G1Q1 or G2Q1 or G3Q1 all bear the same visual characteristics and applications of Q1 except physical characteristics.

Weight measurement
Following Table 6 specifies the weighing machine used in the current work to measure the weight of each fruit sample.

Image acquisition unit
Images are formed by a blend of the source of illumination and the energy reflection by the objects in the scene [5] . In the present work, the compact fluorescent light is the illumination source and objects are the pomegranate fruits. An image acquisition compartment is custom   built for the purpose of image acquisition, there by mimicking the industrial packing lines. The compartment is a metallic one in which object of interest (i.e., pomegranate fruit), light sources and cameras can be positioned within. This image acquisition unit is represented in Fig. 2 . In order to cover the entire fruit surface area, each of the pomegranate fruits is imaged from all the three angles / views. Following Table 7 gives the specifications of the image acquisition. Finally, the time span of the data collection is summarized in Table 8 .

Ethics Statement
There is no funding for the present effort. There is no conflict of interest. The data is available in public domain.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.