QuinceSet: Dataset of annotated Japanese quince images for object detection

With long-term changes in temperature and weather patterns, ecologically adaptable fruit varieties are becoming increasingly important in agriculture. For selection of candidate cultivars in fruit breeding or for yield predictions, fruit set characteristics at different growth stages need to be described and evaluated, which is largely done visually. This is a time-consuming and labor-intensive process that also requires sufficient expert knowledge. The annotated dataset for Japanese quince - QuinceSet - consists of images of Japanese quince (Chaenomeles japonica) fruits taken at two phenological developmental stages and annotated for detection and phenotyping. First, after flowering, when the second fruit fall is over and the fruits have reached 30-50% of their final size, and second, at the ripening stage of quince, just before the fruits are yielded. Both stages of quince images classified as unripe and ripe were annotated using ground truth ROI and presented in YOLO format. The dataset contains 1515 high-resolution RGB .jpg images with the same number of annotated .txt files. Images in the dataset were manually annotated using LabelImg software. A total of 17,171 annotations were provided by the experts. The images were acquired on site at the Institute of Horticulture in Dobele, Latvia. Homogenization of the images was performed under different weather conditions, at different times of the day, and from different capturing angles. The dataset contains both fully visible quinces and quinces partially obscured by leaves. Care was also taken to ensure that the foreground, which contains the leaves has adequate brightness with minimal shadows, while the background is darker. The presented dataset will allow to increase the efficiency of the breeding process and yield estimation, to identify and phenotype quinces more reliably, and may also be useful for breeding other crops.

a b s t r a c t visible quinces and quinces partially obscured by leaves. Care was also taken to ensure that the foreground, which contains the leaves has adequate brightness with minimal shadows, while the background is darker. The presented dataset will allow to increase the efficiency of the breeding process and yield estimation, to identify and phenotype quinces more reliably, and may also be useful for breeding other crops.
© 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

Value of the Data
• Japanese quince (Chaenomeles japonica) as a fruit plant is a comparatively new crop, so there is relatively little research on it. The more the fruit composition is analyzed, including non-invasive methods, the less cultivation and selection issues are addressed with manual measurements. Therefore, the publicly available dataset for Japanese quince presented here, which includes data from unripe and ripe quince annotated with ground truth ROI for Japanese quince detection and phenotyping, should play a central role in helping breeders develop phenotyping strategies. • The dataset contains annotated image data classified into two classes according to the phenological stage of development of Japanese quince: unripe and ripe Japanese quince. The first, unripe -about a month after flowering, when the second fruit fall is over underdeveloped fruit sets had already fallen), and yield can be statistically predicted. The second, ripewhen the fruits are fully ripe and the yield can be estimated.
• The precision agriculture community can benefit from these data to detect, evaluate, and monitor the Japanese quince breeding process and test more effective yield prediction more accurately. • The presented dataset can be used by researchers for image processing pipelines and model  calibration in computer vision and for training, testing, and validating Convolutional Neural  Networks and Visual Transformers.  • The dataset can be used by researchers to develop and train quince classification and recog-nition models, and to develop new phenotyping algorithms. • Farmers can use cell phones in combination with other technological means (e.g., drones) to predict and evaluate the harvest of Japanese quince.

Background
Chaenomeles japonica is a diploid species belonging to the Maloideae, Rosaceae. It is a dwarf shrub originally from central and southern Japan. Japanese quince was brought to Europe already in 1869 and has ever since been appreciated as an ornamental plant because of its showy, long-lasting flowering. [4] . Latvia was one of the first countries in Europa, that in the 1950ties started the breeding of Chaenomeles japonica as a fruit crop for processing. For the last 30 years Japanese quince as a fruit crop is well known not only in the Baltic countries but also in Ukraine, Scandinavia, Germany and Poland. Fruits are an interesting raw material for the food industry because of their nutritional value. It is known that fruit set and yield are strongly dependent on genotype [5] . In Latvia, the breeding of Japanese quince continued in LatHort in the 1990s with the aim of obtaining local cultivars adapted to the Latvian climate. Significant differences were found between different genotypes in terms of productivity, fruit quality, fruit size, biochemical content, and other traits [6] . After evaluation in LatHort, three cultivars 'Rasa', 'Darius' and 'Rondo' were selected and registered in Latvia. These cultivars are very productive (4-8 kg per bush during full crop); fruits are relatively homogeneous, weigh 40-60 g and ripen in early or mid-September [7] . Chaenomeles japonica is an example of a complex trait characterized by target populations of the environment, i.e., meteorological conditions and genotypes.
Currently, LatHort has collected rich genetic material of Japanese quince, and breeding is being actively pursued. A number of promising hybrids have been identified and are under detailed consideration for registration of new cultivars. The genotypes differ in shrub shape, yield, winter hardiness, disease resistance, fruit quality characteristics including shape, color, biochemical composition, etc., and fruit ripening time. The Table 1 summarizes some of the most important parameters of the registered varieties and future genotypes.
The process of breeding Chaenomeles japonica takes 15-20 years from crossing to variety. To select candidate varieties, the characteristics of several thousand seedlings must be described and evaluated, most of which is done visually. This is a time-consuming and labor-intensive process that also requires sufficient manpower. In addition, visual scoring is relatively subjective, and results may vary among different evaluators [8] . Therefore, the utility of new techniques for non-invasive fruit detection and phenotyping to improve yield performance should be evaluated by adopting Machine Learning (ML) techniques, considering cost-benefit and human-centered considerations.
ML and Deep Learning (DL) techniques have shown very promising results in fruit classification and detection problems [9] and yield quality evaluation [10] . A neat and clean image dataset in precision agriculture [11] supplemented with an image labelling tool [12] is the basic requirement to build accurate and robust ML models for the real-time environment. Previous reviews on the task of fruit detection in the field have reinforced the choice of the RGB camera as the detector of choice because it is inexpensive and easy to implement [13] .

Image capturing
The Japanese quince images were taken in an orchard of the Institute of Horticulture in Dobele, located in the southern part of Latvia (Coord WGS84 56 °37 335 N, 23 °33 233 E). The images were taken on a 0.3 ha plot planted with Japanese quince of eleven genotypes, with an average width of shrub of 0.7-1 m and an average canopy height of 0.5-0.9 m. The images of the Japanese quince were taken with the Samsung Galaxy A8 cell phone, see Fig. 1 .
Before image capturing based on the Japanese quince growing stage, experts of the LatHort evaluated its breeding conditions and the best time for imaging The images were captured in a field environment in sunny, cloudy, and partly cloudy weather. The distance between the camera and the Japanese quince for image capture varied from a minimum of 15-20 cm, in which mainly the quinces were seen, to 20-50 cm, in which the quinces and branches were seen, to 50-70 cm, in which the fruits were seen with the shrub and a maximum of 1 m from the plant. In cases where the quinces were not evenly distributed in the shrub, they were captured within distance to capture all quinces. The images were captured from a different angle, including from the top view to the side view of the quinces, and backgrounds.
Capturing of the images took place at two growing phases of Japanese quince cultivation. It means that images were divided into two phenological development stages of quince: (a) unripe and (b) ripe. The first was captured about one month after flowering when the second fruit fall is over (underdeveloped fruit sets had already fallen) and fruits reached 30-50% of final size. The second "portion" was captured at the ripening stage of the quinces, just before the fruits are yielded. Since not all genotypes (cultivars and hybrids) ripen at the same time, three dates were chosen. Data were collected at two different times for unripe quinces and three different times for ripe quinces, see Table 2 .
Experts of the LatHort evaluated captured Japanese quince images and divided them into two classes (labels) according to growth stage: (1) unripe and (2) ripe. The images of unripe of Japanese quince were acquired from 14th till 15th June 2021 under daylight. The images of ripe Japanese quinces were acquired on 16th, 20th and 23rd August 2021.  Examples of ground truth labelling of individual Japanese quince using LabelImg software in scenes with varying levels of occlusions by other quinces and leaves. The first row above presents the ripe class images with ROI annotations, the second row the unripe class images with corresponding ROI annotations.

Ethics Statement
This study did not conduct experiments with humans and animals.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Data Availability
QuinceSet: Dataset of Annotated Japanese Quince Images for Object Detection (Original data) (Zenodo).