An annotated image dataset of vegetable crops at an early stage of growth for proximal sensing applications

This article introduces a dataset of 2 801 images of vegetable crops. Maize (Zea mays), bean (Phaseolus vulgaris) and leek (Allium ampeloprasum) crops at an early stage of development (between 2 and 5 weeks from seeding of transplanting) are supported. Two kinds of annotations are provided: (i) bounding boxes enclosing the crops of interest or their stems, weeds being left apart, and (ii) crop structures in the form of star graphs whose vertices are the plant organs (stems and leaves) and whose edges represent the connections between them. The images have been captured in various production and experimentation plots in France using an acquisition module which controls light conditions. They present a wide variety of soil conditions, weed infestation and growth stages. This dataset can benefit precision hoeing and in-field crop monitoring applications that are based on proximal imagery.


Value of the Data
• This dataset is a collection of images of vegetable crops at an early stage of development. The images are annotated with crop bounding boxes and crop structures, i.e. the precise location of the organs (stem and leaves) and their relations. This data is highly valuable to develop automated solutions for precision hoeing and in-field crop monitoring applications. • These data can benefit research institutes working on the automatic detection of crops as well as commercial applications exploiting Deep Learning models for real-time precision hoeing and crop monitoring. • These data can be used to reproduce our results [2,3] . They can also benefit any research in computer vision and artificial intelligence that aims at developing new crop detection and location algorithms, the performance presented in the related research articles being the bottom line for experimental comparison.

Data Description
The dataset is composed of 2801 JPEG images. Each image is uniquely identified by its file where:  • < date > is the date of image capture in ISO 8601 standard format YYYY-MM-DD , • < location > is the location of the image capture, • < label > is the crop specie, • < serie > is an optional number identifying the culture row, and • < id > is the index of the culture in that row.
A collection of images that are not dated, located or that do not have a label is also prorandom in the dataset. They present different soil conditions, stages of growth and weed infestations.

Bounding Boxes Annotations
2 610 images are annotated with rectangular bounding boxes. Each crop is annotated with two bounding boxes: one for the whole crop and another one centered on the crop stem entry point in the soil. Fig. 2 illustrates the bounding box annotations on three images of the dataset. Whole crop annotations are depicted in red, green and orange, and stem annotations are depicted in pink, blue and yellow, respectively for maize, bean and leek crops. Table 1 Number of bounding boxes annotations in the dataset by crop type. "Crops" refers to the number of whole crop bounding box annotations while "Stems" refers to the number of stem bounding box annotations. The label "no-obj" refers to the images without any crop. Images  Crops  Stems   Maize  1 065  2 264  2 274  Bean  779  2 913  2 918  Leek  601  3 070  3  The bounding boxes annotations are provided in the standard PASCAL VOC format [1] (XML file). Each bounding box has a label depicting the crop specie (maize, bean or leek) and if the bounding box represents a stem of not, e.g. the label of a maize crop is maize and the label of a bean stem is stem_bean .

Crop Type
Each annotation file has the same name as the corresponding image excepted the .xml file extension. Table 1 summarizes the number of whole crop and stem bounding boxes annotations of the dataset by crop type.

Crop Structures Annotations
1 135 images are annotated with crop structures. A crop structure is composed of the whole plant bounding box and a star graph where the plant organs (stem and leaves) are the vertices and the connections between them are the edges. Fig. 3 illustrates the crop structure annotations on two images of the dataset. For each crop, the bounding box is depicted with a blue rectangle, the blue dot being the bounding box center. The stem keypoint is depicted by a green dot and the leaf keypoints are depicted by red dots connected to the stem by a red line.
There is no standard serialization format for star graph annotations so crop structures are provided in a custom JSON format. Each annotation has the same name as the corresponding image excepted the .json file extension. Listing 1 presents an example of a JSON annotation file for one image containing one bean crop with two leaves: • The top level field image_name identifies the image corresponding to this annotation.
• The other top level field objects lists the crop annotations of the image. Each crop annotation have: Table 2 Number of crop structures annotations in the dataset by crop types. "Images" refers to the numbers of images. "Crops/stems" refers to the number of whole crop bounding box annotations, which is equal to the number of stem keypoint annotations. "Leaves" refers to the number of leaf tip keypoint annotations. The label "no-obj" refers to the images without any crop. -a label field depicting the crop specie ("maize" or "bean"), -a box field storing the bounding box coordinates of the whole crop, and -a parts field storing a list of keypoint annotations. Each keypoint annotation has a kind field (either "stem" or "leaf") depicting the keypoint type and a location field indicating the keypoint location in the image.
For each crop, there is exactly one stem keypoint and one bounding box as well an arbitrary number of leaf keypoints, zero included. The bounding boxes and keypoints coordinates are expressed in pixels in the usual image coordinates system whose origin is the top-left corner. Table 2 summarizes the number of crop structures annotations in the dataset and details the number of annotated stem and leaf keypoints. Additionally, the dataset contains 4 text files listing the images used for the training and validation of the deep neural networks presented in our published articles. The two files sdnet-train.txt and sdnet-valid.txt list the images used for the training and then validation of SDNet [3] . The two files yolo-train.txt and yolo-valid.txt list the images used for the training and the validation of the neural networks presented in [2] .

Experimental Design, Materials and Methods
The images were acquired at four different locations in France: INRAE Montoldre, CTIFL Prigonrieux, Fermes Larr ȿ re at Liposthey and Bordeaux Sciences Agro in Bordeaux. Three species of vegetable crops are coverered by this dataset: maize ( Zea mays ), bean ( Phaseolus vulgaris ) and leek ( Allium ampeloprasum ). The crops are at an early stage of development (2 to 5 weeks from the seeding). The crop rows have not been hoed or treated with phytosanitary products. Natural weeds (mostly purslane ( Portulaca oleracea ), black nighshade ( Solanum nigrum ) and couch grass ( Elymus repens )) or manually seeded ones (mustard ( Sinapis alba ), raygrass ( Lolium spp. ), matricaria ( Matricaria chamomilla ) and lamb's quarter ( Chenopodium album )) may be present.
The acquisition of images is performed with the acquisition module depicted in Fig. 4 . The camera (Basler acA2500-gc equipped with a C125-0418-5M F1.8 f 4 mm Basler lens for the current acquisition module) is facing the soil at an elevation h between 35 cm and 45 cm from the ground. The light conditions are artificially controlled: a hull isolates the camera from the exterior and two 20 W led panels provide a c onstant and homogeneous scene illumination.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.