Dataset of annotated food crops and weed images for robotic computer vision control

Weed management technologies that can identify weeds and distinguish them from crops are in need of artificial intelligence solutions based on a computer vision approach, to enable the development of precisely targeted and autonomous robotic weed management systems. A prerequisite of such systems is to create robust and reliable object detection that can unambiguously distinguish weed from food crops. One of the essential steps towards precision agriculture is using annotated images to train convolutional neural networks to distinguish weed from food crops, which can be later followed using mechanical weed removal or selected spraying of herbicides. In this data paper, we propose an open-access dataset with manually annotated images for weed detection. The dataset is composed of 1118 images in which 6 food crops and 8 weed species are identified, altogether 7853 annotations were made in total. Three RGB digital cameras were used for image capturing: Intel RealSense D435, Canon EOS 800D, and Sony W800. The images were taken on food crops and weeds grown in controlled environment and field conditions at different growth stages


Specifications
Agronomy and Crop Science, Computer Vision and Pattern Recognition.

Specific subject area
Object classification, Object detection, Object recognition, Crop growth and development.

Type of data
Image. Annotations.

How data were acquired
Data was acquired by capturing images in field conditions and in a controlled environment.

Description of data collection
Dataset consists of two directories. Directory images ) 1118 food crops and weed images and directory annotations, i.e. their 1118 counterpart annotation XML files, which can be included 7853 annotations of two classes: food crops (six species), 441 annotations and weed (eight species), 7442 annotations.

Data source location
Municipalities: • Value of the Data • The dataset presents images of food crops and weed in their seedling growth stages and, respectively, their manually annotated images. It can be useful for agronomists and researchers in different fields for precision agriculture and computer vision tasks. • The dataset is open access and can be used for future researchers and engineers for constructing their own food crops and weed recognition algorithms. • The dataset can be used for benchmarking deep learning algorithms, recognizing objects, constructing models and navigate robotics. • The dataset can furthermore be used to train, tests and validate convolutional neural networks. was used in a controlled environment. The data is organized into two directories: images and annotations.  ually. All raw images were annotated by human experts. The experts were asked to mark food crop and weed species with closed polygons and to assign a type (food crop or weed) to each polygon. All pixels that lie inside a polygon inherit the label from the polygon. The annotated dataset contains both the polygon information and the crop /weed annotation images.

Experimental set up
The species of the food crops were chosen based on their popularity among consumers in Latvia and the necessity to implement intensive weed management solutions.
Two types of images are included in data set: (i) images of food crops and weeds that have been cultivated in vegetation pots in controlled greenhouse conditions; (ii) images of food crops and weeds from open field conditions. Images of plants from greenhouse were taken at the Scientific Institute for Plant Protection Research "Agrihorts", University of Life Sciences and Technologies of Latvia, Jelgava, Latvia. Images from field conditions are from three locations in Latvia: Kekava, Rujiena, and Krimulda. The digital images were captured with perspective projection over plants.
To build the dataset, common weed species found in vegetable fields were selected, 8 weeds: goosefoot ( Chenopodium album ), catchweed ( Galium aparine ), field pennycress ( Thlaspi arvense ), shepherd's purse ( Capsella bursa-pastoris ), field chamomile ( Matricaria perforata ), wild buckwheat ( Polygonum convolvulus ), field pansy ( Viola arvensis ), quickweed ( Galinsoga parviflora ). There were 6 food crops selected: beetroot ( Beta vulgaris ), carrot ( Daucus carota var. sativus ), zucchini ( Cucurbita pepo subsp. pepo ), pumpkin ( Cucurbita pepo ), radish ( Raphanus sativus var. sativus ), black radish ( Raphanus sativus var. niger ). The list of food crops and weeds is presented in Table below: In a greenhouse, plants were grown in vegetation pots under natural and artificial light. The peat substrate was used for soil preparation with such characteristics: pH 6.0, moisture content < 65%, peat fraction < 20.0 mm, N 12.0%, P2O5 14.0%, K2O 24.0%, Te 1.0 kg m-3. In each vegetation pot, the seeds of the plants were sown in one to two rows at a distance between them of 2.0 -5.0 cm. The seedling boxes were watered once to twice per week. Temperature was set to + 20. Images of plants in field conditions were taken in the organic vegetable farms in three locations specified above before weeding activities carried out by farmers. To capture images a platform mounted on four wheels was constructed. A digital camera was attached in the middle of the platform with the lens directed downwards. The platform was moved across the field in different directions to take photos of the plants. Afterwards, weeds and crops at early growth stages were manually marked in the pictures with ground truth bounding boxes, see Fig. 2 .
Python-based software was created to make annotations of the images. Each image labeled individual food crops and weed, taking into consideration that the area of the green surface of the crop leaves doesn't overlap with the leaves of other crops. The labels of each image, with their coordinates, were stored in the document folder annotations . Annotated XML files are compatible with annotation rules used in popular Pascal VOC dataset: http://host.robots.ox.ac. uk/pascal/VOC/voc2007/

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.