Mapping of Rumex obtusifolius in nature conservation areas using very high resolution UAV imagery and deep learning

Rumex obtusifolius (Rumex or broad leaved dock) is one of the most common weeds in grasslands. It spreads quickly, lowers the nutritional value of the grass, and is poisonous for livestock due to its oxalic acid content. Mapping it is important before any control treatment is applied. Current methods for mapping Rumex either involve manual work or the utilization of ground robots, which are not efficient in large fields. This study investigated the feasibility of using aerial images from unmanned aerial vehicles (UAV) and deep learning to map Rumex in grasslands. Seven pre-trained CNN models were tested using transfer learning on UAV images acquired at 10 m, 15 m, and 30 m height. Based on Cross Validation results, MobileNet performed the best in detecting Rumex, with an F1-Score of 78.36% and an AUROC of 93.74%, at 10 m height. At 15 m, the detection performance was relatively lower (F1-score = 72.00%, AUROC = 88.67%), but the results showed that the performance can increase with more data. Experiments also showed that Rumex detection was dependent on the flight height since the algorithm was unable to detect the plants at 30 m height. The code and the datasets used in this work were released in an open access repository to contribute to the advances in grassland management using UAV technology.


Introduction
Grasslands are an important agricultural ecosystem in Europe since they represent 35% of the utilized agricultural area (Smit et al., 2008). Most of the obtained production in grasslands is destined for fodder and hence, they are primordial for livestock's nutrition and production. Grasslands have proved to reduce soil erosion (Liu et al., 2020) and to store CO 2 (Yang et al., 2019), which cuts down the carbon losses from agricultural land to the atmosphere and contributes against climate change. Nevertheless, the hectares devoted to grassland in Europe are decreasing in the last years -mainly because of two reasons-(1) the production costs are high and (2) the opportunity cost of sowing another crop is bigger. From all the production costs, pest control represents a large part of the expenditures where chemical products are restricted (organic farming) (Flaten et al., 2020) since weeds are removed manually. Therefore, research should focus on developing methods to detect unwanted weeds at early stages and avoid their infestation through the fields.
Rumex obtusifolius (Rumex or broad leaved dock) is one of the most common weeds in production grasslands in the Netherlands (Valente et al., 2019). It is a perennial, herbaceous plant of the botanical family of knot weed plants (Polygonaceae) and reaches heights of 50 up to 120 cm (Mosyakin, 2005). Rumex is a weed because it competes with grass for water, light, and nutrients (mainly Nitrogen) (Hiremath et al., 2013), reducing crop yield and lowering its edibility since it is poisonous for livestock (Krištálováv et al., 2011). Rumex has a high dissemination capacity. A single plant produces over 60,000 seeds per year in their flowering period (Holm et al., 1977). From those, 80% are vital seeds, which will germinate producing new plants. Vital seeds are long-lived and can germinate after burial for 21 years (Toole and Brown, 1946). Rumex control is mainly done by spraying pesticides. Nevertheless, chemicals are only effective in rosette stadium because they are systemic products, which affect only the leaves and the stem of the plant. As Rumex has a taproot, a lot of energy is stored in the root. Hence, the pesticide affects the leaves and the stem, but not the root, which allows a quick regrowth of the weed (Cavers and Harper, 1964). In extensive used grassland and in conservation areas, a large-scale herbicide usage is banned to preserve biodiversity in grassland. Frequent and early mowing is also forbidden to protect soil-breeding bird species and diversity of plants. Both restrictions can result in massive spreading of broad leaved dock and thus complicate the cultivation of the grassland. There are other methods that do not utilize chemicals, for instance crop rotation and biological control (van Evert et al., 2009;Bond and Grundy, 2001). Another alternative is removing the weeds manually or using ground robots (Kounalakis et al., 2019;Lottes et al., 2016;van Evert et al., 2011). However, all the mentioned approaches are inefficient in large grassland fields since they are either expensive, time-consuming and/or labour intensive. Consequently, there is a strong need to find an automatic detection method to control Rumex, among other weeds.

Related work
In the last years, there has been significant progress in the fields of object detection and image classification due to the increase in computation power and the availability of large datasets (Ciresan et al., 2012;Russakovsky et al., 2015;Simonyan and Zisserman, 2015). These advances apply also to weed detection, recognition, and management. Already in 2018, (Kamilaris and Prenafeta-Boldú, 2018) published a survey of 40 papers that used deep learning techniques to address agricultural problems, including weed detection, which outperformed the traditional methods implemented until then. The most challenging step for weed detection is to distinguish between weed and crop species (Wang et al., 2019). To overcome the issue, (Brown and Noble, 2005) used both spectral and spatial features to identify weeds in crops. Similarly, (Hamylton et al., 2020) applied a CNN machine learning algorithm that leveraged information from both the high spatial resolution and the spatial context of the raster grids of the UAV images to map island vegetation. A semi-automatic Object-Based Image Analysis procedure developed with Random Forest was implemented by (Gao et al., 2018) to classify crop, weeds, and soil with an accuracy of 94.50%.
Grasslands have a great diversity of vegetation, which is challenging for algorithms to identify targeted weed species from the rest of the grass. Hence, deep learning applications on weed detection in grassland are quite limited. (Gebhardt et al., 2006;Gebhardt and Kühbauch, 2007) carried out two consecutive studies where image segmentation, local homogeneity calculation, and morphological operations were used to segment homogeneous regions, with detection rates from 71% to 95%. Nevertheless, their algorithms were trained with images taken by hand at close range, which is not feasible in large fields. An open source method that can be applied in site-specific weed management was developed by (Lam et al., 2021), with an F1-score of 78.65%. Nevertheless, they only implemented one CNN architecture (VGG16), many elements required manual intervention, and needed the use of commercial software.
Ground robots have already been used to detect weeds in grasslands. However, they have several disadvantages over UAV, mainly the coverage time and the adaptation to the terrain. The robot used in (van Evert et al., 2011) covered one hectare in three hours, whereas a UAV flies one hectare in less than half an hour -depending on the flight height and resolution to be acquired. Secondly, ground robots need to adapt to the irregularities of the terrain (slope, rocks, pits…), which might cause their break or stop their task, but UAV are not affected by ground difficulties. A combination of both technologies is a good solution to speed up the detection time of weeds. A UAV for a faster weed detection was flown in (Binch et al., 2018), which afterwards informed the ground robot about the location of weeds and redirected the robot to the exact location and enabled it to spray weeds from the ground. Their algorithm achieved 83-95% accuracy, with very specific flight parameters and a flight height of 8 feet. Some size distortions were introduced due to a varying flight height, which need to be controlled in future studies.
As a continuation of our previous research (Valente et al., 2019), UAV imagery is used in this research to map Rumex in grasslands. In this study, seven pre-trained convolutional neural network (CNN) models are applied to a high resolution aerial imagery acquired from an Unmanned Aerial Vehicle (UAV) over wide grasslands to evaluate their feasibility on detecting Rumex plants. To evaluate the performance of the deep learning algorithms, two k-fold cross validation models were implemented for the 10 m and 15 m datasets. To cover the flight height limitation mentioned in our previous paper, three different flight heights (10 m, 15 m, and 30 m) are tested to evaluate the best configuration for Rumex mapping in high-resolution UAV imagery. Moreover, the code and dataset used for this study is available in an open access repository for the scientific community.

Materials and methods
The overall methodology of this study is subdivided into two stages: development and mapping (Fig. 1). The development stage consists of two phases, namely preprocessing and training. During preprocessing, an orthomosaic and its ground truth is transformed into a format suitable for training a deep learning model. This transformed data is used to compare different deep learning models using k-fold cross validation. The best model is then trained using all the data. The mapping phase involves applying the trained model to a new orthomosaic to detect and map the Rumex location. The details of both modelling and mapping stages are explained in the remainder of this section.

Study area
The dataset used in this study was acquired in a grassland field located in Germany, near the city of Kleve. This field belongs to the Salmorth reserve and is used by researchers of the nature conservation center in the Kleve district (Naturschutzzentrum im Kreis Kleve e.V). The exact location of the field is presented in Fig. 2 (Centre of Field 1: 5748143 N, 714617E. Centre of Field 2: 5748301 N, 714692E). All the grassland fields along the Rhine river to the West of the city of Emmerich (close to the Dutch -German border) were infested with Rumex obtusifolius.

Data acquisition and annotation
The aerial imagery was acquired using a Phantom 3 Professional UAV, on April 17th, 2018. Three flight heights were implemented (10 m, 15 m, and 30 m) with spatial resolutions of 5.9 mm, 6.4 mm, and 8.3 mm, respectively. The UAV images and the Ground Control Points (GCP) measured on the field were used to generate very high-resolution orthomosaics with the photogrammetry software Agisoft Metashape (St. Petersburg, Russia, version 1.7.3). The parameters chosen to generate the orthomosaic were the followings: medium accuracy, 400 k key points, and 100 k tie points. The orthomosaics used in this work have been published in a public repository to be used as benchmark for testing future works in this field (Valente and Kooistra, 2021). Fig. 3 shows the orthomosaic of Field 1 divided into four quadrants (four main cardinal points) and the location of the labelled Rumex plants marked with red bounding boxes. Dividing the orthomosaic into four quadrants was not an indispensable step, but was done to simplify the preprocessing steps (see Section 3.3) and to ease the division of training, validation and test sets.
Experts of the nature conservation centre in the Kleve district located Rumex plants in the orthomosaic and labelled them using Matlab (Massachussets, USA, version 9.4). They realized that in the 10 m and the 15-m-orthomosaics, they were not able to identify all the broad leaved dock plants, and the amount of labelled ones was approximately 80% of all the Rumex plants present in the field. Regarding the 30-morthomosaic, they did not manage to identify any Rumex plant due to the low spatial resolution, which means that no ground truth data was available for Field 2. Table 1

Preprocessing
The first step during the preprocessing phase was to divide the orthomosaics into non-overlapping image patches of size 256 × 256 pixels implementing a sliding window operation, as described in (Valente et al., 2019). Each image patch was labelled as Rumex (positive case) or Other (negative case) based on the ground truth bounding boxes provided by the experts. When the ground truth box overlapped with the image patch, the corresponding patch was labelled as Rumex. Otherwise, it was labelled as Other. Fig. 4 shows some examples of image patches with and without Rumex. The resulting patches were used in a cross validation setting to evaluate the model performance. The details of the CV procedure are explained in Section 3.6.

Deep learning models
Several off-the-shelf deep learning models were compared for their efficacy in classifying Rumex on grasslands with UAV imagery. They were also used to establish a benchmark. The deep learning models considered are listed in Table 2. Large conventional algorithms such as VGG, Resnet and DenseNet were tested, along with smaller models (ShuffleNet, MobileNet, EfficientNet, MNASNet) that were developed for computationally efficiency, meaning that they can be run on devices  with limited hardware capacity.
PyTorch (version 1.7) (Paszke et al., 2017) was chosen to implement and train these models. The library includes these models pretrained on the ImageNet database (Deng et al., 2009), which enables them to classify 1000 different object classes. Since there are only two classes in this study (Rumex and Other), the final layer of these models were modified to solve a binary classification problem instead of 1000 classes. The model was fine-tuned by means of transfer learning (Yosinski et al., 2014) with the Rumex dataset. The training was carried out using Adam optimizer and early stopping was used to ensure good generalization (Goodfellow et al., 2016).
All hyperparameters' values were set to default except the learning rate, which was determined separately by testing a range of values as described in (Smith, 2017). The code in this study is available at the following GitHub repository https://github.com/satnih/rumex.

Performance metrics
Several metrics were used to characterize the performance of the classification models, including accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUROC) (Fawcett, 2006). The first four metrics are derived from the confusion matrix of a binary classifier, which consists on a 2 × 2 matrix that groups the predictions into true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN), as presented in Table 3. The AUROC is a performance metric for discrimination between positive and negative cases. It takes values in the range of [0.5, 1], where a random classifier has a score of around 0.5 and a perfect classifier a 1. It is plotted as false positive rate (FPR) against true positive rate (TPR). While the confusion matrix and the associated metrics are meant to analyze the actual classification decision of a model, AUROC summarises the model performance across all potential classification decisions. All the metrics are defined in Eqs. (1)-(6).

Cross Validation
Since there are a limited number of image patches, cross validation (CV) was used to evaluate all the models for both the 10 m and 15 m dataset to ensure there was no bias in the analysis. For the 10 m data set, the patches from the four quadrants were randomly shuffled and split into five folds using stratified sampling. This ensured that each of the five folds had equal number of Rumex and Other patches since there are in total 481 Rumex patches and 1288 Other patches (Table 1). After splitting the data, 5-fold CV was used to evaluate each model. In each iteration, one fold was used as the test set, another was used as the validation set and the remaining three were used as the training set. This procedure was repeated five times, so training was performed on each fold. Each model's performance was determined by averaging the metrics of the five folds. For the 15 m dataset, 2-fold CV was used instead of 5-fold CV due to limited data points (129 Rumex and 393 Other). Table 4 displays the performance of the algorithms trained with the 10 m-orthomosaic. The model's names are suffixed with "-10" to indicate that they are trained with 10 m-data. Due to the high class imbalance, Table 4 also includes the accuracy of a default classifier, that always predicts an image patch to belong to the majority class (in this case it is Other). Thus, its accuracy of a default classifier is simply the percentage of Other patches in the test set. It can be seen that all models performed better than the default classifier. MobileNet-10 has the highest AUROC (92.68%) and F1-score (77.46%) compared to the other models. However, the other models (except MNASNET) are not too far behind. The minimum AUROC and F1-score obtained (excluding MNASNet) are 90.70% and 73.18%, respectively. However, the standard deviation of the F1-score of Densenet-10 and Efficient-10 are very high (14 and 10, respectively) indicating that their performance was not consistent across different folds. Based on these results, MobileNet-10 was chosen for the rest of the study. Moreover, MobileNet is one of the smallest and most computationally efficient models, with about 300,000 parameters compared to other models.

Results
To better understand the performance of Mobilenet-10, the predictions of each fold was analyzed in more detail. For instance, Table 5 shows the confusion matrix of a fold that was the test set in an iteration. Overall, there were 34 misclassified patches that can be divided into three categories based on the type of errors: 1) wrong labels, 2) image artifacts, and 3) small plants. Fig. 5 shows some example patches from these three categories. The first row consists of patches that were mislabelled as Other. The model correctly identified them as Rumex, but they were accounted as false positives in the confusion matrix. The second row shows patches with image artifacts because of which the algorithm failed to detect Rumex. Finally, the last row shows patches with very small Rumex which the model could not detect. Overall, MobileNet-10 failed only on difficult cases, indicating its robustness.

Flight height of 15 meters
MobileNet-10 was also tested on the 15 m-orthomosaic to examine its re-usability. As expected, its performance was poorer on 15 m-data compared to 10 m-data (F1-score = 71.22% vs 77.46%, respectively). A reason for this reduction is that on the 15 m-orthomosaic the Rumex plants were smaller and the algorithm was trained with the size of the 10 m-orthomosaic. To solve this problem, the model was fine-tuned with some images of the 15 m-data. This updated model is referred as MobileNet-15. Despite the limited training data, there was a significant improvement in performance, as shown in Table 6. Particularly, the F1-Score increased from 71.22% to 78.16%, indicating that the model could be adapted to new environments efficiently with a small amount of training data. For future studies, more data can be introduced to the model to check if the model's performance can still increase.

Flight height of 30 meters
MobileNet-10 was also tested on the 30-m-orthomosaic, shown in Fig. 6. The 30-m-orthomosaic did not come with ground truth data because the experts could not identify any Rumex plant in it (mainly due to the spatial resolution). MobileNet-10 also failed to detect Rumex plants in it. These results indicated that the flight height of 30 m and the 8.3 mm spatial resolution were limiting factors for Rumex detection.

Discussion
MobileNet-10 was able to correctly classify 93.84% of the image patches of an orthomosaic. Furthermore, it showed promising results when tested on the 15-m-data. Nevertheless, it should be noted that all  data was acquired on the same day, and under similar light conditions. Therefore, more research is required to assess the robustness of the method, for example, in different weather conditions, and in different latitudes. However, it should be remarked that experiments carried on 15-m-data showed that the same development methodology used to train MobileNet-10 can be used to update the model to new conditions. The reasoning behind is that MobileNet-15 was trained also with images from the 15 m flight height, which were more similar to the test dataset for that case. Therefore, the algorithm learnt some specific patterns of that dataset and consequently Rumex plants were better detected.
In the existing literature on Rumex detection, there are other approaches that achieved similar or higher performance like (van Evert et al., 2011), who managed to achieve 93.00% accuracy and (Ahmed et al., 2014), who achieved an accuracy of 98.50%. Nevertheless, these studies used close range images with spatial resolutions of around 2 mm, either by hand or with a robot, whereas this study uses UAV imagery with 5.9 mm of spatial resolution. While it is difficult to detect Rumex from a larger height, it also makes it possible to cover larger areas, which eases scalability. In another related study (Lam et al., 2021) proposed an automated open-source workflow for mapping Rumex in grasslands. However, the proposed workflow has many elements that require manual intervention, for instance, the resolution parameter of the orthophoto in the WebODM software (ODM, 2020), that needs to be set manually on a case-by-case basis. Further, they claim to provide an open-source methodology while some elements of the workflow require the use of the commercial software Agisoft Metashape. While this software can be replaced by an open-source alternative, like QGIS (QGIS Development Team, 2009), it also requires manual setting of several parameters. Furthermore, the authors do not demonstrate the generalisation ability of the model to new environments, and neither provide a repository to their code and datasets.
It took only 4 s to obtain the results displayed on Table 4 for MobileNet-10 on Field 1. This is yet not enough for a real-time application, but a promising future research direction includes exploring smaller and more efficient deep learning models and methods to map Rumex in a grassland during the UAVs flight.
Finally, while this study focused on mapping Rumex obtusifolius in grasslands, the same methodology should be tested to map others weeds or plants in grasslands. Moreover, future studies could focus on the detection and analysis of Rumex plants at an earlier development stage to plan ahead the weed control strategies.

Conclusion
This study demonstrated that it is feasible to use deep learning models on UAV imagery to map Rumex obtusifolius in grasslands with very limited training data. Seven different deep learning models were explored. Among those, MobileNet was the most suitable for this application, with an F1-score of 78.36%, recall of 79.76%, and an AUROC of 93.74%. Experiments showed that the model trained at a specific flight height does not directly generalize to different heights. Cross-validation made it possible to confirm that using transfer learning on a small amount of additional data can be used to adapt the model to new flying conditions. Moreover, we discovered that 5.9 mm of spatial resolution is the minimum resolution required for Rumex detection using UAV. Finally, the code and the datasets used in this work are published on an open access repository to contribute to the advances in grassland management using UAV technology.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The second and the third rows represent patches with image artifacts and small Rumex plants, respectively, in which MobileNet-10 failed to detect Rumex plants.

Table 6
Performance of MobileNet-10 and Mobilenet-15. MobileNet-10 is the Mobilenet model trained with 10 m-data, whereas MobileNet-15 is the updated version of MobileNet-10, which is fine-tuned with 15 m-data. Default-15 is the default classifier that predicts the image to belong always to Other.