Mapping Invasive Herbaceous Plant Species with Sentinel-2 Satellite Imagery: Echium plantagineum in a Mediterranean Shrubland as a Case Study

: Invasive alien plants (IAPs) pose a serious threat to biodiversity, agriculture, health, and economies globally. Accurate mapping of IAPs is crucial for their management, to mitigate their impacts and prevent further spread where possible. Remote sensing has become a valuable tool in detecting IAPs, especially with freely available data such as Sentinel-2 satellite imagery. Yet, remote sensing methods to map herbaceous IAPs, which tend to be more difﬁcult to detect, particularly in shrubland Mediterranean-type ecosystems, are still limited. There is a growing need to detect herbaceous IAPs at a large scale for monitoring and management; however, for countries or organizations with limited budgets, this is often not feasible. To address this, we aimed to develop a classiﬁcation methodology based on optical satellite data to map herbaceous IAP’s using Echium plantagineum as a case study in the Fynbos Biome of South Africa. We investigate the use of freely available Sentinel-2 data, use the robust non-parametric classiﬁer Random Forest, and identify the most important variables in the classiﬁcation, all within the cloud-based platform, Google Earth Engine. Findings reveal the importance of the shortwave infrared and red-edge parts of the spectrum and the importance of including vegetation indices in the classiﬁcation for discriminating E. plantagineum. Here, we demonstrate the potential of Sentinel-2 data, the Random Forest classiﬁer, and Google Earth Engine for mapping herbaceous IAPs in Mediterranean ecosystems.


Introduction
Invasive Alien Plants (IAP) pose a significant threat to many ecosystems [1] and agriculture globally [2]. The growing number of invasive plant species, which include agricultural weeds, compromise ecosystem stability and threaten economic productivity [3][4][5][6]. The cost and impact of IAPs is extensive, ranging from reducing grazing land and poisoning livestock [7] to the loss of native species and lowering biodiversity [8][9][10][11].
Reliable invasive species distribution maps are important for targeting management of infestations, modelling future invasion risk [12], assisting with policy decisions, strategic allocation of funding, effective implementation of control programmes [13][14][15][16] and planning for ecological restoration [17,18]. Traditional mapping of weeds is time-consuming, rapidly outdated, labor-intensive and expensive, and relying heavily on field-based surveys has costs and contamination of wool by seeds [54]. The species contains pyrrolizidine alkaloids, which are toxic to grazing animals and may result in their death if eaten over prolonged periods [55,56]. Echium plantagineum is a prolific seed producer with heavy infestations yielding up to 10,000 seeds per square meter [47,57] which may remain in the soil for up to six years [52,58]. To date, very little research has been conducted on detecting E. plantagineum with multispectral satellite imagery. Ullah et al. [59] conducted satellite remote sensing studies using Landsat Thematic Mapper (TM) to detect dense and flowering E. plantagineum in Australia; however, since there was no accuracy assessment provided, it is not possible to quantify how successfully the species was detected. In another study, McIntyre [60] determined that E. plantagineum could not be distinguished from pastures and crops in high-resolution aerial imagery; however, the species could be more reliably detected and a higher accuracy achieved using EO-1 Hyperion satellite hyperspectral imagery [60,61]. Hyperspectral imagery has been used successfully to detect other herbaceous IAPs such as Musk thistle (Carduus Nutans) [62] and Hoary cress (Cardaria draba) [63] while using multispectral satellite imagery for detecting herbaceous IAPs such as giant hogweed (Heracleum mantegazzianum), knotweed (Fallopia sp.) and Striga (Striga hermonthica), with mixed results [37][38][39].
While the most common approach for species detection is based on differences in spectral signatures, phenological approaches, which are based on seasonal events such as flowering, have proven to be effective in detecting some invasive plant species [12,64,65]. Distinct phenological patterns can provide opportunities for remote detection if the invasive species under consideration has a different seasonal or inter-annual growth pattern compared to its co-occurring species [12,35], or floral characteristics that may assist in spectrally discriminating a plant species from its background such as flowering Acacia salicina and Acacia saligna trees [64], giant hogweed (Heracleum mantegazzianum), knotweeds (Fallopia spp.) [35] and E. plantagineum [59,60]. High temporal resolution satellite sensors such as Sentinel-2 that capture images every few days have the capability of detecting a species in near real time, allowing for opportune interventions and timely species monitoring and management [21,39,66,67]. In terms of supporting long-term and large-scale mapping of IAPs, especially in financially constrained countries, it is necessary to explore freely available multispectral datasets such as Sentinel-2 in conjunction with advanced machine learning algorithms [23].
Other studies in Mediterranean-type ecosystems aimed at detecting IAP species using remote sensing include mostly trees, but also some shrubs and grasses [68][69][70][71][72]. Detection of herbaceous IAPs with remote sensing in a Mediterranean ecosystem using only freely available multispectral satellite imagery has not been performed, probably partly because the success of a remote sensing analysis declines as site complexity increases [73]. The case of E. plantagineum within the Fynbos Biome of South Africa provides an ideal study system to address this question. Here, we aim to develop a classification methodology to detect herbaceous IAPs using the cloud-based platform, Google Earth Engine (GEE), a Random Forest classifier, Sentinel-2 data and the species E. plantagineum as a case study. Specifically, we aim to determine: (1) Can Sentinel-2 imagery be used to discriminate the invasive alien plant species, E. plantagineum? (2) What variables are important in detecting E. plantagineum in satellite imagery?

Study Area
Our study area is situated within three municipalities: the City of Cape Town, Stellenbosch and Drakenstein. These municipal areas are located within the Fynbos Biome which is part of the Cape Floristic Region (CFR) of South Africa. The CFR, with its high native species richness and levels of endemism, is a biodiversity hotspot [74,75]; it is the smallest, but most biologically diverse of all the world's six plant kingdoms, and is recognized as a UNESCO world heritage site [76,77]. The Fynbos Biome is one of the most species-rich floristic regions in the world [78]; however, it is also the most invaded of all biomes in South Africa [3,41]. The Fynbos Biome is characterized by a Mediterranean-type climate with hot, dry summers and mild wet winters, and has a mean annual precipitation of 480 mm [79]. Fynbos, the dominant vegetation within the Fynbos Biome, is a low growing shrubby vegetation characterized by its fine leaved evergreen shrubs [42,80]. Study sites within the Fynbos Biome with known occurrences of E. plantagineum that could easily be accessed and sampled were selected ( Figure 1). Geomatics 2023, 3, FOR PEER REVIEW 4 native species richness and levels of endemism, is a biodiversity hotspot [74,75]; it is the smallest, but most biologically diverse of all the world's six plant kingdoms, and is recognized as a UNESCO world heritage site [76,77]. The Fynbos Biome is one of the most species-rich floristic regions in the world [78]; however, it is also the most invaded of all biomes in South Africa [3,41]. The Fynbos Biome is characterized by a Mediterranean-type climate with hot, dry summers and mild wet winters, and has a mean annual precipitation of 480 mm [79]. Fynbos, the dominant vegetation within the Fynbos Biome, is a low growing shrubby vegetation characterized by its fine leaved evergreen shrubs [42,80]. Study sites within the Fynbos Biome with known occurrences of E. plantagineum that could easily be accessed and sampled were selected ( Figure 1).

Study Species
Echium plantagineum is an erect, softly hairy annual, occasionally biennial, herbaceous plant that often occurs in pastures along with other pasture species, weeds, and grasses, but also inhabits road verges, abandoned fields and natural areas in the Fynbos Biome [47], and can grow between 20 and 60 cm tall [81]. The species is an aggressive, droughttolerant IAP that adapts to many conditions and soil types [58] and readily inhabits fallow land and disturbed areas [7,54]. The main flowering period is from late winter and can last between three and five months [82]. In our study region, the species usually flowers around early or mid-spring (September-October) and lasts for two-three months, but this is variable depending on moisture availability and temperature [58]. During this time, distinctive purple flowers can often be seen in expansive tracts of land where E. plantagineum exists. In most instances, however, the species occurs as scattered patches of varying size, and in narrow stretches along roadsides where roadside management (mowing) helps to maintain its presence by encouraging regrowth [57] and removing competition from other species. Seedlings usually emerge and develop as rosettes in late winter, then produce stems which elongate and flower in spring; these plants usually die in the summer months; however, plants growing near areas with a high moisture content (e.g., roadsides, rivers, dams) may continue to flower during summer and die in late summer or early autumn [58].

Study Species
Echium plantagineum is an erect, softly hairy annual, occasionally biennial, herbaceous plant that often occurs in pastures along with other pasture species, weeds, and grasses, but also inhabits road verges, abandoned fields and natural areas in the Fynbos Biome [47], and can grow between 20 and 60 cm tall [81]. The species is an aggressive, drought-tolerant IAP that adapts to many conditions and soil types [58] and readily inhabits fallow land and disturbed areas [7,54]. The main flowering period is from late winter and can last between three and five months [82]. In our study region, the species usually flowers around early or mid-spring (September-October) and lasts for two-three months, but this is variable depending on moisture availability and temperature [58]. During this time, distinctive purple flowers can often be seen in expansive tracts of land where E. plantagineum exists. In most instances, however, the species occurs as scattered patches of varying size, and in narrow stretches along roadsides where roadside management (mowing) helps to maintain its presence by encouraging regrowth [57] and removing competition from other species. Seedlings usually emerge and develop as rosettes in late winter, then produce stems which elongate and flower in spring; these plants usually die in the summer months; however, plants growing near areas with a high moisture content (e.g., roadsides, rivers, dams) may continue to flower during summer and die in late summer or early autumn [58].

Training and Validation Data
The process of image classification requires ground truth data to train and verify model results [83]. Reference data for E. plantagineum presence samples were collected using a handheld Garmin 62S Global Positioning System with an accuracy of 3 m. Presence samples were collected during the period of 9-14 October 2020 to capture the species presence during its peak flowering stage, as it is hypothesized that the flowering species is spectrally discernible from co-occurring plant species. Sites of data collection consisted of grazing lands and open fields with E. plantagineum present within our study area. The species commonly occurs with other herbaceous species, graminoids, and bare ground, thus often making their detection in imagery challenging. At the study sites, E. plantagineum had more than 50% cover and was the only profusely flowering purple species at the time of data collection. Echium plantagineum sample points were collected as individual points where the species dominated and as close to the centre of a plot or field as possible. Since Sentinel-2 has a spatial resolution of 10-20 m, sample points were buffered to 10 m to include surrounding pixels of the class, yet maintaining sample precision by keeping within 20 m. In addition, random sample points were assigned to representative land cover classes present in the Sentinel-2 image; these other land cover classes included bare ground, built-up, agriculture, shrubland and water. Validation points were selected from the labelled samples by randomly splitting the samples into 70% training and 30% validation data [84][85][86] within GEE (Table 1).

Image Acquisition and Processing
The Copernicus Sentinel-2 mission is a constellation of two identical optical satellites, Sentinel-2A and Sentinel-2B, which were launched by the European Space Agency (ESA) in June 2015 and March 2017, respectively [87]. Sentinel-2 satellites provide free and global coverage of multispectral (13 spectral bands) and medium-high spatial resolution (10-60 m) imagery with a temporal resolution of 5 days between the two satellites combined. Furthermore, Sentinel-2 provides novel spectral capabilities with two near infrared (NIR) bands (NIR and NIR narrow), three red-edge bands and two shortwave infrared (SWIR) bands, which are valuable for vegetation detection [87].
Sentinel-2 Level-2A Surface Reflectance imagery for 12 October 2020 was used for the study as this date for imagery coincided with the period when presence samples were collected for E. plantagineum. Image bands 1, 9 and 10 were excluded as they are designed for atmospheric features and provide no land cover information. All remaining image bands ( Table 2) were then clipped to the extent of the study region.
Sixteen vegetation indices (Table 2) were computed from the Sentinel-2 imagery using the visible, near-infrared (NIR) and red-edge image bands within GEE. These indices were selected based on their ability to separate vegetation from non-vegetation, to account for bare soil, atmospheric influences, and the usefulness of the NIR and red-edge indices for vegetation mapping [20,88]. The most commonly used vegetation index for separating vegetation from non-vegetation and for vegetation vitality and stress is the Normalized Difference Vegetation Index (NDVI) [89]. The Enhanced Vegetation Index (EVI) has proven useful for vegetation classification to simultaneously correct for atmospheric influences and soil background [90]. The Normalized Difference Red Edge Index (NDRE) is similar to NDVI, but uses the ratio of Near-Infrared and the Red Edge, and is known to be effective in detecting healthy vegetation [91]. One of the earliest vegetation indexes for vegetation analysis is the Simple Ratio (SR), which is the ratio formed by spectral ratioing of the NIR band with the red band. The greater the amount of healthy green vegetation present in a pixel, the larger the contrast between NIR and red values, which results in higher NIR-to-Red ratios [92]. Blue (NIR -Green)/(NIR + Green) [97] x IPVI NIR/(NIR + Red) [98] x NDGI (Green -Red)/(Green + Red) The Soil Adjusted Vegetation Index (SAVI) corrects for the influence of soil background and is included in this study. In addition, we assessed the inclusion of the Atmospherically Resistant Vegetation Index (ARVI), which is similar to NDVI, but minimizes the effects of atmospheric scattering [90,105]. Furthermore, we included the Green Normalized Difference Vegetation Index (GNDVI), which is like NDVI except that it measures the green spectrum instead of the red spectrum. This index is more sensitive to chlorophyll concentration than NDVI. We also included the Infrared Percentage Vegetation Index (IPVI); this index is the same as NDVI functionally but is computationally faster. Additionally, we included the Normalized Flower Red-Edge (NFRE) as proposed by [60] for flowering E. plantagineum and adapted it to Sentinel-2 image bands for testing.

Image Classification-Random Forest
Based on the success of the Random Forest classifier in detecting other invasive plant species [38,40,64,106,107], we used this classifier in our case study. The Random Forest classifier provides an assessment of the importance of different variables during the classification process; variables or features (e.g., image bands and vegetation indices) with a higher importance value have a greater contribution to classification accuracy. However, variable importance is sensitive to the Random Forest method parameters number of trees and variables per split. Choosing optimal parameters for the Random Forest classifier can improve stability of variable importance scores and help to discriminate between important and unimportant variables [108]. Hyperparameters are those parameters that are specified by the user to control the training process in the Random Forest algorithm. In this case, the number of trees needs to be explicitly defined within GEE, while other parameters, if not specified, use default values. The number of trees parameter was incrementally tested within GEE from 10 to 1000 to determine the optimal number of trees which provided the best overall accuracy. The parameter variables per split used the square root of the number of variables; the minimum leaf population, bag fraction and seed were set to the default values of 1, 0.5 and 0, respectively, while no limits were set to the maximum number of leaf nodes (max. nodes) ( Table 3). Only create nodes whose training set contains at least this number of points. Bag Fraction 0.5 The fraction of input to bag per tree.

Max. Nodes No limit
The max. number of leaf nodes in each tree. Seed 0 The randomization seed.
An advantage of the Random Forest classifier is that the algorithm itself can be used for feature selection [110]. Feature selection methods are used to select a subset of variables from the original variables; the subset variables are those determined to be most valuable for predicting classes in the classification [111,112]. Feature selection provides information on which variables (image bands and vegetation indices) are most relevant for a classification and ranks them in order of importance [110]. By selecting the most optimal variables and removing bands which may be highly correlated, one can avoid a bias in the results. The reduction in data dimension means that processing can be simplified and the performance of the classifier improved [113]. Here, the Random Forest algorithm was used to rank those variables from the set of image bands (10 image bands) and indices (16 indices) ( Table 2) which were determined to be most useful in discriminating target classes [110]. The Random Forest algorithm provides a measure of variable importance based on the Mean Decrease in Gini (MDG) value to identify predictor variables. The MDG measures how much a variable reduces the Gini Impurity within a particular class; the larger the MDG value, the purer the variable, and thus the more important the variable [114,115]. The MDG was used to calculate variable importance.
Three classification approaches were evaluated in this study: (a) using Sentinel-2 image bands + VIs, (b) using the most important image bands and VIs, and (c) using Sentinel-2 image bands only. The variables used in the first analysis (Bands and VIs) were all the proposed Sentinel-2 image bands and indices based on their ability to separate vegetation from non-vegetation, to account for bare soil background, atmospheric influences, and their usefulness in vegetation mapping. The second analysis was based on the best variables (top 50%) calculated from the first analysis, while the third analysis was based on using only Sentinel-2 image bands for comparison with the other two analyses (Table 3). Each classification was based on the Random Forest algorithm which was fine-tuned to identify the optimal number of trees used in each classification.
We generated a validation dataset for the accuracy assessment by randomly splitting our samples into 70% for training and 30% for validation to avoid any bias in the results. The accuracy of the classification was assessed using the 30% validation samples within GEE. Classifications were assessed using an error matrix approach, which included the overall accuracy (OA) and Kappa coefficient [116], as well as the user accuracy (UA) and producer accuracy (PA) for each class.

Results
We determined that by fine-tuning the hyperparameter to select the optimal number of trees, the overall accuracy could be increased by up to 1.7% (Bands and VIs). When comparing our optimal number of trees to that of a typical moderate size of 100 trees, the accuracy was increased by 0.4% (Figure 2).
The overall classification accuracy for varying number of trees in the Random Forest classification for three different classifications was high overall (Figure 2). The optimized value of 160 trees yielded the best overall classification accuracy at 93% when using all 26 variables (Bands and VIs); fewer trees slightly lowered the overall accuracy (1% decrease in overall accuracy at 20 trees), while with more trees the accuracy would plateau at the expense of computational efficiency. When using only the most important variables (Most important bands and VIs), the optimal number of trees for the Random Forest classifier was 30, whereas when using only Sentinel-2 image spectral bands (Bands), the optimal number of trees was 50.
Results suggest that all 26 variables (image bands and vegetation indices) were considered important in terms of MDG; however, some variables scored significantly higher than others. The most important variables were the two Sentinel-2 SWIR image bands (B11 and B12 respectively), followed by the red-edge bands (B5 and B6) and the narrow NIR band (B8A). The Green Difference Vegetation Index (GDVI), Red Index (RI), Normalized Flower Red-Edge (NFRE) and the Normalized Difference Red-Edge (NDRE) proved to be the most significant indices evaluated in the classification (Figure 3).
The classification results indicate that using all Bands and VIs outperformed both the analysis using the most important bands and VIs and the analysis using only image bands marginally with an overall accuracy (OA) of 93.11% compared to 92.66% and 92.77%, respectively, while that of the kappa statistic (K) was also highest for the analysis using Bands and VIs (0.92) compared to using only the most important bands + VIs and image bands alone which both achieved marginally lower results (0.91).
The accuracy for E. plantagineum ranged between 81.05% and 85.90% for the Producer's Accuracy (PA), while the User's Accuracy (UA) ranged between 56.30% and 64.71% ( Table 4). The UA of E. plantagineum increased by 8.40% when including vegetation indices in the classification compared to that of using only Sentinel-2 image bands. Similarly, the UA of E. plantagineum increased by 6.72% when using image bands and indices (Bands and VIs) compared to using only the top variables (Most important bands and VIs), highlighting the importance of Sentinel-2 image bands and derived vegetation indices for detecting IAPs. Although the classification that relied on only Sentinel-2 image bands achieved a high overall accuracy, Kappa, and producer's accuracy, the inclusion of vegetation indices was important for increasing the user's accuracy. Across all three cases, it was determined that a higher proportion of pixels was misclassified as either agriculture or shrubland compared to other land cover classes, resulting in a lower accuracy for E. plantagineum ( Table 4). The classification result of the analysis based on Bands and VIs is shown in Figure 4. The classification results indicate that using all Bands and VIs outperformed both the analysis using the most important bands and VIs and the analysis using only image bands marginally with an overall accuracy (OA) of 93.11% compared to 92.66% and 92.77%, respectively, while that of the kappa statistic (K) was also highest for the analysis using Bands and VIs (0.92) compared to using only the most important bands + VIs and image bands alone which both achieved marginally lower results (0.91).
The accuracy for E. plantagineum ranged between 81.05% and 85.90% for the Producer's Accuracy (PA), while the User's Accuracy (UA) ranged between 56.30% and 64.71% (Table 4). The UA of E. plantagineum increased by 8.40% when including vegetation indices in the classification compared to that of using only Sentinel-2 image bands. Similarly, the UA of E. plantagineum increased by 6.72% when using image bands and indices (Bands and VIs) compared to using only the top variables (Most important bands and VIs), highlighting the importance of Sentinel-2 image bands and derived vegetation indices for detecting IAPs. Although the classification that relied on only Sentinel-2 image bands achieved a high overall accuracy, Kappa, and producer's accuracy, the inclusion of vegetation indices was important for increasing the user's accuracy. Across all three cases, it was determined that a higher proportion of pixels was misclassified as either agriculture or shrubland compared to other land cover classes, resulting in a lower accuracy for E. plantagineum ( Table 4). The classification result of the analysis based on Bands and VIs is shown in Figure 4.  Table 2 for description of variables.  Table 2 for description of variables.  Table 2 for description of variables.  Table 2 for description of variables.   Table 2 for description of variables.  Table 2 for description of variables.

Discussion
The main objective of this study was to test the capabilities of freely available multispectral Sentinel-2 imagery and the cloud-based platform, Google Earth Engine, for detecting herbaceous IAPs in a Mediterranean-type ecosystem. This case study focuses on identifying the species E. plantagineum in the Fynbos Biome. The machine learning algorithm, Random Forest with Sentinel-2 image bands and indices within Google Earth Engine were used and results highlight the most important variables used in the classification. Additionally, the Random Forest parameter for the number of trees was fine-tuned and optimized to increase classification accuracy. Even though default values of the Random Forest model usually work reasonably well for image classification [117], we confirmed that fine-tuning the hyperparameter for the number of trees improved the performance of the Random Forest classifier by up to 1.7%. Similarly, hyperparameter optimization increased classification accuracy by 2-3% compared to using default hyperparameters [118].
The most important variables for discrimination of classes were identified using the Random Forest algorithm for feature selection. The most important variables identified were the shortwave infrared, red-edge and near infrared image bands, which is similar to what others have discovered in systems examining invasive trees or shrubs within the

Discussion
The main objective of this study was to test the capabilities of freely available multispectral Sentinel-2 imagery and the cloud-based platform, Google Earth Engine, for detecting herbaceous IAPs in a Mediterranean-type ecosystem. This case study focuses on identifying the species E. plantagineum in the Fynbos Biome. The machine learning algorithm, Random Forest with Sentinel-2 image bands and indices within Google Earth Engine were used and results highlight the most important variables used in the classification. Additionally, the Random Forest parameter for the number of trees was fine-tuned and optimized to increase classification accuracy. Even though default values of the Random Forest model usually work reasonably well for image classification [117], we confirmed that fine-tuning the hyperparameter for the number of trees improved the performance of the Random Forest classifier by up to 1.7%. Similarly, hyperparameter optimization increased classification accuracy by 2-3% compared to using default hyperparameters [118].
The most important variables for discrimination of classes were identified using the Random Forest algorithm for feature selection. The most important variables identified were the shortwave infrared, red-edge and near infrared image bands, which is similar to what others have discovered in systems examining invasive trees or shrubs within the South African context [44,45,119]. Similarly, the red-edge and shortwave infrared regions were the most significant for discriminating differences between C3 and C4 grass species using Sentinel-2 imagery [120]. Several other studies have also determined the red-edge band to be critical in distinguishing vegetation classes and plant species [121][122][123]. This is often ascribed to the fact that this portion of the electromagnetic spectrum is sensitive to plant characteristics such as leaf pigmentation, water content and leaf structure [121,124]. Sentinel-2 has three red-edge bands, all of which were deemed important. Additionally, the vegetation indices that incorporated the red-edge band (B5) were two of the most valuable vegetation indices. Vegetation indices are commonly used as input to classification schemes and have been shown to help differentiate classes [72,125]. Part of this effort was to explore the value in using vegetation indices to classify E. plantagineum. The Normalized Flower Red-Edge (NFRE) has been applied to determine a relationship with floral cover of E. plantagineum originally using hyperspectral imagery [60]; we adapted the index to the closest multispectral image bands of Sentinel-2 data, red-edge (B5) and red (B4) bands, and determined it to be one of the most important vegetation indices to distinguish the species, thus demonstrating the importance of phenological characteristics (such as flowering) for detecting IAP species and confirming that this range of the spectrum is relevant to detecting flowering E. plantagineum [61].
We determined that the UA and PA varied with variable importance (Table 4), and that using the most important variables resulted in the PA for E. plantagineum increasing by 3.22%, while the UA decreased slightly by 1.68%. Although the producer's accuracy of E. plantagineum was 81.05%, showing an omission error of 18.95%, the user's accuracy (64.71%) indicates a commission error of 35.29%, revealing an overestimation when classifying the species. The lower accuracy for the species in all three classifications can be attributed to the difficulty in discriminating E. plantagineum from agriculture and shrubland, which may coexist with the species, as well as the fact that the species is commonly interspersed with other vegetation, and that small patches can remain undetected in 10 m spatial resolution imagery. The accuracy may be increased by selecting training for different levels of pureness of E. plantagineum to identify the level of accuracy that one can classify E. plantagineum from pure homogeneous patches to different levels of mixed patches. In the Australian study to detect E. plantagineum using 1 m resolution aerial imagery (red, green, blue and NIR bands), it was determined that the species could not be differentiated from pastures and crops; additionally, the authors reported high errors of commission and omission for the species [60]. The result of our producer's accuracy for E. plantagineum was similar to that of the study that achieved 83% using satellite hyperspectral imagery EO-1 Hyperion to detect E. plantagineum [60,61]; this is likely due to the difference in spatial resolution of the two sensors. Since EO-1 Hyperion has a spatial resolution of 30 m, there is a higher likelihood of having heterogeneous patches and hence lower classification accuracy; however, the availability of 49 spectral bands may have increased the classification accuracy. Sentinel-2 has a spatial resolution of 10-20 m and thus a lower likelihood of heterogeneous patches, while the low number of bands would have a lower likelihood of detection as opposed to a higher number of bands.
Mapping invasive plant species can be challenging in diverse Mediterranean ecosystems [70,126]; furthermore, the limitations experienced when mapping herbaceous IAPs can be attributed to the varying patch size and density of the species with lower density and smaller patches remaining undetected [35]. In this study, E. plantagineum presence samples collected covered a minimum area of 10 m 2 and had at least 50% density coverage; however, the species is commonly interspersed with other herbaceous plants and grasses. Other studies reported that cover densities of IAPs need to be sufficiently high for their detection [28,127,128]. For example, Bugweed (Solanum mauritianum) was best detected in WorldView-2 multispectral satellite data when cover density was greater than 76% [28], and spotted knapweed (Centaurea maculosa) was detectable using hyperspectral imagery where cover densities were more than 70% and cover was larger than 0.1 ha [128]. Furthermore, the inclusion of regular time series data may improve the ability to the map invasive species [129], since E. plantagineum may be in bloom at different times at different sites due to different characteristics of sites (e.g., varying amounts of disturbance or water availability), thus affecting the phenology of the species and hence its detection. Future research may also include assessing the probability of each classified pixel and using this as a measure of its uncertainty and to improve accuracy [115,130]. Alternatively, testing the down scaling of spatial resolution of satellite data using deep learning might be a potential solution (see, for example, [131]).

Conclusions
Here, we show that using Sentinel-2 image bands and derived indices in a Random Forest classification holds promise for detecting herbaceous IAP's such as E. plantagineum. However, the field samples collected for the species had a minimum density cover of 50%, which may have resulted in lower density patches remaining undetected or being misclassified. For future research, it is recommended that more samples with more than 70% coverage [28,127] of E. plantagineum are included within a minimum patch size of 20-30 m 2 for Sentinel-2 imagery. Nevertheless, our results are valuable for initial screening purposes for targeting management of dense infestations and modelling future invasion risk. The results could also be used to supplement the South African Plant Invaders Atlas, which provides a very broad-level indication of the distribution of the species. The classification model may be useful for historical analysis to determine inter-annual drivers of abundance. It is recommended that future research focus on utilizing the classification model and applying it to Sentinel-2 image data from previous years, captured during the same phenological stage (flowering) of E. plantagineum, and compare results with those presented here to provide an indication of change in the species over time. This may assist in explaining the relative roles of climate vs. management in determining density or abundance of the species.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.