Automatic detection of Japanese knotweed in urban areas from aerial and satellite data

Presence of invasive alien plant species in urban areas has become an issue throughout the globe. City administrations are making efforts to avoid invasions, to eradicate and/or control invasive species, or alternatively, try to process them into useful products. To monitor the spread of species over a region, it is important to map invasive species. In the scope of the APPLAUSE project, we have developed a One-Class support vector machine (SVM) approach to detect invasive species from individual aerial and multiple Sentinel-2 satellite images over the City of Ljubljana (Slovenia). In this paper, we focus specifically on the detection of Japanese knotweed, because it produces large stands and is therefore the most detectable invasive species in the studied area. The proposed SVM approach uses red-green-blue (RGB) band composites and infrared (IR) bands as input data for aerial images, while for satellite images additionally normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) are used. In the study, we used ground truth data collected by experts both as training and as validation data. On aerial images, we first perform segmentation, which is followed by two One-Class SVM classifications. On these classification outputs, we use K-means algorithm on the IR band, which groups the samples and removes the ones that were falsely recognized as Japanese knotweed. Merging the results together and masking out small samples and areas where Japanese knotweed is not present, we get the final result. On satellite data the approach is similar; the only difference is the usage of multiple input images from different acquisition dates for SVM classification. For detection of Japanese knotweed from the aerial images the accuracy was 83%, and 90% for stands larger than 100 m on satellite data. The results demonstrate that the applied methodology with the qualitative ground truth data can be used operationally for automatic detection of Japanese knotweed on the municipality level.


Introduction
Identification, detection and mapping of invasive alien plant species (IAPS) is becoming a high priority in urban areas across the globe as the invasive species are intensively spreading beyond indigenous vegetation, causing negative effects on existing ecosystems and human well-being. Ground surveys are still common for most mapping projects, despite intensive labor requirements, associated high costs, and incomplete Co-Editors' Note: This study was contributed in relation to the international conference "Detection and control of forest invasive alien species in a dynamic world" held in Ljubljana, Slovenia, September 25 th -29 th , 2019 organized by the project LIFE ARTEMIS (LIFE15 GIE/SI/000770) (https://www.tuje rodne-vrste.info/en/project-life-artemis/projectactivities/international-conference-2019/). This conference has provided a venue for the exchange of information on various aspects of early detection and rapid response of forest invasive alien species. The conference provided an opportunity for dialog between academia, policymakers, and forestry practitioners.
Citation: Smerdu A, Kanjir U, Kokalj Ž (2020) Automatic detection of Japanese knotweed in urban areas from aerial and satellite data. Management of Biological Invasions 11(4): 661-676, https://doi.org/ 10. 3391/mbi.2020.11.4.03 coverage of the landscape (Crosier and Stohlgren 2004). Thanks to the availability of high spatial resolution data, remote sensing (RS) is being increasingly used to study and model the spread of invasive plant species (Müllerová et al. 2017). RS is therefore playing an important, but limited, role in detecting and mapping invasive plants as it has the capacity to assess large spatial extents and examine specific spatial phenomena on the Earth's surface rapidly and comprehensively (Evangelista et al. 2009). RS is successful at detecting IAPS as long as the target exhibits distinctive characteristics when compared to surrounding indigenous species (Huang and Asner 2009), whereas detecting a specific plant species in forests (under tree crowns) or small size plant species has proved to be a greater challenge. Large-scale infestations, where invaders are clearly the dominant species and where environmental heterogeneity is reduced, tend to be easier to detect (Lass et al. 2005). In addition, the detection rate of invasive plants using RS is usually enhanced if the target species has phenological attributes that are distinctive from native vegetation (Evangelista et al. 2009).
The choice and suitability of RS imagery for vegetation mapping is determined by a number of factors including spatial coverage of the observed area, spatial resolution (smallest unit or pixel size), spectral characteristics (the range of electromagnetic wavelengths used), spectral resolution (the width and number of distinct wavelength bands), temporal resolution with seasonal and ephemeral effects on discrimination between classes of interest and the cost of data acquisition (Cuneo et al. 2009). Spatial resolution is crucial because it determines the scale of the study and the target feature's level of classification accuracy. Spectral resolution plays an important role, because accurate vegetation classification from a multispectral image is dependent on the degree of spectral separation between the major vegetation classes, compared to variation within the classes (Cuneo et al. 2009).
In our study, we use two different RS imagery sets: aerial and satellite images. Aerial images have a much higher spatial resolution in relation to satellite data, but a much lower temporal resolution; in Slovenia they are acquired approximately once every three years for the whole country. With the use of temporally denser Sentinel-2 images, we can avoid problems of detection from a single image where some (alien) plant species cannot be recognized by the differences in their life cycle compared to the surrounding vegetation due to the "wrong" timing of RS data acquisition (Müllerová et al. 2017). High temporal resolution imagery can help us to explore in detail the temporal phenological patterns of analyzed invasive species, as the distinctiveness of any phenological attribute can vary widely with regional climate, latitudinal gradients, and species richness within an ecosystem (Jarnevich et al. 2013).
The primary objective of this study was to determine the spatial extent of Japanese knotweed (Fallopia japonica (Houtt.) Ronse Decraene 1988) in the City of Ljubljana (capital of Slovenia), although other invasive species are also largely present in the studied area (e.g. Canadian goldenrod (Solidago canadensis L., 1753), staghorn sumac (Rhus typhina L., 1756), giant hogweed (Heracleum mantegazzianum Sommier & Levier, 1895), Jerusalem artichoke (Helianthus tuberosus L., 1753), tree of heaven (Ailanthus altissima (Mill.) Swingle, 1916), Himalayan balsam (Impatiens glandulifera Royle, 1834)). Japanese knotweed forms distinctive, dense stands and is therefore a suitable type of invasive species to detect on satellite images, but according to Müllerová et al. (2017) also difficult to recognize by RS means because these stands are relatively indistinct from similar vegetation types. Larger Japanese knotweed stands are usually present on the riverbanks, along roads, at construction sites, and abandoned areas. The stands appear different in winter or summertime, as the chlorophyll activity in the leaves varies greatly during the growing season. Furthermore, different stands might be in different growing stages during the same period (dependent on groundwater level, exposure to sunlight etc.). These phenological changes might be of great help for detection of IAPS with RS data.
In our study, the identification, detection, and mapping of Japanese knotweed was done based on a Support Vector Machine (SVM) classification approach. The SVM algorithm is a supervised statistical learning technique developed to handle either binary or multi-class classification (Vapnik 2006). SVM aims to identify a hyper-plane that is able to distinguish the input dataset into a predefined discrete number of classes consistent with the training data (Mountrakis et al. 2011). Various evaluations of SVM performance show that the algorithm is capable of depicting several classes with a small number of training data without losing the accuracies (Foody and Mathur 2004;Mantero et al. 2005;Bruzzone et al. 2006;Shao and Lunetta 2012;Zheng et al. 2015). SVM is commonly used in RS for classification (Paz-Kagan et al. 2019;Bruzzone et al. 2006;Mantero et al. 2005). Species suitable for detection are usually the ones that form large dense segments such as trees (Rajah et al. 2019) and shrubs (Atkinson et al. 2014). Detection of Japanese knotweed using SVM has already been done on very high spatial resolution data (e.g. Martin et al. 2018;Jones et al. 2011), but to the best of our knowledge, no studies have examined this approach for the Japanese knotweed detection on Sentinel-2 images.

Study area
Our study focuses on the City of Ljubljana, which has a population of around 300,000 people (2018)   In the City of Ljubljana, the invasive species are widely present throughout the whole municipality and broad-scale eradication is closely controlled by local authorities. The city is a study case to deal with the problem of invasive alien plant species and explore the possibilities of their innovative use with the active involvement of its residents in the scope of the APPLAUSE project (Urban Innovative Actions initiative), within which this study was conducted. The project addresses problem of dealing with invasive alien plant species and explores possibilities of their useful use with active involvement of City of Ljubljana residents.

High spatial resolution aerial images
The most intuitive and straightforward remote sensing approach for IAPS detection is the use of high spatial resolution images to visually inspect the spatial distribution of non-native species (Huang and Asner 2009). The idea in this approach is to pinpoint these species based on their unique spatial patterns or phenological characteristics.
We are using the Slovenian national orthophoto imagery archive. The high resolution aerial dataset (0.5 m-1 m) was obtained from the Surveying and Mapping Authority of the Republic of Slovenia. The orthophotos are made from the Cyclic AeroSurvey (CAS) images and the digital terrain model. Each orthophoto image covers the area of 6.75 km 2 (2.25 km by 3 km) and 64 orthophoto images cover the complete area of the City of Ljubljana. In our analysis, we used the images acquired in April 2018. All images contain red, green, blue (RGB), and near infrared (IR) bands and have a spatial resolution of 0.5 m.

Satellite data
The second data source in this study are freely available Sentinel-2 optical images. The Sentinel-2 satellite constellation consists of two satellites, Sentinel-2A and -2B (together Sentinel-2). It is an Earth observation mission from the Copernicus Programme run by the European Space Agency (ESA) for the European Commission, that systematically acquires optical images at high spatial resolution over land and coastal waters. Sentinel-2 multispectral satellite imagery provides 13 bands in different wavelength intervals in 10 m, 20 m or 60 m spatial resolution. It has a large coverage (swath width measures 290 km) and high revisit time (5 days), making it highly cost-effective for the purpose of this study in comparison to other forms of data such as aerial photography and fieldwork.
From the Sentinel Hub platform (Sentinel Hub 2019), we acquired all available Sentinel-2 satellite images (Level-1C S-2) with less than 10% cloud coverage from the beginning of acquisitions to the end of 2019. All Sentinel-2 images were preprocessed (atmospherically corrected to the bottom of the atmosphere reflected images -Level 2A), having a spatial resolution of 10 m. Apart from downloading multispectral bands for the analysis, we also acquired vegetation index NDVI (normalized difference vegetation index) and EVI (enhanced vegetation index). NDVI is one of the most-used vegetation indices for studying vegetation phenology (Yan and Roy 2014); it reduces the spectral noise caused by certain illumination conditions, topographic variations or cloud shadows (Huete et al. 2002). EVI is similar to NDVI, but has been recognized as having an improved sensitivity to high biomass regions and vegetation monitoring capability. It reduces the effects of environmental factors such as atmospheric conditions and soil background (Matsushita et al. 2007).

Ground truth data
Field surveys enable detection of smaller regions of IAPS and detection of plants, which are hard or impossible to recognize from RS images. In addition, they provide the reference data for the machine learning algorithms. In the scope of the APPLAUSE project, ground truth data over the City of Ljubljana is constantly obtained from biology experts. This particular database is currently only for internal use, but will be publicly available once fully created. Apart from the field survey collections, the data about invasive species is also collected through a software application, where both communal companies which are executing the removal of the invasive plants as well as the citizens who use software module for the IAPS plant recognition are updating the locations of invasive species. Some of these data are point locations and some cover the area of invasive species plant stands, which is for our analysis more desirable form of ground truth data information. In our study, we used the field data collected until December 2018. All described ground truth datasets were later further updated with additional fieldwork over the whole study area in October 2019 by RS experts. Selected ground truth data were used both as training and validation datasets, following 3:1 rule. Additionally, the public dataset for many different alien and invasive plants and animals is available online (Web portal Invazivke 2019).

Aerial orthophotos
The procedure applied on orthophotos is based on One-Class SVM classification. The input data were single orthophotos and positive samples (inliers), in our case approximately 900 stands of Japanese knotweed. The outputs of this procedure are segments detected as Japanese knotweed of each orthophoto over the city of Ljubljana, which are later merged together for the complete municipality.
The workflow is illustrated in Figure 2. First, we performed segmentation on all orthophotos from 2018 using ENVI FX module (Harris 2018). The segmentation is controlled by two parameters: scale level and the merge level, both with a range between 1 and 100%. Scale level determines the degree of fragmentation, whereas merge level controls the combining of segments by merging smaller segments into larger ones. Final empirically selected parameters were 40 for scale level and 85 for merge level. Then we have implemented two One-Class SVM classifications with radial basis function kernel where the first classification is done on RGB bands and their texture (Figure 2, left column) while the second one on RGB plus IR band (Figure 2, right column). According to Yekkehkhany et al. (2014), the radial basis function kernel should provide the best overall results for green urban areas extraction. RGB and texture information about segments were calculated during the segmentation; additionally, we calculated means of IR band for each segment. The training data for SVM was approximately 900 segments of Japanese knotweed stands obtained from ground truth data.
After performing One-Class SVM, we used K-means algorithm on the IR band (Figure 2, left column). The algorithm groups the samples and removes the ones that were falsely recognized as Japanese knotweed according to the ground truth data.
In the final stage, we merged the results taking the union with the second One-Class SVM results, simultaneously removing the samples smaller than 10 m 2 and the ones lying in forests or on the agricultural fields. These areas were obtained from free regularly updated data-layer Agricultural and forest land use in Slovenia, provided by the Ministry of Agriculture, Forestry and Food. This masking was applied because detection of Japanese knotweed in the forests is extremely difficult or impossible; and it is very unlikely that it grows on the regularly cultivated agricultural fields.

Sentinel-2 images
As input data for the SVM method on Sentinel-2 images, we selected RGB composites, NDVI and EVI indices of several images having less than 10% cloud coverage for the selected dates. An additional input data were also shapefiles of pixel-sized training samples, obtained from the fieldwork. The workflow is illustrated in Figure 3.
First, all the bands and indices were stacked into one array. From the rasterized shapefile we obtain the positive examples (pixels) of Japanese knotweed to train our SVM model. The ratio of training samples was approximately 75% (300 out of 400). We used One-Class SVM for inlier/outlier detection. We tried to determine whether a sample belongs to the same distribution as existing observations (training samples) of Japanese knotweed (an inlier), or not (an outlier). For One-Class SVM only training data for Japanese knotweed is required, whereas for a general SVM algorithm training samples for each of the classes need to be provided. Second, we used the radial basis function kernel for SVM, same as in the procedure applied on orthophotos. The parameters for the SVM model (nu and gamma) were determined by grid search, where intuitively nu sets an upper bound on misclassified training samples and lower bound on the number of training samples being support vectors, while gamma defines the influence radius of a single training example. We also developed a genetic algorithm which sets the parameters gamma and nu, but still needs a bit of tweaking to perform its best. The main obstacle is the lack of ground truth information about areas that are Japanese knotweed free; they would enable writing a better fitness function. Current fitness function maximizes the number of correctly detected Japanese knotweed stands, but does not consider the false detections of Japanese knotweed. This function would not distinct between classifiers which returns only the ground truth data and the ones detecting all the pixels as Japanese knotweed.
The last step of this method is masking out areas of forest and agricultural land use, where it is unlikely for Japanese knotweed to grow, the same as in the procedure applied for orthophotos. For the calculations we can choose an arbitrary set of available images of the study area. We calculated results for many different subsets of images varying in number of images and the dates images were taken. Then by checking the results, the number of false detections and correctly detected Japanese knotweed stands, we determined which subsets are the most appropriate. For example, good results were produced when we used these four images: one acquired in the middle of winter (January), one at the start of the growing season of average tree species (beginning of April when Japanese knotweed is not growing yet), one in the midst of the growing season, and one acquired in late autumn (November). This selection is taking advantage of specifics of the phenological signature of Japanese knotweed. The spring images separate Japanese knotweed from other green vegetation, whereas the ones from January and November are more adequate to separate Japanese knotweed from bare lands and shrubs (the signature reddish hues of dry Japanese knotweed).

Results
The detection maps of Japanese knotweed on orthophotos are shown in Figures 4 and 5. The introduced method for detection of Japanese knotweed from orthophotos gives an 83% detection accuracy -it detects 49 out of 59 samples. This is slightly worse compared to the study made by Dorigo et al. (2012) over the same study area with 90% overall accuracy using orthophotos acquired at different times. In our case, the accuracy is in general strongly connected to the quality and number of training samples. Furthermore, the orthophotos from 2018 were acquired in not a very ideal time as far as the Japanese knotweed analysis is concerned. Its stands are in full growth and are therefore harder to detect even with the naked eye, so we believe the detection would be more accurate if the aerial photography was acquired in a different season. For example, Japanese knotweed is easy to spot at the beginning of the growing cycle when most of the vegetation is already green, yet its areas are still lacking vegetation and the soil is bare.
Using SVM methodology for detection of Japanese knotweed on Sentinel-2 images gave approximately 90% accuracy for the stands larger than 100 m 2 (one pixel). Figure 6 shows a map with detections from satellite images. Similar to the procedure based on orthophotos, the accuracy strongly depends on the input images we selected and training samples we provided.
Additionally, we observed that the accuracy of detection on Sentinel-2 images varies a lot depending on which input images we choose (the number of images, the dates the images were taken, cloud coverage, etc.). Table 1 shows the comparison of results for different single images or different combination of image acquisition dates and Figure 7 is its visual representation. In Area 1 we observed 35 pixels, in Area 2 70 pixels; they altogether formed 9 stands. The first two columns give us the number of detected pixels in two different study areas which also differ quite a bit. One of the reasons is the presence of clouds over some samples from study Area 1 (noticeable as very low percentage of detected pixels for all the dates).  What is not visible from Table 1 but needs to be mentioned is the number of falsely detected samples. The number is much higher with single date input data. Depending on the phenological state, the misclassified data varies between agricultural fields, grasslands, trees and shrubs. Even if the accuracy (by pixel) does not seem very high, the last column in Table 1 tells us how many stands of Japanese knotweed were at least partially detected. These numbers are also very important, as these detections give us the location of the stands even though the whole stand is not detected. Both of the described methods rely heavily on good quality ground truth data. From the abundance of ground truth data available in our case, data that we obtained from the field experts in December 2018 and data from web application Invazivke (Web portal Invazivke 2019) proved not to be very  Table 1. valuable for our method based on Sentinel-2 images, due to its small graphic units, given either as location points or small areas of stands. Therefore, we collected our own training samples from fieldwork. However, for future developments we expect to use updated and extended field data from the experts.

Discussion
In this paper we tried to present the potential of different remote sensing images (aerial and satellite) to detect and map IAPS. The main differences between detecting IAPS from aerial images compared to those on satellite images are based on different spatial and temporal resolutions, which determine the suitable approaches of detection. An extent review of spectral, textural and phenological approaches of IAPS detection has been done by Bradley (2014) where she highlights circumstances where the different approaches are likely to be most effective. The benefit of orthophotos is its high spatial resolution which allows us to detect smaller stands. The lower spatial resolution of satellite data does not allow this. This limitation can affect the detection of most invasive plant species from satellite data as they grow either individually or on smaller land plots and are therefore not always visible at larger spatial resolutions. However, Japanese knotweed forms large and dense stands that are favorable for detection from satellite images. The size of the unit that we want to detect is therefore very important. A single image element-a pixel-may not necessarily carry enough (important) information to determine if and where exactly invasive alien plant species are located. For a relevant analysis we have to observe a sufficiently large area of Japanese knotweed stands used for training dataat least 10 m 2 for orthophotos and at least 100 m 2 for satellite images. A requirement for a stand to be detectable from remote sensing images is that it is dense and compact, not mixed with other vegetation, and that it does not grow under other plants. We also noticed that stands in the immediate proximity of roads and highways are difficult to detect on Sentinel-2 images, as the alignment of images in comparison to the ground truth data may vary for a pixel. Although the disadvantage of satellite data and its lower spatial resolution can be resolved by utilizing very high spatial remote sensing satellites (e.g. WorldView, GeoEye, Pleiades), the presence of many non-native species is according to Huang and Asner (2009) still not discernable. In future work we plan to test this hypothesis by testing developed methodologies using very high resolution satellite sensors. Other suggestions for further work include exploration of the merit in combining the Sentinel-2 (high temporal resolution) with aerial orthophoto (high spatial resolution) in a single classifier; assessing the suitability of the method to detect Japanese knotweed along the infrastructure network outside urban areas; including the proximity to roads/railways and rivers as an additional predictor; and comparison of SVM results vs. neural networks. A high priority might also be to somehow mitigate the bottleneck of ground truth data acquisition.
What concerns the temporal resolution, the limits of regular CAS acquisitions, at least in Slovenia, is its low temporal resolution, as images are acquired every three years. On the other side, Sentinel-2 satellite images are freely available for everyone and have very high temporal resolution; images over the same area might be acquired every five days. This allows us to use phenological features to distinct different species on multi-temporal data. Knotweeds form dense stands which are relatively similar to the surrounding vegetation types. The main distinction is the differences in phenological states between knotweeds and the tree canopy. Start of a growing season of knotweeds occurs substantially later in the season than the tree canopy: when it starts, the trees are already green. Therefore, similar to Müllerová et al. (2017) we emphasize the importance of considering seasonal changes in detection probability which can be well exploited using dense optical satellite data. In addition, the importance for an accurate detection while using any of the mentioned data type is to take advantage of spectral and textural properties of Japanese knotweed (Jones 2011).
Furthermore, the accuracy between analyzed areas varies quite a lot with the number of input images and the dates of image acquisition. Therefore, in agreement with Alvarez-Taboada et al. (2017), we can state that one of the factors that play a significant role in obtaining good results from satellite images is the date of acquisition of images.
In overall, the maps of exact location and extent of IAPS obtained from RS data can give a good view over spatial propagation of invasive species, help us understand their environmental effect, could allow cost-effective management efforts in preventing future spread and nonetheless raise awareness among citizens. Our approach could be applied using either aerial or satellite data on large spatial scales (e.g. municipalities) to detect Japanese knotweed at early stages of invasion.