Waarnemingen . be – Non-native plant and animal occurrences in Flanders and the Brussels Capital Region , Belgium

Citizen scientists make important contributions to the collection of occurrence data of non-native species. We present two datasets comprising more than 520,000 records of 1,771 non-native species from Flanders and the Brussels Capital Region in Belgium, Western Europe, collected through the website http://www.waarnemingen.be hosted by Stichting Natuurinformatie and managed by the nature conservation NGO Natuurpunt. Most records were collected by citizen scientists, mainly since 2008. Waarnemingen.be aims at recording all species, native and non-native, and it is shown here that this kind of biodiversity portals are also particularly well suited to collect large amounts of data on non-native species. Both datasets presented here are also discoverable through the Global Biodiversity Information Facility (GBIF).


Introduction
Invasive alien species (IAS) are considered a major threat to native biodiversity and 11% of the alien (non-native) species in Europe are considered invasive (Caffrey et al. 2014).One of the vital components in invasive species management is an early warning system to detect upcoming invasive species and communicate species alert information (Caffrey et al. 2014).
Information distributed by early warning systems can trigger management actions to control, eradicate or mitigate effects of the IAS (Genovesi 2005;Genovesi et al. 2010).Citizen scientists are already frequently engaged in detecting and reporting non-native species (Crall et al. 2010;Gallo and Waitt 2011;Scyphers et al. 2014;Adriaens et al. 2015b).In this paper, we describe two large datasets of non-native plant and animal species in Flanders and the Brussels capital region, mainly collected by citizen scientists.

Taxonomic coverage
In the absence of a nationwide approved non-native species list, non-native plants and animals were selected based on their status in the waarnemingen.bedatabase.The species status was determined by national and international experts.The non-native plant dataset comprises 3 kingdoms: Plantae (1,352 species), Chromista (2 species) and Fungi (2 species), but will further be referred to as non-native plant dataset.The non-native animal dataset includes 415 species from the kingdom Animalia.Apart from the species, the datasets also contain records of subspecies, varieties, forma, hybrids, and multispecies.Figure 1 shows that the majority of non-native species (plants 77%, animals 75%) were reported no more than 50 times.The 10 most frequently reported non-native plant and animal species are shown in Table 1.Despite the smaller species diversity in animal (415) compared to plant species (1,356), animals were reported more frequently than plants (respectively 380,100 and 143,497 recordings).Non-native species account for 2.7% of the observations in waarnemingen.be.

Spatial coverage
The presented datasets include non-native species records from Flanders and the Brussels Capital Region, Belgium (Figure 2).These regions are situated in the north of Belgium and cover an area of 13,522 km² and 162 km² respectively (13,684 km² in total or 45% of the Belgian territory).
Natuurpunt has not acquired permission by all observers to publish open data at the finest geospatial scale, so coordinates are generalized to a standardized grid (indicated in the field dataGeneralizations). Due to historical reasons, two different grid types are used, depending on the species group.All plant occurrence data are generalized to IFBL (Instituut voor Floristiek van België en Luxemburg) grid cells of 4 × 4 km², with the grid codes indicated in the field verbatimCoordinates.The WGS84 centroids of these grid cells are calculated in decimalLatitude/ Longitude with a coordinateUncertaintyInMeters of 2,828 meters (using Wieczorek et al. 2004).All animal occurrences were attributed to grid cells of 5 × 5 km² of the Universal Transverse Mercator (UTM) projection.The centroids of the 5 × 5 km² grid cells were calculated using the WGS84 projection with a coordinateUncertaintyInMeters of 3,769 meters (Wieczorek et al. 2004).
Figure 3 shows the number of non-native species and observations per grid cell.Please note that the data is mostly presence only data, without indication of the observation effort.Differences in the amount of species can be related to the species diversity, but also to variation in observation effort (see Methods and Discussion).Table 2 shows the top 10 of the most widespread reported non-native species.

Temporal coverage
The dataset covers observations from 30 June 1859 to 31 December 2016.Most observations were reported by citizen scientists in waarnemingen.be.This online platform was launched in 2008 and most records were registered since then, but historical records and datasets have also been imported.These imported datasets are mostly separate datasets from volunteers, or specific monitoring projects from study groups dating from before the start of waarnemingen.be in 2008.After a check on validity by species specialists, these datasets were added to waarnemingen.bewhen requested by the dataset responsible.Figure 4 shows the number of non-native species records per decade (left) or year (right).

Methods
Waarnemingen.be is promoted as the data portal for nature observations by Natuurpunt, the largest nature conservation NGO in Flanders.Hence, the majority of data is collected by non-professional volunteer citizen scientists.A minority of the data comes from    We selected all users that recorded non-native (plant or animal) species.We calculated for each grid cell the number of days each observer was present.Their presence was derived from the registration of records of all plant (A) or animal (B) species in waarnemingen.be.This results in a unique value per grid cell, representing the total nuber of day-visits per grid cell.A) day-visits per grid cell (4 × 4 km IFBL) by non-native plant observers (light green 0-250; dark green 251-500; blue 501-1,000; black 1,000-1,978).B) day-visits per grid cell (5 × 5 km) (white 0-2,000; yellow 2,001-5,000; orange 5,001-10,000; red 10,001-32,000).
professional researchers (mostly employees from Natuurpunt).Although both types of data are not separately labelled in the dataset, we know data collected by professionals is a minority since the professional use of waarnemingen.beby external professionals is regulated under a separate agreement and contract.

Sampling description
Most of the data was opportunistically recorded, without a predefined sampling protocol.This resulted in uneven sampling in time (see Figure 4), space and between species groups.In the absence of data of the true spatial search effort, we defined an approximation of search effort.For the animal dataset, this was calculated by selecting all users that recorded non-native animal species.For this selection of users, we calculated for each grid cell the number of separate days each observer was active (as evidenced by records).Their presence per grid cell was derived from the registration of records of all animal species in waarnemingen.be.This results in a unique value per grid cell, representing the total number of day-visits per grid-cell.This method gives a good indication of spatial variation in observer activity (search effort), even though it does ignore visits that resulted in no observations.The same principles were used for nonnative plant search effort approximation (see Figure 5).As mentioned above, there is also an uneven sampling between species groups.Birds (58.3%), vascular plants (27.3%) and mammals (9.5%) are the most frequently recorded non-native species groups in the dataset.It must be noted that these percentages cannot be used to compare abundance of non-native species groups since they are heavily influenced by the interests and species knowledge of recorders.Birds are overrepresented in the dataset while insects, other invertebrates, fish, reptiles and amphibians together only represent 4.5% of all records.
Currently, the data is mainly presence only data.Presence is certain (in the absence of determination errors; see quality control description), absence of data can have multiple reasons: a grid cell was not visited (or not in the right period), the species was not present, the species was present but not detected or the species was detected but not registered in the database.

Quality control description
The validation procedure is a multi-step process depending on the proof added to the observation (photograph or recorded sound), the species status and the region.
If a photograph or sound recording is added to the observation, this is always presented to a validator (a group of experts both professional and nonprofessional) for validation.When this proof is absent, an algorithm checks if automatic validation (hereafter autovalidation) is active for this species.The activation of autovalidation, and the parameters used for autovalidation are determined by the species group validators (see Supplementary material Table S1).Autovalidation depends on 3 parameters.For a record to be automatically accepted, there need to be a number of observations of the species supported by proof (at least one or two), within a certain radius (ranging from 100 m to 10 km) within a specified time range (60-3000 days).For records which do not meet the autovalidation rules, and species for which the autovalidation is not active are treated depending on species status.Common species can be validated by regional validators, but a large amount of observations is not (yet) validated.Rare and very rare species are manually validated case by case.When no proof is provided with the record, this procedure is mostly an interactive procedure in which observers can be asked for additional information by a team of validators, after which the validator manually 22% of non-native plant records and 9% of nonnative animal records in this dataset are supported by photographs on waarnemingen.be.An additional 49% (non-native plants) and 31% (non-native animals) were validated manually or by autovalidation (see Table 3).Although everybody can submit data to waarnemingen.be,we see that a small group of users (< 10%) contributes more than 90% of the data (K.Swinnen, unpublished information).These small group of users are mostly species group specialists with a good species knowledge.The validation status is indicated in the field identificationVerificationStatus, the link to the original record in the field references.

Discussion
A bottleneck in citizen science data collection is to persuade the public to report and continue to report observations, particularly of common species.The growing number of citizen science projects and apps can confuse or even fatigue volunteers (Roy et al. 2012).Waarnemingen.behas a dominant position in the biological recording landscape in Flanders and the Brussels Capital Region (Adriaens et al. 2015b).The above described datasets show that large numbers of data of non-native species can be generated by citizen scientists, using a platform with an established and extensive user base, focusing on the recording of all species (native or non-native species alike).Main reason for the success of the platform is that returns for the users (e.g.species related statistics and information, distribution maps of species, species lists per area, validation of observations, comparing species lists between observers,…) has always been important.Furthermore, the platform is managed and promoted by Natuurpunt (in the magazines, newsletters, communication with naturalist workgroups, presentations and courses,…), the largest (non-governmental) nature conservation organization in Flanders with 107,000 family memberships and 187 associated workgroups.In addition to this, the data portal generates a growing user community now totalling 25,000 contributors and 2,000,000 unique visitors since the launch of the platform.Since Natuurpunt is a regional organization, data described here is limited to Flanders and the Brussels Capital region.
In addition to waarnemingen.be,data on nonnative and invasive species in Flanders and the Brussels Capital Region are also collected via other projects and portals.The region was partially included as research area for the European RINSE project (Reducing the Impacts of Non-native Species in Europe) (Adriaens et al. 2015a); the Flemish governmental Institute for Nature and Forest Research (INBO) monitors fish species in Flanders, including non-native and invasive species (Brosens et al. 2015); Florabank monitors vascular plant distribution in the Flanders and the Brussels Capital region (Van Landuyt et al. 2012); and iRecord (http:// www.brc.ac.uk/irecord) and iNaturalist (https://www.inaturalist.org/)allow the registration of non-native and invasive species in the study area.
Apart from being a data collection platform, waarnemingen.bealso contains description, ecological and distribution information and pictures of most species.When submitting data on species which are difficult to determine, the observer receives information about the method how a correct determination can be achieved (microscopic or genital examination) or which body parts need to be clearly visible on the submitted picture in order to allow validation of the observation.In addition to this, fact sheets were made for 94 selected species in the invasive alien species early warning system (see Vanderhoeven et al. (2015) or the project details and selection criteria of the species and Table S2 for the species list).This early warning system notifies inscribed users based on their preferences for species or management area.42,111 alerts were sent by the end of 2017 to subscribers, mostly local nature managers but also government officials or local authorities.These data can consequently be used for management actions, for example: a rapid response to remove American mink (Neovison vison) (Adriaens et al. 2015a).Furthermore, citizen science observations contributed to the baseline information on distribution of invasive alien species in Europe (Tsiamis et al. 2017) and Belgium (Adriaens et al. 2018).The open data can also be used in the TrIAS (Tracking Invasive Alien Species) project, a data driven framework to inform policy (Vanderhoeven et al. 2017).
Since the launch of waarnemingen.be in 2008, there is a large increase in data quantity (Figure 4).The proportion of data reported by smartphone in the dataset has since the launch of apps in 2012 increased quickly (Figure 4).Collection of data via smartphones has multiple advantages as described by Vercayie and Herremans (2015).All observers using waarnemingen.behave to register prior to submitting observations.Users can decide on data sharing in their profile, where they also find information on the creative commons terms.Data sharing options are. 1) My observation cannot be shared with other organisations.2) My observations can be shared for scientific purposes.3) My observations can be shared for scientific purposes, policy, nature management and education.In addition to this, for specified data requests, users with a stricter data sharing preference can be asked (via e-mail) to share data for this specific purpose.This was also the case for the delivery of the data of the non-native plants and animals to GBIF.Although the quality and usefulness of the data can be further improved (eg. stimulate additional pictures, the use of species checklists to allow the determination of (pseudo) absences), the amount of data is large (even when only observations supported by pictures are considered) compared to the two IAS apps described by Adriaens et al. (2015b).To increase data fitness, waarnemingen.befocusses more on checklists and track registration since late 2016, which records observation effort in a more precise way than currently.

Figure 1 .
Figure 1.Frequency distribution of the number of observations per non-native plant or animal species.

Figure 2 .
Figure 2. Location of Belgium within Europe (left) and the three administrative regions in Belgium (yellow = Flanders, black = Brussels Capital Region, red = Wallonia).

Figure 4 .
Figure 4. Number of collected records of non-native species currently present in waarnemingen.bebetween 1910 and 2000 (left) and between 2001 and 2016 (right).Each number on the left x-axis is a period of 10 years (e.g., 1910 = 1 January 1901-31 December 1910 etc.).Note the different scales of the y-axis between the left and the right graph and the strong increase in data since the launch of waarnemingen.be in 2008, and the increase in records registered by smartphone since the launch of the apps ObsMapp (2012) and iObs (2013) and WinObs (2014).

Figure 5 .
Figure 5. Approximated search effort for A) non-native plants and B) non-native animals.We selected all users that recorded non-native (plant or animal) species.We calculated for each grid cell the number of days each observer was present.Their presence was derived from the registration of records of all plant (A) or animal (B) species in waarnemingen.be.This results in a unique value per grid cell, representing the total nuber of day-visits per grid cell.A) day-visits per grid cell (4 × 4 km IFBL) by non-native plant observers (light green 0-250; dark green 251-500; blue 501-1,000; black 1,000-1,978).B) day-visits per grid cell (5 × 5 km) (white 0-2,000; yellow 2,001-5,000; orange 5,001-10,000; red 10,001-32,000).

Table 1 .
Top 10 of the most frequently reported non-native plant and animal species in waarnemingen.be.% is calculated as the number of observations from this species divided by the total number of records of non-native plant or animal species.Only including observations from species identified to species level.

Table 2 .
Top 10 of the most widespread reported non-native plant and animal species.

Table 3 .
Number of observations and validated proportion.