Which calibrated threshold is appropriate for ranking non-native species using scores generated by WRA-type screening toolkits that assess risks under both current and future climate conditions?

Score-based decision-support tools are increasingly used to identify potentially invasive non-native species as part of the risk screening (initial risk identification) component of non-native species risk analysis. Amongst these tools are the Weed Risk Assessment (WRA) and its derivatives, e.g. the Aquatic Species Invasiveness Screening Kit (AS-ISK), which have been extensively used on a large variety of terrestrial and aquatic plants and of aquatic animals worldwide. In this paper, a correction is made to the previous guidance on the use of two separate thresholds to risk-rank species, i.e. one for current climate conditions (basic risk assessment: BRA threshold) and one for future climate conditions (BRA + climate change assessment: BRA+CCA threshold). Re-evaluation of this practice reveals that, to avoid the incorrect risk-ranking of species, only the BRA threshold should be used in all future applications of WRA-type toolkits that include a separate set of climate-change questions – at present, this involves the AS-ISK and the newly released Terrestrial Animal Species Invasiveness Screening Kit (TAS-ISK). As a result of this revised guidance, all published studies containing AS-ISK applications to date are reviewed here, and where approrpiate corrected risk ranks are provided for species that were risk-ranked using a BRA+CCA threshold. Corrections are also made whenever applicable to published errors or incorrect risk ranks based on the BRA threshold in the AS-ISK applications reviewed.


Introduction
An essential step in the use of decision-support tools for identifying potentially invasive non-native species (sensu Copp et al. 2005c), which pose a threat to native species and ecosystems worldwide (Molnar et al. 2008;Paini et al. 2016), is to determine the most appropriate risk threshold score for distinguishing high-risk from lower-risk species (Gordon et al. 2008). Risk screening (or identification) is the first step in the risk analysis process, followed by comprehensive (full) risk assessment, and risk management Citation: Vilizzi L, Piria M, Copp GH (2022) Which calibrated threshold is appropriate for ranking non-native species using scores generated by WRA-type screening toolkits that assess risks under both current and future climate conditions? Management of Biological Invasions 13 (in press) and communication (Canter 1993;Copp et al. 2016a). For the risk screening step, electronic score-based tools are increasingly used to screen non-native terrestrial plants (Pheloung et al. 1999;Gordon et al. 2008) and aquatic species (Copp et al. 2016b to identify potential invaders. The commonality of these decision-support tools is that their output scores can be used to compute a "calibrated" (score-based) threshold that ranks species according to their risk level. This allows species that pose a high risk of being (or becoming) invasive in the designated risk assessment area to be distinguished from those that pose a medium risk of being or becoming invasive (Gordon et al. 2008;Copp et al. 2009;Vilizzi et al. 2022a). Identification of high-risk species provides policy and decision-makers with information (cf. risk level and assessor confidence in the information that supports their responses) with which to prioritise non-native species for management. The most common options include immediate action (rapid response) to control or eradicate the invader, and/or a more comprehensive risk assessment. This information, therefore, serves to inform policy and decisions, such as the most appropriate management approach and the best way to communicate the risks more widely to stakeholders and the general public (Copp et al. 2016b. Amongst the most widely used screening tools are the Weed Risk Assessment (WRA) for terrestrial plants (Pheloung et al. 1999;Gordon et al. 2008) and the Aquatic Species Invasiveness Screening Kit (AS-ISK: Copp et al. 2016bCopp et al. , 2021Hill et al. 2020;) for aquatic species. Although adapted directly from the WRA (Copp et al. 2009), and therefore based on 49 basic risk assessment (BRA) questions (Copp et al. 2005a, b;Vilizzi et al. 2019), the AS-ISK was enhanced by the inclusion of an additional set of six climate change assessment (CCA) questions and a few other elements -as such, the AS-ISK complies with the "minimum requirements" (Roy et al. 2018) for the assessment of invasive non-native species with regard to the EU Regulation 1143/2014 (European Council 2014). These CCA questions therefore distinguish the AS-ISK from the WRA and other WRA-type toolkits (Pheloung et al. 1999;Gordon et al. 2008), which effectively assess non-native species under current climate only. The purpose of these six CCA questions is to allow the assessor to predict, based on available information, how the forecasted future climate conditions are likely to affect the various risks (of introduction, establishment, dispersal and impact) associated with the species being screened. This feature would appear to offer the possibility to compute two separate thresholds, i.e. using receiver operating characteristic curve analysis of the risk screening scores generated by the toolkit (see Vilizzi et al. 2022a): one BRA threshold based on scores from the 49 BRA questions, which evaluate risks under current-climate conditions; and a second BRA+CCA threshold based on the BRA plus the six CCA questions. The computation and use of two thresholds has been employed in most AS-ISK applications to date (Table 1). However, consistent with the long-standing principle of risk  (if applicable) for the BRA and the BRA+CCA (with the BRA+CCA threshold scores provided only for reference purposes and crossed out); Corrections (if applicable) resulting from using the BRA threshold to rank species based on their BRA+CCA score (n.c. = no corrections). Threshold values are given in the decimal points as originally provided in the source reference. In case of more than one risk assessment area for a certain application, the order follows the one given in the source reference.

Reference
S. Provision of both BRA and BRA+CCA thresholds involving more than one species and requiring no corrections. 6 Provision of both BRA and BRA+CCA thresholds involving more than one species and requiring correction of the BRA+CCA risk rankings plus a change in the BRA based risk rankings. 7 Provision of both BRA and BRA+CCA thresholds involving more than one species and requiring correction of the BRA+CCA risk rankings only for one species incorrectly classified as medium risk. 8 Provision of both BRA and BRA+CCA thresholds involving more than one species and requiring correction of the BRA+CCA risk rankings and screening species from several taxonomic groups 9 Provision of both BRA and BRA+CCA thresholds involving more than one species and requiring correction of the BRA+CCA risk rankings for one to nine species analysis that assessment protocols and the resulting assessments are dynamic, and therefore should be regularly re-evaluated, the authors of this article have reviewed the use of a BRA+CCA threshold and have found that its use is most likely inappropriate.
The rationale for inclusion of six CCA questions in the AS-ISK, and thus the use of the BRA+CCA threshold to rank species, was to take into account how future climate conditions are likely to affect a screened species' risk, i.e. either increasing or decreasing or not changing the risk score. Use of the BRA+CCA threshold, instead of the (baseline) BRA threshold, could potentially alter the species' risk rank (low, medium, high) in a manner that is inconsistent with the change in risk score. Overall, there are two possible errors in risk ranking that are introduced by the use of the BRA+CCA threshold instead of the BRA threshold: 1) Firstly, in cases where the BRA+CCA threshold is higher than the BRA threshold (Table 1), a species ranked as high-risk using the BRA threshold could be reduced in risk rank (i.e. to medium risk) using the BRA+CCA threshold -this despite the fact that the assessor's responses to the CCA questions indicated that the species' risks are likely to remain unchanged or to increase under predicted climate conditions. An example of this is the marine fish red drum Sciaenops ocellatus (Supplementary material  Table S8), which was attributed the same total risk score (20.5) for the BRA and BRA+CAA, i.e. no decrease or increase under future climate conditions. However, the species' risk rank dropped from High to Medium using the BRA+CCA threshold (21.75), which (falsely) suggests a decrease in the species' invasiveness risk despite no change in its BRA+CCA score (situated slightly above the BRA threshold of 19.75). 2) Secondly, in cases where the BRA+CCA threshold is lower than the BRA threshold (Table 1), a species could be increased in risk rank (to high risk) using the BRA+CCA threshold -this despite the fact that the assessor's responses to CCA questions indicated the species' risks are likely to decrease under predicted climate conditions. An example of this is the brackish-water fish mango tilapia Sarotherodon galilaeus (Table S8), which received a BRA score of 29.0 and a BRA+CAA score of 25.0, i.e. a slight decrease in risk under future climate conditions. However, the species' risk rank increased from Medium to High using the BRA+CCA threshold (22.50), which (falsely) suggests an increase in the risk of the species being invasive in the risk assessment area despite a reduced risk score (situated below the BRA threshold of 30.50).
The aim of the present paper is to re-evaluate the use of thresholds in published applications of WRA-type decision-support tools that currently contain separate climate-change questions, which at present are the AS-ISK and the recently released Terrestrial Animal Species Invasiveness Screening Kit (TAS-ISK: Vilizzi et al. 2022b) -both of these toolkits are available for free download at www.cefas.co.uk/nns/tools. The specific objectives of this study are to: 1) review published applications of the AS-ISK in which the BRA+CCA threshold was used to risk-rank non-native species; 2) re-allocate the screened species to the appropriate risk rank (i.e. low, medium, high) where necessary; and 3) provide new guidelines on the correct usage of thresholds in future applications of WRA-type toolkits that include both the 49 BRA and six CCA questions. Additionally, corrections are provided, where appropriate, to published risk ranks of species, which were based on the BRA scores, that resulted from published errors or incorrect setting of the threshold's limits by the authors of such applications.

Toolkit description
Full descriptions of the AS-ISK are available elsewhere (Copp et al. 2016b. Briefly, the AS-ISK represents a second-generation adaptation of the WRA, being a marriage of the generic screening questions in the "Pre-screening module" of the European Non-native Species in Aquaculture Risk Analysis Scheme (Copp et al. 2016a) with the architecture of the freshwater Fish Invasiveness Screening Kit v2 (Lawson et al. 2013). As with the WRA, the AS-ISK consists of 49 BRA questions, but unlike the WRA includes six CCA questions (Copp et al. 2016b). The AS-ISK allows assessors to screen 27 taxonomic groups (as per Ruggiero et al. 2015) of aquatic organisms in their choice of 32 languages (see Copp et al. 2021).
Upon completion of a risk screening, the species is attributed a BRA score and a BRA+CCA (composite) score -these range from −20 to 68 and from −32 to 80, respectively. The BRA+CCA score increases or decreases by up to 12 points relative to the BRA score but will remain unchanged in case where the total CCA score is 0. Scores < 1 suggest that the species is unlikely to become invasive in the risk assessment area and is therefore classified as "low risk" (Pheloung et al. 1999;Gordon et al. 2008). Higher scores classify the species as posing either a "medium risk" or a "high risk" of becoming invasive. Distinction between medium-risk and high-risk levels depends upon setting a threshold value, which is obtained by receiver operating characteristic curve analysis (Bewick et al. 2004;Gordon et al. 2008). This can be achieved when the screening includes a sufficient number and proportion of non-native species that are known (and therefore can be categorised) a priori to be either invasive or non-invasive (see Vilizzi et al. 2019Vilizzi et al. , 2022a. Notably, risk screening tools of the WRA-type, including the AS-ISK, do NOT contain a feature that allows the assessor to compute a threshold. The calibration of scores to obtain a threshold is a procedure carried as a separate, subsequent analytical step that uses the output scores from the WRA-type toolkits (Vilizzi et al. 2022a).

Review of applications and risk-rank correction
In the comprehensive review, the following data were retrieved for each published AS-ISK application: (i) risk assessment area; (ii) number of screened species; (iii) taxonomic group of the screened species (see , whenever applicable and if related to the setting of groupspecific thresholds; (iv) any other component involving replication at the species level affecting the screening scores (cf. separate assessors); (v) BRA threshold(s) (if provided); (vi) BRA+CCA threshold(s) (if provided).
For all applications that provided the BRA threshold only, or both the BRA and BRA+CCA thresholds, the scores (i.e. low, medium, high) for the screened species were then corrected (if applicable) and, in all cases, based on the BRA threshold only. Accordingly, correction of the risk rank based on both the BRA and BRA+CCA scores involved the ranking of a species into: • "Low risk" if score < 1; • "Medium risk" if score ≥ 1 and < Threshold; • "High risk" if score ≥ Threshold.
The number of corrected current-climate (BRA) and/or climate-change (BRA+CCA) risk ranks for each application was then reported, as applicable.

Results
With the exception of the global application by , which will be discussed separately here below due to its unique approach, a total of 43 published AS-ISK applications were retrieved for review. Of these, a partial application of the AS-ISK was excluded because it provisionally and partially used the Biology/Ecology section only to screen a non-aquatic taxonomic group, i.e. terrestrial reptiles (Kopecký et al. 2019).
Of the 42 retained applications (Table 1; Tables S1-S30), two provided risk scores for the BRA only (Filiz et al. 2017a;Castellanos-Galindo et al. 2018) and one for both the BRA and the BRA+CCA (Velle et al. 2021), but without reference to any threshold; as a result, the risk ranks were empirically defined. Another eight applications provided risk scores for both the BRA and BRA+CCA, but with reference to the BRA threshold only (i.e. no BRA+CCA threshold was computed); as such, no correction of the resulting risk ranks was required (Filiz et al. 2017b;Li et al. 2017;Tarkan et al. 2017aTarkan et al. , b, 2020Tarkan et al. , 2022Baduy et al. 2020;Kumar et al. 2021). However, as the application by Tarkan et al. (2017a) resulted in a calibrated threshold of −3 (hence, lower than the "default" threshold of −1 to distinguish between low-risk and medium-risk species: see Materials and methods), this was taken to distinguish between low-risk and high-risk species only, with no medium-risk species identified in that application.
The remaining 31 applications provided both BRA and BRA+CCA thresholds against which the BRA and BRA+CCA scores were evaluated, respectively, in terms of risk rankings (Table 1). These applications were therefore subject to revision of the BRA+CCA risk ranks, which were redefined against the corresponding BRA threshold only (see Materials and methods). Of these 31 applications, six dealt with one species only and the remaining 25 involved screenings of more than one species (note that hereafter the term "species" will be used loosely to refer to taxon, which in the reviewed applications included not only species but also genera, sub-species and hybrids -but see Dodd et al. (2019) and Yoğurtçuoğlu et al. (2021), below).
Of the six applications involving only one species (Table 1), one required two corrections based on assessor-specific risk ranks , whereas the other five did not require any correction (Suresh et al. 2019;Moghaddas et al. 2020;Zięba et al. 2020;Haubrock et al. 2021;IAVH 2021). Note that, in the IAVH (2021) application, the BRA and BRA+CCA threshold values were identical, hence no need to revise the risk ranks.
Of the 25 applications involving more than one species (Table 1), four did not involve any correction (i.e. Paganelli et al. 2018;Glamuzina et al. 2021;Ruykys et al. 2021;Saba et al. 2021). However, in the application by Ruykys et al. (2021), the BRA and BRA+CCA thresholds were identical (hence, no corrections to risk ranks), whereas the application by Saba et al. (2021) was included in this review only for the sake of completeness. This is because the procedure for calibration used by Saba et al. (2021), and the resulting thresholds and risk ranks described in that application, are to be regarded as statistically invalid due to the insufficient number (n = 5) of screened species required to achieve a calibration specific to the risk assessment area of interest.
Of the 21 applications that involved more than one species and required correction of the risk ranks, due to use of the BRA+CCA threshold (Table 1), four also required a change in the BRA-based risk rank. Specifically, the application by Glamuzina et al. (2017) provided incorrect BRA-based risk ranks for three species, though no corrections were required when the BRA threshold replaced the BRA+CCA threshold to rank species. In the application by Dodd et al. (2019), one species relative to a River Basin District was incorrectly classified as high-risk instead of medium-risk. The application by Li et al. (2021) required correction of the BRA scores equal to 1 for two species, which were classified as low risk instead of medium risk (i.e. low-risk score ≤ 1 instead of < 1: see Materials and methods). Whereas, although the application by Paganelli et al. (2021) correctly described the risk ranks of screened species based on their "Scenario I", incorrect risk ranks were provided in Table 3 of their paper for most species (i.e. "low level of risk", where none of the species' BRA or BRA+CCA scores was < 1).
Of the 17 applications that involved more than one species and required correction of the BRA+CCA based risk ranks only (Table 1), the application by Tarkan et al. (2021) required only one risk-rank correction, i.e. one species having been incorrectly classified as medium risk. Two applications screened species from several taxonomic groups, namely Clarke et al. (2020) and Tidbury et al. (2021), and of these, the former provided the same BRA and BRA+CCA thresholds for one group of species (hence, no correction required). All of the remaining 14 applications required a change in risk rank for one to nine species (i.e. Semenchenko et al. 2018;Bilge et al. 2019;Interesova et al. 2020;Killi et al. 2020;Lyons et al. 2020;Uyan et al. 2020;Moghaddas et al. 2021;Radočaj et al. 2021;Stasolla et al. 2021;Wei et al. 2021a, b;Yapici 2021;Yoğurtçuoğlu et al. 2021;. However, the application by To et al. (2022) failed to implement a statistically reliable receiver operating characteristic curve analysis for calibration and they used incorrect species names, but their study has been included here for the sake of completeness. Of particular note is the application by Yoğurtçuoğlu et al. (2021), which screened a single species with respect to 25 river basins (i.e. risk assessment areas), hence resulting in risk ranks that are river basin-specific instead of species-specific.
Regarding the global application by , this involved the screening of 819 non-native species from 15 groups of aquatic organisms. No species-specific risk scores were provided, except for the 29 species classified as very high-risk based on the BRA+CCA thresholds (their Fig. 4b).
As such, in any future AS-ISK applications that make use of these global thresholds, because risk assessment-area specific calibration is not possible (see  for details on the usage of global thresholds), only the global BRA based thresholds for aquatic organismal groups ( Table 2 in ) and for climate/marine ecoregions (Table 5: ibid.) should be used.
Overall, most of the corrections that resulted when the erroneous BRA+CCA threshold was replaced by the correct BRA threshold, involved an increase in risk rank from medium to high. This was the case of the global trial application  in which the calibrated BRA threshold was lower for most taxonomic groups than the calibrated BRA+CCA threshold. The exceptions to this were the applications by Clarke et al. (2020) and Uyan et al. (2020) for which most changes were decreases in risk rank from high to medium. In these two latter applications, five of the seven calibrated BRA thresholds (for taxonomic groups) were virtually equal to or lower than the calibrated BRA+CCA threshold.

Discussion
As a result of the revised guidance on the use of calibrated thresholds outlined in this paper, half (n = 21) of the published AS-ISK applications required a correction of the risk rank for one or more of the screened species. In some of these applications, a "moderate" to "substantial" increase in the proportion of species ranked as high-risk (instead of medium-risk, as originally reported) results from use of the correct (BRA) threshold to rank the species (i.e. Semenchenko et al. 2018;Bilge et al. 2019;Interesova et al. 2020;Killi et al. 2020;Paganelli et al. 2021;Wei et al. 2021a;Yapici 2021). If those studies are (or have been) used to advise policy and/or management decisions, then the stakeholders and decision-makers should be informed of these changes in risk rank. Similarly, though perhaps less critically, the decreases in risk rank (i.e. from high to medium) in some AS-ISK applications (e.g. Clarke et al. 2020;Uyan et al. 2020) may not necessarily require an amendment to the policy and management decisions (e.g. in terms of "prioritisation") if financial resources are available. For all other applications, any changes in risk rank (especially from medium to high) should be evaluated on a species-specific basis, with amendments to advise provision considered on a case-by-case basis for the risk assessment area of interest.
The AS-ISK and TAS-ISK electronic toolkits (currently available in version 2.3.2) include the setting of both the BRA and BRA+CCA thresholds (which were computed separately by receiver operating characteristic curve analysis), with the corresponding species' BRA and BRA+CCA risk ranks being provided. In view of the revised guidance provided here on the correct use of calibrated risk thresholds, until such time that the BRA+CCA threshold value field can be removed from the AS-ISK and the TAS-ISK, users of these decision-support tools should henceforth insert the same value into both of the toolkit's BRA and BRA+CCA threshold fields. This will ensure that the species screening report generated by the toolkit provides the same (correct) risk rank. However, it is important to understand that the threshold-setting function is not possible in AS-ISK and TAS-ISK applications whenever the database contains multiple screenings of the same species, i.e. by more than one assessor (i.e. Glamuzina et al. 2017;Li et al. 2017;Tarkan et al. 2017a, b;Bilge et al. 2019;Lyons et al. 2020;Glamuzina et al. 2021;Li et al. 2021;Moghaddas et al. 2021;Wei et al. 2021a). This constraint results from assessor-level replication of the species-specific scores, which are averaged over the assessors.