Where Did You Come From? Where Did You Go? Investigating the Origin of Invasive Leptocybe Species Using Distribution Modelling

Research Highlights: We present the first attempts to model the distributions of the two cryptic and globally invasive species of Leptocybe invasa sensu lato (Fisher & LaSalle) (Hymenoptera: Eulophidae) in its purported country of origin, namely Australia. Background and Objectives: Leptocybe invasa is an invasive eucalypt-galling wasp that spread quickly all over the world in the early to mid-2000’s, achieving significant pest status through its severe impacts on the growth and productivity of extra-limital eucalypt plantations. Until its discovery in Europe and the Middle East, the genus was undescribed, and its native range remains unclear. Molecular studies indicate the globally invasive population comprises two cryptic species with variable modes of reproduction. Collection records from Australia, the purported origin, represent only one of the invasive lineages, restricted to subtropical and tropical Queensland and northern New South Wales. To date, the original invasive lineage has not been found in Australia, despite searches over the seventeen years that it has been spreading overseas. Materials and Methods: To understand the distributions of the invasive populations, and to infer Leptocybe spp. native ranges within Australia, we used correlative niche modelling in Maximum Entropy (MaxEnt) and multivariate analysis, and created a CLIMEX model based on development rates of an invasive population. Results: We used the environmental conditions in the extra-limital range to infer possible origins, with our findings supporting the possibility that the invasive populations may have originated from different populations in Australia. Conclusions: We highlight the need for better understanding of the distribution, genetic diversity, and reproductive mode of the species within Australia. The variety of climatic niches occupied by invasive lineages of the wasp potentially present new threats to eucalypts in previously uninfested habitats.


Introduction
"Know from whence you came. If you know whence you came, there are absolutely no limitations to where you can go." -James Baldwin Eucalypts are cultivated worldwide for their fast growth and suitability for a range of uses. The high diversity of Eucalyptus species within Australia is associated with an immense diversity of insect herbivores, some of which can be very damaging when their abundance is unusually high [1][2][3][4][5][6][7]. Some of these insects have become invasive pests of eucalypts where they are grown as exotic species, and are of particular concern to hardwood plantations globally [8][9][10][11][12].
In the early 2000s, an invasive gall wasp previously unknown to science was discovered on eucalypts in the Middle East and Europe. The wasp, known only from females, was described into a novel monotypic genus as Leptocybe invasa [13], by which time it had already spread to Asia and Africa [14,15]. It is believed to have originated in Australia [13], a supposition supported by its close phylogenetic relationship with endemic Australasian species [16], and because the only eucalypt-gall-inducing Chalcidoidea are endemic to Australia [17], and based on scenario modelling of populations [18]. Although it was initially reported from south-eastern and northern Queensland during searches for potential biocontrol agents [19], molecular studies [18] were unable to match any Australian Leptocybe specimens to L. invasa.
The invasive populations were shown to comprise two cryptic species {named "Western" (the lineage to which L. invasa belongs) and "Asian" (the novel lineage) in Nugnes et al. [14] and "Lineage A" and "Lineage B" in Dittrich-Schröder et al. [18], respectively}. None of several extensive collections, ranging from North Queensland to northern New South Wales, including those from which successful biocontrol agents of L. invasa were sourced [19,20], were L. invasa [18,21]. A third lineage ("C") was reported from one site in Australia, but all other Australian populations reared from L. invasa-type galls [21] were identified as belonging to "Asian/Lineage B" [18]. The geographic origin, host range, endemic natural enemies and geographic range of L. invasa sensu stricto thus remain unknown. Additionally, the identification of two cryptic invasive Leptocybe spp. suggests that studies reporting on L. invasa conducted in regions where both occur in sympatry may have unknowingly used the second species or a mixture of both. The abundance of hosts, parthenogenetic reproduction and a lag in mortality from natural enemies might have contributed to the fast spread of L. invasa around the world [22], although it is now well-controlled by the parasitoid wasp Quadrastichus mendeli in several countries [15].
Of the many environmental factors influencing the distribution of living organisms, the altitude of occurrence of the host plant has been reported to restrict the occurrence of L. invasa galls in East Africa [23,24] and Turkey [25]. The influence of altitude is presumably mediated through effects on ambient temperature which can impact insects indirectly by altering biotic interactions or directly by modifying their metabolism, development time and ultimately their phenology [26,27]. Insects can survive low temperatures through freezing avoidance or thermal tolerance [28,29]; with supercooling ability [30,31], rapid cold hardening [32] and overwintering [33] all reported in L. invasa. Nevertheless, while low temperature is regarded as one of the most important factors influencing invasion and colonization by L. invasa [32] it is not known how temperature extremes impact its distribution.
Species-distribution modelling is often used to predict the potential invasive range of organisms where the native range is known [34], to predict pest distributions, biocontrol agent suitability, and assess exotic-endemic species interactions [35]. However, "reverse modelling" to determine the native range of species based on an established invasive range is less common. To infer the possible origins of overseas populations of L. invasa in Australia and the environmental variables which may delimit invasions, we used MaxEnt (Maximum Entropy) and CLIMEX modelling approaches, using environmental conditions in the invaded range, and developmental thresholds calculated in invasive range populations, respectively. We test the suggestion arising from genetic modelling that the two invasive lineages of the wasps originated from different locations in Australia [18] and compare results from the two distribution models. Identifying the geographic origin of L. invasa sensu stricto will aid in the search for biological control agents [15] and in understanding its invasion processes.

MaxEnt Modelling
Species occurrence data were obtained from a variety of sources. The distribution of L. invasa in invaded ecosystems was compiled from publications with georeferenced collection locations. Requests were sent to authors of non-georeferenced publications for the geographical locations of insect occurrence. Other collection records were obtained from the Centre for Agriculture and Bioscience International (CABI) Invasive Species Compendium (https://www.cabi.org/isc; last accessed 18 December 2017). We created databases separating the collection records into different populations based on the genetic groupings as described by [14,18]. The databases included Lineage A which comprises African, South American and Mediterranean-region populations, Lineage B (invasive) which included populations in Asia, and Lineage B (endemic) from Australian collections {Table S1 in Supplementary Material; for current distribution maps see [15,18]}. Since South Africa has a mixture of Lineages A and B but knowledge of their respective geographical distributions was unavailable, these records were omitted from the African database. Both lineages likewise co-occur in Thailand, Vietnam and Laos [18], but these countries represented less than 12% of our Asian records {with over half from China, where only Lineage B has been recorded [14]}, so all Asian records were treated as Lineage B. Environmental variables characterising climatic factors including precipitation, temperature, moisture index and radiation (Table S2), at 10-minute spatial resolution, were obtained from the CliMond dataset [36]. The 35 variables used represent climate in the period 1950-2000.

Niche Modelling
We carried out correlative niche modelling [37] using multivariate analysis and MaxEnt because our records were 'presence only' data [38]. MaxEnt estimates the probability of occurrence based on the suitability of environmental parameters [39] and has predictive power even with small datasets [40,41]. Indices of habitat suitability were predicted for each population using MaxEnt desktop version 3.3.3 k (http://biodiversityinformatics.amnh.org/open_source/maxent/) [38]. The environmental factors themselves that generated these models were compared for different populations. Since the climatic range of Eucalyptus does not extend to the polar region [42,43], our modelling excluded these regions. Background sampling was restricted by the rectangular extent of collection records [39,[44][45][46]. Because MaxEnt is capable of selecting the appropriate feature for the number of samples used for a model [39,46], we used default features and default regularization parameters. After determining that some databases were expansive and included regions across several continents, we split them into regional population groups which improved the area under the curve (AUC).
The distributions of the known lineages in different regions was considered in the resulting prediction maps and the projected suitability to Australian climatic conditions compared to that of the Australian collection localities. The models were evaluated by observing the AUC of the receiver operating characteristic plot of the predictions.

Testing for Niche Shifts
To infer whether niche shifts might have occurred, environmental data corresponding to each collection location were extracted in ArcMap from Bioclim variables and used in Multivariate analysis. Principal components analysis (PCA) was used to project the distribution of the collection records under reduced dimensions [47,48]. Canonical variates analysis was carried out in Genstat to visualise grouping patterns of occurrence records [49]. Populations were grouped according to lineage. Groupings were tested by bootstrapping and observing reallocation errors and Mahalanobis distances in stepwise discriminant analysis.

Effects of Environmental Factors
To evaluate the contribution of the climatic variables to the MaxEnt models, a jack-knife test was used. Wilk's lambda (λ) criterion in stepwise discriminant analysis was used to evaluate the importance of climatic variables in the distribution of the collection locations. The error of reallocation of data points to groupings was also considered.

CLIMEX Modelling
The Mediterranean CLIMEX default parameter file {Hearne Scientific CLIMEX v4 [50]} was used with the Match Locations function as the base model onto which the L. invasa developmental parameters reported in [33] were built ( Table 1). The Mediterranean default file was chosen because that was the region where L. invasa was first reported, widely spread and comprises only Lineage A [15]. For the modified model, temperature thresholds and day degree requirements were from [33], while cold stress values reflected the reported lower temperature threshold (0 • C) of Qui et al. (2011) reported in [33]. The CLIMEX model represented a different modelling approach from MaxEnt as the distribution records was not used; rather the default Mediterranean climate parameter file was built upon to incorporate biological development data to predict climatic suitability {eco-climatic index (EI) score} within Australia and hence predict the potential geographic origin of L. invasa Lineage A.

Niche Modelling and Projections in MaxEnt
The MaxEnt models had AUC values greater than 0.5 for all the populations: Lineage A 0.858; Lineage B invasive 0.909; Lineage B endemic 0.875. When collection records in invaded locations (invasive populations only; each lineage considered separately) were projected back to Australia, the models showed that the most suitable geographic regions for these populations differed, primarily with respect to southwest Western Australia and inland northern Australia (Figure 1).

Niche Modelling and Projections in MaxEnt
The MaxEnt models had AUC values greater than 0.5 for all the populations: Lineage A 0.858; Lineage B invasive 0.909; Lineage B endemic 0.875. When collection records in invaded locations (invasive populations only; each lineage considered separately) were projected back to Australia, the models showed that the most suitable geographic regions for these populations differed, primarily with respect to southwest Western Australia and inland northern Australia (Figure 1).

Testing for Niche Overlap
The suitability range for Lineage A when projected to Australia merges with part of the predicted endemic range of collections from Queensland to the southern region of Australia ( Figure  1A). The projection of the Asian population (Lineage B) to Australia shows suitability in northern and eastern Australia, corresponding to regions of both moderate and high suitability from the Australian collection records model ( Figure 1B), but also a reasonable match with the Australian collection records themselves in Figure 1A.
The PCA showed an aggregation of a population of Lineage A with both B lineages; another group comprised only Lineage A ( Figure 2). This indicated that Lineage A was not a homogenous group and we therefore separated them into their regional populations, i.e. African, American and Mediterranean populations.

Testing for Niche Overlap
The suitability range for Lineage A when projected to Australia merges with part of the predicted endemic range of collections from Queensland to the southern region of Australia ( Figure 1A). The projection of the Asian population (Lineage B) to Australia shows suitability in northern and eastern Australia, corresponding to regions of both moderate and high suitability from the Australian collection records model ( Figure 1B), but also a reasonable match with the Australian collection records themselves in Figure 1A.
The PCA showed an aggregation of a population of Lineage A with both B lineages; another group comprised only Lineage A (Figure 2). This indicated that Lineage A was not a homogenous group and we therefore separated them into their regional populations, i.e. African, American and Mediterranean populations.

Contribution of Environmental Variables to the Probability of Occurrence of L. invasa
In MaxEnt, the different lineages were separable by the climatic variables derived from the major factors of temperature, precipitation, radiation and moisture index (Table S3). The probability of occurrence generally increased with increasing annual mean air temperature for all the invasive

Contribution of Environmental Variables to the Probability of Occurrence of L. invasa
In MaxEnt, the different lineages were separable by the climatic variables derived from the major factors of temperature, precipitation, radiation and moisture index (Table S3). The probability of occurrence generally increased with increasing annual mean air temperature for all the invasive

Contribution of Environmental Variables to the Probability of Occurrence of L. invasa
In MaxEnt, the different lineages were separable by the climatic variables derived from the major factors of temperature, precipitation, radiation and moisture index (Table S3). The probability of occurrence generally increased with increasing annual mean air temperature for all the invasive populations ( Figure 4). Annual mean precipitation, radiation and moisture index showed the same effect but peaking around different ranges for the different populations. The Australian population showed a general decline with increase in annual mean temperature, precipitation and moisture index but an increase with increasing annual mean radiation (Figure 4). These effects were not evident, however, when other environmental factors were included in the models.  (Figure 4). Annual mean precipitation, radiation and moisture index showed the same effect but peaking around different ranges for the different populations. The Australian population showed a general decline with increase in annual mean temperature, precipitation and moisture index but an increase with increasing annual mean radiation (Figure 4). These effects were not evident, however, when other environmental factors were included in the models. The first component of the PCA was explained by environmental variability (36.0%), mainly in precipitation and moisture index, while the second (20.5%) was explained by radiation and temperature (Tables S4 and S5). In stepwise discriminant analysis, the variables with the lowest Wilk's lambda values were dominated by temperature-related variables though other factors also contributed (Tables S6 and S7).

CLIMEX Modelling
The Mediterranean default parameters used as an approximation of Lineage A showed a widespread climatic suitability ( Figure 5A). The addition of the developmental data of Qui et al.
(2011) {reported in [33]}, specifically a lower temperature threshold of 0 °C and degree-days of 1776, to the CLIMEX Mediterranean default in Australia did not change the geographic area of EI suitability beyond the Mediterranean projection, and was greater than the range predicted by MaxEnt modelling using distribution data that included the Mediterranean region. However, the CLIMEX model output using the much higher developmental temperature thresholds given in [33] (Figure 5B) followed more closely the reported endemic distribution of Lineage B [18], and the eastern MaxEnt model generated using invasive Lineage B distribution data ( Figure 1B). This supports the inference that the Chinese lineage used in [33] was Lineage B, i.e. the lineage reported from China by [14], and by [18] from the projected Australian distribution. There is a small possibility that the Qui et al. (2011) development rates may have been calculated using Lineage A, since the MaxEnt and CLIMEX models overlap somewhat in their predictions, but this remains untested as Lineage A has not so far been reported from China, although populations with differing reproductive modes have [14]. The first component of the PCA was explained by environmental variability (36.0%), mainly in precipitation and moisture index, while the second (20.5%) was explained by radiation and temperature (Tables S4 and S5). In stepwise discriminant analysis, the variables with the lowest Wilk's lambda values were dominated by temperature-related variables though other factors also contributed (Tables S6 and S7).

CLIMEX Modelling
The Mediterranean default parameters used as an approximation of Lineage A showed a widespread climatic suitability ( Figure 5A). The addition of the developmental data of Qui et al.
(2011) {reported in [33]}, specifically a lower temperature threshold of 0 • C and degree-days of 1776, to the CLIMEX Mediterranean default in Australia did not change the geographic area of EI suitability beyond the Mediterranean projection, and was greater than the range predicted by MaxEnt modelling using distribution data that included the Mediterranean region. However, the CLIMEX model output using the much higher developmental temperature thresholds given in [33] (Figure 5B) followed more closely the reported endemic distribution of Lineage B [18], and the eastern MaxEnt model generated using invasive Lineage B distribution data ( Figure 1B). This supports the inference that the Chinese lineage used in [33] was Lineage B, i.e. the lineage reported from China by [14], and by [18] from the projected Australian distribution. There is a small possibility that the Qui et al. (2011) development rates may have been calculated using Lineage A, since the MaxEnt and CLIMEX models overlap somewhat in their predictions, but this remains untested as Lineage A has not so far been reported from China, although populations with differing reproductive modes have [14].

Discussion
Our MaxEnt and CLIMEX models supported the widely held assumption that L. invasa originated in Australia, despite having never been recorded there, since both modelling methods showed climatic suitability for this lineage (Lineage A). The multivariate analysis showed climatic niche variability in the different populations, with the Mediterranean and Asian populations occupying niches different from all the other populations. It is therefore likely that variability occurs within the genus in Australia, and that the different lineages are from different origins, as suggested by genetic scenario modelling [18]. A small number of eucalypt species are endemic to New Guinea and Indonesia [51] and although little is known of their galling fauna, they have never been surveyed for Leptocybe and there remains a possibility that L. invasa may have originated there.
The aim of our study was to predict the geographic region of origin of L. invasa sensu stricto (Lineage A) using two modelling methods. The first, MaxEnt, used the current invasive distribution of each lineage to infer climatically-similar regions in Australia. The prediction for Lineage A overlapped with the known distribution of Lineage B in Queensland and northern New South Wales but showed suitability further south, and particularly in Western Australia. As for the CLIMEX model, MaxEnt predicted Lineage B across a more northerly distribution, with both modelled distributions overlapping the known range of Lineage B. CLIMEX, using a Mediterranean climate to represent the original and widespread exotic occurrence of Lineage A, predicted a large range that overlapped with the prediction generated by MaxEnt, and also a more southerly and western distribution than Lineage B has been found. Only Lineage B has been reported from China, where the development rate data used in the modified CLIMEX Mediterranean model was generated [33], and the model output fitted well with the known range of Lineage B in Australia. This model was a better fit with the known range of Lineage B than the MaxEnt model, which placed it further west than suitable hosts are likely to occur. Development rates, thermal thresholds and day-degree requirements for Lineage A (currently unavailable) might help to improve our predictive power for the endemic distribution of L. invasa using CLIMEX.
Potentially, phenotypic plasticity [52], along with asexual reproduction and concealed habit in galls, as well as anthropogenic distribution of suitable hosts adaptable to widely variable environmental conditions, probably aided range expansion by the wasps. Lineage B (Asian population) and the Mediterranean population of Lineage A occupy the most divergent climatic niches from specimens collected in Australia, suggesting a lack of niche conservatism since the Australian population is genetically closest to Lineage B invasive [18]. Males are more frequently observed in Lineage B [14] which could indicate sexual reproduction in these populations that could lead to future rapid genetic variability in the populations. Sexual reproduction has been suggested to

Discussion
Our MaxEnt and CLIMEX models supported the widely held assumption that L. invasa originated in Australia, despite having never been recorded there, since both modelling methods showed climatic suitability for this lineage (Lineage A). The multivariate analysis showed climatic niche variability in the different populations, with the Mediterranean and Asian populations occupying niches different from all the other populations. It is therefore likely that variability occurs within the genus in Australia, and that the different lineages are from different origins, as suggested by genetic scenario modelling [18]. A small number of eucalypt species are endemic to New Guinea and Indonesia [51] and although little is known of their galling fauna, they have never been surveyed for Leptocybe and there remains a possibility that L. invasa may have originated there.
The aim of our study was to predict the geographic region of origin of L. invasa sensu stricto (Lineage A) using two modelling methods. The first, MaxEnt, used the current invasive distribution of each lineage to infer climatically-similar regions in Australia. The prediction for Lineage A overlapped with the known distribution of Lineage B in Queensland and northern New South Wales but showed suitability further south, and particularly in Western Australia. As for the CLIMEX model, MaxEnt predicted Lineage B across a more northerly distribution, with both modelled distributions overlapping the known range of Lineage B. CLIMEX, using a Mediterranean climate to represent the original and widespread exotic occurrence of Lineage A, predicted a large range that overlapped with the prediction generated by MaxEnt, and also a more southerly and western distribution than Lineage B has been found. Only Lineage B has been reported from China, where the development rate data used in the modified CLIMEX Mediterranean model was generated [33], and the model output fitted well with the known range of Lineage B in Australia. This model was a better fit with the known range of Lineage B than the MaxEnt model, which placed it further west than suitable hosts are likely to occur. Development rates, thermal thresholds and day-degree requirements for Lineage A (currently unavailable) might help to improve our predictive power for the endemic distribution of L. invasa using CLIMEX.
Potentially, phenotypic plasticity [52], along with asexual reproduction and concealed habit in galls, as well as anthropogenic distribution of suitable hosts adaptable to widely variable environmental conditions, probably aided range expansion by the wasps. Lineage B (Asian population) and the Mediterranean population of Lineage A occupy the most divergent climatic niches from specimens collected in Australia, suggesting a lack of niche conservatism since the Australian population is genetically closest to Lineage B invasive [18]. Males are more frequently observed in Lineage B [14] which could indicate sexual reproduction in these populations that could lead to future rapid genetic variability in the populations. Sexual reproduction has been suggested to be associated with improved fitness in populations utilising heterogeneous environments [53] and could be associated with climatic range expansion in the sexual populations of L. invasa. Empirical evidence supporting this hypothesis exists for rotifers [54] but a study using thrips did not support it [55]. Nevertheless, given the serious impact of this pest on planted eucalypts worldwide [15] the economic implications of its continued range expansion are clear.
Although temperature is thought to be the most restricting environmental factor affecting insect development [56,57], supercooling ability is exhibited in L. invasa populations in temperate climates [30]. However, none of our models singled out temperature or its derived variables to be the major factor separating the climatic niches but our study does not explain the observed negative effect of altitude, i.e., low temperatures on galling intensity [23,24].
Our models indicated that Lineage A could have a wider area of endemism than the existing collection locations in Australia, mostly to the south, and suggest that future collections of galls should target southern parts of Australia to locate populations of natural enemies associated with Lineage A, and with more climatic similarity to its possible endemic origin. All the parasitoid species introduced as biological control agents [15] were collected from within the endemic range of Lineage B. The Australian-origin eucalypt weevil Gonipterus scutellatus sensu lato (Coleoptera: Curculionidae), comprises an invasive cryptic species complex exhibiting variable biocontrol as a potential climatic and/or taxonomic mismatch between host and biocontrol agent [58]. Climatic variability and host-parasitoid mismatch between different lineages could similarly impact the performance of parasitoids used for the biological control of L. invasa in different parts of the world [19,[59][60][61]. These findings also imply a risk of reintroduction of the invasive populations to Australia or local introductions to other parts of the world where they are currently not found, thereby impacting the performance of susceptible host plants.

Conclusions
Our findings {expanded from [62]} represent the first attempts to model the distributions of the two cryptic and globally invasive species of Leptocybe invasa sensu lato in its purported country of origin, namely Australia. Molecular and biological evidence for the existence of two species likely reflects adaptation of insects introduced overseas to climatically disparate regions in Australia. Surprisingly, neither temperature nor its derived variables were found to contribute strongly to the differentiation of the species' climatic niches in the endemic range. Understanding the threats posed to extra-limital plantings of eucalypts as well as those in Australian ecosystems remains challenging in the absence of verified collection records from Australia and better information about the species' genetic diversity and modes of reproduction.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/10/2/115/s1, Table S1: Collection records used for modelling climatic suitability for Leptocybe species, Table S2: Key to Bioclim variables {from [36]}, Table S3: Contributions of Bioclim variables in MaxEnt logistic prediction of Leptocybe invasa suitability models of different populations, Table S4: Principal components loadings for all the collection records divided into Lineages A, B invasive and B endemic (latter denoted as C in Figure 2), Table S5: Principal components loadings for the populations of Lineage A, Table S6: Wilk's lambda criterion values of the environmental variables used to predict the variability in climatic conditions of the different lineages of Leptocybe invasa, Table S7