Towards understanding insect species introduction and establishment: A community‐level barcoding approach using island beetles

Since Darwin put forward his opposing hypotheses to explain the successful establishment of species in areas outside their native ranges, the preadaptation and competition‐relatedness hypotheses, known as Darwin's naturalization conundrum, numerous studies have sought to understand the relative importance of each. Here, we take advantage of well‐characterized beetle communities across laurel forests of the Canary Islands for a first evaluation of the relative support for Darwin's two hypotheses within arthropods. We generated a mitogenome backbone tree comprising nearly half of the beetle genera recorded within the Canary Islands for the phylogenetic placement of native and introduced species sampled in laurel forests, using cytochrome c oxidase I (COI) sequences. For comparative purposes, we also assembled and phylogenetically placed a data set of COI sequences for introduced beetle species that were not sampled within laurel forests. Our results suggest a stronger effect of species preadaptation over resource competition, while also revealing an underappreciated shortfall in arthropod biodiversity data—knowledge of species as being native or introduced. We name this the Humboldtean shortfall and suggest that similar studies using arthropods should incorporate DNA barcode sequencing to mitigate this problem.

evolutionarily related to native species are expected to have higher establishment success than less related species (Li et al., 2015).
The proposed mechanistic explanation is that alien species that are closely related to native species are likely to share similar traits with those native species, such as diet, physiological tolerances and ecological niche requirements, facilitating their establishment (Burns & Strauss, 2012;Li et al., 2015;Schaefer et al., 2011). The second is the competition-relatedness hypothesis (Figure 1), where alien species that are unrelated to native species are suggested to have higher establishment success through limited resource competition with native species (Darwin, 1859). The proposed mechanistic explanation is that alien species that are more distantly related to native species are less likely to share similar niche requirements. Such niche differentiation should act to both limit competition for resources, and facilitate the occupation of new or unfilled niches (Fristoe et al., 2021;Schaefer et al., 2011). In the current climate of global biodiversity loss, with concomitant negative knock-on effects to ecosystem services (Kumschick et al., 2015;Mack et al., 2000), and increasing economic costs from alien species (Diagne et al., 2021), understanding the ecological and evolutionary dynamics of biological invasion is an important area of investigation (Davidson et al., 2011;Fristoe et al., 2021;Li et al., 2015;Park et al., 2020;Procheş et al., 2008), for which establishment dynamics are a fundamental foundation.
In addition to the original hypotheses that form the so-called Darwin's naturalization conundrum, at least 33 additional hypotheses have been proposed as potential mechanistic explanations for alien species establishment (Jeschke & Heger, 2018), of which four make predictions based upon similarity among species. All four are consistent with Darwin's competition-relatedness hypothesis, in that they predict that introduced species that are less related (similar) to native species will have higher establishment success. The first of these is the enemy-release hypothesis (Keane & Crawley, 2002), which posits that the absence of natural predators or parasitoids may enhance the successful establishment of alien species in new territories. Thus, alien species that are distantly related to native species will have an establishment advantage over more related species by not being subjected to the pressure of native predators or parasitoids (Liu & Stiling, 2006;Torchin et al., 2001). The second is the life history hypothesis (Cassey, 2002;Jeschke & Strayer, 2006;Van Kleunen et al., 2010), which argues that evolutionarily distant alien species may have numerous different functional traits in comparison with native species, and that these may allow them to successfully establish and coexist with native species. The third is the limiting similarity hypothesis, which proposes that the probability of successful establishment of a given introduced species increases as a function of how different it is from native species (MacArthur & Levins, 1967). While this hypothesis does not explicitly reference evolutionary relatedness, it can be considered an extension of Darwin's competition-relatedness hypothesis under a general model where species diverge in multidimensional trait space through time. Finally, the novel weapon hypothesis predicts that F I G U R E 1 Darwin's naturalization conundrum. Upper panel: hypothetical phylogeny of beetle species indicating relatedness among native (black) and non-native (grey) species. Bottom left panel: the preadaptation hypothesis predicts that introduced species that are evolutionarily related to native species are expected to have higher establishment success than less related species, facilitated by trait similarity. Bottom right panel: the competition-relatedness hypothesis predicts that introduced species that are unrelated to native species will have higher establishment success, through limited resource competition with native species.
introduced species with novel traits, relative to traits within the native community, will have a higher likelihood of establishment through competitive advantage (Callaway & Ridenour, 2004). This hypothesis is consistent with Darwin's competition-relatedness hypothesis, under the assumption that the more unrelated species are, the more likely they are to present different weapons.
Our understanding of how relatedness between introduced and native species may influence establishment success has been informed by phylogenetic analysis of community assembly. This has been studied particularly more in plants than in other organisms such as invertebrates (e.g. insects or arachnids), given the comparatively greater availability of historical data for plants (Daehler, 2001;Diez et al., 2008;Fristoe et al., 2021;Li et al., 2015;Schaefer et al., 2011). Well-studied floristic systems have also provided comprehensive time series, such as the flora of the Azorean islands, where introduced species records extend back to 1494 (Schaefer et al., 2011). In contrast, invertebrates in general, and arthropods in particular, have received relatively little attention, and this is likely due to several complicating factors. Introduced species are likely to be difficult to detect, even in the early stages of invasion, as exemplified by the crazy ant Anoplolepis gracilipes on Christmas Island, detected only when it had already spread through rainforest habitat, causing serious problems to biodiversity (O'Dowd et al., 2003). The typically small size and often cryptic nature of invertebrate species may also lead to detection difficulties (Simberloff, 2013).
The lack of community phylogenetic investigations focussing on introduced invertebrate species is an obstacle to understanding what, if any, general principles may underpin the successful establishment of introduced invertebrate species. This is particularly relevant given the substantial threats to biodiversity, associated ecosystems services, and food security, posed by invertebrate species. At a global scale, it has been estimated that approximately 8.7 billion US dollars are spent yearly to counteract the negative effects of introduced invertebrate species or to prevent their establishment (Diagne et al., 2021). While elimination is desirable, mitigation efforts are often more realistic, as exemplified by Lymantria dispar and Reticulitermes flavipes. The first is a European moth, capable of devastating forest damage within its introduced North American range (Sharov & Liebhold, 1998 Grapputo et al., 2005), the Japanese beetle, Popillia japonica, introduced to both North America and Europe (Mori et al., 2021), and the globally established whitefly, Bemisia tabaci (Oliveira et al., 2001).
Despite the challenges posed by arthropod systems, progress can be made if regional faunas have been well-characterized, and local communities have been subject to detailed inventory.
Here, we take advantage of well-characterized communities of beetle species within the laurel cloud forests of the Canary Islands (Salces-Castellano et al., 2021) to evaluate the influence of phylogenetic relatedness on introduced species establishment.
Using species inventory data from 31 plots sampled across the four western Canary Islands, together with data from a regional biodiversity databank, and publicly available DNA sequences, we categorize species as native (NAT) or introduced as a result of human activity (INT), and further categorize NAT species as endemic (END) or not endemic (NATnotend). Distinguishing between NATnotend and INT species origins is often challenging (Patiño & Vanderpoorten, 2015), and we address this by recategorizing presumed NATnotend species as INT if they present high sequence matching (≥99% similarity) to individuals sampled outside the Canary Islands. We use this data to determine the presence of introduced beetle species within laurel cloud forest habitat of the Canary Islands, and to test between the relative roles of the preadaptation and competition hypotheses.
Endemic species, by virtue of their evolved nature within the archipelago (but see Emerson & Kolm, 2005), can typically be assumed to have been present within the archipelago prior to the arrival of NATnotend species. In turn, species derived from human introductions represent a subsequent and more recent phase of community assembly. This temporal structure provides an opportunity to contrast phylogenetic structure among different establishment cohorts to infer community assembly processes through time. As community assembly progresses and species richness and complexity increase, the dimensionality of unoccupied niche space should diminish, providing less opportunity for species to establish as a consequence of increased competition for limited resources. We therefore hypothesize that more recently established species within the laurel cloud forests (INT species) will typically present stronger signatures for the preadaptation hypothesis, compared with species of less recent origin (NATnotend species). To address the aforementioned hypotheses, we first construct a mitogenome backbone tree for sampled genera, and use phylogenetic placement of COI sequences for individual species within genera to then estimate phylogenetic relatedness among species.

| Study taxa and sampling framework
The beetle fauna of the Canary Islands comprises 2245 described

| Field sampling, mtDNA sequencing and PBS delimitation
In addition to the 25 laurel forest plots described in Salces-Castellano et al. (2021), six plots were sampled in 2019 and 2020 ( Figure 2, Table S1), following the protocol described by Emerson et al. (2017).
Individuals from the newly sampled plots were sorted to parataxonomic units (PU), from which four individuals within each plot, if possible, were sequenced for an 824 bp region of the mtDNA COI gene (hereafter referred to as COI-3P, reflecting its location towards the 3′ end of the gene). For some particularly difficult taxonomic groups (e.g. the genera Atheta (Staphylinidae) and Laparocerus (Curculionidae), and the tribe Cryptorhynchini), more than four individuals were sequenced. Sequences were then combined with those from Salces-Castellano et al. (2021). To identify presumed biological species (PBS; Emerson et al., 2017), a custom R script available from github (Salces-Castellano et al., 2021; https://github.com/asalc escas tella no/Diver gence -thres hold.git) was used to produce an unweighted pair group method with arithmetic mean (UPGMA) tree from pairwise K2P distances, using an alignment of all sequences from all PUs. This script first implements a conservative maximum intraspecific divergence threshold of 8% to create individual lineage alignments. Under this divergence threshold, it is assumed that all individuals of a given biological species will be represented within a single lineage, but that a given lineage may comprise more than one biological species. The script then generates a table summarizing: (i) the PU composition within each lineage; (ii) for each PU within a lineage, the proportion of individuals from that PU included within the lineage; and (iii) when a lineage comprises individuals from more than one PU, the presence or absence of monophyly among individuals from the same PU. Lineages exclusively comprising all individuals within a given PU were inferred to represent a single PBS. Individuals from PUs that segregated across more than one lineage were revised to evaluate potential PU assignment error or possible laboratory contamination, with the removal of any sequences ascribed to the latter. When two or more PUs segregated within the same lineage, individuals were revised morphologically, and in the absence of any differences, the lineage was inferred to represent a single PBS. For lineages comprising two or more morphologically distinct PUs, and where PUs were phylogenetically structured within the lineage, PUs were inferred to represent different PBS. Morphologically distinct PUs that showed no clear segregation with regard to mtDNA variation were inferred to represent recent speciation with incomplete lineage sorting and/or gene flow, and were removed for subsequent analyses.
F I G U R E 2 Laurel forest sampling plots across the four western Canary Islands. See electronic Table S1 for specific details for each sampling plot.
For each PBS, from one to four individuals representative of mtDNA sequence variation within that PBS were sequenced for the 658 bp barcode region of the COI gene (hereafter referred to as COI-5P, reflecting its location towards the 5′ end of the gene, see Appendix S1, for more details about PCR amplification conditions). COI-5P and COI-3P are nonoverlapping fragments of the COI gene. For both COI regions, PCR products were sequenced using the Sanger DNA sequencing service of Macrogen (www. macro gen.com).

| Taxonomic assignment of PBS
Specimens from PBS were examined and taxonomically assigned by HL and EJG using dichotomous keys and additional information (see Appendix S2, for detailed information). Taxonomic assignments were further assessed and refined by comparison of barcode sequences against a near-complete reference library for the 626 species of Curculionoidea (weevils) of the Canary Islands (Machado, 2022;Stüben, 2021), and public databases. Barcode sequences were used for (i) BLAST search (blastn -outfmt 5 -evalue 0.001 -max_target_seqs 100) against the NCBI nucleotide database and, (ii) species identification using the BOLD identification system within the BOLD System database.

| Assessing existing native and introduced species categorizations
Inferred native but not endemic (NATnotend), endemic (END) or introduced (INT) status of species is provided within the Canary Islands Biodiversity databank (https://www.biodi versi dadca narias. es/biota). However, it is recognized that distinguishing between native and introduced status is often difficult in natural communities (Andersen et al., 2019). Thus, to further refine inferences between NATnotend and INT species status, we used high barcode sequence matching (≥99%) to individuals sampled outside the macaronesian region as evidence that inferred NATnotend species are more likely of introduced origin Cicconardi et al., 2017).
For those species that could only be identified to the genus level, assignment to either NATnotend, END or INT, was based upon species records for the genus within the Canary Islands, together with patterns of genetic variation among sequenced individuals. If all described species within a genus are END, then the species is assumed to be one of these endemisms, with a similar inference framework if all species within a genus are NATnotend, or INT. For genera with both END and NATnotend species, we took two approaches for downstream analyses. As a first approach, we assigned a status when the probability of a sampled species having that status was ≥80%, and excluded those species for which this could not be achieved. As a second approach, we excluded all species where status could not be definitively assigned. For all other cases, we assessed patterns of genetic variation and its spatial structuring, assuming species with moderate to high, and/or geographically structured genetic variation, to be NATnotend or END, and species with no, or limited but geographically unstructured genetic variation, to be INT, with equivocal species being excluded.

| MtDNA COI reference sequences for unsampled introduced species
To obtain reference sequences for introduced species not sampled by us, taxonomic searches were performed in BOLD Systems and GenBank for all introduced species recorded within the four western Canary Islands. For those species for which a reference sequence was not available, and for which the species was the only representative of the genus among the islands, sequences from a congeneric species, when available, were used as a proxy for the introduced species. Where possible, both COI regions were extracted from complete mitogenomes. In the absence of a complete mitogenome, we downloaded both regions and concatenated them in geneious prime 2019.1.1 (Biomatters, Auckland, New Zealand). We prioritized obtaining both regions from the same individual, opting for different individuals of the same species when this was not possible. In the absence of both regions, a single region was used.

| Mitogenome sequencing, assembly and alignment
With the objective of generating a robust mitogenome phylogenetic backbone tree for all sampled genera, a GenBank search was carried out to identify all available mitogenomes corresponding to genera cited for the Canary Islands. These mitogenomes were complemented by de novo generation of mitogenomes for all genera sampled within the cloud forest plots that were not represented in GenBank.
For de novo generation of mitogenomes, one individual from each genus was used for DNA extraction, using complete specimens with the head/pronotum and abdomen separated. Extractions   (Boisvert et al., 2010), SPAdes (Bankevich et al., 2012) and Celera (Myers et al., 2000). Parameters used for each method are provided in Table S3. The resulting contigs from each method were then evaluated in Geneious Prime to eliminate those with less than 2000 bp, for the purpose of improved processing time. Using the function De Novo Assemble in Geneious Prime, contigs were then assembled into supercontigs. Using the mitogenome of Tenebrio obscurus (GenBank accession number MG739327) as a reference, mitogenomic supercontigs were identified. Only regions consistently retrieved with at least two assemblers were retained, and a 50% majority rule consensus sequence was generated. Reference COI-3P and COI-5P sequences (described above) were then used to taxonomically assign each mitogenome. Mitogenomes were annotated using the MITOS webserver (Bernt et al., 2013)

| Mitogenome backbone tree
RAxML and Phylobayes were used to reconstruct phylogenetic relationships among genera using both the nucleotide and amino acid alignments for the 13 PCGs. Partitioning by gene and by codon was and -cat -gtr as an infinite mixture model) were run for 30 days, assessing chain convergence every 3 days, together with the number of iterations within each chain, using the software Tracer. Resulting trees were compared with recently published higher level phylogenies for Coleoptera (Mckenna et al., 2015;Timmermans et al., 2016;Zhang et al., 2018) to evaluate consistency with regard to previously described relationships.

| Phylogenetic tree placement
Phylogenetic placement of COI sequences on the mitogenome backbone tree was performed using the tool 'RAxML-HPC v.8 on XSEDE (8.2.12) -Phylogenetic tree inference using maximum likelihood/rapid bootstrapping run on XSEDE' on the CIPRES Science Gateway V. 3.3. The input file was generated with a global alignment of all PBS, together with sequences for introduced species obtained from BOLD and GenBank, using MAFFT in Geneious Prime. Phylogenetic tree placement was run with the default options, with the above mentioned mitogenome backbone tree defined as the Binary Backbone (-r), a GTRCAT model, and 1000 alternative runs. Results were visually assessed in FigTree to evaluate the taxonomic consistency of phylogenetic placement within the mitogenome tree.

| Relatedness analyses
In a first step, the r packages 'ape' (Paradis et al., 2004) and 'picante' (Kembel et al., 2010) were used to prune (function prune. sample) the phylogenetic placement tree to yield the following Analyses were also carried out comparing among the set of introduced species sampled within laurel forest, to further test between the preadaptation and competition hypotheses. Species were ranked by the number of plots within which they were sampled and ranks were progressively grouped, in both an ascending and descending order, conducting mntd analyses at each level. Under the assumption that higher plot occupancy for introduced species reflects higher establishment success, we expect relatedness trends observed across the full phylogeny to be more pronounced for more established introduced species. Consistency with relatedness trends observed across the full phylogeny was also evaluated using the r package 'stats', to conduct a GLM comparing mean relatedness to the nearest NAT species between the set of introduced species sampled in a single plot, and those introduced species sampled in more than one plot.

| Sampling and mtDNA COI sequencing
A total of 4225 individuals were sampled across the six new plots and classified to PU, from which 1817 were selected for COI-3P sequencing, with a sequencing success of 88%, yielding a total of 1591 COI-3P sequences. These sequences were combined with those from 25 previously sampled plots (Salces-Castellano et al., 2021) yielding a total of 4708 sequences, which segregated into 360 PBS (Table S4).

F I G U R E 3
Community-level relatedness among introduced and native species under different phylogenetic expectations for establishment success. (a) Hypothetical phylogeny of beetle species indicating relatedness among native (black) and non-native (grey) species. (bi) Six species introductions into a native community of three species, for which phylogenetic relationships are represented in (a). (bii) Non-native species that are phylogenetically more related to native species have higher establishment success, resulting in a dominant signature of phylogenetic clustering at the community level. (biii) Non-native species that are phylogenetically less related to native species have higher establishment success, resulting in a dominant signature of phylogenetic overdispersion at the community level. (biv) Establishment success of non-native species is not phylogenetically related, resulting in the absence of phylogenetic structure at the community level.
One hundred and sixty-two COI-5P sequences, representing 160 PBS, were obtained from Arjona et al. (2022), and these were complemented with the sequencing of further 426 individuals from across the remaining 200 PBS, yielding 402 new COI-5P sequences (94% sequencing success) for a total of 564 COI-5P sequences.
Across the 360 PBS, 351 (97.5% of the total) were represented by both COI-5P and COI-3P sequences, while the remaining nine PBS were only represented by COI-3P (Table S4).

| Taxonomic assignment, species status and new species records
A total of 268 PBS could be confidently taxonomically assigned to species level, with 92 assigned only to genus level. Four previously unrecorded species are herein reported for the Canary Islands. The

species Corticaria fagi, Corticarina cavicollis, Ischnopterapion modestum
and Psylliodes instabilis were all assigned with ≥99% sequence similarity to individuals sampled outside the Canary Islands. Two previously unrecorded genera for the archipelago, Cephennium and Pterostichus, were both sampled for a single species, but without species-level taxonomic assignment. Due to their lack of previous records, these six species are likely to represent relatively recent introductions. Among the remaining 354 species, 224 were classified as END within the regional databank, with a further 56 classified as NATnotend and 19 as INT. Among the 56 species classified as NATnotend in the regional databank, we identified 19 as being INT, based on high sequence similarity to populations outside the Canary Islands. A further 55 species with only genus-level taxonomic assignment could not be unequivocally assigned biogeographic status using the regional databank; however, three were assigned as INT based on their sequence similarity to populations outside the Canary Islands. Finally, 24 species without clear biogeographic status could be classified as END and 2 more as NATnotend with a level of confidence using the 80% criterion (see methods). Thus, the 360 species were classified as 248 END species, 39 NATnotend and 47 INT species, with only 26 remaining unassigned between END or NATnotend (Table S4).  (Table S5). Among the remaining 33 species, available sequences from congeneric species were used as proxies for 14 introduced species that are the single representatives of their genera in the Canary Islands (Table S5). A total of 93 species were represented by both COI regions, while 42 species only had COI-5P (barcode region) and 10 species only COI-3P. Thus, COI sequence data were obtained for 145 of the 164 introduced species not sampled within the laurel forest, yielding a total of 505 species together with the 360 PBS sampled by us, of which 444 species were represented by both COI regions.

| Mitogenome sequencing
A total of 212 genera (29%) among the 736 recorded for the Canary Islands had an available mitogenome in GenBank (Table S6), including 59 of the 169 genera sampled in laurel forests. Individuals from the remaining 110 genera were selected for de novo mitogenome sequencing. This yielded a total of 82 mitogenomes, with the majority (75%) including all 13 PCG (Table S7). An additional thirty-one tribal-or subfamily-level mitogenomes were publicly available for genera that uniquely represent their corresponding tribe or subfamily within the archipelago (Table S6). This yielded a total of 341 mitogenomes, representing 141 of 169 (83%) genera sampled within laurel forests, and 56 of 118 (48%) genera corresponding to introduced species that were not sampled within laurel forest.

| Mitogenome backbone tree generation and phylogenetic tree placement
Of the two mitogenome backbone trees that were generated with RAxML (amino acid and 13 PCG), only the 13 PCG tree resolved relationships at the superfamily level and, to a lesser extent, familylevel, consistent with other previous beetle phylogenetic works (Crampton-Platt et al., 2015;Timmermans et al., 2016). For the Phylobayes analyses, only the two 13 PCG chains converged. No convergence was obtained for the amino acid chains after 1 month of analysis; therefore, these analyses were not taken any further.
Thus, the two trees derived from the 13 PCG matrix were considered as candidates for subsequent analyses. Among the two trees, the 13 PCG mitogenome backbone tree generated with Phylobayes was the more consistent with previously published mitogenome phylogenies for Coleoptera (Crampton-Platt et al., 2015;Timmermans et al., 2016;Zhang et al., 2018), providing similar superfamily and family-level resolution. Thus, the Bayesian 13 PCG backbone tree was used for the placement of the 505 species using their associated COI sequence data. The resulting tree was visually assessed in Figtree to ensure that genera were monophyletic and that genera fell within their corresponding families.

| Relatedness analyses
Assessing the relatedness of INT species sampled within laurel forests against NAT laurel forest species at the archipelago scale revealed phylogenetic clustering, but with only marginal significance (mntd.obs.z value = −1.5278, p-value = .06; Table S8). In contrast, comparing the relatedness of INT species that were not sampled in laurel forests, against NAT laurel forest species revealed significant overdispersion (mntd.obs.z value = +2.7626, p-value = .997; Table S9). Assessing relatedness of NATnotend species against END species yielded a marginally significant trend towards overdispersion (mntd.obs.z = +1.5641, p-value = .939; Table S10). However, no significant result was obtained when the species with uncertain biogeographic status (indicated in the Table S4) were excluded from the NATnotend against END analysis (Table S11).  (Table S9). Assessing the relatedness of NATnotend species against END species also yielded no significant relationships, whether using species under the 80% criterion or omitting all species with uncertain biogeographical origin (Tables S10 and S11, respectively).

Analyses of introduced species ranked by plot occurrence with
ranks progressively grouped in both ascending and descending order, yielded trends towards phylogenetic clustering for all cumulative rank categories, with the exception of species sampled in only a single plot, which presented nonsignificant overdispersion. However, only three cumulative rank categories were marginally significant, with the remainder being nonsignificant (Table S12) Table S13).

| DISCUSS ION
Our results show that introduced species within the laurel forest of the Canary Islands are more closely related to native species than expected by chance, consistent with a dominant influence of species preadaptation over limited resource competition. The establishment of phylogenetically distant species, such as Biphyllus lunatus Darwin's naturalization conundrum, and those of Xu et al. (2022) who analysed communities of freshwater lake fish. All three studies converge on the observation that successfully established introduced species tend to be more phylogenetically related to native species, providing support for the preadaptation hypothesis. While support for the preadaptation hypothesis varies across the different geographic partitions of our data, additional support for a dominant role of preadaptation comes from our analyses specifically contrasting relatedness patterns of introduced species sampled in the laurel forest and the native species that they are most closely related to.
Comparing mean phylogenetic distances to the nearest native species for introduced species sampled in only one plot (a proxy of less established species) against introduced species sampled in two or more plots (a proxy for more established species) revealed significantly smaller phylogenetic distances for species present in two or more plots (Figure 4).

| Competitive displacement of related native species, or trait mediated coexistence?
For plants at least, it is generally recognized that exotic species are often competitively superior to native species (Jakobs et al., 2004;Levine et al., 2003;Vilà et al., 2011), favouring the displacement of native species by exotics. While it remains unclear to what extent the same may apply to insects, there are reasons to expect a similar relationship within insular environments, where some studies have detected a progressive increase in the arrival and establishment of F I G U R E 4 Boxplots comparing nearest taxon distances to a native species for introduced species in laurel forest that were present in either one plot (N = 26) or in two or more plots (N = 21).
introduced species (Borges et al., 2020;Pyšek et al., 2020), as well as a tendency towards the decline of endemic species populations in insular systems such as the Azorean islands (Borges et al., 2020).
The typically reduced species richness on islands has been linked to reduced intensity of interspecific interactions, compared with continental settings (Patton et al., 2021). In turn, this has led to suggestions that island species are likely to be less adapted to competitive and predatory interactions (Elton, 1958;Kalmar & Currie, 2006;Wilson, 1961;Wilson & MacArthur, 1967), thus favouring their displacement when related species colonize from continental settings. Additionally, colonizing species of beetle are likely to be better dispersers than related endemic species, due the evolution of secondary flightlessness in the latter (Waters et al., 2020). This is exemplified by genera such as Calathus (Carabidae), where phylogenetic analyses have revealed that at least three flighted lineages have colonized the archipelago, each subsequently becoming flightless (Emerson et al., 2000). Recent work indicates that a shift from higher to lower dispersal ability is likely to increase the longer term extinction probability of a lineage , thus favouring the displacement of native species when related introduced species have higher dispersal ability. Displacement of native species by closely related introduced species may be accentuated by hybridization and introgression (Rhymer & Simberloff, 1996).
In addition to close phylogenetic relatedness, both Li et al. (2015) and Marx et al. (2016) found that established introduced species of plant also tend to be functionally or ecologically distinct from native species. Their results thus argue for a primary importance of preadaptation, within which traits for competition avoidance are also favoured, thus favouring the coexistence of native species with introduced relatives. In the absence of relevant functional and ecological data for species of beetle within this study, it is not possible to understand the extent to which distinctiveness may also be associated with close phylogenetic relatedness. However, distinctiveness between related native and introduced species should favour their coexistence, while similarity should lead to competition, and this may be reflected in co-occurrence and abundance data. Sampling data for Ocypus aethiops, an introduced species, and its native congeneric O. umbricola, are suggestive of competitive displacement ( Figure S1). Ocypus aethiops is a recent introduction to the Canary Islands, first recorded in 2012 (Wojas, 2021), where it was sampled from a single location within the laurel forest of Anaga. Across the 10 plots sampled in Anaga, Tenerife, in 2012 (with three plots resampled between 2017 and 2018), O. aethiops was sampled exclusively in the three most western plots of the Anaga peninsula, being absent or at a much lower frequency than the native congeneric species, O. umbricola, across the seven more eastern plots. Li et al. (2015) have demonstrated the utility of temporal sampling for plants to show native species displacement by closely related introduced species. In the absence of robust functional or ecological data for insects, temporal sampling focussed on related native and introduced species may be a useful approach to understand the relative importance of coexistence and displacement dynamics.

| Preadaptation, competition-relatedness and temporal patterns in community assembly
In addition to evaluating relative support for the preadaptation and competition-relatedness hypotheses, we also tested the hypothesis that more recently established species (i.e. introduced species) will typically present stronger signatures for the preadaptation hypothesis, compared with species of less recent origin (i.e. native not endemic species). Patterns of phylogenetic relatedness are consistent with a history of community assembly, prior to human-mediated species introductions, within which species establishment is essentially random with regard to the preadaptation and competitionrelatedness hypotheses. However, as pointed out for plants by Li et al. (2015), a dominant influence of preadaptation could be masked if newly arrived species are competitively superior to closely related native species, and thus capable of displacing them, leading to their eventual extinction. The observation of Li et al. (2015) highlights the context dependency for interpreting patterns of phylogenetic relatedness with regard to competition. In a scenario of pre-equilibrium community assembly, where immigration exceeds extinction, assembly patterns are likely to be stochastic and less deterministic (Emerson & Gillespie, 2008). Such a dynamic should favour random patterns of relatedness for newly established species, with regard to already established species, consistent with our results. However, as extinction (and speciation) becomes more important, and assembly dynamics tend towards equilibrium, turnover of species should eventually lead to a more deterministic set of species within a community (Simberloff & Wilson, 1970), for which competition effects are likely to have been more important. Within this dynamic, the concerns of Li et al. (2015) become relevant, complicating the inference of establishment dynamics of phylogenetically divergent native lineages. This reinforces the need for studies understanding the nature of competitive differences or trait divergence between contemporary pairs of related native and introduced species. Such studies may provide a helpful framework for understanding probable ecological processes behind patterns of phylogenetic relatedness in earlier phases of community assembly.

| Barcode-assisted assignment of introduced species status
Distinguishing between introduced and native not endemic species status within natural communities is often difficult, particularly among invertebrates. This has prompted efforts to use relatedness information from DNA sequence data to infer probable status, both for traditional specimen-based sampling of invertebrate communities (Andersen et al., 2019;Smith & Fisher, 2009) and metabarcode sequencing . Here, we were able to address this limitation, and make species-level taxonomic assignments for 268 of the 360 species sampled, providing a direct association with their recorded native or introduced species status. The remaining 92 species were all assigned to their respective genera, with 55 of these (15.2% of all sampled species) being of equivocal biogeographic origin, due to the existence of unsampled native not endemic and introduced species within their constituent genera. Of these 55 species, three could be assigned as INT species comparing with external barcode databank, two as native not endemic and 24 as endemic using the 80% criterion explained in the methods section (Table S4).
While our level of taxonomic assignment addresses the limitations faced by Andersen et al. (2019) and Kennedy et al. (2022), we considered an additional source of error. For any given native not endemic species, it remains possible that historical inferences of native or introduced status may be incorrect. This is a particularly relevant concern on islands, where such cases have been previ-

| Native or introduced species status-an overlooked biodiversity knowledge shortfall
The approximately 95% increase in the number of introduced species within Canary laurel forest through barcode sequence matching highlights an underappreciated shortfall in arthropod biodiversity data-knowledge of species native or introduced status. In addition to previously recognized shortfalls and challenges (Cardoso et al., 2011;Emerson et al., 2022;Hortal et al., 2015), a lack of reliable data on whether species are native or introduced will also constrain efforts to effectively monitor, understand, manage and ultimately conserve biodiversity. It is unlikely that our results are an idiosyncrasy, or unique to our focal taxonomic group and habitat.
It is rather more likely to be a universal concern, which may vary across different geographic regions and fractions of arthropod diversity. In harmony with the naming of previous shortfalls (Brown & Lomolino, 1998;Cardoso et al., 2011;Diniz-Filho et al., 2013;Hortal et al., 2015;Lomolino, 2004), we refer to this as the Humboldtean shortfall-the limited knowledge on the status of species as being either native or introduced.
Initiatives to taxonomically and geographically scale up local and regional barcode reference library production (Hobern, 2021) are thus likely to have a positive collateral impact upon the Humboldtean shortfall.

| CON CLUS IONS
To our knowledge, this is the first community-level phylogenetic analysis to address the dynamics of arthropod species introduction and establishment. Our results are consistent with previous work on plants (Li et al., 2015;Marx et al., 2016), and fish (Xu et al., 2022), indicating a dominant role for species preadaptation over competition avoidance. The importance of phylogenetic relatedness as a predictor of establishment by introduced species of beetle to laurel forest habitat is reinforced by lower relatedness for introduced species that were not sampled within forests, and higher occupancy within forests for introduced species more closely related to native species. We find indirect evidence for competitive displacement by introduced species; however, further work is needed to understand the relative importance of competitive displacement and trait mediated coexistence.
The combined mitogenome phylogeny and barcode approach presented here provides a generalizable framework for investigating arthropod species introduction and establishment dynamics.
With the costs and logistics of whole mitochondrial genome and barcode sequencing becoming more affordable and streamlined (Srivathsan et al., 2021), the approach can be applied to better understand the dynamics of establishment of introduced arthropod species. However, our results also highlight a potential concern for arthropod biodiversity research. High barcode sequence matching to individuals from distant continental areas revealed 19 species that had been presumed to be native, to be introduced, doubling the number of introduced species to 38. The extent to which this problem manifests itself across different taxonomic domains and geographic areas will be determined by the completeness of relevant barcode reference libraries. In this context, globally coordinated efforts to generate taxonomically assigned barcode and metabarcode sequence data for arthropods Emerson et al., 2022)

ACK N O WLE D G E M ENTS
The authors wish to thank the following for assistance with field-

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data used for the analyses was uploaded to the Dryad Digital Repository (10.5061/dryad.v41ns1s1v). The custom R script developed to define presumed biological species (PBS) by applying a maximum intraspecific divergence threshold is available from GitHub (https://github.com/asalc escas tella no/Diver gence -thres hold.git).
COI sequences and mitogenomes generated for the study were upload to GenBank (see supporting information for accession numbers