Preliminary population studies of the grassland swallowtail butterfly Euryades corethrus (Lepidoptera, Papilionidae)

Abstract Euryades corethrus is a Troidini butterfly (Papilionidae, Papilioninae), endemic to grasslands in southern Brazil, Uruguay, Argentina and Paraguay. Formerly abundant, nowadays it is in the Red list of endangered species for those areas. During its larval stage, it feeds on Aristolochia spp, commonly found in southern grasslands. These native grassland areas are diminishing, being converted to crops and pastures, causing habitat loss for Aristolochia and E. corethrus. This study aimed to assess the genetic diversity, population structure and demographic history of E. corethrus. We sampled eight populations from Rio Grande do Sul, Brazil and based on Cytochrome Oxidase subunit I (COI) molecular marker, our results suggest a low genetic variability between populations, presence of gene flow and, consequently, lack of population structure. A single maternally inherited-genetic marker is insufficient for population-level decisions, but barcoding is a useful tool during early stages of population investigation, bringing out genomic diversity patterns within the target species. Those populations likely faced a bottleneck followed by a rapid expansion during the last glaciation and subsequent stabilization in effective population size. Habitat loss is a threat, which might cause isolation, loss of genetic variability and, ultimately, extinction of E. corethrus if no habitat conservation policy is adopted.


INTRODUCTION
The main goal of conservation biology is the maintenance of populations and species in the long term, over a wide geographic scope. Among the many hurdles in conservation efforts, we must include the identification of populations and units, the assessment of population characteristics such as size and connectivity, detection of hybridization, evaluation of the persistence and adaptation potentials of populations when faced with environmental change, and ultimately an understanding of the factors affecting this process (Hohenlohe et al. 2021). Population studies can provide lots of information, and although they are insufficient to investigate the factors controlling the abundance and distribution of the plants and animals being studied (Ehrlich & Murphy 1987), they are the first steps in an effort to improve the data on species that otherwise would remain lacking. As demonstrated by Petit-Marty et al. (2021), nucleotide diversity in COI can be a useful tool for preliminary evaluation of the conservation status of a species that lacks previous knowledge.
Sadly, conservation studies are biased toward vertebrates and forest areas (Di Marco et al. 2017, Lawler et al. 2006, and as a result, grassland is often overlooked. That apparent lack of diversity is deceiving, since they actually are among the richest ecosystems in the world, in some cases even richer than forest areas (Wilson et al. 2012). Grassland environments are also responsible for various ecological services, among them soil carbon sequestration and hydric resources management (Pillar & Vélez 2010). Two projects, called GLASOD (Global Assessment of Human-induced Soil Degradation) and UNEP (United Nations Environment Programme) project, mapped out the degradation of areas with vegetation cover in the world and found out that some form of antropic degradation is present in 15% of those areas. That number rises to 33% if we consider cultivation areas, according to an FAO (Food and Agriculture Organization) report. While forest coverage and structure can be good indices of conservation in ecoregions clearly dominated by forests, in non-forested ecoregions (e.g. grasslands, deserts), conservation should be based on other variables. Adding to that, Tropical grassy biomes might be mischaracterized as anthropogenic in origin and already degraded due to the lack of tree cover, which causes those areas to be mismanaged. This error can result in avoidance of disturbances (e.g. fires) that are necessary for the dynamics between grasses and trees (Parr et al. 2014). Furthermore, they might suffer "reforestation" with exotic species such as Pinus and Eucalyptus (Canadell & Raupach 2008), causing the replacement of native, ancient and endemic plant species (Bond & Parr 2010, Putz & Redford 2010, Stickler et al. 2009). The three southern states of Brazil (Paraná, Santa Catarina and Rio Grande do Sul) have a peculiar mosaic of biomes, where the Pampa in the southernmost areas turns into Atlantic forest as we go north and finally a small area of Cerrado at the borders of Paraná and São Paulo. The forest areas in the states of Santa Catarina and Paraná and in the northern half of Rio Grande do Sul have patches of grassland while the South Brazilian Plateau is mostly grassland (Pampa). The Pampa biome has been reduced to about 43% of its original size, mainly due to the conversion to agriculture and silviculture (Vélez-Martin et al. 2015) and yet, those areas are neglected and not properly protected (Overbeck et al. 2007). Adding to that, knowledge on insect diversity in the Pampa biome is severely lacking. On the other hand, if we consider that Mendonça et al. (2015) states that studies have found up to 17 different orders of arthropods in the grasslands of the Pampa biome, it is safe to speculate that those areas have a high diversity of insects. An evidence of that is the data available on iNaturalist, where a group called "Insects of Rio Grande do Sul" has gathered occurrences for 2117 different species (iNaturalist).
Also, Rio Grande do Sul is among the richest when it comes to butterflies, with 832 known species and subspecies (Giovenardi et al. 2013), and among those species, there is Euryades corethrus, a Neotropical swallowtail butterfly from the tribe Troidini (Papilionidae). This species is endemic to grasslands of southern regions of Brazil and neighbouring countries (Uruguay, Paraguay and Argentina) (Tyler et al. 1994) and are more easily found from September to April.
Euryades corethrus feeds on plants of the genus Aristolochia while on their larval stage and on plants such as Senecio and Eupatorium during the adult stage. Those plants are common in southern native grasslands (Costa 2016). According to the available literature (Biezanko et al. 1974, Klitzke & Brown 2000, Beccaloni et al. 2008, Volkmann & Núñez-Bustos 2010, E. corethrus uses five Aristolochia species as host plants during its larval stage. In our personal observation, whenever we encountered the larval butterfly in the field, it was feeding on A. sessilifolia. This could indicate that E. corethrus has a strong association to this particular species, but without further investigation we can only speculate. Many butterflies (with E. corethrus being one example), are highly sensitive to environmental changes. They are habitat specialists with moderate long range dispersal abilities, and as a consequence, climate-driven range shifts can directly impact their phylogeographical structure. An example of that effect for a similar swallowtail that occurs in a temperate mountainous habitat can be seen in Todisco et al. (2012) Apart from climate change effects, impacts caused by human activities (like the progressive shrinking of native grassland areas where the host plant can be found) could be compounded by climate-driven range shifts. According to Tyler et al. (1994), this used to be an abundant butterfly species in its distribution area; however, nowadays E. corethrus is listed in different categories in the Red list of The use of charismatic species to attract public support for a particular area or biome (Andelman & Fagan 2000) is a strategy that could be applied using E. corethrus. It can perform this role by acting as a surrogate, becoming an umbrella or an indicator species for the grassland. The approach of "indicator species" is based on the premise that the chosen species (in that case, E. corethrus) is reflecting the other species in the community, as well as reflecting the chemical and/or physical changes to that environment (Landres et al. 1988) Despite being a very good candidate for the role of flagship species, E. corethrus hasn't been extensively studied: records in the literature are mainly inventories or records of new occurrences (Dolibaina et al. 2011), with a few studies dealing with the biology of the species (Caporale et al. 2017), only recently focusing on its ecology (Costa 2016). Euryades corethrus has been tangentially investigated during studies dealing with the tribe Troidini, such as their origin and evolution (Braby et al. 2005), phylogenetic relationships within the tribe (Silva-Brandão et al. 2015) and biogeography and diversification patterns (Condamine et al. 2012). They were included in the phylogenetic relationships and divergence times of Papilionidae (Condamine et al. 2012, Allio et al. 2020) but an investigation focused on this particular species' relationship with the other members of Papilionidae still hasn't been done. Moreover, there is none specifically analyzing the genetic diversity and population structure of E. corethrus.
Holsinger & Weir (2009) defines a structured population as one that has different variances among populations and within populations. There is a substructure instead of the same variance throughout the whole population. Nordborg & Innan (2002) see that substructure as a perturbation of the standard neutral model. A structured population has less gene flow, which can eventually lead to isolation and speciation, but also raises the risk of extinction. On the other hand, gene flow is a "creative force", preventing speciation and natural selection and therefore inhibiting genetic evolution (Slatkin 1987). Knowing the extinction risks and mapping priority areas needed to preserve not only individuals but also the species' genetic diversity are invaluable tools for defining conservation strategies.
Considering E. corethrus is an endangered species in need of urgent conservation actions and it is unknown how habitat loss will affect the remaining populations, or even if and how those populations are structured, this work was aimed at obtaining preliminary data regarding the genetic diversity, population structure, gene flow and demographic history of E. corethrus from southern Brazil.

Specimen collection
The expected area of occurrence included the Brazilian states of Rio Grande do Sul, Santa Catarina and Paraná and also areas of Argentina and Uruguay that border those states. All specimens were captured in their adult stage, in the field, between March of 2016 and January of 2017. The exceptions are the specimens from Eldorado do Sul, Rio Grande do Sul (CE), which were provided by a collaborator after being collected in the field as eggs and reared in the lab until they reached the adult stage. We cannot ensure that siblings were not compared, but since they were collected on different days and that this butterfly has the habit of placing a single egg (occasionally up to five) (Caporale et al. 2017, Mega et al. 2020) on each plant. We cannot be certain if individuals within an area were part of the same brood. A total of 80 individuals from 8 different localities were used (Table I). The collecting sites were chosen based on previous knowledge about the expected distribution of the butterflies and known remnants of preserved grassland (G.W.G Atencio et al. unpublished data). In order to test if the butterflies form just a single population or several smaller populations within the considered area, we chose to include in the DNA extraction only specimens from sites with at least 10 captured specimens and localities as widespread as possible within the known area of occurrence. Captured specimens were killed by pinching, kept in individual, identified paper envelopes, frozen as soon as possible and kept at -20°C freezer until DNA extraction. The butterflies were collected under SISBIO permit number 52634-1.

DNA extraction and amplification
Total genomic DNA was extracted from thorax following Mega & Ravers (2011). DNA quality and concentration were measured using a spectrophotometer (Nanodrop ND˗1000, Thermo Fisher). The COI gene fragment was amplified by PCR using the universal primers LCO˗1490 (forward) and HCO˗2198 (reverse) from Hebert et al. (2003). PCR was conducted in 10 µL volume reactions, as follows: 6.75 µL ultrapure water, 1.0 µL 10x PCR buffer, 0.5 µL MgCl 2 , 0.25 µL dNTP mix (10 mM), 0.2 µL of each primer (20 mM), 0.1 µL Platinum Taq DNA polymerase (5 U/ µL), 1 µL DNA (approximately 50 ng). Thermal cycling conditions followed Doorenweerd et al. (2015), with modifications: 94°C for 3 minutes; 30 cycles of 94° for 30s, 50°C for 30s and 57°C for 40s; final extension at 72°C for 5 minutes. PCR products were purified with ExoSAP˗IT TM (Applied Biosystems). Sequencing reactions were performed both forward and reverse at Macrogen (Seoul, South Korea). Electropherograms were assembled, checked and sequences were edited and aligned using both strands for confirmation with Geneious Prime 2019.2.1 (https://www. geneious.com). All sequences were aligned with MAFFT (Rozewicki et al. 2019), implemented in Geneious Prime 2019.2.1. The generated sequences were compared with available sequences using BLASTN (https://blast.ncbi. nlm.nih.gov), with standard parameters and submitted to GenBank (Table I). The sequenced specimens were registered in SISGEN under the number A7278D9.  The population structure was evaluated by spatial analysis of molecular variance (SAMOVA) implemented in SAMOVA 2.0 (Excoffier et al. 1992, Dupanloup et al. 2002, Excoffier & Lischer 2010, without a priori information about population structure and also by Bayesian Analysis of Population Structure (BAPS) 6.0 with the 'spatial clustering of groups' models followed by admixture analysis (Corander & Marttinen 2006, Corander et al. 2008a, b, Cheng et al. 2013. BAPS uses both approaches to determine the most appropriate clustering (default model) and best groups in a fixed number of clusters (fixed-K mode).

Tests of demographic equilibrium and population expansion
Haplotype and nucleotide diversity were calculated using DnaSP 6.0 (Rozas et al. 2017). Demographic equilibrium was tested in each operational population unit by calculating Fu's Fs (Fu 1997) and Tajima's D (Tajima 1989) statistics. P-values for the two statistics were obtained as the proportion of simulated values smaller than or equal to the observed values (ɑ = 0.02 for Fu's Fs, and ɑ = 0.05 for Tajima's D). Expected mismatch distributions and parameters of sudden expansion s = 2lT were calculated using Arlequin 3.5.2.2 by a generalized least-squares approach (Schneider and Excoffier 1999), under models of pure demographic expansion and spatial expansion (Ray et al. 2003, Excoffier 2004. The probability of the data under the given model was assessed by the goodness-of-fit test implemented in Arlequin. Parameter confidence limits were calculated in Arlequin through a parametric bootstrap (10,000 simulated random samples).

Reconstruction of demographic histories
The demographic history of the population was estimated by running the MCMC analysis under the Bayesian skyline tree prior in BEAST 2.6.2 (Bouckaert et al. 2019). The HKY (Hasegawa, Kishino and Yano) model for nucleotide substitution (Hasegawa et al. 1985) was estimated in JModelTest. We assumed a relaxed lognormal molecular clock with a mean rate of 0.0075 (corresponding to the divergence rate of 1.5% per million years). This rate is commonly used in Lepidoptera studies (DeChaine & Martin 2005, Canfield et al. 2008, Kawakita & Kato 2009, Oliver et al. 2012, Pfeiler et al. 2012, Wang et al. 2014

Genetic diversity
The dataset of COI alignment contains 657 bp, of which 16 were variable (13 synonymous and 3 non-synonymous substitutions) and 6 parsimony informative. These polymorphic sites defined 17 haplotypes ( Figure 1). The haplotype diversity (Hd) ranged from 0.378 in CE to 0.667 in FMS, FZI, SA and U. The nucleotide diversity (π) ranged from 0.00061 in CE to, 0.00122 in FMS, FZI, SA and U. The summary of genetic diversity indices and neutrality tests are presented in Table II. From the 17 haplotypes recovered, 10 haplotypes were unique to a single locality and one haplotype was found in all localities. The remaining haplotypes range from 2 to 4 localities ( Figure 2). The most common haplotype was present in all population samples and was the most abundant haplotype across all populations, identified in 70% of all the sampled individuals across the entire study region. This main haplotype was the core of a star-like cluster topography, central to all other haplotypes, most of them represented by a single individual and divergent by a single mutation (Figure 1). Fu's Fs and Tajima's D statistics were calculated for all operational population units defined a priori and for a unique population approach. Values of Fs and D were negative for all populations, and significant for most of them (Table II). In stable populations, over time, it is expected for those values to be close to zero. Since our values were negative and significant for most of them, the null hypothesis of constant population size should be rejected. Just as is the case with Tajima's D, the value calculated for Fu's Fs is an indication of population expansion.
Mismatch analysis corroborates the sudden expansion model (Table III). The theta value (0.05 ma) indicates the true time since population expansion. The observed mismatch distributions corresponded to the expected frequency distributions of pairwise nucleotide site differences in an exponentially growing population (Figure 3). Also, the mismatch distribution was unimodal indicating demographic expansion.

Population structure
The population structure was evaluated by spatial analysis of molecular variance (SAMOVA) without a priori information about population structure and also by Bayesian Analysis of Population Structure (BAPS). Both analyses presented similar results, indicating a lack of population structure and therefore suggesting the populations defined a priori based on the geographic location are a unique panmictic population. The analysis of molecular variance (AMOVA) calculated the source of variation among and within populations (Sum of squares was 1.900 and 23.400, respectively); the variation within populations corresponded to 100% of total variation. Also, the average Fst value was -0.01676 (p value = 1) indicating no genetic differentiation between populations. Fst values are supposed to range from 0 to 1, but small samples with low differentiation levels can produce negative values close to zero. Pairwise Fst values for all populations are represented in Figure 4.

Demographic history
The effective sample size (ESS) for the Bayesian Skyline Plot was larger than 200, suggesting that the MCMC (Markov chain Monte Carlo) mixed properly and that the number of generations was sufficient to infer size changes.
The Bayesian skyline plot indicated an overall pattern of population stability throughout the Pleistocene. A fluctuation near the present (around 100,000 ya), suggests a population bottleneck followed by rapid population expansion (~50,000 ya). (Figure 5).

DISCUSSION
Although typically not sufficient when unraveling population-level questions, mitochondrial (mt) COI (Cytochrome Oxidase I) barcoding is a useful tool during the early stages of population structure investigation, highlighting patterns of genomic diversity within the target species (Hajibabaei et al. 2007). We believe our findings are useful as a starting point, but should be  approached carefully, since we are dealing with a single genetic marker with a rather idiosyncratic evolutionary history. It is, after all, maternally inherited and much smaller than the nuclear genome, but it is still useful information for a species that severely lacks genetic data, as is the case for E. corethrus. The COI sequences of E. corethrus revealed very low nucleotide diversity (0.00097± 0.00015) and a relatively high haplotype diversity (0.542± 0.068). Despite the haplotype diversity, most occur at a very low frequency and are differentiated from a dominant (probably ancestral) haplotype by a single mutational step. Tajima's D and Fu's Fs values were negative for all populations and significant for most of them, indicating an excess of rare nucleotide site variants compared to the expectation under a neutral model of evolution. These results lead us to reject the neutrality hypothesis.
A negative Tajima's D, paired with the haplotype network in a "star-like" pattern ( Figure 1) ( Slatkin & Hudson 1991, Avise 2000 indicates population size expansion after a bottleneck or a selective sweep and/or purifying selection (Tajima 1989). Further evidence is found in the aforementioned negative value of Fs (evidence for an excess number of alleles), which is expected to follow a recent population expansion or from genetic hitchhiking (Fu 1997). Schmitt & Seitz (2002) investigated the genetic structure of Polyommatus coridon (Lepidoptera: Lycaenidae) and also verified very low Fst values, just like our results for E. corethrus. They also didn't find hierarchical structure either between or within the study areas, as well as no correlation between  genetic and geographical distances. This lack of congruence between geographical and genetic patterns might be explained by the high dispersal power of some species, corroborated by Hastings & Harrison (1994). Our results suggest that populations from Southern Brazil probably represent one metapopulation sufficiently interconnected for the maintenance of the genetic pool. The sufficient gene flow between populations reduces or restrain the process of geographic differentiation (de Jong et al. 2011). This phenomenon is common in flying insects, especially in migratory and/or good dispersers, such as Monarch butterflies (Brower & Boyce 1991), bumble bees Bombus terrestris (Estoup et al. 1996), and dragonflies The BPS pattern suggests a long period of population stability, with a decrease in population size around 100,000 ya. This population decrease was followed by a rapid population growth around 50,000 ya. A similar pattern, where there was a decrease in population size followed by rapid population growth and range expansion during the Late Pleistocene can be found for other Lepidoptera, like Gnopharmia colchidaria (Geometridae), G. kasrunensis (Geometridae) and Polyommatus coridon (Lycaenidae) (Rajaei Sh et al. 2013, Schmitt & Seitz 2001.
The recent demographic expansion is in accordance with broadly observed patterns of population expansion in different taxa, following the last glacial period, which ended around 12,500 years ago (de Jong et al. 2011  if no habitat conservation policy is adopted in the near future. The habitat fragmentation can erode neutral and adaptive genetic diversity of populations caused by decreases in the effective population size and inter-populational gene flow (Johansson et al. 2007, Dixo et al. 2009, Wang et al. 2011, Liu et al. 2013, review in Lv et al. 2019. Thus, habitat fragmentation threatens not only local, but regional, and ultimately global biodiversity (Tilman et al. 1994, Dobson 1997. Euryades corethrus is inserted in an anthropogenic landscape consisting of highly fragmented agricultural landscapes, with embedded habitat remnants. In such a scenario, the butterflies need not only large and reachable habitats, but also smallers habitats in between that function as stepping stones in order to achieve long-term persistence at the landscape level (Heinrichs et al. 2015). This seems to agree with the strategy proposed by Seraphim et al. (2016), who advise that instead of creating biologic reserves, it is more effective to increase the conservation status of small remnants of suitable habitat, protect the patches that are already inserted in the metropolitan matrix and promote the connectivity or the populations through tree lined avenues and cultivation of native resting flora in urban parks. This is desirable since a metapopulation structure needs a balance between subpopulation diversity and migration in order to maintain the genetic diversity (Whitlock 2004).
Another viable strategy seems to be spatial risk spreading, a measure that compensates for the lag between the measures being taken to restore the habitats and the desired effect in the quantity and quality of habitats (Maes et al. 2004). Among the risk spreading strategies, we can mention translocations to suitable, unoccupied sites that otherwise would have a low probability of short-term spontaneous colonization or even re-introductions into sites previously occupied by the species (Oates 1992).
Regardless of the adopted strategy, it is possible that Rio Grande do Sul has become the last refuge of the species in Brazil. A recent work by G.W.G. Atencio et al. (unpublished data) surveyed the areas where the butterfly was expected to be found but very few places actually had either the butterfly or suitable grassland. Argentina and Uruguay have only very few scattered records in recent years, but the conservation status of native grasslands in these countries seems to be as dire as the one in Brazil, according to G.W.G Atencio et al. (unpublished data) and the relationship between these populations and the ones in Brazil remains to be investigated. We hope that the similarities between the areas in Brazil and the areas in the neighbouring countries of Uruguay and Argentina allow our data to be used for grassland research not only on where data was collected, but in similar areas in South America as well, since conservation efforts should aim to protect not just the Pampa, but the entire grassland network .