High-quality carnivore genomes from roadkill samples enable species delimitation in aardwolf and bat-eared fox

In a context of ongoing biodiversity erosion, obtaining genomic resources from wildlife is becoming essential for conservation. The thousands of yearly mammalian roadkill could potentially provide a useful source material for genomic surveys. To illustrate the potential of this underexploited resource, we used roadkill samples to sequence reference genomes and study the genomic diversity of the bat-eared fox (Otocyon megalotis) and the aardwolf (Proteles cristata) for which subspecies have been defined based on similar disjunct distributions in Eastern and Southern Africa. By developing an optimized DNA extraction protocol, we successfully obtained long reads using the Oxford Nanopore Technologies (ONT) MinION device. For the first time in mammals, we obtained two reference genomes with high contiguity and gene completeness by combining ONT long reads with Illumina short reads using hybrid assembly. Based on re-sequencing data from few other roakill samples, the comparison of the genetic differentiation between our two pairs of subspecies to that of pairs of well-defined species across Carnivora showed that the two subspecies of aardwolf might warrant species status (P. cristata and P. septentrionalis), whereas the two subspecies of bat-eared fox might not. Moreover, using these data, we conducted demographic analyses that revealed similar trajectories between Eastern and Southern populations of both species, suggesting that their population sizes have been shaped by similar environmental fluctuations. Finally, we obtained a well resolved genome-scale phylogeny for Carnivora with evidence for incomplete lineage sorting among the three main arctoid lineages. Overall, our cost-effective strategy opens the way for large-scale population genomic studies and phylogenomics of mammalian wildlife using roadkill.


27
In the context of worldwide biodiversity erosion, obtaining large-scale genomic resources 28 from wildlife is essential for biodiversity assessment and species conservation. An 29 underexploited but potentially useful source of material for genomics is the thousands of  (Fig. 2b). With a nucleotide distance of 0.054 for both COX1 and CYTB, the genetic 151 distance observed between the two subspecies of aardwolf (Proteles ssp.) was higher than the 152 majority of the intraspecific distances observed across Carnivora. However, with a nucleotide 153 distances of 0.020 for COX1 and 0.032 for CYTB, the genetic distance observed between the 154 two subspecies of bat-eared fox (Otocyon ssp.) was clearly in the ambiguous zone and did not 155 provide a clear indication of the specific taxonomic status of these populations. 156 Finally, to test whether the two pairs of allopatric subspecies diverged synchronously      Table 1). Due to quality differences among the extracted tissues for both 190 species, the N50 of the DNA fragment size for P. cristata (9,175 bp) was about twice higher 191 than the N50 of the DNA fragment size obtained for O. megalotis (4,393 bp). The quality of 192 the reads basecalled with the high accuracy option of Guppy was significantly higher than the 193 quality of those translated with the fast option, which led to better assemblies (see Fig. S1

201
The two reference genomes were assembled using MinION long reads and Illumina 202 short reads in combination with MaSuRCA v3.2.9 (Zimin et al., 2013). Hybrid assemblies for 203 both species were obtained with a high degree of contiguity with only 5,669 scaffolds and an 204 N50 of 1.3 Mb for the aardwolf (P. cristata) and 11,081 scaffolds and an N50 of 728 kb for 205 the bat-eared fox (O. megalotis) ( Table 1). Exhaustive comparisons with 503 available 206 mammalian assemblies revealed a large heterogeneity among taxonomic groups and a wide 207 variance within groups in terms of both number of scaffolds and N50 values (Fig. 3, Table   208 S3). Xenarthra was the group with the lowest quality genome assemblies, with a median 209 number of scaffolds of more than one million and a median N50 of only 15 kb. Conversely, 210 Carnivora contained genome assemblies of much better quality, with a median number of 211 scaffolds of 15,872 and a median N50 of 4.6 Mb, although a large variance was observed 212 among assemblies for both metrics (Fig. 3, Table S3). Our two new genomes compared 213 favourably with the available carnivoran genome assemblies in terms of contiguity showing 214 slightly less than the median N50 and a lower number of scaffolds than the majority of the 215 other assemblies (Fig. 3, Table S3). Comparison of two hybrid assemblies with Illumina-216 only assemblies obtained with SOAPdenovo illustrated the positive effect of introducing 217 Nanopore long reads even at moderate coverage by reducing the number of scaffolds from 218 409,724 to 5,669 (aardwolf) and from 433,209 to 11,081 (bat-eared fox) while increasing the 219 N50 from 17.3 kb to 1.3 Mb (aardwolf) and from 22.3 kb to 728 kb (bat-eared fox). With 220 regard to completeness based on 4,104 single-copy mammalian BUSCO orthologues, our two 221 hybrid assemblies are among the best assemblies with more than 90% complete BUSCO 222 genes and less than 4% missing genes (Fig. 4, Table S4). As expected, the two corresponding 223 Illumina-only assemblies were much more fragmented and had globally much lower BUSCO 224 scores (Fig. 4, Table S4).

292
Effective population size reconstructions 293 We used the pairwise sequential Markovian coalescent (PSMC) model to estimate the 294 ancestral effective population size (Ne) trajectory over time for each sequenced individual.

295
For both the aardwolf and the bat-eared fox, the individual from Eastern African populations 296 showed a continuous decrease in Ne over time, leading to the recent Ne being lower than that 297 in Southern African populations (Fig. 6). This is in agreement with the lower heterozygosity 298 observed in the Eastern individuals of both species. For the bat-eared fox, the trajectories of 299 the three sampled individuals were synchronised approximately 200 kya ago (Fig. 6a), which 300 could correspond to the time of divergence between the Southern and Eastern populations. In 301 contrast, the Ne trajectories for the aardwolf populations did not synchronise over the whole 302 period (~2 Myrs). Interestingly, the Southern populations of both species showed a marked 303 increase in population size between ~10-30 kya before sharply decreasing in more recent 304 times (Fig. 6).   Phylogenomic inference was first performed on the whole supermatrix using ML. The 326 resulting phylogenetic tree was highly supported, with all but one node being supported by 327 maximum bootstrap (UFBS) values (Fig. 7). To further dissect the phylogenetic signal 328 underlying this ML concatenated topology, we measured gene concordance (gCF) and site 329 concordance (sCF) factors to complement traditional bootstrap node-support values. For each 330 node, the proportion of genes (gCF) or sites (sCF) that supported the node inferred with the 331 whole supermatrix was compared to the proportion of the genes (gDF) or sites (sDF) that 332 supported an alternative resolution of the node (Fig. 7, Supplementary materials). Finally, a 333 coalescent-based approximate species tree inference was performed using ASTRAL-III based 334 on individual gene trees (Supplementary materials). Overall, the three different analyses 335 provided well-supported and almost identical results (Fig. 7). The order Carnivora was  Congruent phylogenetic relationships among Feliformia families and within hyaenids were 343 also retrieved with the mitogenomic data set (Fig. 2a). The short internal nodes of Felidae 344 were the principal source of incongruence among the three different analyses with 345 concordance factor analyses pointing to three nodes for which many sites and genes support 346 alternative topologies (Fig. 7) including one node for which the coalescent-based with maximum bootstrap support (Fig. 7), as in the mitogenomic tree (Fig. 2a). However, the 361 concordance factor analyses revealed that many sites and many genes actually supported 362 alternative topological conformations for this node characterized by a very short branch 363 length (sCF=34.1, SDF1=29.2, sDF2=36.7, gCF=46.9, gDF1=18.6, gDF2=18.2, gDFP=16.3) 364 ( Fig. 7). In Pinnipedia, the clade Odobenidae (walruses) plus Otariidae (eared seals) was 365 recovered to the exclusion of Phocidae (true seals), which was also in agreement with the 366 mitogenomic scenario (Fig. 2a). Finally, within Musteloidea, Mephitidae represented the first 367 offshoot, followed by Ailuridae, and a clade grouping Procyonidae and Mustelidae.

368
Phylogenetic relationships within Musteloidea were incongruent with the mitogenomic tree, 369 which alternatively supported the grouping of Ailuridae and Mephitidae (Fig. 2a).   With an increasing number of species being threatened worldwide, obtaining genomic 403 resources from mammalian wildlife can be difficult. We decided to test the potential of using  show that the two subspecies of P. cristata present a genetic divergence greater than the 448 threshold, whereas the divergence is slightly lower for the two subspecies of O. megalotis. 449 These results seem to indicate that the subspecies P. c. septentrionalis might have to be  Given that accurately defining the species taxonomic level is essential for a number of       In order to assemble a mitogenomic data set for assessing mitochondrial diversity among P.  (Table S6). Briefly, we extracted total genomic DNA total 680 using the DNeasy Blood and Tissue Kit (Qiagen) for P. c. cristata (NMB TS307), P. c.  To construct reference assemblies with high contiguity for the two focal species we selected 732 the best-preserved roadkill samples: NMB TS305 for O. megalotis and NMB TS307 for P. 733 cristata (Table 1)     individual from each population (Fig. S2). Finally, the two GDI were compared to check if 845 the Southern populations were more structured than the entire populations.

846
To contextualize these results, the same GDI measures were estimated for well-   Figure S1: Plot of the quality of Nanopore long reads base-called with either the fast or the high 1473 accuracy option of Guppy v3.1.5. The quality of the base-calling step has a large impact on the final 1474 quality of the assemblies by reducing the number of contigs and increasing the N50 value. 1475 1476 Figure S2: Definition of the genetic differentiation index (GDI) based on the F-statistic (FST). The 1477 main difference between these two indexes is the use of heterozygous allele states for GDI rather than 1478 real polymorphism for the FST. Green = π within , Orange = π between , Blue = Population A, Red = 1479 Population A+B. 1480 1481 Figure S3: Graphical representation of the results of contamination analyses performed with 1482 BlobTools for a) the aardwolf (Proteles cristata) and b) the bat-eared fox (Otocyon megalotis). 1483 1484 Table S1: Pairwise patristic distances estimated for the 142 species based on the phylogenetic tree 1485 inferred with the 15 mitochondrial loci (2 rRNAs and 13 protein-coding genes). 1486 1487 surveillance project", "Exclude anomalous", "Exclude partial", and using only the RefSeq assembly 1495 for Homo sapiens. 1496 1497  Figure S1: Plot of the quality of Nanopore long reads base-called with either the fast or the high accuracy option of Guppy v3.1.5. The quality of the base-calling step has a large impact on the final quality of the assemblies by reducing the number of contigs and increasing the N50 value.

High accuracy basecalling
Fast default basecalling Figure S2: Definition of the genetic differentiation index (GDI) based on the Fstatistic (FST). The main difference between these two indexes is the use of heterozygous allele states for GDI rather than real polymorphism for the FST. Green = π within , Orange = π between , Blue = Population A, Red = Population A+B.
π private A = π private B = π tot Fst = 0   Table S1: Pairwise patristic distances estimated for the 142 species based on the phylogenetic tree inferred with the 15 mitochondrial loci (2 rRNAs and 13 proteincoding genes).   S3: Sample details and assembly statistics (Number of contigs/scaffolds and associated N50 values) for the 503 mammalian assemblies retrieved from NCBI (https://www.ncbi.nlm.nih.gov/assembly) on August 13th, 2019 with filters: "Exclude derived from surveillance project", "Exclude anomalous", "Exclude partial", and using only the RefSeq assembly for Homo sapiens.  Table S5: Annotation summary and supermatrix composition statistics of the 53 species used to infer the genome-scale Carnivora phylogeny.    Table S10: Summary information for the Carnivora genomes available either on Genbank, DNAZoo (https://www.dnazoo.org) and the OrthoMaM database as of February 11th, 2020. The "OMM" column indicates if the genome was available on OMM (yes) or not (no). The "Annotation" column indicates whether the genome was already annotated (yes) or not (no).