The genome sequence of Tadarida brasiliensis I. Geoffroy Saint-Hilaire, 1824 [Molossidae; Tadarida]

We present a genome assembly from an individual male Tadarida brasiliensis (The Brazilian free-tailed bat; Chordata; Mammalia; Chiroptera; Molossidae). The genome sequence is 2.28 Gb in span. The majority of the assembly is scaffolded into 25 chromosomal pseudomolecules, with the X and Y sex chromosomes assembled.


Introduction
Tadarida brasiliensis, commonly known as the Mexican freetailed bat or the Brazilian free-tailed bat, is medium-sized New World insectivorous bat.Belonging to the family Molossidae, Tadarida is one of 21 genera that comprise the 4 th largest family in Chiroptera (Simmons, 2023).The genus of Tadarida contains 8 species, with T. brasiliensis being the only New World bat of the genus (Figure 1).While some genera with Molossidae show support for monophyly, Tadarida does not show evidence for forming a monophyletic clade (Agnarsson et al., 2011).Since T. brasiliensis is the only New World bat of this genus, a subgenus classification of Rhizomops has been previously proposed (Legendre, 1984), but morphological and genetic evidence does not support this distinction (Ammerman et al., 2012;Gregorin & Cirranello, 2016).Genetic evidence from four genes produces a clade specifically formed of T. brasiliensis, Tadarida aegyptiaca and Sauromys petrophilus (Ammerman et al., 2012), but more recent analysis using morphological evidence does not support this clade (Gregorin & Cirranello, 2016).The closest relative of T. brasiliensis is T. aegyptiaca, with T. brasiliensis showing a higher relation to Old World molossids with the last shared common ancestor with the New World clade being 29 mya (Ammerman et al., 2012).
T. brasiliensis is one of the most widely distributed mammals in the New World, and one of the most abundant bat species.According to the International Union for Conservation of Nature (IUCN), T. brasiliensis is listed as Least Concern (Barquez et al., 2015).The geographic range of the species includes most of the United States, Mexico, Central America, and southwestern South America, including Greater and Lesser Antilles (Wilkins, 1989).Due to the species' large range, and proposed behavioral and morphological differences, T. brasiliensis was once thought to comprise up to nine subspecies (Schwartz, 1955).However, the population structure from genetic studies and morphological evidence do not confirm any subspecies classification and support gene flow between mainland and island populations (Morales et al., 2016;Morales et al., 2018).Instead, phenotypic differences are hypothesized to be correlated to climatic variation across regions and individuals are recommended to be categorized as migratory or nonmigratory (Morales et al., 2016).The large geographic range is most likely related to the species' large dispersal capabilities.
One of the defining traits of T. brasiliensis is the "free tail", where the tail extends beyond the uropatagium which is characteristic of the Molossidae family (Figure 2A).T. brasiliensis is the smallest of the New World molossids with an adult weight range of 11-14 g, average total body length of 95 mm, and the average forearm length of 42 mm (Schmidly, 2018).The species has a short velvety brown pelage, with long hairs on the feet that extend past the toes (Wilkins, 1989).The snout is short with wrinkled lips.The ears, when pressed forward, do  not extend past the snout and do not meet at the midline.The species is not strongly sexually dimorphic, but reproductively active males have an enlarged gular gland which is a sebaceous gland found on the suprasternal neck region (Krutzsch et al., 2002).This gland shows seasonal functionality and during the breeding season secretes a thick, oily, odorous substance in reproductive adult males.
T. brasiliensis is known for forming large colonies, with a single colony sometimes containing several million individuals.The densest populations of the species are found in Texas, where in the summer an estimated 95-104 million bats (primarily females forming maternity colonies) occupy a few select caves known as guano caves (Schmidly, 2018).T. brasiliensis primarily roost in caves or man-made structures such as buildings or under bridges but have also been found in hollow trees in the southeastern US (Wilkins, 1989).This species is classified as a migratory bat with some of the longest recorded bat migrations.T. brasiliensis is estimated to have an annual migration as far as 1500 km (Villa-R & Cockrum, 1962), traveling from central and southwestern United States southward into Mexico (Cockrum, 1969).While shorter seasonal movements in temperate bats are not uncommon, mostly in response to unfavorable climate (Popa-Lisseanu & Voigt, 2009), longer migrations as seen in T. brasiliensis are rarer in bats.While temperature changes could be triggering this movement, another potential reason for this annual trip is the seasonal availability and distribution of food resources, such as movements of migratory moths (Russell et al., 2005).Even during nightly foraging trips, individuals may travel more than 160 km from their roost to a foraging site and back in one evening, maintaining horizontal flight speeds of up to 44 m/s (Davis et al., 1962;McCracken et al., 2016).This high dispersal capability may be responsible for the lack of distinct population structure across the wide species range.However, there are differences in migratory behavior based on geographic location and sex.Previous banding experiments propose sedentary or non-migratory subpopulations in parts of the United States (Cockrum, 1969), but no genetic differentiation has been found to account for this difference in migratory phenotype (Russell et al., 2005).Also, females seem to travel further with more males forming resident populations in the winter or not traveling as far into Mexico (Russell et al., 2005).
T. brasiliensis is an aerial insectivore that relies upon echolocation to navigate and forage for prey (Figure 2B).In open spaces this species emits a shallow frequency-modulated echolocation pulse that descends from approximately 25 to 20 kHz over a 15-20 ms duration.When approaching obstacles or prey, pulse durations are progressively shortened to 2 ms while the starting frequency of the fundamental harmonic is concomitantly raised to approximately 50 kHz, with higher (2 nd and 3 rd ) non-overlapping harmonics also becoming prominent in the signal (Schwartz et al., 2007;Simmons et al., 1978).T. brasiliensis is known to forage at high altitudes, with recorded foraging activity on migratory noctuid moths (Lepidoptera) occurring as high as 3000 m (Williams et al., 1973).However, T. brasiliensis can also forage near ground and their diverse diet varies seasonally depending on species abundance and availability (McCracken et al., 2021;Ross, 1961).
Previously, there has only been a short read genome assembly of T. brasiliensis available (GenBank accession: GCA_ 004025005.1)which was generated as a part of the Zoonomia Project (Zoonomia, 2020).Notable sequencing projects utilizing this assembly consist of comparative genomics analyses of diet, visual system adaptations based on foraging style, immunity and metabolic adaptations, and longevity across several bat species (Blumer et al. 2022;Davies et al., 2020;Fushan et al., 2015;Moreno Santillan et al., 2021;Potter et al., 2021).As for future applications with the new reference quality long read genome assembly reported herein, T. brasiliensis has previously been proposed as an ideal mammal model for the genetic and epigenetic basis of migration, mostly due to the widespread species range, abundant population, and variation of the phenotype (Merlin & Liedvogel, 2019).Functional genomics projects should be fruitful in this species compared to other North American bats due to their high population levels and therefore less conservation pressure, as sample sizes are often a limiting factor.Although there is continued potential for comparative projects across multiple bat species, access to this genome assembly will hopefully encourage more work regarding the genetic basis for T. brasiliensis' unique traits.

Genome sequence report
The genome was sequenced from a single male Tadarida brasiliensis collected from the Texas A&M campus of College Station, Brazos County, Texas, USA.A total of 39-fold coverage in Pacific Biosciences Hi-Fi long reads (contig N50 86 Mb) was generated after removal of all reads shorter than 10kb.Primary assembly contigs were scaffolded with chromosome confirmation Hi-C data.The final assembly has a total length of 2.28 Gb in 147 sequence scaffolds with a scaffold N50 of 111 Mb (Table 1).The majority, 98.44%, of the assembly sequence was assigned to 25 chromosomal-level scaffolds, representing 23 autosomes (numbered by sequence length, and the X and Y sex chromosomes).Chromosomal pseudomolecules in the genome assembly of Tadarida brasiliensis are shown in Table 2.The assembly has a BUSCO (Simao et al., 2015) completeness of 96.3% using the laurasiatheria reference set.
While not fully phased, the assembly deposited is of one haplotype.

Methods
The T. brasiliensis specimen was an adult male individual collected on the evening of October 15, 2018.The bat was caught by hand net as it left a roost located in a building on the Texas

Hi-C chromatin confirmation capture
Chromatin confirmation capturing was done making use of the ARIMA-Hi-C (Material Nr.A510008) and the Hi-C+ Kit (Material Nr.A410110) and followed the user guide for animal tissues (ARIMA-Hi-C kit, Document A160132 v01 and ARIMA-Hi-C 2.0 kit Document Nr: A160162 v00).In brief, circa 50 mg flash-frozen powdered tissue was crosslinked chemically.The crosslinked genomic DNA was digested with the restriction enzyme cocktail consisting of two and four restriction enzymes, respectively.The 5'-overhangs are filled in and labelled with biotin.Spatially proximal digested DNA ends were ligated and finally the ligated biotin containing fragments were enriched and went for Illumina library preparation, which followed the ARIMA user guide for Library preparation using the Kapa Hyper Prep kit (ARIMA Document Part Number A160139 v00).The barcoded Hi-C libraries run on a NovaSeq6000 with 2x 150 cycles.
Assembly was carried out following the Vertebrate Genome Project pipeline v2.0 (Rhie et al., 2020) as follows.HiFi reads were created with ccs (v6.0.0).HiFiasm (v0.16.0) was used to create the initial contig set.Haplotypic duplication was identified and removed with purge dups (v1.2.5) (Guan et al., 2020).The quality of the assembly was evaluated using Merqury (Rhie et al., 2020) and BUSCO (Manni et al., 2021).Scaffolding with 10X data was carried out with Scaff10X (commit bc3a0cb), Bionano data with Bionano Solve (v 3.6.1)and Hi-C data (Rao et al., 2014) with SALSA2 (commit e6e3c77) (Ghurye et al., 2019).HiGlass (Kerpedjiev et al., 2018) was implemented to generate Hi-C contact maps and perform manual curation of scaffolds into chromosomes.Figure 3 to Figure 6 were generated using BlobToolKit (Challis et al., 2020).Software utilized for T. brasiliensis analysis are depicted in ).In summary, HMW gDNA has been sheared to 20 and 25 kb fragments, respectively, with the MegaRuptor™ device (Diagenode).10 ug sheared gDNA have been used for library preparation.All PacBio SMRTbell™ libraries were size selected for fragments larger than 9 to 13 kb, 13 kb, and 15 kb with the BluePippin™ device according to the manufacturer's instructions.The size selected libraries run on six Sequel II SMRT cells with the SEQUEL II sequencing kit 2.0 for 30 hours on the SEQUEL II of the DRESDEN concept Genome Center (DcGC), Germany.Circular consensus sequences were called making use of the default SMRTLink tools.
Bionano optical mapping of megabase-size gDNA Megabase-size gDNA of Tadarida brasiliensis was labelled as described in the Bionano Prep direct label and stain (DLS) protocol (Document number 30206).These DNAs were tagged with the nicking-free DLE enzyme.One flow cell of the labelled gDNA was run on the Bionano Saphyr instrument at the DcGC and circa 200X genome coverage of molecules longer than 150 kb was achieved.

Data availability Underlying data
The T. brasiliensis genome sequencing initiative is part of the Bat1K genome sequencing project.The genome assembly is released openly for reuse.
Data accession identifiers are reported in Table 1.
The first long-read genome assembly of Tadarida brasiliensis (Brazilian free-tailed bat) is reported in this work.Genomic studies will be beneficial for this species due to its unique traits, including as a mammalian model for the genetic and epigenetic basis of migration.
The sequencing methods and genome assembly are comprehensively outlined and appropriately designed based on the pipeline developed by the Vertebrate Genome Project of the Genome 10K Consortium.The combination of advanced sequencing technologies (HiFi, Hi-C, and Bionano) fittingly addressed the challenge of generating a high-quality haplotype genome assembly in the absence of a reference sequence.Post-assembly quality control revealed 96% of the genes have complete sequences.
The results are clearly and comprehensively detailed.The sequence data is also made available, allowing further use for downstream analysis such as annotation, comparative genomics, and functional studies.

I only have minor comments:
The short-read genome sequence of T. brasiliensis is already available.Although the authors argued on the importance of genomics research for this species, the need to generate a long-read genome assembly should be clearly communicated.

1.
Does Figure 2 refer to the specific individual caught for this study or are these data obtained from other studies?While the expertise of the authors on bat species identification is unquestionable, actual data may be added to validate the claim on the bat's identity, especially since it has close morphological features with other bats such as N. macrotis.

2.
The sequencing coverage was 39X for HiFi and 100X for Bionano.How was the sequencing coverage decided?Was this estimated from the genome size of the organism? 3.
The section Genome Sequence Report describes that the assembly obtained from HiFi long reads were scaffolded with Hi-C data.However, the role of Bionano genome mapping in the analysis was not clarified.

4.
In the section Genome Sequence Report, it was claimed that 98.44%, of the assembly 5.
sequence was assigned to 25 chromosomal-level scaffolds.However, this statistic was not mentioned in the data section.How was this value obtained?Please clarify.It would be worthwhile to mention in the main text that the alternative haplotype is also available.

6.
The sequencing was appropriately performed for one individual bat, but it was noted that different tissues were used (muscle, kidney, liver) for each sequencing technology.While there may be a technical explanation for this, this should be communicated.

7.
The data follow the format for similar manuscripts in this journal, however Figures 3-6 may not be understandable to a non-expert.Additional information in the figure caption or main text can help guide readers in the interpretation of data.

8.
Were the authors able to make a comparison of the short-read and long-read genomes for T. brasiliensis?

9.
Is the rationale for creating the dataset(s) clearly described?Partly

Are the datasets clearly presented in a useable and accessible format? Partly
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: genomics, wildlife and disease spillover I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Heliana Dundarova
Institute of Biodiversity and Ecosystem Research, Bulgarian Academy of Sciences, Sofia, Bulgaria The manuscript represents valuable research on the genome of the Brazilian free-tailed bat.Approaches and methodologies are based on strong and well-documented protocols, as shown in the excellent implementation of results.The language of the manuscript is according to the good scientific practice.Such a report is valuable and will help future research on bat genomes.
Minor comments: Keywords -please replace or delete Tadarida brasiliensis and genome sequence, because they are used in the title and can not be used as keywords.For example, use Brazilian free-tailed bat instead Tadarida brasiliensis.Species taxonomy -include also Vespertilionoidea, before Molossidae.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Chiroptera, phylogeny, phylogeography, taxonomy I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.maps.html 2. In the methods section the authors mention they generate 10x Genomics and BioNano data, but there's no mention of using this in the Genome sequence report.Reviewer Expertise: Evolutionary biology and genetics/genomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 1 .
Figure 1.Position of Tadarida brasiliensis in the phylogeny of Family Molossidae.Tadarida brasiliensis is one of 8 species currently recognized in the genus Tadarida (Rafinesque, 1814).Tadarida belongs to the subfamily Molossinae (Gervais, 1856), which currently includes 20 genera and 132 species.FIgure created with BioRender.com.

Figure 2 .
Figure 2. Tadarida brasiliensis A) Adult individual of the Mexican free-tailed bat, Tadarida brasiliensis.Note the tail extending beyond the uropatagium, short wrinkled snout, and hairs extending beyond toes.Photo courtesy of Brock and Sherri Fenton with permission, Windsor Cave, Jamaica.B) An echolocation pulse sequence emitted by Tadarida brasiliensis while foraging over a pond.This sequence begins with a typically shallow frequency-modulated search-phase pulse and ends with a terminal buzz.

Figure 4 .
Figure 4. Genome assembly metrics generated using blobtoolkit for the T. brasiliensis genome assembly.The larger snail plot depicts scaffold statistics including N50 length (bright orange) and base composition (blue).The smaller plot shows BUSCO completeness in green.

Figure 3 .
Figure 3. Hi-C Contact Map of the T. brasiliensis assembly with 25 chromosomes, visualized using HiGlass.

Figure 5 .
Figure 5. GC coverage plot generated for the T. brasiliensis assembly using blobtoolkit.Individual chromosomes and scaffolds are represented by each circle.The circles are sized in proportion to chromosome/scaffold length.Histograms show the sum length of chromosome/scaffold size along each axis.Color of circles indicate taxonomic hits of each Phylum represented in the assembly.

Figure 6 .
Figure 6.Cumulative sequence plot generated for the T. brasiliensis assembly using blobtoolkit.The grey line shows the cumulative length for all chromosomes/scaffolds in the assembly.Colored lines represent Phylum represented in the assembly.

Reviewer
Report 25 May 2024 https://doi.org/10.21956/wellcomeopenres.22806.r76577© 2024 Dundarova H.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
3. In sentence three of the Introduction change, "While some genera with Molossidae .." to "While some genera within Molossidae ..." Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Partly Competing Interests: No competing interests were disclosed.Reviewer Expertise: Genome assembly and conservation genomics.I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.Is the rationale for creating the dataset(s) clearly described?YesAre the protocols appropriate and is the work technically sound?YesAre sufficient details of methods and materials provided to allow replication by others?YesAre the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 2 . Chromosomal pseudomolecules in the genome assembly of Tadarida brasiliensis
(Kilkenny et al., 2010)ome Size (bp) GC%.The chromosome number of Tadarida brasiliensis is 2n=50.A&M campus in College Station, Brazos County, Texas, USA.Capture, handling, and sampling were approved by the local institutional Animal Care and Use Committee (Texas A&M University animal use protocol # 2017-0163D) and by Texas Parks and Wildlife scientific collecting permit SPR-1104-610.Upon capture, a dichotomous key (Schmidly, 2018) was utilized to confirm species identity.The bats were identified as T. brasiliensis based on specific morphological features such as forearm measures and other external features.Additionally, other molossids found in Texas (Eumops perotis, Nyctinomops femorosaccus, Nyctinomops macrotis) are much larger compared to T. brasiliensis.At the capture location, there is limited range overlap with other molossids except for N. macrotis which is an uncommon species.However, these two species can be distinguished based on ear morphology as N. macrotis' ears meet on the midline of the head whereas T. brasiliensis' ears do not.After collection from the field, the specimen was brought back to the laboratory and spent 5 months in captivity before tissue extraction and sample preparation.The animal was euthanized with pentobarbital overdose on February 28, 2019.Tissue samples collected were blood, brain (left hemisphere, cerebellum, front and back cortex, dorsal and ventral striatum), liver, heart (ventricle), left and right lung, spleen, left and right kidney, arm muscle, and left and right testicle.In total, 23 tissue samples were collected.All tissue samples were flash frozen in liquid nitrogen and stored in a -80°C freezer until shipment with the cold chain maintained.All data were recorded and reported in accordance with the ARRIVE guidelines(Kilkenny et al., 2010)-see data availability section and Table1.
Extraction of megabase-size gDNA a) Bionano-plug based megabase-size gDNA extraction for Bionano optical mapping.Megabase-size gDNA was extracted from liver tissue according to the Bionano Prep™ Animal tissue DNA isolation soft tissue protocol (Document

SP based megabase-size gDNA extraction for 10x linked Illumina reads
. A second batch of megabase-size gDNA from snap-frozen kidney tissue was extracted with the beta version of the Bionano Prep SP Animal Tissue DNA Isolation Protocol (Document number 30339, Bionano, San Diego, CA).In brief, snap-frozen kidney tissue was homogenized with the Tissue Ruptor (Qiagen) on ice in a chaotropic buffer containing ethanol and tissue lysis took place by adding Proteinase K. Cell debris have been removed by centrifugation.The released gDNA was bound to a Nanobind disk (a novel nano structured silica on the outside of the thermoplastic paramagnetic disk) upon the addition of salting buffer and isopropanol.After several washing steps, the gDNA was eluted from the Nanobind disk.PFGE revealed mega-size DNA molecule length of 50 kb up to 600 kb.