Complete chromosome-level genome assembly data from the tawny crazy ant, Nylanderia fulva (Mayr) (Hymenoptera: Formicidae)

The tawny crazy ant, Nylanderia fulva (Mayr) (Hymenoptera: Formicidae) has a native range that extends from northern Argentina to southern Brazil. In the U.S.A. this species has often been misidentified as Nylanderia (Paratrechina) pubens or N. cf. pubens and has likely been present in Florida and Texas for several decades [1]. In the early 2000’s explosive population growth in Texas and neighboring states drew renewed taxonomic focus. Genetic analyses [2,3] aided in identifying the pest species as N. fulva. This species poses an invasive threat to native flora and fauna and human structures. In its invasive range it has been reported to displace another invasive species, the red imported fire ant. The specimens used for genome sequencing were obtained from the coastal region of Mississippi. DNA was extracted from pupae. The genome data set was deposited to the National Center for Biotechnology Information as submission ID: SUB10775679, Project ID: PRJNA796544, Accession IDs: SAMN24895442 and JAKFQQ000000000. The organism taxid is 613905, locus tag prefixes are L1K79. The assembly, USDA_Nfulva_1.0, was generated in collaboration with Dovetail Genomics (now Cantata Bio) to yield a chromosome-level assembly of 375 Mb with a 15.67 Mb N50 and 78X coverage and revealing 16 putative chromosomes. This high-quality chromosome-level genome assembly was released prior to publication as a public service to the research community.


a b s t r a c t
The tawny crazy ant, Nylanderia fulva (Mayr) (Hymenoptera: Formicidae) has a native range that extends from northern Argentina to southern Brazil. In the U.S.A. this species has often been misidentified as Nylanderia ( Paratrechina ) pubens or N. cf. pubens and has likely been present in Florida and Texas for several decades [1] . In the early 20 0 0's explosive population growth in Texas and neighboring states drew renewed taxonomic focus. Genetic analyses [2 , 3] aided in identifying the pest species as N. fulva . This species poses an invasive threat to native flora and fauna and human structures. In its invasive range it has been reported to displace another invasive species, the red imported fire ant. The specimens used for genome sequencing were obtained from the coastal region of Mississippi. DNA was extracted from pupae. The genome data set was deposited to the National Center for Biotechnology Information as submission ID: SUB10775679, Project ID: PRJNA796544, Accession IDs: SAMN24895442 and JAKFQQ0 0 0 0 0 0 0 0 0. The organism taxid is 613905, locus tag prefixes are L1K79. The assembly, USDA_Nfulva_1.0, was generated in collaboration with Dovetail Genomics (now Cantata Bio) to yield a chromosome-level assembly of 375 Mb with a 15.67 Mb N50 and 78X coverage and revealing 16 putative chromosomes. This high-quality chromosome-level genome assembly was released prior to publication as a public service to the research community.
Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Field-collected ants were identified by the Mississippi Entomological Museum, separated by developmental stage and flash frozen in liquid nitrogen. Samples were mailed in dry ice to Dovetail Genomics (now Cantata Bio) for Omni-C library preparation, sequencing, and scaffold assembly. Sequences were aligned to the Nylanderia fulva (Taxonomy ID 613905) reference genome assembly GCA_005281655.1 [4] using bwa ( https://github.com/lh3/bwa ). The separations of Dovetail Omni-C read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above confidence threshold [5][6][7] . Data source location The whole genome master sequence accession is JAKFQQ0 0 0 0 0 0 0 0 0.1 https://www.ncbi.nlm.nih.gov/nuccore/JAKFQQ0 0 0 0 0 0 0 0 0.1 (also https://www.ncbi.nlm.nih.gov/nuccore/2271424038 ). Chromosome assembly accession numbers and links are provided in Table 3.

Value of the Data
• This is a high-quality chromosome-level genome of the invasive tawny crazy ant, Nylanderia fulva (TCA), an important pest ant in its introduced regions. • Researchers studying ant and social insect genomics, comparative genomics, and evolution will find value in the chromosome-level genome, particularly for studying supergenes and synteny. • Ants (family Formicidae, order Hymenoptera) have extremely diverse chromosome-level genomes, with haploid chromosome number varying between 1 and 60 [8] . These data will help study this phenomenon. • The dataset can be used to study genes involved in sexual and social phenotypic and behavioral polymorphisms. • The dataset will help clarify the evolution of chemical communication and venom genes and biosynthetic pathways.

Objective
Of the approximately 14,0 0 0 extant ant species described, many have had their genomes ( https://www.antweb.org , accessed July 2022) and transcriptomes sequenced and annotated [4 , 9 , 10] . While many sequencing effort s have f ocused on pest ant species from around the world, chromosome-level sequencing and annotation of ant species is far from complete. The main objective was to improve the public genome assembly to chromosome level to facilitate evolutionary and functional analyses of ants and other social insects.

Data Description
The genome data set was deposited to the National Center for Biotechnology Information (NCBI) as submission ID: SUB10775679, Project ID: PRJNA796544, Accession IDs: SAMN24895442 and JAKFQQ0 0 0 0 0 0 0 0 0. The organism taxonomy ID is 613905, and locus tag prefixes are L1K79. The assembly, USDA_Nfulva_1.0, was generated in collaboration with Dovetail Genomics (now Cantata Bio) to yield a chromosome assembly of 375 Mb with a 15.67 Mb N50 and 15.9 Kb N90, at 78X coverage ( Table 1 ), and revealing 16 putative chromosomes ( Table 3 ). Scaffolding improvements to the reference genome GCA_005281655.1 are illustrated in Fig. 1 and Table 2 .  Table), complete and single copy (S) 234 (91.76%), complete and duplicated (D) 7, fragmented (F) 9 in the input assembly and 8 in the chromosome-level assembly, missing (M) 5 in the input assembly and 6 in the chromosome-level assembly, total groups searched 255. The BUSCO version used was 4.05 with lineage dataset eukaryota_odb 10 [11] .

Insect specimens
A portion of a TCA colony was collected in Ocean Springs, MS on May 7, 2021. Ants were found in a wooded area on the grounds of the Cedar Point facility of the Gulf Coast Research Laboratory (30.39 °N, 88.78 °W) that has been known to have TCA for at least eight years. The colony had several queens and an abundance of brood. It was in a rotting limb of a small understory tree. Portions of the limb were broken off and TCA were shaken into a collection box for transport to Mississippi State University, Starkville, MS. The colony was maintained in the lab for over 45 days. A fluon (Insect-a-slip, BioQuip) film applied inside the lower rim of the holding boxes kept the ants contained. The ants and their brood were manually transferred with small hobby brushes to separate clean containers. Using a Leica M125 stereo microscope (8-100X), over 200mg of both early pupae and mixed age larvae were placed in corresponding cryotubes. The tubes were chilled on ice for five minutes and then dropped into liquid nitrogen for at least four minutes to flash freeze the tissue. An aluminum tube block pre-chilled to -75 °C was used to transfer the tube to an ultra-cold freezer at the same temperature. The samples were shipped on dry ice to the Dovetail Genomics lab for sequencing and subsequent analyses.

DNA preparation and sequencing
DNA samples were quantified using Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). For Dovetail Omni-C library, chromatin was fixed in place with formaldehyde in the nucleus. Fixed chromatin was digested with DNase I and then extracted, chromatin ends were repaired and ligated to a biotinylated bridge adapter followed by proximity ligation of adaptercontaining ends. After proximity ligation, crosslinks were reversed, and the DNA purified. Purified DNA was treated to remove biotin that was not internal to ligated fragments. Sequencing libraries were generated using NEBNext Ultra enzymes and Illumina-compatible adapters. Biotin-containing fragments were isolated using streptavidin beads before PCR enrichment of library. The PacBio SMRTbell library ( ∼20kb) for PacBio Sequel was constructed using SMRTbell Express Template Prep Kit 2.0 (PacBio, Menlo Park, CA, USA) using the manufacturer recommended protocol. The library was bound to polymerase using the Sequel II Binding Kit 2.0 (PacBio) and loaded onto PacBio Sequel II. Sequencing was performed on PacBio Sequel II 8M SMRT cells. Scaffolding was done using Illumina HiSeqX platform, paired-end sequencing, and Dovetail HiRise scaffolding assembly tools were provided by Dovetail Genomics (now Cantata Bio).

Data processing analysis
The input de novo assembly and Dovetail OmniC library reads were used as input data for HiRise, a software pipeline designed specifically for using proximity ligation data to scaffold genome assemblies [6] . Dovetail OmniC library sequences were aligned to the draft input assembly using bwa ( https://github.com/lh3/bwa ). The separations of Dovetail OmniC read pairs mapped within draft scaffolds were analyzed by HiRise to produce a likelihood model for genomic distance between read pairs, and the model was used to identify and break putative misjoins, to score prospective joins, and make joins above a threshold. 628 joins were made by HiRise, and 14 breaks were made to the input assembly.
Hi-C contact matrices were generated in two formats: cool and hic. Both contact matrices were generated from the same BAM file by using read pairs where both ends were aligned with a mapping quality of 60. Topologically associated domains (TADs) are fundamental units of chromatin topology, wherein all the chromatin is in close physical proximity. It is thought that regulatory signals can be conveyed more easily within a TAD than between TADs. TAD boundaries often occur at CTCF binding sites and are thought to be established and maintained by Cohesin/CTCF complex. TADs were identified using the Arrowhead program implemented in the Juicertools package. TADs were called at three different resolutions: 10 kbp, 25 kbp, and 50 kbp. The parameters used were -k KR -m 20 0 0 -r 10 0 0 0, -k KR -m 20 0 0 -r 250 0 0, and -k KR -m 20 0 0 -r 50 0 0 0. A/B compartments were identified at 1 Mbp using the eigenvector program implemented in the JuicerTools package ( https://github.com/aidenlab/ JuicerTools). The parameters used were KR BP 10 0 0 0 0 0. TAD statistics are presented in Table 2 . Isochores, extended genomic regions (typically 300kb to multimegabase) of uniform, characteristic GC content, were predicted using the isofinder program; none were predicted. The parameters used were 0.90 p2 30 0 0. The output was post-processed and converted to BEDPE format. 959 CTCF sites were predicted using the CREAD program. The position weight matrix was downloaded from CTCFBSDB 2.0 website. The output was then post-processed to convert it to a bed file. Multires files were generated using the clodius package. Scaffolding and TAD reports were provided by Dovetail.

Ethics Statements
in this publication are for the information and convenience of the reader. Such use does not constitute an official endorsement or approval by the U.S. Department of Agriculture, Agricultural Research Service. USDA is an equal opportunity provider and employer.