Raw pacific biosciences and illumina sequencing reads and assembled genome data for the cattle ticks Rhipicephalus microplus and Rhipicephalus annulatus

Ticks from the genus Rhipicephalus have enormous global economic impact as ectoparasites of cattle. Rhipicephalus microplus and Rhipicephalus annulatus are known to harbor infectious pathogens such as Babesia bovis, Babesia bigemina, and Anaplasma marginale. Having reference quality genomes of these ticks would advance research to identify druggable targets for chemical entities with acaricidal activity and refine anti-tick vaccine approaches. We sequenced and assembled the genomes of R. microplus and R. annulatus, using Pacific Biosciences and HiSeq 4000 technologies on very high molecular weight genomic DNA. We used 22 and 29 SMRT cells on the Pacific Biosciences Sequel for R. microplus and R. annulatus, respectively, and 3 lanes of the Illumina HiSeq 4000 platform for each tick. The PacBio sequence yields for R. microplus and R. annulatus were 21.0 and 27.9 million subreads, respectively, which were assembled with Canu v. 1.7. The final Canu assemblies consisted of 92,167 and 57,796 contigs with an average contig length of 39,249 and 69,055 bp for R. microplus and R. annulatus, respectively. Annotated genome quality was assessed by BUSCO analysis to provide quantitative measures for each assembled genome. Over 82% and 92% of the 1066 member BUSCO gene set was found in the assembled genomes of R. microplus and R. annulatus, respectively. For R. microplus, only 189 of the 1066 BUSCO genes were missing and only 140 were present in a fragmented condition. For R. annulatus, only 75 of the BUSCO genes were missing and only 109 were present in a fragmented condition. The raw sequencing reads and the assembled contigs/scaffolds are archived at the National Center for Biotechnology Information.


a b s t r a c t
Ticks from the genus Rhipicephalus have enormous global economic impact as ectoparasites of cattle. Rhipicephalus microplus and Rhipicephalus annulatus are known to harbor infectious pathogens such as Babesia bovis, Babesia bigemina , and Anaplasma marginale . Having reference quality genomes of these ticks would advance research to identify druggable targets for chemical entities with acaricidal activity and refine anti-tick vaccine approaches. We sequenced and assembled the genomes of R. microplus and Keywords: Rhipicephalus microplus Rhipicephalus annulatus PacBio genome sequencing Large genome assembly Canu assembler Cattle tick R. annulatus , using Pacific Biosciences and HiSeq 40 0 0 technologies on very high molecular weight genomic DNA. We used 22 and 29 SMRT cells on the Pacific Biosciences Sequel for R. microplus and R. annulatus , respectively, and 3 lanes of the Illumina HiSeq 40 0 0 platform for each tick. The PacBio sequence yields for R. microplus and R. annulatus were 21.0 and 27.9 million subreads, respectively, which were assembled with Canu v. 1.7. The final Canu assemblies consisted of 92,167 and 57,796 contigs with an average contig length of 39,249 and 69,055 bp for R. microplus and R. annulatus , respectively. Annotated genome quality was assessed by BUSCO analysis to provide quantitative measures for each assembled genome. Over 82% and 92% of the 1066 member BUSCO gene set was found in the assembled genomes of R. microplus and R. annulatus , respectively. For R. microplus , only 189 of the 1066 BUSCO genes were missing and only 140 were present in a fragmented condition. For R. annulatus , only 75 of the BUSCO genes were missing and only 109 were present in a fragmented condition. The raw sequencing reads and the assembled contigs/scaffolds are archived at the National Center for Biotechnology Information. Published

Value of the Data
• These are high quality genomes of important cattle parasites that vector bovine pathogens.
• Researchers studying arachnid and tick genomics, comparative genomics, and arachnid evolution will find the assembled genomes valuable. • The datasets can be used to study genes involved in the development of pesticide resistance in these economically important tick species. • Genes present in these genomes can provide foundational data for research to identify druggable targets for chemical entities with acaricidal activity and also refine anti-tick vaccine approaches.

Data Description
Rhipicephalus microplus and R. annulatus are known to harbor infectious pathogens including Babesia bovis, Babesia bigemina , and Anaplasma marginale . Bovine babesiosis is considered the most economically important arthropod vector-borne disease of livestock in the world. R. microplus is also of high consequence to animal agriculture in tropical and subtropical parts of the world where it has developed resistance to all available commercial pesticide products [1] . Very high molecular weight genomic DNA was purified from eggs collected from laboratoryreared strains of R. microplus and R. annulatus . The genomic DNA was sequenced using 22 and 29 SMRT cells for R. microplus and R. annulatus , respectively, on Pacific Biosciences Sequel and 3 lanes on the Illumina HiSeq 40 0 0 platform. The Canu assembler was used to assemble the genome using only the PacBio reads. Raw read data can be found in the Sequence Read Archive (SRA) under accession numbers SRR9875273 for the R. microplus PacBio Sequel reads, SRR10034978 for the R. microplus Illumina Dovetail Hi-C reads and SRR10 0 09121 for the   Tables 1 , 2 , and 3 , respectively. Fig. 1 is a process flow diagram to clarify the data processing and genome assembly steps.

Tick materials and genomic DNA purification
For R. microplus , genomic DNA was extracted from 10 g of a pooled collection of eggs obtained from the f7, f10, f11, and f12 generation of the Deutsch strain. The Deutsch strain was started from a few individual engorged female ticks collected during a 2001 tick outbreak in Webb County, TX, USA. For R. annulatus , we sought to reduce genetic heterozygosity by conducting single pair matings of generation 18 of the Klein Grass strain, placing one adult male with 10 female adults in a cloth sleeve glued to the shaved side of a bovine host. Following engorgement, individual females were placed into tubes to enable oviposition. We obtained a total of 1.25 g of eggs from 9 single paired matings and this amount of eggs yielded 1.7 mg of genomic DNA. The Klein Grass strain was started in 2010 from an outbreak in Kinney County, TX, USA. Both tick strains have been inbred since their collection and creation, however, they are not genetically homogeneous. A protocol from Sambrook et al. [2] was used to purify very high molecular weight genomic DNA, pulverizing frozen eggs in a liquid nitrogen-cooled mortar and pestle, addition to an aqueous buffer, followed by RNAse treatment, proteinase K digestion, phenol extraction, and dialysis in 50 mM Tris, 10 mM EDTA, pH 8.0 [3] . The resultant DNA was determined by agarose gel electrophoresis to be > 200 kb.

Genome sequencing and assembly
Sequencing at the Texas A&M AgriLife Genomics and Bioinformatics Service, College Station, TX used 22 and 29 SMRT cells on the Pacific Biosciences Sequel for R. microplus and R. annulatus , respectively. Each genomic DNA was also sequenced on 3 lanes of the Illumina HiSeq 40 0 0 platform. The Illumina reads were originally intended for use in error-correcting the Sequel long reads. However, as we could not access the computational resources necessary to error-correct and assemble these large tick genomes, we chose to create a Sequel-only assembly using the Canu pipeline [4] . Read quality checks and filtering of raw reads were conducted   via the manufacturer's standard protocol and protocols developed at the Texas A&M AgriLife Genomics and Bioinformatics Service prior to submission to NCBI and assembly. Canu software error corrects the long reads in multiple steps and can generate highly contiguous genome assembly. We utilized the Pittsburgh Supercomputing Center Bridges system [5] , granted through the National Science Foundation-sponsored Extreme Science and Engineering Discovery Environment (XCEDE) program [6] . Each tick genome's Canu assembly took approximately 25 consecutive days, running on a reserved node with access to 352 cores, 12 TB of RAM, and node-local disk storage to avoid unnecessary data transfers. Program parameters were corMhapSensitivity = high, corOutCoverage = 100, batOptions = -dg3 -db 3 -dr 1 -ca 500 -cp 50, and an input genome size estimate of 2.9 and 3.0 Gb for R. microplus and R. annulatus , respectively, based upon our studies with Rhipicephalus tick genomes (F. Guerrero, unpublished results). Two rounds of polishing the assembly were performed using the ArrowGrid [7] wrapper tool, which incorporates the PacBio GenomicConsensus v2.3.2 Arrow algorithm. The Arrow-Grid installation included ArrowGrid commit d3aa0f3 dated July 18, 2018, and the PacBio pbbioconda Github repository ( https://github.com/PacificBiosciences/pbbioconda ) commit 1d1dd31 dated September 25, 2018. The scheduler part of the ArrowGrid workflow tool was adapted to run on the Texas A&M High Performance Computing (HPRC) Terra cluster, which uses the Slurm Workload Manager. BamTools v2.5.1 ( https://github.com/pezmaster31/bamtools ) was also used in the ArrowGrid workflow. Purge_Haplotigs v1.0.4 [8] was used to separate primary contigs from haplotigs on the assembled contigs after the second round of Arrow polishing. Purge_Haplotigs was also used to generate the NCBI placement file, which provides genomic coordinates of the haplotigs relative to the primary contigs. NUCmer v3.1 with MUMmer 3.2.3 was used for each purge_haplotigs step and NUCmer v3.9.0alpha with MUMmer version 3.9.0alpha was used to generate the NCBI placement file since the purge_haplotigs ncbiplace command required NUCmer v3.9 + .
In order to further improve the quality of the genome assembly for R. microplus , we contracted with Dovetail Genomics (Dovetail Genomics, Scotts Valley, CA, USA) to access their chromosome conformation capture Hi-C capability. Using eggs from the Deutsch f12 and f13 generations, Chicago libraries were created in vitro by adding synthetic chromatin and crosslinks to facilitate proximity ligation. We also provided Dovetail Genomics with the R. microplus polished and assembled genome described above. Data from these libraries and our assembly were analyzed with the Dovetail proprietary algorithm HiRise [9] to find and resolve misjoins in the de novo assembly, and to generate the final genome assembly. Genome completeness was assessed using BUSCO v3.0.2 [10] in genome mode with the arthropoda_odb9 BUSCO lineage and the Augustus fly species.

Ethics Statement
The cattle used to rear the laboratory strains of ticks that provided the eggs for DNA purification were cared for according to protocols approved by the USDA-ARS Cattle Fever Tick Research Laboratory Institutional Animal Care and Use Committee (IACUC).

Declaration of Competing Interest
This work was funded in parts by the USDA-ARS CRIS Project No. 3094-320 0 0-036-0 0D, a USDA-ARS Cooperative Agreement No. 58-3094-6-017 with the Department of Entomology, Texas A&M AgriLife Research, College Station, TX, USA, and by Texas A&M AgriLife Research through an Insect Vector Diseases Competitive Grant and High Consequence Genomics Research Project on Vector-borne Diseases to the Department of Entomology. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562. Specifically, it used the Bridges system, which is supported by NSF award number ACI-1445606, at the Pittsburgh Supercomputing Center (PSC).