Horn fly transcriptome data of ten populations from the southern United States with varying degrees and molecular mechanisms of pesticide resistance

Haematobia irritans irritans (Linnaeus, 1758: Diptera: Muscidae), the horn fly, is an external parasite of penned and pastured livestock that causes a major economic impact on cattle production worldwide. Pesticides such as synthetic pyrethroids and organophosphates are routinely used to control horn flies; however, resistance to these chemicals has become a concern in several countries. To further elucidate the molecular mechanisms of resistance in horn fly populations, we sequenced the transcriptomes of ten populations of horn flies from the southern US possessing varying degrees of pesticide resistance levels to pyrethroids, organophosphates, and endosulfans. We employed an Illumina paired end HiSeq approach, followed by de novo assembly of the transcriptomes using CLC Genomics Workbench 8.0.1 De Novo Assembler using multiple kmers, and annotation using Blast2GO PRO version 5.2.5. The Gene Ontology biological process term Response to Insecticide was found in all the populations, but at an increased frequency in the populations with higher levels of insecticide resistance. The raw sequence reads are archived in the Sequence Read Archive (SRA) and assembled population transcriptomes in the Transcriptome Shotgun Assembly (TSA) at the National Center for Biotechnology Information (NCBI).

Response to Insecticide was found in all the populations, but at an increased frequency in the populations with higher levels of insecticide resistance. The raw sequence reads are archived in the Sequence Read Archive (SRA) and assembled population transcriptomes in the Transcriptome Shotgun Assembly (TSA) at the National Center for Biotechnology Information (NCBI).
Published by Elsevier Inc. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ ) Value of the data • These are transcriptomes from different populations of parasitic biting flies of livestock with varying levels and mechanisms of phenotypically characterized insecticide resistance. • Researchers studying insecticide resistance in biting flies will find these assembled transcriptomes valuable due to the expanded gene expression sequence data in the transcripts. • The datasets produced from these transcriptomes can be used in comparative studies of molecular mechanisms causing insecticide resistance in biting flies. • The RNASeq data provided in this study can be used with or without a genome for single nucleotide polymorphism (SNP) calling.

Objective
This dataset is the foundational set for Bendele et al. [1] and their study of molecular mechanisms of horn fly pesticide resistance. Our objective in designing the study and acquiring and analyzing the dataset was to be comprehensive and include all the known horn fly pesticide resistance phenotypes in the fly populations that were sampled. Ten horn fly populations with wide-ranging resistance phenotypes were selected and transcriptomes from each population were synthesized and sequenced. The transcriptomes were analyzed individually and all ten transcriptomes were pooled to produce a single horn fly transcriptome. This dataset adds value to Bendele et al. [1] by providing a foundation of 10 transcriptomes for more targeted molecular studies of a wide range of genes in pesticide resistant and susceptible horn flies, including genes with roles in pesticide metabolism and detoxification.

Raw data
Population samples were collected from the backs of cattle using the aerial hand net method and were frozen while alive and stored at −80 °C until the population samples were sorted to make the RNA population samples. Horn fly samples were sorted by sex on dry ice and 10 individual male horn flies, and 10 individual female horn flies pooled and placed into tubes prechilled on dry ice. A total of 4 replicates from each population were created, with the total fly masses shown in Table 2 . To increase the statistical power of the data sets, 4 replicates were  Table 2 and checked on a 1% agarose gel for a final visual inspection before storage at −80 °C with Fig. 1 showing an overview of the process. RNA was barcoded and pooled, then sequenced on 2 lanes of Illumina HiSeq 40 0 0 with a 75 × 75 standard protocol producing 150bp paired-ends reads. The raw reads were deposited at NCBI under BioProject PRJNA30967, BioSample accession numbers SAMN11539754-SAMN11539763, SRA accession numbers SRR9016890-SRR9016899 and as SRA study SRP0 0 0249 as listed in Table 3 . A Principal Component Analysis was conducted on individual replicate read counts aligned to the Kerrville Susceptible transcriptomes using Bowtie2 and R using R Studio 2022.12.0 + 353 using the tidyverse package ( Fig. 2 ).

Assembled transcriptomes
Each population had four replicates sequenced with each replicate treated individually for all preassembly bioinformatic steps and then pooled for the transcriptome assembly with Fig. 3 showing an overview of the analysis steps. The highest quality de novo CLC Genomic Workbench 8.0.1 transcriptomes were produced using kmer 25. The assembled transcriptomes from each population were submitted to NCBI Transcriptome Shotgun Assembly (TSA) database with accession numbers GHLQ010

Sequencing
Ten horn fly populations with varying levels of phenotypically diagnosed insecticide resistance ( Table 1 ) were sequenced in a research study on horn fly insecticide resistance mechanisms [1] . Pools of 10 adult male and 10 adult female horn flies were prepared, using 4 replicates of each population. RNA was isolated from each replicate using TRIzol Reagent (Thermo Fisher Scientific) followed by DNase treatment using the RNeasy kit (Qiagen) following manufacturers' recommended protocols. The RNA was sequenced at Texas A&M AgriLife Genomics and Bioinformatics Service (College Station, TX, USA), barcoding all samples and pooling into two lanes of the Illumina HiSeq 40 0 0 with a 2 × 75 standard protocol producing 150bp paired end reads.

Transcriptome assembly
Raw read files were checked for quality using FastQC version 0.11.5, which showed a large amount of contamination of ribosomal and mitochondrial RNA as overrepresented sequences. SortMeRNA version 2.1 [9] was used with a custom horn fly mitochondrion and ribosomal database created from NCBI accession numbers KM669714, DQ029097, NC_007102, EU375511, EU375513, EU375514, HQ844235, DQ437515, EF560184, U60809, JQ246755, JQ246651, EU179518, EU013947, EU013946, FJ025436, KJ470673, KJ470672, KJ470671 and KJ470670 in Supplementary File 1 at Mendeley Data [10] . The remaining reads were uploaded into CyVerse Discovery Environment (DE) [11 , 12] using Cyberduck version 6.0.4 then rechecked using FastQC 0.11.5 (multifile) app. The CyVerse DE Trimmomatic 0.36.0 [13] app was used with Trimmomatic TruSeq2-PE adapter file and all default parameters with the addition of a 14bp read head crop length. The resulting Trimmomatic output files were checked for the final time with FastQC 0.11.5 (multi-file) app. Preassembly read statistics for each population and replicate can be found in Supplementary Table 1 at Mendeley Data [10] . The population replicates read files were sequenced with each of the replicates treated individually for all preassembly bioinformatic steps and then con-catenated into a pair of read files (R1 & R2) for each population using Concatenate Multiple Files DE app available in CyVerse DE for the transcriptome assembly.
De novo transcriptomes were assembled for each population using the CLC Genomics Workbench 8.0.1 (Qiagen) De Novo Assembler using word size/kmer of 21, 23, 25, and 27 with bubble size of 75, minimum contig length of 200, and default mapping options (mismatch cost of 2, insertion cost of 3, length fraction of 0.5 and similarity fraction of 0.8). The de novo transcriptome assemblies were filtered using FASTA Minimum Size Filter app in CyVerse DE to remove any sequences of less than 200bp. CyVerse DE apps Compute Contig Statistics, BUSCO-v3.0 [14] with diperta_odb9 lineage, and rnaQUAST_1.2.0 ( de novo based) [15] were used with default parameters to assess the quality of the different word size/kmer transcriptome assemblies and results tabulated in Supplementary Table 2 at Mendeley Data [10] . The word size/kmer 25 transcriptomes were determined as the highest quality and submitted to NCBI as part of the Transcriptome Shotgun Assembly (TSA) database.

Transcriptome annotations
Each of the population TSA transcriptomes was annotated using BLAST2GO PRO version 5.2.5 [16][17][18][19] using CloudBlast to perform BlastX against the UniProtKB/Swiss-Prot database with an e-value of 1.0E-25 and default parameters (Supplementary Table 3 at Mendeley Data [10] ). Inter-ProScan searches were performed using CloudIPS for all families, domains, sites, repeats, structural domains, and other sequence features (Supplementary Table 4 at Mendeley Data [10] ). Gene Ontology (GO) mapping was done using database version 2019.04 followed by GO annotation using default parameters except Blast E-value hit filter 1.0E-25 to match the E-value used for BlastX searches noted earlier (Supplementary Table 3 at Mendeley Data [10] ). The In-terProScan GO annotations are provided in Supplementary Table 5, with the top ten Biological Process, Cellular Component, and Molecular Function GO terms from each population shown in Supplementary Table 6 at Mendeley Data [10] . The top ten Biological Process, Cellular Component, and Molecular Function GO terms from these 10 different populations based on the LSU Rosepine Fall 1998 population are shown in Figs. 4-6 , respectively. GO Enzyme Code Mapping was done using default parameters and can be found in Supplementary Table 7 at Mendeley Data [10] . Each transcriptome was annotated using blast + /2.13.0 with default BlastN parameters against the NCBI Nucleotide Sequence Database (NT 3-14-2023) then further filtered by an e-value 1.0E-25 and results can be found in Supplementary 8 at Mendeley Data [10] .

Ethics Statements
No experiments were conducted on animals for this manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Supplementary