Transcriptome data on salivary lipocalin family of the Asiatic Triatoma rubrofasciata

The dataset in this report is related to the research article entitled: “Salivary gland transcriptome of the Asiatic Triatoma rubrofasciata” [1]. Lipocalin family proteins were identified as the dominant component in T. rubrofasciata saliva, and phylogenetic analysis of the salivary lipocalins resulted in the formation of five major clades (clade I-V). For further characterization, each clade of T. rubrofasciata lipocalin was subjected to alignment and phylogenetic analyses together with homologous triatomine lipocalins: procalin, a major allergen in T. protracta saliva and its homologue Td04 from T. dimidiata (clade I), pallidipin and triplatin, inhibitors of collagen-induced platelet aggregation identified from T. pallidipennis and T. infestans, respectively, and their homologue Pc20 identified from Panstrongylus chinai (clade II), Td30 and Td38 from T. dimidiata with unknown functions (clade III), triatin-like salivary lipocalins, Pc58 and Pc226 identified from P. chinai and Td18 from T. dimidiata (clade IV), and triafestin, an inhibitor of the activation of the kallikrein–kinin system, identified from T. infestans saliva and its homologues, Td25 and Td40 from T. dimidiata and Pc64 from P. chinai (clade V).


Specifications
Insect Science Specific subject area Salivary lipocalins of a hematophagous insect Type of data Table, figure How data were acquired RNA-seq was performed using HiSeq 2500 (Illumina) with 100-bp paired-end reads. The trinity sequences were aligned with CLUSTAL W software and examined using Molecular Evolutionary Genetics Analysis (MEGA) ver. 6. Phylogenetic trees were constructed by the maximum likelihood (ML) method with the distance algorithms available in the MEGA package Data format Raw Parameters for data collection Triatoma rubrofasciata specimens were captured in Hanoi, Vietnam. Salivary glands were dissected from adult insects after 2 weeks of feeding, and total RNA was extracted from 20 sets of the salivary glands using NucleoSpin RNA Plus (Takara Bio, Shiga, Japan).

Description of data collection
The quality of paired-end reads obtained by HiSeq sequencing was checked by FastQC. All reads were trimmed using Trimmomatic to obtain high-quality sequences, and de novo assembly of trimmed reads was performed using Trinity. Read counts and FPKM (fragments per kilobase of exon per million mapped fragments) were calculated using RSEM (RNA-Seq by Expectation-Maximization). CDS were extracted using nucleotide sequence databases of the National Center of Biological  Value of the Data • The data represents the first report of salivary lipocalins from an Asiatic triatomine bug.
• The results will provide further information on the salivary biochemical and pharmacological complexity of triatomine bugs and the evolution of salivary components in blood-sucking arthropods. • cDNAs and recombinant proteins prepared from these transcripts will promote the discovery of novel pharmacologically active compounds, as well as the development of biomarkers following exposure to Triatoma rubrofasciata .

Data Description
The salivary gland transcriptome of Triatoma rubrofasciata revealed 64 coding sequence (CDS) coding for lipocalin family proteins, which accounted for 89.27% FPKM of the secreted class and 64.82% FPKM of total molecules in the salivary glands [1] . Table 1 shows the grouping of transcripts coding for lipocalin family proteins in T. rubrofasciata salivary glands obtained by phylogenetic analysis [1] . Figures 1-5 represent alignment and phylogenetic analyses of each clade of T. rubrofasciata salivary lipocalins together with homologous proteins: procalin, a major allergen in T. protracta saliva and its homologue Td04 from T. dimidiata (clade I), pallidipin and triplatin,

Experimental Design, Materials, and Methods
The sequences of T. rubrofasciata salivary lipocalins were obtained in the study "Salivary gland transcriptome of the Asiatic Triatoma rubrofasciata " [1] . The trinity sequences coding for the lipocalin family of proteins were aligned with CLUSTAL W software [2] and examined using Molecular Evolutionary Genetics Analysis (MEGA) version 6 [3] . The best maximum likelihood (ML) model for analysis was selected based on the lowest BIC score (Bayesian Information Criterion) in MEGA 6, and phylogenetic trees were constructed by the ML method with the distance algorithms available in the MEGA package. Bootstrap values were determined based on 1,0 0 0 replicates of the datasets. Data access is possible by viewing "Salivary gland transcriptome of the Asiatic Triatoma rubrofasciata " [1] .

Declaration of Competing Interest
The authors declare that they have no competing financial interests or personal relationships that may have influenced the work reported in this paper.