RNA-seq data exploration after trypanosome RNA-binding protein UBP1 expression is altered by CRISPR-Cas9 gene editing and overexpression

Previous studies have shown that overexpression of the Trypanosoma cruzi U-rich RNA-binding protein 1 (TcUBP1) in insect-dwelling epimastigotes results in a gene expression pattern resembling that of the infective form of the pathogen. Here, we used CRISPR-Cas9-induced edition of TcUBP1 and full-length protein overexpression in epimastigote cells to monitor transcriptomic changes during the epimastigote-to-metacyclic trypomastigote stage transition of T. cruzi. This dataset includes the bioinformatics analysis of three different RNA-seq samples, each with three biological replicates, showing differential mRNA abundances. The current transcriptome report has the potential to shed light on the quantitative variances in the expression of significant up- or down-regulated mRNAs as a consequence of the levels of the UBP1 protein. Raw data files were deposited at the NCBI Sequence Read Archive - SRA at http://ncbi.nlm.nih.gov/Traces/sra/sra.cgi with accession numbers PRJNA907231 and PRJNA949967.

Dataset link: RNA-seq of TcUBP1 knockdown in Trypanosoma cruzi CL-Brener epimastigotes (Original data) Dataset link: RNA-seq of TcUBP1 transgenic Trypanosoma cruzi CL-Brener epimastigotes (Original data) a b s t r a c t Previous studies have shown that overexpression of the Trypanosoma cruzi U-rich RNA-binding protein 1 (TcUBP1) in insect-dwelling epimastigotes results in a gene expression pattern resembling that of the infective form of the pathogen.Here, we used CRISPR-Cas9-induced edition of TcUBP1 and full-length protein overexpression in epimastigote cells to monitor transcriptomic changes during the epimastigote-to-metacyclic trypomastigote stage transition of T. cruzi .This dataset includes the bioinformatics analysis of three different RNA-seq samples, each with three biological replicates, showing differential mRNA abundances.The current transcriptome report has the potential to shed light on the quantitative variances in the expression of significant

Value of the Data
• This is the first transcriptome study of Trypanosoma cruzi RNA-binding protein UBP1overexpressing and knockdown parasites.• The RNA-Seq dataset underlines the quantitative differences of significant up-or downregulated mRNAs as a direct effect of UBP1 protein levels.• Further gene expression analysis will provide a data source for understanding of how T. cruzi epimastigotes differentiate to infective metacyclic trypomastigotes.

Background
The present study focused on the U-rich RNA-binding protein 1 (UBP1, TcCLB.507093.220),one of the first trypanosome RNA-recognition motif-containing proteins to be described.Remarkably, UBP1 gene is encoded in a single stable dicistronic unit and its protein product regulates the abundance of several genes that have U-rich elements in Trypanosoma cruzi [ 1 , 2 ].We have already shown that when UBP1 is overexpressed in insect-dwelling epimastigotes, the microorganism shows an expression pattern that looks like the infective form [3] .To further investigate the regulatory role of this protein, we used the CRISPR-Cas9 technology to generate a population of parasites that did not express this protein.However, we obtained an N-terminal mutated protein that is significantly less expressed than the endogenous form found in normal cells [4] .

Data Description
This data describes a transcriptomic experiment to investigate the differential gene expression profiles between the T. cruzi N-terminal mutated UBP1 knockdown [4] and full-length protein overexpression in epimastigote cells.The experimental design consists of three highthroughput sequencing samples with three biological replicates each of wild-type RNA-binding protein TcUBP1 (WT), UBP1-GFP tetracycline-induced epimastigotes for four days (UBP1-OE), and CRISPR/Cas edited epimastigotes (UBP1mut-KD).Differential gene analysis was conducted using DESeq2 [5] , with the criteria of at least 2-fold change (|Log2 fold change| > 1, FDR-adjusted p value < 0.05).The expression values listed in Supplementary Table 1 show the average Log2 fold change for all mapped genes in UBP1-OE or UBP1mut-KD versus WT.TcUBP1 overexpression in epimastigote cells led to a significant increase in the abundance of 801 genes and a downregulation of 362 genes.In the population of UBP1mut-KD parasites, a significant decrease of 26 genes and an increase of 67 genes was observed ( Fig. 1 ).Next, the transcriptome dataset was examined to check the amount of UBP1 transcript levels within the two conditions, with UBP1 being the experimental internal control of our RNA-Seq analysis.As expected, the Log2 fold change obtained for UBP1-OE/WT was 6.16 (FDR-adjusted p value = 2.1E-84) and the Log2 fold change of UBP1mut-KD/WT was -0.90 (FDR-adjusted p value = 2.1E-02).In addition, the oppositely affected genes after UBP1 overexpression and knockdown were recently reported by Sabalette et al. (see Ref. [4] ).Raw data is available through NCBI's Sequence Read Archive at http://www.ncbi.nlm.nih.gov/sra.cgiwith accession numbers: PRJNA907231 and PRJNA949967.

Experimental Design, Materials and Methods
Control wild-type UBP1, four-day tetracycline-induced UBP1 parasites [6] , and UBP1-CRISPR/Cas9 edited cells [4] were cultured in epimastigote form conditions at 28 ºC.Total RNA from three biological replicates was prepared from approximately 10 7 epimastigote cells of each sample (WT, UBP1-OE, and UBP1mut-KD) using the TRIzol reagent following the manufacturer's instructions (Invitrogen).Isolation of poly(A) + mRNA, library preparation, and deep-sequencing on an DNBSeq platform were performed at BGI. 100 nt reads were mapped to the T. cruzi reference genome CL Brener Esmeraldo-like strain (TriTrypDB-59_TcruziCLBrenerEsmeraldo-like_Genome.fasta), as described [3] .Differential gene analysis was conducted using DESeq2 [5] .All RNA-seq raw data files for WT, UBP1-overexpression, and UBP1-CRISPR/Cas9 edited samples, used in this study, are available as FASTQ files of 100 bp paired-end reads in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database with the following study numbers: PRJNA907231 and PRJNA949967.

Fig. 1 .
Fig. 1.A bar graph for the number of genes identified in the two conditions.Stacked bar charts show the distribution of the number of significant genes and DEGs identified in each pairwise comparison between OE (UBP1-OE) and KD (UBP1mut-KD) versus WT.