Direct RNA sequencing dataset of SMG1 KO mutant Physcomitrella (Physcomitrium patens)

Nonsense-mediated mRNA decay (NMD) is a system that controls the quality of mRNA transcripts in eukaryotes by degradation of aberrant transcripts in a pioneer round of translation. In mammals, NMD targets one-third of mutated, disease-causing mRNAs and ∼10% of unmutated mRNAs, facilitating appropriate cellular responses to environmental changes [1]. In plants, NMD plays an important role in development and regulating abiotic and biotic stress responses [2]. The transcripts with premature termination codons (PTCs), upstream ORFs or long 3′-UTRs can be targeted to NMD. It was shown that alternative splicing plays a crucial role in regulation of NMD triggering, for example, by the introduction of a PTC in transcripts. Therefore, the correct identification of mRNA isoforms is a key step in the study of the principles of regulation of the cell transcriptome by the NMD pathway. Here, we performed long-read sequencing of Physcomitrella (Physcomitrium patens) mutant smg1Δ line 2 native transcriptome by Oxford Nanopore Technology (ONT). The smg1Δ is a knockout (KO) mutant deficient in SMG1 kinase is a key component of NMD system in plants and animals [3]. RNA was isolated with Trizol from 5 day old protonemata and sequenced using kit SQK-RNA002, flow cells FLO-MIN106 and a MinION device (Oxford Nanopore Technologies Ltd., UK (ONT)) in three biological repeats. Basecalling was performed with Guppy v.4.0.15. The presented transcriptomes give advantages in the identification and functional characterization of RNA transcripts that are direct targets of the Nonsense-mediated mRNA decay system.


a b s t r a c t
Nonsense-mediated mRNA decay (NMD) is a system that controls the quality of mRNA transcripts in eukaryotes by degradation of aberrant transcripts in a pioneer round of translation. In mammals, NMD targets one-third of mutated, disease-causing mRNAs and ∼10% of unmutated mRNAs, facilitating appropriate cellular responses to environmental changes [1] . In plants, NMD plays an important role in development and regulating abiotic and biotic stress responses [2] . The transcripts with premature termination codons (PTCs), upstream ORFs or long 3 -UTRs can be targeted to NMD. It was shown that alternative splicing plays a crucial role in regulation of NMD triggering, for example, by the introduction of a PTC in transcripts. Therefore, the correct identification of mRNA isoforms is a key step in the study of the principles of regulation of the cell transcriptome by the NMD pathway. Here, we performed long-read sequencing of Physcomitrella ( Physcomitrium patens ) mutant smg1 line 2 native transcriptome by Oxford Nanopore Technology (ONT). The smg1 is a knockout (KO) mutant deficient in SMG1 kinase is a key component of NMD system in plants and animals [3] . RNA was isolated with Trizol from 5 day old protonemata and sequenced using kit SQK-RNA002, flow cells FLO-MIN106 and a MinION device (Oxford Nanopore Technologies Ltd., UK (ONT)) in three biological repeats. Basecalling was performed with Guppy v.4.0.15. The presented transcriptomes give advantages in the identification and func-tional characterization of RNA transcripts that are direct targets of the Nonsense-mediated mRNA decay system.

Value of the Data
• The identification of NMD targets is a challenging task. In this context, the ONT direct RNA sequencing seems to be the ideal technology for the comprehensive and correct identification of all mRNA isoforms in NMD-deficient mutants because of its ability to identify full native transcripts [5] . This is the first dataset that describes native transcriptomes of plants with a disrupted NMD system. • The moss P. patens is a suitable model for studying the NMD pathway in plants [6] . Therefore, the presented dataset can be used for the analysis of transcriptome regulation in eukaryotes. • Using nanopore sequencing is the main advantage of the reported dataset because of the analysis of native transcripts. Therefore, it might be used for revealing new RNA targets of the NMD system in plants. Using this dataset, one can also correct own RNA-seq data and investigate the principles of plant transcriptome regulation.

Data Description
The dataset contains data obtained through the sequencing of purified polyadenylated RNAs extracted from the moss Physcomitrella ( Physcomitrium patens ) SMG1 KO mutant [3] . The three biological replicates were sequenced with a MinION sequencer (Oxford Nanopore Technologies Ltd., UK (ONT)). Each library was sequenced in an individual flow cell (R9.4.1) during 72 h. Raw data was basecalled to FASTQ with Guppy v.4.0.15. FASTQ files were deposited in NCBI Sequence Read Archive and are accessible through the BioProject: PRJNA670829 . The main information about runs is shown in Table 1 . The reads of the control sample (RNA CS) used in the library preparation (SQK-RNA002) were not filtered. Read quality score is calculated as the mean Phred quality score of all read nucleotides. Default minimum value of quality score for further analysis is 7. More than 87% of reads had a quality score higher than 7, and mean quality score among all reads lay between 9.4 and 10.2. Using minimap2, more than 90% of obtained nanopore reads were mapped to the reference genome, suggesting the high quality of data. The longest read is 51089 nucleotides, and its quality score is higher than 7. The distribution of read lengths is shown in Fig. 1 . Only reads with length less than 40 0 0 nucleotides are presented because longer reads are rare.

Experimental Design, Materials and Methods
SMG1 is the core kinase of the NMD machinery. Several lines with a deleted SMG1 in the basal land Physcomitrella patens subsp. patens ("Gransden 2004", Frieburg) were produced by James P. B. Lloyd [3] . One of these lines, SMG1 KO mutant line 2, was used for direct RNA sequencing by Oxford Nanopore Technology (ONT). Protonemata of the mutant line were grown in 200 ml liquid BCD medium supplemented with 5 mM ammonium tartrate (BCDAT) during a 16 h photoperiod at 25 °C for 5 days [4] . Total RNA from protonemata of three biological repeats was isolated using TRIzol TM Reagent. RNA quality and quantity were evaluated via electrophoresis in an agarose gel with ethidium bromide staining. The precise concentration of total RNA in each sample was measured using a Qubit TM RNA HS Assay Kit, 5-100 ng on a Qubit 3.0 (Invitrogen, US) fluorometer. 100 μg aliquots of total RNA were diluted in 100 μl of nuclease-free water, and poly(A) was selected using Poly(A)Purist TM -MAG Purification Kit Invitrogen by Thermo Fisher Scientific. Resulting poly(A) RNA was eluted in nuclease-free water. The Direct RNA sequenc-ing kit by Oxford Nanopore (SQK-RNA002) including the optional reverse transcription step was used to prepare libraries from the poly(A) RNA. 200 ng total library was loaded in FLO-MIN106 (ONT R9.4) flow cells and sequencing on the MinION platform and standard MinKNOW software. We used Guppy 4.0.15 (Oxford Nanopore Technologies) for basecalling direct RNA sequencing data. MinIONQC.R script [7] and Samtools v.1.10 [8] were used to calculate FASTQ quality control statistics. Minimap2 v.2.17 [9] with parameters -ax splice -uf -k14 -G2k was used to align reads to Physcomitrella ( Physcomitrium patens ) genome (assembly version v3) with added yeast enolase control sequence.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.