Transcriptome of Russet Norkotah and its clonal selection, TXNS278

Potato has a large genetic diversity. This diversity is in part due to somaclonal variability that appears within potato selections for which tubers are used as seeds. However, the potato tetraploid genome, as well as the use of tubers for crop propagation, does not allow for easy genetic studies. The objective is to gain knowledge at the genomic level from standard Russet Norkotah and a subclonal Russet Norkotah selection TXNS278. In this report, we used RNA-seq, which allows genome-wide gene expression analysis to sequence the transcriptomes of the subclonal Russet Norkotah selection TXNS278 with standard Russet Norkotah grown in commercial fields. Among the selections, TXNS278 appeared in a multi-year analysis in Texas as a top No 1 yielding variety. Russet Norkotah and TXNS278 leaf and root transcriptomes were sequenced at two time points during growing season.


Objective
Potato (Solanum tuberosum L.) is one of the most important human food crops, and its production is increasing [1]. Potato genetic diversity is partially the consequence of a tetraploid genome, as well as propagation using the potato tubers and not true seeds. This allows the development of new mutations in cultivated populations called somaclonal variations [2]. The cultivar Russet Norkotah (RN) is the 2nd most popular russet in the US [3]. In 1989, the Texas Potato Breeding and Variety Development Program made strain selections to improve the Russet Norkotah cultivar [4]. Strains were first selected for criteria such as large vines (giant hill). After these initial selections, the remainder of the selection cycles were based on tuber type and yield across multiple locations in Texas. By 1998, six clones remained in the program, including the Texas Russet Norkotah Strain 278 (TXNS278).
RN and TXNS278 were grown from 1989 to 2007 for 18 generations in 15 locations in Texas. It was determined that TXNS278 is genetically uniform and stable from generation to generation with no evidence of variants [5,6].
The objective of this study was to compare differentially expressed genes between RN and TXNS278. Genomic differences resulting from somaclonal selection of potato are difficult to characterize. For instance, to date, no genetic differences have been identified between RN or any other Texas selections including TXNS278 using isozymes, RFLP, and AFLP markers [6].
In this dataset, next-generation sequencing was used to sequence root and leaf transcriptomes of the two potato cultivars at two time points in the growing season. RNAseq is a powerful tool for research, which facilitates identifying differences in gene expression between cultivars in order to gain insight into their genomic and physiological differences.

Plant material for transcriptome analysis
Russet Norkotah and the clonal selection TXNS278 were grown near Springlake, TX as part of the Texas Potato were planted April 1, 2013 from tuber seed obtained from certified seed growers. Tissues were sampled on July 6th (T1, 66 days after planting) and on July 24th (T2, 84 days after planting). Tissues were stored in a tube in RNAlater Stabilization Solution (Thermo Fischer Scientific, Waltham, MA) and placed on ice in a cooler before transportation to the laboratory, where tissues were frozen upon arrival and stored at − 20 °C. Leaves and roots from different plants within the same plot were sampled independently at both time points. Plots of TXNS278 and RN were separated by two feet.

RNA extraction
RNA was isolated from individual samples using the Qiagen plant RNeasy kit. DNAse treatment was performed according to manufacturer recommendations (Qiagen, Hilden, Germany). RNA was quantified using an Infinite 200 PRO NanoQuant (Tecan, Mannedorf, Switzerland), and quality was verified by Bioanalyzer (Texas A&M AgriLife Genomics & Bioinformatics Service, College Station, TX).

RNA sequencing
A total of 24 RNA samples were submitted to the AgriLife Genomics & Bioinformatics Service. After RNA quality evaluation, three independent RNA samples (biological replicates) were pooled. Poly(A) RNA enrichment, library construction, and RNA sequencing from each pool were performed at the AgriLife Genomics & Bioinformatics Service.
One library was made for each potato cultivar (RN and TXNS278) at each time point (T1 and T2) and from each tissue (leaf and root). Therefore, a total of eight libraries were made using the TruSeq Kit (Illumina, San Diego, CA). The sequencing was performed using 100 single-end reads on one lane of the Illumina Hiseq-2000 platform.
The libraries were made publicly available through NCBI and can be found at the following address https :// www.ncbi.nlm.nih.gov/geo/query /acc.cgi?&acc=GSE87 857. A summary of the sequencing results is described in Table 1.

Mapping of RN and TXNS278 transcriptomes to the potato genome
Leaf and root samples were collected from a commercial field near Springlake, TX in 2013. Two time points were tested, based on plant development: T1 (full flowering) and T2 (senescing plants).
Over 176 million reads that passed the quality filters were obtained (Additional file 1: Table S1), with an average of 22 million reads per library. The reads were mapped to the S. tuberosum double haploid DMI3.4 genome ensembl19 using Tophat2 in CyVerse (iPlantcollaborative.org). Only 33% of the reads from the library RN-LeafT2 mapped to the potato genome (Table 1 data file 8). After exclusion of this library, a minimum of 66.3% of reads mapped to the potato genome, from which 55 to 68% were uniquely aligned reads (Additional file 1: Table S1). The samples used for the data file 8 were infected with potato virus and only 33% of the reads matched the potato genome. Consequently these data should be excluded from further analyses. Interestingly, a higher percentage of unique mapped reads were obtained from the root libraries (62-67%). This difference might be related to higher rRNA levels in leaves than in roots in spite of the mRNA enrichment.

Limitation
The libraries were sequenced on pooled samples from three biological replicates; this reduced the cost of sequencing multiple samples, but limited the statistical power of the analysis. Nevertheless, several studies