Robustness of RNA sequencing on older formalin-fixed paraffin-embedded tissue from high-grade ovarian serous adenocarcinomas

Formalin-fixed paraffin-embedded (FFPE) tissues are among the most widely available clinical specimens. Their potential utility as a source of RNA for transcriptome studies would greatly enhance population-based cancer studies. Although preliminary studies suggest FFPE tissue may be used for RNA sequencing, the effect of storage time on these specimens needs to be determined. We conducted this study to determine whether RNA in archived FFPE high-grade ovarian serous adenocarcinomas from Surveillance, Epidemiology and End Results (SEER) registries was present in sufficient quantity and quality for RNA-Seq analysis. FFPE tissues, stored from 7 to 32 years, were obtained from three SEER sites. RNA was extracted, quantified, quality assessed, and subjected to RNA-Seq (a whole transcriptome sequencing technology). FFPE specimens stored for longer periods of time had poorer RNA sample quality as indicated by negative correlations between specimen storage time and fragment distribution values (DV). In addition, sample contamination was a common issue among the RNA, with 41 of 67 samples having 5% to 48% bacterial contamination. However, regardless of specimen storage time and bacterial contamination, 60% of the samples yielded data that enabled gene expression quantification, identifying more than 10,000 genes, with the correlations among most biological replicates above 0.7. This study demonstrates that FFPE high-grade ovarian serous adenocarcinomas specimens stored in repositories for up to 32 years and under varying storage conditions are a promising source of RNA for RNA-Seq. We also describe certain caveats to be considered when designing RNA-Seq studies using archived FFPE tissues.


Introduction
Advances in molecular technologies have enabled more comprehensive molecular characterization of pathology tissues beyond what can be achieved using morphological or clinical PLOS  coded or coded but unlinked tissue blocks from the SEER registries. No contact with subjects was made for this study.

Subject/specimen selection
The same FFPE tissue sections used in our previous analysis of SEER FFPE tissues for whole exome sequencing were used in this study [5] and from the same cases as were used for proteomics analysis of FFPE tissues [12]. As specimens were retrospectively collected by the RTRs from multiple medical facilities and pathology labs within each of the three catchment areas, fixation times/conditions (such as type of formalin used) and storage conditions (such as temperature, controlled humidity, etc.) were unknown. Tissues were from high-grade serous ovarian adenocarcinomas (ICD-O-3 Topography code: C56.9; Morphology codes: 8441/3, 8460/3, 8461/3) and storage times ranged from 7 to 32 years; storage times were calculated from the date when the cancer was diagnosed to the date when RNA was extracted (storage times were rounded up to the next year). For seven cases, two different sections, or replicates, were prepared and sent to the lab for analysis ( Table 1). The lab performed each study procedure on these sections in a blinded fashion. The sections were then used to assess consistency of results.
Each SEER registry conducted a pathology review of lead and trail sections flanking the 5 sections from each tissue block to determine whether tissue was consistent with the selection criteria (high-grade serous ovarian adenocarcinoma, � 50% of cells with nuclei consistent with malignant cells, and � 50% of cells were necrotic); approximately 30 cases from each registry were reviewed to ultimately select 20 cases that met study criteria. For each case identified as meeting the study criteria, five 10-micron sections were placed in a sterile tube. Additional details regarding the pathology review are described previously [5].

DNA/RNA extraction and RNA quantity and quality assessment
The FFPE tissues were deparaffinized using xylene and ethanol according to the procedures detailed by Qiagen for the AllPrep DNA/RNA FFPE Kit. RNA was eluted in 60 microliters EB buffer (Qiagen). Qiagen All Prep FFPE kit was used to purify DNA and RNA from each specimen; the entire tissue section sent was used for nucleic acid purification (i.e. no macrodissection was performed). RNA quality was checked by running on the Agilent Bioanalyzer (RNA Nano chip) and RNA Integrity Number (RIN) values were obtained. The RNA quality was also assessed using the RT 2 RNA QC PCR Array from Qiagen that tests for the presence of transcripts of two housekeeping genes, ACTB and HPRT1, in addition to testing for inhibitors of Reverse Transcription and PCR amplification reactions, and the presence of genomic DNA contamination.

Library preparation
NGS libraries were prepared using the NEBNext Ultra II RNA Library Prep Kit for Illumina. The input amounts of RNA used for library preparation ranged from 21 ng to 1000 ng. The input amount of RNA (Table 2) was determined by the amount of sample available. Wherever >2000 ng RNA was available (36 samples), the maximum amount per reaction (1000 ng) was used as the input. For samples where the total amount available was between 2000 ng and 500 ng, half of the RNA was used for library preparation, and the other half was saved for repeating the library prep, if needed. For samples with less than 500 ng available, the entire RNA was used for library preparation. The volume used for the library prep ranged from 0.6 ul to 8.6 ul, depending on the RNA concentration. None of the samples were concentrated using SpeedVac or any other means. Libraries were prepared per manufacturer recommendations. Briefly, the ribosomal RNA (rRNA) was depleted by hybridization to rRNA probes, followed by RNase H digestion of the hybridized RNA. RNase H is a better method than RiboZero for depleting ribosomal RNA when low quality RNAs, such as from FFPE tissues, are anticipated [7]. Excess probes were removed by DNase I digestion, followed by RNA purification using Agencourt RNAClean XP beads. Because the samples had small sized fragments to begin with, no fragmentation was done during the library prep. First strand cDNA was synthesized by ProtoScript II Reverse Transcriptase using random priming at 42˚C, for 30-45 mins. Second strand cDNA synthesis by nick translation was enabled by RNase H by creating nicks and gaps in the RNA strand. Ends of the cDNA were then repaired to make them blunt and phosphorylated, followed by dA-tailing and adaptor ligation. Sample indexes were added during the PCR enrichment step. 15-19 cycles of amplification were used. PCR reactions were purified using Agencourt SPRIselect beads (0.9X). Samples with additional peaks at~80bp (primers) or 128bp (adaptordimers) in the bioanalyzer traces underwent additional clean ups to eliminate the residual primers or adaptor dimers. Libraries were QC'ed by running on the Bioanalyzer. The average library fragment size ranged from 290bp to 571bp (median: 405bp), and library yields ranged from 1 ng to 5 ug, with the median yield of 23.2 ng.

RNA sequencing
The libraries were sequenced on Illumina NextSeq 500 using NextSeq High Output v2 kit. Three samples were sequenced using the pair-end run mode with run setup as 2x76 bps. The rest of the 64 samples were sequenced on NextSeq 500 using the 1x151 bps single-read run with 10 samples pooled per run; this was done as recommended by Illumina tech support when the initial pair-end 2x76bps runs did not work for Read 2. In addition, 25 samples (25 out of 67; 37%) with low sequencing yield were re-sequenced on NextSeq 500 using the 1x151 bps runs to get additional sequencing depth.

Contamination assessment
Bacterial contamination of the RNA was determined for all samples using the FastScreen (https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen) software tool. The mapping step in the software filtered out the bacteria reads.

Bioinformatics and statistical analysis
Illumina's HiSeq Real Time Analysis software (RTA) was used for processing images files, and the Illumina bcl2fastq_v2.17 was used for demultiplexing. The quality of reads was investigated using FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc) to detect any abnormalities in the raw sequencing reads. FastqScreen was also used to detect any reads originating from other species indicating bacterial contamination. The sequencing adapters and low quality bases were trimmed using Trimmomatic [13] software (version 0.30). The trimmed reads were aligned to human genome reference sequence GRCh Build 38 using STAR aligner version 2.5.1. The Gencode annotation version 24 was used to guide the spliced transcript alignment. Picard's MarkDuplicates (https://broadinstitute.github.io/picard/command-lineoverview.html#MarkDuplicates) was used to evaluate the library complexity by determining the amount of unique or non-duplicated reads in the samples. Picard's CollectRnaSeqMetrics program (https://broadinstitute.github.io/picard/command-line-overview.html) was used to collect mapping percentages on coding, intronic, intergenic and UTR regions as well as gene body coverage. In addition, RSeQC [14] version 2.3.5 was used to determine read pair inner distance, read GC content, junction saturation, and library strand. Additional QC statistics were generated by using multiQC [15] software package. RSEM [16] version 1.2.22 was used to get raw count as well as normalized read count on genes. Genes expressed below a noised threshold (TPM<1 or raw count <5) were filtered. Evidence of batch effects was analyzed using a Principal Components Analysis (PCA) analysis using an R script. Correlation analysis was carried out using the Pearson correlation between different metrics. R packages were used to generate sample correlation plots. See S1 Table for software pipeline and parameters. The biological replicates were handled as independent samples since the bacterial contamination, percent of mapped reads, and percent of mRNA reads were very different for replicates, as well as the libraries were prepared independently for each replicate sample. The correlation between the replicates was calculated.

RNA quality assessment
RNA extracted from the 67 FFPE samples obtained from the SEER repositories was assessed for quality before preparing NGS libraries (Fig 1). The RIN values of the RNA samples could be calculated for 61 of the 67 samples, and were between 1 and 2.8, with a median value of 2.4 (Fig 1 and Table 2). To better assess the degradation of the samples, fragment distribution values (DV) were calculated to estimate the percentage of fragments longer than 200 nt (DV 200 ), 100 nt (DV 100 ), and 50 nt (DV 50 ). The DV 200 (percentage of RNA fragments > 200 nt in size) was between 1% and 50%, with a median value of 13%. The DV 100 (percentage of RNA fragments > 100 nt in size) was between 11% and 86%, with a median value of 56%. DV 50 (percentage of RNA fragments > 50 nt in size) was found to be between 76% and 100%, with a median value of 98%; see Table 2. DV estimates could not be made for 14 samples; this could have been due to the fragment distribution in the sample not allowing the software to accurately identify the markers that are used to estimate the fragment size. Shorter fragments are a common characteristic of more degraded RNA. When comparing specimen storage times with DV 100 and DV 200 values, there was a negative correlation between the storage time and DV values (-0.44 for DV 200 and -0.39 for DV 100 ) indicating that longer storage time is associated with more degraded RNA (Fig 2); specimens stored for 7 to 12 years had an average DV 200 of 20 and DV 100 of 53, those stored 13 to 22 years yielded an average DV 200 of 14 and DV 100 of 46, and those stored 23-32 years yielded an average DV 200 of 8 and DV 100 of 34. In addition, there was a weak negative correlation between the age and library yield (-0.21); specimens stored for 7 to 12 years yielded an average library of 500 ng, stored 13   to 22 years yielded an average library of 87ng, and those stored 23-32 years yielded an average library of 41ng. This was expected since degraded samples have less amplifiable RNA and therefore, have lower library yields compared to non-degraded RNAs when using the same PCR conditions. We did not find the RIN values to be a useful measure of quality for the highly degraded FFPE RNA samples (Fig 2). DV 200 , though a better metric than RIN, was also not the best QC metric for these samples, since highly degraded RNA fragment sizes were less than 200 nt; therefore, the DV 200 values were very low. Thus, we decided to calculate DV 100 and DV 50 . DV 100 gave the most useful measure of fragment distribution, and thus, sample quality for these samples, as is evident from the correlation plot in Fig 2. We also plotted DV 200 and DV 100 values against the percentage mapped RNA (Fig 3) in order to identify minimum requirements for the sample QC values. Though both DV 200 and DV 100 resulted in similar plots, DV 100 provided greater resolution, making it the more suitable QC metric for highly degraded samples. Samples with DV 100 below 40 resulted in poorer mapping of the sequenced RNA reads (percent mapping < 50), while samples with DV 100 > 60 were more likely to perform better (percent mapping >60). With the caveat that DV value is not the sole determinant of sequencing outcome, we suggest, for future analyses of similar quality samples, proceeding with samples with DV100 > 40; we also suggest starting with higher input amounts of total RNA for all samples with DV100 < 60. The current Illumina recommendations for their TruSeq RNA Access / RNA Exome kit (the recommended kit for FFPE RNA samples) are based on DV 200 ; values above 70% are considered good with input recommendations as low as 20 ng, while 20-50 ng and 50-100 ng are recommended for samples with medium (DV 200 50-70%) and low (DV 200 30-50%) quality. Illumina does not recommend proceeding with samples with DV 200 < 30%. It is noteworthy that in the current set of samples, there were no samples with DV 200 >50%, and only 8 had DV 200 between 30% and 50% ( Table 2 and Fig 3). Thus, our DV 100 -based recommendations are useful for samples that fall below the 'low' quality RNA, as defined by Illumina. In order to further assess the quality of extracted RNA, we performed qPCR assays on a subset of SEER samples, along with some non-FFPE total RNA and some non-SEER FFPE RNA samples (S2 Table). We tested the panel of RNA samples for the presence of amplifiable transcripts for two housekeeping genes-ACTB and HPRT1 (amplicon sizes 174 bp and 57 bp, respectively). The assay also included controls for the presence of inhibitors of reverse transcription and PCR amplification, and the presence of DNA contamination. We were unable to detect the presence of either of the two housekeeping genes in any of the FFPE-RNA samples tested, even though no inhibition of reverse transcription or PCR amplification was detected in any of the tested samples. The non-FFPE samples showed good amplification of both housekeeping genes. This indicates that the RT 2 RNA QC PCR Array is not a good assay to assess the integrity of these samples. This could be either because of inherent incompatibility of some assay component(s) with FFPE samples, or because the genes or amplicons tested may not be optimum and another set of housekeeping genes / amplicons (with higher expression level in the assayed tissue and smaller amplicon size) may be more suited for FFPE-RNA.
Total RNA libraries were prepared from all the samples after rRNA depletion. Because the degraded FFPE-RNA fragments are likely to lack an intact polyA tail, rRNA depletion, followed by random priming during the reverse transcriptase reaction was chosen as the method of choice over the use of poly(dT) for mRNA selection and for priming first strand cDNA synthesis.

QC measures of RNA sequencing
Several RNA sequencing QC parameters were examined (Table 2), including sequencing yield, Q30 (% of bases greater than Q30 is a metric for the overall quality and accuracy of base calling of the sequencing run [17]) (Fig 4A), % total mapped reads, % uniquely mapped reads, and % non-duplicated reads (Fig 4B). The total pass filter reads for all samples were between 10.6 million to 159.4 million due to large variation in the library quality. All samples resulted in 69-91% of bases greater than quality score of Q30 (Fig 4A). Sequencing reads from smaller fragment sizes have very short portions of useable reads and are more likely to have ambiguous alignments to the reference genome, and are thus filtered out by the mapping software. Therefore, highly degraded RNA samples have very low mapping rates, which is reflected in the positive correlation between percent total mapping and DV values, e.g., a correlation value of 0.58 with DV 100 (Fig 2). Percent total mapping to the human genome (hg38) was between 7.4% to 94.9% (Fig 4B). Percent unique mapping was between 2% to 86% and percent non-duplicated reads for the samples ranged from 1% to 76%. These QC measures indicate vast differences in the amount of unique fragments per sample.

RNA statistics and potential artifacts
Picard's CollectRNAStatistics was used to determine the percent ribosomal RNA, mRNA, intronic, and intergenic bases present in the alignment files. All samples had less than 2.5% ribosomal bases. Intronic bases varied between 17% to 72%. Intergenic bases varied between 11% to 81%. The percentages of mRNA bases were between 1% to 34% (S1 Fig). In terms of the GC content profiles, some samples showed abnormal peaks around 5% and 85% GC content (Fig 5). The extremely low and high GC samples have very low mRNA mapped reads compared to samples with balanced GC content (the GC content normally is in range of 40%-60% for human samples). This observation agrees with a previous study [18], where high GC FFPE samples had a large fraction of reads mapped to the intronic regions that may have contributed to the abnormal GC-content peak in FFPE samples. The FFPE fixation process is known to introduce molecular artifacts such as GC content alterations compared with fresh frozen tissues [4].
Gene Body Coverage plots measure the percentage of reads that cover each nucleotide position of all of genes scaled to 100 bins from 5 0 UTR to 3 0 UTR. It is used to identify 3' or 5' bias that may have occurred during the library preparation or sequencing process. A 3 0 -end bias in gene body coverage may indicate degradation of RNA. A 5 0 -bias may be due to alternative poly-adenylation sites [18]. The coverage plot over normalized gene body for all samples indicates a small set of samples having either 5' bias, or 3' bias ( Fig 6A). The rest of the samples have relatively uniform coverage distributions. In order to assess whether sample quality and mRNA content may play a role in the 5' and 3' bias for gene body coverage, we split the samples into two groups based on the mRNA yield. We then selected 32 samples that had mRNA percentages greater than 10% and observed that the bias along the transcript length was mostly eliminated from the higher quality RNA sample group (Fig 6B). This result is consistent with previous studies [19,20] that revealed ribosomal RNA depletion based protocols provided more uniform transcript coverages than poly-A selection based mRNA-Seq protocol. However, we found that with highly degraded RNAs, 5 0 to 3 0 coverage bias along transcripts were not completed eliminated.
We also investigated the sample storage age impact on the uniformity of the read coverage by calculating the correlations between the specimen storage time with the read mapping rates for the exonic, intronic regions as well as the uniformity of the read coverage depicted by the 5' to 5' bias. There were slightly negative correlations (S2 Fig).

Contamination assessment
Fifty six of the 67 samples were found to have greater than 1% bacterial contamination ( Table 2). 41 of these had greater than 5% bacterial contamination. A common species contaminating the samples was Propionibacterium acnes. Although all of the contamination of the samples cannot be explained by this species, up to 14% of the total reads accounted for in the contaminated samples were from this bacteria, with percentages ranging from 0.02% to 14.35% of the reads. Twenty-eight of the samples had more than 5% of reads mapping to Propionibacterium acnes (S3 Fig). There were six samples contaminated with mouse genome with 4%-11% of reads mapped to mouse genome. Since the six samples were from two different SEER sites, not all samples from the same sites were contaminated. In addition, among the six contaminated samples, one sample had a biological replicate (sample SEER_045 and sample SEER_065) collected from same patient and stored at the same SEER site; only SEER_065 had reads mapped to mouse genome in the 4-11% range. This suggests that samples were not contaminated during the initial sample collection, but may have been introduced in somewhere along the sample preparation process (which includes all of the steps from tissue acquisition through to RNA sequencing). While we cannot conclude where the contamination was from, we do know that contamination with the bacterial or mouse genomes was not from a single SEER site; rather, it was found in 56 samples from all 3 SEER sites.

Specimen storage time, genome mapped reads and mRNA correlations
The correlation analysis (Fig 2) between specimen storage (archival) time and total percentage of mapped reads showed a weak negative correlation value of -0.14, which suggests that the storage time might affect the sample quality. In addition, the bioanalyzer library molarity and mapped mRNA reads shows a positive correlation value of 0.37. We have observed high percentages of the mapped reads aligned to intergenic regions as well as intronic regions, especially for poor quality samples with extremely low or high GC content. The DV 100 has positive correlations with the percentage of reads mapped on human genome (0.58), non-duplicated reads (0.53), as well as total gene detected (0.43). DV 100 , together with bioanalyzer library molarity values after library prep, can be used as a metrics parameter to screen poor quality samples before sequencing.
In order to better depict the relationship between sample storage time and sample quality, we divided the samples into three storage time cohorts, and calculated the mean value and SD of the sample DV 100 and sample mRNA content. The longer storage time was associated with lower RNA quality (DV 100 ) and lower mRNA content from NGS analysis. However, no significant correlation was observed between SEER site and DV 100 or mRNA content (Fig 7).
We further investigated which gene group might be conserved among samples across different specimen storage time cohorts and SEER sites based on gene expression profiles. We extracted 189 genes that had relatively high expression across samples (Part A of S3 Table). KEGG Pathway analysis of the 189 genes showed most of them were related to Ribosome, P13K-Akt signaling pathway, ECM-receptor interaction and Focal adhesion pathways (Part B of S3 Table). The P13K-Akt signaling pathway is known as a central regulator of ovarian cancer. This result suggests that valuable gene expression information could be extracted from aged FFPE blocks with degraded RNAs based current NGS technologies.

Correlations between biological replicates
Seven pairs of biological replicates (2 separate sets of tissue curls were cut from the same FFPE blocks for 7 subjects) were included in this study (S4 Table). The sample QC assessment showed most replicates (6 out of 7) had very similar RINs. However, the two oldest pairs of the samples (SEER 005 (24 year old sample) and SEER 007 (18 year old sample)) each had one sample with extremely low RIN value around 1.0. The library yields for the replicated samples showed from 1 to 100-fold difference between replicates. In addition, 5 pairs of replicates were heavily contaminated with bacteria genomes but the percentages of bacterial contamination between the replicates were very different, suggesting sample contamination. Moreover, the sequencing yields between the replicates were also different. Among the 7 pairs of replicates, 4 pairs of biological replicates had correlation values between 0.7-0.8 when comparing gene expression values between the replicates pairs; the remaining 3 pairs had correlations below 0.22. The low correlations for the biological replicates indicate that the variability in extracted RNA and sample processing have contributed towards the differences in sample gene expression profiles. Principal Components Analysis (PCA) showed 3 samples were extreme outliers while rest of the 11 samples are more closely related based on gene expression (S4 Fig). For some of the samples with similar gene expression profiles, the correlations were above 0.85, indicating that the results among high quality samples including biological replicates are reproducible.

TCGA ovarian prognostic signature gene set expression profile
In order to compare our sequencing data with pre-existing ovarian cancer data, we selected TCGA ovarian prognostic signature gene set [21] which includes 193 genes expressed in a fashion that can predict patient survival; TCGA data were generated using high-quality freshfrozen tissues. We profiled the expression level of this gene set using the SEER FFPE RNA-Seq data. All of the 193 genes from the TCGA dataset were found in the SEER FFPE RNA-Seq data set with a minimum of 5 read counts per gene as the minimum cutoff threshold. The gene expression values had a wide dynamic range with up to nearly 100 fold change differences among the 193 ovarian prognostic signature gene set.
In addition, we downloaded 48 RNA-Seq data sets from TCGA Ovarian Serous Cystadenocarcinoma project (https://portal.gdc.cancer.gov/projects/TCGA-OV) and one normal ovary tissue RNA-Seq data set from Human BodyMap Illumina project (https://www.ebi.ac.uk/ arrayexpress/experiments/E-MTAB-513). We then examined gene expression profiles of the 193 signature gene panel using the three data sets. The expression level was higher overall in a majority of the 193 genes in the TCGA fresh-frozen ovary tumor samples and Human Body-Map normal ovary tissue data sets compared to the SEER FFPE data set. This may be due to higher sequencing depth in the TCGA/Human BodyMap data sets than SEER FFPE RNA-Seq data. While SEER FFPE samples were targeted for 50 million reads per sample, TCGA-OV data set had 100-300 million reads per sample and Human BodyMap ovary tissue sample had about 243 million reads. Among the 67 SEER FFPE RNA-Seq samples, 26 SEER samples (Fig  8, cluster B) showed good correlation with TCGA and the Human BodyMap data sets (Fig 8, cluster A) for the majority of the genes. 13 SEER samples (Fig 8, cluster C) displayed low to medium overall expression of the 193 signature gene set. 28 SEER samples had very low overall expression and clearly were clustered as completely different clusters (Fig 8, cluster D). Most of those in cluster D had very low library yields and molarity compared to the rest of the samples. For these 28, there was low quantity of cDNA to sequence, very few mapped mRNA reads and very low gene counts.
Among the 193 prognostic signature gene set, we observed the gene expression level are different between the TCGA cluster A samples from the rest of the SEER samples. Such differences could be the consequence of different rRNA depletion methods as well as RNA extraction kits used. We also observed only a small subset of genes were expressed in SEER Cluster B but missing in TCGA Cluster A data sets. Upon closer examination, we found that those genes encode histone proteins. The transcripts of the histone genes, such as HIST1H2AB, HIST1H2BE or HIST1H3E, lack polyA tails, therefore, the polyA capture based RNA-seq protocol could miss those transcripts in TCGA data set.

Discussion
This study demonstrates that FFPE samples acquired from SEER under varying storage duration and conditions still have potential value as sources of RNA for NGS studies. Longer FFPE tissue storage time was associated with more degradation of the RNA (which supports previous findings [6,22]). This in turn can cause lower yields in library product, as we have shown. For a large portion of the samples (60%), gene expression quantification found more than 10,000 genes; the number of genes expressed was comparable to human control RNA (about 13,000 genes). In addition, the correlations among 4 out of 7 biological replicates were above 0.70.
Our study complements and expands upon several published studies that examined the use of FFPE tissues for RNAseq. As noted in a recent review article on the use of FFPE for high throughput genomic studies, reliable gene expression data can be obtained from RNAseq using FFPE tissues, provided they are not stored for a long periods of time [22]. Our study expands upon the storage time issue by including a wider range of storage times. We examined the use of FFPE tissues that had been stored up to 32 years (which was 12 years longer than samples analyzed in previous studies [6,22]) and found that while storage time was negatively associated with RNA sample quality, gene expression data could be obtained regardless of specimen storage time. Additionally, while traditional QC measures such as RIN are often used, they rarely predict the success of sequencing from FFPE tissues [23]. Here, we suggest using DV 100 for initial sample QC, combined with library quality analysis using bioanalyzer and qPCR to avoid sequencing of samples that are not likely to yield useful data. We further expanded upon previous studies by using a larger number of FFPE tissues from a single cancer site. Our study leveraged the SEER registries to obtain FFPE tissues from 60 ovarian cancer cases, which was more than triple what previous studies used for examining the utility of FFPE for RNAseq using a single cancer site; these previous studies had used between two [7-10, 18, 24] to 19 FFPE tissues from a single cancer site [6]. Together, the large sample size and wide range of specimen storage times allowed a more robust development of quality metrics for helping to predict RNAseq success and study design elements to increase the utility of resulting RNAseq data from archived FFPE tissues.
Another strength of our study was that the samples are reflective of what epidemiologists might anticipate having access to for translational clinical research studies. Residual tissues from surgical resections are typically preserved as FFPE blocks, and subjected to a variety of storage conditions. Our study shows that it is not essential to have pristine FFPE blocks in order to conduct RNA-seq analyses and obtain useable results.
Sample quality was one of the limitations of our study. We found that 40% of the samples had very low mRNA and gene counts due to highly degraded RNAs, and about 80% of the samples were contaminated with bacterial or mouse RNA. Larger library size was correlated with a lower percentage of total mapped reads to the human reference sequence (-0.78). Samples with heavier bacterial and mouse contamination had larger library sizes (correlation value is 0.77), which can explain the lower mapping percentages. As previous studies [19,25] suggested, targeted exome capture transcriptome protocols, such as Illumina TruSeq RNA Access Library Prep / RNA Exome kit, designed to capture the target gene regions using capture probes, effectively eliminated the contaminating species from costly sequencing and provided accurate and unbiased estimates of RNA abundance for degraded RNAs. However, the TruSeq RNA Access kit is not recommended for highly degraded samples (DV 200 < 30%) based on the protocol. Only 8 of the 53 SEER FFPE samples for which we could measure DV 200 , had DV 200 > 30%. Therefore, initial QC checks for RNA degradation and selection of a suitable library preparation protocol based on the RNA degradation profile is highly recommended. Our analysis showed that ribosomal RNA depletion based whole transcriptome analysis protocol such as NEBNext Ultra II RNA Library Prep Kit for Illumina works very well with sample input far lower in quantity and quality than typically recommended. However, it still showed limitations for handling highly degraded RNA samples. This observation is consistent with the recent study by Schuierer and colleagues [19]. Careful experimental design and bioinformatics analyses are critical to ensure high accuracy and reproducibility for profiling of very heterogeneous RNA samples that cover a wide range of low quantity and extremely low quality samples.
From our study, we observed that gene body coverage along the transcript length was fairly even for a majority of the RNA samples. Data from this study agree with a previous study done by Graw and colleagues [18] that random priming during reverse transcription of RNA from FFPE tumor samples effectively eliminates the 3 0 bias expected from the heavily degraded RNA. Thus, we recommend rRNA depletion using RNase H and random priming of the reverse transcription reaction for preparing sequencing libraries from FFPE-RNA samples, over the use of poly(dT) for mRNA selection and conversion to cDNA.
Another limitation to this study was the lack of fresh frozen (FF) tumor tissue from the same patients to compare the results. However, several studies have provided evidence showing FFPE as an acceptable alternative when FF tissue is not available for NGS [6,8,9,18,20,24,26]. From our study, we extracted genes that are relatively highly expressed among samples from different specimen storage time cohorts and storage sites and performed pathway analysis. Most of the genes are related to the Ribosome, P13K-Akt signaling pathway, ECM-receptor interaction and Focal adhesion pathways. The P13K-Akt signaling pathway is known as a central regulator of ovarian cancer. Moreover, we profiled the selected 193 ovarian prognostic signature gene set among the SEER FFPE samples, FF Ovarian Serous Cystadenocarcinoma sample from TCGA-OV project RNA-Seq, and the FF normal ovary tissue sample from Human BodyMap Illumina RNA-Seq project. The result of profiling the 193 ovarian prognostic signature gene showed 26 of the 67 SEER samples (39%) had similar expression patterns to the TCGA datasets despite the wide range of variations in storage time, sequencing depth, sample quality and library preparation protocol differences. Twenty eight (42%) of the SEER samples had low gene counts. The low gene counts were not associated with storage time. The low gene counts were likely due to very poor sample quality and extremely low library yields and library molarity, which limited the amount of cDNA available for sequencing and resulted in very low number of mapped mRNA reads.
The results collectively suggest that the NGS data produced from the FFPE SEER archive tissue can be reproducible when samples meet specific quality control criteria or the following caveats are taken into account: • Sample quality assessment using the normal QC metrics such as DV 200 or RIN alone may not provide an accurate measurement for highly degraded RNA. This is in agreement with several earlier studies that showed RIN may not be an accurate RNA quality assessment measure [23]. Therefore, it is necessary to adjust the QC metrics to assess shorter RNA fragments.
• DV 200 is not very informative for highly degraded samples, but DV 100 provides some resolution to assess the sample quality. Based on this study of FFPE-RNA, we suggest proceeding with samples with DV 100 > 40. For all samples with DV 100 < 60, it is beneficial to use higher sample input amounts, whenever possible. Using DV 100 for initial sample QC, combined with library quality analysis using bioanalyzer and qPCR are often necessary to avoid sequencing samples that are not likely to yield useful data.
• Anticipate more degradation of the RNA in older FFPE tissues. However, RNA-Seq can still yield reliable gene expression data for degraded RNA. Due to FFPE having relatively heavily degraded RNA, we suggest rRNA depletion using RNase H and random priming of the reverse transcription reaction for preparing sequencing libraries from FFPE-RNA samples, over the use of poly(dT) for mRNA selection and conversion to cDNA • Assess potential contamination of the RNA as part of the standard post sequencing RNA-Seq QC protocols. Bacterial contamination and genomic DNA contamination are common for some of the FFPE samples independent of storage condition of sites and sample preparation procedures. Contamination can lead to misinterpretation of results, especially when working with other species (such as the microbiome). A recent article by Heintz-Buschart et al [27] showed contamination with non-human sequences could be traced to RNA extraction columns; we did not use a control RNA extraction (with no sample) to control for contamination introduced during the RNA purification, but this may be a good precaution to take for future studies. Highly contaminated samples that do not yield much mRNA from the original tissues should be excluded from further downstream data analysis.
• Gene expression data from samples that have low quality cDNA, few mapped mRNA reads, and generate low gene counts should be used with caution. In our study, expression levels of the 193 signature gene panel in the 28 SEER samples with low gene counts (Fig 7, Cluster D) did not correlate well with TCGA, suggesting the RNAseq data from low gene count samples is not of good quality for gene expression analyses.

Conclusions
Recent studies have shown remarkably high consistency between RNA-seq data generated from paired freshly frozen and FFPE tissue samples [6,8,9,18,20,23,24,26]. Our study provides additional evidence and suggestions for feasibly conducting gene expression analysis using RNA-Seq with FFPE tissues spanning a wide range of storage times. There is no denying that there are technical and quality limitations for FFPE RNA-Seq data. However, many of these issues can be overcome through thorough QC and thoughtful bioinformatics analyses.
Our study supports the notion that RNA-Seq on archived FFPE samples can be used as an unbiased and comprehensive assessment of gene expression in biomedical studies.