Profiling miRNAs in nasopharyngeal carcinoma FFPE tissue by microarray and Next Generation Sequencing

Nasopharyngeal carcinoma (NPC) is a non-lymphomatous, squamous-cell carcinoma that occurs in the epithelial lining of the nasopharynx. Nasopharyngeal carcinoma has a geographically well-defined distribution worldwide, with the highest prevalence in China, Southeast Asia, and Northern Africa. Symptoms of nascent NPC may be unapparent or trivial, with diagnosis based on the histopathology of biopsied tissue following endoscopy of the nasopharynx. The tumor node metastasis (TNM) staging system is the benchmark for the prognosis of NPC and guides treatment strategy. However, there is a consensus that the TNM system is not sufficiently specific for the prognosis of NPC, as it does not reflect the biological heterogeneity of this tumor, making another biomarker for the detection of NPC a priority. We have previously reported on different approaches for microRNA (miRNA) biomarker discovery for Formalin Fixed Paraffin Embedded (FFPE) NPC tissue samples by both a targeted (microarray) and an untargeted (small RNA-Seq) discovery platform. Both miRNA discovery platforms produced similar results, narrowing the miRNA signature to 1–5% of the known mature human miRNAs, with untargeted (small RNA-Seq approach) having the advantage of indicating “unknown” miRNAs associated with NPC. Both miRNA profiles strongly associated with NPC, providing two potential discovery platforms for biomarker signatures for NPC. Herein, we provide a detailed description of the methods that we used to interrogate FFPE samples to discover biomarkers for NPC.


Contents lists available at ScienceDirect
Genomics Data j o u r n a l h o m e p a g e : h t t p : / / w w w . j o u r n a l s . e l s e v i e r . c o m / g e n o m i c s -d a t a / were obtained from the biological repository in the Department of Pathology of The George Washington University Hospital, Washington, DC. Tissue sections from FFPE were reviewed by two independent pathologists (E.M. and S.E.) to confirm the diagnosis as shown in [1]. FFPE preparation, hematoxylin and eosin (H&E) staining, and representative images have also been previously reported [1]. It should also be noted that the SRA project submission contains four additional samples (Accession: SRX345915, SRX345913, SRX345913 and SRX345909). These samples reference a survey of serum pools from NPC positive and control individuals discussed in [1] but not further referenced herein.

RNA isolation
Total RNA was isolated from 2 × 10 μm sections from each FFPE case using the miRNeasy FFPE kit (Qiagen) [1]. RNA concentration, purity, and integrity (RIN) were determined by spectrophotometry (Nanodrop 1000) and the Agilent 2100 Bioanalyzer using the Agilent RNA 6000 Nano and small RNA kits. Purified RNA was stored at b−50°C.
Yields of total RNA derived from FFPE were approximately 100 ng/μm with 260/280 and 260/230 ratios of~2.0 and~1.9, respectively. Analysis on the Agilent Bioanalyzer indicated that the samples were enriched for small RNA species with integrity (RNA Integrity Number or RIN) values of two to three. Though typically indicative of RNA degradation, the robustness of miRNAs in these FFPE tissue [2] and reports from other groups [3] that RIN values have negligible effect on miRNA results enabled us to consider this purified RNA suitable for further analysis by microarray and RNA sequencing.

Microarray, data normalization and analysis
All eight samples underwent analysis via microarray (Table 1). Total RNA isolated from each FFPE case was labeled and hybridized to an Agilent human miRNA microarray (miRBase Release 16.0) and scanned [1]. The intensities of each sample were transferred to digital data and log 2 transformed using Agilent Feature Extraction (V.10.7). Raw data files in text (.txt) format were analyzed with Agilent GeneSpring software (GX 12.6) [4]. A total of 1205 human and 144 human viral microRNAs were used from miRBase v16.0.
To analyze the differentially expressed miRNAs, quantile normalization was performed to standardize these data across the samples. Raw data (thresholded and log base 2 transformed) were filtered by expression values (20.0-336133.0) with at least two out of the eight samples having values within the cut-off range to remove very low signal values and background influence. The four tumor samples were grouped and analyzed against the four control samples by unpaired Student's t-test with a p-value cut-off of 0.05 (p-value obtained by Asymptotic analysis) and a fold-change cut-off of 2.0. Hierarchical clustering was then performed [1] using the Euclidean distance metric and Centroid linkage rule. We identified 35 significantly dysregulated miRNAs, including four Epstein-Barr Virus (EBV) miRNAs and 31 human miRNAs (13 downregulated and 18 up-regulated) [1]. These analyses were conducted again for this manuscript to verify their reproducibility. In addition, the miRNA signatures were compared to the recently released miRBase (v 19.0) with its up-date the miRNA nomenclature ( Table 2) than in the original publication of these data, which used miRBase (v 16.0) [1].
Significance analysis was completed using GeneSpring [4] as detailed below: 1) A new project was created, followed by a new experiment, and miRNA was selected for analysis type, followed by the data import wizard for workflow type. 2) In New miRNA Experiment Steps, the raw intensity files were uploaded. The selected technology was set to 31181_v16_0 and no baseline transformation was performed. The threshold raw signals were set to 1.0 and quantile was chosen as the normalization algorithm along. 3) In the Experiment Setup, the samples were grouped into four tumor and four control cases under the Experiment Grouping option. While further interpretations may be created depending on analysis requirements, in this case experimental parameters "tumor/control" (categorical) were set up. The condition tumor and control were selected and Non-Averaged for the Average Over Replicates in Conditions. Detected and Not Detected were selected and Compromised in Use Measurements Flagged. Table 1 List of the raw data files deposited to NCBI GEO and SRA with accession numbers. Further details on the FFPE sample set in [1] with histological type, TNM staging [9] and WHO classification [10]. Denotes those from the same patient (i.e. paired NPC/Control tissue samples).

4) Quality control:
The correlation coefficient value of all samples was N 0.7 and therefore all the samples were used in further analysis. Further, 3D Principle Components Analysis (PCA) scores and plotting were used to determine any association among the samples (Fig. 1). It was noted that paired samples did not exhibit more significant clustering than non-paired (NPC/Control tissue) in the analysis ( Fig. 1 and Hierarchical clustering [1]). In Filter by Expression, the right entity and interpretation were selected and filtered by raw data value. The lower cut-off value of the interest range was set to 20 and at least two out of eight samples had values within this range. 5) In Analysis, the condition was set as tumor versus control, tested by t-test unpaired, and an asymptotic p-value was computed without correction. The fold change cut-off was N2.0 and analyzed under pairs of conditions with tumor compared to control. Hierarchical clustering analysis of differentially expressed genes from all samples was conducted on both entities and the conditions by normalized intensity values using Euclidean distance metric and Centroid linkage rule.

Small RNA sequencing
Small RNA sequencing was performed on five of the same samples used in microarray analysis (Table 1) of the three samples used in the previous analysis (control samples 341E and 11311E and tumor sample 341B) omitted due to the exhaustion of total RNA purified from the small tissue areas available for the study. Total RNA derived from the FFPE was subjected to Ribo-Zero Pretreatment using Ribo-Zero rRNA Removal Kit (Epicentre) as described by the manufacturer and in [1]. Library preparation and sequencing have been described in further details in [1]. Briefly, Illumina libraries were constructed from 1 μg of total RNA using the TruSeq Small RNA Sample Kit (Illumina). Libraries were subjected to quality control prior to sequencing using an Agilent 2100 BioAnalyzer and concentration determination using PicoGreen (Invitrogen). The Illumina Genome Analyzer IIx was used to perform the sequencing by Expression Analysis, A Quintiles Company (Durham, NC). Table 2 Microarray miRNA expression analysis between tumor and control NPC FFPE tissue using unpaired Student's t-test (p-value b0.05 and fold-change N2.0). In this repeated analysis by GeneSpring updated nomenclature found in miRBase v19.0 was utilized to update the sample set found in [1]. Thirty-five miRNAs were dysregulated comprising four EBV specific miRNAs. Sequencing processing: alignment, mapping and annotation Initial processing was performed using both FastqMcf and FastQC both of which can be accessed at http://code.google.com/p/ea-utils/ wiki/FastqMcf and http://www.bioinformatics.babraham.ac.uk/projects/ fastqc.
After adaptor removal and quality filtering,~28 million reads were aligned to the human (UCSC hg19) and Human herpes virus 4 (Epstein-Barr virus or EBV) genome (NCBI NC_007605.1) and miRNA counts generated for each sample [1]. Both miRDeep 2.0.0.5 [5] and miRExpress 2.0 [6] were used to generate counts, and each provided comparable results, with over 50% of the reads mapping to miRNAs in either the human or EBV genomes ( Table 3). Identification of known miRNAs was based on miRBase Release 19 [7], with an alignment identity of 1%, a tolerance range of 4, and a similarity threshold of 0.8 [1]. In total, using miRDeep and miRExpress, 984 and 847 human and EBV miRNAs were identified, respectively, with a count per million greater than one in at least two of the samples.
Using EdgeR [8], a binomial distribution was used to compare the independent analyses from miRDeep and miRExpress [1]. The biological coefficient of variation (BCV) was used to estimate the variability across the dataset and plotted via the plotBCV function ( Fig. 2A), with a common dispersion of 67% indicating a relatively high dispersion of gene expression levels. Given that this was an observational study on independent NPC cases using NPC tumors of different histological grades, such a value would not be considered atypical. Using the function plotsmear in EdgeR, log-fold changes were plotted against log-cpm (Fig. 2B). Using EdgeR, 99 dysregulated miRNAs were identified in NPC tumor tissue versus control tissue samples.   Top common human miRNAs illustrated [11] as detected in corresponding independent analyses from both microarray and RNA-Seq. A total of eight common miRNAs were highlighted across both methods under the statistical cut-offs previously described [1].