The long non-coding RNA GHSROS reprograms prostate cancer cell lines toward a more aggressive phenotype

It is now appreciated that long non-coding RNAs (lncRNAs) are important players in orchestrating cancer progression. In this study we characterized GHSROS, a human lncRNA gene on the opposite DNA strand (antisense) to the ghrelin receptor gene, in prostate cancer. The lncRNA was upregulated by prostate tumors from different clinical datasets. Transcriptome data revealed that GHSROS alters the expression of cancer-associated genes. Functional analyses in vitro showed that GHSROS mediates tumor growth, migration and survival, and resistance to the cytotoxic drug docetaxel. Increased cellular proliferation of GHSROS-overexpressing PC3, DU145, and LNCaP prostate cancer cell lines in vitro was recapitulated in a subcutaneous xenograft model. Conversely, in vitro antisense oligonucleotide inhibition of the lncRNA reciprocally regulated cell growth and migration, and gene expression. Notably, GHSROS modulates the expression of PPP2R2C, the loss of which may drive androgen receptor pathway-independent prostate tumor progression in a subset of prostate cancers. Collectively, our findings suggest that GHSROS can reprogram prostate cancer cells toward a more aggressive phenotype and that this lncRNA may represent a potential therapeutic target.

This revealed a probe set, 2652604, consisting of 4 probes complementary to GHSROS.
Cell and tissue exon array data were downloaded from NCBI GEO 141 , EBI ArrayExpress 142 and the Affymetrix web site (see Supplementary Table S10). GEO datasets were bulk-downloaded using v3.6.2.117442 of the Aspera Connect Linux software (Aspera, Emeryville, CA, USA). In total, 3,924 samples were downloaded, corresponding to ~46% of all exon array data deposited in the NCBI GEO database. Arrays (individual CEL files) were normalized (output on a log2 scale, centered at 0) using the SCAN function in the R package 'SCAN.UPC' 143,144 . SCAN normalizes each array (sample) individually by removing background noise (probe-and array-specific) data from within the array. Next, arrays were interrogated using the UPC function in 'SCAN.UPC'. UPC outputs standardized expression values (UPC value), ranging from 0 to 1, which indicate whether a gene is actively transcribed in a sample of interest: higher values indicate that a gene is 'active' 143 . UPC scores are platform-independent and allow cross-experimental and cross-platform integration.

Evaluation of GHSR/GHSROS transcription in deep RNA-seq dataset
It has been estimated that reliable detection of low abundance transcripts in humans warrants very deep sequencing (> 200 million reads per sample 145 ) -far beyond most current datasets. To illustrate, we considered the expression of GHSR/GHSROS in a comparable clinical dataset. Publicly available RNA-seq data (NCBI GEO accession no. GSE31528) from eight subjects with metastatic castration-resistant prostate cancer (bone marrow metastases) 146 were interrogated. Briefly, total RNA-seq was performed on random-primed paired end read libraries, to ensure consistent transcript coverage 147,148,149 , generating an average of 160M reads per sample. Paired-end FASTQ files were aligned to the human genome (UCSC build hg19) using the spliced-read mapper TopHat (v2.0.9) 148 and reference gene annotations to guide the alignment. BigWig sequencing tracks for the UCSC genome browser 150 were obtained from TopHat-generated BAM files (indexed by samtools v1.2 151 ) using a local instance of the bamCoverage command in deepTools v2.5.4 152 . BigWig files were visualized in the UCSC genome browser (hg19). A region with less than ~10 supporting reads can be considered to have low coverage, rendering active transcription difficult to interpret 152,153 .

RNA secondary structure prediction
The ViennaRNA web server was employed 154 to predict the secondary structure of GHSROS and its minimum free energy 155 .
RNA sequencing of PC3-GHSROS cells RNA was extracted from in vitro cultured PC3-GHSROS cells and controls, as outlined in the manuscript body. RNA purity was analysed using an Agilent 2100 Bioanalyzer, and RNA with an RNA Integrity Number (RIN) above 7 used for RNA-seq. Strandspecific RNA-sequencing (RNA-seq) was performed by Macrogen, South Korea. A TruSeq stranded mRNA library (Illumina) was constructed and RNA sequencing performed (50 million reads) on a HiSeq 2000 instrument (Illumina) with 100bp paired end reads. Pre-processing of raw FASTQ reads, including elimination of contamination adapters, was performed with scythe v0.994 (https://github.com/vsbuffalo/scythe). Paired-end human FASTQ files were aligned to the human genome, UCSC build hg19 using the spliced-read mapper TopHat (v2.0.9) 148 and reference gene annotations to guide the alignment.
Raw gene counts were computed from TopHat-generated BAM files using featureCounts v1.4.5-p1 156 , counting coding sequence (CDS) features of the UCSC hg19 gene annotation file (gtf). FeatureCounts output files were analysed using the R programming language (v.3.2.2). Briefly, raw counts were normalized by Trimmed Mean of M-values (TMM) correction 157,158 . Library size-normalized read counts (per million; CPM) were subjected to the voom function (variance modelling at the observation-level) in limma v3.22.1 (Linear Models for Microarray Data) 159,160 , with trend=TRUE for the eBayes function and correction for multiple testing (Benjamini-Hochberg false discovery rate of cut-off, Q-value, set at 0.05). Genes with at least a 1.5 log2 fold-change difference in expression between PC3-GHSROS and PC3-vector (empty vector) cells were defined as differentially expressed. Although validation is not required, as RNA-seq gives very accurate measurements of relative expression across a broad dynamic range 160 , selected differentially regulated genes were validated using quantitative reverse-transcription PCR (qRT-PCR) (see manuscript body and table S11). Detailed gene annotations were obtained by querying Ensembl with the R/Bioconductor package 'biomaRt' 161 .

RNA sequencing of LNCaP-GHSROS cells
A scaled heat map (unsupervised hierarchical clustering by Euclidean distance) was generated in R using heatmap.3 (available at https://goo.gl/Yd9aTY) and a custom R script. .