Whole-Genome Sequence of the Metastatic PC3 and LNCaP Human Prostate Cancer Cell Lines

The bone metastasis-derived PC3 and the lymph node metastasis-derived LNCaP prostate cancer cell lines are widely studied, having been described in thousands of publications over the last four decades. Here, we report short-read whole-genome sequencing (WGS) and de novo assembly of PC3 (ATCC CRL-1435) and LNCaP (clone FGC; ATCC CRL-1740) at ∼70 × coverage. A known homozygous mutation in TP53 and homozygous loss of PTEN were robustly identified in the PC3 cell line, whereas the LNCaP cell line exhibited a larger number of putative inactivating somatic point and indel mutations (and in particular a loss of stop codon events). This study also provides preliminary evidence that loss of one or both copies of the tumor suppressor Capicua (CIC) contributes to primary tumor relapse and metastatic progression, potentially offering a treatment target for castration-resistant prostate cancer (CRPC). Our work provides a resource for genetic, genomic, and biological studies employing two commonly-used prostate cancer cell lines.

library preparation was performed using a TruSeq Nano DNA kit (Illumina, San Diego, CA) with a target insert size of 350 bp. Pairedend libraries (150 bp) were sequenced using a HiSeqX sequencer (Illumina). Base calls were converted into FASTQ files using bcl2fastq v2.15.0 and provided to our laboratory.

Data processing
Raw reads (FASTQ) were trimmed using scythe v0.994 github.com/ vsbuffalo/scythe, with default settings, to remove low quality bases and read-pairs, and contaminating adapter sequences.
Resulting scaffolds were gap filled using "sga gapfill" and errorcorrected FASTQ reads. The genome assemblies (gapfilled scaffolds) were evaluated using QUAST v4.3 (Gurevich et al. 2013) and the human reference genome. Genes of interest were interrogated in the assembled genomes using BLAST, via a local instance of Sequence-Server v1.0.9 (Priyam et al. 2015), and GMAP v2016-11-07 (a genomic mapping and alignment program for mRNA and EST sequences) (Wu and Watanabe 2005).
Single nucleotide variant (SNV) and short indel calling: Samtools v1.3.1 mpileup and bcftools (Li et al. 2009) were used to interrogate indexed BAM files of reads aligned to the reference genome and generate a VCF (Variant Call Format) file of SNVs and short indel variants. Variants (likely to be common germline variants) present in HapMap (Gibbs et al. 2003), 1000 genomes phase 3 (2504 human genomes) (Sudmant et al. 2015), and the National Heart Lung and Blood Institutes Exome Sequencing Project (Tennessen et al. 2012) (bundled variant data file available at https://goo.gl/mEogvD) were excluded.
Copy number variation (CNV) calling: To screen PC3 and LNCaP genomes for CNV, we employed the R package "cn.mops" (Copy Number estimation by a Mixture Of PoissonS) (Klambauer et al. 2012). Briefly, paired-end genome reads from PC3 and LNCaP were aligned to the reference genome and compared with normal prostate reads to obtain genome-wide read-depth profiles. Custom R scripts were used to parse the output.
Gene expression potential analysis: We interrogated publicly available transcriptome data from PC3 (Wang et al. 2015) (NCBI GEO: GSE65112) and LNCaP cells (Metzger et al. 2016) (NCBI GEO: GSE64529). In addition, transcriptome data from normal prostate samples were obtained from TCGA (see above). Briefly, paired-end reads were trimmed using scythe, and aligned to human reference genome GRCh38 build 82 using the spliced-read mapper TopHat (v2.0.9) (Kim et al. 2013) and reference gene annotations to guide the alignment. Raw gene counts were computed from the generated BAM files by featureCounts v1.4.5-p1 (Liao et al. 2014), counting exon features of the gene annotation file (gtf) in order to include noncoding RNA genes. FeatureCounts output files were analyzed using the R programming language (v.3.2.2) (R Core Team 2013). Raw counts were normalized by Trimmed Mean of M-values (TMM) correction (Robinson and Oshlack 2010;. The expression of genes in normal prostate, LNCaP, and PC3 was assessed using the Universal exPression Code (UPC) method (Piccolo et al. 2013), available in the R package "SCAN.UPC". This method estimates the active/inactive state of genes in a sample, where a UPC value . 0.5 indicates that a gene is actively transcribed.
cBioPortal analysis: Data on copy number alterations in prostate cancer tumor tissue were obtained using the cBioPortal tool (www. cbioportal.org) (Cerami et al. 2012;Gao et al. 2013) with the following parameters: "GENE: HOMDEL HETLOSS;", where "GENE" denotes a gene symbol. Clinical information was also downloaded and the data further analyzed using custom R scripts.
Kaplan-Meier survival analysis: Kaplan-Meier survival analysis was performed to compare disease-free survival (DFS) in patient groups stratified by CNVs. DFS is defined as the time to either recurrence or relapse, second cancer, or death (Gill and Sargent 2006). In the context of prostate cancer, DFS is a suitable surrogate for overall survival (OS), given that metastatic disease is not curable and recurrence of disease would be expected to contribute significantly to mortality.
Gene ontology term enrichment analysis Gene enrichment analyses were performed using DAVID (Database for Annotation, Visualization and Integrated Discovery) (Huang et al. 2009a). All gene groups are potentially informative, despite lower rankings, and serve to guide biological interpretation (Huang et al. 2009b).

Data availability
The genome reads reported in this paper have been deposited in the BioProject database as PRJNA361315 (PC3) and PRJNA361316 (LNCaP). Code used to generate the data and CNV analysis output files (tabulated text files) are available at github.com/sciseim/PCaWGS. Genome assemblies (FASTA format) (Seim 2017a,b), and filtered and annotated single-nucleotide and indel variation data files (VCF format) (Seim 2017c), have been deposited at Zenodo. A BLAST server is available at http://ghrelinlab.org.

RESULTS AND DISCUSSION
WGS PC3 and LNCaP prostate cancer cells were obtained directly from ATCC, cultured for four passages, and 150 bp paired-end reads obtained using an Illumina HiSeqX sequencer. Following read trimming, 1.53 billion reads from PC3 were retained, of which 99.9% could be aligned to the Ensembl GRCh38.82 human reference genome at $71 · mean coverage ( Figure  1A). Similarly, we obtained 1.49 billion trimmed reads from LNCaP with a 99.9% alignment rate and mean coverage of $69 · ( Figure 1B).
We also performed de novo genome assembly to allow characterization of whole-gene loci by BLAST and other mappers. The final, gapfilled PC3 genome assembly consisted of 1.66 M scaffolds (largest scaffold 692.4 kb) with an N50 of 23.3 kb and an NG50 (number of sequences with lengths equal to or larger than N50) of 22.4 kb. The LNCaP assembly consisted of 1.70 M scaffolds (largest scaffold 536.0 kb) at an N50 value of 44.4 kb and NG50 of 45.0 kb.

Single-nucleotide and indel variation
After excluding common germline sequence variants (SNVs and short indels), filtering by SnpSift (Cingolani et al. 2012a), and annotation by SnpEff (Cingolani et al. 2012b), we identified in LNCaP 0.94 M and in PC3 0.56 M sequence variants (SNVs and short indels) that were private or unique to the particular cell line (Table 1). As expected, the majority of variants were found in noncoding regions.
In particular, we noted that LNCaP had a larger number of stop_gained events, which are changes predicted to confer nonsense mutations and result in nonfunctional proteins or proteins with reduced function (Table 1). In LNCaP, SNVs and indel variants contributed 378 stop_gained events in 209 genes. We next identified biological processes overrepresented in this gene set (Table 2). This included a C/T transition at amino acid position 318 of menin (MEN1) (c.T954A in NCBI RefSeq NM_000244). Somatic inactivating mutations of menin are found in endocrine cancers (Falchetti et al. 2009), suggesting that MEN1 is a tumor suppressor gene. However, it has recently been reported that MEN1 is an oncogene in prostate cancer. Menin interacts with the androgen receptor and patients with overexpression of MEN1 show poor OS (Malik et al. 2015). The MEN1 SNV is present in an LNCaP sample interrogated by whole-exome sequencing (Taylor et al. 2010). Therefore, it is not likely to be a sequencing or data processing artifact. The functional regions of menin are currently not known, thus, the effect of the LNCaP premature stop codon event cannot be predicted. Interrogation of eight cBioPortal data sets suggests that inactivating mutations in the coding sequence of MEN1 in prostate cancer is unique to LNCaP (data not shown); however, it is possible that distinct patient populations possess this variant (e.g., see Lindquist et al. (2016)).
In PC3, 58 stop_lost events (Table 1)  There was a significant enrichment for HLA class II antigen presenting genes associated with the immune response (Table 3; Fisher's exact P $ 0.05). It has recently been shown that the PC3, LNCaP, and DuPro (but not the DU145) prostate cancer cell lines and prostate cancer tissues express HLA class II molecules (Younger et al. 2008;Doonan and Haque 2015). However, we could not identify any prostate cancer patients with stop_lost events in these genes using the cBioPortal tool (Cerami et al. 2012;Gao et al. 2013) (data not shown). While evasion of the antitumour immune response is an emerging research area (Drake 2010;Corrales et al. 2016), caution should be exercised when considering the use of PC3 cells in these studies. Sequence variant analysis and interrogation of the PC3 de novo genome assembly by BLAST and GMAP confirmed that the tumor suppressor p53 (TP53) is inactivated by a single frameshift event (p.A138fs; indel; c. Ã 4955A in NCBI RefSeq NM_000546) (Carroll et al. 1993) (Figure 2A). PC3 shared 0.26 M sequence variants (166,912 SNVs and 89,919 indels) with LNCaP, and 21 of these constituted stop_lost events (Table  1). Overrepresented biological processes in PC3 and LNCaP included "O-glycan processing" (the mucins MUC3A and MUC6) and "extracellular matrix disassembly" (the trypsinogens PRSS1 and PRSS2) ( Table 2). Interestingly, while we have identified MUC3A stop_gained events in PC3 and LNCaP, cell lines generated from Caucasian patients, a recent study suggests that MUC3A protein-changing variants are rare in Caucasians and predominant in African Americans, the subpopulation with the highest prevalence of prostate cancer, where MUC3A changes are observed in 88% of patients (Lindquist et al. 2016).
Taken together, these data indicate that protein-coding genes in LNCaP are perturbed extensively by point and indel mutations. Even n Table 2 Significantly overrepresented biological processes associated with sequence variants contributing stop_gained events in the PC3 and LNCaP prostate cancer cell lines  Stop_gained events are denoted changes predicted to confer nonsense mutations and result in nonfunctional proteins or proteins with reduced function. Gene enrichment analysis was performed using DAVID (Database for Annotation, Visualization and Integrated Discovery). MHC, major histocompatibility complex; ER, endoplasmic reticulum; TAP, transporter associated with antigen processing.
n Table 3 Putative deleted genes and their expression in the LNCaP and PC3 prostate cancer cell lines after filtering steps, our LNCaP data (at passage four from the ATCC stock) reveal a clear difference in the number of particular variant events compared to PC3. However, previous exome sequencing work suggests that the genome of the parental LNCaP strain sequenced here (clone FGC) and its derived strains are inherently unstable (Spans et al. 2012(Spans et al. , 2014, and this could give rise to the apparently high mutation rate in protein-coding sequences. As with studies of the HeLa genome (Adey et al. 2013;Landry et al. 2013), further genome sequencing efforts are warranted to investigate whether the variants reported here are somatic mutations found in particular LNCaP strains, or if they represent preexisting subpopulations within the parental LNCaP strain. In the future, single-cell WGS is likely to resolve this issue. Nevertheless, LNCaP and PC3 appear to have distinct SNV and indel profiles.

Putative gene loss
Most human cancers have CNVs, which impact upon gene dosage through loss or gain of whole chromosomes or chromosome segments (Hanahan and Weinberg 2011). Previous studies have described CNVs in PC3 and LNCaP using targeted techniques, such as exome sequencing. However, WGS, together with continuously updated gene annotations, offers improved detection of copy number changes (Meynert et al. 2014;Belkadi et al. 2015;Warr et al. 2015). CNVs were identified using the R package cn.mops (Klambauer et al. 2012). In particular, we wished to identify genes that are lost in PC3 and LNCaP. The absence of this information can misinform even the most well-designed in vitro or cell line xenograft experiment (e.g., where a gene in an important pathway is lost). In the context of CNV analysis, we were interested in identifying putative homozygous deletions (CNV = 0; CNV0 events), i.e., genes that are inactivated by partial or complete gene deletion. To inform this analysis, we also considered the transcriptional potential of each gene by analyzing publicly available transcriptome (RNA-seq) data from normal prostate, LNCaP, and PC3. Genes with a UPC value of $0.5 were considered inactive (Piccolo et al. 2013).
Although a large number of SNVs and indel variations were observed in LNCaP, only a single homozygous deletion event (CNV0) was observed in this cell line. In the complex Prader-Willi gene locus there was a putative loss of PWRN1, a gene associated with epigenetic reprogramming during spermatogenesis (Wawrzik et al. 2009) (Table 3).
In contrast to LNCaP, 39 CNV0 events were found in PC3 (Table 3). CNV of the Y chromosome was evident from the PC3 sequence coverage ( Figure 1A). In agreement with previous studies employing cDNA microarrays (Clark et al. 2003) and multicolor fluorescence in situ hybridization (Aurich-Costa et al. 2001), our CNV analysis revealed that large regions of the Y chromosome (including eight genes) were deleted in PC3 (Table 3) Clinical observations and experimental studies indicate that the growth hormone receptor (GHR) mediates the development and progression of cancer (Brooks and Waters 2010), and GHR expression is elevated in prostate cancer cell lines and tissues (Chopin et al. 2002;Weiss-Messer et al. 2004). Interestingly, we noted that the genes encoding the classical growth hormone receptor signaling molecules STAT3 (STAT3) and STAT5 (STAT5A and STAT5B) were lost in PC3 cells. Thus, autocrine GHR actions are likely to be associated with alternative signaling pathways (Barclay et al. 2010) in PC3. Loss of STAT3 in PC3 has been firmly established experimentally (Yuan et al. 2005;Pencik et al. 2015), and there is evidence to suggest that STAT3 suppresses prostate cancer metastasis and confers a good prognosis (Pencik et al. 2015).
We identified a homozygous deletion event spanning four genes (CIC, PAFAH1B3, PRR19, and TMEM145) on chromosome 19 in PC3 ( Figure 3A). In LNCaP, a genome coverage plot of reads flanking this region revealed a putative heterozygous event (CNV1; loss of a single copy of the same genes) ( Figure 3B). Of these four genes, the mammalian homolog of Drosophila CIC (Jiménez et al. 2012) is particularly interesting. Capicua is a transcriptional repressor of cancer metastasis in a number of cancers (Choi et al. 2015;Okimoto et al. 2017). Recent WGS data also suggests that CIC is lost in PC3 cells (Iorio et al. 2016). Homozygous deletions of CIC have been reported in neuroblastoma (Nagaishi et al. 2014;Fransson et al. 2016), and a homozygous deletion of CIC in a subpopulation of H1975 human nonsmall cell lung cancer cell line xenografts rendered them highly metastatic (Okimoto et al. 2017). We interrogated 75 cBioPortal data sets from diverse tumors, confirming that one or two copies of CIC are lost in many cancer types (see Supplemental Material, Figure S1).
CIC is abundantly expressed in normal prostate tissue, whereas its expression is reduced in primary tumors and ablated in metastatic prostate cancer (Choi et al. 2015). To characterize the potential clinical significance of CIC deletions in prostate cancer, we further examined 1311 tumors from eight data sets using the cBioPortal tool. While homozygous deletion events of the four genes deleted in PC3 cells were rare, a substantial fraction of prostate tumors harbored heterozygous deletions of these genes ( Figure 3C). Approximately 6% of primary prostate tumors had heterozygous deletions and 2% had homozygous deletions of CIC, whereas 21% of metastatic tumors had homozygous CIC deletions and 2% heterozygous deletions ( Figure 3D).
Prostate cancer relapse or recurrence frequently results in incurable metastasis, ultimately causing patient death (Wu et al. 2014;Weiner et al. 2016). As CIC deletions were more frequent in metastatic tumors, we reasoned that deletion of one or both copies of CIC is a means by which primary tumors in patients that eventually develop metastatic lesions achieve increased fitness and survival. The association between CIC homozygous deletion events and DFS in primary tumors could not be reliably assessed due to the low number (n = 2) of patients with recorded relapses; however, patients with primary prostate tumors with one lost copy of CIC (heterozygous deletion events) had a significantly worse outcome (P = 0.018, log-rank test) ( Figure 3E). Similarly, OS is significantly worse in advanced-stage gastric cancer patients with low CIC expression (Okimoto et al. 2017).
A recent study comparing PC3 and LNCaP reported that the long form of the CIC protein (CIC-L) was not expressed and that the short form (CIC-S) was expressed at extremely low levels in PC3 cells (Choi . Our CNV analysis, employing WGS reads, interrogation of the de novo PC3 assembly using BLAST and GMAP, and analysis of RNA-seq reads mapped to the reference genome, failed to detect an intact CIC gene in PC3. We sequenced low-passage PC3 cells sourced directly from ATCC and speculate that the previous study (Choi et al. 2015) detected low-level gene expression by PC3 subpopulations with intact CIC resulting from genetic drift during prolonged subculture (passaging; see Festuccia et al. (1999); ).
Taken together, these data suggest that although a rare event in prostate tumors, homozygous deletion of CIC is not an idiosyncrasy of the PC3 cell line. Moreover, loss of a single gene copy of CIC is relatively common in prostate cancer. We speculate that disruption of one or both copies of CIC renders prostate cancer patients susceptible to an adverse disease outcome. A previous study employing forced overexpression of CIC in PC3 and LNCaP demonstrated that CIC is repressed by a trio of microRNAs (Choi et al. 2015). Altered MAPK signaling through the ERK pathway also suppresses endogenous CIC in lung cancer (Okimoto et al. 2017). Collectively, our data raise the possibility that the combination of microRNA repression, altered ERK signaling, and somatic events in the CIC locus promote tumorigenesis and confer a poor disease outcome.

Relevance of findings
In summary, we provide genome sequence data for PC3 and LNCaP, prostate cancer cell lines commonly employed in cancer research.
These data contribute to a catalog of cancer genomes, adding to recent whole-transcriptome sequencing, pharmacological profiling, and whole-exome sequencing efforts (Barretina et al. 2012;Klijn et al. 2015;Iorio et al. 2016) aimed at enhancing our understanding of human disease. For example, the phenomenon of androgen independence in prostate cancer has intrigued scientists for decades. Of the two cell lines interrogated in our study, PC3 is androgen-independent, whereas the LNCaP strain sequenced (LNCaP-FGC) is androgen-dependent. Recent work, including an investigation of 150 patients with metastatic CRPC (Robinson et al. 2015), suggests that anomalies (mutations, amplifications, and deletions) in a number of genes in the androgen receptor pathway play a role in the transition to androgen independence. We speculate that future work-employing WGS, RNA-sequencing, epigenetic profiling, and similar high-throughput methods-on a large number of cell lines and clinical samples is likely to identify genes critical for androgen independence. For instance, an androgen-independent strain of LNCaP (LNCaP-LNO) has been developed from cultures of an early passage of the LNCaP cells sequenced in our study (LNCaP-FGC) (van Steenbrugge et al. 1991). LNCaP-LNO and LNCaP-FGC were compared at the gene expression level (Oosterhoff et al. 2005); hinting that specific gene mutations or copy number events render LNCaP-LNO cells androgen-insensitive.
Raw reads (see Data availability in Materials and Methods) and sequence (SNV and indel) and CNV data are made available. We have generated de novo genome assemblies of both cell lines, allowing genes of interest to be investigated further, enabling, for example, the validation of gene loci associated with novel transcripts obtained from Trinity de novo transcriptome analysis (Grabherr et al. 2011;Haas et al. 2013). In addition, the genomes can be interrogated using a BLAST server, available at http://ghrelinlab.org. We acknowledge the limitations of short-insert (350 bp) genome sequencing, particularly when resolving complex repetitive or heterozygous regions (Rhoads and Au 2015;Merker et al. 2016). However, we anticipate that as sequencing becomes increasingly affordable, our sequencing efforts will complement future long-read genome assembly work and prove useful when correcting for errors (sequence polishing).
Finally, we reveal that one or both copies of CIC, a tumor metastasis suppressor gene, are frequently lost in prostate cancer and could drive metastatic CRPC. We anticipate that further biological insights into the role of Capicua in prostate cancer will shortly be gained by the research community, in line with the ethos of G3: Genes, Genomes, Genetics Genome Reports.