Targeted deep sequencing of urothelial bladder cancers and associated urinary DNA: a 23‐gene panel with utility for non‐invasive diagnosis and risk stratification

Objectives To develop a focused panel of somatic mutations (SMs) present in the majority of urothelial bladder cancers (UBCs), to investigate the diagnostic and prognostic utility of this panel, and to compare the identification of SMs in urinary cell‐pellet (cp)DNA and cell‐free (cf)DNA as part of the development of a non‐invasive clinical assay. Patients and Methods A panel of SMs was validated by targeted deep‐sequencing of tumour DNA from 956 patients with UBC. In addition, amplicon and capture‐based targeted sequencing measured mutant allele frequencies (MAFs) of SMs in 314 urine cpDNAs and 153 urine cfDNAs. The association of SMs with grade, stage, and clinical outcomes was investigated by univariate and multivariate Cox models. Concordance between SMs detected in tumour tissue and cpDNA and cfDNA was assessed. Results The panel comprised SMs in 23 genes: TERT (promoter), FGFR3,PIK3CA,TP53,ERCC2,RHOB,ERBB2,HRAS,RXRA,ELF3,CDKN1A,KRAS,KDM6A,AKT1,FBXW7,ERBB3,SF3B1,CTNNB1,BRAF, C3orf70,CREBBP,CDKN2A, and NRAS; 93.5–98.3% of UBCs of all grades and stages harboured ≥1 SM (mean: 2.5 SMs/tumour). RAS mutations were associated with better overall survival (P = 0.04). Mutations in RXRA, RHOB and TERT (promoter) were associated with shorter time to recurrence (P < 0.05). MAFs in urinary cfDNA and cpDNA were highly correlated; using a capture‐based approach, >94% of tumour SMs were detected in both cpDNA and cfDNA. Conclusions SMs are reliably detected in urinary cpDNA and cfDNA. The technical capability to identify very low MAFs is essential to reliably detect UBC, regardless of the use of cpDNA or cfDNA. This 23‐gene panel shows promise for the non‐invasive diagnosis and risk stratification of UBC.


Objectives
To develop a focused panel of somatic mutations (SMs) present in the majority of urothelial bladder cancers (UBCs), to investigate the diagnostic and prognostic utility of this panel, and to compare the identification of SMs in urinary cell-pellet (cp)DNA and cell-free (cf)DNA as part of the development of a non-invasive clinical assay.

Patients and Methods
A panel of SMs was validated by targeted deep-sequencing of tumour DNA from 956 patients with UBC. In addition, amplicon and capture-based targeted sequencing measured mutant allele frequencies (MAFs) of SMs in 314 urine cpDNAs and 153 urine cfDNAs. The association of SMs with grade, stage, and clinical outcomes was investigated by univariate and multivariate Cox models. Concordance between SMs detected in tumour tissue and cpDNA and cfDNA was assessed.

Introduction
Despite intensive research into biomarkers for the noninvasive diagnosis of urothelial bladder cancer (UBC), the mainstay of detection remains flexible cystoscopy. Commercial urine tests exist; however, none have been widely accepted into routine clinical practice due to poor performance and/or poor evidence [1][2][3]. Many tests are based on levels of proteins or RNA and, as these are not unique to UBC or causally linked to the disease, they tend to lack specificity and are often not detectably elevated in small or low-grade tumours [4]. The ideal non-invasive test should detect all UBCs whilst not generating false-positive results from non-malignant urological conditions. DNA-based biomarkers (methylation, single nucleotide variants, and copy number variants) can be detected in urinary DNA and could be used for the non-invasive detection and characterisation of UBC [5]. Deep sequencing has enabled both the large-scale identification of somatic mutations (SMs) in UBC [6] and the sensitive detection of SMs in urinary DNA [7][8][9][10][11]. However, whole genome sequencing at sufficient depth to detect SMs at low mutant allele frequencies (MAFs) remains expensive; thus, to make a test affordable and interpretable, targeted sequencing of the minimum number of SMs that provide sufficient information is desirable. With optimisation of biomarkers and sample processing, highly sensitive and specific tests could be developed. Notwithstanding, most urine DNA-based studies have utilised DNA extracted from the cell pellets of centrifuged urine (cpDNA) [7,12,13]; however, several studies have reported that cell-free DNA (cfDNA) from supernatants of centrifuged urine better represents the genomic changes in UBC [14][15][16].
The primary objective of the present study was to develop a focused panel of SMs present in the majority of UBCs. Our secondary objectives were to investigate the prognostic utility of this panel and to compare the identification of these SMs in urinary cpDNA and cfDNA as a stepping-stone to the development of a non-invasive diagnostic and prognostic clinical assay. We used a combination of publicly available data and in-house exome sequencing to select candidate SMs for inclusion; many of the SMs are directly involved in UBC pathogenesis [6]. This panel of SMs in 23 genes was validated by amplicon deep-sequencing of primary UBCs from 956 patients. We subsequently used deep-sequencing to identify the tumour tissue SMs in matched urine samples comprising 314 urine cpDNAs and 153 urine cfDNAs. Amplicon sequencing and a capture-based approach were compared for SM detection in urinary DNAs.

Patients and Samples
Biospecimens were collected as part of the Bladder Cancer Prognosis Programme (BCPP, ethics approval 06/MRE04/65). Patients were recruited consecutively from 2005 to 2011 from 10 hospitals in the West Midlands (UK) and gave informed consent for enrolment based upon initial cystoscopic findings suggestive of primary UBC. All patients were newly diagnosed and treatment-na€ ıve at biospecimen collection, and were subsequently treated and monitored according to contemporary European Association of Urology (EAU) guidelines (including re-resection where indicated) and EAU risk groups (for non-muscle-invasive bladder cancer [NMIBC]). Inclusion and exclusion criteria are detailed elsewhere [17]. Where necessary, tumour grade and stage records were amended according to results of early reresection or cystectomy. We used the 1973 grade classification as it was in universal use in the UK at the time of patient recruitment, is the basis for the European Organisation for the Research and Treatment of Cancer (EORTC) and EAU NMIBC risk tables [18], and has comparable utility to the 2004/2016 classification [19]. For quality assurance, 10% of diagnostic formalin-fixed paraffinembedded tumour samples were retrieved from local histopathology departments and underwent expert pathological review. All included tumours were purely or predominantly TCCs.
Urine (30-50 mL) was placed on ice, centrifuged within 8 h (600 x g for 10 min), and supernatant and pellet stored at À80°C. Tissues were collected at transurethral resection of bladder tumour (TURBT), snap-frozen, and stored at À80°C. DNA was extracted from tissues (25 mg) and blood (100 lL) using DNeasy Blood and Tissue kits (Qiagen, Hilden, Germany). DNA was extracted from urine pellets and supernatants (10 mL) using Quick-DNA Urine kits (Zymo Research, Irvine, CA, USA). DNA concentrations were determined fluorimetrically (Qubit; Thermo Fisher Scientific Inc., Waltham, MA, USA). We analysed: tumour DNA from 956 patients (along with 402 matched blood samples to discriminate between mutations and polymorphisms), urine cpDNA from 314 of these 956 patients, and paired urine cfDNA from 261 of these 314 patients where >10 mL urine supernatant was available; Fig. 1.

Library Preparation and Sequencing
Amplicon libraries were prepared by multiplex-PCR: primers were divided between two 30-cycle target-specific PCRs using 5 ng DNA for each and KAPA robust polymerase. The PCR products were combined and barcoded in a 15-cycle PCR using Phusion high-fidelity polymerase [8]. Up to 384 barcoded libraries were sequenced (2 9 150 bp) on a NextSeq mid-output flow-cell to a mean read depth of 50009.
Capture-based libraries incorporating unique molecular identifiers (UMIs) were prepared according to the manufacturer's protocol using 20 ng DNA (Cell3 TM Target; Nonacus Ltd, Birmingham, UK) and sequenced as above to a mean consensus read depth of 22009. Briefly, DNA was enzymatically fragmented, end-repaired and A-tailed, followed by ligation of adapters containing UMIs and incorporation of sample barcodes by PCR. Libraries were pooled and  hybridised to biotinylated probes overnight, followed by bead capture, amplification, and sequencing. A detailed workflow is available at nonacus.com.

Bioinformatics and Data Analysis
Amplicon sequencing reads were aligned to the human genome (Hg19) using bowtie, and reference and nonreference read depths extracted using bamreadcount. Only Q >30 base-calls were considered, and variant detection was based on the non-reference reads >2.5% of the total read depth and a minimum of 10 non-reference reads, as described previously [8]. All mutations included in the 23gene panel had to meet the criteria of ≥10% MAF in ≥1 tumour and <2.5% in germline DNA. We used Sanger sequencing to confirm 50 such mutation calls, with 100% accuracy. With the exception of the well-known TERT promotor mutations, only mutations classified as moderate or high impact by variant effect predictor [20] were considered. Reads from the capture-based libraries were aligned using Burrows-Wheeler Alignment tool (BWA), and UMI sequences were extracted as part of the i7 index read and used to annotate the aligned reads on a per original molecule basis. Using a proprietary bioinformatic pipeline, consensus reads were built where at least two reads contained the same UMI sequence and had identical genomic start and stop coordinates. Variant calls required a minimum of four supporting consensus reads.

Prognostic Utility of Frequently Mutated Genes
Kaplan-Meier curves were constructed to investigate the effect of mutated genes on outcomes (disease-specific survival, overall survival, and, where appropriate, progression-free and recurrence-free intervals). Hazard ratios (HRs) and P values presented with Kaplan-Meier curves were obtained by fitting univariate Cox models to the respective datasets. To account for confounding, base models including key influential factors were developed for each population and the relevant genes, then individually included in this model. If ≥2 genes were found to be significant (P < 0.1) in a population for a specific outcome when included with the base model, further Cox models were constructed. These included every appropriate pair of genes in addition to the base model. Conditions were applied to the genes that were evaluable and to the outcomes suitable for modelling. More details are given in Appendix S1 (Supplementary data).

Prognostic Utility of Frequently Mutated Genes
Across the entire cohort, TERT, FGFR3, TP53 and RAS were significantly associated with overall and disease-specific survival (Fig. 3). RAS mutations remained significantly associated with better overall survival when adjusting for EAU risk factors (HR 0.60, 95% CI 0.37-0.97; P = 0.04). There were insufficient events to adjust by EAU risk factors for disease-specific survival.
The influence of mutated genes on time to recurrence and overall survival was investigated in patients with NMIBC (there were too few events to consider progression and disease-specific survival). Mutations in RXRA, RHOB and the TERT promoter were associated with shorter time to recurrence (P < 0.05; Fig. 4), and remained significant after adjusting for gender and EAU risk group. RAS mutations were significantly associated with better overall survival after adjusting for gender and EAU risk group (P < 0.01).
We also analysed the influence of SMs on time to progression and disease-specific survival in patients with high-risk NMIBC; the association between FGFR3 mutations and longer time to progression approached significance (HR 0.35, 95% CI 0.12-1.05; P = 0.06). None of the genes were significantly associated with disease-specific survival; although survival curves for RAS mutant and wildtype in high-risk NMIBC patients diverged, there were too few events to calculate statistical significance by Cox model (Appendix S1. Supplementary data).

DNA Yield from Urine Pellets and Urine Supernatants
In 261 paired urinary cfDNAs and cpDNAs, the median cfDNA yield was 4.5 ng/mL of urine compared with 52 ng/ mL for cpDNA. Using a minimum DNA input of 10 ng for amplicon sequencing enabled 74% of urine supernatants to be utilised, compared with >90% of pellets. Across the 261 urine samples there was no correlation between supernatant and pellet DNA yields (Fig. 5).

Detection of SMs in Paired Tumour and Urine cfDNA
Amplicon sequencing was used to analyse cfDNA from 153 patients with tumour SMs, cpDNA data, and >10 ng cfDNA. Of 437 SMs identified in tumour DNA, 353 were detected in urinary cfDNA (80.7%), and ≥1 SM was found in 128 cfDNAs (83.8%). This compares favourably with the detection of 326 SMs in the corresponding cpDNAs (74.6%), and the detection of ≥1 SM in 118 cpDNAs (77.3%). The allele frequencies of mutations detected in 153 paired cpDNAs and cfDNAs were positively correlated (r s = 0.86; Fig. 6), with a median MAF of 24.5% in cfDNA vs 18.9% in cpDNA (P < 0.001). The median MAF in Grade 1, 2 and 3 disease was 2.2%, 26.1% and 36.7% for cfDNA and 3.2%, 20.4% and 29.8% for cpDNA, respectively. The proportions of mutations identified in individual genes in each type of DNA (tumour tissue DNA, urinary cpDNA, and urinary cfDNA) are shown in Fig. 7.

Capture-Based cpDNA and cfDNA Analysis
In paired cpDNAs and cfDNAs from 45 patients, SMs were detected by a capture-based method (whereby consensus read building removes PCR and sequencing errors permitting detection of MAFs >10-fold lower than standard amplicon sequencing [21]). All 45 pairs of samples were from patients with SMs identified in tumour tissue; for 30 patients, SMs were not detected in cpDNA by amplicon sequencing ('falsenegatives') and for 15 patients they were ('true positives'). All expected tumour SMs were detected in the true positive cpDNAs and corresponding cfDNAs; MAFs from the amplicon and capture-based methods were closely aligned (Fig. 8a) confirming the strong correlation between cpDNA and cfDNA MAFs (Fig. 8b).  Discussion DNA-based urinary biomarkers have emerged as the frontrunners for the non-invasive detection of UBC. The ideal DNA-based non-invasive diagnostic test for bladder cancer would utilise the minimal amount of sequencing to obtain optimal sensitivity across all grades and stages of disease, whilst utilising a DNA substrate abundant in the majority of urine samples. In the present study, we describe such a test, identifying 451 SMs in 23 genes that, overall, were present in 96% of UBCs. Many commonly mutated large tumour suppressor genes (e.g. KDM6A, KMT2D) with SMs widely distributed across the gene were unsuitable for inclusion. Notwithstanding, this panel demonstrates potential for noninvasive detection of UBC via urinary DNA.
The distribution of common SMs across stages and grades of UBC in this cohort is consistent with previous data [22,23]. Also consistent with the literature [24][25][26], we found that TP53, FGFR3 and TERT promoter mutations are predictive of survival in univariate analyses, but are not significant in multivariate analyses adjusting for accepted risk factors [1]. RAS mutations are associated with improved survival and remain so after adjusting for EAU risk factors. As RAS mutations are known activators of a known oncogene, it is unlikely that they are beneficial per se and more likely that they co-occur with favourable events or are mutually exclusive with unfavourable events. We caution that RAS mutations have not been reported as prognostic in UBC in other large datasets [6,23]. There have been contradictory reports as to whether PIK3CA mutations are prognostic [23,27,28], but we demonstrate no relationship. Additionally, we find that RXRA, RHOB and TERT mutations are all associated with decreased recurrence-free interval in NMIBC.
We have shown that 71% of SMs harboured by UBCs can be detected in corresponding urine cpDNA by amplicon sequencing (2.5% MAF threshold) resulting in the detection of 76% of mutation-positive tumours. Capture-based analysis of cpDNAs confirmed that the SMs detected at >2.5% MAF by amplicon sequencing were genuine, and that decreasing the limit of detection to 0.2% MAF increases the number of SMs detected. If we had applied the capture-based approach to all cpDNAs we hypothesise that up to 94% of all SMs could have been detected, potentially identifying 95% of mutation-positive tumours.
Using amplicon sequencing, tumour SMs were detected in 78% of cfDNAs, and cfDNA and cpDNA MAFs were correlated, as previously demonstrated [29]. There was a small (5%) but significant (P < 0.001) increase in average MAF in cfDNA relative to cpDNA. We verified these data using a capture-based approach with improved analytical sensitivity; this method is less error-prone, extremely sensitive (due to UMIs and consensus reads), and quantitative (sequencing reads can be mapped back to individual DNA molecules). Using this method we found that all tumour SMs that can be detected at >2.5% MAF in cpDNA were also detected in cfDNA, and that 80% of the SMs missed in cpDNA at >2.5% MAF can be detected in cfDNA at >0.2% MAF. We hypothesise that if the capture-based method had been applied to all cfDNAs then up to 95% of all SMs could have been detected, potentially identifying 97% of mutationpositive tumours.
Our present data also show that sequencing selected regions of 10s of genes (rather than 100s of genes) could provide the basis for a non-invasive diagnostic test for UBC, with high sensitivity for all grades and stages of disease. The majority of false negative urine samples were due to undetectably low MAFs in cpDNA and cfDNA, and not due to the absence of mutations in the tumour. Thus, the technical ability to identify very low MAFs should be a key component of any such test.
Other workers have also utilised cpDNA and targeted deep sequencing for the identification of genomic alterations in urine samples from patients with UBC [7,9,30]; however, few studies have directly compared cpDNA and cfDNA by targeted deep sequencing in this setting [11,31]. Although cpDNA is conventionally utilised for urinary biomarker studies (principally due to higher yields than cfDNA), we have shown that SM detection in urinary cfDNA works as well as (or marginally better than) SM detection in cpDNA.
Notwithstanding, >25% of urine supernatants yielded <1 ng/ mL DNA and were unsuitable for analysis; thus, the abundance of urinary cpDNA likely outweighs the marginal 8 Tumour tissue DNA Urine cpDNA Urine cfDNA advantages of cfDNA. Preparing cpDNA and cfDNA in parallel, and then either analysing both, or cpDNA whenever possible and cfDNA in cases where cpDNA extraction fails, would reduce the number of untestable samples. To improve cfDNA yields per urine sample in the future, the development of economical and efficient methods to extract cfDNA from larger urine volumes (>100 mL) would facilitate the widespread applicability of urinary cfDNA analysis.
Our primary objective was to develop a focused panel of SMs present in the majority of UBCs and, secondarily, to investigate its prognostic utility and detection in cpDNA and cfDNA as a stepping-stone to the development of a clinical diagnostic assay. Validation in another cohort of patients with UBC will be required to translate these findings, as well as the presentation of sensitivities and specificities from participants with and without UBC; this work is ongoing. However, we consider the data presented here to be of interest to both the UBC and liquid biopsy research communities, with additional novel findings relating to prognosis. Furthermore, recent evidence also suggests that mutations in four of the genes within our panel (ERCC2, FGFR3, PIK3CA and ERBB2) are associated with response to cisplatin-based neoadjuvant chemotherapy for MIBC [32,33], and FGFR inhibitors are in clinical trials for patients with advanced MIBC [34], thus demonstrating additional potential utility of our panel. However, with regard to treatment selection for FGFR inhibition (and of also relevance to ERBB2), it was noticeable that the identification of actionable mutations by amplicon sequencing was superior in tumour tissue DNA than in urinary DNA (Fig. 7); notwithstanding, the collection, shipping, handling and processing of liquid biopsies for such assays is generally easier than for conventional tumour biopsies, with the added benefits of abundance and the potential for repeat testing.
It should also be noted that patients in the present study all had primary UBC with urine samples collected pre-TURBT and tumour samples collected at TURBT; confirmation is required regarding the sensitivities and specificities of mutation detection in UBC surveillance urine samples (both NMIBC surveillance, and MIBC surveillance following bladder-preservation), and the potential confounding effects of urothelial field change, radiotherapy, and other urological conditions. Again, this work is ongoing.

Conclusions
We have described key components of a potential noninvasive diagnostic test for bladder cancer based upon a 23gene panel, and which also demonstrates additional utility for risk stratification and the possibility of therapeutic response prediction in specific settings. SMs can be reliably detected in urinary cpDNA and cfDNA, although the technical capability to identify very low MAFs is essential to reliably detect UBC regardless of the use of cpDNA or cfDNA. Given the higher yields of cpDNA per urine sample, cfDNA could be used to corroborate cpDNA results or if cpDNA yields are insufficient.