Aristolochic acid mutational signature defines the low-risk subtype in upper tract urothelial carcinoma

Rationale: Dietary exposure to aristolochic acids and similar compounds (collectively, AA) is a significant risk factor for nephropathy and subsequent upper tract urothelial carcinoma (UTUC). East Asian populations, who have a high prevalence of UTUC, have an unusual genome-wide AA-induced mutational pattern (COSMIC signature 22). Integrating mutational signature analysis with clinicopathological information may demonstrate great potential for risk ranking this UTUC subtype. Methods: We performed whole-genome sequencing (WGS) on 90 UTUC Chinese patients to extract mutational signatures. Genome sequencing data for urinary cell-free DNA from 26 UTUC patients were utilized to noninvasively identify the mutational signatures. Genome sequencing for primary tumors on 8 out of 26 patients was also performed. Metastasis-free survival (MFS) and cancer-specific survival (CSS) were measured using Kaplan-Meier methods. Results: Data analysis showed that a substantial proportion of patients harbored the AA mutational signature and were associated with AA-containing herbal drug intake, female gender, poor renal function, and multifocality. Field cancerization was found to partially contribute to multifocality. Nevertheless, AA Sig subtype UTUC patients exhibited favorable outcomes of CSS and MFS compared to the No-AA Sig subtype. Additionally, AA Sig subtype patients showed a higher tumor mutation burden, higher numbers of predicted neoantigens, and infiltrating lymphocytes, suggesting the potential for immunotherapy. We also confirmed the AA signature in AA-treated human renal tubular HK-2 cells. Notably, the AA subtype could be ascertained using a clinically applicable sequencing strategy (low coverage) in both primary tumors and urinary cell-free DNA as a basis for therapy selection. Conclusion: The AA mutational signature as a screening tool defines low-risk UTUC with therapeutic relevance. The AA mutational signature, as a molecular prognostic marker using either ureteroscopy and/or urinary cell-free DNA, is especially useful for diagnostic uncertainty when kidney-sparing treatment and/or immune checkpoint inhibitor therapy were considered.


Introduction
Approximately 90-95% of urothelial carcinoma (UC) occurs as urothelial carcinoma of the bladder (UCB), with upper tract urothelial carcinoma (UTUC) accounting for 5-10% [1]. Although they have a similar histopathologic appearance, UTUC is a distinct clinical entity with an aggressive clinical behavior and a more advanced presentation compared to UCB [2]. East Asian regions, such as China, have a much higher UTUC prevalence, accounting for more than 30% of UCs [3][4][5].
Conceivably, lifestyle behaviors in risk factors for UTUC, such as aristolochic acid (AA)-containing herb drug consumption, may account for the observation [3,[6][7][8]. It has been shown that the AA-induced mutational signature is characterized by A:T to T:A transversions [8,9]. The unusual genome-wide AA signature, termed signature 22 in COSMIC, holds great potential as "molecular fingerprints" for AA exposure in UTUC [8] and multiple cancer types [7,9].
Clinical interest in the AA-related subtype is increasing, especially in East Asia, since AAassociated UTUCs are prevalent [3,10]. Integrating mutational signature analysis with clinicopathological data may be a crucial step toward personalized treatment strategies for this UTUC subtype. However, due to the lack of WGS data and incomplete clinicopathological data in East Asian UTUC patients, the clinical significance of AA-related subtype UTUC remains largely unknown. In this study, we performed WGS on tumor tissues of 90 Chinese UTUC patients. Using the AA mutational signature as a screening tool, we were able to identify the AA Sig subtype of patients who should be stratified into a low-risk group, although with a high incidence of multifocality. As tumor stage is difficult to assert clinically in UTUC, this signature will identify those diagnostic uncertainties when kidney-sparing treatment and/or immune blockade therapy is considered.

Genomic characterization of Chinese upper tract urothelial carcinoma
We sequenced 90 UTUC samples ( Figure S1) using WGS (~30X) and identified a median of 19,639 (interquartile range (IQR), 16,578 to 32,937) SNVs and a median of 2,197 (IQR, 1,615 to 2,650) indels. A median of 437 (IQR, 355 to 584) coding mutations was noted ( Figure S2A). Combined with the clinical information, we found that the increased mutation load was consistent with the prevalence of exposure to potent mutagen AAs ( Figure S2B). Next, we examined the candidate driver genes with MutSigCV (Q<0.05). Only two genes, TP53 and FRG2C, were identified ( Figure S2C). However, many genes listed in the Cancer Gene Census as known driver genes were affected by nonsilent mutations, including genes that are frequently mutated in Western UTUC patients [11,12], such as KMT2A, C and D (27%) and ARID1A (14%) ( Figure S2C). Consistent with a previous study [13], hotspot driver mutations in the promoter of TERT (22% of all the patients) and FGFR3 mutations (2%) were identified in our cohort but at a much lower frequency ( Figure S2C). Overall, the landscape of point alterations in our UTUC cohort and Western UTUC cohort was similar, but the prevalence of mutations differed.

AA mutational signature defines an etiologically distinct subgroup with favorable outcomes
Next, we explored the dynamic interplay of risk factors and cellular processes using mutational signature analysis. We identified 23 mutational signatures defined by COSMIC in our cohort by MutationalPatterns [14]. Hierarchical clustering based on the number of SNVs attributable to each signature confirmed two major subtypes: AA Sig and No-AA Sig ( Figure 1A). There was a significantly higher number of signature 22 mutations in the AA Sig subtype than in the No-AA Sig subtype ( Figure 1B, Wilcoxon rank test, p<0.001). Consistent with previous epidemiological studies in Asian patients [6,[15][16][17], we found that the AA Sig subtype was significantly associated with AA-containing herb drug intake, poor renal function, female sex, multifocality and lower T stage ( Figure 1C, Table 1). In addition, a Kaplan-Meier plot revealed that the AA Sig subtype exhibited favorable outcomes compared with the No-AA Sig subtype in both cancer-specific survival (CSS) ( Figure 1D, log-rank, p=0.038) and metastasis-free survival (MFS) ( Figure  1E, log-rank, p=0.039). Consistently, the AA Sig subtypes also exhibited favorable outcomes in muscle-invasive UTUC patients (log-rank, p=0.028 for CSS, log-rank, p=0.028 for MFS) ( Figure 1F-G). Therefore, the AA mutational signature defines an etiologically distinct subgroup with favorable outcomes.

Field cancerization may contribute to multifocality in the AA Sig subtype
Consistent with our previous epidemiological study [17], we found an increased rate of multifocality and high bladder recurrence in the AA Sig subtype ( Figure S2B, Table 1). Field cancerization [18], which is the development of a field with genetically altered cells, has been proposed to explain the development of multiple primary tumors and local recurrence. Therefore, we further sequenced three AA Sig subtype patients, including a multifocal patient ( Figure S3 and Table S1). We did find similar numbers of SNVs and indels in the morphologically normal urothelium specimens in the multifocal patient (Table  S2). Strikingly, the AA mutational signature was consistently identified in urothelium specimens and tumor tissues in multifocal patients (Figure 2A-B), which indicated that AA exposure may contribute to field cancerization. However, copy number alterations were not identified in the urothelium tissues of this patient ( Figure 2C). Moreover, in the multifocal patient, the urothelial tumor in the renal pelvis from 2007 shared no genetic alterations with a renal pelvis tumor from 2015 or a bladder tumor from 2015. However, the two tumors from 2015 were genetically related ( Figure 2D). Therefore, field cancerization and intraluminal seeding could co-contribute to multifocality and increased bladder recurrence in AA Sig subtype patients. The AA mutational signature was consistently identified in urothelium specimens and tumor tissues in the other two AA Sig subtype patients ( Figure 2E). Strikingly, we found copy number alterations in urothelium specimens of an AA Sig subtype patient ( Figure 2F). Similarly, the copy number alteration pattern in urothelium specimens was not consistent with the matched tumor, which suggested that AA may also induce copy number alterations in the morphologically normal urothelium.

Potential for immunotherapy in the AA Sig subtype of UTUC patients
The AA Sig subtype bears high mutation burdens and thus may be a good candidate for immune checkpoint blockade therapy [19]. We predicted neoantigens binding to patient-specific human leukocyte antigen (HLA) types. The AA Sig subtype had the highest number of predicted neoantigens ( Figure 3A). Moreover, it has been reported that lymphocyte infiltration, especially CD3 + lymphocytes, in the tumor region is associated with improved survival in a range of cancers, including urothelial cancer [20][21][22], and the number of tumorinfiltrating lymphocytes independently correlates with progression-free survival in non-small-cell lung carcinoma patients treated with nivolumab immunotherapy. Therefore, we further evaluated the extent of tumor-infiltrating mononuclear cells (TIMCs) and CD3 + lymphocytes in 76 available samples (Table S3). We found that the number of CD3 + lymphocytes was positively associated with the number of stromal TIMCs (R 2 =0.74; p<0.001) ( Figure  3B). The AA Sig subtype had higher numbers of both stromal TIMCs (Wilcoxon rank test, p<0.001) and CD3 + lymphocytes (Wilcoxon rank, p<0.001) ( Figure  3C-D). Representative images from a patient with the AA Sig subtype and the No-AA Sig subtype are shown in Figure 3E and 3F, respectively.  AA mutational signature as "molecular fingerprint" for inferring previous AA exposure and AA Sig subtype patients by urinary cell-free DNA First, we experimentally verified that AA alone, as a purified isolated compound, was sufficient to cause AA mutational signatures in human renal tubular cell HK-2 and human uroepithelium cell SV-HUC-1. We found that HK-2 cells (IC50 31.97 µM) were more sensitive to AA treatment than SV-HUC-1 cells (IC50 43.78 µM) ( Figure 4A). Next, we treated HK-2 cells (4.5 months) and SV-HUC-1 cells (3 months) with AA at 50% IC50, which resulted in the development of mixed clones. Consistent with previous studies in several types of cells [7,23,24], we identified AA mutational signature mutations in HK-2 cells ( Figure 4B). The proportion of the AA mutational signature in HK-2 cells was more evident when called mutations were filtered by untreated cells ( Figure 4B). However, SV-HUC-1 cells hardly proliferated following AA exposure, and the AA mutational signature was not identified ( Figure S4A). Copy number changes were also identified in AA-treated HK-2 cells but not in SV-HUC-1 cells ( Figure S4B). Further mechanism study of AA exposure in SV-HUC-1 cells should maintain cells in culture with occasional passaging until cultures emerged from senescence as previously reported in human primary murine embryonic fibroblasts [23,24].
Moreover, we identified an AA mutational signature in histologically "normal" urothelial cells.
In a published study [25], a similar "field effect" was also identified in a Chinese UTUC patient. Thus, we evaluated whether we can take advantage of this "field effect" and used a noninvasive urine test to screen AA Sig subtype patients in urine sediment and/or cell-free DNA. One practical question that arises is how this approach could be implemented clinically. It has been shown that successful detection of the AA signature in urothelial tumors using archived FFPE specimens and low-coverage exome sequencing [26]. Thus, using our published low-coverage WGS data [27] for urinary cell-free DNA from 26 patients with UTUC, we detected a large proportion of the AA mutational signature (nearly >15%) in 4 out of 26 patients ( Figure 4C). To further validate whether the AA mutational signature in urinary cell-free DNA inferred the AA Sig subtype patients, we performed whole-genome sequencing in primary tumors of 8 urinary cell-free DNA matched patients: 4 with a high proportion (nearly >15%) and 4 with a very low proportion (<5%) of AA mutational signatures in cell-free DNA. Consistently, high proportion (>50%) of AA signature mutations was identified in matched tumor tissues in all 4 patients with high proportion (nearly >15%) of AA signature mutations in cfDNA but not in any of the 4 patients with low proportion (<5%) of AA signature mutations in cfDNA ( Figure 4D). Taken together, we confirmed that the AA mutational signature in cell-free DNA can infer the AA Sig subtype patients.

Discussion
In this study, the comprehensive genomic analysis of Chinese UTUC patients shows a significant association of the AA mutational signature with the consumption of AA-containing herbal formulations. Fifty-nine percent of patients (16/ 27) with a self-reported AA intake history of over 1 year were classified as AA Sig subtype, while there are also 17% (11/63) of patients without self-reported AA intake history which were classified as AA Sig subtype (Table 1, Supplementary file 1). This finding may be due to the difficulty of tracking the dosage of AAs from various herbal remedies. In our study, only 70 AA-containing single products and mixed herbal formulas were considered (Table S4). Nevertheless, AA-containing herbs should be discouraged for clinical use due to their nephrotoxic and UTUC-promoting potential.
Notably, our AA Sig subtype patients presented favorable clinical outcomes. This finding is contrary to previous reports in which the AA Sig subtype was significantly associated with multifocality and impaired renal function. According to the latest EAU guidelines, multifocality as a preoperative prognostic factor would define UTUC patients as high-risk, and radical nephroureterectomy is the standard intervention for such patients [1]. Very few AA Sig subtype patients can receive chemotherapy due to aggravation of renal function after radical surgery. AA Sig subtype patients normally experience multiple surgeries and eventually bilateral nephrectomy, such as the multifocal patient of the present study (one side radical nephroureterectomy in 2007 and the other side in 2015). However, in the AA Sig subtype being a low-risk group, kidney-sparing surgical management and close and stringent follow-up may be recommended. Furthermore, advanced and/or metastatic disease may be subjected to immune checkpoint blockade therapy. More importantly, we found that even though fewer SNVs were called with low-coverage WGS data ( Figure S5A), the proportion of the AA mutational signature in backgrounds of other mutational signatures can be successfully retrieved by low-coverage WGS data ( Figure S5B). Therefore, using the AA mutational signature as a screening tool by low-coverage WGS data with either diagnostic ureteroscopy or urinary cell-free DNA has great clinical significance for disease management, although this needs further investigation in clinical practice using a larger cohort. Additionally, the AA mutational signature has also been identified in several other types of cancer patients from both East Asia and Europe, such as kidney cancer and liver cancer [7,9,28,29]. The AA mutational signature may serve as a secondary prevention tool by screening for AA-associated cancers or for kidney disease in patients suspected or known to be exposed to AA.
Collectively, our study provides the most comprehensive genomic profile of Chinese UTUC patients to date. Use of the AA mutational signature as a screening tool may accelerate the development of novel prognostic markers and personalized therapeutic approaches for AA Sig subtype UTUC patients.

Patient cohort
All fresh UTUC samples in this study were obtained from Peking University First Hospital (Grant No.2015(977)). These fresh samples were stored in liquid nitrogen immediately after surgery. Formalin-fixed, paraffin-embedded (FFPE) UTUC samples were provided by the Institute of Urology after pathologic diagnosis. The main endpoint events consisted of cancer-specific survival (CSS) and metastasis-free survival (MFS). All patients were not treated with neoadjuvant chemotherapy. The study was approved by the Ethics Committee of Peking University First Hospital.

AA exposure assessment
AA exposure assessment was performed according to self-reported data on 70 AA and its derivative-containing herb drug (collectively, AA) intake [17,30]. These herbs were taken as single products (Guan Mu Tong (Aristolochia manshuriensis Kom), Guang Fangchi (Aristolochia fangchi), Qing Mu Xiang (Radix Aristolochiae), Ma Dou Ling (Fructus Aristolochiae), Tian Xian Teng (Caulis Aristolochiae), Xun Gu Feng (herba Aristolochiae mollissimae), and Zhu Sha Lian (Aristolochia cinnabarina)) or were components of mixed herbal formulas (e.g., Guan Mu Tong in the Long Dan Xie Gan mixture). The accumulated self-reported usage of the above drugs for more than a year was termed AA exposure patients. Clinical and demographic information was obtained from a prospectively maintained institutional database.

Whole-genome sequencing
For whole-genome sequencing, genomic DNA from FFPE cancer samples was isolated using the Quick-DNA™ FFPE Kit and Genomic DNA Clean & Concentrator (Zymo Research, CA, US) and from fresh cancer tissue samples using the QIAmp DNA Mini Kit (QIAGEN Inc., MD, US). The DNA sequencing libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, MA, US) following the manufacturer's instructions. Briefly, the genomic DNA was fragmented using a Covaris S2 Ultrasonicator instrument (Covaris Inc., MA, US). The sheared DNA was repaired and 3' dA-tailed using the NEBNext Ultra II End Repair/dA-Tailing Module unit and then ligated to paired-end adaptors using the NEBNext Ultra II Ligation Module unit. After purification by AMPure XP beads, the DNA fragments were amplified by PCR for 6-8 cycles. The quality of the DNA sequencing library was assessed with a Bioanalyzer 2100 system. Finally, the libraries were pooled and sequenced with the HiSeq X Ten platform (Illumina, San Diego, CA, US) according to the manufacturer's instructions, generating 2 x 150-bp paired-end reads.

Single nucleotide variations and insertions/deletions calling
The quality of short DNA reads was controlled by Trimmomatic [31]. The good quality PE reads were aligned with the human reference genome hg19 (http://genome.ucsc.edu/) using the BWA-MEM tool [32]. The reads mapped with the reference at the same coordinates were removed using Picard. Furthermore, realignment at insertion/deletion sites (indels) and base quality score recalibration (BQSR) were performed following the best practices of the GATK pipeline [33]. Single nucleotide variations (SNVs) and indels were called using VarScan2 [34] and Vardict [35], and then Rtg tools [36] was used to remove variants called in a set of 1000 healthy Chinese individuals [37] and obtain the common variants called by the two software packages. Further filtering criteria were carried out according to reference [38,39]. SNVs meet these criteria were removed: 1) The median shortest distance of the variant position within the read to either aligned end is less than 10; 2) The median absolute deviation of the shortest distance of the variant position within the read to either aligned end is less than 3; 3) The proportion of reads at the variant position with low mapping quality (less than 1) is greater than 10%; 4) The median mapping quality of variant reads is less than 40; 5) The median base quality at the variant position of variant reads is less than 20; 6) The strand bias for variant reads covering the variant position, i.e. the fraction of reads in either direction, is less than 0.02,unless the strand bias for all reads is also less than 0.2; 7) The length of repetitive sequence adjacent to the variant position, where repeats can be 1-, 2-, 3-, or 4-mers, is 12 or more; and 8) The largest number of variant positions within any 50 base pair window surrounding, but excluding, the variant position is greater than 2. Then, we set the mutational allele frequency cutoff as more than 0.25 and less than 0.75 according to the allele frequency distribution of the samples. (SNVs with allele frequency more than 0.75 were more likely germline mutations and SNVs with allele frequency less than 0.25 were more enriched in FFPE samples). Finally, Annovar [40] was applied to remove variants whose mutation frequency was no less than 0.001 in the 1000 Genomes project phase 3, latest Exome Aggregation Consortium (ExAC) dataset, NHLBI-ESP project with 6500 exomes, latest Haplotype Reference Consortium database, and latest Kaviar database. SNVs which are associated with Pathogenic in clinvar or annotated with urinary associated in cosmic were retrieved. When calling SNVs from low-coverage data, we did not filter the SNVs based on allele frequency. Phylogenetic relationships of the six samples from the multifocal patient were deciphered with SNVs using mrbayes_3.2.2 [41].

Mutational signature analysis
The R (3.5.1) package MutationalPatterns [14] was used to determine mutational signatures in each sample by using "fit_to_signatures" following the authors' guidelines. The output files of MutationalPatterns are presented in Supplementary file 2 and 3. We discovered 23 COSMIC signatures that were merged by shared etiologies into 10 signatures in our cohort. We named signature 22 as AA. Hierarchical clustering was performed by the number of SNVs attributable to each signature [42].

Neoantigen prediction
HLAscan [43] was used to genotype the HLA region with HLA-A, HLA-B and HLA-C taken into consideration (default parameters). NetMHC4.0 [44] was used for predictions of peptide-MHC class I interactions. Nonsynonymous SNVs were used to perform this analysis. An in-house script was used to obtain possible 9-amino acid sequences covering the mutated amino acids according to the manual. We counted the strong binders (%rank < 0.5) according to the manual of NetMHC4.0.

Assessing tumor-infiltrating lymphocytes
Tumor-infiltrating mononuclear lymphocytes were measured according to a standardized method from the International Immuno-Oncology Biomarkers Working Group [45]. To evaluate the CD3 + lymphocytes in tumor sections, anti-CD3 antibody (ab5690, 1:10000; Abcam) was used in a histochemical assay.

Statistical analysis
The variables of different groups were compared using the Wilcoxon rank test and Kruskal-Wallis test as indicated. Kaplan-Meier (K-M) analysis was used to evaluate the associations of the classifiers with metastasis-free survival (MFS) and cancer-specific survival (CSS). Two-sided P-values < 0.05 were considered statistically significant. All statistical analyses were conducted using SPSS version 20.0 (IBM Corporation, America) and R (3.5.1).

Study approval
The study was approved by the Ethics Committee of Peking University First Hospital.

Data availability
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive [47] in BIG Data Center [48], Beijing Institute of Genomics (BIG), Chinese Academy of Sciences (accession number HRA000029 and GVM000054 (GVM)). That can be accessed at http://bigd.big.ac. cn/gsa-human/s/HiObV4f3 and https://bigd.big.ac. cn/gvm/getProjectDetail?project=GVM000054.