Abstract
Despite the recent availability of complete genome sequences of tumors from thousands of patients, isolating disease-causing (driver) non-coding mutations from the plethora of somatic variants is notoriously challenging, and only a handful of validated examples exist. By integrating whole-genome sequencing, gene expression, chromatin accessibility, and genetic data from TCGA, we identified 301 non-coding somatic mutations that affect gene expression in cis. These mutations cluster into 36 hotspot regions with diverse molecular mechanisms of gene expression regulation. We further show that these mutations have hallmark features of noncoding drivers; namely, that they confer a positive selection on growth, functionally disrupt transcription factor binding sites, and contribute to disease progression reflected in decreased overall patient survival.
Footnotes
Boxplots in Figure 2 and 3 were revised by adding raw data points. All gene-level ASE and somatic mutations called by Varscan for 1165 TCGA samples were deposited into DRYAD database and freely shared with other researchers. These data will be very useful for other researchers who want to map these somatic mutations to other newly generated cancer-specific or common regulatory features. In addition, researchers can further correlate mutation occurrence with gene-level ASE or mRNA expression, and the latter expression data can be easily downloaded from GDC data portal.