Evidence for widespread dysregulation of circadian clock progression in human cancer

The ubiquitous daily rhythms in mammalian physiology are guided by progression of the circadian clock. In mice, systemic disruption of the clock can promote tumor growth. In vitro, multiple oncogenes can disrupt the clock. However, due to the difficulties of studying circadian rhythms in solid tissues in humans, whether the clock is disrupted within human tumors has remained unknown. We sought to determine the state of the circadian clock in human cancer using publicly available transcriptome data. We developed a method, called the clock correlation distance (CCD), to infer circadian clock progression in a group of samples based on the co-expression of 12 clock genes. Our method can be applied to modestly sized datasets in which samples are not labeled with time of day and coverage of the circadian cycle is incomplete. We used the method to define a signature of clock gene co-expression in healthy mouse organs, then validated the signature in healthy human tissues. By then comparing human tumor and non-tumor samples from twenty datasets of a range of cancer types, we discovered that clock gene co-expression in tumors is consistently perturbed. Subsequent analysis of data from clock gene knockouts in mice suggested that perturbed clock gene co-expression in human cancer is not caused solely by the inactivation of clock genes. Furthermore, focusing on lung cancer, we found that human lung tumors showed systematic changes in expression in a large set of genes previously inferred to be rhythmic in healthy lung. Our findings suggest that clock progression is dysregulated in many solid human cancers and that this dysregulation could have broad effects on circadian physiology within tumors. In addition, our approach opens the door to using publicly available data to infer circadian clock progression in a multitude of human phenotypes.


Figure S1
Clock gene co-expression in healthy, wild-type mouse organs. (A) Scatterplots of expression for three clock genes in mouse lung (GSE59396). Each point is a sample, and the color indicates zeitgeber time (ZT), where ZT0 corresponds to "lights on." Expression values of each gene were normalized to have mean zero and standard deviation one. ( B ) Heatmaps of Spearman correlation between each pair of the 12 clock genes in each mouse dataset used to make the reference. Genes are ordered manually by a combination of name and known function in the clock. GSE54650 includes gene expression from 12 organs, but to maintain diversity in datasets, we used data from only two organs. (C) Heatmap of consistency in sign of Spearman correlation (across the eight datasets) for each pair of clock genes.

Figure S2
Estimated times of peak expression for the 12 clock genes in each of the eight mouse datasets used to make the reference. Times of peak expression were estimated using ZeitZeiger and the time of day information for each sample, as described in the Methods. ZT refers to zeitgeber time and CT refers to circadian time. ZT0 and ZT24 (or CT0 and CT24, for samples collected under constant darkness) are equivalent. Genes are ordered identically to Clock gene co-expression is insensitive to phase differences, based on data from liver of wild-type mice (GSE13093). CT refers to circadian time. Daytime feeding (food only available from circadian time CT1 to CT9) shifts the phase of the clock in the liver by 12 h, but does not affect clock gene co-expression. (A) Normalized expression of three clock genes over time. (B) Heatmaps of Spearman correlation for each condition. (C) Clock correlation distance (CCD; relative to the mouse reference) and p-value for each condition (calculated by permutation as described in the Materials and Methods).

Figure S5
Clock gene co-expression is robust to incomplete coverage of the 24-h cycle. ( A ) Pairwise scatterplots of expression for three clock genes in mouse lung (GSE59396). Each point is a sample. CT refers to circadian time. Daytime samples are light blue (CT0-CT8), nighttime samples are dark blue (CT12-CT20). Expression values of each gene were normalized to have mean zero and standard deviation one. ( B ) Heatmaps of Spearman correlation for daytime and nighttime samples from three datasets used to make the mouse reference. Samples from GSE11923 and GSE54650 were collected in constant darkness (DD), whereas samples from GSE59396 were collected in alternating light-dark. (C) Clock correlation distance (CCD; relative to the mouse reference) and p-value for each dataset and CT range. 6/18 Figure S6 Clock gene co-expression in human datasets that were designed to study circadian rhythms and in which samples are labeled with time of day (or time since synchronization, for in vitro datasets). (A) Scatterplots of expression for three clock genes in human brain (GSE71620). For this dataset, gene expression measured in postmortem tissue, with zeitgeber time (ZT) based on time of death. Each point is a sample, and the color corresponds to zeitgeber time, where ZT0 corresponds to sunrise. Expression values of each gene were normalized to have mean zero and standard deviation one. The oscillation that was clear in the mouse data is no longer visible, but the correlations have remained. (B) Heatmaps of clock gene co-expression for datasets not shown in Fig. 1D. Datasets from U2OS cells were collected in vitro, the remaining datasets were collected from in vivo tissues.

Figure S7
Clock gene co-expression is generally stronger in mouse datasets than in human datasets. In addition, strength of clock gene co-expression in human datasets not labeled with time of day is comparable to that seen in human datasets designed to study circadian rhythms. For each dataset, we quantified the difference between the 95th and 5th percentiles of the distribution of Spearman correlations between pairs of the 12 clock genes (thus, the maximum difference is 2). Each point corresponds to a dataset.  Figure S11 (A) Delta clock correlation distance (ΔCCD) for each human cancer dataset. P-value corresponds to the probability that a random permutation of condition labels could produce a ΔCCD greater than or equal to the one observed. (B) Clock correlation distance (CCD) for non-tumor and tumor samples from each human cancer dataset.

13/18
Figure S12 Clock gene co-expression is perturbed in tumors of various histological grades. Plots show delta clock correlation distance (ΔCCD) for all combinations of TGCA cancer type and tumor grade that included at least 50 tumor samples. In each case, ΔCCD was calculated using all non-tumor samples of the respective cancer type. Cancer types are ordered by aggregate ΔCCD.

Figure S14
Clock gene co-expression in clock gene knockouts in mice. (A) Heatmaps of Spearman correlation between clock genes in wild-type and knockout samples from seven datasets. Knockout samples in each dataset are from mice in which at least one component of the clock was knocked out either in the entire animal or in a specific cell type. Gene expression was measured in various tissues. Datasets are ordered by descending delta clock correlation distance (ΔCCD). For details of datasets, including sample sizes, see Table S1. (B) ΔCCD between wild-type and knockout samples in each dataset. Positive ΔCCD indicates that the correlation pattern of the wild-type samples is more similar to the mouse reference than is the correlation pattern of the mutant samples.

16/18
Figure S15 Loss of rhythmicity in clock gene expression in clock gene knockouts. Signal-to-noise ratio (SNR) of circadian expression for each gene in wild-type and knockout samples from each dataset. SNR was calculated using ZeitZeiger (see Materials and Methods for details); the calculation uses each sample's time of day information. For ease of visualization, three outliers corresponding to wild-type samples with SNR>20 are not shown.