Widespread dysregulation of the circadian clock in human cancer

The mammalian circadian clock is a critical regulator of metabolism and cell division. Although multiple lines of evidence indicate that systemic disruption of the circadian clock can promote cancer, whether the clock is disrupted in primary human tumors is unknown. Here we used transcriptome data from mice to define a signature of the mammalian circadian clock based on the co-expression of 12 genes that form the core clock or are directly controlled by the clock. Our approach can be applied to samples that are not labeled with time of day and were not acquired over the entire circadian (24-h) cycle. We validated the clock signature in transcriptome data from healthy human tissues, then developed a metric we call the delta clock correlation distance (ΔCCD) to describe the extent to which the signature is perturbed in samples from one condition relative to another. We calculated the ΔCCD comparing human tumor and non-tumor samples from The Cancer Genome Atlas and eight independent datasets, discovering widespread dysregulation of clock gene co-expression in tumor samples. Subsequent analysis of data from clock gene knockouts in mice suggested that clock dysregulation in human cancer is not caused solely by loss of activity of clock genes. Furthermore, by analyzing a large set of genes previously inferred to be rhythmic in healthy human lung, we found that dysregulation of the clock in human lung cancer is accompanied by dysregulation of broader patterns of circadian co-expression. Our findings suggest that clock dysregulation is a common means by which human cancers achieve unrestrained growth and division, and that restoring clock function could be a viable therapeutic strategy in multiple cancer types. In addition, our approach opens the door to using publicly available transcriptome data to quantify clock disruption in a multitude of human phenotypes. Our method is available as a web application at https://hugheylab.shinyapps.io/deltaccd.


Introduction
Daily rhythms in mammalian physiology are guided by a system of oscillators called the circadian clock (1) . The core clock consists of feedback loops between several genes and proteins, and based on work in mice, is active in nearly every tissue in the body (2,3) . The clock aligns itself to environmental cues, particularly cycles of light-dark and food intake (4)(5)(6) . In turn, 1/29 the clock regulates various aspects of metabolism (7)(8)(9) and is tightly linked to the cell cycle (10)(11)(12)(13)(14)(15) .
Consistent with the tight connections between the circadian clock, metabolism, and the cell cycle, multiple studies have found that systemic disruption of the circadian system can promote cancer. In humans, long-term rotating shift work and night shift work, which perturb sleep-wake and circadian rhythms, have been associated with breast, colon, and lung cancer (16)(17)(18)(19) . In mice, environmental disruption of the circadian system (e.g., through severe and chronic jet lag) increases the risk of breast cancer and hepatocellular carcinoma (20,21) . Furthermore, both environmental and genetic disruption of the circadian system promote tumor growth and decrease survival in a mouse model of human lung adenocarcinoma (22) . While these studies support the link from the clock to cancer, other studies have established a link in the other direction, namely that multiple components of a tumor, including the RAS and MYC oncogenes, can induce dysregulation of the circadian clock (23)(24)(25) . Despite this progress, however, whether the clock is actually disrupted in human tumors has remained unclear. Given recent findings that stimulating the circadian clock slows tumor growth in a mouse model of melanoma (26) , it is important to determine the extent of clock disruption across human cancers, in order to delineate the general potential of anti-tumor strategies based on restoring or improving clock function.
When the mammalian circadian clock is functioning normally, clock genes and clock-controlled genes show characteristic rhythms in expression throughout the body and in vitro (2,3,27) . Measurements of these rhythms through time-course experiments have revealed that the clock is altered or perturbed in some human breast cancer cell lines (28,29) . Existing computational methods for this type of analysis require that samples be labeled with time of day (or time since start of experiment) and acquired throughout the 24-h cycle (30)(31)(32) . Unfortunately, existing data from resected human tumors meet neither of these criteria. In this scenario, one approach might be to look for associations between the expression levels of clock genes and other biological and clinical variables. For example, in human breast cancer, the expression levels of several clock genes have been associated with metastasis-free survival (with the direction of association depending on the gene) (33) . However, because a functional circadian clock is marked less by the levels of gene expression than by periodic variation in gene expression, this type of analysis cannot necessarily be used to determine whether the clock is functional.
A more sophisticated approach is to assume the presence of periodic variation and to infer a cyclical ordering of samples, using methods such as Oscope or CYCLOPS (34,35) . By applying CYCLOPS to transcriptome data from hepatocellular carcinoma, Anafi et al. found evidence for weaker or disrupted rhythmicity of several clock genes (as well as genes involved in apoptosis and JAK-STAT signaling) in tumor samples compared to non-tumor samples. Importantly, CYCLOPS does not require that samples be labeled with time of day, but it does require that the samples cover the entire cycle. As a result, the authors recommend that CYCLOPS be applied to datasets from humans with at least 250 samples (35) . In addition, although CYCLOPS can be 2/29 used to infer rhythmicity in the expression of individual genes, it is not designed to quantify the differences in circadian clock function as a whole between multiple conditions. Rather than attempting to infer an oscillation, an alternative approach might be to take advantage of the pattern of co-expression (e.g., pairwise correlation) that results from different clock genes having rhythms with different characteristic phases. Indeed, a previous study found different levels of co-expression between a few clock genes in different subtypes and grades of human breast cancer (33) . Although this finding was an important first step, its generalizability has been limited because the correlations in expression were not examined for all clock genes, in other human cancer types, or in healthy tissues where the circadian clock is known to be functional. Thus, a definitive answer to whether the circadian clock is functional across the spectrum of human cancers is still lacking.
The goal of this study was to determine whether the circadian clock is functional in human cancer. Using transcriptome data from mice, we defined a robust signature of the mammalian circadian clock based on the co-expression of clock genes. We validated the signature using transcriptome data from various organs in humans, then examined the extent to which the signature was perturbed in tumor compared to non-tumor samples from The Cancer Genome Atlas (TCGA) and from multiple independent datasets. Our findings suggest that the circadian clock is dysfunctional in a wide range of human cancers.

Consistent correlations in expression of clock genes in mice
The progression of the mammalian circadian clock is marked by characteristic rhythms in gene expression throughout the body (3) . We hypothesized that the relative phasing of the rhythms of different genes would give rise to a characteristic pattern of correlations between genes. Such a pattern could be used to infer the activity of the clock, even in datasets in which samples are not labeled with time of day (Fig. 1A). To test this hypothesis, we first collected eight publicly available datasets of genome-wide, circadian gene expression from various mouse organs under both constant darkness and alternating light-dark cycles (3,12,(36)(37)(38)(39)(40) (Table S1). We focused on 12 genes that are part of the core circadian clock or are directly controlled by the clock and that exhibit strong, consistently phased rhythms in expression in multiple organs (32) . For the rest of the manuscript, we refer to these 12 genes as "clock genes." For each dataset, we calculated the Spearman correlation between expression values (over all samples) of each pair of genes. The pattern of correlations was highly similar across datasets and revealed two groups of genes, where the genes within a group tended to be positively correlated with each other and negatively correlated with genes in the other group (Fig. 1B). Genes in the first group (Arntl, Npas2, and Clock), which are known to form the positive arm of the clock (41) , peaked in expression shortly before zeitgeber time 0 (ZT0, which corresponds to 3/29 time of lights on or sunrise; Fig. S1). Genes in the second group (Cry2, Nr1d1, Nr1d2, Per1, Per2, Per3, Dbp, and Tef), which are known to form the negative arms of the clock, peaked in expression near ZT10. Cry1, which appeared to be part of the first group in some datasets and the second group in others, tended to peak in expression around ZT18. These results indicate that the progression of the circadian clock in mice produces a consistent pattern of correlations in expression between clock genes. The pattern does not depend on the absolute phasing of clock gene expression relative to time of day. Consequently, the pattern is not affected by phase shifts, such as those caused by temporally restricted feeding (42) (Fig. S2).
Most computational methods for quantifying circadian rhythmicity and inferring the status of the clock require that samples be acquired over the entire 24-h cycle. Because our approach does not attempt to infer oscillations, we wondered if it would be robust to partial coverage of the 24-h cycle. We therefore analyzed clock gene co-expression in three of the previous datasets, in samples acquired during the first 8 h of the day (or subjective day) or the first 8 h of the night (or subjective night). In each dataset, the correlation pattern was preserved in both daytime and nighttime samples (Fig. S3). These results indicate that our approach can detect an active circadian clock in groups of samples without using time of day information, even if the samples' coverage of the 24-h cycle is incomplete.

Validation of the correlation pattern in humans
We next applied our approach to nine publicly available datasets of circadian transcriptome data from human tissues: one from skin (43) , two from brain (44,45) , three from blood (46)(47)(48) , and three from cells cultured in vitro (38,49,50) (Table S1). The dataset from human skin consisted of samples taken at only three time-points for each subject (9:30am, 2:30pm, and 7:30pm). The datasets from human brain were based on postmortem tissue from multiple anatomical areas, and zeitgeber time for each sample was calculated using the respective donor's date and time of death and geographic location. The datasets from human blood consisted of multiple samples taken throughout the 24-h cycle for each subject. The datasets from cells cultured in vitro were based on time-courses following synchronization by dexamethasone, serum, or alternating temperature cycles.
The patterns of clock gene co-expression in human tissues and cells were similar to the patterns in mice ( Fig. 2 and Fig. S4), which is consistent with our previous findings of similar relative phasing of clock gene expression in mice and humans (51) . The pattern was less distinct in human blood (Fig. S5), likely because several clock genes show weak or no rhythmicity in expression in blood cells (51) . The strong pattern in human skin was due to clock gene co-expression both between the three time-points and between individuals at a given time-point (Fig. S6). Compared to co-expression patterns in mouse organs and human skin, those in human brain were somewhat weaker, which is consistent with the weaker circadian rhythmicity for clock genes in those two brain-specific datasets (51) .
To confirm our findings in a broader range of human organs, we analyzed five transcriptome datasets from healthy human lung, liver, skin, and adipose tissue (52)(53)(54)(55)(56) (Table S1). Samples from these datasets were not collected for the purpose of studying circadian rhythms and therefore are not labeled with time of day. Nonetheless, we observed the expected pattern of clock gene co-expression in each dataset ( Fig. 2C and Fig. S7). We conclude that our approach can detect the signature of a functional circadian clock in a variety of human tissues in vitro and in vivo, even in datasets not designed to study circadian rhythms and with fewer than 100 samples.

Aberrant patterns of clock gene co-expression in human cancer
To investigate patterns of clock gene co-expression in human cancer, we applied our approach to RNA-seq data collected by The Cancer Genome Atlas (TCGA) and reprocessed using the Rsubread package (57) . TCGA samples are from surgical resections performed prior to neoadjuvant treatment. The times of day of surgery are not available, but presumably most or all of the surgeries were performed during normal working hours. We analyzed data from the 12 cancer types that included at least 30 samples from adjacent non-tumor tissue (Table S1). For each cancer type, we calculated the Spearman correlations in expression between clock genes across all tumor samples and all non-tumor samples.
In non-tumor samples from most cancer types, we observed a similar pattern of clock gene co-expression as in the mouse and human circadian datasets ( Fig. 3A-B and Fig. S8A). In contrast, in tumor samples from each cancer type, the pattern was weaker or absent. We observed the same trend when we restricted our analysis to only matched samples, i.e., samples from patients from whom both non-tumor and tumor samples were collected (Fig. S9). To confirm these findings, we analyzed eight additional datasets of gene expression in human cancer, four from liver and four from lung, each of which included matched tumor and adjacent non-tumor samples (58)(59)(60)(61)(62)(63)(64)(65) (Table S1). As in the TCGA data, clock gene co-expression in tumor samples was perturbed relative to non-tumor samples ( Fig. 3C and Fig. S8B).
To quantify the dysregulation of clock gene co-expression in human cancer, we first combined the eight mouse datasets in a fixed-effects meta-analysis ( Fig. 4A and Methods) in order to construct a single "reference" correlation pattern ( Fig. S10 and Table S2). For each of the 12 TCGA cancer types and each of the eight additional datasets of human cancer, we then calculated the Euclidean distances between the reference pattern and the non-tumor pattern and between the reference pattern and the tumor pattern. We refer to each of these distances as a clock correlation distance (CCD), and we refer to the difference between the tumor CCD and the non-tumor CCD as the delta clock correlation distance (ΔCCD). A positive ΔCCD indicates that the correlation pattern of the non-tumor samples is more similar to the reference than is the correlation pattern of the tumor samples.
Consistent with the visualizations of clock gene co-expression, every TCGA cancer type and additional cancer dataset had a positive ΔCCD (Fig. 4B), as did the individual tumor grades in 5/29 the TCGA data ( Fig. S11). Among the three TCGA cancer types with the lowest ΔCCD, prostate adenocarcinoma had a relatively high non-tumor CCD (suggesting dysregulated clock gene co-expression even in non-tumor samples), whereas renal clear cell carcinoma and thyroid carcinoma each had a relatively low tumor CCD (Fig. S12). To evaluate the statistical significance of the ΔCCD, we permuted the sample labels (non-tumor or tumor) in each dataset and re-calculated the ΔCCD 1000 times. Based on this permutation testing, the observed ΔCCD for 11 of the 18 datasets had a one-sided P ≤ 0.001 (Fig. 4B). Overall, these results suggest that the circadian clock is dysregulated in a wide range of human cancers.
Tumors are a complex mixture of cancer cells and various non-cancerous cell types. The proportion of cancer cells in a tumor sample is called the tumor purity and is an important factor to consider in genomic analyses of bulk tumors (66) . We therefore examined the relationship between ΔCCD and tumor purity in the TCGA data. With the exception of thyroid carcinoma and prostate adenocarcinoma, ΔCCD and median tumor purity across TCGA cancer types were positively correlated ( Fig. S13; Spearman correlation = 0.67, P = 0.059 by exact test). These findings suggest that at least in some cancer types, dysregulation of the circadian clock is stronger in cancer cells than in non-cancerous cells.

Distinct patterns of clock gene expression in human cancer and mouse clock knockouts
We next investigated whether the clock dysregulation in human cancer resembled that caused by genetic mutations to core clock genes. We assembled seven datasets of circadian gene expression that included samples from wild-type mice and from mice in which at least one core clock gene was knocked out, either in the entire animal or in a specific cell type (8,42,(67)(68)(69)(70)(71) (Table S1). For each dataset, we calculated the correlations in expression between pairs of clock genes in wild-type and mutant samples and calculated the ΔCCD ( Fig. S14 and Fig. S15).
The two datasets with the highest ΔCCD (>50% higher than any ΔCCD we observed in human cancer) were those in which the mutant mice lacked not one, but two components of the clock (Cry1 and Cry2 in GSE13093; Nr1d1 and Nr1d2 in GSE34018). The ΔCCDs for the other five mutants were similar to or somewhat lower than the ΔCCDs we observed in human cancer. Given the smaller sample sizes compared to the human cancer datasets, the ΔCCDs for four of the other five mutants were not significantly greater than zero (one-sided P > 0.05 by permutation test).
To further compare clock dysregulation in human cancer and clock knockouts, we calculated differential expression of the clock genes between non-tumor and tumor samples and between wild-type and mutant samples (Fig. 5A). Differential expression in the clock knockouts was largely consistent with current understanding of the core clock. For example, knockout of Arntl (Bmal1, the primary transcriptional activator) tended to cause reduced expression (irrespective of time of day) of Nr1d1, Nr1d2, Per1, Per2, Per3, Dbp, and Tef, and increased or unchanged expression of the other clock genes. In the double knockout of Cry1 and Cry2 (two negative regulators of the clock), this pattern was reversed. Interestingly, neither of these patterns of differential expression was apparent in human cancer.
In the clock gene knockouts, rhythmic expression of the clock genes was reduced or lost (Fig.  S16). Although it was not possible to directly quantify the rhythmicity of expression in the human cancer datasets, we reasoned that a proxy for rhythmicity could be the magnitude of variation in expression. Therefore, for each TCGA cancer type and each additional human cancer dataset, we calculated the median absolute deviation (MAD) in expression of the clock genes in non-tumor and tumor samples. We then compared the log 2 ratios of MAD between tumor and non-tumor samples to the log 2 ratios of MAD between mutant and wild-type samples from the clock gene knockout data (Fig. 5B). As expected, samples from clock gene knockouts showed widespread reductions in MAD compared to samples from wild-type mice. In contrast, human tumor samples tended to show similar or somewhat higher MAD compared to non-tumor samples. Taken together with the differential expression analysis, these results imply that clock dysregulation in human cancer is not due solely to loss of activity of one or more core clock genes.

Dysregulation of broader circadian gene expression in human lung cancer
Finally, we explored how clock dysregulation in cancer might affect the circadian transcriptome. We obtained a set of 1,292 genes that were inferred by the CYCLOPS method to be rhythmic in a dataset from healthy human lung (35) . By applying WGCNA (72) to the same dataset, we grouped the 1,292 genes into five modules according to their co-expression (Fig. 6A). Based on DAVID (73) and consistent with the analysis of Anafi et al. (35) , the five modules were enriched for genes involved in various biological processes, including angiogenesis, phosphoprotein signaling, and alternative splicing (Table S3). We then used NetRep (74) to determine the extent to which each module was preserved in non-tumor and tumor samples from five datasets of human lung cancer (two from TCGA and three from NCBI GEO). Notably, all five datasets had ≤65 non-tumor samples and the datasets from NCBI GEO had ≤91 tumor samples, too few for CYCLOPS to produce a reliable ordering. Based on this analysis, four of five modules (comprising 1,256 of 1,292 rhythmic genes) were significantly more strongly preserved in non-tumor compared to tumor samples (Fig. 6B, P ≤ 0.001 for at least two of seven preservation statistics in at least four of five datasets).
We also quantified differential expression of the 1,292 normally rhythmic genes between non-tumor and tumor samples in the five lung cancer datasets, and examined the relationship between each gene's log 2 fold-change and its phase of peak expression (acrophase) in healthy lung as inferred by CYCLOPS. Remarkably, we observed a consistent pattern, in which genes with an acrophase near tended to show the strongest reductions in expression in tumor π samples (Fig. 6C). Using Phase Set Enrichment Analysis (75) , we found that these genes were particularly enriched for involvement in G protein-coupled receptor signaling (Table S4). Taken together, these findings suggest that dysregulation of the circadian clock in human lung cancer is accompanied by systematic changes in broader circadian gene expression.

7/29
Discussion Increasing evidence has suggested that systemic disruption of the circadian clock can promote tumor development and that components of a tumor can disrupt the circadian clock. Until now, however, whether the clock is functional in primary human tumors has been unclear. Here we developed a simple method to probe clock function based on the co-expression of a small set of clock genes. By applying the method to publicly available cancer transcriptome data, we uncovered dysregulation of the circadian clock in multiple types of human cancer. Furthermore, our analysis of human lung cancer suggests that dysregulation of the clock is accompanied by large-scale changes in circadian gene expression.
Our approach for detecting a functional circadian clock relies on three principles. First, we use prior knowledge of clock genes and clock-controlled genes. Second, we account for the fact that the clock is defined not by a static condition, but by a dynamic cycle. Our approach thus exploits the co-expression of clock genes that arises from (1) different genes having rhythms with different circadian phases and (2) different samples being taken from different points in the cycle. Finally, our method does not attempt to infer an oscillatory pattern, but instead uses only the statistical correlations in expression between pairs of genes. The underlying assumption of the ΔCCD is that perturbations to the clock will alter the relative phases and/or signal-to-noise ratios of rhythms in clock gene expression and thereby alter the correlations. Although the correlation matrix only captures part of the complex relationship between genes, it is intuitive, simple to calculate, and requires relatively few samples (our results indicate that 30 in humans may be sufficient). Altogether, these principles enable our method to detect dysregulation of the clock in groups of samples whose times of day of acquisition are unknown and whose coverage of the 24-h cycle is incomplete. Consequently, the ΔCCD should be a valuable complement to methods designed to infer rhythms in omics data, such as CYCLOPS (35) .
Despite these advantages, our method does have limitations. First, a large ΔCCD does not exclude the possibility that the clock genes are still rhythmic, but instead implies that if rhythms are present, the phase relationships between genes are greatly altered relative to a normally functioning clock. Similarly, despite altered expression and co-expression levels, some of the genes rhythmic in healthy lung may still be rhythmic in lung cancer. Second, the ΔCCD is insensitive to the alignment of the circadian clock to time of day, and so cannot detect a phase difference between conditions, such as that recently observed during manic episodes of bipolar disorder (76) . This limitation, however, allowed us to readily compare clock gene co-expression between mice and humans, despite the circadian phase difference between the two species (51) . Third, the ΔCCD is invariant to the relative levels of gene expression between conditions, which is why we complemented our analysis of clock gene co-expression with analysis of differential expression and differential variability. Fourth, transcription is only one facet of the core clock mechanism, and perturbations to post-translational modification or degradation of clock proteins (if unaccompanied by changes in clock gene expression) would not be detected by our approach. Finally, because the ΔCCD relies on co-expression across samples, it does 8/29 not immediately lend itself to quantifying clock disruption in single samples. In the future, it may be possible to complement the ΔCCD and assess clock function in some datasets by directly comparing matched samples from the same patient.
In healthy tissues in vivo, the circadian clocks of individual cells are entrained and oscillating together, which allows bulk measurements to contain robust circadian signals. Consequently, the loss of a circadian signature in human tumors could result from dysfunction in either entrainment, the oscillator, or both. Dysfunction in entrainment would imply that the clocks in at least some of the cancer cells are out of sync with each other and therefore free running, i.e. ignoring zeitgeber signals. Dysfunction in the oscillator would imply that the clocks in at least some of the cancer cells are no longer "ticking" normally. Given the current data, which can be considered averaged clock gene expression from many cells, these scenarios cannot be distinguished. However, the moderate correlation between ΔCCD and tumor purity across cancer types leads us to speculate that the clocks in stromal and/or infiltrating immune cells may be operating normally. In the future, the nature and mechanisms of clock dysregulation in vivo may be unraveled through a combination of mathematical modeling (77,78) and single-cell measurements. A separate matter not addressed here is how the cancer influences circadian rhythms in the rest of the body (79) , which may be relevant for optimizing the daily timing of anticancer treatments (80) .
Based on this study alone, which is observational, it is not possible to determine whether clock dysregulation is a cancer driver or passenger. Indeed, perturbations to co-expression in cancer are not limited to clock genes or to rhythmic genes, as previous work has found changes in co-expression and connectivity across the transcriptome (81,82) . However, given the clock's established role in regulating metabolism and a recent finding that stimulation of the clock inhibits tumor growth in melanoma (26) , our findings raise the possibility that clock dysregulation and manipulation of normal circadian physiology may be a cancer driver in multiple solid tissues. On the other hand, a functional circadian clock seems to be required for growth of acute myeloid leukemia cells (83) , so further work is necessary to clarify this issue.

Conclusions
Our findings suggest that dysregulation of the circadian clock is a common means by which human cancers achieve unrestrained growth and division. Thus, restoring clock function could be a viable therapeutic strategy in a wide range of cancer types. Given recent work on the effects of natural light exposure (84) and time-restricted feeding (85,86) , such a therapeutic would not necessarily have to be pharmacological. In addition, given the practical challenges of studying circadian rhythms at the cellular level in humans, our method offers the possibility to quantify clock function in a wide range of human phenotypes using publicly available transcriptome data.

9/29
Methods Selecting the datasets We selected the datasets of circadian gene expression in mice (both for defining the reference pattern and for comparing clock gene knockouts to wild-type) to represent multiple organs, light-dark regimens, and microarray platforms. For circadian gene expression in humans, we included three datasets from blood, two from brain, and one from skin. The samples from blood and skin were obtained from living volunteers, whereas the samples from brain were obtained from postmortem donors who had died rapidly. For GSE45642 (human brain), we only included samples from control subjects (i.e., we excluded subjects with major depressive disorder). Zeitgeber times for samples from GSE56931 (human blood) were calculated as described previously (87) . For the TCGA data, we analyzed all cancer types that had at least 30 non-tumor samples (all of which also had at least 291 tumor samples). When analyzing clock gene expression in human cancer, unless otherwise noted, we included all tumor and non-tumor samples, not just those from patients from whom both non-tumor and tumor samples were collected. For details of the datasets, all of which are publicly available, see Table S1.

Processing the gene expression data
For TCGA samples, we obtained the processed RNA-seq data (in units of transcripts per million, TPM, on a gene-level basis) and the corresponding metadata (cancer type, patient ID, etc.) from GSE62944 (57) . For E-MTAB-3428, we downloaded the RNA-seq read files from the European Nucleotide Archive, used Salmon to quantify transcript-level abundances in units of TPM (88) , then used the mapping between Ensembl Transcript IDs and Entrez Gene IDs to calculate gene-level abundances.
For the remaining datasets, raw (in the case of Affymetrix) or processed microarray data were obtained from NCBI GEO and processed using MetaPredict, which maps probes to Entrez Gene IDs and performs intra-study normalization and log-transformation (89) . MetaPredict processes raw Affymetrix data using RMA and customCDFs (90,91) . As in our previous study, we used ComBat to reduce batch effects between anatomical areas in human brain and between subjects in human blood (51,92) .

Analyzing clock gene expression and co-expression
We first focused on the expression of 12 genes that are considered part of the core clock or are directly controlled by the clock and that show strong, consistently phased rhythms in a wide range of mouse organs (3,32) . We calculated times of peak expression and strengths of circadian rhythmicity of expression in wild-type and mutant mice using ZeitZeiger (32) , with three knots for the periodic smoothing splines (93) .

10/29
We quantified the relationship between expression values of pairs of genes using the Spearman correlation (Spearman's rho), which is rank-based and therefore invariant to monotonic transformations such as the logarithm and less sensitive to outliers than the Pearson correlation. Using the biweight midcorrelation, which is also robust to outliers, gave very similar results. All heatmaps of gene-gene correlations in this paper have the same mapping of correlation value to color, so they are directly visually comparable.
We calculated the reference Spearman correlation for each pair of genes (Table S2) using a fixed-effects meta-analysis (94) of the eight mouse datasets shown in Fig. 1. First, we applied the Fisher z-transformation ( ) to the correlations from each dataset. Then we calculated rctanh a a weighted average of the transformed correlations, where the weight for dataset was i n i − 3 (corresponding to the inverse variance of the transformed correlation), where is the number n i of samples in dataset . Finally, we applied the inverse transformation ( ) to the weighted i anh t average.
To quantify the similarity in clock gene expression between two groups of samples (e.g., between the mouse reference and human tumor samples), we calculated the Euclidean distance between the respective Spearman correlation vectors, which contains all values in the strictly lower (or strictly upper) triangular part of the correlation matrix. Given a reference and a dataset with samples from two conditions, we calculated the Euclidean distances between the reference and each condition, which we call the clock correlation distances (CCDs). We then calculated the difference between these two distances, which indicates how much more similar to the reference one condition is than the other and which we refer to as the delta clock correlation distance (ΔCCD). Although here we used Euclidean distance, other distance metrics could be used as well.
To evaluate the statistical significance of the ΔCCD for a given dataset, we conducted the permutation test as follows: First, we permuted the relationship between the sample labels (e.g., non-tumor or tumor) and the gene expression values and recalculated the ΔCCD 1000 times, always keeping the reference fixed. We then calculated the one-sided p-value based on the number of permutations that gave a ΔCCD greater than or equal to the observed ΔCCD, adjusted using the method of Phipson and Smyth in the statmod R package (95) . Since we used the one-sided p-value, the alternative hypothesis was that non-tumor (or wild-type) is more similar to the reference than is tumor (or mutant).
To calculate the ΔCCD for individual tumor grades, we used the clinical metadata provided in GSE62944. We analyzed all combinations of TGCA cancer type and tumor grade that included at least 50 tumor samples. In each case, we calculated the ΔCCD using all non-tumor samples of the respective cancer type.
To compare ΔCCD and tumor purity, we used published consensus purity estimates for TCGA tumor samples (66) . The estimates are based on DNA methylation, somatic copy number variation, and the expression of immune genes and stromal genes (none of which are clock genes).
We quantified differential expression between tumor and non-tumor samples and between mutant and wild-type samples using limma and voom (96,97) . As these techniques as designed for transcriptome data, we calculated differential expression using all measured genes. To ensure a fair comparison between human and mouse data, we ignored time of day information in the mouse samples. We quantified the variation in expression of clock genes in each dataset and condition using the median absolute deviation (MAD), which is less sensitive to outliers than the standard deviation.

Analyzing circadian gene expression in human lung cancer
We obtained the set of rhythmic transcripts (identified by microarray probe ID) inferred by CYCLOPS to be rhythmic in healthy human lung (35) . This set contains transcripts whose abundance was well described by a sinusoidal function of CYCLOPS phase in samples from both the Laval and GRNG sites of GSE23546 (Bonferroni-corrected P < 0.05 and peak/trough ratio > 2) and whose orthologs were rhythmic in mouse lung. We mapped the microarray probes to Entrez Gene IDs and calculated the acrophase for each gene as the circular mean of the provided acrophase for the corresponding probes. Phase values inferred by CYCLOPS are relative, so Anafi et al. adopted the convention of setting phase (in radians) to the average π acrophase of the PAR bZip transcription factors (DBP, HLF, and TEF).
Because some of the module preservation statistics calculated by NetRep use the expression matrix directly, we used ComBat (92) to merge the expression data from the Laval and Groningen sites of GSE23546. We then followed WGCNA's recommended procedure for identifying gene modules (72) . Briefly, starting with the merged expression data, we calculated the Spearman correlation matrix for the 1,292 genes, then used the "signed" method with a soft thresholding power of 12 to calculate the adjacency matrix, then calculated the topological overlap dissimilarity matrix. We used hierarchical clustering of the dissimilarity matrix and adaptive branch pruning to define modules of genes, followed by a procedure to merge closely related modules. We then used the DAVID web application (version 6.8) to discover functional categories enriched in each module (73) .
We used NetRep (74) to calculate seven module preservation statistics between healthy human lung and the non-tumor and tumor samples from five datasets of human lung cancer (LUAD and LUSC from TCGA and GSE19188, GSE19804, and GSE32863 from NCBI GEO). We excluded GSE10072, because it included expression levels for only 876 of the 1,292 genes (the other five included at least 1,199 of the 1,292 genes). When calculating module preservation, NetRep automatically removed any genes that were not measured in both healthy human lung and the group of samples to which it was being compared. For non-tumor and tumor samples from each dataset, we calculated the Spearman correlation matrix and adjacency matrix using the same procedure as for healthy human lung.

12/29
To evaluate the statistical significance of the difference in each module preservation statistic between non-tumor and tumor samples, we performed permutation testing and calculated one-sided p-values similarly to the procedure for ΔCCD, permuting the sample labels in the test dataset (non-tumor or tumor) and recalculating module preservation 1000 times (note this is different from the standard way WGCNA and NetRep perform permutations, which is to shuffle the gene labels).
Periodic smoothing splines of log 2 fold-change as a function of acrophase in healthy lung were fit using ZeitZeiger (32) . For Phase Set Enrichment Analysis (75) , we used the "canonical pathways" gene sets from MSigDB (98) and looked for gene sets with a q-value < 0.1 against a uniform distribution and vector-average value within 0.4 radians of either 0.07 or 1.06 (the π π mean phases of the peak and trough of the spline fits, respectively). No gene sets met the criteria for the former. Figures S1-S16 .  Table S1. Details of each dataset used in this study. Table S2. Reference Spearman correlations between clock genes. Table S3. Functional categories significantly enriched in circadian gene modules in human lung according to DAVID (Benjamini-Hochberg P ≤ 0.05). Table S4. Gene sets in human lung with uniform q-value < 0.1 and vector-average value within 0.4 radians of 1.06 according to Phase Set Enrichment Analysis. π Acrophase is defined to be the circular mean of the acrophases of the circadian clock-driven π PAR bZip transcription factors DBP, TEF, and HLF. Each point corresponds to one of the 1,292 genes. Each blue curve indicates a fit to a periodic smoothing spline. The circular mean of the trough of the spline fits is 1.06 . π