Age-associated cryptic transcription in mammalian stem cells is linked to permissive chromatin at cryptic promoters

Suppressing spurious cryptic transcription by a repressive intragenic chromatin state featuring trimethylated lysine 36 on histone H3 (H3K36me3) and DNA methylation is critical for maintaining self-renewal capacity in mouse embryonic stem cells. In yeast and nematodes, such cryptic transcription is elevated with age, and reducing the levels of age-associated cryptic transcription extends yeast lifespan. Whether cryptic transcription is also increased during mammalian aging is unknown. We show for the first time an age-associated elevation in cryptic transcription in several stem cell populations, including murine hematopoietic stem cells (mHSCs) and neural stem cells (NSCs) and human mesenchymal stem cells (hMSCs). Using DECAP-seq, we mapped and quantified age-associated cryptic transcription in hMSCs aged in vitro . Regions with significant age-associated cryptic transcription have a unique chromatin signature: decreased H3K36me3 and increased H3K4me1, H3K4me3, and H3K27ac with age. Furthermore, genomic regions undergoing such age-dependent chromatin changes resemble known promoter sequences and are bound by the promoter-associated protein TBP even in young cells. Hence, the more permissive chromatin state at intragenic cryptic promoters likely underlies the increase of cryptic transcription in aged mammalian stem cells.


Results
Mammalian stem cell aging models.
In this study, we focused on aging in two models: murine hematopoietic stem cells (mHSCs) and human mesenchymal stem cells (hMSCs). Transcriptional and chromatin-level changes were previously characterized in highly purified HSCs isolated from young (4mo) and aged (24mo) mice 7 ; we reanalyzed these data to look for evidence of intragenic cryptic transcription. We further used culture expanded hMSCs isolated from cord blood as an in vitro model of aging in a human stem cell population. Culture expansion is a commonly used model of MSC aging as it recapitulates many of the phenotypes observed in MSCs isolated from aged individuals 27 . We confirmed that culture expanded hMSCs have a decrease in differentiation potential; an increase in the proportion of senescent cells, as measured by -galactosidase staining; and decreased growth rate ( Figure S1). Interestingly, the reduction in differentiation potential occurs before decreased growth rate and increased -galactosidase activity; for this study, we chose to expand the cells to the point that the mock-aged cells had decreased differentiation potential, but had not yet slowed growth or increased cellular senescence.

Analysis of RNA-seq data indicates an increase of intragenic cryptic transcription with age.
Intragenic cryptic transcription is the initiation of transcription from a non-promoter region within a gene body 22,28 . This phenomenon should be reflected in RNA-seq data as an increase in mapped reads downstream of the cryptic initiation site relative to upstream thereof.
As such sites can occur throughout a gene, we developed a method to detect cryptic transcription in RNA-seq samples by examining exon-based expression levels within transcripts. Simply, if downstream exons have a higher transcripts per kilobase million (TPM) than the first (or second) exon of a transcript, as reflected by the ratio of exon-based TPMs, this indicates that cryptic transcription is occurring in the cognate transcript. When comparing between samples, an elevation in cryptic transcription can be inferred from an increased average exon-based TPM ratio in one sample vs. the other. We tested our method using RNA-seq data from murine embryonic stem cells (mESCs) that lack Setd2, which are known to have elevated cryptic transcription 18 . We detected a significant increase in the average exon-based TPM ratio in the Setd2 knockout sample vs. the control mESCs ( Figure S1), indicating that our method can detect increased cryptic transcription from RNA-seq data. Analysis of mESCs undergoing Setd2 knockdown 29 or knockout 30 and Setd2 knockout murine oocytes 31 confirmed an elevation of cryptic transcription in these systems ( Figure S1).
We applied this method to RNA-seq datasets from aged mHSCs 7 and culture-expanded hMSCs. The average exon-based TPM ratio is increased in mHSCs isolated from old mice relative to young, indicating that cryptic transcription increases with age ( Figure 1A). These ratios are shown separately for young and old samples in Figure S1. A similar phenomenon is observed in hMSCs ( Figure 1B and Figure S1). There is also a trend of transcripts with higher expression having greater increases in exon-based TPM ratios with age in both mHSCs and hMSCs ( Figure S1), suggesting that highly expressed genes are more susceptible to ageassociated increases in cryptic transcription. Additionally, the exon-based TPM ratio increases deeper into the transcript, which is consistent with sustained transcription from cryptic sites and expected in a cumulative measurement from RNA-seq. This is confirmed by the number of exons that have exon-based TPM ratios greater than 2 increasing as exon number increases ( Figure S1). Thus, the RNA-seq data from both mHSCs and hMSCs indicates that, on a global scale, cryptic transcription is elevated with age in mammalian stem cells.
We next identified transcripts that have an age-associated increase in cryptic transcription, i.e., transcripts for which the exon-based TPM ratio is highest in the old vs. young samples. Because the data are not normally distributed ( Figure S1), we ranked the log2transformed data and determined the right-tailed inflection point to identify transcripts with a significantly high increase in exon-based TPM ratio with age 32,33 ( Figure 1C). All transcripts to the right of this cut-off (199 in mHSCs and 304 in hMSCs) were considered to exhibit an ageassociated elevation of cryptic transcription. Heatmaps of the exon-based TPM ratios confirm an increase in the old samples ( Figure 1D), and a scatterplot of the old ratio vs. the young ratio on a per-transcript basis shows that the identified transcripts all lie above the line y=x ( Figure S1). As expected, metagene plots of the RNA-seq read distribution in these samples show that read density increases towards the 3' end of the transcripts vs. the 5' end ( Figure 1E). Two examples of cryptic transcripts identified by this analysis, Car11 in mHSCs and IAH1 in hMSCs, are shown in Figure 1F. The transcripts identified by our analysis are longer than average ( Figure   S1), suggesting that, along with higher expression levels, gene length is associated with elevated cryptic transcription with age.
As cryptic transcription increases with age in both mHSCs and hMSCs, we asked whether this also occurs in other adult stem cells. Neural stem cells (NSCs) are maintained in a resting state, but are activated under certain conditions 34 . We examined RNA-seq data from quiescent and activated NSCs (qNSCs and aNSCs, respectively) freshly isolated from the subventricular zone of young and old mice for evidence of an age-associated increase in cryptic transcription. Principle component analysis (PCA) showed that most gene expression changes correlate with activation state and a smaller fraction are associated with sex, while age has less of an effect (Figure 2A). However, in both males and females, aNSCs show an increased exonbased TPM ratio with age, though qNSCs do not ( Figure 2B). In males, this increase is only evident in the last exon, while in females, there is a gradual increase throughout the transcript.
When divided into expression quartiles, in aNSCs isolated from female mice, only transcripts in the fourth quartile show elevated cryptic transcription with age, suggesting a strong link between expression level and age-increased cryptic transcription. However, no such trend was observed in aNSCs from male mice ( Figure S2). We identified 237 transcripts in males and 266 in females that exhibit increased cryptic transcription with age, confirmed by heatmaps of the young and old ratios vs. the first exon ( Figure 2C) and metagene plots of RNA-seq signal ( Figure 2D); as in mHSCs and hMSCs, these transcripts tend to be longer ( Figure S2). Thus, increased cryptic transcription is also a hallmark of NSC aging, but only in activated NSCs.
We additionally looked for signs of an age-associated increase in cryptic transcription in publicly available aging and senescence RNA-seq datasets (E-GEOD-59966; E-GEOD-46486; GSE53330; E-MTAB-4879; and refs. [35][36][37][38][39][40][41][42][43][44] ). In 21 out of 25 datasets spanning a range of tissues, we detected an elevation of cryptic transcription with age ( Figure S2), similar to what was observed in the stem cells. Overall, this suggests that increased cryptic transcription is a hallmark of aging in many mammalian tissues. When we applied this method to samples from Rett and Werner syndromes 45,46 , both premature aging models, no increase in cryptic transcription was detected ( Figure S2), indicating that these models do not recapitulate this feature of aging.
Nevertheless, taken together, our data indicate that elevated cryptic transcription is a hallmark of mammalian aging.

Global analysis of the 5' ends of transcripts identifies sites of cryptic transcription initiation.
Although examination of RNA-seq data revealed that cryptic transcription increases with age, this analysis cannot identify the sites from which these cryptic transcripts initiate (cryptic transcription start sites, cTSSes). For this, we used a recently-developed protocol called DECAPseq that allows for the specific sequencing of the 5' ends of transcripts 18 . We were only able to obtain sufficient RNA to perform this analysis on hMSCs, which can be expanded to large numbers during culture. A modified peak-calling algorithm was used to identify DECAP-seq peaks in young and old hMSC samples. Most peaks cluster around annotated RefSeq TSSes (referred to as "endogenous TSSes"), as expected ( Figure 3A). Although the DECAP-seq signal is reduced downstream of endogenous TSSes, it persists throughout the gene body in both young and old samples, suggesting that intragenic cryptic transcription occurs in both young and old hMSCs. DECAP-seq peaks were considered to indicate cryptic transcription if they were greater than 2000bp distant from an endogenous TSS, a stringent cut-off that ensures no signal from annotated promoters is mis-called as cryptic and prevents chromatin signatures associated with such promoters from confounding assessment of chromatin state at cryptic promoters; subsequent analysis considers these peaks only. A comparison of these peaks revealed that 58% of the total peaks were found in both samples, while nearly 12% were unique to the young sample and almost 30% were unique to the old sample ( Figure 3B), which suggests that cryptic transcription increases with age.
We further developed a differential peak calling pipeline to identify non-endogenous TSS-associated DECAP-seq peaks with more reads in the old sample vs. the young. While only signal in the old sample ( Figure 3C); these peaks are found in both young and old samples. That the old sample has more higher peaks than the young sample suggests that cryptic transcription increases with age in hMSCs, consistent with our other analyses. A metagene plot clearly shows that these sites have higher DECAP-seq signal in the old vs. young samples ( Figure 3D), while those identified as having more signal in the young show the converse ( Figure S3). DECAP-seq signal along the whole gene is shown for both GAPDH and CAPNS1, two genes with 3 identified cTSSes between them. An enrichment of DECAP-seq signal is seen around the endogenous TSS of each gene, and several peaks can be found within the gene bodies; the intragenic peaks are clearly higher in the old samples vs. the young ( Figure 3E). We consider the 1375 loci identified by this analysis to be bona fide sites of age-associated cryptic transcription.
If the age-increased DECAP-seq peaks are sites where cryptic transcription increases with age, this should be reflected in the RNA-seq data. As expected, there is an increase in the ratio of RNA-seq reads mapping immediately up-and downstream from these sites ( Figure 3F), indicating that the increased transcription identified by DECAP-seq can be independently detected. When considering the 158 cTSSes located within introns ( Figure 3G), this trend is more obvious. The transcripts determined to have increased cryptic transcription with age by DECAP-seq are longer ( Figure 3H) and tend to be more highly expressed ( Figure 3I). There is a slight trend of higher gene expression with age in genes with an age-associated increase in cryptic transcription, though those genes with decreased cryptic transcription with age have no clear trend in expression changes ( Figure S3). We also searched for enriched transcription factor binding sites (TFBSs) in the vicinity of the age-associated cTSSes. HOMER 47 analysis identified enrichment of five motifs within 200bp of these DECAP-seq peaks ( Figure 3J and Figure S3). Changes in the abundance or binding of transcription factors that recognize these sites with age could contribute to increased Pol II recruitment to the cTSSes and therefore cryptic transcription.

Transcription-related chromatin states are largely maintained during aging.
Transcription is fundamentally regulated at the level of chromatin. It is well known that chromatin becomes more open with age in many systems (reviewed in ref. 48 ), which could contribute to increased cryptic transcription by promoting intragenic Pol II entry. We therefore performed ChIP-seq to characterize how several transcription-associated histone modifications (H3K4me1, H3K4me3, H3K27ac, and H3K36me3) and heterochromatin-associated histone marks (H3K9me3 and H3K27me3) change with age in hMSCs. Additionally, we performed a ChromHMM analysis 49 Figure 4B and Figure S4). Most of these changes occur between similar states, e.g., state 7 (enriched for H3K36me3, H3K4me1, and H3K27ac) changing to state 8 (H3K36me3 only) by losing enrichment of several histone modifications ( Figure 4C and Figure S4). Heterochromatin loss can be attributed largely to the conversion of states 1-3 to state 10, unmodified chromatin, rather than to a gain of canonically euchromatic histone modifications in previously heterochromatic regions. Overall, the most drastic changes in chromatin state during aging are a loss of constitutive heterochromatin, a gain of facultative heterochromatin, and a decrease in the overlap between H3K4me1 and H3K27ac with H3K36me3.
We also examined how chromatin state changes with age in particular genomic regions ( Figure 4D and Figure S4). State 1 is most enriched at intergenic and lamin-associated domains (LADs), but shows a drastic reduction in the enrichment with age, though in some of these regions, there is an increase in state 3 enrichment ( Figure S4). However, the particular LADs that lose state 1 (H3K9me3) are not the loci that gain state 3 (H3K27me) ( Figure 4E  There are subtle age-associated changes in histone modification enrichment in expressed genes. As seen in Figure 4G, H3K4me3 distribution does not change with age, but its enrichment around the promoter increases. While enrichment levels of H3K4me1 and H3K27ac do not appreciably change, their distribution becomes wider with age. This is reflected in Figure 4C as the fraction of state 8 that changes to state 7 with age and Figure 4D as the increase of state 7 in genic regions. Unlike the increased prevalence of promoter-associated histone modifications at actively expressed genes during aging, H3K36me3 is depleted within gene bodies ( Figure 4G). This is not a complete loss of H3K36me3, but a reduction in its enrichment. As H3K36me3 promotes the restoration of chromatin structure following Pol II transit 23 , it is possible that its decreased enrichment has effects on the chromatin that are undetectable by ChromHMM analysis. Thus, although chromatin structure at actively expressed genes is largely preserved with aging, the chromatin at these loci may be more open in old hMSCs.

The gain of an active promoter-like chromatin structure correlates with age-associated cryptic transcription.
Reduced H3K36me3 is associated with increased cryptic transcription and H3K4me3 (ref. 19 ). A chromatin environment enriched for H3K4me3 and depleted of H3K36me3 is reminiscent of the state at active endogenous promoters. We hypothesized that the chromatin structure around age-associated cTSSes might take on an active promoter-like state with age.
Indeed, H3K4me3 is specifically enriched near age-increased cTSSes in old, but not young, hMSCs ( Figure 5A), while no such increase in H3K4me3 is observed at cTSSes with decreased expression with age ( Figure S5). The pattern of H3K4me3 enrichment around the age-associated cTSSes is even comparable to what is seen at endogenous TSSes ( Figure S5). A similar trend is also seen for H3K4me1 and H3K27ac ( Figure S5). While H3K36me3 persists in the vicinity of age-increased cTSSes, it is decreased in the old samples compared to the young ( Figure 5A and Figure S5). Although H3K36me3 levels are reduced with age at age-associated cTSSes, the levels are still much higher than observed at endogenous TSSes, which likely contributes to the relatively low levels of transcription observed at these sites. Thus, in hMSCs the chromatin state around age-increased cTSSes becomes more active promoter-like with age, with a reduction in H3K36me3 and an accumulation of H3K4me3.
This observation prompted us to determine whether age-associated cTSSes have other characteristics of promoters. We therefore performed ChIP-seq for TBP and compared its enrichment at endogenous TSSes and age-increased cTSSes. As expected, over 85% of the TBP peaks were located within 2000bp of an annotated promoter ( Figure S5). Surprisingly, TBP is bound near a subset of age-associated cTSSes to a similar extent in both old and young hMSCs ( Figure 5B). Independent clustering of the TBP ChIP-seq signal further showed that the distribution of TBP enrichment is similar at endogenous and cryptic TSSes, which suggests that at least part of the transcriptional machinery recognizes select age-associated cTSSes in a similar manner to which it recognizes endogenous TSSes. The pre-initiation complex may be actively recruited to age-increased cTSSes and, as the local chromatin state becomes more open and permissive with age, transcription initiation may increase at these loci.
If age-associated cTSSes gain an active promoter-like chromatin structure in aged hMSCs, it is possible that this chromatin signature is associated with additional age-increased cTSSes that were not identified by DECAP-seq. We therefore searched for sites within gene bodies where H3K4me3 is gained and H3K36me3 enrichment decreases with age ( Figure 5C).
The focus was on H3K4me3 rather than H3K4me1 or H3K27ac as its distribution is known in mouse HSC aging 7 , allowing us to extend our model to this system. Using this method, we identified an additional 2118 putative age-associated cTSSes in hMSCs, exemplified by the site found in ATOH8 ( Figure 5D). The putative age-increased cTSS is enriched for H3K4me3 in both young and old cells, but there is a local depletion of H3K36me3 with age, as seen by a broadening of the valley in the signal track for that histone modification. The increase of DECAP-seq reads in this region in the old vs. young sample suggests an age-associated increase in cryptic transcription, though it does not rise to the level of significance required by our differential peak calling algorithm.
To confirm that the sites identified in this analysis are associated with elevated cryptic transcription, we analyzed RNA-seq and DECAP-seq data surrounding these loci. There is a significant increase in DECAP-seq signal in old hMSCs vs. young associated with these sites ( Figure S5). Likewise, the ratio of RNA-seq signal in the exon downstream of the predicted cTSS vs. the first exon of the transcript is elevated in old vs. young hMSCs ( Figure 5E).
Additionally, the putative age-increased cTSSes have higher scores in a promoter 2.0 analysis 52 than random sequence, indicating that they have promoter-like characteristics ( Figure 5F). We performed a similar analysis using ChIP-seq and RNA-seq data from aging mHSCs 7 and identified 510 putative age-associated cTSSes. As in hMSCs, intragenic sites that lost H3K36me3 and gained H3K4me3 with age have an age-associated elevation of RNA-seq reads downstream of the predicted cTSS ( Figure 5G). Likewise, these loci also have high promoter prediction scores ( Figure 5H). Taken together, these analyses suggest that the sites identified by our chromatin state analysis are bona fide age-increased cTSSes.

Discussion
While it is well established that the chromatin landscape changes with age, the impact of these changes on cells and organisms have not been exhaustively characterized. Here we show for the first time that intragenic cryptic transcription is elevated with age in mammals and link this increase to an altered chromatin state that is more permissive for transcription activation.
Fundamentally, cryptic promoters are sites from which RNA polymerase II can aberrantly initiate transcription. These sites have sequence features similar to those in endogenous TSSes ( Figure S3; Figure 5); such promoter-like sequences may be the result of random chance during evolution. Nevertheless, these cTSSes can be bound by TBP in both young and old cells ( Figure   5), which suggests that they are functioning promoters that are routinely silenced, likely due to a repressive chromatin state maintained downstream of H3K36me3 17-19 ( Figure 6). Indeed, this histone modification is reduced within gene bodies with age, and its reduction is especially pronounced around age-increased cTSSes (Figures 4 and 5). A concomitant increase in H3K4me1, H3K4me3, and H3K27ac in these regions with age results in a chromatin state similar to what is seen at active endogenous promoters ( Figure 5). Thus, as the intragenic chromatin state becomes more permissive for transcription initiation during aging, overall levels of cryptic transcription increase.
This increase in cryptic transcription occurs in a wide range of tissues during aging: in addition to hMSCs, mHSCs, and NSCs, evidence of this phenomenon is found in skin, bone, liver, and various regions of the brain (Figures 1 and 2; Figures S1 and S2). Interestingly, in mouse aNSCs, the age-associated increase in cryptic transcription has distinct characteristics in males and females ( Figure 2). Indeed, in the human dermal fibroblasts dataset analyzed in Figure   S2 (ref. 39 ), when samples are segregated by sex, only males show an elevation of cryptic transcription with age (not shown), further suggesting sex-specific differences in this aspect of aging. As cryptic transcription increases in a broad spectrum of tissues with age, it may contribute to aging pathologies throughout the body. While the direct phenotypic consequences of elevated cryptic transcription are uncharacterized in adult tissues, loss of Kdm5b, which increases cryptic transcription, impairs self-renewal of embryonic stem cells 19 . This suggests that elevated cryptic transcription may also be detrimental in adult stem cells and aging tissues in mammals, as has been found in yeast 12 . As cryptic transcription increases with age throughout the body, and is epigenetically encoded, interventions that target this phenomenon may be approachable pro-longevity treatments.
Medium was replaced every 4 days in the absence of passaging. Cells were grown to ~70% confluence and split 1:4 as needed. Cell density was below the concentration needed for reliable detection with a hemocytometer (10 5 cells/mL), so the growth curve was estimated as 2 population doublings per passage.
hMSC senescence-associated -galactosidase assay Approximately 5×10 4 hMSCs were plated in one well of a 6 well plate and incubated at 37ºC with 5% CO2 and 3% O2 for 4-6 hours. Staining was performed as described in ref. 53 .

hMSC DECAP-seq library preparation and sequencing
DECAP-seq was performed as described in ref. 18 with some modifications. The starting material was poly-A+ RNA was isolated from 100g of total RNA using the Dynabeads mRNA Purification Kit (Life Technologies, 61006). 5' phosphate groups were removed using CIP (NEB, M0290) rather than AP, and Cap-Clip acid pyrophosphatase (CellScript, C-CC15011H) was used in place of RppH to de-cap the mRNAs. Size selection of products was performed after 5' adapter ligation, rather than after 1 st strand synthesis to select against cDNAs generated solely from binding of the RT primer to the 5' adapter. As in ref. 18 , a negative control library was generated by omitting the de-capping step. Libraries were submitted to the Human Genome Sequencing Center at Baylor College of Medicine for sequencing on the Illumina HiSeq 2500 sequencing platform. All sequencing data have been deposited in the GEO database at NCBI (#GSE156409).

Murine NSC RNA-seq library preparation and sequencing
RNA-seq libraries were generated by GENEWIZ LLC. using the SMART-Seq v4 Ultra Low Input RNA Kit for cDNA synthesis and the Nextera XT DNA Library Preparation Kit.
Libraries were sequenced using a 2x125 paired end configuration on an Illumina HiSeq2500. All sequencing data have been deposited in the GEO database at NCBI (#GSE156409).

Data quality control, read trimming, and read mapping
Raw reads were trimmed to remove sequencing adaptors and low quality reads using Trim Galore version 0.4.4 with default parameters (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). RNA-seq data was mapped to the genome using HISAT2 version 2.1.0 (ref. 56 ). DECAP-seq data was mapped using STAR version 2.7 (ref. 57 ). ChIP-seq data was mapped using bowtie2 version 2.2.4 (ref. 58 ). Human genome assembly hg19 and mouse genome assembly mm10 were used as reference genomes in the mapping steps. Reads mapped to the ENCODE blacklist regions 59,60 were filtered. Multialigned reads were removed by a homemade script.

RNA-seq data analysis and global cryptic transcription event assessment
After mapping, transcript-level read count, and subsequent FPKM and TPM measurements, were performed using salmon version 1.0.0 (ref. 61 ) with following parameters: -l A -validateMappings.
An exon-level approach was used to assess cryptic transcription. The exon-level read count was performed using featureCounts version 1.5.3 (ref. 62 ) with the following parameters: -t

exon -g transcript_id -f -O -T 8 -p -C.
To eliminate the bias caused by multiple isoforms, cryptic transcript identification was performed on the major transcript of each gene, defined as the transcript with the highest TPM as determined in the transcript-level analysis. Major transcripts with length >3 kb and TPM ≥1 were used for further analysis.
The length-normalized TPM of each exon was calculated using a homemade script, and the expression ratio between the first or second exon and the remaining exons was calculated using the following formula, hereafter called relative expression value ( ): Where i is the i th exon of a transcript; L is the length of the exon; and b = 1 or 2, depending on whether the length-normalized expression of the first or second exon was used as the baseline value.
Fold change of relative expression (ratio of ratios) were calculated using the following formula: Where i is the i th exon of a transcript, O is the data of the old sample, and Y is the data of the young sample.
Global changes in cryptic transcription with age in paired old and young samples were assessed by calculating the log2-transformed for the second, third, fourth, and last exons of all major transcripts relative to their first and second exons. A log2-transformed value of >0 indicates increased cryptic transcription with age. Changes in cryptic transcription were also assessed after transcripts were divided into quantiles based on their TPM and the same analysis was performed independently for each group. For hMSCs, mHSCs, and NSCs, we additionally compared the averaged values (exons 2 through the second to last exon) in old vs. young cells.
A two-tailed Wilcoxon signed-rank test was used to determine differences in cryptic transcription between the old and young samples vs. the null hypothesis that the average log2transformed was the same in both samples. To identify specific transcripts that have increased cryptic transcription with age, we considered the exon with the maximum relative to the first exon for each major transcript; the relative expression value of this exon vs. the first exon is termed the CT score of the transcript. As these values do not have a normal distribution, even after log2-transfromation, the ratio of ratios were ranked, and transcripts with significantly large ratio of ratios were identified using the rank ordering algorithm in ROSE 32,33 .
Transcripts with a putative increase in cryptic transcription with age were further filtered by three criteria: 1) all of the exons downstream of the identified exon have a higher relative expression value in the old sample than in the young; 2) transcripts in which the first exon of a transcript has the highest were excluded; and 3) the absolute expression of the second exon in the old sample must be larger than or similar to the expression in the young sample to ensure that transcripts with few reads mapping to the 5' end of the transcript did not dominate the analysis.

DECAP-seq data analysis
Following initial read mapping, strand-specific reads were separated and analyzed independently in order to increase sensitivity. A metagene plot was constructed to confirm that the majority of reads mapped near annotated TSSes. Cryptic TSSes (cTSSes) were identified in 10bp bins across the genome by comparing sample data to the negative control. Read count assessment was performed using the csaw package version 1.20 (ref. 63 ) and the count data was normalized using the TMM method in edgeR 64 . Each bin was tested for significant differences between young or old sample and negative control using a Poisson test with the null hypothesis that for each bin, the signal in the sample is equal to the signal in the negative control.
Where p is the output p-value, x is the number of reads in a 10 bp bin of the sample data (young or old sample) and is the number of reads in the same region in the negative control data.
Neighboring bins were then merged into 100 bp windows, with the most significant bin representing the window. Adjustment for multiple comparisons were perform by calculating FDR using the Benjamini & Hochberg method 65 . cTSS peaks were defined as windows with FDR < 0.05 and fold change > 1.2 in the young or old sample vs. the negative control. Windows that overlapped with annotated TSSes and 3' UTRs were filtered out. A common cTSS was defined as a peak identified in both young and old datasets; otherwise, the cTSS was considered unique to the young or old sample.
Age-associated cTSS peaks are a subset of cTSS peaks in old sample. Age-associated cTSS peaks were defined as windows with significantly more reads than negative control (FDR < 0.05) and significantly more reads than the young sample (FDR < 0.05 and fold change > 1.2).
Significance was determined using a Poisson test using signals in the young sample as . The null hypothesis is that for each bin, signal abundance in the old sample is the same to that in the young sample. Merge of neighboring bins and multiple comparison correction were performed as described above.

Analysis of age-increased cryptic transcription
The population distribution of length and expression levels of transcripts with ageincreased cryptic transcription was compared to the major transcripts of expressed genes, as defined above. Significance was determined using a two-tailed Wilcoxon rank sum test vs. the null hypothesis that they are equal.
Motif features within 200bp of age-increased cTSSes were identified by HOMER findMotif.pl with default parameters 47 .
To assess whether transcripts that have increased cryptic transcription are associated with a change in gene expression, FPKM fold changes (old/young) of all the major transcripts were calculated and ordered by rank, from negative fold changes to positive fold changes. The distribution of the fold change rank of transcripts with age-increased cTSSes were compared to transcripts with age-decreased cTSSes, using the distribution of the rank list of all the genes as control.
RNA-seq data was used to validate the age-increased cTSSes identified in the DECAPseq analysis. RNA-seq read coverage 100 bp up-and downstream of age-associated cTSSes was counted using deeptools version 3.2.0 (ref. 66 ) with the computeMatrix function and a 10bp bin size. The sum of read counts for each bin was calculated in the young and the old sample and its ratio was computed. A two-sided Wilcoxon rank sum test vs. the null hypothesis that the ratio was equal in young and old samples was used to determine significance.
Differentially bound regions were identified by the csaw package 63 with default parameters, except that H3K9me3, H3K36me3 and H3K27me3 were performed using the get_motimal_broad_ERs model and peaks were called for the remaining datasets using get_optimal_punctate_ERs model. Differentially bound regions were identified by comparing the changes between young and old data using the csaw package 63 with H3 total and input as internal control for histone markers and TBP, respectively.

Heatmaps of read abundance of histone markers in genic regions and around
TSSes/cTSSes were generated by deeptools version 3.2.0 (ref. 66 ).
TBP ChIP-seq quality was assessed by determining the percentage of peaks located on promoter regions, defined as 1 kb up-and downstream from annotated TSSes.
TBP read abundance within 500bp of age-increased cTSSes or endogenous TSSes was calculated in 10bp bins using cTSSes or TSSes as reference and processed TBP bigwig files as input; regions without TBP signal were excluded. Three clusters were made using the k-means method based on the pattern of TBP read coverage around cTSSes or TSSes.
H3K36me3 coverage around age-increased cTSSes and endogenous TSSes was calculated using deeptools computeMatrix function 66 in 10 bp bins. Endogenous TSSes of major transcripts were used in this analysis (defined above). The sum of read depth in this region was calculated for cTSSes and TSSes in both young and old samples, and changes were assessed using a two-sided Wilcoxon rank sum test vs. the null hypothesis that the normalized read depth is equal in the young and old samples.

Chromatin state analysis
Chromatin states were identified using a 10 state model in chromHMM 49 with predefined peaks as input. In addition to pre-defined genomic regions in the package, we also assessed the enrichment of chromatin states 1kb up-and downstream of TSSes and transcription end sites (TESes), using genomic coordinates from hg19.
The genomic positions of LADs were downloaded from USCS genomic browser using human assembly hg19. Heatmaps of the LADs regions were made with the deeptools plotHeatmap function 66 using histone modification signal tracks as sample and H3 total signal tracks as control. k-means clustering was used to separate the LADs into two groups based on the signal enrichment of H3K9me3. Heatmaps of H3K9me3 and H3K27me3 were made for both young and old data with the LADs pileup in the same order.

cTSS region prediction from histone modification patterns
Putative age-associated cTSS regions in hMSCs and mHSCs were predicted by changes in H3K4me3 and H3K36me3 enrichment. Non-promoter genic regions, defined as gene body regions at least 1kb away from any endogenous TSSes, were considered putative age-increased cTSS regions if they: 1) contained significantly more H3K4me3 reads in the old sample and the regions did not overlap with H3K36me3 peak in the old sample than the young; or 2) contained significantly fewer H3K36me3 reads in the old sample than the young and the region was within 2kb of a H3K4me3 peak in the old sample. A Poisson test was used to determine whether reads were increased (for H3K4me3) or decreased (for H3K36me3) in the old sample relative to the young. Signal abundance the young sample was used as . The null hypothesis is that for each peak region, signal abundance in the old sample is the same as in the young.

Analysis of putative age-increased cTSS regions
Promoter characteristics of putative age-associated cTSS regions were analyzed by the promoter prediction algorithm Promoter -2.0 (ref. 52 ). The promoter score was defined as the number of sequences containing promoter features identified by the algorithm from 100 2kb sequences. In each iteration, 100 randomly selected promoter sequences, 100 randomly selected putative age-increased cryptic promoters and 100 randomly selected genomic sequences were analyzed by the algorithm separately. This process was repeated 100 times to get a distribution.
The sample sequences analyzed each time were distinct and samples were randomly selected without replacement. Promoter sequences were defined as 1kb up-and downstream of endogenous TSSes. Putative age-increased cryptic promoters were defined as 1kb up-and downstream of the midpoint of the identified age-increased cTSS regions. Random genomic sequences were 2kb sequences that randomly extracted from all autosomes.
Validation of increased cryptic transcription with age from putative age-       young hMSCs. Predicted age-increased cTSSes found within expressed major transcripts (n = 1056, teal) and randomly selected genic regions (n=1000, blue) were analyzed. F) Boxplot showing promoter prediction scores for endogenous TSSes (n=100, red), putative age-increased cTSSes (n=100, teal), and random DNA sequences (n=100, blue) in hMSCs. G and H) as (E) and (F), respectively, except these depict putative age-increased cTSSes identified in mHSCs (n=264). P-values for both hMSCs and mHSCs boxplots were calculated by a two-sided Wilcoxon rank-sum test against the null hypothesis that the RNA-seq ratios or promoter scores are equal in the two groups, as appropriate. In young cells, H3K36me3 (red peaks) within the gene body of actively transcribed genes serves as a scaffold for the chromatin modifiers KDM5B and DNMT3B (top panel). These enzymes remove intragenic H3K4me3 (blue peaks) and confer DNA methylation (solid black circles) at CpG residues, respectively, generating a repressive chromatin environment that maintains cTSSes in an inactive state 18,19 . During aging, H3K36me3 levels are reduced within gene bodies, which inhibits the recruitment of KDM5B and DNMT3B to these regions, resulting in accumulation of H3K4me3 and reduced DNA methylation (bottom panel). This results in a chromatin state that is more permissive for transcription initiation at cTSSes (red arrow), causing an elevation of cryptic transcription with age.