R-loops and regulatory changes in chronologically ageing fission yeast cells drive non-random patterns of genome rearrangements

Aberrant repair of DNA double-strand breaks can recombine distant pairs of chromosomal breakpoints. Such chromosomal rearrangements are a hallmark of ageing and compromise the structure and function of genomes. Rearrangements are challenging to detect in non-dividing cell populations, because they reflect individually rare, heterogeneous events. The genomic distribution of de novo rearrangements in non-dividing cells, and their dynamics during ageing, remain therefore poorly characterized. Studies of genomic instability during ageing have focussed on mitochondrial DNA, small genetic variants, or proliferating cells. To gain a better understanding of genome rearrangements during cellular ageing, we focused on a single diagnostic measure – DNA breakpoint junctions – allowing us to interrogate the changing genomic landscape in non-dividing cells of fission yeast (Schizosaccharomyces pombe). Aberrant DNA junctions that accumulated with age were associated with microhomology sequences and R-loops. Global hotspots for age-associated breakpoint formation were evident near telomeric genes and linked to remote breakpoints on the same or different chromosomes, including the mitochondrial chromosome. An unexpected mechanism of genomic instability caused more local hotspots: age-associated reduction in an RNA-binding protein could trigger R-loop formation at target loci. This finding suggests that biological processes other than transcription or replication can drive genome rearrangements. Notably, we detected similar signatures of genome rearrangements that accumulated in old brain cells of humans. These findings provide insights into the unique patterns and potential mechanisms of genome rearrangements in non-dividing cells, which can be triggered by ageing-related changes in gene-regulatory proteins.


Introduction
Cellular processes like transcription and replication can trigger DNA lesions such as doublestrand breaks (DSBs) (Gaillard and Aguilera 2016;Heyer 2015;Aguilera and García-Muse 2013;Skourti-Stathaki and Proudfoot 2014). A sensitive sequencing approach has revealed DSBs at hotspots in mouse brain cells, linked to transcribed genes with neuronal functions (Wei et al. 2016), suggesting that the physiological context can affect the landscape of DSBs. DSBs are normally repaired by homologous recombination or by non-homologous end-joining -two pathways which protect chromosomes from aberrant structural variations (Soutoglou et al. 2007;Difilippantonio et al. 2000;Haber 2016). Under certain physiological conditions, e.g. when the regular DNA-repair pathways are compromised, alternate DNA end-joining processes take over, often involving short homologous sequences (microhomologies) that are typically unmasked through DNA-end resection from the DSBs (Ottaviani et al. 2014;Decottignies 2007Decottignies , 2013. Microhomology-mediated end-joining events can link chromosomal breakpoints that are normally far apart or even on different chromosomes (Bétermier et al. 2014;Villarreal et al. 2012). Such events lead to genome rearrangements such as inversions, duplications, translocations or deletions, which can considerably affect the function of genomes. The patterns of genome rearrangements are shaped by the particular mechanisms of their formation and by the fitness effects they exert on the cell.
Ageing has been associated with both an increase in DSBs (Lombard et al. 2005; White and Vijg 2016) and a decline in the efficiency and accuracy of DNA repair (Gorbunova and Seluanov 2016;White and Vijg 2016). Accordingly, increased genomic instability and chromosomal rearrangements are well-known hallmarks of ageing (Szilard 1959;Curtis 1963;Strehler 1986;Hasty et al. 2003;López-Otín et al. 2013;Vijg and Suh 2013). Impaired non-homologous end-joining leads to accelerated ageing in human patients and mouse models, and microhomology-mediated end-joining increases with age (Vaidya et al. 2014).
Genome re-sequencing studies during ageing have been limited to mitochondrial DNA (Williams et al. 2013;Kennedy et al. 2013), small genetic variants (Gangloff et al. 2017; support from multiple sequence reads (Ye et al. 2009;Wang et al. 2011;Emde et al. 2012;Sindi et al. 2012;Rausch et al. 2012;Jiang et al. 2012;Trappe et al. 2014;Layer et al. 2014; Kimura and Koike 2015;Jeffares et al. 2017). Whilst this is useful for reducing false positives, these algorithms will only identify variations present in multiple cells in a population. To identify the rare, heterogeneous variations arising spontaneously in different non-dividing cells, we stringently filtered split reads that joined sequences from two distant genomic sites (Fig. S1) (Ellis et al. 2018). Such split reads represent potential breakpoint junctions of genome rearrangements, which lead to new sequence combinations (Fig. 1A).
We identified 776,174 such junctions from 225,554,047 total reads across eight replicate pools sampled at six time points.
Several lines of evidence indicate that these breakpoint junctions are not artefacts of sequence library preparation but represent in vivo genome rearrangements. First, fewer junctions were present within coding regions than would be expected by chance (Fig. 1B).
This bias may reflect selection against intra-genic rearrangements that disrupt gene function.
Second, modelling showed that the free DNA ends available for junction formation were not proportionally represented in the observed juxtapositions, i.e. sequences represented by higher read depth were not more likely to feature in breakpoint junctions (Fig. S2). Third, a larger age-associated increase was evident in intra-chromosomal junctions than in interchromosomal junctions, and among intra-chromosomal events, junctions joining neighbouring regions were preferred over those joining more distal regions ( Fig. S3; Note S1). We would not expect such bias from a bioinformatics artefact. Fourth, age-associated junctions were characterised by separate repair signatures at breakpoints compared to signatures suggestive of false positives, of which there were far fewer (see below and Note S1). A drawback of this approach is that the juxtaposed regions forming a junction are analysed without supporting information from other reads or breakpoints, and the exact nature and extent of the structural variations thus remains unknown.
During chronological ageing, breakpoint junctions strongly increased relative to the total number of mapped reads in each sample, particularly from Day 2 onwards (Fig. 1C).
This increase was most pronounced for junctions involving nuclear DNA only, but was also evident within mitochondrial DNA and between nuclear and mitochondrial DNA (Fig. 1C).
Work from others supports the notion that DSBs in nuclear DNA can be repaired with mitochondrial DNA (Hu et al. 2014;Ricchetti et al. 1999;Decottignies 2005). The breakpoint junctions featured different types of sequence rearrangements: single-base insertions not present at either joined region, blunt junctions directly linking two regions, or microhomologies of up to 20 bases shared between both joined regions (Fig. 1D). The blunt junctions and the junctions with single-base insertions or single-base microhomologies did not increase with age. These junctions might have been formed by a distinct mechanism before ageing and/or they could reflect artefacts (Note S1). In stark contrast, junctions featuring 2-20 bases of microhomology did markedly accumulate with age ( Fig. 1D). This signature indicates that these ageing-associated rearrangements occur by microhomologymediated end-joining. The observed size distribution, with a peak at 5-6 bases, might reflect a trade-off between the length of microhomology available near DSBs and the benefit of longer homology for end-joining repair. Interestingly, rearrangements with similar patterns of microhomology seem also to be enriched in cancer cells (Wala et al. 2017). We conclude that genome-wide rearrangements, represented by breakpoint junctions featuring microhomologies, accumulate as a function of the age of non-dividing cells.
Using motif discovery, we found seven long sequence motifs enriched at microhomology-mediated junctions (Fig. S4A). Motifs of known transcription factors from fission and budding yeast, which are shorter than the longer sequence motifs discovered, showed significant homology within these longer motifs (Fig. S4B). These transcription factors are involved in nutrient starvation and other stress responses or in cell-cycle control.
This result suggests that specific transcription factors are associated with regions near chromosomal junctions, raising the possibility that they are involved in triggering DSBs and microhomology-mediated rearrangements.

Similar patterns of genome rearrangements accumulate in old human brain cells.
Non-dividing yeast cells are a model for the post-mitotic ageing of somatic human cells such as the long-lived cells of the brain (Magrassi et al. 2013;Kaeberlein 2010). To check whether similar ageing-associated rearrangements also occur in human cells, and to validate our method in an independent system, we applied our junction calling pipeline to published sequencing data of young and old human brain tissue (Williams et al. 2013). We found a subtle increase in junctions in older brain cells ( Fig. 2A), but the coverage and sample number in this data set were low (Materials and Methods). As in fission yeast (Fig.   1D), these junctions were associated with microhomology ( Fig. 2A). The microhomologyassociated junctions in human brain cells showed a bimodal distribution: a large population featuring similar microhomology lengths to fission yeast (peaking around 4-6 bases), and a less abundant population featuring longer microhomology (median ~16 bases). Interestingly, in cancer genome sequences a transition in the probability of junction formation occurs at around 11 bases of microhomology (Wala et al. 2017); the authors suggest that this transition reflects a shift in repair mechanisms from microhomology-mediated end-joining to single-strand annealing. Notably, simulated data showed that there were fewer junctions in coding regions of human brain cells than would be expected by chance (Fig. 2B). As for fission yeast (Fig. 1B), this finding likely reflects selection against rearrangements that interfere with gene function, either through cell death or active culling of unfit cells (Merino et al. 2015). These results raise the possibility that similar patterns of ageing-related DNA rearrangements occur in both yeast and in human brain cells.
Local and global hotspots for genome rearrangements in ageing yeast cells. Junction formation over time represents a complex, multi-dimensional process: each junction is comprised of two juxtaposed sequences from any two genomic regions; independent junctions can recurrently form between the same two regions (Fig. 3A, green), or between one 'hotspot' region and different other regions (Fig. 3A, blue & red); and junction-formation can be either age-dependent (Fig. 3A, red) or not (Fig. 3A, blue). To visualize the yeast junctions and identify age-associated patterns in any rearrangements, we determined ratios of the number of junctions in the oldest cells to the corresponding number in the youngest cells. Ageing-associated junctions formed preferentially at two distinct types of hotspot: 1) those enriched for local, intra-chromosomal junctions (mostly within 20kb), and 2) those enriched for more global, inter-chromosomal junctions (Fig. 3B). The local hotspots were more abundant but less pronounced than the global hotspots (Fig. 3B). These local hotspots likely reflect spatial constraints on junction formation, with neighbouring DNA being a more likely repair substrate than distal DNA (Fig. S3). At global hotspots, on the other hand, junctions between nearby regions of DNA were under-represented, contrary to expectation and the situation at local hotspots (Fig. S3). These findings suggest that distinct mechanisms operate at global and local hotspots.
We identified three strong global hotspots featuring numerous connections with other, typically remote, sequences throughout the genome. These global hotspots were located near the right end of Chromosome II and near both ends of Chromosome III, the latter being the sites of ribosomal DNA (rDNA) (Fig. 3B). Given that these global hotspots occur in repetitive regions near chromosome ends (Toda et al. 1984;Wood et al. 2002;Mandell et al. 2004), they might simply reflect the large number of repeated sequences. However, if the copy number of these repeated sequences remained constant during ageing, the ratio of junctions between young and old cells should still reflect ageing-associated changes ( Fig.   S5A). We therefore checked for changes in the ratio of repeat copy numbers between young and old cells at global hotspots. This analysis showed that repeat sequences at hotspots did not increase, but actually decreased with age ( Fig. S5B). Thus, if anything, we underestimated the prevalence of the global hotspots as sites for age-associated junction formation. Work in other systems has shown that copy numbers of rDNA repeats decrease with age, and instability in this region is linked with ageing and longevity (Spivey et al. 2017;Kobayashi 2014;Lu et al. 2018). The decrease of rDNA copy numbers with age could be linked to junction formation in these regions: work in primate kidney cells has demonstrated differential repair of cellular DNA sequences (Zolan et al. 1982), reminiscent of the heterogeneity we observe for age-associated rearrangements. We conclude that ageingassociated rearrangements occur preferentially at either local or global hotspots.
Possible causes for global hotspots downstream of thl2. To better understand the global hotspots, we analysed the positions of junctions relative to genome annotations. The first hotspot was downstream of a tlh2 gene ( Fig. 4A; left), encoding a RecQ family DNA helicase. Copies of tlh2 reside on all four sub-telomeres of Chromosomes I and II (Mandell et al. 2004). The S. pombe reference genome assembly only includes one tlh2 gene copy, and it seems likely that the other three copies also feature hotspots. Initially discovered in E.
A re-analysis of RNA-seq data from non-dividing cells (Atkinson et al. 2018) revealed a subtle increase in tlh2 expression during chronological ageing (Fig. S6A). To test whether increased transcription at tlh2 leads to increased junction formation at this hotspot, we generated a strain that overexpresses tlh2. We then sequenced the genome of this strain, along with a wild-type control, in early stationary phase when tlh2 expression is normally low. The proportion of junctions downstream of tlh2 was higher in the tlh2 overexpression strain compared to wild-type (Fig. S6B). Thus, overexpression of tlh2 is sufficient to trigger increased breakpoint junctions, possibly reflecting rearrangements owing to transcriptional stress. Moreover, the tlh2 overexpression strain was substantially shorter-lived than wildtype cells (Fig. S6C). These results suggest that increased transcription of tlh2 and/or increased levels of the Tlh2 protein cause rearrangements that affect cell survival and longevity.
Given the subtle and variable increase in tlh2 expression based on bulk RNA-seq data Accordingly, the smFISH experiment showed that tlh2 was de-repressed in sir2 deletion cells (Fig. 4B). To test whether the tlh2 locus is prone to R-loop accumulation, we applied a DNA-RNA immunoprecipitation followed by sequencing (DRIP-seq) experiment for the profiling of R-loops (Ginno et al. 2012). Indeed, R-loops were strongly enriched downstream of tlh2 (Fig. 4C). Together these results suggest that R-loops may trigger the rearrangements at these global hotspots whose transcriptional activity depends on chromatin-remodelling factors such as Sir2. Moreover, significantly more junctions occurred in 3'-untranslated regions (UTRs) of genes than in 5'-UTRs, similar to the situation at the global hotspots. Our pipeline showed no bias towards AT-or GC-rich regions (Materials and Methods). This finding suggests that the ends of genes are particularly prone to rearrangements. Next, we identified genes whose 3'-UTRs contained more junctions than would be expected by chance, given their length (Fig. 5B).

Ageing-related changes in RNA-binding proteins and R-loops may
Functional enrichment analysis using AnGeLi (Bitton et al. 2015) showed that, of the 148 enriched genes, 17 produced transcripts that are targets of Scw1, representing a significant enrichment (FDR-corrected p <0.0001). Scw1 is an RNA-binding protein (RBP) (Hasan et al. 2014;Jin and McCollum 2003;Karagiannis et al. 2002) and an orthologue of the human polypyrimidine-tract-binding proteins RBPMS and RBPMS2. RBPMS was initially identified in an effort to find the Werner Syndrome gene (Shimamoto et al. 1996), suggesting intriguing parallels with the RecQ helicase Tlh2 at the hotspot discussed above.
Scw1 negatively regulates its target RNAs, possibly by binding to a motif in their 3'-UTRs (Hasan et al. 2014). To examine how an RBP might affect DNA rearrangements, we looked for any ageing-dependent changes in Scw1 substrate binding (Materials and Methods). These analyses suggested that Scw1 loses its affinity for some RNA targets with age ( Fig. S7A), but it does not switch binding substrate from RNA to DNA (Fig. S7B). In fact, the protein levels of Scw1 markedly decreased in ageing cells (Fig. 5C). The budding yeast orthologue of Scw1, Whi3, aggregates during replicative ageing (Schlissel et al. 2017), so it is possible that the decrease in Scw1 reflects protein aggregation. In any case, this result suggests that Scw1, rather than promoting rearrangements at its targets, is preventing them in young cells.
How might an age-associated functional loss of Scw1 trigger rearrangements? We hypothesized that R-loops might again be involved. The presence of RBPs on nascent transcripts can inhibit the formation of R-loops by preventing RNA annealing to the singlestranded DNA template (Huertas and Aguilera 2003;Aguilera 2002;Li and Manley 2005;Wahba et al. 2011). To test whether loss of Scw1 promotes R-loop formation, we quantified R-loops in scw1 deletion (scw1∆) and wild-type cells using R-loop immunostaining and fluorescence quantification. We validated the technique using RNase H mutants that are known to show increased R-loop formation as a positive control (Zhao et al. 2018) and RNase H treatment as a negative control (Fig. S8). For a physiologically relevant comparison, we used proliferating and early stationary phase cells, because older wild-type cells naturally lack Scw1 (Fig. 5C). Indeed, the scw1∆ cells contained more R-loops than wild-type cells (Fig. 5D). We applied DRIP-seq to directly test the effect of Scw1 on R-loop formation at the 227 published target genes of Scw1 (Hasan et al. 2014). This experiment showed that such R-loops slightly, but significantly increased in the absence of Scw1 (Fig.   5E). An increase remained even when the 17 Scw1 targets we identified (Fig. 5B) were not included (one sample t-test: T=-2.46, p=0.014). This finding suggests that the R-loops increased across all or most Scw1 targets. We conclude that the absence of Scw1 is sufficient for increased R-loop formation at its target genes. We propose that diminished Scw1 function in old cells leads to exposed 3'-ends of nascent target mRNAs that can reanneal with complementary sequences in their template DNA (Fig. 5F). The resulting Rloops may then trigger genome instability and associated rearrangements downstream of the coding regions of Scw1 target genes. Thus, age-associated changes in the expression of an RBP with a role in post-transcriptional gene regulation can lead to non-random junction distribution via increased genome instability at its targets.

Conclusions.
Our sensitive analyses of deeply sequenced yeast genomes uncover the rare and diverse events of chromosomal rearrangements that specifically accumulate during ageing of non-dividing cells. Our results provide evidence for widespread genome rearrangements accumulating during ageing of non-dividing fission yeast, associated with microhomology sequences near the breakpoints. Similar rearrangements are also evident in older human brain cells. Junctions indicative of rearrangements show a non-random distribution, both in terms of the regions that are joined (local DNA is often preferred to distal DNA) and the biological importance of the region (fewer junctions occur in coding regions).
These rearrangements occur in global or local hotspots and may be triggered by nonrandom patterns of DNA-RNA interactions. We find that the age-associated decline in the RNA regulatory protein Scw1 and associated R-loop formation can trigger breakpoint formation at target genes. Other RBPs that are modulated during ageing might act similarly to Scw1 in triggering genome rearrangements at other hotspots. These findings highlight that physiological changes can fuel cell-and condition-specific genome rearrangements with a non-random distribution.

Materials and Methods
Strains used in this study. The ageing pools of yeast used for the main experiment are described elsewhere (Ellis et al. 2018). Briefly, an inter-crossed population fission yeast derived from the parental strains Y0036 and DY8531 (Hu et al. 2015;Brown et al. 2011;Jeffares et al. 2015) was inoculated into eight separate 2L flasks of liquid yeast extract supplemented (YES) medium, and grown until the optical density reached a plateau.
Samples were then snap frozen in liquid nitrogen every 24hrs for the next five days. Scw1∆ and Scw1-TAP strains have been published elsewhere (Hasan et al. 2014). Samples for RIP-chip and ChIP-seq of Scw1-TAP were taken at 0, 2 and 4 days after cells reached a plateau in optical density. The Tlh2OE strain was generated for this study using a PCRbased approach (Bähler et al. 1998) to introduce an nmt1 promoter (Forsburg 1993) in front of tlh2. Experiments using this strain, and experiments for chromosomal spreads, were carried out in Edinburgh minimal medium (EMM). For any lifespan experiments in EMM, cultures were grown until the optical density of cells reached a plateau, when cells were spun down and re-suspended in EMM without glucose. All transgenic strains were confirmed by PCR. The Sir2 deletion strain was obtained from the Bioneer fission yeast deletion library (Kim et al. 2010).
DNA extraction, library preparation and sequencing. Genomic DNA was extracted using a standard phenol chloroform procedure (Murray et al. 2016). After precipitation, DNA pellets were re-suspended in TE buffer and treated with RNase A (Qiagen), before being mechanically sheared to ~200bp (Covaris AFA). Sheared DNA was passed through PCR purification columns (QIAquick, Qiagen), and the fragment size distribution was checked using a 2100 Bioanalyzer (Agilent). Libraries were prepared using NEBNext Ultra kits (NEB) according to the manufacturer's standard protocol -this procedure included dA-tailing (Note S1). After individual quantification (Qubit) and quality control (2100 Bioanalyzer), all fortyeight libraries were pooled. The pool was then sequenced using 126nt paired-end reads on the Illumina HiSeq 2500 (SickKids, Canada).  1.19). Split reads were then obtained using a simple shell script integrating Samtools (Li et al. 2009) (v1.2). Note that, when assessing mitochondria-mitochondria split reads, the circularity of the mtDNA needed to be accounted for by ignoring those that aligned to both the start and end of the reference contig. Split reads were then filtered in a custom python script. To pass filtration, both alignments of each split read had to adhere to the following criteria: a minimum length of 40bp, a mapping quality score of 60 (the maximum score given by BWA-MEM), no clipped alignments at both ends (Suzuki et al. 2011). Information on the alignments of each split read was obtained from the CIGAR strings of the initial soft-clipped read, as any information contained in subsequent hard-clipped reads is redundant. The 100bp sequencing reads from human data (Williams et al. 2013) were re-mapped to the hg38n assembly of the human genome (accessed May 2016) using the same pipeline. The original purpose of this data set was to compare somatic mutations in young and old mitochondrial DNA, and the sequenced libraries were mtDNA-enriched (Williams et al. 2013). However, there was still a considerable amount of coverage at the nuclear chromosomes, although less so than in the fission yeast data set.

Read alignment and junction filtration. After performing initial checks in FastQC
To assess the pipeline, we generated one hundred random 100bp sequences of the S. pombe reference genome using BioPython (Cock et al. 2009) and output them as separate vcf files (the file format required by Mason (Holtgrewe 2010) -see below). Each vcf file also contained a random location at which the fragment should be inserted. These fragments were then inserted into the reference genome to produce separate fasta files (using GATK (McKenna et al. 2010) and Mason (Holtgrewe 2010)). Each fasta file was used to generate 1x of simulated reads. Once all reads had been simulated, they were mixed with reads simulated using the standard reference genome (with no insertions) to give a proportion of split reads similar to that obtained in the real sequence files. The read alignment and junction filtration pipeline described above was then applied, and the number of simulated junctions that were recovered was counted. Although sensitivity was low (14/100 simulated rearrangements were recovered), there were no false positives, showing that this filtration is conservative but robust. Using a two-sample Wilcox test, a comparison between the GCcontent of all simulated junctions (N=100) and recovered junctions (N=18) showed that there was no significant bias in our pipeline toward calling junctions in AT-or GC-rich regions (W=992, p=0.5). All sequence data are available in the European Nucleotide Archive under the study accession PRJEB30570.
Modelling the expected rearrangement distribution. To calculate the expected number of DNA fusions between each chromosome in a random admixture of DNA ends, the number of reads mapping to each chromosome (all samples and timepoints considered simultaneously) was obtained using Samtools ) (v1.2) and converted to a proportion of the total number of reads. In Perl, these proportions were used to calculate the number of junctions that would be expected for each combination of chromosomes, were the DNA ends to join randomly. One thousand simulations were performed.

Measurement of microhomology at junctions. For each time point, bam files from all
repeats were merged using Samtools ( v1.2) ). Split reads were then obtained and filtered as above. Using python, junctions were first categorised as follows: those whose alignments share no sequence homology and have no non-homologous sequence between them; those whose alignments share no sequence homology and contain a non-homologous insertion between them; those whose alignments share an overlapping region of sequence homology (note that this homology is not necessarily perfect and may contain InDels). After categorising each junction, the length of any non-homologous or homologous sequence was recorded.

Motif Enrichment. For each junction classed as microhomology-mediated, we used
BioPython (Cock et al. 2009) to collect the 100bp surrounding sequence. These 100bp sequences were compiled into a multi-sequence fasta file and submitted to MEME (Bailey et al. 2009) for motif discovery. Only motifs with e-values lower than 0.05 were considered.
Using TomTom (Bailey et al. 2009), these sequences were compared to databases of known motifs for DNA-binding protein in fission and budding yeast. The most significant hits are shown in Fig. S8B.
Identification of rearrangement hotspots. Junction location files obtained after split read filtration were merged to create one combined file for each time point. A custom python script was used to divide each chromosome into 20kb windows and categorise each junction into a window based on its two alignments. The number of junctions at all windows was then output as separate chromosome-chromosome matrices for each time point. To see how much of an increase there was at each window, day five matrices were divided by their corresponding matrix at day zero (the end and start of the experiment, respectively). To quantify global hotspots, the average ratio across all windows at each 20kb bin of each chromosome was calculated. Fig. S4, the coverage at every position in the genome was obtained for each merged bam file using Bedtools (Quinlan et al. 2010) (v2.22.1). This per-base coverage was then used to obtain the median coverage at each 20kb bin. To get a score for how much each bin had changed in read depth, the median at day 5 for each bin was divided by the median at day zero. To analyse the distribution of junctions at the start and end of genes, a custom python script was used to collect CDS start and end positions based on their strand and coordinates from a bed file (see above). For each repeat, junction alignments were then collected at the 500bp surrounding these positions, and their relative positions inside or outside the gene end were recorded.

Gene enrichment.
To get the number of junctions in each 3' UTR, 3' UTR coordinates were obtained from an S. pombe gff3 file (see above) and converted into bed format. The number of junctions at each 3' UTR were then counted and plotted against the length of that UTR.
Any gene with more junctions than twice the standard deviation of all UTRs were considered recurrently rearranged. Gene enrichment analysis was performed in AnGeLi (Bitton et al. 2015).

RIP-chip experiments.
Two experimental repeats were performed for three timepoints: Days 0, 2 and 4 after cells reached stationary phase. RIP-chip of Scw1-TAP was performed as described (Hasan et al. 2014), except for the following modifications: immunoprecipitation was carried out using monoclonal antibodies against protein A (Sigma); the lysis buffer contained 10mg/ml heparin (sigma H7405), 1 mM PMSF and 1∶100 protease inhibitor cocktail (Sigma P8340); and magnetic beads containing the immunoprecipitate were resuspended in 50 µl of wash buffer containing 1 mM DTT, 1 unit/ml of SuperaseIN (Ambion 2696) and 30 units/ml of AcTev protease (Life Technologies 12575015). The solution with the beads was incubated for 1.5 h at 18°C and the supernatant recovered, and RNA extracted using PureLink RNA micro columns (Life Technologies), according to the manufacturer's instructions. The RNA was eluted from the column in 14 µl and used for labelling without amplification. For microarrays, fluorescently labelled cDNA was prepared from total RNA and immunoprecipitated RNA from the RIP-chip using the SuperScript Plus Direct cDNA Labelling System (Life Technologies) as described by the manufacturer, except for the following modifications: 10 µg of total RNA was labelled in a reaction volume of 15 µl.
We then used 0.5 µl of 10× nucleotide mix with labelled nucleotide (1/3 of the recommended amount), and added 1 µl of a home-made dNTP mix (0.5 mM dATP, 0.5 mM dCTP, 0.5 mM dGTP, 0.3 mM dTTP) to the reaction. All other components were used at the recommended concentrations. Note that these changes are essential to prevent dye-specific biases.
Labelled cDNAs were hybridised to oligonucleotide microarrays manufactured by Agilent as described (Rallis et al. 2013). Microarrays were scanned with a GenePix 4000A microarray scanner and analysed with GenePix Pro 5.0 (Molecular Devices). A dye swap was performed for all repeats. Any gene with data missing at any probe or in either repeat were not included in the analysis. Data was median-centred for analysis and any genes whose expression was highly un-correlated across repeats or did not appear at all timepoints were not considered. The RIP-chips data have been submitted to ArrayExpress under accession number E-MTAB-7618.

ChIP-seq experiments.
ChIP-seq assays were performed as described (Cotobal et al. 2015). For experiments using Scw1-TAP, ChIP-seq samples were collected at the same timepoints as for the RIP-chip experiments (see above). ChIP-seq libraries were sequenced using an Illumina MiSeq instrument. SAMtools (Li and Durbin 2009) and BED tools (Quinlan et al. 2010) were used for sequence manipulation. Peak calling was performed using GEM (Guo et al. 2012). The ChIP-seq data are available in the European Nucleotide Archive under the study accession PRJEB30570.

Single-molecule RNA Fluorescence
In Situ Hybridization. We performed smFISH, imaging and quantification on formaldehyde-fixed cells as described (Sun et al. 2020).
The rpb1 probes were labelled with Quasar 670 and the tlh2 probes with CAL Fluor Red 610. Probe sequences are provided in Table S1. Cells were mounted in ProLong Gold antifade mountant with DAPI (Molecular Probes) and imaged on a Leica TCS Sp8, using a 0 63x/1.40 oil objective. Optical z sections were acquired (z-step size 0.3 microns) for each scan to cover the depth of the cells.

DRIP-seq experiments.
To detect R-loops, we snap-froze cells exponentially growing in YES. Samples for DRIP-seq were prepared based on published protocols (Wahba et al. 2016;García-Rubio et al. 2018b;Hartono et al. 2018), with minor variations. The main difference was in the method for nucleic acid purification. Briefly, cells were thawed on ice and re-suspended in the same buffer used for ChIP-seq chromatin extraction. Cells were homogenized using a Fast-prep instrument, and chromatin was precipitated by centrifugation. Chromatin was re-suspended in Qiagen buffer G2 (from genomic extraction kit) with Proteinase K and incubated for 40 min at 55°C. Nucleic acids were purified by phenol:chloroform extraction and precipitation, digested with S1 nuclease and sonicated.
Half of the sample was treated with RNAseH to use as control. R-loop-IP was performed using Dynabeads-mouse M280 pre-coated with S9.6 antibody (Millipore MABE1095). DNA was then eluted and purified for library preparation with NEBNext Ultra kits (NEB).
For data analysis, the depth at every base was collected using SAMtools and divided by the total number of mapped reads for each sample to normalise for differences in sequencing depth. To compare the R-loop signal at Scw1 target genes between wt and scw1∆, we used a Python script to collect the mean normalised coverage for each target gene, including both those that were identified by Hasan and colleagues as upregulated in scw1∆ and those that were enriched in Scw1-TAP RIP-chip (Hasan et al. 2014  (grey) repeats. One-sided two sample Mann-Whitney U to test whether observed is less than expected (U=0, p <0.001, N=8). Note that proportion of simulated junctions in coding regions is similar to proportion of genome reported to be coding (excluding introns (Wood et al. 2002)).

(C) Number of breakpoint Junctions Per Mapped Read (JPMR) passing filter (Materials and
Methods) as a function of cellular age. Cells were cultured in rich medium until the optical density no longer increased, indicating that cells were no longer dividing, and used as day zero for the chronological ageing timecourse. Samples were taken from the cultures at 0, 1, 2, 3, 4 and 5 days. JPMR are shown relative to day zero (N=8, confidence intervaI=68%).
Ochre line: total junctions, genome wide. Grey lines from top to bottom: subset of junctions formed among fragments of nuclear DNA (nDNA); between fragments of mitochondrial (mtDNA) and nDNA; among fragments of mtDNA.  Analysis was performed on sequence data from putamen samples of neurologically normal Caucasian males (Williams et al. 2013), six of whom were young (blue; 19, 24, 28, 28, 29 and 30 year old donors) and six of whom were old (red; 67, 71, 78, 83, 85 and 89 year old donors). The 100bp reads from these samples, which were crudely enriched for mtDNA, were re-mapped to the whole human genome and filtered (Materials and methods).  (bottom), these junctions may not be the result of ageing as they could have been present earlier (top). Red: true age-associated hotspots will show an age-associated increase. Some rearrangements may occur recurrently between the same two regions (green) or one region with many others (red).

(B)
Heatmap showing the ratio of junction counts between Days 5 and 0 (T5/T0; 'old-young ratio'), in 20kb windows across the genome. An element in the heatmap depicting two regions with high levels of age-associated breakpoint formation will have a higher old-young ratio and appear darker. The average number of breakpoints at each window, genome-wide, is plotted in grey at the top. An intra-chromosomal region of Chromosome II is blown up in bottom right; local hotspots are exemplified by region II:2940000-3080000, where the oldyoung ratio exceeds 25 in many bins along the top edge of the heatmap. Global hotspots are reflected by the dark diagonal emanating from the right end of Chromosome II and both ends of Chromosome III.  which normally express Scw1, were analysed for R-loop formation in presence (wt) and absence (scw1Δ) of Scw1. Right: example images from chromosomal spreads; DNA stained with DAPI (blue) and R-loops by S9.6 antibody (Boguslawski et al. 1986) (yellow), with (+) or without (-) RNase H treatment (which removes R-loops). Scale bar: 10µm.
(E) Normalised read coverage at Scw1 target genes after DRIP-seq of scw1Δ and wt cells using the S9.6 antibody (ochre). The difference between strains is significant (one sample t-test: T=-2.52, p=0.012). Samples after the addition of RNase H are shown as a control (blue).
(F) Model for how reduced Scw1 levels might lead to genome rearrangement at its targets.
In young cells, Scw1 is present and binds its targets at their 3' UTRs. In old cells, Scw1 is absent and thus leaves nascent RNA of its targets free to anneal with template ssDNA. Alignment files were then converted to BAM format and sorted before having PCR duplicates removed in Samtools. Next, Bash commands (Awk and Grep) were used to extract split reads from these files. A custom Python script was used to filter these split reads to obtain a robust set of junctions representing two juxtaposed DNA locations, and to characterise their microhomology use across the junction.     red crosses). Right: Results from three independent repeats with wild type (WT) and tlh2 overexpression cells (tlh2OE). One-sided two sample Mann-Whitney U to test whether tlh2OE is greater than WT (U=9, p=0.04).
(C) Chronological lifespan of tlh2OE (red) compared to WT (grey). Lines show means of three repeats ± 68% confidence intervals. B: Example ChIP-seq of Scw1-TAP. Plots show normalised read depth (mean of two repeats) at three Scw1 targets (red) and three non-targets (blue) in cells aged for two days. 7 For this visualisation, we chose genes which do not overlap any other annotations and matched targets with non-targets of similar gene and 3' UTR lengths. Figure S8. R-loop immunostaining controls in exponential culture.
A: Representative images of nuclear spreads from wild-type and rnh1∆ rn1201∆ double mutant cells, with the latter known to feature increased R-loop formation (Zhao et al. 2018).
Nuclei are stained with DAPI (blue) and R-loops are detected with the S9.6 antibody (green).
To verify that the green signal comes from R-loops, control slides (right hand panels) were treated with RNAse H before adding the primary antibody. Scale bar 10 µm.

Note S1
Arguments against breakpoint junctions forming in-vitro.
In addition to the selection against junctions in coding sequences (Fig. 1B), three pieces of evidence argue against the formation of these junctions in vitro after DNA extraction. First, during sequence library preparation, dA nucleotides were incorporated at the 3' ends of DNA fragments before ligation of the sequencing adapters (with complementary dT overhangs).
We therefore expected that false positive concatemers occurring in vitro during sequence library preparation should feature insertion of an A at the breakpoint. On the other hand, genuine junctions may feature different signatures at the breakpoint, depending on how they were repaired in vivo. Whilst we found a population of single-base insertions (of which A or T insertions could reflect dA-tailing during library preparation), these junctions did not increase at all with age, and age-associated junction formation was associated with microhomology at the breakpoint (Fig. 1D). Second, modelling, based on the read depth for each body of DNA (e.g. average depth at Chromosome II), showed that the number of junctions between each type of event (e.g. mtDNA-Chromosome I) was not proportional to the relative number of free DNA ends in the library, with far fewer junctions between mitochondrial DNA and nuclear DNA than would be expected (Fig. S2). Third, the ratio of junction counts between old and young cells was higher for junctions that formed within chromosomes than those that formed between chromosomes (Fig. 3B), showing that age-associated junction formation was more prevalent within chromosomes. Furthermore, of those junctions that formed within chromosomes, nearby DNA was preferred to distant DNA (Fig. S3). This pattern echoes recent findings in the cancer genome where the frequency of junction formation between two pieces of DNA is approximately inversely proportional to the distance between them (Wala et al. 2017   Neither alignment clipped at both ends.
Junction locations. Breakpoint sequence information.