Catastrophic chromosomal restructuring during genome elimination in plants

Genome instability is associated with mitotic errors and cancer. This phenomenon can lead to deleterious rearrangements, but also genetic novelty, and many questions regarding its genesis, fate and evolutionary role remain unanswered. Here, we describe extreme chromosomal restructuring during genome elimination, a process resulting from hybridization of Arabidopsis plants expressing different centromere histones H3. Shattered chromosomes are formed from the genome of the haploid inducer, consistent with genomic catastrophes affecting a single, laggard chromosome compartmentalized within a micronucleus. Analysis of breakpoint junctions implicates breaks followed by repair through non-homologous end joining (NHEJ) or stalled fork repair. Furthermore, mutation of required NHEJ factor DNA Ligase 4 results in enhanced haploid recovery. Lastly, heritability and stability of a rearranged chromosome suggest a potential for enduring genomic novelty. These findings provide a tractable, natural system towards investigating the causes and mechanisms of complex genomic rearrangements similar to those associated with several human disorders. DOI: http://dx.doi.org/10.7554/eLife.06516.001


Introduction
Nucleosomes containing variant histone (centromeric histone H3, CENH3) (Verdaasdonk and Bloom, 2011) (also known as CENP-A) determine centromeres. In the absence of the endogenous CENH3, Arabidopsis thaliana mitotic and meiotic functions can be complemented by chimeric CENH3  or CENH3 from diverged plant species (Maheshwari et al., 2015), but crossing these strains to wild-type individuals results in frequent loss of the chromosomes marked by the variant CENH3. Following stochastic genome elimination in the early mitotic divisions, the progeny can be haploid, aneuploid or diploid Ravi et al., 2014). In nature, similar phenomena involve defective CENH3 loading (Sanei et al., 2011). Thus, mating of individuals that express diverged CENH3s, can lead to mitotic catastrophe.
The consequences of mitotic malfunction on genome integrity can be dire (McClintock, 1984;Gordon et al., 2012). Missegregated chromosomes can lead to aneuploidy (Janssen et al., 2011), but also to extensive and catastrophic restructuring resulting in, sequentially, chromosome sequestration in micronuclei, endonucleolytic damage, defective repair, and finally rescue (Crasta et al., 2012;Hatch et al., 2013;Zhang et al., 2013). The resulting structurally rearranged chromosomes may contribute to cancer or developmental syndromes (Hastings et al., 2009;Liu et al., 2011;Stephens et al., 2011;Jones and Jallepalli, 2012). Nevertheless, chromosomal rearrangements are not necessarily deleterious: some may influence fitness by altering recombination or gene dosage (Comai et al., 2003). It is possible that pathways leading to disease and to diversity share a common mechanistic basis (Zhang et al., 2013). Genome elimination in Arabidopsis provides a previously lacking organismal system to investigate genome instability during mitotic catastrophes, connected mechanisms, and consequences.

Results
We used the GFP-tailswap haploid inducer Ravi et al., 2014) in the experimental setup illustrated in Figure 1. This strain is in the Col-0 background and carries a homozygous CENH3 null mutation whose function is partially complemented by a chimeric CENH3 in which an N-terminal GFP fused to the H3.3-like N-terminal tail replaces the native CENH3 Nterminal tail. We crossed this strain to polymorphic accession Ler gl1-1 to track haplotypes in the F1 progeny and obtained the expected haploid induction frequency Ravi et al., 2014) (Figure 1). The recessive gl1-1 mutation confers trichomeless leaves in paternal Ler gl1-1 haploids while it is masked in Col/Ler diploid hybrids. We sequenced 10 of the phenotypically diploid Col/Ler individuals with wild-type phenotype, performed dosage plot and single nucleotide polymorphism (SNP) analysis and found that 100% of these were diploid with 50% Col and Ler genomes respectively (Figure 1-figure supplement 1). Plants from the aneuploid class exhibited multiple pleiotropic and morphological defects and had trichomes, except in the rare exception when eLife digest The genome of an individual organism contains all the instructions needed to build and maintain that individual. Any changes to the DNA in the genome can alter the instructions that are given to cells, which can lead to cancer and other diseases. However, changes to the genome can sometimes be beneficial as they can introduce more variety into the instructions carried by different individuals, which increases their potential to adapt to changes in their environment.
In plants and animals, DNA is arranged into structures called chromosomes. Generally, an individual's genome contains two copies of each chromosome; one inherited from their mother and one from their father. However, occasionally during reproduction, all the chromosomes from one of the parents are left out from the cells of the offspring in a process called 'genome elimination'. This makes individuals that carry only half the normal number of chromosomes, known as haploids. Sometimes the process of genome elimination is disrupted, which leads to individuals that have incomplete genomes or chromosomes that carry big rearrangements of the DNA, as if they had been shattered and put back together incorrectly.
In a small plant known as Arabidopsis thaliana, genome elimination frequently happens in the offspring of two individuals that carry different versions of a gene called centromeric histone H3 (CENH3). However, it is not clear how this works, or what roles genome elimination plays in evolution and disease.
Here, Tan et al. studied genome elimination by cross-breeding Arabidopsis plants that carried a mutant form of CENH3 with plants that have a normal version of the protein. The experiments found that many of the offspring were haploid. Some of the others carried an extra copy of an entire chromosome or a section of a chromosome. A third group had an extra copy of a chromosome that was missing some sections or had been rearranged. These 'shattered' chromosomes were always formed from chromosomes that came from the parent plant with a mutant form of CENH3.
Tan et al. also found that a protein called DNA Ligase 4, which helps reconnect broken DNA strands, is involved in repairing the breaks in these shattered chromosomes. Some of the genetic rearrangements documented in the experiments were passed on to subsequent generations of plants, which suggests that these genomic changes can be stable enough to be inherited.
The genomic rearrangements observed in the Arabidopsis plants are similar to those seen in patients with cancer and other genetic diseases. Tan et al. findings show that Arabidopsis plants provide a useful system for studying these genome rearrangements, which may inform efforts to treat these human diseases. the GL1 locus was lost. The five recognizable primary trisomic (2n + 1) phenotypes were represented (Steinitz-Sears, 1963;Koornneef and Vanderveen, 1983): Chromosome 1 (Chr1) trisomics have dark green, serrated leaves and are dwarfed, Chr2 trisomics exhibit round leaves and are late flowering, Chr3 trisomics have narrow, yellow green leaves, Chr4 trisomics display narrow and smaller flat leaves, and Chr5 trisomics display light green and narrow leaves. However, aneuploid plants with more severe or unusual phenotypes were also observed, suggestive of other chromosomal combinations or more serious chromosomal aberrations. Chromosome dosage analysis based on whole genome sequencing (Henry et al., 2010) (Supplementary file 1) distinguished three chromosomal alteration types in aneuploids ( Figure 2). Similar outcomes were obtained using independently derived haploid inducers, either expressing GFP-tailswap ( Figure 2E and Figure 2-figure supplement 1) or CENH3 from other plant species (Maheshwari et al., 2015). The most common type, numerical aneuploids, display whole chromosome aneuploidy such as in the classical primary trisomics ( Figure 2B shows an example for a numerical Chr3). In our dataset, single primary trisomics (2n + 1) account for 75% of the numerical class. Other individuals from the numerical class with two or more extra whole chromosomes included 16% double primary trisomics (2n + 1 + 1), 2% triple primary trisomics (2n + 1 + 1 + 1) and 3% quadruple primary trisomics (2n + 1 + 1 + 1 + 1). Additionally, we obtained disomic Chr4 haploids (n + 1, a type of numerical aneuploidy that were not included in this analysis) as well as Chr2 or Chr3 monosomic diploids (2n − 1) at 4% frequency ( Figure 2-figure supplement 2). These have never been described in Arabidopsis before, possibly because, if they were to arise from meiotic defects, they would result from nullisomic gametes, which are not viable (Henry et al., 2009). Aneuploids resulting from mitotic failure do not have those constraints.
The second alteration type is defined by simple truncations and repair of at most two double stranded DNA breaks per chromosome ( Figure 2C shows an example of truncated Chr3). This truncated class was found to occur in 22% of the aneuploid population. In the third class, a single chromosome exhibited many oscillations in copy number state, as if shattered and subsequently rearranged ( Figure 2D shows an example of shattered Chr3). This shattered class was found to occur in 11% of the aneuploid population. Additionally, some of the aneuploids exhibited a combination of numerical, truncated and shattered chromosome types ( Figure 2E). Alteration of copy number for Chr1, 2 and 3 are represented at similar frequencies based on the average copy number alteration of all five chromosomes, with Chr4 and 5 alterations being, respectively, over-and under-represented ( Figure 2F). This may be explained by the uneven distribution between chromosomes of few, selected genes that are highly dosage-sensitive. According to this hypothesis, Chr4 would be selectively depleted for such genes.
Chromosomal truncations have been reported from a selfed trisomic (Huettel et al., 2008). To assess whether truncated and shattered aneuploid types could be produced from meiotic missegregation, we sequenced 96 individuals produced by a selfed Col-0 triploid. Because of the irregular meiosis, most gametes produced by triploids are aneuploid (Henry et al., 2007). Dosage analysis revealed that all were numerical aneuploids (Supplementary file 2 and Figure 2-figure supplement 3). To assess whether truncation and shattering could be the result of meiotic defects in the GFP-tailswap line, we sequenced 96 individuals from selfed GFP-tailswap and observed that 98% (n = 94) of the progeny were diploid while two individuals carried single primary trisomies of Chr2 and Chr3 respectively, representing only the numerical class of aneuploids (Supplementary file 3 and Figure 2-figure supplement 4). Based on these results, we believe that truncated and shattered aneuploid classes from our crosses reflect genomic instability associated with mitotic errors in the early embryo.
Shattered chromosomes can be recovered from all five A. thaliana chromosomes ( Figure 3A). In some cases, shattering appears to extend to two chromosomes (top panel of Figure 3A) only because the haploid inducer used carries a reciprocal Chr1/Chr4 translocation originating from the integration of GFP-tailswap T-DNAs. SNP analysis demonstrates that all duplicated (copy number 3) and triplicated (copy number 4) regions originated from the haploid inducer ( Figure 3B). Single-copy regions displaying loss of heterozygosity carry Ler alleles (i.e., wild-type), consistent with the loss of the haploid inducer haplotype.
Although aneuploids from the shattered class were often sterile, line FRAG00062 was partially fertile and allowed us to investigate the inheritance and stability of the variant DNA. We sequenced 16 F2 progeny from FRAG00062 and obtained two individuals with precisely the same shattered pattern as the F1 parent and 14 that appeared diploid ( Figure 4A). Meiotic co-inheritance of all dosage variant segments is consistent with a single, stable chromosomal unit that was formed after a catastrophe. To confirm this hypothesis, we used DNA fluorescence in situ hybridization to visualize the FRAG00062 chromosomes using Col-0 derived BAC painting probes specific for Chr1 and Chr4 ( Figure 4B). Mitotic cells contained 11 chromosomes ( Figure 4D). FRAG00062 came from a cross using GFP-tailswap line #11, which carried a reciprocal Chr1/4 translocation (Figure 2-figure supplement 1). This allowed us to distinguish the haploid inducer Chr1, the Ler Chr1, and a third Chr1 with rearranged signals, which we interpret as the shattered extranumerary chromosome ( Figure 4C and Next, we sought to investigate why shattering is restricted to a single chromosome. During genome elimination crosses in other plant species, micronuclei are commonly observed    (Subrahmanyam and Kasha, 1973;Gernand et al., 2005). We dissected embryos from a genome elimination cross and observed one to four micronuclei per cell ( Figure 4E, F) in 81% of the embryos (n = 110), but none in embryos from control crosses (n = 21, p < 0.001). The presence of micronuclei suggests that sub-compartmentalized lagging chromosomes can be shattered by double stranded DNA breaks, reassembled haphazardly by non-homologous end joining (NHEJ), and finally restituted into the main nucleus (Crasta et al., 2012).
In order to reconstruct breakpoint junctions, we sequenced FRAG00062 to 100× coverage, extracted read pairs from the ends of duplicated and triplicated blocks and performed de novo assembly. 38 such junctions were assembled (Supplementary file 4) and a random subset of 12/12 were confirmed by PCR ( Figure 5-figure supplement 1) followed by Sanger sequencing to demonstrate the accuracy of the de novo assembly. All reconstructed junctions were consistent with NHEJ with either microhomology, observed as 2-15 bp of sequence overlap (Hastings et al., 2009), blunt fusions, or unidentified sequence insertions ( Figure 5B). We also observed inversions (fragments that join in head to head or tail to tail orientation) in 47% of our breakpoint junctions (Supplementary file 4). The size distributions of microhomology tracts and insertions are indicated in Figure 5-figure supplement 2.
Overall, triplicated block sizes from FRAG00062 were significantly smaller than duplicated blocks (n = 23 in both cases, with p < 0.001, Figure 5C) and these triplications cannot be easily explained from a missegregated chromosome. Duplicated and triplicated blocks could therefore, have different origins. To address this question, we asked whether breakpoint junctions of the two different copy number states display differential association to various genomic and chromatin features such as genes and repeated elements (Lamesch et al., 2012), DNA replication origins (Costas et al., 2011), DNase I hypersensitive sites (DHS) (Zhang et al., 2012) and nine non-overlapping chromatin states that partition the Arabidopsis genome (Sequeira-Mendes et al., 2014) (Supplementary file 5). When analyzing windows of 1000 bp centered around the breakpoints of duplicated blocks, we observed an enrichment in genic DNA (from 53% background level to 70%, p < 0.01, Figure 5D,F). A subtler, but still significant, increase was observed when using larger windows (10,000 bp , from 53% background level to 62%, p < 0.01, Figure 5F). Consistently, 42% of breakpoint junctions from FRAG00062 are predicted to generate chimeric gene products (Supplementary file 4). In the same analysis, we noted that the breakpoint regions of duplicated and triplicated blocks contained some genomic features that differed in frequency. In particular, replication origins, which occupy less than 1% of 10,000 bp windows around the borders of duplicated blocks, are present in almost 8% around the borders of triplicated blocks (compared to a genome average of 3.5%, p < 0.05, Figure 5E,G). The association of the breakpoints flanking duplicated DNA to genic DNA and of those flanking triplicated DNA to replication origins suggests the contribution of two distinct mechanisms to restructuring of the same chromosome ( Figure 6). The first, chromothripsis acting through breakage and ligation (Stephens et al., 2011;Korbel and Campbell, 2013). The second, chromoanasynthesis, via replication fork collapse and template switching (Hastings et al., 2009;Liu et al., 2011;Kloosterman and Cuppen, 2013).
Our in silico reconstruction suggests that NHEJ is involved in repairing breaks that occurred on the shattered chromosomes. To test this explanation, we created a haploid inducer carrying Pollinating it with wild-type LIG4/LIG4 pollen (from Ler gl1-1) resulted in normal haploid induction frequencies. However, when mutant lig4-2/lig4-2 pollen was used, the frequency of haploids doubled at the expense of aneuploids and diploids (Table 1 and Figure 7). This effect was still observed when the seed parent carried the WT allele ( Table 1). It is possible that parental-specific haploinsufficiency results from early loss of the wild-type LIG4 allele located on the chromosome targeted for elimination, which in this case is the maternal chromosome. This result indicates that NHEJ contributes to formation or persistence of aneuploid and diploid progeny and that unrepaired double-stranded DNA breaks increase elimination of the haploid inducer genome, similar to observations in mouse-human hybrid genome elimination . We hypothesize that missegregated chromosomes enter a degradative pathway initiated by endonucleolytic breaks. Occasionally, such chromosomes are rescued (i.e., restituted to a haploid or diploid nucleus) through a pathway requiring NHEJ, resulting in aneuploidy. Therefore, more haploids are produced when the NHEJ pathway is impaired (Figure 6).

Discussion
Taken together, our results provide evidence for the occurrence of chromosome restructuring (Cai et al., 2014;Morrison et al., 2014) when diverged individuals hybridize, identifying a centromerebased mechanism for genomic instability. This phenomenon studied here depends on chimeric  . The process of genome elimination and connected models for chromosomal rearrangements. Genome elimination ensues when a haploid inducer expressing a variant CENH3 protein mates with the wild type. In many cases, the chromosomes marked by the variant CENH3 missegregate in the embryo and are compartmentalized in Figure 6. continued on next page CENH3, but a similar effect was observed when the haploid inducer strain expresses CENH3 of a close species (Maheshwari et al., 2015), indicating the effectiveness of natural and artificial variation. While the genesis and fate of restructured chromosomes is difficult to study in humans, their formation, effects, and even transmission in Arabidopsis are within experimental reach, as demonstrated by the enhancing effect of NHEJ mutants on haploid induction. The range of phenotypes, the formation of copy variants and of chimeric genes at junctions, and their occasional meiotic transmission, suggest that catastrophic chromosomal restructuring, could contribute to heritable genetic variation.

Plant material and growth conditions
All plants were grown in Sunshine Professional Mix Peat-Lite Mix 4 (SunGro Horticulture, Agawam, MA) under 16hr/8hr light/dark photoperiod in a growth room set at 21˚C. F1 seeds from GFP-tailswap crosses were germinated on MS agar plates and 2-week old seedlings were transplanted into soil. The lig4-2 (SAIL_597_D10) line used is in the Col-0 background. Genotyping primers (5′ to 3′) used to are lig4-2/LP2: GATATGACAAGCCTTGGCATGAATGT, lig4-2/RP: AAAGTGGATGACATCTCGCTG and LB1: GCCTTTTCAGAAATGGATAAATAGCCTTGCTTCC for the left border of the SAIL T-DNA insertion.
Genomic DNA preparation, sequencing and read processing All DNA samples were extracted from leaves using Nucleon Phytopure kits (GE Healthcare, Pittsburgh, PA). 1.5 μg of DNA were used for a PCR-free library preparation using the NEBNext DNA Library reagents with Nextflex-96 indexes (Bioo Scientific, Austin, TX) using a PCR-free protocol. 2 μl of each 96-barcoded libraries were pooled and sequenced using the 50 bp protocol on a single lane of Hiseq 2000 at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley. Demultiplexing was performed by the same facility and resulting raw reads were processed with a custom Python script (Filter_N_Adapter_Trim_Batchmode.pyavailable from GitHub repository: https://github.com/KorfLab/FRAG_project) that removes the filtered reads from Cassava 1.8, adapter sequences, reads that contain Ns and trims reads for quality. Figure 6. Continued micronuclei. DNA damage, NHEJ repair and restitution of the micronucleus to the euploid pole nucleus can result in aneuploidy or diploidy. Alternatively, shattered chromosomes result from chromothripsis and chromoanasynthesis. The former involves fragmentation and random ligation, the latter replication fork collapse and microhomologymediated strand switching. As a consequence, the pulverized and reassembled chromosome forms a single unit and can be meiotically inherited. The schematics for chromothripsis and chromoanasynthesis are shown sequentially for convenience, but their order has not been determined. In addition, our results obtained using DNA ligase4-2 mutants suggest that the NHEJ pathway plays an important role in the repair of the haploid inducer chromosomes that contribute to diploid and aneuploid progeny, such that when NHEJ is inhibited, haploid induction frequency increased. DOI: 10.7554/eLife.06516.017 Chromosome dosage analysis

SNP analysis
Positions polymorphic between Col-0 and Ler were identified using sequencing reads from a diploid Col/Ler hybrid control, a Ler plant and a Col-0 plant using custom python scripts. For dosage plot analyses, 50 bp single reads were mapped to the TAIR10 A. thaliana reference genome sequence using BWA (Li and Durbin, 2009) and default parameters. Dosage variation was detected as previously described (Henry et al., 2010 The genomic reference chromosomes were partitioned into consecutive nonoverlapping bins of 100,000 bp and the percentage of reads mapping to each bin from each sample was recorded. Relative coverage was calculated by dividing the percentage obtained for each bin by either the corresponding mean percentage for all individuals or the corresponding percentage for the control individual. The relative coverage was set at 2 to represent the diploid background copy value.

Tan et al., 2016
), and is described in detail at Bio-protocal Specifically, polymorphic positions were first identified if they were covered at least 25 times in the hybrid reads and contained two alleles, each representing at least 40% of the allelic calls. Reads from the Col-0 and Ler parents were then used to assign alleles to the two parents. Positions were only retained if they were homozygous in both parents (represented at least 97% of the allelic calls) and covered at least 6 times in the Col-0 library and at least once in the Ler library. This process resulted in the identification of 107,640 SNP positions (Supplementary file 6). Next, reads from each of the samples were mined for allele calls at these positions and each read was assigned to one or the other parent based on the parental information. If the read did not match either allele, the genotype was reported as 'na'. Finally, genotype information was pooled by consecutive, non-overlapping bins of 1 Mb to derive a percentage of Ler allele per bin for each sample. Using this measure, the Col-0/Ler diploid hybrid is expected to exhibit 50% Col-0 across the genome.

Cytogenetic analysis
All analyses were carried out using chromosome spreads from young anthers. BAC contigs specific for A. thaliana chromosomes 1 and 4 were used as painting probes. BAC DNA was labeled with biotin-, digoxigenin-or Cy3-deoxyuridine triphosphate by nick translation as previously described (Lysak and Mandáková, 2013). Labeled DNA probes were pooled, hybridized to suitable chromosome spreads and visualized using fluorescent microscopy. See Supplementary file 7 for the list of BAC clones used as painting probes.

Breakpoint assembly
Breakpoints from FRAG00062 were identified using a high-density 500 bp bin-size dosage plot produced using 50 bp reads extracted from 100 bp paired-end sequencing reads of the FRAG00062 library obtained from an Illumina HiSeq 2000 instrument. Blocks of duplicated or triplicated dosage were defined by eye. A custom script (batch-specific-junction-search.py -available from GitHub repository: https://github.com/KorfLab/FRAG_project) was used to extract the sequencing reads mapping within a 2000 bp region around each breakpoint. These sequences were then assembled using the PRICE genome assembler using the standard paired-end assembly setting (Ruby et al., 2013). Resulting contigs were aligned to the Arabidopsis reference genome by NCBI-BLASTN and characteristic breakpoint junctions were identified when two halves of a contig mapped disconcordantly to the reference genome. Primers flanking 12 randomly selected breakpoint junctions were designed using Primer3 (Li and Durbin, 2009) based on their respective de novo assembled contigs. Standard PCR procedures were used for amplification using oligo pairs (Supplementary file 8) and GoTaq Green Mastermix (Promega Corporation, Madison, WI) on 1 ng DNA from FRAG00062 and FRAG00080 (a diploid sibling control) followed by Sanger sequencing.

Breakpoint analysis
The A. thaliana TAIR10 genome annotation includes genomic locations for various features in Generic Feature Format Version 3 (GFF). Files specifying genes, transposon, satellite repeats, and replication origins were downloaded from the TAIR FTP site (ftp://ftp.arabidopsis.org//Maps/gbrowse_data/ TAIR10/). The GFF file containing the location of mapped replication origins was available from a study by Costas et al., (2011). These GFF files were combined with results about mapped DHS (Zhang et al., 2012) and details from the recent work by Sequeira-Mendes et al., (2014), which combined various published epigenomic studies to partition the entire genome into nine different chromatin states. Perl scripts were used to convert the DHS and chromatin state information into GFF format, and these scripts, along with the resulting combined GFF file are available from a GitHub repository: https://github.com/KorfLab/FRAG_project.
The set of genome features in the combined GFF file were compared to the annotated set of duplicated and triplicated blocks. Various Perl scripts available from the above GitHub repository, along with a GFF representation of all blocks were used to assess the enrichment of genomic features at the breakpoint regions of duplicated/triplicated blocks. Specifically, window sizes of either 1000 or 10,000 bp were centered on each breakpoint coordinate, and the number of bp contributed by each feature of interest were summed across all windows. We also calculated the number of bp contributed by each feature outside those windows. Enrichment ratios were then calculated using the percentage of bases occupied by each feature across all windows at breakpoints compared to percentage of the same features that occupy the remaining fraction of the aneuploid chromosome. The p-values were determined by shuffling experiments in which the locations of the breakpoints were randomized 1000 times, with the resulting shuffled ratios compared to the ratios observed in the real data (Supplementary file 5).