Title Somatic LINE-1 retrotransposition in cortical neurons of Rett patients and healthy individuals

Mounting evidence supports that LINE-1 (L1) retrotransposition can occur postzygotically in healthy and diseased human tissues, contributing to genomic mosaicism in the brain and other somatic tissues of an individual. However, the genomic distribution of somatic L1Hs (Human-specific LINE-1) insertions and their potential impact on carrier cells remain unclear. Here, using a PCR-based targeted bulk sequencing approach, we profiled 9,181 somatic insertions from 20 postmortem tissues from five Rett patients and their matched healthy controls. We identified and validated somatic L1Hs insertions in both cortical neurons and non-brain tissues. In Rett patients, somatic insertions were significantly depleted in exons—mainly contributed by long genes—than healthy controls, implying that cells carrying MECP2 mutations might be defenseless against a second exonic L1Hs insertion. We observed a significant increase of somatic L1Hs insertions in the brain compared with non-brain tissues from the same individual. Compared to germline insertions, somatic insertions were less sense-depleted to transcripts, indicating that they underwent weaker selective pressure on the orientation of insertion. Our observations demonstrate that somatic L1Hs insertions contribute to genomic diversity and MECP2 dysfunction alters their genomic patterns in Rett patients. Author Summary Human-specific LINE-1 (L1Hs) is the most active autonomous retrotransposon family in the human genome. Mounting evidence supports that L1Hs retrotransposition occurs postzygotically in the human brain cells, contributing to neuronal genomic diversity, but the extent of L1Hs-driven mosaicism in the brain is debated. In this study, we profiled genome-wide L1Hs insertions among 20 postmortem tissues from Rett patients and matched controls. We identified and validated somatic L1Hs insertions in both cortical neurons and non-brain tissues, with a higher jumping activity in the brain. We further found that MECP2 dysfunction might alter the genomic pattern of somatic L1Hs in Rett patients.


51
The term "somatic mosaicism" describes the genomic variations that occur in the somatic cells that 52 make up the body of an individual. These variations contribute to intra-individual genetic diversity 53 among different cells (Campbell et al., 2015). In addition to various types of cancers, somatic 54 mosaicisms reportedly contribute to a variety of neurological disorders, including epilepsy, 55 neurodegeneration, and hemimegalencephaly (Poduri et al., 2013). The human-specific LINE-1 (L1Hs) 56 retrotransposon family is the only known family of active autonomous transposons in the human 57 genome (Hancks and Kazazian, 2012;Kazazian and Moran, 2017). L1s retrotranspose through a 58 process called target-primed reverse transcription (TPRT), with the capacity for de novo insertion into 59 new genomic locations in both germline and somatic cells (Cost et al., 2002;Luan et al., 1993). 60 Mounting evidence supports that L1Hs elements, with increased copy number in the brain relative to  (Table 1): 1) read pairs with non-specific amplification signals and incorrect 3' 135 truncation were removed based on the sequence of L1Hs 3' end (Read 2); 2) after merging paired-end 136 reads into contigs, chimeric molecules with abnormal contig structures were identified by BLAST and 137 filtered out; 3) reads with inconsistencies in BWA-MEM and BLAT alignments were defined as 138 mapping errors; and 4) putative somatic insertion signals without multiple PCR duplicates or those 139 present in different individuals were removed, as they were deemed likely to have resulted from 140 sequencing errors. After applying these error filters, the remaining insertions were annotated with peak 141 features to facilitate downstream analysis.

Improper alignment
We rejected reads with less than 30-bp alignment or more than 3 mismatches to the reference genome.

Chimera within L1 segment
We rejected reads with less than 95% identity (> 4 mismatches) to the L1Hs 3' end consensus sequence.

Chimera within poly-A tail
We rejected reads at risk of being chimeric (Upton et al., 2015). Read was re-aligned to hg19 using BLAST to find the corresponding best alignments for the non-retrotransposon and retrotransposon segments. Read was removed as a putative chimera when the overlap of the two best segments was > 10 bp and A% ≥ 50% or 6-10 bp and A% < 50%.

Subfamily filter
We rejected putative somatic insertion sites that overlapped with L1 young subfamilies (L1Hs and L1PA2-4) reference insertions.

Known non-reference filter
We rejected putative somatic insertion sites that overlapped with known non-reference L1 insertions in eul1db (Mir et al., 2015).

Misaligned reads
We rejected reads at risk of being misaligned, defined as inconsistent BWA and BLAT alignment.

Local SV
We rejected reads at risk of being derived from a nearby reference L1Hs (Upton et al., 2015). We extracted 2 kb from the reference genome extending downstream from an aligned non-retrotransposon section and aligned the full read contig against this region with BLAT to exclude genomic rearrangements.

Observed in common
We rejected putative somatic insertion sites observed in two or more individuals.

PCR duplicate
We rejected somatic insertion sites without supporting PCR duplicates.
Performance evaluation of the HAT-seq method using a positive control 146 To benchmark the performance of HAT-seq for detecting somatic L1Hs insertions, we experimentally   input, HAT-seq was able to detect somatic insertion events present in a single cell (Fig 2D and   173 Appendix 3).  Table). These results showed that HAT-seq performed in 183 combination with our error filters could successfully remove most artifacts and identify very low-184 frequency somatic insertions in bulk DNA samples.

12
Profiling of somatic L1Hs insertions in brain and non-brain human tissues 187 Next, we applied HAT-seq to 20 bulk samples obtained from postmortem neuronal (PFC neurons) and 188 non-neuronal tissues (heart, eye, or fibroblast) from five Rett syndrome patients and five 189 neurologically normal age-, gender-, and race-matched controls (Table 2 and S4-S7 Table). A total of 190 9,181 putative somatic L1Hs insertions were identified in these 20 HAT-seq libraries (S8 Table).  Owing to the rarity of each somatic insertion in the cell population and to the sensitivity limits of 208 various analytical methods, experimental validation of somatic insertions using unamplified bulk DNA,209 in particular when one of the primers is complementary to numerous homologous sequences in the 210 human genome is very challenging (Appendix 4). In theory, if a somatic insertion was unique to a 211 single cell, it would be impossible to detect it in any replicated gDNA extracted from the same tissue.

212
To circumvent this, we performed single-copy cloning by adapting a modified version of digital nested H and S10 Table). Four of these clonal somatic insertions were located in introns of TGM6, CNTN4, 218 DIP2C, and DGKB; three were sense-oriented to transcripts.  Table). We confirmed this 229 insertion was a full-length somatic L1Hs insertion with 14 bp TSD and a cleavage site at 5'-230 TT/AAAG-3', similar to the consensus L1 EN motif 5'-TT/AAAA-3' (Fig 4B-D). Notably, we also 231 validated this 5' junction by combining full-length PCR with 5' junction PCR (  In addition, we verified one fibroblast-and another heart-specific L1Hs insertion in two patients with 235 Rett syndrome (Fig 3E-F). The heart-specific L1Hs insertion in the Rett patient (UMB#1420) was 236 further resolved to be a highly 5' truncated L1Hs insertion (~800 bp) with 9 bp TSD and a cleavage  Table). The poly-A tails of these two clonal somatic insertions 238 were experimentally measured to be polymorphic, indicating that they may involve multiple mutations 239 after the original somatic retrotransposition events (Fig 3I and S10 Table). As previously reported 240 (Evrony et al., 2015;Grandi et al., 2013), poly-A tail was shown to be a highly mutable sequence  Table). We further quantified the allele fractions of this insertion using custom droplet digital PCR 247 (ddPCR) assay and found that 6.34% of fibroblasts and 2.87% of PFC neurons contained this L1Hs  Table). Our observations demonstrated that endogenous L1Hs could 249 retrotranspose in various types of non-brain tissues during human development.    Table). Our speculation was that if an L1Hs inserted into the exonic regions, 265 especially in important genes, of the MECP2 mutated cell, the cell would have a higher risk of death 266 and subsequently be cleared up; therefore, the observed exonic depletion of L1 insertions in Rett 267 patients might be resulted from the negative selection acting on those "lethal" exonic insertions. in all tissues from the same donor, we used germline insertion as endogenous control to measure the 284 relative copy number of genome-wide somatic insertions in the brain and non-brain tissues. We 285 quantified the relative somatic L1Hs content by calculating the L1Hs-derived read count ratio of 286 somatic to germline insertions using HAT-seq data of each sample (S14 Table; Table).

303
We next characterized the genome-wide germline L1Hs insertions. HAT-seq yielded greater than 320-304 fold enrichment for KR, KNR, and UNK L1Hs insertions (S15 Table). On average, 814 KRs, 183 305 KNRs, and 10 UNKs were identified in each bulk sample (Table 2, S5-7 Table). Hierarchical clustering 306 based on L1Hs profiles correctly paired all neuronal samples with the non-neuronal tissue samples of 307 the same individual ( Fig 6D). To experimentally validate the HAT-seq predicted germline insertions, 308 we performed 3' PCR validation on a random subset of polymorphic insertions from among the ten 309 individuals, including 8 sites out of 160 polymorphic KRs, 20 sites out of 451 KNRs, and 2 sites out 310 of 48 UNKs (S7 and S16 Table). As a result, all of the assayed sites were detected in 3' PCR, with 311 98.4% (120/122) and 100% (168/168) sensitivity and specificity, respectively (S16 Here, we present HAT-seq, a bulk DNA sequencing method to profile genome-wide L1Hs insertions  Table).

344
Clonally distributed insertions are prevalent in normal brain (Evrony et al., 2015). Increasing evidence   such single-cell approaches cannot achieve increased sensitivity without cost (Evrony et al., 2016).