Genomic DNA transposition induced by human PGBD5

Transposons are mobile genetic elements that are found in nearly all organisms, including humans. Mobilization of DNA transposons by transposase enzymes can cause genomic rearrangements, but our knowledge of human genes derived from transposases is limited. In this study, we find that the protein encoded by human PGBD5, the most evolutionarily conserved transposable element-derived gene in vertebrates, can induce stereotypical cut-and-paste DNA transposition in human cells. Genomic integration activity of PGBD5 requires distinct aspartic acid residues in its transposase domain, and specific DNA sequences containing inverted terminal repeats with similarity to piggyBac transposons. DNA transposition catalyzed by PGBD5 in human cells occurs genome-wide, with precise transposon excision and preference for insertion at TTAA sites. The apparent conservation of DNA transposition activity by PGBD5 suggests that genomic remodeling contributes to its biological function. DOI: http://dx.doi.org/10.7554/eLife.10565.001


Introduction
Transposons are genetic elements that are found in nearly all living organisms (1). They can contribute to the developmental and adaptive regulation of gene expression and are a major source of genetic variation that drives genome evolution (2). In humans and other mammals, they comprise about half of the nuclear genome (3). The majority of primate-specific sequences that regulate gene expression are derived from transposons (4), and transposons are a major source of structural genetic variation in human populations (5).
While the majority of genes that encode transposase enzymes tend to become catalytically inactive and their transposon substrates tend to become immobile in the course of organismal evolution, some can maintain their transposition activities (6,7). In humans, at least one hundred L1 long interspersed repeated sequences (LINEs) actively transpose in human genomes and induce structural variation (8), including somatic rearrangements in neurons that may contribute to neuronal plasticity (9). The human Transib-like transposase RAG1 catalyzes somatic recombination of the V(D)J receptor genes in lymphocytes, and is essential for adaptive immunity (10). The Mariner-derived transposase SETMAR functions in single-stranded DNA resection during DNA repair and replication in human cells (11).
Among transposase enzymes that can catalyze excision and insertion of transposon sequences, DNA transposases are distinct in their dependence only on the availability of competent genomic substrates and cellular repair enzymes that ligate and repair excision sites, as compared to retrotransposons which require transcription of the mobilized sequences (12). Most DNA transposases utilize an RNase H-like domain with three aspartate or glutamate residues (so-called DDD or DDE motif) that catalyze magnesium-dependent hydrolysis of phosphodiester bonds and strand exchange (13)(14)(15). The IS4 transposase family, which includes piggyBac 4 transposases, is additionally distinguished by precise excisions without modifications of the transposon flanking sequences (16). The piggyBac transposase and its transposon were originally identified as an insertion in lepidopteran Trichoplusia ni cells (17). The piggyBac transposon consists of 13-bp inverted terminal repeats (ITR) and 19-bp subterminal inverted repeats located 3 and 31 base pairs from the 5' and 3' ITRs, respectively (18). PiggyBac transposase can mobilize a variety of ITR-flanked sequences and has a preference for integration at TTAA target sites in the host genome (15,(18)(19)(20)(21)(22)(23).
Members of the piggyBac superfamily of transposons have colonized a wide range of organisms (24), including a recent and likely ongoing invasion of the bat M. lucifugus (25). The human genome contains 5 paralogous genes derived from piggyBac transposases, PGBD1-5 (24, 26). PGBD1 and PGBD2 invaded the common mammalian ancestor, and PGBD3 and PGBD4 are restricted to primates, but are all contained as single coding exons, fused in frame with endogenous host genes, such as the Cockayne Syndrome B gene (CSB-PGBD3)-PGBD3 fusion (24,27). Thus far, only the function of PGBD3 has been investigated in some detail. CSB-PGBD3 is capable of binding DNA, including endogenous piggyBac-like transposons in the human genome, but has no known catalytic activity, though biochemical and genetic evidence indicates that it may participate in DNA damage response (28,29). PGBD5 is distinct from other human piggyBac-derived genes by having been domesticated much earlier in vertebrate evolution approximately 500 million years (My) ago, in the common ancestor of cephalochordates and vertebrates (24, 30). PGBD5 is transcribed as a multi-intronic but nonchimeric transcript predicted to encode a full-length transposase (30). Furthermore, PGBD5 expression in both human and mouse appears largely restricted to the early embryo and certain 5 areas of the embryonic and adult brain (24, 30). These intriguing features prompted us to investigate whether human PGBD5 has retained the enzymatic capability of mobilizing DNA.

Results
Human PGBD5 contains a C-terminal RNase H-like domain that has approximately 20% sequence identity and 45% similarity to the active lepidopteran piggyBac, ciliate piggyMac, and mammalian piggyBat transposases ( Fig. 1A and S1) (24, 25, 31). We reasoned that even though the ancestral transposon substrates of PGBD5 cannot be predicted due to its very ancient evolutionary origin (~500 My), preservation of its transposase activities should confer residual ability to mobilize distantly related piggyBac-like transposons. To test this hypothesis, we used a synthetic transposon reporter PB-EF1-NEO comprised of a neomycin resistance gene flanked by T. ni piggyBac ITRs (Fig. 1B) (20,32). We transiently transfected human embryonic kidney (HEK) 293 cells, which lack endogenous PGBD5 expression with the PB-EF1-NEO transposon reporter plasmid in the presence of a plasmid expressing PGBD5, and assessed genomic integration of the reporter using clonogenic assays in the presence of G418 to select cells with genomic integration conferring neomycin resistance (Fig. 1C, Fig. S2). Given the absence of suitable antibodies to monitor PGBD5 expression, we expressed PGBD5 as an N-terminal fusion with the green fluorescent protein (GFP). We observed significant rates of neomycin resistance of cells conferred by the transposon reporter with GFP-PGBD5, but not in cells expressing control GFP or mutant GFP-PGBD5 lacking the transposase domain (Fig. 1C), despite equal expression of all transgenes (Fig. S3). The efficiency of neomycin resistance conferred by the transposon reporter with GFP-PGBD5 was approximately 4.5-fold less than that of the T. ni piggyBac-derived transposase (Fig. 1D), consistent with their evolutionary divergence. These results suggest that human PGBD5 can promote genomic integration of a piggyBac-like transposon.
If neomycin resistance conferred by the PGBD5 and the transposon reporter is due to genomic integration and DNA transposition, then this should require specific activity on the transposon ITRs. To test this hypothesis, we generated transposon reporters with mutant ITRs and assayed them for genomic integration (Fig. 1B & S4). DNA transposition by the piggyBac family transposases involves hairpin intermediates with a conserved 5'-GGGTTAACCC-3' sequence that is required for target site phosphodiester hydrolysis (15). Thus, we generated reporter plasmids lacking ITRs entirely or containing complete ITRs with 5'-ATATTAACCC-3' mutations predicted to disrupt the formation of productive hairpin intermediates (15). To enable precise quantitation of mobilization activity, we developed a quantitative genomic PCR assay using primers specific for the transposon reporter and the endogenous human TK1 gene for normalization (Fig. S5, S6). In agreement with the results of the clonogenic neomycin resistance assays, we observed efficient genomic integration of the donor transposons in cells transfected by GFP-PGBD5 as compared to the minimal signal observed in cells expressing GFP control (Fig. 1E). Deletion of transposon ITRs from the reporter reduced genomic integration to background levels (Fig. 1E). Consistent with the specific function of piggyBac family ITRs in genomic transposition, mutation of the terminal GGG sequence in the ITR significantly reduced the integration efficiency (Fig. 1E). These results indicate that specific transposon ITR sequences are required for PGBD5-mediated DNA transposition.
DNA transposition by piggyBac superfamily transposases is distinguished from most other DNA transposon superfamilies by the precise excision of the transposon from the donor site and preference for insertion in TTAA sites (20,32). To determine the structure of the donor sites of transposon reporters mobilized by PGBD5, we isolated plasmid DNA from cells two days after transfection, amplified the transposon reporter using PCR, and determined its 8 sequence using capillary Sanger sequencing (Fig. S7). Similar to the hyperactive T. ni piggyBac, cells expressing GFP-PGBD5, but not those expressing GFP control vector, exhibited robust excision of ITR-flanked transposon with apparently precise repair of the donor plasmid ( Fig. 2A, 2B & S7). These results suggest that PGBD5 is an active cut-and-paste DNA transposase.
To validate chromosomal integration and determine the location and precise structure of the insertion of the reporter transposons in the human genome, we isolated genomic DNA from G418-resistant HEK293 cells following transfection with PGBD5 and PB-EF1-NEO, and amplified the genomic sites of transposon insertions using flanking-sequence exponential anchored (FLEA) PCR, a technique originally developed for high-efficiency analysis of retroviral integrations (33). We adapted FLEA-PCR for the analysis of genomic DNA transposition by using unique reporter sequence to prime polymerase extension upstream of the transposon ITR into the flanking human genome, followed by reverse linear extension using degenerate primers, and exponential amplification using specific nested primers to generate chimeric amplicons suitable for massively parallel single-molecule Illumina DNA sequencing To infer the mechanism of genomic integration of transposon reporters, we analyzed the sequences of the insertion loci to determine integration preferences at base pair resolution and identify potential sequence preferences. We found that transposon amplicons isolated from cells expressing GFP-PGBD5, but not those isolated from GFP control cells, were significantly enriched for TTAA sequences, as determined by sequence entropy analysis (35) (Fig. 2C). To discriminate between potential DNA transposition at TTAA target sites and alternative mechanisms of chromosomal integration, we classified genomic insertions based on target sites containing TTAA and those containing other sequence motifs (Table 1) transposases, PiggyBac superfamily transposases are thought to utilize a triad of aspartate or glutamate residues to catalyze phosphodiester bond hydrolysis, but the catalytic triad of aspartates previously proposed for T. ni piggyBac is apparently not conserved in the primary sequence of PGBD5 (Fig. S1) (14,15,24,36). Thus, we hypothesized that distinct aspartic or glutamic acid residues may be required for DNA transposition mediated by PGBD5. To test this hypothesis, we used alanine scanning mutagenesis and assessed transposition activity of GFP-PGBD5 mutants using quantitative genomic PCR (Fig. 3 & S10). This analysis indicated that simultaneous alanine mutations of D168, D194, and D386 reduced apparent transposition activity to background levels, similar to that of GFP control (Fig. 3). We confirmed that the mutant GFP-PGBD5 proteins have equivalent stability and expression as the wild-type protein in cells by immunoblotting against GFP (Fig. 3B). These results suggest that PGBD5 represents a distinct member of the piggyBac family of DNA transposases.

Discussion
Our current findings indicate that human PGBD5 is an active piggyBac transposase that can catalyze DNA transposition in human cells. DNA transposition by PGBD5 requires its Cterminal transposase domain, and depends on specific inverted terminal repeats derived from the lepidopteran piggyBac transposons (Fig. 1). DNA transposition involves trans-esterification reactions mediated by DNA hairpin intermediates (15). Consistent with the requirement of intact termini of the piggyBac, Tn10, and Mu transposons (18), elimination or mutation of the terminal GGG nucleotides from the transposon substrates also abolishes the transposition activity of PGBD5 ( Fig. 1). PGBD5-induced DNA transposition is precise with preference for insertions at TTAA genomic sites (Fig. 2). Since our analysis was limited to ectopically expressed PGBD5 fused to GFP and episomal substrates derived from lepidopteran piggyBac transposons, it is possible that endogenous PGBD5 may exhibit different activities on chromatinized substrates in the human genome.
Current structure-function analysis indicates that PGBD5 requires three aspartate residues to mediate DNA transposition (Fig. 3), but its DDD domain appears to be distinct from other piggyBac transposase enzymes with respect to primary sequence ( Fig. S1) (14). Thus, the three aspartate residues required for efficient DNA transposition by PGBD5 may form a catalytic triad that functions in phosphodiester bond hydrolysis, similar to the DDD motif in other piggyBac family transposases, or alternatively may contribute to other steps in the transposition reaction, such as synaptic complex formation, hairpin opening, or strand exchange (14,15,18). In addition, we find that alanine mutations of the three required aspartate residues in the PGBD5 transposase domain significantly reduce but do not completely eliminate genomic integration of the transposon reporters (Fig. 3). This could reflect residual catalytic activity despite these mutations, or that PGBD5 expression may affect other mechanisms of DNA integration in human cells.
The evolutionary conservation of the transposition activity of PGBD5 suggests that it may have hitherto unknown biologic functions among vertebrate organisms. DNA transposition is a major source of genetic variation that drives genome evolution, with some DNA transposases becoming extinct and others domesticated to evolve exapted functions. The evolution of transposons' activities can be highly variable, with some organisms such as Z. mays undergoing continuous genome remodeling and recent two-fold expansion through endogenous retrotransposition, Drosophila and Saccharomyces owing over half of their known spontaneous mutations to transposons, and primate species including humans exhibiting relative extinction of transposons (1).
Evolutionary conservation of transposase genes is generally interpreted as evidence of their biological function. However, these functions can undergo exaptation, with biochemical activities of transposase genes and their transposon substrates evolving to have endogenous functions other than genomic transposition per se. For example, human RAG1 is a domesticated Transib transposase that has retained its active transposase domain, and can transpose ITRcontaining transposons in vitro, but catalyzes somatic recombination of immunoglobulin and Tcell receptor genes in lymphocytes across signal sequences that might be derived from related transposons (37,38). Human SETMAR is a Mariner-derived transposase with a divergent DDN transposase domain that has retained its endonuclease but not transposition activity, and functions in double strand DNA repair by non-homologous end joining (7). The human genome encodes over 40 other genes derived from DNA transposases (1, 3), including THAP9 that was recently found to mobilize transposons in human cells with as of yet unknown function (39). RAG1, THAP9 and PGBD5 are, to our knowledge, the only human proteins with demonstrated transposase activity.
The distinct biochemical and structural features of PGBD5 indicated by our findings are consistent with its unique evolution and function among human piggyBac derived transposase genes (24, 30). PGBD5 exhibits deep evolutionary conservation predating the origin of vertebrates, including a preservation of genomic synteny across lancelet, lamprey, teleosts, and amniotes (30). This suggests that while PGBD5 likely derived from an autonomous mobile element, this ancestral copy was immobilized early in evolution and PGBD5 can probably no longer mobilize its own genomic locus, at least in germline cells. The human genome contains several thousands of miniature inverted repeat transposable elements (MITE) with similarity to piggyBac transposons (1, 24). CSB-PGBD3 can bind to the piggyBac-derived MER85 elements 13 in the human genome (28, 29). Similarly, it is possible that PGBD5 can act in trans to recognize and mobilize one or several related MITEs in the human genome. Recently, single-molecular maps of the human genome have predicted thousands of mobile element insertions, and the activity of PGBD5 or other endogenous transposases may explain some of these novel variants (40,41). PGBD5 localizes to the cell nucleus, and is expressed during embryogenesis and neurogenesis, but its physiological function is not known (30).
Given that both human RAG1 and ciliate piggyMac domesticated transposases catalyze the elimination of specific genomic DNA sequences (10, 31), it is reasonable to hypothesize that PGBD5's biological function may similarly involve the excision of as of yet unknown ITRflanked sequences in the human genome or another form of DNA recombination. Since DNA transposition by piggyBac family transposases requires substrate chromatin accessibility and DNA repair, we anticipate that additional cellular factors are required for and regulate PGBD5 functions in cells. Likewise, just as RAG1-mediated DNA recombination of immunoglobulin loci is restricted to B lymphocytes, and rearrangements of T-cell receptor genes to T lymphocytes, potential DNA rearrangements mediated by PGBD5 may be restricted to specific cell types and developmental periods. Generation of molecular diversity through DNA recombination during nervous system development has been a long-standing hypothesis (42,43). The recent discovery of somatic retrotransposition in human neurons (44)(45)(46), combined with our finding of DNA transposition activity by human PGBD5, which is highly expressed in neurons, suggest that additional mechanisms of somatic genomic diversification may contribute to vertebrate nervous system development.
Because DNA transposition is inherently topological and orientation of transposons can affect the arrangements of reaction products (47), potential activities of PGBD5 can depend on

Reagents
All reagents were obtained from Sigma Aldrich if not otherwise specified. Synthetic oligonucleotides were obtained from Eurofins MWG Operon (Huntsville, AL, USA) and purified by HPLC. After 24 hours, transfected cells were trypsinized and re-plated for functional assays.  1. Ct values were calculated using ROX normalization using the ViiA 7 software (Applied Biosystems). We determined the quantitative accuracy of this assay using analysis of serial dilution PB-E1-NEO plasmid as reference (Fig. S6).

Flanking sequence exponential anchored (FLEA) PCR
To amplify genomic transposon integration sites, we modified flanking sequence exponential anchored (FLEA) PCR (33), as described in Supp. Fig. S8 (34). To exponentially amplify the purified products, beads were resuspended in a total volume of 50 μl containing 500 nM of exponential and Transposon1 primers, and the Platinum

Lentivirus production and cell transduction
Lentivirus production was carried out as described in (55

Statistical analysis
Statistical significance values were determined using two-tailed non-parametric Mann-Whitney tests for continuous variables, and two-tailed Fisher exact test for discrete variables.