Human PRDM9 can bind and activate promoters, and other zinc-finger proteins associate with reduced recombination in cis

Across mammals, PRDM9 binding localizes almost all meiotic recombination hotspots. However, most PRDM9 motif sequence matches are not bound, and most PRDM9-bound loci do not become hotspots. To explore factors that affect binding and subsequent recombination outcomes, we mapped human and chimp PRDM9 binding sites in a human cell line, and measured PRDM9-induced H3K4me3 and gene expression changes. These data revealed varied DNA-binding modalities of PRDM9, and histone modifications that predict binding. At sites where PRDM9 binds, specific cis sequence motifs associated with TRIM28 recruitment, and histone modifications, predict whether recombination subsequently occurs. These results implicate the large family of KRAB-ZNF genes in consistent, localized meiotic recombination suppression. PRDM9 affects gene expression for a small number of genes including CTCFL and VCX, by binding nearby. Finally, we show that PRDM9’s DNA-binding zinc finger domain strongly impacts the formation of multimers, with a pair of highly diverged alleles multimerizing less efficiently.

. Moreover, not all PRDM9 binding sites 49 become hotspots (Baker et al., 2014; Grey et al., 2017), and the reasons for this remain unclear. In 50 particular, apart from PRDM9 motifs themselves there are no specific DNA sequence features that 51 have been shown to modulate recombination rate in cis in mammals, nor epigenetic modifications 52 shown to play a causal role genome-wide. 53 PRDM9 has been hypothesized to play a role in meiotic gene regulation given its H3K4 trimethy-  ., 2015). This large number of peaks likely results from the high expression level of 95 PRDM9 in this system, providing sensitivity to detect even weak binding interactions. Weak PRDM9 96 binding interactions such as these may help to explain the ∼40% of DSB events that occur outside 97 known hotspots in mice (Lange et al., 2016). 98 We compared our ChIP-seq data with a set of 18,343 published in vivo human DSB hotspot peaks 99 from DMC1 ChIP-seq experiments in testis samples (Pratto et al., 2014). We found evidence for 100 binding at up to 74% of DSB hotspots (at p<10 −3 ) after correcting for chance overlaps, demonstrating 101 that even in a completely different cell type and expression system, PRDM9 binds the majority of 102 hotspots. The proportion bound in our system is greater (up to 82%) at DSB hotspots not subject to 103 the telomere effect, which substantially increases the probability of DSB formation within roughly with the strength of PRDM9 binding in our system (Figure 1b), and conversely the probability of 107 overlap increases for hotter DMC1 peaks, especially in non-telomeric regions (Figure 1-S1b). 108 To investigate the histone methylation activity of PRDM9 and to provide an additional marker of 109 PRDM9 binding, we also performed ChIP-seq against the H3K4me3 mark in both transfected and 110 untransfected cells by the same method. After subtracting sites overlapping "pre-existing" H3K4me3 111 peaks (those present in untransfected cells), we found that 95% of PRDM9 binding peaks show 112 H3K4me3 following transfection (p<0.01), and this proportion increases to 100% with increasing 113 PRDM9 binding enrichment (see Figure 1b). That  H3K36me3 deposition at bound sites (see Figure 1-S1d). Further, full-length PRDM9 preferentially 118 binds more open chromatin, and appears to phase surrounding nucleosomes (see Figure 1-S1h), 119 again as seen in mice (Baker et al., 2014). However, the zinc finger domain by itself appears unable 120 to phase nucleosomes (see Figure 1-S1g). 121 Next, we compared enrichment values for PRDM9 and H3K4me3 in our cells with in vivo testis 122 H3K4me3 and DMC1 enrichment values computed from published raw data (Pratto et al., 2014) 123 (see Methods and Materials). PRDM9 enrichment in our HEK293T cells correlates with testis 124 H3K4me3 enrichment ( = 0.50), but shows a much lower raw correlation with testis DMC1 enrich-125 ment ( = 0.21), consistent with a layer of DSB regulation occurring downstream of PRDM9 binding 126 and H3K4me3 marking (Figure 1-S2), which we show below does indeed occur. Taken alone, the Manuscript under review present in A/B-only hotspots (Figure 1-S4; Pratto et al., 2014). Therefore, the B allele binds Motif 190 7 with greater affinity than does the A allele, demonstrating distinguishable binding preferences 191 between these alleles, which differ at a single DNA-contacting amino acid in ZF5 (Baudat et al., 192 194 To examine how the primary DNA sequence affects the probability of PRDM9 binding, we identi- 195 fied matches to each of our motifs genome-wide using FIMO (Bailey et al., 2015). Although the 196 probability of overlapping a PRDM9 binding peak increases linearly with motif match score, even 197 the strongest 0.1% of motif matches have only a 50% chance of overlapping a binding peak (see 198 Figure 2-S2a). Given that binding cannot be reliably predicted by even this multivariate motif score 199 alone, it must be influenced by the wider sequence and chromatin contexts of each motif match. 200 To identify factors that predict whether any given region of the genome will be bound by PRDM9 Human PRDM9 is able to bind promoters genome-wide 216 A study in mice has shown that in the absence of PRDM9, DSBs localize to active promoters marked 217 with H3K4me3. It has been suggested that PRDM9 may serve to provide alternative H3K4me3 218 sites to compete with and direct recombination away from promoters (Brick et al., 2012). However, 219 our ChIP-seq data revealed that, surprisingly, of the 12,982 protein-coding genes with H3K4me3 220 surrounding their Transcription Start Site (TSS) in our untransfected cells (p<10 −5 ), 81% have a 221 PRDM9 binding peak center within 500 bp of the TSS, compared to only 6% expected by chance 222 overlap. 223 Our power to detect binding at promoters is likely increased due to their overrepresentation 224 among ChIP-seq reads (Figure 2-S2, Jain et al., 2015). However, we see no promoter ChIP-seq 225 enrichment for the chimp PRDM9 W11a allele, which does not bind GC-rich DNA (see below; only 226 3% of promoters are within 500 bp of a chimp PRDM9 peak, versus 9% expected by chance overlap). 227 Furthermore, motif identification at human PRDM9's promoter binding sites revealed the expected 228 binding motifs at similar frequencies to non-promoter peaks, except for a 2-fold enrichment of 229 Motif 7. Interestingly, Motif 7 is also the B-allele enriched motif, so PRDM9's promoter affinity 230 might also differ between common human alleles. We suggest that these motifs, together with 231 accessible chromatin, allow the observed weak but consistent PRDM9 binding to these regions 232 (Figure 2c,Figure 2-S2), which tend to have lower mean enrichment estimates across a range of 233 motif FIMO scores (Figure 2-S2). Thus, the human B allele of PRDM9 can and does consistently bind 234 to promoters, but more weakly than to non-promoter regions. A recent study of PRDM9 binding in 235 mouse testes (Grey et al., 2017) found that mouse PRDM9 can also be present at a small number 236 of promoter regions, but this recruitment depended on Spo11 (absent in HEK293T cells) and was 237 hypothesized not to involve PRDM9's zinc fingers. Therefore, different alleles of PRDM9 interact   (Myers et al., 2008). The right side lists the percent of the top 1000 peaks ranked by enrichment (without further filtering) containing each motif type. Zinc finger residues at DNA-contacting positions (labeled -1, 3, 6) are illustrated below each zinc finger position, classified by polarity, charge, and presence of aromatic side chains. Zinc fingers 5 and 6 lack positively charged amino acids and contain aromatic tryptophan residues, and they coincide with a variably spaced motif region (indicated by vertical dotted lines). Motif 4 is truncated. b: H3K4me3 ChIP-seq data from PRDM9-transfected HEK293T cells (this study) and H3K4me3/DMC1 data from testes (Pratto et al., 2014) were force-called in a 1-kb window centered on each PRDM9 binding peak center (p<10 −6 , minimum peak separation 1000 bp) to provide a p-value for enrichment of each H3K4me3/DMC1 sample at each PRDM9 peak. In our parameterization, the enrichment value represents the fold enrichment over background, minus 1, at the base with the smallest p-value within each peak region. Peak windows with fewer than 5 input reads from cells or testes were filtered out, to improve enrichment estimates, and windows with excessive genomic coverage (in the top 0.1%ile) or IP coverage (>500 combined fragments) were removed to avoid outliers due to mapping errors. PRDM9 peaks overlapping H3K4me3 peaks from untransfected cells were removed, leaving 37,188 peaks passing all filters. Peaks were split into deciles according to their PRDM9 enrichment values, and the proportion of peaks with a force-called H3K4me3 or DMC1 p-value <0.05 is plotted within each decile. c: Peaks were stratified into quartiles based on increasing PRDM9 enrichment (light green to dark green) after filtering out promoters. Mean recombination rates (from the HapMap LD-based recombination map HapMap, 2007) at each base in the 20-kb region centered on each bound motif are plotted for each quartile, with smoothing (ksmooth, bandwidth 25). d: Peak enrichment quartiles (filtered to remove promoters as in c) were separated by motif type (motifs 2, 3, and 5 were combined due to low abundance), and the mean HapMap recombination rate overlapping peak centers was plotted against median PRDM9 enrichment in each quartile, with lines of best fit added for Motif 7 versus all other motifs (left plot). Plot showing the difference in the percentage of AB-only DMC1 peaks versus AA-only DMC1 peaks (Pratto et al., 2014) containing each motif type (right plot). Error bars indicate two standard errors of the mean (left plot) or 95% bootstrap confidence intervals (right plot).       Figure 2d). This effect cannot 243 be explained by the weaker PRDM9 enrichment that we observe at promoter peaks; for similar 244 enrichment values (strongly bound promoters versus weakly bound non-promoters), promoter 245 peaks have much lower recombination rates and DMC1 enrichment (see Figure 2, Figure 2-S2). 246 Although there is widespread human PRDM9 binding to promoters, PRDM9 seems utterly unable to 247 induce recombination at these sites; however, in the absence of PRDM9, DSBs localize to promoters 248 in mice (Brick et al., 2012) motif score for any of the 7 human motifs in each bin. b: As in a, but for the chimp PRDM9 W11a allele ChIP-seq dataset. "Motif Score" refers to the maximum FIMO motif score for the chimp motif (see Figure 4) in each bin. c: Mean HapMap recombination rates are reported for promoter (pink squares) and non-promoter (red circles) human PRDM9 peaks split into quartiles of PRDM9 enrichment (filtered to not overlap repeats or occur within 15 Mb of a telomere; error bars represent two standard errors of the mean). Both median enrichment values and recombination rates are greater for non-promoter peaks, even in overlapping ranges of enrichment. d: Mean recombination rate in 20-kb windows centered on bound motifs, for promoter (pink) and non-promoter (red) peaks further filtered only to include peaks with PRDM9 enrichment values between 1 and 2 (smoothing: ksmooth bandwidth 200).   binds directly to a subset of THE1B repeat copies containing matches to its target motif (Figure 3a), 290 in a known region of the repeat (Myers et al., 2008 test this we also examined mean H3K4me3 signals in testes in the same way (Figure 3-S2 not associate strongly with any sequence motifs outside the directly bound region, so it might act 334 as a local "pioneer" protein at least on this background, despite results in mice (Grey et al., 2017). 335 We then independently tested for the presence of motifs influencing recombination hotspot  to stronger/weaker PRDM9 binding. The remaining four "non-PRDM9" recombination-influencing 340 motifs show no association whatsoever with PRDM9 binding in HEK293T cells, and map well outside 341 the PRDM9 binding motif (Figure 3a). The strongest signal is for the motif ATCCATG (joint p=2.8×10 −9 342 for LD-hotspots, OR=0.32; p=2.5×10 −6 for DMC1 hotspots), whose presence within a THE1B repeat 343 produces a dramatic reduction in the surrounding recombination rate at PRDM9-bound THE1B 344 repeats (Figure 3b). ATCCATG (Figure 3c). More surprisingly a weak, but significant, increase 373 in H3K4me3 signal (p=7.5×10 −13 ) was also seen, even though this modification is more generally 374 associated with active chromatin regions including promoters. The same weak H3K4me3 peak was 375 also seen in testes, after restricting analysis to THE1B repeats not bound by PRDM9, indicating 376 this modification operates fully independently of PRDM9, and explaining how the H3K4me3 signal 377 also increases with ATCCATG presence when PRDM9 does bind. This weak increase might reflect 378 genuine partial co-occurrence of the two marks at the same locus (but possibly on different alleles, 379 or in different cells), or in theory it could be explained by non-specificity of experimental antibodies 380 for these two histone modifications. 381 We reasoned that we might more generally exploit the subtle H3K4me3 signal elevation (what-382 ever its underlying cause) as a potential marker also of H3K9me3 elevation in germline tissues, by 383 examining H3K4me3 in testes. We performed de novo motif finding to identify PRDM9-independent Confirming that these motifs also predict H3K9me3 levels, we observed almost perfect positive 390 correlation (r=0.93) between H3K4me3 signal strength in testes and H3K9me3 (as well as H3K4me3) 391 in, for example, particular ROADMAP ESC cell-lines (Figure 3-S1d). 14 of the 18 motifs showed 392 association with heterochromatin (p<2.5×10 −8 ), in at least one cell type. Therefore, this represents 393 a set of motifs for both H3K9me3 and H3K4me3, broadly observable across somatic cells and (at 394 least for the latter mark) testes also, and so we refer to this set as non-PRDM9 H3K9me3/H3K4me3 395 motifs.  of one or more as-yet-unknown KRAB-ZNF protein(s). The three specific ZNF proteins also all bind 446 sites overlapping those implicated in impacting H3K9me3/H3K4me3 and meiotic recombination, 447 two in the same region as the TRIM28 motif, but with differing sequence specificity (Figure 3a). 448 Thus, while not all human KRAB-ZNF proteins have yet been characterized, those that bind THE1B      are normally expressed only in spermatogenesis. We validated expression induction at these two 519 genes using qPCR ( Figure 5). 520 VCX encodes a small, highly charged protein of unknown function and has been previously 521 studied for its involvement in PRDM9-related non-homologous recombination events and X-linked    (Figure 6-S1). This is consistent with human PRDM9 binding strongly to itself, as demonstrated 575 previously (Baker et al., 2015b). 576 To narrow the PRDM9 domain(s) responsible for this self-binding behavior, we split the full-577 length human B-allele PRDM9 cDNA into two pieces: one containing only the C-terminal Zinc Finger 578 domain (the "ZFonly" construct), and one containing everything else (the "noZF" construct), and 579 tagged with HA or V5 as above (illustrated in Figure 6a). We co-transfected these constructs into     Manuscript under review array. Because the mock control lane is clean (Figure 6-S2a), this band likely reflects a real but weak 588 self-binding capability mediated by the non-ZF portion of PRDM9. In complete contrast, we saw 589 an intense co-IP band when co-transfecting ZFonly-HA with ZFonly-V5. Therefore, the zinc finger 590 domain of one PRDM9 protein can bind strongly to the zinc finger domain of another, while the 591 rest of the protein interacts more weakly. 592 To confirm this, we co-transfected full-length, V5-tagged human PRDM9 with either noZF-HA 593 or ZFonly-HA. Again, only a very faint co-IP band is visible with the noZF construct, and a very 594 intense band is visible with the ZFonly construct (Figure 6b), so the ZFonly construct is sufficient to 595 bind and pull down the full-length construct. This finding replicated in a repeat experiment, and 596 when reversing the direction of the IP-western experiment (Figure 6-S2b). Finally, no co-IP band is 597 seen in a negative control where we co-transfected the noZF construct with the ZFonly construct 598 corresponding to the other end of the protein (Figure 6b) rather than 12. We refer to these as Chimp-HA and Chimp-V5 (illustrated in Figure 6a). To  Chimp-V5 would be the "bait" pulled down by IP with anti-V5, and Chimp-HA and Human-HA would 618 be the co-IP "prey" detected by western blotting with anti-HA (we replicated by reversing the tags). 619 The results show that Chimp PRDM9 is >2-fold more efficiently pulled down, compared to Human 620 PRDM9, by Chimp PRDM9. Conversely Human PRDM9 is >2-fold more efficiently pulled down than 621 Chimp PRDM9, by Human PRDM9 (Figure 6c). Thus, PRDM9 preferentially forms homo-multimers 622 rather than hetero-multimers, at least for ZF arrays as highly diverged as Human and Chimp. for these features, in part, and find that they differ even between humans and chimpanzees, in a 629 manner dependent on the PRDM9 ZF-array. 630 The narrow widths and large number of our ChIP-seq peaks allowed us to recover no fewer Overview of the different C-terminally tagged PRDM9 constructs used. Both an HA and a V5 version of each construct were generated for co-IP experiments. b: Barplot showing the relative intensity of western blot co-IP bands normalized to input bands (from 50-g of total lysate protein) for each combination of bait and prey constructs. Whenever both bait and prey contain the zinc finger domain (green bars), the co-IP signal is much stronger than when either or both constructs lack a ZF domain (orange bars     and its binding is associated with different epigenetic features (Figure 2), resulting in broad-scale 643 binding differences between the human and chimp alleles (Figure 4) (Pratto et al., 2014;Figure 1-S2). Here we show that, even outside 652 of telomeric regions, broad-scale effects can influence recombination outcomes independently of 653 PRDM9 binding and local sequence context (Figure 3-S2). 654 One strongly negative predictor of recombination outcomes is presence within an active gene  (Figure 4c). 743 Given DSB suppression at promoters, nearby PRDM9 binding sites might be immune from the 744 effects of hotspot death, which would otherwise act to abolish its binding and drive potentially 745 deleterious mutations-potentially including any which weaken the promoter-to fixation in these 746 regions. Indeed, speculatively, this may even explain why recombination is actively suppressed at 747 promoters in certain organisms. 748 We have also demonstrated that PRDM9's ability to form multimers is mediated primarily by 749 its zinc finger array, while two highly diverged human and chimp alleles form hetero-multimers which we previously identified as a mechanism to explain the role of PRDM9 in fertility and speciation 762 in mice (Davies et al., 2016). In this case, a preference for homo-multimer formation would have 763 obvious advantages.  (Landt et al., 2012). 878 Sequencing reads were aligned to hg19 using BWA (v0.7.0-r313, option -q 10, Li and Durbin,

C-terminal tags
ChIP-seq datasets generated in this study. The datasets utilized in this analysis include the N-terminal YFP-tagged human construct used for most of the analysis as well as the C-terminal tagged constructs used in subsequent experiments. Columns 3 and 4 list the proportion of fragments estimated to arise from true signal genome-wide, as computed by our peak calling algorithm. Replicate 2 is assigned "n/a" when only one replicate was performed. Total peak numbers on the autosomes and on the X chromosome are listed in the second-to-last column (HEK293T cells lack a Y chromosome). The final column is an estimate of the proportion of 100-bp bins in the genome with evidence of enrichment at p<10 −5 .
replicates. Fragment coverage from each replicate was then computed at each position in the Calling PRDM9 binding peaks 899 We developed a maximum-likelihood-based peak calling algorithm that takes as input the number to ( ), a parameter representing the coverage due to binding enrichment in ChIP replicate 1 at bin . 918 We wish to test the hypothesis that ( ) ≥ 0 for each bin . 919 Estimating constants 920 To speed up this step and to provide smoother coverage estimates, we first computed coverage values in 100-bp bins across the autosomes. One can estimate by assuming (conservatively) that when 1 ( ) = 0 or 2 ( ) = 0, ( ) = 0. That is, one can assume that if ChIP replicate has coverage 0 at bin , then any coverage in the other replicate ( ′ ) arises purely from background. Thus for all such that ( ) = 0 and thus one can estimate ′ aŝ (2) Next, maximum likelihood estimation and hypothesis testing are performed across all bins (see 921 below), and̂ is re-computed as above, using coverage means from the subset of bins with p< 10 −10 , 922 for which the ratio of coverage between the two replicates will be less affected by noise. Recall that at each position the Poisson means for coverage in each lane are (dropping the notation for succinctness) wherê 1 ,̂ 2 , and̂ are constants estimated for the whole genome. To simplify calculations, we reparameterize using a new variable = ∕ and rewrite the above equations as Given the observed coverage values 1 , 2 , and , the Poisson log likelihood function can be written as ∝ − 1 + 1 log( 1 ) − 2 + 2 log( 2 ) − + log( ) = −̂ 1 − + 1 log(̂ 1 + ) −̂ 2 −̂ + 2 log(̂ 2 +̂ ) − + log( ) = − (̂ 1 +̂ 2 + 1) − (1 +̂ ) + 1 log(̂ 1 + ) + 2 log(̂ 2 +̂ ) + log( ).
Peak calling and centering 934 Given a likelihood ratio value Λ( ) for each base along a chromosome, along with a p-value threshold (which is converted to a lower bound on the likelihood ratio, ) and , a threshold on the minimum separation between peak centers, initial peak centers are found by identifying all significant bases (bases for which Λ( ) > ) that are local maxima. Specifically, each significant base is scanned to test if At each initial peak center satisfying these criteria, a confidence interval is computed by identifying Given observed overlaps between the sets of and peaks, we can compute the corrected overlap fraction, ∕ as follows. Let ∕ be the proportion of systematic overlaps, ∕ be the fraction of chance overlaps, and ∕ be the proportion of total overlaps. The probability of no overlap is simply the product of the complements of chance and systematic overlaps, as follows: Solving for ∕ then yields: Motif finding 961 For each peak, a 300-bp sequence (centered on the called peak center) was extracted from the 962 reference sequence (hg19). Ab initio motif calling was performed on sequences from the top 5,000 Manuscript under review was also reported, along with the maximum PRDM9 motif score within each bin, as computed 998 by FIMO software (Bailey et al., 2015). Bins were filtered to exclude those with fewer than 5 or 999 greater than 50 overlapping Input fragments (removing the bottom 10% and top 0.1% of coverage 1000 to eliminate outlying repetitive regions or regions with poor coverage). Peaks were defined as bins 1001 with p<10 −5 and enrichment >2 (∼100k bins for human), and non-peaks were defined as bins with 1002 p>0.5 and 0 enrichment (∼9.3M bins for human). 1003 To set up a binary classification problem that could be easily modeled and interpreted, non-1004 peaks were subsampled to an identical number as the peaks dataset and merged to serve as the 1005 input for modeling. This dataset was randomly split into five subsets, and the fifth subset was  detailed results of this conditional testing are given in Figure 3-source data 1, and were used to 1239 produce the first two rows of Figure 3a. 1240 Identifying motifs associated with previously measured H3K4me3 signal strength in testes 1241 A previous human study measured levels of H3K4me3 in testes (Pratto et al., 2014)  themselves and estimated effect sizes is provided in Figure 3- cantly altering hotspot probability (Figure 3a; Figure 3-source data 1). 1335 In particular, the most significant motif, associated with increased non-PRDM9 H3K9me9/H3K4me3,  does not alter the effect of non-PRDM9 H3K9me3/H3K4me3 motifs (Figure 3-  full resulting set of 1571 identified ROADMAP motifs and details is given in Figure 3- recombination rates, the collection of motifs is of biological interest in itself. We grouped highly 1448 co-occurring (and typically overlapping) motifs, collapsing motifs whose correlation (in which THE1B 1449 element each motif occurred in) was >50% until no further grouping was possible. This resulted in a 1450 set of 67 distinct "summary" motif groups, whose results are summarized in Figure 3- single protein with a well-defined target motif. Indeed, this analysis revealed a set of 7 motifs, all within a contiguous region of length 57bp and covering the 41 bases in bold and underlined below, mapping to the region 181-231 of the THE1B consensus sequence. The resulting extended "TRIM28" target motif is: of these histone modifications are discussed in the main text. 1528 Identifying motifs associated with binding of KRAB-ZNF genes, and TRIM28 recruitment, 1529 at THE1B repeats 1530 The above approach describes a method to identify sequence motifs within all or a subset of 1531 THE1B elements influencing 0-1 hotspot status. We applied the identical approach to attempt to and then identified motifs. We did the same for TRIM28, a protein recruited by the KRAB domains 1536 of many KRAB-ZNF proteins, and assayed in H1 human embryonic stem cells (Imbeault et al., 2017). 1537 In the first three cases, the identified motifs cluster and could be mapped to specific regions of 1538 THE1B, shown in Figure 3a and also described below. In the case of TRIM28 the signal is expected to 1539 be a superposition of sites of binding by different KRAB-ZNF proteins, complicating interpretation; 1540 indeed we identified 16 motifs, mapping throughout THE1B elements. The top-scoring motifs were 1541 TCCCTGC and CCATGTA. These heavily overlapped 2 of the 4 motifs altering (and in both cases 1542 decreasing) the probability of hotspot occurrence, including the highly significant motif ATCCATG. 1543 Therefore, we conditioned on the latter motif occurring and repeated our motif-finding for the 1544 resulting subset of THE1B repeat elements, reasoning that such TRIM28 peaks might be bound by a 1545 single protein with a well-defined target motif. Indeed, this analysis revealed a set of 7 motifs, all 1546 within a contiguous region of length 57 bp and covering the 41 bases in bold and underlined below, 1547 mapping to the region 181-231 of the THE1B consensus sequence. The resulting extended "TRIM28" 1548 target motif is shown below. There is some spacing variability in the first half of this motif among 1549 bound copies because of the variable number of copies of "CT" found in this region. This motif 1550 incorporates and links the hotspot-influencing motifs ATCCATG and CTGCACA (highlighted in blue). 1551 Moreover, it overlaps several additional motifs associated with (increasing, red below) non-PRDM9 1552 H3K9me3/H3K4me3. Finally this motif is disrupted by several motifs associated with decreasing 1553 (blue below) non-PRDM9 H3K9me3/H3K4me3. These overlaps are highlighted in the above figure, 1554 which gives results for all four motifs. 1555 As shown in the above alignment figure, we also identified two similar target motifs for binding 1556 of ZNF766 mapping to different parts of the THE1B repeat consensus. The previously unknown 1557 extended "TRIM28" motif above is therefore a recombination coldspot motif, and simultaneously a 1558 motif, including the motif "ATCCATG" and others, for TRIM28 recruitment, H3K9me3 deposition, and 1559 weaker H3K4me3 deposition, at the same locations. Moreover it appears that binding in THE1B     Figure supplement 1. DMC1, H3K36me3, and ATAC-seq signals surrounding human PRDM9 peaks. a: A comparison our autosomal PRDM9 peaks, called at various p-value thresholds ranging from 10 −8 to 10 −3 (minimum peak separation 250 bp), to a set of published DSB hotspots corresponding to the human A allele (from a set of 18,343 "Intersect" DMC1 hotspots found in multiple individuals, filtered to remove hotspots wider than 3 kb Pratto et al., 2014). Hotspots were further split into subsets occurring within 15 Mb of a telomere (turquoise) or not (orange). "Overlap" requires a PRDM9 peak center to fall within a reported DMC1 hotspot interval, and overlap fractions were corrected downward to account for chance overlaps (see Methods and Materials). b: DMC1 hotspots were split into decile bins by reported DMC1 heat, and the proportion of hotspots in each bin overlapping one or more of our PRDM9 peaks is indicated (error bars represent two standard errors of the proportion). c: Profile plot showing the mean H3K4me3 enrichment (measured in HEK293T cells transfected with human PRDM9) at bound human motifs conditioned not to have any H3K4me3 enrichment in untransfected cells. Grey lines indicate 2 standard errors of the mean. (smoothing: ksmooth, bandwidth 25) d: Profile plot showing the mean H3K36me3 enrichment (measured in HEK293T cells transfected with human PRDM9) at bound human motifs conditioned not to have any H3K36me3 enrichment in untransfected cells. Grey lines indicate 2 standard errors of the mean. NB: absolute enrichment values cannot be compared across samples. (smoothing: ksmooth, bandwidth 25) e-h: ATAC-seq profile plots surrounding a set of the ∼15,000 strongest human PRDM9 ChIP-seq peaks (filtered to require a motif match and to not overlap an annotated DNase hypersensitive site), across 4 different transfection samples. "Coverage" here refers to the frequency with which an ATAC-seq fragment center occurs at each position, such that "Nuc.-free" coverage tracks the centers of nucleosome-depleted regions, and "MonoNuc." coverage tracks the centers of single nucleosomes. Coverage values are normalized to the mean values observed between 1500 and 3000 bases away from each site, as a measure of background, and smoothed (ksmooth bandwidth = 50). The human-transfected cells show strongly phased nucleosomes centered at ∼100 bp to either side of the motif and an elevated signature of nucleosome depletion at the center (h), when compared to the three controls (e,f,g). The ZFonly result (g) suggests that the ZF domain alone is insufficient to produce this nucleosome phasing. These data also suggest that PRDM9 binding is favored in nucleosome-depleted regions. H3K4me3 ChIP-seq data from transfected HEK293T cells (this study) and H3K4me3/DMC1 data from testes (Pratto et al., 2014) were force-called in a 1-kb window centered on each PRDM9 binding peak center (p<10 −6 , minimum peak separation 1000 bp) to provide an enrichment value for each H3K4me3/DMC1 sample at each PRDM9 peak. Peaks were further split into subsets occurring within 15 Mb of a telomere (turquoise) or not (orange). Pairwise comparisons plot the mean force-called enrichment value of each sample (y axis) in each enrichment decile bin of each other sample (x axis). Points are positioned at the median value of each decile and error bars represent two standard errors of the mean. Raw Pearson correlation values are printed on each plot. All comparisons show a significant positive correlation (p<2×10 −16 ). Peak windows with fewer than 5 input reads from cells or testes were filtered out, to improve enrichment estimates, and windows with excessive genomic coverage (in the top 0.1%ile) or IP coverage (>500 combined fragments) were removed to avoid outliers due to mapping errors. PRDM9 peaks overlapping H3K4me3 peaks from untransfected cells were removed, leaving 37,188 peaks passing all filters. Interestingly, we observe an enrichment of H3K4me3 in telomeric peaks in our HEK293T cells but not in testes.
Manuscript under review   Here we use two motif score ranges to evaluate more carefully the distribution of functional motifs. Since there should be a higher proportion of functional motifs among higher-     et al., 2015). For 0.1 percentile bins of increasing FIMO score, the proportion of motif matches occurring within 150 bp of a PRDM9 peak center is plotted (p<10 −6 , minsep 250). Even the strongest 0.1% of motif matches are only bound 50% of the time. b: PRDM9 peaks overlapping Motif 1 (and having more than 5 input reads overlapping the peak center) were divided into those overlapping promoters (stringently, those within 1 kb of a TSS, overlapping an H3K4me3 peak in untransfected cells, and overlapping a DNase HS site; red), and non-promoters (failing those criteria and further not overlapping an H3K4me3 peak reported by any ENCODE data; see Methods and Materials; pink). Mean raw input coverage values are plotted in decile bins of FIMO score, with error bars representing ± 2 s.e.m. c,d: Same as b, but with mean sum of raw ChIP fragment coverage values in each bin (c) or mean computed enrichment values in each bin (d). Overall, promoters show greater input sequencing coverage and thus we have greater power to detect weak binding in these regions. When corrected for this sequencing bias, we see that promoter binding sites tend to have weaker binding enrichment for a given FIMO score. e: Mean force-called DMC1 enrichment values (Pratto et al., 2014) are reported for promoter (pink squares) and non-promoter (red circles) human PRDM9 peaks split into quartiles of PRDM9 enrichment (filtered to not overlap repeats or occur within 15 Mb of a telomere; error bars represent two standard errors of the mean). Both median PRDM9 enrichment values and DMC1 enrichment values are greater for non-promoter peaks, even in overlapping ranges of PRDM9 enrichment. f: Mean raw DMC1 coverage in 20-kb windows centered on bound motifs, for promoter (pink) and non-promoter (red) peaks further filtered only to include peaks with PRDM9 enrichment values between 1 and 2 (smoothing: ksmooth bandwidth 200). 56096000 56096500 56097000 56097500 56098000 56098500 56099000 56099500 56100000 56100500 56101000 56101500 56102000 56102500 56103000 56103500 5    Samples were split and run on two blots separately, one imaged using an anti-HA antibody (upper) and one using an anti-V5 antibody (lower). Exposure time was 4 minutes. Ladder lanes are overlaid on the left, with approximate sizes in kiloDaltons noted. Lanes are labeled according to which full-length Human construct (HA or V5) was used, as well as which antibody was used for immunoprecipitation. IgG heavy chains are visible around ∼50 kDa, while the Human allele is visible as a band around ∼100 kDA with two or three smaller bands beneath it, likely representing degradation products (Grey et al., 2011; Cole et al., 2014). "-" is a short-hand label for input lanes, for which 50 g of input chromatin was loaded in each well. The first six lanes demonstrate the specificity of the antibodies and their lack of cross-reactivity. The last two lanes show the co-IP experimental results confirming multimerization. Right: Two independent replicates were performed to confirm the formation of multimers with the full-length human constructs, using IgG mock control lanes to rule out nonspecific co-precipitation. Images were cropped to include only the PRDM9 bands. Input lane bands appear to have run lower than expected due to the use of a higher concentration of loading buffer in the IP lanes, an issue which was avoided in subsequent experiments. Western blots illustrating co-IP results for various combinations of full-length human, noZF, and ZFonly constructs. a: The third and fourth blots show only a very faint co-IP signal despite strong input expression of the noZF construct, indicating that the non-ZF portion of PRDM9 cannot form multimers efficiently with itself or full-length PRDM9. The first and second blots show strong co-IP signals for the ZFonly construct, indicating that the ZF domain binds itself and binds the full-length Human construct. The fifth plot shows that the ZFonly and noZF constructs do not bind each other and confirms that multimerization is not mediated by the C-terminal tags. b: A replication of the experiment shown in the first blot above, but performing the IPs and western blots with both tag combinations. This confirms that the full-length Human construct can pull down the ZFonly construct, and the ZFonly construct is sufficient to pull down the full-length Human construct.