The T1150A cancer mutant of the protein lysine dimethyltransferase NSD2 can introduce H3K36 trimethylation

Protein lysine methyltransferases (PKMTs) play essential roles in gene expression regulation and cancer development. Somatic mutations in PKMTs are frequently observed in cancer cells. In biochemical experiments, we show here that the NSD1 mutations Y1971C, R2017Q, and R2017L observed mostly in solid cancers are catalytically inactive suggesting that NSD1 acts as a tumor suppressor gene in these tumors. In contrast, the frequently observed T1150A in NSD2 and its T2029A counterpart in NSD1, both observed in leukemia, are hyperactive and introduce up to three methyl groups in H3K36 in biochemical and cellular assays, while wildtype NSD2 and NSD1 only introduce up to two methyl groups. In Molecular Dynamics simulations, we determined key mechanistic and structural features controlling the product specificity of this class of enzymes. Simulations with NSD2 revealed that H3K36me3 formation is possible due to an enlarged active site pocket of T1150A and loss of direct contacts of T1150 to critical residues which regulate the product specificity of NSD2. Bioinformatic analyses of published data suggested that the generation of H3K36me3 by NSD2 T1150A could alter gene regulation by antagonizing H3K27me3 finally leading to the upregulation of oncogenes.

Protein lysine methyltransferases (PKMTs) play essential roles in gene expression regulation and cancer development. Somatic mutations in PKMTs are frequently observed in cancer cells. In biochemical experiments, we show here that the NSD1 mutations Y1971C, R2017Q, and R2017L observed mostly in solid cancers are catalytically inactive suggesting that NSD1 acts as a tumor suppressor gene in these tumors. In contrast, the frequently observed T1150A in NSD2 and its T2029A counterpart in NSD1, both observed in leukemia, are hyperactive and introduce up to three methyl groups in H3K36 in biochemical and cellular assays, while wildtype NSD2 and NSD1 only introduce up to two methyl groups. In Molecular Dynamics simulations, we determined key mechanistic and structural features controlling the product specificity of this class of enzymes. Simulations with NSD2 revealed that H3K36me3 formation is possible due to an enlarged active site pocket of T1150A and loss of direct contacts of T1150 to critical residues which regulate the product specificity of NSD2. Bioinformatic analyses of published data suggested that the generation of H3K36me3 by NSD2 T1150A could alter gene regulation by antagonizing H3K27me3 finally leading to the upregulation of oncogenes.
Epigenetic modifications including DNA methylation and histone modifications are key determinants of important cellular processes and differentiation events (1,2), and perturbations in epigenetic mechanisms can lead to the initiation and progression of many diseases including cancer (3). Many different types of histone post-translational modifications have been discovered so far, including lysine and arginine methylation, lysine acetylation, threonine and serine phosphorylation, and others (4). The combination of all these histone modifications, forming the so-called histone code, directs the organization of chromatin and subsequent control of gene expression. Lysine methylation of histone N-terminal tails is one of the most abundant histone modifications which is introduced by the site-specific protein lysine methyltransferases (PKMTs) (5). Depending on the site of the methylated lysine, it could be linked either with repressive (H3K9me2/3 and H3K27me3) or active chromatin states (as in the case of H3K4me1/2/3). Histone H3 lysine 36 (H3K36) methylation is connected with several biological processes including control of gene expression, DNA repair, and recombination as well as alternative splicing (6)(7)(8). H3K36 methylation in human cells is mainly catalyzed by five different enzymes (NSD1, NSD2, NSD3, ASH1L, and SETD2). While NSD1, NSD2, NSD3, and ASH1L can introduce only up to dimethylation of H3K36 in vitro and in vivo (5,7,9,10), SETD2 is the main human enzyme that can introduce up to trimethylation of H3K36 (H3K36me3) in somatic cells (11,12). The difference in the methylation states together with their different genomic localization are key factors in the complexity of the H3K36 histone code because each methylation state of K36 encodes different biological downstream cascades (6,7,13). The H3K36me2 mark occurs at intergenic regions and promoters while the H3K36me3 mark is enriched at gene bodies of active genes (14)(15)(16)(17). H3K36me2 directly interacts with the DNA methyltransferase DNMT3A via its proline-tryptophan-tryptophan-proline (PWWP) domain which binds preferentially to H3K36me2 at intergenic regions (18)(19)(20), while H3K36me3 is bound by the PWWP domain of DNMT3B (21). H3K36me2/3 were shown to act as antagonists of H3K27 trimethylation (22,23), but this effect is more pronounced in the case of H3K36me3, as illustrated by global mass spectrometric analyses (24)(25)(26). H3K27me3 is another repressive histone mark that is deposited by the PRC2 complex and has an important role in development.
The nuclear receptor binding SET domain protein 2 (NSD2) (also known as MMSET, and WHSC1) (8) and its human paralogs NSD1 (also known as KMT3B) (27) and NSD3 (WHSC1L1) (28), are Su(var)3 to 9, Enhancer-of-zeste, Trithorax (SET) domain containing PKMTs (8,29). All NSD enzymes share a similar domain organization. Their catalytic SET domain at the C-terminal part of the enzyme comprises three subdomains, pre-SET or associated with SET (AWS), SET, and post-SET. It is responsible for the methylation of histone H3K36 using the methyl donor cofactor S-adenosyl-L-methionine (AdoMet) (8,28). Similar to some other PKMTs (e.g., Clr4 and EZH2 (30)(31)(32)(33)), NSD PKMTs adopt an autoinhibitory conformation in which an autoregulatory loop (ARL) connecting the SET and post-SET domains blocks the substrate lysine binding channel in the absence of substrate (10,34). Recently, a crystal structure of NSD2 bound to a nucleosomal substrate revealed crucial contacts of NSD2 with the DNA, the H3 tail, and histone H2A which altogether stabilize the NSD2 ARL in an active open conformation (35,36).
In addition to the catalytic SET domain, NSD enzymes contain regulatory domains including two PWWP domains which are important for binding to DNA and methylated H3K36. In the case of NSD2, the first N-terminal PWWP domain was shown to bind H3K36me2 and stabilize NSD2 binding at chromatin (37). Furthermore, the NSD enzymes contain five plant homeodomains (PHD) and an atypical (C5HCH) plant homeodomain (PHD) finger and NSD2 has a unique high mobility group motif which contributes to its nuclear localization (8,38). In the context of nucleosomal substrates, H3K36 was the specific site to be methylated by NSD2 (9). In the case of NSD1, K168 of the linker histone H1.5 was shown to be methylated in addition to H3K36 (39). Moreover, some non-histone targets were shown to be methylated by NSD1, like NFkB-p65 (K218 and K221) (40), ATRX (K1033) as well as the small nuclear RNA-binding protein U3 (K189) (39).
In agreement with the complex functions of NSD-catalyzed H3K36 dimethylation and non-histone methylation, NSD enzyme dysfunction was linked to several diseases ranging from developmental disorders to cancers (6)(7)(8). Heterozygous loss of NSD2 is responsible for a developmental disease called Wolf-Hirschhorn syndrome (WHS) (41). Haploinsufficieny of NSD1 was linked to an overgrowth syndrome called SOTOS syndrome (42). Regarding cancer involvement, NSD2 is overexpressed and acts as a major regulator of gene transcription and disease progression in multiple myeloma cells harboring t (4,14) translocations (43). Increased activity of NSD1 is a common feature in AML with the t(5;11)(q35;115) chromosomal translocation which results in a fusion of the N-terminal domains of the nucleopore 98 (NUP98) protein to the C-terminal part of NSD1 (44)(45)(46). In addition, many missense variants of NSD1 and NSD2 were observed in various types of cancers like hematological cancers (47,48), head and neck squamous cell carcinomas (49), human brain tumor cell lines (50), and lung cancers (51)(52)(53).
Unlike the loss-of-function changes caused by gene deletions, deciphering the biological effects caused by somatic cancer missense mutation is more challenging. Many of these mutations are observed in PKMTs in different types of cancers and some of them were shown to dramatically change enzyme activity, product specificity, substrate specificity, or other enzyme properties (54)(55)(56)(57). This highlights how a single amino acid exchange could mechanistically drive carcinogenesis. A frequent NSD2 missense single point mutation (E1099K) was detected in many patients with leukemia. This mutant was comprehensively characterized and shown to be hyperactive (47,55,58,59). Structural analyses revealed that it destabilizes the NSD2 ARL leading to higher activity (35,36). However, the effects of other frequent missense cancer mutants in NSD2 and its paralog NSD1 are not well understood. In particular, T1150A in NSD2 is one of the frequent somatic cancer mutants and the corresponding T1232A mutation in NSD3 has been shown before to be a driver mutant which increases cell proliferation and xenograft tumor growth (35).
In this work, we aimed to characterize the biochemical effects of the frequent somatic missense cancer mutation in NSD2 (T1150A) and its NSD1 analog (T2029A) both observed in leukemia in addition to three other somatic cancer mutations in NSD1 observed mostly in solid cancers (Y1971C, R2017Q and R2017L). We show that Y1971 and R2017 are critical residues in NSD1 and their mutation leads to a loss of enzyme activity suggesting that NSD1 acts as a tumor suppressor gene in these cancers. In contrast, our data reveal that the T1150A/T2029A mutants of NSD2 and NSD1 are hyperactive, and in biochemical and cellular assays, we observed that they introduce up to H3K36me3, while the wild type (WT) NSD2 and NSD1 enzymes can only generate H3K36me2. In Molecular Dynamics (MD) simulations, we demonstrate that H3K36me3 generation is possible due to an enlarged active site pocket of the mutant that allows AdoMet binding to an NSD2 T1150A-H3K36me2 complex in a productive conformation leading to H3K36me3 generation. Furthermore, we provide evidence that the generation of H3K36me3 leads to altered gene regulation in lymphocytic leukemia tumor cells by antagonizing H3K27me3 finally leading to the upregulation of oncogenes.

Results and discussion
Selection of NSD2 and NSD1 somatic cancer mutants to study For the selection of somatic cancer mutants in NSD2 that potentially affect the methyltransferase activity, we screened the Catalogue of somatic mutants in the cancer (COSMIC) database (60) and filtered for missense mutations in the catalytic SET domain (Fig. 1A). The most frequently observed NSD2 SET domain missense mutation is E1099K, which is followed by T1150A. Both mutations were specifically observed in patients with hematological cancers. While NSD2 E1099K has already been well characterized (35,36,47,55,58,59), very little was known about T1150A when we started our work, which is why NSD2 T1150A was selected as the main focus of this study. In addition, we were interested to compare the NSD2 data with one of its homologs, NSD1, and screened the COSMIC profile of the NSD1 SET domain as well (Fig. 1B). Indeed, the NSD1 T2029A mutation, corresponding to NSD2 T1150A, has been observed in hematological malignancies as well. Moreover, the NSD1 COSMIC profile contains additional missense mutations, and the Y1971C and R2017Q/L mutations have similar frequencies as T2029A. They were observed in different types of solid cancers like the digestive tract, endometrial, and other soft tissue carcinomas, and we included them in our study as well.
Activity analysis of the somatic cancer mutants of NSD2 and NSD1 First, we wanted to compare the methyltransferase activity of the mutants with the corresponding WT enzymes using in vitro methyltransferase assays. To this end, the GST-NSD2-SET WT and the corresponding T1150A mutant, created by site-directed mutagenesis, were overexpressed in Escherichia coli and purified from bacterial cells (Fig. 1C). The catalytic SET domains of NSD1 and NSD2 were shown before to recapitulate the fulllength enzyme in activity as well as substrate and product specificity (9,10,61). Equal concentrations of both GST-NSD2 WT and T1150A mutant were mixed in methylation buffer supplemented with radioactively labeled AdoMet with one of three different histone substrates, H3K36 unmodified peptide, H3.1 recombinant protein or recombinant H3.1 mononucleosomes. The methylation signal was detected afterward by autoradiography (Figs 1, D-F and S1). Consistently, the T1150A cancer mutant showed hyperactivity compared to WT on all three different histone substrates. Quantification of the autoradiographic signals revealed around 9-fold higher activity on the H3 peptide substrate, 4-fold on recombinant H3.1 protein, and 7-fold on recombinant mononucleosomal substrates ( Fig. 1, D-F). This finding is in agreement with a recent study which also reported hyperactivity of NSD2 T1150A (36) and the observation that overexpression of NSD1 and NSD2 and the activating NSD2 mutation E1099K are frequently observed in blood cancers as described earlier. Additionally, these results suggest that the NSD2 T1150A mutant mediates carcinogenesis through a gain-of-function mechanism.
For comparison with NSD1, we used an existing murine NSD1-SET domain WT construct (39) that shares >95% identity with the human protein. We then cloned and purified the NSD1 mutants (Fig. 1G) and tested their methyltransferase activity as well. Consistently, NSD1 T2029A also showed an . The selected mutations investigated in this study are labeled in black while the previously investigated NSD2 E1099K mutation is labeled in gray. C and G, Coomassie BB-stained SDS-polyacrylamide gels depicting equal amounts of purified GST-tagged WT and mutant proteins for NSD2 (C) and NSD1 (G). D-F and H-J, Methyltransferase assays of NSD2 (D-F) and NSD1 (H-J) cancer mutants. The methyltransferase assays were conducted using either H3K36 peptide (D and H), H3.1 recombinant protein (E and I), or H3.1 recombinant mononucleosomes (F and J) as methylation substrate and radioactively labeled AdoMet. Autoradiographic pictures of the methyltransferase assay are shown in the upper panels and the corresponding quantitative analysis is in the lower panels. The mutant activities are displayed relative to WT enzymes based on the relative band intensity (rbi). The data are expressed as means ± SEM for at least two independent replicates. NSD1, nuclear receptor binding SET domain protein 1; NSD2, nuclear receptor binding SET domain protein 2; WT, wild type.
The T1150A cancer mutant of NSD2 can introduce H3K36me3 enhanced activity compared to NSD1 WT on the different histone substrates (8.5-fold for peptide and nucleosomal substrates, and 3-fold on the protein substrate) (Fig. 1, H-J). In contrast, the other NSD1 cancer mutants (Y1971C, R2017Q and R2017L), which are mostly observed in different types of solid tumors, abolished the NSD1 activity ( Fig. 1, H-J). The R2017Q results agree with a previous study investigating this mutant in the context of SOTOS syndrome (10). The activity effects of the different NSD1 cancer mutants ranging from inactivity to hyperactivity suggest that NSD1 can act as an oncogene or tumor suppressor in a tumor type-specific manner. The NSD1 loss-of-function effect of the three cancer missense mutations discovered here is in agreement with reports describing epigenetic silencing of NSD1 by promoter hypermethylation or genetic deletions in solid tumor cell lines and patients with cancer (49)(50)(51)62). These NSD1-silenced tumors show decreased intergenic H3K36me2 and DNA hypomethylation of CpG islands and intergenic regions at affected genomic loci (27,62).
Mechanistically, the effects of the Y1971C and R2017Q/L mutations can be explained in the light of the recently published cryo-EM structure of NSD2 in complex with a nucleosome containing H3K36M. This structure shows that NSD2 Y1092 (which corresponds to NSD1 Y1971) contacts the H3K36M side chain together with two more aromatic residues (F1177 and Y1179) and an additional hydrophobic one (L1120) (36). The Y1971C mutation disrupts this aromatic cage which is needed for positioning and deprotonation of the target lysine explaining the loss of catalytic activity caused by the mutation. In an NSD1-AdoMet structure, R2017 is observed to bind to three aromatic amino acids, Tyr1870, Tyr1997, and Phe2018, through H-bonds and cation-π interactions (10). These residues are known to be essential for the activity of many SET domains containing PKMTs (10). Moreover, R1138, the NSD2 residue corresponding to NSD1 R2017, is involved in (C-H-O) hydrogen bonding between its main chain carbonyl oxygen and the C-atom of the AdoMet methyl group suggesting that it has a direct role in catalysis (61). Collectively, these findings can explain that the mutation of R2017 to leucine or glutamine abolishes the enzymatic activity of NSD1.
To investigate if there are larger differences in folding between the NSD1 and NSD2 WT enzymes and their corresponding cancer mutants, CD spectroscopy was conducted with all proteins. All spectra were similar indicating comparable secondary structure and folding between WT enzymes and their corresponding cancer mutants (Fig. S2). Collectively, these results confirm that NSD2 T1150 and its equivalent residue in NSD1, T2029, are important for controlling the methyltransferase enzymatic activity and their mutation to alanine increases the activity of both enzymes. In contrast, NSD1 residues Y1971 and R2017 are essential for enzymatic activity and the Y1971C and R2017Q/L exchanges cause a loss of function.
T1150A/T2029A change the enzyme product specificity on protein and nucleosomal substrates Since only NSD2 T1150A and its homolog NSD1 T2029A exhibited catalytic activity, we decided to characterize these hyperactive mutants in more detail. Both proteins share very high amino acid sequence identity around T1150/T2029 (113 residues identical within a region of 126 amino acid residues) (Fig. S2C). NSD1 and NSD2 are known to generate H3K36 methylation only up to the dimethylation state in vitro and in vivo (5,7,9,10), while SETD2 is the sole human enzyme to catalyze up to trimethylation of H3K36 in somatic cells (11). The change in H3K36 methylation state can result in different biological responses (6,7,13), but the exact molecular reason for the dimethyl product specificity of NSD1 and NSD2 is unknown. We were interested to investigate whether the hyperactive NSD2 T1150A cancer mutant can change the NSD2 product specificity to also generate H3K36me3. To test this hypothesis, an equal molar concentration of NSD2 WT and T1150A was mixed with recombinant H3.1 protein in methylation buffer supplemented with unlabeled AdoMet. After methylation, the samples were loaded onto an SDSpolyacrylamide gel, and the methylation state of methylated histone proteins was analyzed by Western blot using antibodies specific for either H3K36me2 or H3K36me3. Strikingly, while both NSD2 WT and T1150A mutant could catalyze the histone H3K36 dimethylation, only the NSD2 T1150A mutant was able to generate H3K36me3 (Fig. 2, A and B). We confirmed the specificity of the H3K36me3-specific antibody using SPOT peptide array binding experiments (Fig. S4A). Since the H3.1 recombinant mononucleosomes are a more physiologically relevant substrate, they were used in a similar methyltransferase assay. Again, it was observed that the NSD2 T1150A variant, but not NSD2 WT, catalyzed trimethylation of H3.1 nucleosomes (Figs. 2C and S3A). Noteworthy, the same effect was also observed using the NSD1 T2029A mutant on both the H3.1 protein and mononucleosomal substrates (Figs. 2, D-F and S3B). Moreover, and in order to rule out any possible cross-reactivity of the H3K36me3 specific antibody with H3K36me2, the methyltransferase assay was repeated with titrating the amount of the hyperactive NSD1 T2029A mutant until equal dimethylation activity was achieved in comparison with NSD1 WT (Fig. S4B). At such conditions, the H3K36me3 methylation signal was only detected with the T2029A mutant (Fig. S4C). This confirms the specificity of the H3K36me3 antibody and indicates that the detected H3K36me3 signal is indeed due to trimethylated H3K36 introduced by the T2029A mutant and not to H3K36me2 cross reactivity of the antibody. As H3K36me3 is observed under conditions of similar H3K36me2 activity of WT and T2029A, the H3K36me3 generation by T2029A is not a simple consequence of its hyperactivity but due to a real change in product specificity of the mutant.
Mass spectrometry analysis revealed the trimethylation activity of NSD1 T2029A on an H3K36 peptide substrate To confirm the trimethylation activity of the hyperactive cancer mutants, we applied mass spectrometry which is independent of the use of H3K36me3-specific antibodies. Due to the lower sensitivity of the mass spectroscopy coupled methylation assays as compared with radioactive or antibody-The T1150A cancer mutant of NSD2 can introduce H3K36me3 based methods, sufficient catalytic activity was only observed with the purified NSD1 proteins such that NSD2 T1150A could not be studied. NSD1 WT or T2029A mutant was used in equal molar concentration in a methyltransferase assay using the unmodified H3(26-44) peptide as substrate. After the reaction, the methylation states of histone peptides were identified by MALDI-TOF mass spectrometry. Due to the lower sensitivity of our mass spectrometric assay, only monomethylated H3K36 peptide was detected in the reaction with NSD1 WT (Fig. 2G). However, under the same assay conditions, T2029A was able to methylate the H3K36 peptide resulting in all methylation states (mono-, di-and trimethylation) which is in agreement with our Western blot results.
Additionally, the H1.5 K168 peptide was reported previously to be an NSD1 histone target with even stronger methylation than the H3K36 peptide in vitro (39). We, therefore, tested if NSD1 T2029A mutant can catalyze trimethylation on the H1.5 K168 peptide as well. First, the methyltransferase activity of NSD1 T2029A on H1.5 (160-176) peptide was compared to NSD1 WT enzyme in radioactive methyltransferase assay and the signal was detected by autoradiography (Fig. S5A). As observed previously with the H3K36 substrate, T2029A was hyperactive relative to NSD1 WT (Fig. S5A). Interestingly and in agreement with the H3K36 trimethylation activity of T2029A, in methylation assays coupled to mass spectrometry readout, trimethylation of the H1.5 K168 peptide was observed with T2029A but not NSD1 WT (Fig. S5, B-D). In summary, these results document a unique trimethylation activity of the NSD2 T1150A and NSD1 T2029A cancer mutants. This activity could rewrite the H3K36 methylation landscape leading to massive biological responses that may explain the carcinogenic effect of these mutations.

NSD2 T1150A shows enhanced complex formation with H3K36me2 and AdoMet compared to NSD2 WT in MD simulations
We next aimed to characterize the mechanism behind this exceptional trimethylation catalysis of the NSD2 T1150A mutant on H3K36 using MD simulations. For the correct setup of these experiments, information about the reaction mechanism was essential, in particular, whether the enzyme acts in a processive or distributive manner. Many SET-domain PKMTs were shown to exhibit a processive mode of action for the stepwise methylation of their target lysine residues (63)(64)(65)(66). This means that after one methyl transfer reaction, the cofactor product S-adenosyl-L-homocysteine (AdoHcy) dissociates and a new AdoMet molecule is bound without dissociation of the peptide substrate. To confirm this mechanism for the NSD enzymes, we conducted methylation reactions of the H3K36me1 peptide with NSD1 T2029A (the only enzyme that was active enough for this kind of analysis) and detected the product methylation states by MALDI mass spectrometry. Noteworthy, starting with H3K36me0, a clear peak of dimethylated H3K36 product and also some trimethylation was observed as mentioned before (Fig. 2G). In contrast, much lower methylation was detectable when the reaction was started under the same conditions with the H3K36me1 substrate (Fig. S6B). This result is not compatible with a distributive reaction mechanism.
In order to investigate the mechanistic basis of the difference in product specificity between NSD2 WT and T1150A, steered Molecular Dynamics (sMD) simulations were applied. In the sMD approach, external forces are used to guide bimolecular association processes (67). Thereby, reactions that otherwise would be too slow to be modeled in MD simulations are accelerated and at the same time, the conformational sampling is concentrated along a specific, predefined reaction coordinate. In a processive reaction mechanism, higher product methylation states are achieved by the release of AdoHcy after one methylation reaction followed by the binding of a new AdoMet to the enzyme-peptide complex. Consequently, the AdoMet association process to enzyme-peptide complexes already containing H3K36me1 or me2 peptides was modeled to study the potential generation of higher H3K36 methylation states up to H3K36me3. For this, NSD2-peptide complexes with the H3K36me1 and me2 peptide bound in the NSD2 peptide binding cleft and devoid of AdoMet were modeled using the cryo-EM structure of NSD2 E1099K, T1150A bound to a nucleosome and the SET domain of SETD2 complexed with H3K36M as templates (PDB 7CRO and 5V21, see also Experimental procedures and Supplementary data for details and structures). To simulate the AdoMet association process into the active site of the NSD2-peptide complexes, it was placed 27 Å above its binding pocket and a weak attractive force of 0.2 kJ/(mol × Å 2 ) was applied between the Nε-atom of lysine 36 and the methyl group C-atom of AdoMet (Fig. 3A). In order to define criteria describing a successful docking of AdoMet, the geometric requirements for a transition state (TS)-like conformation were applied, which were derived from the known S N 2 geometry of methyl group transfer reactions (68).
One hundred sMD simulations à 35 ns were performed for each NSD2 WT and T1150A and frames were recorded every 20 ps. Then, the number of docking events were monitored which fulfilled the success criteria in at least one frame. The analysis of the sMD simulations revealed that both proteins were able to establish S N 2 TS-like conformations, however, with significant differences. With the H3K36me1 substrate, the analysis revealed ten successful docking simulations for NSD2 WT and 15 for NSD2 T1150A corresponding to a nonsignificant difference (Fig. 3B). In contrast, with the H3K36me2 substrate, NSD2 WT accommodated AdoMet successfully into the binding pocket in only two out of 100 simulations, whereas NSD2 T1150A did this in 14 out of 100 simulations (p-value 3.2 × 10 −4 ) (Fig. 3B). An analysis of the RMSD of successful AdoMet association compared to already bound AdoMet showed that the positions were in good agreement (Fig. S7, A and B). An example of a successful docking event is provided in Supplementary Movie 1. In conclusion, this sMD experiment fully recapitulates the biochemical data described earlier. It identified an enhanced capability of NSD2 T1150A to accommodate AdoMet when H3K36me2 is already bound when compared to NSD2 WT which can explain its ability to generate H3K36me3. NSD2 T1150A has an increased active site volume to accommodate H3K36me2 and AdoMet The sMD experiments showed an enhanced ability of NSD2 T1150A to bind H3K36me2 and AdoMet simultaneously compared to NSD2 WT. To analyze the molecular mechanism of this effect, the volume of the active site pocket was measured during MD simulations and the contacts of the active site amino acids were examined. First, an analysis of the active site volume was carried out by simulating NSD2 complexed with AdoMet and the H3K36 peptides in different methylation states. By having AdoMet already bound, a standardized comparison between NSD2 WT and NSD2 T1150A can be made and undersampling of NSD2 WT frames with bound AdoMet was avoided. For this, 30 MD simulations à 100 ns were conducted for each NSD2 WT and T1150A. Out of this pool, 5000 randomly selected snapshots were used to calculate the active site volume around K36 for each protein complexed with H3K36me0, me1, or me2 (Fig. 3C). The analysis of the calculated volumes shows that large volumes (≥70 Å 3 ) occur more frequently for NSD2 T1150A and lower volumes (0-25 Å 3 ) occur more frequently for NSD2 WT. This effect increases with higher methylation levels of K36. For K36me2, large volumes occur 8.5-fold more often for NSD2 T1150A compared to NSD2 WT (p-value 0.015, calculated by two-tailed t test assuming equal variance based on three replicates of the analysis) (Fig. 3D). The strong elevation in this effect with higher H3K36 methylation levels suggests that the active site tends to collapse with higher methylation levels and this effect is more pronounced with WT than with the mutant. Overall, these findings clearly explain the increased capability of T1150A to accommodate AdoMet and H3K36me2 simultaneously.
Since the T1150A mutation is the only difference between the two proteins, the increased volume of the active site pocket must be a direct consequence of the mutation. MD simulations of modeled complexes of NSD2 with the H3K36me0, me1, and me2 peptides revealed contacts of T1150 to Y1092 and L1120 (Figs. 3C and Fig. S7C). Contacts were considered as established if the distance of a pair of heavy atoms from both amino acids was below 4.5 Å. An Hbond between the hydroxyl group of T1150 and the backbone amide of Y1092 was established in 15% of the simulation time (30 replicates of 100 ns) with H3K36me0 but only 5% and 7% in the case of me1 and me2 ( Fig. 3E and Supplementary data). This contact orients Y1092 and restricts the volume of the active site pocket consequently disfavoring the interaction with me1 and me2 substrates. In the case of A1150, the contact with Y1092 is much less frequent, which supports the further methylation of me1 and me2 substrates. The hydrophobic contact between the T1150 side chain methyl group and the Cδ-atoms of L1120 was observed in 53% of the simulation time (average of me0, me1, and me2) but only in 24% of the time in the case of T1150A (p-value 1.2 × 10 −14 , calculated by two-tailed t test assuming equal variance based on 30 replicates MD analysis) (Supplementary data). This interaction orients L1120 and also restricts the volume of the active site pocket. Hence, the disruption of contacts to Y1092 and L1120 is likely the reason for the enlargement of the active site volume of the T1150A mutant leading to its activity as trimethyltransferase. As mentioned previously, Y1092 and L1120 surround the H3K36 side chain, and the mutation of NSD1 Y1971 (corresponding to NSD2 Y1092) to cysteine abolished the enzyme activity.
The concept that the size of the enzyme active site pocket controls the PKMT product specificity has been developed by comparing the Rubisco large subunit methyltransferase LSMT, a lysine trimethyltransferase, with the monomethyltransferase SET7/9 (69). It had been further refined by showing that a Tyr/ Phe switch in the lysine binding pocket can control the PKMT product specificity, where tyrosine favors the smaller sized product (mono-and dimethylation) while phenylalanine switches the specificity to di-and trimethylation (70). This The T1150A cancer mutant of NSD2 can introduce H3K36me3 concept was experimentally validated for several PKMTs (13,57,(70)(71)(72)(73) and also rationalized by MD and QM simulations (74). Moreover, the G9a residue Y1067, corresponding to NSD2 Y1092, has been shown to control the dimethylation product specificity of G9a, and its mutation to phenylalanine converted G9a to a trimethyltransferase (73). The loss of the positioning of NSD2 Y1092 in the T1150A mutant may have a similar effect, which could explain the switch from dimethyltransferase to trimethyltransferase activity in the T1150A mutant. Moreover, NSD2 L1120 is the only amino acid residue in close proximity to the H3K36 side chain which is not conserved between the dimethyltransferase NSD2 and the trimethyltransferase SETD2. SETD2 contains a less bulky methionine at this place suggesting that L1120 might also have an important role in controlling the product specificity of NSD2. The loss of the contact of T1150 to L1120 in the T1150A mutant might increase the flexibility of L1120 and thereby destroy its control of the product specificity of NSD2. In summary, our findings unraveled key mechanistic features controlling the product specificity of this class of enzymes by the accurate positioning of residues surrounding the substrate H3K36 side chain.

NSD2 T1150A catalyzes trimethylation of H3K36 in human cells
Next, we asked whether the NSD2 T1150A mutant can catalyze H3K36me3 in human cells as well and most importantly if this trimethylation will be due to direct catalysis by NSD2 T1150A and not mediated indirectly by the SETD2 enzyme (the sole human enzyme responsible for H3K36me3 deposition). To address this question, we first created a SETD2 knockout HEK293 cell line using CRISPR-Cas9 genetic engineering. Following literature data (11), the SETD2 KO was validated in the derived clonal cell lines by the absence of genomic H3K36me3 compared to parental cells (Fig. S8A). Analysis of genomic DNA revealed mutations in both alleles of the SETD2 gene in one selected clone (Fig. S8, B and C). This . These cell lines were selected not to contain mutations in NSD1, NSD2, NSD3, and SETD2 and not to show overexpression of NSD enzymes. The significantly up-and downregulated genes are colored red and blue, respectively. Each dot is representing one gene probe corresponding in total to 504 significantly upregulated genes and 163 significantly downregulated genes (significant at adjusted p-value < 0.05). Non-significant probes are colored black. The analysis was conducted using the GEO2R tool and GEO57083 expression array data. On the right side, examples of upregulated oncogenes (red) and downregulated TSGs (blue) mapped to the COSMIC cancer gene census list are provided. C, Analysis of the enrichment of histone modification(s) at the differentially upregulated genes shown in panel B (Log 2 FC ≥ 2, adjusted p-value < 0.05). The analysis was performed using the Encode database with the Enrichr analysis tool. All hits were ordered according to their adjusted p-values as indicated beside each bar. The bar diagram shows H3K27me3 in the B-lymphocyte cell line GM12878 as the most significant hit (marked in red). NSD2, nuclear receptor binding SET domain protein 2.
The T1150A cancer mutant of NSD2 can introduce H3K36me3 step was followed by the expression of either NSD2 T1150A or NSD2 WT catalytic domain in the SETD2 KO cells and the investigation of genomic H3K36me3 levels. To this end, NLS-mVenus-tagged NSD2 WT or NSD2 T1150A cancer mutant were transfected into the SETD2 KO cells together with NLS-mVenus empty vector as a negative control. Transfection efficiency detected by the mVenus reporter fluorescence was analyzed by flow cytometry showing equal transfection yields (Fig. S9). Moreover, the equal expression of mVenus-tagged NSD2 WT or NSD2 T1150A was further confirmed by Western Blot using an anti-GFP antibody (Fig. 4A). Intriguingly, the overexpression of NSD2 T1150A in the SETD2 KO cell line led to a defined rise in genomic H3K36me3 levels, while cells overexpressing WT NSD2 did not show an increase in H3K36me3 when compared with untransfected cells or cells transfected with mVenus empty vector (Fig. 4A). These results were confirmed in three independent transfection series. They directly demonstrate the capability of the NSD2 T1150A SET domain to generate H3K36me3 in human cells independent of SETD2. This result strongly suggests that the corresponding full-length NSD2 T1150A mutant has the same ability as so far regions outside of the catalytic part of SET-domain PKMTs have not been shown to affect product methylation levels.
Expression of the NSD2 T1150A variant as abnormal H3K36 trimethyltransferase will lead to increased H3K36me3 levels and the introduction of H3K36me3 at aberrant genomic locations. Changes in the H3K36 methylation state are known to be associated with diverse biological outcomes because H3K36me2 and H3K36me3 exhibit distinct downstream effects on gene transcription and chromatin structure (13).

Bioinformatic analysis of gene expression changes in cancer cells containing T1150A
Next, we wanted to get insights into the possible biological effects that could be mediated by NSD2 T1150A in leukemic cancer cells. Bioinformatic analysis of publicly available gene expression datasets was performed to retrieve the change in transcriptome between leukemic cancer cell lines harboring T1150A or WT NSD2. For this, a diffuse large B cell lymphoma (DLBCL) cell line (OCILY-18) was identified in the CCLE database to carry the NSD2 T1150A mutation. OCILY-18 gene expression data were compared with four other DLBCL cell lines containing NSD2 WT revealing a differential gene expression signature with 504 upregulated genes and 163 downregulated genes ( Fig. 4B and Table S3). While these are different cell lines, they are related because all were derived from DLBCL. Control cell lines were selected not to contain mutations in NSD enzymes and SETD2 and not to show overexpression of NSD enzymes in comparison with the other control cell lines. Notably, systematic differential gene expression analyses performed between the four selected NSD2 WT (DLBCL) cancer cell lines (each one against the other three) provided no significant hits (Fig. S10). To validate that the expression differences of the OCILY-18 cell line from the controls are indeed due to the T1150A mutation, future studies will need to include more T1150A mutant cell lines or investigate WT and mutant cells with an isogenic background.
Assuming that the expression changes of the OCILY-18 cell line are related to the T1150A mutant, we further investigated them to reveal if they contain additional relevant information. The higher number of upregulated genes in the T1150A mutant cell line came in agreement with the known correlation of H3K36me3 and active gene expression. Mapping the differential upregulated and downregulated genes with the COSMIC cancer gene census list (a list of wellestablished oncogenes and tumor suppressor genes) revealed 15 differentially upregulated oncogenes and eight downregulated tumor suppressor genes (Fig. 4B). For example, HoxA9 was one of the differentially upregulated genes in the tumor cell line containing T1150A, which is a known oncogene in hematological cancers and its expression is under the control of NSD enzymes (75). Due to the lack of H3K36me2/3 ChIP-seq data for these cells, a direct correlation between alterations in these marks and gene expression changes could not be investigated.
However, there is a strong antagonism between H3K36me3 and H3K27me3, an important gene silencing modification, which is more prevalent than the antagonism of H3K36me2 and H3K27me3 as illustrated by global mass spectrometric analyses (24)(25)(26). We wondered whether competition of H3K27me3 by the NSD2 T1150A generated elevated H3K36me3 could explain the gene upregulation. To address The T1150A cancer mutant of NSD2 can introduce H3K36me3 this question, we analyzed publicly available ENCODE histone modifications ChIP-seq datasets for enriched histone modifications at the upregulated genes. Impressively, the H3K27me3 ChIP profile in the B-lymphocyte cell line GM12878 was retrieved as the top significant hit (adjusted p-value 1.32 × 10 −10 ) (Fig. 4C). Moreover, TF-ChIP seq analysis of upregulated genes revealed an enrichment of EZH2 in B-cell as the only significant hit (adjusted p-value 0.0028) (Fig. S11A). This finding suggests that these genes are silenced by H3K27me3 in normal B lymphocytes and they become upregulated by the H3K36me3 generated by NSD2 T1150A. H3K27me3 is critical to halt the self-renewal of lymphoid progenitors (76,77). Disruption of this function can result in lymphomas similar to the EZH2 inactivating mutants observed in many hematological cancers (76,78,79) and the results of the H3.3K27M oncohistone frequently observed in pediatric gliomas (80,81). Moreover, a decrease in H3K27me3 was correlated with poor outcomes in leukemic patients (82). Imbalanced high H3K36me3 level can phenocopy the EZH2 disruption by inhibiting its recruitment and activity finally leading to decreased H3K27me3 levels (83).

Conclusion
In cancer cells, numerous somatic mutations were observed in critical epigenome regulators including the H3K36 dimethyltransferases NSD1 and NSD2. To shed light on possible mechanisms by which these mutations may lead to carcinogenesis, we investigated the enzymatic properties of several somatic missense mutants of NSD2 and NSD1. We show that mutations of Y1971 and R2017 in NSD1 abrogate catalytic activity indicating that NSD1 can act as a tumor suppressor gene in some tumor types which is in line with its frequent silencing or deletion in cancers. In contrast, the frequent T1150A mutation of NSD2 and the corresponding T2029A mutant of NSD1 showed hyperactivity and an abnormal H3K36 trimethylation activity, which contrasts the WT NSD1 and NSD2 enzymes that both only function as dimethyltransferases. This novel activity was documented biochemically in vitro and human cells in a SETD2 knockout background. Using MD simulations, we investigated the mechanistic underpinnings of this atypical activity and uncovered key rules governing the product specificity of this class of enzymes by the accurate positioning of residues surrounding the H3K36 side chain. These analyses detected an enlarged active site pocket of NSD2 T1150A compared to the WT enzyme in particular in complex with the H3K36me2 substrate, which provides enough space for binding of the AdoMet cofactor allowing the conversion of bound H3K36me2 to the trimethylated product. Additional MD simulations revealed two important contacts of T1150 with Y1092 and L1120 which mediate the change in the active pocket volume between NSD2 WT and the T1150A mutant (Fig. 5A). Based on our biochemical, cellular, and computational biology date combined with analyses of published genome and expression data, we propose a model by which this mutant could aberrantly introduce H3K36me3 which competes with endogenous H3K27me3 marks finally leading to changes in gene expression and carcinogenesis (Fig. 5B). If this model could be validated in follow-up studies in cancer cells, an NSD2 inhibitor would be a therapeutic option for T1150A containing patients.

Experimental procedures
Site-directed mutagenesis, enzymes overexpression, and purification The GST-tagged expression constructs of the human NSD2 catalytic domain including the AWS, SET, and Post-SET domains (amino acids 992-1240 of UniProt No: O96028) were taken from Schuhmacher et al (84). This part is an independent folding unit as illustrated by structural studies [pdb 7E8D (36)]. A corresponding GST-tagged mouse NSD1 catalytic domain expression construct (amino acids 1701-1987, Uni-Prot No: O88491) containing an additional C1920S mutation was taken from (85). The SET domains of human and mouse NSD1 share >95% identity and none of the residues that are affected in this study or known to be catalytically relevant is different. The different NSD2 (T1150A) and NSD1 mutants (Y1971C, R2017Q/L and T2029A) were created by sitedirected mutagenesis using the megaprimer method (86). Human NSD1 residues Y1971, R2017, and T2029 correspond to mouse NSD1 residues Y1869, R1915 and T1927. The sequence of all plasmids was validated by Sanger sequencing. For the protein overexpression, the different plasmid constructs (WT and mutants) were transformed into E. coli BL21-CodonPlus (DE3) cells (Novagen). The bacterial cells were grown at 37 C until they reached an OD 600nm between 0.6 and 0.8. Afterward, 1 mM isopropyl-ß-d-thiogalactopyranoside was added to induce protein expression at 20 C overnight. The next day, the cells were harvested by centrifugation at 3800 rcf for 20 min, followed by washing once with STE buffer (10 mM Tris/HCl pH 8.0, 1 mM EDTA and 100 mM NaCl) and collection of the cell pellets by centrifugation at 4900 rcf for 25 min.
Histone octamers were prepared as described previously (87). Briefly, the pET21a expression constructs of H3.1, H4, H2A, and H2B were overexpressed in BL21-Codon Plus E. coli cells which were allowed to grow at 37 C until OD 600nm of 0.6 to 0.8 was reached. Induction of protein expression was done at 20 C shaker for 3 h after the addition of. Cells were harvested by centrifugation at 5000 rcf for 15 min, washed using STE buffer (10 mM Tris/HCl pH 8, 100 mM NaCl, 1 mM EDTA), centrifuged again at 5000 rcf for 15 min, and the cell pellets stored at −20 C.
For histone protein purification, the bacterial cell pellets were resuspended in SAU buffer (10 mM sodium acetate pH 7.5, 1 mM EDTA, 10 mM lysine, 5 mM β-mercaptoethanol, 6 M urea, 200 mM NaCl) followed by sonication (Epishear, Active Motif). The lysate was centrifuged at 40,000 rcf for 1 h and the supernatant was filtered through a 0.45 μM syringe filter (Chromafil GF/PET 45, MachereyNagel) and passed through a HiTrap SP HP (5 ml, GE Healthcare) column connected to an NGC FPLC system (BioRad), which was previously equilibrated with SAU buffer. After washing the column with SAU buffer, proteins were eluted with NaCl gradient from 200 mM to 800 mM. Fractions were collected, analyzed by SDS-PAGE, pooled according to purity and yield, dialyzed against pure water with two changes overnight, and dried in a vacuum centrifuge for storage at 4 C.
The lyophilized histone proteins were dissolved in unfolding buffer (20 mM Tris/HCl pH 7.5, 7 M guanidinium chloride, 5 mM DTT) and their concentrations were determined spectrophotometrically at OD 280nm . The proteins were mixed in a ratio of 1 (H3, H4) to 1.2 (H2A, H2B). The samples were dialyzed against refolding buffer (10 mM Tris/HCl pH 7.5, 1 mM EDTA, 2 M NaCl, 5 mM β-mercaptoethanol) overnight with one buffer change. To purify the octamers afterward, the samples were separated by size exclusion chromatography using a Superdex 200 16/60 PG column equilibrated with refolding buffer. Fractions were collected, pooled according to purity, and afterward concentrated using Amicon Ultra-4 centrifuge filters (30 kDa cutoff, Merck Millipore). The purified octamers were validated by SDS-PAGE, aliquoted, flashfrozen in liquid N 2 , and stored at −80 C.

Nucleosome reconstitution
Nucleosomes were prepared using the histone octamers and DNA fragments as described previously (87). Briefly, the Widom-601 sequence (88) was cloned into a TOPO-TA vector together with a linker sequence providing 64 bp linker DNA on the 5 0 side of the core nucleosome and 29 bp on the 3 0 side, amplified by PCR and purified. DNA and histone octamers were mixed in different ratios between equimolar and 2-fold octamer excess. The samples were then dialyzed in a Slide-A-Lyzer microdialysis devices (ThermoFisher) against high salt buffer (10 mM Tris/HCl pH 7.5, 2 M NaCl, 1 mM EDTA, 1 mM DTT), which was continuously replaced by low salt buffer (same as high salt, but with 250 mM NaCl) over 24 h. Afterward, the samples were dialyzed overnight against storage buffer (10 mM Tris/HCl pH 7.5, 1 mM EDTA, 1 mM DTT, 20% glycerol), aliquoted, flash-frozen in liquid nitrogen, and stored at −80 C. The DNA assembly with histone octamer in the nucleosome was validated by the electro-mobility gel shift assay.

Circular dichroism spectroscopy
The Circular Dichroism (CD) spectroscopy measurement was performed to investigate the secondary structure composition of the NSD proteins. Protein samples were mixed with 200 mM KCl buffer and CD spectra were recorded from 240 nm to 190 nm at 20 C for 60 cycles using a 0.1 mm cuvette in a J-815-150S CD Spectrometer (Jasco). As background signal, dialysis buffer II (20 mM Tris/HCl pH 7.2, 100 mM KCl, 0.5 mM DTT, 60% glycerol) was measured under the same conditions.
Peptide methylation assay using radioactively labeled AdoMet NSD1 and NSD2 WT and different cancer mutants (3.4 μM) were mixed with the unmodified H3 (aa 26-44) peptide (4.4 μM) or H1.5 (aa 160-176) peptide (9.8 μM) (Intavis AG, Köln, Germany) in methylation buffer (50 mM Tris/HCl HCl, pH 9, 5 mM MgCl 2 and 1 mM DTT) supplemented with 0.76 μM radioactively labeled AdoMet (PerkinElmer) for 4 h at 37 C or overnight at 25 C. The reactions were stopped by the addition of SDS-PAGE loading buffer and heating for 5 min at 95 C. Afterward, the samples were separated by Tricine-SDS-PAGE followed by the incubation of the gel in amplify NAMP100V (GE Healthcare) for 1 h on a shaker and drying of the gel for 2 h at 70 C under vacuum. The signals of the transferred radioactively labeled methyl groups were detected by autoradiography using a Hyperfilm high performance autoradiography film (GE Healthcare) at −80 C in the dark. The film was developed with an Optimax Typ TR machine after different exposure times. Quantification of scanned images was conducted with ImageJ.
Protein and nucleosome methylation assay using radioactively labeled AdoMet NSD1 and NSD2 WT and different cancer mutants (3.4 μM) were mixed with recombinant H3.1 protein (1 μg) (purchased from NEB) or recombinant H3.1 mononucleosomes in methylation buffer (50 mM Tris/HCl, pH 9, 5 mM MgCl 2 and 1 mM DTT) supplemented with 0.76 μM radioactively labeled AdoMet (PerkinElmer) for 4 h at 37 C or overnight at 25 C. The reactions were stopped by the addition of SDS-PAGE loading buffer and heating for 5 min at 95 C. Afterward, the samples were resolved by 16% SDS-PAGE and processed as described in the last chapter.

Analysis of peptide methylation by MALDI mass spectrometry
The methylation reactions were performed using the unmodified H3K36 (aa 26-44), H3K36me1 (aa 26-44), or H1.5 (aa The T1150A cancer mutant of NSD2 can introduce H3K36me3 160-176) peptide (4.5 μM) in methylation buffer (50 mM Tris/ HCl, pH 9, 5 mM MgCl 2 and 1 mM DTT) supplemented with 1 mM unlabeled AdoMet (Sigma-Aldrich) and 6.7 μM NSD1 for 4 h at 37 C. The reactions were halted by the addition of 0.1% trifluoroacetic acid (TFA). All the samples were cleaned using C18 tips (Agilent Technologies). The eluted samples were spotted onto an anchor chip plate (Bruker-Daltonics) followed by drying. Next, 1 μl of HCCA matrix (0.7 mg/ml αcyano-4 hydroxycinnamic acid dissolved in 85% acetonitrile, 0.1% TFA, 1 mM ammonium dihydrogen phosphate) was added to the dried sample spots and allowed to dry again. Afterward, the dried spots on the anchor plate were analyzed using an Autoflex Speed MALDI-TOF mass spectrometer (Bruker-Daltonics). The mass spectra were collected using the Flex control software (Bruker-Daltonics). For calibration, the peptide calibration standard (Bruker-Daltonics) with peptides ranging from 700 to 3200 Da was used. The collected spectra were analyzed with Flex analysis software (Bruker-Daltonics).

Detection of H3.1 methylation by Western blot
NSD1 and NSD2 WT and their corresponding cancer mutants (3.4 μM) were mixed with either recombinant H3.1 protein or recombinant H3.1 mononucleosomes in methylation buffer (50 mM Tris/HCl, pH 9, 5 mM MgCl 2 and 1 mM DTT) supplemented with 1 mM unlabeled AdoMet (Sigma-Aldrich) overnight (14 h) at 25 C. The reactions were stopped on the next day by the addition of SDS-PAGE loading buffer and heating for 5 min to 95 C. Afterward, the samples were resolved by 16% SDS-PAGE. Analysis was done by western blotting using the primary antibodies directed against H3K36me3 (ab9050, 1:2000) or H3K36me2 (ab9049, 1:2500) and as secondary antibody the anti-rabbit HRP conjugate (Na934v, GE Healthcare, 1:5000). The signal was detected by chemiluminescence after the addition of Pierce ECL Western Blotting substrate.
The antibodies used in the Western Blot analysis were validated by peptide array binding. Peptide arrays were synthesized with an Autosport peptide array synthesizer (Intavis AG) using the SPOT method (85,89). Four spots corresponding to four different K36 methylation states of the H3K36 peptide (aa 29-43) (unmodified-H3K36me1-H3K36me2-H3K36me3) and one additional spot with H3K36A as negative control were synthesized on the array. After blocking with 5% milk in TBST buffer, the array was incubated with the primary antibody solution for 1 h at room temperature followed by washing three times. Then, the array was kept in secondary antibody solution (anti-rabbit HRP) for 1 h. After washing again, the signal was detected by chemiluminescence after the addition of Pierce ECL Western Blotting substrate.

Preparation of SETD2 knockout HEK293 cells
As a first step, a CRISPR-Cas9 knockout of SETD2 was conducted in HEK293 cells, and single cells were isolated by cell sorting. Three gRNAs were used to target the Setd2 gene aiming to increase the probability of successful knockout as described (90) (Table S1). Single-stranded oligonucleotides encoding these sequences as forward strands were annealed to their complementary oligonucleotides to result in doublestranded DNA with 5 0 single-stranded overhangs complementary to the BbsI-restricted Cas9 vector. After that, the double-stranded oligonucleotides were ligated with pU6-(BbsI) _CBh-Cas9-T2A-mCherry plasmid backbone (provided by Ralf Kuehn, Addgene catalog number 64324) (91) in presence of BbsI HF restriction enzyme (NEB) and T4 DNA ligase (NEB) using golden gate assembly. The ligated products were transformed into XL1 blue E. coli cells by electroporation followed by isolating the plasmids from single bacterial colonies using NucleoBond Xtra Midi kit (Macherey-Nagel). The Cas9-sgRNA plasmids were validated by BbsI restriction analysis and confirmed by Sanger sequencing. Next, a mixture of the three Cas9-sgRNA plasmids (150 ng/μl) was transfected into HEK293 cells at 70% confluency in a fresh medium using FuGENE HD Transfection Reagent (Promega). Two days after transfection, mCherry-positive HEK293 cells were sorted using Sony cell sorter SH800S into 96-well plates containing Dulbecco's Modified Eagle's Medium (Sigma) supplemented with 20% fetal bovine serum, penicillin/streptomycin, and L-glutamine (Sigma). The single-cell clones were allowed to grow and expand and then selected for SETD2 knockout using H3K36me3 Western blot and Sanger sequencing. Since SETD2 is the sole human enzyme responsible for depositing H3K36me3 in HEK293 cells, the genomic H3K36me3 levels from the SETD2 knockout HEK293 cells were tested with H3K36me3-specific antibody (ab9050) and compared to the parental cells. For Sanger sequencing, genomic DNA was extracted from the parental and knockout HEK293 cells using QIAamp DNA mini kit (QIAGEN) and PCR-amplified with primers located 500 to 600 bp from the sgRNA target site. PCR products were Sanger sequenced, and sequences were aligned using the SnapGene multiple sequence alignment tool.

Transfection of HEK293 SETD2 knockout cells and flow cytometry analysis
The coding sequence of NSD2 (amino acids 992-1240, UniProt No: O96028) WT and T1150A catalytic domains tagged with nuclear localization sequence (NLS) of the SV40 Large T-antigen were cloned into the mVenus-C1 plasmid backbone (provided by Steven Vogel, Addgene plasmids no. 27794) (92) by Gibson-Assembly (NEB). NLS-NSD2-mVenus WT and T1150A plasmids were transfected into 70 % confluent HEK293 SETD2 knockout cells using FuGENE HD Transfection Reagent (Promega). NLS-mVenus empty vector was used as a negative control. After 3 days of transfection, cells were harvested and the transfection efficiency, as well as expression of the NSD2 constructs in HEK293 cells, were evaluated based on the mVenus reporter by flow cytometry (MACSQuant VYB, Miltenyi Biotec) and Western blot using an anti-GFP antibody (ThermoFisher, PA1-980A, 1:1000). Data analysis was performed using the FlowJo software (Treestar).

Cell lysis and immunoblotting
Both parental and SETD2 KO HEK293 cells were lysed for immunoblotting. For nuclear lysate enrichment, cell pellets were resuspended first in lysis buffer (10 mM Tris/HCl pH 8, 10 mM NaCl and 0.2% NP-40) supplemented with protease inhibitor cocktail (cOmplete ULTRA tablets, Mini, EDTA -free, EASYpack, Roche) and kept shaking on arotor for 30 min at 4 C. Next, the samples were centrifuged at 4 C and the supernatant was removed. A second, high salt lysis buffer (50 mM Tris/HCl pH 7.5, 1.5 mM MgCl 2 , 20% glycerol, 420 mM NaCl, 25 mM NaF, and 1 mM Na 3 Vo 4 ) supplemented with protease inhibitor cocktail was used to resuspend the nuclear pellets thoroughly with vortexing and sonification. Afterward, the samples were spun down and the supernatant was aliquoted and flash-frozen. Bradford assay was used to quantify the total protein amount of the lysate from the different samples, and accordingly, equal amounts of lysate were mixed with SDS-loading buffer and resolved by SDS-PAGE. Analysis was done by western blotting using the primary antibodies against H3K36me3 (ab9050, 1:2000), H3K36me2 (ab9049, 1:2500), H3 (ab1791, 1:10,000) or GFP (ThermoFisher, PA1-980A, 1:1000) and as secondary antibody anti-rabbit HRP (Na934v, GE Healthcare, 1:5000). The signal was detected by chemiluminescence after the addition of Pierce ECL Western Blotting-substrate.

Steered molecular dynamics simulation of AdoMet association
Steered molecular dynamics simulations were performed in OpenMM 7.5.1 (93,94) utilizing the NVIDIA CUDA GPU platform (https://docs.nvidia.com/cuda/). The systems were parameterized using the General Amber force field (GAFF) and AMBER 14 all-atom force field (95-97) if not specified otherwise. The non-bonded interactions were treated with a cut-off at 10 Å. Additionally, the Particle Mesh Ewald method (98) was used to compute long-range Coulomb interactions with a 10 Å nonbonded cut-off for the direct space interactions. Energy minimization of the system was performed until a 10 kJ/mol tolerance energy was reached. Simulations were run using a 2 fs integration time step. The Langevin integrator (99) was used to maintain the system temperature at 300 K with a friction coefficient of 1 ps −1 . The initial velocities were assigned randomly to each atom using a Maxwell-Boltzmann distribution at 300 K. A cubic water box with a 10 Å padding to the nearest solute atom was filled with water molecules using the tip4p-Ew model (100). Production runs were performed under periodic boundary conditions and trajectories were written every 10,000 steps (20 ps).
The structures of human NSD2 WT and NSD2 T1150A (amino acids 992-1221) were modeled based on the cryo-EM structure of NSD2 E1099K, T1150A in a nucleosome complex (PDB 7CRO) (35). Missing amino acids and the reverting mutations of K1099E and A1150T (as appropriate) were modeled using PyMOD 3.0 (101). The missing part of the post-SET loop (1207-1221) in PDB 7CRO was modeled based on the SET domain of SETD2 (PDB: 5V21) (102) using PyMOD 3.0, since no structure of NSD2 complexed with the H3K36 peptide and post-SET loop has been resolved. Subsequently, the histone tail of PDB 7CRO was replaced by the H3K36 peptide from PDB 5V21, and methionine 36 mutated to lysine. The target lysine 36 was then manually deprotonated as required for the S N 2 mechanism (61, 74). Methyl groups were introduced at the lysine side chain nitrogen using PyMOL (103). Parametrization of methylated lysine in the different methylation states was accomplished using AMBER 14 GAFF and ff14SB (104). AdoMet was modeled based on the coordinates of AdoHcy and parametrized using ANTE-CHAMBER from AmberTools (18.0) (105) and placed 27 Å away from the AdoMet-binding pocket as starting point for the association studies.
To equilibrate the solvent, a 5 ns pressure coupled equilibration with Monte Carlo barostat (110) was performed at a pressure of 1 atm. The C-alpha (Cα) atoms of NSD2, the peptide, and the AdoMet atoms were restrained with a force of 100 and 5 kJ/mol × Å 2 , respectively. The restraints were taken off successively, starting with the NSD2 Cα restraints, followed by a 5 ns equilibration with the peptide and AdoMet still being restrained. Subsequently, the AdoMet and peptide restraints were removed as well, followed by 0.1 ns equilibration with no restraints. A distant-dependent force of 0.2 × distance of centroid 1 (lysine 36 side chain nitrogen and its attached two hydrogen atoms) and centroid 2 (AdoMet methyl group and its attached three hydrogen atoms) (kJ/ mol)/Å 2 ) was used to pull the center of mass (COM) of centroid 1, towards the COM of centroid 2. The lysine hydrogen atoms were replaced with carbon atoms as appropriate for the Kme1, Kme2 and Kme3 simulations. Two additional weaker forces were used to guide AdoMet into a proper binding position in the AdoMet binding pocket (force2: 0.1 × distance of centroid 3 (AdoMet atoms N1, C2, N3) and centroid 4 (NSD2 L1202 atoms N, Cα, C) (kJ/mol)/ Å 2 ); force3: 0.05 × distance of centroid 5 (AdoMet atoms N0, Cα, Cβ) and centroid 6 (NSD2 F1149 atoms Cα, C, O) (kJ/ mol)/Å 2 )). For production, sMD simulations were conducted for 100 replicates à 35 ns (total simulation time 3.5 μs).
In order to define criteria describing a successful docking of AdoMet, the following geometric requirements for a TS-like The T1150A cancer mutant of NSD2 can introduce H3K36me3 conformation were derived from the known S N 2 geometry of methyl group transfer reaction (68) (Fig. S12).
(1) The distance between the lysine Nε and AdoMet methyl group C-atom is <4 Å. (2) The angle between the lysine Nε -lysine Cδ bond and the virtual bond between lysine Nε and the AdoMet methyl group C-atom is in a range of 109 ± 30 . (3) The angle between the lysine Nε-AdoMet methyl group Catom and AdoMet methyl group C-atom -AdoMet S-atom bonds is in a range of 180 ± 30 .
Data analysis was performed utilizing MDTraj (1.9.4) (111) to calculate the distances and angles necessary for the geometric criteria of an S N 2 TS-like conformation and the RMSD. All structures were visualized using PyMOL (2.4.1).

Volume estimation of the NSD2 active pocket
General simulation parameters and starting structures of NSD2 WT and T1150A complexed with the H3K36 peptide were modeled as described earlier for the sMD experiments. AdoMet was positioned in the AdoMet binding pocket based on the coordinates of AdoHcy in PDB 7CRO (35). A 5 ns pressure coupled equilibration with Monte Carlo barostat (110) was performed at a pressure of 1 atm. NSD2 and peptide Cα atoms as well as cofactor AdoMet atoms were restrained with a force of 100 and 5 kJ/mol × Å 2 , respectively. The restraints were taken off successively, starting with the Cα restraints, followed by a 5 ns equilibration with only AdoMet restrained. Subsequently, the AdoMet restraints were removed as well followed by 5 ns equilibration with no restraints. For production, 30 replicates à 100 ns were performed (total simulation time 3 μs).
Analysis of the volumes around lysine 36 was performed using POVME3 (112) with a grid spacing of 0.4 Å, a distance cut of 0.4 Å, a contiguous points criterion of three, and a convex hull exclusion. The coordinates of the inclusion spheres are: 35.30, 40.30, 34.55 and 30.00, 39.77, 34.69 each with a radius of 3.0 Å (Fig. S13). Out of the total simulation time of 3 μs, 10 % of the simulation frames were randomly chosen and the volume calculated. This process was done in triplicates for each methylation state (me0, me1, and me2) for NSD2 WT and NSD2 T1150A.

Differential gene expression analysis of NSD2 WT and T1150A
The Cancer cell line encyclopedia (https://sites. broadinstitute.org/ccle) and COSMIC (cancer.sanger.ac.uk) databases were screened for hematological cancer cell lines harboring the NSD2 T1150A mutant, which revealed a diffuse large B cell lymphoma cell line (OCILY-18) containing the mutant of interest. Four more NSD2 WT control cell lines were selected (OCILY-1, OCILY-7, OCILY-10, and DOHH2) for comparison with OCILY-18 (60,(113)(114)(115). The control cell lines were selected to have the same cancer disease subtype (diffuse large B cell lymphoma) as NSD2 T1150A cancer cell line OCILY-18 and at the same time not to carry other mutations in NSD1, NSD2, NSD3, and SETD2. Additionally, NSD2 WT-containing cell lines with overexpression of NSD enzymes in comparison to other control cell lines were not selected. Moreover, gene expression data and mutational profiles of the selected cell lines needed to be available. The GSE57083 dataset, which contains the RNA expression microarray data of all mutant and control cell lines in the same platform (GPL570), was used to retrieve the differentially expressed genes (Log 2 fold change (FC) ≥2 or ≤−2, adjusted p-value < 0.05). The analysis was done using the GEO2R tool (https://www.ncbi.nlm.nih.gov/geo/geo2r/) (116) with implemented Benjamini & Hochberg correction to obtain adjusted p-value and correction for multiple testing. The probes which are not specifically assigned to a single gene were removed.
The differentially upregulated and downregulated genes were analyzed for significantly enriched gene ontology terms and biological processes using the (Enrichr) analysis tool (https://maayanlab.cloud/Enrichr/) at false discovery rate <0.05 using Benjamini & Hochberg correction (117)(118)(119). In order to investigate the most enriched histone modification(s) and transcription factors at the differentially expressed genes, the ChIP-seq data in the ENCODE-histone modifications and ENCODE-TFs databases were analyzed using the Enrichr analysis tool (117)(118)(119). All hits were ranked according to their adjusted p-value (significance when adjusted p-value < 0.05 using Benjamini & Hochberg correction).
Statistics T-Tests were conducted with Excel using the specified settings. p-values based on binomial distributions were calculated with Excel using the Binom.dist function.

Data availability
All biochemical data generated or analyzed during this study are included in the published article and its supplementary files. Table S3