First thermostable CLIP-tag by rational design applied to an archaeal O6-alkyl-guanine-DNA-alkyl-transferase

Graphical abstract Lessons from molecular evolution: the acquired knowledge on the human MGMT is the starting point for applying a rational design to other AGTs, in order to obtain alternative (in our case thermostable) homologues and expand the “SNAP-tag technology” to organisms and/or reaction conditions incompatible with the activity of the commercial protein-tags.


Introduction
Self-labelling protein-tags (SLPs) catalyse the covalent, highly specific and irreversible binding of the fluorogenic moiety of synthetic ligands designed by mimicking the physiological protein substrate. Their relatively small dimension makes them powerful tools for a huge number of applications [1]. To this aim, in 2003, the group of Prof. Kai Johnsson introduced for the first time an engineered version of the human O 6 -alkyl-guanine-DNA-alkyl-tra nsferase (hMGMT), by applying a molecular evolution approach in order to: i), erase the protein DNA binding activity, ii) increase the catalytic activity toward the pseudo-substrate O 6 -benzylguanine (BG-) derivatives [2][3][4][5][6] (Fig. 1). The unique features of this protein that covalently retain a part of the BG-substrate allowed a huge number ''biotech" applications: conceptually, all desired chemical groups, if previously conjugated to the benzyl moiety, could be covalently linked to the protein [2][3][4][5][6]. Accordingly, an impressive number of papers appeared in literature, leading to the commercialization of this engineered hMGMT as ''SNAP-tag" (New England Biolabs) whose application cover various disciplines, https   from medicine to biotechnology: genetically fused to a protein of interest (POI), it has been proposed as valid alternative to fluorescent proteins (FPs) and affinity-tags proteins (as GST, MBP, His-tag, etc.) mainly in fluorescence microscopy [7], but also in protein purification/immobilization methodologies [8]. Noteworthy, despite the disadvantage to utilise an external substrate, SNAPtag revealed particularly useful in some contexts, as anaerobic conditions, where FPs fail to produce a fluorescent signal [9]. Furthermore, the high enzyme specificity for the substrate and the extreme versatility of the production of BG-conjugates make the SNAP-tag an ideal tool for the production and development of new generation biosensors [10,11].
Afterwards, the need to get an available SLP with an orthogonal substrate specificity with respect to the SNAP-tag for in vivo multiprotein labelling, induced Johnsson and co-workers to a further development, again by molecular evolution, of a variant of the hMGMT (the CLIP-tag), able to recognise O 2 -benzyl-cytosine (BC-) derivatives [12] (Fig. 1). This is peculiar, because the great majority of model organisms does not react with BC-, avoiding any endogenous activity [12]. The advantage of possessing two orthogonal SLPs to be genetically fused to two respective POIs is particularly suitable for in vivo and in vitro protein-protein interaction studies, by methodologies as the Selective crosslinking of interacting proteins (S-CROSS) [13], as well as by employing FRET fluorophore pairs conjugated to the BG-and BG-substrates.
Despite the effort pushed on the hMGMT variants in order to increase their general stability [6], it is known that the SNAP-tag and the CLIP-tag have the limitation of being employable in vitro under moderate reaction conditions and in vivo in ''mesophilic" model organisms characterized by a temperature optimum lower than 40°C. Although these conditions cover most applications, these SLPs are not suitable for studies on (hyper)thermophilic organisms and, more generally, organisms that thrive in nonpermissive environmental conditions (high pressure, extremes of pH, high ionic strength, etc.). To fill this gap, from 2012 on, Perugino and collaborators identified an O 6 -alkyl-guanine-DNA-alkyl-tr ansferase activity in the hyperthermophilic archaea Saccharolobus solfataricus: as expected, this protein is involved in in vivo direct DNA repair by alkylating agent treatment [14]. The fruitful lesson from the SNAP-tag, where the aminoacidic residues involved in the recognition of the BG-derivatives were identified, led to the production by site-directed mutagenesis of a S. solfataricus OGT variant (defined SsOGT-H 5 , Fig. 2 and Fig. S1), which has been fully characterized: it is unable to bind DNA but still retaining a high catalytic activity on the fluorescein BG-derivative (SNAP-Vista Ò Green, hereinafter BG-FL, Fig. S2) [14]. Furthermore, SsOGT-H 5 revealed a strong thermostability comparable to the wild type and a general resistance to other chemical-physical denaturants [14]. Unlike hMGMT and relative variants, the OGT homologue from S. solfataricus is not sensitive to chelating agents [14], suggesting the absence of any structural Zinc ion: this was then confirmed by the 3D resolution of this archaeal protein [15]. All these premises constituted the starting point for exploiting this mutant as an innovative SLP in ''thermophilic" contexts. Indeed, Ss-OGT-H 5 was used as a protein tool genetically fused to a thermostable enzyme [16], as well as it underwent heterologous expression in thermophilic organisms [16,17]. In particular, we purified at high yield and purity the SsOGT-H 5 -lacS fusion protein by means of immobilized metal affinity chromatography (IMAC), as well as by incubating the E. coli cell free extract at high temperature (70°C) in order to obtain a selective precipitation of E. coli proteome and recover the heterologously expressed thermostable protein in the soluble fraction [16]. Moreover, the presence of this thermostable SLP (hereinafter TS SNAP) also allowed a fourfold increase the thermostability of the alpha-carbonic anhydrase from Sulfurihydrogenibium yellowstonense [18,19]. The heterologous expression of the TS SNAP was successfully achieved in the thermophilic bacterium Thermus thermophilus HB27 EC , as well as in the hyperthermophilic archaea Sulfolobus islandicus [16,17]. In both cases, these organisms were permeable to BG-FL, making possible the determination and the measure of the activity of TS SNAP, demonstrating the correct folding at high temperatures in these organisms [16,17].
By means of computational and structural approach, we have rationally driven the tuning of the substrate specificity of the TS -SNAP from BG-to BC-derivatives, in order to produce and fully characterize the first, to date, thermostable CLIP-tag ( TS CLIP; Fig. 2 and Fig. S1). The structure-based analysis allowed us to detect the peculiar amino acids involved in substrate specificity for this class of enzymes, as well as to propose a general approach for the engineering of any O 6 -alkyl-guanine-DNA-alkyl-transferase turning it into a SNAP-and a CLIP-tag variant.  1. Scheme of the irreversible reaction of SNAP-and CLIP-tag on their respective BG-and BC-derivative substrates. Upon the reaction, the benzyl moiety is covalently linked to the catalytic cysteine (white triangle). If desired chemical groups (indicated as 1 and 2) are previously conjugated to BG-and BC-, it is possible to achieve a specific multiprotein labelling by these commercial SLPs.

Results
ants, in complex with DNA or ligands, etc.), highlighting the molecular mechanisms of these proteins in the direct DNA repair and substrate specificity. Spanning from mesophilic to thermophilic environments, AGTs display a different primary structure, whereas all of them show a typical protein architecture, mainly consisting of two globular domains [8,20]: a poorly conserved N ter domain, whose function is not still well understood (likely involved in regulation, cooperative binding, and stability; [15,21,22]), and a highly conserved C ter domain housing all the functional elements for protein activity; i.e., the helix-turn-helix motif (HTH) responsible for the DNA binding, the -V/IPCHRVV/I-amino acid consensus sequence including the catalytic cysteine, and the active site loop that is mainly involved in the substrate specificity and it is characterized by a conformational plasticity for which it is annotated as a structural element undergoing spatial rearrangement along the catalytic cycle (Fig. 2) [8,[21][22][23][24][25].
The 3D structure of SsOGT revealed some peculiarities of this thermostable protein in the N ter domain, as the presence of a SAS bridge (C29-C31), the absence of any structural Zinc ion, and the interconnection between the N ter and the C ter domain by the D27-R133 residues interaction [15]. Site directed mutagenesis targeting the above-mentioned residues demonstrated their role on thermal stability of this protein [15]. Based on these data, we explore a structure-and computational-based pipeline to design a thermostable OGT specifically active on BC-derivatives for biotechnological application which require an SLP with an orthogonal substrate recognition both for molecular in vitro studies and for in vivo multi-protein labelling. Accordingly, the experimental workflow described below includes three key steps: i) primary sequence design; ii) in-silico structural analysis to select the molecular determinants of thermal stability and substrate specificity; iii) engineered protein purification and biochemical characterization.

The Chimera CLIP mutant
The molecular evolution approach leading to the production of the SNAP-tag and the CLIP-tag starting from hMGMT, clearly revealed that, as expected, the mutations for the enhancement of the activity towards BG-, as well as for the change in substrate specificity towards BC-derivatives mainly fall in the C ter domain, in particular in the H4 helix of the HTH motif and the active site loop ( Fig. 2 and S1) [6,12]. The first attempt to produce a thermostable OGT active on BC-derivatives was indeed the construction of a chimeric gene, by combining the N ter domain (including the connecting loop) of SsOGT with the C ter domain of the CLIPtag. Trying to assure enough stability to this chimera, the wellknown D27-R133 ion-pair of SsOGT [15] was introduced, by replacing a glycine residue in the CLIP-tag C ter domain with an arginine residue (G160R mutation). This chimera (Chimera CLIP ; Fig. 2 and Fig. S1) inside E. coli cells revealed a specific activity on CLIP-Cell TM TMR-Star (hereinafter BC-TMR; Fig. S2), demonstrating that the selectivity offered by the C ter domain coming from the commercial tag was not influenced by the presence of a different N ter domain from the SsOGT (Fig. S3a). This very promising preliminary result, however, was hampered by the purification trials failure, probably due to the strong instability of the chimeric protein.

The tailored gene expressing SsOGT CLIP
Afterwards, a synthetic gene of SsOGT was produced, where its H4 helix and the active site loop were substituted by the same structures from CLIP-tag (the so-defined ''CLIP" region in Fig. S1), leading to the purification of SsOGT CLIP ( Fig. 2 and Fig. S1).
In order to rationalize the target protein design, we generated the predicted tertiary structure of SsOGT CLIP and Chimera CLIP using Robetta (https://robetta.bakerlab.org/) server [30]: by optimal superposition of the predicted models with the experimental structure of SsOGT (PDB ID: 4ZYE) [15], we were able to validate the potential improved stability of SsOGT CLIP , mainly based on the introduction, among the others, of three aminoacidic substitutions belonging to the wild type SsOGT, i.e., K78E, F84W and H147L. In particular, the mentioned mutations led to the instauration of an ion pair, a H-bonds and a hydrophobic contact, respectively ( Fig. 3). Interestingly, two of the resulting contact are responsible for molecular bridge connecting the N-terminal to the C-terminal domain. Indeed, the side chain of W84 is involved in a H-bond with the carboxylic oxygen of the I58 main chain, on the other hand L147 of H5 produced a hydrophobic interaction with F53 that lies on H1 helix.
Although the heterologous expression of this protein was quite low, a few amounts was purified and tested with BG-and BC-fluorescent derivatives. As shown in Fig. S3b, by a qualitative assay with fluorescent substrates, we revealed that TS SNAP and SsOGT CLIP possess a strong specificity towards their relative substrates, and an undetectable activity with the orthogonal ones, even after a prolonged incubation (see 5.4 Section). Unfortunately, also in the case of SsOGT CLIP , heat treatment above 50°C resulted in a thermal denaturation and in vitro protein precipitation of this protein (data not shown). The insertion of mesophilic fragments from the CLIPtag in the scaffold of the SsOGT led to protein instability as in the heterologous expression as well as in the thermal denaturation. For these reasons, we decided to maintain the overall primary structure of SsOGT as much as possible, in order to hit only few aminoacidic residues in the wild-type protein, trying to preserve the thermophilic feature of the designed construct. Since biochem-ical evidences were in good agreement with the comparative structural analysis, we applied again the computational approach to drive the design of the SsOGT-MC 8 mutant.

Biochemical characterization of the SsOGT-MC 8 mutant
Detailed information from the in-silico studies on SsOGT were then applied for the construction of a synthetic gene, in which eight aminoacidic residues in the wild type were substituted, in particular four of them (S100A, R102A, M106T, K110E) were kept as in the TS SNAP enzyme [16], in order to avoid any DNA binding activity. Other substitutions (G105N, S109D, G130P, S132L) coincide with the aminoacidic residues typical of the CLIP-tag enzyme ( Fig. S1) from which we just excluded the Y-to-E mutation on H3 helix keeping the Y90 in the primary structure of SsOGT-MC 8 , in order to maintain the active site loop anchored to the H4 helix by establishing H-bond with Y90 (the structure-based rationale of such aminoacidic mutation is discussed below).
The expressed protein, the SsOGT-MC 8 (Fig. 2) was easily purified from the E. coli cell free extract by affinity chromatography. The catalytic activity (in terms of second-order rate constant; [12,13,15,16,26]) of this tag was determined at the relative physiological temperature (65°C): as the previously citated engineered variants, it exhibits activity on the BC-TMR substrate, whereas it was not possible to measure any activity on orthogonal BG-FL substrate, displaying a specular behaviour with respect to the TS SNAP (Table 1), thus impeding in both cases to determine a BC/BG ratio value. On the other hand, purified CLIP-tag reacts only 10 2 faster on BC-TMR than on BG-FL, partially in agreement with previous data [12,26], whereas SNAP-tag is generally more specific on BGderivatives (10 3 faster; Table 1) [12,26]. The residual reactivity of CLIP-tag towards its non-respective substrate, makes it advisable to first label the SNAP-tag to minimize cross-reactions [26].
An analysis of the cross-reactivity of thermostable SLPs was therefore performed by employing two different approaches. First, by using fluorescent substrates in competition with customised non-fluorescent nucleobases (BG-1 and BC-2, Fig. S2 and S4) [15,27], IC 50 values were determined. As shown in Table 2, all enzymes particularly prefer fluorescent substrates respect to our customised product (Fig. S5), leading to very high IC 50 values, if compared to classical AGTs' inhibitor, as BG-and Lomeguatrib [28]. Nevertheless, SsOGT-MC 8 and CLIP-tag displayed an expected high orthogonal specificity, but this behaviour could be an effect of the substrate pairs used [26].
The second approach involved the utilization of a BGconjugated agarose resin (SNAP-Capture Pull Down Resin, New England Biolabs), generally suitable for the SNAP-tag selective immobilization. As shown in Fig. 4a, the degree of the covalent protein immobilization depends from the total amount of protein (input, I) and the unbound protein in the flowthrough (FT) by following the equation in Fig. 4a. Only a specific activity on BGderivatives (green enzymes in Fig. 4a) leads to the immobilization, increasing the value in term of percentage of bound protein on the resin.
Considering that this experiment was performed at room temperature (see Section 5.7), as expected, SNAP-tag is effectively the most active on the BG-resin, followed by the TS SNAP and SsOGT, confirming that the mutant is more active than the wild type at room temperature [16] (Fig. 4b). Because the inactive mutant SsOGT C119A [15] does not interact with the BG-substrate on the resin, the covalent binding of BG-reactive SLPs is clearly from their specific activity (Fig. 4b).
The addition of BC-reactive SLPs on BG-resin (red bars in the histogram of Fig. 4b) resulted in a partial but not negligible immobilization of the CLIP-tag, whereas SsOGT-MC 8 displayed a much lower protein immobilization (Fig. 4b), probably due to the low activity of this protein-tag at room temperature. All data clearly demonstrated that this variant of SsOGT is effectively specific on BC-derivatives and could be used in combination with orthogonal BG-derivatives with a low risk of cross-reactivity.
Finally, we tested the thermal stability of SsOGT-MC 8 by Differential Scan Fluorimetry (DSF), as described [15,16,29]. It is very important to underline that this technique is usually carried out from 20°C to 95°C at a ramping of 1 % (scan rate of 1 min/°C Â cycle). Under these conditions, SNAP-tag showed a T m over 65°C, revealing an increased stability of ca. 17°C respect to the wild type hMGMT counterpart [6]. In the case of thermostable  Fig. 3a represents the ion pair between E78 and K81; Fig. 3b shows the N ter -to-C ter contact based on Hbond of W84 with I58; Fig. 3c highlights the hydrophobic interaction between L147 and F53. Each panel is indicated with the corresponding letter (in red) in the cartoon representing the optimal superposition of SsOGT CLIP and Chimera CLIP predicted structures, depicted in orange and grey, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Table 1 Catalytic activity of commercial and thermostable SLPs on BG-and BC-derivatives.
Ref. OGTs from Saccharolobus solfataricus and Pyrococcus furiosus, they were not effectively denatured and it was not possible determine T m values [15,16,29]. The experimental protocol was then opportunely modified by increasing to 5 min/°C Â cycle and maintaining the temperature limits [15,16,29]. This prolonged heat treatment considerably reduced the T m value of the SNAP-tag (Table 3 and Fig. S6), while the SsOGT-MC 8 variant showed a stability to thermal denaturation almost identical to TS SNAP [16], which is slightly lower than the SsOGT wild type (80°C) [16]. Indeed, the mutations introduced in SsOGT in order to change the substrate specificity did not affect the thermal stability of the SsOGT-MC 8 mutant, thus proposing it as a valid alternative of the CLIP-tag under high temperature conditions, both in vivo and in vitro. 2.1.5. In vivo expression of TS SNAP and SsOGT-MC 8 in thermophilic bacteria. The marked thermostability and the substrate specificity of this new variant have been tested in an in vivo experiment in the thermophilic bacterium T. thermophilus HB27 EC , as described for TS SNAP [16]. After an overnight culture at 65°C, transformed cells with the empty pMK184 plasmid, as well as the constructs containing the ogt-H 5 and ogt-MC 8 genes, were centrifuged and resuspended in PBS 1 Â buffer in the presence of fluorescent substrates. As seen for BG-FL [16], cells are also permeable to BC-TMR (Fig. 5), suggesting that also this substrate can be utilised in this model organism. Although the protein expression is far from being optimised, a specific fluorescent signal is only present where the tags with the own substrate are incubated (1 h at 65°C), whereas the reaction with the orthogonal one did not reveal any fluorescent signal, reproducing the identical in vitro results employing purified proteins (Fig. 5). The presence of a specific BC-activity from the heterologous expression of the ogt-MC 8 gene Table 2 Cross-reactivity of SLPs by competitive inhibition method (IC 50 ), by using fluorescent derivatives as substrates and non-fluorescent ones as competitors.    clearly demonstrated that this SLP is correctly folded and, as the TS -SNAP, could be employed in thermophilic model organisms as TS CLIP.

The 3D structure of TS CLIP
The presented crystallographic studies allow us to obtain the first structural model of a SLP with substrate specificity towards the BC-substrates (PDB ID: 8AES). TS CLIP produced crystals that diffracted at 2.8 Å of resolution (Table 4) and thanks to the highquality electron density map we were able to assign the orientation of all the amino acids along their sidechains, building the structural model of SsOGT-MC 8 that shows the typical folding architecture of all AGTs, consisting of two globular domains connected by a long loop. Since the sequence is composed for the 95 % of the aminoacidic residues of SsOGT, the engineered construct maintains the overall molecular architecture of SsOGT including the determinants of the protein stability discussed below. In particular, the N ter domain (a.a. 1-53) folds into an anti-parallel b-sheet, connected to a conserved a-helix (H1) by a random-coiled region; that is further stabilized by the peculiar disulphide bridge established between the C29 and C31. As mentioned above, the main modifications with respect to wild type SsOGT occurred on the C-terminal domain, as it is directly involved in substrate recognition; indeed, it houses the catalytic C119 residue within the conserved -PCHR-signature, the ''inactivated" HTH motif, responsible in wild type protein for the DNA minor groove binding, the active site loop and the ligand-binding pocket composed of the 'asparagine hinge' and H4 helix.
By optimal superposition of TS CLIP with the wild type protein, we observed the repositioning of discrete regions in TS CLIP, in particular the active site loop moves of 2.6 Å from the central core of the protein towards the bulk solvent; it could be referred to the simultaneous presence of P130 that imposes rigidity and L132 that is characterized by higher flexibility (B-factor = 57.35) compared to the wild type serine (B-factor = 26.27). As consequence of such conformational change, we notice that R133 side chain shifts towards H4 inducing a restriction of the upper part of the active site pocket. However, in the new conformation R133 still maintains the salt bridge with D27, whose importance for protein stability has been described (Fig. S7) [15,25,30].
The crystallographic analysis of TS CLIP allowed us to determine the molecular contacts responsible for the achieved stability of such protein compared to SsOGT CLIP , confirming the reliability of protein engineering process performed based of the predicted structure [31].
As observed in Fig. 6, TS CLIP gained an intrahelical ion bond between D95 and K91, and a second H-bond between E78 and S96, connecting H2 and H3 helices, both bettering the overall stability of the enzyme; the main chain I128-K138 H-bond was restored too, as occurred in SsOGT, anchoring the H5 helix to the ligand binding cavity wall, opposing the one constituted by the HTH. In addition, the substitution of a glutamic acid with a tyrosine (E90Y) with respect to SsOGT CLIP has a huge relevance, for the fact that Y90 is involved in the coordination of a H-bond-based network with V122, P130 main chain and K138: all these residues are located on the ligand entry site, making these interactions important functional elements for the selectivity towards BC-TMR substrates.
Being the first crystal structure of a SLP with binding activity on BC-derivatives we were able to depict the role of each selected mutations (G105N, S109D, G130P, S132L, Fig. S1) introduced to guarantee the substrate specificity, performing also a comparative analysis with the mutated residues included in CLIP-tag [12]. In particular, N105 and D109 on H4 helix, introduced for the coordination of cytosine moiety, and P130 on the opposite surface contribute to narrow the accessibility to the substrate binding pocket by steric hindrance (Fig. 7).
In fact, by comparing the active site gate of the wild type SsOGT with those of TS CLIP (both crystallographic structures) in complex with a BC molecule optimally superposed to the BG ligand of the SNAP-BG complex (PDB ID: 3KZZ) it is evident that SsOGT-MC 8 active site entry is shaped to host the BC moiety, while it clashes with the BG compound. Considering the steric contribution, the P130 plays a key role (Fig. 7) in re-shaping the active pocket entry, also because its conformation is stabilized by H-bond with Y90. Along playing a relevant role in the interaction with P130, Y90 (Y114E mutation in CLIP-tag) is the only residue that has not been modified by following the CLIP-tag generation protocol [12].
Johnsson and co-workers speculated on the possible role of such tyrosine in the formation of a hydrogen bond with the N 3 of BG to stabilize the developing negative charge on the leaving group guanine; consequently, they decided to mutate it into glutamic acid to abolish BG binding activity [12,23]. On the contrary, TS CLIP is characterized by a higher substrate specificity also in presence of such tyrosine. Differently from CLIP-tag, in TS CLIP the selectivity has been achieved also by steric hindrance, obtained as result of the inserted mutations, that allow the binding of BC-instead of BG-derivatives.
This observation is further confirmed by the biochemical investigation in which it was not possible to determine the kinetic of binding of BG due to its complete exclusion from the active site pocket.

Conclusions and perspectives
This work, resulting from the lesson of the molecular evolution performed on the hMGMT, is the demonstration that the knowledge acquired by the group of prof. Johnsson led to the identification of residues involved in the substrate specificity of this enzyme [12]. Since the AGTs are evolutionary conserved [14], this information could become a tool to apply mutations rapidly and unequivocally to all known AGTs, in order to develop new engineered SLPs. The only limitation of this approach is that the identified AGTs must be sensitive to benzyl-guanine inhibition (note that E. coli AdaC is not inhibited by BG, so it is not active on BG-derived substrates [32,33]). As shown in Fig. S8, the ''SNAP-tag technology" could potentially be used in each model organism: if the chemical-physical growth conditions are compatible with com-mercial SNAP-and CLIP-tag, it is advisable to exploit them, given their high catalytic activity on BG-and BC-derived substrates. In other cases, it is possible to search for a specific BG-activity using fluorescent substrates and, starting from the acquired knowledge, proceed to the identification and appropriately the modification of the endogenous AGT, in order to produce homologous SNAPand CLIP-tag, whose behaviours are compatible with the in vivo conditions of the organism used.
Using this methodology and starting from an archaeal OGT, which is stable at high temperatures and to the most common denaturing agents [14], we engineered it obtaining a pair of thermostable SLPs ( TS SNAP and, for the first time, TS CLIP). These new protein-tags could allow an expansion of this technology in (hyper)thermophilic organisms and in all non-permissive reaction conditions for commercial SLPs.  Fig. 6a is a zoom-in of the network of contacts coordinated by Y90 in which we highlighted the potential Hbond between Y90 and the main chain of P130, V122 and the side chain of K138 respectively. Fig. 6b represents the H3 intra-helix ion pair between D95 and K91 and the Hbond established by S96 and E78; the latter substitutes the intra-helix contact of SsOGT CLIP (orange) with an inter-helix bond in TS CLIP (raspberry). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)  used. Silica gel 60 (70-230 mesh) was used for gravity column chromatography (GCC).

DNA constructs
To obtain the E. coli expression vectors, similar procedures were applied for all the proteins. In particular, pQE-CLIP-tag and pQE-ogtMC 8 plasmids were used as templates to amplify the relative genes, by using Nco-NZY-Fwd / QE-Rev oligonucleotides pairs (5 0 -AGGAGATATACCATGGCACACCATCACCATCACCATACGG-3 0 / 5 0 -CATTACTGGAT CTATCAACAGGAG-3 0 ). Subsequently, the amplified fragments and the pHTP1 (NZYTech, Portugal) recipient vector were digested with Nco I and Hind III restriction enzymes and ligated. Then, the resulting ligation mixture was used to transform E. coli DH5a competent cells and positive colonies were confirmed by DNA sequence analyses. Whereas pQE-ogtH 5 was cloned as previously described [14]. In addition, to reach the final vector for the heterologous expression of SsOGT-MC 8 mutant in T. thermophilus HB27 EC strain [34], pQE-MC 8 plasmid was first digested with BamHI and HindIII enzymes to recover the MC 8 gene fragment, which was then subcloned in the pMK-ogtH 5 [16].

Protein expression and purification
TS SNAP was purified as previously described [16]. CLIP-tag and SsOGT-MC 8 proteins were expressed in E. coli BL21 Rosetta2 (DE3) cells, grown at 37°C in Lysogeny Broth (LB) medium supplemented with 50 mg/L kanamycin and 30 mg/L chloramphenicol [35][36][37]. The protein expression was induced with 1 mM isopropyl-thio-b-D-galactoside (IPTG) when an OD 600nm of 0.5-0.6 was reached. The biomass was collected and resuspended 1:3 (w/v) in purification buffer A (50 mM phosphate, 300 mM NaCl; pH 8.0) supplemented with 1 % Triton X-100 and stored overnight at À20°C. Subsequently, it was treated in ice with lysozyme and DNAse for 60 min and sonicated as described [14]. After a centrifugation of 30 min at 60,000 Â g, the cell extract was recovered and applied to a Protino Ni-NTA Column 1 mL (Macherey-Nagel) for His 6 -tag affinity chromatography, accordingly to the procedure previously described [27]. The eluted protein fractions were collected, dialysed against phosphate buffered saline (PBS 1Â, 20 mM phosphate buffer, NaCl 150 mM, pH 7.3) and confirmed by SDS-PAGE analysis. To test the activity of purified proteins, 5 lM of enzyme was incubated with 10 lM of the relative substrate BG-FL and BC-TMR (New England Biolabs, USA) in 1 Â Fluo Reaction Buffer (50 mM phosphate, 0.1 M NaCl, 1.0 mM DTT, pH 6.5) at 25°C for CLIP-tag and 65°C for SsOGT-H 5 and SsOGT-MC 8 . After stopped the reaction in Laemmli buffer 1 Â (formamide 95 %; EDTA 20 mM; bromophenol 0.05 %), samples were loaded on SDS-PAGE gel and analysed by gel-imaging technique on a Ver-saDoc 4000 TM system (Bio-Rad) as previously reported [14,15,22].

In vivo fluorescent assay
For the in vivo assay, T. thermophilus HB27 EC cells transformed with pMK184 (used as control), pMK-ogtH 5 and pMK-ogtMC 8 plasmids were grown at 65°C in SC selective medium (tryptone 8 g/L, yeast extract 4 g/L, NaCl 3 g/L, in mineral water pH 7.5) supplemented with 30 mg/L kanamycin as late as stationary phase (OD 600nm > 1.5) [16,38]. Cell pellets from 1 mL were resuspended in 0.1 mL of SC medium in presence of 3 lM of BG-FL and BC-TMR fluorescent substrates and incubated at 65°C for 1 h. After the reaction, cells were first washed twice with 1 mL of SC medium, then denatured for 15 min at 100°C by adding a Laemmli buffer 1 Â and directly loaded on SDS-PAGE.

Substrate specificity assay by competitive IC 50 inhibition method
The substrate specificity of CLIP-tag, SsOGT-H 5 and SsOGT-MC 8 proteins were evaluated on BG-N 3 and BC-N 3 substrates by the competitive inhibition assay performed as described [15,27,29]. By using fixed concentrations of the fluorescent BC-TMR substrate (5 lM) and enzymes (5 lM), an increasing concentration of guanine/cytosine-azide derivatives (0-1 mM) was added to the mixtures. The reactions were incubated for 60 min at 25°C for CLIP-tag and 65°C for MC 8 respectively and then stopped by adding Laemmli buffer 1 Â. Subsequently, the samples were loaded on SDS-PAGE and the fluorescent bands were measured by gelimaging on a VersaDoc 4000 TM system (Bio-Rad), by applying green LED/605 bandpass filter. Then, obtained data were plotted by IC 50 equation [15,25]. As control, 5 lM of SsOGT-H 5 protein was incubated at 65°C with 5 lM of the fluorescent substrate BG-FL in presence of an increasing amount of the BG/BC-azide derivatives (0-1 mM). In this case, fluorescent bands were visualized on Ver-saDoc 4000 TM system (Bio-Rad), by applying a blue LED/530 bandpass filter.

Rate constants determination
5 lM of purified CLIP-tag, SsOGT-H 5 and SsOGT-MC 8 proteins were incubated in PBS 1 Â buffer with an excess of BG-FL/BC-TMR substrates (20 lM) at 25°C and 65°C respectively, accordingly to the method described [12,27]. . Aliquots were taken at different times and the reactions were immediately stopped in 1 Â Leammli Buffer. Finally, the samples were loaded on SDS-PAGE for the gel-imaging analysis and data were fitted to a pseudofirst-order reaction model using the GraFit 5.0 software package (Erithacus Software ltd.), and second-order rate constants k (in s À1 M À1 ) were obtained by dividing the pseudo-first-order constant by the concentration of substrate.

Capture and protein immobilization of SLPs
The SNAP-Capture Pull Down resin (New England Biolabs, binding capacity: 1 mg pure protein/mL of bed resin) 80 lL was washed with 1 Â PBS and incubated with 120 lL of 1 mg / mL of protein (SNAP-tag / CLIP-tag / H 5 / MC 8 / C119A mutant). After one minute, 5 uL of supernatant (protein plus resin) was withdrawn and the rest was incubated for 16 h at room temperature. Afterwards, 5 lL of supernatant (protein plus resin) was withdrawn and all samples were first loaded on SDS-PAGE and then protein bands were determined by coomassie staining. Visible gel-imaging was used for the determination of intensity of each protein band.

Thermal stability analysis by using DSF method
The stability of the OGTs variants was analysed by the differential scan fluorimetry method (DSF) by following the protocol previously described [6,15,16,25,39]. Triplicates of each condition containing 25 lM of enzyme in PBS 1 Â buffer and SYPRO Orange dye 1Â (Invitrogen, USA) were subjected to a scan of 70 cycles at temperatures from 20 to 95°C for 5 min/°C Â cycle, in a CFX96 Touch Real-Time PCR (Bio-Rad). Relative fluorescence data were normalized to the maximum fluorescence value and plotted vs temperature. The resulting sigmoidal curves allowed the determination of the inflection points (T m values) by fitting the Boltzmann equation [6,15,16,25,39].

Crystallographic studies
For crystallization trials, the protein was concentrated up to 7.5 mg/mL using Amicon Ultra-0.5 mL centrifugal filter devices (membrane MWCO = 10 kDa). Crystallization was performed by means of a robot-assisted (Oryx4; Douglas Instruments) sittingdrop-based spare-matrix strategy, using screen kits from Hampton Research and Qiagen, by the vapor diffusion method. Optimal Ss-OGT-MC 8 crystals grew in two weeks at 20°C in a drop obtained by mixing equal volumes of a protein and a reservoir solution containing 0.2 M tri-Potassium citrate and 20 % (w/v) PEG3350 in a final droplet volume of 2 lL, equilibrated against 50 lL of the reservoir solution. For X-ray data collection, crystals were cryoprotected in the precipitant solution supplemented with 12 % glycerol, mounted in a cryo-loop and flash frozen in liquid nitrogen for subsequent X-ray diffraction analysis. The crystals diffracted at 2.8 Å of resolution and data were collected at the ID30-B beamline at the European Synchrotron Radiation Facility (ESRF, Grenoble, France) equipped with a Pilatus PILATUS3 6 M 1000 lm Si sensor (Dectris) [40], at a wavelength of 1.008 Å. The diffraction data were indexed with XDS program [41], which assigned the crystal to the orthorhombic space-group P212121 with the following cell dimension: a = 100.83 Å, b = 119.43 Å, c = 180.24 Å, a = b = c = 90°. This cell contained 10 molecules per asymmetric unit, with a corresponding solvent content of 56.9 % and a Matthews coefficient of 2.85. Data processing was carried out using the CCP4 program suite [42] and in particular the structure was solved by molecular replacement using the program Phaser of the PHENIX software suite [43]; the structure of the wild type SsOGT protein was used as the search model (pdb: 4ZYE) and the overall sequence identity between the two proteins was 95 %. The resulting electron density map was good enough to allow iterative cycles of manual model building using Coot [44] and refinement using phenix.refine [43] ( Table 4). The atomic coordinates and structure factors of the Ss-OGT-MC 8 have been deposited in the Protein Data Bank (http:// www.rcsb.org) under the accession code PDB ID: 8AES.

Tertiary structure mode
Tertiary structure model of Chimera CLIP , SsOGT CLIP and CLIP-tag were generated by Robetta (https://robetta.bakerlab.org/) servers [31]. The software PROCHECK from CCP4 program suite [44] was used to analyse the Ramachandran plot of generated models along with the bond angles and bond lengths of the protein structure.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.