Biotechnological context

Monomeric Red Fluorescent Proteins (mRFPs) are widely used as genetically encodable tags for studying cellular processes. Their distinctive fluorescence results from the chemical rearrangement of amino acids, giving rise to the formation of an acylimine, further modified in different mRFPs. The DsRed precursor (λexcmax = 558nm, λemmax = 583nm) has the disadvantage of being tetrameric and having low photostability and a slow folding rate. For these reasons, several monomeric variants have been engineered to improve performance in terms of photostability and structural stability and to cover a different range of wavelengths. By using random mutagenesis (Shaner et al. 2004) the series of mFruits was obtained, which comprises mCherry (λexcmax = 587nm, λemmax = 610nm), mStrawberry (λexcmax = 574nm, λemmax = 596nm), and mOrange (λexcmax = 548nm, λemmax = 562nm). Crystal structures of these proteins have been obtained (Shu et al. 2006) and these all show the canonical β-barrel structure harboring an α-helix comprising the residues involved in the formation of the chromophore. The β-barrel in the monomeric mCherry variant (PDB code 2h5q) comprises 11 strands with the chromophore formed by the contiguous residues Methionine-Tyrosine-Glycine (collectively referred to as position 66 in the PDB entry and in our sequence numbering). We report here the near-complete assignment of backbone NMR chemical shifts for mCherry. The N-terminus (residues − 4 to 3) and C-terminus (residues 224–231) are not present in the crystallographic structure and are dynamically disordered. The NMR assignments presented here will be used to address the photostability of the mFruits.

Method and experiments

Protein expression and purification

A pET11a plasmid containing the gene of mCherry from Discosona sp. (Uniprot code Q5S3G8) containing an N-terminal poly-His tag MGHHHHHHG was transformed in BL21(DE3) E. coli cells. A single colony was picked and grown overnight at 37°C in Luria-Bertani media, then spun down and resuspended in 50 mL M9 media to grow overnight. After spinning down, 5 parallel growth flasks were prepared; in each, 4 mL of culture was added to 200 mL M9 medium containing the relevant isotopes (1 gr/L NH4Cl and 3 gr/L 13C6-D-glucose). After growing the cells at 37°C for 4h, 0.001M IPTG was added, and the temperature lowered to room temperature (26°C). Expression was stopped after 24h and the protein pellet was purified on Ni-NTA columns, dialyzed in 50mM Tris pH8.0, 2mM DTT, 0.5mM EDTA and then dialyzed in 20mM Tris, 50mM NaCl pH 7.0. Validation of correct expression of the full-length protein was obtained by ESI-TOF mass spectrometry from which a labelling efficiency of 97% was obtained. The protein used for NMR analysis was inclusive of the poly-His tag.

NMR spectroscopy

The NMR sample consisted of 1.5 mM U-15N,13C protein dissolved in 20 mM Tris-HCl, 50 mM NaCl, pH 7.0, 5% D2O and spectra were acquired at 303K with a Bruker Avance III HD spectrometer operating at 950MHz proton frequency, equipped with z-axis gradients and a cryogenic probe head (TCI). The following experiments were collected using Bruker library pulse sequences: 2D 15N-1H HSQC, 3D HNCO, HN(CA)CO, HNCACB, HN(CO)CACB, H(NCA)NH, HN(CO)CA, H(NCOCA)NH, iHNCA, using TROSY (Pervushin et al., 1997) and BEST (Lescop et al., 2007) methodology.

Data were processed with NMRPipe (Delaglio et al. 1995) and spectral assignments were made with NMRFAM-Sparky (Lee et al. 2015). Spectral offsets of 0.5*93 Hz were applied to the amide 15N and 1H dimensions of all TROSY spectra to align them with the correct decoupled chemical shift values of the 2D 15N-1H HSQC spectrum. Chemical shift referencing followed the IUPAC recommendation of the protein NMR community (Markley et al., 1998).

Extent of assignments and data deposition

Despite the presence of more than 236 residues, the 15N-1H TROSY-HSQC spectrum of mCherry is well resolved (Fig.1), and the sensitivity of 3D BEST-TROSY spectra were adequate for the assignment. Strong signals in the H(NCA)NH and H(NCOCA)NH spectra were helpful for assigning the more dynamic regions, including the termini.

Fig. 1
figure 1

15N-1H TROSY-HSQC NMR spectrum of mCherry with residue-specific assignments

Residue numbering “_Atom_chem_shift.Auth_seq_ID” in the BMRB deposition (ID 51489) follows that reported in the crystal structure (PDB code 2h5q), which runs from Met(-4) to Lys231. The cyclized residues Met-Tyr-Gly are collectively referred to as number 66 and the sequence then continues with Ser69. In the crystal structure, the N-terminal region is disordered and the first residues were not observed. NMR spectra allow the observation of 6 residues in the N-terminal region (from Ser(-2) to Asp3) that were lacking in the electron density map. Their NMR chemical shifts confirm that this region is disordered. Similarly, the last eight residues are not observed in the crystal structure but are assigned in our work and are confirmed disordered. Residues 66–76 containing the chromophore could not be assigned. Also, no assignment was obtained for residues 37, 53–54, 80–83, 130 and 221–222. The completeness of assignments is: 1H (199), 15N (199), 13Cα (211), 13Cβ (187), and 13C’ (210). Excluding the His-tag, the protein sequence contains 234 amino acids, of which 26 Gly (having no 13Cβ) and 12 Pro (lacking 1H and yielding no 15N assignments in the triple resonance experiments). The extent of completeness for the aforementioned nuclei is then 86%, 86%, 90%, 90%, and 90%, respectively.

An intial secondary structure analysis was obtained with the TALOS + software (Shen et al. 2009) and compared with the classification obtained with the STRIDE software (Heinig and Frishman 2004) from the crystallographic structure in Fig.2. Good overall agreement is observed.

Fig. 2
figure 2

(a) Secondary structure of mCherry obtained by using the present NMR chemical shift assignment used as input for the TALOS + software. Blue regions refer to β-strands and red regions to α-helices. (b) Secondary structure of mCherry obtained from the crystal structure (PDB code 2h5q) used as input for the STRIDE software. Blue regions refer to β-strands and red regions to α-helices

Although canonical helix and strand conformations are easily detected by programs like TALOS+, the combination of 1H, 15N, 13Cα, 13Cβ, and 13C’ chemical shifts contains more information about the backbone conformation. It is possible to recover the eight common structural motifs defined as Dictionary of Protein Secondary Structure (DSSP) by the program CheSPI (Nielsen and Mulder 2021). A CheSPI analysis for mCherry is shown in Fig.3. The top panel (a) shows the much richer structural classification, where colors depend on backbone geometry and structural context (see legend). Furthermore, the heights of the bars (CheZOD score) reflects the dynamic information content of the shift information, with a values of 8 marking the border between order and disorder, and values below 3 indicative of ‘random coil’ dynamic averaging. As can be seen in the central panel (b), the canonical secondary structure elements are well retrieved, but some regions diverge from this. The bottom panel (c) shows a summary in which also coil (grey line), turn (green arc), and 310-helices (magenta squiggle) are identified, in addition to α-helix (red squiggle) and β-strand (blue arrow).

Fig. 3
figure 3

CheSPI analysis for mCherry. (a) On the CheSPI color scale, well-formed strands and helices are defined by blue and red colors, respectively, while coil color depends on context; turns are shown in green, and disordered, ‘random coil’, residues are displayed as grey. Hues change from red through orange to yellow at the C-terminal ends of helices and green at the ends of β-strands. For a more comprehensive explanation of the PCA analysis underlying CheSPI colors, the reader is referred to the paper of Nielsen and Mulder (Nielsen and Mulder 2021). (b) Stacked bar plot of CheSPI populations of “extended” (blue), “helical” (red), “turn” (green), and “non-folded” (grey), local structures (c) CheSPI DSSP-8 assignment. Cartoon of the most confident CheSPI prediction: coil (grey line), turn (green arc), 310-helix (magenta squiggle), α-helix (red squiggle), β-strand (blue arrow)