Full NMR assignment, revised structure and biosynthetic analysis for the capsular polysaccharide from Streptococcus Pneumoniae serotype 15F

Upon investigation of Streptococcus pneumoniae serotype 15F capsular polysaccharide (CPS), we discovered that it had a different phosphorylation substituent, namely glycerol-2-phosphate like the other serogroup 15 CPS rather than the originally reported 0.2 equivalent of phosphate or phosphocholine. Furthermore, we also determined the locations of the two previously unassigned O-acetyl groups present in the repeating unit of the 15F CPS, and carried out full NMR assignments of the 15F as well as 15A CPS. Lastly, a biosynthetic analysis of serotypes 15F and 15A was performed and used to make a prediction for the structure of the recently discovered serotype 15D.


Introduction
Streptococcus pneumoniae is a major human pathogen, estimated as responsible for over one million deaths annually [1].It is a Gram positive bacterium encapsulated by a polysaccharide, known as capsular polysaccharide (CPS), which shields it from the host immune system.Many different serotypes of Streptococcus pneumoniae exist, each with a unique serological response caused by having unique CPS.To date, more than 100 serotypes have been identified [2][3][4][5].
NMR spectroscopy is an important tool for the elucidation of CPS structures, as knowledge of the structures will help understand how the biosynthesis works, and how the pathogen is evolving to circumvent the high vaccine pressure.A common method for identifying the serotype of a strain isolated from a patient is PCR and consequent genetic analysis; however this method suffers from not being able to properly distinguish closely related serotypes, and consequently some strains can be mistyped [4,6,7].In the case of the recently discovered 15D which was initially typed as 15A, closer analysis and Quellung reaction typing revealed it to be a unique serotype with an unknown CPS structure [4].Similarly, when 7D was first encountered several years ago, the strain that produced the CPS was initially identified as 7C using Quellung reaction typing but as 7B using genetic analysis, and using NMR spectroscopy it was found to be a hybrid CPS with a 5:1 ratio between the CPS of 7C and 7B, arising when a 7B strain had a specific mutation at a crucial residue, F385L, in the glycosyltransferase wcwK producing a bispecific transferase [7].
Serogroup 15 consists of five capsular polysaccharides, 15F, 15A, 15B, 15C and the recently described 15D [4].At least four of the serotypes, those with published CPS structures, have a repeating unit comprised of the same five monosaccharide units [8][9][10], as shown in Fig. 1.Of these four structures, 15F stands out in two ways compared to the rest: Firstly, unlike 15A, 15B and 15C which are reported as having a glycerol-2-phosphate substituent on 70% of the repeats, 15F is reported as having 20% phosphate or phosphocholine.Secondly, 15F has two OAc groups, whose location were previously unknown, whereas 15A and 15C have none and 15B have 0.85 OAc per repeating unit [8][9][10].The phosphate substitution was previously questioned by Jones and Lemercinier (2005) [10], who also suggested that the 15F CPS should be further investigated.The presence of phosphate or phoshpocholine in 20% of the repeating units seemed at odds both with the other serotypes in serogroup 15 and streptococcus CPS in general, as no other reported structure of CPS contained such a substitution.Additionally, as the locations of O-acetylation were unknown, this also warranted further studies.Furthermore, to our knowledge neither 15F nor 15A CPS have previously been fully assigned by NMR spectroscopy.The opportunity to improve and update the knowledge of serogroup 15 CPS and CPS biosynthesis served as the inspiration for this study.

Sample preparation
The purified CPS from serotype 15F and 15A were produced by SSI Diagnostica, Hillerød, Denmark.They were dissolved in 600 μL D 2 O (99.9%; Sigma Aldrich).Ultrasonication and de-O-acetylation of 15F was carried out as previously described [11].

NMR spectroscopy
The 15F native NMR data were recorded on a 600 MHz Bruker Avance (600.17MHz for 1 H and 150.91 MHz for 13 C) and the remaining NMR data were recorded on a 800 MHz Bruker Avance III (799.85MHz for 1 H and 201.12 MHz for 13 C) equipped with a 5 mm TCI cryoprobe.The strong teichoic acid (CWPS) phosphocholine methyl signal was used as internal reference (3.200 and 54.50 ppm for 1 H and 13 C, respectively) [12].All data were acquired at 313 K.The following 2D experiments were used for the structural analysis: Double quantum filtered correlated spectroscopy (DQF-COSY), total correlation spectroscopy (TOCSY) with  With the exception of CLIP-HSQC [13], all experiments were performed using standard Bruker pulse sequences.The NMR data was acquired and  processed using Bruker Topspin 4.0.7.

Characterization of 15F CPS
As 15F CPS produced fairly broad lines in NMR acquisition (as can be seen in Fig. 2 and Figs.S1-S3), the sample was first sonicated to partially fragment the polysaccharides to increase tumbling and improve the linewidth, and then de-O-acetylated to further simplify the spectra.The spectra obtained from this sonicated & de-O-acetylated sample contained five anomeric signals not arising from the CWPS [12], which were labelled A-E in order of descending 13 C chemical shift as shown in Fig. 3. Four of these five signals had chemical shifts, as well as 1 J H1,C1 [14] and 3 J H1,H2 coupling constants, corresponding to β-configuration as well as one to α-configuration.The sample still contained signals arising from one acetyl group as part of the CPS, which was identified as originating from a β-GlcpNAc.The assignment of the five monosaccharide units were performed using HSQC, HMBC, H2BC, HSQC-TOCSY, DQF-COSY and TOCSY (shown in Fig. 3 and S10-S14), and the glycosidic linkages identified using HMBC and chemical shift analysis [15,16].The resulting assignment of the sonicated and de-O-acetylated 15F CPS is shown in Table 1.HMBC correlations between monosaccharides were observed from the anomeric position of the β-Glcp A to the 3-position of the α-Galp E, which in turn had HMBC correlations from its anomeric position to the 2-position of the β-Galp D. The anomeric position of the β-Galp D had HMBC correlations to the 4-position of the β-GlcpNAc C, the anomeric position of which in turn had correlations to the 3-position of the β-Galp B. Finally, the anomeric position of the β-Galp B had HMBC correlations with the 4-position of the β-Glcp A. This is in agreement with the reported monosaccharide compositions and linkages of the CPS of serotype 15F and 15A [8,9].
Following this assignment, the 2D NMR data obtained on the sonicated (Figs.S4-S9)) as well as native 15F CPS (Figs.S1-S3) samples were investigated.Of the five anomeric signals, only the one arising from the α-Galp E was observed to be significantly impacted by the de-Oacetylation, as the acetylation causes it to move upfield in 13  This would indicate that the OAc groups are on the 6-positions of B and E, which was confirmed by HMBC correlations from the 6-positions to the carbonyl of the OAc groups.The full assignment of the native and sonicated 15F CPS can be found in Table 1.The HMBC correlations between the monosaccharide units are the same as reported above for the sonicated and de-O-acetylated 15F CPS.The revised structure of serotype 15F CPS is shown in Fig. 1.
There does not seem to be any signals that would arise from a nonphosphorylated variation of the repeating unit, making it seem unlikely that it is not fully glycerol-2-phosphate substituted.This would seem to be in line with the observations made previously for serotype 15B [10,20].Interestingly, 15B was also initially reported as containing phosphocholine in 20% of the repeating units [21], but then later found to contain stoichiometric amounts of glycerol-2-phosphate upon full NMR assignment [10].The presence of glycerol-2-phosphate in stoichiometric amounts are at odds with the results published almost 40 years ago [8,20], and while there does exist quite a few examples of Streptococcus pneumoniae CPS that contain phosphocholine, such as serotype 16F, 16A, 24A, 27, 28F, 28A, 32A and 32F, none of them are reported as only partially substituted [5,11,19,22,23].This observation of 20% phospholine could perhaps be explained by the presence of phosphocholine in the CWPS, as also suggested by Jones and Lemercinier (2005) [10,12].

Characterization of 15A CPS
To our knowledge, no full NMR assignment of the 15A CPS has been published, and as it was needed to compare the obtained spectra of 15F CPS and to confirm the presence of a glycerol-2phosphate, the 15A CPS, which should be identical except for the lack of OAc groups, was also investigated.As expected, the obtained spectra (Figs. 2 and 4, and S15-S16) and consequently assignment (Table 2) of it was practically identical to those of the sonicated and de-O-acetylated 15F CPS, confirming that 15A and 15F CPS only differ in the lack of Oacetylation in 15A.Similar to the observation for the 15F CPS, there was no indication in the NMR data that the glycerol-2-phosphate was not present in stoichiometric amounts, which would fit with the reported glycerol amount for 15A by Venkateswaran et al. (1983) [20].
It should be noted that some studies have reported stoichiometric amounts of glycerol-2-phosphate in the repeating unit [10,20], and others 70% for the serogroup 15 CPS [8,9].
Another distinctive figure in serogroup 15 CPS is the variable Oacetylation.The cps loci of serogroup 15 all have WciZ acetyltransferase.However, wciZ in serotype 15A and 15C are inactive because of a frameshift with deletion or insertion of TA units and results in a truncated product [24].WciZ in 15F shows 99.08% amino acid identity to 15B.Based on CPS structure of 15B, WciZ transfers O-acetyl groups on the 2-, 3-, 4-, 6-positions in a ratio of 6:12:12:55 at the terminal α-Gal   15F has an extra wcjE gene, which can be found in the cps of 14 serotypes: 9V, 11A, 11D, 11F, 15F, 20, 31, 33A, 35A, 35C, 42, 43, 47A, and 47F [25,26].According to 15F CPS structure determined in this work, 15F WcjE mediates β-Gal 6-O-acetylation, which is consistent with previously proposed function of WcjE [26].Additionally, serotype 15F cps locus presents partial rhamnose synthetic genes (rmlB and rmlD) and a truncated UDP-galactofuranose synthetic gene (glf).However, none of them serve any apparent functions in CPS biosynthesis, as there are neither rhamnose or galactofuranose in 15F CPS structure.Lastly, it was recently reported that the new serotype 15D contained highly similar Wzy polymerase and WciZ O-acetyltransferase as 15F, but lack of a WcjE O-acetyltransferase [4].Therefore, the 15D CPS structure can be predicted to be highly similar to 15F CPS, but without 6-OAc at the β-Galp, as shown in Fig. 5.

Conclusion
The structure for Streptococcus pneumoniae serotype 15F was fully assigned using NMR spectroscopy and revised, and it turned out to be even more similar to 15A than previously described, as it only differs by the presence of two OAc groups.This also makes the phosphorylation of 15F more in line with what is reported for the other serogroup 15 CPS.Similarly, 15A was also fully assigned by NMR as it had to our knowledge not been done previously and was needed to verify the 15F structural analysis.This also enabled a prediction of the CPS structure of the recently discovered serotype 15D based on the reported genetics.This study further highlights the need to use NMR spectroscopy to characterize CPS structures and understand why they differ genetically and serologically, as well as the need to fully elucidate the structures to understand the biosynthesis.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Fig. 1 .
Fig. 1.The reported structures of serogroup 15 CPS.The acetyl groups of 15F have been assigned as part of this work, as their location was previously unassigned.R is 20% phosphate or phosphocholine according to Perry et al., 1982 or stoichiometric glycerol-2-phosphate as described in this work.

Fig. 3 .
Fig. 3. HSQC spectrum of the sonicated & de-O-acetylated 15F sample.The positions arising from the CPS repeating unit have been labelled as described in the text.The anomeric signals marked with * arise from the CWPS.
C and downfield in 1 H chemical shift.The other four anomeric signals were not particularly affected by the acetylation.Of the remaining signals, only those arising from the α-Galp E and the β-Galp B was affected significantly by the presence of the two OAc groups, in particular the signals arising from the 6-positions of both and the 5-position of the α-Galp E.

Fig. 4 .
Fig. 4. HSQC spectrum of sonicated 15F CPS (blue/dark green) overlayed with native 15A (red/pink).The positions connected with a line are those heavily affected by O-acetylation.

[ 10 ]
. According to 15F CPS structure determined in this paper, WciZ

Table 2
NMR assignment of 15A CPS.