A Plastic Biosynthetic Pathway for the Production of Structurally Distinct Microbial Sunscreens

Mycosporine-like amino acids (MAAs) are small, colorless, and water-soluble secondary metabolites. They have high molar extinction coefficients and a unique UV radiation absorption mechanism that make them effective sunscreens. Here we report the discovery of two structurally distinct MAAs from the lichen symbiont strain Nostoc sp. UHCC 0926. We identified these MAAs as aplysiapalythine E (C23H38N2O15) and tricore B (C34H53N4O15) using a combination of high-resolution liquid chromatography–mass spectrometry (HR-LCMS) analysis and nuclear magnetic resonance (NMR) spectroscopy. We obtained a 8.3 Mb complete genome sequence of Nostoc sp. UHCC 0926 to gain insights into the genetic basis for the biosynthesis of these two structural distinct MAAs. We identified MAA biosynthetic genes encoded in three separate locations of the genome. The organization of biosynthetic enzymes in Nostoc sp. UHCC 0926 necessitates a branched biosynthetic pathway to produce two structurally distinct MAAs. We detected the presence of such discontiguous MAA biosynthetic gene clusters in 12% of the publicly available complete cyanobacterial genomes. Bioinformatic analysis of public MAA biosynthetic gene clusters suggests that they are subject to rapid evolutionary processes resulting in highly plastic biosynthetic pathways that are responsible for the chemical diversity in this family of microbial sunscreens.


Table of Contents
Table S1.MAAs     S1.Substructures, retention times (tR) and calculated m/z values of protonated MAA variants 1 -22 from Nostoc sp.UHCC 0926.Difference of measured m/z value to the calculated m/z value (Δ, red if > ± 5) in parts per million (ppm) and peak areas of variants (relative areas in %).Structures of aminocyclohexenone (ACH) and aminocyclohexenimine (ACHI) at the top of the table.The values corresponding to tricore B (18) and aplysiapalythine E (19) are bolded which make up the 17.74 % and 52.08 % of the total MAA intermediates respectively.Intermediate structures included in the biosynthetic scheme (Figure 2) are highlighted with a grey background.

NMR Parameters
To obtain assignment of 1 H and 13 C resonances in MAAs, a set of two-dimensional homonuclear and heteronuclear NMR experiments were collected in addition to the conventional 1 H presaturation and 13 C-{ 1 H} experiments.Total correlation spectroscopy (TOCSY, mixing time of 90 ms) and double-quantum filtered correlation spectroscopy (DQF-COSY) experiments were employed to identify spin-systems.Heteronuclear single quantum coherence ( 13 C HSQC and spin-multiplicity edited 13 C HSQC) and heteronuclear multiple bond correlation ( 13 C HMBC) experiments were used to assign 1 H and 13 C one-bond and multiple-bond connectivities. 1 H experiment, with residual water presaturation during T1 recovery delay (2 s), was collected with 32k complex points, corresponding to acquisition time of 1.28 s.The signal was accumulated with 4 scans. 13C-{ 1 H} spectrum was collected using 96k complex points, corresponding acquisition time of 1s.Recycle delay was 3 seconds during which gated 1 H decoupling was applied.Twodimensional TOCSY spectrum was collected with 200 and 4100 complex points in t1 and t2, resulting in 15.6 ms and 320 ms acquisition times.TOCSY spectrum was measured with the isotropic mixing time of 90 ms using DIPSI-2 spin-lock.Number of transients was 8 and T1 recovery time set to 1 s.Twodimensional DQF-COSY spectrum was collected using 300 and 2048 complex points in t1 and t2, corresponding to acquisition times of 18.7 ms and 128 ms, respectively. 13C HSQC experiments with (and without) spin-multiplicity editing and utilizing gradient-enhanced sensitivity improvement scheme, were collected using 2 transients with 512 (400) and 2048 complex points in t1 and t2, respectively.These correspond to acquisition times of 7 (5.5)ms and 160 (92) ms, respectively.The T1 recovery delay of 1.5 was used.The 13 C HMBC spectrum was collected using 1024 and 4096 points in t1 and t2, respectively.This translates to acquisition times of 11 ms and 180 ms, respectively.Signal was accumulated with 12 transients using recycle delay of 1.18 seconds.The transfer delay for long-range correlations was set to 8 Hz.

Figure S2 .
Figure S2.MS E (E: elevated collision energy) spectra of the extracted ion chromatogram peaks representing MAA variants 1 -11 presented in Figure S1.Dotted line with arrow ends shows the magnified range of the spectrum and the x-number the magnification.Fragment m/z 151.08659 pointed with an arrow is diagnostic for MAAs.

Figure S3 .
Figure S3.MS E (E: elevated collision energy) spectra of the extracted ion chromatogram peaks representing MAAvariants 12 -22 presented in Figure S1.Dotted line with arrow ends shows the magnified range of the spectrum and the x-number the magnification.Fragment m/z 151.08659 shown in left panel is diagnostic for MAAs.

Figure S7 .
Figure S7.Partly annotated 1 H-1 H COSY spectrum of aplysiapalythine E (19) sample in D2O with enlargement from area B.

Figure S9 .
Figure S9.Partly annotated edited 1 H-13 C HSQC spectrum of aplysiapalythine E (19) sample in D2O with enlargement from area B.

Figure
Figure S10. 1 H-13 C HMBC spectrum of aplysiapalythine E (19) sample in D2O with enlargements from area B and C.

Figure S11 .
Figure S11.Partial 1 H spectra of aplysiapalythine E (19) acid hydrolysate in 2 M D2SO4 (in D2O) showing the match of anomeric protons signals to reference compounds presented in the table which values are from Giner et al., J Nat Prod 2016, 79, 2413-2417.

Figure S12 .
Figure S12.Product ions of aplysiapalythine E (19) from MS E spectrum.Line with arrow ends shows the magnified range of the spectrum and the red x-number the magnification.

Figure 19 .
Figure 19.Maximum likelihood phylogenetic tree constructed using MysA amino acid sequences.Clade where the MysA of Nostoc sp.UHCC 0926 is placed is highlighted in orange box.Support values are based on 1000 bootstraps.

Figure S21 :
Figure S21: Phylogenetic tree constructed via maximum likelihood method using MysB amino acid sequences along with the additional methyltransferase enzymes identified in the MAA biosynthetic gene clusters.Clade where the MysB of Nostoc sp.UHCC 0926 is placed is highlighted in yellow.Support values are based on 1000 bootstraps.

Figure S23 :
Figure S23: Maximum likelihood phylogenetic tree constructed using MysD amino acid sequences.Clade where the MysD of Nostoc sp.UHCC 0926 is placed is highlighted in blue.Support values are based on 1000 bootstraps.

Figure S24 .
Figure S24.Phylogenetic tree constructed via maximum likelihood method using the alignment of amino acid sequences of MysH enzymes identified within the 10 kb flanking region of the MAA BGC enzymes.Support values are based on 1000 bootstraps.

Table S3 .
Product ion data of from MS E spectrum of aplysiapalythine E variant's (19) chromatographic peak.Codes are marked to the aplysiapalythine E in FigureS3Δ = difference of calculated (Calc) and experimental (Exp) ion masses in parts per million (ppm).

Table S4 .
NMR data of the 756 Da tricore B (18) sample in D2O.

Table S5 .
Size of the chromosomal and plasmid DNA in complete genome assembly of Nostoc sp.UHCC 0926.

Table S6 .
BlastP hits for the MAA biosynthetic enzymes of Nostoc sp.UHCC 0926 based on max score values.(Size based on amino acids.Sequence ID %: Sequence identity percentage)