Resolving Confusion Surrounding d‐Ala‐d‐Ala Ligase Catalysis in Cyanobacterial Mycosporine‐Like Amino Acid (MAA) Biosynthesis

Mycosporine‐like amino acids (MAAs) are natural UV‐absorbing sunscreens that evolved in cyanobacteria and algae to palliate harmful effects from obligatory exposure to solar radiation. Multiple lines of evidence prove that in cyanobacteria all MAAs are derived from mycosporine‐glycine, which is typically modified by an ATP‐dependent ligase encoded by the gene mysD. The function of the mysD ligase has been experimentally described but haphazardly named based solely upon sequence similarity to the d‐alanine‐d‐alanine ligase of bacterial peptidoglycan biosynthesis. Combining phylogeny and alpha‐fold tertiary protein structure prediction unambiguously distinguished mysD from d‐alanine‐d‐alanine ligase. The renaming of mysD to mycosporine‐glycine‐amine ligase (MG‐amine ligase) using recognised enzymology rules of nomenclature is, therefore, proposed, and considers relaxed specificity for several different amino acid substrates. The evolutionary and ecological context of MG‐amine ligase catalysis merits wider appreciation especially when considering exploiting cyanobacteria for biotechnology, for example, producing mixtures of MAAs with enhanced optical or antioxidant properties.


Naming of mysD Has Been an Ambiguous Affair
For over three decades, mycosporine-like amino acid (MAA) biosynthesis has been studied in cyanobacteria. [1,2] MAAs are photoprotective compounds produced by photosynthetic organisms living in aquatic ecosystems to mitigate potentially harmful effects from obligatory exposure to solar UV radiation (UVR). [3] The precursor for all MAAs is 4-deoxygadusol (4-DG), which in cyanobacteria originates from two biogenic sources. [4] One route is derived from the pentose phosphate pathway whereby 3-hydroxypyruvate and ribose-5-phosphate condense to give the starter unit sedoheptulose-7-phosphate (SH7-P). SH7-P is then converted to 4-DG by 2-epi-5-epi-valiolone synthase (encoded by mysA) and O-methyltransferase (encoded by mysB). [5] The second pathway arises from the shikimic acid pathway, in which 3-deoxy-d-arabino-heptulosonate phosphate (DAHP) is converted by 3-dehydroquinate synthase (encoded by aroB) and a SAM-dependent methyltransferase to give 4-DG. [6] Experimental evidence supports that in cyanobacteria, both pathways converge at the methylation step (i. e., mysB). [4] In all cyanobacterial MAA biosynthetic pathways studied to date, 4-DG is next converted to mycosporine-glycine by an ATPgrasp enzyme. This enzyme is encoded by the gene mysC, which is located immediately downstream to mysA and mysB, forming a cluster with a colinear (cisÀ ) gene order arrangement. This cis architecture is always present in cyanobacteria capable of constitutive MAA biosynthesis. [7] Mycosporine-glycine is typically further modified in many cyanobacteria by a fourth gene (within the colinear cluster) which is interchangeable, depending on the strain. Some species have an NRPS-type gene (mysE) [5] whilst others have a gene encoding an ATP-dependent ligase (mysD). [8] The presence of this fourth gene accounts, at least in part, to the large structural diversity of MAAs found in nature.
The mysD gene was first described in an MAA biosynthetic gene cluster encoded by Nostoc punctiforme ATCC29133. [8] In this cluster, mysD encodes for an ATP-grasp-like protein with close primary amino acid sequence homology to ddl (d-alanined-alanine ligase) involved in bacterial peptidoglycan biosynthesis. [9,10] However, the authors avoided naming the enzyme d-alanine-d-alanine ligase because the study focused solely on MAA biosynthesis. In peptidoglycan biosynthesis, ddl has a very specific substrate specificity for alanine. Subsequent studies of MAA biosynthesis, particularly in cyanobacteria, have consistently and incorrectly designated the mysD encoding enzyme (MysD) as a d-alanine-d-alanine ligase based only upon primary amino acid sequence homology to ddl, even when alanine was not the preferred substrate incorporated into the final MAA structure. An arbitrary search of Google Scholar using the key terms "d-alanine-d-alanine ligase" and "mycosporines" revealed over 40 papers published within the last five years where this incorrect naming appeared. Most of these MAA biosynthetic studies did not describe a reaction between 4-DG and alanine, further exacerbating this misplaced enzyme nomenclature. Even within a single cyanobacterial species, promiscuity abounds with several amino acids potentially acting as substrates for MysD, thereby giving rise to multiple MAAs products. [8,[11][12][13] Biosynthesis of multiple products from the same biochemical pathway has been described before as a unique property that would allow adaptation under everchanging environmental conditions. [14] Commercial development of MAAs might feasibly be achieved by exploiting the relaxed substrate specificity of MysD in fermentations that produce mixtures of MAAs, for enhanced antioxidant or optical properties.
According to enzyme nomenclature guidelines of the International Union of Biochemistry and Molecular Biology (IUBMB) and the International Union of Pure and Applied Chemistry (IUPAC), [15] an enzyme should be named in accordance with the substrates used in the reaction it catalyses (e. g., ddl is related to the synthesis of d-alanine-d-alanine from two molecules of d-Ala). A consequence of incorrectly designating enzyme names leads to aberrant terminology, with misleading biochemistry that perpetuates in the scientific literature. For example, mysD has previously been called "shinorine synthetase" [16] based solely on the biosynthesis of shinorine, an MAA formed by the reaction between mycosporine-glycine and serine. However, this terminology limits the enzyme name to a single substrate (serine) when, in fact, catalytic reactions between mycosporine-glycine and at least six other amino acids have been described, namely serine, threonine, alanine, cysteine, arginine and glycine. [8,[11][12][13] This relaxed substrate specificity has been highlighted previously and presents a strong argument for renaming mysD. [11] Based on the large volume of data available on the reported substrates for MysD (Scheme 1), the present naming of this enzyme when implicated in MAA biosynthesis is clearly mistaken and causes considerable confusion. To correctly represent the biochemical reactions catalysed by MysD, the naming of this enzyme now needs to be resolved in accordance with IUBMB/IUPAC rules. Obeying nomenclature rules, we herein suggest adopting the name "mycosporine-glycine-amine ligase" when annotating the protein encoded by mysD. IUPAC guidelines also state that abbreviations can be a part of an enzyme name. As such, we also offer the name MG-amine ligase. This new terminology clearly distinguishes mysD in MAA biosynthesis (encoded by the gene mysD described in cyanobacterial clusters) from the d-Ala-d-Ala ligase (encoded by ddl) associated with peptidoglycan biosynthesis.

The Consequence of mysD in Cyanobacterial MAA Biosynthesis
The proposed new naming correctly identifies the substrates of mysD as mycosporine-glycine and the amine group (À NH 2 ) of several amino acids. [13] Despite displaying relaxed specificity for multiple amino acid substrates, MG-amine ligase favours serine and threonine, two amino acids with polar uncharged side chains. This preferred specificity may explain why so many cyanobacterial strains produce either shinorine or porphyra-334. [17,18] Studies evaluating in vitro enzymatic parameters such as reaction time and rate of substrate consumption by MGamine ligase in Nostoc linckia NIES-25 reported a 40 % conversion of MG in 8 min using threonine to yield porphyra-334. [13] Following threonine, serine was the most copiously consumed amino acid, with residual values (< 1 % compared to the reaction with threonine) for cysteine, alanine, arginine and glycine. No product was detected when the remaining 14 essential amino acids were provided as substrates. The same substrate preference for threonine (resulting predominantly in the biosynthesis of porphyra-334) was observed during constitutive and UV-induced MAA biosynthesis by Nodularia spumigena CENA596. [7] Conversely, in vivo experimentation indicated that serine was the preferred substrate for MG-amine ligase which accounted for shinorine being the major MAA biosynthesised by Sphaerospermopsis torques-reginae ITEP-Rafael B. Dextro has an MSc in Ecology and is currently undertaking research for his PhD describing cyanobacterial diversity in Brazilian environments. He received a Study Abroad grant to develop part of his research at King's College London.
Marli F. Fiore is Professor of Environmental Microbiology at the Centre for Nuclear Energy in Agriculture (CENA), University of São Paulo. Her research group, 'The CYANOS', is focused on exploring the taxonomy and genomics of cyanobacteria across Brazilian biomes, and the production of natural products by these bacteria. She has recently been elected a member of the International Committee on Systematics of Prokaryotes/Subcommittee on the Taxonomy of Phototrophic Bacteria. Paul F. Long is a microbiologist by training and is currently Professor of Marine Biotechnology and Therapeutics at King's College London. His research group uses a combination of bioinformatics, field work and wet laboratory studies to discover new high value biotech products from nature, particularly from aquatic ecosystems. 024 [7,19] and Microcystis aeruginosa PCC 7806. [20] Relaxed substrate specificity of MG-amine ligase may be considered an adaptive advantage, allowing MAA biosynthesis to proceed under amino acid depleted conditions. [21] Several MAA-related gene cluster architectures have been described in cyanobacteria, with the cis-arrangement mysABCD most frequent in strains capable of constitutive MAA biosynthesis. [7,8,11] Taking the cis arrangement as a prerequisite for MAA production in cyanobacteria, mining available genomes deposited in the NCBI database found that surprisingly, only 185/1568 genome sequences (11.8 %) encoded a mysABCD cis cluster arrangement (Table S1 in the Supporting Information). The mys genes annotated in the remaining genome sequences formed partial cis clusters (containing mysABC and other mys genes such as mysE or H) or had the associated genes scattered at different loci across the genomes. This gene order arrangement was evocative of trans-acting enzymes in other biosynthetic pathways, for example, some modular polyketide synthases. [22] However, recent experimental data on a small number of strains encoding trans-cluster gene arrangements unambiguously demonstrated that these strains failed to biosynthesize MAAs, even when exposed to UV radiation. [7] These data corroborated that MAA biosynthesis could also not be complemented by the shikimic acid pathway, which had previously been shown in a gene deletion mutants of Anabaena variabilis ATCC 29413 (that encodes a cis-mysABCD). [4,6] When taken together, these observations could arguably suggest that unexpectedly most (69.8 %) cyanobacterial strains for which draft genomes have been deposited in the NCBI database are likely MAA non-producers (without any form of mys gene cluster). The overwhelming number of cyanobacterial genomes where a cis-gene order was lost might be explained by events Scheme 1. Reactions between mycosporine-glycine and various amino acids that act as substrates for mycosporine-glycine-amine ligase (MG-amine ligase, encoded by mysD).

ChemBioChem
Perspective doi.org/10.1002/cbic.202300158 such as deletions induced by transposable elements [23] or by the degeneration of the cluster from ancestry states. [24] Further descriptions of gene order arrangements implicated in MAA biosynthesis and supported by rigorous chemical analyses is required to realise that alternative photoprotective strategies also exist in nonproducing cyanobacteria.
The presence of cis-mysABCD was found only in four distinct taxonomic orders (Nostocales, Oscillatoriales, Chroococcales and Synechococcales). The greatest frequency was noted in Nostocales, which arguably reflected enriched taxon sampling. [21] To redress this imbalance, cyanobacteria representing other taxonomic orders that are found in diverse and extreme environments will have to be isolated and then sequenced. [25] Nonetheless, subsequent gene and enzyme nomenclature must be consistent if sensible biochemistry and ecological information are to be inferred. A phylogenetic tree was constructed using all copies of mysD irrespective of gene cluster orientation ( Figure S1). Representative genes from this tree where next compared in a further phylogenetic analysis with ddl (correctly encoding the enzyme d-Ala ligase related to peptidoglycan biosynthesis) and other ATP-grasp enzymes (RimK and GshB; Figure 1). Both phylogenies provided compelling evidence for a common ancestor of mysD, that was independent from other ATP-grasp enzymes. Even mysABCD clusters encoded in the genomes of non-photosynthetic actinomycete bacteria and a homolog present in the macroalgae Porphyra umbilicalis (a known MAA producer [26] ) showed sequence similarity to cyanobacterial mysD, suggestive of a common ancestor ( Figure S1). The external group formed using ddl, rimK and gshB gene sequences (Figure 1) displayed, as expected, a different evolutionary history and functions than that proposed for mysD. Incorporating alpha-fold [27] machine learning predictions corroborated the phylogenetic analysis, and indisputably demonstrated high tertiary protein structural similarity between MysD homologs ( Figure S2), with the same numbers and positions of α-helixes and β-sheets.
Comparison between the alpha-fold structures also predicted that active sites were located within the same regions of each homolog. All mysD homologs had the same predicted Cterminal (IPR011095) and N-terminal (IPR011127) protein domains as d-Ala-d-Ala ligase encoded by ddl genes used as an external group in the phylogeny. This was an expected consequence of automatic annotation. A comparison between the tertiary structure of MysD and Ddl further demonstrated that these enzymes are different ( Figure S3).

The Upshot of Correct and Unambiguous Naming of mysD
Combining phylogeny and alpha-fold protein structure prediction unambiguously separated the previously generically named d-alanine-d-alanine enzyme into two distinct structural families. The first is involved in peptidoglycan biosynthesis and is a genuine condensation between alanine molecules. The other is an enzyme with relaxed specificity for several different amino acid substrates, catalysing MAA biosynthesis and, therefore, can no longer be considered a bone fide d-Ala-d-Ala ligase. The generic name mycosporine-glycine-amine ligase (MG-amine ligase) is proposed according to recognised enzymology rules for nomenclature and considering the most up-todate knowledge pertaining to MAA biosynthesis. Unequivocal annotation of MAA biosynthetic genes combined precision structural elucidation and quantitative analytical chemistry will expound evolutionary distinctions between producers and nonproducing strains and illuminate the ecological functions of MAAs beyond photoprotection. It remains elusive as to why MAAs are typically produced as mixtures. Prospective scrutiny of enzymes such as mysD that display relaxed substrate specificity may afford MAA mixtures to be commercially exploited in the future, with enhanced photoprotective or antioxidant activities.

Supporting Information
Methods describing phylogenetic analyses of mysD, ddl, rimK and gshB using MEGA11, [28] and tertiary structural predictions running AlphaFold [27] and InterPro [29] software are given in the Supporting Information.