Legionella shows a diverse secondary metabolism dependent on a broad spectrum Sfp-type phosphopantetheinyl transferase

Several members of the genus Legionella cause Legionnaires’ disease, a potentially debilitating form of pneumonia. Studies frequently focus on the abundant number of virulence factors present in this genus. However, what is often overlooked is the role of secondary metabolites from Legionella. Following whole genome sequencing, we assembled and annotated the Legionella parisiensis DSM 19216 genome. Together with 14 other members of the Legionella, we performed comparative genomics and analysed the secondary metabolite potential of each strain. We found that Legionella contains a huge variety of biosynthetic gene clusters (BGCs) that are potentially making a significant number of novel natural products with undefined function. Surprisingly, only a single Sfp-like phosphopantetheinyl transferase is found in all Legionella strains analyzed that might be responsible for the activation of all carrier proteins in primary (fatty acid biosynthesis) and secondary metabolism (polyketide and non-ribosomal peptide synthesis). Using conserved active site motifs, we predict some novel compounds that are probably involved in cell-cell communication, differing to known communication systems. We identify several gene clusters, which may represent novel signaling mechanisms and demonstrate the natural product potential of Legionella.


INTRODUCTION
The genus of Legionella is relatively diverse with 58 member species, 29 of which are known to cause disease in humans (Cunha, Burillo & Bouza, 2016). Legionellosis, infection with a member of the genus, can result in a form of pneumonia known as Legionnaires' disease or the less severe, flu-like disease known as Pontiac fever. The first Legionella was identified following an outbreak of Legionnaires' disease in 1976, and The presence of PKS and NRPS is well established in all types of bacteria, for example, Streptomyces, Mycobacteria, Myxobacteria, Pseudomonas and Bacillus. Often these products are essential in a particular facet of their respective lifecycles. From Legionella, only four BGCs have been explored in detail with three secondary metabolites structurally elucidated to date ( Fig. 1) (Ahrendt et al., 2013;Shevchuk et al., 2014;Burnside et al., 2015;Johnston et al., 2016a). Legioliulin (1), a product of a trans-AT PKS cluster first identified  Figure 1 Structures of the known Legionella natural products legioliulin, legionellol and legiobactin as well as PPTase inhibitor used in this study (4-6).
in L. dumoffii, was reported originally in 2004 (Amemura-Maekawa et al., 2004) and biological activity assays failed to determine a role for the compound beyond fluorescence (Ahrendt et al., 2013). This study investigated the possibility that legioliulin is required for intracellular survival and ultimately failed to assign a biological function. On the other hand, a transposon mutagenesis library of L. pneumophila revealed a polyketide that interferes with lysosomal degradation during infection of both protozoa and macrophages (Shevchuk et al., 2014). Legiobactin (2) is a siderophore involved in iron sequestration (Cianciotto, 2007) and the unusual polyketide legionellol A (3) is involved in sliding motility and might additionally act as a surfactant (Johnston et al., 2016a). Despite all Legionella strains containing several BGCs, no further research has explored the roles of their respective products. To attempt to further explore the possibility that secondary metabolites are an important part of the Legionella lifecycle, we performed genome-wide comparisons of 15 genome sequences from Legionella, paying particular attention to the prevalence of BGCs. We explore possible structures and functions for these BGCs.

Genome sequencing, assembly and annotation
Shotgun sequencing of Legionella parisiensis DSM 19216 was performed using a Genome Sequencer FLX (Roche) by MWG Genomics (Munich). Assembly was performed using the Celera Assembler (v5.3) and quality assessed using QUAST (Gurevich et al., 2013).
Sequencing yielded a total of 290,164 reads with average read length of 353 bp. The L. parisiensis genome was assembled into a total of 226 contigs (115 1 kb) with an N50 of 65,672 bp and a predicted genomic coverage of 25. Genome annotation was performed using prokka (v1.12) (Seemann, 2014). Abricate was used to identify common antibiotic resistances (https://github.com/tseemann/abricate).

Phylogenetic analysis
Fourteen Legionella genomes were downloaded from NCBI (Table 1), their protein fasta files extracted and together with L. parisiensis, ortholog families were identified using proteinortho5 (Lechner et al., 2011). Protein singletons identified in only a single species were removed from further analysis. The presence or absence of all ortholog families were used to generate a gene content tree using the binary function associated with RAxML (Stamatakis, 2014) and the gamma model of rate heterogeneity and a random number seed for parsimony inferences. Protein sequences of the ACPs of E. coli (ACPS, WP_000986025.1) and Bacillus subtilis (Sfp-like, WP_003234549.1) were taken from the NCBI website and used to identify homologs in each species with Blastp. Protein sequences were aligned using ClustalW, and phylogenetic trees of the PPTases were created using the PhyML plugin attached to Geneious (v6.1.6) (Guindon et al., 2010). Branch formation was supported with bootstrapping (n = 1,000).
Cloning and expression of LparPPTase pCOLA_Duet1 (Novagen) was used as a vector for overproduction of the PPTase from L. parisiensis. The PPTase gene was amplified using primers Lpar_PPtase_Fw_SacI (GAGCTCGATGATCATTACCGAATTTAACCCT) and Lpar_PPtase_Rv_PstI (GTTCTGAATTAGGGGCAACGTGTCGAC) (synthesized by Sigma-Aldrich, St. Louis, MI, USA). Both the PCR product obtained and pCOLA_Duet1 were digested with SacI and PstI (Fermentas). Digestion products were separated by gel electrophoresis and desired fragments isolated with Gene JETGel extraction kit (Fermentas). Isolated fragments were ligated for 1 h at room temperature using T4-ligase (Fermentas). After ligation, E. coli DH10B was transformed with the ligation mixtures in a 1 mm cuvette by electroporation at 1,250 V, 200 and 25 mF. Cells were plated on LB-Kan agar and incubated overnight at 37 C. Colonies were picked and inoculated in LB-Kan media for plasmid extraction. The plasmids obtained were sequenced, and pCOLA_LparPPTase plasmids transferred into E. coli BL21 Star. Positive colonies were picked and cells were transformed with pUC18_indC (Brachmann et al., 2012) and grown on LB-Kan-Amp agar. Cells were grown to an OD 600 of 0.5 at 37 C at which time cultures were induced with 0.1 mM isopropyl-b-D-thiogalactopyranoside (IPTG) (Fermentas), and the cultures were incubated at 16 C overnight. Following induction of LparPPTase in pUC18_indC, cells were pelleted and resuspended in deionized water for easy visualization of the blue pigment produced by IndC.

Legionella in vivo inhibition
L. parisiensis was grown to an OD 600 of 0.1 in 200 ml AYE broth in a 96-well plate at 37 C. Putative PPTase inhibitors 4-6 (Foley et al., 2014) were then added in different concentrations, and the cells were allowed to grow for 24 h. For visualization of legioliulin production in L. parisiensis, the cells were illuminated with long-wave UV-light. The MIC of the PPTase inhibitors were tested in triplicate on L. parisiensis, L. pneumophila and L. longbeachae using the OD 600 value.

Secondary metabolite identification
Secondary metabolites were identified using antiSMASH 3.0 (Weber et al., 2015) with the optional ClusterFinder algorithm activated. The results from each genome were then aligned using Mauve (Darling et al., 2004), a BLAST based analysis program, to identify homologous clusters. Using this method, we assembled some clusters that were split across different contigs by sequence similarity, additionally taking into account the predicted substrate specificities and domain modifications from each unassembled module. The sequence for the isocyanide synthase cluster, isnAB, was taken from

Genome of L. parisiensis
Purified genomic DNA from L. parisiensis DSM 19216 was used for shotgun sequencing. Assembly using Celera (v5.3) revealed a 4,232,283 bp genome with a GC content of 37.98% and was predicted to contain 3,916 protein-coding sequences (CDS). This Whole Genome Shotgun project has been deposited at GenBank under the accession number LSOG00000000. The version described in this paper is version LSOG01000000.

Genome wide analyses
Together with the 14 other Legionella genomes (Table 1) we identified all protein ortholog families in Legionella (Table S1). The core genome of the 15 Legionella species consists of 711 coding sequences and includes a type II secretion system as well as the Dot/Icm system. The conserved type II secretion system is essential for intracellular survival and growth (Hales & Shuman, 1999;Polesky et al., 2001;Rossier, Starkenburg & Cianciotto, 2004) as well as promoting growth at low temperatures (Söderberg, Rossier & Cianciotto, 2004). The Dot/Icm system is already known to be ubiquitous in all strains (Feldman et al., 2005). The effectors secreted by this system work in concert to evade the phagosome and form the Legionella-containing vacuole allowing the bacteria to grow intracellularly (Isberg, O'Connor & Heidtman, 2009;Ensminger, 2016).
Using the program abricate, we additionally analysed the genomes for possible antibiotic resistance genes. In L. anisa, L. cherrii, L. longbeachae, L. sainthelensis and L. wadsworthii beta-lactamase resistance was identified with no other antibiotic resistance genes present. However, several multi drug efflux pumps were also found in the genomes (Table S2). Using the amino acid sequences of all annotated coding sequences from each strain, we determined ortholog families using proteinortho5 (Lechner et al., 2011). From these ortholog families, we produced a phylogeny representing the gene content based upon the presence or absence of each protein ortholog family. Following analysis of all Legionella strains and their BGCs, we constructed a map of each BGC common to more than a single species based on the protein sequence identity (Figs. 2 and 3). Bacteriocins are a class of ribosomally synthesized peptides with antibacterial properties. They are classified based on their mode of action and size (Yang et al., 2014) and are typically used to attack other bacteria competing in similar environments (or sometimes have broad-spectrum activity) but contain resistance mechanisms to avoid self-harm (Cotter, Hill & Ross, 2005). This analysis revealed that there is a range of different bacteriocins Figure 2 Legionella phylogeny based on presence or absence of ortholog families together with a summary of orthologous BGCs found in two or more Legionella species. BGCs were identified using antiSMASH (Weber et al., 2015) and nucleotide sequences were aligned using Mauve (Darling et al., 2004) to determine those that were similar. Ortholog presence was first determined using pro-teinortho5 (Lechner et al., 2011). The gene content tree was then constructed using RAxML, based on the presence or absence of each ortholog. BGCs are separated according to the class of compound produced. Cluster letters refer to those genetic schematics shown in Fig. 3 and compound numbers refer to those found in Figs. 1 and 6. A full list of BGCs can be found in Table S3. The gene cluster encoding IsnAB is not detected by antiSMASH but is a known BGC responsible for the biosynthesis of isonitrile containing compounds that are widespread in bacteria (Brady et al., 2007).
present in Legionella with all species containing at least one cluster with L. geestiana, L. oakridgensis and L. shakespearei containing bacteriocins not present in any other species (Figs. 2 and S1; Table S3).
Only a single PPTase was identified in Legionella, which activates natural product biosynthesis clusters in vitro Interestingly, the ortholog analysis identified only a single Sfp-like PPTase in all of the analyzed Legionella genomes (Fig. 4). No AcpS-like PPTase that is usually involved in fatty acid biosynthesis exists (Mofid, Finking & Marahiel, 2002). PPTases are required to post-translationally attach a 4′-phosphopantetheine arm from CoA to the serine residue contained in the T (ACP or PCP) domain and therefore are essential for fatty acid, polyketide and non-ribosomal peptide biosynthesis Stack, Neville & Doyle, 2007).
Unsurprisingly, within this group, Legionella PPTases form a distinct branch (Fig. 4). To test if the L. parisiensis Sfp-like PPTase could activate a NRPS, IndC from Photorhabdus luminescens (Brachmann et al., 2012) and the PPTase from L. parisiensis were co-produced in E. coli BL21 Star. IndC produces the blue pigment indigoidine by condensation of two glutamines. While indC is constitutively expressed in this experiment, the L. parisiensis PPTase gene expression was under control of an IPTG-inducible promoter. Addition of IPTG and consequent PPTase expression led to blue pigment production Figure 3 Representative examples of BGCs found in multiple Legionella species as identified in Fig. 2. Protein domain architecture as determined from NCBI's conserved domain database for NRPS (green) and PKS (red) encoding genes are also shown. Each circle represents an individual domain of the respective PKS or NRPS (domains not to scale). The PKS from L. pneumophila (I) contains a C-terminal condensation domain typical of those seen in NRPSs, which is also capable of polyketide chain release. Clusters O-U can be found in Fig. S1. All clusters are in Table S3.  Figure 4 Maximum likelihood phylogeny created using PhyML of PPTases identified in Legionella genomes and their relationship to a selection of PPTases from other bacteria. Scale represents amino acid substitutions per amino acid position. Bootstrapping (n = 1,000) was used to support branch formation. (Fig. S2A). As the E. coli Sfp-type PPTase, EntD, is not capable of activating IndC (Brachmann et al., 2012), any production of indigoidine must be activated by the PPTase from L. parisiensis. Harvesting and resuspension of the colored cells in water shows a bright blue pigmentation of the IPTG-induced culture (Fig. S2B).

Inhibition of legioliulin production and growth in L. parisiensis
To rule out the possibility that any PPTase was missed in this analysis, we used 2-pyridinyl-N-(4-aryl)-piperazine-1-carbothioamides (4-6), specific inhibitors of bacterial Sfp-like PPTases (Foley et al., 2014), to shut off legioliulin production. Legioliulin production and growth are closely linked. Bacterial growth was measured at OD 600 , and legioliulin production was observed under long-wave UV-light. The addition of 1 mg/mL of inhibitor 4 resulted in a total loss of legioliulin production. We then used different concentrations of 4 to determine substance effectivity (Fig. 5). Concentrations as low as 0.4 mg/mL of 4 showed an inhibition in legioliulin biosynthesis and growth. For compounds 5 and 6, initial inhibitory effects were observed at concentrations of 0.75 and 6 mg/mL, respectively. Similarly, growth inhibition was observed for L. pneumophila and L. longbeachae (Table S3).

Biosynthetic gene clusters
During the secondary metabolite analysis, we used antiSMASH to predict BGCs and extracted all those containing predicted siderophore, PKS, NRPS, lantipeptide or bacteriocin clusters. With the optional ClusterFinder algorithm activated (Cimermancic et al., 2014), we also examined all putative and saccharide-like clusters for misclassification (Table S3). Strains contained between 15 and 36 BGCs in total with NRPS clusters being the most prevalent. The most widespread PKS, NRPS and siderophore clusters found in Legionella are shown in Fig. 3 highlighting the overall synteny as well as the domain architecture of the natural product synthases.
Non-ribosomal peptide synthetase product predictions in Legionella PKS and NRPS specificity can often be predicted based upon the DNA sequence and comparisons to experimentally validated studies (Stachelhaus, Mootz & Marahiel, 1999;Challis, Ravel & Townsend, 2000;Yadav, Gokhale & Mohanty, 2003). In the case of the Stachelhaus code, conserved motifs in the adenylation (A) domain are used to predict substrate specificity. These conserved motifs and their respective specificities were confirmed by single nucleotide mutations resulting in either a loss of, or relaxation of substrate specificity (Stachelhaus, Mootz & Marahiel, 1999). Prediction using a hidden Markov model based approach is also available to predict specificities of either A domains from NRPS or acyltransferase (AT) domains from PKS and is integrated into antiSMASH (Minowa, Araki & Kanehisa, 2007;Weber et al., 2015). NRPSPredictor2, unlike the previous two methods, uses a support vector machine to predict specificities (Röttig et al., 2011). These methods formed the basis to predict the structures of the natural products produced by BGCs in Legionella. In many Legionella BGCs, the specificities for A domains involved in the activation of the correct amino acid in the NRPS could either not be predicted or showed variable results when these different algorithms were used. We therefore only attempted to predict resulting structures where a consensus among the three methods was reached. Several low molecular weight natural products produced from monomodular NRPS could be predicted assuming non-iterative use of these NRPS modules (Fig. 6).

Secondary metabolism in Legionella is under-pinned by a broad spectrum PPTase
Following the sequencing of the L. parisiensis genome, we noted the presence of 32 BGCs, as predicted by antiSMASH (Table S3). We then further investigated a selection of other Legionella strains to obtain a snapshot of the secondary metabolite potential of the genus. Through ortholog clustering we looked specifically for genes that are known to be essential in secondary metabolism.
Interestingly, this diversity in secondary metabolites gene clusters appeared to be controlled by a single Sfp-like PPTase in all Legionella strains analyzed, L.Ppt (Legionella PPTase, Fig. 4). This PPTase may therefore be capable of activating all different carrier proteins involved in polyketide and non-ribosomal peptide biosynthesis as well as fatty acid biosynthesis, a part of the primary metabolism as has been seen before (Losick & Isberg, 2006). A precedent for this has been made in Pseudomonas aeruginosa, which carries only a single broad spectrum PPTase that is active in both primary and secondary metabolism (Seidle, Couch & Parry, 2006). However, the veracity of this hypothesis is yet to be definitively determined in Legionella. Following identification of only one PPTase, we used an indigoidine production assay to confirm a role for L.Ppt from L. parisiensis in secondary metabolism. The enzyme was able to activate the NRPS IndC from P. luminescens, even though no NRPS product is known for any Legionella strain so far confirming this function. To investigate the effect on suppression of the Sfp-type PPTase, we grew L. parisiensis in the presence of Sfp-type PPTase inhibitors (Foley et al., 2014) and showed that legioliulin production, in addition to cell viability, is halted (Table S4; Fig. 5). The importance of this is that if only a single PPTase controls both primary and secondary metabolism, PPTase inhibitors may be effective as monotherapeutic drugs with multi-target effects (Silver, 2007) resulting from the loss of several functional ACP or PCP proteins, inhibiting essential fatty acid and secondary metabolite biosynthesis.  Fig. 2. Monomodular NRPS are predicted to produce modified amino acids or dipeptide derivatives that have also been identified in different fungi (Forseth et al., 2013). In a relatively rarely described phenomenon, NRPS domains may be re-used during product biosynthesis resulting in peptides longer than expected from the NRPS domain architecture. An example of such an iterative use is due to the action of the thioesterase domain which, following a single round of biosynthesis, must oligomerize the enzyme bound peptide product before release from the NRPS (Shaw-Reid et al., 1999;Bruner et al., 2002;Hoyer, Mahlert & Marahiel, 2007;Felnagle et al., 2008). Due to the relative infrequency that this happens, we assumed non-iterative use of domains for all structural predictions. Cluster B (Fig. 3) encodes a NRPS/PKS hybrid that is suggested to produce a valine elongated by a single polyketide elongation using malonyl-CoA with the resulting product, dependent on the thioesterase (TE) function, might be linear (7a) or cyclic (7b). Cluster C encodes a monomodular NRPS that is predicted to produce a N-formylated amino acid that is either reduced by the C-terminal reduction (Red) domain to the aldehyde (8a), or the alcohol (8b) that can then by cyclized nonenzymatically to form an oxazoline ring (8c). The acylated amino acid derived from cluster D can undergo similar transformation resulting in structurally related compounds (9a, 9b, 9c). Cluster G and I are very similar to C but the NRPS is terminated by a TE domain resulting again in either a linear (10a) or cyclic product (10b).
Although fatty acid biosynthesis has been questioned as a general target for antibiotic therapy (Parsons & Rock, 2011), the parallel inhibition of fatty acid, virulence factor and signaling compound biosynthesis might make PPTase inhibitors powerful antibiotics or drugs that could also work against intracellular pathogens, where fatty acid biosynthesis is essential (Yao et al., 2014).
Reconstruction of BGCs and ortholog clustering highlight the diversity of potential secondary metabolites in Legionella Only the structures of legioliulin (1), legiobactin (2) and legionellol (3) have been solved (Fig. 1) while one other PKS derived compound has been implicated in lysosomal degradation (Shevchuk et al., 2014). Legioliulin is a trans-AT PKS derived fluorophore (Fig. 3A). However, beyond fluorescence of bacterial strains containing the gene cluster, a biological function was not defined for legioliulin. This is perhaps unsurprising given that only the strains amoebic intracellular growth capabilities were tested while the species has been isolated from both environmental and clinical sources in both fluorescing and non-fluorescing forms (Igel, Helbig & Lück, 2004). The cluster of coding sequences responsible for legionellol, a hydrophilic molecule involved in lipid scaffolding, has been ascribed to a number of small discrete genes (lpg2223-41) coding for different domains in L. pneumophila (Fig. S3) (Johnston et al., 2016b).
One disadvantage with short read sequencing technologies is that long gene sequences that are prone to containing repetitive sequences may not be properly assembled. This may be the case for the PKS and NRPS gene clusters that we have examined here, as some are known to be highly repetitive such as the mycolactone PKS (Stinear et al., 2004) or the syringopeptin NRPS (Scholz-Schroeder, Soule & Gross, 2003). Although Legionella probably do not contain examples as extreme as mycolactone or syringopeptin, it is possible that the misclassified saccharide-like clusters or some of the contigs containing clusters at their respective termini are in fact collapsed BGCs due to poor assembly. Despite this, we found significant conservation of some BGCs, although this was not always reflected in the phylogenetic tree composed of all coding sequences. For example, Cluster F (Fig. 2) is present in species that appear more dissimilar with respect to their gene content. This observation may be in part explained by the amount of horizontal gene transfer that is reported to occur in this genus leading to a greater diversity of coding sequences (Gomez-Valero et al., 2011).
Cluster F, the most prevalent cluster, is a NRPS consisting of a single module containing an A, T and C domain, however it was not limited to a given clade of bacteria suggesting it is probably either dispensable for growth and survival, or it plays a more general role. Perhaps more interesting are the clusters that are exclusive to certain clades such as clusters E and K, a NRPS and type III PKS, respectively as well as clusters M and N, which are both siderophores. The apparent maintenance of these clusters in specific clades may be representative of essential functions in their particular environment. However, experimental evidence is needed to verify the veracity of this hypothesis. Siderophores are a well-known virulence factor of many bacteria and the structure of legiobactin (2) has already been elucidated in L. pneumophila (Cluster M (Burnside et al., 2015)). It is reported as having an identical structure to rhizoferrin (Drechsel et al., 1991;Burnside et al., 2015) and is essential for ferric iron uptake during infection of the lungs (Liles, Scheel & Cianciotto, 2000;Robey & Cianciotto, 2002;Allard et al., 2009;Chatfield et al., 2012).
Several Legionella strains also encode homologs of isnA and isnB that have been shown to be involved in the biosynthesis of isonitrile containing natural products that are widespread among bacteria (Brady et al., 2007). Specifically, isnA and isnB encode proteins that, together, produce an inhibitor of insect phenoloxidase that has been shown to be important in defense against host immune responses in entomopathogenic bacteria (Brady et al., 2007;Crawford et al., 2012). In Legionella, a helix-turn-helix domain protein and a cytochrome P450 oxidase are always associated with the cluster (Fig. 3L). In Pseudomonas, the isnAB cluster is part of a larger BGC and does not make the phenoloxidase inhibitor. There, the IsnAB homologs PvcA and PvcB are encoded as a part of the pyoverdine BGC where they are involved in maturation of the siderophore pyoverdine (Drake & Gulick, 2008).
Analysis of clusters B-D reveals the presence of NRPS that are clustered with genes encoding a transcriptional regulator. Although not definitive, this provides evidence supporting a role for these products as novel signaling compounds as seen in other Gram-negative bacteria Brameyer et al., 2015). If this is indeed the case, its significance lies in the fact that the bacteria occupy a relatively diverse environment and the signals may be specific for their respective niches.
In addition to the more conserved clusters found in several strains, unique clusters have been identified that are present only in individual strains (Fig. S3). Among them is another trans-AT PKS in L. cherii that might be responsible for the described red fluorescence of this strain that also gave it its name. However, the red fluorescence might also be derived from the legioliulin cluster, also encoded in this genome, when a starting unit other than cinnamic acid is used that could result in a red-shift of the resulting fluorophore. Different PKS/NRPS hybrids are encoded in L. anisa, L. parisiensis and L. longbeachae that additionally encode type I PKSs that could also be involved in the production of unusual fatty acids or lipids required for their particular niche (Fig. S3).
There are a large number of diverse and interesting BGCs in Legionella that have thus far been unexplored. Although few are conserved across species, we cannot rule out the possibility that these BGCs are providing important chemical compounds to their respective strains, whether for signaling, or otherwise. The lack of cluster conservation further reinforces the notion that this genus is a large, untapped reservoir for novel secondary metabolite discovery. Given the association of these bacteria with protozoa in the environment and the interaction of the pathogenic strains with human phagocytic cells, bioactive metabolites originating from this genus may have activity against eukaryotic targets making this an interesting area of future research.

PKS
polyketide synthase NRPS non-ribosomal peptide synthetase BGC biosynthetic gene cluster AT acyltransferase A adenylation T thiolation C condensation TE thioesterase