The plasmid-mediated evolution of the mycobacterial ESX (Type VII) secretion systems

The genome of Mycobacterium tuberculosis contains five copies of the ESX gene cluster, each encoding a dedicated protein secretion system. These ESX secretion systems have been defined as a novel Type VII secretion machinery, responsible for the secretion of proteins across the characteristic outer mycomembrane of the mycobacteria. Some of these secretion systems are involved in virulence and survival in M. tuberculosis; however they are also present in other non-pathogenic mycobacteria, and have been identified in some non-mycobacterial actinomycetes. Three components of the ESX gene cluster have also been found clustered in some gram positive monoderm organisms and are predicted to have preceded the ESX gene cluster. This study used in silico and phylogenetic analyses to describe the evolution of the ESX gene cluster from the WXG-FtsK cluster of monoderm bacteria to the five ESX clusters present in M. tuberculosis and other slow-growing mycobacteria. The ancestral gene cluster, ESX-4, was identified in several nonmycomembrane producing actinobacteria as well as the mycomembrane-containing Corynebacteriales in which the ESX cluster began to evolve and diversify. A novel ESX gene cluster, ESX-4EVOL, was identified in some non-mycobacterial actinomycetes and M. abscessus subsp. bolletii. ESX-4EVOL contains all of the conserved components of the ESX gene cluster and appears to be a precursor of the mycobacterial ESX duplications. Between two and seven ESX gene clusters were identified in each mycobacterial species, with ESX-2 and ESX-5 specifically associated with the slow growers. The order of ESX duplication in the mycobacteria is redefined as ESX-4, ESX-3, ESX-1 and then ESX-2 and ESX-5. Plasmid-encoded precursor ESX gene clusters were identified for each of the genomic ESX-3, -1, -2 and -5 gene clusters, suggesting a novel plasmid-mediated mechanism of ESX duplication and evolution. The influence of the various ESX gene clusters on vital biological and virulence-related functions has clearly influenced the diversification and success of the various mycobacterial species, and their evolution from the non-pathogenic fast-growing saprophytic to the slow-growing pathogenic organisms.


Background
The genome of Mycobacterium tuberculosis contains five ESX (or ESAT-6) gene clusters, named ESX-1, -2, -3, -4 and -5, which encode the Esx and PE/PPE proteins, various ATPases, membrane proteins, the mycosin proteases and other ESX-associated proteins [1,2]. The ESX gene clusters have been the topic of extensive research following the discovery that the primary attenuating deletion of M. bovis BCG, region of difference 1 (RD1), includes part of ESX-1 [3][4][5]. The proteins encoded in each of the ESX gene clusters have been predicted to form dedicated protein secretion systems, the ESX secretion systems, which have since been defined as a Type VII secretion machinery responsible for the secretion of, amongst others, the Esx, PE and PPE proteins encoded in them, across the outer mycomembrane [6,7].
The functions of the five M. tuberculosis ESX secretion systems appear to be distinct. ESX-1 is associated with virulence in M. tuberculosis [8][9][10], where it is involved in the inhibition of T-cell responses and phagosome maturation [11,12], and assists in the escape of mycobacteria from the macrophage vacuole by ESAT-6-mediated perforation of the vacuolar membrane [13][14][15][16]. ESX-5 has also been linked to M. tuberculosis pathogenicity and is involved in modulating the host immune responses to maintain a persistent infection [15,17,18]. ESX-5 has furthermore been linked to the uptake of nutrients by increasing outer-membrane permeability in the slowgrowing mycobacteria [19]. ESX-3 is essential for the in vitro growth of M. tuberculosis [20,21], and is involved in divalent cation (iron and zinc) homeostasis [22,23], and specifically iron uptake via the mycobactin iron acquisition pathway [21,24]. The functions of ESX-2 and ESX-4 remain unknown.
The ESX gene clusters occur throughout the genus Mycobacterium. A previous study has proposed the order of duplication of the ESX gene clusters to be ESX-4, -1, -3, -2 and then -5, with ESX-5 exclusively associated with the slow-growing mycobacteria [2]. The non-pathogenic, fastgrowing mycobacterium, M. smegmatis, contains three of the five M. tuberculosis ESX gene clusters, ESX-1, -3 and -4 [2]. In M. smegmatis, ESX-1 has been shown to be involved in conjugal DNA transfer [25,26]. ESX-3 is also involved in iron homeostasis, however it has not been directly linked to zinc homeostasis, and is not essential in this organism [27]. Although there are distinct contrasts in the functions of these secretion systems in M. smegmatis and M. tuberculosis, the orthologous systems have been shown to share certain characteristics and to secrete both sets of substrates [25,28,29]. This suggests that the ESX secretion systems have retained conserved mechanisms, and that virulence-associated functions may have evolved subsequently, or be associated with specific substrates.
ESX gene clusters have also been identified in the genomes of closely related actinomycetes outside of the genus Mycobacterium, including Nocardia, Streptomyces and Corynebacteria [2,6]. Furthermore, genes encoding two components of the ESX secretion system, the WXG (or Esx-like) and FtsK/SpoIIIE proteins, have been found clustered in some gram-positive monoderm genera such as Bacillus, Listeria and Saccharomyces [30]. Indeed, it has been suggested that ESX secretion systems occur outside of the Mycolata (species containing a mycomembrane-like outer membrane containing mycolic acids, including Corynebacteria, Rhodococci, Nocardia and Mycobacteria) and are therefore not typically involved in transmycomembrane secretion [31]. This, together with the absence of an identifiable component responsible for mycomembrane translocation, or an elucidated Type VII secretory mechanism, has generated some controversy, as some suggest that these are requirements for the designation of the ESX secretion systems as distinct Type VII secretion machineries [32].
Here we investigated the presence and absence of the ESX gene clusters in the genomes of the sequenced mycobacteria and other representative species from the class Actinobacteria. The phylogenetic relationship between these and the identified WXG-FtsK clusters of certain monoderm bacteria was determined in order to define the evolutionary history of the Type VII ESX secretion systems. In addition to the five ESX gene clusters which were previously identified, ESX gene clusters were identified on plasmids within several species of mycobacterium, and shown to precede the genomic ESX duplications. A model is proposed for the plasmidmediated duplication and evolution of the ESX gene clusters.

Results and discussion
ESX gene clusters were identified from the publicly available genome sequences of 60 actinobacterial species, including 40 mycobacterial species, 11 other species from the order Corynebacteriales and 9 species selected from the orders Pseudonocardiales, Glycomycetales, Micromonosporales, Frankiales, Streptosporangiales, Catenulisporales, Streptomycetales, Propionibacteriales and Kineosporiales (Table 1). Each genome contains between one and seven ESX gene clusters. The components and arrangement of each ESX gene cluster were determined and are represented in Additional file 1 with three WXG-FtsK clusters from Staphylococcus aureus, Listeria monocytogenes and Bacillus subtilis, identified in the literature as precursors of the ESX gene cluster [30,31]. The concatenated protein sequences of each ESX gene cluster were aligned and used to generate a phylogeny of the ESX gene clusters using maximum likelihood (ML) and distance methods ( Fig. 1 and Additional file 2) using the WXG-FtsK clusters of S. aureus, L. monocytogenes and B. subtilis as the outgroup. The topology of the trees generated by ML and distance methods was conserved, depicting 5 distinct clades, each incorporating one of the M. tuberculosis H37Rv ESX gene cluster regions 1 to 5.
ESX gene clusters were identified on plasmids in several mycobacterial species (pMFLV01 in M. gilvum, pMKMS01 and pMKMS02 in M. sp. KMS, Plasmid01 in M. sp. MCS, pMYCCH.01 and pMYCCH.02 in M. chubuense, pMYCSM01, pMYCSM02 and pMYCSM03 in M. smegmatis JS623, Plasmid 2 in M. abscessus sp. bolletii and pMyong1 in M. yongonense). Four additional mycobacterial plasmid-encoded ESX gene clusters were previously identified by Ummels et al., (2014) [33]. The sequences of three of these, on pRAW from M. marinum E11, pMAH135 from M. avium subsp. hominis suis T135 and pMK12478 from M. kansasii ATCC12478, are publicly available and were included in the phylogenetic analyses. The plasmid-encoded ESX clusters group phylogenetically with some of the ESX gene clusters identified on contigs from the incomplete genome sequences of M. tusciae and M. parascrofulaceum and together form a subclade of each genomic ESX duplication subsequent to ESX-4 ( Fig. 1). The M. parascrofulaceum and M. tusciae sequencing projects are incomplete, therefore it was not possible to conclusively determine whether the ESX gene clusters identified in these species are plasmid or chromosomally located. However, based on synteny and the phylogenetic clustering of these M. tusciae and M. parascrofulaceum ESX with the plasmidencoded ESX clusters, these ESX are predicted to be encoded on plasmids, or to originate directly from plasmid DNA. Sequence alignments indicate that each contig containing a predicted plasmid-located ESX cluster shares several conserved segments, or locally collinear blocks (LCBs), with the ESX-containing plasmids from the same subclade (Additional file 3). This is particularly apparent for sequences containing the subclade of ESX-3, which consist almost entirely of four LCBs, and the subclade of ESX-5. This supports the definition of these M. tusciae and M. parascrofulaceum ESX gene clusters as plasmid ESX gene clusters. The ESX gene clusters on the plasmids and M. tusciae and M. parascrofulaceum contigs, which form outgroups to ESX-1, -2, -3 and -5, were named ESX-P1, -P2, -P2' , -P3 and -P5, where "P" indicates the plasmid localisation of the ESX (Table 2). ESX-P1, ESX-P2, ESX-P3 and ESX-P5 form outgroups to the genomic ESX with the same numbers and ESX-P2' branches off prior to ESX-P2. ESX-P1 to -P5 contain all of the core ESX components, including espG and espI and ESX-P1 also incorporates EspH, while EccA is absent from ESX-P2.

ESX-4
Orthologs of the ESX-4 gene cluster were identified in all of the mycolic acid producing species from the genera Mycobacterium, Gordonia, Nocardia, Rhodococcus and Corynebacterium. ESX-4 gene clusters were also identified in the 9 species from the orders Pseudonocardiales, Glycomycetales, Micromonosporales, Frankiales, Streptosporangiales, Catenulisporales, Streptomycetales, Propionibacteriales and Kineosporiales which do not have mycolic acids in their cell envelope. These organisms each contain between one and four copies of the ESX-4 gene cluster. Although the arrangement and components of this gene cluster are well conserved amongst the mycobacterial species; insertions, deletions and rearrangements are common amongst the   In addition to the WXG-FtsK cluster components, ESX-4 encodes EccD, EccB and MycP, which have been suggested to be involved in a more intricate secretion mechanism to transport proteins into and across the unique and complex outer mycomembrane [34]. However, the presence of the ESX-4 cluster in various non-mycomembrane containing actinobacteria suggests that the secretion system encoded by these gene clusters is not directly involved in mycomembrane translocation. Although the function(s) of ESX-4 have yet to be determined, the presence and maintenance of this gene cluster throughout the mycobacteria and other actinobacteria suggests that it plays an important role in bacterial metabolism. Homologs of the ESX-4 gene cluster components occur in all 5 ESX gene clusters and could represent the proteins required for translocation across the inner membrane. The additional components present in the subsequent ESX duplications may be involved in mycomembrane translocation, be additional substrates, assist in the translocation of additional substrates or facilitate specific mechanisms of those secretion systems.
Phylogenetically associated with the ESX-4 gene cluster is a subgroup of ESX gene clusters which include homologs of the eccA, eccE, espG, espI, pe and ppe genes, in addition to the ESX-4 components. This cluster was identified in the mycolic acid producing species N. farcinica, N. brasiliense, N. cyriacigeorgica, T. paurometabola, S. rotundus, M. vaccae, M. fortuitum and M.
abscessus subsp. bolletii. The arrangement of the genes in this cluster varies between species, but does not resemble any of the M. tuberculosis ESX gene clusters. This cluster contains all of the conserved ESX gene cluster components and appears to be an evolutionary intermediate between ESX-4 and the subsequent duplications, and is therefore named ESX-4 EVOL (ESX-4 evolved).

ESX-3
ESX-3 present in all of the studied mycobacteria, with the exception of M. chubuense, suggesting that ESX-3 is the first ESX duplication in the mycobacterial genome. ESX-3 contains all of the ESX conserved components eccA to E, mycP, esx and pe/ppe pairs as well as espG. Although essential for in vitro growth of M. tuberculosis, ESX-3 is not essential in the fast-growing M. smegmatis [20]. ESX-3 is involved in iron homeostasis and uptake via the mycobactin pathway [24] and genetic reduction during evolution of the slowgrowers may have eliminated the redundancy of ESX-3. Outside of the mycobacteria, ESX-3 was only identified in S. rotundus suggesting that ESX-3 was inserted prior to the divergence of Segniliparus and Mycobacterium from a common ancestor. The presence of three mycobactin genes in the S. rotundus ESX-3 furthermore suggests that the association between ESX-3 and iron homeostasis may be conserved. The ancestral mycobacteria M. abscessus, M. abscessus subsp. bolletii and M. massiliense contain only ESX-4 (or ESX-4evol) and ESX-3.  ESX-1 has been implicated in virulence, and its deletion in attenuation of the pathogenic mycobacteria [8,9]. However, its presence throughout most of the mycobacteria, including non-pathogenic and saprophytic fast-growing organisms, suggests that the primary function of this gene cluster is not virulence, and that the virulence-associated function has evolved more recently in pathogenic organisms. An additional gene cluster, identified in the nonmycobacterial actinomycetes N. brasiliense and N. cyriacigeorgica contains all of the components of ESX-4 EVOL , but has an operonic arrangement similar to the M. tuberculosis ESX gene clusters. This cluster forms a subgroup just outside of the mycobacterial ESX-1 clade and is therefore named ESX-1 AN (ancestral ESX-1). An ESX gene cluster with similar arrangements was identified in T. paurometabola, but has undergone a transposition event which has resulted in the disruption of eccC and deletion of eccB. Phylogenetic clustering of this region is not consistent between algorithms and this region is also predicted to be an ESX-1 AN cluster, based on synteny.

ESX-2 and ESX-5
ESX-2 and ESX-5 occur only in the slow-growing mycobacteria. ESX-2 contains all of the conserved ESX components including espG and espI in an operonic structure, while ESX-5 contains only espG, but has multiple copies of pe and ppe, and the insertion of a ferredoxin and a cyp143 gene. The function(s) of ESX-2 have not been elucidated, and although its duplication correlates evolutionarily with both the slow-growing and pathogenic phenotypes, it has been lost from some of these species (M. leprae, M. marinum, M. ulcerans subsp. liflandii and M. ulcerans). ESX-5 is the only ESX gene cluster present in all of the slow-growers but absent in all of the fast-growers, and may be the ESX gene cluster most involved in pathogenicity and the slowgrowing phenotype [35]. Deletion of this region, however, does not directly increase the growth rate of M. marinum or M. tuberculosis [18,36]. ESX-5 has been implicated in immune evasion and in the secretion of the PE and PPE proteins [36,37]. Only ESX-5 contains multiple copies of the pe and ppe genes, the numbers of which vary between species, and its evolution is predicted to have preceded the expansion of these gene families [37].
M. tusciae contains an ESX cluster, ESX-2 AN (ancestral ESX-2), which contains all of the ESX-2 components and precedes both the ESX-2 and ESX-5 clades, as well as ESX-P2' , -P2 and -P5 gene clusters. M. tusciae is a slow-growing mycobacterium which, based on 16S rDNA sequencing, clusters with the fast-growing mycobacteria and is most closely related to the fast-growing mycobacteria M. farcinogenes, M. komossense and M. aichiense [38]. The correlation between the presence of an ESX-2/5-like cluster and a slow growth-rate might imply that M. tusciae is an evolutionary intermediate between the fast-and slow-growing mycobacteria. The mycolic acid composition of the cell membrane of M. tusciae most closely resembles that of the M. avium complex and M. parascofulaceum [38] suggesting that the different ESX secretion systems may have evolved with changes in the mycomembrane structure; as reflected in the role of ESX-5 in maintaining selective mycomembrane permeability in the slow growing pathogenic M. tuberculosis and M. marinum species [19]. Investigation of the potential association between these two ESX clusters, mycomembrane structure and growth rate may provide important information regarding the evolution of the often pathogenic, slow-growing mycobacteria.

Plasmid-mediated ESX evolution
The duplication and evolution of the ESX gene clusters and their secretion systems have clearly impacted on the evolution, diversity and success of the mycobacteria. The identification of ESX gene clusters on several plasmids within the mycobacteria, and their phylogenetic association with each of the genomic ESX gene clusters, provides novel insight into the mechanism of ESX evolution suggesting that the duplication and diversification of these clusters was plasmid-mediated. The presence of multiple plasmid copies within a single organism facilitates diversification by allowing the coevolution of various ESX clusters simultaneously. The plasmid localisation furthermore facilitates the loss of deleterious effects, while the incorporation of beneficial plasmid DNA into the genome allows permanent retention and might be selected for. We propose a model for the plasmidmediated duplication and evolution of the ESX gene clusters (Fig. 2). Based on this model, the FtsK-WXG cluster present in the Firmicutes evolved to form the ESX-4 gene cluster, through the incorporation of eccB, eccD and mycP, during the evolution of the actinobacteria; resulting in the presence of ESX-4 in the genomes of various actinobacterial species. A copy of ESX-4 has been incorporated into plasmid DNA after the divergence of the genera Corynebacterium and Rhodococcus. The additional ESX components, eccA, eccE, espG, espI, pe and ppe, were incorporated into this plasmid-located cluster (ESX-P AN ), which was subsequently incorporated into the genomes of some species, including Nocardia ssp., T. paurometabola, S. rotundus and M. abscessus subsp. bolletii, as ESX-4 EVOL. The variation in the arrangement and sequences of the genes in these clusters may represent independent insertions at different evolutionary time points. The presence of both ESX-4 and ESX-4 EVOL in some species implies that ESX-4 EVOL is a duplication of the ESX-4 cluster, and has not evolved directly from it. ESX-1, -2, -3, and 5 have evolved from a single duplication of ESX-4. The presence of all of the conserved ESX components in ESX-4 EVOL suggests that it evolved from the same progenitor and that ESX-4 EVOL is an intermediate between ESX-4 and ESX-1, -2, -3 and -5. Continual evolution of this plasmid ESX gene cluster generated the operonic structure characteristic of the mycobacterial ESX gene cluster duplications. Plasmid precursors of the four duplications, ESX-P1, -P3, -P2' , -P2 and -P5, have evolved simultaneously by divergence of the common plasmid ancestor, after which genome insertions generated the genomic ESX-1, -2, -3 and -5 clusters.
It appears furthermore, that these plasmids may be able to transfer between mycobacterial species. The pRAW, pMyong1, pMK12478 and pMAH135 plasmids, which contain ESX-P5, were also shown to contain components of a Type-IV secretions system and a traA/relaxase gene; which are required for conjugation of the plasmid between some slow-growing mycobacterial species [33]. Fig. 2 Model of ESX evolution based on plasmid-mediated duplication and evolution. The ancestral ESX-4 gene cluster evolved from the WXG-FtsK cluster via the incorporation of additional genes, eccB, eccD, mycP and rv3446c. ESX-4 was duplicated into plasmid DNA, into which additional ESX genes, eccA, eccE, espI, espG, pe and ppe, were incorporated. The plasmid ancestor (ESX-P AN ) was reinserted into the genomes of various Mycolata, generating ESX-4 EVOL . Continuous evolution generated the operonic structure of the plasmid ESX gene cluster. Divergent evolution of the plasmid ESX generated several plasmid ESX (ESX-P1, -P3, -P2', -P2 and -P5) which were inserted into the mycobacterial genome to generate ESX-3, ESX-1, ESX-2 and ESX-5. An earlier version of ESX-P1 was inserted into the genomes of some actinomycetes as ESX-1 AN and a precursor of ESX-P2' was inserted into the M. tusciae genome as ESX-2 AN. Red arrows represent genome insertions ESX-associated evolution of the mycobacteria A phylogenetic analysis of the mycobacteria and related actinomycetes based on their ESX gene clusters was done using the concatenated protein sequences of all of the ESX gene clusters of each species (Fig. 3). The Mycolata have evolved from a single gram-positive monoderm Fig. 3 The phylogeny of the mycobacteria based on ESX duplication and evolution. Maximum likelihood phylogeny describing the evolution of the mycobacteria based on the concatenated ESX gene cluster amino acid sequences from each species. ESX duplication and deletion events influenced the evolution and diversification of the mycobacteria as described in the text. Species which contain plasmid ESX gene clusters are underlined. One thousand subsets were generated for bootstrapping resampling of the data ancestor into two groups, those which contain only ESX-4, ESX-4 EVOL and ESX-1 AN , the non-mycobacterial actinomycetes; and those which also contain an ESX-3 gene cluster, which with the exception of S. rotundus, consist of the mycobacteria.

Conclusion
The distinctive cell envelope of mycobacteria, characterised by the highly impermeable outer mycomembrane peptidoglycan-arabinogalactan-mycolic acid matrix [6], provides a protective barrier against extracellular stresses, but also presents an obstacle to the export of proteins and acquisition of nutrients. Although mycobacteria possess both Sec and Tat secretion systems, which translocate proteins across the inner membrane, the ESX, or Type VII, secretion systems are the first mechanism proposed for the secretion of proteins into and across the mycomembrane. This study explored the evolution of the mycobacterial Type VII ESX gene clusters from the WXG-FtsK cluster in S. aureus, L. monocytogenes and B. subtilis to the 5 ESX gene clusters in M. tuberculosis. The ancestral ESX gene cluster (ESX-4) was identified in several non-mycomembrane producing actinobacteria as well as the non-mycobacterial Corynebacteriales. Between two and seven ESX gene clusters were identified in each mycobacterial species. A novel ESX gene cluster, ESX-4 EVOL , was identified in some non-mycobacterial myco-membrane containing actinomycetes and M. abscessus subsp. bolletii. ESX-4 EVOL contains all of the conserved components of the ESX and appears to be a precursor of the mycobacterial ESX duplications. Plasmid-encoded precursor ESX were identified for each of the genomic ESX-3, -1, -2 and -5 gene clusters and a novel plasmid-mediated mechanism of ESX duplication and evolution proposed. The presence and absence of the ESX gene clusters in the mycobacteria redefines the order of duplication of the ESX gene clusters in the mycobacteria as ESX-4, ESX-3, ESX-1 and then ESX-2 and ESX-5. The influence of the various ESX gene clusters on vital biological and virulence-related functions has clearly influenced the diversification and success of the various mycobacteria, and their evolution from the non-pathogenic fast-growing saprophytic to the slowgrowing pathogenic organisms.

Genome sequence data
All protein and DNA sequence information was obtained from publicly available finished and unfinished genome sequencing information. The genomes of 40 mycobacterial species, 11 other species from the order Corynebacteriale, nine species selected from the orders Pseudonocardiales, Glycomycetales, Frankiales, Micromonosporales, Streptosporangiales, Catenulisporales, Streptomycetales, Propionibacteriales and Kineosporiales and 3 gram-positive monoderm species containing WXG-FtsK clusters (Table 1), were analysed.

Comparative genomic analyses
The M. tuberculosis H37Rv ESX protein sequences of interest were used as templates to identify orthologous ESX protein and gene sequences. Blast similarity searches, blastn, tblastn and blastp [42], were done using NCBI Blast and the genome sequence databases listed in Additional file 4. Adjacent genomic regions were searched for additional ESX genes to determine clustering and arrangement of genes; for unfinished genomes in contig format this was not always possible and gene cluster arrangement was assumed based on sequence identity and anticipated arrangement. Large intergenic regions were searched for gene insertions using blastx analyses [43].

Phylogenetic analyses
Annotated protein sequences were obtained from the protein sequence databases. The protein sequences of conserved components of each ESX gene cluster (EccA, EccB, EccC, EccD, EccE, PE(s), PPE(s), Esx (CFP-10like), Esx (ESAT-6-like), EspG, EspI, MycP, Rv3446c, EspH, EspJ, EspK, EspL, EspB, Cyp143 and Ferredoxin) were concatenated. Multiple sequence alignments of all concatenated ESX gene cluster protein sequences were done with Clustal W 2.0 [44,45] using the Bioedit Sequence Alignment Editor version 7.1.3.0 [46]. Similarly, multiple sequence alignments of a single sequence composed of all of the combined ESX gene cluster protein sequences, from each species, were done. Phylogenetic trees were determined by distance and maximum likelihood analyses using SeaView Version 4.4.2 [47]. Distance analysis was done using the observed neighbour-joining method with 10000 bootstrap replicates. Maximum likelihood phylogenies were generated using PhyML [48] with the JTT (Jones Taylor Thornton) algorithm [49], using model-given amino acid equilibrium frequencies, specifying no invariable sites and no across site variation. Nearest-neighbor interchange tree searching operations were used with a BioNJ starting tree. The WXG-FtsK cluster sequences from S. aureus, L. monocytogenes and B. subtilis were defined as the outgroup. The M. microti ESX clusters were omitted from the phylogenetic analyses as protein annotations were not available.

Plasmid and contig sequence alignments
Plasmid and contig sequences were obtained from the NCBI (Additional file 4) and alignments of the plasmid and contig sequences containing each subgroup of ESX gene cluster were done using the progressiveMauve algorithm of the Mauve 2.3.1 Genome Alignment Visualisation software [50].