A novel variant of the Listeria monocytogenes type VII secretion system EssC component is associated with an Rhs toxin

The type VIIb protein secretion system (T7SSb) is found in Bacillota (firmicute) bacteria and has been shown to mediate interbacterial competition. EssC is a membrane-bound ATPase that is a critical component of the T7SSb and plays a key role in substrate recognition. Prior analysis of available genome sequences of the foodborne bacterial pathogen Listeria monocytogenes has shown that although the T7SSb was encoded as part of the core genome, EssC could be found as one of seven different sequence variants. While each sequence variant was associated with a specific suite of candidate substrate proteins encoded immediately downstream of essC, many LXG-domain proteins were encoded across multiple essC sequence variants. Here, we have extended this analysis using a diverse collection of 37 930 L . monocytogenes genomes. We have identified a rare eighth variant of EssC present in ten L. monocytogenes lineage III genomes. These genomes also encode a large toxin of the rearrangement hotspot (Rhs) repeat family adjacent to essC8, along with a probable immunity protein and three small accessory proteins. We have further identified nine novel LXG-domain proteins, and four additional chromosomal hotspots across L. monocytogenes genomes where LXG proteins can be encoded. The eight L. monocytogenes EssC variants were also found in other Listeria species, with additional novel EssC types also identified. Across the genus, species frequently encoded multiple EssC types, indicating that T7SSb diversity is a primary feature of the genus Listeria .


InTRoDuCTIon
In their natural environments, bacteria live in complex communities with other microbes, viruses and often higher eukaryotes. In order to co-exist, bacteria have developed numerous strategies, including competitive and co-operative mechanisms [1,2]. A major way that bacteria interact with biotic and abiotic factors is through the secretion of extracellular proteins, and to date 11 protein secretion systems have been described [3]. While most of these are unique to Gram-negative bacteria, the type VII secretion system (T7SS) is found in many monoderm Gram-positive bacteria, including staphylococci and streptococci, and in diderm Gram-positives such as mycobacteria [4,5].
There are commonalities and differences between the T7SS of mycobacteria and of Bacillota such as Staphylococcus spp., resulting in them being assigned T7SSa (actinobacteria) and T7SSb (Bacillota) [6,7]. One of the key similarities is the requirement of a membrane-bound AAA+ ATPase for secretion activity. In T7SSa, the ATPase is named EccC and is arranged as a hexamer at the OPEN ACCESS centre of a 2.3 MDa membrane-bound protein complex [7][8][9]. The corresponding ATPase in T7SSb is termed EssC, and shares sequence similarity with EccC but additionally has two forkhead-associated domains at its N-terminus that are essential for activity [10,11]. A second common feature between the two systems is the presence of helical hairpin substrate proteins from the WXG100 family. These small proteins form homodimers (T7SSb) or heterodimers (T7SSa), and are secreted in a folded state [12,13]. Other substrate families appear to be unique to either of the two systems, with the T7SSa secreting PE/PPE and Esp proteins, whereas T7SSb substrates have an LXG domain or are related to YeeF proteins [14][15][16][17].
Growing evidence indicates that Bacillota employ their T7SSb for competitive interactions with other bacteria. Staphylococcus aureus secretes a DNase, EsaD and an LXG-domain membrane depolarizing toxin, TspA, which can target closely related bacteria [17,18]. Four LXG toxins have been identified in Streptococcus intermedius, and LXG toxins have also been shown to mediate bacterial antagonism in Enterococcus faecalis and Bacillus subtilis [16,[19][20][21]. However, to date, the largest number of candidate T7SSb toxins have been identified in Listeria monocytogenes [22].
L. monocytogenes is a food-borne pathogen that is found in water, soil and decomposing vegetation [23]. From here it can make its way into to food chain, where its ability to grow at refrigeration temperatures, and to resist low pH and high salt, contribute to its survival and growth in food [24]. Once ingested, it can cause numerous diseases from abortion in pregnant individuals to bacteraemia and death in up to 30 % of cases [25,26]. L. monocytogenes genomes have been classified into one of four evolutionary lineages and multiple clonal complexes (CCs), with lineages I and II most frequently associated with human disease [27,28]. Genes encoding the T7SSb are found in all sequenced genomes to date, and are located in a region of high sequence variability [29].
Our prior genomic analysis revealed that the gene encoding the T7SSb core component, EssC, is found as seven different variants across L. monocytogenes genomes. The sequence variability corresponds to the final two ATPase domains of EssC, and each EssC variant is adjacent to a variable region (termed variable region 1) that likely encodes one or more secreted toxin substrate specific to each EssC variant [22]. We found that the EssC1 variant is the most prevalent and that essC1-specific toxins have a YeeF-like toxin, related to Staphylococcus aureus EsaD, while essC5, essC6 and essC7 genomes encode LXG-domain toxins at the equivalent position [22]. In addition to EssC-specific suites of substrate proteins, we also found other genomic hotspots where T7SSb toxins could be encoded, identifying 40 different LXG toxins in sequenced genomes. Collectively, this high diversity implies that the T7SSb plays a key role in bacterial antagonism.
Here, we further investigate L. monocytogenes T7SSb variability through analysis of a collection of 37 930 L. monocytogenes genomes obtained from public repositories. We have identified a rare eighth EssC variant unique to lineage III strains, which is genetically associated with a likely antibacterial toxin of the rearrangement hotspot (Rhs) repeat protein family. We also describe further LXG candidate toxin proteins across the species and four additional chromosomal hotspots where they may be encoded. This high variability in EssC type is observed in other Listeria species, suggestive of a competitive strategy that is widely utilized across the genus.

Impact Statement
Listeria monocytogenes is a soil-borne saprophytic bacterium and a food-borne pathogen of humans. Decomposing plant matter and the human gastrointestinal tract are rich in diverse microbial species, and to colonize these niches L. monocytogenes must be able to compete with other bacteria. The type VII secretion system (T7SS) of Bacillota has been shown to secrete protein toxins that target other bacteria. In this study, we have analysed a diverse collection of L. monocytogenes genome sequences to study the diversity of the Listeria T7SS and its putative effector proteins. We show that the EssC component of the L. monocytogenes T7SS is highly diverse, clustering into one of eight sequence variants. Each EssC variant is associated with a specific toxin candidate, and the EssC8 variant T7SS likely secretes a novel rearrangement hotspot (Rhs) repeat toxin. We also identify multiple new LXG-families of T7SS toxins and describe genomic hotspots where they are encoded. We find no link between EssC variants and clinical outcome. In agreement with this, analysis of EssC variability in available genomes of other Listeria species showed that all eight L. monocytogenes EssC variants are present in non-monocytogenes Listeria species.
(cgMLST) was performed using chewBBACA version 2.8.5 [32] and the 1748 allelic scheme from the Listeria PubMLST site at Institute Pasteur (https://bigsdb.pasteur.fr/listeria) [33]. Phylogenomic trees were built from cgMLST profiles using Grapetree version 1.5 using the RapidNJ algorithm [34]. The list of L. monocytogenes genomes included, multilocus sequence typing (MLST) sequence type, CC, order in cgMLST trees, accession numbers and associated metadata are provided in Table S1. Isolates were assigned as clinical when obtained from human samples, while they were assigned as non-clinical when obtained from food, factory or environmental sources. Those obtained from animal sources were not assigned as clinical or non-clinical as this could not be determined from the available information.

Querying the L. monocytogenes genome repository for EssC and LXG toxin variants
The L. monocytogenes genome repository was queried using ABRicate version 1.0.1 (https://github.com/tseemann/abricate), using DNA sequences representing conserved and variable regions of the genes encoding the EssC1-8 proteins and the LXG proteins D, E, F, G and Omicron. All genomes were also converted into predicted ORFs using Prodigal version 2.6.3 [35], and screened using amino acid translations of the conserved and variable regions using blast+ version 2.13.0 [36].

Identification of genetic hotspots and T7SS-related genes
Genomes of essC8 variant L. monocytogenes genomes were directly aligned against L. monocytogenes EGDe (essC1) using progres-siveMauve [41] (https://darlinglab.org/mauve/mauve.html). Genes within previously identified genetic hotspot regions were extracted, with protein accessions obtained via Batch Entrez, and homologous proteins identified from the NCBI reference proteins (refseq_protein) database for L. monocytogenes (taxid: 1639) via blast under default search settings. Identification of EssC in non-monocytogenes species was carried out with blastp under the same conditions.

Phylogenetic analysis
Amino acid sequences of EssC variants from L. monocytogenes and non-monocytogenes Listeria spp. were aligned using muscle. Phylogenetic trees were reconstructed using mega7, using the neighbour-joining algorithm, JTT matrix and pairwise deletions, with 100 bootstraps [48].

Identification of a novel variant of essC in lineage III genomes of L. monocytogenes
Previously, we had analysed the genetic arrangement of the T7SSb gene cluster across 271 L. monocytogenes genome sequences that were present in the NCBI database, resulting in the identification of seven sequence variants of EssC [22]. In this study, we have expanded this analysis to investigate the distribution of L. monocytogenes essC variants present in a genome repository that was assembled from multiple sources, comprising 37 930 genomes that encompassed all four evolutionary lineages (I-IV). Within the repository there were 17 687 sequences from lineage I genomes, 19 034 from lineage II genomes, 1133 from lineage III genomes and 76 from lineage IV genomes. These had been isolated from a diverse range of geographical locations and time periods, and from human, animal, food, factory and environmental sources (Table S1).
Genomes were phylogenetically clustered into their evolutionary lineages, annotated with their respective CCs, and annotated as clinical or non-clinical (Fig. S1). The EssC-encoding sequence was extracted from every genome and aligned against representative EssC1-EssC7 sequences to classify the T7SSb subtype of each strain. The distribution of essC genetic variants across these evolutionary lineages and CCs is summarized in Fig. 1(a), with the distribution of essC genetic variants across all genomes in the repository shown in Fig. S1. Within lineages I and II, there were clear associations of essC variant with the MLST CCs, for example in lineage I where CC1 exclusively encodes essC1, but CC2 and CC6 encode only essC2; similarly, in lineage II, CC9 encodes essC1, but CC8 encodes essC2. The CC14 in lineage II is split into three distinct groups, and this shows in their respective essC variants. The first two clusters of CC14 encode essC2 and essC1 (Fig. S1), while the third cluster of CC14 encodes essC1 and essC5. In contrast, in lineages III and IV there is much more diversity in essC types, which is likely related to the much higher genomic variability in these lineages, as well as them being less often isolated and, hence, less represented in the repository. While there were clear associations with MLST CCs, there was no link between essC variant and whether the isolate was from a clinical or non-clinical source.
Two of the 37 930 genome sequences did not contain an essC gene; however, on further inspection these genome sizes (2.512 and 2.808 Mb) were smaller than the 2.944 Mb EGDe genome, and additionally lacked numerous essential genes as well as the T7SSb locus, so likely represent incomplete sequences. Of the remaining genomes, the essC1 variant is the major subtype, with essC2 the second most abundant (Table 1), in agreement with previous analysis of a much smaller number of genomes [22]. We also noted previously that some essC1 genomes harboured a second, truncated essC encoding a protein from residue 771 onwards, but with a variable region from a different EssC. These were assigned as pseudogenes because they lack a start codon (the closest methionine is at residue 790) and the sequence coverage starts part way though one of the ATPase domains (Fig. S2). This finding was further reinforced in our larger dataset, and we noted that some essC2 and essC3 genomes also encoded a second truncated variant ( Table 1). As reported previously [22], where a second partial essC copy was present, these genomes encode the predicted toxins of both essC variant types and contain a significantly larger variable region 1 compared to single copy essC genomes [22]. *Manual inspection of the genomic sequences of these two genomes identified a sequencing gap covering the T7SS locus and multiple essential genes; therefore, they likely represent incomplete genomes. During our analysis, we noted that EssC sequences from a small subset of ten lineage III genomes clustered separately from the seven EssC variants we had identified previously, forming an eighth EssC variant cluster. All ten of these essC8 genomes were isolated in the USA, and no one source category could be assigned to all genomes. These ten genomes are listed in Table 2, alongside metadata regarding isolation details, and a sequence alignment of EssC8 against the other seven EssC variants is shown in Fig.  S2. Ultimately, this rare variant accounted for 0.88 % of genomes in lineage III of L. monocytogenes, and 0.026 % of all genomes examined in this analysis. Following this, we later identified a further essC8-encoding strain -the recently sequenced soil isolate L. monocytogenes FSL L7-0325 -after blast analysis of the NCBI RefSeq database, which we also included in our subsequent bioinformatic analysis.
In addition to EssC, the T7SSb of Bacillota comprises three additional membrane proteins, EsaA, EssA and EssB, and two globular components, EsaB and EsxA [51]. Our previous analysis had indicated that across the L. monocytogenes essC1-essC7 genomes, these components are highly conserved (approximately 97 % identity) [22]. Extending our analysis to essC8 genomes, we noted that these five core proteins were also highly similar to those encoded by the other seven variant genomes (e.g. EsxA ≥96.6 %, EsaB ≥98.8 %). By contrast, the sequence identity of EssC8 relative to the other seven EssC variants ranged from 77.53 % with EssC4 to 82.80 % with EssC1, with the variability almost exclusively encompassing the cytosolic D2 and D3 portions of the protein (Fig. S2). Submission of representative EssC sequences of each variant to Plotcon confirmed that sequence variation at the protein level is confined mainly to the C-termini of these proteins (Fig. 1b).

essC8 variable region 1 encodes an Rhs toxin
The T7SSb locus of L. monocytogenes genomes comprises two variable regions that are separated by a cluster of five housekeeping genes (encoding predicted phosphoenolpyruvate mutase, 6-O-methylguanine DNA methyltransferase, a YjbI-superfamily protein, a 2-hydroxyacid dehydrogenase and a tRNA; Fig. 2) [22]. Variable region 1 encodes one or more candidate substrate proteins that are specific for a particular essC variant; these include a YeeF-related toxin in essC1 genomes, and LXG toxins A, B and C in essC5, essC6 and essC7 genomes, respectively [22].   2 provides an overview of the T7SSb locus of an example of each essC variant, with homologous genes conserved across the eight genomes shaded by sequence similarity. As expected, variable region 1 is highly heterogeneous between the eight essC subtypes, since each essC strain encodes variant-specific substrate proteins (usually alongside cognate immunity proteins). However, we noted previously that genes encoding 'orphan' immunity proteins (i.e. to protect from 'non-self ' toxins) are also encoded in variable numbers in almost all genomes [22]. In the particular example in Fig. 2, the predicted cognate immunity gene to the LXG-domain toxin of the essC5 strain FSL L7-0239 is also encoded in variable region 1 of the essC1, C2, C3 and C4 genomes.
The dominant genetic feature of variable region 1 for all essC8 genomes is the presence of a large gene encoding a 1656 residue protein (Figs 2, 3a and 4). The protein has a DUF6531 (PF20148) domain that precedes 16 Rhs repeat (PF05593) domains extending down its length (Fig. 3b). We subsequently renamed this protein Listeria Rhs toxin A (LrhA). Many Rhs-repeat proteins are anti-bacterial toxins that can be secreted via a number of routes, including the Gram-negative type VI secretion system and by the Sec pathway in Gram-positive bacteria, dependent upon the targeting information that is present [52,53]. Some Rhs toxins also have insecticidal activity [54]. The Rhs repeats form a filamentous 'cocoon' that encases a C-terminal toxin domain [55].
Immediately downstream of lrhA is a candidate immunity gene that we have termed lrhI. The lrhI gene encodes a 261 residue protein with a predicted transmembrane domain close to its N-terminus; no protein family could be assigned to the remaining portion of this protein. As the globular portion of LrhI is predicted to be extracellular, the target of LrhA is likely to be within the cell envelope. Consistent with our annotation of LrhI as an immunity protein, orphan copies are encoded in variable region 1 of multiple other L. monocytogenes essC types, including genomes UTK_C1-0018-E1 (essC1; LAX90_RS06725), UTK_C1-0003-E1 (essC2; LAX77_RS05315) and NRRL B-3381 (essC5; BJM68_RS12710).
Most T7SSb toxins analysed to date are encoded at loci that also contain genes for small helical proteins, some of which are from the WXG100 family and that serve as targeting factors/chaperones for toxin secretion [16,19]. Sandwiched between essC8 and lrhA are three small genes. One of these (FSL L7-0325, FTZ063_RS11910; Fig. 3b) encodes a canonical WXG100 family protein, while the other two also encode small proteins that are predicted to be almost completely alpha-helical.

modelling of LrhA confirms predicted Rhs structural features
To explore the organization of LrhA further, including identifying any potential structural features of the N-terminal region (which has no predicted homology to any known protein domain), we divided the protein sequence into three equal sections with 100 residue overlaps and analysed them with AlphaFold multimer (Fig. S3), with each output being aligned to generate a model of the entire protein. The AlphaFold prediction clearly modelled the Rhs repeats as a column of β-sheets, with the folded C-terminus oriented within the β-sheet bundle (Fig. 4). While AlphaFold also predicted the fold of the C-terminal toxin domain, we were not able to identify any significant structural matches to this protein region using the dali or phyre2 servers.
To explore any potential protein-protein interactions, the three small proteins encoded upstream of lrhA were modelled alongside the LrhA N-terminal 173 amino acids using AlphaFold multimer (Fig. S3). This resulted in all three proteins being positioned at the α-helical N-terminus of LrhA, with the α-helices aligned along a single axis, almost intercalating with one another. Based on this model, together within the current understanding in the role of these small proteins in T7 substrate secretion in other species [16,19], these proteins were assigned the term Listeria T7SSb auxiliary proteins (LxpABC). The predicted AlphaFold model of LrhA bound to the Lxp proteins is shown in Fig. 4.

novel LXG-domain toxin, omicron, is encoded at variable region 2 of some genomes
The second variable region found at the L. monocytogenes T7SSb locus, variable region 2, is bounded by a tRNA-lysine gene, lmot01, at the 5′ end and a hypothetical membrane protein-encoding gene, lmo0082, at the 3′ end. This variable region is generally much shorter than variable region 1, encoding a single LXG-domain toxin (either toxin D, E, F or G) [22] alongside a cognate immunity gene and sometimes additional orphan immunities. We noted that in the small dataset analysed previously, approximately 25 % of genomes had no toxin encoded at this locus [22].
Analysis of our larger dataset revealed that 74.6 % of the genomes encoded an LXG-domain toxin in variable region 2 (Tables 3  and S2-S5, Figs 1a and S1); we never found more than one LXG toxin encoded in this region. This heterogeneity was also borne out in the 11 essC8 genomes where 7 of them were found to encode an LXG-domain protein. Across these seven genomes, one of three LXG-domain toxins were present. Three genomes encoded a copy of toxin G, while two encoded copies of toxin D. Surprisingly, two genomes encoded a previously unidentified toxin at this position, designated toxin Omicron (Table 3). An alignment of Omicron with toxins D-G (Fig. S4a) and Plotcon analysis (Fig. S4b) shows that it shares the same highly conserved N-terminal region (up to amino acid 343), with a divergent C-terminal toxin domain that shares structural similarity to the Tne2 NADase type VI secretion system effector [56]. A predicted immunity gene for Omicron is encoded adjacent to the toxin and is also found as an orphan immunity gene in variable region 2 of some other L. monocytogenes genomes.
Although we had not identified toxin Omicron in our previous analysis, Omicron was also encoded at low frequency in variable region 2 of other genomes in our larger dataset (Table 3). Further analysis revealed that all of these are exclusively lineage III genomes (Table S4). This prompted us to examine whether other variable region 2 toxins showed any lineage specificity. Strikingly, it is clear that LXG toxin identity clustered with EssC type across lineage and MLST CC, but that this was irrespective of whether strains were isolated from clinical or non-clinical settings (Figs 1a and S1). We noted that only toxins D or F were encoded across lineage I (with toxin F being the most frequent; Table S2, Fig. 1a) whereas lineage II genomes predominantly encode toxin D or E, with rare instances of toxin F (Table S3, Fig. 1a). Lineage III genomes showed the highest variability, with instances of genomes encoding each of the five different toxins. By contrast, the small number of lineage IV genomes in our dataset only encoded toxin H, and the majority of genomes in this lineage did not encode any LXG toxin at this region (Table S5, Fig. 1a).

Insertion of a prophage within variable region 2 of essC8 strain 13FmFo000101
During examination of the essC8 genomes, it was noted that L. monocytogenes 13FMFO000101 presented an extensively extended variable region 2 due to insertion of a 38.6 kb prophage into the tRNA gene, lmot01 (Fig. 5). While no direct identity could be assigned to the prophage, it was determined to be intact by the phage prediction phaster webserver, with 64 ORFs being identified. Interestingly, insertion of the prophage did not disrupt the LXG-domain toxin or immunity genes in this region. Furthermore, the attR arm of the prophage did not disrupt the tRNA-lysine as a result of insertion into the chromosome. None of the prophage ORFs were predicted to encode any T7SSb-related toxin or immunity genes. An overview of the ten essC8 genomes identified from the genome repository is illustrated in Fig. 5, including the highly extended variable region 2 of 13FMFO000101.

Identification of new ess-external hotspots and predicted T7SSb substrates
Previously, we identified ten chromosomal hotspots across L. monocytogenes, in addition to variable regions 1 and 2, where LXG-domain substrates could be encoded (Fig. 6), and 33 distinct LXG toxin sequences (toxins H-ϕ; Table S6) [22]. When a hotspot was 'occupied' , generally only one LXG-domain-encoding gene was found (except for the hotspot at lmo1096 where two LXG proteins were encoded in a head-to-head arrangement). Additional features of hotspots are the presence of one or two genes encoding small helical hairpin proteins sharing the WXG100 family fold, along with genes for immunity proteins [22].
To examine the repertoire of LXG-domain proteins in essC8 genomes, we searched genome annotations with the term 'LXG' , and also undertook blast analysis of encoded proteins using the LXG domains of previously identified L. monocytogenes toxins [22] ( Table S6). From this, we found that the essC8 genomes collectively encoded a further 17 of the LXG proteins we had described previously [22], in addition to those at variable region 2. We also noted that as well as Omicron, there were a further eight LXGdomain proteins that were novel. In each case, the LXG proteins were encoded adjacent to a probable immunity gene and were preceded by one or two genes for WXG100-like proteins. Gene neighbourhood analysis showed that four of these novel toxins -Nu, ψ, Iota and ω -were encoded at known hotspot regions (Table S6). The other four LXG-proteins -Kappa, ξ, Upsilon and Chi -were encoded at four new hotspots (Fig. 6, Table S6). blast searches indicated that all eight of these toxins were encoded, at varying frequencies, in other essC variant genomes.

T7SSb is encoded by non-monocytogenes Listeria spp.
A blastp search identified that EssC homologues are encoded in many species of Listeria in addition to L. monocytogenes. Batch Entrez was used to retrieve available EssC sequences of non-monocytogenes Listeria species from the RefSeq database and an alignment of these sequences was performed, with a representative of each L. monocytogenes EssC (1-8) for reference. A neighbour-joining tree is shown in Fig. 7. All eight EssC variants are represented in non-monocytogenes Listeria spp., with Listeria innocua, Listeria welshimeri and Listeria seeligeri encoding seven of the eight EssC variants including EssC8 (Table 4). Outside of EssC 1-8, other potential EssC types were detected, including four possible types primarily associated with Listeria booriae, three possible types with Listeria grayi and six other possible types with other Listeria species (Fig. 7, Table 4).

T7SSb Rhs toxin, LrhA, is encoded by L. welshimeri, L. seeligeri and 'Listeria swaminathanii'
To determine whether the Rhs toxin, LrhA, was restricted to L. monocytogenes or more broadly distributed in the genus Listeria, a blastp search was performed. We used Batch Entrez to download available accession numbers from RefSeq, and analysed these by gene neighbourhood analysis (Fig. S5). From this, it was apparent that LrhA homologues are also found in L. welshimeri, 'L. swaminathanii' and L. seeligeri (Fig. S5). Interestingly, while in L. monocytogenes, 'L. swaminathanii' and L. welshimeri LrhA is only encoded downstream of essC8, in L. seeligeri LrhA could additionally be found in association with EssC2, EssC5 and EssC7 (Fig. 8). As discussed above, a small fraction of L. monocytogenes genomes encode a second, truncated copy of EssC of a different variant. We noted that some of the L. seeligeri genomes also have truncated essC genes and that LrhA was also encoded downstream of truncated essC2, essC5, essC7 and essC8 (Fig. 8).
When we compared the sequence of LrhA encoded downstream of the different EssC variants, we noticed that the EssC2, EssC5 and EssC7-associated LrhAs were shorter than the EssC8-associated protein (Fig. S6). The truncated LrhAs in these genomes lack the helical N-terminal region and DUF6531 domain, comprising only Rhs repeats and the C-terminal toxin domain. They also lack the three genes encoding the LxpABC partner proteins. It is not clear whether any of these truncated LrhA proteins would be substrates of the T7SSb and, if so, how they would be targeted to the secretion system. We also note that although the Rhs repeat sequence is highly conserved, the toxin domain is quite divergent across these four LrhA variants. Rhs protein-encoding genes Fig. 6. Chromosomal hotspots where LXG-protein-encoding genes occur in L. monocytogenes genomes. Hotspots shown in blue have been identified previously [22], those in red were found following genome analysis of essC8 genomes.
are notorious sites for recombination [57], and it is possible that these truncated lrhA genes act as a source of sequence variability for the L. seeligeri species, providing a template for diversification of the toxin domain to overcome immunity.

DISCuSSIon
In this work, we have assembled a repository of 37 930 L. monocytogenes genomes and interrogated it to explore the diversity in the T7SSb and its substrates across the species. We have shown that the core secretion component, EssC, occurs as one of eight variants, with EssC1 being the most common and EssC8 the rarest. We also show that a small subset of essC1, essC2 and essC3 genomes encode a truncated EssC of a different variant, along with the likely substrate protein(s) recognized by that specific EssC type. Four variants of EssC have previously been reported in Staphylococcus aureus [58], and recently it was shown that there are four EssC types in group B Streptococcus [59], suggesting that heterogeneity is a general feature of the T7SSb.
The EssC8 variant had not been previously described and at present is only found in a few L. monocytogenes lineage III genomes. An Rhs domain protein, LrhA, is encoded downstream of essC8 but is not found in other essC variant genomes, and we suggest that in L. monocytogenes it is a T7SSb-EssC8-specific substrate. This is further supported by the finding that LrhA is encoded adjacent to genes for small WXG100 proteins, which are known to act as chaperones and/or targeting factors for T7-secreted toxins [16,19].
The most common toxin substrates of the T7SSb harbour an LXG domain and our prior analysis had identified 40 distinct LXG proteins across L. monocytogenes genomes [22]. While we were not able to individually analyse all of the genomes in the repository, inspection of the essC8 genomes identified a further nine proteins of the LXG family. Some of these were encoded within known chromosomal hotspots, whereas four were encoded at new hotspots not previously described. Retrospective analysis indicated that these novel LXG proteins were not unique to essC8 genomes but also present in other essC variant genomes in the  repository. It is likely that there are additional LXG proteins encoded within the species that may be revealed by more intensive study of the available genome sequences.
We extended our analysis to assess T7SSb variability in the genus Listeria. Although there is significantly less genome sequence information outside of L. monocytogenes, we were able to identify numerous subtypes of EssC in other Listeria spp., including examples of each of the eight L. monocytogenes EssC variants. We also identified several novel EssC variants, and for species where multiple complete genome sequences were available, we frequently found they encoded more than one EssC subtype.