Metagenome diversity illuminates the origins of pathogen effectors

ABSTRACT Recent metagenome-assembled genome (MAG) analyses have profoundly impacted Rickettsiology systematics. The discovery of basal lineages (novel families Mitibacteraceae and Athabascaceae) with predicted extracellular lifestyles exposed an evolutionary timepoint for the transition to host dependency, which seemingly occurred independent of mitochondrial evolution. Notably, these basal rickettsiae carry the Rickettsiales vir homolog (rvh) type IV secretion system and purportedly use rvh to kill congener microbes rather than parasitize host cells as described for later-evolving rickettsial pathogens. MAG analysis also substantially increased diversity for the genus Rickettsia and delineated a sister lineage (the novel genus Tisiphia) that stands to inform on the emergence of human pathogens from protist and invertebrate endosymbionts. Herein, we probed Rickettsiales MAG and genomic diversity for the distribution of Rickettsia rvh effectors to ascertain their origins. A sparse distribution of most Rickettsia rvh effectors outside of Rickettsiaceae lineages illuminates unique rvh evolution from basal extracellular species and other rickettsial families. Remarkably, nearly every effector was found in multiple divergent forms with variable architectures, indicating profound roles for gene duplication and recombination in shaping effector repertoires in Rickettsia pathogens. Lateral gene transfer plays a prominent role in shaping the rvh effector landscape, as evinced by the discovery of many effectors on plasmids and conjugative transposons, as well as pervasive effector gene exchange between Rickettsia and Legionella species. Our study exemplifies how MAGs can yield insight into pathogen effector origins, particularly how effector architectures might become tailored to the discrete host cell functions of different eukaryotic hosts. IMPORTANCE While rickettsioses are deadly vector-borne human diseases, factors distinguishing Rickettsia pathogens from the innumerable bevy of environmental rickettsial endosymbionts remain lacking. Recent metagenome-assembled genome (MAG) studies revealed evolutionary timepoints for rickettsial transitions to host dependency. The rvh type IV secretion system was likely repurposed from congener killing in basal extracellular species to parasitizing host cells in later-evolving pathogens. Our analysis of MAG diversity for over two dozen rvh effectors unearthed their presence in some non-pathogens. However, most effectors were found in multiple divergent forms with variable architectures, indicating gene duplication and recombination-fashioned effector repertoires of Rickettsia pathogens. Lateral gene transfer substantially shaped pathogen effector arsenals, evinced by the discovery of effectors on plasmids and conjugative transposons, as well as pervasive effector gene exchanges between Rickettsia and Legionella species. Our study exemplifies how MAGs yield insight into pathogen effector origins and evolutionary processes tailoring effectors to eukaryotic host cell biology.


FIGURE S7. Phylogenomics analyses of candidate REMs (cREM-1-cREM-5).
Protein information is provided in Table S2.All alignments done with MUSCLE (default parameters) (189) with conservation analyzed using WebLogo (190).Amino acid coloring is described in the FIG. 3 legend.Black boxes provide short names for MAGs from Davison et al. (73).(A) cREM-1 proteins are minimized in architecture relative to ancestral forms.Proteins with high similarity to RT0435 (cREM-1) are mostly conserved in Rickettsia genomes yet highly diverse (gray), sometimes duplicated and tandemly arrayed (red) and often components of larger modular proteins (cREM-1d).Inset illustrates cREM-1 similarity to tandem repeats in the scrub typhus effector OtDUB (CAM80065), which carries multiple eukaryotic-like domains (5, 124, 203) (described in FIG. 7).For brevity, alignment of a small conserved region shared across OtDUB repeat 1 (383-406), OtDUB repeat 2 (645-668), Blapp1 HJD67197, and Blapp1 HJD67198 is shown.Phylogeny estimation of 102 cREM-1 proteins indicates diversification of larger CREM-1 domain modular proteins and streamlining to a single cREM-1 protein in most Rickettsia genomes.Alignment was not masked (1544 total sites, 38.34% invariant).A maximum likelihood-based phylogeny was estimated with PhyML (185), using the Smart Model Selection (186) tool to determine the best substitution matrix and model for rates across aa sites (VT +G+F).Branch support was assessed with 1,000 pseudoreplications.Log likelihood of tree: -34728.6.(B) cREM-2 proteins diversified from an ancient gene duplication.Proteins with high similarity to RT0352 (cREM-a, red) and RT0351 (cREM-2b, light blue) are tandemly arrayed and mostly conserved in Rickettsia genomes, yet highly diverse from ancient forms (cREM-2d).Inset illustrates the conserved central region of cREM-2 proteins.Estimated phylogeny indicates cREM-2b proteins are more divergent from cREM-2d proteins.Alignment was not masked (973 total sites, 43.4% invariant).Phylogeny estimated as described in panel A (final model LG +G).Branch support was assessed with 1,000 pseudo-replications.Log likelihood of tree: -25322.17.(C) cREM-3 proteins are highly conserved and present in other proteobacterial assemblies.These proteins are typically annotated as Pfam PF10877 (DUF2671: restricted to Rickettsia spp.).This sequence logo is for 107 non-redundant proteins obtained from searches against 'Rickettsiales', with proteins aligned using MUSCLE (default parameters).A more complete list of proteins is found in Table S2, though more sequences are likely retrievable using HMMER searches.At bottom, a pairwise alignment between R. typhi RT0206 and the most divergent subject retrieved from a BlastP search excluding 'Rickettsiales' is shown (hypothetical protein B7X02_01410 from Rhodospirillales bacterium 12-54-5, OYW13786.1).These proteins are 30% identical over the match shown.The residues noted with arrows in the sequence logo are shown over the pairwise alignment and below with a structure for R. typhi RT0206 predicted with Alphafold (197,198) S2. [NOTE: The Rickettsia endosymbiont of Oedothorax gibbosus (NZ_OW370493) and Rickettsia endosymbiont of Ceutorhynchus assimilis (NZ_OU906081) assemblies were not included in our other analyses, as manuscripts supporting these assemblies were not published.We included them in this analysis due to their relevance to reproductive parasitism (RP) in Rickettsia species].
Top, architecture of the modular protein pLbAR_38, which is carried on plasmid pLbAR of Rickettsia felis str.LSU-Lb (gray inset) (92).Black triangles, proprotein convertase cleavage sites (211).We refer to this protein and its adjacent predicted antidote (not shown) as a CndA/B module (174), as the toxin encodes nuclease (CinB) and deubiquitinase (CidB) domains similar to RP toxins of certain wolbachiae that cause cytoplasmic incompatibility in arthropod hosts (168-171).We previously showed that pLbAR_38 shares similarity to a diverse assemblage of proteins from a narrow range of obligate intracellular bacteria, some of which are known reproductive parasites (5).
Here, the remaining schema shows the result of a Blastp search against the NCBI nr database using pLbAR_38 (coordinates 1-3048, which excludes the ankyrin repeats) as the query (inset, color key for alignment scores).NOTE: each subject (numbered 1-24 at right) represents a distinct protein architecture, yet in some cases multiple similar proteins can be found for closely related species and strains).Yellow stars, novel RP toxins identified since our prior report (5).Matches for wolbachiae, Cardinium species (Bacteroidetes), Diplorickettsia species (Gammaproteobacteria), and Rickettsiella species (Gammaproteobacteria) are not shown to emphasize novel findings in Origins of Rickettsia T4SS effectors 15 Rickettsiaceae.White numbers indicate % identity across significant alignments.Subjects missing internal sequence relative to pLbAR_38 (no.4) are joined by dashed lines; subjects with large insertions relative to pLbAR_38 are adjusted accordingly (nos. 5, 14, and 15).Blurred-out regions within subjects depict sequences with no significant matches to pLbAR_38.Six proteins for the Rickettsia endosymbiont of Ceutorhynchus assimilis are boxed, with the arrow pointing to a seventh protein that is directly compared to a protein from the Rickettsia endosymbiont of Adalia bipunctata

FIGURE
FIGURE S6.Characteristics of Rickettsia Ankyrin Repeat 2 (RARP-2) proteins and . (D) cREM-4 proteins harbor a conserved pentapeptide repeat (PR).Schema shows alignment of 82 non-redundant cREM-4 proteins with illustration of the PR consensus sequence at top (208).A small conserved motif (inset) was also identified.(E) cREM-5 and cREM-5p have conserved central regions that lack similarity to proteins outside of Rickettsia and Tisiphia genomes.(F) One copy of cREM-5p from the RiClec (Endosymbiont of Cimex lectularius) genome is found on a RAGE transposon.General schema and annotation of RAGE genes follows previous reports (91, 92, 133).

FIGURE S8 .
FIGURE S8.Discovery of novel RickA architecture.Black boxes provide short