The phage shock protein (PSP) envelope stress response: discovery of novel partners and evolutionary history

ABSTRACT Bacterial phage shock protein (PSP) systems stabilize the bacterial cell membrane and protect against envelope stress. These systems have been associated with virulence, but despite their critical roles, PSP components are not well characterized outside proteobacteria. Using comparative genomics and protein sequence-structure-function analyses, we systematically identified and analyzed PSP homologs, phyletic patterns, domain architectures, and gene neighborhoods. This approach underscored the evolutionary significance of the system, revealing that its core protein PspA (Snf7 in ESCRT outside bacteria) was present in the last universal common ancestor and that this ancestral functionality has since diversified into multiple novel, distinct PSP systems across life. Several novel partners of the PSP system were identified: (i) the Toastrack domain, likely facilitating assembly of sub-membrane stress-sensing and signaling complexes, (ii) the newly defined HTH-associated α-helical signaling domain-PadR-like transcriptional regulator pair system, and (iii) multiple independent associations with ATPase, CesT/Tir-like chaperone, and Band-7 domains in proteins thought to mediate sub-membrane dynamics. Our work also uncovered links between the PSP components and other domains, such as novel variants of SHOCT-like domains, suggesting roles in assembling membrane-associated complexes of proteins with disparate biochemical functions. Results are available at our interactive web app, https://jravilab.org/psp. IMPORTANCE Phage shock proteins (PSP) are virulence-associated, cell membrane stress-protective systems. They have mostly been characterized in Proteobacteria and Firmicutes. We now show that a minimal PSP system was present in the last universal common ancestor that evolved and diversified into newly identified functional contexts. Recognizing the conservation and evolution of PSP systems across bacterial phyla contributes to our understanding of stress response mechanisms in prokaryotes. Moreover, the newly discovered PSP modularity will likely prompt new studies of lineage-specific cell envelope structures, lifestyles, and adaptation mechanisms. Finally, our results validate the use of domain architecture and genetic context for discovery in comparative genomics.

PSP evolution across the tree of life Supplementary Tables Table S1

PSP evolution across the tree of life
Supplementary Text

LiaI-LiaF-TM and Toastrack domains
To best characterize the LiaIGF proteins, we used PSI-BLAST searches from the three sub-sequences of the full-length proteins, N-terminal TM region of LiaF, the C-terminus globular domain (DUF2154) of LiaF, and the globular domain in LiaG (DUF4097), followed by structure-informed sequence alignment.These analyses revealed that LiaI and LiaG bear remarkable similarities to the N-terminal TM and Cterminal globular regions of the LiaF protein, respectively.We discovered that the globular domains of these LiaG-LiaF proteins are homologs of each other and that the profiles detected by Pfam in this region, DUF2154, DUF4097, and DUF2807, can be unified into a single domain called "Toastrack" that has a single-stranded right-handed beta-helix fold with a unique N-terminal 7-stranded region that displays a complex intertwining of strands (PDB: 4QRK; Toastrack_N, Pfam: PF17115 only has this unique N-terminal region) [Fig.1A; Table S1].Likewise, the homology between the 4TM (four TM) regions of LiaI and the N-terminal domain DUF2157 of LiaF led us to rename the 4TM region as 'LiaI-LiaF-TM' [Fig.1B; Table S1].Thus, the results of our analyses define two new domains: LiaI-LiaF-TM and Toastrack.

PspM and PspN domains
PspM comprises two TM regions and no other distinct domains.PspN contains a short domain at the C-terminus, DUF3046, and a yet uncharacterized N-terminal domain, which we now call PspN_N.We found that DUF3046 is α-helical with highly conserved threonine and cysteine residues that might be required for its function.
To further characterize the DUF3046 homologs, we used nucleotide sequences rather than translated open reading frames (ORFs), followed by sequence alignment analysis [Fig.1B].We found that the DUF3046 domain, which is widespread across Actinobacteria, is more similar to the short downstream protein, Rv2738c, than to the C-terminus of the fourth member of the M. tuberculosis operon, PspN (encoded by Rv2742c).

PspAA and PspAB domains
The PspA neighborhood analysis identified a new component in the proximity of PspA, which is a protein containing a novel trihelical domain (with absolutely conserved R and D) present in Euryarchaeota, Thaumarchaeota, Actinobacteria, Chloroflexi, Firmicutes, and a few Alpha-and Gammaproteobacteria).This protein occurs in a two-gene cluster with PspA [Fig.3; Table S3; e.g., MA_1460; AAM04874.

Novel PSP associations
2.1 PspA/Snf7 domain architectures A very small fraction of PspA homologs shows variation in their domain architecture (proteins that contain fusions with PspA instead of carrying PspA alone).For example, cyanobacterial PspA homologs show some interesting variations: a few have dyads or triads of PspA, either as repeated domains within a polypeptide or a predicted operon with multiple copies of PspA-containing genes [Fig.3; e.g., BAG06017.1],while others carry an additional hydrolase domain of NlpC/P60 superfamily at the N-terminus that is predicted to catalyze the modification of phosphatidylcholine, thus altering membrane composition [Fig.3; Table S3; AFZ52345.1;(1)].We also find a novel fusion of PspA with PspAA in Actinobacteria (ACU53894.1,Acidimicrobium; defined in the section on PspAA below).Similar to the PspA homologs, a search for the related superfamily, Snf7, revealed minimal variation in domain architecture, with occasional fusions (<5%) found only in eukaryotes [Fig.2B].Some Actinobacteria, such as Mycobacteroides abscessus, have an Snf7 homolog [CAM62382.1,Fig. 2B] fused to an RND-family transporter member.The latter transports lipids and fatty acid and is flanked by two genes encoding the Mycobacterium-specific TM protein with a C-terminal Cysteine-rich domain (2).

Vps4 and AAA + -ATPases
One or more copies of an snf7 gene [e.g., OLS27540.1;Table S3] and a gene for the VPS4-like AAA + -ATPase (with an N-terminal MIT domain and C-terminal oligomerization domain; Table S2) are known to occur together in Archaea; they define the core of an ESCRT complex (3).However, we observed some diversity between different archaeal lineages.For example, the Asgardarchaeota contain a genomic context that is most similar to eukaryotes.This archaeal context is composed of the Vps4 AAA + -ATPase and Snf7-encoding genes along with an ESCRT-II gene that encodes a protein with multiple winged helix-turn-helix (wHTH) domains (4).In Crenarchaeota, Snf7 and the Vps4 AAA+-ATPase are encoded in a distinct three-gene operon, which contains a gene coding for a CdvA-like coiled-coil protein with an N-terminal PRC-barrel domain implicated in archaeal cell division (5).In this case, the Snf7 domain is fused to a C-terminal wHTH domain, which might play a role equivalent to the ESCRT-II wHTH domain.These operons may be further extended with additional copies of Snf7 genes and other genes coding for a TM protein and an ABC ATPase.We also observed that a related VPS4-like AAA + -ATPase was transferred from Archaea to Bacteria and is found in Cyanobacteria, Bacteroidetes, Verrucomicrobia, Nitrospirae, and Planctomycetes (e.g., ACB74714.1,Opitutus; Table S3).In these operons, the snf7 gene is displaced by an unrelated gene coding for a larger protein with TPR repeats followed by a 6TM domain, again suggesting a membrane-proximal complex.
Our analysis also showed that the bacterial PspA (e.g., AEY64321.1,Clostridium; Table S3) might occur with a distinct AAA + -ATPase in various bacterial clades.The resulting protein (e.g., AEY64320.1,Clostridium) has two AAA + -ATPase domains (e.g., CKH37208.1,Mycolicibacterium) in the same polypeptide, with the N-terminal version being inactive.This gene dyad also occurs with either a previously unidentified membrane-anchored protein with a divergent Snf7 domain (OGG56892.1;Table S3) and other coiled-coil or -helical domain-containing proteins.Both PspA and the membrane-associated Snf7, along with the AAA + -ATPase, may occur in longer operons with other genes coding for an ABC-ATPase, an ABC TM permease, and a solute-binding protein with PBPB and OmpA domains [e.g., OGG56892.1;Table S3].

PspA with PspM or Thioredoxin
The association of ClgR-HTH with PspAM is also confined to this RsmP family, suggesting that these are also determinants of the rod-shaped morphology of the cell.The PspN presence in the immediate operon of ClgR-HTH-PspAM (containing ClgR, PspA, PspM) is limited to a few mycobacteria (CCP45543.1,M. tuberculosis H37Rv), which have an N-terminal PspN_N (as defined below) and Cterminal DUF3046.The remaining ClgR-HTH-PspAM operons lack the fused PspN_N-DUF3046 protein and instead contain only the ancestral DUF3046 located three genes downstream (ABK71106.1,Mycolicibacterium smegmatis).The duplicated DUF3046 domain forms the intact ClgR-HTH-PspAMN operon only in the M. tuberculosis complex (6,7).The presence of the same family of thioredoxin with a different family of PspA (typically, two copies) in Cyanobacteria suggests that the thioredoxin homolog is involved in a similar redox activity to control PspA [AFZ14666.1,Crinalium; Fig. S3].

Novel contexts containing Toastrack
Toastrack and TM domains In most homologs, we find that Toastrack domains are fused to N-terminal single or multi-TM domains such as PspC, LiaI-LiaF-TM, HAAS, SHOCT, strongly suggesting that the Toastrack domains are predominantly intracellular with N-terminal membrane tethers [Fig.4, 5; Table S4].In Cyanobacteria, we find variable multidomain proteins with an N-terminal TM anchor followed by a region containing the Toastrack domain flanked by immunoglobulin (Ig) and one or more catalytic domains such as a fringe-like glycosyltransferase or a caspase-like thiol peptidase [Fig.4; Table S4; AFY83227.1,Oscillatoria; Table S2].Further, in several architectures, the N-terminal TM regions fused to the Toastrack domain are replaced by at least two variants of the bihelical SHOCT (e.g., Bacillus subtilis yvlB, CAB15517.1)[Fig.4, 5; Tables S1 and S4].We call these variants SHOCT-like domains to distinguish them from the classical SHOCT domain, as these include a domain partly detected by the Pfam DUF1707 (8) model and another that has not been detected by any published profile.The SHOCT and related domains are fused to disparate domains and are typically found at the N-or C-termini of proteins.

Toastrack and transcription factors
We also discovered several conserved genomic contexts containing Toastrack, with likely roles in membrane-linked stress response: The first of these found across diverse bacterial lineages contains a core of four genes coding for i) a sigma factor, ii) a receptor-like single TM protein with an intracellular anti-sigma-factor zinc finger (zf-HC2, PF13490 in Pfam) and extracellular HEAT repeats, iii) one or two membrane-anchored Toastrack-containing proteins (AFK03672.1 Emticicia; Table S4), and iv) a previously uncharacterized protein with hits to the Pfam model DUF2089.We found that this Pfam model DUF2089 can be divided into an N-terminal ZnR, central HTH, and C-terminal SHOCT-like domains (ADE70705.1,Bacillus) [Fig.4; Tables S2 and S4].In a few of these operons, the membrane anchor of the Toastrack domain is a LiaI-LiaF-TM domain [Fig.5; Table S4].Variants of this system include additional genes coding for a protein with a LiaI-LiaF-TM domain fused to an N-terminal B-box domain (e.g., ACO33311.1,Acidobacterium) or a PspC protein (e.g., OGF50123.1,Candidatus Firestonebacteria) [Fig.5; Table S4].We propose that this three-gene system functions similarly to the classical lia operon in transducing membrane-associated signals to a transcriptional output affecting a wide range of genes via the sigma factor.
Similarly, an operon observed predominantly in various Proteobacteria and Bacteroidetes couples a protein with a membrane-anchored Toastrack domain (typified by AAM36414.1,Xanthomonas) with genes coding for an ABC-ATPase, a permease subunit, and a GNTR-HTH transcription factor with distinct C-terminal α-helical domain and another [Fig.5; Tables S2 and S4].These operons also code for a previously uncharacterized protein matching the Pfam DUF2884 model.We show that these proteins are membrane-associated lipoproteins (e.g., AJI31452.1,Bacillus), which might function as an extracellular solute-binding partner for the ABC-ATPase and permease components.A comparable operon found in Actinobacteria replaces the GNTR-HTH transcription factor with a ribbon-helix-helix (RHH) domain protein.In some Actinobacteria, the Toastrack domain encoded by the operon is fused to a SHOCT-like domain and is encoded adjacent to genes specifying a two-component system (CCP43715.1,Mycobacterium) or a transport operon (CAB88834.1,Streptomyces) [Fig.5; Table S4].These operons with the Toastrack domains are likely to couple transcriptional regulation to the sensing of membrane-proximal signal and transport [Fig.5; Table S4].The GNTR-HTH and RHH operons in these systems are likely to function as transcriptional regulators analogous to PspF and ClgR transcription factors from classical PSP systems.

Figure S2 .
Figure S2.PspA maximum likelihood gene tree with bootstrap values by RAxML-NG.

Table S3 :
Representative PspA/Snf7 homologs Gene, Lineage information and Genomic Contexts grouped by Domain Architectures

Table S4 :
Representative homologs of Psp cognate partner domains Gene, Lineage information and Genomic Contexts grouped by Domain Architectures More information available on our webapp.

Table S2 ;
web app].This predicted operon occasionally contains a third gene coding for a SHOCT-like bihelical domain-containing protein in various bacterial and archaeal lineages [Fig.3; In a few bacterial and archaeal lineages, the PspA-PspAA dyad co-occurs with another dyad comprising a membrane-associated Metallopeptidase and a protein with a novel domain, which we termed PspAB (for PspA-associated domain B, AAZ55047.1,Tfu_1009Thermobifida) [Fig.3;