Functional and structural characterization of a novel putative cysteine protease cell wall-modifying multi-domain enzyme selected from a microbial metagenome

A current metagenomics focus is to interpret and transform collected genomic data into biological information. By combining structural, functional and genomic data we have assessed a novel bacterial protein selected from a carbohydrate-related activity screen in a microbial metagenomic library from Capra hircus (domestic goat) gut. This uncharacterized protein was predicted as a bacterial cell wall-modifying enzyme (CWME) and shown to contain four domains: an N-terminal, a cysteine protease, a peptidoglycan-binding and an SH3 bacterial domain. We successfully cloned, expressed and purified this putative cysteine protease (PCP), which presented autoproteolytic activity and inhibition by protease inhibitors. We observed cell wall hydrolytic activity and ampicillin binding capacity, a characteristic of most bacterial CWME. Fluorimetric binding analysis yielded a Kb of 1.8 × 105 M−1 for ampicillin. Small-angle X-ray scattering (SAXS) showed a maximum particle dimension of 95 Å with a real-space Rg of 28.35 Å. The elongated molecular envelope corroborates the dynamic light scattering (DLS) estimated size. Furthermore, homology modeling and SAXS allowed the construction of a model that explains the stability and secondary structural changes observed by circular dichroism (CD). In short, we report a novel cell wall-modifying autoproteolytic PCP with insight into its biochemical, biophysical and structural features.


Finding domain 1
Interpro and ThreadDom domain prediction indicates that PCP has three domains. However, when each domain was submitted individually to LOMETS server, the threading alignments presented very low homology between the first 48 residues of domain 1 and the PDB templates. The best templates clearly aligned to conserved regions, especially those predicted by MEROPS to be near the catalytic residues. One of these templates, RipA (PDB accession 3NE0), presents a small Nterminal domain tightly bound to the NlpC/P60 domain, which was shown to physically block the catalytic site cleft. While this small domain, belonging to the PB015164 family, has no homology to the N-terminal region of domain 1 from PCP, we hypothesized that the 48 unaligned residues comprised another sub-domain. This was confirmed after a BLAST search was performed with the N-terminal region. At least seven proteins (e-values < 5e-04) have homologues of the N-terminal region at the C-terminus of NlpC/P60 domains (instead of the N-terminus), corroborating that this region is a distinct domain (Fig. 3A). When submitted individually to LOMETS server, this subdomain presented 20% identity to the LCI domain from Bacillus subtilis, an antimicrobial protein (PDB accession 2B9K), and 26% identity to the C-terminal region of a putative sensor histidine kinase domain (SHK) from Clostridium symbiosum (PDB accesion 3FN2). Remarkably, despite only 16% identity between each other, the LCI domain and C-terminal region of SHK have a very similar fold (Fig. S4), with a β-sheet composed of three antiparallel strands in a β 1 β 3 β 2 topology.
The only difference is the presence of a small fourth strand (β 1/2 ) in the loop between β1 and β2 in the LCI domain (Fig. S4). Nonetheless, we termed this fold as the LCI fold considering that the LCI domain is the smallest structure in the PDB presenting this fold. We conclude that PCP, in fact, has four protein domains, with the "new" domain 1 comprising residues 1 through 48.

Interdomain interactions
When we analyze D1 and D2 in PCP homologues that have an upstream domain 1 in regards to domain 2, it is possible to observe some indications of adaptive evolution between residue pairs. The basic hypothesis connecting correlated substitution patterns and residue-residue contacts is very simple: if two residues of a protein or a pair of interacting proteins form a contact, a destabilizing amino acid substitution at one position is expected to be compensated by a substitution at the other position over the evolutionary timescale, in order for the residue pair to maintain attractive interaction 1 . The most prominent of these pairs is Arg45/Asp152. This pair is either completely conserved (32 out of 41 sequences) or completely mutated (9 out 41 sequences).
Example of the mutated Arg45/Asp152 pair can be seen in (Fig. S6). This observation suggests that this residue pair might be involved in the formation of salt bridge between domain 1 and domain 2.
Superficial hydrophobic patches can indicate where interdomain interfaces are likely to form, as well as where ligand and substrate may bind. This might be the case of the hydrophobic clump comprising residues 50-70, which was predicted to interact with domain 3 in SAXS fit. Figure 6 (S6). Alignment of domains 1 and 2 regions with homologous sequences.

Supplementary
Concomitantly mutated Arg45 and Asp152 residues are depicted in the black boxes and suggest possible interaction sites between domain 1 and domain 2.