Sequence-Specific Intramembrane Proteolysis: Identification of a Recognition Motif in Rhomboid Substrates

Summary Members of the widespread rhomboid family of intramembrane proteases cleave transmembrane domain (TMD) proteins to regulate processes as diverse as EGF receptor signaling, mitochondrial dynamics, and invasion by apicomplexan parasites. However, lack of information about their substrates means that the biological role of most rhomboids remains obscure. Knowledge of how rhomboids recognize their substrates would illuminate their mechanism and might also allow substrate prediction. Previous work has suggested that rhomboid substrates are specified by helical instability in their TMD. Here we demonstrate that rhomboids instead primarily recognize a specific sequence surrounding the cleavage site. This recognition motif is necessary for substrate cleavage, it determines the cleavage site, and it is more strictly required than TM helix-destabilizing residues. Our work demonstrates that intramembrane proteases can be sequence specific and that genome-wide substrate prediction based on their recognition motifs is feasible.

algorithm (Sharpe et al., unpublished) was used to refine the transmembrane boundaries of proteins predicted to have type I or type III topology (i.e. periplasmic N-terminus). The approximate edges from Phobius were indented by 4 amino acids at both ends and a window of five residues was used to scan for mean hydrophobicity using the Goldman-Engelman-Steitz hydrophobicity scale (Engelman et al., 1986) which is the most appropriate hydrophobicity scale for amino acids in single pass transmembrane regions (Koehler et al., 2009). Transmembrane domain boundaries were then defined by a window average hydrophobicity of greater than -0.94 kcal/mol or by individual residue hydrophobicity of greater than 8.0 kcal/mol.
To identify proteins that contained the rhomboid recognition motif, the specificity matrix ( Fig. 4B) was used to derive a regular expression describing the recognition motif as follows. The P4, P1 and P2' positions were allowed to be occupied only by amino acids that, in the context of TatA, permitted 51-100% cleavage efficiency in comparison to the wild type (denoted by white and light grey squares in Fig. 4C). All other positions, that is, P5, P3, P2, and P1', were allowed to be occupied by any amino acids except for those that caused decrease of cleavage efficiency to 0-25% compared to the wild type TatA (black squares in Fig. 4B), unless they occurred in a known substrate (that is glutamate in P3 of Spitz, and aspartate in P1' of LacYTM2). Next, the putative type I and III proteins were scanned for the regular expression within a sequence window such that the P2' position of the motif (Fig.6A) was up to 10 amino acids upstream of the N-terminal TMD boundary and greater than 16 residues upstream of the C-terminal boundary of the TMD (analysis range P). Motif quality in these hits was scored by summing up P4, P1, and P2' contributions following this arbitrary Putative type I or Type III proteins that did not have any motif in a sequence window defined such that the P2' position was no more than 10 amino acids upstream of the Nterminal TMD boundary and no more than 5 amino acids downstream of the C-terminal TMD boundary (analysis range N) were predicted not to be AarA substrates. Proteins that contained the motif downstream of analysis range P were excluded from further analysis because their motifs were deemed topologically unlikely to be accessible to rhomboid cleavage when embedded in the lipid bilayer.

Supplemental Results
Newly generated or unmasked secondary recognition motifs explain the apparent exceptions in motif recognition by AarA.
As described in the main text, the P2' mutant of Gurken (I247G) and P1 and P4 mutants of Spitz (A138F and L135G, respectively) were unexpectedly cleaved at almost wild type levels by AarA. To examine the products in more detail, we introduced these three mutations into the corresponding chimeric MBP/Trx fusion proteins (Fig. 1A), and determined cleavage sites by AarA in vitro using N-terminal sequencing and mass spectrometry (Fig. 5C).
Specifically, the P2' mutation of Gurken (I247G) led to cleavage between G247 and V248 and generated a new recognition site with M244, G247, and F249 in P4, P1, and P2' positions, respectively. Similarly, the P1 (A138F) mutation in Spitz generated a new recognition motif with the F138 in P4 position resulting in cleavage between A141 and S142.
Knocking out the original P4 residue in Spitz by L135G mutation shifted the cleavage site between G143 and A144, revealing a secondary recognition motif, normally cryptic, which contains permissive P1 (G143), P4 (I140) and P2' (M145) residues (Fig. 5C). The double P4/P2' mutant of Spitz (L135G/I140G) is uncleavable because P4 residues from both recognition motifs become disabled. The presence of two juxtaposed recognition motifs in Spitz was also confirmed in another way: individual mutation of the primary (A138P) or secondary (G143P) P1 residues in Spitz allows cleavage within the other, non-mutated motif, while the double mutant (A138P/G143P) is uncleavable (Fig. 5D). Thus, the three exceptions were indeed only apparent; they in fact strongly supported and confirmed the overall functional conservation and significance of the TatA-like recognition motif in a diverse set of substrates.

Minor differences in motif recognition by GlpG and YqgP
As with AarA, there were three substrate mutants that appeared partially to contravene the recognition motif requirement. Gurken P2' mutant I247G was cleaved at wild type levels by GlpG, and LacYTM2 and Gurken P1 mutants (S43F and A245F, respectively) were cleaved with moderate efficiency by YqgP. We therefore introduced these three mutations into the corresponding MBP/Trx fusion proteins and determined the cleavage sites by Nterminal sequencing of the in vitro reaction products (Fig. 5E). The P2' mutant of Gurken (I247G) is cleaved by GlpG at its original site indicating that GlpG, while broadly conforming to the substrate motif, has slightly different preferences at the P2' position (which is less tightly constrained than the P4 or P1 positions even for AarA). In addition, YqgP recognises a secondary, normally hidden, recognition site in LacYTM2 that is perfectly stereotypical (Fig.   5E). Although this is the major cleavage site in the P1 mutant of LacYTM2 (S43F), minor cleavage at the mutated original site still occurs. More surprisingly, the P1 mutant of Gurken (A245F) is cleaved at the mutated recognition site with moderate efficiency by YqgP, which is surprising because phenylalanine in the P1 position has proven highly deleterious in other cases. Overall, these results indicate that subsite preferences of GlpG and YqgP for the critical motif positions may be slightly different from those of AarA or that their preferences may be influenced by the recognition motif sequence context.

Putative recognition motifs in published rhomboid substrates
Our analysis of rhomboid specificity suggests that those rhomboids that can cleave the model substrates Spitz, Gurken, TatA and LacyTM2 all specificially recognise a sequence motif, that we defined by mutagenic analysis of TatA. This conclusion is consistent with published data on other rhomboids and their substrates. For example, mouse RHBDL2 (which can cleave Spitz) was shown to cleave mouse thrombomodulin in a cell-based assay and the cleavage site position has been mapped approximately between proline 508 and leucine 528 (Lohi et al., 2004). Consistent with this, we find at least one stereotypical recognition motif in this region (putative P1 residues will be displayed in bold red, while P4 and P2' in bold blue, and TMD will be underlined): PP 508 AVGLVHSGLLIGISIASLCL 528 VVALLALLCHLRKKQ.
In addition, human RHBDL2 cleaves human EphrinB3 roughly between proline 226 and cysteine 250 (Pascall and Brown, 2004) and there is at least one stronger and one weaker putative recognition motif in this range: SMP 226 AVAGAAGGLALLLLGVAGAGGAMC 250 WRRR.
It remains to be tested whether and in which of these predicted recognition motifs RHBDL2 cleaves.
It seems that even some rhomboids with apparently distinct substrate specificity might require a stereotypical TatA-like recognition motif. The recently identified Entamoeba histolytica rhomboid 1 (EhROM1) was shown to cleave an E. histolytica surface lectin EHI_044650 in a cell-based overexpression assay (Baxt et al., 2008). Strikingly, cleavage of EHI_044650 is blocked by a glycine to valine mutation that occurs in a putative P1 position within a potential stereotypical recognition motif: QDVDNTAAIAAGTTVAVVVAVIVVVMVIIAIGIKQTV. This is intriquing since EhROM1 does not cleave Spitz and it was suggested to have different substrate specificity from D.
melanogaster Rhomboid-1 or bacterial rhomboids (Baxt et al., 2008), but it seems that it might recognise a variant of the TatA-like recognition motif.
Finally, we can reconcile our data on Spitz cleavage sites with those published by others. Baker et al. who used Spitz TMD transplanted into C100-Flag (Urban and Wolfe, 2005), originally a gamma-secretase substrate (Li et al., 2000), detected by mass spectrometry two in vitro cleavages by GlpG between (in Spitz numbering) A141-S142 and G143-A144 (Baker et al., 2007). In contrast, here we report cleavage of wild type Spitz sequence by three different bacterial rhomboids and by Drosophila Rhomboid-1 primarily between amino acids A138-S139. However, we also identify a secondary recognition motif with a cleavage site between G143-A144, which is employed, albeit less efficiently, when the primary motif is mutated (Fig. 5C). This secondary motif is preserved and indeed cleaved by GlpG in C100Spitz-Flag, presumably because the primary motif had been disrupted by the transplantation of Spitz TMD into the C100-Flag context (C100-Flag in uppercase, Spitz in lowercase with Spitz numbering used, secondary recognition motif highlighted): DVGSNKa 138 siasgavggvviatvivitlvmlKKK. Note that the residue that would correspond to a P4 in the primary Spitz motif is a serine (bold italicized) in C100Spitz-Flag, which we found non-permissive in this position. In summary, disruption of the primary recognition motif in C100Spitz-Flag leads to cleavages at normally less favoured sites. This is similar to what we have observed in the i7 linker insertion mutant of TatA: during overdigestion of i7 by AarA in vitro, susceptible P1-P1'sites in the linker, lacking stereotypic P4 and P2' residues, can be eventually cleaved, albeit considerably more slowly than the true recognition motif (Fig. 3B, lower graph).
The N-termini of individual species were inferred from their molecular mass, as indicated. Figure S3. Protein sequences of the predicted substrates and non-substrates All candidate substrates of AarA that we predicted in the P .stuartii genome were ranked according to their motif quality. The top fifteen candidate substrates and fifteen predicted non-substrates were selected and their TMDs and surrounding juxtamembrane regions were amplified from genomic DNA, in vitro translated and radiolabelled. Sequences of those protein fragments that were thus generated and tested for cleavage by AarA are shown aligned by the C-terminus of their predicted transmembrane domain (underlined). The number of amino acids that precede each tested fragment in the corresponding full-length protein are denoted by a number at the N-terminus of each sequence. The heptapeptide regions corresponding to the regular expression that encompasses the identified recognition motifs are highlighted in bold within each sequence and also listed separately in square brackets with individual motif quality scores indicated. Each protein is further identified by its NCBI accession and GI numbers, and its predicted topology. For example, 'o85-106i' indicates a periplasmic N-terminus, transmembrane domain spanning residues 85 to 106, and intracellular C-terminus; n10-21c29/30o signifies a signal peptide with a predicted signal peptidase cleavage site between amino acids 29 and 30. Proteins are listed in the order as they appear in Fig. 6C; those that were cleaved by AarA are marked with an asterisk.