Topology-Driven Discovery of Transmembrane Protein S-Palmitoylation

Protein S-palmitoylation is a reversible lipophilic posttranslational modification regulating a diverse number of signaling pathways. Within transmembrane proteins (TMPs), S-palmitoylation is implicated in conditions from inflammatory disorders to respiratory viral infections. Many small-scale experiments have observed S-palmitoylation at juxtamembrane Cys residues. However, most large-scale S-palmitoyl discovery efforts rely on trypsin-based proteomics within which hydrophobic juxtamembrane regions are likely underrepresented. Machine learning– by virtue of its freedom from experimental constraints – is particularly well suited to address this discovery gap surrounding TMP S-palmitoylation. Utilizing a UniProt-derived feature set, a gradient boosted machine learning tool (TopoPalmTree) was constructed and applied to a holdout dataset of viral S-palmitoylated proteins. Upon application to the mouse TMP proteome, 1591 putative S-palmitoyl sites (i.e. not listed in SwissPalm or UniProt) were identified. Two lung-expressed S-palmitoyl candidates (synaptobrevin Vamp5 and water channel Aquaporin-5) were experimentally assessed. Finally, TopoPalmTree was used for rational design of an S-palmitoyl site on KDEL-Receptor 2. This readily interpretable model aligns the innumerable small-scale experiments observing juxtamembrane S-palmitoylation into a proteomic tool for TMP S-palmitoyl discovery and design, thus facilitating future investigations of this important modification.


SUPPLEMENTARY METHODS
Synthesis of pyridyl disulfide sepharose for Acyl-RAC -All reactions are conducted in PDS buffer (100 mM HEPES + 2 mM EDTA pH 7.8) unless otherwise indicated.A 3 mL slurry of NHS-Sepharose was washed three times with PDS buffer in a 10 mL filter column, followed by addition of 100 mM cystamine in PDS buffer to fill the 10 ml column.

Figure S1 .
Figure S1.Summary statistics of the training dataset.(A) Types of TMPs represented in the training data.Total number of Cys sites are grouped based on whether they reside on a multi-pass, Type I/III or Type II/IV TMP (based on orientation of the N-termini relative to the lipid bilayer).(B) Distribution of Spalmitoyl (magenta) and non-S-palmitoyl (gray) Cys sites based on topological location.Shown above each column are the percentages of Cys sites that are S-palmitoylated within each type.

Figure S2 .
Figure S2.Hydrophobicity and hydrophobicity gradients as examples of the TopoPalmTree feature set.Hydrophobicity values are derived from the window sequence (5 amino acids flanking on each side of the given Cys residue).Total hydrophobicity is calculated for the entire window sequence whereas the hydrophobicity gradient is the difference in hydrophobicity (C-terminal minus N-terminal window).For total hydrophobicity, positive values indicate increased overall hydrophobicity surrounding the Cys residue.For hydrophobicity gradient, more positive values indicate increasing hydrophobicity from the Nto C-terminal direction across the given Cys residue.Shown are (A) total hydrophobicity and (B) hydrophobicity gradient for transmembrane regions, as well as (C) total hydrophobicity and (D) hydrophobicity gradient for cytoplasmic regions.The black bars represent mean values.

Figure S3 .
Figure S3.Window Cys score and Asn scores as examples of amino acid-based features within the training dataset.(A) Window sequence scoring system for adjacent Cys residues giving priority to closer Cys residue.Cys residues flanking the Cys of interest (center) are awarded 4 points.Cys residues in the 2 nd flanking positions are given 2 points, and then 1 point for any Cys within flanking positions 3 through 5 in either direction.(B) The values are then summed to achieve a total score for the N-and C-terminal windows, with mean values shown based on S-palmitoyl status.(C) Asparagine scoring system providing one point for every Asn residue within the N-and C-terminal windows.(D) Mean total Asn scores based on S-palmitoyl status.

Figure S4 .
Figure S4.Sequence logos based on S-palmitoyl status within the training dataset.Sequence logos were generated with package ggseqlogo.For each logo, the central empty space represents the Cys site of interest with N-and C-terminal flanking residue probabilities on the left and right, respectively.The yaxes represent probability with each position summing to 1. Logos are separated by column for Spalmitoyl status and row for topological locations.

Figure S5 .
Figure S5.Feature importance plot for TopoPalmTree.The varImp function from the caret package was used to examine feature importance for the top 20 features.Importance is determined by reduction in the Gini index.The highest scoring feature determines whether the Cys site is cytoplasmic.Features ending with _n and _c represent the N-and C-terminal windows, respectively, for the indicated feature.

Resource Table REAGENT or RESOURCE SOURCE IDENTIFIER Antibodies (dilution)
The tube is rotated for 4 hours at room temperature, washed at least 4 times with PDS buffer, washed 4 times with PDS buffer containing 20 mM DTT, then washed another 4 times with PDS without DTT.The slurry is washed 4 times with MeOH, 100 mM 2-pyridyl disulfide (aka Aldrithiol-2) in MeOH is added to fill the column and the reaction is rotated at room temperature for 15 minutes.The slurry is washed twice with 100 mM 2-pyridyl disulfide in MeOH then

Reagents for synthesis of thiopropyl sepharose (TPS)
Sequences are all for murine genes.Fw = forward.Rv = reverse.Tm = melting temperature.Ta = annealing temperature used for PCR.For all reactions, Q5 polymerase was used except for Vamp5 cloning into pCMV-EGFP which was performed with Dream Taq.