A Folding Pathway-Dependent Score to Recognize Membrane Proteins

While various approaches exist to study protein localization, it is still a challenge to predict where proteins localize. Here, we consider a mechanistic viewpoint for membrane localization. Taking into account the steps for the folding pathway of α-helical membrane proteins and relating biophysical parameters to each of these steps, we create a score capable of predicting the propensity for membrane localization and call it FP3mem. This score is driven from the principal component analysis (PCA) of the biophysical parameters related to membrane localization. FP3mem allows us to rationalize the colocalization of a number of channel proteins with the Cav1.2 channel by their fewer propensities for membrane localization.


Introduction
Calcium influx plays a significant role in controlling a variety of cellular functions, and is mainly carried out by voltage-gated Ca 2+ channels [1,2]. Voltage-gated L-type Ca 2+ channels (LTCCs) are involved in the regulation of muscle contraction, hormone secretion, neural excitability, gene expression and neurotransmitter release. LTCC channels consist of four isoforms: Ca v 1.1, Ca v 1.2, Ca v 1.3 and Ca v 1.4, of which Ca v 1.2 and Ca v 1.3 are more distributed and localize in diverse tissues [3][4][5][6][7][8][9][10][11]. Ca v 1.2 makes up at least 75-80% of the LTCCs of the brain [12][13][14][15]. Many types of channels and receptors correlate functionally and spatially with Ca v channels.
Interestingly, the SK channels are specifically coupled to and activated by the Ca 2+ channels, including LTCCs [16,34]. During the depolarization periods, the LTCC channels mediate the Ca 2+ influx. The subsequent binding of calcium to calmodulin leads to the conformational change and opening of the SK channels that causes the efflux of potassium ions. Thus, a close physical and functional relationship exists between these two types of channels. Lu et al. were the first who indicated the coupling of LTCC and SK channels via cytoskeleton proteins [35]. They demonstrated that the SK2 and Ca v 1.2 or Ca v 1.3 channels are linked via an important component of the actin cytoskeleton, a-actinin2.
Existing methods for the scaling of membrane localization propensity use algorithms such as hidden Markov models (HMMs) and supported vector machines (SVMs) to recognize protein sequences that have the potential to sub-localize within the membrane [43]. However, these methods do not consider the causal folding pathway involved in recognition. We introduce the Folding Pathway-based Protein Propensity for membrane (FP 3 mem) score that is tightly associated with the tendency of proteins for being a-helical plasma membrane proteins. We use this score for interpreting the colocalization of the Ca v 1.2 channel with the rat SK (rSK) channels, and with the AMPAR and NMDAR receptors in Eubacteria and Archea taxa and in vertebrate classes including Fishes, Amphibia, Aves and Mammalia. Our data characterized the Ca v 1.2 as having a high propensity for localization within the plasma membrane together with other willing channels, which supports the hypothesis that the Ca v 1.2 is an anchor for the membrane proteins in its close proximity.
We calculate the thermodynamic, biophysical, and structural parameters DCp (change in specific heat), DCp(hyd) (change in hydration specific heat), DG(hyd) (change in Gibbs energy of hydration), DG(oct) (change in free energy of transfer from water to octanol), DG(wif) (change in free energy of transfer from water to POPC interface), DDG(a-helix), GG4Br, DH(hyd) (change in enthalpy of hydration) and kProt for the sequences obtained from the BLAST. We consider DCp, DG(hyd) and DH(hyd) as parameters characterizing protein properties in the water phase. DG(oct), DG(wif) and DDG(a-helix) have a role in the transition of proteins from the aqueous phase to the lipid phase. Finally, DCp(hyd), GG4Br, and kProt explain the behavior of proteins in the lipid phase.
We perform this calculation using the Hamid, Ali akbar, Maryam Data Analyser Machine (HAMDAM) software (freely available upon request). We calculate the hydration (hyd) parameters DCp(hyd), DG(hyd) and DH(hyd) of each sequence using the following equations [45][46][47]: Where DX refers to the change in X from the native state to the unfolded state, DF(hyd) represents each of the three parameters, j is the residue position, ASA stands for the accessible surface area, and n represents the total number of residues in each sequence. We obtain DCp from the following equation [48]: In order to calculate the DG(oct) [49], DG(wif) [50,51], DDG(ahelix) [52], GG4Br [53], and kProt [54] (all indicated with a ''W'' after the parameter name in figures), we employ the Sliding Window Recognizer (SWR) procedure [55]. This procedure reads the protein sequence within a window of a given number of residues and computes the parameters for the amino acids within that window, then slides forward one residue and repeats the process. We choose a window of 10 residues and calculate the parameter average for each window. Then we report the average of averages over all windows. In the case of the DDG(a-helix) parameter, although proline residues are considered helix breakers, their behavior differs in membrane proteins [56], which led us to consider this amino acid as a helix maker within this subset of proteins. For calculation of the GG4Br parameter, the number of GXXXG[I/V] motifs are counted in each window. We perform Anova and PCA analysis using the free software PSPP (http://www.gnu.org/software/pspp).
To produce alkaline phosphatase (APHO)18A3L, APHO16A5L and APHO14A7L sequences, three peptide constructs generate with the 18A3L, 16A5L and 14A7L amino acids compositions. In order to consider different sequences for each of the three amino acid compositions, we generate 2000 random sequences for each peptide and insert them to the corresponding site in alkaline phosphatase.

Results and Discussion
Other studies have previously pointed out the association of Ca v with SK channels in membranes [16,34,35]. Lu et al. demonstrated the connection of these two channels via the a-actinin protein [35]. On the other hand, the ion conductance through the membrane and the localization within the membrane of the SK channel was disrupted in Ca v null mutant mouse. The authors suggested that the Ca v channel could act as an anchor for the SK channel at the membrane. For integral membrane proteins containing transmembrane region(s), an essential requirement for functionality is localization within the membrane. Our goal is to quantify the tendency of Ca v and SK channels for localizing within the membrane.
White and Wimley mentioned that the folding process of membrane proteins could be divided into four steps, including partitioning, insertion, folding, and association [49]. We consider several biophysical parameters for each of these steps. Two parameters (DG(oct) and DG(wif)) were previously provided for two of these steps [49,51]. We consider other parameters related to each step in our computation ( Figure 1). As a control, we compute these parameters for the non-membrane protein alkaline phosphatase (APHO). We also calculate parameters for three varieties of alkaline phosphatase: APHO18A3L, APHO16A5L and APHO14A7L. These alkaline phosphatases localize in the E-coli inner membrane with the aid of three inserted peptides. Each peptide is composed of only two types of amino acids, Alanine (A) and Leucine (L). The experimentally determined tendency of alkaline phosphatase for localizing in the membrane is in direct correlation with the number of leucine residues in the inserted peptides [57,58].

The partitioning step
The ''partitioning'' step, the partitioning of proteins between lipid and water phases in the lipid-water interface, can be described by the DCp, DH(hyd), DG(hyd), and DG(wif) parameters. A membrane protein should not have a stable fold before insertion into the membrane. This property is specified by the protein heat capacity DCp. A more positive DCp indicates lower stability, and thus a lower propensity to be in the folded state in the water phase [55]. The DCp of the alkaline phosphatases (the reference proteins, ''Ref'') that contain inserted peptide is more positive than the DCp of the alkaline phosphatase (Figure 2A), representing a difference between the primitive forms of life (Archea and Eubacteria) and vertebrates. In vertebrates, the DCp is lower and thus the propensity for folding in the water phase is higher than in bacteria. Therefore, for prokaryotes, the partitioning parameter is more favorable for membrane localization when compared to vertebrates. Predictably, the a-actinin homologous proteins have fewer propensities for unfolding in water with respect to channels.
Another parameter involved in the partitioning step is the hydration enthalpy change DH(hyd). This parameter is a scale of the hydrophilic interaction of the unfolded state. A more negative DH(hyd) indicates a higher tendency of the protein to be in the unfolded state in water [45,46]. In the case of the reference proteins, because the three types of peptides are composed solely of leucine and alanine residues, the amount of hydrophilic interactions is reduced ( Figure 2B). The a-actinin homologous proteins have a dramatically higher tendency for unfolding in Mammalia than other organisms. For the rSK channel homologous proteins, the tendency of unfolding in water is higher in vertebrates than in bacteria. This tendency indicates that according to this partitioning parameter, in comparison to the prokaryotic protein, the vertebrate protein is far from folded state in water. This favors folding of the vertebrate protein in non-  aqueous environment. It may be presumed that there is a discrepancy between the DCpand DH(hyd)-derived partitioning parameters in each group of proteins(Panels 2-A and 2-B). However, as the DCp is generally determined by nonpolar residues and the DH(hyd) parameter by polar residues, the difference between the ratio of polar to nonpolar residues is the root of this apparent dissimilarity. Each of these parameters is weighted later.
An additional parameter affecting the partitioning step can be the hydration free energy change DG(hyd). A membrane protein must dehydrate before entering the plasma membrane. Since the DG(hyd) is a scale of the propensity for hydration, the more positive this parameter, the easier is the dehydration process [45,59]. The propensity of rSK channel homologous proteins for dehydration is less in vertebrates than in bacteria ( Figure 2C), indicating that the partitioning affected by this parameter happens Figure 3. The insertion step parameters. The changes in parameters describing the insertion stage are shown for bacterial and vertebrate proteins. The vertical axes are the mean value of the corresponding parameter average for the window that slides along the protein sequence. A) The variation of DG(oct) is illustrated, which is a whole residue hydrophobicity scale and a sign of the protein membrane propensity. B) The alpha helix propensity is a measure of the tendency to form alpha helix, which is characterized by DDG(a-helix more difficult in vertebrate classes than in Archea and Eubacteria. Among all studied proteins, mammalian a-actinin homologous proteins have the least propensity for dehydration ( Figure 2C).
The last parameter that we incorporate, influencing the partitioning of proteins between the water and lipid phases is the DG(wif), was introduced by White et al. [51]. This parameter represents the free energy change for the transfer of the protein in the unfolded state from the bulk water to the lipid-water interface. Nonpolar interactions with water and electrostatic interactions with the lipid head groups are the important interactions taking part in this process [49,60]. The more negative the DG(wif), the higher is the affinity of the unfolded state to enter the interface [61]. This parameter is small for the rSK homologous proteins in the taxa and classes other than Mammalia ( Figure 2D). Thus, the propensity for entering the lipid-water interface is low in mammalian rSK channels but not in the mammalian Ca v channels. However, when we consider all four parameters involved in the partitioning step, we cannot simply conclude which of the two types of channels is more efficient in this process.

The insertion step
The second step in the folding of membrane proteins is the ''insertion'' of the protein into the membrane. In this step, a protein enters the lipid phase from the lipid-water interface. Two parameters are associated with this step: the DG(oct) and the DDG(a-helix) [49,52,62]. In their study, the White group utilized octanol-saturated water as the lipid-like phase and introduced the DG(oct) parameter. More negative values of the DG(oct) correlate with a higher tendency of the protein for transferring to the lipid phase. For channels, this insertion parameter is more favorable in prokaryotes than in vertebrates ( Figure 3A). We infer that the transition from the interface to the lipid phase acts as an obstacle for vertebrate channels in the process of membrane localization.
In the interface, where the important step of insertion into the membrane takes place, formation of disordered structures is more probable than formation of helical structures [60,63]. Therefore, protein structures are more likely to become a-helical after insertion into the membrane because of the low membrane dielectric constant [49,64,65]. Because the hydrophobic core of the membrane has a high affinity for exposed hydrophobic groups of proteins [66], if an ahelix is stable in the water phase it would not form hydrophobic interactions with the membrane core. The DDG(a-helix) specifies the propensity for the formation of a stable a-helix structure in the water phase. More positive values of this parameter correlate with a lower propensity for a-helix formation in water, and thus are more favorable for the insertion of the protein into the membrane. For rSK channel homologous proteins, this parameter decreases in Mammalia ( Figure 3B) and disturbs the insertion step.
These parameters do not have the same effects in the localization of each evolutionary class of protein within the plasma membrane (Figures 2 and 3). While some parameters support the membrane localization of the homologous sequences of one protein, others impede this process. Therefore, all parameters should be weighted accordingly when calculating the membrane localization score.

Folding and association steps
Based on the four-step model, a protein obtains its final folded state in the membrane and, if necessary, gains its final function by Figure 4. The folding and association stages parameters. The changes in parameters describing the folding and association stages are shown for bacteria and vertebrates proteins. In the A and C panels, the vertical axes are the mean value of the corresponding parameter average for the window that slides along the protein sequence. A) The kProt changes are shown in this panel. kProt is a factor to elucidate a-helix membrane protein topology. B) The specific heat capacity of hydration for the channels, references and actinin is shown in this board. DCp(hyd) points out the hydrophobic patches. C) The occurrence number of the GG4Br motif in windows is defined as a scale for helix packing. Ref  association with other subunits. An important feature of alpha helical membrane protein folding is protein topology, which can be either single-span or multi-span, represented by the parameter kProt [54]. More negative kProt values correlate with higher protein tendency for becoming multi-span. Prokaryotes show lower tendency than vertebrates for multi-span topology ( Figure 4A). In the case of rSK channel homologous proteins, the tendency for becoming multi-span is less in mammals than in other vertebrates. The definition of kProt parameter is founded on the properties of membrane proteins. Therefore, this parameter does not provide information about the topology of non-membrane proteins including alkaline phosphatase and a-actinin homologous proteins ( Figure 4A).
Due to the low dielectric constant of the membrane, hydrogen bond rich structures such as a-helices are more probable in membrane proteins. In order to attain more stability and generate a specific function, the a-helices pack together in a manner such that a stable helix can stabilize an adjacent unstable helix [65,67,68]. The packing of a-helices is caused by two factors: superficial hydrophobic patches working as glue, and spatial fitting of the a-helices similar to lock and key model. The DCp(hyd) parameter represents superficial hydrophobic patches. A more positive DCp(hyd) indicates a more exposed hydrophobic patch [45]. The combined surface area of superficial hydrophobic patches, which is a scale for the association of membrane a-helices, is high for the rSK channel homologous proteins in comparison to the a-actinin homologous proteins ( Figure 4B). Furthermore, the surface area of superficial hydrophobic patches, and therefore the tendency for association indicated by this factor, decreases from prokaryote to vertebrate organisms, especially in the rSK channel homologous proteins ( Figure 4B). The second packing factor, the spatial fitting of membrane a-helices, can be quantified by measuring the frequency of the GXXXG[I/V] motif in the proteins using the GG4Br parameter [53,69]. For rSK channel homologous proteins, the frequency of the GXXXG[I/V] motif is much higher in Mammalia as compared to other classes ( Figure 4C). We conclude that in the mammalian rSK channels, the spatial fitting of a-helices plays a more significant role in packing than the superficial hydrophobic patches.

The FP 3 mem score
Not all parameters involved in protein localization within the plasma membrane change in the same functional direction over evolution (Figures 2, 3 and 4). Hence, we create a parameter that in addition to including all significant parameters previously discussed, is able to determine the tendency of localization of proteins within the membrane. This parameter can also be used as a scale for the comparison of membrane localization between proteins of interest. In order to fulfill this purpose, all parameters that participate in membrane localization should be weighted according to their contributions. We use principal component analysis (PCA) to obtain proper weights for each parameter [70]. We consider four principal components (PC) and utilize the proposed correlation coefficients for each parameter in each PC for constructing a factor representing the tendency of proteins for membrane localization. We name this factor the Folding Pathway-based Protein Propensity for mem- Specificity 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 PPV 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 Accuracy 0.9 0.8 0.9 0.8 0.9 0.9 0.9 0.9 0.9 0.9  To evaluate the FP 3 mem efficiency, we calculated the statistical parameters for the eSLDB [79]. This database contains the whole proteome of many eukaryotes. FP 3 mem identifies the transmembrane proteins of human, yeast and nematode from the database with the indicated efficiencies. doi:10.1371/journal.pone.0016778.t003 brane (FP 3 mem). The HAMDAM software calculates the FP 3 mem based on the following formula:

MCC
In this equation, the set of x i represents the nine parameters (i = 1 to 9) for the rSK and Ca v 1.2 channel homologous proteins. Here, a, b, c, and d correspond to the correlation coefficients of each parameter in PC1, PC2, PC3 and PC4 respectively. More details are supplied in Table S2. We calculate the FP 3 mem for all sequences in the TMA dataset [71], which contains 273 transmembrane a-helix-containing sequences from PDB structures, as well as for the sequences of a set of human soluble proteins, documented as cell fraction, obtained from Uniprot. These sets serve as references for membrane proteins and non-membrane proteins, respectively. In order to define a cutoff for FP 3 mem values that discriminates membrane proteins from non-membrane proteins, we evaluate the following parameters [72,73]

F1 score~2
Sensitivity|Accuracy SensitivityzAccuracy The FP 3 mem efficiency is compared with other methods on the same datasets that are common for predicting protein membrane localization. Sen and Spe are the abbreviation of sensitivity and specificity respectively. doi:10.1371/journal.pone.0016778.t004 Based on these results and the corresponding ROC curve ( Figure  S1), we set our cut-off at FP 3 mem = 231, and consider the proteins with FP 3 mems values higher than 231 as membrane proteins.
In order to evaluate the FP 3 mem capability for discriminating membrane proteins, we calculate the abovementioned statistical parameters for several datasets (Table S3). One of these datasets, the Moller dataset, includes three levels of trust to SWISSPROT transmembrane annotation: A, B, and C [74] ( Table 1). The accuracy of FP 3 mem is best for the B dataset, in which membrane localization of dataset members has been proven by experimental evidence. If we omit mitochondrial membrane proteins from the A dataset, in which protein structures have been determined by x-ray crystallography, the accuracy for the A dataset increases (Table 1). We conclude that in mitochondria, the process of membrane localization may not be similar to the four-step membrane protein folding process (Figure 1).
We take another dataset from the MPtopo database (Table 1). Based on whether the protein helix bundles are determined from three dimensional structure or by biochemical experimental methods, Jayasinghe et al. have divided the MPtopo database into 3D and 1D datasets respectively [75]. When we omit mitochondrial membrane proteins from the 3D dataset, the accuracy increases. The lipid context of mitochondrial membrane proteins is different from that of proteins in the plasma membrane [76]. This difference may cause a different pathway of membrane protein folding.
The last dataset that we consider is that of alpha, which is taken from the July 9, 2010 version of the PDBTM [77]. The non-redundant alpha dataset consists of all a-helical transmembrane proteins in the PDB. The calculated statistical factors are also near to one for this dataset, which confirms the accuracy of the FP 3 mem score in distinguishing a-helical transmembrane proteins.
We calculate FP 3 mem for the membrane proteins used in training the PSORTb 3.0 predictor algorithm [78] (Table 2). FP 3 mem has a high efficiency in recognizing prokaryotic membrane proteins. The eSLDB database annotates the eukaryotic proteomes of various organisms based on their cellular localizations [79]. We compute the FP 3 mem score for a group of human, nematode (Caenorhabditis elegans), and yeast (Sacharomyces cerevisiae) transmembrane proteins that have been experimentally annotated (Table 3). Statistical parameters again confirm the efficiency of FP 3 mem in distinguishing membrane proteins.
There are several methods for determining the localization of proteins in different regions of the cell, including the plasma membrane. Some commonly used methods include CELLO, which utilizes only the primary structure of proteins [80], pTARGET, which utilizes the amino acid and domain compositions [81], ProteomeAnalyst, which uses the homology of the sequences [82], WoLFPSORT, which makes use of the amino acid composition and the sequence homology [83], and MultiLoc, which employs signal sequences, motifs, and amino acid compositions [84]. Teasdale compared the capacity of these methods for determining the localization of proteins of two datasets, LOC2145 and SP3763 [85]. We calculate the sensitivities and specificities of FP 3 mem and other methods in distinguishing the membrane proteins of these two datasets (Table 4). FP 3 mem has the highest specificities as well as reasonable sensitivities. Binary predictors encounter a common problem of disadvantaged specificities despite good sensitivities [86]. However, FP 3 mem possesses high specificities. Our method, which is based on biophysical parameters of the membrane protein folding pathway, in this regard outperforms the existing methods.
In order to enter membrane, proteins pass different stages. The superiority of FP 3 mem with respect to other methods comes from the fact that we relate the folding pathway stages of membrane proteins to the representative physical parameters and do predictions with a mechanistical viewpoint which was absent in previous sequence-based methods.
The abovementioned databases and methods provide information about whether a protein localizes within the plasma membrane, but cannot resolve the membrane association (the propensities for membrane localization) of two transmembrane proteins. A probable reason for the co-localization of physically interacting membrane proteins is that a protein with a high membrane association can compensate the low membrane association of its partner. Hence, we examine the capacity of FP 3 mem in quantifying the membrane association of proteins using the alkaline phosphatase variants whose membrane associations were determined empirically [57,58]. The relationship between the FP 3 mem and membrane association of these variants is direct and non-linear ( Figure 6). Therefore, FP 3 mem is not only capable of recognizing a-helical transmembrane proteins with a high efficiency, but also can be a scale for membrane association propensity.

The membrane proteins co-localized with Ca v channel
We use the FP 3 mem score to study the rSKs-a-actinin-Ca v 1.2 protein system in various organisms (Figure 7). The FP 3 mems of aactinin homologous proteins are similar to non-membrane proteins in all evolutionary branches. FP 3 mem values are smaller than cut-off and equivalent to zero membrane association in all branches (Figures 5, 6). We consider rSK channel homologous proteins as membrane proteins only in prokaryotes. However, for Ca v 1.2 channel homologous proteins, in addition to prokaryotes, Fishes and to a less degree Mammalia show higher propensities toward localization in the membrane. This difference in the membrane association of mammalian rSK and Ca v 1.2 channels may be the reason for the observed fading of SK2 channel presence in the plasma membrane in the absence of Ca v [35] ( Figure S2). Because of their high FP 3 mem, we hypothesize that the Ca v 1.2 channels assist in the membrane localization of SK channels in Mammalia and Fishes.
Supporting data exist for the presence of Ca v 1.2 channel in complexes containing AMPAR or NMDAR glutamate receptors [38]. Contrary to rSK channels, none of these receptors depends directly upon the Ca 2+ influx for activation. Hence, we hypothesize that the reason they accompany the Ca v channel is to localize within the membrane, and that this membrane localization does not occur in the absence of Ca v . In order to test this hypothesis, we calculate FP 3 mem for the homologous proteins of these receptors in prokaryotes and vertebrates ( Figure 8). We observe that the FP 3 mem of Ca v 1.2 homologous proteins is higher than the FP 3 mem of the AMPAR and NMDAR homologous proteins, especially in Fishes and Mammalia. This observation supports the proposed hypothesis that these receptors couple the Ca v 1.2 channel with the aim of localizing within the membrane.

Conclusion
By creating a score (FP 3 mem) encompassing the biophysical parameters involved in the folding of a-helical transmembrane proteins, we provide a scale for measuring the propensity of protein sequences for localization within the plasma membrane. This parameter distinguishes membrane proteins from non-membrane proteins in various datasets, and powerfully competes with other methods. Furthermore, FP 3 mem quantifies a protein's propensity for becoming an a-helical transmembrane protein. We suggest the difference in this propensity as an underlying reason for the colocalization of SK channels, as well as AMDAR and NMDAR receptors, with the Ca v 1.2 calcium channel. Figure S1 The ROC curve. The sensitivity is plotted against 1specificity. The bold black filled circle is the cutoff point.