Assessment of Prokaryotic Signal Peptides for Secretion of Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) in E. coli: An in silico Approach

Extracellular secretion of recombinant proteins in E. coli has various advantages, including proper folding and biological activity of proteins, lack of inclusion body, and simple steps of recombinant protein purification. But selection of a suitable signal peptide for secretion of recombinant proteins is performed mainly by trial and error which is a time-consuming and costly process. The aim of this study is in silico evaluation of common signal peptides to select the appropriate signal peptides for secretion of TRAIL protein in E. coli. SignalP server was used to predict the potential of a TRAIL-binding signal peptide and identify its correct cleavage by signal peptidase enzyme. The physicochemical properties of signal peptides and the solubility of TRAIL bound to the signal peptides were calculated by means of ProtParam and SOLpro, respectively. Results showed that of the 26 signal peptides studied, 18 signal peptides had this ability. Among these signal peptides, SfmC, OmpC, DsbA, and PhoA were good candidates for secretion of TRAIL in E. coli.

Tumor necrosis factor (TNF) related apoptosis inducing ligand (TRAIL) is a type II membrane protein belonging to the TNF superfamily.It has 243 amino acids and becomes soluble after enzymatic digestion 1 .The potential to exclusively induce apoptosis in cancer cells and not affecting healthy cells have turned this ligand to a suitable option in clinical studies against various cancers 2 .
Despite the use of different hosts such as Pichia pastoris yeast 3 , insect cells Sf9 4 , and CHO cells 5 , E. coli is the best expression system for producing TRAIL, because the protein does not undergo post-translation modifications, such as glycosylation, and has not disulfide bonds.Compared to other expression systems, E. coli enjoys a number of advantages including safety, simplicity, low costs, as well as known genetics and biochemistry 6 .However, despite the various benefits of E. coli to produce TRAIL, expression of proteins in the cytoplasm is associated with problems such as improper folding, formation of inclusion body, difficult purification of desired protein, and proteolytic degradation 7 .These can be overcome by transferring the expressed protein to the periplasmic space with a suitable signal peptide.Secretory expression of protein in E. coli has several advantages including correct folding, and results in proper biological activity due to presence of different chaperons in the periplasmic space, reduced protein purification steps and complexity, and reduced proteolytic degradation of proteins of interest 8,9 .
Signal peptide is a sequence of 5-30 amino acids in N-terminal of secretory proteins and leads to extracellular or intra-compartment transfer of protein through interaction with secretory factors in biological membranes (10).Despite differences in signal peptides between prokaryotic and eukaryotic cells, a common structure is seen in all signal peptides including a positively charged amino acids region in N-terminal (n-region), a central region with hydrophobic amino acids (hregion), and a neutral and polar amino acids region in C-terminal (c-region) 11 .It seems that n and h regions interact with negative charge and non-polar regions of membrane phospholipids, respectively.C-region has a recognition sequence for signal peptidase which cleavage the signal peptide at Nterminal of the secretory protein after transferring it across the membrane 12 .
Selection of a suitable signal peptide is an important prerequisite for efficient secretion of recombinant proteins.However, no certain rules exist so far regarding the selection of an appropriate signal peptide which could lead to efficient secretion of recombinant proteins into the extracellular space.Therefore, selection of signal peptides for secretion of different proteins in E. coli is performed mostly by trial and error 9 .Signal peptides OmpA, PhoA, and PelB are among the most commonly used signal peptides for secretion proteins in E. coli 13 .Use of trial and error method for selecting a suitable signal peptide can be timeconsuming and costly and therefore it is necessary to use alternative methods.As an appropriate method, one can use valid and suitable servers for in silico analysis of physicochemical properties of different signal peptides connected to a target protein 14 .The present study aimed at evaluating and comparing important features of known signal peptides in order to find theoretically appropriate signal peptides for production of secretory TRAIL in E. coli.

METHODOLOGY
At first, the amino acid sequence of 26 known signal peptides commonly used for protein secretion in E. coli was extracted from ExPasy database (www.expasy.com)(Table 1).Then the amino acid sequence derived from each signal peptide was added to N-terminal of TRAIL amino acid sequence and various characteristics such as signal peptide cleavage site, signal peptide physicochemical properties, and the protein solubility were evaluated.Finally, the results obtained from signal peptides analysis were compared and the most appropriate signal peptide for secretory production of TRAIL in E. coli was selected.

Prediction of signal peptide cleavage site
Despite multiple computational tools for predicting a signal peptide and assessment of its cleavage site by signal peptidase, SignalP is one of the best and most reliable tools in this regard with 85% accuracy in prediction of a potential signal peptide.It can predict the presence of signal peptides and signal peptidase cleavage site based on the neural network method using SignalP server (http://www.cbs.dtu.dk/services/SignalP/) 15 .

Analysis of physicochemical properties of signal peptides
Using the known ProtParam tool (http:// web.expasy.org/protparam/), the physicochemical properties of signal peptide sequences including amino acid composition, positive or negative charge of amino acids, molecular weight, pI, aliphatic index, grand average of hydropathy (GRAVY), and instability index were analyzed in silico 16 .

Protein solubility prediction
Using Server SOLpro (http:// scratch.proteomics.ics.uci.edu/), the solubility of protein during overexpression was predicted with an overall accuracy of more than 74% in E. coli 17 .

Selection of potential signal peptides
SignalP server was used to evaluate the possible function of a specific sequence as a signal peptide in proximity of TRAIL sequence.Table 2 depicts the output of the analysis by SignalP and includes scores of C, S, Y, S-mean, and D, the site of the signal peptidase effect, and different regions of signal peptide (c, h, n).The potential signal peptides were selected by evaluating the discriminating scores with a cutoff of higher than 0.5.The signal peptides Bla, gIII, LPP, npr, OmpT, TolB, TorA, and TorT were excluded due to D-score lower than 0.5 and further analyses were performed on the remaining signal peptides.

Analysis of physicochemical properties of signal peptides
Different physicochemical properties of signal peptides were evaluated lonely and in connected with TRAIL amino acid sequence by using ProtParam computational tool (Table 3).The amount of positive charge in n-region was +2 in most signal peptides, while it was +1 in PelB signal peptide, +3 in LivK, MalE, and Pac signal peptides, and +4 in Endoxylanase signal peptide.The signal peptide hydrophobicity was assessed using GRAVY and aliphatic index parameters.GRAVY was obtained from the mean hydropathy index of signal peptides amino acids.As seen in Table 3, SfmC, OmpC, and DsbA signal peptides had the highest and LivK and Pac signal peptides had the lowest GRAVY.Aliphatic index is the relative size of aliphatic side chains which was the greatest in SfmC and OmpC signal peptides and LivK and the lowest in Endoxylanase signal peptides.Stability of TRAIL protein connected to signal peptide was another physicochemical analysis which was evaluated with the instability index, and amounts of less than 40 and higher than 40 showed the stability and instability of the expressed protein, respectively.As is clear from the results, the expressed TRAIL connected to all signal peptides was stable.

Solubility of various expressed proteins
There are different servers for prediction of protein solubility during overexpression, and SOLpro is one of the best and most reliable of them.The obtained data indicated that overexpression of TRAIL protein connected to each signal peptide can lead to the production of an insoluble protein (Table 3).

DISCUSSION AND CONCLUSION
The use of targeted therapy nowadays has become to an interesting method of research in the field of cancer treatment 18 .TRAIL ligand is an important factor in this field due to its exclusive impact in inducing apoptosis of cancer cells, and many studies are performing in this regard 19 .
Although E. coli is the best host for production of this protein due to its relatively simple structure, its production within the cell results in an insoluble form and in order to have biological effects, it requires different refolding steps 20 .Transfer of the protein into the periplasmic space is an important way to reduce this problem.However, secretion of a protein into the periplasmic space is a highly coordinated process among different cell components.Selection of an appropriate signal peptide is one of the important factors in this regard.Since random selection of a signal peptide is timeconsuming and costly, it seems reasonable to use appropriate methods for prediction of the process in order to save time and money.The use of bioinformatics powerful tools to predict a physiological phenomenon has become a widely used method in different fields of biology 21 .Accordingly, different servers were used in this study to evaluate the physicochemical properties of 26 common signal peptides and their impact on the secretion of TRAIL in E. coli.The signal peptides were evaluated based on characteristics such as net positive charge, theoretical pI, molecular weight, hydrophobicity, as well as its stability in connection with TRAIL.Prediction of the exact site of cleavage by signal peptidase enzyme is important for evaluation of secretive structure.Using SignalP server, it was found that Bla, gIII, LPP, npr, OmpT, TolB, and TorA signal peptides bound to TRAIL sequence are not cut properly and thus they were excluded from the study.The structure of signal peptides has an enormous effect on their efficiency 22 .Positively charged amino acids are important elements in n-region of signal peptides, so that replacing these amino acids with negatively charged or uncharged amino acids decreases sharply the performance of signal peptides in secretion of the desired protein.It seems that these amino acids are necessary for interaction of the signal peptide with negatively charged phospholipids of the cell membrane.Most signal peptides in the present study had two positive charges in n-region.Hydrophobicity of signal peptides is another important factor in their performance.The higher the hydrophobicity and the longer the length of h-region, the greater will be the efficiency of signal peptide.These hydrophobic amino acids are required for interaction of signal peptides with the hydrophobic area of the membrane.The hydrophobicity of signal peptides was evaluated by aliphatic index and GRAVY, and according to Table 3, the highest hydrophobicity was seen in SfmC, OmpC, and DsbA signal peptides.Proper and high secretion of a protein greatly depends on efficiency of cleavage of signal peptide by signal peptidase.Presence of AXA motif in c-region immediately upstream of signal peptide cleavage site is crucial for cutting of signal peptides.In this motif, there are small and neutral amino acids such as alanine, glycine, and serine in positions 1 and 3, but position 2 includes bulky amino acids.As in Table 2, this motif is seen in all signal peptides.However, among various physicochemical factors associated with signal peptides, positive charge in n-region and length and amount of hydrophobicity of c-region are the most important factors 22 .The positive charge was +2 among most of signal peptides and D-score had no significant difference between them.Therefore, length and hydrophobicity of hregion is the basis of selection of a suitable signal peptide.Accordingly, SfmC, OmpC, DsbA, and PhoA signal peptides had the highest impact and Endoxylanase, LivK, L-Asparaginase, and Pac signal peptides had the lowest effect on the secretion of TRAIL.Since, as far as we know, there were no studies to evaluate TRAIL secretion with signal peptides such as SfmC, OmpC, and DsbA, these signal peptides can be good candidates for such studies.The results of this study are consistent with in silico studies regarding secretion of growth hormone in E. coli 23 .Although the use of signal peptide PhoA proposed in this study is consistent with the performed empirical test 24 , other tests with OmpA signal peptide, which was not a candidate in this study, had also good results 25 .Thus, despite the advantages of bioinformatics tools for selecting a signal peptide, practical evaluation of the selected signal peptides is also required.

Table 1 .
Amino acid sequence of signal peptides

Table 2 .
Sequence analysis of signal peptides by SignalP

Table 3 .
Signal peptides physicochemical properties identified with ProtParam and SOLpro