Designing, Optimizing, and Structure Prediction of Chimeric Protein CTB-IpaD for Expression in E.coli

1Young Researchers and Elite Club, Andimeshk Branch, Islamic Azad University, Andimeshk, Iran. 2Department of Cardiology, Faculty of Medicine, Dezful University of Medical Sciences, Dezful, Iran. 3Department of Drug Biotechnology, Faculty of Pharmacy, Tabriz University of Medical Sciences, Tabriz, Iran. 4Department of Medical Biotechnology, Faculty of Advanced Technologies in Medicine, Tehran University of medical Sciences, Tehran, Iran. 5Department of Microbiology, Islamic Azad University, Damghan Branch, Damghan, Iran 6Department of Marine Biology, Faculty of Marine Science, Chabahar Marine University, Chabahar, Iran.

Shigella and Escherichia are members of the family Enterobacteriaceae and are genetically similar 1 .The name of Shigella that was first isolated in 1894, derived from the Japanese bacteriologist Kiyoshi shiga 2 .This bacterium is an intracellular pathogen and as few as ten bacteria of the Shigella in water or food can cause the symptom to occur 3 .
The natural host of this organism is the human and mode of transmission of infection is fecal-oral route through infected food and water.Diarrhea with abdominal pain, generalized muscle aches and pains are the most common early symptoms.Shigellosis appear 24-48 hours after eating contaminated food, and the enterotoxin by the effect on intestinal epithelial cells causes loss of water and electrolytes 4 .
Each year, more than 164 million infections and 1.1 million deaths due to infection have been world-widely reported and 69% of the victims are children under 5 years of age 5 .According to several studies, prevalence rate have been reported different.According to studies in Tehran, the prevalence rate of shigellosis were 1% and 2% and this percentage was 5.2% in another study over 3 years in Qazvin [6][7][8] .
The mechanism of shigella entry into epithelial cells is dependent on a number of bacterial surface proteins.Important proteins are encoded by invasive plasmid.These proteins include IpaA, B, C, D and H protein. IpaB, C and D proteins proposed as Shigella vaccine candidate 9 .IpaD protein play an important role during invasion of the bacterium, and causes IpaB and IpaC proteins to come more quickly to the surface of the of the cell for pathogenesis and invasion [10][11] .
IpaD protein is one of the most important virulence factors of Shigella species.IpaD With molecular weight of 37 kDa, by using of IpaB preceded the onset of the invasion which existed on the surface of bacteria.IpaD has dumbbellshaped structure, is connected to the bacterial type III secretion system by the C-terminal region and IpaD directly interact with the environment by the N-terminal region [12][13] .
Many studies have shown that using a specific antibody against IpaD can prevent the transfer of bacteria into the host cell.Because of this reason, IpaD has become a major antigen in the vaccine [14][15][16] .
Today, CTB is known as a strong immuno adjuvant which is related to mucosal immunity 17- 18 , because of being an affective translocator for the systemic antibody secretion and mucosal antibody secretion for the conjugated antibody, it has been chemically and genetically altered.Dendritic cells activated by CTB has the ability to increase primary histocompatibility complex and induce the secretion of secondary molecules such as CD80 and CD86 on the surface of dendritic cells at the same time and also causes the secretion of proinflammatory cytokines [19][20][21] .For this reason, the chimeric protein consisted of CTB and IpaD, first CTB as an adjuvant causes to increases the immunogenicity against of IpaD, and second it can create immunity against both infectivity agents.The aim of this study is to construct a gene that consist of IpaD and CTB and subsequent bioinformatics feature.

Sequences, databases and construct design
IpaD and CTB with accession number of YP_406165 and AAC34728 were obtained from National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov), and saved as FASTA format.Signal peptides and sequences removed that are not an urgent necessity for immunological response.
It is required to maintain the protein structure and the prevention of domain interactions.The sequences fused together by (EAAAK) 4 hydrophobic linkers in order to find the best epitope, exposing chimeric antigen.In order to express in E. coli, codon usage optimization was done by using bioinformatics software, Optimum Gene TM Algorithm.
The nucleotide sequences were translated by use of the site (http://web.expasy.org/translate), and the antigenicity of the proteins is identified by the VaxiJen server (http:// www.darrenflower.info/VaxiJen), this server used an independent alignment method to predict physical and chemical properties of proteins.The combination of these genes constructed based on highest immunogenicity scores.

The physico-chemical parameters of proteins
For the analysis of the physico-chemical parameters of proteins, sequences were analyzed via ProtParam (http://web.expasy.org/protparam/)from ExPASy, (Swiss Institute of Bioinformatics), in order to gain information about protein properties such as molecular weight, amino acid composition, the number of positively and negatively charged amino acids, the biochemical characteristics of protein, half-life of a protein, and finally protein instability index(PII).

Prediction of RNA secondary structure
The secondary structure of the mRNA was evaluated by RNAalifold program after codon optimization.The program used multiple sequence alignment as input information for the analysis of patterns keep changing, and a scoring matrix is made for the minimum free energy and synchronous information changes.The dynamic programming is used to select a structure has a minimal energy for a complete set of RNA sequences alignment.

Prediction of protein secondary structure
Protein secondary structure prediction was performed by SOPMA.The server uses a selfoptimization method (SOPM) in order to improve the prediction.

Prediction of protein tertiary structure
Protein tertiary structure prediction was performed using I-TASSER server online (http:// zhanglab.ccmb.med.umich.edu/I-TASSER/)and the work was done on the basis of using combination of threading and Ab-initio technique.

Prediction of antigenic B-cell epitopes
The server ABCpred (http://www.imtech.res.in/raghava/abcpred) was used to identify and predict B cell epitopes of chimeric protein sequences.In this server, prediction was performed using recurrent neural networks (RNN).

Designing and optimizing
The optimization was performed in two steps; involving optimization of codon bias and increasing the GC content, these result in enhancing the CAI from 22 to 100%, that showing a significant increase in codon usage as a function of the level of gene expression and It is obvious that an improvement in codon usage frequency will improve protein expression levels in E. coli (Fig. 1) The results of the Blast-X sequence analysis confirmed that gene sequence optimization result no changes in chimeric protein sequence.

Physicochemical properties
Biochemical characterization of the protein was analyzed by using the program ProtParam.The calculated molecular weight and isoelectric point of the target protein were respectively 25,647 Daltons and 6.69 with molecular formula C1126H1818N312O356S7.The instability index calculated in the In-vitro, and proteins with instability index less than 40 are stable proteins.The instability index of the protein was computed as 36.71 and stable, due to the presence of a conserved amino acid (threonine), in vivo half-life of protein in E. coli was measured more than 10 hours.The aliphatic index defined as the relative volume of a protein occupied by aliphatic side chains and is a measure of the thermo-stability of the protein and aliphatic index for the desired protein was 85.41, and also antigenicity prediction by Vaxijen to determine antigenicity of the protein and the value was 0.5011.

Prediction of RNA secondary structure
Prediction of RNA secondary structure of 5' end in mRNA was revealed after optimization in chimeric gene (fig.2).The minimum energy for this structure was calculated to be -421 kcal/mol.

Secondary structure prediction of chimeric protein
Chimeric protein secondary structure was predicted by using SOPMA program showed that the chimeric protein has 67.5% a-helix, 9.5% extended strand, 4.3% β-turn, and 18.6% random coil.

3D structure prediction of chimeric protein
Three-dimensional (3D) structure of

Linear B cell epitope prediction
The presence of B cell epitopes are often the determining factor for the immunogenicity of a protein.Linear B cells epitopes in the chimeric protein were predicted by ABCpred server and the result was depicted in Table 2.

DISCUSSION
Despite the great efforts made in the past, an effective vaccine is not available for Shigella disease.Children in progress communities being the highest risk group for the disease.It previously reported that Shigella proteins such as IpaB and IpaD are extremely immunogenic and effective against Shigellosis 22 .
Taking this into consideration, in order to identify IpaD epitopes, dominant available IpaD epitopes depend on the the function of this area.
As if its activity becomes inactive, the ability of Shigella invasion will be suppressed.The results show that IpaD and specially the N-terminal of the protein are the important factors for the entry of Shigella into host cells.Altogether, the IpaD Nterminal region is vaccine candidate against Shigellosis.
In many studies, the immuno-adjuvant properties of CTB from bacterium Vibrio cholerae have been found to boost the immune response 19- 20 .Recently to design vaccine candidate, chimeric proteins has been the focus of many research.A chimeric protein comprises subunits, a flexible linker, and sequences with adjuvant properties can improve immunogenic properties of recombinant proteins.These two subunits of the protein connected by appropriate linker .The hydrophobic residues of the linkers (EAAAK) 4 mediate electrostatic interactions between the glutamic acid and lysine and thereby forming the salt bridge with respect to each other, the chemical nature of neighboring residues and the positioning of the charged groups of the residues form a stable helix structure and prevents domain interaction of chimeric protein 23 .In the study by Chen et al, the number of linker repeat are between 1 and 5 and it has been shown by analysis that repeat 4 or 5 are particularly critical, and can prevents domain interaction of protein 24 .
Since IpaD and CTB gene sequences are rich in rare codons, and the bases of adenine and thymine, the expression of genes are not likely to be in BL21 (DE3), and Rosetta (DE3) that is a strain derivative of BL21 and designed to enhance the expression of proteins which contain rare codons at high frequency.For this purpose, sequences of the chimeric gene was optimized based on codon usage pattern for high expression in E. coli.In order to investigate the stability of the RNA structure, pseudo loop structure and ΔG were evaluated for each RNA structure.Also the prediction of secondary structures and three-dimensional structure were evaluated.Finally, for the prediction of linear epitopes were measured using the protein sequence as input.

CONCLUSION
With the Bioinformatics tools, a good construct was designed that expected to be largely expressed in E. coli host.The protein derive from this construct could be a candidate vaccine against Shigelosis.

Table 1 .
Results of sequence optimization

Table 2 .
Linear B cell epitopes prediction of chimeric protein by ABCpred server