In silico comparative proteomic analysis of enzymes involved in fatty acid biosynthesis in castor bean ( Ricinus communis ) and soybean ( Glycine max )

* Correspondence: qudsia@ciitsahiwal.edu.pk


Introduction
Plant oils are very important in our lives.Fatty acids (FA) play a vital role in human nutrition and therapeutics.Besides having nutritional importance, FA are also important for use in different industrial products, e.g., soaps, detergents, lubricants, varnishes, paints, ink, and cosmetics.Thus, an ever expanding and lucrative market for nutritional and industrial uses is available for oilseed crops.Vegetable oil utilization is expected to be doubled by 2040 (FAO, 2015).Biodiesel is an important product of plant oils.Alternate energy resources, like biodiesel, are now of increasing importance (Haas, 2005).More than 100 years ago, a brilliant inventor, Dr Rudolph Diesel, designed a diesel engine to run on plant oil.He used peanut oil as a fuel in his engines at the Paris exhibition (Demirbas, 2003).Biodiesel obtained from different vegetable oils, i.e. rapeseed, soybean, and sunflower, is very important for many reasons.It can replace diesel oil in internal combustion engines and boilers without major adjustments and with nominal decrease in performance, negligible sulfate emissions, and less emission of pollutants compared with that of diesel oil.Biodiesel is made from renewable biomass mainly by alkali-catalyzed transesterification of triacylglycerols (TAGs) from plant oils (Ma and Hanna, 1999).Efforts have been made in different countries for the introduction and promotion of biodiesel (Carraretto et al., 2004).
Initially, plant oil fuels were not popular among the public due to their high prices as compared to petroleum fuel.However, an increase in petroleum prices and uncertainties about its availability increased the need for biodiesel (Demirbas, 2002).Vegetable oils are also known as triglycerides because of having 98% triglycerides out of total fatty acid contents (Barnwa and Sharma, 2005).The use of biodiesel has grown dramatically during the last few years.Therefore, it is desirable to develop oilseed plants with an increased oil content to cope with the tremendous demand for seed oils.
Five fatty acids palmitate, linoleate, stearate, linolenate, and oleate, are the main constituents of plant oils.The Abstract: Plant oils are very important for domestic and industrial use.Biodiesel can be obtained from plant seed oil.Biodiesel is currently popular and in demand due to the high cost of petroleum and to avoid pollution.It is time to increase plant seed oil production and conduct research to find ways of enhancing its production.We studied two species of oil seed plants, i.e.Ricinus communis and Glycine max, with varying amounts of oil content.Proteins from six categories of enzymes involved in fatty acid biosynthesis were selected for study.The 3D structures were predicted using different structure prediction tools.The structures were validated and selected on the basis of quality factors.The pairs of proteins were compared by pairwise sequence alignment using Clustal W and structural superposition by Chimera Matchmaker.The physiochemical properties were studied by PROTPARAM.In R. communis, eighteen structures were selected from I Tasser, thirteen from Swiss Model, and two from Raptorx.In G. max, twenty structures were selected from I Tasser, nine from Swiss Model, and four from Raptorx.The highest percent identity in pairwise sequence alignment was observed between the two species for biotin carboxylase.Biotin carrier was least identical between these two species.Monogalactosyldiacylglycerol desaturase (FAD5) showed the highest percentage of structural identity between the two species while ER phosphatidate phosphate was least identical.Eight proteins in both species had an instability index below 40.Eight proteins in R. communis and five in G. max were acidic in nature.Fourteen proteins in R. communis and seventeen in G. max were hydrophobic.The aliphatic index of all proteins was above 50 with which conferes good thermal stability.physiochemical properties of these fatty acids are different due to variations in the number of double bonds and acyl chain length.Different biochemical pathways involved in plant oil biosynthesis consist of rate limiting enzyme systems (Bates et al., 2013).
Acetyl CoA carboxylase (ACCase) plays a crucial role in regulation of fatty acid synthesis.It is involved in the conversion of acetyl Co-A to malonyl Co-A, the most important step in fatty acid biosynthesis.Glycerol-3phosphate acyltransferase (GPAT) is an important enzyme.It acylates the sn-1 position in the glycerol backbone and produces lysophosphatidic acid (LPA).Another important enzyme in the pathway is lysophosphatidic acid acyltransferase (LPAAT), which acylates the sn-2 position to synthesize phosphatidic acid (PA), which in turn is converted into diacylglycerol (DAG) by the enzyme phosphatidic acid phosphatase (PAP).
An acyltransferase, diacylglycerol acyltransferase (DGAT), converts DAG to TAG by using acyl-CoA as a substrate (Thelen and Ohlrogge, 2002;Bates et al., 2013).Fatty acids are mainly stored as TAGs in seeds.In addition to TAGs, fatty acids are also present as wax esters e.g. in jojoba fruit (Simmondsia chinensis).Different enzymes are involved in the TAG synthesis pathway.
Plant breeders and metabolic engineers have been trying to enhance seed oil production for many years.It has been observed that even a small increase in plant seed oil yield per hectare increases the crop's value by more than 1 billion USD.How can we achieve this?Manipulation of biosynthetic pathways offers a number of exciting opportunities for plant biologists to redesign plant metabolism toward production of specific enzymes and coenzymes.Information about the role and function of these enzymes can help scientists to genetically modify oilseed crops to obtain a high quantity of seed oil.Targeting ACCase in plastids resulted in a 5% increase in oil content in rapeseed (Roesler et al., 1997).Overexpression of genes DGAT, DGAT2A, and DGAT1-2 in Arabidopsis, soybean, and maize increased their overall oil contents (Zheng et al., 2008).
There are many reports of gene manipulation to increase seed oil (Weselake et al., 2009).Few proteinlevel studies of enzymes involved in plant oil biosynthesis pathways have been conducted.Most of the proteins involved in this pathway are uncharacterized and their 3D structures are not available.There are millions of protein sequences available with respect to limited protein structure.Structure prediction is of primary importance in protein-level studies of FA biosynthesis.By predicting 3D structures, we can overcome this gap.Protein structure prediction involves generating 3D models of proteins from amino acid sequences using computer algorithms.After structure generation, human involvement is also required for the selection of the best structure on the basis of quality factors (Murzin and Bateman, 2001;Ginalski and Rychlewski, 2003).FA seed oil composition varies significantly between and within species.The fuel properties of biodiesel derived from a mixture of fatty acids depend on its composition in seed oil.By changing the fatty acid profile, through genetic engineering, we can improve the fuel properties of biodiesel (Harris, 2012).
G. max is widely used in biodiesel production and has high oleic acid contents (22%-34%), but its overall oil contents are low (12%-20%) (Sangwan et al., 1986).R. communis is a wild plant and can be grown in harsh environments (Jumat et al., 2010).It has high oil contents, i.e. 45%-50% (Ramos et al., 1984), but is still unpopular in biodiesel production.We selected these two plant species due to their contrasting differences in oil contents and potential use for biodiesel production.In the present study we compared six categories of enzymes involved in the fatty acid biosynthesis pathway.Comparison was performed on amino acid sequences and protein structure levels to find the variations and similarities between these two plant species on the protein level.

Materials and methods
Thirty-three enzymes involved in fatty acid biosynthesis from six different categories, i.e.ACCase, elongase, desaturase, thioesterase, TAG synthase, and oil body proteins, in Ricinus communis and Glycine max were selected for this study.Different bioinformatics tools and databases were used, from amino acid sequence retrieval to protein structure stability analysis.

Sequence retrieval
Primary amino acid sequences of enzymes involved in fatty acid biosynthesis and storage in R. communis and G. max were retrieved from the plant database Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html).

Protein 3D structure prediction
The protein structures were predicted from four wellknown structure prediction tools: I Tasser, Swiss Model, Modeller, and Raptorx.I Tasser is a freely available online protein structure modeling tool (https://zhanglab.ccmb.med.umich.edu/I-TASSER/).Swiss Model is an automated tool for generating 3D models of proteins from amino acid sequences through homology modeling.It is userfriendly and accessible via the ExPASy web server (Arnold et al., 2006).Raptorx predict 3D models of proteins by homology modeling.The Raptorx web server is available at http://raptorx.uchicago.edu(Peng and Xu, 2011).Modeller is a computer program used for comparative protein modeling.Input for Modeller is the alignment of a sequence to be modeled with the template, pdb file of the template, and simple script file.Without any human interaction, Modeller generates a model of the target protein containing all nonhydrogen atoms (Eswar et al., 2008).

Protein structure refinement and validation
The best structures from the predicted models were selected with the help of structure evaluation tools, i.e.Rampage, Verify 3D, and ERRAT.ERRAT was used to evaluate protein structures.It is freely available on the UCLA-DOE server (Mahgoub and Bolad, 2013).Rampage (Ramachandran plot) displays the main chain conformation angles of the polypeptide chain (Gopalakrishnan et al., 2007).Verify 3D in CASP was used to identify well and poorly folded parts (Kosinski et al., 2003).

Pairwise sequence alignment
Pairs of the sequences of the same enzyme from R. communis and G. max were aligned using Clustal W. Percent similarity and percent identity were noted to find the similarities and differences between the pairs.

Structural comparison
The predicted structures of the same enzyme from R. communis and G. max were superposed to obtain structural alignment by using UCSF Chimera 1.11 Matchmaker.The files of superposed pairs were retrieved and saved as pdb extensions.Percent identity and root mean square deviation (RMSD) values for each superposed pair were also calculated to determine the similarities and differences between the pairs.UCSF Chimera is a molecular modeling package that provides interactive visualization and analysis of density maps.This software contains a large collection of interactive methods (Goddard et al., 2007).

Comparison of physiochemical properties
The instability index based on the weight value was calculated using the ExPASy ProtParam server.ProtParam is a tool that provides information about the physiochemical properties of protein sequences.The computed parameters include the molecular weight, theoretical isoelectric point (pI), amino acid composition, atomic composition, instability index, and molecular formula (Gasteiger et al., 2005).

Protein 3D structure prediction
Protein structures for 66 proteins of six categories of enzymes used in fatty acid biosynthesis from Ricinus communis and Glycine max were predicted using Modeller, Raptorx, I Tasser, and Swiss Model.The structures with high quality factors were selected (Tables 1-6).Thirty-three models were predicted for R. communis.Eighteen models were selected from I Tasser, thirteen from Swiss Model, and only two from Raptorx (Table 7).Thirty-three protein structure models were predicted for G. max.Twenty were selected from I Tasser, nine from Swiss Model, and four from Raptorx (Table 8).

Pairwise sequence alignment
All six enzymes were compared by pairwise sequence alignment (Table 9).Percent identity and percent similarity were observed.In ACCase the highest percentage of identity (85.0%) and percentage of similarity (89.2%) were found for biotin carboxylase.Lowest percent identity (11.5%) and percent similarity (20.4%) were found for biotin carrier.An overall good rate of similarity was found for elongase enzyme.Highest percent identity (76.1%) and percent similarity (83.6%) were observed for 3-ketoacyl-Co-A reductase.Lowest identity (32.6%) and similarity (47.11%) were observed for beta-ketoacyl ACP synthase II.The remainder showed more than 50% identity, except beta-ketoacyl ACP synthase I.For desaturase, eight pairs displayed good similarity, ranging from 45.3% to 76.3%, except stearoyl-ACP desaturase.Two pairs of amino acid sequences for thioesterase were compared.
Percent identity between the two species for palmitoyl-ACP thioesterase (FatB) was 78.3% while for acyl-ACP thioesterase (FatA) it was 12.7%.Eight pairs of amino acid sequences were compared for TAG synthesis enzymes.Four pairs (LPAT, DGAT 2, diacylglycerol cholinephosphotransferase, and DGAT 1) had percent identity of 53%-68.3%and that for the remaining four pairs was less than 12%.Oil body protein had percent similarity of 68.5% and 81.8% for DGD1.

Structural comparison
Structural comparison of six enzymes was performed by Chimera Matchmaker.The structures of six enzymes of R. communis and G. max were superposed to detect variations and similarities at the structural level.RMSD values were also calculated (Table 10).Three pairs of enzymes from the ACCase category were compared.Only one enzyme, biotin carboxylase, had 80.15% structural identity.Biotin carrier was least identical (2.6%) between the two species.Ten pairs of structures were superposed for the elongase category and an overall good percent identity was recorded.Seven pairs showed a percent identity that ranged between 79.5% and 89.6%.The remaining three pairs were 54.2% to 72.33% identical.In case of desaturase, the highest percent identity (82.22%) was found between the monogalactosyldiacylglycerol desaturase (FAD5) of these two species.The structures of linoleate desaturase (FAD7) were least (1.55%) identical.Two pairs of stearoyl-ACP desaturase were 60% to 62% identical, while the remaining pairs were less than 36% structurally identical.Among thioesterase pairs, the highest percent identity (46.39%) was found for palmitoyl-ACP thioesterase (FatB), while acyl-ACP thioesterase (FatA) showed very poor percent identity (1.40%).Overall percent identity between the structures of the TAG synthesis category was not very good.The highest value (47.86%) was recorded for diacylglycerol acyltransferase (DGAT 2), followed by 36.6%

Ricinus communis Glycine max
Table 1.Predicted structures of ACCase enzyme group involved in fatty acid biosynthesis in Ricinus communis and Glycine max.

Ricinus communis Glycine max
Table 2. Predicted structures of elongase enzyme group involved in fatty acid biosynthesis in Ricinus communis and Glycine max.

Comparison of physiochemical properties
Physiochemical properties (molecular weight, atomic composition, amino acid composition, pI value, formula) of enzymes were computed with ProtParam (Tables 11 and  12).The instability index of the proteins was calculated to check for those less than 40, which is a standard for proteins to be stable (Roy et al., 2011).In the ACCase enzyme category, two proteins of R. communis were stable and one was unstable, while in G. max only one protein out of three was stable.In the elongase enzyme category, seven proteins of R. communis and four of G. max were stable.
For the desaturase category, five proteins in R. communis and seven in G. max were stable.Two pairs of thioesterase were compared.All proteins in R. communis and one in G. max were unstable.In the TAG synthesis category, four proteins in R. communis and three in G. max were stable.
In oil body proteins, all proteins from R. communis were unstable and only one was unstable in G. max.
The pI is the pH value when charges are present on the surface of a protein but the net charge is zero.The pI value

Ricinus communis Glycine max
Table 4. Predicted structures of thioesterase enzyme group involved in fatty acid biosynthesis in Ricinus communis and Glycine max.
of 7 is neutral.Proteins having pI values below 7 are acidic and above 7 are basic.In the ACCase category, biotin carrier (5.78) in R. communis was acidic in nature.In the elongase category, beta-ketoacyl ACP synthase III (6.46) of R. communis was acidic while beta-ketoacyl ACP synthase I (6.14) and beta-ketoacyl ACP synthase III (5.75) of G. max were acidic.In the desaturase category, stearoyl-ACP desaturase was acidic in both species.Acyl-ACP thioesterase (FatA) (6.37) and palmitoyl-ACP thioesterase (FatB) (6.56) were acidic in R. communis but in G. max only palmitoyl-ACP thioesterase (FatB) (6.48) was acidic.Lysophosphatidic acid acyltransferase (LPAT) of the TAG synthesis category was acidic (5.56) in R. communis.No acidic protein was found in oil body proteins.GRAVY shows the hydropathicity value of proteins.It can be calculated as the sum of hydropathy values of all amino acids divided by the number of residues in the sequence.Its value varies in a range of ±2.A low GRAVY value means that the protein is more hydrophilic in nature and vice versa (Kyte and Doolittle, 1983).A negative

Ricinus communis Glycine max
Table 5. Predicted structures of TAG synthase enzyme group involved in fatty acid biosynthesis in Ricinus communis and Glycine max.score shows that it trends towards hydrophilicity, while a positive score represents hydrophobicity (Roy et al., 2011).
In the ACCase category of enzymes, for both species, all proteins were hydrophilic.For the elongase category, five proteins of R. communis showed hydrophobicity ranging from 0.014 to 0.268 and hydrophilicity from -0.023 to -0.227.In G. max, eight out of ten proteins were hydrophobic (0.026 to 0.144).Seven proteins out of eight were hydrophilic in R. communis and eight were hydrophilic in G. max for the desaturase category of enzymes.Thioesterase was hydrophilic in both species.The TAG synthesis category of enzymes was hydrophobic in both species.Digalactosyldiacylglycerol synthase (DGD1) was hydrophilic and ER phosphatidate phosphatase was hydrophobic in R. communis and G. max.
The aliphatic index (API) is defined as the relative volume occupied by the aliphatic side chain in a protein.

Discussion
In general, plant oil biosynthesis mostly follows common biosynthetic pathways for fatty acids in the plastid as well as TAG in the endoplasmic reticulum (ER) and a small

Ricinus communis
Glycine max  proportion of oil accumulates in oil bodies.However, there are significant differences in content and composition of seed oil in different plant species.Using comparative proteomics, we attempted to understand the effect of change in protein structure and sequential differences in oil contents in different plant species.In this study, 33 enzymes involved in biosynthesis and accumulation of seed oil were compared at protein sequence and structure levels in two oil seed plant species, Glycine max and Ricinus communis.The selected enzymes corresponded to six different categories: ACCase, desaturase, elongase, thioesterase, TAG synthesis, and oil body proteins.
The first step in a proteomic study is to have the 3D structure of the protein.The structures of the selected proteins were not available.We predicted the structures by using five structure prediction tools and high quality structures were selected for further studies.Most of the structures were selected from I Tasser.No good quality structure was predicted by Modeller.The structure prediction from Modeller requires templates from the Protein Data Bank (PDB) to execute homology modeling (Eswar et al., 2008).Related PDB structures of the query proteins were not available in PDB so high quality structures could not be predicted.I Tasser is a protein modeling tool that uses a hierarchical approach based on enhanced profile threading alignment of secondary structures (Wu and Zhang, 2007).It predicts high quality structures for distant homologs (Zhang, 2008).A large number of high quality structures were predicted and selected from I Tasser, i.e. 19 for R. communis and 20 for G. max.
Oil biosynthesis is generally considered as the production of fatty acids.Fatty acid biosynthesis is regulated by ACCase.The fatty acid contents were lowered in transgenic seeds with the reduction of the activity of ACCase.In this study, three pairs of amino acid sequences and structures for ACCase enzymes from G. max and R. communis were compared.Only one pair showed a good percentage of similarity, while one exhibited very poor similarity.This might explain the difference in oil content between the two species.Sharma and Chuhan (2012) found microsynteny of genes for ACCase of R. communis and G. max with respect to Arabidopsis thaliana.They found a 3-bp deletion in the 8th and 26th exons of R. communis and 3-bp insertion in the 29th and 31st exons in G. max with respect to A. thaliana.Roesler et al. (1997) and Yang et al. (2010) also concluded that variations in genes of ACCase may lead to oil content variation in plant species.Oil content in maize was increased up to 1.3% by introducing SNPs in genes related to ACCase.In our study, this may be the reason for variations in oil content between the two species.
Thioesterase regulates the fatty acid chain length (Jones et al., 1995).Genetic engineering of FatB in A. thaliana and Brassica napus resulted in a 5% increase of palmitic acid content.(Dormann et al., 2000).Downregulation of FatB resulted in a lower level of saturated fatty acids (Buhr et al., 2002).Variations in the palmitic acid contents of seed oil in different plant species were found to be due to variations in the FatB gene (Cardinal et al., 2007).
Palmitic acid content in R. communis is 2% (Akbar et al., 2009), while in G. max rt is 7%-11% (Kinney, 1997).Sharma and Chuhan (2012) concluded that the variation in palmitic acid content in G. max and R. communis might be due to deletion of the first exon of the FatB gene.In this study, a good percentage of identity (78.3%) was found between these two species at the structural level but the amino acid sequence identity was less than 50%.Our results agree with previous findings on gene-level study.
A high level of oleic acids and lower level of saturated fatty acids in G. max were induced by downregulation of the FAD2 and FatB genes.Sharma and Chuhan (2012) found that insertion and deletion in FAD2 may be the reason for high oleate and linoleate levels in G. max.In this study, FAD2 showed 30.2% structural identity between the two species.This might explain the variation in oil content between G. max and R. communis and confirms previous findings of genetic level studies.
In almost all plant species, the chemical form in which oil is stored is TAG (Ohlroggeav and Browse, 1995).DGAT and LPAT are very important in the TAG synthesis pathway (Eastmond et al., 2011).DGAT catalyzes a rate limiting step in storage oil biosynthesis (Saha et al., 2006).DGAT1 is expressed in seeds and pollens and plays a major role in seed oil biosynthesis (Zhang et al., 2009).DGAT2 is involved in the synthesis of unusual fatty acids in storage oil (Burgal et al., 2008;Durrett et al., 2010).In our study, TAG synthesis enzymes showed great variations between species in amino acid sequence and structure level.DGAT1 can be a potential target for genetic modifications in seed oil plants (Weselake, 2005;Lung and Weselak, 2006).We found variations in both species for DGAT and LPAT at amino acid sequence and structure levels.LPAT was found to be 2.0% to 53.9% identical in pairwise amino acid sequence alignment while the percent structural identity was very poor, ranging between 0.52% and 18.35%.In this study, we found variation in protein sequence and structure level between G. max and R. communis for the enzymes involved in fatty acid biosynthesis.
Acyl carrier proteins (ACPs) play a vital role in transportation of starting and intermediate materials throughout the fatty acid biosynthetic pathway (Crosby and Crump, 2012).In this study, more ACP-related proteins in G. max were more unstable than in R. communis, which may lead to low oil contents in G. max.Moreover, more proteins in ACCase and TAG synthesis were unstable in G. max than in R. communis.An overall good API was observed in the proteins, especially in the TAG synthesis category.This shows good thermostability of the proteins (Ikai, 1980).TAG synthesis enzymes were hydrophobic, which helps them to avoid water and better interact with counter proteins (Zing et al., 2007).
Soy bean (Glycine max) is widely used in biodiesel production (Okullo et al., 2012) by esterification of oleic acid and transesterification of TAG (Bokade and Yadav, 2009;Brahmkhatri and Patel, 2011).ACP, ACCase, and TAG-related proteins are unstable in G. max, which may be the reason for its low oil contents.If the stability of these proteins can be enhanced by in vitro techniques it will improve the quantity and quality of oil contents in G. max.On the other hand, castor bean (R. communis) has high oil contents but very low oleic acid contents.Genetic engineering of desaturase (FAD2) and thioesterase (FatB) by some insertions or deletions may enhance the oleic acid quantity (Sharma and Chuhan, 2012) in R. communis.It will become an inexpensive source for biodiesel production.A very important limitation for biofuel is its cost of production because the oil crops used require suitable environmental conditions and proper nutrition levels.The cost of production can be curtailed in two ways, either by enhancing the oil quantity in G. max or improving the quality of oil in R. communis.
In the future a study of the coexpression of genes and protein-protein interactions in this pathway would be useful in biofuel production strategies.
An overall good aliphatic index (<50) was recorded for all the enzymes in both species.R. communis had a score of 75.52 to 89.45 for ACCase while it was 87.98 to 93.29 in G. max.For the elongase category, the API ranged from 76.60 to 102.61 in R. communis, while it was 80.09 to 96.37 in G. max.API for thioesterase ranged from 80.04 to 84.70 (R. communis) and 82.24 to 83.18 (G.max).The highest API score was observed in the TAG synthesis category.In R. communis, it ranged from 94.28 to 108.19 and in G. max from 95.84 to 106.4.Oil body protein API values ranged from 64.49 to 91.72 (R. communis) and 77.23 to 106.01 (G.max).

Table 6 .
Predicted structures of oil body enzyme group involved in fatty acid biosynthesis in Ricinus communis and Glycine max.

Table 7 .
Selected protein 3D models, on the basis of quality factors, for the enzyme categories involved in fatty acid biosynthesis in Ricinus communis.

Table 8 .
Selected protein 3D models, on the basis of quality factors, for the enzyme categories involved in fatty acid biosynthesis in Glycine max.

Table 9 .
Pairwise alignment results for the enzyme categories involved in fatty acid biosynthesis in Ricinus communis and Glycine max.

Table 10 .
Structure alignment results for the enzyme categories involved in fatty acid biosynthesis in Ricinus communis and Glycine max.

Table 11 .
Physiochemical properties of enzymes involved in fatty acid biosynthesis in Ricinus communis.

Table 12 .
Physiochemical properties of enzymes involved in fatty acid biosynthesis in Glycine max.