Computational identification of six salt stress responsive genes in Cynara scolymus L

Cynara scolymus L. is a perennial plant belonging to family Asteracea. It has many traditional and medicinal values and is also used as a source of vegetable in many countries. This plant thrives well in saline areas and is considered to be moderately salt tolerant but very little information is available about its salt stress tolerance mechanism. In this study six (6) salt stress responsive genes (Four up-regulated i.e. CSD1, RCI3, AFB3, HVA22A and two down-regulated i.e. CSD2, MSD1) were identified in Cynara scolymus L. by using homology search approach with the help of various bioinformatics tools/resources (BLASTn, BLASTp, ORF finder, CD search, Clustal Omega, ProtParam and Phyre2). Furthermore, the secondary and tertiary structures of reference and target genes were predicted and compared for confidence and similarities in structure. All selected salt stress responsive genes were predicted in Cynara scolymus through computational analysis of 36,323 ESTs. The EST based identification of these genes is a confirmation of their expression and these findings will help us understand the salt stress tolerance mechanism in this important plant.

Introduction Cynara scolymus L. (C. scolymus) is a perennial plant commonly known as Globe Artichoke and belongs to family Asteraceae. It is a rich source of minerals, fiber and phenolics [1,2]. Its leaf extract is used as antimicrobial, hepatoprotective and for cholesterol reducing purposes. The eatable parts of C. scolymus are the delicate internal leaves and the receptacle called as "heart" [3]. It assumes an essential part in humans' daily nutritional value, mainly in the Mediterranean regions [4]. C. scolymus has a small amount of fat and large amount of minerals like potassium (K), sodium (Na), phosphorus (P), vitamin C, polyphenols and flavones [3,5]. Many other Cynara species are grown in different areas of the world i.e. in U.S, principally in California, in South America, North Africa, Near East and China [4,5]. The condition in which plants grow in higher than normal quantity of salt is called salinity or salt stress condition. Salt stress is an important abiotic stress which affects the metabolic and physiological responses of the plants. It is one of the main abiotic stresses faced by plants. Plants identify the stress signals and pass them to the cellular apparatus and then the adaptation is fulfilled by the regulation of gene expression [6]. C. scolymus L. grows in high salinity and drought conditions [7]. Since it is a medicinal plant, researchers have also studied the effect of salinity on its phenolic composition and antioxidant activity [8]. Salt

Deduction of secondary and tertiary protein structures
The physical properties including Theoretical PI, Instability Index, Aliphatic Index and GRAVY of the deduced protein structures of the target C. scolymas ESTs and Arabidopsis thaliana reference genes were obtained using online tool ProtParam from the ExPasy server [13]. Whereas, the secondary and tertiary structures were obtained from Phyre2 online protein analysis server [14]. The deduced secondary and tertiary structures were compared for confidence and similarities in structure.

Results
Identification of new conserved salt stress genes in C. scolymus All selected salt stress responsive genes were identified in Cynara scolymus through computational analysis of 36,323 ESTs. Homology search of each gene showed less query coverage and identity when compared with nucleotides (blastn) and comparatively higher query coverage and identity when compared with protein sequences (blastp). Query coverage of nucleotide and protein sequences of CSD1, RCI3, AFB3, HVA22A, CSD2 and MSD1are shown in (Fig.1a & b). Identification of similar conserved domains in predicted genes CD search results of newly predicted conserved genes in Cynara scolymus demonstrated that all predicted genes share almost similar conserved domains with their corresponding reference genes. Similarities of conserved domains are shown in (Fig. 2a to f).

Phylogenetic analysis and Percent Identity Matrix of MSD1 genes
Phylogenetic analysis of the protein coding sequence (amino acid) of MSD1 gene with its orthologs in Camelina sativa, Papaver somniferum and Arabidopsis thaliana showed that MSD1 gene of Cynara scolymus is more close to Papaver somniferum as compred to Camelina sativa and Arabidopsis thaliana ( Fig. 3a & b). Percent Identity Matrix (as predited by Clustal Omega) for MSD1 gene of each plant is shown in ( Table  2).

Deduction of secondary and tertiary structures of proteins
In current analysis the deduced secondary and tertiary structures of proteins also showed similar physical properties including molecular stability, aliphatic index and GRAVY. The results in (Table 3) showed that the C. scolymus target ESTs coded for proteins showed homology to the respective Arabidopsis reference genes. All deduced proteins of target ESTs except GE578194.1 showed similar stability / instability to their respective Arabidopsis reference counterparts. Values of aliphatic indices deviated within the range of +-11 to the reference protein values which indicated comparable contribution of the aliphatic side chains to the stability of the protein molecules. All target EST deduced proteins showed comparable polarity to the reference sequence proteins except for the EST sequence GE583634.1, which gave a very low positive GRAVY value. The deduced secondary and tertiary structures of target ESTs selected from Cynara scolymus were comparable to the predicted structures obtained from the respective Arabidopsis reference genes. The three dimentional models for GE578194.1, GE594969.1, GE583634.1 and GE599954.1 were predicted with 100% confidence by the Phyre2 software with a query coverage range from 88 -99 % in the structures. A comparison of deduced tertiary structures of both Cynara scolymus and Arabidopsis thaliana are given in (Fig 4a-l).    4 (a-l). Tertiary structure analogues obtained by Phyre2 online analysis server from deduced proteins of Arabidopsis thaliana reference genes and Cynara scolymus target ESTs  20]. To understand the transcriptional profiling of genes whose expression levels alter in response to salt stress, different EST/cDNA collections have been utilized in plants [21]. As compared to nucleotide sequences query coverage, protein sequence query coverage is greater due to codon degeneracy and due to the fact that DNA alignment doesn't take into account the redundancy of amino acid codons. Furthermore, DNA alignment also doesn't consider the more similar structures of some amino acids to others and the similar functional role they have in the protein e.g. Isoleucine and valine have similar structures, both have hydrophobic side chains and differ only by the addition of an extra carbon on isoleucine. A mutation from one to the other is not as likely to substantially change the protein structure as some other mutations would, but a DNA sequence alignment treats this mutation the same as any other. Therefore both approaches were adopted in this research. The higher query coverage in case of BlastP as compared to Blastn for all six genes is upto the mark. This research also investigated the presence of conserved domains on reference genes (Arabidopsis thaliana) and their predicted orthologs (Cynara scolymus L.) Presence of same conserved domains in all predicted genes provides a clue of similar function. All the predicted genes share similar super families and specific hits and therefore a high confidence level for the inferred function of the protein query sequence. The amino acid sequence of a protein is known to determine the secondary and tertiary structure of a protein molecule [22].

Figure 3(a). Alignment of Cynara scolymus MSD1 gene sequence with Arabidopsis thaliana, Camelina sativa and Papaver somniferum. Below the protein sequences is a key denoting conserved sequence (*), conservative mutations (:), semi-conservative mutations (.), and nonconservative mutations ( ). Alignment was generated by Clustal O (1.2.4). (b) Cladogram of Cynara scolymus MSD1 gene sequence with Arabidopsis thaliana, Camelina sativa and Papaver somniferum showing that on the basis of MSD1 gene sequence, Cynara scolymus is more close to Papaver somniferum as compared to Arabidopsis thaliana and Camelina sativa
In the current analysis the deduced secondary protein structures also show comparable physical properties including molecular stability, aliphatic index and GRAVY thus infering similar results as above. This is also evident by the presence of trans-membrane helices in Arabidopsis reference genes RCl3 and HVA22A and their target ESTs in Cynara scolymus (Accession No.s GE578194.1 and GE589804.1). It is known that genes for proteins with structural and functional similarity have similarities in secondary structures leading to comparable three dimentional tertiary structures. This is relevant for similarity in their functional properties. Moreover, the tertiary structure analogs of Cynara scolymus EST Acession GE599954.1 (superoxide dismutase), GE578194.1 (heme-dependent peroxidase), GE583634.1 (transport inhibitor response 1 protein) and GE594969.1 (molecule superoxide dismutase) showed 100% confidence of model with analogs of their respective Arabidopsis reference genes. These evident similarities lead to the conclusion that the targeted ESTs possessed similar structure and function as their reference genes. Thus, indicating the possibility of their orthologous origin [23]. Many scientists are pursuing similar in-silico (bioinformatics) approaches to identify genes related to stress tolerance in plants from available gene pool of plant genomes [24,25]. This saves researchers from unwanted waste of time and potential funds. Due to the above findings it is plausible to say that Arabidopsis conserved domains can be used to predict orthologs in Cynara scolymus which can further be used to identify potential genes for salt tolerance in plants.

Conclusion
It is predicted that in C. scolymus CSD1, MSD1, CSD2, AFB3, RCI3 and HVA22A salt stress responsive genes are present which may potentially help this plant against salinity. Bioinformatics identification of salt stress responsive genes in C. scolymus and their molecular annotation will be a good contribution in understanding the acclimation process of this plant under saline conditions.