Codon Usage Bias and RNA Secondary Structures Analysis for Virus Resistant Genes in Arabidopsis thaliana and Oryza sativa Muhammad

The current study is based upon the relationship of the codon usage bias (CUB) and RNA secondary structures (RSS) analysis between virus resistant (biotic stress) and housekeeping genes in Arabidopsis thaliana (ath) and Oryza sativa (osa). A total of seventeen genes (including thirteen virus resistant and four housekeeping genes) were subjected for the CUB and RSS analysis. The CUB analysis revealed that nine (out of thirteen) virus resistant genes showed similarity by more than 50% in the usage of codons for Arabidopsis thaliana and Oryza sativa and only one (out of four) ATC8 (housekeeping gene) showed similarity in the usage of codons by more than 50%. On the other hand Mfold algorithms for RNA secondary structure prediction revealed that six virus resistant genes showed significant difference in minimum free energy, two in number of loops and one in the number of stems. Whereas for the housekeeping genes, none showed significant difference in minimum free energy, number of loops & number of stems of RNA secondary structures. It can be concluded from the results that on the basis of RNA secondary structure virus resistant genes show mix behavior while housekeeping genes show uniform behavior. CUB analysis helps us to engineer the virus resistant crops through adjusting specific codon usage.


Introduction
CUB refers to the nonrandom usage of synonymous codon to encode a specific amino acid (AA).Frequently used codons are termed as optimal or major codons, whereas less frequently used ones are termed non optimal or minor codons.Non optimal codons usually correspond to less abundant tRNAs than that of optimal codons [1,2,3].Different organisms show particular preference to one of the several codons that encode the same amino acid.The arising of these preferences is much debated area of molecular evolution.It is generally acknowledged that codon preferences reflect a balance between mutational biases and natural selection for translational optimization [4].In many unicellular organisms, invertebrates, and plants synonymous codon usage bias result from a coadaptation between codon usage and tRNAs abundance to optimize the efficiency of protein synthesis.However it remains unclear whether natural selection acts at the level of the speed or the accuracy of mRNAs translation.Codon usage can improve the fidelity of protein synthesis in multicellular species [5].The consequences of biotic stress have been poorly understood because its application is difficult to control and its physiological consequences are highly variable.Many plant viruses are recognized on the basis of leaf symptoms that depend on localized changes to chloroplast structure and function.Virus infections may have greater effects on fitness and competitive ability in lower amount of Nitrogen (N), high light of the environments than that of shade, high nutrient conditions.Some other ecological implications of these observations are also found to be responsible for the susceptibility of the plants to viral infection [6].The first phase of resistance is induced by the recognition of pathogen-associated molecular patterns (PAMPs) by plant cell surface pattern recognition receptors, which initiates PAMPtriggered immunity that usually halts the infection of pathogens before invasion 7-8 .The next phase of plant resistance, resistance (R)-mediated resistance, or effector-triggered immunity, is induced by the direct or indirect recognition of pathogen effector proteins by plant R proteins, which are typically nucleotide binding site-Leu-rich repeat (NB-LRR) proteins [7,8] .Effector-triggered immunity usually induces a hypersensitive response (HR) with localized cell death and defense gene expression that suppresses the growth and spread of pathogens postentry [7,9].The resistance to plant viruses can be divided into multiple stages [10].The primary stage of virus resistance is the cellular-level resistance that occurs immediately after entry of the virus into plant cells which is called extreme resistance that inhibits viral accumulation in the initially invaded cells [10,11].More than a dozen dominant genes responsible for resistance to plant viruses have been isolated, most of them are NB-LRR-type R genes [12].RTM1 was isolated as the first lectin gene responsible for resistance to a Potyvirus [13].But the importance of lectins in plant immunity to viruses has been debated for more than a decade.The lectin gene JAX1, targets potexviruses, which are distantly related to potyviruses.JAX1 also exhibits a level of resistance different to that of RTM1.Interestingly, out of the 14 natural recessive resistance genes against plant viruses that have been cloned from diverse plant species thus far, 12 encode the eukaryotic translation initiation factor 4E (eIF4E) or its isoform eIF(iso)4E [14].In this study 13 virus resistant genes i.e., Resistance to yellow strain of cucumber mosaic virus1 (RCY1) Restricted TEV movement (RTM1) [18], loss of susceptibility to Potyvirus 1 (LSP1) [19], Suppressor of gene silencing2 (SGS2) [20], Suppressor of gene silencing3 (SGS3) [20], eukaryotic initiation factor 4E1(At-eIF4E1) [21], eukaryotic initiation factor (iso)4E (At-eIF(iso)4E) [22], Cucumovirus multiplication1 (cum1) [23], Cucumovirus multiplication2 (cum2) [23], Restricted TEV movement2 (RTM2) [24], hypersensitive response (HRT) [25] and four housekeeping genes Actin8 (ATC8) [26], Polyubiquitin 10 (UBQ10) [26], tubulin beta 2 (TUB2) [26] Elongation facto-1 alpha (EF-1 á) [26] were selected from Arabidopsis thaliana and Oryza sativa and subjected to CUB analysis, using bioinformatics tools.The aim of the study is to find co-relation between CUB and virus stress along with the housekeeping genes in Arabidopsisthaliana and Oryzasativa.The secondary structure of a nucleic acid molecule refers to the base pairing interactions within a single molecule or set of interacting molecules, & can be represented as a list of bases which are paired in a nucleic acid molecule [27].The secondary structures of biological DNA's and RNA's tend to be different: biological DNA mostly exists as fully base paired double helices, while biological RNA is single stranded & often forms complicated base-pairing interactions due to its increased ability to form hydrogen bonds stemming from the extra hydroxyl group in the ribose sugar.[26], TUB2 [26], UBQ10 [26]and EF-1 α [26]) were identified in Arabidopsis thaliana and Oryza sativa through literature survey.Sequence Retrieval Thirteen virus resistant genes (RCY1, EDS1, EDS5, RTM1, LSP1, SGS2, SGS, At-eIF4E1, At-eIF (iso) 4E, cum1, cum2, RTM2, HRT) and four housekeeping genes (ACT8, TUB2, UBQ10 and EF-1 α) in Arabidopsis thaliana and Oryza sativa were retrieved through nucleotide database and gen bank of NCBI by using the accession number of the genes.The fasta sequences of genes were saved and used for further analysis of RNA secondary structures through Mfold algorithm.

Open Reading Fram (ORF) Analysis
Open reading fram (ORF) is a graphical analysis tool which is used to find all the open reading frames of a nucleotide sequence.The ORF feature for all viral resistant genes and housekeeping genes were found through ORF finder.The ORF finder plays an important role in the determination of protein sequence of the nucleotides.The fasta sequences of virus resistant and housekeeping genes of Arabidopsis thaliana and Oryza sativa were subjected to the ORF finder of NCBI and the longest frames were selected for further analysis.Secondary structure analysis Secondary structure of RNAs were predicted from Mfold [28] and then analyzed i.e. the number of loops, budges, junction & helices, number of stems, loops, & minimum free energy of each RNA secondary structure was recorded.

Codon Usage Bias Analysis
For the analysis of codon usage the fasta sequence of the genes were subjected to codon usage analysis program i.e., www.geneinfinity.org.The codon usage analysis program was used for analysis of amino acids.The codon having largest value were selected and compared them for each amino acid of Ath and Osa.It is the key step of research and the results obtained in this step were saved for further analysis.The relative synonymous codon usage (RSCU) values for each of the codon in the sequence were determined by using the following formula.RSCU= (no. of codon used/total no. of codon) × no. of amino acids.

Secondary structure Analysis
The RNA secondary structures of virus resistant and housekeeping genes were analyzed (Table 4.) in terms of Number of loops, stem region with Minimum free energy (MFE).Six of the virus resistant genes (RCY1, EDS1, LSP1 CUM1, RTM2 and HRT) show significant difference in minimum free energy, two (CUM1 and RTM2) Show significant difference in number of loops and only one (RTM2) show significant difference for the number of stems in secondary structures.Whereas for the house keeping genes, no one showed significant difference in minimum free energy, number of loops & number of stems.It can conclude from the results that mix behavior on the basis of RNA secondary structures were observed for viral resistant genes and uniform for housekeeping genes in Arabidopsis thaliana &Oryza sativa.

Discussions
The study is based upon codon usage bias in Arabidopsis thaliana and Oryza sativa through bioinformatics tools.Bioinformatics is the part of molecular biology that involves working with biological data, typically using computers, with the goal of enabling and accelerating biological research.Bioinformatics spans a wide range of activities: data capture, automated recording of experimental results; data storage and access, using a multitude of databases and query tools; data analysis; and visualization of raw data and analytical results.Bioinformatics is also is applied in the creation and

Conclusion
The codon usage analysis was done for thirteen virus resistant and four housekeeping genes.In virus resistant genes out of thirteen 9 show similarity in the usage of codons by more than 50% and remaining 4 genes show similarity by less than 50%in Arabidopsis thaliana and Oryza sativa.Similarly out of four housekeeping genes only one gene show similarity by more than 50% and remaining 3 show similarities by less than 50%.The finding in this research is platform for engineering the virus resistant crops.

Table 1 )
reveals that For RCY1 gene Arabidopsis thaliana (Ath) prefers fram +1 and Oryza sativa (Osa) prefers +3 fram and the fram length in base pairs (bp) are 1086 and 1284 respectively.While fram length of amino acid (A.A) are 361 and 427 respectively similarly for EDS1 +1 and +2 and fram length of bp and A.A are 1548, 1866 and 515,621.EDS5 prefer the same fram i.e., +2 in both plants and fram length of bp and A.A are 1632,414.RTM1 prefer +2 and +1 fram and fram length of bp and A.A are 525,483 and 174,160.LSP1 prefer the same fram of +2 while 504,621 and 167,206 are fram length of bp and A.A in Ath and Osa respectively.SGS2 prefer +1 fram in cases and 2805, 3657 and 934, 1218 are fram length of bp and A.A respectively.In SGS3 Ath prefer +2 and that of Osa prefer +3 and 1878,1830 are fram length of bp while 625,609 are fram length of A.A.elF4E prefer +3 fram and fram length of bp, A.A are 708,684 and 235,227 respectively.In elf (iso)4E the two plants prefer the fram of +1 and +2 and their fram length of bp and A.A are 366,621 and 121,206 respectively.CUM1 gene in Ath and Osa prefer +1 and -2 while fram length of bp and A.A are 708,360 and 235,119.In case of CUM2 gene Ath and Osa prefer +2and +3 fram and 5178, 2445 are fram length of bp while 1725,814 are fram length of A.A respectively.In RTM2 gene plants prefer +3 fram, the fram length in bp of Ath is 1101 and in case of Osa it is 1017, and that of A.A are 366 and 338 respectively.The HRT gene Ath prefer +1 and Osa prefer +2 fram, their length of bp are 1683, 2700 and that of A.A are 560,899 respectively.The gene length of Ath genes are 1582, 2036, 2031, 634, 1293, 4013, 2145, 942, 1474, 898, 5456, 1209, 5996 and that of Osa are 1841, 2140, 7337, 3606, 1124, 2027, 958, 3932, 2177, 750, 993, 3227, 2942, 1432 and 3454 base pairs respectively.The ORF analysis were also done for four housekeeping genes (Table 1) in which for ATC8 gene both Ath and Osa prefer +1 fram, 1134 fram length of bp and 377 fram length of A.A respectively.For UBQ 10 gene Ath and Osa prefer +3 and +1 their fram length of bp is 1146 and 1209 and that of A.A are 381 and 402 respectively.Ath prefer +2 fram for TUB2 gene and Osaprefer +1 and fram length of bp are 1353, 1344 and that of A.A are 450,447 respectively.For EF-1 á gene Ath and Osa prefer +3 and +2 fram and 1350, 1344 are fram length of bp and 449,447 are of A.A respectively.The gene length of Ath housekeeping gene are 1707, 1270, 1726, 1590 and that of Osa gene are 1608, 1609, 1799 and 1755 base pairs respectively.

.Tubulin beta-2 (TUB2) Plant Name Accession # MFE(kcal/mol) Number of Loops Number of S tems
[39] is the observed number of codon occurrences divided by the number expected if synonymous codons were used uniformly.Second, the relative merits of different codons can be assessed from the viewpoint of translational efficiency[34].The CUB analysis in this research was done by calculating relative synonymous codon usage (RSCU) values.The wonderful result was obtained in case of RTM1 gene in which all four codons do not use for alanine amino acid.Although synonymous codons encode the same amino acids, they are not used randomly and some are used more frequently than others Such as codon for Methionine is ATG and to more extant for Tryptophan amino acids the codon is TGG.Mukhopadhyay[35].Reported similar CUB analyses for tissue specific and housekeeping genes in A. thaliana and Oryza sativa.The variation pattern of synonymous codon usage is well studied across the genes of an archaen, Methanococcus maripalud is which is most extensively studied archaen and is isolated from salt marsh sediment[36].The use of alternative synonymous codon s is due to the divergence of coding sequences is well studied in closely related organism i.e., Arabidopsis thaliana and Brassica rapa ssp.Pekinensis[37].The codon usage bias analysis under abiotic stress i.e., salt resistant genes in Arabidopsis thaliana and Oryza sativa is studied[38].While not studied under biotic stress still, which is the main theme of this research.Recent Study is based on codon usage bias in biotic stress condition which is caused by viruses.A total of thirteen (13) virus resistant genes (RCY1, EDS1, EDS5, RTM1, LSP1, SGS2, SGS3, elF4E, elf (iso) 4E, CUM1, CUM2, RTM2 and HRT) and four housekeeping genes (ACT8, TUB2, UBQ10 and EF1á) were identified in Arabidopsis thaliana and Oryza sativa.The four housekeeping genes i.e., ACT8, TUB2, UBQ10 and EF1á are also reported[39].In the same way the genes like RCY1, EDS1, EDS5, RTM1, LSP1, SGS2, SGS3, elF4E, elf (iso) 4E, CUM1, CUM2, RTM2 and HRT showing resistance to stress which is caused by virus infection in Arabidopsis thaliana and Oryza sativa, are also reported by [