Predictive structure and protein-ligand interface of novel lectin from rice bean (Vigna umbellata)


 Lectins are an important group of multivalent glycoproteins having the property of selectively recognizing and precipitating glycoconjugates. Although lectins have been reported from diverse biological sources, legume lectins is the best-characterized family of plant lectins. We have successfully cloned and sequenced the RbL ORF of 843bp from immature rice bean seeds (Vigna umbellata). We report the results of in silico analysis of novel lectin precursor of 280 amino acids from rice bean. BlastP analysis revealed more than 90% sequence similarity of RbL protein with Vigna angularis lectin and Vigna aconitifolia lectin. ProtParam analysis revealed acidic, stable and hydrophobic nature of RbL protein. Template based 3D structure of RbL protein was modeled using I-TASSER tool and validated as good quality model. Structural analysis revealed the presence of β-sandwich (Jelly roll fold or lectin fold) in modeled RbL structure. RbL protein was functionally annotated as a plant defense protein. Molecular docking was performed to analyze interactions of RbL protein with predicted ligands (N-acetyl-D-glucosamine β-galactose, Lactose and Adenine) and two selected ligands (Glucose and Mannose). Molecular dynamics (MD) simulations of RbL-ligand complexes confirmed robust hydrogen bonding interactions between ligands and RbL protein. The novel information generated in the study would be useful in exploring RbL protein for different biomedical and biotechnological applications.


Introduction
Lectins constitute an important group of glycoproteins having ability to bind and precipitate glycoconjugates selectively and reversibly. These proteins are also known as hemagglutinins, for their selective binding with cell surface receptors on erythrocytes and make them agglutinate (Tsaneva and Van Damme 2020). This property differentiates lectins from other glycoproteins and is the easiest way of identi cation of lectins from different biological sources. Lectins have been identi ed from all three domains of life viz., eukaryotes, prokaryotes and archea. Among the different groups of lectins, plant lectins are thoroughly investigated group. More than a century ago, Stillmark (1888) gave the rst description of lectins while investigating the effect of castor bean extract (Ricinus communis L.) on red blood cells. Since the rst insight of lectins, numerous plant lectins have been investigated for their biochemical and biological properties (Vandenborre et al. 2011).
Structurally, lectins have been classi ed into three categories, viz., merolectins, hololectins and chimerolectins (Peumans and Van Damme 1995). These are also classi ed into different groups depending on their carbohydrate binding speci city (Van Damme et al. 1998). Based on sequence and structure of lectin motif, plant lectins have also been categorized into twelve families (Van Damme et al. 2008). Carbohydrate binding speci city enable lectins to exert multiple biological activities such as mitogenic, antibacterial, antifungal, antiviral, anticancer and insecticidal activities (Gautam et al. 2018); symbiotic interactions between host plants and microbes (De Hoff et al. 2009), plant hormone mediated growth and development (Lannoo et al. 2007;Lannoo and Van Damme 2010) and signaling (Jin et al. 2009).
Though lectins are ubiquitously distributed and associated with all life forms, the most investigated plant lectins belong to the family Leguminosae (Cavada et al. 2020) and most of them have been puri ed from mature seeds comprising up to 10% of total storage proteins (Laija et al. 2010). Apart from seeds, lectins have also been identi ed from vegetative tissues in lower amount. In seeds, lectins are synthesized as preprecursor molecule and sequestered in protein bodies during seed development and broken down during seed germination for providing essential amino acids (Van Damme et al. 2008). The inactive preprecursor polypeptide is converted into mature protein through distinct proteolytic processes such as C-terminal trimming of polypeptide.
Despite high structural and sequential similarities, legume lectins show variability in carbohydrate speci city and quaternary structures. The primary structure of legume lectins is made up from protomers of 250 to 300 amino acid residues (~30kDa). Legume lectins have higher proportion of β-sheets, β-turns and negligible amount of α-helix in their secondary structure (Swamy et al. 1985), thus considered as βsheet proteins. The 3D structure of legume lectins consist of at six stranded anti-parallel β-sheets (Back face) and seven-stranded curved anti-parallel β-sheets (Front face), interconnected by different loops. The dome like structure represent the β-sandwich structure, related to Jelly roll fold also known as lectin fold with two hydrophobic cores, present between front and back β-sheets and between the curled loop and front β-sheets (Lagarda-Dias et al. 2017). Concavity in front β-sheets and second hydrophobic core is suited well to bind with carbohydrates moieties. Quaternary structure of legume lectins is characterized by an oligomeric structure in which monomers are assembled as homodimers (Canonical legume lectin dimer) or homotetrameric structure (dimmers of dimers). Moreover, heterotetrameric structures have also been observed . Different types of quaternary structures [Canonical, ECorL-type, GS4-type, DBL-type, ConA-type, PNA-type, GS1-type and DB58-type and Arcelin-5-type (monomeric) structures] are known for legume lectins (Brinda et al. 2005).
Legume lectins require divalent cations particularly Ca +2 and Mn +2 for their activity as they are essential for maintaining the stability of carbohydrate binding site. The metal binding site formed by the side chains of glutamic acid, aspartic acid and histidine amino acids in legume lectins is highly conserved (Rini 1995). The carbohydrate and metal-binding sites are localized in close vicinity at top of the front sheet in legume lectins structure (Cummings et al. 2017). Four loops A, B, C and D are responsible for the formation of carbohydrate binding sites (Sharma and Surolia 1997). The diversity in carbohydrate speci city of legume lectins is mainly due to variability in conformation and size of the D loop. To the some extent, the variability in carbohydrate binding is also speci ed by the loop C (Young and Oomen 1992). Variability in carbohydrate-binding speci cities makes legume lectins a valuable tool in glycome research (Coelho et al. 2017). Legume lectins are the best model system to study protein-sugar interactions .
With the development of advance computational tools, the analysis of large amount genomic and proteomic data has become easy and less time consuming. Critical analysis of information embedded in structure of proteins is essential to characterize their physic-chemical properties and functions. Though reports on structural and functional characterization of various legumes lectins are available (Pinto et al. 2008;Moreira et al. 2013;Li et al. 2014;Filho et al. 2017;Osman and Konozy 2017), but there is no such report available on lectin from rice bean [Vigna umbellata (Thunb.) Ohwi and Ohashi]. Herein, we presented the in silico analysis of RbL protein for its structural and functional characterization. Molecular docking and MD stimulations of RbL protein-ligands complexes was also performed to study RbL proteinligand interactions.

Materials And Methods
In the present investigation, RbL ORF sequence which had been cloned and sequenced from immature rice bean seeds using degenerate primer set, was translated into 280 amino acids long polypeptide with ExPASY translate software (http://web.expasy.org/translate/) (Gasteiger et al. 2005). The deduced amino acid sequence was analyzed with different computational tools for structural and functional characterization of RbL protein.

Primary and secondary structure analysis
Physicochemical parameters of deduced RbL sequence were analyzed using ProtParam server (http://web.expasy.org/protparam) (Gasteiger et al. 2005). PHD web server (Rost 1996) was used for secondary structure analysis of RbL protein. ProteinPredict (https://www.predictprotein.org) web server (Rost et al. 2004) was used to predict putative functions of RbL protein based on gene ontology.
Conserved motif structure was analyzed using MEME SUITE tool (Bailey et al. 2009) with parameters including maximum number of motifs =7; minimum width of motif=6; and maximum width of motif = 50.
The NCBI CDS online tool was used to analyze gene-encoded protein domains (Marchler-Bauer et al. 2015). We used Kyte and Doolittle (1982) method at ProtScale software (http://web.expasy.org/protscale) for hydrophobicity analysis of RbL protein and Signal IP 3.0 server for the prediction of signal peptide cleavage sites (Bendtsen et al. 2005). The putative N-glycosylation sites were predicted with NetNGly1.0 server (http://www.cbs.dtu.dk/services/NetNGlyc/) (Gupta and Brunak 2002).
Sequence alignment, homology modeling, binding site prediction and molecular docking analysis RbL protein sequence was aligned with other legume lectins sequences using Clustal W algorithm and formatted with ESPript2.2 (Gouet et al. 2003). RbL protein structure was modeled using I-TASSER modeling server (Roy et al. 2010). The model validation was done through the Zlab server (Anderson et al. 2005). Using ProSA-web interactive server, Z-score was also calculated to recognize errors in RbL protein structure (Wiederstein and Sippl 2007). GalaxySite server (http://galaxy.seoklab.org) (Heo et al. 2014) was used to predict binding ligands for RbL protein. Due to the absence of binding site knowledge, COACH (meta-server) was used to predict protein-ligand binding site in modeled RbL structure. COACH produced ligand-binding sites employing two complementary approaches, S-SITE and TM-SITE (Yang et al. 2013a, b). CDOCKER component from Discovery Studio was adopted for molecular docking test which is a CHARMm based docking engine (Wu et al. 2003). The receptor is rm, while ligands are resilient throughout the docking. For each ligand, the interaction and CHARMm energy symbolize binding a nity (Brooks et al. 1983). Water particles are usually raised in a semi-exible and rm docking because water particles might harm the protein-ligand complex structure. Water particles were released and hydrogen atoms were attached to protein. The RbL binding site spheres were labeled as the areas inside a radius of 10Å from the ligands geometric centroid. The structures of recognized hits were arranged and docked into RbL-binding pocket. Various poses for every selected molecule were produced and interpreted based on -CDOCKER interaction energy. The 2D interactions were generated from PLIP server (Salentin et al. 2015).
MD-simulations were conducted to investigate the binding stability of docked complexes of RbL protein with ligands adenine, glucose, β-galactose, lactose, mannose and N-acetyl-D-glucosamine executed in GROMACS 4.6.5 module using Gromos96 43a1 force eld parameter (Hess et al. 1997;Van Der Spoel et al. 2005;Abraham et al. 2015). The PRODRG server was employed to produce ligands topologies which can be utilized in GROMACS (Schüttelkopf et al. 2004). The SPC water state was selected and the box was designated as cuboid. Periodical terminal limitations were implemented in all regions and Na + Cl − ions were a xed to compensate the system. The 50,000 moves of steepest descents energy minimization were carried to rest any steric disputes. Energy minimization was then ensued by 1000ps NPT equilibration. The particle mesh Ewald approach was applied to reckon the extensive range of electrostatic interplays and the bond length restrained by the LINCS algorithm (Hess et al. 1997). Parrinello-Rahman barostat and V-rescale thermostat were performed sequentially to sustain the steady pressure and heat (1 bar and 300 K). The 10 ns production run was conducted and the coordinates were kept. The generated trajectories were examined by outlining the root mean square deviation (RMSD), root mean square uctuation (RMSF) and hydrogen bond (H-bond) for every frame.

Result And Discussion
We have successfully cloned and sequenced RbL ORF of 843bp encoding 280 amino acids from immature rice bean beads using degenerate primers set. The nucleotide sequence of RbL cDNA has been submitted to the GenBank Database (The sequence has been submitted to the GenBank Database under accession No.MT043160). Herein, we report the in silico analysis of novel lectin precursor from rice bean (RbL). The results of the study have been discussed under the subsequent subheads.
Primary and secondary structure analysis of RbL protein ProtParam analysis revealed that RbL protein was composed of 280 amino acids (Supplementary Figure  1) of which Serine was the most abundant amino acid with 14.60% occurrence, followed by leucine (10.40%), threonine (7.50%) and glycine (7.10%) (Figure-1). Number of positive charged and negatively charged amino acid residues in RbL protein were 21 and 18, respectively. Similar to Erythrina lectin (ECorL) (Osman and Konozy 2017) and Winged bean lectin (WBL) (Kortt 1985), RbL protein had high proportion of hydroxylic amino acids, aliphatic and acidic amino acids with low methionine content and no cysteine residues. In some lectins such as lima bean lectin (Phaseolus lunatus) cysteine residues is essential for both carbohydrates and metal ions binding activities (Roberts and Goldstein 1984). Several cysteine residues have been also been conserved in hevein lectin domain for maintaining the structure and folding (Aboitiz et al. 2004). Blast-P analysis of deduced RbL protein sequence showed 97.14% and 91.79% similarities with from Adzuki bean (Vigna angularis) and Moth bean (Vigna aconitifolia) lectins, respectively. The knowledge of pI (Isoelectric pH) of a protein is important for having an assessment of solubility, sub-cellular localization and its interactions. Cytoplasmic proteins have pI lower then physiological pH, while those in nucleus have pI more than the physiological pH (Nandi et al. 2005). Theoretical pI of RbL protein was 5.76 indicating negative charge on protein and acidic nature. The molecular weight of the deduced RbL protein was inferred to be 29833.57 Da. The half-life is a prediction of the time taken for half of the protein to disappear after its synthesis in cell. ProtParam software relies on "N end rule", which relates half-life of a protein to the identity of its N-terminal residue (Varshavsky 2011). The analysis revealed estimated half life of RbL protein was 30hrs.
The instability index value of 34.99 (Instability index < 40) revealed good stability of RbL protein. The high aliphatic index value (93.99) and the positive GRAVY value (0.145) revealed thermal stability and hydrophobic nature of putative RbL protein. Approximately 50% amino acids had positive hydrophobicity score (>0) which also con rmed hydrophobicity nature of RbL protein (Figure-2). Proteins with hydrophobicity score < 0 and > 0 are more likely to be globular (hydrophilic protein) and membranous (hydrophobic) in nature, respectively (Magdeldin et al. 2012).
Signal peptide of 15 to 20 amino acids length is usually present at the N-terminus of newly synthesized proteins. The presence of signal peptide determines their fate to enter secretary pathway and transportation to reside either inside certain organelles (Plant vacuole, or protein bodies), secreted from cell, or inserted into most cellular membranes (Tsaneva and Van Damme 2020). Most of constitutively expressed lectins are synthesized with signal peptide as a precursor. Signal peptide directs the transportation of proteins to ER lumen in order to con rm their entry into secretary pathway (Van Damme et al. 2008). During co-translational processing, signal peptide is cleaved off by a signal peptidase enzyme in ER lumen. Most of constitutively expressed legume lectins synthesized with a signal peptide and have destination to plant vacuole or extracellular space. Contrary to it, induced lectins are synthesized with no signal peptide, thus reside in nucleus or in cytoplasm (Lannoo and Van Damme 2010).
Signal IP 3.0 server predicted two signal peptide cleavage sites in RbL protein using two bioinformatics algorithms. The Hidden Morkov Model (HMM) and Neural Network (NN) algorithms predicted the size of signal peptide of 24 and 26 amino acids, respectively (Supplementary Figure 2a and 2b). Signal peptide cleavage sites in Vigna aconitifolia (A 24 -A 25 ) and Vigna unguiculata (A 26 -A 27 ) lectin sequences have also been reported by Filho et al. (2017). The presence of signal peptide in RbL protein indicated that the protein enters into endomembrane system for transportation through membrane to or through cell membrane, to cellular compartments such as plant vacuole for storage. Signal peptide with its hydrophobic core (α-helical region) has the property of interacting with hydrophobic interior of membrane.
Signal peptide with its hydrophobic core (α-helical region) interacts with hydrophobic interior of plasma membrane and initiate the transport of protein across the membrane. A 13 amino acids (8 to 21) long hydrophobic region in signal peptide of RbL protein was predicted with Phobius server (http://phobius.sbc.su.se/) (Käll et al. 2007). The location of Vigna aconitifolia and Vigna unguiculata has been reported in plasma membrane and predicted stress response activity as one of their major biological functions (Filho et al. 2017). Since both legume lectins reveal high sequence similarity with RbL protein, therefore their functions could be assigned to the RbL protein. Moreover, the presence of conserved domain for lectin receptor kinase (LecRK) signi es the putative role of RbL protein in plant defense. Domain mapping of RbL sequence with NCBI CDS online tool revealed the presence of conserved lectin_legume_LecRK_Arcelin_ConA domain (domain architecture ID 10160902) (amino acid residues from 30 to 260). Moreover, dimer and tetramerization domains were found highly conserved in RbL protein which also indicated that protein would be of multimeric nature (Supplementary Figure 3a).
The identi cation of motifs in large genomic sequences is a cumbersome, however it become easier when motifs are large and have high sequence similarity (Ujinwal et al. 2019). The analysis of conserved motifs in legume lectins including RbL protein was performed by MEME Suite tool (memesuite.org/tools/meme). The suite function by looking for repeated, un-gapped sequence patterns in the provided sequences. The suite determines width and number of occurrences of each motif repeatedly in order to minimize 'e-value' of motif. The predicted motifs with a lower e-value have the probability of nding out an equally well conserved region in target sequence (Unger and Sussman 1993). Initially the suite tool was set with parameters for motif identi cation in the sequences with maximum number of motifs to be identi ed =7. In all 7 identi ed motifs, four were more conserved and functional motifs then other motifs. These four conserved regions showed low e-value, and the number of amino acid residues were also different then non-functional motif (Supplementary Figure 3b).
Legume lectins are predominantly β-sheet proteins for containing large proportion of β-pleated sheets and negligible amount of α-helix (Swamy et al. 1985). PHD neural network server revealed 39.64% βsheets, 51.43% random coil and 8.93% α-helix in secondary structure of RbL protein. Since rice bean has dominance of the β-stands in the secondary structure, therefore belongs to the class of β-proteins.

Multiple sequence analysis of RbL protein
To have better insight of conservedness in deduced RbL sequence, deduced RbL sequence was aligned with other legume lectin sequences using ClustalW algorithm and curated with ESPript 2.2 server (Figure-3). Highest sequence similarity of RbL protein was observed with, Vigna angularis, Vigna aconitifolia and Vigna unguiculata lectins. Genetically, rice bean also shares similarity in evolutionary pattern and in pod and seed characteristics with adzuki bean. Moreover, similarity in electrophoretic pattern also reveals a close relationship between rice bean and adzuki bean (Chandel et al. 1988). NetNGlyc 1.0 server predicted correct folding state of soluble and membrane bound proteins (Haweker et al. 2010). The presence of four glycosylation sites revealed high stability of RbL protein as they probably increase the inter-subunit interactions in protein (Mitra et al. 2006).
Multiple sequence alignment of RbL protein and other legume lectins revealed conservedness in amino acid sequences of carbohydrate binding loops except loop D. Variability in loop D is responsible for different carbohydrate speci cities of legume lectins (Cummings et al. 2017). Another important observation from sequence alignment was the conservation in amino acid residues which are responsible for metal ion binding (denoted by $) and monosaccharide binding (denoted by #) in legume lectins. From the sequence alignment, the N-terminal (after signal peptide sequence) and C-terminal sequences in deduced RbL sequence were also identi ed.
Prosite analysis (https://prosite.expasy.org/) (Gasteiger et al. 2005) revealed two legume lectins signature motifs in deduced RbL sequence which were β-chain signature  corresponds to N-terminal) (Aspartate in β-chain is conserved for metal binding) and α-chain signature  corresponds to C-terminal]. The presence of these signatures revealed that RbL protein is probably a two chain lectin processed post-translationaly during maturation in plant vacuole and interactions between two chains lead to formation of dimer which has also been observed for DBL, PHA-L, PHA-E, SBA lectins. The speci c signature motifs in legume lectins determine and differentiate their dimerization state (Van Damme et al. 2008).
Multiple sequence alignment also revealed the presence of conserved leucine amino acid as a cleavage signal for the removal C-terminal peptide. The C-terminal sequence in RbL amino acid sequence was identi ed by following three rules (Moreira et al. 2013) which are (i) rst amino acid of excised peptide is small or hydrophobic; (ii) cleavage occurs after an acid, polar or hydrophobic amino acid residue, but not after a basic amino acid and; (iii) cleavage spot is located 5 to 8 residues after a conserved leucine amino acid residue. By analyzing the sequence alignment of legume lectins, the C-terminal cleavage site in deduced RbL sequence was observed between aspartic acid and leucine residues (Asp 270 and Leu 271 ) and further presence of the two leucine amino acid confers the stability of C-terminal α-helix. The presence of C-terminal α-helix has also been observed in the structure of DBL, SBA and BVL-II (Moreira et al. 2013). Further, hydrophobic nature of C-terminal peptide in deduced RbL sequence revealed its targeting to the vacuole (Chrispeels and Raikhel 1992). The presence of α-helix at c-terminal confers stability to dimeric and tetrameric structure of legume lectins by getting sandwiched between β-sheets of two lectin monomers.
Homology modeling, prediction of ligand binding sites and molecular docking analysis of RbL protein Homology modeling provide an insight of structure function relationship of target protein through building a 3D structure using a template protein whose 3D structure is experimentally predicted. It produces high quality structural models when the similarity between the target and template protein is high, but sequences having similarity lesser than 20% identity produces unrelated structures. A 3D model of RbL protein was generated using the I-TASSER modeling server. The top hit template was PHA-E lectin (Phaseolus vulgaris lectin) (PDBID: 5AVA) (80.71% sequence similarity with a coverage value of 0.93). The server modeled RBL protein including 280 amino acid residues using iterative threading assembly re nement as in Figure 4a. I-TASSER modeling begins from the templates classi ed by LOMETS from the Protein Data Bank. LOMETS, a threading approach, includes multiple threading applications. The estimated TM-score and RMSD for the best model is 0.66 ± 0.13 and 6.90 ± 4.10Å, with a -0.41 C-score. Cscore is usually in the range of (-5, 2), where a more signi cant value means a model with tremendous con dence. In model validation, the chart color-coding black, dark grey, grey, light grey denote highly favored conformations Delta >= -2. White with black grid signi es favored conformations -2 > delta >= -4. The White with grey grid renders unreliable conformations delta < -4. Highly favored measurements bestowed 209 residues (84.615%) as green crosses. Favored measurements are conferred as brown triangles 24 residues (9.717%). The non-favorable measurements were displayed as red circles 14 residues (5.668%) as shown in Figure 4b.
Similar to other legume lectins, the dome like structure also known as lectin fold, jelly roll fold and βsandwich structure was also identi ed in RbL protein. The structure also showed six at stranded antiparallel β-sheets forming back face and seven-stranded curved anti-parallel β-sheets forming front face of structure. RbL consisted of 16-β strands connected with turn and loops (50% of the total amino acid residues). Since, 3-D model of RbL protein showed concurrency with conserved legume lectins structure (β sandwich model), the RbL protein must have two hydrophobic cores, (i) between the front and back sheets and (ii) between the front sheet and three loops, forming metal binding site and carbohydrate recognition domain.
Three-dimensional structure of RbL protein visualized with UCSF Chimera software (Pettersen et al. 2004) to identify the presence of characteristic carbohydrate binding loops in RbL structure. ProSA-web (Protein Structure Analysis web) server predicts Z score which represents the score of input structure to the score of experimentally determined protein deposited in Protein Data Bank (PDB) (Berman et al. 2006). The positive Z-score value indicates instability of structure while zero and negative scores indicates ideal structure (Filho et al. 2017). The server computed z score of -7.10 for RbL structure (Large black dot in supplementary gure 4) which was within the acceptable range (−10 to 10) (Yadav et al. 2012). Moreover, the z score for RbL structure was approximately equal to the z score of Vigna unguiculata lectin structure (-7.64) (Filho et al. 2017) and Z score of template protein (−6.52) which suggests that the predicted RbL model is reliable and share close structural similarity with template protein.
RbL structure Analysis revealed the presence of four loops namely A, B, C and D connected by turns and loops which are responsible for carbohydrate binding speci city of legume lectins (Figure 5a). The conserved triad [(Asp(D)-Asn(N)-Gly(G)/Arg(R)] responsible for monosaccharides binding through hydrogen bonding was also identi ed. Further, aromatic amino acid (Tyrosine), which is required for stacking with the non-polar face of sugar, was also identi ed in Loop C (Cummings et al. 2017). The folding in RbL protein, constituted typically from β-sheets interlinked by turns and loops, appears to create a very rigid and strong structural scaffold. In legume lectins, changes in orientation of β-sheets confer rigidity to structure and provide protection from proteolytic degradation (Van Damme et al. 1998).
Since metal ions are required for the stabilization of carbohydrate binding site, Ca +2 ion binding site formed by Glu 150 , Asp 152 , Asp 160 , His 165 amino acid side chains in RbL structure was also predicted ( Figure 5b). The presence of cis-peptide bond between Ala 111 and Asp 112 , which is essential for stability of carbohydrate bonding site in legume lectins was also identi ed in deduced RbL sequence (Sharon and Lis 2002).
Proteins perform their biological functions by interacting with ligands. Identi cation of ligand binding sites is of paramount importance to characterize protein on the basis of their biological function (Heo et al. 2014). Since in vivo and in vitro elucidation of ligand binding sites in a protein 3D-structure a laborious process, time consuming and costly process, in silico prediction using bio-computational tools is the suitable way of predicting ligand binding sites (Roche et al. 2010). GalaxySite web server predicted βgalactose, N-acetyl glucosamine and lactose and adenine as putative binding ligands for RbL protein (Table-1). We used COACH server and CDOCKER engine to predict binding site and docking analysis of RbL protein with these ligands including two additionally selected ligands glucose and mannose. The structures of N-acetyl-D-glucosamine, lactose, β-galactose, mannose, glucose, and adenine with PubChem ID 439174, 6134, 439353, 18950, 5793 and 190 were retrieved from Pubchem in SDF format. We recorded ligands interaction energy based on -CDOCKER interaction energy from the docking report, as depicted in Table 2.
On interpreting, the interactions between RbL and selected ligands (β-galactose, N-acetyl-D-glucosamine, glucose, mannose, lactose and adenine). It is observed that the residues Asp 112 , Ala 129 , Asn 156 , and Leu 241 are involved in hydrogen bond interaction with the hydroxal group of the selected ligands except adenine ( Figure 6). All the poses were adopted as springing subjects for the MD computations. The molecular dynamic simulations were analyzed based on RMSD, RMSF, and H-bond. RMSD values of protein-ligand complexes determined the structural changes from 0 to 10 ns. The overall RMSD values were between 0.20 to 4.50 nm and attained a steady state after the 4ns of simulation (Figure 7a). RMSF renders each atom's uctuation throughout the simulation. RMSF was accounted for RbL protein-ligand complexes; the values forti ed that the binding site residues bestowed small uctuation. The average RMSF values were 0.10 to 0.26 nm for the binding site residues, as depicted in Figure 7b.
The intermolecular H-bonding within the protein and ligand performs a crucial task in protein-ligand complexes stabilization. The hydrogen bond network stability built between RbL protein and selected ligands was measured during the 10ns simulation run. The cumulative estimate of H-bonds in the complexes versus time is shown in Figure 8. The number of H-bonds with an average of 4 to 6 was established in RbL complexes throughout the simulation with β-galactose, N-acetyl-D-glucosamine, glucose, mannose, lactose and adenine. This indicates that the selected ligands have enduring and robust hydrogen bonding with RbL protein.
ProteinPredict web server revealed that RbL protein had carbohydrate binding as one of the important molecular functions (Table 3). Further, analysis also revealed that RbL protein has function in cellular processes. This probably denotes the function of RbL protein as storage protein for providing essential amino acids during seed germination. Argot 2.5 web server also annotated RbL protein by identifying various GO terms among which the most signi cant terms were GO:0030246 (Carbohydrate binding); GO:0046872 (Metal ion binding); GO:0006952 (Defense response); GO:0043086 (Negative regulation of catalytic activity) GO:0050832 and GO:0042742 (Defense response to fungus and bacterium, respectively and GO:0031640 (Killing of cells of other organisms) ( Supplementary Figure 5a and 5b).
Computational analysis of RbL structure also revealed the presence of adenine binding site which suggests its function in storage of adenine derived plant hormones in seeds and during stress as extracellular ATP (Adenine is one of the structural component of ATP) has been recognized as central signaling molecule in plant defense response (Delatorre et al. 2007;Cao et al. 2014;Choi et al. 2014). The interaction of RbL protein with ATP could be related to the presence of conserved receptor kinase domain in RbL protein. Moreover, RbL has speci city to bind with N-acetyl-D-glucosamine, a monomer of chitin, it is logical that RbL protein would have entomotoxic properties. Taken together, these results support the role of RbL protein in plant defense.

Conclusions
For the rst time, we carried out in silico analysis of novel lectin precursor from rice bean culminating into valuable information on structural and functional properties of RbL protein. ProtParam analysis revealed hydrophobic, acidic and stable nature of RbL protein with two putative signal peptide cleavage sites and a C-terminal domain. RbL protein was identi ed as β-sheet protein due to the dominance of β-sheets and the presence of jelly roll fold (β-sandwich structure) in 3D structure. Comparative sequence analysis showed high sequence similarity in the primary structure of RbL protein and other legume lectins. Molecular docking analysis unveiled the potential of N-acetyl-D-glucosamine β-galactose, glucose, mannose, lactose and adenine ligands to associate with the binding pocket of the RbL protein. Furthermore, molecular docking and dynamics simulation outcomes con rmed hydrogen bond interactions during the docking investigation for decent binding between ligands and its feasible binding site in RbL protein. Gene ontology analysis revealed that RbL protein has signi cant role in plant defense. Moreover, the presence of adenine binding site also supports the role of RbL protein in plant defense mechanisms. The current novel information on RbL protein would be valuable for conducting research on future implication of RbL protein in biomedical and biotechnological interventions such as glycan pro ling, disease research and plant protection.

Declarations
Ethical Approval-Not applicable Consent to Participate-granted Consent to Publish-Consent is granted for publishing paper in AB&B