Sequence-Structure Comparative and Network-Based Prediction of Drought Gene Candidate Regulator in Elaeis guineensis

Drought poses a significant threat to global food security, particularly impacting crops like oil palm. Selecting genes for genome editing to enhance drought tolerance presents formidable challenges. To ensure that the target gene is chosen correctly and results in the desired character, a pilot study is necessary to determine the target gene for knockout. Two genes drought-related, At-BRL3 and AtOST2, were scrutinized in this context. Aligned with the Elaeis guineensis genome, their neighbouring proteins and gene ontology were analysed to identify potential targets for genome editing. AtBRL3, identified as BRL1 (XP_010913986.1) in E. guineensis , exhibited 58.48% identity and 100% coverage. It interacts with 12 nodes, including BIR1, BRI1, and AT2G20050, crucial for signalling pathways and cellular responses. Molecular function analysis revealed kinase activity. AtOST2 showed high similarity to plasma membrane ATPase/HA1 (XP_010913679.1) in E. guineensis , with 87.46% identity and 100% query cover. It correlated with 14 genes associated with ABA stimulus, stomatal movement, and hormone response. EgBRL1 and Eg-HA1, resembling AtBRL3 and AtOST2, respectively, emerge as promising targets for developing drought-tolerant oil palm cultivars through gene editing. Nonetheless, further validation through in vitro gRNA target selection and in vivo conversion of OST2/BRL3-containing plasmids in oil palm callus-es is indispensable to demonstrate their efficacy in conferring novel drought resistance traits.

oil palm needs to be more drought-tolerant.According to Adam et al. (2011), the generation of male inflorescences and female fruit bunch or flower abortion brought on by stressors both had an impact on the bunch's production.Through a multilocation test of progeny trials in both dry and wet environments, the conventional method of selecting palms for drought resistance has been done.Conventional breeding is a process of selective breeding where crops are chosen based on their superior performances.The most well-known traditional breeding techniques include hybridization, recurrent selection, mass selection, backcross breeding, and pure-line selection.It requires more time and is excessively reliant on a plant's genotype.But different extrinsic factors have an impact on a plant's phenotypes.However, selection based on phenotypic expression is mostly inaccurate.Breeders began incorporating numerous biological specialties into plant breeding as a result, and they created modern breeding techniques.High throughput phenotyping, genomic selection, markers-assisted breeding, and CRISPR-Cas9 are some of the most popular modern breeding techniques (Lamichhane & Sapana 2022).Our group have been harnessing the CRISPR/Cas9 technology to improve oil palm genetics for favourable traits, such as Ganoderma tolerant using transcriptomic approach (Putranto et al. 2019).To ensure that the target gene is chosen correctly and results in the desired character, a pilot study is necessary to determine the target gene for knockout.Thus, before we continue to designing sgRNA for further CRISPR experiments, a bioinformatics approach to select precise gene target was done.
Based from literature studies, two genes namely OST2 and BLR3 was chosen as candidate.In Arabidopsis, OPEN STOMATA 2 (OST2) (AHA1) is a key plasma membrane H+-ATPase involved in the stomata response (Merlot et al. 2007).Plant cells create proton gradients by the action of plasma membrane proton (H+)-ATPases, which activate a variety of secondary transporters that facilitate the uptake of ions and metabolites (Palmgreen 2001;Osakabe et al. 2014).The eleven Arabidopsis plasma membrane H+-ATPases, or AHA1-AHA11 (Baxter et al. 2003), are made up of transmembrane domains with ten helices that include phosphorylation and nucleotide-binding sites, as well as N-and Cterminal domains in the cytoplasm (Pedersen et al. 2007).The primary regulatory domain involved in H+-ATPase inhibition is the C-terminus; phosphorylation in this area and subsequent interaction with 14-3-3 proteins regulate activation (Svennelid et al. 1999).A study reported two dominant mutations in the ost2 locus result in constitutive activation of the proton pump by eliminating stomata responses to abscisic acid (ABA) (Pedersen et al. 2007).Additionally, a study documented that ost2_cripspr mutants exhibited a markedly higher degree of stomatal closure in conjunction with a lower amount of transcriptional water loss when evaluating the stomatal response under ABA-induced circumstances.The results showed that a mutation at the OST2 locus caused by CRISPR/Cas9 improved stomatal responsiveness, which in turn promoted drought tolerance (Joshi et al. 2020).While BRL3 gene is classified as leucine-rich repeat (LRR)-RLK family members of the BR-INSENSITIVE 1 (BRI1), bind directly to brassinosteroid (BR) hormones (Li et al. 1997;Wang et al. 2001;Kinoshita et al. 2005;Hothorn et al. 2011;She et al. 2011).Early BR signalling events (Gou et al. 2012) depend on BRI1's interaction with the co-receptor BRI1 ASSOCIATED RECEPTOR KINASE 1 (BAK1), which is triggered by ligand perception.The BRI1-EMS-SUPPRESSOR1 (BES1) and BRASSINAZOLE RESISTANT1 (BZR1) transcription factors (Yin et al. 2002;Wang et al. 2002;He et al. 2002) are primarily responsible for controlling the expression of certain BR-regulated genes.This BRI1-BAK1 heterodimerization starts a signalling cascade of phosphorylation events.While BRs influence several developmental and environmental stress reactions in plants, there is ongoing debate regarding the precise function of BRs in stressful situations.While overexpressing the BR biosynthesis enzyme DWF4 and applying BRs exogenously both improve a plant's ability to withstand drought stress, suppressing the BRI1 receptor also produces drought-resistant phenotypes (Feng et al. 2015;Ye et al. 2017).Interestingly, interaction between the two pathways upstream of the BRASSI-NOSTEROID-INSENSITIVE 2 (BIN2) kinase has been reported (Zhang et al. 2009;Gui et al. 2016).ABA signalling suppresses the BR signalling pathway following BR perception.BRL3ox shoots showed higher concentrations of proline, GABA, and tyrosine while under drought stress.On the other hand, the most prevalent metabolites in the BRL3ox roots during the stress time course were trehalose, sucrose, myo-inositol, raffinose, and proline.Significantly, there has been prior research connecting all of these metabolites to drought tolerance (Fàbregas 2018).
Since both genes historically explored in model plant Arabidopsis, here we performed alignment to the reference genes with the genome database of oil palm (E.guinensis), protein modelling, and validation using Ramachandran plot.A networking analysis was also performed to support the data of the chosen target genes.

MATERIALS AND METHODS Protein sequence collection and modelling
The AtBRL3 and AtOST2 sequences were collected from Uniprot database, with ID Q9LJF3 and ID A0A1P8AYX4, respectively.In order to select the best model protein of BRL3/OST2 from Arabidopsis thaliana and Elaies guineensis, we utilised Robetta prediction, with RoseTTA fold option chosen.Robetta's accuracy is primarily reliant on the presence of homologs, or homologous sequences, in the PDB, UniProt, and Uniclust sequence databases.In the supplementary information of the RosettaCM publication (Baek et al. 2021), a predicted confidence value that accounts for this is given for comparative modelling domains and was found to correspond with the actual GDT to native.The model prediction was then validated and compared by its secondary structure using Ramachandran plot by PROCHECK (https://www.ebi.ac.uk/thornton-srv/ software/PROCHECK/).

Alignment and phylogenetic tree construction
The protein sequence of AtBRL3 and AtOST2 were globally aligned to E. guineensis protein database using the BLASTP program in NCBI (https://blast.ncbi.nlm.nih.gov/).The data retrieved were collected to build a phylogenetic tree.Neighbour-joining (NJ) approach and bootstrap 1000 were considered because of the high similarity score between proteins.

Networking analysis
To understand the neighbour protein interaction with AtBRL3 and AtOST2, a network analysis and gene ontology (GO) is performed with String-db (https://string-db.org/)(Szklarczyk et al. 2023) and the network was visualized with Cytoscape (Shannon et al. 2003).
To support the data, six identical proteins of EgBRL1 were then analysed to estimate its phylogenetic correlation with AtBRL3 (Figure 2).Roughly, AtBRL3 showed closeness with XP_010928900.1 and XP_010913986.1.This is in line with high similarity percentage results from BLASTP.Accordingly, ID XP_010913986.1 became the best candidate, with query cover 100%, of the closest protein to AtBRL3 based on the sequence comparison.
A general rule of thumb: two protein sequences are said to be homologous if they share more than 30% of their total lengths (far greater identity score is seen by accident in short alignments), although the 30% criterion ignores a lot of readily observable homologs.Taken together, the finding of AtBRL3/AtOST2 homologs passing the cut off score and considered have high homology.Also, protein sequence chosen as the template is correlated to the sensitivity of protein (and translated-DNA) than DNA: DNA similarity searches.Compared to protein-protein or translated alignments, the evolutionary look-back time for DNA: DNA alignments is between 5 and 10 times shorter from Protein: DNA alignments.After more than 200-400 million years of divergence, DNA-DNA alignments hardly ever find homology; nevertheless, protein-protein alignments frequently find similarity in sequences that last shared an ancestor more than 2.5 billion years ago (Pearson 2013).The phylogenetics tree illustrates two different clusters, with bootstrap score 99.8 and 64.4 (Figure 4).AtOST2 is located in the bigger cluster with others EgOST.However, phylogenetic tree shows XP_010913679.1 is the closest based on the genetic distance, supporting with highest score from BLASTP (100% query cover and 87.46% similarity).

Protein structure comparison of BRL3/OST2 in A. thaliana and E. guineensis
To compare the protein structure of BRL3/OST2, a protein modelling based on ab initio approach was performed.The sequence of AtBRL3 (ID Q9LJF3) was modelled with RoseTTAFold, with confidence 0.76, from residue 1 to 1164 (Figure 5a).Coherently, the sequence of XP_ XP_010913986.1 (EgBRL1) was also modelled into 3D structure, with confidence score 0.75 (Figure 6a).The model protein was then validated by Ramachandran plot (Figure 5b, 6b, Table 1).
The AtOST2 protein with ID A0A1P8AYX4 and XP_010913679.1 (EgHA1) were also modelled by RoseTTAFold by Robetta tools.AtOST2 was successfully modelled with confidence score 0.76 (Figure 7a), while 0.81 is the confidence score of EgHA1 the (Figure 8a).
Next, Ramachandran analysis was also performed to validate the protein modelling results (Table 1).The Ramachandran plot, which displays the mapping of pairs of torsion angles of the polypeptide backbone on the background of the "allowed" or predicted values, is one of the most helpful techniques for validating protein structures.Glycines and other amino acids, as well as various amino acids to a lesser extent, have substantially varied allowable areas of the Ramachandran plot (Wlodawer 2017).The data showed AtBRL3 and XP_010913986.1 have 82.7% and 82.1% most favoured regions, respectively.Even better, higher percentage was exhibit by modelled of AtOST2 and XP_010913679.1 94.2 and 94.9%, respectively.In terms of secondary structure, AtBRL3 structure consist of helix structure mostly in the N terminal, then followed by coil structure mixed with strand and helix in some part up to the C terminal.The secondary structure is slightly similar to the XP_010928900.1 protein model.In more detail, the helix structure of XP_010913986.1 initiate in the beginning of the model (AA 2-20 and 26-38) making the coil structure longer than the AtBRL3 (Figure 9a and b).Moreover, the template available for modelling in both AtBRL3 and XP_010928900.1 are different.
Whilst AtOST2 and XP_010913679.1 show obvious distinct secondary structures.AtOST2 consists of helix in the region AA 1-61, in between region appear coil structure in AA 21-27 (Figure 10a).It is contrast to the modelling of XP_010913679.1,which showing coil-helix-coil -helix repeatedly in the whole amino acids structure (Figure 10b).

Networking analysis
To explore the neighbour-gene of AtBRL3 and AtOST2 in the cellular level and analyse the gene ontology occur in the system, we performed networking analysis.When AtBRL3 and AtOST2 put in the network, there is no line associated with other genes related to proliferation and growth, such as AUX1, ARF15, TIR1, SKP1, ILR1 and oil biosynthesisrelated genes such as FAD3, KASI, WRI1, FATB (Figure 11a & 12a).However, different ways of data mining shows that AtBRL3 alone is connected to the several proteins related to the defence mechanism against pathogen and growth signalling such as BAK1, BIR1, SERK4, and BRI1 (Figure 11b).A null allele of BAK1 exhibits a semi-dwarf phenotype and has decreased sensitivity to brassinosteroids (BRs), whereas overexpression of BAK1 causes elongated organ phenotypes.BRI1 interacts with serine/threonine protein kinase BAK1 both in vitro and in vivo.
A severe dwarf phenotype similar to the phenotype of null bri1 alleles is produced by the expression of a dominant-negative mutant allele of BAK1.These findings show that BAK1 is a part of the BR signalling system (Li et al. 2002).Numerous studies have shown that BAK1 regulates dPCD (development-related programmed cell death) in an essential way.For instance, silencing GhBAK1 in cotton (Gossypium hirsutum) results in high levels of cell death and increased ROS production, indicating that BAK1 controls cell death in a way that is conserved across a wide range of plant species (Gao et al. 2013(Gao et al. , 2019)).The networking data leads to the hypothesis that BRL3 gene is important for plant survival.Taken together, BRL3 as a target gene for editing for drought-tolerant oil palm is crucial to be kept or upregulated in the system.In Arabidopsis, overexpression of BRL3, a member of the brassinosteroid receptor family with increased vascularity, can increase resistance to drought stress.Overexpression of the drought-tolerant BRL3 receptor offers drought tolerance without hindering overall development, in contrast to loss-of-function mutations that result in drought resistance at the price of growth in the widely expressed BRI1 receptor (Fàbregas et al. 2018).Thus, the BRL3 activation is one of the alternatives to generate drought-tolerant oil palm.
In line with networking results, AtOST2 is also correlated to proteins related to plant adaptation to abiotic stress such as HAB1, ABI5, PP2CA, PYL13 (Figure 12b).HAB1, one of the main PP2Cs from Clade A protein phosphatases, is a negative regulator of ABA signalling in Arabidopsis.The research suggests that PYL5 is a nuclear and cytosolic ABA receptor that directly inhibits clade A PP2Cs to activate ABA signalling.Furthermore, PYL5-mediated suppression of clade A PP2Cs can be used to achieve increased resistance to drought (Santiago et al. 2009).In the presence of ABA and abiotic stressors, the basic leucine zipper transcrip- tion factor known as ABA Insensitive 5 (ABI5) is essential for controlling seed germination and early seedling growth.ABI5 controls the expression of genes that have the ABSCISIC ACID RESPONSE ELEMENT (ABRE) pattern in their promoter region, contributing to the core ABA signalling that is made up of PYR/PYL/RCAR receptors, PP2C phosphatases, and SnRK2 kinases.The stress adaption genes, such as LEA proteins, are among the regulated targets (Skubacz et al. 2016).However, the correlation between OST2 and ABI5 is still unclear, especially on whether it plays as an inhibitor or activator.OST2 is one of the candidate target genes for genome editing to develop drought-tolerant oil palm by knocking it out, based on the previous research (Osakabe et al. 2016).

CONCLUSION
In terms of sequence comparison, BRL3 and OST2 are promising target for gene editing to generate drought-tolerant oil palm.This is supported by the high similarity of alignment from both genes in A. thaliana to E. guinensis and structure modelling protein shows favourable comparison.In addition, functional protein prediction via network analysis shows BRL3 and OST2 playing important roles in drought tolerance, via activation and or gene inhibition.However, to prove the effectivity of both genes to generate new variety of drought tolerance in oil palm, an in vitro (gRNA target selection) and in vivo (by transforming the plasmid containing OST2/BRL3 gene to the oil palm callous) approaches are inevitable.

AUTHORS CONTRIBUTION
G.W.P and R.A.P designed the research and supervised all the process, G.W.P, A.A.A, L.D.M and Y.S. collected and analysed the data.I.R, H.M, E.Y wrote the manuscript.

Figure 1 .
Figure 1.Global alignment of AtBRL3 to E. guineensis protein database with query cover and similarity information (A), visualization of multiple alignment of six similar proteins EgBRL1 and AtBRL3 (B), detail of alignment data (C).

Figure 3 .
Figure 3. Global alignment of AtOST2 to E. guineensis protein database with query cover and similarity information (A), visualization of multiple alignment from twelve similar plasma membrane ATPase proteins from E. guineensis and AtOST2 (B), detail of alignment data (C).

Figure 2 .
Figure 2. Phylogenetic construction of AtBRL3 and six similar proteins in E. guineensis using Neighbour joining approach with 1000 bootstrap.

Figure 4 .
Figure 4. Phylogenetic construction of AtOST2 and twelve similar proteins in E. guineensis using Neighbour joining approach with 1000 bootstrap.

Figure 11 .
Figure 11.String-db analysis of AtBRL3 and proteins related to proliferation, growth, and oil biosynthesis (a), Network analysis of BRL3 by Cytoscape (b).

Figure 12 .
Figure 12.String-db analysis of AtOST2/HA1 and protein related to proliferation, growth, and oil biosynthesis (a), Network analysis of AtOST2 by Cytoscape (b)