Exploring Molecular Docking algorithm for Lung Cancer Drug Discovery – A Case Study with RET Protein

Non-small cell lung cancer (NSCLC) is one of the leading causes of cancerrelated deaths across the globe.1.33%of all NSCLC cases occur due to an alteration inRETprotein. Commonly occurringRET fusionpartners includeKIF5B, CCDC6, NCOA4, and TRIM33. Numerous multikinase inhibitors are active against rearranged RET. However, mutations in the RET-fusion protein can result in adverse effects in terms of drug resistance against NSCLC. In this context, molecular docking algorithm is certainly important to support the drug discovery pipelines. However, availability of huge number of algorithms in the literature limits the researchers to proceed further in drug discovery development. Thus, the present study focuses on inding the best docking algorithm among ArgusLab, PatchDock, AutoDock 4.0 and AutoDock Vina for drug discovery process against RET fusion cancers using Pearson’s correlation coef icient. We believe that our study will be a valuable source of information for carrying out further computational studies on RET fusion cancer, bothmutant and wild type.


INTRODUCTION
Cancer is still a major cause of the ongoing health crisis throughout the world. Lung cancer, by itself, accounts for a signi icant portion of cancerassociated deaths across the world. According to an estimate drawn in 2017, it had a higher mortality rate than breast, colorectal, and brain cancers combined in the US. Model-based estimates have projected to show 12.66% new lung cancer cases of the 1,806,590 new cancer cases and 22.37% lung cancer deaths among the 606,520 cancer deaths in the US (Siegel et al., 2020). Of the two extensive types of lung cancers, about 86% is of the non-small cell lung cancer type. Oncogenic driver mutations occurring at a varied frequency of 2-25% (Kadota et al., 2016) in genes encoding VEGF, EGFR, ALK, KRAS, BRAF, MET, HER2, RET, ROS1 and PIK3CA results in NSCLC (D' Arcangelo et al., 2013;Oxnard et al., 2013).
The receptor tyrosine kinase RET protein is encoded by the Rearranged during Transfection (RET) protooncogene (10q11.2). RET protein is made up of 3 domains: 4 cadherin like repeats making up an extracellular domain consisting of a cysteine rich region and a Ca binding site, an encompassing transmembrane domain, along with a tyrosine kinase domain (intracellular). The GDNF family of proteins acts as ligands for RET protein. The multimeric complex formed after ligand-receptor interaction activates the kinase domain, causing autophosphorylation of the intracellular domain followed by the activation of several signaling pathways, including MAPK, PI3K/AKT, JNK and ERK (Phay and Shah, 2010).
Genetic aberrations in the RET protein can lead to the deregulation of receptor tyrosine kinase (RTK) signaling that can cause malignancy. Apart from point mutations, chromosomal rearrangements in the RET gene reason for the 1-2% of NSCLC (Kohno et al., 2012;Ju et al., 2012). The commonly recognized RET fusions include KIF5B, CCDC6, KKIAA1217, KIAA1468, TRIM FRMD4A, CUX1, and NCOA. Of these, the most common fusion protein comprises of KIF5B of the kinesin family 5B and CCDC6 of the coiled-coil domain containing-6 (Wade and Iams, 2018).
These fusions cause overexpression of the RET protein that contains the RET kinase domain, a transformation inducer. They are seen in tumors that rarely harbor mutations in the mutual drivers like ALK, EGFR, and KRAS (Platt et al., 2015) and is considered to be a "novel driver molecular" of lung adenocarcinoma in patients without the aforementioned common mutations (Wang et al., 2019). Many protein TKIs (tyrosine kinase inhibitors) are available that show anti-RET activity. Some are broad-spectrum drugs and tend to be less potent. Dose reduction and discontinuation of TKIs due to a higher rate of grade 3 and 4 toxicities owing to activity against VEGFR kinases limit the optimal ef icacy of these TKIs (Ferrara et al., 2018). The drug resistance acquired by the drug targets requires inding novel drugs that can overcome the limits of the current targeted therapies. For instance, the protein we are working on RET-CCDC6 responded to vandetanib treatment until a secondary mutation S904F rendered the inhibitory activity of vandetanib ineffective (Nakaoku et al., 2018).
Biocomputational technology and rational drug design has made novel drug discovery easier and to predict their effects on a target at relatively low expense and time compared to wet-lab experiments, especially in structure-based drug design. Molecular docking has become an essential tool in pharmaceutical drug research to predict binding af inities of the docked ligand to the protein via simpliied free energy calculations (Tangyuenyongwatana and Jongkon, 2016). The docking algorithm predicts the optimized conformation, with less binding free energy (Dar and Mir, 2017).
Docking software used includes SymmDock, Patchdock, Argus Lab, Flexx, GOLD, DOCK, HADDOCK, AutoDock, SEED, SCORE, etc. PatchDock, Argus Lab, and AutoDock are some of the widely used software with AutoDock being the most cited and widely used free docking tool. Among the available numerous software packaes choosing the best algorithm is a dif icult task. Comparision and correlation studies conducted using various algorithms will provide an accurate prediction close to the in-vitro studies. Our objective is to propose the best docking algorithm for NSCLC drug discovery against RET-CCDC6 fusion protein among, PatchDock, Argus Lab 4.1, AutoDock 4.2, and AutoDock Vina.

Data Set
For this study, the rearranged RET-CCDC6 fusion protein was selected and its structure was derived from protein data bank (PDB ID: 6NJA) (Terzyan et al., 2019). Likewise, Mutant RET -CCDC6 fusion protein (at position S904F) was retrieved from PDB ID: 6FEK (Nakaoku et al., 2018).
The 3D structures of the ligands/small molecules ($) were downloaded from PubChem database in sdf format. These were then converted to PDB structures with the help of OpenBabel GUI software.

PatchDock
The receptor (wild type or mutant RET fusion protein) and ligand molecule (drug or RET kinase inhibitor) were uploaded in PDB format on the server and docking scores were obtained. The best possible binding scores were resulted from the server (Aruleba et al., 2018).

ArgusLab 4.1
The crystal structure of wild type (PDB ID: 6NJA) and mutant RET fusion protein (PDB ID: 6FEK) was downloaded into ArgusLab 4.1 program and binding site was made by choosing "Make binding site for this protein" option. The ligands were selected and docked in the generated grid on the proteins (Baskaran and Ramachandran, 2012). On running, the binding energies of different proteinligand docking were derived, for both mutant and wild type RET fusion protein.

AutoDock 4.2
It is the most common molecular modelling software used to study protein-ligand interactions (Song et al., 2017). The proteins (6NJA and 6FEK) are selected and loaded on to the AutoDock. Necessary charges and hydrogen atoms were added to the protein. Speci ications of the size and position of the grid box is given, and the prepared ligands were then allowed to dock to the speci ied binding site. The free energy of binding (kcal/mol) and the Estimation Inhibition Constant (Ki) are generated after the successful completion of docking.

AutoDock Vina
This is the more ef icient form of AutoDock. This next-generation molecular modeling software    works by preparation of the proteins and ligands, in a similar fashion as that in AutoDock, then docking is done by use of command prompt by giving the code for Vina (Trott and Olson, 2009). Autodock Vina gives the RMSD values and Af inity (in kcal/mol) of the best binding position of a ligand to the fusion protein.

Statistical Validation
It is the measure of the strength of the association between the two variables. The 1st variable chosen is the IC50 values of the small molecules determined from literature. The correlation of these values are checked against binding score and energies determined by the use of 4 algorithms separately, for both the wild type and mutant RET fusion protein (Tanchuk et al., 2016). The value of r is determined using the formula: Here, x stands for the IC 50 (nm) values for the different drugs against RET-CCDC6 fusion; y stands for the docking score/energies of the respective drug molecules, n is the number of drug molecules and r is the value for Pearson's correlation coef icient.

Molecular docking scores and energies
The docking score/ energy (Kcal/mol) is generated for all the muti kinase inhibitors docked against the target proteins, 6NJA and 6FEK and presented in Table 2 for PatchDock, Table 3 for Argus Lab 4.01, Table 4 for AutoDock 4.2 and Table 5 for AutoDock Vina. Docked complex view of one of the reference molecules (AD 80) against RET fusion protein using AutoDock 4.2 is shown in Figure 1.

Pearson's Correlation Co-ef icient
The ef icacy of the four algorithms is then determined by deriving the Pearson correlation coef icient or the r value of the IC 50 of the drugs against the respective docking scores/energies. A higher positive score signi ies a positive correlation which is necessary for an algorithm to be deemed as workable. Table 6 gives the r values for the wild RET-CCDC6 fusion protein and mutant type RET-CCDC6 fusion protein for different algorithms. From the results, we understand that the value of r obtained in the case of AutoDock Vina 0.41 for 6NJA and 0.45 for 6FEK signi ies positive correlation in both cases and is higher than the r values obtained for 6NJA and 6FEK in case of the other algorithms.

CONCLUSIONS
RET-CCDC6 fusion in lung cancer is a topic of increasing interest because of the drug resistance that develops in patients suffering from it. In the present study, the ef icacy of the different algorithms available in the literature was explored against RET-CCDC6 protein. The parameter Pearson correlation coef icient was utilized to check the ef icacy of each docking algorithm. Of the 4, AutoDock Vina was the one algorithm which gave positive Pearson's correlation coef icient (r) value for both the wild type RET fusion protein and the mutant RET fusion protein. Therefore, we conclude that 'AutoDock Vina' algorithm is the method of choice for drug discovery studies, especially against RET protein of Lung cancer.