In Silico Drug Design of Anti-Breast Cancer Agents

Cancer is a condition marked by abnormal cell proliferation that has the potential to invade or indicate other health issues. Human beings are affected by more than 100 different types of cancer. Some cancer promotes rapid cell proliferation, whereas others cause cells to divide and develop more slowly. Some cancers, such as leukemia, produce visible tumors, while others, such as breast cancer, do not. In this work, in silico investigations were carried out to investigate the binding mechanisms of four major analogs, which are marine sesquiterpene, sesquiterpene lactone, heteroaromatic chalcones, and benzothiophene against the target estrogen receptor-α for targeting breast cancer using Schrödinger suite 2021-4. The Glide module handled the molecular docking experiments, the QikProp module handled the ADMET screening, and the Prime MM-GB/SA module determined the binding energy of the ligands. The benzothiophene analog BT_ER_15f (G-score −15.922 Kcal/mol) showed the best binding activity against the target protein estrogen receptor-α when compared with the standard drug tamoxifen which has a docking score of −13.560 Kcal/mol. TRP383 (tryptophan) has the highest interaction time with the ligand, and hence it could act for a long time. Based on in silico investigations, the benzothiophene analog BT_ER_15f significantly binds with the active site of the target protein estrogen receptor-α. Similar to the outcomes of molecular docking, the target and ligand complex interaction motif established a high affinity of lead candidates in a dynamic system. This study shows that estrogen receptor-α targets inhibitors with better potential and low toxicity when compared to the existing market drugs, which can be made from a benzothiophene derivative. It may result in considerable activity and be applied to more research on breast cancer.


Introduction
Breast cancer is defined as a malignant tumor that starts in the cells of the breast. The type of breast cancer is determined by which cells in the breast becomes cancerous [1]. There are numerous locations in the breast where breast cancer can begin. Breasts primarily consist of lobules, ducts, and connective tissue. The milk travels through the ducts, which are tubes, from the breast to the nipple [2]. Connective tissue, which is made up of fibrous and fatty tissue, holds everything together. Usually, ducts or lobules are places where breast cancer develops [3]. Blood and lymphatic vessels are two ways that breast cancer can spread to other body areas. Metastasis refers to the spread of breast cancer to other bodily regions [4]. Cancer is a disorder wherein some body cells proliferate out of control and spread to other body regions [5]. In any one of the billions of cells that make up the human body, cancer can start almost anywhere. With more than 10 million deaths from cancer in the previous year, it is the leading cause of mortality worldwide. In all regions of India, the incidence of breast cancer has been rising by 0.5% to 2% annually in all age

Results and Discussion
The results are summarized in Tables 1-3 and Figures 1-14. The observation showed that the chemical makeup of the substituents had a significant impact on the compounds' ability to inhibit breast cancer. The chemical structures of benzothiophene derivatives are given in Figure 1. The marine sesquiterpene analogs, heteroaromatic chalcones analogs, and sesquiterpene lactone analogs which have been tested are given in .

Molecular Docking Studies
For the purpose of assessing the compounds' binding affinities at atomic levels, the ligands were docked to the active sites of proteins using the molecular docking program Glide module of Schrodinger suite 2021-4. To ascertain the inhibitory action of the developed analogs, they were docked to the breast cancer target (PDB ID: 2IOG). It is amply established that when compared to the standard drug tamoxifen, the compound BT_ER_15f has the highest Glide G-score (−16.14). BT_ER_15f represents BT-benzothiophene analog; ER is the target, and 15f is the compound code as given in Figure 1. The docking score and Glide G-score are given in Table 1 below which shows the best binding pose of the top 60 compounds. Figure 5 below represents the 2D and 3D docked poses of compound BT_ER_15f. The other 2D and 3D docked poses of the top 10 compounds are given in Figure S1a-j in the Supplementary Data.

Molecular Docking Studies
For the purpose of assessing the compounds' binding affinities at atomic levels, the ligands were docked to the active sites of proteins using the molecular docking program Glide module of Schrodinger suite 2021-4. To ascertain the inhibitory action of the developed analogs, they were docked to the breast cancer target (PDB ID: 2IOG). It is amply established that when compared to the standard drug tamoxifen, the compound BT_ER_15f has the highest Glide G-score (−16.14). BT_ER_15f represents BT-benzothiophene analog; ER is the target, and 15f is the compound code as given in Figure 1. The docking score and Glide G-score are given in Table 1 below which shows the best binding pose of the top 60 compounds. Figure 5 below represents the 2D and 3D docked poses of compound BT_ER_15f. The other 2D and 3D docked poses of the top 10 compounds are given in Figure S1a-j in the Supplementary Data.
The lipophilic evidence of the aromatic moieties is what mostly causes the Glide scores to increase. The amino acid residues such as ASP351, GLU 380, and GLU419 form a negative charge around the ligand (BT_ER_15f_), and LYS 531 forms a positive charge around the ligand.
The discovered binding modes demonstrated that the ligand (BT_ER_15f) created connections with various residues LEU391 to LEU428 surrounding the active pocket through hydrogen bonds, hydrophobic interactions, and other mechanisms. The 2D and 3D interaction diagram of BT_ER_15f with protein 2IOG is given in Figure 5.
The amino group of the ligand (BT_ER_15f) binds to the active pocket with the amino acid residues TRP383 and ASP351.

Binding Free Energy Calculation Using MM/GBSA
Additionally, molecular docking was evaluated using MM/GBSA free restricting vitality, which is identified for breast cancer (PDB ID: 2IOG) target using a post-scoring approach, and the results are displayed in Table 2. The free energy of binding for the group of ligands was calculated using the Prime molecular mechanics-generalized Born surface area (MM/GBSA) of Schrödinger 2021-4 suite. The OPLS4 force field was used to minimize energy from the post-docked ligand-receptor complex with generalized-Born/surface area (MM/GBSA).
Because of the significant negative values produced by all compounds in the MM/GBSA experiment, the energies that showed strong ligand binding in the binding pocket of 2IOG are van der Waals energy (MMGBA dG Bind vdW) and non-polar solvation (MMGBA dG Bind Lipo). Other energies, such as covalent energy (MMGBA dG Bind Covalent) and electrostatic solvation (∆GSolv), do not favor receptor binding. Moreover, greater negative values of MMGBA dG Bind vdW and MMGBA dG Bind Lipo demonstrate extraordinary hydrophobic interaction with 2IOG and ligands.
According to the findings of the MM/GBSA research, the DG bind values for considerably active compounds were found to be in the range of −15.33 to −84.12 kcal/mol. Additionally, dGvdW values, dG lipophilic values, and the energies are favorably contributing to the total binding energy [17]. BT_ER_15f, which has the highest docking score, exhibited excellent DG bind values of −70.59 kcal/mol.

ADMET Studies
ADMET features were predicted using the Schrödinger suite 2021-4's Qikprop module. Properties such as molecular weight, dipole, hydrogen bond donor, hydrogen bond acceptor, log P o/w, and Lipinski's rule of five are identified and mentioned in Table 3 below.
According to Lipinski's rule of five, the molecule's molecular weight should be ≤500, the partition coefficient should be ≤5, and the number of hydrogen bond donors and acceptors should be ≤5 and ≤10, respectively. All of these qualities, together with molecular flexibility, are thought to be important drivers of oral bioavailability. The BT_ER_15f ligand possesses a molecular weight of 561.67, a dipole moment of 5.605, an estimated number of hydrogen bonds that would be donated by the solute to water molecules in an aqueous solution is 3, and an estimated number of hydrogen bonds that would be accepted by the solute from water molecules in an aqueous solution is 7.5. With fewer exceptions, the obtained ADMET attributes are within the suggested ranges.
The number of H-bond donors is in the range of 0-2; the number of H-bond acceptors is in the range of 2-9.7. The number of violations of Lipinski's rule of five is 0-2.

Pharmacophore Modeling
A pharmacophore model is a theory that explains how a group of compounds that bind to the same biological target exhibit the biological behaviors that have been observed [18]. The electron-withdrawing group, hydrogen bond donor, and hydrophobic top-active compounds are given below in Figure 6. The pharmacophore models were created using the Phase module of the Schrödinger suite 2021-4. The default set of six chemical properties of Phase was used to build pharmacophore sites for these compounds: hydrogen bond acceptor (A), hydrogen bond donor (D), hydrophobic (H) negative ionizable (N), positive ionizable (P), and aromatic ring (R). The distance and angles between different AAHHH.3 sites are shown in Figure 7a,b. AAHHH.3 represents that two hydrogen bond acceptors and three hydrophobic groups are essential for the activity. All ligands had their fitness scores evaluated using the AAHHH.3 model. A scatter plot analysis was also used to uncover discrete vital pharmacophoric requirements at spatial structure areas. The blue cubes around the ligand represented a favorable position for group substitution, whereas the red cubes showed a non-favorable position in Figure 6a-c for the top four ligands of this study.

3D-QSAR Results
The atom-based QSAR module of Schrodinger suite 2021-4 was used to create the 3D-QSAR models for ER-alpha. Pharmacophore-based alignment of the ligands was taken into consideration in order to produce a statistically meaningful and highly predictive 3D-QSAR model [19]. Both the training and test sets of molecules had their prediction ability examined. Additionally, the default settings were applied, and a maximum of 2000 conformers and 15 conformations per rotatable bond were produced. Using vector, volume, site, survival, and survival in actives scores, the generated hypotheses were graded and ranked. Five places were determined to be common to all compounds in the hypothesis. A 3D-QSAR model was then developed using partial least squares (PLS) regression statistics. ionizable (P), and aromatic ring (R). The distance and angles between different AAHHH.3 sites are shown in Figure 7a,b. AAHHH.3 represents that two hydrogen bond acceptors and three hydrophobic groups are essential for the activity. All ligands had their fitness scores evaluated using the AAHHH.3 model. A scatter plot analysis was also used to uncover discrete vital pharmacophoric requirements at spatial structure areas. The blue cubes around the ligand represented a favorable position for group substitution, whereas the red cubes showed a non-favorable position in Figure 6a-c for the top four ligands of this study.  ionizable (P), and aromatic ring (R). The distance and angles between different AA sites are shown in Figure 7a,b. AAHHH.3 represents that two hydrogen bond a and three hydrophobic groups are essential for the activity. All ligands had the scores evaluated using the AAHHH.3 model. A scatter plot analysis was also use cover discrete vital pharmacophoric requirements at spatial structure areas. The bes around the ligand represented a favorable position for group substitution, the red cubes showed a non-favorable position in Figure 6a-c for the top four lig this study.  The formula for the test set: The green dots in Figure 8a,b represent ligands of the test set and training set. The ligands must be near the linear progression curve. The scatter plot with the XY-axis of the actual correlation with the predicted pIC50 is represented in Figure 8a,b for the test and training set compounds.
ranked. Five places were determined to be common to all compounds in the hypothesis. A 3D-QSAR model was then developed using partial least squares (PLS) regression statistics.
The formula for the test set: y = 0.58x + 2.29 (R 2 = 0.83) The green dots in Figure 8a,b represent ligands of the test set and training set. The ligands must be near the linear progression curve. The scatter plot with the XY-axis of the actual correlation with the predicted pIC50 is represented in Figure 8a,b for the test and training set compounds.

MD Simulation
The MD simulation is used to estimate macromolecule mechanics, and it is based on classical mechanics and the application of Newton's equation of motion to compute the speed and location of each atom in the investigated system. As a result, MD undertakes a more thorough structural investigation than docking, resulting in a more realistic depiction of protein motion [20].
Using a 100 ns MD, the stability of the docked BT ER 15f/2IOG complex was examined. Using the Desmond module of Schrödinger 2021-4, the complex in the explicit solvent system with the OPLS4 force field was investigated. The BT_ER_15f compound interacts with the protein residues as shown in Figure 9. The interaction fraction of each amino acid is given in Figure 10.

MD Simulation
The MD simulation is used to estimate macromolecule mechanics, and it is based on classical mechanics and the application of Newton's equation of motion to compute the speed and location of each atom in the investigated system. As a result, MD undertakes a more thorough structural investigation than docking, resulting in a more realistic depiction of protein motion [20].
Using a 100 ns MD, the stability of the docked BT ER 15f/2IOG complex was examined. Using the Desmond module of Schrödinger 2021-4, the complex in the explicit solvent system with the OPLS4 force field was investigated. The BT_ER_15f compound interacts with the protein residues as shown in Figure 9. The interaction fraction of each amino acid is given in Figure 10.    MD of standard tamoxifen was also performed, and it was found that the amino acids ALA350 and PHE404 have the highest interaction time. It is represented in Figures 11 and  12. Amino acid residue ALA350 has a continuous interaction time. The RMSD value from MD of standard tamoxifen was also performed, and it was found that the amino acids ALA350 and PHE404 have the highest interaction time. It is represented in Figures 11 and 12. Amino acid residue ALA350 has a continuous interaction time. The RMSD value from the resulting trajectory analysis was in the range of 1.0 to 3.0. Green vertical bars in Figure 13 indicate protein residues that interact with the ligand, and the interactions between residues 100 and 130 showed the largest changes up to 2.4 Å. Through the formation of hydrophobic contacts with TRP383, ALA 350, PHE 404, and LEU 387, the molecule was positioned in the active pocket.
At 37 ns, higher ligand RMSD fluctuations (up to 2.7 Å) were noted, given in Figure 14. Stable hydrophobic interactions with ALA 350, LEU 387, and PHE 404 were noted during the simulation. Utilizing measures of root-mean-square fluctuations, the flexibility of residues on ligand bindings was examined. To comprehend the molecular insights involved in the binding of TRP383 in the active pocket of protein target 2IOG, a 100 ns molecular dynamic simulation was conducted.
It could be noted from Figure 15 that the deviation in the displacement of atoms is larger compared to BT_ER_15f. Thus BT_ER_15f, which has the best fit in the binding pocket, is best compared to the market available drug tamoxifen. the resulting trajectory analysis was in the range of 1.0 to 3.0. Green vertical bars in Figure  13 indicate protein residues that interact with the ligand, and the interactions between residues 100 and 130 showed the largest changes up to 2.4 Å. Through the formation of hydrophobic contacts with TRP383, ALA 350, PHE 404, and LEU 387, the molecule was positioned in the active pocket.   the resulting trajectory analysis was in the range of 1.0 to 3.0. Green vertical bars in Figure  13 indicate protein residues that interact with the ligand, and the interactions between residues 100 and 130 showed the largest changes up to 2.4 Å. Through the formation of hydrophobic contacts with TRP383, ALA 350, PHE 404, and LEU 387, the molecule was positioned in the active pocket.   The interaction time of each amino acid is given in Figure 13. It could be noted that the interaction times of amino acid TRP383 were greater than all other amino acids. Amino acid ASN 532 interaction was steady for the first 60 nanoseconds (ns), and then interaction was lost. Again, interaction occurs from 80 to 100 ns. Figure 16 provides the ligand (BT_ER_15f) characteristics such as ligand RMSD, radius of gyration (rGyr), intramolecular hydrogen bonds (intraHB), molecular surface area (MolSA), solvent accessible surface area (SASA), and polar surface area (PSA). The ligand and protein root-mean-square fluctuation is shown in Figure 17a,b, and it is important for describing local variations throughout the protein chain. RMSF is a measure of the displacement of a particular atom or group of atoms relative to the reference structure averaged over the number of atoms. RMSD is useful for the analysis of time-dependent motions of the structure.
L-RMSF (local root-mean-square fluctuation) and P-RMSF (protein root-mean-square Fluctuation) are both measures of a protein molecule's flexibility or mobility. The average deviation or fluctuation in the position of each atom in a protein molecule from its average position in a given simulation or experimental data is measured as L-RMSF. It is calculated for a specific region or residue in a protein rather than the entire protein, and it is commonly used to identify flexible or disordered regions of a protein that are important for its function. P-RMSF, on the other hand, is the average RMSF value calculated for all the atoms in a protein molecule. It is used to quantify the protein's overall flexibility or mobility and can aid in identifying regions that are relatively stable or flexible. P-RMSF can also be used to compare the flexibility of various proteins or conformations of the same protein. Both L-RMSF and P-RMSF are key techniques in the study of protein structure and function because they give insight into the dynamic features of proteins that are vital for their biological activity. At 37 ns, higher ligand RMSD fluctuations (up to 2.7 Å) were noted, given in Figure  14. Stable hydrophobic interactions with ALA 350, LEU 387, and PHE 404 were noted during the simulation. Utilizing measures of root-mean-square fluctuations, the flexibility of residues on ligand bindings was examined. To comprehend the molecular insights involved in the binding of TRP383 in the active pocket of protein target 2IOG, a 100 ns molecular dynamic simulation was conducted.
It could be noted from Figure 15 that the deviation in the displacement of atoms is larger compared to BT_ER_15f. Thus BT_ER_15f, which has the best fit in the binding pocket, is best compared to the market available drug tamoxifen.    The interaction time of each amino acid is given in Figure 13. It could be noted that the interaction times of amino acid TRP383 were greater than all other amino acids. Amino acid ASN 532 interaction was steady for the first 60 nanoseconds (ns), and then interaction was lost. Again, interaction occurs from 80 to 100 ns. Figure 16 provides the ligand (BT_ER_15f) characteristics such as ligand RMSD, radius of gyration (rGyr), intramolecular hydrogen bonds (intraHB), molecular surface area (MolSA), solvent accessible surface area (SASA), and polar surface area (PSA). The ligand and protein root-mean-square fluctuation is shown in Figure 17a,b, and it is important for describing local variations throughout the protein chain. RMSF is a measure of the displacement of a particular atom or group of atoms relative to the reference structure averaged over the number of atoms. RMSD is useful for the analysis of time-dependent motions of the structure.

Docking Studies
Docking studies were carried out mainly for four analogs which are marine sesquiterpene [21], sesquiterpene lactone analogs [22,23], heteroaromatic chalcones [24,25], and benzothiophene analogs [26] which were obtained from literature studies. The 3D crystal structure of the breast cancer protein 2IOG was previously co-crystallized with the N-[(1R)-3-(4hydroxyphenyl)-1-methylpropyl]-2-[2-phenyl-6-(2-piperidin-1-ylethoxy)-1H-indol-3-yl] acetamide. From the Protein Data Bank, the protein PDB ID 2IOG (resolution 1.6 Å) was retrieved. Arpita Roy published a paper on in silico investigation of agonists for proteins involved in breast cancer using the same target 2IOG [27]. The protein was optimized using the epic module of the Schrödinger suite 2021-4's protein preparation wizard. By adjusting bond ordering, adding hydrogen atoms, and eliminating water molecules longer than 5 Å, the protein was optimized using the protein preparation wizard. Missing chains were then added using the Prime module of the Schrödinger suite 2021-4. The RMSD of the crystallographic heavy atoms was held at 0.30 for the OPLS4 molecular force field, which was used to minimize the protein. To pinpoint the centroid of the active site, a grid box was created. Using the Glide module of the Schrödinger suite 2021-4, all the compounds were docked into the catalytic pocket of the target protein 2IOG [28]. Significant Glide scores indicate ligands with higher 2IOG binding affinities [29,30]. Hussein highlighted in his work that the compound CH4 (chalcone) exhibited a binding energy of −10.83 kcal/mol against the target protein 2IOG [31], which possesses anti-breast cancer activity.

MM/GBSA Binding Free Energy Calculation
The precise determination of binding free energy plays a very essential role among the several strategies that may be used to analyze the ligand-receptor interaction [32,33]. Using the Prime molecular mechanics-generalized Born surface area (MM/GBSA) of Schrödinger 2021-4, post-docking energy minimization calculations were carried out to determine the free energy of binding for the collection of ligands in complex with a receptor [34]. The Poisson-Boltzmann surface area (MM/PBSA) and molecular mechanics-generalized Born surface area (MM/GBSA) are arguably very popular methods for binding free energy predictions [35]. Imaobong Etti published that the Artocarpus species has good anticancerous properties [36]. He and his co-workers found that Artonin E possesses the best drug-likeness using the Prime module of the Schrodinger software 2021-4 against the target protein 2IOG.

In Silico Predicted ADMET Properties
By identifying the most promising candidates for development and eliminating those with a low chance of success, early assessment of ADME-Tox characteristics can reduce the time and expense of screening and testing. The regulatory authorities are now very interested in the practical application of in silico methodologies for predicting preclinical toxicological endpoints, clinical side effects, and ADME features of new chemical entities [37]. ChemAxon properties such as molecular weight, total polar surface area (TPSA), hydrogen bond acceptor and donor count, log P, log D, log S, molar volume, and dissociation constant (KD), as well as the number of violations of Lipinski's rule of five, van der Waals volume, and other properties were used to predict the physically and pharmacokinetically significant descriptors for the top hits by using the Qikprop module of Schrodinger suite 2021-4. Table 3 displays these outcomes.

Pharmacophore Modeling
An explanation for the pharmacological effects of a collection of substances that bind to the same biological target is known as a pharmacophore model [38]. "An ensemble of steric and electronic features necessary to produce optimal supramolecular interactions with a given biological target" is a pharmacophore model [39]. A pharmacophore model can then be used to query the 3D chemical library to look for potential ligands, which are referred to as "pharmacophore-based virtual screening," depending on whether the approach was ligand-or structure-based virtual screening (VS) [40]. The pharmacophore model was created using the Phase module of the Schrodinger suite 2021-4. The common pharmacophore AAHHH.3 found from our work can be used for further high-throughput screening to screen a large database [41]. Tien-Yi-Hou and his co-workers performed work on estrogen receptor-α ligand binding through pharmacophore modeling and concluded few pharmacophore models active against breast cancer.

QSAR-Quantitative Structure Activity Relationship
The application of force field calculations requiring three-dimensional structures of a given collection of small molecules with known activities is referred to as 3D-QSAR. 3D-QSAR is an extension of classical QSAR that uses robust statistical analysis such as PLS, G/PLS, and AN to explain the three-dimensional features of ligands and predict their biological activity [42].
A computational modeling method known as the quantitative structure-activity relationship (QSAR) helps researchers connect the structural characteristics of chemical compounds with their biological functions. Drug development requires QSAR modeling [43].

Molecular Dynamics
A technique for simulating the physical motions of atoms and molecules is called molecular dynamics (MD) [44]. For a predetermined period of time, the atoms and molecules are allowed to interact, giving insight into the dynamic "evolution" of the system. Molecular dynamics is a method for computing the time evolution of a group of interacting atoms using Newton's equations of motion. In the thermodynamic process of protein-ligand interaction, a tiny molecule's solvation free energy acts as a stand-in for the ligand's desolvation [45]. MD of the highest Glide score compound was performed in this work using the Desmond module of Schrodinger suite 2021-4. From these geometric requirements around the ligand, BT_ER_15f was identified.

Conclusions
In the realm of drug design and discovery, integrated methodologies of QSAR and molecular docking-based prediction have been successfully used in a number of statistically supported examples. The current research on benzothiophene analogs, specifically BT_ER_15f, using molecular docking and QSAR demonstrated that it has a sizable anticancer effect against the target 2IOG.
From the docking study, the benzothiophene derivative demonstrated better arrangement at the dynamic site. The current investigation aided in identifying the key compounds and their beneficial effects. In subsequent analysis using in vitro and in vivo techniques, it could be optimized as a drug to treat breast cancer. According to the findings, the compound BT_ER_15f, a benzothiophene derivative, exhibits strong anti-breast cancer action and is useful for future research.
The pharmacokinetics and drug-likeness studies revealed that the ligand BT_ER_15f could be the best drug candidate against breast cancer.
In the future, this study will be a reliable resource for achieving further benzothiophene derivatives through innovative structural modifications in benzothiophene derivatives that are being widely researched. These findings provide compelling support for novel studies that involve developing more methodological frameworks to investigate molecular facets of their anti-cancer action.

Data Availability Statement:
The data supporting reported results can be available with corresponding author Kalirajan Rajagopal which will be shared on request by mail.