Identification of Lead Compounds against Scm (fms10) in Enterococcus faecium Using Computer Aided Drug Designing

(1) Background: Enterococcus faecium DO is an environmental microbe, which is a mesophilic, facultative, Gram-positive, and multiple habitat microorganism. Enterococcus faecium DO is responsible for many diseases in human. The fight against infectious diseases is confronted by the development of multiple drug resistance in E. faecium. The focus of this research work is to identify a novel compound against this pathogen by using bioinformatics tools and technology. (2) Methods: We screened the proteome (accession No. PRJNA55353) information from the genome database of the National Centre for Biotechnology Information (NCBI) and suggested a potential drug target. I-TASSER was used to predict the three-dimensional structure of the protein, and the structure was optimized and minimized by different tools. PubChem and ChEBI were used to retrieve the inhibitors. Pharmacophore modeling and virtual screening were performed to identify novel compounds. Binding interactions of compounds with target protein were checked using LigPlot. pkCSM, SwissADME, and ProTox-II were used for adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. (3) Results: Novel selected compounds have improved absorption and have better ADMET properties. Based on our results, the chemically identified inhibitor ZINC48942 targeted the receptor that can inhibit the activity of infection in E. faecium. This research work will be beneficial for the scientific community and could aid in the design of a new drug against E. faecium infections. (4) Conclusions: It was observed that novel compounds are potential inhibitors with more efficacy and fewer side effects. This research work will help researchers in testing and identification of these chemicals useful against E. faecium.


Introduction
In the early 1900s, Enterococcus faecalis and faecium were identified and isolated. In human beings, these are the most abundant species comprising up to one percent of microbiota in the intestine [1]. To treat diseases (neonatal meningitis, urinary tract infections, surgical wound infections, nosocomial bacteremia, and catheter-related) caused by E. faecium, billions of dollars are spent, and this pathogen kills roughly two million new antimicrobial compounds against E. faecium, to inhibit the pathogen's functionality by targeting a bacterial protein and to predict lead compounds against scm (fms10) of E. faecium.

Materials and Methods
The flowchart of the adopted computer-aided drug design (CADD) is given in Figure 1. The details of the methodology are as follows.
Life 2021, 11, x FOR PEER REVIEW 3 of 15 antimicrobial compounds against E. faecium, to inhibit the pathogen's functionality by targeting a bacterial protein and to predict lead compounds against scm (fms10) of E. faecium.

Materials and Methods
The flowchart of the adopted computer-aided drug design (CADD) is given in Figure  1. The details of the methodology are as follows.

Target Identification
E. faecium contains one chromosome and three plasmids, 3209 genes, which translate into 3114 proteins, and the Gas Chromatography (GC) content is 37.8% [12]. Four out of 22 surface proteins of E. faecium show a virulence factor. Against infectious diseases caused by Enterococcus faecium potential drug targets, recombinant Scm65 (A-and B-domains) and Scm36 (A-domain) are identified, as revealed by functional annotation using various computational tools and techniques [13], were used. For virulence factor analysis, the Virulence Factor Database (VFDB) was used [14].

Protein Selection and Structural Refinement
To retrieve protein sequences in FASTA format, UniProt (Universal Protein Resource) database (http:/www.uniprot.org/) was used [15]. Expasy ProtParam and CFSSP (http://www.biogem.org/tool/chou-fasman/) was used to calculate physiochemical properties of protein and prediction of secondary structure, respectively [16]. This also helps in the prediction of protein functFns [17]. The target protein structure was not present in PDB, and the sequence of the target protein did not fulfill the requirements of homology modeling, therefore, we moved toward threading based modeling, and for this purpose, we used the online server I TASSER [18]. We used UCSF Chimera for protein visualization [19].
For structure refinement, we used two online servers, named GalaxyWeb and 3Drefine [20,21]. For generating the Ramachandran plot, Rampage was used, The plot also provides information about the residues lying in favored, allowed, or outlier regions [22].

Target Identification
E. faecium contains one chromosome and three plasmids, 3209 genes, which translate into 3114 proteins, and the Gas Chromatography (GC) content is 37.8% [12]. Four out of 22 surface proteins of E. faecium show a virulence factor. Against infectious diseases caused by Enterococcus faecium potential drug targets, recombinant Scm65 (A-and B-domains) and Scm36 (A-domain) are identified, as revealed by functional annotation using various computational tools and techniques [13], were used. For virulence factor analysis, the Virulence Factor Database (VFDB) was used [14].

Protein Selection and Structural Refinement
To retrieve protein sequences in FASTA format, UniProt (Universal Protein Resource) database (http:/www.uniprot.org/) was used [15]. Expasy ProtParam and CFSSP (http: //www.biogem.org/tool/chou-fasman/) was used to calculate physiochemical properties of protein and prediction of secondary structure, respectively [16]. This also helps in the prediction of protein functFns [17]. The target protein structure was not present in PDB, and the sequence of the target protein did not fulfill the requirements of homology modeling, therefore, we moved toward threading based modeling, and for this purpose, we used the online server I TASSER [18]. We used UCSF Chimera for protein visualization [19].
For structure refinement, we used two online servers, named GalaxyWeb and 3Drefine [20,21]. For generating the Ramachandran plot, Rampage was used, The plot also provides information about the residues lying in favored, allowed, or outlier regions [22]. To evaluate theoretical protein models, QMEAN Z-score was used [23]. PROVE (PROtein Volume Evaluation) was used for structure validation procedures [24]. VERIFY3D was used to verify the final model of the predicted protein [25].

Protein Properties
InterPro and PRED-LIPO, a Hidden Markov Model, was used to classify the protein domain and lipoprotein signals, respectively [26,27]. Trans-membrane helixes were predicted by using the TMHMM server v.2.0 [28]. PSORTb was used for the prediction of prokaryotic localization sites [29]. Binding and active sites were predicted using COFAC-TOR and CASTp [30,31].

Selection and Retrieval of Ligands
Overall, 211 Chemical compounds from PubChem, ChEBI, and Literature Survey were considered as ligands considering their biological activities. The 2D chemical structure was retrieved and converted into 3D by using Discovery Studio.

Docking Analysis
Docking results were obtained according to their binding affinities by using AutoDock Vina. PyMOL and Discovery Studio were used for the analysis of the protein-ligand complex to understand the interactions between receptor and inhibitor along with the binding sites of target Protein.

Pharmacophore Generation
The selected compounds with a wide range of structural diversity and activity were aligned. A pharmacophore model was generated to merge all the features of selected compounds. The pharmacophore of the top 10 inhibitors against targeted Scm (Fms10) was generated using LigandScout. Virtual screening was performed against the ZINC database to find the inhibitors which can inhibit adherence activity [32,33].

Docking of Novel Compounds
All identified hits of pharmacophore-based virtual screening were sorted according to their pharmacophore-fit score, and 100 compounds were selected and filter on the basis of two rules, named rule of five and the Veber rule. Then the top 15 compounds were selected. These compounds were docked with the receptor and evaluated for binding energies and protein-ligand interactions by using AutoDock Vina. Pymol was used for making complex files of receptor and ligand, and for finding interactions, LigPlot was used, respectively.

Toxicity Analysis
After a thorough analysis of docking results, drug likeness and toxicity characteristics were identified through pkCSM [34], ProTox-II [35], and SwissADME [36], which are reported as useful tools in calculating important drug-like descriptors, such as adsorption, distribution, metabolism, excretion, and toxicity (ADMET), as well as use for predicting lead likeness with respect to mutagenicity and carcinogenicity.

Lead Identification
The most active inhibitors were identified based on docking score, ligand-protein interactions, and toxicity analysis studies including Molecular Weight (MW), Hydrogen Bond Donner (HBD), Hydrogen Bond Acceptor (HBA), partial coefficient logP, Polar Surface Area (PSA), rotatable bonds, rings, Blood-Brain Barrier and Ames Toxicity etc. The compounds showing the least binding affinity, high lead likenesses, and best interactions were selected as potential inhibitors of Scm (Fms10).

Molecular Dynamics Simulation
Molecular dynamics simulations were performed for 50 nanoseconds using Desmond, a Package of Schrödinger LLC [37]. The initial stage of protein and ligand complexes for Life 2021, 11, 77 5 of 15 molecular dynamics simulation were obtained from docking studies. Molecular Docking Studies provide a prediction of ligand binding status in static conditions. Simulations were carried out to predict the ligand binding status in the physiological environment. The protein-ligand complexes were preprocessed using Protein Preparation Wizard or Maestro, which also included optimization and minimization of complexes. All systems were prepared by the System Builder tool. Solvent Model with an orthorhombic box was selected as TIP3P (Transferable Intermolecular Interaction Potential 3 Points). The OPLS_2005 force field was used in the simulation [38]. The models were made neutral by adding counter ions where needed. To mimic the physiological conditions, 0.15 M salt (NaCl) was added. The NPT ensemble (Isothermal-Isobaric: moles (N), pressure (P), and temperature (T) are conserved) with 300 K temperature and 1 atm pressure was select for complete simulation. The models were relaxed before the simulation. The trajectories were saved after every 50 ps for analysis, and the stability of simulations was evaluated by calculating the root mean square deviation (RMSD) of the protein and ligand over time.

Results and Discussion
The amino acid sequence of target proteins of Enterococcus faecium was retrieved from the UniProt database (I3U5K9). The structure of the protein was generated by I-TASSER, shown in Figure 2.

Molecular Dynamics Simulation
Molecular dynamics simulations were performed for 50 nanoseconds using Desmond, a Package of Schrödinger LLC [37]. The initial stage of protein and ligand complexes for molecular dynamics simulation were obtained from docking studies. Molecular Docking Studies provide a prediction of ligand binding status in static conditions. Simulations were carried out to predict the ligand binding status in the physiological environment. The protein-ligand complexes were preprocessed using Protein Preparation Wizard or Maestro, which also included optimization and minimization of complexes. All systems were prepared by the System Builder tool. Solvent Model with an orthorhombic box was selected as TIP3P (Transferable Intermolecular Interaction Potential 3 Points). The OPLS_2005 force field was used in the simulation [38]. The models were made neutral by adding counter ions where needed. To mimic the physiological conditions, 0.15 M salt (NaCl) was added. The NPT ensemble (Isothermal-Isobaric: moles (N), pressure (P), and temperature (T) are conserved) with 300 K temperature and 1 atm pressure was select for complete simulation. The models were relaxed before the simulation. The trajectories were saved after every 50 ps for analysis, and the stability of simulations was evaluated by calculating the root mean square deviation (RMSD) of the protein and ligand over time.

Results and Discussion
The amino acid sequence of target proteins of Enterococcus faecium was retrieved from the UniProt database (I3U5K9). The structure of the protein was generated by I-TASSER, shown in Figure 2. The overall quality of the structure was 90.1% using Rampage, as mentioned in Table  1 and Figure 3. The predicted structure contains 24 helix, 36 sheets, and 40 coils. Scm (Fms10) of E. faecium DO has total 14 B-repeats from which 13 beta-repeats of 19 residues in length and 1 partial repeat of 10 residues also contain a signal peptide, and if we see subcellular localization in Figure 4A, then a major part of protein lies in cell wall region. Domains and functional sites and lipoprotein signal peptides of the target protein are shown in Figure 4B,C, respectively. The overall quality of the structure was 90.1% using Rampage, as mentioned in Table 1 and Figure 3. The predicted structure contains 24 helix, 36 sheets, and 40 coils. Scm (Fms10) of E. faecium DO has total 14 B-repeats from which 13 beta-repeats of 19 residues in length and 1 partial repeat of 10 residues also contain a signal peptide, and if we see subcellular localization in Figure 4A, then a major part of protein lies in cell wall region. Domains and functional sites and lipoprotein signal peptides of the target protein are shown in Figure 4B,C, respectively.   The Hidden Markov Model method was used for the prediction of lipoprotein signal peptides of Gram-positive bacteria. Transmembrane helices were predicted by using TMHMM server v 2.0. According to the Exp number of amino acids in trans-membrane helices (TMHs), the expected number of amino acids in transmembrane helices can be determined. If this number is larger than 18, it is very likely to be a transmembrane protein or have a signal peptide. Total prob of N-in: The total probability that the N-term is on the cytoplasmic side of the membrane. The bacterial protein value of the expected amino acids   There were 211 inhibitors retrieved against Collagen-binding MSCRAMM Scm (Fms10) from PubChem, ChEBI, and Literature. All these were selected based on their inhibitory effect on Collagen-binding MSCRAMM Scm (Fms10) involved in multi-drug resistance. Among 211 compounds, 161 failed during Lipinski and Veber filtering (such as Molecular Weight > 500, logP > 5, H-Bond Donors > 5, H-Bond acceptors > 10, PSA < 140, RB < 10).
The selected inhibitors were docked with Collagen-binding MSCRAMM Scm (Fms10), and 10 ligands with best binding affinities were chosen, Table 2. These inhibitors were analyzed through LigPlot to determine the amino acids involved in protein-ligand binding interactions. The Hidden Markov Model method was used for the prediction of lipoprotein signal peptides of Gram-positive bacteria. Transmembrane helices were predicted by using TMHMM server v 2.0. According to the Exp number of amino acids in trans-membrane helices (TMHs), the expected number of amino acids in transmembrane helices can be determined. If this number is larger than 18, it is very likely to be a transmembrane protein or have a signal peptide. Total prob of N-in: The total probability that the N-term is on the cytoplasmic side of the membrane. The bacterial protein value of the expected amino acids was more than 18, and almost all graphs had the same protruded area into and outside the cell and the transmembrane area, see Figure 4D.
The selected inhibitors were docked with Collagen-binding MSCRAMM Scm (Fms10), and 10 ligands with best binding affinities were chosen, Table 2. These inhibitors were analyzed through LigPlot to determine the amino acids involved in protein-ligand binding interactions.        Based on binding energies, the top 10 selected inhibitors were chosen to generate a pharmacophore model. This merged pharmacophore model with matching features, such as Hydrogen Bond Donors, Hydrogen Bond Acceptors, and Aromatic Rings, is shown in Figure 5.
After pharmacophore modeling, virtual screening using the ZINC library was performed via LigandScout, to identify the compounds with features like those of the pharmacophore model. Twenty-one thousand, three hundred and thirty-one hits (Hitrate: 70%) were identified out of 30,737 compounds. Based on the Pharmacophore-fit score, the top 100 compounds were selected, and again the rule of five and Veber rule was applied. After applying both rules, 15 novel compounds were selected for molecular docking with the target protein. Molecular docking of Scm (Fms10) with selected novel compounds was conducted using AutoDock Vina (Table 3). Based on binding energies, the top 10 selected inhibitors were chosen to generate a pharmacophore model. This merged pharmacophore model with matching features, such as Hydrogen Bond Donors, Hydrogen Bond Acceptors, and Aromatic Rings, is shown in Figure 5.
After pharmacophore modeling, virtual screening using the ZINC library was performed via LigandScout, to identify the compounds with features like those of the pharmacophore model. Twenty-one thousand, three hundred and thirty-one hits (Hitrate: 70%) were identified out of 30,737 compounds. Based on the Pharmacophore-fit score, the top 100 compounds were selected, and again the rule of five and Veber rule was applied. After applying both rules, 15 novel compounds were selected for molecular docking with the target protein. Molecular docking of Scm (Fms10) with selected novel compounds was conducted using AutoDock Vina (Table 3). Based on binding energies, the top 10 selected inhibitors were chosen to generate a pharmacophore model. This merged pharmacophore model with matching features, such as Hydrogen Bond Donors, Hydrogen Bond Acceptors, and Aromatic Rings, is shown in Figure 5.
After pharmacophore modeling, virtual screening using the ZINC library was performed via LigandScout, to identify the compounds with features like those of the pharmacophore model. Twenty-one thousand, three hundred and thirty-one hits (Hitrate: 70%) were identified out of 30,737 compounds. Based on the Pharmacophore-fit score, the top 100 compounds were selected, and again the rule of five and Veber rule was applied. After applying both rules, 15 novel compounds were selected for molecular docking with the target protein. Molecular docking of Scm (Fms10) with selected novel compounds was conducted using AutoDock Vina (Table 3). Based on binding energies, the top 10 selected inhibitors were chosen to generate a pharmacophore model. This merged pharmacophore model with matching features, such as Hydrogen Bond Donors, Hydrogen Bond Acceptors, and Aromatic Rings, is shown in Figure 5.
After pharmacophore modeling, virtual screening using the ZINC library was performed via LigandScout, to identify the compounds with features like those of the pharmacophore model. Twenty-one thousand, three hundred and thirty-one hits (Hitrate: 70%) were identified out of 30,737 compounds. Based on the Pharmacophore-fit score, the top 100 compounds were selected, and again the rule of five and Veber rule was applied. After applying both rules, 15 novel compounds were selected for molecular docking with the target protein. Molecular docking of Scm (Fms10) with selected novel compounds was conducted using AutoDock Vina (Table 3). Based on binding energies, the top 10 selected inhibitors were chosen to generate a pharmacophore model. This merged pharmacophore model with matching features, such as Hydrogen Bond Donors, Hydrogen Bond Acceptors, and Aromatic Rings, is shown in Figure 5. After docking through autodock vina, the ADMET properties of novel compounds were determined, and one compound (ZINC48942) was identified as the most active from all molecules after toxicity analysis. Properties and interactions of the best one are shown in Table 4 and Figure 6.  After pharmacophore modeling, virtual screening using the ZINC library was performed via LigandScout, to identify the compounds with features like those of the pharma-Life 2021, 11, 77 10 of 15 cophore model. Twenty-one thousand, three hundred and thirty-one hits (Hitrate: 70%) were identified out of 30,737 compounds. Based on the Pharmacophore-fit score, the top 100 compounds were selected, and again the rule of five and Veber rule was applied. After applying both rules, 15 novel compounds were selected for molecular docking with the target protein. Molecular docking of Scm (Fms10) with selected novel compounds was conducted using AutoDock Vina (Table 3). After docking through autodock vina, the ADMET properties of novel compounds were determined, and one compound (ZINC48942) was identified as the most active from all molecules after toxicity analysis. Properties and interactions of the best one are shown in Table 4 and Figure 6.
The Desmond simulation trajectories were analyzed. Root mean square deviation (RMSD), root mean square fluctuation (RMSF), and protein-ligand contacts were calculated from MD trajectory analysis. Figure 7 shows the evolution of RMSD values in the course of time for the backbone atoms of the ligand bound protein. The RMSD plot of the complex indicates that the complex reaches stability at 10 ns. From then, an average RMSD value of 2.2 Å persists up to 50 ns. After that, changes in RMSD values remain within 2.2 Å during the simulation period, which is quite acceptable for small, predicted proteins. Ligand fit to protein RMSD values fluctuates within 1.0 Angstrom after being stable. These indicate that the ligand remains stably bound to the binding site of the receptor during the simulation period.  The Desmond simulation trajectories were analyzed. Root mean square deviation (RMSD), root mean square fluctuation (RMSF), and protein-ligand contacts were calculated from MD trajectory analysis. Figure 7 shows the evolution of RMSD values in the course of time for the backbone atoms of the ligand bound protein. The RMSD plot of the complex indicates that the complex reaches stability at 10 ns. From then, an average RMSD value of 2.2 Å persists up to 50 ns. After that, changes in RMSD values remain within 2.2 Å during the simulation period, which is quite acceptable for small, predicted proteins. Ligand fit to protein RMSD values fluctuates within 1.0 Angstrom after being stable. These indicate that the ligand remains stably bound to the binding site of the receptor during the simulation period.    The Desmond simulation trajectories were analyzed. Root mean square deviation (RMSD), root mean square fluctuation (RMSF), and protein-ligand contacts were calculated from MD trajectory analysis. Figure 7 shows the evolution of RMSD values in the course of time for the backbone atoms of the ligand bound protein. The RMSD plot of the complex indicates that the complex reaches stability at 10 ns. From then, an average RMSD value of 2.2 Å persists up to 50 ns. After that, changes in RMSD values remain within 2.2 Å during the simulation period, which is quite acceptable for small, predicted proteins. Ligand fit to protein RMSD values fluctuates within 1.0 Angstrom after being stable. These indicate that the ligand remains stably bound to the binding site of the receptor during the simulation period.    Most of the important interactions of ligand-proteins determined with MD are hydrogen bonds and hydrophobic interactions, as depicted in Figure 10. THR_421, SER_22, and THR_423 are the most important ones in terms of H-bonds. The stacked bar charts were normalized over the course of the trajectory: for example, a value of 1.0 suggests that for 100% of the simulation time, the specific interaction was maintained. Values over 1.0 are possible as some protein residue may make multiple contacts of the same subtype with the ligand.  Most of the important interactions of ligand-proteins determined with MD are hydrogen bonds and hydrophobic interactions, as depicted in Figure 10. THR_421, SER_22,  Most of the important interactions of ligand-proteins determined with MD are hydrogen bonds and hydrophobic interactions, as depicted in Figure 10. THR_421, SER_22, Figure 9. Protein Secondary Structure element distribution by residue index throughout the protein structure. Red columns indicate alpha helices, and blue columns indicate beta-strands. and THR_423 are the most important ones in terms of H-bonds. The stacked bar charts were normalized over the course of the trajectory: for example, a value of 1.0 suggests that for 100% of the simulation time, the specific interaction was maintained. Values over 1.0 are possible as some protein residue may make multiple contacts of the same subtype with the ligand.

Conclusions
Several multidisciplinary methods have gained research attention to lessen the time and cost during the drug development process. The motivation of this research work was to find target proteins and then select inhibitors for infectious Enterococcus faecium strains. From the ZINC database, we selected chemical compounds that inhibit the effect of the Scm (Fms10) protein. Pharmacophore modeling with virtual screening and docking analysis helped to separate the compounds having the least binding energy with the target protein. The chemically identified inhibitor ZINC48942 targeted the receptor that can inhibit the activity of adherence and spreading of infection in E. faecium. We concluded that this drug could be used as a lead compound to develop a drug that can selectively act against E. faecium infections without interfering with the activities of the human proteasome. These findings will be beneficial for the scientific community and could aid in the design of a new drug against E. faecium infections.

Conflicts of Interest:
The authors declare no conflict of interest. Figure 10. Protein-ligand contact histogram.

Conclusions
Several multidisciplinary methods have gained research attention to lessen the time and cost during the drug development process. The motivation of this research work was to find target proteins and then select inhibitors for infectious Enterococcus faecium strains. From the ZINC database, we selected chemical compounds that inhibit the effect of the Scm (Fms10) protein. Pharmacophore modeling with virtual screening and docking analysis helped to separate the compounds having the least binding energy with the target protein.
The chemically identified inhibitor ZINC48942 targeted the receptor that can inhibit the activity of adherence and spreading of infection in E. faecium. We concluded that this drug could be used as a lead compound to develop a drug that can selectively act against E. faecium infections without interfering with the activities of the human proteasome. These findings will be beneficial for the scientific community and could aid in the design of a new drug against E. faecium infections.