Discovery of new TLR7 agonists by a combination of statistical learning-based QSAR, virtual screening, and molecular dynamics

Search for new antiviral medications has surged in the past two years due to the COVID-19 crisis. Toll-like receptor 7 (TLR7) is among one of the most important TLR proteins of innate immunity that is responsible for broad antiviral response and immune system control. TLR7 agonists, as both vaccine adjuvants and immune response modulators, are among the top drug candidates for not only our contemporary viral pandemic but also other diseases. The agonists of TLR7 have been utilized as vaccine adjuvants and antiviral agents. In this study, we hybridized a statistical learning-based QSAR model with molecular docking and molecular dynamics simulation to extract new antiviral drugs by drug repurposing of the DrugBank database. First, we manually curated a dataset consisting of TLR7 agonists. The molecular descriptors of these compounds were extracted, and feature engineering was done to restrict the number of features to 45. We applied a statistically inspired modification of the partial least squares (SIMPLS) method to build our QSAR model. In the next stage, the DrugBank database was virtually screened structurally using molecular docking, and the top compounds for the guanosine binding site of TLR were identified. The result of molecular docking was again screened by the ligand-based approach of QSAR to eliminate compounds that do not display strong EC50 values by the previously trained model. We then subjected the final results to molecular dynamics simulation and compared our compounds with imiquimod (an FDA-approved TLR7 agonist) and compound 1 (the most active compound against TLR7 in vitro, EC50 = 0.2 nM). Our results evidently demonstrate that cephalosporins and nucleotide analogues (especially acyclic nucleotide analogues such as adefovir and cidofovir) are computationally potent agonists of TLR7. We finally reviewed some publications about cephalosporins that, just like pieces of a puzzle, completed our conclusion.


Introduction
TLR7, as a member of the Toll-like receptor (TLR) family, is located in the endosomal compartment. TLR7 is principally expressed in plasmacytoid dendritic cells and B cells, but a low level of its expression can be observed in non-immune cells such as hepatocytes, epithelial cells, and keratinocytes. Stimulation of TLR7 in dendritic cells causes the production of various interleukins like IL-12 and type I interferons (IFNs) by these cells. IL-12 enhances the cytotoxic activity of T lymphocytes and natural killer cells [1]. TLR7 receptor is able to recognize viral single-stranded viral RNA (ssRNA) as well as synthetic tricyclics belonging to the imidazoquinoline series by using two different pockets [2].
The pathway triggered in TLR activation is either mediated by myeloid differentiation primary response gene 88 (MyD88) or TIRdomain-containing adapter-inducing interferon-β (TRIF). TLR7 carries its signals via the activation of MyD88, and its activation is confirmed to be involved in 1-the production of many cytokines such as IL-29 (implicated in the immune response against pathogens), 2-the upregulation of anti-tumor proteins like tissue inhibitor of metalloproteinases 1 (TIMP1) and phosphatidylinositol 3,4,5-trisphosphate 3phosphatase and dual-specificity protein phosphatase (PTEN), and 3-the marginal suppression of oncogenes like vascular endothelial growth factor (VEGF) [3]. The development of antiviral and antibacterial compounds, adjuvants in vaccines, and new drugs for cancer therapy, allergy, and asthma have been inspired by the TLR7 mechanism of action [4]. Upon activation of the TLR7 pathway, a variety of transcriptional factors and interferon regulatory factors are recruited, and various cytokines will be released, including the type I interferons (namely IFN-α and IFN-β), which act in both paracrine and autocrine manners on infected and uninfected cells to generate antiviral immunity. The pathway of IFN-I activation finally leads to the activation of interferon-stimulated genes (ISGs), which are a broad set of proteins involved in versatile aspects of response to viral pathogens. For example, protein kinase R, activated by such ISGs, inhibits viral protein synthesis, or the 2 ′ ,5 ′ -oligoadenylate synthetase family degrades viral RNA [5].
Imiquimod and its derivatives resiquimod and gardiquimod are among the most studied TLR7 agonists (Fig. 1). Imiquimod (compound 9) is a tricyclic nitrogen molecule belonging to the imidazo [4,5-c] quinoline series, which indirectly inhibits viral replications of certain viruses such as herpes simplex virus type 2 (HSV-2), Sendai virus, human papillomavirus (HPV), Rift Valley fever virus (RVFV), hepatitis C virus (HCV), Banzi virus, poxvirus (molluscum contagiosum disease) by induction of IFN-α [6]. Typically, imidazoquinolines like resiquimod can activate TLR7 and TLR8 simultaneously [7], but some of them, like imiquimod, can act as a TLR7-specific agonist [8]. Aldara® (5% cream) and Zyclara® (3.75% cream) are two commercially available FDA-approved imiquimod-containing products for the treatment of actinic keratosis, superficial basal cell carcinoma, external genital, and perianal warts in people of 12 years of age or older [9]. Antiviral activity of imidazoquinoline is perhaps their most notable studied activity, but their therapeutic significance is not limited to that. Gardiquimod has been employed in cancer therapy. Inhibition of cell proliferation, promotion of apoptosis, and suppression of metastasis are suggested as possible mechanisms that this drug exerts its anti-tumor activity [14]. Imiquimod was shown to prevent the growth of cutaneous breast cancer by a CD8 + T cell-dependent mechanism [15]. Resiquimod has been illustrated to have tumor regression activities by impeding the suppression that is mediated by tumor-associated macrophages (TAMs) and myeloid-derived suppressor cells (MDSC) in the tumor microenvironment [16]. The reversal of suppressive actions of regulatory T cells (Treg) and stimulation of natural killer cells are two other pathways by which TLR7 activation exert anticancer effects [17]. Though little is known about how TLR7 can recognize and develop an immune response against bacterial RNA, its activity against bacterial recognition in conventional dendritic cells and subsequent activation of T helper type 1 (Th1) responses is established [18].
Due to the high similarity in function and structure, TLR7 and TLR8 are studied together most of the time. Pyrimidine and purine derivatives, (hetero)bicyclic compounds, (hetero)tricyclic compounds, macromolecular RNA or DNA have demonstrated TLR7/8 agonistic activity. The last group (DNA and RNA oligomers) seems less promising than others because TLR7 agonists will be ineffective unless sufficiently delivered to endosomal compartments, where TLR7 resides. Compound 4 ( Fig. 1), as an 8-hydroxyadenine derivative, demonstrated an IFNinducing activity of 10 times higher than imiquimod and with better oral tolerance. Guanosine analogues such as loxoribine activate immune cells exclusively via TLR7. In vitro studies of loxoribine efficacy in mice display substantial inhibition of B16 melanoma lung tumor metastasis [4].
Computer-aided drug design (CADD) methods have facilitated the discovery of new compounds without recourse to expensive synthetic tools and various pharmacological tests. This article consists of different phases of computational methods, aiming to verify each other in a stepby-step fashion.  50 values and some other well-known TLR7 agonists [10]. Also, the EC 50 values for imiquimod [11], resiquimod [12], and gardiquimod [11] have been deduced from the references in the BindingDB database [13].

Overview
In the first phase, 256 TLR7 agonists were collected from a number of articles. In the next phase, molecular descriptors of these compounds were calculated with the MOE software. The prepared data must have been curated in a way to improve the quantitative structure-activity relationship (QSAR) model quality. Not all of the descriptors could be implemented with reliable predictability for the final model. Thus, redundant descriptors had to be excluded. Our feature selection method for this process was forward and backward elimination. We tested different methods to this aim, but the best method was found to be R Square. Statistical learning procedures were used to create the relationship between molecular descriptors and the potency of each compound to generate a robust QSAR model. Docking-based virtual screening as the other computational method was implemented to suggest new repurposed drugs from the DrugBank database capable of agonizing the TLR7 receptor. In the next step, the statistical-based model was used to filter out the best-predicted compounds in the virtual screening step to yield the top compounds. It is essential to apply a statistical algorithm to confirm data prepared by virtual screening as a dual-validating procedure. At the final phase, molecular dynamics was utilized as the ultimate computational verifying tool to unravel the real binding residency and the interactions of the ligand molecules with the TLR7 receptor and to determine the correctness and veracity of the virtual screening and machine-learning processes. This work-flow is depicted in Fig. 2, and each procedure will be elaborated in the next parts in further detail.

Dataset collection and descriptors calculation
A dataset of 256 TLR7 ligands with their EC 50 and structures was generated from the literature [10,[19][20][21][22][23][24][25][26][27]. This dataset was generated using a set of search criteria in Scopus database using these three keywords "TLR7" AND "Agonist" AND "EC 50 ". A total of 13 results have been obtained in the Scopus, which were manually filtered for the studies that used the design/discovery purposes of novel compounds. In these selected studies, some compounds had displayed no or very mild agonistic activity on TLR7, which were also included in the dataset as decoys. Random number generation of calculators was used to assign a number to the EC 50 of such values. Then the structures were imported manually in the Molecular Operating Environment (MOE) software and prepared by the QuickPrep module. In the structure preparation, Pro-tonate3D was exploited for the protonation of the required atoms. The 3D structure preparation was performed to reach a gradient of less than 0.001 kcal/mol/Å. After preparation, a dataset of these molecules was built along with their EC 50 values. Some of the EC 50 had no activation for concentrations less than a specified range. We used a random number generator (in that range) to create random EC 50 values since having diversity in the range of EC 50 values by data augmentation has gained much attention due to its undeniable improvement in the model accuracy.
In the MOE software (2015.10 version), physicochemical descriptors can be computed, including 2D, i3D, and x3D descriptors. Many of the descriptors with constant 0 or 1 for all molecules manually deleted (expert opinion) and 135 remained or with irrelevant values that could not be used in the QSAR model was removed. Descriptors that were used for building the model were further screened in the feature selection.

Feature selection
For the best combination of descriptors, the forward and backward selection method was used. The EC 50 was converted to a negative logarithmic scale, and all features (descriptors) were normalized. The Pvalue threshold (0.25) was used as the stopping rule.
The following statistics have been applied to the feature selection model: RMSE (root mean square error), AICc (Akaike information criterion, corrected), BIC (Bayesian information criterion), and R 2 (coefficient of determination). Both the AIC and BIC combine absolute fit by considering model parsimony. In other words, they penalize by adding parameters to the model, but they achieve this by different means. Of the two, the BIC penalizes by adding parameters to the model more strongly than the AIC. For both of these parameters, a more negative value represents a better fit and will be chosen as the final model [28].

Partial least squares regression (PLSR) model
In this study, the statistically inspired modification of the partial least squares (SIMPLS) method was used to predict EC 50 based on the descriptors. Partial least squares (PLS) is known as a multivariate machine learning or statistical learning algorithm that succeeds in circumstances where the application of ordinary least squares does not produce satisfactory results, such as highly correlated X variables, several Y (dependent) variables, and lots of X variables [29]. The SIMPLS method is suitable for large-scale datasets with collinearity, such as QSAR data. PLSR is an expeditious method because it converts complex matrix calculations into simple regressions. PLSR works by discovering a linear regression model by projecting the predicted variables and the response variables into a new lower-dimensional space, similar to principal component analysis (PCA) that regulates the collinearity among the variables. PLSR exploits the correlations between the Xs and the Ys to reveal the underlying latent structure and to identify the principal components (PCs) that explain the highest covariance between the predictors and the response. Dataset centering and scaling were performed on the dataset, which means the data matrix has zero mean and unit variance.
In this step, the following statistical values were calculated: root mean PRESS (root mean predicted residual sum of squares), R 2 , and Q 2 (the cross-validated coefficient of determination), and VIP (variable importance projection).

Model evaluation
Leave one out cross-validation (LOOCV) was used for determining the optimal number of factors to extract. All the processes mentioned above were done using JMP Statistical Discovery Software 13. Some Python libraries were used for other analyses (Seaborn and Matplotlib for graph drawings and NumPy and Pandas for numerical computations and analyses).

Root mean predicted residual sum of squares (PRESS)
The root mean PRESS depends on the cross-validation process. The PRESS residuals are defined as e(i) = y i -y ' I where y'i is the predicted value for the ith test compound, y i is its observed value, and n is the total number of objects (ligands) in the entire data set. It is noteworthy to state that in these equations, y generally refers to the biological value (EC 50 ), and x refers to the physicochemical features in the QSAR model.
The process is repeated for all n observations, and the PRESS statistic is computed as follows: where the notation i/i indicates that the model predicts the response estimated when the ith sample is left out from the training set.

Cross-validated coefficient of determination (Q 2 )
Q 2 is calculated as follows: where yi is the calculated average of the observed values, and SSY is the sum of squares for Y averaged across all responses and based on the validation set observations.

Molecular docking-based virtual screening
Virtual screening is a computationally intensive method utilized for the identification of potent ligands from a training set of compounds extracted from a database. Hence, the full database of DrugBank was extracted and employed for this purpose.
Extra precision (XP) molecular docking technique was performed to determine the best structures due to their docking energies and the molecular interactions with TLR7. The prepared training set was imported to Glide of Schrodinger, and the energy values (XP GlideScores (XP GScore)) were computed using the Optimized Potentials for Liquid Simulations 2005 (OPLS_2005) force-field. Glide was enabled to sample flexible ligand structures based on considering ring conformations and nitrogen inversion of ligands. Epik state penalties of stable ligands were calculated at a pH of 7.2 (intracellular pH) and added to the GlideScores [30,31].
Finally, the structures with the most negative docking scores were passed to the QSAR model evaluations. The top ten compounds, which displayed the lowest EC 50 values on the statistical learning-based QSAR model, were further assessed for their interactions with TLR7 by molecular dynamics studies. The recent structure of TLR7 (5ZSF from PDB database, from Macaca mulatta) was used for this study. TLR7 is known to harbor two spatially distinct binding sites based on the study of Zhang et al. [32]. The first binding site, which is conserved in TLR7 and TLR8 structures, recognizes small molecules and is critical for their activation, and the second pose recognizes single-stranded RNA molecules (ssRNA). In that study, the authors also stated that the first site preferentially binds to guanosine, whereas the second site typically binds to uridine moieties in ssRNA structures. In this study, the first binding site was used since it has been evolutionary designed to accommodate small molecules (guanosine analogues). This binding site is located a bit down at the intersection of two subunits of TLR7. Lys432 was used at the center of the grid generation with a box of 25*25*25 Å size.

Molecular dynamics (MD)
Molecular dynamics (MD) is a widely used computational method for the understanding of the interactions between various molecules under desired circumstances [33,34]. MD studies were used in this study to verify if the ligand is able to reside in the binding pocket of the TLR7 for an acceptable period of time with satisfactory binding properties or not. In other words, MD acted as a verification part for the virtual screening that was achieved using the QSAR and molecular docking section.
MD studies were performed using the Desmond package (D.E. Shaw research [35]) for a period of 20 ns for each (top eight) ligand. OPLS_2005 force-field was applied for the simulation, with the SPC water model and a concentration of 0.15 M of NaCl. Steepest descent minimization and the Limited-memory Broyden--Fletcher-Goldfarb-Shanno (LBFGS) method of energy minimization were used to converge the energy of the system to a gradient of 1 kcal/mol/Å with a maximum iteration of 2000. MD simulation was set in NPT (constant atom numbers, pressure (1.01325 bar), and constant temperature (310 K)) ensemble, and before running the MD simulation, simulated annealing was performed. In simulated annealing, the temperature of the system was enhanced to 400 K to remove the non-selective interactions for 0.5 ns and returned to the normal 310 K afterward. Additionally, the SHAKE algorithm was exploited to impose a constraint on the geometry of water molecules and heavy atom bond lengths with hydrogen to accelerate the calculations with acceptable precision. The Nose-Hoover chain and Martyna-Tobias-Klein approach were employed as the thermostat and barostat adjustment methods with 1.0 ps and 2.0 ps intervals using isotropic coupling style, respectively. The summation of long-range electrostatic forces was accomplished by the famous Particle Mesh Ewald (PME) approach. A 2 fs and 6 fs Reversible Reference System Propagator Algorithm (RESPA) integrator time-step was exploited for the calculation of near and far range forces. A cut-off radius of 9.0 Å was maintained for the calculation of the Coulombic forces. The Root Mean Square Deviation (RMSD), Root Mean Square Fluctuation (RMSF) of proteins and ligands, and also ligands' torsional profile were monitored in reference to the first frame of the simulation. Interactions lasting more than 15% of the time of simulation were documented in the final results.

Feature selection
Of the 135 descriptors, 45 features were selected. Several parameters were calculated for this step, which are depicted in Table 1 below.

SIMPLS QSAR model
Using the descriptors selected in the previous step, the SIMPLS model was constructed. In Fig. 3, we show the coefficients of essential descriptors in the SIMPLS model. The descriptors employed in our model are depicted in Fig. 4A. The model was able to provide satisfactory results ( Fig. 4B and C), as indicated by the good regression coefficient (R 2 = 0.8054) and cross-validation regression coefficient (Q 2 = 0.5856) values ( Table 2). The minimum root mean PRESS was 0.6534, and the minimizing number of factors was 14 (Fig. 4D). The maximum number of factors was set to 15, and 14 gave us the lowest minimizing factor value. Thirty-two descriptors could get a VIP value greater than 0.8, suggesting that they have played an important role in predicting EC 50 . The model provides a regression equation of the descriptors that is computationally efficient with desirable accuracy.
GCUT_SLOGP, reactive, SMR_VSA0, PEOE_VSA+1, and PEOE_VSA-3 were the most important descriptors. The GCUT descriptors exploit the atomic contribution to logP (using the Wildman and Crippen SlogP method) instead of partial charge. Reactive is an indicator of the presence of reactive groups. VSA is an abbreviation for van der Waals surface area, and SMR denotes the sum of v i such that R i is in [0,0.11], in which R i denotes the contribution to Molar Refractivity for atom i, and v i refers to the van der Waals surface area (Å 2 ) of atom i. The Partial Equalization of Orbital Electronegativities (PEOE) is a method for the calculation of atomic partial charges, and the charge is transferred between bonded atoms until equilibrium [36]. We can roughly conclude that features related to solubility and energy values were the most relevant features in our model.
Variable importance in projection (VIP) score estimates the importance of each variable in the projection used in the model (Fig. 5). The Python code of the SIMPLS model is attached to the supplementary materials.
The correlation matrix between the features and the target value (1/ Log(EC 50 )) is illustrated by Seaborn and Matplotlib (Fig. 6). In this figure lighter colors indicate less linear (Pearson) correlation whereas stronger colors indicate more robust linear correlations. We used this plot to decrease the amount of multicollinearity and to assess whether the contribution of each descriptor is sufficient enough to the accuracy of the final model.

Molecular docking-based virtual screening
The results of virtual screening by molecular docking identified the top compounds capable of binding to TLR7 at is guanosine binding site (with Lys432 as one of the most important residues). At this step, 8665 compounds were docked at the receptor site (the ligands that were too unstable in the binding site are therefore not counted in this number), and the top 100 were selected for the next step of QSAR-based verification. The most potent compound identified by molecular docking was 2 ′ deoxy-Thymidine-5 ′ -Diphospho-Alpha-D-Glucose with an XP GScore of − 16.24. It is no surprise that a nucleotide-linked sugar has the best docking score since this binding pocket is mainly designed to bind to guanosine. Several other nucleotides/nucleosides or their analogues also displayed amazingly negative energy values in docking. For example, 1-Phosphomethylphosphonic Acid-Guanylate Ester, 2-Phosphoaminophosphonic Acid Guanylate Ester, 3-Thymidine-5 ′ -Diphospho-Beta-D-Xylose were among the second to fourth top compounds in terms of their binding energies. Denufosol, a drug consisting of two conjoined nucleotides used for cystic fibrosis [38], ranked fifth in this category ( Table 3). The full list of the virtual screening results is provided in the supplementary materials.
Since nucleotides have been studied in various studies and their analogues can interfere with many pathways in viral pathogenesis, this study focused on other possible ligands for agonizing the activity of TLR7.
One of the most interesting and prominent results of this study is that cephalosporins like ceftiofur, cefotiam, ceftriaxone, cefdinir, cefditoren, and cefpodoxime displayed stronger inhibitory activity than previously recognized inhibitors like imiquimod and resiquimod. Typically, cephalosporins are β-lactam antibacterial agents used to combat a range of infections from gram-positive to gram-negative bacteria. Historically, they were derived from the fungus Cephalosporium sp., and by covalently inhibiting penicillin-binding proteins required for cell wall synthesis like peptidoglycan transpeptidase, they exert their pharmacological activity [39,40].
Surprisingly, there are some reports in the literature that mention the confirmed antiviral activity of cephalosporins. In a study dated back to 1988 by Swiss researchers, it was demonstrated that some cephalosporin derivatives were capable of inhibiting two DNA-containing viruses, namely vaccinia virus and herpes simplex I (HSV-I) but not RNAcontaining viruses (and also no activity on normal cells) [41]. It is very likely now that some of the effects observed for those derivatives were related to TLR pathways (essentially TLR7 and TLR8). Cephalosporins have also been documented as potential inhibitory compounds against nucleocapsid protein C-terminal domain (N-CTD) of SARS-CoV-2, with ceftriaxone indicating the strongest inhibitory potential [42]. Ceftriaxone has been tested for its inhibitory activity against novel human coronavirus (SARS-CoV-2), and it displayed 16.14 μM inhibition for this virus, and a stronger inhibition was also reported for cefoperazone (12.36 μM) [43]. There seems to be a loose connection between these inhibitory values and their corresponding inhibitory values for TLR7 (-12.614 XP GScore for ceftriaxone and − 7.825 XP GScore for cefoperazone). In an in silico study, cefotiam was found among the top potential compounds in inhibiting SARS-CoV-2 M pro [44]. Hobi et al. indicated that some derivatives or degradation products of ceftazidime could have an anti-HIV-1 effect irrelevant to CD4/gp120 interaction [45]. In another study by Pomorska-Mól et al., researchers found that ceftiofur distorts the humoral and cellular immune response to vaccines, as expected for a TLR agonist [46]. The amount of previous literature data that are coherent with our findings do not end here. Accumulating evidence points out that cephalosporins could be linked with some forms of autoimmunity. For example, acute and fatal incidences of cephalosporin-induced autoimmune hemolytic anemia have been reported, but the underlying mechanism has not been established [47][48][49][50][51]. Ceftriaxone, especially, has been reported to cause other types of autoimmune reactions like acute autoimmune hepatitis, which could lead to fulminant hepatic failure [52]. Ceftriaxone and cefepime have been linked to drug-induced autoimmune systemic lupus erythematosus (SLE) [53,54]. Cephalosporins and penicillins have been generally identified as drugs that could  exacerbate the progression of SLE [55]. In parallel with these findings, we know that some TLR agonists could be used as artificial producers of autoimmune reactions in animal models. In fact, TLR4 and TLR9 ligands are frequently used to induce autoimmune diseases or idiosyncratic adverse drug reactions in animal models because they stimulate inflammation and activate the innate immune system [56]. Even topical usage of TLR7 agonists has been exploited to induce lupus-like diseases in mice [57]. TLR7 antagonists have also been suggested as possible treatment strategies for autoimmune diseases [2]. These pieces of information strongly underscore the role of cephalosporins in TLR agonism and autoimmune diseases, which is now computationally confirmed by our study and has also been reported in many in vitro and in vivo conditions discussed earlier. It is likely that activation of several different TLRs could contribute to such effects observed for cephalosporins. This useful TLR agonism of cephalosporins can be cautiously used to treat viral diseases, especially when a bacterial cause is additionally suspected.
To add extra importance to the fact of how much TLR7 can exhibit antiviral properties, one should look at the paper by Bam and colleagues. They demonstrated that GS-9620, a TLR7 agonist, results in potent inhibition of HIV-1 infection in blood mononuclear cells [5]. Furthermore, GS-9620 has been shown to reduce liver and serum hepatitis B virus (HBV) DNA, HBeAg, and HBsAg concentrations in chronically infected chimpanzees [58]. Interestingly, resiquimod has been found to display antiviral activity against HIV replication in monocytes [59]. Among the compounds in the shortlisted series of potential TLR7 agonists, valtorcitabine has shown remarkable suppression of serum HBV DNA and has been well-tolerated in patients with chronic HBV [60]. Gemcitabine, a nucleoside analog used as an anticancer medication, has demonstrated broad-spectrum antiviral activity and can suppress enterovirus infections through innate immunity [61].

QSAR-based verification of virtual screening
The shortlisted compounds of virtual screening with the lowest negative binding energy (XP GScore of more negative than − 10) were subjected to QSAR-based verification (note that nucleotide analogues were knowingly ignored, as stated earlier). In this step, the top compounds with agonistic activity on TLR7 were identified. Since molecular docking can only tell us about the binding and not necessarily the agonistic or antagonistic effects, this verification can enhance the accuracy of finding the right compounds with demonstrable agonistic activity (and not merely strong binding affinities). The best compounds of this section were ceftiofur, cefotiam, ceftriaxone, cefdinir, cidofovir, gemcitabine, cefditoren, valtorcitabine, cytochlor, cefpodoxime, abacavir hydroxyacetate, adefovir, and D-Eritadenine, 5-[4-Tert-Butylphenylsulfanyl]-2,4-Quinazolinediamine (DB01958). The first eight compounds were tested in an additional step using MD simulation to ensure if they reveal considerable binding residency in the binding pocket of TLR7. Table 4 presents the QSAR results of the verification parts for the top ten compounds.

Molecular dynamics (MD)
The MD was performed for the strongest TLR7 agonist (Fig. 1,  compound 1), imiquimod, and 6 other compounds identified in this study (three of them were cephalosporins, including ceftriaxone, cefditoren, and cefpodoxime plus 3 other compounds, namely cidofovir, adefovir, and D-Eritadenine). In all cases, the ligand remained stable in the binding pocket attached to TLR7, but there were some nuances between each ligand molecule (their binding orientation and affinity), which will be discussed in extensive detail.
The most impactful results were seen for adefovir and cefditoren, and we limit the discussion here for these two compounds compared with imiquimod and compound 1 (for a full description of the results for each  compound, please visit the supplementary material; cefpodoxime, D-Eritadenine, ceftriaxone, and cidofovir were the other four top compounds). The first fact when analyzing the results should be that we exploited the 5ZSF crystal structure from PDB that is TLR7 cocrystallized with imiquimod. Therefore, the RMSD diagram is considerably lower for imiquimod than other ligands (Fig. 7A). This should not give a false impression that other compounds are weak inhibitors of TLR7. Indeed, the MD results of compound 1 strongly support this concept (we expected to see lower fluctuations in the RMSD diagram of this compound compared with imiquimod). In line with this observation, compound 1 was set as the true reference to evaluate the binding fluctuations of different drug ligands in the guanosine binding pocket of TLR7 (Fig. 7B). Adefovir and cefditoren displayed the most promising results in MD simulation among all compounds studied in this research work. After stabilization in the binding site (after 7.5 ns), adefovir exhibited less than 0.1 Å, which is quite remarkable and even superior to compound 1 (Fig. 7C). Cefditoren seemed to have been stabilized even faster but displayed around 0.15 Å fluctuations, again superior to Fig. 6. The correlation matrix between the 45 features and target values. The more the colors are dispersed, the more the model lacks multicollinearity, and the more robust the model will become. The color (value) of the first row can be an indicator (beside VIP) of the contribution of the descriptor in the prediction of EC 50 (this plot is drawn using Seaborn). For a full list of abbreviations of this diagram, please visit the supplementary material of Bernal et al. [37]. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)

Table 3
The list of some of the compounds with strong binding affinities to TLR7 (less than − 10 XP GScores). Nucleotide analogues are intentionally limited to the top five compounds (rows 1 to 5). Cephalosporins displayed large (absolute) values for XP GScore, indicating high potential for their application as TLR7 agonists.
Name of the compound XP GScore  compound 1 (Fig. 7D). RMSF study of the protein for various ligands, in general, appeared to be similar for the compounds of this study with some subtle differences. Interestingly, cefditoren indicated lower RMSF values (even lower than imiquimod as the co-crystallized ligand of TLR7), which signifies a very strong binding orientation for this ligand in the TLR7 pocket (visit supplementary material, protein RMSF). Based on the RMSF diagram of the protein, it can also be inferred that the residues engaged in Fig. 7. RMSD diagrams of the imiquimod (FDA-approved drug that acts by agonizing TLR7), compound 1 (the most potent TLR7 agonist with 0.2 nM EC 50 ), and two of the most potent inhibitors discovered in this study, i.e., adefovir and cefditoren. RMSD values have been calculated for both Cα of TLR7 and ligands compared with their first frame. interaction with the ligands are the same for adefovir, cefditoren, and compound 1, but a bit different from imiquimod. Imiquimod did not display any interactions with the ligand in the residues around 260 to 390, whereas the three others indicated such interactions. Thr586 was one of the most prominent residues for the interaction of all compounds, highlighting its importance in the guanosine binding site of TLR7. Asp555 was another highly important residue in the guanosine binding site, having interactions with all four compounds. Phe408 displayed pipi stacking bonds with various rings in the ligands for all compounds. Lys432 that was used as the center of our grid for the molecular docking, represented hydrogen bonds with adefovir and compound 1 (Fig. 8).
Analyses of protein secondary structure during the simulation unraveled that cefditoren tends to fold the structure of the protein more than the other three structures at about 0.5% more compared with imiquimod and adefovir and 0.3% more than compound 1. It is not hard to suppose a possible link between such folding stability and ligands' binding residency (see supplementary material, protein secondary structure).
In summary, we want to lay emphasis on the important but possibly neglected role of cephalosporins in increasing the immune response by TLR activation. Cefditoren, which is our most potent cephalosporin capable of interacting with TLR7, has been reported to rarely cause drug reaction with eosinophilia and systemic symptoms (DRESS) [62]. Such reports are in keeping with our results that cephalosporins are strong modulators of TLR pathways in the cells.

Conclusion
TLR7 agonists are increasingly being heralded as immunostimulatory molecules, and their usage can encompass antiviral agents to adjuvants in vaccine formulations. We performed three independent computational approaches in a step-wise verifiable manner to reach the top compounds via a drug repurposing or repositioning approach.
First, we built a dataset based on previously characterized TLR7 agonists. Their physicochemical descriptors have been calculated, and after feature selection, a statistical learning method (SIMPLS) was employed to generate the QSAR model. Thus, the first computational model was constructed using QSAR. We then accomplished a structurebased virtual screening using molecular docking to rank the top compounds that were capable of interacting with the guanosine binding pocket of TLR7. At this step, nucleotide analogues were found to have the most potent binding affinity to TLR7. Some cephalosporins ranked very high in our virtual screening, and we decided to study further whether these compounds are suitable TLR7 binders or not.
Second, we used our previously trained QSAR model to act as a ligand-based screening for the results of structure-based virtual screening by molecular docking. Some previously identified antiviral agents, namely cidofovir and adefovir, in addition to cephalosporins, exhibited remarkable predicted EC 50 values, more negative than imiquimod.
Last, we did an MD simulation to understand whether the ligand-TLR7 complex remains stable until a decent time or not. Cefditoren of cephalosporins and adefovir from antivirals represented the best binding properties and are suggested as very potent TLR7-binders. The astute reader should remember that since the QSAR model was trained based on the agonists (and not merely binders to TLR7), our model has high reliability in terms of functionality (acting as agonist/antagonist versus only binding to the binding site of TLR7).
Our literature surveys were further confirmed our findings that cephalosporins modulate the immune system. There are plenty of studies (enumerated in the discussion) indicating the possible relationship of cephalosporins in drug-induced autoimmunity (which is expected for TLR7 agonists as well). Our findings could shed light on the smarter usage of cephalosporins and their possible repurposing approaches in viral diseases, especially when the immune system becomes compromised or weakened, e.g., in acquired immunodeficiency syndrome (AIDS).
In short, cefditoren and adefovir demonstrated prominent in silico binding affinity to TLR7, even stronger than imiquimod and compound 1 (the most potent in vitro TLR7 agonist to date). Cefpodoxime, D-Eritadenine, ceftriaxone, and cidofovir were among the next candidates for strong TLR7 agonistic activity. Future in vitro and in vivo studies would uncover how precisely our three-combined (QSAR-docking-MD) platform has performed.

Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi. org/10.1016/j.imu.2021.100787.