QSAR and Structure Based Molecular Docking of Angelicin Compounds: an Attempt to Drug Design against Swine Influenza

Swine influenza is one of the viral family members of Orthomyxoviridae. It contains ribonucleic acid (RNA) as genetic component and has three serological types including A, B and C respectively of which A subtype is a very virulent and infectious which can attack to human [1]. This RNA virus contains hemagglutinin (HA) and neuraminidase (NA) in the cell surface glycoproteins which produces virulent effect. A number of 16 HA (H1 to H16) and 9 NA (N1 to N9) are known in viral glycoproteins. Therefore, a total number of 16 × 9=144 distinct progenies could be produced by combinatorial combination of different types of HA and NA. These progenies may be exemplified by H1N1, H1N2.... , H2N1, H2N2...H3N2....H3N3.. and so on. Currently, H1, H2 and H3 in combination with N1 and N2 are most frequently dormant strains in human. Remaining subtypes are to be zoonotic, causing disease mainly in fowl and nonhuman primates [24]. Swine influenza is originated in swine and easily picked up by wild aquatic birds and other animal species like birds, pigs, ferret, horses, seals, whales, mink, giant anteaters, cats and dogs in which infection is largely intestinal, waterborne and asymptomatic. Then the viral strains emit to the roaming environment of these animals via nasal secretions, saliva, cough, tears and intestinal diarrheal materials. The reasserted strains may infect to the humans adjacent proximity of these animals and cross-infection occurs by transmitting of viral genetic mutant drift [5] between humans and these animals. This may produce genetic reassortment of the viral strains which become very dangerous. This genetically mutated drift can be speeded around the world and killed almost 100 million people in 1918 in Spain due to H1N1 whereas H2N2 engulfed a number of 4 million people in 1957 in Asia. H3N3 caused death of about one million people in Hong Kong in 1968, and in 2007 pathogenicity of H5N1 Avian influenza strain caused global Health threat. Currently in 2009, swine influenza (H1N1) outbreak in Mexico and other parts of the world has led to issuances of pandemic alertness by the WHO [6,7]. According to WHO approximately 526,060 cases of pandemic H1N1/09 infection and at least 6770 deaths were reported all over the world by 15 November 2009. This has suddenly arisen in North America and spread rapidly in Europe, Asia and South Africa via human to human transmission within a very short span [8]. As the devastating impact of swine influenza is enormous, a renewed drug discovery effort worldwide is essential to counteract the disease more efficiently. From the literature search it was shown that there is no specific chemotherapeutic agent against swine influenza caused by H1N1 viable strain. Therefore researchers have been trying to develop anti swine influenza chemotherapeutics which are able to combat against swine influenza pandemic.


Introduction
Swine influenza is one of the viral family members of Orthomyxoviridae. It contains ribonucleic acid (RNA) as genetic component and has three serological types including A, B and C respectively of which A subtype is a very virulent and infectious which can attack to human [1]. This RNA virus contains hemagglutinin (HA) and neuraminidase (NA) in the cell surface glycoproteins which produces virulent effect. A number of 16 HA (H1 to H16) and 9 NA (N1 to N9) are known in viral glycoproteins. Therefore, a total number of 16 × 9=144 distinct progenies could be produced by combinatorial combination of different types of HA and NA. These progenies may be exemplified by H1N1, H1N2…. , H2N1, H2N2…H3N2….H3N3.. and so on. Currently, H1, H2 and H3 in combination with N1 and N2 are most frequently dormant strains in human. Remaining subtypes are to be zoonotic, causing disease mainly in fowl and nonhuman primates [2][3][4]. Swine influenza is originated in swine and easily picked up by wild aquatic birds and other animal species like birds, pigs, ferret, horses, seals, whales, mink, giant anteaters, cats and dogs in which infection is largely intestinal, waterborne and asymptomatic. Then the viral strains emit to the roaming environment of these animals via nasal secretions, saliva, cough, tears and intestinal diarrheal materials. The reasserted strains may infect to the humans adjacent proximity of these animals and cross-infection occurs by transmitting of viral genetic mutant drift [5] between humans and these animals. This may produce genetic reassortment of the viral strains which become very dangerous. This genetically mutated drift can be speeded around the world and killed almost 100 million people in 1918 in Spain due to H1N1 whereas H2N2 engulfed a number of 4 million people in 1957 in Asia. H3N3 caused death of about one million people in Hong Kong in 1968, and in 2007 pathogenicity of H5N1 Avian influenza strain caused global Health threat. Currently in 2009, swine influenza (H1N1) outbreak in Mexico and other parts of the world has led to issuances of pandemic alertness by the WHO [6,7]. According to WHO approximately 526,060 cases of pandemic H1N1/09 infection and at least 6770 deaths were reported all over the world by 15 November 2009. This has suddenly arisen in North America and spread rapidly in Europe, Asia and South Africa via human to human transmission within a very short span [8]. As the devastating impact of swine influenza is enormous, a renewed drug discovery effort worldwide is essential to counteract the disease more efficiently. From the literature search it was shown that there is no specific chemotherapeutic agent against swine influenza caused by H1N1 viable strain. Therefore researchers have been trying to develop anti swine influenza chemotherapeutics which are able to combat against swine influenza pandemic.
A series of 1H-1,2,3-triazole-4-carboxamide compounds were synthesized by Cheng et al. [9]. These compounds showed potent biological activity against various strain of H3N2 and H1N1 influenza virus as well as H5N1 (RG14) viral strain by inhibiting the nuclear polymerization. Lepri et al. synthesized a number of 2-aminothiophene-3-carboxamide derivatives and evaluated their biological activities against PA-PB1 interaction complex using Enzyme Linked Immuno Sorbent Assay (ELISA) technique. It was found that the viral polymerase consists of hetero trimetric complex consisting of PB1, PB2 and PA sub units which are essential for transcription and replication of viral m-RNA [10]. High throughput screening confirmed that the angular furocoumarins which is also called as angelicin can produce potent anti-viral activities. Further, Yeh and co-workers synthesized a number of angelicin derivatives and tested their biological activities against influenza A (H1N1) virus induced cytopathic effect in Madin-Darby canine kidney cell culture in low micro molar range. Some of these compounds were also found to be highly active against influenza A (H3N2) and influenza B viral strains. These compounds can produce its biological activities by inhibiting neuraminidase containing viral ribonucleoprotein (vRNP) complex [11]. Structure activity relationship studies were performed for these compounds against influenza A/WSN/33 (H1N1) strain. To explore biochemical mechanism of angelicin compounds at a molecular level it has to carry out quantitative structure activity relationship (QSAR) utilizing structural properties but such type of QSAR modeling for these derivatives are not yet done. Quantitative structure activity relationship is an interdisciplinary emerging research area with many interesting and practical applications to environmental protection, new drug discovery, and understanding the molecular basis of property/ bioactivity/ toxicity of chemicals. In particular, various classes of graph invariants and molecular descriptors are being increasingly used in these methods [12][13]. Although studies in structure-activity relationships go back to antiquity since the times of Crum-Brown and Fraser [14], it is only in the recent times that one witnesses a vigorous study of it as an interdisciplinary area. Of all recent studies in quantitative structure activity relationship modeling which are interwoven with those of medicinal chemistry, of pharmacokinetics, drug design etc., the seminal account of progress in the subject for the application of chemoinformatics and prediction of biological activities may be found in the book written by Leach and Gillet [15] entitled "An Introduction to Chemoinformatics". One has to mention precursors to this book Waterbeemd [16], Gasteiger [17] and Roy et al. [18] which really grew out for chemometric methods in molecular design and strategy for QSAR development for many QSAR applications requiring the predictions of the property of the molecules and use of these predictions to evaluate, screen and help priorities of the synthesis of these candidates [15][16][17][18].
Therefore, it is our aim in the present study to formulate QSAR models under the frame work of various sets of descriptor including topological, three dimensional (3D), constitutional, functional group and atom center fragment indices, which are computed solely from the structure of angilicin compounds utilizing genetic algorithm-multiple linear regression (GA-MLR) methods which can predict essential structural invention responsible for producing biological activity. Finally, mode of actions of these derivatives was explored by molecular docking of the active ligand towards neuraminidase target and that docking model could be applied for the design, screening and synthesis of coumarin compounds supposed to have potent antiviral activities.

Biological activity data
Linear furobenzopyrones are called as psoralen having reported to induce bifunctional photodamage to the DNA of the cutaneous cells in a selective way, thus inhibiting DNA functions and result in inter-strand cross linkage and produce anti-viral activities. Angular furobenzopyrone or fused furocoumarinin in which the furan ring is attached to the 6,7 positions of the coumarin ring is called as angelicin.
In the present study, a series of 53 angelicin derivatives (Table 1) have been considered from the published literature. Various group substitutions including R=methyl, ethyl; R 1 =phenyl, methoxy phenyl; R 2 =Benzoyl have been incorporated in the given angelicin nucleus to generate a number of congeners [11]. Angelicin derivatives are capable to inhibit the virus-induced cytopathic effect (CPE) on MDCK (Madin-Darby canine kidney) cells and the results expressed as IC50 (concentration required to reduce CPE by 50% relative to the virus control). Activities in terms of IC50 were converted into pIC50 which was considered as dependent variables for further QSAR modeling.

Structure optimization and descriptor calculation
2D angelicin structures were drawn using Chemdraw [19]. The drawn structures were then converted into three-dimensional (3D) modules, and the 3D geometries of all compounds were fully optimized using MM2 force field using a value of 0.01 as dielectric constant considering Chem3D ultra. These energetically minimized stable confirmations were then taken into consideration for the computation of theoretical structural descriptor for further QSAR modeling. Theoretical molecular descriptors are the numerical representation of molecule, achieved by applying the principles of graph theory to molecular structure. It encodes molecular architecture and quantifies such aspects of molecular structure as size, shape, symmetry, complexity, branching, cyclist, stereo electronic character, etc. Structural descriptors can be categorized as topological, 3D, atom centered fragments, constitutional and functional groups respectively. In the present work, DRAGON software [20,21] is used for the computation of theoretical molecular descriptors. A number of 436 topological descriptors were calculated and after elimination, a number of 344 are remaining. A total number of 596 three dimensional (3D) descriptors, useful for our purpose, were calculated via DRAGON software, and before model development, these were reduced to 487. A total number of 131 Constitutional, Functional group and atomcentered fragments descriptors, useful for our purpose, were calculated and before model development, these were reduced to 108, and The reduction in the descriptors was due to keeping a constant value for, or nearly all, of the compounds, and for those that perfectly correlated (r=1.0) with other descriptors. The reduced sets of descriptors were then treated by genetic algorithm-multiple linear regressions (GA-MLR) algorithm for developing QSAR models. Table S1 represents different classes of molecular descriptors along with their symbols.

Statistical analysis by GA-MLR
The reduced descriptor data of angelicin still contain a huge number of predictors which were then treated by genetic algorithm (GA) for selection of features [22] having significant impact on the anti H1N1 activity. Genetic algorithm is a random optimization tool useful for searching large probability space based on the principle of generation of new chromosome by mutation and crossover of parent gene. In this method gene is encoded by descriptor and each chromosome consists of combination of gene representing population consisting of combination of molecular descriptor. The fitness of each chromosome, which reflects the quality of the solution, is evaluated. The next step is the reproduction, which indicates creation of new chromosome (offspring) by selecting existing chromosomes through cross-over and mutation. The fitness of new chromosome is now evaluated and the cycle is repeated until optimal solutions or optimal fitness is attained [23]. Multivariate regression analysis (MRA), one of the oldest data reduction methodologies, continues to be widely used in QSAR, as it does not impose any restriction on the type and number of graphical invariants used in structure-property-activity studies. The ultimate goal of QSAR-based drug design is to find out which structural properties confer the drug highest potency or lowest toxicity. The drug's potency is here a dependent variable, and the structural properties, also called molecular descriptors, are the independent variables. The experimental signal that measures the potency could be, for example, the binding affinity of a drug candidate to its target protein. GA-MLR has been incorporated in Nanobridges software [24] which is used in the present study to formulate QSAR models of angelicin against H1N1 swine influenza. For the QSAR modeling, training and test set approaches has been incorporated. Training set was used to build the QSAR model whereas test set molecules activity is predicted by the developed training model. In the present study 70% of the 53 molecule were considered as training set while rest 30% was taken as test set compounds as indicated by asterisk in Table 1. The division of training and test data has been carried out by Kennard stone method [25] for model validation. All representative points of the test set has been resembled to those in the training set in the multidimensional vector space as well as the representative points of the training set must be well distributed within the whole area occupied by the entire dataset

Model validation
The quality of each model is denoted by R 2 (R is the square root of multiple R-square for regression), Q 2 Loo (cross-validated r 2 ) values for the training set, an external validation was performed by calculating predictive R 2 (R 2 pred ) and the standard error of estimation, SEE, represents standard deviation which is measured by the error mean square, which expresses the variation of the residuals or the variation about the regression line. Thus standard deviation is an absolute measure of quality of fit and should have a low value for the regression to be significant [26].

R 2 and Q 2
Loo of a model are calculated by Where Y obs , Y calc and Y pred denote observed, calculated and predicted activity values, respectively, and Ῡ indicates mean activity value of training molecules. Q2 Loo denotes predictive statistics which should be greater than 0.5. The validated QSAR's can identify the most significant contribution of the descriptor data modeled.
External validation of the model is carried out by calculating Predicted R 2.
where, Y pred test and Y test indicate predicted and observed activity values respectively of the test set compounds and Ῡ training indicates mean of observed activity values of the training set. For a predictive QSAR model, the value of R 2 pred should be more than 0.5 [27].

Structure based docking study of angelicin
In silico molecular docking is one of the sophisticated computational structure based drug design methods to study the ligand-receptor interaction which may produce energetically stable geometry of ligand-receptor complex having minimal interaction energy represented by different scoring functions such as dock score, piecewise linear potential score, potential of mean force score, and steric and electrostatic score. This score is used to predict the bind ing affinity of a ligand toward receptor. This utility allows screening a set of compounds for lead optimization [28,29]. Present molecular docking study has performed by using ArgusLab 4.0.1 dock engine [30]. It is free software which considers ascore as scoring function. The docking engine is allowed full flexibility of the ligand in side ridge active of the target receptor [31]. In the present study, all the 53 compounds have been docked inside the target cavity of neuraminidase. Compound 32 entitled as 4-Methyl-9-phenyl-8-(thiophene-2-carbonyl)-furo [2,3-h] chromen-2-one represent highest active lead compounds in angelicin series and considered as an active ligand to predict the mode of binding towards the target cavity. As per the experimental evidence described by Yeh et al., the optimized lead 4-methyl-9-phenyl-8-(thiophene-2-carbonyl)-furo[2,3-h]chromen-2-one (compound number 32, IC50=70 nM (Table 1)) showed many-fold enhanced activity compared to the high throughput screening (HTS) compound 1. Also, compound 32 was found effective in case of influenza A (H3N2) and influenza B virus strains similar to approved anti-influenza drug zanamivir. This compound can produce its biochemical mechanism by inhibition of vRNP (viral-Ribonucleo protein). Neuraminidase H1N1 viral enzyme is responsible for releasing dangerous progeny particle after attachment to the host cell. Therefore, the crystal structure of neuraminidase was considered as a receptor in the presence study. It is worth to mention that structure of neuraminidase-ligand complex is known and one of the PDB Codes of Neuraminidase (PDB ID:3B7E) [32] was downloaded from the Brookhaven Protein Databank (http:// www.rcsb.org). Then ligand receptor study was carried out by the "Dock ligand tool" of ArgusLab software. All water molecules and HET bound molecules except 1000 GOL which has been co-crystallized were deleted to prepare the target. The target contains a number of binding sites. Binding site of co-crystalized ligand 1000 GOL carries largest surface area and were treated as a reference to make the binding site for the ligand X-ray group. The generated binding cavity consists of all active residues having at least one atom within 3.5 Å from any atom in the co-crystallized ligand X-ray group. Comparative docking analyses of highly active ligands such 32, 53 and 52, intermediate active ligands including 1, 2 and 3 and lower active compounds such 10 and 37 towards target have been well discussed in next section.

QSAR modeling
A number of QSAR models have been generated for angilicin compounds utilizing different sets of computed descriptor including topological, 3D, constitutional, functional group, atom centered fragments. Out of these models, results of four QSARs based on topological, 3D, topological coupled with 3D and combination of topological, constitutional, functional groups, and atom centered fragments have been discussed in terms of R 2 , Q 2 Loo , and predictive R 2 . The QSAR results have been summarized in Table 2.

R 2 represent explained variance and Q 2
Loo represent internal validation predictive statistic whereas R 2 pred indicates external validation parameter of the develop QSAR models. 3D descriptor alone does not produce satisfactory inhibition of H1N1. Topological descriptor based model can produce 73.20% and 56.10% explained and predictive statistic variances of the H1N1 inhibitory activity of the training compounds while topological along with 3D descriptor based model can produce improved inhibition in terms of 79.80% and 72.70% explained and predictive statistic variances of the H1N1 inhibitory activity of the training compounds. Combination of topological, constitutional, functional groups, and atom centered fragments based model can produce 78.20% and 72.70% explained and predictive statistic variances of the H1N1 inhibitory activity of the training compounds. It was found that out of four QSAR models topological coupled with 3D descriptor based model can produce maximum external predictability in terms of 38.70% of the activity prediction. Topological as well as combination of topological, atom centered, constitutional, and functional group based model may produce all most equal test set predictability given as 30.00%. From comparative statistical validation parameters of four QSAR  models, it was observed that topological coupled with 3D descriptor based model gives highest impact and acceptable internal validation whereas external predictability in terms of predicted R 2 is found as beyond 0.5. Quality of the models can be improved by applying MAE (mean absolute error)-based criteria [33] and determining applicability domain of a (Q)SAR [34]. Therefore, further attempt has been made to improve the quality of this topological coupled with 3D descriptor based model by detecting outlier compound which is out of fit in the applicability domain (AD) of the response data. "The applicability domain of a QSAR is the physico-chemical, structural, or biological space, knowledge or information on which the training set of the model has been developed, and for which it is applicable to make predictions for new compounds. The applicability domain of a (Q)SAR should be described in terms of the most relevant parameters, i.e., usually those that are descriptors of the model. Ideally the (Q)SAR should only be used to make predictions within that domain by interpolation not extrapolation" [34]. NanoBridges software incorporated "AD using Standardization approach" tool which was used in the present training and test data to identify outlier compounds are located outside the applicability domain of the built QSAR model [35,36]. It was shown that training molecule numbers 2 and 48 as well as test molecule numbers including 25 and 51 were predicted as outliers which do not belong to the zone of applicability domain. In a next attempt, compound numbers 2, 48, 25, and 51 have been omitted from the topological coupled with 3D descriptor data set and again QSAR was modeled which follows as Validation parameters of this model are quite acceptable and external predictability in terms of predictive R 2 has increased to 0.518. Therefore, this model could be further used to predict biological activity of the newly designed congeneric compounds. Significant parameters included in models (4) and (5) were interpreted in Table  3. Mechanistic interpretation of the modeled descriptors including EEig11r, EEig13r, GATS5p and BEHp3 having positive contribution towards H1N1 inhibition is due to of resonance effect and atomic polarizabilities. Descriptor including TIC1, TIC3, IVDE, Yindex, and IC3 are 3D information indices encoding size, shape, and symmetry and atom distribution of the inhibitors. Functional groups having higher van der Waal volumes are favorable for the activity which has been captured as 2D autocorrelation indices including MATS5v and H7v as well as functional group index like nR=Ct. Much more increase in the molecular volume, size and mass may produce detrimental effect on the biological activity which is given by Moran autocorrelation of lag 7 weighted by mass (MATS7m) with negative coefficient. These significant predictors could be considered for further generation of highly active angelicin congeners. So the study in this direction can increase the searching of hit rates and decrease in cost of drug design and development prior to experiment by producing potent chemotherapeutics against swine influenza.

Angelicin ligand-neuraminidase vRNP target docking
Docking analyses of angelicin-neuraminidase vRNP target revealed that angelicin core can produce Aromatic π-stacking, groups substituted in 8 th , 9 th and R are responsible for producing hydrophobic interactions whereas O-atom of angular furan can produce hydrogen bonding. The opti mized ligands were docked into active binding cavity of neuraminidase target considering grid resolution (angle) of 0.4 degrees as default value. ArgusLab allows free rotation of the ligand inside the cavity so as to generate a number of 150 poses. The best complex pose with minimal interaction energy of has been taken into consideration for better explanation of mode of interaction between the ligand and active amino acid residues of the receptor protein. Comparative best docking complex pose of some highly active, intermediate active and lower active ligands were analyzed in Table 4. It was found that O-atom of angular furan of intermediate active and lower active compounds do not produce any hydrogen bonding whereas highly active ligand can produce same with TRP 760 which is responsible for the inhibition of Swine influenza viral ribonucleoprotein.
To predict the mode of binding of highly active ligand, interaction patterns of compound 32 has been explained in Figure 1. From the docking study ( Figure 1) it was shown 4-methyl group can produce hydrophobic interaction with TYR 738. The fused benzene ring is surrounded by hydrophobic and aromatic residues including TRP 760. This moiety may contribute aromatic pi (π) stacking interaction. At which electron delocalization may be produce and may enhance the H1N1 ribonucleo protein inhibitory activity. The study in this way may direct to design active congeneric compounds for the treatment of against swine influenza.

Conclusion
From Table 3, it was predicted that the modeled descriptors including EEig11r, EEig13r, GATS5p and BEHp3 contribute resonance effect and atomic polarizabilities. Therefore it was concluded that substitution of electron donating functional groups containing hetero atoms in the angelicin may increase the resonance integrals and polarizability to increase the inhibition of H1N1 neuraminidase. This is in good agreement with the docking findings where fused benzene ring is surrounded by hydrophobic and aromatic residues including TRP 760 that may contribute aromatic pi (π) stacking interaction. The modeled descriptors including MATS5v and H7v represented three dimensional van der Waals volumes. Therefore, the functional groups with higher van der Waals volume are responsible for interacting H1N1 viral protein amino acid in the active binding cavity which may produce the biological activity. The capture of 3D information indices indicating size, shape, and symmetry and atom distribution has been proved by the significant descriptor including TIC1, TIC3, IVDE, Yindex, and IC3 respectively. Increase in the value of these descriptors may increase the H1N1 viral protein inhibition. Presence of nR=Ct functional group may increase size of the molecule and increase in number of aliphatic tertiary C may contribute hydrophobic interaction in cleft of H1N1 neuraminidase. This model may provide clues for future drug design to combat against swine influenza.

Future scope: Docking based screening model
Angelicin can be fragmented by utilizing synthon approach [37] proposed by E. J. Corey who introduced the concept of a synthon in retrosynthetic analysis. By fragmentation of angelicin core ( Figure  2), 7-hydroxy-4-methyl-coumarin template has been produced which was docked inside the same neuraminidase target cavity and energy of ligand receptor interaction was found as -6.242 kcal/mol.
The study of interaction pattern of 7-hydroxy-4-methyl-coumarin along with 3B7E is given in Figure 3. From this model it is clear that 4-methyl has interacted with TYR 738 and ILE 480 as well as TRP 743. Fused benzene ring has interacted with TRP 760, TRP 743 by aromatic pi-stacking interaction. So there is a similarity between mode of binding of these two docked models which showed TYR 738 and TRP 760 as common interactive residues. Therefore, these docking models could be used for the screening and designing of potent 7-hydroxy-4-methylcoumarin congeneric compounds supposed to have antiviral activity.