Computational identification and binding analysis of orphan human cytochrome P450 4X1 enzyme with substrates

Cytochrome P450s (CYPs) are important heme-containing proteins, well known for their monooxygenase reaction. The human cytochrome P450 4X1 (CYP4X1) is categorized as “orphan” CYP because of its unknown function. In recent studies it is found that this enzyme is expressed in neurovascular functions of the brain. Also, various studies have found the expression and activity of orphan human cytochrome P450 4X1 in cancer. It is found to be a potential drug target for cancer therapy. However, three-dimensional structure, the active site topology and substrate specificity of CYP4X1 remain unclear. In the present study, the three-dimensional structure of orphan human cytochrome P450 4X1 was generated by homology modeling using Modeller 9v8. The generated structure was accessed for geometrical errors and energy stability using PROCHECK, VERFIY 3D and PROSA. A molecular docking analysis was carried out against substrates arachidonic acid and anandamide and the docked substrates were predicted for drug-likeness, ADME-Tox parameters and biological spectrum activity. The three-dimensional model of orphan human cytochrome P450 4X1 was generated and assessed with various structural validation programmes. Docking of orphan human cytochrome P450 4X1 with arachidonic acid revealed that TYR 112, ALA 126, ILE 222, ILE 223, THR 312, LEU 315, ALA 316, ASP 319, THR 320, PHE 491 and ILE 492 residues were actively participating in the interaction, while docking of CYP4X1 with anandamide showed that TYR 112, GLN 114, PRO 118, ALA 126, ILE 222, ILE 223, SER 251, LEU 315, ALA 316 and PHE 491 key residues were involved in strong interaction. From this study, several key residues were identified to be responsible for the binding of arachidonic acid and anandamide with orphan human cytochrome P450 4X1. Both substrates obeyed Lipinski rule of five in drug-likeness test and biological spectrum prediction showed anticarcinogenic activity. Compared to anandamide, arachidonic acid showed strong interaction with cytochrome P450 4X1 and also less health effect in certain human system in ADME-Tox prediction. These findings provide useful information on the biological role and structure-based drug design of orphan human cytochrome P450 4X1.

CYPs. The human CYP4X1 gene is located within the P450 ABXZ gene cluster on chromosome 1. The gene has 12 exons and the predicted protein has 509 amino acids [6]. The tissue distribution of human CYP4X1 is reported to be predominantly found in adult human skeletal muscle, trachea, and aorta [7]. Recent report suggests that CYP4X1 is expressed in brain and in the liver [8,9]. It is involved in drug metabolism and synthesis of cholesterol, steroids and other lipids. Members of the cytochrome P450 4F subfamily are known to primarily oxidize endogenous compounds, for example, fatty acids and arachidonic acid derivatives [10]. The metabolic capabilities of CYP4X1 are largely unknown, yet a recent study has identified arachidonic acid derivatives to have been implicated in a large number of physiologically important processes. A number of P450s, primarily from subfamilies 2C, 2J, 4A and 4F, are known to oxidize arachidonic acid which has been implicated as important signaling mediator. The arachidonic acid derivative anandamide (arachidonoyl ethanol amide) is a natural endocannabinoid found in most human tissues, and acts as an important signaling mediator in neurological, immune and cardiovascular functions. Anandamide (arachidonoyl ethanol amide) has emerged as an important signaling molecule in the neurovascular cascade [11].
Human CYP4X1 amino acid sequence has been identified recently but the three-dimensional structure of this protein is not yet known. Earlier experimental studies of CYP4X1 proposed that arachidonic acid and its derivative anandamide can act as possible substrates [12]. However, to date information on the structure and ligand binding site is not available for CYP4X1. Through homology modeling it is possible to generate realistic model comparable to experimental structures and through docking studies substrate binding energies and important key residues involved in substrate binding can be found. Many Computer-Aided Drug Design (CADD) methodologies have been carried out previously for finding suitable drug target [13][14][15][16][17][18][19]. In the present work, three-dimensional model of CYP4X1 was constructed using homology modeling and energy minimization was done to refine the model. After that, arachidonic acid and anandamide were docked into the active sites of the CYP4X1 model. The interaction between CYP4X1 and substrates helped in finding energetically favorable binding sites and the key residues responsible for substrate specificity.

Sequence retrieval
The sequence of human CYP4X1 protein in FASTA format was retrieved from Uniprot Knowledge base (http:// www.uniprot.org/) of accession number Q8N118.

Sequence alignment and homology modeling
For the template selection, PSI-BLAST search was used against Protein Data Bank (PDB) and top ranked six  templates (1TQN, 3CZH, 2HI4, 3NA0, 3K9V, 3E4E) were selected for the model building. The templates and target sequence were aligned by using Clustal Omega [20] with default parameters and observed for conserved sequence. Further, the aligned sequence was used as the input to generate homology model of CYP4X1 using Modeller 9v5 [21]. The coordinates for heme were obtained from the template 1TQN and positioned as in the template.

Energy minimization and structural validation
The constructed CYP4X1 model was further refined by energy minimization using YASARA package [22], and the resulting model was subjected to structural quality assessment. PROCHECK and VERIFY 3D were used for geometric evaluation. The PROSA program was used to assess the energy of residue-residue interaction using a distance-based pair potential and the energy was transformed to a score called Z-score. Residues with negative Z-score indicate reasonable side-chain interactions.

Binding site analysis
ConSurf was used for identification of the functional regions in the protein. The degree of conservation of the amino acid sites among homologues protein with similar sequences was estimated. The conservation scores were depicted onto the molecular surface of the orphan human cytochrome P450 4X1 to reveal the patches with highly conserved residues that are often important for biological function.

Ligand optimization
The possible substrates like arachidonic acid and its derivative anandamide were downloaded from the PubChem in Structure Data Format (SDF). Conversion of SDF to Protein Data Bank (PDB) format was carried out using Open Babel program [23]. The MMFF94 force field was used for energy minimization of ligand molecules [24]. Gasteiger partial charges were added to the ligand atoms, non-polar hydrogen atoms were merged and rotatable bonds were defined. Ligand geometries and electric properties were calculated using MOPAC2009 [25].

Molecular docking
Docking calculations were carried out using DockingServer [26] to compute the free energy of binding on protein model. Essential hydrogen atoms, Kollman united atom type charges and solvation parameters were added with the aid of Auto Dock tools [27]. Affinity (grid) maps of 60X60X60Å grid points and 0.375 Å spacing were generated using the Auto grid program. Auto Dock parameter set and distance dependent dielectric functions were used in the calculation of the Van der Waals and the electrostatic terms respectively. Docking simulations were performed using the Lamarckian Genetic Algorithm (LGA) and the Solis and Wets local search method. Initial position, orientation and torsions of the ligand molecules were set randomly and all rotatable torsions were released during docking. Each docking experiment was derived from 10 different runs that were set to terminate after a maximum of 250000 energy evaluations. During the search population size of 150, translational step of 0.2 Å and quaternion and torsion steps of 5 were applied.
Prediction of drug-likeness, ADME-Tox and biological spectrum activity The substrates arachidonic acid and anandamide were subjected drug-likeness prediction using Lipinski rule of five, toxicity prediction using ADME-Tox and also biological activity prediction.

Sequence analysis
The amino acid sequence of cytochrome P450 4X1 was retrieved from UniProt database and subjected to transmembrane prediction using PHD and TMHMM 2.0 servers. The trans-membrane domain was predicted between 13-30 residues. The amino acid sequence of CYP4X1 was queried against Protein Data Bank (PDB) using PSI-Blast with five iterations. PSI-Blast analysis showed that the alignment between target and templates was below 30%. Under this circumstance, use of the multiple templates could provide more reliable structure modeling. Hence, best six templates (1TQN, 3CZH, 2HI4, 3NA0, 3K9V, 3E4E) were selected and the selected template sequence and target sequence (CYP4X1) were subjected to multiple alignments using Clustal Omega with default parameters. The alignment between targets and templates were examined for sequence conservation and signature motifs. The structural motifs were WxxxR in the C helix, ExxR in the K helix and essential one FxxGxxxCxG in L helix. Characteristic motif for the CYP super family, which includes conserved cystine residue that ligates to the Fe of the heme was observed (Figure 1). Several signature motifs were conserved for CYP4X1 which is in agreement with previous findings [15,28]. The conservation of sequence and signature motifs elucidates that model based on this alignment is reliable. In the available template, the first 20 residues of the N-terminal were deleted in order to facilitate its crystallization. Since the N-terminus does not affect the substrate binding, the corresponding first 20 residues were not modeled in the CYP4X1 model.

Structure analysis
The alignment between templates and target sequence were submitted to Modeller for the construction of homology model of CYP4X1. The loops were modeled through the server ModLoop and subjected to energy minimization using YASARA package. Energy minimization was performed by YAMBER force field implemented in YASARA package to get optimized model structure having the initial energy of 72759.7 kJ/mol to a final energy of −247540.1 kJ/ mol. The energy minimized homology model of CYP4X1 was subjected to several structural validation programs like PROCHECK, VERIFY 3D, QMEAN, PROSA, and PROVE. The PROCHECK [29] analysis based on Ramachandran plot provides an idea on the stereo chemical quality of the protein model. It highlights regions of the proteins which appear to have unusual geometry and provides an overall assessment of the structure. The PROCHECK evaluation showed the residues in most favored regions as 81.9%, residues in additional allowed regions as 14.8%, residues in generously allowed regions as 2.6% and residues in  disallowed regions as 0.7% (Figure 2A). For a good quality model, the residues located in the core and allowed regions should be over 90% which is the case for the model presented here (since 81.4% + 14.8% = 96.2%). In order to check the native protein folding energy of the model, PROSA II [30] analysis was made and mean force potentials and averaged over all residues in the structure were determined. The PROSA II analysis showed Z-score of −8.33 ( Figure 2B), and PROVE [31] analysis showed average Z-score of 0.97 and Z-score RMS value of 23.24. QMEAN [32] is a scoring function of a linear combination of six structural descriptors with the score ranging between 0 and 1 and higher value reflecting a better quality of input model. The QMEAN score of the present model is 0.517 ( Figure 2C). From the results of entire structural validation program it is inferred that the homology modeled protein is reliable for further study.
The final model was deposited in Protein Model Database (PMDB) [33] and it is available under ID: PM0079202. The overall topology of homology model of CYP4X1 consists of twelve helices, namely, A-L, five antiparallel beta sheets and their connecting loops. The heme is positioned between two structurally conserved helices, I and L ( Figure 3A). Highly conserved theronine residue is located in the middle of helix I as seen in many available mammalian CYP structures. This conserved residue has been suggested to participate in the proton delivery and play an important role in the dioxygen bond cleavage during catalytic cycle [34].

Binding site prediction
ConSurf [35] was used to characterize the functional regions in the protein. It identifies by considering the degree of conservation of the amino acid sites among their sequence homologues. The conservation grades were projected on the molecular surface of the CYP4X1 protein to reveal the patches with highly conserved residues that are often important for biological function. The surface residues with the most conserved amino acids are shown colored in bordeaux, residues with average conservation in white and variable amino acids in turquoise in the protein structure in Figure 3B.
Predicting ligand binding sites can reduce the conformational space of docking and also can provide insights into their molecular functions.

Molecular docking
In order to understand enzyme-substrate interaction and to determine the key residues responsible for interaction, the model of CYP4X1 was docked with selected substrates using DockingServer. DockingServer integrates a number of popular software like Auto Dock and MOPAC for accurate ligand geometry optimization, energy minimization, charge calculation, docking calculation and protein-ligand complex representation. The calculated free energy of binding of CYP4X1 with arachidonic acid ( Figure 4A) and anandamide ( Figure 4B) were −7.76 and −5.76 kcal/mol respectively ( Table 1). The negative and low value of free energy of binding indicates a strong favorable bond between CYP4X1 and arachidonic acid in most favorable conformations. Docking of CYP4X1 with arachidonic acid revealed that major residues involved were TYR 112, ALA 126, ILE 222, ILE 223, THR 312, LEU 315, ALA 316, ASP 319,    Figure 5B). The common residues in both arachidonic acid and anandamide were TYR 112, ILE 223, LEU 315, ALA 316 and PHE 491 ( Table 2).

Evaluation of drug-likeness and ADME-Tox
Both arachidonic acid and anandamide were evaluated for drug-likeness using Lipinski rule of five [36]. The 'rule of five' imposes limitation on the logP (the logarithm of octanol/water partition coefficient), molecular weight and the number of hydrogen bond acceptors and donors. The rule states that most 'drug-like' molecules have logP <5, molecular weight <500, number of hydrogen bond acceptors <10 and number of hydrogen bond donors <5. Molecules violating more than one of these rules may have problems with bioavailability. Both arachidonic acid and anandamide do not violate the Lipinski rule of five parameters to be an orally active compound (Table 3). Moreover, according to Freitas [37], the oral bioavailability is inversely proportional to topological polar surface area (TPSA). In this study, arachidonic acid has lower topological surface area value than anandamide suggesting arachidonic acid has better oral bioavailability than anandamide. Analysis of pharmacological parameters using ADME-Tox evaluation revealed that both arachidonic acid and anandamide have comparable values on specific parameters related to absorption, distribution, metabolism, excretion and toxicity. But arachidonic acid has less health effects on human than anandamide (Table 4).

Biological spectrum predictions
Computer software PASS [38,39] predicts simultaneously several hundreds of biological activities depending upon the chemical structures of compounds such as  Calculates the apparent volume of distribution for a compound in L kg-1. f Test for assessing mutagenic properties of the compounds. g Estimates probability of blood, gastrointestinal system, kidney, liver and lung effect at therapeutic dose range. the predicted activity spectrum giving probable activity (Pa) and probable inactivity (Pi). Prediction of this spectrum by PASS is based on SAR analysis of the training set containing more than 35,000 compounds, which correlates with more than 500 kinds of biological activities. Pa and Pi values are independent and their values can vary from 0 to 1. The more the Pa value, the less is the probability of false positives in the set of compounds selected for biological testing. For example, if one selects compounds for which a particular activity has Pa ≥ 0.9, the expected probability to find inactive compounds in the selected set is very low, but about 90% of active compounds are missed. If compounds with Pa ≥ 0.8 are chosen, the probability to find inactive compounds is also low, but about 80% of active compounds are missed. By default, in PASS, value with Pa = Pi is chosen as a threshold and all compounds with Pa > Pi are suggested to be active. Another criterion for selection is the novelty of compounds. If, Pa value is high, sometimes one may find close analogues of known biologically active substances among the tested compounds. For example, if, Pa > 0.7, the chance to find the activity in experiment is high, but in some cases the compound may occur to be the close analogue of known pharmaceutical agents. If, 0.5 < Pa < 0.7 the chance to find the activity in experiment is less, but the compound is not so similar to the known pharmaceutical agents. If, Pa < 0.5 the chance to find the activity in experiment is even less, but if it is confirmed the compound may occur to be a new chemical entity. Results of prediction of the biological spectrum of substrates estimated when Pa > 0.5 is shown in Table 5. Both the substrates show similar biological spectrum for anticarcinogenic activity.

Conclusion
The orphan human cytochrome P450 4X1 has been suggested to be a potential drug target for cancer therapy. Lack of the structural information about this enzyme hinders the detailed characterization of its biological functions and its applications in structure based design. For this reason, the three-dimensional model of orphan human cytochrome P450 4X1 was constructed using homology modeling. To provide useful information to characterize the enzyme's function, two known substrates, arachidonic acid and anandamide were docked into the active sites and then refined by energy minimization to determine favorable binding modes.
Several key residues TYR 112, ILE 223, LEU 315, ALA 316 and PHE 491 were identified to have involved in binding of arachidonic acid and anandamide. These key residues are expected to affect the catalytic activity and can be used as candidates for further mutagenesis studies. Both the substrates does not violate Lipinski rule of five in drug-likeness test, while in ADME-Tox prediction, arachidonic acid shows less health effects on cardiovascular system, gastrointestinal system and kidney of human when compared to anandamide. In biological activity spectrum predictions, both arachidonic acid and anandamide show anticarcinogenic activity. However, further in-vivo validation and conformation of the present finding is required. The results of this study will be useful for structure-based drug design of orphan human cytochrome P450 4X1.