Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

Jayaram, B; Dhingra, Priyanka; Mishra, Avinash; Kaushik, Rahul; Mukherjee, Goutam; Singh, Ankita; Shekhar, Shashank

doi:10.1186/1471-2105-15-S16-S7

Volume 15 Supplement 16

Thirteenth International Conference on Bioinformatics (InCoB2014): Bioinformatics

Research
Open access
Published: 08 December 2014

Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

B Jayaram^1,2,3,
Priyanka Dhingra^1,2,
Avinash Mishra^2,3,
Rahul Kaushik^2,3,
Goutam Mukherjee^1,2,
Ankita Singh² &
…
Shashank Shekhar²

BMC Bioinformatics volume 15, Article number: S7 (2014) Cite this article

3389 Accesses
36 Citations
Metrics details

Abstract

Background

The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals.

Results

Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution.

Conclusion

Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment.

Background

"The native conformation of a protein is determined by the totality of interatomic interactions and hence, by the amino acid sequence, in a given environment" (Nobel Lecture, Christian B. Anfinsen, December 11, 1972). According to Anfinsen's protein folding hypothesis, a protein's native structure is determined by its amino acid sequence which drives protein into its minimum Gibbs energy state [1]. This hypothesis evolved as a basic tenet for protein structure prediction algorithms (PSPAs). However limited understanding of net balance of forces involved in protein folding creates deficiencies in various proposed PSPAs. One of the early efforts in solving protein folding problem was driven by thermodynamic calculations, which incorporate searching algorithms to investigate a conformation that corresponds to minimum free energy [2]. Here the large number of degrees of freedom of a protein gives rise to innumerable conformations, an enumeration of which is practically impossible. This despite, proteins fold rapidly into their native structure in milliseconds to seconds time scales implying that a brute force enumeration of all possible conformations may not be required as implicit in Levinthal's Paradox [3]. The fact that sequence introduces local structural bias, narrows down the accessible conformational space and introduces local as well as long range interactions, suggesting a halfway solution to the paradox [4–7]. As a result, PSPAs need two key components: (a) a rapid computational algorithm for protein conformational search and (b) an accurate scoring function to capture the best available conformation. The first component involves use of different physics based as well as knowledge based approaches for extensive sampling of the vast conformational space [8, 9]. Physics based sampling methods include use of Monte Carlo (MC) methods [10–15], Genetic algorithms [16], molecular dynamics simulations (MD) [17, 18], simulated annealing [19, 20], replica-exchange MC or MD and local enhanced sampling [21–23]. Knowledge based methods use information from the solved protein structures and knowledge based potentials for sampling protein conformational space [24].

Homology modeling [25–30] and fold recognition/ threading methods [31–35] are knowledge based approaches, which are routinely used to generate reliable models for proteins with overall fold topology similar to an available template in the protein databases. Query protein with no sequence and structural similarity are modeled from scratch using physics based/ ab initio approaches. The success of ab initio or physics-based sampling methods is limited by lack of accurate energy functions [36, 37], heavy computational requirements, force field errors [38–40] and protein size, while knowledge based approaches are limited by sequence similarity and evolutionary relationships [41–43]. A popular trend in protein conformational sampling is the fragment assembly method, which uses parts of known protein or protein fragments to generate a structure of the target. After conformational sampling, the next immediate concern is to capture the best available structure by means of a scoring function [44–56]. These functions combine chemical, physical, geometrical and energetic constraints to capture native or near native models [57, 58].

A thorough literature survey reveals that the available protein structure prediction algorithms are based on methods such as (a) homology modeling, (b) fold recognition, (c) ab initio and (d) hybrid [59, 60]. Different software/tools are available in the public domain based on these computational approaches and are evaluated every two years during the Critical Assessment of techniques for protein structure prediction (CASP experiments) [61]. Recent CASP experiments have shown significant progress by hybrid approaches, which combine homology, ab initio along with atomic level model refinements for protein structure prediction [62]. This article describes Bhageerath-H, a homology/ ab initio hybrid software for predicting tertiary structure of monomeric proteins. Bhageerath-H makes use of Bhageerath-H Strgen algorithm [63] for extensive sampling of the protein fold space and generates a large basket of decoys containing near-native protein conformations, which are further supplemented by a chemical logic based alignment scheme and then clustered to eliminate non-unique redundant structures. These are then screened by a physico-chemical scoring metric (pcSM) and assessed for their quality. The selected models are refined via a unique and effective quantum mechanics based loop bond angle optimization method, which drives the selected models further close to the native topology. Bhageerath-H automated pipeline is freely available to the scientific community across the world via http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp.

Methodology

Bhageerath-H software suite for protein tertiary structure prediction narrows down the conformational search space and predicts five probable near native candidate structures for an input amino acid sequence. The software comprises seven computational modules which work in conduit and together form an automated pipeline. Figure 1 shows a diagrammatic representation of Bhageerath-H software suite. Following sections discuss each module of the automated pipeline.

(A) Bhageerath-H Strgen for candidate structures

The first step in the pipeline involves generation of a large pool of full length decoys. In the proposed protein structure prediction pipeline, Bhageerath-H Strgen algorithm for protein conformational sampling [63] is the first module. The module takes as input protein amino acid sequence and provides as output a large pool of decoys. A revised and improved version of structure generation algorithm is incorporated in the Bhageerath-H software suite. Bhageerath-H Strgen makes use of the current sequence and structural database knowledge along with Bhageerath ab initio folding [64, 65] in order to effectively search the fold space for an input protein sequence. It starts with amino acid sequence, followed by secondary structure prediction and BLAST [66] search for sequence based homologs. In addition, it also searches for distant analogs and structural homologs using tools such as pGenthreader [67, 68], ffas [69, 70], spark-x [71] and HHSearch [72]. A new addition to this methodology is a chemical logic based [73] procedure for template selection followed by alignment generation. It utilizes amino acid chemical properties such as hydrogen bond donor, conformational flexibility, shape and size of side chains for generating an amino acid substitution scoring matrix. This scoring matrix is used for template selection as well as template-target alignment generation. The matrix helps in selecting distant homologs, which are generally missed during a normal database search. The templates and template-target alignments are used for modeling fragments of varying length via Modeller [74, 75]. Modeled fragments are then screened for missing links with no available templates. These missing stretches are generated using Bhageerath ab initio modeling method [65, 76, 77]. All the incomplete protein fragments are patched in order to generate full-length models, which are energy scored and top 5 lowest energy decoys are sent for Bhageerath abintio loop sampling. The newly sampled structures are added to the growing pool of full length protein decoys. The output of the first step is a large pool of protein decoys. The average size of the decoy pool is on the order of 10⁴-10⁵ structures.

Bhageerath-H Strgen module includes locally installed copies of Psipred, BLAST, PFAM [78], SCOP [79, 80], nr [81], pdb database [82]http://www.pdb.org/pdb/home/home.do, HHSearch, Spark-X, pGenthreader, ffas and modeller. The scalable Bhageerath-H Strgen algorithm is currently configured to utilize 64 processors of Linux Cluster. Programs are written in C++, MPI language and involve use of linux shell scripting. Average time taken for Bhageerath-H Strgen run is 1-2hrs. This first module of the Bhageerath-H pipeline generates a large pool of decoys which needs to be further filtered, processed and refined. We would like to note that Bhageerath-H software is not just limited to Bhageerath-H Strgen an already published algorithm. Bhageerath-H Strgen is a protein decoy generation program which is the first module here. After protein decoy generation, protein decoy selection and refinement are the other two very important steps in protein structure prediction pipeline. In Bhageerath-H software modules 2-5 are dedicated for decoy clustering, selection and refinement, which are not included in Bhageerath-H Strgen. Output from this module is submitted for clustering in the next step.

(B) Clustering

Recurring structural models sampled in the previous step are clustered using K-means clustering algorithm. The main aim of this step is to retain a single representative structure of each unique topology. MMTSB [83] toolkit's k-clust is used to perform clustering. The tool requires list of protein decoys to cluster. Following command was executed:

kclust -mode rmsd -cdist -heavy -lsqfit -radius 1.0 -maxerr 1 pdblist > cluster_file

This command gives as an output a cluster file, which contains the centroid in the pdb format along with the members of each centroid and the root mean square deviation (rmsd) distance of each member from the centroid. The centroids themselves are mathematical constructs and convey no information, but utilizing rmsd information one lowest rmsd member from each cluster is picked [83]. To overcome the time limitation, clustering is performed in a parallel mode. The output of K-mean clustering is a set of decoys, which are unique, non-recurring and contain near-native structural models. This set of decoys containing near-native models is submitted for physico-chemical scoring in the third step.

(C) Scoring based on a physico-chemical metric

The third step in the Bhageerath-H pathway involves the use of a robust metric that combines chemical, physical, geometrical and energetic constraints known to show universalities among native protein structures. The physico-chemical scoring metric (pcSM) consists of different parameters, which include (a) P: Secondary structure penalty, (b) M: Euclidean distance, (c) A1-A4: Surface areas and (d) E: Empirical potential energy functions. The scoring function calculates a final cumulative score (CS), which comprises each of these parameters.

C S = c_{A 1} A 1 + c_{A 2} A 2 + c_{A 3} A 3 + c_{A 4} A 4 + c_{p m a x} (P_{H}, P_{S}) + c_{M 1} M 1

where A1 is the fractional area of exposed non-polar residues, A2 is the fractional area of exposed non polar part of residues, A3 is the weighted exposed area, A4 is the total surface area, PH and Ps are secondary structure penalties for helix and sheet respectively, M1 is Euclidean distance. The prefix "c" for each parameter in the above equation refers to its optimized coefficient. c_A1 = 10, c_A2 = 0.1, c_A3 = 0.00001, c_A4 = 0.001, c_M1 = 0.001, c_p = 0.15(P_H) and 0.21(P_S).

In order to get the top 10 structures, each of the seven parameters are evaluated for all the clustered decoys and a short energy minimization is performed to remove steric clashes. For the given input decoy pool, pcSM gives as an output top 10 ranked native-like candidates structures. pcSM algorithm runs in parallel mode and utilizes 64 processors. On an average, time taken for scoring varies from 2 to 3 hours. The top 10 pcSM ranked models are submitted for protein structure analysis and validation in the next step.

(D) Protein Structure Analysis and Validation (PROTSAV) based ranking

PROTSAV is a protein structure quality assessment meta-server (manuscript under preparation). Currently, it comprises six tools namely Procheck [84], Verify-3D [85], ERRAT [86], Naccess [87], PROSA [88] and dDFIRE [89], for quality assessment of protein structures. PROTSAV generates an overall protein quality score, which is a summation of scores predicted by individual modules. High PROTSAV values reflect poor structure quality of query protein and low values close to zero represent good quality of query protein structure. Run time for this module is 40-45 seconds. In this step, pcSM selected top 10 protein models are analysed and ranked. The top ranked model is submitted for QM based loop bond angle refinement in the next step.

(E) Quantum mechanics (PM6) based loop bond angle optimization

Quantum mechanics (PM6) based loop bond angle optimization (manuscript in preparation) takes topmost PROTSAV selected model as an input, optimizes loop bond angles and performs ab initio loop sampling [66]. The small pool of decoys generated in the process is side chain optimized using Scwrl4. Scwrl4 is a program for prediction of protein side chain conformation [90]. Scwrl4 uses latest backbone-dependent library to provide rotamer frequency, dihedral angles and variances. The side chain optimized decoys are further energy minimized (SD = 500, CG = 500) using sander module of AMBER10 software [91].

These optimized and energy minimized refinement generated decoys are scored using pcSM and the top 10 ranked QM refined models are passed to next step.

(F) Final ranking

Input to this step is top 10 pcSM ranked QM refined models from step (E) and top 5 PROTSAV ranked models from the step (D). PROTSAV ranked models are side chain optimized and energy minimized before final ranking. The selected 15 models are re-ranked using pcSM and the top 5 are given to the user as an output.

The Bhageerath-H protocol is a careful combination of different algorithms which are configured to work in conduit. Starting from Bhageerath-H Strgen followed by clustering, pcSM scoring, PROTSAV and QM refinement each module has its own importance and role in providing the user, near-native candidate structures as final output. The software takes protein amino acid sequence as input and provides a user as output five native-like candidate structures. Figure 2 shows the flow chart of Bhageerath-H software suite.

Results and Discussion

Validation of Bhageerath-H software suite

Bhageerath-H automated pipeline was thoroughly tested and validated on the benchmark CASP10 dataset. Each CASP experiment reveals the state of the art in the field of protein structure prediction. About75 CASP10 targets of varying size and complexity were considered here for the analysis. To begin with the assessment, CASP-like conditions were mimicked, which means the native and near-native homologs were excluded during structure prediction. Any template released later than the first CASP10 server target i.e. fifth May of 2012 was not considered. For structure assessment an automated pipeline was developed. For each CASP10 target, sequence was extracted from the native structure. Then predicted structure sequence and the native sequence were aligned using ClustalW [92]. Residues with missing coordinates were removed from the predictions in order to make the sequence of the two structures match exactly. The native and the Bhageerath-H generated final five models were compared based on the widely used criteria of Cα root mean square deviation (Cα RMSD) and Template modeling score (TM-score). Cα RMSD is a global indicator of structural identity, while TM-score identifies local substructures and evaluates local identity. TM-score refers to template modeling score. TM-score is considered as a quantitative measure for classification of protein topology. A TM-score > 0.5 signifies that protein pairs share same fold whereas a TM-score < 0.5 are mostly not of the same fold and a TM-score of 0.17 indicates random prediction [93, 94].

(A) Bhageerath-H performance on 75 CASP10 targets

Bhageerath-H was validated on 75 CASP10 targets. Cα RMSDs and TM-scores of final five Bhageerath-H predictions from the native were calculated. In 68 out of 75 systems i.e. in 91% of the cases Bhageerath-H predicted model has a TM-score ≥0.5, while in 44 targets i.e. in 59% of the cases Bhageerath-H was able to predict a model in top 5 having a Cα RMSD from the native ≤5.0Å (Additional File 1). Figure 3 shows the TM-score distribution and Figure 4 shows the Cα RMSD distribution of all the75 targets.

Comparison of Bhageerath-H performance with BAKER-ROSETTA, Quark and MULTICOM-CLUSTER

For comparative analyses, we considered three state-of-the-art servers for protein tertiary structure prediction. Predictions submitted by BAKER-ROSETTA [95], Quark [96] and MULTICOM-CLUSTER [97] during CASP10 [62] experiment were used. Their submitted five predictions were downloaded from the CASP10 website http://www.predictioncenter.org/casp10/index.cgi and analyzed using the automated evaluation pipeline described above. The minimum RMSD obtained among the five submitted models was considered. In 36 cases, BAKER-ROSETTA server submitted a model among five predictions having Cα RMSD from the native ≤5.0Å. Quark submitted 40 predictions among 75 under the Cα RMSD cutoff of 5.0Å, whereas MULTICOM-CLUSTER succeeded in 33 cases. In comparison to these three servers, Bhageerath-H server was successful in 44 cases i.e. in 59% of the cases, this server was able to propose a model in top 5 having a Cα RMSD from the native ≤5.0Å (Figure 5).

CASP organizers assign a unique target id to each protein fielded in the CASP experiment. While validating and comparing performance of Bhageerath-H software on 75 CASP10 targets, we have closely analyzed some of the CASP10 target proteins in which Bhageerath-H outperformed other three servers under consideration. A brief description of the biological role of the targets T0655, T0672, T0675, T0700, T0716, T0736, T0747, T0755, T0669, T0713, T0686, T0724 is given in Additional File 2.

For targets T0655, T0672, T0675, T0700, T0716, T0736, T0747, T0755 Bhageerath-H outperformed BAKER-ROSETTA server. It predicted a structure in top 5 within the defined Cα RMSD cutoff (≤5.0 Å). In case of Quark, Bhageerath-H exceeded in 6 cases T0669, T0672, T0675, T0685, T0716, T0747, while Bhageerath-H was successful in 11 cases when compared with MULTICOM-CLUSTER. For targets T0655, T0672, T0675, T0716, T0747, Bhageerath-H achieved high prediction accuracy than all the three servers (Figure 6).

A close inspection of the reason for better performance of Bhageerath-H revealed that for targets such as T0675, T0672, T0669, T0716, T0736, T0700 it was Bhageerath-H Strgen patching module as well as ab initio loop sampling which generated a low RMSD near-native structure. In systems T0655, T0747, the low RMSD sampled structure is due to the amino acid chemical logic based scoring matrix. The amino acid substitution scoring matrix is a new addition to Bhageerath-H Strgen methodology and performs a very thorough search of the database for homologs based on amino acids chemical properties. This matrix helped in template search and alignment generation especially in targets T0655 and T0747, where most other servers failed to predict a low RMSD structure. It identified correct templates and generated better target-template alignments, which resulted in high quality near-native structural models for proteins with low sequence similarity. In cases where a full length template is unavailable, the matrix helped in generating high quality alignments for short sequence fragments. Other than amino acid chemical logic based scoring matrix the major contributor for better performance of Bhageerath-H software is abinitio loop sampling. Loops are the most flexible parts of a protein structure involved in molecular recognition. Correct modeling of loops has always been a challenge. Ab initio loop sampling module helped in systematic and thorough sampling of the loop conformation space and generated low RMSD models. CASP 10 targets where Bhageerath-H outperformed other participating servers were mainly modeled through chemical logic and ab-initio loop sampling.

Other than above specified targets, Bhageerath-H's performance is noteworthy for targets T0713, T0686 and T0724 when compared to the other three servers under consideration. Though high quality Bhageerath-H models were not predicted, these targets need special attention and discussion. These three targets are described below as case studies for illustration of Bhageerath-H performance.

(i) Target T0713: This target is a hypothetical protein from Eubacterium ventriosum having PDB: 4H09 and 739 amino acid residues. It has four leucine rich repeats domains which take solenoid shape in protein structure. These domains help protein to interact with its complementary protein partner. Bhageerath-H sampled a lowest RMSD structure of 8.91Å in pool of trial structures. After clustering and pcSM decoy selection the lowest RMSD model in top 10 was 9.80Å. The topmost PROTSAV selected model was given to QM based structure refinement. QM refined the input model and generated a decoy in the small pool having 6.61Å Cα RMSD from the native. It is due to the bond angle optimization which assisted in a better conformational sampling and a lower RMSD decoy, which was picked by pcSM during final five ranking. Bhageerath-H successfully modeled and picked a structure in the top five having leucine repeat domain similar to the native structure. The domain form horseshoe shape reflects its biological activity.

(ii) Target T0686: This target is a sporozite surface protein of plasmodium vivax, one of the causative agents for malarial disease. It is also called TRAP (thrombospondin repeat anonymous protein) which mediates the invasion of mosquitoes and vertebrates host cells in malaria. TRAP protein has two functional domains (i) TSP (thrombospondin type I) and (ii) VWA (von willebrand factor type A) that are responsible for cell adhesion. Bhageerath-H Strgen generated a 7.41Å RMSD structure which was retained post clustering. pcSM and PROTSAV picked an 8.13Å structure which was submitted for QM based refinement. The final lowest RMSD model in top 5 is 7.75Å, which is a much better prediction in comparison to other server predictions. Model structure closely superimposes with VWA domain of native crystal structure (PDB: 4QHO) protein while there are a few anomalies in TSP domain. VWA domain is mainly responsible for protein's biological activity and covers a stretch of ~180 amino acids. TSP is a shorter domain (∼40 amino acids). The final ranked Bhageerath-H modelled structure missed an extended β-sheet, which resulted in a high RMSD of the prediction from the native.

(iii) Target T0724: This target is a hypothetical uncharacterized protein from bacteroides vulgates having PDB: 4FMR. It has only one characterized functional domain i.e DNA binding. QM based structure refinement assisted in better conformational sampling and in generating a near-native decoy. A brief biological description of the studied targets is given in the Additional File 2.

In a nut shell, major reasons behind the ability of Bhageerath-H to predict lower RMSD near-native models are firstly exhaustive sampling technique. Bhageerath-H Strgen and the newly developed amino acid chemical logic based scoring matrix help in a thorough search of template and protein conformational space, ensuring generation of near-native models in maximum instances. Secondly, it is the pcSM scoring function which cherry picks these native-like candidates with 93% accuracy. Apart from these two major modules, it is the PROTSAV structure analysis which ranks models accordingly and submits for QM refinement. Finally, QM based refinement protocol facilitates in going one step ahead and improves prediction accuracy.

(B) Assessment of individual modules of Bhageerath-H pipeline

To comprehend the potential of individual modules of Bhageerath-H automated pipeline, we further analyzed 7 targets where Bhageerath-H outperformed all the three servers. Table 1 details the output of individual modules of Bhageerath-H i.e Bhageerath-H Strgen, clustering, pcSM scoring, PROTSAV ranking and final output. Table 1 column 3 contains the result of module 1, Bhageerath-H Strgen. It shows the lowest Cα RMSD sampled in the decoy pool. Column 4 shows the size of the decoy pool. Column 5 has Cα RMSD result for module 2, clustering. It contains information of the lowest RMSD structure in the decoy pool after clustering. Column 6 represents the size of the decoy pool post clustering. Column 7 contains the result for module 3, pcSM scoring. It shows the lowest Cα RMSD among the top 10 pcSM ranked decoys. Column 8 has results of module 4, PROTSAV ranking. The Cα RMSD of topmost PROTSAV ranked model. The last column has the final prediction results of Bhageerath-H pipeline, the lowest Cα RMSD among final five Bhageerath-H predictions for the given target.

Table 1 Assessment of individual modules of Bhageerath-H pipeline for 7 CASP10 targets.

Full size table

As discussed earlier the backbone of any protein tertiary structure prediction software/tool is its protein conformational sampling module. Unless a near-native decoy is sampled/generated, it is impossible to attain high prediction accuracy. In Table 1 for all the 7 targets near-native decoys (Cα RMSD ≤5.0Å) were present in Bhageerath-H Strgen sampled decoy pool. These decoys were retained post K-mean clustering. While filtering bad decoys from good ones, it is extremely important to retain the sampled near-native decoys in the smaller basket. As can be seen from Table 1 clustering was able to reduce the basket size while retaining good structures. Second major module of prediction pipeline is scoring. pcSM scoring function has successfully picked the best decoys in top10 except in the case of T0655. PROTSAV has further assisted in ranking the best model (lowest Cα RMSD) as topmost model in 5 cases. In 2 cases we missed out the lowest RMSD sampled decoy in final ranking but successfully selected a ≤5 Å in final predicted output. The last column shows the final prediction results of Bhageerath-H pipeline. Figure 7(a1-a7) shows superimposition of lowest Cα RMSD Bhageerath-H predicted models with the corresponding natives.

(D) Quality assessment of Bhageerath-H predictions

Finally, the quality of Bhageerath-H predictions was assessed based on Molprobity score [98]. Molprobity score evaluates the stereochemistry of input structure. Online Molprobity server http://molprobity.biochem.duke.edu was used for score calculation. Additional File 3 shows the Molprobity score of the best Bhageerath-H predictions. Best refers to the lowest Cα RMSD in the final five Bhageerath-H predictions. The average Molprobity score is 1.94 for 75 predictions.

Bhageerath-H web server

Bhageerath-H automated pipeline is available for the scientific community as a freely accessible web server at url http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp. The web server takes as input amino acid sequence of the query protein. The processed results are sent to the users at the email id provided by them. Each submitted job is provided with a unique Jobid, which can be used to check job status. The server provides an option for specifying templates. A user can either opt for automatic template searching option or user defined template option. In automatic template searching option software itself searches for the best templates and uses hybrid approach to predict tertiary structure. In user defined template option, user is required to input template information i.e. template's pdb-id and chain id. Structures based on the defined templates will be given to the user as output. Complete Bhageerath-H run takes approximately 5-6 hours depending on the size of the protein. The software runs on a 35 node Quad-Core AMD Opteron(tm) Processor 2380 based cluster on CentOS platform over an Infini-band QDR backbone. Bhageerath-H receives at least 10-20 jobs every day from all across the world. Bhageerath-H is participating in CASP11 competition (1^st May 2014 - 16^th July 2014). Figure 8 show a screenshot of Bhageerath-H webserver.

Conclusions

We have developed Bhageerath-H, an automated pipeline for protein tertiary structure prediction and made it into a freely accessible web server http://www.scfbio-iitd.res.in/bhageerath/bhageerath_h.jsp. The pipeline comprise six different modules which are Bhageerath-H Strgen for decoy generation, K-mean clustering, pcSM for decoy selection, PROTSAV for structure validation, QM (PM6) based loop refinement and final ranking. Together each module assists in pushing the prediction accuracy to higher limits. Bhageerath-H server was validated on 75 CASP10 targets and results show that the methodology is effective in predicting good structures for proteins with varying sequence and structural similarities. Comparison with some of the existing softwares demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. A critical analysis of the targets where Bhageerath-H was unsuccessful in predicting low RMSD structures highlights the areas of improvement. These include better secondary structure prediction, better alignment strategies, improvement in ab initio modeling for sampling new folds and refinement strategies. We are currently working on these areas especially for targets with very low sequence similarity. The current version of Bhageerath-H has already taken the structure prediction field beyond CASP10. This improved methodology is fielded in the ongoing CASP11 experiment.

Several proteins exhibit partial or complete instability in their structures. These proteins are classified as intrinsically disordered proteins (IDPs). Bhageerath-H is a homology and abinito hybrid method for modeling structures of monomeric proteins. The current web-enabled version of the protocol is not specifically programmed to model structures of IDPs. Rather, the ab initio loop modeling section of the first module as well as QM(PM6) method for loop bond angle refinement attempt to sample conformation space of long loop stretches/disordered regions.

Thus to summarize, in the recent years, data driven homology based computational methods have proved successful in predicting tertiary structures for sequences with high sequence similarity. With the dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Overcoming these limitations while pushing the frontiers of protein structure prediction, we have proposed Bhageerath-H algorithm. The proposed algorithm finds applications in the field of protein structure/function prediction, active-site directed drug design, in studying protein-protein interactions, and in protein design and engineering. In the absence of experimental protein structure, the availability of computational protein tertiary structural models helps to probe biological functions of proteins.

Abbreviations

CASP:: Critical Assessment of Protein Tertiary Structure Prediction.
RMSD:: Root mean square deviation.
Cα RMSD:: C-alpha root mean square deviation
TM-Score:: Template modeling score.
PSPA:: Protein structure prediction algorithms.

References

Anfinsen CB: Principles that govern the folding of protein chains. Science. 1973, 181: 223-230. 10.1126/science.181.4096.223.
Article CAS PubMed Google Scholar
Guo JT, Ellrott K, Xu Y: A historical perspective of template-based protein structure prediction. Methods Mol Biol. 2008, 413: 3-42.
CAS PubMed Google Scholar
Levinthal C: Are there pathways for protein folding?. Journal de Chimie Physique et de Physico-Chimie Biologique. 1968, 65: 44-45.
Google Scholar
Srinivasan R, Rose GD: A physical basis for protein secondary structure. Proc Natl Acad Sci USA. 1999, 96: 14258-14263. 10.1073/pnas.96.25.14258.
Article PubMed Central CAS PubMed Google Scholar
Street AG, Mayo SL: Intrinsic β-sheet propensities result from van der Waals interactions between side chains and the local backbone. Proc Natl Acad Sci USA. 1999, 96: 9074-9076. 10.1073/pnas.96.16.9074.
Article PubMed Central CAS PubMed Google Scholar
Honig B: Protein folding: From the levinthal paradox to structure prediction. J Mol Biol. 1999, 293: 283-293. 10.1006/jmbi.1999.3006.
Article CAS PubMed Google Scholar
Chikenji G, Fujitsuka Y, Takada S: Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study. Proc Natl Acad Sci USA. 2006, 103: 3141-3146. 10.1073/pnas.0508195103.
Article PubMed Central CAS PubMed Google Scholar
Liwo A, Czaplewski C, Ołdziej S, Scheraga HA: Computational techniques for efficient conformational sampling of proteins. Curr Opin Struct Biol. 2008, 18: 134-139. 10.1016/j.sbi.2007.12.001.
Article PubMed Central CAS PubMed Google Scholar
Baldwin RL, Rose GD: Is protein folding hierarchic? I. Local structure and peptide folding. Trends Biochem Sci. 1999, 24: 26-33. 10.1016/S0968-0004(98)01346-2.
Article CAS PubMed Google Scholar
Scheraga HA, Lee J, Pillardy J, Ye YJ, Liwo A, Ripoll D: Surmounting the Multiple-Minima Problem in Protein Folding. Journal of Global Optimization. 1999, 15: 235-260. 10.1023/A:1008328218931.
Article Google Scholar
Wales DJ, Scheraga HA: Global Optimization of Clusters, Crystals, and Biomolecules. Science. 1999, 285: 1368-1372. 10.1126/science.285.5432.1368.
Article CAS PubMed Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E: Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics. 1953, 21: 1087-1092. 10.1063/1.1699114.
Article CAS Google Scholar
Da Silva RA, Degreve L, Caliri A: LMProt: An Efficient Algorithm for Monte Carlo Sampling of Protein Conformational Space. Biophys J. 2004, 87: 1567-1577. 10.1529/biophysj.104.041541.
Article PubMed Central CAS PubMed Google Scholar
Tang K, Zhang J, Liang J: Fast Protein Loop Sampling and Structure Prediction Using Distance-Guided Sequential Chain-Growth Monte Carlo Method. PLoS Comput Biol. 2014, 10: e1003539-10.1371/journal.pcbi.1003539.
Article PubMed Central PubMed Google Scholar
Zhang J, Lin M, Chen R, Liang J, Liu JS: Monte Carlo sampling of near-native structures of proteins with applications. Proteins. 2007, 66: 61-68.
Article CAS PubMed Google Scholar
Lee J, Scheraga HA, Rackovsky S: New optimization method for conformational energy calculations on polypeptides: Conformational space annealing. J Comput Chem. 1997, 18: 1222-1232. 10.1002/(SICI)1096-987X(19970715)18:9<1222::AID-JCC10>3.0.CO;2-7.
Article CAS Google Scholar
Caves LS, Evanseck JD, Karplus M: Locally accessible conformations of proteins: multiple molecular dynamics simulations of crambin. Protein Sci. 1998, 7: 649-666. 10.1002/pro.5560070314.
Article PubMed Central CAS PubMed Google Scholar
Abrams CF, Vanden-Eijnden E: Large-scale conformational sampling of proteins using temperature-accelerated molecular dynamics. Proc Natl Acad Sci USA. 2010, 107: 4961-4966. 10.1073/pnas.0914540107.
Article PubMed Central CAS PubMed Google Scholar
Chou KC, Carlacci L: Simulated annealing approach to the study of protein structures. Protein Eng. 1991, 4: 661-667. 10.1093/protein/4.6.661.
Article CAS PubMed Google Scholar
Kannan S, Zacharias M: Simulated annealing coupled replica exchange molecular dynamics--an efficient conformational sampling method. J Struct Biol. 2009, 166: 288-294. 10.1016/j.jsb.2009.02.015.
Article CAS PubMed Google Scholar
Hansmann UHE: Parallel Tempering Algorithm for Conformational Studies of Biological Molecules. Chem Phys Lett. 1997, 281: 140-10.1016/S0009-2614(97)01198-6.
Article CAS Google Scholar
Zhou R: Methods Replica exchange molecular dynamics method for protein folding simulation. Mol Bio. 2007, 350: 205-223.
CAS Google Scholar
Zhang W, Chen J: Efficiency of adaptive temperature-based replica exchange for sampling large-scale protein conformational transitions. J Chem Theory Comp. 2013, 9: 2849-2856. 10.1021/ct400191b.
Article CAS Google Scholar
Zhou H, Skolnick : Ab initio protein structure prediction using chunk-TASSER. J Biophys J. 2007, 93: 1510-1518. 10.1529/biophysj.107.109959.
Article CAS PubMed Google Scholar
Arnold K, Bordoli L, Kopp J, Schwede T: The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics. 2006, 22: 195-201. 10.1093/bioinformatics/bti770.
Article CAS PubMed Google Scholar
Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N: ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res. 2010, 38: W529-W533. 10.1093/nar/gkq399.
Article PubMed Central CAS PubMed Google Scholar
Goujon M, McWilliam H, Li W, Valentin F, Squizzato S, Paern J, Lopez R: A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 2010, 38 (suppl 2): W695-9704.
Article PubMed Central CAS PubMed Google Scholar
Jayaram B, Dhingra Priyanka: Towards creating complete proteomic structural databases of whole organisms. Current Bioinformatics. 2012, 7: 424-435. 10.2174/157489312803900992.
Article CAS Google Scholar
Kopp J, Schwede T: Automated protein structure homology modeling: a progress report. Pharmacogenomics. 2004, 5: 405-416. 10.1517/14622416.5.4.405.
Article CAS PubMed Google Scholar
Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T: Protein structure homology modelling using SWISS-MODEL Workspace. Nature Protocols. 2009, 4: 1-13.
Article CAS PubMed Google Scholar
Norel P, Petrey D, Honig B: PUDGE: a flexible, interactive server for protein structure prediction. Nucleic Acid Research. 2010, 38: W550-554. 10.1093/nar/gkq475.
Article CAS Google Scholar
Rost B, Schneider R, Sander C: Protein Fold Recognition by Prediction-based threading. J Mol Biol. 1997, 270: 471-80. 10.1006/jmbi.1997.1101.
Article CAS PubMed Google Scholar
Godzik A: Fold recognition methods. Methods Biochem Anal. 2003, 44: 525-46.
CAS PubMed Google Scholar
Taylor William, Jonassen Inge: A structural pattern-based method for protein fold recognition. Proteins Structure Function and Bioinformatics. 2004, 56: 222-34. 10.1002/prot.20073.
Article CAS Google Scholar
Jinbo X, Feng J, Libo Y: Protein structure prediction using threading. In Protein Structure prediction Methods in Molecular biology. 2008, 413: 91-121.
Google Scholar
Richard B, David B: Ab initio protein STRUCTURE PREDICTION: Progress and Prospects. Annual Review of Biophysics and Biomolecular Structure. 2001, 30: 173-189. 10.1146/annurev.biophys.30.1.173.
Article Google Scholar
Themis L, Martin K: Effective energy functions for protein structure prediction. Current Opinion in Structural Biology. 2000, 10: 139-145. 10.1016/S0959-440X(00)00063-4.
Article Google Scholar
Fan H, Periole X, Mark AE: Mimicking the action of folding chaperones by Hamiltonian replica-exchange molecular dynamics simulations: Application in the refinement of de novo models. Proteins. 2012, 80: 1744-1754.
CAS PubMed Google Scholar
Lin MS, Gordon TH: Reliable protein structure refinement using a physical function. J Comp Chem. 2011, 32: 709-717. 10.1002/jcc.21664.
Article CAS Google Scholar
Zhu J, Fan H, Periole X, Honig B, Mark AE: Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins. 2008, 72: 1171-1188. 10.1002/prot.22005.
Article PubMed Central CAS PubMed Google Scholar
Margelevicius M, Venclovas C: Re-searcher: a system for recurrent detection of homologous protein sequences. BMC Bioinformatics. 2010, 11: 89-102. 10.1186/1471-2105-11-89.
Article PubMed Central PubMed Google Scholar
Wernisch L, Hunting M, Wodak SJ: Identification of Structural Domains in Proteins by a Graph Heuristic. Proteins Struct Funct Genet. 1999, 35: 338-352. 10.1002/(SICI)1097-0134(19990515)35:3<338::AID-PROT8>3.0.CO;2-I.
Article CAS PubMed Google Scholar
Liu T, Guerquin M, Samudrala R: Improving the accuracy of template-based predictions by mixing and matching between initial models. BMC Structural Biology. 2008, 8: 24-10.1186/1472-6807-8-24.
Article PubMed Central PubMed Google Scholar
Rykunov D, Fiser A: New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics. 2010, 11: 28-10.1186/1471-2105-11-28.
Article Google Scholar
Fang Q, Shortle D: Protein refolding in silico with atom-based statistical potentials and conformational search using a simple genetic algorithm. J Mol Biol. 2006, 359: 1456-10.1016/j.jmb.2006.04.033.
Article CAS PubMed Google Scholar
McConkey BJ, Sobolev V, Edelman M: Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci USA. 2003, 100: 3215-3220. 10.1073/pnas.0535768100.
Article PubMed Central CAS PubMed Google Scholar
Benkert P, Kunzli M, Schwede T: QMEAN server for protein model quality estimation. Nucleic Acids Research. 2009, 37: W510-W514. 10.1093/nar/gkp322.
Article PubMed Central CAS PubMed Google Scholar
Benkert P, Tosatto SC, Schomburg D: QMEAN: A comprehensive scoring function for model quality assessment. Proteins. 2008, 71: 261-277. 10.1002/prot.21715.
Article CAS PubMed Google Scholar
McConkey BJ, Sobolev V, Edelman M: Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci USA. 2003, 100: 3215-10.1073/pnas.0535768100.
Article PubMed Central CAS PubMed Google Scholar
Zhang J, Chen R, Liang J: Empirical potential function for simplified protein models: Combining contact and local sequence-structure descriptors. Proteins: Structure Function and Bioinformatics. 2006, 63: 949-960. 10.1002/prot.20809.
Article CAS Google Scholar
Lu M, Dousis AD, Ma J: OPUS-PSP: An Orientation-dependent Statistical All-atom Potential Derived from Side-chain Packing. Journal of Molecular Biology. 2008, 376: 288-301. 10.1016/j.jmb.2007.11.033.
Article PubMed Central CAS PubMed Google Scholar
Rykunov D, Fiser A: Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins Structure, Function and Bioinformatics. 2007, 67: 559-568. 10.1002/prot.21279.
Article CAS Google Scholar
Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kale L, Schulten K: Scalable molecular dynamics with NAMD. Journal of Computational Chemistry. 2005, 26: 1781-1802. 10.1002/jcc.20289.
Article PubMed Central CAS PubMed Google Scholar
Fang Q, Shortle D: A consistent set of statistical potentials for quantifying local side-chain and backbone interactions. Proteins. 2005, 60: 90-10.1002/prot.20482.
Article CAS PubMed Google Scholar
Shen MY, Sali A: Statistical potential for assessment and prediction of protein structures. Protein Sci. 2006, 15: 2507-2524. 10.1110/ps.062416606.
Article PubMed Central CAS PubMed Google Scholar
Sippl MJ: Recognition of errors in three-dimensional structures of proteins. Proteins. 1993, 17: 355-10.1002/prot.340170404.
Article CAS PubMed Google Scholar
Mishra A, Rao S, Mittal A, Jayaram B: Capturing Native/Native like Structures with a Physico-Chemical Metric (pcSM) in Protein Folding. BBA - Proteins and Proteomics. 2013, 1834: 1520-31. 10.1016/j.bbapap.2013.04.023.
Article CAS PubMed Google Scholar
Mishra A, Rana PS, Mittal A, Jayaram B: D2N: Distance to native. BBA - Proteins and Proteomics. 2014, 1844: 1798-1807. 10.1016/j.bbapap.2014.07.010.
Article CAS PubMed Google Scholar
Morea V, Tramontano A: Assessment of homology-based predictions in CASP5. Proteins Structure Function and Bioinformatics. 2003, 53: 352-368. 10.1002/prot.10543.
Article Google Scholar
Tress M, Tai CH, Wang G, Ezkurdia I, López G, Valencia A, Lee B, Dunbrack RL: Domain defi nition and target classifi cation for CASP6. Proteins Structure Function and Bioinformatics. 2005, 61: 8-18. 10.1002/prot.20717.
Article CAS Google Scholar
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A: Critical assessment of methods of protein structure prediction (CASP) -- round x. Proteins Structure Function and Bioinformatics. 2013, 82: 1-6.
Article Google Scholar
Zhang Y: Progress and challenges in protein structure prediction. Curr Opin Struct Biol. 2008, 18: 342-348. 10.1016/j.sbi.2008.02.004.
Article PubMed Central CAS PubMed Google Scholar
Dhingra P, Jayaram B: A homology/ab initio hybrid algorithm for sampling near-native protein conformations. J ComputChem. 2013, 34: 1925-1936.
CAS Google Scholar
Jayaram B, Bhushan K, Shenoy RS, Narang P, Bose S, Agarwal P, Sahu D, Pandey V: Bhageerath: An Energy Based Web Enabled Computer Software Suite for Limiting the Search Space of Tertiary Structures of Small Globular Proteins. Nucl Acids Res. 2006, 34: 6195-6204. 10.1093/nar/gkl789.
Article PubMed Central CAS PubMed Google Scholar
Jayaram B, Dhingra P, Lakhani B, Shekhar S: Bhageerath - Targeting the Near Impossible: Pushing the Frontiers of Atomic Models for Protein Tertiary Structure Prediction. Journal of Chemical Sciences. 2012, 124: 83-91. 10.1007/s12039-011-0189-x.
Article CAS Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-402. 10.1093/nar/25.17.3389.
Article PubMed Central CAS PubMed Google Scholar
Lobley A, Sadowski MI, Jones DT: pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics. 2009, 25: 1761-1767. 10.1093/bioinformatics/btp302.
Article CAS PubMed Google Scholar
McGuffin LJ, Jones DT: Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics. 2003, 19: 874-881. 10.1093/bioinformatics/btg097.
Article CAS PubMed Google Scholar
Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A: FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Res. 2005, 33: W284-288. 10.1093/nar/gki418.
Article PubMed Central CAS PubMed Google Scholar
Jaroszewski L, Li Z, Cai XH, Weber C, Godzik A: FFAS server: novel features and applications. Nucleic Acids Res. 2011, 39: W38-W44. 10.1093/nar/gkr441.
Article PubMed Central CAS PubMed Google Scholar
Yang Y, Faraggi E, Zhao H, Zhou Y: Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics. 2011, 27: 2076-2082. 10.1093/bioinformatics/btr350.
Article PubMed Central CAS PubMed Google Scholar
Söding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21: 951-960. 10.1093/bioinformatics/bti125.
Article PubMed Google Scholar
Jayaram B: Decoding the design principles of amino acids and the chemical logic of protein sequences. Nature Precedings. 2008, [http://precedings.nature.com/documents/2135/version/1]
Google Scholar
Sali A, Blundell TL: Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol. 1993, 234: 779-815. 10.1006/jmbi.1993.1626.
Article CAS PubMed Google Scholar
Sali A, Potterton L, Yuan F, Vlijmen HV, Karplus M: Evaluation of comparative protein modeling by MODELLER. Proteins. 1995, 23: 318-326. 10.1002/prot.340230306.
Article CAS PubMed Google Scholar
Narang P, Bhushan K, Bose S, Jayaram B: A computational pathway for bracketing native-like structures for small alpha helical globular proteins. Phys Chem Chem Phys. 2005, 7: 2364-2375. 10.1039/b502226f.
Article CAS PubMed Google Scholar
Narang P, Bhushan K, Bose S, Jayaram B: Protein structure evaluation using an all-atom energy based empirical scoring function. J Biomol Str Dyn. 2006, 23: 385-406. 10.1080/07391102.2006.10531234.
Article CAS Google Scholar
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam protein families database. Nucleic Acid Res. 2010, 38: D211-222. 10.1093/nar/gkp985.
Article PubMed Central CAS PubMed Google Scholar
Murzin A, Brenner SE, Hubbard T, Chothia C: SCOP: a structural classification of proteins database. Journal of Molecular Biology. 1995, 247: 536-540.
CAS PubMed Google Scholar
Holm L, Sander C: Protein folds and families: sequence and structure alignments. Nucleic Acids Res. 1999, 27: 244-247. 10.1093/nar/27.1.244.
Article PubMed Central CAS PubMed Google Scholar
Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35: D61-65. 10.1093/nar/gkl842.
Article PubMed Central CAS PubMed Google Scholar
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Research. 2000, 28: 235-242. 10.1093/nar/28.1.235.
Article PubMed Central CAS PubMed Google Scholar
Feig M, Karanicolas J, Brooks CL: MMTSB Tool Set: enhanced sampling and multiscale modeling methods for applications in structural biology. J Mol Graph Model. 2004, 22 (5): 377-395. 10.1016/j.jmgm.2003.12.005.
Article CAS PubMed Google Scholar
Laskowski RA, MacArthur MW, Moss DS, Thornton JM: PROCHECK - a program to check the stereo chemical quality of protein structures. J App Cryst. 1993, 26: 283-291. 10.1107/S0021889892009944.
Article CAS Google Scholar
Luthy R, Bowie JU, Eisenberg D: Assessment of protein models with three-dimensional profiles. Nature. 1992, 356: 83-85. 10.1038/356083a0.
Article CAS PubMed Google Scholar
Colovos C, Yeates TO: Verification of protein structures: Patterns of nonbonded atomic interactions. Protein Science Cambridge University Press. 1993, 2: 1511-1519.
Article CAS Google Scholar
Lee B, Richards FM: The interpretation of protein structures: estimation of static accessibility. J Mol Biol. 1971, 55: 379-400. 10.1016/0022-2836(71)90324-X.
Article CAS PubMed Google Scholar
Wiederstein M, Sippl MJ: ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Research. 2007, 35: W407-W410. 10.1093/nar/gkm290.
Article PubMed Central PubMed Google Scholar
Yang Y, Zhou Y: Ab initio folding of terminal segments with secondary structures reveals the fine difference between two closely related all-atom statistical energy functions. Protein Science. 2008, 17: 1212-1219. 10.1110/ps.033480.107.
Article PubMed Central CAS PubMed Google Scholar
Krivov G, Shapovalov MV, Dunbrack RL: Improved prediction of protein side-chain conformations with SCWRL4. Proteins. 2009, 77: 778-795. 10.1002/prot.22488.
Article PubMed Central CAS PubMed Google Scholar
Case DA, Darden TA, Simmerling CL, Wang J, Duke RE, Luo R, Crowley M, Walker RC, Zhang W, Merz KM, Wang B, Hayik S, Roitberg A, Seabra G, Kolossváry I, Wong KF, Paesani F, Vanicek J, Wu X, Brozell SR, Steinbrecher T, Gohlke H, Yang L, Tan C, Mongan J, Hornak V, Cui G, Mathews DH, Seetin MG, Sagui C, Babin V, Kollman PA: Amber 10. 2008, University of California, San Francisco
Google Scholar
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal × version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.
Article CAS PubMed Google Scholar
Xu J, Zhang Y: How significant is a protein structure similarity with TM-score = 0.5?. Bioinformatics. 2010, 26: 889-895. 10.1093/bioinformatics/btq066.
Article PubMed Central CAS PubMed Google Scholar
Zhang Y, Skolnick J: TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005, 33: 2303-2309.
Google Scholar
Rohl CA, Strauss CE, Misura KM, Baker D: Protein structure prediction using Rosetta. Methods Enzymol. 2004, 383: 66-93.
Article CAS PubMed Google Scholar
Xu D, Zhang Y: Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins. 2012, 80: 1715-1735.
Article PubMed Central CAS PubMed Google Scholar
Wang Z, Eickholt J, Cheng J: MULTICOM: a multi-level combination approach to protein structure prediction and its assessments in CASP8. Bioinformatics. 2010, 26: 882-888. 10.1093/bioinformatics/btq058.
Article PubMed Central CAS PubMed Google Scholar
Chen VB, Arendall WB, Headd JJ, Keedy DA, Immormino RM, Kapral GJ, Murray LW, Richardson JS, Richardson DC: MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr. 2010, 66: 12-21.
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

Programme support to the Supercomputing Facility for Bioinformatics & Computational Biology (SCFBio), IIT Delhi from the Department of Biotechnology Govt. of India and Indian Council of Medical Research is gratefully acknowledged. RK is a recipient of Senior Research Fellowship from Council of Scientific and Industrial Research (CSIR), India

Declarations

The publication charges of this article were funded by Professional Development Fund, IIT Delhi, India.

This article has been published as part of BMC Bioinformatics Volume 15 Supplement 16, 2014: Thirteenth International Conference on Bioinformatics (InCoB2014): Bioinformatics. The full contents of the supplement are available online at http://www.biomedcentral.com/bmcbioinformatics/supplements/15/S16.

Author information

Authors and Affiliations

Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
B Jayaram, Priyanka Dhingra & Goutam Mukherjee
Supercomputing Facility for Bioinformatics & Computational Biology, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
B Jayaram, Priyanka Dhingra, Avinash Mishra, Rahul Kaushik, Goutam Mukherjee, Ankita Singh & Shashank Shekhar
Kusuma School of Biological Sciences, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
B Jayaram, Avinash Mishra & Rahul Kaushik

Authors

B Jayaram
View author publications
You can also search for this author in PubMed Google Scholar
Priyanka Dhingra
View author publications
You can also search for this author in PubMed Google Scholar
Avinash Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Kaushik
View author publications
You can also search for this author in PubMed Google Scholar
Goutam Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Singh
View author publications
You can also search for this author in PubMed Google Scholar
Shashank Shekhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B Jayaram.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

PD developed Bhageerath-H Strgen, AM developed pcSM, RK and AS developed PROTSAV, PD and GM developed quantum mechanics (PM6) based loop bond angle refinement. BJ supervised the above projects. PD, AM and RK analyzed and validated the server. BJ and PD wrote the manuscript. PD and SS web-enabled the software.

Electronic supplementary material

12859_2014_6740_MOESM1_ESM.pdf

Additional File 1: Cα RMSD and TM-Score of best Bhageerath-H prediction. Best refers to lowest Cα RMSD predicted model in final five Bhageerath-H predictions. (PDF 248 KB)

12859_2014_6740_MOESM2_ESM.pdf

Additional File 2: Description of biological and structural relevance of CASP10 Targets (T0655, T0672, T0675, T0700, T0716, T0736, T0747, T0755, T0669, T0713, T0686, T0724). (PDF 82 KB)

Additional File 3: Molprobity score of the best Bhageerath-H prediction. (PDF 27 KB)

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Jayaram, B., Dhingra, P., Mishra, A. et al. Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins. BMC Bioinformatics 15 (Suppl 16), S7 (2014). https://doi.org/10.1186/1471-2105-15-S16-S7

Download citation

Published: 08 December 2014
DOI: https://doi.org/10.1186/1471-2105-15-S16-S7

Thirteenth International Conference on Bioinformatics (InCoB2014): Bioinformatics

Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

Abstract

Background

Results

Conclusion

Background

Methodology

(A) Bhageerath-H Strgen for candidate structures

(B) Clustering

(C) Scoring based on a physico-chemical metric

(D) Protein Structure Analysis and Validation (PROTSAV) based ranking

(E) Quantum mechanics (PM6) based loop bond angle optimization

(F) Final ranking

Results and Discussion

Validation of Bhageerath-H software suite

(A) Bhageerath-H performance on 75 CASP10 targets

Comparison of Bhageerath-H performance with BAKER-ROSETTA, Quark and MULTICOM-CLUSTER

(B) Assessment of individual modules of Bhageerath-H pipeline

(D) Quality assessment of Bhageerath-H predictions

Bhageerath-H web server

Conclusions

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Electronic supplementary material

12859_2014_6740_MOESM1_ESM.pdf

12859_2014_6740_MOESM2_ESM.pdf

Additional File 3: Molprobity score of the best Bhageerath-H prediction. (PDF 27 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Bioinformatics

Contact us