Abstract
Homology modeling (HM) plays an important role in drug discovery. HM analysis aims at predicting a 3D model from a biological sequence in order to discover new drugs. There are several problems in executing an HM analysis in large-scale, such as multiple software to be evaluated, the management of the parallel execution, and results analysis, e.g. browsing manually all results to find which structure was derived from which program with good quality. Scientific Workflow Management System (SWfMS) with parallelism and provenance support can aid the large-scale HM executions by addressing the result analysis. However, before submitting the HM workflow for execution, it has to be specified along with its several alternatives (also called variants), as considered in this paper. Managing HM workflow variations is a complex task to be accomplished even with the help of a SWfMS. In this paper, we propose SciSamma (Structural Approach and Molecular Modeling Analyses), an abstract representation of HM workflows inspired in the concept of software product lines (SPL). SciSamma models HM workflow variants to execute with parallel processing in the cloud using SciCumulus SWfMS. We evaluated SciSamma with two common variants using 100 protease enzymes of protozoan genomes. Both variations presented scalability with performance improvements (dropping from 8 h to 27 min using 32 Amazon’s large virtual machines). While evaluating the two workflow variants, through provenance queries, they present the same quality in biological results, but the difference in execution time between them was around 40 %.
This work was partially sponsored by FAPERJ and CNPq.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cavasotto, C.N., Phatak, S.S.: Homology modeling in drug discovery: current trends and applications. Drug Discov. Today. 14, 676–683 (2009)
Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. Computer 40, 24–32 (2007)
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)
Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: creating large-scale scientific applications using semantic representations of computational workflows. In: The National Conference on Artificial Intelligence, pp. 1767–1774, Vancouver, BC, Canada (2007)
Deelman, E., Mehta, G., Singh, G., Su, M.-H., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 376–394. Springer, London (2007)
Santos, I., Dias, J., Oliveira, D., Ogasawara, E., Ocaña, K., Mattoso, M.: Runtime dynamic structural changes of scientific workflows in clouds. In: Proceedings of the IEEE/ACM 6th International Workshop on Clouds and (eScience) Applications Management – CloudAM, pp. 417–422. Dresden, Germany (2013)
Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: Proceedings of the 3rd International Conference on Cloud Computing, pp. 378–385. IEEE, Washington, DC, USA (2010)
Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: Proceedings of the Joint EDBT/ICDT 2013 - Workshops on EDBT 2013, pp. 282–289. ACM Press, NY, USA (2013)
Moreau, L., Groth, P.: Provenance: an introduction to PROV. In: Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 3(4), pp. 1-129. Morgan & Claypool Publishers, San Rafael (2013)
Shah, F., Mukherjee, P., Desai, P., Avery, M.: Computational approaches for the discovery of cysteine protease inhibitors against Malaria and SARS. Curr. Comput. Aided-Drug Des. 6, 1–23 (2010)
Lindoso, J.A.L., Lindoso, A.A.B.P.: Neglected tropical diseases in Brazil. Revista do Instituto de Medicina Tropical de São Paulo. 51, 247–253 (2009)
Oliveira, D., Ocaña, K., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10, 521–552 (2012)
Martí-Renom, M.A., Stuart, A.C., Fiser, A., Sánchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)
Rose, P.W., Bi, C., Bluhm, W.F., Christie, C.H., Dimitropoulos, D., Dutta, S., Green, R.K., Goodsell, D.S., Prlic, A., Quesada, M., Quinn, G.B., Ramos, A.G., Westbrook, J.D., Young, J., Zardecki, C., Berman, H.M., Bourne, P.E.: The RCSB protein data bank: new resources for research and education. Nucleic Acids Res. 41, D475–D482 (2013)
Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Eswar, N., Eramian, D., Webb, B., Shen, M.-Y., Sali, A.: Protein structure modeling with MODELLER. Methods Mol. Biol. 426, 145–159 (2008)
Sutcliffe, M.J., Haneef, I., Carney, D., Blundell, T.L.: Knowledge based modelling of homologous proteins, part I: three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1, 377–384 (1987)
Li, H., Tejero, R., Monleon, D., Bassolino-Klimas, D., Abate-Shen, C., Bruccoleri, R.E., Montelione, G.T.: Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Sci. 6, 956–970 (1997)
Xiang, J.Z., Honig, B.: Jackal: a Protein Structure Modeling Package. Columbia University and Howard Hughes Medical Institute, New York (2002)
Koehl, P., Delarue, M.: A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling. Nat. Struct. Biol. 2, 163–170 (1995)
Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M.: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 (1993)
Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R.: NCBI reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009)
Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ocaña, K.A.C.S., de Oliveira, D., Silva, V., Benza, S., Mattoso, M. (2015). Exploiting the Parallel Execution of Homology Workflow Alternatives in HPC Compute Clouds. In: Toumani, F., et al. Service-Oriented Computing - ICSOC 2014 Workshops. Lecture Notes in Computer Science(), vol 8954. Springer, Cham. https://doi.org/10.1007/978-3-319-22885-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-22885-3_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22884-6
Online ISBN: 978-3-319-22885-3
eBook Packages: Computer ScienceComputer Science (R0)