Skip to main content

Exploiting the Parallel Execution of Homology Workflow Alternatives in HPC Compute Clouds

  • Conference paper
  • First Online:
Service-Oriented Computing - ICSOC 2014 Workshops

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8954))

  • 986 Accesses

Abstract

Homology modeling (HM) plays an important role in drug discovery. HM analysis aims at predicting a 3D model from a biological sequence in order to discover new drugs. There are several problems in executing an HM analysis in large-scale, such as multiple software to be evaluated, the management of the parallel execution, and results analysis, e.g. browsing manually all results to find which structure was derived from which program with good quality. Scientific Workflow Management System (SWfMS) with parallelism and provenance support can aid the large-scale HM executions by addressing the result analysis. However, before submitting the HM workflow for execution, it has to be specified along with its several alternatives (also called variants), as considered in this paper. Managing HM workflow variations is a complex task to be accomplished even with the help of a SWfMS. In this paper, we propose SciSamma (Structural Approach and Molecular Modeling Analyses), an abstract representation of HM workflows inspired in the concept of software product lines (SPL). SciSamma models HM workflow variants to execute with parallel processing in the cloud using SciCumulus SWfMS. We evaluated SciSamma with two common variants using 100 protease enzymes of protozoan genomes. Both variations presented scalability with performance improvements (dropping from 8 h to 27 min using 32 Amazon’s large virtual machines). While evaluating the two workflow variants, through provenance queries, they present the same quality in biological results, but the difference in execution time between them was around 40 %.

This work was partially sponsored by FAPERJ and CNPq.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://oodt.apache.org/components/maven/workflow/.

References

  1. Cavasotto, C.N., Phatak, S.S.: Homology modeling in drug discovery: current trends and applications. Drug Discov. Today. 14, 676–683 (2009)

    Article  Google Scholar 

  2. Gil, Y., Deelman, E., Ellisman, M., Fahringer, T., Fox, G., Gannon, D., Goble, C., Livny, M., Moreau, L., Myers, J.: Examining the challenges of scientific workflows. Computer 40, 24–32 (2007)

    Article  Google Scholar 

  3. Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10, 11–21 (2008)

    Article  Google Scholar 

  4. Gil, Y., Ratnakar, V., Deelman, E., Mehta, G., Kim, J.: Wings for Pegasus: creating large-scale scientific applications using semantic representations of computational workflows. In: The National Conference on Artificial Intelligence, pp. 1767–1774, Vancouver, BC, Canada (2007)

    Google Scholar 

  5. Deelman, E., Mehta, G., Singh, G., Su, M.-H., Vahi, K.: Pegasus: mapping large-scale workflows to distributed resources. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for e-Science, pp. 376–394. Springer, London (2007)

    Chapter  Google Scholar 

  6. Santos, I., Dias, J., Oliveira, D., Ogasawara, E., Ocaña, K., Mattoso, M.: Runtime dynamic structural changes of scientific workflows in clouds. In: Proceedings of the IEEE/ACM 6th International Workshop on Clouds and (eScience) Applications Management – CloudAM, pp. 417–422. Dresden, Germany (2013)

    Google Scholar 

  7. Oliveira, D., Ogasawara, E., Baião, F., Mattoso, M.: SciCumulus: a lightweight cloud middleware to explore many task computing paradigm in scientific workflows. In: Proceedings of the 3rd International Conference on Cloud Computing, pp. 378–385. IEEE, Washington, DC, USA (2010)

    Google Scholar 

  8. Costa, F., Silva, V., de Oliveira, D., Ocaña, K., Ogasawara, E., Dias, J., Mattoso, M.: Capturing and querying workflow runtime provenance with PROV: a practical approach. In: Proceedings of the Joint EDBT/ICDT 2013 - Workshops on EDBT 2013, pp. 282–289. ACM Press, NY, USA (2013)

    Google Scholar 

  9. Moreau, L., Groth, P.: Provenance: an introduction to PROV. In: Synthesis Lectures on the Semantic Web: Theory and Technology, vol. 3(4), pp. 1-129. Morgan & Claypool Publishers, San Rafael (2013)

    Google Scholar 

  10. Shah, F., Mukherjee, P., Desai, P., Avery, M.: Computational approaches for the discovery of cysteine protease inhibitors against Malaria and SARS. Curr. Comput. Aided-Drug Des. 6, 1–23 (2010)

    Article  Google Scholar 

  11. Lindoso, J.A.L., Lindoso, A.A.B.P.: Neglected tropical diseases in Brazil. Revista do Instituto de Medicina Tropical de São Paulo. 51, 247–253 (2009)

    Article  Google Scholar 

  12. Oliveira, D., Ocaña, K., Baião, F., Mattoso, M.: A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J. Grid Comput. 10, 521–552 (2012)

    Article  Google Scholar 

  13. Martí-Renom, M.A., Stuart, A.C., Fiser, A., Sánchez, R., Melo, F., Sali, A.: Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325 (2000)

    Article  Google Scholar 

  14. Rose, P.W., Bi, C., Bluhm, W.F., Christie, C.H., Dimitropoulos, D., Dutta, S., Green, R.K., Goodsell, D.S., Prlic, A., Quesada, M., Quinn, G.B., Ramos, A.G., Westbrook, J.D., Young, J., Zardecki, C., Berman, H.M., Bourne, P.E.: The RCSB protein data bank: new resources for research and education. Nucleic Acids Res. 41, D475–D482 (2013)

    Article  Google Scholar 

  15. Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  16. Eswar, N., Eramian, D., Webb, B., Shen, M.-Y., Sali, A.: Protein structure modeling with MODELLER. Methods Mol. Biol. 426, 145–159 (2008)

    Article  Google Scholar 

  17. Sutcliffe, M.J., Haneef, I., Carney, D., Blundell, T.L.: Knowledge based modelling of homologous proteins, part I: three-dimensional frameworks derived from the simultaneous superposition of multiple structures. Protein Eng. 1, 377–384 (1987)

    Article  Google Scholar 

  18. Li, H., Tejero, R., Monleon, D., Bassolino-Klimas, D., Abate-Shen, C., Bruccoleri, R.E., Montelione, G.T.: Homology modeling using simulated annealing of restrained molecular dynamics and conformational search calculations with CONGEN: application in predicting the three-dimensional structure of murine homeodomain Msx-1. Protein Sci. 6, 956–970 (1997)

    Article  Google Scholar 

  19. Xiang, J.Z., Honig, B.: Jackal: a Protein Structure Modeling Package. Columbia University and Howard Hughes Medical Institute, New York (2002)

    Google Scholar 

  20. Koehl, P., Delarue, M.: A self consistent mean field approach to simultaneous gap closure and side-chain positioning in homology modelling. Nat. Struct. Biol. 2, 163–170 (1995)

    Article  Google Scholar 

  21. Laskowski, R.A., MacArthur, M.W., Moss, D.S., Thornton, J.M.: PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Crystallogr. 26, 283–291 (1993)

    Article  Google Scholar 

  22. Pruitt, K.D., Tatusova, T., Klimke, W., Maglott, D.R.: NCBI reference Sequences: current status, policy and new initiatives. Nucleic Acids Res. 37, D32–D36 (2009)

    Article  Google Scholar 

  23. Arnold, K., Bordoli, L., Kopp, J., Schwede, T.: The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics 22, 195–201 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kary A. C. S. Ocaña .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ocaña, K.A.C.S., de Oliveira, D., Silva, V., Benza, S., Mattoso, M. (2015). Exploiting the Parallel Execution of Homology Workflow Alternatives in HPC Compute Clouds. In: Toumani, F., et al. Service-Oriented Computing - ICSOC 2014 Workshops. Lecture Notes in Computer Science(), vol 8954. Springer, Cham. https://doi.org/10.1007/978-3-319-22885-3_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22885-3_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22884-6

  • Online ISBN: 978-3-319-22885-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics